How to frequently modify a list of triangles?

imaginaryhuman · May 16, 2011

I have a large list of triangles and it appears you have to upload an array which is exactly the size of the number of triangles you need. You can't just send part of an array. This means if I want to remove some triangles then I have to allocate a new array with the new length and copy over all the triangle data. Isn't there a faster way of removing various triangles from an array and getting the mesh updated without all this allocation/copying?

Antitheory · May 16, 2011

ImaginaryHuman said: ↑

I have a large list of triangles and it appears you have to upload an array which is exactly the size of the number of triangles you need. You can't just send part of an array. This means if I want to remove some triangles then I have to allocate a new array with the new length and copy over all the triangle data. Isn't there a faster way of removing various triangles from an array and getting the mesh updated without all this allocation/copying?
Click to expand...

Is it really all that trouble though? It's just an array of integers.

You can work with a List<int> from behind the scenes and then just use ToArray() every time you need to put it back into the mesh.

imaginaryhuman · May 16, 2011

No, the integers is not an issue. And this would apply to vertices/normals/uv's as well. Reassigning a whole array of data every frame is an issue.

Like say I have a multi-screen game level with thousands of objects strewn throughout, but I want to remove a few objects, and they're all part of one mesh, would it be better to modify the list and reupload the entire thing or let the GPU cull the out-of-screen triangles?

Beyond that it'd be really helpful to be able to upload only part of an array without having to have the array be the exact size of the mesh.

Eric5h5 · May 16, 2011

ImaginaryHuman said: ↑

Like say I have a multi-screen game level with thousands of objects strewn throughout, but I want to remove a few objects, and they're all part of one mesh, would it be better to modify the list and reupload the entire thing or let the GPU cull the out-of-screen triangles?
Click to expand...

Well, I think they shouldn't really all be part of one mesh in the first place. There's such a thing as too much combining....

Beyond that it'd be really helpful to be able to upload only part of an array without having to have the array be the exact size of the mesh.
Click to expand...

The mesh has to be uploaded to the GPU, so there's not much you can do about it. The same way you have to call Apply() on a texture and the entire thing is uploaded, even if you just change one pixel.

--Eric

imaginaryhuman · May 16, 2011

This is for a custom sprite/particle system. It is necessary to have thousands of particles on one mesh and to move the vertices of each quad. So I'd like to be able to Remove some particles without having to reallocate a whole new array at the required length. The only other way I can see to do it is to split the system into multiple batches and only reupload the groups that change, which slightly defeats the purpose of gaining speed by putting them all in a one drawcall mesh.

imaginaryhuman · May 16, 2011

Also my other question is, when you set new triangles/vertex data, does it copy the entire contents of the array into some other internal array, or does Unity just reference your array to get its data, ie could I modify the array after I set it initially and have it automatically be live or do you HAVE to set the array again every time you modify something?

Krobill · May 16, 2011

it may be more efficient to 'degenerate' the unused polygons by merging all their vertices in the same position and get them back when you need them. It's kind of a geometry pooling system and it is certainly faster than constantly update all the data arrays of the Mesh class because if you change the number of vertices you have to change normals, uvs, etc...

If you target the PC/Mac platforms though, for a 2D sprite system, manual particle merging has become quite obsolete. Dynamic batching is able to merge thousands of quads on the fly and if you use a pool system to avoid constant instanciation / destruction of the gameObject, it's actually easier. You then only have to update meshes on the particular frames / updates where you need to change UVs and quad size if you need texture animations. At least that's what we do here and it works quite well

Krobill · May 16, 2011

ImaginaryHuman said: ↑

Also my other question is, when you set new triangles/vertex data, does it copy the entire contents of the array into some other internal array, or does Unity just reference your array to get its data, ie could I modify the array after I set it initially and have it automatically be live or do you HAVE to set the array again every time you modify something?
Click to expand...

No Unity does not reference your Array and yes you have to copy the whole thing back to the Unity 'black box' every time you modify it.

imaginaryhuman · May 16, 2011

One thing I notice is that the length of the Triangles array can be different from the length of all other vertex arrays. So maybe I could let the vertex data arrays stay as they are and then only upload new Triangle arrays each frame? Maybe this what you were describing as pooling - ie don't change the vertex data arrays, maybe just change the triangle array entries for triangles that are disappearing to point to a single `dummy triangle` or something?

How does the dynamic batching work?

Krobill · May 16, 2011

The triangles Array only determines which poly uses which vertices. Most of the time you don't touch that but I guess you could make polys disappear by reassigning which vertices they use... In a dynamic animated 2D system it's usual to have to update all vertices positions on a per frame basis so it's really the vertices Array that you need to change the most. Though for procedural animations you can handle in fact quite a lot with vertex shaders, leave the Mesh data alone, and spare a lot of CPU power.

Dynamic batching is quite a long subject and leads sometimes to heated conversations around here ^^
To make things short and from our experience only :
GameObjects associated to a simple quad mesh with the same Material will be batched
- unless there are too much of them (it's a looooooot usually)
- unless they are uniformly and non uniformly scaled : that is, uniformly and non uniformly scaled object can't be batched together (some are handled by GPU and others by CPU I guess...)
- a very important subtlety concerning alpha blended objects (which is pretty much everyone of them with 2D games ^^) : objects can't be batched together if objects with a different material are positionned between them in term of depth along the camera axis (position of the pivot not the acutal polygons). This is due to the automatic z-sorting of Unity3D to avoid alpha blend rendering artifacts. You have to group your objects using the same materials in depth ranges to maximize dynamic batch.

imaginaryhuman · May 16, 2011

Interesting.

Would you suggest to avoid dynamic batching entirely (since it has some overhead) and just go with your own larger combined mesh with, say, several thousand quads on it, and then `pool` the hidden quads by changing the vertices or triangles?

You kind of made it sound like you can do many many thousands of quads (e.g. 100,000?) with a single dynamic batch so long as you follow some rules... but is it more efficient to do that than to have a few meshes with 16384 vertices each?

imaginaryhuman · May 16, 2011

Also a related question.. if you are putting lots of `sprite` quads on a mesh and moving the vertices `manually` each frame (if they are sprites that move), how do you deal with rotation and scaling? Do you store like an angle and radius from a `handle` coordinate for each vertex and then recalculate all of the vertex coordinates using trig each frame? How fast is that and is there a better way? (thinking of 2d bone joints/hinge)

Krobill · May 16, 2011

I certainly recommend to USE dynamic batching, not avoid it ! I proposed a large 'pool' mesh if you wish to stick to the manual merging option but today I wouldn't recommend that. It's indeed a lot easier to handle rotation and scaling through transforms of gameObject rather than to have to recalculate vertices positions for each quad.

imaginaryhuman · May 16, 2011

OK but if we have a batch of say 30 quads on a single mesh in order to reduce draw calls, and let several of those dynamically batch with manual handling of vertex positions for rotation, isn't that better than having hundreds of game objects with only 1 quad each and matrices for rotation?

imaginaryhuman · May 16, 2011

And although it may be easier to do transforms with game objects is it faster?

bigmisterb · May 16, 2011

ImaginaryHuman said: ↑

OK but if we have a batch of say 30 quads on a single mesh in order to reduce draw calls, and let several of those dynamically batch with manual handling of vertex positions for rotation, isn't that better than having hundreds of game objects with only 1 quad each and matrices for rotation?
Click to expand...

This all depends on how you set it up, if you had 1 object with 30 quads and adjusted them every frame, no, I think that would be slower than 30 quad objects adjusting position and rotation.

That specific type of object is built into Unity, it is called a particle emitter.

In order to rotate a single quad to face the camera, you have to do a rotation lookat angle, then you have to do 4 calls to the direction to get each vertices, when you only have to do one rotation lookat angle and let the matrix of Unity do the rest. Far simpler far less cpu usage.

Draw calls have nothing to do with cpu time eaten up by your programming, it more has to do with the number of objects compounded by the number of faces compounded again by the number of textures. Optimization of the programming is solely the responsibility of the programmer. I wouldn't see where creating one object and doing 10+ math functions per quad would reduce your draw calls or increase frame rate to any greater or lesser extent.

imaginaryhuman · May 16, 2011

Yah I'm kinda replacing Untiy's particle system with a more advanced one.

I think the only way to find out will be to do some speed tests of my own. As usual, some approaches work better in certain situations, others in a different situation. So I think I will go with several approaches overall and let the user decide, or figure out a way to monitor it in realtime and choose the best method.

I don't think I like the idea of thousands of game objects, though, just for the sake of easier rotation. Doing easier rotations in 2D can probably be optimized faster than the numerous calculations needed for matrices. I also know that generally speaking having a large mesh containing multiple quads is faster on most systems than lots of meshes - which is the whole point behind products like SpriteManager.

Quietus2 · May 16, 2011

BigMisterB said: ↑

Draw calls have nothing to do with cpu time eaten up by your programming, it more has to do with the number of objects compounded by the number of faces compounded again by the number of textures. Optimization of the programming is solely the responsibility of the programmer. I wouldn't see where creating one object and doing 10+ math functions per quad would reduce your draw calls or increase frame rate to any greater or lesser extent.
Click to expand...

That's not really true. A draw call is inherently a cpu bound transaction, as it's the cpu sending data to the gpu. The gpu is really good at processing large batches of triangles, but is slow changing internal states such as switching materials.

Processing everything as one large triangle strip with a single material you end up with a single draw call. Same thing with deferred lighting, where you trade lower drawcalls for an increase in fillrate.

ImaginaryHuman : I ran into a similar situation as yourself, and considered pre-staging the mesh data with a linked list for blazing fast random access add/removal of vertices and only a single array copy. That is, until I realized how terrible linked lists of structs worked in C# as opposed to Ansi-C. It was one of those times where I wanted to punch managed applications in the face.

Krobill · May 16, 2011

I may be mistaken but SpriteManager2 doesn't rely as much on 'manual' merging as version 1 did. I think you still have the option to do so but SpriteManager does take advantage of dynamic batching. The bigger part of SpriteManager is handling texture animations and atlases creation...

When you say you don't like being forced to use entire gameObjects to handle 2D sprites (because obviously they are much more complex than what's needed), you have to remember one thing : all the code that you'll produce is only managed code with noticeably inferior performances compared to native C++ code. When you use Unity's GameObjects you rely much more on native code than what you have to implement in the managed part of your application. So even if what Unity is doing is more complex than what you do, it'll perform better. You have also to keep in mind that if you want to implement a 2D system of alpha blended sprites merged into a single mesh, you have to ensure that your triangles are declared in the back to front order. If your quads are moving along the depth axis, you have to modify the order constantly. This kind of sorting will hurt badly your performances for large collections of quads where Unity can do it for you for a fraction of the cost if you use GameObjects. Choosing a solution somewhere in between with small batches of quads forbids you to 'intersect' those groups along the depth if they are alpha blended...

I strongly encourage you to make some performances tests to see what is the best solution for your needs.

imaginaryhuman · May 16, 2011

Yah you have a point about the game objects but front to back movement won't be much if a common thing for 2d. I've also heard some other people say that compiled script code can benear to native c++ speed?

I will have to do some tests to confirm what works the fastest I guess.

Krobill · May 16, 2011

even if they don't move, the simple act of placing a new quad between existing ones (you have to handle z-ordering at least a bit) will force you to rearrange the order. And think 2.5D, not that uncommon! ^^

I don't know how Mono works exactly but .NET on which it is based is using JIT compilation. It is faster than older interpreted environments but doesn't quite match native code speed. To be fair, advanced use of JIT can lead to better performances than native c++ code in some special situations but I am not even sure you have access to these tricks with Mono...

imaginaryhuman · May 16, 2011

Ok point taken about 2.5D. So in your personal system you're using right now you just generate hundreds of game objects with one quad on each?

Krobill · May 16, 2011

yup that's what we do. Though we do not instantiate and destroy GameObjects all the time, we use a pool and enable / disable them on demand. We update UVs only when the animated texture requires and vertices' positions only if the new 'frame' of the sprite has a different size from the previous one. The key to good performances is to limit the number of access (set get) to Unity unmanaged code to the minimum.

imaginaryhuman · May 17, 2011

Sounds good. Thanks for the tips.

Alesk · Jun 7, 2011

Hi,

I'm also working on a custom particle system and I encounter something weird while updating my single mesh displaying 16000 particles.
Here is a benchmark of each assignation step :

mesh.vertices = vertexArray -> 1.49 ms
mesh.uv= uvArray -> 0.46 ms
mesh.colors = colorArray -> 5.20 ms
mesh.triangles = triangleArray -> 2.80 ms

How do you explain a so long time for colors ??

tomnullpointer · Jun 7, 2011

Just out of interest, I had a simlar issue and i ended up using this.

Code (csharp):

//resize arrays to correct length of vertices that we have produced for this iteration

System.Array.Resize(ref m_Vertices,m_TotalVertexes);

System.Array.Resize(ref m_TriangleIndexes,m_TotalVertexes);

System.Array.Resize(ref m_Normals,m_TotalVertexes);

In my situation im making a huge max size array, then filling it procedurally and using th eabove code to clip it to the actual length ive filled in the production process.

I also wrote a custom particle (sprite decal system) and batched my own etc, it was faster than unity but not significantly moreso than having loads of gameobjects. In my version I had a mx lentgh array to account for x-number of particles (4vertexes each etc). My manager then just found free slots and assigned the appropriate vectors and reassigned the whole list to a unity mesh filter. There was loads of redundancy since with few particles i was essentially drawing the majority of the particles offscreen, but it did make for a less spammy gameobject situation

imaginaryhuman · Jun 8, 2011

Interesting, thanks for the performance results.

My concern about resizing arrays is that it has to be done every frame and that basically usually works by a) allocating new memory for the array and b) copying the content of the old array to the new one - and that can be a significant overhead. That's what makes it prohibitive.

Alesk · Mar 30, 2012

ImaginaryHuman said: ↑

Interesting, thanks for the performance results.

My concern about resizing arrays is that it has to be done every frame and that basically usually works by a) allocating new memory for the array and b) copying the content of the old array to the new one - and that can be a significant overhead. That's what makes it prohibitive.
Click to expand...

My solution to that is quite "simple" : I've made multiple static arrays with different lengths, and I fill the more appropriate one with my data before assigning it to the mesh.
Often more entries are available, I set the last unsued to 0 to not have ghost triangles.
Of course it uses more memory, but it's quite acceptable in this case.
Rebuilding an array for 16000 particles (so 64000 vertices) uses near 9ms on my cpu (Q6600), with my solution it's always 0ms

You can see my first result here : http://www.alesk.fr/demo/smoke/ (edit : link leads now to the last update of this demo)

As you can see the assignation part is decreasing with the particle count, but the arrays init time is always 0, since I only update prebuilt arrays

By the way, I'm interested by screenshots of the numbers displayed while 16000 particles are shown, with of course your cpu + gpu references

imaginaryhuman · Jun 8, 2011

Yea, see, the allocation overhead is a lot... good idea to just waste some space with padding standard-size arrays rather than do a resize.

I got 54fps on ATI 1600 iMac from 2006 with all particles on-screen fullscreen.

Alesk · Jun 8, 2011

Good !

I'll now try to split the sorting over multiple frames.

Anyway, does someone have an explanation about the high time cost on colors array assignation ?

imaginaryhuman · Jun 9, 2011

One of the big bottlenecks is that a) you have to update your own arrays with new vertices/data, b) set the new arrays which copies them to Unity's internal array, and then c) upload them to the vertex buffers etc. So there's like 3 copies being done.

Also I've been finding that using a Color[] array for texture pixels is really silly.. I mean, who needs a 4-byte float to store a 1-byte color resolution? which ends up being 16 bytes per vertex, or even 64 bytes for a single quad. Unity really needs a function to take a byte array as input for these kind of functions.

In OpenGL in another language it's possible to draw about 5 screens full of pixels directly over the graphics bus per frame, and yet in Unity I could only get it to do about 20% of that amount.

Alesk · Mar 30, 2012

ImaginaryHuman said: ↑

One of the big bottlenecks is that a) you have to update your own arrays with new vertices/data, b) set the new arrays which copies them to Unity's internal array, and then c) upload them to the vertex buffers etc. So there's like 3 copies being done..
Click to expand...

Yeah, I'm aware of that, but comparing to the uv time (0.5ms) I don't understand why it takes so long to copy the colors (2.5ms), uvs are 2 floats where colors are 4 floats, so that's basically twice the amount of data, so why the update time is 5 time longer, it should be only 2 ?

ImaginaryHuman said: ↑

Also I've been finding that using a Color[] array for texture pixels is really silly.. I mean, who needs a 4-byte float to store a 1-byte color resolution? which ends up being 16 bytes per vertex, or even 64 bytes for a single quad. Unity really needs a function to take a byte array as input for these kind of functions.
Click to expand...

Maybe that's because they treat the data in a HDR way internally... not sure about that.
But then the colors values should not be clamped to 1f... So I totally agree with you, having a 4 bytes Color object to define RGBA values would be really faster than the 16 bytes object we have now.
Unless this implies the engine have to convert it back to floats internally to keep working, I'm not sure if it would eat so much time or not

Maybe it's time to put this in the features request list
I think I'll also ask for a Mesh.Build() function, to let us control when the data should be sent to the gpu.

Noisecrime · Jun 9, 2011

ImaginaryHuman said: ↑

Unity really needs a function to take a byte array as input for these kind of functions..
Click to expand...

Completely agree, would also go a long way to help improve Unity native texture updating too, since unless you happen to be using floats, you go through byte>float>byte conversions before it even uploads the data to the gpu.

imaginaryhuman · Jun 10, 2011

Yup. Colors really are not a good way to store a texture, not to mention the amount of wasted memory storing 16 bytes for 1 pixel.

I don't know what Unity's graphics engine is doing under the hood but maybe they're sending the floats to the gpu for colors and letting the graphics driver convert the format, but either way it would be way better to have a simple RGBA 8-bit color format.

Noisecrime · Jun 10, 2011

ImaginaryHuman said: ↑

Yup. Colors really are not a good way to store a texture, not to mention the amount of wasted memory storing 16 bytes for 1 pixel.

I don't know what Unity's graphics engine is doing under the hood but maybe they're sending the floats to the gpu for colors and letting the graphics driver convert the format, but either way it would be way better to have a simple RGBA 8-bit color format.
Click to expand...

As far as I know (with regards to opengl at least) colour data is always in bytes and i don't see any good reason why they'd allow floats, since as you say its massively wasteful on memory.

BTW I added a request to the wishlist some time ago, though i'm not sure how much attention or influence the wishlist forum has on future developments.

As you said the simple solution is to just allow support for bytearrays.

Eric5h5 · Jul 26, 2011

And hey, now Unity 3.4 has Color32 (along with Get/SetPixels32), so problem solved.

--Eric

imaginaryhuman · Jul 27, 2011

Excellent news! Thanks for the tip-off.

Have you tested it yet, is it faster?

imaginaryhuman · Jul 27, 2011

In my comparison I make it about 20ms to upload a 512x512 Color array, versus about 13ms to upload a Color32 array. It's certainly still not as fast as it could be - it should be 3-4 times faster, but it's a welcome improvement.

Eric5h5 · Jul 27, 2011

ImaginaryHuman said: ↑

Have you tested it yet, is it faster?
Click to expand...

Yep, it's over twice as fast for reading and more like 5X faster for writing.

--Eric

Eric5h5 · Jul 27, 2011

ImaginaryHuman said: ↑

In my comparison I make it about 20ms to upload a 512x512 Color array, versus about 13ms to upload a Color32 array. It's certainly still not as fast as it could be - it should be 3-4 times faster, but it's a welcome improvement.
Click to expand...

What system do you have? I'm definitely getting a 550% speed improvement with SetPixels32 compared to SetPixels on a 1024x1024 texture.

--Eric

hippocoder · Jul 27, 2011

Depends on the purposes too for example scaling breaks dynamic batching.

imaginaryhuman · Jul 27, 2011

I only tested it in the editor, that's probably why.

imaginaryhuman · Jul 27, 2011

I get 55 fps for 512x512 SetPixels + Apply, versus 80 fps with SetPixels32.

I get 88 vs 193 fps for grab.

ATI X1600 on a 2006 iMac.

Eric5h5 · Jul 27, 2011

I tested in the editor too, Radeon 5870 on a 2010 Mac Pro. I'm not sure about that testing methodology though. I timed a loop of 100 SetPixel calls using Time.realtimeSinceStartup; using frames per second would seem to involve too many other factors.

--Eric

Noisecrime · Jul 27, 2011

I suspect the testing methodologies employed by you two are measuring different things.

I'd concur with Eric5h5 in terms of how fast the copy/assignment has improved. However I suspect what ImaginaryHuman is seeing and testing is including time taken to upload the data to the gpu. I think testing SetPixels in a tightloop, probably means Unity isn't uploading to the card.

Eric5h5 · Jul 27, 2011

Noisecrime said: ↑

I think testing SetPixels in a tightloop, probably means Unity isn't uploading to the card.
Click to expand...

Indeed, it only does that when you use Apply().

--Eric

imaginaryhuman · Jul 27, 2011

Well of course it has to include the apply. With the apply I see no more than a 2 x speedup but it's good to know that the setpixels32 itself is several times faster. It's just that the apply is no faster. And part of that problem is still the fact that you have to write your changes to memory, use setpixrls to copy that memory into Unity's internal storage and then upload it to the graphics bus. If we could simply skip setpixrls32 altogether and just transfer from your own color32 buffer straight to the graphics men that would be a big improvement.

Eric5h5 · Jul 27, 2011

Apply() isn't going to be any faster than before; it doesn't have anything to do with SetPixel vs. SetPixel32 speed. It's a separate topic from what's been discussed in this thread...speaking of which, unfortunately using a Color32 array for mesh colors doesn't currently work.

--Eric

imaginaryhuman · Aug 22, 2011

What would really be a great addition to Unity (and maybe it does this already, but I have no idea?) is to offer an `Apply` which cleverly understands which `dirty rects` I've modified using SetPixel/SetPixels and only uploads the portions of a texture I've modified to the gfx card when you do an Apply, instead of uploading the entire texture again. And the same should go for any copying from memory to the readable version of the texture, it should only copy what's changed. That would make it so much more flexible for doing things like spooling graphics over the course of time where you don't want to upload an entire texture all at once.

imaginaryhuman · Sep 29, 2011

Another great addition to Unity would be the ability to upload to only a small part of a texture using something like ApplyRect() so it only uploads the parts you actually changed with Setpixels (based on dirty rects algorithm?) rather than having to upload the entire texture after a small change.

Search Unity

Unity ID

Useful Searches

How to frequently modify a list of triangles?

Volunteer Moderator Moderator

Volunteer Moderator Moderator

Volunteer Moderator Moderator

Volunteer Moderator Moderator

Digital Ape

Volunteer Moderator Moderator

Volunteer Moderator Moderator

Volunteer Moderator Moderator