Search Unity

Unity 5.6 GPU Instancing.

Discussion in 'General Graphics' started by flapyfox, Mar 24, 2017.

  1. flapyfox

    flapyfox

    Joined:
    Sep 9, 2008
    Posts:
    408
    Just saw this:
    https://blogs.unity3d.com/2016/12/13/unity-5-6-beta-is-now-available/

    "The new DrawMeshInstancedIndirect function allows you to draw many instances of the same mesh using an instanced shader with arguments supplied from a ComputeBuffer. This new way of rendering instances via script has almost no CPU overhead."

    What does that mean? if I have a cube composed 12 triangles and instance it 100 time, does that mean that it's only counts as 12 triangles like offline tenderer instancing?

    And how does this effect RAM? can I instance 100 cubes on an iphone platform for example and it will only count as 1 cube for the RAM memory?
     
  2. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,338
    Instancing has been in since Unity 5.4. As far as the GPU is concerned it's still rendering the same number of polygons, and in some ways instancing is a little slower, so rendering 100 cubes using instancing will be about the same or slightly slower than a single mesh made of 100 cubes, but it's still 1200 triangles. However it can save a ton of time on the CPU.

    To understand this you need to understand what the CPU does for rendering.

    In the most basic case of rendering 100 cubes the actual cube only exist in memory once and the CPU tells the GPU :
    "render this one cube at this location"
    "render this one cube at this location"
    "render this one cube at this location"
    etc. 100 times. That takes a while on the CPU, and can even sometimes mean the GPU is idle waiting for the CPU to tell it what to do next as it's possible for it to finish rendering one cube faster than the CPU can say what to do next. In this case there's only the one cube mesh and basically just the position is being updated over and over.

    Batching, which Unity uses by default, combines those 100 cubes into a single big mesh on the CPU and then tells the GPU to render that big mesh all at once. Static batching and dynamic batching differ here slightly, but generally it's faster to dynamically generate a single big mesh of those 100 cubes and then only have to tell the GPU to render it once than having to do each cube individually. This uses more memory as there's now potentially a new 1200 poly mesh being created on the CPU and uploaded to the GPU every frame.

    Instancing works by havin the CPU create a list of positions, and then telling the GPU: "render this one cube at these 100 locations". Only one cube mesh is in memory, and it's cheaper than building the big giant mesh on the CPU. On the GPU this ends up being potentially slightly slower than the single big mesh, it's still drawing 100 individual cubes as far as it's concerned, just like the first method, just now it doesn't have to wait for the CPU.

    It's also possible to have the GPU calculate those positions in a compute shader, however the CPU still has to say how many to render.

    The new implementation released with 5.6 differs from what's been available since 5.4 in that the compute shader can also determine the number of instances to draw, where as before the CPU always had to say how many. That's really the only change. It does mean the CPU just has to tell the GPU: "render this one cube however many times and where ever these compute buffers say", which is about as little work as possible for the CPU.
     
    WildStyle69 and OCASM like this.
  3. ShilohGames

    ShilohGames

    Joined:
    Mar 24, 2014
    Posts:
    3,020
    The huge difference between dynamically batching, automatic instancing (Unity 5.4), and API based instancing (Unity 5.5/5.6) is the API based instancing forces the instancing to actually work reliably. This means the performance is very predictable with low draw calls and high frame rates.

    The dynamic batching and automatic instancing options are pretty hit or miss. With Unity 5.4's automatic instancing, I could place thousands of identical objects in the scene and the automatic instancing would only group 2-7 units together instead of hundreds or a thousand. This resulted in a large number of draw calls and a relatively low frame rate when dealing with thousands of laser blasts.

    In my game, I got a 6X speed boost to the frame rate by using the API based instancing in Unity 5.5 instead of the automatic instancing in Unity 5.4 or the dynamic batching. With the API based instancing in Unity 5.5, I can force Unity to instance hundreds of identical units together in a single draw call. I render thousands of laser blasts in my game with just a handful of total draw calls. In my case, static batching was never an option, because the objects were all moving.

    I have not experimented yet with the API instancing in Unity 5.6 vs Unity 5.5, but I have heard Unity 5.6 could offer even more performance in this regard.
     
    OCASM likes this.
  4. flapyfox

    flapyfox

    Joined:
    Sep 9, 2008
    Posts:
    408
    Is this a mistake? Did you mean that it has to wait for the CPU (because it's instancing)?
     
  5. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,338
    No, that is not a mistake. Technically they all have to wait some amount, but:

    Drawing 100 cubes without instancing or batching the GPU has to wait for the CPU before drawing each individual cube.

    Drawing 100 cubes with dynamic batching the GPU waits for the CPU to build and upload the batched mesh each frame, but renders all "100 cubes" at once.

    Drawing 100 cubes with static batching is similar to dynamic batching in that effectively the GPU renders all "100 cubes" at once, but the mesh isn't generated every frame.

    Drawing 100 cubes with traditional instancing the GPU waits for the CPU to give it a list of 100 positions and then renders each cube individually, but doesn't have to wait for the CPU between cubes.

    In all of these cases but static batching the CPU is calculating the positions of the cubes every frame. The CPU is also deciding that there should be 100 cubes.

    Drawing 100 cubes with the new DrawMeshInstancedIndirect in 5.6 means the CPU passes some parameters to a compute shader and tells the GPU "render this mesh wherever the compute buffers tell you to render". The GPU then does all of the work deciding the positions and number to render.

    Drawing 100 cubes using DrawProceduralIndirect is similar to the new 5.6 stuff in that you're deciding where and how many cubes to render fully on the GPU, but by generating a single mesh with all of those cubes in it, like a GPU version of dynamic batching. You also have to pass the vertex data you need to the compute shader manually.

    The main idea is trying to reduce the amount of work, and data the CPU has to do and pass to the GPU each frame.
     
    WildStyle69 and AcidArrow like this.
  6. flapyfox

    flapyfox

    Joined:
    Sep 9, 2008
    Posts:
    408
    and can those instanced 100 cubes have their own place in the lightmap?
    I am thinking of a interior of a building with let's say 20 staircases, everything is lightmapped, how would I benefit from the new instancing in that case?
     
  7. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,338
    No, you would see no benefit from this. Instanced geometry is for dynamic geometry, which means it can't be lightmapped by Unity's lightmapper, and the new instancing stuff is for drawing / instancing geometry manually from script.
     
  8. FiveFingerStudios

    FiveFingerStudios

    Joined:
    Apr 22, 2016
    Posts:
    510
    Curious how do I exactly to take advantage of this?

    I'm working on a zombie game and want to get as much of them on the screen as possible. I'm currently creating 50 of them and disabling them until I need them active. At that time, I enable them and move them to the correctly location on the map.

    I currently able to show about 40 - 45 zombies on the screen at once, but anything beyond that and I get performance issues. I know that them being animated causes an issue (which is another topic). But as far as forcing Unity to perform GPU instancing...how is this done? I've been looking for an example, but haven't been able to find it anywhere.
     
  9. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,338
  10. FiveFingerStudios

    FiveFingerStudios

    Joined:
    Apr 22, 2016
    Posts:
    510
    Thanks for the link. That video is very impressive!

    Unfortunately, I'm too much of a noob, to be able to code something like that. What do you think of this asset in the store?

    https://www.assetstore.unity3d.com/en/#!/content/26009

    It seems to be able to do exactly what we are discussing. It seems to be trading CPU/GPU cycles for memory, which if you can spare is a great benefit.
     
  11. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,338
    That asset works by baking animations into multiple static meshes and playing back animation by just swapping which mesh is being used. It's instancing friendly if you keep the number of animations and framerate low and are drawing a ton of units, like many thousands (basically you need some multiple of units more than there are unique frames of commonly used animation).

    That said it's a perfectly valid technique to use even with out instancing, and I've seen games with 15,000+ units using that technique. For instance I suspect the Total War series has been doing something like this for nearly 2 decades, though I think they also looked like they combined groups of units into single meshes.
     
  12. ShilohGames

    ShilohGames

    Joined:
    Mar 24, 2014
    Posts:
    3,020
    You could use that asset to bake characters and a number of preset poses for each of those characters. Then you could send those through the GPU instancing to support thousands of units in the scene. This approach allows for massive scalability. The only downside is that you won't get completely smooth animations like you are used to seeing with skinned meshes.

    Also, to really work with this, you will need to stop using a game object for each unit. Instead, you will want to keep an array of a custom struct that maintains all of the information about the various units in the scene. The struct would include position, rotation, animation frame (since you will need to manually manage that), and hit points (and anything else). Then you would create a single game object that would hold your C# class class that managed that information. Each frame, your C# class will need to enumerate all of the units and push that information out to the GPU Instancing API. You may want to create a separate C# class for each type of unit. And if you want to get really slick, you should add some code to make sure units standing near each other do not play the same animation frame/pose at the same time, since that will look quirky to a user.

    One hybrid solution would be to use the GPU instancing tricks I described to create a massive zombie horde, but only use that for zombies in the distances or around the edges of the scene. Use normal game objects and skinned meshes for the dozen zombies closest to the player. This approach would let you have a massive number of zombies (literally thousands in view) and have the better looking skinned mesh animations shown to the player in the nearest zombies. You could develop a custom object pool that dynamically switched zombies between the rigid static mesh poses (in GPU Instancing) and the more fluid looking skinned meshes in the nearby zombies. Players would love it, because a player's mind will tell them that all of the zombies share the attributes of the ones nearby. Think of it as LOD applied to animations.
     
  13. FiveFingerStudios

    FiveFingerStudios

    Joined:
    Apr 22, 2016
    Posts:
    510
    Thanks for the help guys.

    I will use the MeshAnimator asset to do exactly that. I was planning on using it for only distant zombies, but will try to do the hybrid solution. I'm hoping. Can just code the transition between normal animation and the GPU instancing so that its seem less.

    Ahh...I played total war back in the day....at that time didn't have a clue on how they got so many units on the screen...but I appreciated the massive numbers