Search Unity

CPU/GPU Skinning

Discussion in 'Editor & General Support' started by ChrisWalsh, Sep 6, 2010.

  1. ChrisWalsh

    ChrisWalsh

    Joined:
    Apr 20, 2010
    Posts:
    47
    Hi,
    I just added some skinned meshes to my scene and was wondering about the skinning (especially in Unity3), and what exactly is being done on the CPU/GPU...

    As far as I know there are 3 parts to rendering a skinned animation (assuming this is just 1 anim, and the renderer is interpolating between keyframes on that 1 anim):

    1. Blending of 2 keyframes animation matrices to get a 3rd morphed set of matrices, using quaternions to blend between the anims.

    2. Creating the matrix palette (by traversing the hierarchy of blended matrices to find final matrices).

    3. Creating the object/world space verts by doing weighted multiplies of the matrices in the matrix palette and each vert. If lighting, also skin the normals. If normal mapped, also skin the tangents.

    When referring to 'skinning is done on the CPU'... am I right in thinking this is just part 1) and 2) above? Surely not 3)? I've got a 5k vert character in my scene and I'm seeing in the profiler in Unity3 that it's using 25% of my CPU doing skinning :(

    It's not doing the actual skinning on the CPU is it?

    Thanks,
    Chris.
     
    XCO and asdzxcv777 like this.
  2. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    BUUUMP. Where's the in-depth authoritative info on how both CPU and GPU skinning work technically behind the curtains? Knowing how things are done in detail allows for better-informed design decisions early on. I know about the rendering pipeline in general, rasterization, shading etc. Now what happens exactly for skinned-mesh-renderers?
     
    XCO likes this.
  3. MakeCodeNow

    MakeCodeNow

    Joined:
    Feb 14, 2014
    Posts:
    1,246
    1 and 2 are always on the CPU and are usually called animation. #3 is skinning and it can be GPU or CPU. There are rare cases where CPU skinning is preferable, but generally GPU skinning is much faster.
     
  4. MakeCodeNow

    MakeCodeNow

    Joined:
    Feb 14, 2014
    Posts:
    1,246
    PS - GPU skinning requires pro.
     
  5. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    Thanks MCN! Would still be hugely interested how it's done on the GPU if GPU skinning is enabled. Doesn't seem to be happening in Vertex Shaders or does it? GPU means basically some kinds of shaders need to run for this purpose, what shaders?
     
  6. MakeCodeNow

    MakeCodeNow

    Joined:
    Feb 14, 2014
    Posts:
    1,246
    Yeah, GPU skinning means it's done in a vertex shader. Unity has a lot of magic in its shader pipeline which is probably why you don't see that code, but I'm not exactly sure. If you're curious as to how its often done, Google matrix pallete skinning shader.
     
    asdzxcv777 likes this.
  7. superpig

    superpig

    Drink more water! Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,658
    GPU Skinning is D3D11-only, AFAIK. The skinning is performed by the vertex shader, and the results written to a new VB via StreamOut (which is why D3D11 is required). The new VB is then rendered similarly to any other mesh.
     
  8. AlkisFortuneFish

    AlkisFortuneFish

    Joined:
    Apr 26, 2013
    Posts:
    973
    Nope, it's supported on OGLES 3.0 as well.

    Yep, that's how it works on DX11. On OGLES3.0 it uses transform feedback. Here's the relevant change on the 4.2 release:

    • GPU Skinning (requires Unity Pro)
      • Completely automatic, no custom shaders needed.
      • Works on DirectX 11 (via stream-out), OpenGL ES 3.0 (via transform feedback) and Xbox 360 (via memexport). Other platforms will continue to use CPU skinning.
     
    superpig likes this.
  9. superpig

    superpig

    Drink more water! Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,658
    Ah, yes. Thanks for the correction.
     
  10. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    Okay very interesting stuff guys, thanks! So contrary to the suggestion by @MakeCodeNow , while there exists a technique called "matrix pallete skinning" that uses vertex shaders, this isn't used in Unity; instead they pick one DX11-specific technique for DX11, another (transform feedback) for GL-ES3, and although transform-feedback should be available in most current OpenGL (not ES) driver implementations, everything else falls back to CPU. Good to know..
     
  11. superpig

    superpig

    Drink more water! Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,658
    FWIW the D3D11 and GLES3 techniques are effectively the same, they just have different names for the pipeline stages ("StreamOut" vs "Transform Feedback").
     
  12. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    I'm not entirely sure its that straightforward though. Been looking into this myself recently as I had tried to use GPU skinning on Android by forcing GLES 3.0, only to re-read the 4.2 release notes and discover its Pro only.

    The problem i've got is that its all a little too black-box. Sure Unity says that its support for dx11 and GLES 3.0, but I also remember reading about gpu skinning being available on iOS then disabled due to performance issues. I have no idea if that statement is correct or misinformation and that's the problem, you never quite know if its working on your target platform.

    Further more it would be really nice to be able to enable/disable gpu skinning on demand for a project or even a specific skinnedMeshRenderer since mileage will vary in terms of performance based on the target hardware. As far as I can tell there are no options to do this. For example if you project is cpu bound and you have a decent gpu then gpu skinning is likely to be a win, conversely if you have a poor gpu, but idle cpu cores then cpu skinning might be better.

    Really its all a bit disappointing that we had to wait until the latest API's to get a feature that can easily work with older versions and older hardware. I guess there must be complexities within Unity that made it impractical to implement older style gpu skinning.
     
  13. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    Well they always go for "runs for many use-cases on a broad range of hardware" but what baffles me is that there isn't a complete SkinningMeshRenderer-replacement in the Asset Store yet ;) perhaps one based on augmenting existing vertex-shaders with matrix-pallette skinning to fully ignore those exotic "DX11 or GLES3 only" features..

    Resending full meshes to the GPU every frame is just stupidly wasteful. This whole practice should have been banned with the introduction of the programmable vertex stage over a decade ago :D now there are a few pitfalls and limitations when going with the vertex stage but they're kind of manageable IMHO. Oh well, if U5 won't improve on this and there's still nothing on Asset Store in another half a year or so, I'll have to consider giving it my best shot myself ...
     
  14. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    lol, guess what I've been working on today ;)

    But don't get too excited, without access to vertex streams to pass bone indices and weights, plus the fact that I've always found skeleton animation math hard I might not get anywhere with it. I do have something working, by which I mean the mesh 'moves',but its just a grossly distorted mess at the moment. I'm hoping its because i've got the bone transformations wrong in the shader, but that sort of stuff is really hard to debug.

    There are various technical limitations such as the max number of bones, and working out hot to provide the bone indices and weights, but it looks feasible.
     
  15. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    I trust you have already collected all the various matrix-pallette-vertex-shader tutorials regardless of GLSL or HLSL from Google.. when I took a quick first glance a few results looked pretty promising but I didn't bookmark them.

    Not sure about "vertex streams", from my cursory high-level research it seemed some just set lots of matrix and/or quaternion uniforms so I guess, going back to the very first post in this thread or rather the first reply by @MakeCodeNow it might be feasible to still perform steps 1 & 2 on parallel CPU cores and prepare everything for vertex-stage uniforms so that the VS only needs to apply a couple matrix transformations for step 3. That's still bound to be much faster than transforming 10000s of vertices on the CPU and reuploading them each frame.. which is the current out-of-box state of affairs outside of DX11 and GLES3.

    BUT probably that's what you're struggling with already and the above is adding no real insights :D I'm definitely fairly noobish on this particular topic.. when you need a second pair of shader-coding eyes or some external beta-tester lemme know!
     
    Last edited: Jul 9, 2014
  16. MakeCodeNow

    MakeCodeNow

    Joined:
    Feb 14, 2014
    Posts:
    1,246
    Interesting info. I had no idea that Unity's GPU skinning was GLES 3.0/DX11 only. That's a remarkably high min spec for a concept that's been a good idea since the original Xbox. As folks elsewhere said, CPU skinning is worth it if you're GPU is crappy, or if you render the same character lots of times per frame, but in my experience you pretty much always need to vectorize and parallelize the CPU skinning for it to be reasonably efficient. Animation as well is naturally parallizable at many points in the pipeline. We were doing that early in the PS360 generation and Unity definitely should (if they are not already) be doing it now.

    If anyone does go for matrix pallete skinning, note that you will have bone limits and you will have to split your meshes (or fall back to CPU skinning) if they go over that limit. This is just the way it's always been, and one of the reasons stream approaches are awesome, but so far too much hardware still lives in the land of DX9/ES2.
     
  17. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    You need a vertex stream or two to provide the bone indices and bone weights to the vertex shader, much like you pass in tangents for normal mapping, you need this data per vertex.

    My current hacky solution because we don't have the ability to add arbitrary streams is to use the vertex color, but that means i'm limited to 2 bones per vertex and the weighting is quantised to 1/255f intervals. There are a couple of alternative options that should provide improved accuracy and upto 4 bones, but I want to get the basics working first.

    As MakeCodeNow mentions though there are further restrictions to deal with, such as the number of matrices that can be passed into a Unity shader. For Shader model 3 that appears to be around 56 bones currently, though may be less with more complex shaders. This can be improved and feasibly could be doubled, but again I want to get the basics working first and to check there aren't any silly or strange road blocks that make the concept pointless to implement.

    Talking of which after much banging my head against a wall and randomly trying out different transformations/matrices combinations, ( becuase I had no idea why the mesh got messed up) I've actually got skinning working on the gpu for a simple test model/animation i knocked up in blender. Might be a bit early to be sure, need to test with a proper animated character, plus its like 5 am, but i'm pretty happy with the progress.

    Oh one other positive aspect, even if skinning doesn't work, there is the possibility of using this technique for non-hardware instancing of objects in a single draw call, without having to batch or combine models. Indeed i'm pretty sure others have already done that.
     
  18. KristianDoyle

    KristianDoyle

    Joined:
    Feb 3, 2009
    Posts:
    63
    I wouldn't mind betting that 'skinning' is the number one expensive calculation that occurs through the character pipeline. Usually. When Motion Builder brought GPU skinning into MB2011, you could raise the mesh density by about a factor of 10 (with a half decent GPU). Realtime motion editing with dense meshes. If you want to find if it's available - do that to your meshes and test the performance ;)
     
  19. superpig

    superpig

    Drink more water! Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,658
    Yeah, I was disappointed when I found that out. IIRC the main reason is that this way it's entirely separated from the rest of the shading pipeline; if we were to use traditional matrix palette skinning then we'd have to be patching our shaders (and/or surface shaders) to do all that work, which would require changes to the shader compiler, etc. Not that any of that sounded impossible to me, but I guess it's just one of those 'nobody's done it yet' things.
     
  20. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    This is something I really don't get about Unity. What you've stated is pretty much to go to expression whenever anybody queries why Unity's graphics are lagging behind hardware, usability, feature completeness etc. Its been like this for many years now, so why don't they just hire or re-allocated one or two specialists to focus on this stuff?

    Its pretty important and clearly hurts Unity in that many features simply cannot be implemented by developers because the API, hooks, functionality doesn't exist.
     
  21. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    Not to derail thread, but I agree with you, and it's the same with array-textures. Been around for ages, are genius for many problems (artifact-less and pain/seam-lesser atlasing alternative for once) and any decent renderer including Unity should provide a hook to be able to use them in shaders. It's just a standard feature of current-gen GPUs and graphics APIs, even GLES3. You ask about it, you get "dunno, never seemed important to us, nobody seems to need it, probably one day, no probably not U5, ah well that's just how life goes isn't it". WTF, they have so many developers! 100s by now I think? Even 10s is a lot. Probably all been working on their GUI for the last couple years, the lot of them :D

    OK, back to skinning..

    Wow that sounds like some extremely cool hackery.. but right now I can't really guess to the best of my ability how that could work on the most basic level? (Instancing is like array-textures that other "has been around for many years now in gfx APIs/GPUs so a renderer should provide some kind of at least low-level access to it but UT, nope"...)
     
    Last edited: Jul 11, 2014
  22. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    GPUSkinning_wip_01.jpg
    Work In Progress - Increased the bone count support from 56 to 74, which is just enough to support good old Unity's Lerpz model (69 bones). Unfortunately switching to float3x4 instead of float4x4 prevents the shader being run on gles or opengl because glsl 1.2 and below don't support non-square matrices! Can easily be fixed by constructing as a float4x4 in the shader, but a bit of a shame. Anyway I've still got an alternative to test that should give me close to 100 bone support in SM3.0.

    Still lots of issues to resolve such as how to obtain the bones without using an animated skinned mesh into the scene, which of course defeats the purpose, but making progress. Then i'll have to add shadow collector/renderer passes too and probably a load of other stuff that i'm forgetting to surface shaders.
     
    metaleap likes this.
  23. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    Awesome stuff!

    I ran into this once with GLES2.. BUT GL-ES 3 shading language (officially called ESSL3 :D ) supports those according to the web :D whether Unity knows this is another topic, but GLES3 that's Android 4.3 or higher, and iOS users always update to their latest OS version anyway ;) I say don't worry about GLES2 unless you're targeting it yourself of course..

    (Ridiculously, the GLES3 emulation seems to be gone in 4.5, only GLES2 in there now. Perhaps that's because GLES3 is functionally equivalent now to the SM3 desktop-GL version Unity is using in GL mode? Dunno...)

    I say go for it :D I gotta wonder, are there any projects out there that on the one hand need high-performance skinning of many high-bone models but on the other hand still need to somehow support ancient SM2? Even current-gen mid-range mobiles have SM3 by and large, and people replace those devices almost every year (or perhaps every two in the case of non-power-users that are content with Candy Crush and Angry Birds).

    I guess an Editor script could extract/process/prepare that data at design-time from a selected Skinned Mesh.. having an out-of-box SkinnedMeshRenderer game-object in the scene that isn't used is indeed a bad idea, it's still uploaded to the GPU AFAIK and one way or the other consumes resources needlessly.

    Ultimately I think you might perhaps want a cginc for the core vertex logic ... people with Surface Shaders could include that in a custom vertex function (which might carry over to shadow passes with a bit of luck), people with custom vertex shaders could include it in all applicable passes.....

    Wouldn't worry about shader patching too much, initially power users can do that manually and before long someone will come up with a patching script, similar to the one for Forward path in the Shadow Softener asset package.
     
    Last edited: Jul 9, 2014
  24. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    Well that's the problem I don't know of a method to force Unity shaders to use GLES 3, but to be fair i've not even bothered doing a search for that yet. Its not a big issue, should be easy enough to have a multi-shader that falls back to creating float4x4 for older shader models.


    That's quite a good point, but regardless if this all works out i'll like to support as wide a range as possible and let the developer decided what they want.


    This is a bit of a sticking point and a decision i've been putting off. Currently I rely on the skinnedMeshRender (SMR) to obtain the animated bones. Certainly I don't want to be bothering with trying to animate the bones myself, pretty sure Unity will be doing that in native and probably optimised threaded code. However i'm not sure where that leaves me.

    Ideally i'd want to include just the bone transform/gameObjects in the scene and have them animated, since that way you can still attach colliders to them, but a few quick tests show that I need to have the correct mesh applied to the SMR to make it work. Hence why in the screen shot above you have two copies, one is the Unity SMR, the other is a static mesh being gpu skinned. This was useful for comparison and judging the error of bone weighting, but currently is also a requirement. I need to dig into SMR and animations to see exactly what sort of access I have. Mind you Mecanim may add a whole new level of complixity on top, again a good reason for trying to use the SMR if only i could prevent it from skinning.


    Yeah this might be doable and worthwhile. Have to see.
     
  25. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    Addendum:

    Was wrong about SMR needing the correct mesh, it actually seems to work with having no mesh. I think when I tried this before I switched the mesh to a cube, thinking having none would break it, appears that the opposite is true.

    Could be that i've messed up somewhere, but if I can use a SMR with the original bones and just set the target mesh to none, so Unity takes care of all animation and bone updates, plus the SMR bone gameObjects allow for attaching other models or colliders and the SMR doesn't add any skinning overhead then this might work out well.

    In fact it might be very cool as it seems to still maintain the original objects renderer bounds despite the mesh no longer existing, meaning the animation still adheres to the cullingtype setting. Though the animation will continue regardless, but I seem to remember being caught out by this before, believing that culling would 'pause' the animation, not just stop it from updating, where as in fact it seems to not updating but the internal timing continues as normal.
     
  26. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    You're right that'd be extremely convenient. How lucky that this "loophole" exists, if it works as you're hoping we only need to lobby UT not to accidentally "fix" this rather advantageous oddity down the road... ;)

    Of course I don't have access to the Unity internal source code but logically speaking:
    • in CPU-skinning mode, it would have to have a copy of mesh verts in order to skin them, right? I think that's unlikely for Unity to create a full copy of the model in RAM when you first assign a mesh, then keep it when it's set to None? But "unlikely" doesn't mean impossible, so yeah let's wait for authoritative input
    • in GPU-skinning mode, it doesn't matter --- with a custom GPU solution like yours one would turn off the oob gpu-skinning option anyway
     
  27. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    Yep that would be my only worry. Without using SMR it really would be a massive pain at every turn. I'm not even sure using SMR with no mesh will maintain the integrity of say Unity's pass to collect potential shadow casters in the scene and I know of no method to 'inform' Unity that a gameObject should be included. Lots of unknown's really, much of which are due to the blackbox nature of using an engine.

    OMG are you stalking me, or Aras, or both of us ? ;)

    LOL funny how fluid information is these days with social networks, really never expected to see that.

    Anyway according to his reply, he doesn't think there is any skinning overhead, probably for the same reasons you gave and the same reason I tried it, it just wouldn't make sense that skinning could occur without a mesh. Of course as you point out you never quite know whats going on under-the-hood.
     
  28. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    Yeah haha sorry I first wanted to reply directly on Twitter (following both so the tweet popped up in my feed) but still struggling with 140-chars after all these years :p so I put it here. (Also perhaps a subtle nugde to raise awareness of this thread and by implication about the necessity of keeping your newly discovered loophole open ;)
     
  29. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    Ok I have some sad news, custom gpu skinning isn't going to work ;(

    There are no technical issue, its sadly a practical issue with how Unity passes data to the shader that is the sticking point and the fact that it takes a massive amount of time to do so in U4.3.4 (not tested newer versions yet).

    The problem is the call

    material.SetVector(name, vector4)
    which has to be called three times for every bone to transfer the matrix data to the shader. This is compounded if your model uses mulitple shaders as each must be pass the bone matrix data. That means for a model with 69 bones and just a single material, that line alone ( performed 69*3 times) took almost 5ms per frame!

    This is a shame because the actual math per bone each frame and extracting the vector4 from the matrix takes around 0.03 ms.

    While I could return to using SetMatrix() this will greatly reduce the number of bones support (74 down to 56), which in turn would reduce the number of Material.SetMatrix calls, I still think it will be a huge hit, probably 1.5 - 2ms. That may just be acceptable for a single material, but for multiple meshes or a mesh with mulitple materials the overhead is just huge and makes the whole thing impractical.

    I'll do some additional testing, try newer versions of Unity, check if there are other methods to pass the bone matrix data to the shader, but at the moment this is looking like the end of the road.
     
  30. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    I wouldn't throw it away and abandon all hope just yet, but put it out there in some form or shape:
    • SetVector might become speedier with the next Mono version, next year's machines, v4.5 or even v5 --- all these are more likely than CPU skinning (process 10000s verts and reupload per frame, per character) becoming speedier to a similar degree. I don't see this as a long-term bottleneck especially as you scale up number of characters, over time.
    • uniform arrays are semi-supported / faked, are you already using those? If not, maybe someone (could even be me down the road) has a go at them and finds them speedier. Or perhaps at some point UT replaces internally the current fake-arrays with real-arrays (perhaps when some legacy gfx generation is finally dropped, happens every other year or so) and the same code becomes that much faster just like that
    • SetMatrix with 56 bones might also still work for many custom use-cases where the art style dictates simpler meshes (who knows, bugs or pet games) but the game-design requires more animated meshes than out-of-box CPU-skinning could handle....
    • Crucially also I wouldn't use SetVector(string name, etc) when it's called so many times, there is another overload without the string which I imagine saves a costly dictionary lookup somewhere and depending on other factors, possibly string allocation thingamagix. Now I'm not entirely sure whether that overload is a new 4.5 thing but I think not. I haven't used it yet because I don't set that many uniforms per frame yet, but yeah you don't wanna use the Shader.SetFoo(string) or Material.SetFoo(string) overloads for "setting many uniforms per frame".
    It's pretty cool progress I think. Other option to push data into a shader is a texture. Now intuitively, at first, I can't imagine writing an in-memory texture every frame and uploading it and the vertex-shader texture-fetch all taken together being faster than Unity's apparently slow-ish SetVector implementation. On the other hand all graphics-API and GPU/driver and (some) engine developers are constantly optimizing texture access and throughput performance (more bigger higher-res moar moar moar textures seems a top-priority gfx investment in those circles), plus it would be a really small texture.. Again, if what you got so far is out there for others to play with, someone, again quite possibly even yours truly quite soonish, might well play with that alternative in case you got fed up with the topic for now so far :D

    Also what kind of machine/OS setup are you using where you got the approx. 5ms. Again, happy to test perf on my (on paper) "beast of a machine", maybe there's already "hope for the future". As I wrote here, U4.5 skinning of just one single skinned mesh kills my 2014-Quadro-workstation FPS from 90ish down to 20ish so theirs is a big fat fail, and I'm pretty optimistic your solution might still fare quite a bit better. And I'm probably not the only one.
     
    Last edited: Jul 10, 2014
  31. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    Yeah, well, um, actually its not as bad as I thought ;)

    Just tested in 4.5.0f6 and it has obviously fixed the setting of shader properties as now I get 0.14 ms instead of 5ms for the case above and for the test where iI have 69 bones and 7 materials on a single animated mesh its no longer 35ms (yes it really was that bad in 4.3.4) but just 0.73 ms!

    Not sure what Unity did, but clearly SetVecto() and maybe other material set shader property values were seriously broken in 4.3.4. Going to have to go back over some client projects and see if I can upgrade them to 4.5. because that level of performance regression in 4.3.4 is simply unacceptable.

    However thinking about it i'm wondering just what level of overhead I can have from setting shader parameters vs having unity do the skinning on the cpu. I don't know if i can get hard numbers for that, but I am a little concerned that regardless what I do, just feeding the materials/shaders with the bone data every frame might end up being too much of an overhead.

    Ultimately I think i'll only know for sure once I get to a stage where I can test this on an Android project which started the whole process and has a 30k vertex/polygon skinnedmesh! Its heavy enough that without the model I gain 15-20 fps, so it will be easy to see if it was worth it.

    Well technically you were right it did become speedier, but only because it was clearly so broken in 4.3.4. ;)

    I don't think overall such a function will ever really get faster and to be honest my main point behind exposing SetVector or SetMatrix was that for an array we have to send each element one at a time! Sending the bone matrices to the shader should be a single call with an array, but Unity doesn't support that yet.

    So yeah to your second point I am using the hacky workaround for sending arrays.

    Good point about using nameID instead of a string, i'd forgotten about that. I did already cache the name strings, but using an int should obviously be even better, especially when i'm potentially calling 1447 times for 69 bones and 7 materials (3 * bones * numMaterials ). I will give that a test in both 4.3.4 and 4.5.

    Edit: Results for nameID are in
    4.3.4 worse case went from 35 ms down to 33.5 ms
    4.5.0 worse case went from 0.73 down to 0.29 ms
    So yeah pretty useful tip that, thanks.


    There are of course some alternatives, such as textures to provide the bone matrices, but i'd really want to avoid that if possible.

    Yeah it is, i'm pretty pleased both from a technical stand point and personnal since some of it has been challenging. Its also why I was so gutted to find such poor performance in my post above. Particularly after spending the morning documenting how I see the overall system working and feeling good about that. Plus I got shadows working and other things.

    Well its still my baby for the moment, but if i hit a dead end i'll open source it, if it works I'll probably release on asset store, so might need some testers.
     
    Last edited: Jul 10, 2014
  32. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    Major props man, such a fantastic project and I'd say sth. like this is needed by lots of folk.

    Well my mind keeps going back to how CPU skinning processes typically 10000s of vertices per character and reuploads all of them per character, per frame. Got more characters than CPU cores, you're gonna feel it (even if they patch up whatever crazy bad regression 4.5.1 seems to have introduced to cpu-skinning). So while sending 100s of vector and/or matrix uniforms might feel a bit overkill to you, it's gotta be the smarter choice as you scale up the number of skinned characters..

    Awesome stuff! Keep this thread in the loop and I'll definitely happily buy and review on Asset Store as soon as you choose to put it there!
     
  33. MakeCodeNow

    MakeCodeNow

    Joined:
    Feb 14, 2014
    Posts:
    1,246
    Quick note that I've found vertex texture fetching on mobile to be really slow, even on fairly modern devices that "support" it.
     
  34. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    Interesting, thanks for the head up.

    Thanks for the vote of confidence.

    In the meantime, for your enjoyment, what happens when Quaternions go bad



    Kind of nightmare-ish effect. Would love to 'tame' this and use it as an actual effect, perhaps in a 'glitch' project ;)
     
  35. cluh

    cluh

    Joined:
    Jul 21, 2012
    Posts:
    4
    One quick thing. Have you tried using MaterialPropertyBlock instead of material.SetVector() ? It's supposed to be a tad faster and may even clean up your code a bit.

    I had been experimenting with getting GPU skinning working to, and was pointed to this thread by metaleap. I didn't get near as far as you though, only have had a very limited time to look at it. I got caught up on how to neatly index the matrix in the shader, as well as the correct construction of the bone matrices that are passed into the shader.

    Best of luck!
     
  36. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    Yeah its on my to do list of things to look at once i've got everything else working. Still a bit concerned that I may run into some technical limitation, so don't want to spend too much time on auxiliary aspects at this point.

    Though i'm not entirely sure that MaterialPropertyBlocks would necessarily help in this case. From memory I thought they were more geared to towards providing an easier and more efficient method of having instances of materials.

    I guess it might help for when a skinnedMesh has multiple shaders though since they all need the same Vectors set, but I wonder how easy this would be for developers to interface with if they have their own custom shader properties.

    Hmm, reading the docs more closely these might well be easier and more efficient to deal with multiple materials on a mesh/renderer. The use of renderer.SetPropertyBlock() suggests that the values are passed on to all materials and shaders that need them on the renderer. If true that certainly cuts down the amount of work i've got to do, but i'm puzzled as to how it works in this case.

    I mean if I have an animated mesh with four very different materials/shaders, each one using various different properties, or even the same property but you want it to be different for each material, how can that work?

    Its also unclear if using SetPropertyBlock will reset/void any shader properties that don't exist in the propertyblock but are used by the shader.

    Basically the Unity docs are not in-depth enough, which I guess brings us all the way back to the start of this whole thread ;)
     
    Last edited: Jul 10, 2014
  37. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    ... and deeper down the rabbit hole that is Unity we go ;)

    So out of curiosity I decided to quickly add in MaterialPropertyBlock (MPB) in Unity 4.3.4 to see if it would help when I had multiple materials on the same animating mesh and got some weird results. Turns out I inadvertently laid the blame for the massive performance drop on the wrong thing, well sort of ;)

    I'd happened to have added in shadow passes to the shader prior to noticing the performance issue. By chance I switched the shader back to an earlier version that didn't have the shadows and the performance was back to normal. Now the weird thing is that this isn't simply a case of shadow passes decreasing performance, since the timings I reported were, according to the profiler coming directly from the function where I used SetVector() not in the gpu results.

    Looking at the Profiler the gpu does decrease in performance when adding the shadows, but the biggest drop by far (80%) is still coming from the cpu and the script. Now either the Profiler is bugged or something weirder is going on, where the fact that my shadow passes also need the boneMatrices (supplied by Vectors) is causing a huge hit in the script that is setting them on the material shader.

    So long story short, it seems a weird combination of a custom shadow pass that required the same vectors as the main shader pass which were being set from a script was the ultimate cause of all the heart ache. I think U4.5.0 still fixed whatever this problem was here though.

    The real wtf though is that switching to MPB also happen to fix the problem in U4.3.4, even when the shadow passes are active! Yep even stranger is that when switching from my old 'update shader properties' function whilst it is running to one that uses MPB performance improves dramatically from 33.5 ms per frame (MainThread stats of 35 ms) to just 0.05 ms, though I then also get a fx.waitForPresent of 1.5 ms and in total 1.5 ms for MainThread.

    So to recap. in U4.3.4 setting many material shader properties in a function with a shader that has shadow passes (collector and caster) which uses those properties performs badly. Remove the shadow passes and its fine, use MPB and its fine, even with shadow passes!

    Man Unity is one complex beast ;)

    Here are the results all in U4.3.4 for a character with 59 bones, 7 Material, 2 directional lights, 1 with hard shadows.
    Note: All tests in editor, but I'd be surprised if that has anything to do with the results.

    Shader With Shadow Passes:
    Function (SetVector called 1239 times per frame) 33.5 ms - MainThread 35 ms - Renderer 0.9 ms

    Shader No Shadow Passes:
    Function (SetVector called 1239 times per frame) 0.26 ms - MainThread 1.2 ms - Renderer 0.8 ms

    Shader With Shadow Passes & MPB

    Function (SetVector called 177 times per frame) 0.05 ms - MainThread 1.5 ms - Renderer 1.4 ms

    Shader No Shadow Passes & MPB:
    Function (SetVector called 177 times per frame) 0.05 ms - MainThread 1.2 ms - Renderer 1.0 ms​
     
  38. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    Wow MPB seems like DA WIN.. crazy, good to know!

    Out of curiosity, do you need to change shaders at all for using MPB? From their naming I initially thought maybe they're the Unity name for GL's "uniform-blocks", but that would be a new syntax on the shader side as well. And from the docs they seem like a slightly different thing altogether
     
  39. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    No shader changes at all, but I can't quite get a handle on MPB myself.
     
  40. superpig

    superpig

    Drink more water! Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,658
    Supposedly, MaterialPropertyBlock is just a bag of material/property values, such that when Unity is actually binding the material properties it does something like:

    Code (csharp):
    1.  
    2. foreach(shader property in material.shader)
    3. {
    4.    if(materialPropertyBlock != null && materialPropertyBlock contains property)
    5.    {
    6.       set shader property (materialPropertyBlock.getProperty(property))
    7.    }
    8.    else
    9.    {
    10.       set shader property (material.getProperty(property))
    11.    }
    12. }
    13.  
    but I guess there must be some interesting things going on under the hood, if it's so much faster...
     
  41. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    Thanks for the update.

    Yeah, there was definitely something screwy going on with using SetVector() and shadow passes that MaterialPropertyBlocks seem to avoid. I could easily see it just being more efficient in terms of a tight loop around a single shader for a material, but that doesn't really explain why you need shadow passes in the shader for SetVector to fail so spectacularly.
     
  42. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    I'd rather suspect simply what you have basically already found previously --- there was a setvec regression in 4.3.4 but only in certain circumstances, not always --- perhaps rather than MPB being magically a gazillion times faster, the regression perhaps simply didn't get triggered when using them? Well you're gonna find out once you go back to 4.5 for measurements ;)
     
  43. cluh

    cluh

    Joined:
    Jul 21, 2012
    Posts:
    4
    The cool thing about MPB is that it allows you to change the uniform values on the materials of a renderer without having to instance them. The downside that I've seen with it is that you can only apply one property block to a renderer, so if 2 different sets of code use it, only one will be applied.

    In my experience, you can use a hybrid between MPB and material.Set...(). In the case of both setting the same value it should favor the value in the MPB.
     
  44. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    Small update.

    Tested on my Nexus 7 and the performance was good. A test scene using Unity skinning came in at 45 fps, the same scene using my custom GPU skinning hit 60 fps (not sure if it is being limited by the device to that value). So there definitely seems to be some benefit.

    Alas one small problem. Despite forcing GLES 3.0 I was only able to get 33 bones passed to the shader. In the adb log it states that 'GLSL: array sizes larger than 99 not supported' Asuming each of those is a Vector4 and I need 3 per bone then thats why I only have 33 bones active. This is a bit of a pain as in SM 3.0 I can have 72 bones, so i'm kind of puzzled as to why GLES 3.0 is so limited (almost half the number).

    I'll have to go back and check shaderlab see if there is a pragma setting i'm missing. I'm already using '#pragma target 3.0' and '#pragma glsl', not sure what else I can do.

    I still have an alternative to 3x4 Matrices to try but that will only gain me about 15 more bones, so I really need to find out what the max number of uniforms supported by GLES 2 and GLES 3 are, as well as whether i'm missing something in the shader. Otherwise I guess i'll have to look into splitting models into ones with smaller bone counts, but I was really hoping to avoid that as it seems like a lot of hassle.

    Its a real shame as the process for enabling GPU skinning is really simple and elegant. Indeed unlike Unity it will be possible to enable/disable it at start time and probably runtime too.

    Edit:
    Moving to the Mac and doing further testing it would appear that arrays in glsl are actually limited to 99 elements! That is going to be a huge pain. I was able to get all the bones working by passing in two bone matrix arrays of 33 bones each, but performance was worse than Unity skinning, taking about 50% more time. Not sure why, if its the branch in the shader code or what, but its not good either way.
     
    Last edited: Jul 11, 2014
    IgorAherne likes this.
  45. Nims

    Nims

    Joined:
    Nov 11, 2013
    Posts:
    86
    What an amazing thread this is, keep it up, I am awed!
     
    IgorAherne likes this.
  46. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    So this only occurs in GL? What about under Windows but running with -force-opengl?

    Yeah all GL-ES hardware caps at 60hz max., this is impossible to circumvent. Measuring the actual GPU-only ms/frame is also pretty much undoable within Unity scripting so well as long as you're hitting 60 you're good.

    If you hit the same limits that are lower than spec'd for SM3 both on the Mac and on GL-ES, then I have to strongly assume a hard-coded Unity "safe-guard" limit rather than an actual GL-context error message.. doesn't Unity do only "fake arrays" ie. you declare an array "arr" and it generates uniforms such as arr_0, arr_1, arr_2 etc.? Couldn't this be highly likely they capped at 99 a few years ago, then never revised the limit or query it from the GL context? Guess we should ping @Aras for this one too! Or perhaps wait until he gets back from holidays.. :D
     
  47. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    So far it happens on GLES 2.0, GLES 3.0 and maybe OpenGL on Mac (need to retest and double check project settings), not tried testing OpenGL on Win yet, i'll give that a go.



    Yeah I thought I read something about that. I should have made a note of the delta per frame from my display instead of just framerate.


    Well if it is an artificial limit, then its only applied to GLES/OpenGL, but it is very strange. Its not a limit on the number of uniforms passed in , it is literally a limit on the number of elements an array can have. I was doing a search for glsl array limits, but then got side tracked ;) Going to test sending BoneMatrices instead of by Vector, it will reduce max bones to 54 in SM3, but maybe GLES 3.0 has higher limit? Plus if I ever get round to implementing my secret weapon I could double the bone count, so maybe even using matrices could be worth it for openGL devices.
     
  48. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    Cool but keep the previous approach handy somewhere as a toggle or at least outcommented code! Seems to have worked very well outside GL-ES / MacGL --- which while not immediately helpful for your own current usecase will still be useful for many projects / future customers

    Perhaps, and if not then GLES 3.1 (which was already announced in reaction to Apple's Metal and will be deployed with Android L) might! I say make it somewhat configurable to allow for future devices (I have no optimism of Unity's own skinning getting that much better even in v5, doesn't seem to be a priority at all, and then even if, you get another minor update and bam a regression, best to just skip it) rather than limit it strictly/hardcoded by yesterday's constraints. :)

    What's your Nexus running, GL-ES 2 or GL-ES 3? If the latter and you're pushing from U4.5, it's best to select "force ES3" in Android Player Settings --- because otherwise from my limited experience I really wouldn't rely on an ES3 device really autoselecting an GLES3 context "like it could and should".....
     
  49. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    Don't worry I never delete anything and currently it should be possible to provide a codebase that supports all variations.

    You mentioned before being unhappy with Unity GPU skinning, can you explain why, might help me focus on my custom version.

    Tried that, no affect.

    As far as I can tell this glsl max array count of 99 is an artificial Unity limitation, which is a bit of a bummer.
     
  50. metaleap

    metaleap

    Joined:
    Oct 3, 2012
    Posts:
    589
    Main gripe is CPU skinning being "quite slow at times", and "not a smart idea anyway": and GPU skinning's main problem is availability. I got Pro right now, but still doesn't seem to be working outside of GL-ES 3 or D3D11. I have a couple crucial but rather intricately tricky shaders that I don't think I'll ever make work in DX. And why should I, all Windows machines have very good GL drivers for the last 2-3 years, even Intel HD 4000 or higher. (On Win and Macs GL4 is fairly widespread by now, not that I could take advantage of that with Unity.) So I can totally just run my projects with -force-opengl on Win, and Mac and Linux gamers get GL anyway. I'd hate writing extra versions or #defines for quite a few "tricky trickery" shaders only to support DX, nope. So in desktop GL as far as I can tell, even though transform-feedback has been a standard GL feature for a number of years, we don't get U-Pro's GPU skinning as of now. Also while I'm invested in U4 Pro I'm not sure I'll move on to U5 Pro tbh. considering the current cost/benefit looks of it. And you're Indie user too, right? GPU skinning is Pro-only. So at some point I'd need a non-Pro, works-everywhere-with-SM3 GPU skinning like many others..

    Btw. I don't think that Unity would oppose this, they allow many projects that provide "missing Pro features" to Indie users (even some sort of RenderTexture-less 'Image Effects' projects, also NavMesh and Umbra replacements etc.) and still make bank from those anyway.

    OK I'm not too familiar with Nexus devices, it does have Android 4.3 or higher right? There's a couple free "GL-ES stats & diagnostics" apps for Android, in case you're not-right-now-but-would-like-to-be 100% certain whether your app runs on ES2 or ES3.

    Well but on the bright side it might be easier to get this hard-limit upped by pestering @Aras than lobbying say Khronos to change their specs!