Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Could Batched Vector Ops Speed Up Unity?

Discussion in 'Scripting' started by Arowx, Jun 26, 2014.

Thread Status:
Not open for further replies.
  1. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    OK You've written you're game the bullets are flying the enemies are moving but then you need to speed things up a bit, you pull up the profiler and start optimising the problems.

    But what about the meta view, each frame you are updating or manipulating a load of Vectors or Quaternions. In some cases you might even have a Enemy manager that loops through all of them and updates them.

    As I found out with a simple benchmark, it takes time for each of these updates (It was taking Unity 12ms to get and set 2000 updates to the transform.position on PC using Deep Profile) to occur and part of that time is the overhead of calling the functions and passing the values, doing the calculation and returning the result.

    So could Unity benefit from Batched based getters, setters and operations e.g. TransformArray.positions = positionArray;

    Or Vector3Array+= scalarArrayOrVectorArray;

    Then Unity unity could under the hood take advantage of platform specific speed ups, e.g. SIMD instructions, and optimisations.

    Would you use Batch based Features for performance optimisation if Unity provided them?
     
  2. Sharp-Development

    Sharp-Development

    Joined:
    Nov 14, 2013
    Posts:
    353
    I definitly would. As a few additionaly tips and tricks:

    1. transform.position is way slower than transform.localPosition. Same goes for rotation. Why is that so?
    Well, unity stores only the local transformation matrix, generating the world values again and again when the properties are called.
    What I did: I've implemented my own transfrom hierarchy, moved every call in a batched fashion to another thread, updated the world/local properties of my transfrom class and simply updated unity with the local properties. Getting major FPS increases out of it. For example, 1000 objects walk with about 250-750 FPS depending on CPU.

    2. When dealing with Vectors/Quaternion, its faster to do the math yourself and use the x,y,z,w components directly, ditching every operator overload. This does carry performance increases of about 20-50% in vector math.

    3. Mono.SIMD... It unfortunatly doesnt work in unity. I've once recreated vectors and quaternions with mono.simd. However, later on I've noticed that unity acctually doesnt support it. Pretty much a waste of time.
     
    NomadKing likes this.
  3. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    Cool so that's a +1

    But the core of Unity is running on C++ so it could take advantage of native platform specific SIMD features, C# would just be passing in the values to use. And even allow for the possibility to process these updates on GPU's e.g. OpenCL/DirectCompute.
     
  4. Sharp-Development

    Sharp-Development

    Joined:
    Nov 14, 2013
    Posts:
    353
    Sure, though I was talking about managed SIMD in user code right there. ;)
     
  5. Brainswitch

    Brainswitch

    Joined:
    Apr 24, 2013
    Posts:
    270
    Unity's version of Mono supports an old version of Mono.SIMD, at least on Windows+x86. But as it is an old version of Mono.SIMD a lot of features are missing and since you need to pass between Mono.SIMD and Unity classes and structs the performance is only rarely higher (and quite often lower) and it has a tendency to crash.
     
  6. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    Just to clarify the SIMD issue, I'm not referring to the C# library.

    With batching the work would be done within the Unity C++ game engine and therefore can use an appropriate platform specific SIMD instruction set or even offload the work onto the GPU.

    In effect providing Unity with a very good way to optimize the batch performance of Vector, Quaternion and Matrix operations.
     
  7. Sharp-Development

    Sharp-Development

    Joined:
    Nov 14, 2013
    Posts:
    353
    Oh really? Thats interesting since I've acctually used the Mono.SIMD library comign with unity.
     
  8. Ferb

    Ferb

    Joined:
    Jan 4, 2014
    Posts:
    25
    I've spent a lot of today looking in to this. When you used it, did you ever do any benchmark tests to see whether it seemed to be working, and on what platforms? The Mono project page currently has support for SIMD instructions on ARM processors down as a low priority task, so it can't make a difference on most mobiles.
     
  9. Sharp-Development

    Sharp-Development

    Joined:
    Nov 14, 2013
    Posts:
    353
    Yes, to be honest, I did benchmarks after my comment above. And it turned out that infact there is absolutely no performance benefit. Mono.SIMD uses some kind of fallbacks in case SIMD is not supported. It seemed when I benchmarked that there are no SIMD instructions executed but rather only the fallbacks which does result in zero performance increase.
     
  10. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Transforms have a lot of overhead, you could try using command buffers in 5 and see if that's an improvement. A transform has to worry all about it's children, the hierarchy and so forth, and it evaluates the entire hierarchy each access, including whatever math is required.
     
  11. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    @hippocoder So my idea of Relative Staticness could boost Unity performance, as moving an object with static child components could be manipulated by a single transform.
     
  12. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Moving an object with static child components doesn't make sense for transform (it isn't a child then), but can be approximated by calculating said child object transforms in C# instead. Command buffers or drawmesh simplify this process assuming you want a bunch of child 'objects' to not have transforms and be objects that are drawn without a transform overhead.

    Naturally, calculating this overhead yourself might well be faster since you can apply various optimisations based on what you need, cutting out a lot of fluff.

    It's all a bit ambiguous what you want, but it's possible to code it in Unity right now - just not using transforms.
     
  13. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    With Relative Static you are letting Unity know that the sub components will not be 'moved' in relation to each other so their meshes can be baked into the none static parent, improving performance.

    You're right but it does give you editor and even runtime flexibility. And any games where objects are combined into a solid compound object would benefit from a performance boost.

    e.g. Upgradable vehicles, weapons or Units could have multiple types of armour or components. These could be added as static sub components and baked into the Unit, reducing the calculations needed to transform the Units.

    You could combine sub objects at runtime and apply static to them knowing that Unity will combine them into a single transform for performance.
     
Thread Status:
Not open for further replies.