Search Unity

Built in Math still slow! (faster implementations here)

Discussion in 'Scripting' started by half_voxel, Nov 1, 2010.

  1. half_voxel

    half_voxel

    Joined:
    Oct 20, 2007
    Posts:
    978
    Some of you may know that a while ago, another user ran some tests and found out that some math functions (primarily the Lerp functions) were slower than needed. He could write faster implementations which did the same thing as the built in ones. This resulted in Unity updating the built in functions to run faster.

    I have run some tests to see if I could write faster implementations of some other Mathf functions, and it turned out that I could.

    Abs (float) - 86% faster
    Abs (int) - 105% faster
    Sign (int) - 60% faster
    Clamp01 (int) - 63% faster
    Repeat - 291% faster
    Approximately - 5209% faster!

    Especially the Approximately function could get super fast! It seems like it has become slower from 2.6, because I ran the same test script on 2.6 and then my implementation was "only" something like 250% faster.
    It almost seems like I must have written faster, but incorrect code.

    It looks like mostly the functions operating with int's are a lot slower than they could be.

    Here's a webplayer if you want to see for yourself - http://www.arongranberg.com/unity/mathf-performance-test/
    The script executes the functions 50000 times and measures the time it took for them to execute, then it compares it to my implementation and calculates a % value (100% equals no difference).
    All values are random (refreshed every loop) and goes from -100000 to 100000... I think.

    I have attached the script if anyone wants to take a look (had to attach it as .txt because the uploader didn't want to upload a .cs file).
     

    Attached Files:

    Last edited: Nov 1, 2010
  2. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    abs, sign, clamp (int). approximately and repeat are faster here, the rest are slower.
     
  3. dgutierrezpalma

    dgutierrezpalma

    Joined:
    Dec 22, 2009
    Posts:
    404
    Sturestone, can you test it on other hardware?

    I remember that 2-3 months ago I was trying to optimize a low level C algorithm and I tried two different optimizations: one of them was much faster on my current desktop and the other one was faster in my 5-years-old laptop. So it is possible that your optimizations are faster in your computer but slower in other computers.
     
  4. half_voxel

    half_voxel

    Joined:
    Oct 20, 2007
    Posts:
    978
    I have tested it on the two computers I have available, one old iMac and one new iMac, the implementations I wrote about in the first post were faster on both.

    I posted a link to a webplayer so you can test it yourself, post your results!
     
  5. dgutierrezpalma

    dgutierrezpalma

    Joined:
    Dec 22, 2009
    Posts:
    404
    I'm at my Ubuntu desktop at this moment, so I can't check it now (no webplayer for Linux :(). I'll try it later from my MacBook Pro ;)
     
  6. Chris-Sinclair

    Chris-Sinclair

    Joined:
    Jun 14, 2010
    Posts:
    1,326
    Here are my results, a bit of a mixed bag of improvements and losses:


    I'm on:
    Windows XP Service Pack 3, 32bit
    Intel Core 2 E8400 (3.0 ghz)
     
  7. half_voxel

    half_voxel

    Joined:
    Oct 20, 2007
    Posts:
    978
    Yup, I'm glad not all of my implementations were faster, then Unity would have done a really bad job with Mathf.
     
  8. xomg

    xomg

    Joined:
    Sep 27, 2010
    Posts:
    330
    Well that's depressing to hear, especially for something like max() on an integer. I'd guess/hope there'll be reasons for this beyond poor implementation.
     
  9. Chris-Sinclair

    Chris-Sinclair

    Joined:
    Jun 14, 2010
    Posts:
    1,326
    I think your clamp methods can be faster if you put a single line like:

    return a > c ? c : a < b ? b : a;

    (similarly for Clamp01)

    You avoid the re-assignment of "a" and the extra checks if "a" is greater than "c". Although, I don't know the current Unity behaviour if the user supplies garbage (min greater than max).

    And the Lerp method could probably be made faster if the Mathf.Clamp call was skipped and its content copy/pasted into the method (avoid invoking the nested method), but that might not be the case unless the custom Clamp logic can be made equal/faster than the built-in Mathf.
     
    Last edited: Nov 1, 2010
  10. half_voxel

    half_voxel

    Joined:
    Oct 20, 2007
    Posts:
    978
    Thanks FizixMan

    This is what I am getting now.
    My implementations seems to be at least as fast as the built-in ones now.

    Updated the webplayer.
     

    Attached Files:

  11. shawn

    shawn

    Unity Technologies

    Joined:
    Aug 4, 2007
    Posts:
    552
    Hey! Thanks for looking into this. :)

    Some comments:

    Code (csharp):
    1. Abs (float)
    2. Abs (int)
    Unity calls Mono's Math.Abs internally, so there is added method calling overhead. This is silly, so tomorrow I'm going to implement it so that we don't waste time by calling into Mono.

    Code (csharp):
    1. Sign (float)
    2. Clamp01 (float)
    Unity's implementation is faster because you are doing comparison of floats vs ints. And then casting clamped results from int to float.

    Code (csharp):
    1. Sign (int)
    2. Clamp01 (int)
    Your implementation is faster because we don't have overrides for ints so there are implicit casts. I'm going to add overrides tomorrow.

    Code (csharp):
    1. Clamp (float)
    2. Clamp (int)
    3. Clamp01 (float)
    4. Clamp01 (int)
    All of your clamps are doing an unnecessary ternary operation if the value is greater than the max.

    Code (csharp):
    1. Min (float)
    2. Min (int)
    3. Max (float)
    4. Max (int)
    5. Lerp (float)
    These are identical to our implementations. And your test shows approx 100% on these, so that makes sense.

    Code (csharp):
    1. Approximately (float)
    Your implementation degrades at larger values. For example, if both a and b are very big, but they are different by one bit, they'd logically be approximately the same. Your implementation would likely return false since adding/subtracting such a small constant would not be representable in such a low floating point resolution.

    Also, it's very dangerous for us to change Approximately, since it may break backwards compatibility.

    Code (csharp):
    1. Repeat (float)
    Your implementation does not work with negative values.
     
  12. Eric5h5

    Eric5h5

    Volunteer Moderator Moderator

    Joined:
    Jul 19, 2006
    Posts:
    32,401
    As long as you're improving math functions, can you take a look at SpeedLerp? Mathf.Lerp, Mathf.InverseLerp, Mathf.SmoothStep, Vector2.Lerp, Vector3.Lerp, Vector4.Lerp, and Color.Lerp are all consistently slower than my own implementations across G5, Xeon, and ARM processors. It would be nice if you could make that script obsolete. ;)

    --Eric
     
  13. shawn

    shawn

    Unity Technologies

    Joined:
    Aug 4, 2007
    Posts:
    552
    Mind sharing your script?

    InverseLerp and SmoothStep are bit more involved so I'll need to spend some time looking at our implementation, but quickly looking over our Lerps I'm not sure how they can be sped up. I'm curious what you're doing to make them faster.
     
    landon912 likes this.
  14. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Pretty great to see Unity dev team interested in the little optimisations too :) keep it up!
     
  15. Eric5h5

    Eric5h5

    Volunteer Moderator Moderator

    Joined:
    Jul 19, 2006
    Posts:
    32,401
    Yep, I linked to it in my previous message (SpeedLerp).

    --Eric
     
  16. shawn

    shawn

    Unity Technologies

    Joined:
    Aug 4, 2007
    Posts:
    552
    Oh, I didn't notice the underline. Maybe we should make that a bit more readable :)
     
    Last edited: Nov 1, 2010
  17. shawn

    shawn

    Unity Technologies

    Joined:
    Aug 4, 2007
    Posts:
    552
    Looks like the only difference between the Lerps is that your implementation early outs when the value is outside of 0-1 range (instead of clamping) which is fair enough, but the worst case scenarios seem to be identical.
     
  18. Eagle32

    Eagle32

    Joined:
    Jun 20, 2010
    Posts:
    89
    i5 750
     

    Attached Files:

  19. half_voxel

    half_voxel

    Joined:
    Oct 20, 2007
    Posts:
    978
    Ah. I knew it was to fast to be correct ;)
    Just out of curiosity, how does your implementation work?

    And btw, why is U3s implementation of Approximately so much slower than U2.6s implementation?
    When running the test on U3 I get 5000%, but on 2.6 I get around 300%, it might be my implementation which magically got faster on U3 though.

    Ah, yeah, missed that.
    ≈200% faster is quite a lot though, I shall see if I can get it to work faster than yours and still get it to work with negative values.

    ....

    Okay, it seems like I can't find a faster implementation of Repeat.
    Here's what I got, it's running at about 90-100% on Unity 2.6 (haven't tested it on U3).

    Code (csharp):
    1. public static float Repeat (float a,float b) { 
    2.     if (a < 0F) {
    3.         return b+(a % b);
    4.     }
    5.     return a % b;
    6. }
    It doesn't return exactly the same values as the built-in though, when sending -20, 5, Mathf returns 0, but my implementation returns 5, both are correct though (0 <= 5 <= 5).
     
    Last edited: Nov 2, 2010
  20. half_voxel

    half_voxel

    Joined:
    Oct 20, 2007
    Posts:
    978
    I would also comment on the documentation regarding Repeat, nothing big but anyway.

    It says:
    Doesn't the modulo operator work with floating point numbers?
    I am using it in my implementation, it works great.

    [Edit] Never mind about this, this is not the right place to comment on it
     
    Last edited: Nov 7, 2010
  21. shawn

    shawn

    Unity Technologies

    Joined:
    Aug 4, 2007
    Posts:
    552
    Because the 2.x implementation had the same problem that your implementation has, so it was rewritten for 3.0 (where we can break backwards compatibility).
     
  22. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Well you could have a Mathf3. class ;)
     
  23. zumwalt

    zumwalt

    Joined:
    Apr 18, 2007
    Posts:
    2,287
    $2010-11-07_1504_mathf.png
    Started it, went out for a bit, came back, interesting results... :)
    I guess not everyones math is very well, I know mine isn't.
     
  24. half_voxel

    half_voxel

    Joined:
    Oct 20, 2007
    Posts:
    978
    Yeah... that doesn't look very good, probably number overflow.
    But hey!, My implementation was Infinity% faster in some cases :D
     
  25. Kamyker

    Kamyker

    Joined:
    May 14, 2013
    Posts:
    1,091
    10 years later, did some tests for my fft.

    Using unity 2019.2, burst, mathematics, il2cpp, x86_64 standalone windows:

    Very weird results:

    Base to compare all of them:
    double d = Math.Sin(x);

    200ms

    float d = (float)Math.Sin(x);

    200ms but slightly slower in editor

    float d = Mathf.Sin(x);

    170ms but 230ms+ in editor

    using the
    using static Unity.Mathematics.math;

    float d = sin(x);

    2200+ms - that's not a typo, it's somehow very slow despite using same code as Mathf

    I wonder if MathF in new NET Standard 2.1 could be faster than all of them.
     
  26. tim_jones

    tim_jones

    Unity Technologies

    Joined:
    May 2, 2019
    Posts:
    287
    That Unity.Mathematics.math timing doesn't look right. When benchmarking Burst code, it's important to set CompileSynchronously=true, and throw away timing measurements from the first call, as described here

    https://docs.unity3d.com/Packages/com.unity.burst@1.3/manual/index.html#synchronous-compilation

    Specifically this part:

     
    MadeFromPolygons likes this.
  27. Kamyker

    Kamyker

    Joined:
    May 14, 2013
    Posts:
    1,091
    Last edited: Mar 7, 2020