Unity Community


Page 1 of 2 12 LastLast
Results 1 to 20 of 26

  1. Location
    Paris
    Posts
    3,730

    The fastest string Contains/IndexOf (very very fast)

    UPDATE : the cake is a lie !
    see this post, thanks to the below discussion, this method is faster in a very particular case (and as a matter of fact I was always falling into that case), which itself can be treated as fast with an alternate better fitting solution (see NPFS3000 post).
    So this thread still have the fastest solution, but it's not in this OP ^^

    _______________________________

    Hello,

    Vet coders will undoubtedly know this, but this is something I found a while ago after trying to milk the fastest possible string Contains. So it might help some people who didn't come to be aware of that.
    In Unity, String comparisons and operations are much needed for animationClip-heavy games, or even just for complicated state machines. Actually, the default string.Contains() or IndexOf() are horribly slow.

    But it comes to a simple option to make it far much faster. Just ignoring culture and casing by turning it to an IndexOf test, while putting "System.StringComparison.OrdinalIgnoreCase" in the function (as Contains() doesn't have this option afaik). Unless you have tons of different state names / animationClips, you don't need to separate them by case. So here we go.


    Test 1 (native Contains method) :

    Code:  
    1. for (int i = 0; i < 100000; i++)
    2. return _clipName.Contains("VeryLongName");

    Test 2 (Contains OrdinalIgnoreCase method) :

    Code:  
    1. for (int i = 0; i < 100000; i++)
    2. return _clipName.IndexOf("VeryLongName", System.StringComparison.OrdinalIgnoreCase) != -1;


    Test 1 : 16 ms.
    Test 2 : 4 ms.

    So "emulating" a Contains() by using an IndexOf with OrdinalIgnoreCase is 400% faster than the builtin Contains().

    Remember it's tested on a PC (Intel i7 920), so you can expect 10 to 30 times longer durations on mobile.
    Last edited by n0mad; 03-07-2012 at 01:54 AM.

  2. Cbp Cbp is offline

    Location
    Copenhagen
    Posts
    4
    Hey,

    How many times did you run that test, they should be very close to each other since Mono string contains is implemented using the IndexOf like this

    Code:  
    1. public bool Contains (String value)
    2. {
    3.     return IndexOf (value) != -1;
    4. }

    https://github.com/mono/mono/blob/ma...stem/String.cs


  3. Location
    California
    Posts
    1,657
    Quote Originally Posted by Cbp View Post
    How many times did you run that test

    Not enough to be significant and if that is the code he used to test, then it is wrong.


  4. Location
    California
    Posts
    1,657
    I am sure there is a mistake in here, but this is my code and my results:
    Code:  
    1. using System;
    2. using System.Collections.Generic;
    3. using System.Linq;
    4. using System.Text;
    5. using System.Diagnostics;
    6.  
    7. namespace ConsoleApplication1
    8. {
    9.     class Program
    10.     {
    11.         static string _clipNameShort = "VeryShortName";
    12.         static string _clipNameLong = "VeryLongName";
    13.         static bool _resultBool;
    14.         static int _resultIndex;
    15.  
    16.         static void Main(string[] args)
    17.         {
    18.             const int PermittedRuns = 10000000;
    19.             Stopwatch sw;
    20.  
    21.             sw = Stopwatch.StartNew();
    22.             for (int i = 0; i < PermittedRuns; i++)
    23.             {
    24.                 _resultBool = _clipNameShort.Contains("VeryLongName");
    25.             }
    26.  
    27.             sw.Stop();
    28.             Console.WriteLine(String.Format("Short Time Contains Not Found= {0:N0}", sw.ElapsedMilliseconds));
    29.  
    30.  
    31.             sw = Stopwatch.StartNew();
    32.             for (int i = 0; i < PermittedRuns; i++)
    33.             {
    34.                 _resultBool = _clipNameLong.Contains("VeryLongName");
    35.             }
    36.  
    37.             sw.Stop();
    38.             Console.WriteLine(String.Format("Long Time Contains Found = {0:N0}", sw.ElapsedMilliseconds));
    39.  
    40.  
    41.             sw = Stopwatch.StartNew();
    42.             for (int i = 0; i < PermittedRuns; i++)
    43.             {
    44.                 _resultIndex = _clipNameShort.IndexOf("VeryLongName");
    45.             }
    46.  
    47.             sw.Stop();
    48.             Console.WriteLine(String.Format("Short Time IndexOf Not Found = {0:N0}", sw.ElapsedMilliseconds));
    49.  
    50.  
    51.             sw = Stopwatch.StartNew();
    52.             for (int i = 0; i < PermittedRuns; i++)
    53.             {
    54.                 _resultIndex = _clipNameLong.IndexOf("VeryLongName");
    55.             }
    56.  
    57.             sw.Stop();
    58.             Console.WriteLine(String.Format("Long Time IndexOf Found = {0:N0}", sw.ElapsedMilliseconds));
    59.  
    60.             sw = Stopwatch.StartNew();
    61.             for (int i = 0; i < PermittedRuns; i++)
    62.             {
    63.                 _resultBool = _clipNameShort.IndexOf("VeryLongName", System.StringComparison.OrdinalIgnoreCase) != -1;
    64.             }
    65.  
    66.             sw.Stop();
    67.             Console.WriteLine(String.Format("Short Time IndexOf Ignore Case Not Found = {0:N0}", sw.ElapsedMilliseconds));
    68.  
    69.             sw = Stopwatch.StartNew();
    70.             for (int i = 0; i < PermittedRuns; i++)
    71.             {
    72.                 _resultBool = _clipNameLong.IndexOf("VeryLongName", System.StringComparison.OrdinalIgnoreCase) != -1;
    73.             }
    74.  
    75.             sw.Stop();
    76.             Console.WriteLine(String.Format("Long Time IndexOf Ignore Case Found = {0:N0}", sw.ElapsedMilliseconds));
    77.  
    78.         }
    79.     }
    80. }
    And the results:
    Code:  
    1. Short Time Contains Not Found= 573
    2. Long Time Contains Found = 611
    3. Short Time IndexOf Not Found = 975
    4. Long Time IndexOf Found = 1,041
    5. Short Time IndexOf Ignore Case Not Found = 1,461
    6. Long Time IndexOf Ignore Case Found = 819
    The results are interesting, and in complete conflict with the OP, so I am curious to know what the OP is seeing and what deviation from expected results of the 400% speedup I am getting.

    @n0mad: Can you post the full block of code you used to test the speed?
    Last edited by JustinLloyd; 03-06-2012 at 11:11 AM.


  5. Location
    Paris
    Posts
    3,730
    Hello and thanks for your interest
    The full block are just in the OP, I intended to strip it from any "noise" by only limiting it to function, not attribution.

    I put a System.Diagnostics.Stopwatch before the test, and read the result after the loop.

    Full code would be :
    Test 1 :
    Code:  
    1. string _clipName = "s_LP";
    2. sw = new Stopwatch();
    3. sw.Start();
    4. for (int i = 0; i < 100000; i++)
    5.   _clipName.Contains("VeryLongName");
    6. sw.Stop();
    7. Debug.Log(sw.ElapsedMilliseconds);

    Test 2 :
    Code:  
    1. string _clipName = "s_LP";
    2. sw = new Stopwatch();
    3. sw.Start();
    4. for (int i = 0; i < 100000; i++)
    5.   _clipName.IndexOf("VeryLongName", System.StringComparison.OrdinalIgnoreCase) != -1;
    6. sw.Stop();
    7. Debug.Log(sw.ElapsedMilliseconds);


    I ran the tests several times, and results are always the same. It's interesting to see that you have different results though ?
    Also, @Cbp, I insist on using OrdinalIgnoreCase, if it's not specified, speed is the same for IndexOf and Contains.

    edit : what's more intriguing about your results Justin is that ignoring case should always be faster, in theory ?

    p.s : _clipName in my tests would be any incoming clip, and never contained "VeryLongName" in any case. I didn't think it would have its importance, but it seems it has after your results.
    Last edited by n0mad; 03-06-2012 at 03:50 PM.

  6. Cbp Cbp is offline

    Location
    Copenhagen
    Posts
    4
    edit : what's more intriguing about your results Justin is that ignoring case should always be faster, in theory ?
    ignoring case Will always be slower, since it has to make both strings to uppercase and then compare them, each char is compared with it's byte value, and therefore the comparison will be the same speed as case sensitive matching.

    p.s : _clipName in my tests would be any incoming clip, and never contained "VeryLongName" in any case. I didn't think it would have its importance, but it seems it has after your results.
    the indexOf only return the first match and should return as soon as this happens, so it does not need to compare the rest of the string, and will be faster, im not total sure it does this tho.


  7. Location
    Gold Coast, Australia
    Posts
    3,593
    Code:  
    1.        static void Main(string[] args)
    2.         {
    3.             bool result;
    4.             string _clipName = "NeryLongNamee";  //vary the length of this
    5.  
    6.             int iterations = 1000000;
    7.  
    8.             var sw = new Stopwatch();
    9.  
    10.             sw.Start();
    11.             for (int i = 0; i < iterations; i++)
    12.                 result = _clipName.Contains("VeryLongName");
    13.             sw.Stop();
    14.  
    15.             Console.WriteLine(sw.ElapsedMilliseconds);
    16.  
    17.             sw = new Stopwatch();
    18.  
    19.             sw.Start();
    20.             for (int i = 0; i < iterations; i++)
    21.                 result = _clipName.IndexOf("VeryLongName", System.StringComparison.OrdinalIgnoreCase) != -1;
    22.             sw.Stop();
    23.  
    24.             Console.WriteLine(sw.ElapsedMilliseconds);
    25.  
    26.             Console.ReadLine();
    27.         }

    After 30s of experimentation I deduced the following:

    If the string we are searching is longer than the string we are looking for, IndexOf is much faster. If it is the same it's a bit slower. If it's shorter than it's much slower.

    For example searching the string "NeryLongNamwe" 1Mn times Contains takes 60ms while indexof takes 160!


    Want the best performance? Add in a short circuit:

    Code:  
    1. result = _clipName.Length < "VeryLongName".Length ? false : _clipName.Contains("VeryLongName");

    Fastest in all tests.


  8. Location
    California
    Posts
    1,657
    Well their string search code uses those there fancy algorithms. The more variation of letters being searched, the faster. The longer the string being searched for, the faster. The longer the string being searched through, the faster (in terms of relative comparison.) IIRC .NET uses something like KMP or Boyer-Moore -- too lazy to look it up at this time. KMP is an algorithm that almost all programmers eventually discover independently and still one of the fastest general purpose string search algorithms.


  9. Posts
    3,768
    I know that you need to use strings for animationClips and so on, but I still don't see this having a huge impact. The string based stuff doesn't have to happen every frame, does it? The only reliance on strings is when you need to change the state of animation(s) on an object, right?


  10. Location
    Paris
    Posts
    3,730
    @NPSF3000 : thanks for the tip ! That's a very good idea

    @Cbp : I tested the stuff and yes, ignoring case is effectively a bit slower .... I was misguided by some other dev blogs who ran some similar tests than the one I posted in the OP :/

    @angrypenguin : string searching can be very intensive for complex state machines and AIs. For example, in my fight AI, some behaviors cannot be calculated without scanning opponent's potential actions, all depending on a lot of potential parameters. I already organized them with enums, but there are tons of different action names, so besides the basic "action family" sorting, I still have to make the AI analyze action names if I want it to be accurate enough for a hardcore audience.
    This routine is run every 1 to 2 seconds for performance optim, but ideally I'd like it to run every 100 ms (which is the average hardcore fighting gamer reaction time). In the end, this can lead into some serious string work, so milking for the best optimization possible is necessary, to avoid interference with all the already CPU consuming tasks (like drawcall, engine, player input responsivity).

    ______________________

    Ok I ran some deeper tests too, and it seems that NPSF3000 conclusion is the right one, but OrdinalIgnoreCase does have some faster cases :

    Not found, short pattern (< 10 chars) :
    - short pattern searched in long string; pattern not found : IndexOf slightly faster than Contains
    - short pattern searched in short string; pattern not found : IndexOf equal to Contains, IndexOf OrdinalIgnoreCase faster than both

    Found, short pattern (< 10 chars) :
    - short pattern searched in long string; pattern found : IndexOf equal to Contains
    - short pattern searched in short string; pattern found : IndexOf slightly faster than Contains

    Not found, long pattern (> 20 chars) :
    - long pattern searched in long string; pattern not found : IndexOf slightly faster than Contains, IndexOf OrdinalIgnoreCase faster than both
    - long pattern searched in short string; pattern not found : IndexOf equal to Contains, IndexOf OrdinalIgnoreCase much faster than both (the 400% case, but well searching a long pattern in a short string is automatically false by definition)

    Found, long pattern (> 20 chars) :
    - long pattern searched in long string; pattern found : IndexOf slightly faster than Contains.

    In all the cases where OrdinalIgnoreCase wasn't specified, it was twice slower than IndexOf/Contains default methods.
    Where OrdinalIgnoreCase is faster, it's only 20% faster.

    Conclusion : I should delete the OP asap IndexOf and Contains are generally prefered to OrdinalIgnoreCase.
    Initially my tests were run on any incoming animation name, and it was always returning the 400% faster result. This can be explained by the fact that all my anim names are using short names, except some special ones, which are the ones containing the patterns I search.
    So my case was very, very specific, and lead to think that OrdinalIgnoreCase would be more performant as I would always fall in the case "long pattern in short string", but NPSF3000 got a far better solution posted above.

    My apologies for the misguiding results in the OP ! And thank you all for having taken the time to try it yourself and correct me Now I should spread the info into all those blogs I saw a while ago having the same results as in the OP ...
    Last edited by n0mad; 03-07-2012 at 01:48 AM.


  11. Location
    Paris
    Posts
    3,730
    Additionally, while we're at string operations optimization, I also found a script on Stackoverflow that is faster than the base string.Replace().
    Once again, I tested it and it was faster, but I may have fallen into another ultra specific case as the one above.
    So if anybody is interested in testing it, or just using it, here it is :

    Code:  
    1. static public string Replace(string original, string pattern, string replacement, int stringBuilderInitialSize)
    2.     {
    3.         if (original == null)
    4.         {
    5.             return null;
    6.         }
    7.  
    8.         if (String.IsNullOrEmpty(pattern))
    9.         {
    10.             return original;
    11.         }
    12.  
    13.  
    14.         int posCurrent = 0;
    15.         int lenPattern = pattern.Length;
    16.         int idxNext = original.IndexOf(pattern, StringComparison.OrdinalIgnoreCase);
    17.         StringBuilder result = new StringBuilder(stringBuilderInitialSize < 0 ? Math.Min(4096, original.Length) : stringBuilderInitialSize);
    18.  
    19.         while (idxNext >= 0)
    20.         {
    21.             result.Append(original, posCurrent, idxNext - posCurrent);
    22.             result.Append(replacement);
    23.  
    24.             posCurrent = idxNext + lenPattern;
    25.  
    26.             idxNext = original.IndexOf(pattern, posCurrent, StringComparison.OrdinalIgnoreCase);
    27.         }
    28.  
    29.         result.Append(original, posCurrent, original.Length - posCurrent);
    30.  
    31.         return result.ToString();
    32.     }


  12. Posts
    3,768
    Are you dynamically adding new states to these state machines/AIs?

    Even if you are, if performance is key then I'd strongly consider running some kind of pre-pass to convert all of those strings into ints (or similar) for use once the real-time part has started. If performance is important, why work with strings at all?


  13. Location
    Paris
    Posts
    3,730
    Quote Originally Posted by angrypenguin View Post
    Are you dynamically adding new states to these state machines/AIs?

    Even if you are, if performance is key then I'd strongly consider running some kind of pre-pass to convert all of those strings into ints (or similar) for use once the real-time part has started. If performance is important, why work with strings at all?
    There are 510 animations per character, dynamically converting them at launch would take forever.
    And because there are 510 to 700 anim names per char, I on't have any other choice but to work with strings.
    I'm not even overdoing the number of different anims, it's just the base 4 punches/kicks types (Light/Fierce), declined into basic fighting game moves (close, far, crouch, jump, specials, hit, crouch hit, etc).

    Plus, now that my engine is written, I would have to convert tons and tons of string tests to enums. It's impossible.

    The other reason why I chosed strings over enums is that they let me use concatenation for simpler action triggering routines.
    Last edited by n0mad; 03-07-2012 at 04:10 AM.


  14. Location
    Norway
    Posts
    152
    Quote Originally Posted by n0mad View Post
    there are tons of different action names
    Maybe the Rabin-Karp algorithm is a better choice for you if what you actually need is a multi pattern search. Wikipedia has a list of string search algorithms that you can take a look at to see if there might be a better choice for you.


  15. Location
    Paris
    Posts
    3,730
    Quote Originally Posted by steego View Post
    Maybe the Rabin-Karp algorithm is a better choice for you if what you actually need is a multi pattern search. Wikipedia has a list of string search algorithms that you can take a look at to see if there might be a better choice for you.
    Thanks for the tip, that's a great link !


  16. Location
    Sweden
    Posts
    1,415
    I am curious, no matter what your results were, what n0mad ment by this:

    Quote Originally Posted by n0mad
    So it might help some people who didn't come to be aware of that.
    In Unity, String comparisons and operations are much needed for animationClip-heavy games, or even just for complicated state machines. Actually, the default string.Contains() or IndexOf() are horribly slow.
    My Open Source Projects: UdpKit, IronJS, Free Unity Assets
    Need unity consulting? Contact me over PM or Here


  17. Location
    Paris
    Posts
    3,730
    Quote Originally Posted by fholm View Post
    I am curious, no matter what your results were, what n0mad ment by this:
    I ment this :


    Quote Originally Posted by n0mad View Post
    There are 510 animations per character, dynamically converting them at launch would take forever.
    And because there are 510 to 700 anim names per char, I on't have any other choice but to work with strings.
    I'm not even overdoing the number of different anims, it's just the base 4 punches/kicks types (Light/Fierce), declined into basic fighting game moves (close, far, crouch, jump, specials, hit, crouch hit, etc).

    Plus, now that my engine is written, I would have to convert tons and tons of string tests to enums. It's impossible.

    The other reason why I chosed strings over enums is that they let me use concatenation for simpler action triggering routines.


  18. Location
    Sweden
    Posts
    1,415
    Wait, you did one animation for "punch jump" and one for "punch standing"? or am I miss understanding something? having 500+ animations seems... excessive to say the least.
    My Open Source Projects: UdpKit, IronJS, Free Unity Assets
    Need unity consulting? Contact me over PM or Here


  19. Location
    Paris
    Posts
    3,730
    Quote Originally Posted by fholm View Post
    Wait, you did one animation for "punch jump" and one for "punch standing"?
    Exactly

    - (Light | Fierce) + (stand | crouch | jump | closeStand) + (Punch | Kick)
    - combos (average of 3 anims x 6 x 8 Martial Arts, as seen in the latest post of Kinetic Damage's thread, there's 10 min of video)
    - specials (4 per 8 Martial Arts, each containing 2 - 15 different animations, sometimes more)
    - grabs (1 per Art, with (launchGrab | DoGrab | FailGrab) + (enemyGrabReaction | enemyGrabRejection))
    - and hit animations : (Light | Fierce | Falling) + (stand | crouch | jump) + (direction of hit : top | left | right | down), plus other misc anims like standing up/stun/etc, all divived into separate splits to enable refactoring.

    And they're all different, so I couldn't use Anim Mixing (or it would have been situational, so I ran for a full non-mixing, non-anim-reusing approach).

    These are not special at all, it's the basis for all 1vs1 fighting games.

    edit : I just counted, there are 1694 AnimationClips in total.
    Last edited by n0mad; 03-07-2012 at 06:58 AM.


  20. Location
    Germany
    Posts
    1,215
    Quote Originally Posted by n0mad View Post
    @angrypenguin : string searching can be very intensive for complex state machines and AIs. For example, in my fight AI, some behaviors cannot be calculated without scanning opponent's potential actions, all depending on a lot of potential parameters. I already organized them with enums, but there are tons of different action names, so besides the basic "action family" sorting, I still have to make the AI analyze action names if I want it to be accurate enough for a hardcore audience.
    1. What kind of situation are you exactly taking about or what do the (productive, not just some test code for synthetical tests) code looks like? In which situation you need to compare many strings and for what reasons. If you don't mind posting an actuall piece of code you currently use in your project here, so we can have a look at it.

    Maybe HashTables/Dictionary could be an alternative to whatever you are attempting to do. The Hash tabels usually store only the hash as a key. So if you search inside this HashTable/Dictionary, your input string will be calculated into a hash and only the hash will be compared with the keys in the HashTable

    Quote Originally Posted by n0mad View Post
    This routine is run every 1 to 2 seconds for performance optim, but ideally I'd like it to run every 100 ms (which is the average hardcore fighting gamer reaction time). In the end, this can lead into some serious string work, so milking for the best optimization possible is necessary, to avoid interference with all the already CPU consuming tasks (like drawcall, engine, player input responsivity).
    2. If your AI code is that time critical, you shouldn't use Enums neither. Enums are also quite slow, compared to let's say an int and constants. Of course with int + constants you have to be bit more careful, because it's possible to accidently have invalid values there.

    Code:  
    1. public enum {
    2.    State1,
    3.    State2,
    4.    State3
    5. }
    6. // you can do
    7. public class StateConstants {
    8.     public const STATE1 = 0;
    9.     public const STATE2 = 1;
    10.     public const STATE3 = 2;
    11. }
    12.  
    13. ...
    14. int state = STATE3; // at compile time the STATE3 willl be replaced with 2, so there won't be any method or variable access in the final IL code
    The const values will be replaced (at compile time) with the values. So it's same as if you do
    Code:  
    1.  if(state==2)
    2.      ....
    just that it's more readable for you.
    Last edited by Tseng; 03-07-2012 at 08:31 AM.
    C# Developer
    www.Atomic-Gear.net
    Games:
    The Pirate Game: Android Market Link

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •