Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

gstring: GC free string for Unity!

Discussion in 'Scripting' started by vexe, Jul 5, 2015.

  1. vexe

    vexe

    Joined:
    May 18, 2013
    Posts:
    644
    Hey guys,

    so I wanted to share this gstring business with you. gstring (gcfreestring) is a string wrapper that uses pointers to mutate the string when performing misc string operations the purpose is to be able to perform most the common operations we do no strings (concat, format, replace, etc) without any allocation.

    gstring is not meant to be stored as member variables, but to quickly declare them in a 'gstring block', use them for whatever string operation you want, then dispose of them. The nice thing is that you don't have to manually dispose of gstrings, once you're in a block all assignments are registered so that when the block/scope ends all used gstrings are disposed.

    But what if you wanted to keep/store the result you calculated and not dispose of them? Well this is where 'intern' comes in - basically there's a runtime intern (cache) table of strings (similar to .NET's string const intern table).string str = result.Intern(); Which basically says, if the string is in the intern (cache) table, return it
    otherwise allocate new memory for it and store it in the table, next time we ask for it, it's there.
    The nice thing about interning is that you could pre-intern your strings via the static method gstring.Intern

    NOTES:
    1- The class is not designed with concurrency/threading in mind, it's meant to be used in Unity
    2- Cultural stuff I did not consider as well
    3- Again, you shouldn't have gstring members in your class. All gstring instances are meant to be disposed. You just quickly open up a gstring.Block() and use gstrings in it, if you want to store a result you get back from a gstring operation use Intern
    4- You don't need the gmcs and smcs file if you compile to a dll, and use the dll instead.

    The public API is identical to that of 'string'

    More details in this video:



    Attached is gstring with the test seen in the video.

    Note I just wrote it a couple of days ago so I didn't have time to test it in my game, so there might be bugs.

    Hope you find it useful!

    Cheers!
     

    Attached Files:

    Last edited: Jul 5, 2015
    zhuchun, looytroop, SparrowGS and 3 others like this.
  2. vexe

    vexe

    Joined:
    May 18, 2013
    Posts:
    644
    Re-uploaded video, louder volume.
     
    winxalex likes this.
  3. blizzy

    blizzy

    Joined:
    Apr 27, 2014
    Posts:
    775
    Hmm, while in principle this is pretty clever, I don't really see the point. You need to be doing lots and lots of string operations to reap a real benefit.

    Edit: typo
     
    Last edited: Jul 6, 2015
    Kiwasi likes this.
  4. JamesLeeNZ

    JamesLeeNZ

    Joined:
    Nov 15, 2011
    Posts:
    5,616
    Was thinking the exact same thing.
     
    Kiwasi likes this.
  5. vexe

    vexe

    Joined:
    May 18, 2013
    Posts:
    644
    I guess it depends on your game/usage.

    A small example: I use a lot of dictionaries to store my data (e.g. inventory items). Each key in that dictionary represents a property/piece of data relevant to that item. e.g. a 'weapon' could have multiple fire rates, stored as 'FireRate0', 'FireRate1' etc. To get the fire rate at a particular index (i) I'd have to return weapon["FireRate" + i]; << string concat. I was using constants to avoid having to concat like that. i.e.

    Code (csharp):
    1.  
    2. public float GetFireRate(int index)
    3. {
    4.       switch(index)
    5.       {
    6.            case 0: return this["FireRate0"];
    7.            case 1: return this["FireRate1"]; etc
    8.       }
    9. }
    10.  
    With a gcfree string I could just:

    Code (csharp):
    1.  
    2. public float GetFireRate(int index)
    3. {
    4.       using(gstring.Block()) return this[gstring.Concat("FireRate", index)];
    5. }
    6.  
    I don't know about you but that just seems way nicer.

    This also lets me enumerate my dictionary, reading those rate values without allocating anything.

    More about this dictionary approach if you're interested is mentioned here https://github.com/vexe/VFW/issues/38

    So yeah, depends on your game. For me, gstring improves many areas of my codebase, more tidy, cleaner, less cluttered, etc. I just want to program, sick of having to worry about the gc for every single line I write, it's just frustratingly stupid.

    Programmer happiness matters for me, whatever bit that helps raise my moral and overall spirit, I'm all game.
     
    VirtusH likes this.
  6. JamesLeeNZ

    JamesLeeNZ

    Joined:
    Nov 15, 2011
    Posts:
    5,616
    Couple of thoughts...

    You could build up those strings in code, then re-use them later as required. Build once, use forever. No need to worry about additional string allocations then.

    You say you enumerate a dictionary? I assume you mean you do something like this...
    Dictionary["example0"].value
    Dictionary["example1"].value

    Seems silly to me. If I need to enumerate a collection, ill make it an array or a list, if I need direct access via a key, its dictionary. I use int's and my Keys though, because I can always find a transform id (GetInstanceId()), or an enumeration.

    The only thing so far I've found I use strings for are tags/layers/invoke functions. All other strings are loaded from xml files.

    strings should be a last choice of ways to find/manipulate stuff, but each to their own.
     
    blizzy and Kiwasi like this.
  7. Kiwasi

    Kiwasi

    Joined:
    Dec 5, 2013
    Posts:
    16,860
    That was my first thought. Its also a bit of an annoyance throwing restrictions on how to use strings.

    I'm not convinced. I would generally use an indexable collection to handle this.

    I'm sure there are edge cases where using strings heavily makes sense. But most of the time there are other ways.
     
  8. novashot

    novashot

    Joined:
    Dec 12, 2009
    Posts:
    373
    ... I usually feel the same as the other guys who posted here... but (maybe i had a beer too many and see things skewed...) it seems your are being beat up for releasing code that helped you out and for free. I do not care if i will ever use it or not, but thanks for throwing out something you made that someone else may find helpful.
     
    Ultroman, atomicjoe and Martin_H like this.
  9. Munchy2007

    Munchy2007

    Joined:
    Jun 16, 2013
    Posts:
    1,732
    I agree with this.

    I too was surprised and somewhat saddened by the amount of negativity that followed the opening post.
     
    atomicjoe and Martin_H like this.
  10. blizzy

    blizzy

    Joined:
    Apr 27, 2014
    Posts:
    775
    I also appreciate the amount of work and the open source code. That has nothing to do with what I previously stated, though. I didn't mean to bash his work or something like that.
     
  11. Kiwasi

    Kiwasi

    Joined:
    Dec 5, 2013
    Posts:
    16,860
    Nah, no negativity was intended. Experienced coders are just direct. It comes from spending all day talking in unambiguous language to a computer. Computers don't require please and thank you. But they do require perfect syntax.
     
    DeeJayVee, ilmario, _met44 and 2 others like this.
  12. passerbycmc

    passerbycmc

    Joined:
    Feb 12, 2015
    Posts:
    1,741
    ya programmers always seem a little blunt in their language. we just tend to get in our own little boxes of thinking. Like i personally know i don't use strings enough in a way that would cause too many allocations and would actually find working with mutable reference types for strings rather annoying. The way i use strings definitely makes more sense to be working with immutable value types.
     
  13. vexe

    vexe

    Joined:
    May 18, 2013
    Posts:
    644
    Appreciate all the feedback guys!

    I believe that information sharing is an ethic a programmer should have. Pretty much all the tools I write are based off of 'need' - Different games have different problems, different problems require different solutions. I was just sharing one of the solutions to one of the problems in my game. If you happen to find this solution useful, beneficial or helpful to you in any way shape or form, I'd be very happy if you take it and apply it! - If you don't, well, hey now you know that such solution exist, if you ever need it it's there. You never know, it might inspire you solutions to solve other problems. Sa'ul good, man (get it? :D)

    I just have one comment to Mr @JamesLeeNZ, cause I'm afraid he didn't get the full picture of the system/design/structure I was using, it's on me, I didn't convey my thoughts properly.

    That does sound silly if the dictionary only had 'example' related data, if that was the case then yeah sure a sequence is more appropriate. But that dictionary doesn't just hold 'example' data, there's more to it than that.

    Let me take a step back, let's take the classical game development problem of an inventory system, with items. The inventory in my game is very similar to that in the classical Resident Evils (0, 1, 2, 3) - You basically have different items: Usable (key items), Consumables (health), Combinables (health, weapons and key items), Equippables (weapons) - And then I ask you to design a system/come up with a solution to solve this problem. How would you go about it? - I don't know about you, but I've asked this question to many of my fellow programmers, and I can almost guarantee that you will hear the same solution from a lot of other programmers. They will come up with some sort of inheritance-based solution, where you have a base Item class -> Weapon, -> KeyItem, -> Health and maybe interfaces ICombinable, IConsumable, etc. Or some sort of composition-based where you have a base Item, and it 'has a' references to Combinable/Consumable/etc objects containing the relevant pieces of data.

    Well, I've tried all those solutions, they're all flawed, ridiculously complex, impractical, inflexible, hard to use, hard to 'externalize' to disk because in pre-production you'll be in a chaotic development stage, you're not sure what you want, so you keep adding/removing pieces of data, experiment with things etc, that makes it hard to serialize/deserialize things cause now you have so many version of your objects, you'd have to use a serializer that has good version tolerancy (which means your serializer needs to write metadata about versions, makes the whole thing slow and complex...)

    What if I told you about a solution that:
    1- Lets you represent 'all' items in a single Item class!
    2- Each different item only contains the pieces of data relevant to it.
    3- Extremely easy to [de]serialize, your serializer doesn't need to support versioning nor inheritance.
    4- It's data driven, very easy to extern the items data to files, the output format is very simple. the parser is trivial.
    5- Could be used to represent the internal structure of many other problems: FSM, Behaviour/Dialogue Trees, and pretty much anything that has 'properties' or 'different pieces of data' related to it (our Items example)

    I was very fortunate to work with the super awesome @Xelnath who introduced me to the idea (its origins goes back to Lua), it opened my eyes, totally changed the way I approach The idea is that you use a dictionary with the 'key' being the name of the 'property' or the relevant 'peice of data', the 'value' is a struct that holds a 'float' and 'string' (this depends on your needs) - This is to wrap the most-common used data types: int, float and bool (represented by that float) and string (represented by that string field) - Then provide implicit conversion operators between the struct and those data types.
    At the end of it all, you'd be able to write the following code:

    Code (csharp):
    1.  
    2. var handgun = new Item();
    3. handgun["Name"] = "Berreta";
    4. handgun["Damage"] = 45f;
    5. handgun["IsStackable"] = true;
    6. handgun["Id"] = 123;
    7.  
    I don't know about you, but that syntax turns me on!

    Of course, instead of string literals one should use constants in a static class. You might ask, why not use enums? Well, simply cause they're not flexible enough. Take for example, each item could be combined with multiple other items, so you'll have:

    Code (csharp):
    1.  
    2. handgun["CombineWith0"] = "HandgunAmmoA";
    3. handgun["CombineWith1"] = "HandgunAmmoB";
    4. handgun["CombineWith2"] = "HGUpgradeParts";
    5.  
    Sure I can keep defining enums, but come on... that's just tedious. I don't mind my keys being strings in this case. They're also more debugger-friendly, some debuggers display enums as numbers, that's not helpful for me. I want to know what that number represents, I don't want to have to remember what each of those number mean (again, programmer happiness)

    This approach drastically improved my progress, I encourage everyone to try it.

    This is how your data might look on disk: http://i.imgur.com/QMEa34j.png
    The inspector is very friendly too: http://i.imgur.com/8uJOK2u.png

    As you can see the output format is very simple, no need for fancy parsers, etc.
    Code (csharp):
    1.  
    2. HEADER
    3. {
    4.     KEY = VALUE,
    5.     etc.
    6. }
    7.  
    I call that structure, DataContainer and the struct: GameData (might call it PropertyContainer and GameProperty etc) Here's my implementation of it: http://pastebin.com/JYZxj2nr

    Your "Item" class just inherits DataContainer and provides quick access to the container data via properties http://i.imgur.com/vbxTF3t.png

    Here's a sample test project https://copy.com/jFZf2rY6TLkRMRyy - you have an inventory class, with items. You read items from file and store them in the mentioned structure.

    So yeah, this was a small example of why I'd need allocation free string concat at runtime. Like we've seen, an item could have more than one, say fire rate, damage type/amount, combine pair, etc. Their keys could be "DamageAmount0", "DamageAmount1" etc. To get the damage at a specific index, you could either return item["DamageAmount" + index] or do the switch I mentioned above ^ - I just think it's more convenient to use the former solution: string concat, but it's less attractive if it allocates due to the + index. This is where gstring comes in to let us do that concat with 0 allocation.

    I hope you understand now the need to enumerate a dictionary the way mentioned at the end of this post https://github.com/vexe/VFW/issues/38 Still think it's 'silly'? :p

    I believe (at least for my use-cases, and runtime gameplay programming) that this 'data container' approach renders OOP obsolete, along with all its non-applicable, redundant, spoon-fed, nonsensical over-complicated hyped, over-praised terminologies (SOLID etc) - You have 'different' type of zombies in your game? Sure, write a Zombie class. Each zombie has different data? Sure, give 'Zombie' a 'data container' and populate the different types of zombies with their corresponding data. Zombies have different behaviours? Sure, write your different behaviour methods (move, attack etc) and have a delegate that points to the right method. No need for vtable BS.

    I should stop here, that's a different beast, for a different rant :D

    Cheers all!
     
    Last edited: Jul 6, 2015
  14. vexe

    vexe

    Joined:
    May 18, 2013
    Posts:
    644
    Oh and.. another area where gstring could be very beneficial: Editor codes. and the internal/backend/core of frameworks. e.g. In VFW, I deal with a decent amount of string/regex Replace, concatenations etc in editor update. I do all sorts of hacks (memoize etc) to keep allocation at bay. I would much rather be able to do "STRING" + "STRING" without having to worry whether or not will slow things down. You might say it won't and it's not a big deal, but for me it is, every bit matters, cause this is code used by many others. I want my core codes to be fast, and my users to have a smooth editor experience.
     
  15. blizzy

    blizzy

    Joined:
    Apr 27, 2014
    Posts:
    775
    All the ranting about OOP aside, you did not address my initial point: You need to be doing lots of string operations to reap a benefit in terms of garbage collection.
     
  16. vexe

    vexe

    Joined:
    May 18, 2013
    Posts:
    644
    @blizzy yes my reply was to clear the misunderstanding on that particular dictionary structure/string concat example.

    To answer you, AFAIK, (correct me if I'm wrong) even if you had a very simple (doesn't need to be a lot or complex) string operation, like a trivial string concat in Update, it would all add up eventually to trigger a collection cycle, which might inconveniently occur in a sensitive/intense point of time in the game.
     
  17. Nodrap

    Nodrap

    Joined:
    Nov 4, 2011
    Posts:
    83
    I watched a video about the profiler and they showed the garbage collection allocated graph going up and snapping back down and in passing he mentioned you can write garbage collection free code - what a great idea. I never even thought you could do that in C#. I come from C++ and so was used to doing this but this example of garbage free string code is just what you need for this. This is also now very important for VR development where framerate glitches can make the user sick. This kind of stuff is definitely needed.
     
  18. Dameon_

    Dameon_

    Joined:
    Apr 11, 2014
    Posts:
    542
    What are the advantages of using this over StringBuilder? StringBuilder generates very little garbage in the first place, and can even be extended so that it generates 0 garbage, and doesn't have any of the limitations this approach seems to come with. I suppose for throwaway strings this has some advantages, but a StringBuilder can even be used for throwaway strings and kept and recycled. StringBuilder can even be adapted to a pooling strategy.
     
    Firestar9114 and Kiwasi like this.
  19. Firestar9114

    Firestar9114

    Joined:
    Jun 16, 2018
    Posts:
    11
    Came here to ask this. Can you explain when StringBuilder generates garbage and how you would extend StringBuilder to remove all garbage generation, or link materials explaining it? I find garbage generation quite annoying.
     
  20. winxalex

    winxalex

    Joined:
    Jun 29, 2014
    Posts:
    166
    Unity guys have build something
    Code (CSharp):
    1. using System;
    2. using UnityEngine;
    3. using System.Collections.Generic;
    4.  
    5. namespace ws.winx.unity.core {
    6.  
    7.  
    8. // Instance of this class can be created as assets.
    9. // Each instance contains collections of data from
    10. // the Saver monobehaviours they have been referenced
    11. // by.  Since assets exist outside of the scene, the
    12. // data will persist ready to be reloaded next time
    13. // the scene is loaded.  Note that these assets
    14. // DO NOT persist between loads of a build and can
    15. // therefore NOT be used for saving the gamestate to
    16. // disk.
    17. [CreateAssetMenu]
    18. public class SaveData : ResettableScriptableObject
    19. {
    20.     // This nested class is a lighter replacement for
    21.     // Dictionaries.  This is required because Dictionaries
    22.     // are not serializable.  It has a single generic type
    23.     // that represents the type of data to be stored in it.
    24.     [Serializable]
    25.     public class KeyValuePairLists<T>
    26.     {
    27.         public List<string> keys = new List<string>();      // The keys are unique identifiers for each element of data.
    28.         public List<T> values = new List<T>();              // The values are the elements of data.
    29.  
    30.  
    31.         public void Clear ()
    32.         {
    33.             keys.Clear ();
    34.             values.Clear ();
    35.         }
    36.  
    37.  
    38.         public void TrySetValue (string key, T value)
    39.         {
    40.             // Find the index of the keys and values based on the given key.
    41.             int index = keys.FindIndex(x => x == key);
    42.  
    43.             // If the index is positive...
    44.             if (index > -1)
    45.             {
    46.                 // ... set the value at that index to the given value.
    47.                 values[index] = value;
    48.             }
    49.             else
    50.             {
    51.                 // Otherwise add a new key and a new value to the collection.
    52.                 keys.Add (key);
    53.                 values.Add (value);
    54.             }
    55.         }
    56.  
    57.  
    58.         public bool TryGetValue (string key, ref T value)
    59.         {
    60.             // Find the index of the keys and values based on the given key.
    61.             int index = keys.FindIndex(x => x == key);
    62.  
    63.             // If the index is positive...
    64.             if (index > -1)
    65.             {
    66.                 // ... set the reference value to the value at that index and return that the value was found.
    67.                 value = values[index];
    68.                 return true;
    69.             }
    70.  
    71.             // Otherwise, return that the value was not found.
    72.             return false;
    73.         }
    74.     }
    75.  
    76.  
    77.     // These are collections for various different data types.
    78.     public KeyValuePairLists<bool> boolKeyValuePairLists = new KeyValuePairLists<bool> ();
    79.     public KeyValuePairLists<int> intKeyValuePairLists = new KeyValuePairLists<int>();
    80.     public KeyValuePairLists<string> stringKeyValuePairLists = new KeyValuePairLists<string>();
    81.     public KeyValuePairLists<Vector3> vector3KeyValuePairLists = new KeyValuePairLists<Vector3>();
    82.     public KeyValuePairLists<Quaternion> quaternionKeyValuePairLists = new KeyValuePairLists<Quaternion>();
    83.  
    84.  
    85.     public override void Reset ()
    86.     {
    87.         boolKeyValuePairLists.Clear ();
    88.         intKeyValuePairLists.Clear ();
    89.         stringKeyValuePairLists.Clear ();
    90.         vector3KeyValuePairLists.Clear ();
    91.         quaternionKeyValuePairLists.Clear ();
    92.     }
    93.  
    94.  
    95.     // This is the generic version of the Save function which takes a
    96.     // collection and value of the same type and then tries to set a value.
    97.     private void Save<T>(KeyValuePairLists<T> lists, string key, T value)
    98.     {
    99.         lists.TrySetValue(key, value);
    100.     }
    101.  
    102.  
    103.     // This is similar to the generic Save function, it tries to get a value.
    104.     private bool Load<T>(KeyValuePairLists<T> lists, string key, ref T value)
    105.     {
    106.         return lists.TryGetValue(key, ref value);
    107.     }
    108.  
    109.  
    110.     // This is a public overload for the Save function that specifically
    111.     // chooses the generic type and calls the generic version.
    112.     public void Save (string key, bool value)
    113.     {
    114.         Save(boolKeyValuePairLists, key, value);
    115.     }
    116.  
    117.  
    118.     public void Save (string key, int value)
    119.     {
    120.         Save(intKeyValuePairLists, key, value);
    121.     }
    122.  
    123.  
    124.     public void Save (string key, string value)
    125.     {
    126.         Save(stringKeyValuePairLists, key, value);
    127.     }
    128.  
    129.  
    130.     public void Save (string key, Vector3 value)
    131.     {
    132.         Save(vector3KeyValuePairLists, key, value);
    133.     }
    134.  
    135.  
    136.     public void Save (string key, Quaternion value)
    137.     {
    138.         Save(quaternionKeyValuePairLists, key, value);
    139.     }
    140.  
    141.  
    142.     // This works the same as the public Save overloads except
    143.     // it calls the generic Load function.
    144.     public bool Load (string key, ref bool value)
    145.     {
    146.         return Load(boolKeyValuePairLists, key, ref value);
    147.     }
    148.  
    149.  
    150.     public bool Load (string key, ref int value)
    151.     {
    152.         return Load (intKeyValuePairLists, key, ref value);
    153.     }
    154.  
    155.  
    156.     public bool Load (string key, ref string value)
    157.     {
    158.         return Load (stringKeyValuePairLists, key, ref value);
    159.     }
    160.  
    161.  
    162.     public bool Load (string key, ref Vector3 value)
    163.     {
    164.         return Load(vector3KeyValuePairLists, key, ref value);
    165.     }
    166.  
    167.  
    168.     public bool Load (string key, ref Quaternion value)
    169.     {
    170.         return Load (quaternionKeyValuePairLists, key, ref value);
    171.     }
    172. }
    173. }
    174.  
    I was thinking using ExpandoObject instead of Dictionaries now when C# is going further of Mono.

    @JamesLeeNZ you speaking like you always know at start what strings you will use, you know everthing form the start and you can buffer hardcode everything. What about things are created dynamically runtime(use lot of Reflection), dynamic delegates,.. and called every frame? What then? You are stuck of static way of thinking in 21st when C# gives you lot of possibilities. No matter Vexe intial request, guy is breaking the ice over your head. One more thx to Vexe for FastReflection.
     
    Last edited: Apr 16, 2019
  21. ChrisVitei

    ChrisVitei

    Joined:
    Sep 18, 2013
    Posts:
    5
    This is great! Thank you!

    Nothing more to add other than it's been incredibly useful for me :)
     
  22. palex-nx

    palex-nx

    Joined:
    Jul 23, 2018
    Posts:
    1,748
  23. Mortuus17

    Mortuus17

    Joined:
    Jan 6, 2020
    Posts:
    105
    Don't know...
    My test had:
    0.02ms - 0.03ms (70/30), 146-148 bytes of garbage per frame.
    Using your library I get 75 bytes of garbage 80% of all frames, taking 0.07ms (the "test" is a conditional concat of 5 strings) - so yeah, so much for gc free.
    And taking into consideration that in the end, GC costs CPU time, i personally THINK that this library just strictly makes your code slower.