General performance/optimization tips for Unity

PhilSA · Apr 14, 2016

So the idea came up in another thread to start a thread where we can all share general performance, optimization, and "best practices" tips. I guess the scripting forum kinda is the most appropriate place for this. I have a few ideas in mind so I'll start:

Disclaimer 1: I am not a genius and I fully expect to be wrong about at least half of what I'm about to write. The goal of this thread is to provide some sort of starting point for a discussion on optimization and performance in Unity and see where the discussions leads us. If I need to, I'll make corrections as we go.

Disclaimer 2: "It depends"

Disclaimer 3: I am giving these tips in a "keep that in mind for your future projects" kind of way, not in a "from now on you must absolutely do this" kind of way

1. Consider not using Monobehaviours on everything

According to this Unity blog post, the way Monobehaviours call their "magic methods" is very unoptimal. Here's a simple test that illustrates this:

We will make 10k GameObjects move every frame using two different methods. In the first method, we attach a MonoBehaviour to each GameObject, and this Monobehaviour is what makes them move. In the second method, we have only one MonoBehaviour in the entire scene, and this MonoBehaviour is responsible for creating entity instances that will make their associated object move.

EDIT: Made some improvements to the test to better compare monobehaviours and basic classes

Test 1: Monobehaviours on everything:

Code (CSharp):

public class Mover : MonoBehaviour

{

Transform _transform;

void Start()

{

_transform = gameObject.GetComponent<Transform>();

}

void Update()

{

_transform.position += Vector3.forward * Time.deltaTime;

}

}

Code (CSharp):

public class MoverInit : MonoBehaviour

{

// This is a prefab of an empty Gameobject with a "Mover" script on it

public GameObject moverPrefab;

void Awake ()

{

for (int i = 0; i < 10000; i++)

{

Instantiate(moverPrefab);

}

}

}

Test 2: Oject manager (only one Monobehaviour in the entire scene):

Code (CSharp):

public class Entity

{

public Transform associatedTransform;

public void OnUpdate()

{

associatedTransform.position += Vector3.forward * Time.deltaTime;

}

}

Code (CSharp):

public class BasicManager : MonoBehaviour

{

private Entity[] entities = new Entity[10000];

void Start()

{

for (int i = 0; i < entities.Length; i++)

{

Entity newEntity = new Entity();

GameObject newGO = new GameObject("newobject");

newEntity.associatedTransform = newGO.GetComponent<Transform>();

entities[i] = newEntity;

}

}

void Update()

{

for (int i = 0; i < entities.Length; i++)

{

entities[i].OnUpdate();

}

}

}

Here are the results:

Monobehaviours on everything:

Oject manager:

Here's some results in terms of average ms per frame:

Having a MonoBehaviour on every object: 3.15 ms per frame

Having each object represented by a basic class with a reference to its Transform: 0.9 ms per frame

As you can see, the difference between the two is pretty incredible. Now, that doesn't mean making an entire game without Monobehaviours will necessarily triple your framerate, because there's a lot more going on than just scripts on empty GameObjects running in a real game, but it'll most likely make a good difference.

In theory, it should be possible to make an entire game using only one MonoBehaviour that calls Update() for everything else. All your game entities would be standard C# classes instantiated and updated by the Main MonoBehaviour that have a link to their model (a ScriptableObject asset), and their view (their GameObject in the scene). This would give you a pretty solid MVC architecture! Though it should be noted that I haven't gotten the chance to put this in practice yet.

2. Arrays and Lists

Never use foreach to iterate over a List. It is much slower than for(int i = 0; ......).

foreach and for(...) are almost the same for Arrays, though.

Whenever you have a collection of things with a specific size (or a size that doesn't change very often, like inventory slots, players in an online game, etc...), use Arrays instead of Lists. Arrays are always More-Performant™

Unity has a handy ArrayUtility class for doing Contains() or Find() operations on Arrays

More on that here:
http://programmers.stackexchange.com/questions/221892/should-i-use-a-list-or-an-array
http://stackoverflow.com/questions/454916/performance-of-arrays-vs-lists

3. Static objects

Many optimizations are made for objects that we know won't need to move. For all of those objects, don't forget to check the "Static" box in the top right corner. Taking the time to do this for all objects that will never move can make a big difference

http://docs.unity3d.com/Manual/StaticObjects.html

4. Never use concave mesh colliders

Concave mesh collider collisions are ridiculously expensive and should never be used in any context whatsoever. Instead, always use convex decomposition. I highly suggest you implement your own convex decomposition solution for Unity, but if you really reeeaaaaalllllllly don't want to, there's some solutions on the asset store.

5. Data-oriented design

This is kind of a huge subject and I won't take the time to explain here in detail, so instead I'll point you to this great talk on the subject:

Data oriented programming, simply put, is programming in a computer-friendly way instead of a human-friendly way. There is a lot to learn about how computers process data and how you can arrange your code to make it easier and faster for the computer to process. I don't think it's realistic to make your entire game 100% data-oriented, but there may be key parts of your systems that would no doubt benefit a lot from it.

More examples/explanation here: http://stackoverflow.com/questions/1641580/what-is-data-oriented-design

6. Be careful with the asset store

The Unity community really likes its asset store, but it can sometimes be more of a curse than a blessing. What often happens when you start building a project with all sorts of packages from the asset store is that your project eventually starts crumbling under its own lack of architectural cohesion, and quickly becomes a nightmare to manage. Not only that, but you can never be assured that whatever you get from the asset store is made in an efficient way. Not even with highly popular packages. Actually, generic, all-purpose systems have a high risk of not being very efficient (or well-adapted) by nature.

In the long run, you will benefit immensely from taking the time to do things by yourself. Especially simple gameplay systems like inventory systems, weapon systems and such. You will become a better programmer, you will understand your code and be able to change it easily, and you will have a system that is better suited for your project's specific needs.

Speaking of generic, all-purpose systems.... just because Unity provides users with pre-made, out-of-the-box systems doesn't mean they are ideally suited for any and every game ever. Take Unity's built-in navmesh system for example. Is it a good idea to use it for your new MMO crafting game with procedural destructible planets and wildlife? Probably not. Nothing's preventing you from making your own navmesh system instead. You can even replace the entire physics engine if your project requires it. There is a lot you can do even without source code access (though the lack of source code access is still a colossal handicap. Don't get me wrong here!)

7. There is a tiny price to pay for every inheritance level

Here's a test output of calling various class functions 100k times:

As you can see, the deeper your inheritance goes, the costlier it gets. The price is very small, though (keep in mind this is 100000 calls and a 2.5ms difference). Basically, it's about 0.000005ms per override level per function call. That is really not a whole lot of milliseconds.

I gotta admit I'm putting this tip here for information purposes only, because in practice, I really don't think it's worth complexifying your architecture for a performance gain as tiny as this. But if you really end up having to make 100k calls to a function that is overriden 8 times one day, do consider composition instead of inheritance

8. Quick tips

Avoid using Linq at runtime as much as possible. It's really slow

Never ever use Reflection at runtime

Never use Unity's SendMessage() system. Not only is it much more costly than direct calls, but ending up using SendMessage() is very often a sign that your code architecture is kinda flawed and could be improved.

Please avoid using any sort of "Find()" method to get references to other objects in your scene. This approach is super popular among Unity users (blame the early learning resources for that), and also super not safe, elegant or efficient. Instead, come up with a plan to link all of your objects together through managers instantiating entities and keeping references to all of them. For instance, you can have a GameManager that instantiates the Player and Enemies at the start of the game, and keeps references to all of them at the moment of instantiation. Later, if an Enemy needs a reference to the Player, it calls its GameManager (which it kept a reference to when it got instantiated), and asks the GameManager for the Player's reference.

Always cache your transform/rigidbody/renderer/etc components on Awake() or Start() in order to use them later on Update()

hippocoder · Feb 16, 2016

Just a short note: it's by far better to measure in millisecs not FPS. Not sure about the inventory thing. Use the best data structure for the job instead because it's unlikely a) anyone would want a flat hierarchy of inventory items which would somehow number in the thousands, and b) it's still not enough to dent the performance of a game using List instead. c) you wouldn't do it each frame.

So my advice is: premature optimisation's pretty evil too.

PhilSA · Feb 16, 2016

hippocoder said: ↑

So my advice is: premature optimisation's pretty evil too.
Click to expand...

Well, sure. But this is a thread about performance considerations afterall

kru · Feb 16, 2016

PhilSA said: ↑

In theory, it should be possible to make an entire game using only one MonoBehaviour that calls Update() for everything else. All your game entities would be standard C# classes instantiated and updated by the Main MonoBehaviour that have a link to their model (a ScriptableObject asset), and their view (their GameObject in the scene). This would give you a pretty solid MVC architecture! Though it should be noted that I haven't gotten the chance to put this in practice yet.
Click to expand...

This is entirely possible, and quite workable. A few wrapper classes to communicate with unity components on the main thread, and you can offload the majority of your game to plain old C# objects. One can turn Unity in to a real ECS and take advantage of cpu cache bundling.

PhilSA · Feb 16, 2016

kru said: ↑

This is entirely possible, and quite workable. A few wrapper classes to communicate with unity components on the main thread, and you can offload the majority of your game to plain old C# objects. One can turn Unity in to a real ECS and take advantage of cpu cache bundling.
Click to expand...

I should add that the only exception would be anything that requires collision events. Monobehaviours are the only way to work with that.

I really hope Unity can expose more of the PhysX API one day

lordofduct · Feb 16, 2016

topic 1, Consider not using Monobehaviours on everything

The example isn't really a Monobehaviour thing, it's an 'Update' thing. The overhead isn't coming from too many MonoBehaviours, just that Update is being processed individually on multiple MonoBehaviours.

PhilSA · Feb 16, 2016

lordofduct said: ↑

topic 1, Consider not using Monobehaviours on everything

The example isn't really a Monobehaviour thing, it's an 'Update' thing. The overhead isn't coming from too many MonoBehaviours, just that Update is being processed individually on multiple MonoBehaviours.
Click to expand...

yeah, and now that I think about it, there's also the fact that the movement vector is calculated only once for everyone in the "one manager" version. At least is still shows that there are ways of arranging things more efficiently

There's no doubt that the test isn't completely accurate. However, I'm fairly certain it has been said somewhere that there is an overhead associated with simply having MonoBehaviours around. Even empty MonoBehaviours.

lordofduct · Feb 16, 2016

Also, it'd be nice if topic 2, arrays and lists, went into depth about what the problem really is... it actually is related to the same reason Linq can be slow.

It really has to do with the fact that 'foreach' treats the object you're looping over as an 'IEnumerable'. Calling 'IEnumerable.GetEnumerator' on it to return an 'IEnumerator' object. Because it's treated as an interface, even if the Enumerator return is a struct, it gets boxed on the heap.

Code (csharp):

foreach(var obj in lst)

{

//statement

}

//BECOMES

IEnumerator<T> e = ((IEnumerable<T>)lst).GetEnumerator(); //boxing here

try

{

T obj;

while(e.MoveNext())

{

obj = (T)e.Current;

//statement

}

}

finally

{

e.Dispose();

}

These objects boxed on the heap increase the garbage size, causing more GC calls.

The only reason 'foreach' on arrays is ok, is because the compiler optimizes foreach loops for arrays, avoiding the boxed IEnumerator.

The same goes for Linq, which treats collections as IEnumerables, as well as generating a lot of garbage for its temporary representations, leading to more GC as well.

This also goes for collections other than Array and List<T>, for example Dictionary<TKey,TValue>, HashSet<T>, etc, which may be used.

One might notice that the actual 'GetEnumerator' on these classes (when not cast to IEnumerable) returns a struct Enumerator, rather than a class. This avoids the GC. Which means you can use GetEnumerator to loop over collections, especially collections that aren't indexed (HashSet<T> and Dictionary<TKey,TValue> are not indexed).

Code (csharp):

var dict = new Dictionary<string, int>();

dict["a"] = 5;

dict["b"] = 12;

dict["c"] = 15;

var e = dict.GetEnumerator();

while(e.MoveNext())

{

Debug.Log(e.Current.Key);

}

NOTE - the mono version of Dictionary returns NEW collections when you access the 'Values' or 'Keys' properties of the dictionary. This too causes GC slow down. So if you want to loop the keys or values, loop the dictionary directly and just access the respective property of the Enumerator (note my example above).

This is opposed to the .Net version of Dictionary. I only recently discovered this looking through the mono source code... I was very disapointed in the mono implementation when I saw this.

lordofduct · Feb 16, 2016

However, I'm fairly certain it has been said somewhere that there is an overhead associated with simply having MonoBehaviours around. Even empty MonoBehaviours.
Click to expand...

Well yeah, of course, any new object is going to come with some sort of overhead. May it be in its instantiation cost, or the fact it takes up memory, and what not.

PhilSA · Feb 16, 2016

lordofduct said: ↑

topic 1, Consider not using Monobehaviours on everything

The example isn't really a Monobehaviour thing, it's an 'Update' thing. The overhead isn't coming from too many MonoBehaviours, just that Update is being processed individually on multiple MonoBehaviours.
Click to expand...

Made a new test for the Monobehaviours thing, this time with basic C# classes versus Monobehaviours. This should give us a much better idea of the real cost of Monobehaviours:

This is the code for the basic C# classes version. Monobehaviours version is the same as the initial one.

Code (CSharp):

public class Entity

{

public Transform associatedTransform;

public void OnUpdate()

{

associatedTransform.position += Vector3.forward * Time.deltaTime;

}

}

Code (CSharp):

public class BasicManager : MonoBehaviour

{

private Entity[] entities = new Entity[10000];

void Start()

{

for (int i = 0; i < entities.Length; i++)

{

Entity newEntity = new Entity();

GameObject newGO = new GameObject("newobject");

newEntity.associatedTransform = newGO.GetComponent<Transform>();

entities[i] = newEntity;

}

}

void Update()

{

for (int i = 0; i < entities.Length; i++)

{

entities[i].OnUpdate();

}

}

}

And the results:

With monobehaviours:

with basic C# classes

2.9ms versus 8.2ms. Still a pretty huge difference, and this test is as 1:1 as it can get, in my opinion

So in short:

Having a MonoBehaviour on every object: 8.2 ms per frame

Having each object represented by a basic class with a reference to its Transform: 2.9 ms per frame

Having each transform updated directly by the manager: 2 ms per frame

Interesting fact: I decided to replace the entities array with a List. The time per frame was 3.3ms
Granted, if an iteration over 10000 elements means a 0.4ms difference with an Array, using Lists probably won't be what's slowing down your game. Still nice to know

Dameon_ · Feb 16, 2016

Having 10000 entities using Update behaviour isn't necessarily proof that MonoBehaviours are evil, but that using the Update behaviour indiscriminately is evil. If you have 10000 entities using Update, you've gone horribly wrong.

MonoBehaviours can save you hundreds of dev hours kf they're properly used, with little impact on your framerate. For some limited cases it's better to use a custom class, but you should be considering whether something needs to be a MonoBehaviour when you inherit from it.

I think that optimization is a lot more complicated than just "do this, don't do this." It's not a real optimization if it costs you dev hours for speed increases that the player won't ever notice. For example, this thread might have somebody bending over backwards to avoid MonoBehaviours and fighting Unity's built in MonoBehaviour magic. That could cost a lot of dev hours not just in writing custom code to effect the same behaviour, but in loss of editor functionality.

Another example is the fear of boxing/unboxing. Some people will drive themselves crazy avoiding it, and jump through all kinds of hoops, but it's only an issue if you're doing it every frame with a lot of objects. You can easily spend all kinds of time avoiding one of the key features of .net, losing all kinds of time developing workarounds for something that could save you a lot of trouble.

Personally, I find my biggest optimizations occur on a large scale. For example, using an event driven system rather than constantly polling for state changes is a huge optimization. Replacing a list with an array is a negligible optimization.

Martin_H · Feb 16, 2016

Awesome thread, thanks for making it!

Just a quick question: you are doing all the performance profiling in builds instead of the editor, right?

Nigey · Feb 16, 2016

Speculative generality is a known code smell. I consider over optimization to be the game development equivalent.

Note: On avoiding MonoBehaviour. I looked into that because of the same blog entry. To have any script be a non-monobehaviour, you cannot have it attached to a GameObject. The only way to attach it to a GameObject is THROUGH Monobehaviour. So you might as well have a Monobehaviour script, which has a 'CustomUpdate()', instead of using magic 'Update()'. Let alone the fact that you can't do any other standard things without MonoBehaviour. For me I found I was trying to fix a problem that was not only not there yet, but was likely to never come to pass.

TheSniperFan · Feb 16, 2016

I'm reposting an old post of mine.

Here are some general performance tips:

1. Don't believe, verify!
What? Instead of blindly believing that your code runs fast, you should put it to the test. The profiler is your best friend when it comes to this.
Why? Because we're humans and humans make errors. If you just look at the code, isolated from the actual environment it runs in, chances are that you overlook some factors that are important.
Example? The way I go about it is by stress-testing every module I write. After I've written one - say an item component that makes it so that the player can pick up something - I create an empty test-scene, think about how many items there can exist in a level of my game and add considerably more (because better safe than sorry). Before I run the scene with the profiler attached, it is absolutely crucial that I think about my expectations first. How many CPU cycles should 800 items that are not being interacted with use? Then I test it.
Believe me when I tell you that you're in for some surprises, if you do it that way.

2. Don't measure your performance in FPS:
What? When game-developers measure performance, they don't look at the framerate, but the frametime. You never target 60 fps, you target 16 ms. You never target 30 fps, you target 33 ms.
Why? Because the framerate averages the actual performance over the course of one second which makes it completely useless. One second is a very long time for your computer.
Because the differences in framerate are non-linear. When you measure that a certain part of your system costs you 10 fps, that doesn't tell you anything, because it depends on how many frames you were running with before. The frametime is linear. Something that takes 5ms always takes 5ms.
Example? You're running at 60 fps and a script brings you down to 54 vs. You're running at 600 fps and a script brings you down to 400. Which caused the bigger performance hit?

3. Consistent performance is absolutely crucial:
What? It's better to have slightly worse, but consistent performance, than having your performance jump all over the place.
Why? Think about the players who are close to the minimal system requirements. When their performance varies greatly, it varies between playable and unplayable, which easily renders the game unenjoyable.
Example? There's few things more frustrating than enjoying a game for hours, before having to put it down due to a performance hit in a late chapter that makes it impossible to play any further. That stuff shouldn't happen. If the player can start, he should be able to finish.

4. Optimize for the most common case, not the most generic one:
What? The title says it all.
Why? Because being fast in a case that barely even happens often doesn't matter.
Example? A ladder script that you can attach to anything that you want the player to be able to climb up. What's the most common case of a ladder? I'll tell you what it isn't and that's the player actually climbing. So before making sure that your climbing code is efficient, you should make sure that the component is disabled when there's nobody climbing it in the first place. (Point 2 at the second list - the overhead involved in calling Update() functions is huge)

5. Prevent last minute decision making:
What? Last minute decision making means that you're doing calculations despite already knowing that you don't need them. switch and if/else statements are particularly problematic and so are Update() functions that start contain one huge if/else.
Why? You're wasting CPU cycles for literally nothing. Use the information you've got to the fullest.
Example? Calculating some movement vectors and discard them afterwards, instead of checking whether they should be calculated first. Obvious? Sure. There are far more elusive cases out there. Some of which have devastating performance implications.

6. Don't pretend like your components exist in a vacuum, over-abstract or be afraid of coupling:
What? When you write components for Unity, it's typically considered to be good practice to write your components in a way that you can drag them onto any object and they Just Work™. This can be a very bad thing to do in not just a few cases.
Why? Because by virtue of over-abstraction, your model of the world can be completely detached from the reality in which the software actually runs. You're rarely dealing with "an item", "a ladder" or "an enemy". The reality is that you're dealing with "a set of items", "a set of ladders" or "a set of enemies". Write your classes accordingly and couple what belongs together.
Example? You have written a "selector" that lets you pick up "items". The item's Update() function handles them being picked up and held in front of you among other things. Typically, when regarding "an item" as if it existed in a vacuum, you'd probably use something like a boolean variable that determines whether the item is being held or not. The selector would change it from the outside. This is terrible, terrible code. Instead, look at the actual reality of your game: A set of items where either one or none is held at a time. Suddenly it makes a lot more sense to write good code. Namely, ditching the boolean logic of the item class, making the Update() function hold the object directly, without any further checks and using the selector to enable/disable the desired object. This is a prime example of rules 4, 5 and 6 and creates a tremendous performance increase, because it reduces the runtime complexity of your items from O(n) to O(1).

Here are some more Unity-specific performance tips I apply when writing my code for Unity:

1. Zero background noise:
What? What I refer to as "background noise" are heap-allocations that happen during each update-loop. Of course this also includes FixedUpdate() and LateUpdate().
Why? Automatic memory management is very dangerous for real-time applications. You cannot control when the garbage collector kicks in and when it does, your application comes to a complete stop for a uncertain period of time. I have an Intel Xeon E3-1231 v3 which plays in a whole other league than what mobile developers have to work with. The GC still takes a few ms to complete.
Example? You write a player script and run the profiler. If you're standing still, you should allocate exactly nothing. However, this is not enough because standing perfectly still is not the most common case. Usually, the player will move his character. This means that all the common operations (moving, rotating the camera, jumping, leaning, crouching and whatnot) are not allowed to allocate memory during each update-loop. If you need some resources on the heap, allocate them beforehand and reuse them.

2. Disable unused scripts:
What? Manage which scripts have their Update(), LateUpdate() and FixedUpdate() functions called at all. If it's not doing anything, it shouldn't use any CPU cycles.
Why? Scalability in regards to the overhead of calling those functions. If you merely use a boolean variable to skip the actual logic of your update-loop altogether, you're underestimating the overhead involved in calling them. From what I've seen, the overhead of the Update() function being called seems to be orders of magnitude slower than comparing a boolean variable. I found this out during my stress-testing sessions.
Example? So you wrote a script that you can drag on an GameObject and it makes the GameObject play a sound when falling on the floor. Let's assume that you have placed a lot of them. Do you really need to have the objects in that building on the far side of your level being updated, if the player isn't anywhere close to them? No, because the player wouldn't be able to hear them falling down anyway.

3. Cache stuff:
What? The components you use a lot should be cached. Other objects you work with, should be cached (if you already know them). You get the idea.
Why? GetComponent<T>() and GameObject.Find() are very, very slow calls. Just use them once, instead of every frame.
Example? You do your GetComponent<T>() calls in Awake() and store the results.

PhilSA · Feb 16, 2016

Nigey said: ↑

Speculative generality is a known code smell. I consider over optimization to be the game development equivalent.
Click to expand...

Dameon_ said: ↑

I think that optimization is a lot more complicated than just "do this, don't do this." It's not a real optimization if it costs you dev hours for speed increases that the player won't ever notice.
Click to expand...

hippocoder said: ↑

So my advice is: premature optimisation's pretty evil too.
Click to expand...

I can understand that people are wary of premature optimization, but keep in mind that this thread isn't saying "from now on you must always do all this". The goal of this thread is to inform people about all the possible performance pitfalls we can think for, so that they are aware of what they can do to improve their game's performance. This is the time and the place to discuss all these things

Dameon_ said: ↑

Having 10000 entities using Update behaviour isn't necessarily proof that MonoBehaviours are evil, but that using the Update behaviour indiscriminately is evil. If you have 10000 entities using Update, you've gone horribly wrong.
Click to expand...

Keep in mind that my tip doesn't say "Never use Monobehaviours". It says "Consider not using Monobehaviours on everything". I did suggest at some point that it could be possible to make an entire game without monobehaviours, but that was just an observation. The 10k behaviours test only serves to prove that a monobehaviour update takes a lot more time than a call to a regular class method.

It is very common to see most unity users put monobehaviours on everything. For instance, many bullet hell games have a monobehaviour on each bullet. In this case it becomes a totally great idea to make a single (or a few) "bullet managers" instead. Same goes for "DestroyAfterXSeconds" behaviours on every sound effect, decal, particle effect that is spawned in a game. A manager would be great for that. This is what I'm hoping people will learn from this tip

PhilSA · Feb 16, 2016

Martin_H said: ↑

Just a quick question: you are doing all the performance profiling in builds instead of the editor, right?
Click to expand...

Actually I forgot about that. I updated the test with the build profiling results. Both are faster, but with about the same relative difference as in the editor

TheSniperFan · Feb 16, 2016

Martin_H said: ↑

Awesome thread, thanks for making it!

Just a quick question: you are doing all the performance profiling in builds instead of the editor, right?
Click to expand...

PhilSA said: ↑

Actually I forgot about that. I updated the test with the build profiling results. Both are faster, but with about the same relative difference as in the editor
Click to expand...

It is perfectly fine to use in-editor profiling for fast iterations. However, you should always do the final checks using an actual build of your game.
As your projects grow, you're likely to end up with more and more code that runs only in either the editor or builds. It will become next to impossible to keep track of every line of code that behaves differently depending on where it runs.
Just be safe and make sure that your final tests are made using builds.

Another one for the thread:

Keep you transform hierarchies flat
I haven't really thought about that one until the last thread (not this one, the one that "spawned" this one), but it's such an obvious performance hog.
The transform hierarchy is a tree. In order to calculate the actual transform of an object (world-position, -orientation & -scale), you need to look at its own transform component and every parent transform, all the up to the root. You add the positions and multiply all the rotations and scales on the way.
Let's just look at the transform component. It consist of three "pieces of data", if you wish.
If you have n GameObjects in your scene, you need to evaluate 3*n pieces of data in order to prepare the transform component of them all.
When you (ab-)use a GameObject as directory, as it's often done, you drastically increase the amount of data that needs to be evaluated. If you use one root-GameObject (typically used as a crutch for world-streaming before the SceneManager was a thing), you double the amount of work, as you now have to evaluate 2*3*n pieces of data.

Unfortunately Unity's hierarchy view is pushing developers to fall into this pit, because it lacks any other form of grouping. Simple layers for GameObjects would be an amazing addition to the editor. Nothing fancy, just something as seen in Blender. I might write an editor extension to do just that. It just depends on whether the editor extension API allows me to do that without too much of a headache.

EDIT:
I should specify that I mean layers that exist purely in-editor and aren't tied to the physics engine.

PhilSA · Feb 16, 2016

TheSniperFan said: ↑

Unfortunately Unity's hierarchy view is pushing developers to fall into this pit, because it lacks any other form of grouping. Simple layers for GameObjects would be an amazing addition to the editor. Nothing fancy, just something as seen in Blender. I might write an editor extension to do just that. It just depends on whether the editor extension API allows me to do that without too much of a headache.
Click to expand...

Basically it needs what UE4 has. Just literal 'folders' to organize your gameObjects, that have no impact on anything other than grouping things in the UI.

At first I was upset about UE4's lack of a concept of "hierarchy" in its scene, but now it's really starting to make sense to me

TheSniperFan · Feb 16, 2016

I just put the transform hierarchy post I made to the test. Here is the gist of it:

Is there a performance difference?
Yes.

Should I bother?
No. (And that's coming from someone who values performance A LOT.)
The performance difference is really that small.

5000 GameObjects (moving right using a central manager):
As root objects: ~4.3ms
Inside a hierarchy at depth 1: ~4.5ms
Inside a hierarchy at depth 10: ~6.8ms

(I used the standard cube as template. So every object had its own collider attached to it.)

Martin_H · Feb 16, 2016

TheSniperFan said: ↑

I just put the transform hierarchy post I made to the test. Here is the gist of it:

Is there a performance difference?
Yes.

Should I bother?
No. (And that's coming from someone who values performance A LOT.)
The performance difference is really that small.

5000 GameObjects (moving right using a central manager):
As root objects: ~4.3ms
Inside a hierarchy at depth 1: ~4.5ms
Inside a hierarchy at depth 10: ~6.8ms

(I used the standard cube as template. So every object had its own collider attached to it.)
Click to expand...

I made similar tests with physics objects, might have even posted the results in one of the threads I made in the past. As far as I remember I came to a similar conclusion. I'm (ab)using transforms as folders because atm having stuff organized is worth more to me than very minor performance gains. I can still rather easily change this further down the line if I feel like my priorities have changed.

@hippocoder: Is there a chance we can get this thread pinned to the top? I feel like it's really useful for a great many of people and will continue to be even if no further replies come in for a while.

PhilSA · Feb 16, 2016

TheSniperFan said: ↑

3. Cache stuff:
What? The components you use a lot should be cached. Other objects you work with, should be cached (if you already know them). You get the idea.
Why? GetComponent<T>() and GameObject.Find() are very, very slow calls. Just use them once, instead of every frame.
Example? You do your GetComponent<T>() calls in Awake() and store the results.
Click to expand...

This reminded me of something super duper important that not many people know:
Whenever you call this.transform in a monobehaviour, it actually does a GetComponent (or an equivalent) of the transform component every time. So, you must always cache your transform component on Awake() or Start() in order to use it later on Update()

I can prove it by re-using my 10k monobehaviours test that each move their transform on Update(). If I don't cache the transform, and instead use this.transform to access it every frame, the average ms per frame jumps from 8 to 11 ms

Same goes for Camera.main and all sorts of gameObject.something

EDIT: Okay, that's pretty weird. What if I told you that this:

Code (CSharp):

void Update()

{

gameObject.GetComponent<Transform>().position += Vector3.forward * Time.deltaTime;

}

Performs slightly better than this:

Code (CSharp):

void Update()

{

this.transform.position += Vector3.forward * Time.deltaTime;

}

The gameObject.GetComponent<Transform>() version is actually 1 ms faster...

hippocoder · Feb 16, 2016

Martin_H said: ↑

@hippocoder: Is there a chance we can get this thread pinned to the top? I feel like it's really useful for a great many of people and will continue to be even if no further replies come in for a while.
Click to expand...

I dunno, if it becomes super useful then it's probably better to put it on the wiki or something, reason being is that the data can go out of date with unity versions so it should also explicitly state version. Plus it should be peer reviewed rather than taken as gospel. Following thread though, see what happens.

Then there's the whole IL2CPP and ongoing optimisations...

Kiwasi · Feb 17, 2016

hippocoder said: ↑

...reason being is that the data can go out of date with unity versions so it should also explicitly state version...
Click to expand...

Speaking of out of date optimisations:

PhilSA said: ↑

This reminded me of something super duper important that not many people know:
Whenever you call this.transform in a monobehaviour, it actually does a GetComponent (or an equivalent) of the transform component every time. So, you must always cache your transform component on Awake() or Start() in order to use it later on Update()
Click to expand...

What version are you using? From 5.0 the transforms are cached by the engine anyway (They were supposed to be, I seldom have much need for this sort of micro-optimisation). And none of the GameObject.somethingElse properties work.

PhilSA · Feb 17, 2016

BoredMormon said: ↑

Speaking of out of date optimisations:
What version are you using? From 5.0 the transforms are cached by the engine anyway (They were supposed to be, I seldom have much need for this sort of micro-optimisation). And none of the GameObject.somethingElse properties work.
Click to expand...

I thought I heard that too, but it seems that for whatever reason, this.transform is heavier than GetComponent<Transform>(), which is in turn heavier than a manually cached transform

The test I made was on 5.3.2f1

lordofduct · Feb 17, 2016

fda

PhilSA said: ↑

I thought I heard that too, but it seems that for whatever reason, this.transform is heavier than GetComponent<Transform>(), which is in turn heavier than a manually cached transform

The test I made was on 5.3.2f1
Click to expand...

Well of course a manually cached transform is going to be fastest. It's already referenced locally in Mono, no overhead of a function call (this.transform is just a property accessor, a function call). As well as no overhead of communicating with the internal unity runtime.

Thing is I just tested, and 'this.transform' is faster than 'this.GetComponent<Transform>()' on my system. This was with 5.3.1f1

Code displays averages

Code (csharp):

using UnityEngine;

public class ZTestScript02 : MonoBehaviour {

public int Count = 500000;

private Transform _t;

double _directMS;

double _propertyMS;

double _genericMS;

double _typeMS;

void Awake()

{

_t = this.GetComponent<Transform>();

}

void Update()

{

var watch = new System.Diagnostics.Stopwatch();

watch.Start();

for (int i = 0; i < this.Count; i++)

{

var t = _t;

}

watch.Stop();

_directMS += (watch.Elapsed.Milliseconds - _directMS) * 0.1d;

watch.Reset();

watch.Start();

for(int i = 0; i < this.Count; i++)

{

var t = this.transform;

}

watch.Stop();

_propertyMS += (watch.Elapsed.Milliseconds - _propertyMS) * 0.1d;

watch.Reset();

watch.Start();

for (int i = 0; i < this.Count; i++)

{

var t = this.GetComponent<Transform>();

}

watch.Stop();

_genericMS += (watch.Elapsed.Milliseconds - _genericMS) * 0.1d;

watch.Reset();

watch.Start();

for (int i = 0; i < this.Count; i++)

{

var t = this.GetComponent(typeof(Transform)) as Transform;

}

watch.Stop();

_typeMS += (watch.Elapsed.Milliseconds - _typeMS) * 0.1d;

watch.Reset();

}

void OnGUI()

{

//display average access time

GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _directMS)));

GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _propertyMS)));

GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _genericMS)));

GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _typeMS)));

}

}

speeds on my i7-2600K at 3.4ghz with 32GB RAM and Geforce GTX960 running Windows 10, Unity ver 5.3.1f1
times are for copying references 500,000 times each.
direct: ~2ms
property: ~14ms
GetComponent<T>: ~30ms
GetComponent(typeof): ~51ms

I'm testing merely just copying a reference to the transform, as this is the rawest data I can think of. No modifying or anything, as that would just dilute the results.

the transform property appears to be slightly over twice as fast as GetComponent.

Note, GetComponent<T> is faster than GetComponent(typeof), and this is because unity has changed the way they get components with GetComponent<T> in a rather ingenious way if might say so. It used to be just a forwarding method to the typeof method, but now it does this:

from decompiled UnityEngine.dll:

Code (csharp):

[WrapperlessIcall]

[MethodImpl(MethodImplOptions.InternalCall)]

internal void GetComponentFastPath(System.Type type, IntPtr oneFurtherThanResultValue);

[SecuritySafeCritical]

public unsafe T GetComponent<T>()

{

CastHelper<T> castHelper = new CastHelper<T>();

this.GetComponentFastPath(typeof (T), new IntPtr((void*) &castHelper.onePointerFurtherThanT));

return castHelper.t;

}

Note the 'CastHelper<T>' is a struct like this:

Code (csharp):

using System;

namespace UnityEngine

{

internal struct CastHelper<T>

{

public T t;

public IntPtr onePointerFurtherThanT;

}

}

So basically they create a struct on the stack in the method 'GetComponent<T>', they then pass in the memory address of the field that just follows a field typed to what we want the result to be. Since we know how the struct packs, the internal code knows the reference should be placed 4/8 bytes (32/64bit) before the address passed in.

This is nice since really on the internal side, they don't have to deal with preparing the reference for return. Instead all they do is on the internal side is set the pointer at that address to the address of the component, which it has stored internally anyways, and you don't have a cast on the mono side which you would if you called 'GetComponent(typeof)' since that returns it typed as Component.

It's literally just an int copy.

With that knowledge, this COULD result in higher efficiency on some machines vs the way the transform property works... maybe? Are you using OSX per chance?

PhilSA · Feb 17, 2016

lordofduct said: ↑

With that knowledge, this COULD result in higher efficiency on some machines vs the way the transform property works... maybe? Are you using OSX per chance?
Click to expand...

I actually just redid the test at home on a different machine with the exact same code I had earlier and my results are now similar to yours. Dunno what happened. I remember double-checking that I didn't mix things up because it really surprised me, so I don't think it was just a mistake.

I was on windows x64 in both cases.

Martin_H · Feb 17, 2016

lordofduct said: ↑

fda

Well of course a manually cached transform is going to be fastest. It's already referenced locally in Mono, no overhead of a function call (this.transform is just a property accessor, a function call). As well as no overhead of communicating with the internal unity runtime.

Thing is I just tested, and 'this.transform' is faster than 'this.GetComponent<Transform>()' on my system. This was with 5.3.1f1

Code displays averages

Code (csharp):

using UnityEngine;

public class ZTestScript02 : MonoBehaviour {

public int Count = 500000;

private Transform _t;

double _directMS;

double _propertyMS;

double _genericMS;

double _typeMS;

void Awake()

{

_t = this.GetComponent<Transform>();

}

void Update()

{

var watch = new System.Diagnostics.Stopwatch();

watch.Start();

for (int i = 0; i < this.Count; i++)

{

var t = _t;

}

watch.Stop();

_directMS += (watch.Elapsed.Milliseconds - _directMS) * 0.1d;

watch.Reset();

watch.Start();

for(int i = 0; i < this.Count; i++)

{

var t = this.transform;

}

watch.Stop();

_propertyMS += (watch.Elapsed.Milliseconds - _propertyMS) * 0.1d;

watch.Reset();

watch.Start();

for (int i = 0; i < this.Count; i++)

{

var t = this.GetComponent<Transform>();

}

watch.Stop();

_genericMS += (watch.Elapsed.Milliseconds - _genericMS) * 0.1d;

watch.Reset();

watch.Start();

for (int i = 0; i < this.Count; i++)

{

var t = this.GetComponent(typeof(Transform)) as Transform;

}

watch.Stop();

_typeMS += (watch.Elapsed.Milliseconds - _typeMS) * 0.1d;

watch.Reset();

}

void OnGUI()

{

//display average access time

GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _directMS)));

GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _propertyMS)));

GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _genericMS)));

GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _typeMS)));

}

}

speeds on my i7-2600K at 3.4ghz with 32GB RAM and Geforce GTX960 running Windows 10, Unity ver 5.3.1f1
times are for copying references 500,000 times each.
direct: ~2ms
property: ~14ms
GetComponent<T>: ~30ms
GetComponent(typeof): ~51ms

I'm testing merely just copying a reference to the transform, as this is the rawest data I can think of. No modifying or anything, as that would just dilute the results.

the transform property appears to be slightly over twice as fast as GetComponent.

Note, GetComponent<T> is faster than GetComponent(typeof), and this is because unity has changed the way they get components with GetComponent<T> in a rather ingenious way if might say so. It used to be just a forwarding method to the typeof method, but now it does this:

from decompiled UnityEngine.dll:

Code (csharp):

[WrapperlessIcall]

[MethodImpl(MethodImplOptions.InternalCall)]

internal void GetComponentFastPath(System.Type type, IntPtr oneFurtherThanResultValue);

[SecuritySafeCritical]

public unsafe T GetComponent<T>()

{

CastHelper<T> castHelper = new CastHelper<T>();

this.GetComponentFastPath(typeof (T), new IntPtr((void*) &castHelper.onePointerFurtherThanT));

return castHelper.t;

}

Note the 'CastHelper<T>' is a struct like this:

Code (csharp):

using System;

namespace UnityEngine

{

internal struct CastHelper<T>

{

public T t;

public IntPtr onePointerFurtherThanT;

}

}

So basically they create a struct on the stack in the method 'GetComponent<T>', they then pass in the memory address of the field that just follows a field typed to what we want the result to be. Since we know how the struct packs, the internal code knows the reference should be placed 4/8 bytes (32/64bit) before the address passed in.

This is nice since really on the internal side, they don't have to deal with preparing the reference for return. Instead all they do is on the internal side is set the pointer at that address to the address of the component, which it has stored internally anyways, and you don't have a cast on the mono side which you would if you called 'GetComponent(typeof)' since that returns it typed as Component.

It's literally just an int copy.

With that knowledge, this COULD result in higher efficiency on some machines vs the way the transform property works... maybe? Are you using OSX per chance?
Click to expand...

But wouldn't it be better to test a more common situation? Here you are retrieving the same thing over and over which seems pointless to me. The optimization for that would be just stopping to do that because it makes no sense, right? But what if you have half a million objects from which you want to retrieve their individual tranforms. That sounds a lot closer to a realistic usecase to me. I just had a gut feeling that your attempt not to dilute the results diluted them even further from being applicable to realworld scenarios. It's just a gut feeling that I wanted to test and I might have done something wrong because I'm still a noob with these things. So please have a look at my alternate test proposal and tell me what you think.

I've copied your original code first, let it run a little and the times are:
1.615 ms
11.986 ms
27.365 ms
44.459 ms
On an i7 with win7 64bit.

And now my twist on your test:

Code (csharp):

using UnityEngine;

public class ZTestScript06 : MonoBehaviour {

private int Count = 500000;

private Transform _t;

double _directMS;

double _propertyMS;

double _genericMS;

double _typeMS;

GameObject[] array;

void Awake()

{

_t = this.GetComponent<Transform>();

array = new GameObject[Count];

for (int i = 0; i < this.Count; i++)

{

array[i] = new GameObject();

}

}

void Update()

{

var watch = new System.Diagnostics.Stopwatch();

watch.Start();

for (int i = 0; i < this.Count; i++)

{

//var t = _t;

var t = array[i];

}

watch.Stop();

_directMS += (watch.Elapsed.Milliseconds - _directMS) * 0.1d;

watch.Reset();

watch.Start();

for(int i = 0; i < this.Count; i++)

{

//var t = this.transform;

var t = array[i].transform;

}

watch.Stop();

_propertyMS += (watch.Elapsed.Milliseconds - _propertyMS) * 0.1d;

watch.Reset();

watch.Start();

for (int i = 0; i < this.Count; i++)

{

//var t = this.GetComponent<Transform>();

var t = array[i].GetComponent<Transform>();

}

watch.Stop();

_genericMS += (watch.Elapsed.Milliseconds - _genericMS) * 0.1d;

watch.Reset();

watch.Start();

for (int i = 0; i < this.Count; i++)

{

//var t = this.GetComponent(typeof(Transform)) as Transform;

var t = array[i].GetComponent(typeof(Transform)) as Transform;

}

watch.Stop();

_typeMS += (watch.Elapsed.Milliseconds - _typeMS) * 0.1d;

watch.Reset();

}

void OnGUI()

{

//display average access time

GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _directMS)));

GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _propertyMS)));

GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _genericMS)));

GUILayout.Label(new GUIContent(string.Format("{0:0.000}ms", _typeMS)));

}

}

1.996 ms //doesn't retrieve the transform actually and just gets the object
62.069 ms
75.015 ms
93.616 ms

The differences seem a lot less pronounced now. Would you agree that this is a more useful test of real-world performance than the one originally proposed?

lordofduct · Feb 17, 2016

Martin_H said: ↑

But wouldn't it be better to test a more common situation? Here you are retrieving the same thing over and over which seems pointless to me. The optimization for that would be just stopping to do that because it makes no sense, right? But what if you have half a million objects from which you want to retrieve their individual tranforms. That sounds a lot closer to a realistic usecase to me. I just had a gut feeling that your attempt not to dilute the results diluted them even further from being applicable to realworld scenarios. It's just a gut feeling that I wanted to test and I might have done something wrong because I'm still a noob with these things. So please have a look at my alternate test proposal and tell me what you think.

I've copied your original code first, let it run a little and the times are:
1.615 ms
11.986 ms
27.365 ms
44.459 ms
On an i7 with win7 64bit.

And now my twist on your test:
..snip code...

1.996 ms //doesn't retrieve the transform actually and just gets the object
62.069 ms
75.015 ms
93.616 ms

The differences seem a lot less pronounced now. Would you agree that this is a more useful test of real-world performance than the one originally proposed?
Click to expand...

Well not really.

The point of the test wasn't to demonstrate a real world situation. It was an attempt to measure the speed of each relative to one another, so to know which is actually the fastest, and not necessarily by how much.

Note, how in your results, the actual differences between them are:
accessing this
generic is ~15.4ms slower than property
typeof is ~17ms slower than generic

accessing array
generic is ~13ms slower than property
typeof is ~18ms slower than generic

The differences in speed is actually roughly the same. They've all scaled up in cost by about 50ms, every one of them, similarly. With one exception, the direct access... with good reason.

UnityEngine.Objects (GameObject, Transform, scripts, etc) have 2 parts to them. The mono/.net object, and the C++ object. They sit in two completely different parts of memory. When you call through to the internal unity stuff, it needs to lookup the related internal object to the mono/.net object.

Now... I don't know how this is done personally. Maybe the instanceID is a hash on some hashtable, maybe they cache the pointer address somewhere, I just don't know. What I do know though is that there is SOME cost to it, and it's reasonable to assume that as it grows in size, it grows in cost.

And your example demonstrates that growth in cost. All methods had a similar increase in cost of calling it, 50ms. It's reasonable to blame this cost on that.

And the fact that the direct access doesn't come with that cost is easily explained as well. There's no communication between mono and the unity internal code. All it is, is a direct memory access of the mono memory heap. Accessing an array by the [index] accessor, versus accessing a variable, is pretty much identical in speed.

You can run a test to prove it.

Here is a simple Console application that demonstrates it:

Code (csharp):

using System;

namespace Console01

{

internal class Program

{

public static void Main()

{

const int COUNT = 500000;

const int LOOP = 5000000;

int[] arr = new int[COUNT];

int no = 0;

double noMS = 0d;

double lowMS = 0d;

double highMS = 0d;

var watch = new System.Diagnostics.Stopwatch();

var rand = new Random();

while(true)

{

watch.Start();

for (int i = 0; i < LOOP; i++)

{

var j = no.ToString();

}

watch.Stop();

noMS += (watch.Elapsed.Milliseconds - noMS) * 0.1d;

watch.Reset();

int low = rand.Next(10);

watch.Start();

for (int i = 0; i < LOOP; i++)

{

var j = arr[low].ToString();

}

watch.Stop();

lowMS += (watch.Elapsed.Milliseconds - lowMS) * 0.1d;

watch.Reset();

int high = rand.Next(COUNT - 11, COUNT - 1);

watch.Start();

for (int i = 0; i < LOOP; i++)

{

var j = arr[high].ToString();

}

watch.Stop();

highMS += (watch.Elapsed.Milliseconds - highMS) * 0.1d;

watch.Reset();

Console.WriteLine("{0:0.000}ms : {1:0.000}ms : {2:0.000}ms", noMS, lowMS, highMS);

System.GC.Collect(); //force collect all those strings, otherwise GC may throw off the StopWatch

}

}

}

}

So yeah, I wasn't going for a real world scenario (none of these examples are real world). I was just trying to compare the raw difference in property, generic GetComponent, and typeof GetComponent.

What your results do help to show others though is that there is an increased cost to accessing these methods when you have more objects in the scene.

But the relative difference between these methods remains the same.

NOW... and @PhilSA may find this interesting, there was a really WEIRD result I found when running your code though.

For smaller values of 'Count', downward of 10,000... the property accessor of Transform ended up being slower than the generic GetComponent call. Results similar to what PhilSA was claiming earlier.

At 10,000 items, with the array, I was getting:

property: ~1ms
generic: ~0.1ms
typeof: 0.2ms to 1.1ms (this one was weird, it jumped all over the place, and changed every time I played the benchmark)

Even though the typeof access jumped all over. The property and generic access was rock solid.

And this throws a whole wrench into the benchmark in general.

I have some theories as to why it might be happening... but I'm not exactly sure, so I can't really say. But maybe the transform property algorithm has a cost with a near linear growth curve, where as GetComponent has a cost with a polynomial growth curve.

So, for object counts lower than N, GetComponent ends up being faster, but end up slower as N increases.

Think of how if you plot a line, and a parabola, with vertex at (0,0). Values for x near 0, the line will be above the parabola, but for values of x as it approaches infinity, the parabola is much higher than the line.

Martin_H · Feb 17, 2016

lordofduct said: ↑

Well not really.

The point of the test wasn't to demonstrate a real world situation. It was an attempt to measure the speed of each relative to one another, so to know which is actually the fastest, and not necessarily by how much.

Note, how in your results, the actual differences between them are:
accessing this
generic is ~15.4ms slower than property
typeof is ~17ms slower than generic

accessing array
generic is ~13ms slower than property
typeof is ~18ms slower than generic

The differences in speed is actually roughly the same. They've all scaled up in cost by about 50ms, every one of them, similarly. With one exception, the direct access... with good reason.

UnityEngine.Objects (GameObject, Transform, scripts, etc) have 2 parts to them. The mono/.net object, and the C++ object. They sit in two completely different parts of memory. When you call through to the internal unity stuff, it needs to lookup the related internal object to the mono/.net object.

Now... I don't know how this is done personally. Maybe the instanceID is a hash on some hashtable, maybe they cache the pointer address somewhere, I just don't know. What I do know though is that there is SOME cost to it, and it's reasonable to assume that as it grows in size, it grows in cost.

And your example demonstrates that growth in cost. All methods had a similar increase in cost of calling it, 50ms. It's reasonable to blame this cost on that.

And the fact that the direct access doesn't come with that cost is easily explained as well. There's no communication between mono and the unity internal code. All it is, is a direct memory access of the mono memory heap. Accessing an array by the [index] accessor, versus accessing a variable, is pretty much identical in speed.

You can run a test to prove it.

Here is a simple Console application that demonstrates it:

Code (csharp):

using System;

namespace Console01

{

internal class Program

{

public static void Main()

{

const int COUNT = 500000;

const int LOOP = 5000000;

int[] arr = new int[COUNT];

int no = 0;

double noMS = 0d;

double lowMS = 0d;

double highMS = 0d;

var watch = new System.Diagnostics.Stopwatch();

var rand = new Random();

while(true)

{

watch.Start();

for (int i = 0; i < LOOP; i++)

{

var j = no.ToString();

}

watch.Stop();

noMS += (watch.Elapsed.Milliseconds - noMS) * 0.1d;

watch.Reset();

int low = rand.Next(10);

watch.Start();

for (int i = 0; i < LOOP; i++)

{

var j = arr[low].ToString();

}

watch.Stop();

lowMS += (watch.Elapsed.Milliseconds - lowMS) * 0.1d;

watch.Reset();

int high = rand.Next(COUNT - 11, COUNT - 1);

watch.Start();

for (int i = 0; i < LOOP; i++)

{

var j = arr[high].ToString();

}

watch.Stop();

highMS += (watch.Elapsed.Milliseconds - highMS) * 0.1d;

watch.Reset();

Console.WriteLine("{0:0.000}ms : {1:0.000}ms : {2:0.000}ms", noMS, lowMS, highMS);

System.GC.Collect(); //force collect all those strings, otherwise GC may throw off the StopWatch

}

}

}

}

So yeah, I wasn't going for a real world scenario (none of these examples are real world). I was just trying to compare the raw difference in property, generic GetComponent, and typeof GetComponent.

What your results do help to show others though is that there is an increased cost to accessing these methods when you have more objects in the scene.

But the relative difference between these methods remains the same.

NOW... and @PhilSA may find this interesting, there was a really WEIRD result I found when running your code though.

For smaller values of 'Count', downward of 10,000... the property accessor of Transform ended up being slower than the generic GetComponent call. Results similar to what PhilSA was claiming earlier.

At 10,000 items, with the array, I was getting:

property: ~1ms
generic: ~0.1ms
typeof: 0.2ms to 1.1ms (this one was weird, it jumped all over the place, and changed every time I played the benchmark)

Even though the typeof access jumped all over. The property and generic access was rock solid.

And this throws a whole wrench into the benchmark in general.

I have some theories as to why it might be happening... but I'm not exactly sure, so I can't really say. But maybe the transform property algorithm has a cost with a near linear growth curve, where as GetComponent has a cost with a polynomial growth curve.

So, for object counts lower than N, GetComponent ends up being faster, but end up slower as N increases.

Think of how if you plot a line, and a parabola, with vertex at (0,0). Values for x near 0, the line will be above the parabola, but for values of x as it approaches infinity, the parabola is much higher than the line.

View attachment 174132
Click to expand...

Very interesting, thanks a lot for the explanation!

PhilSA · Feb 19, 2016

I just made a test for that thing I put in my quick tips section:

There is a small performance price to pay for virtual functions. Keep that in mind, especially for virtual functions called every frame

Click to expand...

And I can now confirm that the performance difference of virtual functions or overrides is completely meaningless. The price of 100000 overridden function calls was about 0.5ms more than the non-virtual function. So yeah.... not worth it

I'm removing that item from the list

EDIT:
Hold on, maybe it does matter after all.

The inheritance depth of a class does potentially have a noticeable effect on performance. Here's my test output:

The cost increases with each inheritance level, so maybe don't go too crazy with huge inheritance chains in which stuff happens at every frame

lordofduct · Feb 19, 2016

yeah, because basically as you add classes to the inheritance chain, and override, you have a function call for each level. They're effectively independent functions from one another getting called up the chain.

There is an extended cost the first time it's called, and the JIT compiler has to build the function chain. But that's a one time cost.

But the over all overhead is mostly negligible. It's really only noticeable for small functions, where in the function does so little that it's comparatively equal to the cost of a function call.

If you had to do a 5 minute job, and it took you 3 minutes to prepare for it, that 3 minute preparation time is HUGE. If you added 3 more 5 minute jobs all taking 3 minutes to prep for, the prep time would annoy you, as you just spend 32 minutes doing 20 minutes of work. You might want to reorganize that job so that you only prep once, and do 3 minutes prep, 20 minutes work, and 23 minutes overall.

Where as if you had an hour long job, the 3 minutes isn't even noticeable.

Virtual methods, and overrides in general, are usually for integral tasks, or heavy tasks that need possible modification. Seldom are they for small minor tasks... a property getter is seldom overridable.

Where as Constructor, it's mandatory that it be overridable, you don't have to even flag it as such, it is implicitly virtual. Since it's integral.

In unity virtual methods would be integral for things like unity messages. Start, Awake, Update, OnTriggerEnter, etc. They should always be marked virtual, unless you don't plan to allow that class to ever be inherited from. In which case you usually mark it sealed, so that it can't be inherited from. Of course on small or 1 man teams you may over look this, but on large teams or distributed libraries/apis, it's super important so that users understand the intent of the class.

A big reason why sealed is important is that technically if I inherit from your class, I can declare a method with the same name as your method and just call it 'new'. And if I'm stuck in a corner (and don't understand the implications) I might feel forced to use new since I can't actually override your method.

Code (csharp):

public class FooA : MonoBehaviour

{

public void OnTriggerEnter()

{

Debug.Log("FooA");

}

}

public class FooB : FooA

{

public new void OnTriggerEnter()

{

Debug.Log("FooB");

}

}

In this case, unity will call FooB's OnTriggerEnter, since it's the first method with that name found when reflecting it out.

This also goes for if FooA marked it as private, and I just defined FooB.OnTriggerEnter with out new (since it's private, don't need the new, private members scope only to the class they're declared in). Which is a good reason to always have your message handles protected or public, unless again marked as sealed, so that way you force warning to the user that if they plan to implement that message they need to be aware of the class hierarchies use of it already.

MV10 · Feb 21, 2016

TheSniperFan said: ↑

5. Prevent last minute decision making:
What? Last minute decision making means that you're doing calculations despite already knowing that you don't need them. switch and if/else statements are particularly problematic and so are Update() functions that start contain one huge if/else.
Why? You're wasting CPU cycles for literally nothing. Use the information you've got to the fullest.
Example? Calculating some movement vectors and discard them afterwards, instead of checking whether they should be calculated first. Obvious? Sure. There are far more elusive cases out there. Some of which have devastating performance implications.
Click to expand...

I can't quite figure out what this is supposed to mean. Don't make last-minute decisions, but also don't calculate something and discard it... If you don't test whether you need the calc, how do you avoid the unnecessary calc?

On the plus side, skimming this thread convinced me to finally spend 30 or 40 minutes adding multicast delegation for a single manager-style Update instead of MonoBehaviour Update calls everywhere. The improvement was fairly significant. Good bang for the buck, in my case, at least at very high res on sub-optimal hardware (e.g. where it counts). Slowly but surely we're shedding the baggage learned from all the tutorials...

Perhaps in one sense, delegating update calls could be a special-case example of what I think you're trying to say in that quoted point above: "turn off" the call on a cached but currently unused instance, for example?

kru · Feb 22, 2016

MV10 said: ↑

I can't quite figure out what this is supposed to mean. Don't make last-minute decisions, but also don't calculate something and discard it... If you don't test whether you need the calc, how do you avoid the unnecessary calc?
Click to expand...

It might be better worded as: Structure your methods to exit as early as possible.

A contrived example:

Code (csharp):

public void OperateOnSomeGameObject(GameObject obj)

{

float possibleUselessCalculation = obj.transform.position.magnitude;

if (obj.activeSelf == true)

{

// do something with possibleUselessCalculation...

}

}

If you know ahead of time that the calculation isn't needed, then don't perform it. In the above code, it'd be better to put the .magnitude calculation inside of the if block, which is where its used. These can sometimes be tricky to track down, especially in monster methods with excessive branching. That's why its a good idea to keep methods small and easy to parse.

TheSniperFan · Feb 27, 2016

@MV10 :
@kru nailed it. Here's a more realistic example, using a player that cannot be controlled while in midair:

Code (csharp):

private void Update() {

// Read input

Vector2 mouseInputX = Input.GetAxis("Mouse X");

Vector2 mouseInputY = Input.GetAxis("Mouse Y");

Vector2 moveHorizontal = Input.GetAxis("Horizontal");

Vector2 moveVertical = Input.GetAxis("Vertical");

// Calculate the base values for the movement

Vector3 desiredMovement = MovementFromInput(moveHorizontal, moveVertical);

Quaternion newPlayerRotation = PlayerRotationFromInput(mouseInputX, mouseInputY);

// Apply environment based modifiers

desiredMove = MovementSlopePenalty(desiredMove);

desiredMove = FloorMaterialPenalty(desiredMove, GetCurrentFloorMaterial());

// More logic

...

// Apply when grounded

if(IsGrounded()) {

rigidbody.AddForce(desiredMove, ForceMode.Impulse);

transform.rotation = newPlayerRotation;

}

}

It makes a lot more sense to structure the code like this:

Code (csharp):

private void Update() {

if(IsGrounded()) {

// Read input

Vector2 mouseInputX = Input.GetAxis("Mouse X");

Vector2 mouseInputY = Input.GetAxis("Mouse Y");

Vector2 moveHorizontal = Input.GetAxis("Horizontal");

Vector2 moveVertical = Input.GetAxis("Vertical");

// Calculate the base values for the movement

Vector3 desiredMovement = MovementFromInput(moveHorizontal, moveVertical);

Quaternion newPlayerRotation = PlayerRotationFromInput(mouseInputX, mouseInputY);

// Apply environment based modifiers

desiredMove = MovementSlopePenalty(desiredMove);

desiredMove = FloorMaterialPenalty(desiredMove, GetCurrentFloorMaterial());

// More logic

...

// Apply

rigidbody.AddForce(desiredMove, ForceMode.Impulse);

transform.rotation = newPlayerRotation;

}

}

As you see, you need to perform the grounded-check anyway, so it's much better, if you do it first, which allows you to skip all the expensive player logic, if he's in midair.

Think of it this way:
Only do calculations, if you know (for sure) that you will (for sure) need them. If you don't know whether this is the case, find out first.
Now, I don't want to rule out the possibility of exceptions, but in general, this is how you should handle it.

MV10 · Feb 27, 2016

I'd call that common sense ... The original wording threw me, but I appreciate the clarification nonetheless.

TheSniperFan · Feb 27, 2016

@MV10 :
You say that, but this is a really, really simple example. As your code gets more and more complex, those problems get more and more difficult to spot.
The above example was a single method inside a single component. Once your processes start having dependencies across methods, classes, even components, this can tricky. Especially if your workflow is iterative. These are the kinds of issues you usually find while refactoring.

Polymorphik · Feb 27, 2016

MV10 said: ↑

I'd call that common sense ... The original wording threw me, but I appreciate the clarification nonetheless.
Click to expand...

Common sense? Is that in the Asset Store?

battou · Feb 27, 2016

Wow! The tip about MonoBehaviour is awesome! Just checked it myself. But man, managing objects in scene without having access to inspector is soooooo inconvenient.(((

larku · Feb 28, 2016

Polymorphik said: ↑

Common sense? Is that in the Asset Store?
Click to expand...

As they say: Common sense 'aint so common.

Polymorphik · Feb 28, 2016

larku said: ↑

As they say: Common sense 'aint so common.
Click to expand...

laurentlavigne · Feb 28, 2017

PhilSA said: ↑

Unity has a handy ArrayUtility class for doing Contains() or Find() operations on Arrays

Click to expand...

Not available at runtime.

Suddoha · Feb 28, 2017

battou said: ↑

Wow! The tip about MonoBehaviour is awesome! Just checked it myself. But man, managing objects in scene without having access to inspector is soooooo inconvenient.(((
Click to expand...

You can keep your MonoBehaviours and still apply that event pattern. Just avoid to call your update methods like the ones Unity invokes.

Besides that, do not start to completely eliminate MonoBehaviours in your game. Many things can be implemented differently. ScriptableObjects are extremely useful as data and behaviour providers, normal classes can be serialized and some classes should just not be a component at all.

But you usually want to keep some behaviours for good reasons. One of them being the other methods which Unity invokes internally. Another one being that you'd eliminate major inspector features that make Unity so great to work with, especially for a game /level designer, prototyping and the like.

Mafutta · Mar 25, 2017

Some things that worked for me. They may or may not work for others.
https://mafutagames.wordpress.com/2017/03/25/unity-v5-6-0b10-optimization-notes/

FlightOfOne · Nov 16, 2018

PhilSA said: ↑

So the idea came up in another thread to start a thread where we can all share general performance, optimization, and "best practices" tips. I guess the scripting forum kinda is the
1. Consider not using Monobehaviours on everything

According to this Unity blog post, the way Monobehaviours call their "magic methods" is very unoptimal. Here's a simple test that illustrates this:

We will make 10k GameObjects move every frame using two different methods. In the first method, we attach a MonoBehaviour to each GameObject, and this Monobehaviour is what makes them move. In the second method, we have only one MonoBehaviour in the entire scene, and this MonoBehaviour is responsible for creating entity instances that will make their associated object move.

EDIT: Made some improvements to the test to better compare monobehaviours and basic classes

Click to expand...

I know this this is an old post, but this is incredible! I had thought if you call update on individual behaviors vs as a batch in an array had no real difference. This nearly doubles in performance! Some planing and thinking is involved beforehand but the performance gain is well worth it.Thanks for the tip!I

Suddoha · Nov 16, 2018

FlightCrazed said: ↑

I know this this is an old post, but this is incredible! I had thought if you call update on individual behaviors vs as a batch in an array had no real difference. This nearly doubles in performance! Some planing and thinking is involved beforehand but the performance gain is well worth it.Thanks for the tip!I
Click to expand...

If that surprises you that much, perhaps the new Entity Component System (ECS) is worth taking a look at, unless you have already done that.

FlightOfOne · Nov 21, 2018

Suddoha said: ↑

If that surprises you that much, perhaps the new Entity Component System (ECS) is worth taking a look at, unless you have already done that.
Click to expand...

Thanks, Yep, I indeed have! But I am holding off using any ECS stuff on projects until it is out of preview and officially supported for production.

Search Unity

Unity ID

Useful Searches

General performance/optimization tips for Unity

Digital Ape

Digital Ape