Search Unity

Performance overhead for vertex function in surface shader

Discussion in 'Shaders' started by Johannski, Apr 3, 2017.

  1. Johannski

    Johannski

    Joined:
    Jan 25, 2014
    Posts:
    826
    Hey there,

    I'm optimizing my shaders right now and had the idea to put some stuff I don't need to calculate for every pixel into a custom vertex function (for a surface shader). For example getting a rotated world point:

    Code (CSharp):
    1. void vert (inout appdata_full v, out Input o)
    2. {
    3.     UNITY_INITIALIZE_OUTPUT(Input,o);
    4.     o.customWorldPos = mul(unity_ObjectToWorld, v.vertex);
    5.     o.customWorldPos = ApplyRotation(o.customWorldPos);
    6. }
    Is there any overhead involved in calling a custom vertex function? Developing for iOS :)
     
  2. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,348
    There's obviously the cost of that math being in the vertex shader, which should hopefully be less than the cost of doing it in the fragment shader unless you have a really high polygon mesh (in which case I would say look into mesh LODs ;)). There is also a cost for transferring data from the vertex shader to the fragment shader. It's not a ton, but it does exist, such that if you had a quad mesh that was only being rendered in 4 pixels doing the work in the vertex shader would be noticeably more expensive than the fragment shader.

    However this is generally considered a good practice for mobile and mid range desktop (at least for GPUs from a few years ago). On desktop / console it can be a bit of a toss up, but often doing it all in the fragment shader can be faster if it means not having to transfer additional data just because the processing power of modern GPUs has gotten so fast but the memory bandwidth (where much of the cost is coming from) hasn't increases as significantly. Mobile, while getting surprisingly powerful and is quite bandwidth constrained, still generally falls on the side of vertex & transfer over doing it in the fragment shader. It all depends on just how costly the operation is of course.

    A transform matrix operation on a position vector is probably going to be enough work to justify the cost, where as just an offset and scale (ie: pos * float3(0.5, 0.5, 0.5) + float3(1,0,3)) is probably not worth it.
     
    FM-Productions likes this.
  3. Johannski

    Johannski

    Joined:
    Jan 25, 2014
    Posts:
    826
    Awesome, thanks for the detailed answer! I did suspect, that it is faster for simple vertex fragment shaders, I just feared that UNITY_INITIALIZE_OUTPUT(Input,o); might have an overhead I don't know of. Indeed it is a multiplication with a TSR matrix. :)
     
  4. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,348
    UNITY_INITIALIZE_OUTPUT(Input,o); just sets everything in the struct to zero to avoid uninitialized variables. If you actually make sure you use all of your parameters (and you absolutely should!) then that macro will effectively be skipped in the shader, otherwise it'll just be set to zero. Personally I dislike that macro since it hides errors that would otherwise let you know your vertex shader transfer data that will never be used.