Search Unity

Shader optimization pointers?

Discussion in 'Shaders' started by hackborn, Feb 11, 2016.

  1. hackborn

    hackborn

    Joined:
    Feb 11, 2016
    Posts:
    5
    This might well be too general a question to ask, but I wrote a shader to handle setting color based on vertex color with the intention that it would be an optimized best-case performance, and I'm surprised to find it performs more slowly than the "Particle Premultiply Blend" shader in the built-in unity "Default-Particle" material. For reference, here's my shader:

    Code (csharp):
    1.  
    2. Shader "Custom/NewSurfaceShader" {
    3.     Properties {
    4.         _Color ("Color", Color) = (1,1,1,1)
    5.     }
    6.     SubShader {
    7.         Tags { "QUEUE"="Transparent" "IGNOREPROJECTOR"="true" "RenderType"="Transparent" }
    8. //        Tags { "RenderType"="Opaque" }
    9. //        LOD 200
    10.      
    11.         CGPROGRAM
    12.         #pragma surface surf Standard
    13.         #pragma vertex vert
    14.  
    15.         #pragma target 2.0
    16.  
    17.         // XXX I would really like to use my own appdata in vert() because I assume
    18.         // there's overhead with getting the full data, but the shader won't
    19.         // compile with this.
    20.         struct appdata {
    21.             float4 color : COLOR;
    22.         };
    23.  
    24.         struct Input {
    25.             float2 uv_MainTex;
    26.             float4 vertexColor;
    27.         };
    28.  
    29.         half _Glossiness;
    30.         half _Metallic;
    31.         fixed4 _Color;
    32.  
    33.         void vert (inout appdata_full v, out Input o) {
    34.             UNITY_INITIALIZE_OUTPUT(Input, o);
    35.             o.vertexColor = v.color;
    36.         }
    37.  
    38.         void surf (Input IN, inout SurfaceOutputStandard o) {
    39.             o.Albedo = IN.vertexColor;
    40.         }
    41.         ENDCG
    42.     }
    43.     FallBack "Diffuse"
    44. }
    45.  
    I've been comparing it against the ParticleShader, which is pretty extensive, so I'm not quite sure what to be looking for. I've already made one change -- the default shader is set to #pragma target 3.0 and I found moving it to 2.0 improved performance considerably.

    I guess my basic qestion is: I know these shaders are behaving fundamentally differently, I'm pretty sure the Particle shader does a texture lookup, then filters that by a single colour, whereas mine would be interpolating colors from each vertex. However, I always thought texture lookups were very slow, and if anything this would make mine even faster. So any advice on whether that's not the case -- the vertex interpolation is going to be slower, and I should give up on that route, or if there's some other difference someone could point me out?

    thanks much!
     
  2. smd863

    smd863

    Joined:
    Jan 26, 2014
    Posts:
    292
    A surface shader automatically applies lighting to the values supplied by the surf function, and if you use "#pragma surface surf Standard", it is using the full Standard physically-based lighting functions which are very expensive. Setting the target to 2.0 will make the shader use a cheaper BRDF function, but it is still much slower than unlit or legacy lighting shaders.

    The existing particle shading is written as a straight vertex and fragment shader, and does no lighting calculations at all. It will be very fast. You might be able to speed it up if you remove the texture lookup, but texture lookups aren't always slow. The graphics card may be able pre-fetch the texture (especially when you are using the uv's straight from the vertex shader) so the lookup may take no time at all. Of course, it will still take an extra instruction or two to multiply the texture in, and once you hit your texture bandwidth limit they will start to hurt you. In practice, though, an extra texture lookup or two could have very little effect on actual performance.
     
    HotRhodium likes this.
  3. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,348
    Shaders that use "#pragma surface" are converted into a (actually several) vert / frag shader behind the scenes. You can select the shader and click on "show generated code" to see the full shader. It's a lot of code, especially for the Standard shader, and there's more than what's just in the file it shows you as many of the functions are in external cginc files.
     
  4. rageingnonsense

    rageingnonsense

    Joined:
    Dec 3, 2014
    Posts:
    99
    Show Generated Code is absolutely the way to go. You'll see that it generates A LOT of code compared to your surface shader.

    If you want something for optimal performance, you will need to start from an unlit vertex shader and do it all manually. You could also take the generated code and rip it apart.
     
  5. hackborn

    hackborn

    Joined:
    Feb 11, 2016
    Posts:
    5
    Oh thanks much, that explains a lot of it. I'm getting pretty interested in the default particle shader now -- even writing a standard vert/frag shader, performance is improved, but still not as fast as the particle shader. So I tried null'ing out the particle shader's texture, and its performance divebombed. Then I tried replacing its texture with a 1x1 image, and performance still took a dip. Which is very odd to me -- I could see mipmapping improving performance on using a large texture, but what's there to improve on something 1x1? Anyway I'm now approaching very close numbers, a couple FPS off so it's not that big a deal, but very interesting to me.
     
  6. hackborn

    hackborn

    Joined:
    Feb 11, 2016
    Posts:
    5
    I was looking at the generated code until I found you could download the Unity built-in shaders, then I switched over to that. I'm just dipping my toes in the water here, and the original is much, much easier to read.
     
  7. rageingnonsense

    rageingnonsense

    Joined:
    Dec 3, 2014
    Posts:
    99
    You'll want to do both honestly. You need the built-in shaders code obviously, but the generated code really helps explain what a surface shader really is (a shortcut really)
     
  8. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,348
    A nice basic and well optimized unlit particle shader is going to be 1 texture read, ~8 vertex math operations and a tiny handful of fragment math operations (also called ALU ops, or Arthimatic Logic Unit operations). The basic standard shader, even running the simplified #pragma target 2.0 version is going to be multiple texture reads, and close to 100 ALU ops between the vertex shader and fragment shader, maybe more. Changing the size of the texture isn't going to make up the difference of 10 fold increase in math operations. Lighting, especially using the physically based Standard shader, is not cheap.

    You can do a simplified lighting model like lambert or blind-phong and it'll be a lot faster than the Standard model, or you can go full vertex lighting which is even cheaper (generally). Doing stuff in the vertex shader is often cheaper because there are usually fewer vertices than pixels for an object on screen so the math will be done fewer times. If you have a lot of very dense meshes or are doing very small particles it might not be any faster and might even be slower as there is a cost to the data that's passed between the vertex shader and pixel shader.
     
  9. hackborn

    hackborn

    Joined:
    Feb 11, 2016
    Posts:
    5
    Thanks much everyone, tremendously helpful info.