Search Unity

Optimize shader

Discussion in 'Shaders' started by adam_mehman, May 27, 2015.

  1. adam_mehman

    adam_mehman

    Joined:
    Dec 11, 2014
    Posts:
    104
    Hi guys, I have one shader that is perfect (on PC) for the thing I need but there is a problem with Mobile platform. It is not optimized for it and it lags a much.

    Can someone help me and tell me how to optimize this shader:


    Code (CSharp):
    1. // Per pixel bumped refraction.
    2. // Uses a normal map to distort the image behind, and
    3. // an additional texture to tint the color.
    4.  
    5. Shader "FX/Glass/Stained BumpDistort" {
    6. Properties {
    7.     _BumpAmt  ("Distortion", range (0,128)) = 10
    8.     _MainTex ("Tint Color (RGB)", 2D) = "white" {}
    9.     _BumpMap ("Normalmap", 2D) = "bump" {}
    10. }
    11.  
    12. Category {
    13.  
    14.     // We must be transparent, so other objects are drawn before this one.
    15.     Tags { "Queue"="Transparent" "RenderType"="Opaque" }
    16.  
    17.  
    18.     SubShader {
    19.  
    20.         // This pass grabs the screen behind the object into a texture.
    21.         // We can access the result in the next pass as _GrabTexture
    22.         GrabPass {                          
    23.             Name "BASE"
    24.             Tags { "LightMode" = "Always" }
    25.          }
    26.        
    27.          // Main pass: Take the texture grabbed above and use the bumpmap to perturb it
    28.          // on to the screen
    29.         Pass {
    30.             Name "BASE"
    31.             Tags { "LightMode" = "Always" }
    32.            
    33. CGPROGRAM
    34. #pragma vertex vert
    35. #pragma fragment frag
    36. #pragma fragmentoption ARB_precision_hint_fastest
    37. #include "UnityCG.cginc"
    38.  
    39. struct appdata_t {
    40.     float4 vertex : POSITION;
    41.     float2 texcoord: TEXCOORD0;
    42. };
    43.  
    44. struct v2f {
    45.     float4 vertex : POSITION;
    46.     float4 uvgrab : TEXCOORD0;
    47.     float2 uvbump : TEXCOORD1;
    48.     float2 uvmain : TEXCOORD2;
    49. };
    50.  
    51. float _BumpAmt;
    52. float4 _BumpMap_ST;
    53. float4 _MainTex_ST;
    54.  
    55. v2f vert (appdata_t v)
    56. {
    57.     v2f o;
    58.     o.vertex = mul(UNITY_MATRIX_MVP, v.vertex);
    59.     #if UNITY_UV_STARTS_AT_TOP
    60.     float scale = -1.0;
    61.     #else
    62.     float scale = 1.0;
    63.     #endif
    64.     o.uvgrab.xy = (float2(o.vertex.x, o.vertex.y*scale) + o.vertex.w) * 0.5;
    65.     o.uvgrab.zw = o.vertex.zw;
    66.     o.uvbump = TRANSFORM_TEX( v.texcoord, _BumpMap );
    67.     o.uvmain = TRANSFORM_TEX( v.texcoord, _MainTex );
    68.     return o;
    69. }
    70.  
    71. sampler2D _GrabTexture;
    72. float4 _GrabTexture_TexelSize;
    73. sampler2D _BumpMap;
    74. sampler2D _MainTex;
    75.  
    76. half4 frag( v2f i ) : COLOR
    77. {
    78.     // calculate perturbed coordinates
    79.     half2 bump = UnpackNormal(tex2D( _BumpMap, i.uvbump )).rg; // we could optimize this by just reading the x & y without reconstructing the Z
    80.     float2 offset = bump * _BumpAmt * _GrabTexture_TexelSize.xy;
    81.     i.uvgrab.xy = offset * i.uvgrab.z + i.uvgrab.xy;
    82.    
    83.     half4 col = tex2Dproj( _GrabTexture, UNITY_PROJ_COORD(i.uvgrab));
    84.     half4 tint = tex2D( _MainTex, i.uvmain );
    85.     return col * tint;
    86. }
    87. ENDCG
    88.         }
    89.     }
    90.  
    91.     // ------------------------------------------------------------------
    92.     // Fallback for older cards and Unity non-Pro
    93.    
    94.     SubShader {
    95.         Blend DstColor Zero
    96.         Pass {
    97.             Name "BASE"
    98.             SetTexture [_MainTex] {    combine texture }
    99.         }
    100.     }
    101. }
    102.  
    103. }
    104.  
     
  2. Noisecrime

    Noisecrime

    Joined:
    Apr 7, 2010
    Posts:
    2,054
    I very much doubt the actual shader code is much of a bottle neck, or that there is much that can be optimised. The problem with performance is most likely down to the shader requiring a 'GrabPass', meaning its grabbing the current camera render and copying it to a texture. This can cause performance issues on slow GPU's and I guess is not much liked by mobile.

    One thing, do you have multiple materials/objects using this shader? If so then you may find that it is doing a grabpass for each material/model being rendered! That will obviously hurt performance even more and thus one optimization you can make is to use a single global grab pass for all materials/models. It does mean that the GrabPass may not be 100% correct, but for the increase in performance it can give its a decent trade off. I discuss this in the thread linked to in my signature below.

    If you only have a single material/model using this shader, then i'm not sure what you can do to optimize it.
     
  3. adam_mehman

    adam_mehman

    Joined:
    Dec 11, 2014
    Posts:
    104
    So, I have one object that needs to have this material attached for a certain time. And after that it gets deactivated.
    I use it on the object to get Stealth mode effect.

    If you have any better idea I would be thankful for the help.
     
  4. ChiuanWei

    ChiuanWei

    Joined:
    Jan 29, 2012
    Posts:
    131
    :( i have use GrabPass and do distortion effect on iphone5
    the fps tooooooo slow!!! :(
     
  5. adam_mehman

    adam_mehman

    Joined:
    Dec 11, 2014
    Posts:
    104
    Yes I know. That is why we need an optimized one.
     
  6. ChiuanWei

    ChiuanWei

    Joined:
    Jan 29, 2012
    Posts:
    131
    i found a solution
    use a new Camera render one target texutre,and control this camera update (set disable or active).
    use Shader.SetGlobalTexture("_GrabTexture", renderTexture);
    this works!! and optimized a lot!!!
     
  7. Zicandar

    Zicandar

    Joined:
    Feb 10, 2014
    Posts:
    388
    Code (CSharp):
    1.  // calculate perturbed coordinates
    2.                 half4 packedNormals = tex2D( _BumpMap, i.uvbump );
    3.                 half4 tint = tex2D( _MainTex, i.uvmain ); // putting this here should start the sampling before doing more math,
    4.                 // giving the hardware more time to sample the main tex, possibly even making this sampling "free" due to the
    5.                 // dependant nature of the sampling of the grab pass.
    6.                 half2 bump = UnpackNormal(packedNormals).rg;
    7.                 float2 offset = bump * _BumpAmt * _GrabTexture_TexelSize.xy;
    8.                 i.uvgrab.xy = offset * i.uvgrab.z + i.uvgrab.xy;
    9.  
    10.                 half4 col = tex2Dproj( _GrabTexture, UNITY_PROJ_COORD(i.uvgrab));
    11.                 return col * tint;
    Swap order of these 2 lines. (Won't give anywhere near good performance due to how grab pass works, but it's an optimization most likely as the MainTex doesn't depend on anything else.
    Tho GrabPass in general will always be slow, as it needs to copy the entire screen. And it does that once for every object with a grab pass shader...
     
  8. adam_mehman

    adam_mehman

    Joined:
    Dec 11, 2014
    Posts:
    104
    I didn't understand you well. Can you please be more precise?
     
  9. adam_mehman

    adam_mehman

    Joined:
    Dec 11, 2014
    Posts:
    104
    So you think this won't give any better performance on mobiles?
     
  10. naxel

    naxel

    Joined:
    Sep 12, 2012
    Posts:
    14
    This one should be slightly better on some mobiles.

    Code (CSharp):
    1. // Per pixel bumped refraction.
    2. // Uses a normal map to distort the image behind, and
    3. // an additional texture to tint the color.
    4. Shader "FX/Glass/Stained BumpDistort" {
    5. Properties {
    6.     _BumpAmt  ("Distortion", range (0,128)) = 10
    7.     _MainTex ("Tint Color (RGB)", 2D) = "white" {}
    8.     _BumpMap ("Normalmap", 2D) = "bump" {}
    9. }
    10. Category {
    11.     // We must be transparent, so other objects are drawn before this one.
    12.     Tags { "Queue"="Transparent" "RenderType"="Opaque" }
    13.     SubShader {
    14.         // This pass grabs the screen behind the object into a texture.
    15.         // We can access the result in the next pass as _GrabTexture
    16.         GrabPass {                        
    17.             Name "BASE"
    18.             Tags { "LightMode" = "Always" }
    19.          }
    20.      
    21.          // Main pass: Take the texture grabbed above and use the bumpmap to perturb it
    22.          // on to the screen
    23.         Pass {
    24.             Name "BASE"
    25.             Tags { "LightMode" = "Always" }
    26.          
    27. CGPROGRAM
    28. #pragma vertex vert
    29. #pragma fragment frag
    30. #pragma fragmentoption ARB_precision_hint_fastest
    31. #include "UnityCG.cginc"
    32. struct appdata_t {
    33.     float4 vertex : POSITION;
    34.     float2 texcoord: TEXCOORD0;
    35. };
    36. struct v2f {
    37.     float4 vertex : POSITION;
    38.     half4 uvgrab : TEXCOORD0;
    39.     half4 uvbump : TEXCOORD1;
    40.     half2 uvmain : TEXCOORD2;
    41. };
    42. half   _BumpAmt;
    43. half4  _BumpMap_ST;
    44. half4  _MainTex_ST;
    45. half4  _GrabTexture_TexelSize;
    46. v2f vert (appdata_t v)
    47. {
    48.     v2f o;
    49.     o.vertex = mul(UNITY_MATRIX_MVP, v.vertex);
    50. #if UNITY_UV_STARTS_AT_TOP
    51.     float scale = -1.0;
    52. #else
    53.     float scale = 1.0;
    54. #endif
    55.     o.uvgrab.xy = (half2(o.vertex.x, o.vertex.y*scale) + o.vertex.w) * 0.5;
    56.     o.uvgrab.zw = o.vertex.zw;
    57.    
    58.     // bake fragment shader computations here
    59.     o.uvbump.zw = (2.0*_BumpAmt*o.uvgrab.z)*_GrabTexture_TexelSize.xy;
    60.     o.uvgrab.xy = o.uvgrab.xy - (_BumpAmt*o.uvgrab.z)*_GrabTexture_TexelSize.xy;
    61.    
    62.     o.uvbump.xy = TRANSFORM_TEX( v.texcoord, _BumpMap );
    63.     o.uvmain = TRANSFORM_TEX( v.texcoord, _MainTex );
    64.     return o;
    65. }
    66. sampler2D _GrabTexture;
    67. sampler2D _BumpMap;
    68. sampler2D _MainTex;
    69. fixed4 frag( v2f i ) : COLOR
    70. {
    71.     // calculate perturbed coordinates
    72.     half2 bump  = tex2D( _BumpMap, i.uvbump.xy ).rg;        
    73.     i.uvgrab.xy = i.uvbump.zw*bump + i.uvgrab.xy;
    74.  
    75.     fixed4 col  = tex2Dproj( _GrabTexture, UNITY_PROJ_COORD(i.uvgrab));
    76.     fixed4 tint = tex2D( _MainTex, i.uvmain );
    77.     return col*tint;
    78. }
    79. ENDCG
    80.         }
    81.     }
    82.     // ------------------------------------------------------------------
    83.     // Fallback for older cards and Unity non-Pro
    84.  
    85.     SubShader {
    86.         Blend DstColor Zero
    87.         Pass {
    88.             Name "BASE"
    89.             SetTexture [_MainTex] {    combine texture }
    90.         }
    91.     }
    92. }
    93. }
    Key changes are:
    - Reduced precision for some calculations
    - Some calculations are moved from fragment shader to vertex shader

    Also it's very important which texture formats you are using for this shader. Try not to use 24/32bpp formats.
    On which devices performance is really bad?
     
  11. naxel

    naxel

    Joined:
    Sep 12, 2012
    Posts:
    14
    And of course follow ChiuanWei advice. As I understand he recommends to render all scene with your own camera using render-to-texture technique. After that render fullscreen quad with this texture and your object with custom effect using second camera. Pass texture you have used in the first pass using Shader.SetGlobalTexture().
    This should work much better because GrabPass unity implementation break CPU/GPU parallelization.
     
  12. ChiuanWei

    ChiuanWei

    Joined:
    Jan 29, 2012
    Posts:
    131
    yes! that's what i mean.;)
     
  13. Zicandar

    Zicandar

    Joined:
    Feb 10, 2014
    Posts:
    388
    Actually what I suggested should give better performance no matter what, however the entire idea of actually using grab passes is quite bad.
     
  14. adam_mehman

    adam_mehman

    Joined:
    Dec 11, 2014
    Posts:
    104

    Thank you very much, i'll try it as soon as get home. Really bad performance issue I have on Samsung Galaxy S3.
     
  15. adam_mehman

    adam_mehman

    Joined:
    Dec 11, 2014
    Posts:
    104
    Ok, I'll do my best to understand what I need to do.
     
  16. Zicandar

    Zicandar

    Joined:
    Feb 10, 2014
    Posts:
    388
    Well, what you really need to figure out is what is causing the performance issues.
    As with GPU's it isn't always what you expect! And some optimizations might slow things down, some might have no effect ect...
     
  17. ChiuanWei

    ChiuanWei

    Joined:
    Jan 29, 2012
    Posts:
    131
    in Unity5, i have use Command Buffers instead of GrabPass works.
     
  18. adam_mehman

    adam_mehman

    Joined:
    Dec 11, 2014
    Posts:
    104
    I know that issue is about the shader, because when i disable it game runs smoothly.
     
  19. adam_mehman

    adam_mehman

    Joined:
    Dec 11, 2014
    Posts:
    104
  20. naxel

    naxel

    Joined:
    Sep 12, 2012
    Posts:
    14
    It's not really the shader, but the grab pass. On SGS 3 (Mali-400 MP) shader consumes just 4 cycles per pixel, it's not very much to cause serious problems. But GrabPass initiate glReadPixels call which stalls CPU, wait for GPU to finish, then copy framebuffer content to texture, and only after that CPU continue processing. In average it costs you about 30 additional milliseconds and more per frame which is performance killer.
    BTW. render to texture approach will suffer depth test issues and only appropriate if object is not occluded.