Hi guys, I have one shader that is perfect (on PC) for the thing I need but there is a problem with Mobile platform. It is not optimized for it and it lags a much. Can someone help me and tell me how to optimize this shader: Code (CSharp): // Per pixel bumped refraction. // Uses a normal map to distort the image behind, and // an additional texture to tint the color. Shader "FX/Glass/Stained BumpDistort" { Properties { _BumpAmt ("Distortion", range (0,128)) = 10 _MainTex ("Tint Color (RGB)", 2D) = "white" {} _BumpMap ("Normalmap", 2D) = "bump" {} } Category { // We must be transparent, so other objects are drawn before this one. Tags { "Queue"="Transparent" "RenderType"="Opaque" } SubShader { // This pass grabs the screen behind the object into a texture. // We can access the result in the next pass as _GrabTexture GrabPass { Name "BASE" Tags { "LightMode" = "Always" } } // Main pass: Take the texture grabbed above and use the bumpmap to perturb it // on to the screen Pass { Name "BASE" Tags { "LightMode" = "Always" } CGPROGRAM #pragma vertex vert #pragma fragment frag #pragma fragmentoption ARB_precision_hint_fastest #include "UnityCG.cginc" struct appdata_t { float4 vertex : POSITION; float2 texcoord: TEXCOORD0; }; struct v2f { float4 vertex : POSITION; float4 uvgrab : TEXCOORD0; float2 uvbump : TEXCOORD1; float2 uvmain : TEXCOORD2; }; float _BumpAmt; float4 _BumpMap_ST; float4 _MainTex_ST; v2f vert (appdata_t v) { v2f o; o.vertex = mul(UNITY_MATRIX_MVP, v.vertex); #if UNITY_UV_STARTS_AT_TOP float scale = -1.0; #else float scale = 1.0; #endif o.uvgrab.xy = (float2(o.vertex.x, o.vertex.y*scale) + o.vertex.w) * 0.5; o.uvgrab.zw = o.vertex.zw; o.uvbump = TRANSFORM_TEX( v.texcoord, _BumpMap ); o.uvmain = TRANSFORM_TEX( v.texcoord, _MainTex ); return o; } sampler2D _GrabTexture; float4 _GrabTexture_TexelSize; sampler2D _BumpMap; sampler2D _MainTex; half4 frag( v2f i ) : COLOR { // calculate perturbed coordinates half2 bump = UnpackNormal(tex2D( _BumpMap, i.uvbump )).rg; // we could optimize this by just reading the x & y without reconstructing the Z float2 offset = bump * _BumpAmt * _GrabTexture_TexelSize.xy; i.uvgrab.xy = offset * i.uvgrab.z + i.uvgrab.xy; half4 col = tex2Dproj( _GrabTexture, UNITY_PROJ_COORD(i.uvgrab)); half4 tint = tex2D( _MainTex, i.uvmain ); return col * tint; } ENDCG } } // ------------------------------------------------------------------ // Fallback for older cards and Unity non-Pro SubShader { Blend DstColor Zero Pass { Name "BASE" SetTexture [_MainTex] { combine texture } } } } }
I very much doubt the actual shader code is much of a bottle neck, or that there is much that can be optimised. The problem with performance is most likely down to the shader requiring a 'GrabPass', meaning its grabbing the current camera render and copying it to a texture. This can cause performance issues on slow GPU's and I guess is not much liked by mobile. One thing, do you have multiple materials/objects using this shader? If so then you may find that it is doing a grabpass for each material/model being rendered! That will obviously hurt performance even more and thus one optimization you can make is to use a single global grab pass for all materials/models. It does mean that the GrabPass may not be 100% correct, but for the increase in performance it can give its a decent trade off. I discuss this in the thread linked to in my signature below. If you only have a single material/model using this shader, then i'm not sure what you can do to optimize it.
So, I have one object that needs to have this material attached for a certain time. And after that it gets deactivated. I use it on the object to get Stealth mode effect. If you have any better idea I would be thankful for the help.
i found a solution use a new Camera render one target texutre,and control this camera update (set disable or active). use Shader.SetGlobalTexture("_GrabTexture", renderTexture); this works!! and optimized a lot!!!
Code (CSharp): // calculate perturbed coordinates half4 packedNormals = tex2D( _BumpMap, i.uvbump ); half4 tint = tex2D( _MainTex, i.uvmain ); // putting this here should start the sampling before doing more math, // giving the hardware more time to sample the main tex, possibly even making this sampling "free" due to the // dependant nature of the sampling of the grab pass. half2 bump = UnpackNormal(packedNormals).rg; float2 offset = bump * _BumpAmt * _GrabTexture_TexelSize.xy; i.uvgrab.xy = offset * i.uvgrab.z + i.uvgrab.xy; half4 col = tex2Dproj( _GrabTexture, UNITY_PROJ_COORD(i.uvgrab)); return col * tint; Swap order of these 2 lines. (Won't give anywhere near good performance due to how grab pass works, but it's an optimization most likely as the MainTex doesn't depend on anything else. Tho GrabPass in general will always be slow, as it needs to copy the entire screen. And it does that once for every object with a grab pass shader...
This one should be slightly better on some mobiles. Code (CSharp): // Per pixel bumped refraction. // Uses a normal map to distort the image behind, and // an additional texture to tint the color. Shader "FX/Glass/Stained BumpDistort" { Properties { _BumpAmt ("Distortion", range (0,128)) = 10 _MainTex ("Tint Color (RGB)", 2D) = "white" {} _BumpMap ("Normalmap", 2D) = "bump" {} } Category { // We must be transparent, so other objects are drawn before this one. Tags { "Queue"="Transparent" "RenderType"="Opaque" } SubShader { // This pass grabs the screen behind the object into a texture. // We can access the result in the next pass as _GrabTexture GrabPass { Name "BASE" Tags { "LightMode" = "Always" } } // Main pass: Take the texture grabbed above and use the bumpmap to perturb it // on to the screen Pass { Name "BASE" Tags { "LightMode" = "Always" } CGPROGRAM #pragma vertex vert #pragma fragment frag #pragma fragmentoption ARB_precision_hint_fastest #include "UnityCG.cginc" struct appdata_t { float4 vertex : POSITION; float2 texcoord: TEXCOORD0; }; struct v2f { float4 vertex : POSITION; half4 uvgrab : TEXCOORD0; half4 uvbump : TEXCOORD1; half2 uvmain : TEXCOORD2; }; half _BumpAmt; half4 _BumpMap_ST; half4 _MainTex_ST; half4 _GrabTexture_TexelSize; v2f vert (appdata_t v) { v2f o; o.vertex = mul(UNITY_MATRIX_MVP, v.vertex); #if UNITY_UV_STARTS_AT_TOP float scale = -1.0; #else float scale = 1.0; #endif o.uvgrab.xy = (half2(o.vertex.x, o.vertex.y*scale) + o.vertex.w) * 0.5; o.uvgrab.zw = o.vertex.zw; // bake fragment shader computations here o.uvbump.zw = (2.0*_BumpAmt*o.uvgrab.z)*_GrabTexture_TexelSize.xy; o.uvgrab.xy = o.uvgrab.xy - (_BumpAmt*o.uvgrab.z)*_GrabTexture_TexelSize.xy; o.uvbump.xy = TRANSFORM_TEX( v.texcoord, _BumpMap ); o.uvmain = TRANSFORM_TEX( v.texcoord, _MainTex ); return o; } sampler2D _GrabTexture; sampler2D _BumpMap; sampler2D _MainTex; fixed4 frag( v2f i ) : COLOR { // calculate perturbed coordinates half2 bump = tex2D( _BumpMap, i.uvbump.xy ).rg; i.uvgrab.xy = i.uvbump.zw*bump + i.uvgrab.xy; fixed4 col = tex2Dproj( _GrabTexture, UNITY_PROJ_COORD(i.uvgrab)); fixed4 tint = tex2D( _MainTex, i.uvmain ); return col*tint; } ENDCG } } // ------------------------------------------------------------------ // Fallback for older cards and Unity non-Pro SubShader { Blend DstColor Zero Pass { Name "BASE" SetTexture [_MainTex] { combine texture } } } } } Key changes are: - Reduced precision for some calculations - Some calculations are moved from fragment shader to vertex shader Also it's very important which texture formats you are using for this shader. Try not to use 24/32bpp formats. On which devices performance is really bad?
And of course follow ChiuanWei advice. As I understand he recommends to render all scene with your own camera using render-to-texture technique. After that render fullscreen quad with this texture and your object with custom effect using second camera. Pass texture you have used in the first pass using Shader.SetGlobalTexture(). This should work much better because GrabPass unity implementation break CPU/GPU parallelization.
Actually what I suggested should give better performance no matter what, however the entire idea of actually using grab passes is quite bad.
Thank you very much, i'll try it as soon as get home. Really bad performance issue I have on Samsung Galaxy S3.
Well, what you really need to figure out is what is causing the performance issues. As with GPU's it isn't always what you expect! And some optimizations might slow things down, some might have no effect ect...
You mean I should try on mobile platform with this link? http://blogs.unity3d.com/2015/02/06/extending-unity-5-rendering-pipeline-command-buffers/
It's not really the shader, but the grab pass. On SGS 3 (Mali-400 MP) shader consumes just 4 cycles per pixel, it's not very much to cause serious problems. But GrabPass initiate glReadPixels call which stalls CPU, wait for GPU to finish, then copy framebuffer content to texture, and only after that CPU continue processing. In average it costs you about 30 additional milliseconds and more per frame which is performance killer. BTW. render to texture approach will suffer depth test issues and only appropriate if object is not occluded.