Optimize shader

adam_mehman · May 27, 2015

Hi guys, I have one shader that is perfect (on PC) for the thing I need but there is a problem with Mobile platform. It is not optimized for it and it lags a much.

Can someone help me and tell me how to optimize this shader:

Code (CSharp):

// Per pixel bumped refraction.

// Uses a normal map to distort the image behind, and

// an additional texture to tint the color.

Shader "FX/Glass/Stained BumpDistort" {

Properties {

_BumpAmt ("Distortion", range (0,128)) = 10

_MainTex ("Tint Color (RGB)", 2D) = "white" {}

_BumpMap ("Normalmap", 2D) = "bump" {}

}

Category {

// We must be transparent, so other objects are drawn before this one.

Tags { "Queue"="Transparent" "RenderType"="Opaque" }

SubShader {

// This pass grabs the screen behind the object into a texture.

// We can access the result in the next pass as _GrabTexture

GrabPass {

Name "BASE"

Tags { "LightMode" = "Always" }

}

// Main pass: Take the texture grabbed above and use the bumpmap to perturb it

// on to the screen

Pass {

Name "BASE"

Tags { "LightMode" = "Always" }

CGPROGRAM

#pragma vertex vert

#pragma fragment frag

#pragma fragmentoption ARB_precision_hint_fastest

#include "UnityCG.cginc"

struct appdata_t {

float4 vertex : POSITION;

float2 texcoord: TEXCOORD0;

};

struct v2f {

float4 vertex : POSITION;

float4 uvgrab : TEXCOORD0;

float2 uvbump : TEXCOORD1;

float2 uvmain : TEXCOORD2;

};

float _BumpAmt;

float4 _BumpMap_ST;

float4 _MainTex_ST;

v2f vert (appdata_t v)

{

v2f o;

o.vertex = mul(UNITY_MATRIX_MVP, v.vertex);

#if UNITY_UV_STARTS_AT_TOP

float scale = -1.0;

#else

float scale = 1.0;

#endif

o.uvgrab.xy = (float2(o.vertex.x, o.vertex.y*scale) + o.vertex.w) * 0.5;

o.uvgrab.zw = o.vertex.zw;

o.uvbump = TRANSFORM_TEX( v.texcoord, _BumpMap );

o.uvmain = TRANSFORM_TEX( v.texcoord, _MainTex );

return o;

}

sampler2D _GrabTexture;

float4 _GrabTexture_TexelSize;

sampler2D _BumpMap;

sampler2D _MainTex;

half4 frag( v2f i ) : COLOR

{

// calculate perturbed coordinates

half2 bump = UnpackNormal(tex2D( _BumpMap, i.uvbump )).rg; // we could optimize this by just reading the x & y without reconstructing the Z

float2 offset = bump * _BumpAmt * _GrabTexture_TexelSize.xy;

i.uvgrab.xy = offset * i.uvgrab.z + i.uvgrab.xy;

half4 col = tex2Dproj( _GrabTexture, UNITY_PROJ_COORD(i.uvgrab));

half4 tint = tex2D( _MainTex, i.uvmain );

return col * tint;

}

ENDCG

}

}

// ------------------------------------------------------------------

// Fallback for older cards and Unity non-Pro

SubShader {

Blend DstColor Zero

Pass {

Name "BASE"

SetTexture [_MainTex] { combine texture }

}

}

}

}

Noisecrime · May 27, 2015

I very much doubt the actual shader code is much of a bottle neck, or that there is much that can be optimised. The problem with performance is most likely down to the shader requiring a 'GrabPass', meaning its grabbing the current camera render and copying it to a texture. This can cause performance issues on slow GPU's and I guess is not much liked by mobile.

One thing, do you have multiple materials/objects using this shader? If so then you may find that it is doing a grabpass for each material/model being rendered! That will obviously hurt performance even more and thus one optimization you can make is to use a single global grab pass for all materials/models. It does mean that the GrabPass may not be 100% correct, but for the increase in performance it can give its a decent trade off. I discuss this in the thread linked to in my signature below.

If you only have a single material/model using this shader, then i'm not sure what you can do to optimize it.

adam_mehman · May 27, 2015

So, I have one object that needs to have this material attached for a certain time. And after that it gets deactivated.
I use it on the object to get Stealth mode effect.

If you have any better idea I would be thankful for the help.

ChiuanWei · May 28, 2015

i have use GrabPass and do distortion effect on iphone5
the fps tooooooo slow!!!

adam_mehman · May 28, 2015

Yes I know. That is why we need an optimized one.

ChiuanWei · May 28, 2015

babazookZ said: ↑

Yes I know. That is why we need an optimized one.
Click to expand...

i found a solution
use a new Camera render one target texutre,and control this camera update (set disable or active).
use Shader.SetGlobalTexture("_GrabTexture", renderTexture);
this works!! and optimized a lot!!!

Zicandar · May 28, 2015

Code (CSharp):

// calculate perturbed coordinates

half4 packedNormals = tex2D( _BumpMap, i.uvbump );

half4 tint = tex2D( _MainTex, i.uvmain ); // putting this here should start the sampling before doing more math,

// giving the hardware more time to sample the main tex, possibly even making this sampling "free" due to the

// dependant nature of the sampling of the grab pass.

half2 bump = UnpackNormal(packedNormals).rg;

float2 offset = bump * _BumpAmt * _GrabTexture_TexelSize.xy;

i.uvgrab.xy = offset * i.uvgrab.z + i.uvgrab.xy;

half4 col = tex2Dproj( _GrabTexture, UNITY_PROJ_COORD(i.uvgrab));

return col * tint;

Swap order of these 2 lines. (Won't give anywhere near good performance due to how grab pass works, but it's an optimization most likely as the MainTex doesn't depend on anything else.
Tho GrabPass in general will always be slow, as it needs to copy the entire screen. And it does that once for every object with a grab pass shader...

adam_mehman · May 28, 2015

ChiuanWei said: ↑

i found a solution
use a new Camera render one target texutre,and control this camera update (set disable or active).
use Shader.SetGlobalTexture("_GrabTexture", renderTexture);
this works!! and optimized a lot!!!
Click to expand...

I didn't understand you well. Can you please be more precise?

adam_mehman · May 28, 2015

Zicandar said: ↑

Code (CSharp):

// calculate perturbed coordinates

half4 packedNormals = tex2D( _BumpMap, i.uvbump );

half4 tint = tex2D( _MainTex, i.uvmain ); // putting this here should start the sampling before doing more math,

// giving the hardware more time to sample the main tex, possibly even making this sampling "free" due to the

// dependant nature of the sampling of the grab pass.

half2 bump = UnpackNormal(packedNormals).rg;

float2 offset = bump * _BumpAmt * _GrabTexture_TexelSize.xy;

i.uvgrab.xy = offset * i.uvgrab.z + i.uvgrab.xy;

half4 col = tex2Dproj( _GrabTexture, UNITY_PROJ_COORD(i.uvgrab));

return col * tint;

Swap order of these 2 lines. (Won't give anywhere near good performance due to how grab pass works, but it's an optimization most likely as the MainTex doesn't depend on anything else.
Tho GrabPass in general will always be slow, as it needs to copy the entire screen. And it does that once for every object with a grab pass shader...
Click to expand...

So you think this won't give any better performance on mobiles?

naxel · May 29, 2015

This one should be slightly better on some mobiles.

Code (CSharp):

// Per pixel bumped refraction.

// Uses a normal map to distort the image behind, and

// an additional texture to tint the color.

Shader "FX/Glass/Stained BumpDistort" {

Properties {

_BumpAmt ("Distortion", range (0,128)) = 10

_MainTex ("Tint Color (RGB)", 2D) = "white" {}

_BumpMap ("Normalmap", 2D) = "bump" {}

}

Category {

// We must be transparent, so other objects are drawn before this one.

Tags { "Queue"="Transparent" "RenderType"="Opaque" }

SubShader {

// This pass grabs the screen behind the object into a texture.

// We can access the result in the next pass as _GrabTexture

GrabPass {

Name "BASE"

Tags { "LightMode" = "Always" }

}

// Main pass: Take the texture grabbed above and use the bumpmap to perturb it

// on to the screen

Pass {

Name "BASE"

Tags { "LightMode" = "Always" }

CGPROGRAM

#pragma vertex vert

#pragma fragment frag

#pragma fragmentoption ARB_precision_hint_fastest

#include "UnityCG.cginc"

struct appdata_t {

float4 vertex : POSITION;

float2 texcoord: TEXCOORD0;

};

struct v2f {

float4 vertex : POSITION;

half4 uvgrab : TEXCOORD0;

half4 uvbump : TEXCOORD1;

half2 uvmain : TEXCOORD2;

};

half _BumpAmt;

half4 _BumpMap_ST;

half4 _MainTex_ST;

half4 _GrabTexture_TexelSize;

v2f vert (appdata_t v)

{

v2f o;

o.vertex = mul(UNITY_MATRIX_MVP, v.vertex);

#if UNITY_UV_STARTS_AT_TOP

float scale = -1.0;

#else

float scale = 1.0;

#endif

o.uvgrab.xy = (half2(o.vertex.x, o.vertex.y*scale) + o.vertex.w) * 0.5;

o.uvgrab.zw = o.vertex.zw;

// bake fragment shader computations here

o.uvbump.zw = (2.0*_BumpAmt*o.uvgrab.z)*_GrabTexture_TexelSize.xy;

o.uvgrab.xy = o.uvgrab.xy - (_BumpAmt*o.uvgrab.z)*_GrabTexture_TexelSize.xy;

o.uvbump.xy = TRANSFORM_TEX( v.texcoord, _BumpMap );

o.uvmain = TRANSFORM_TEX( v.texcoord, _MainTex );

return o;

}

sampler2D _GrabTexture;

sampler2D _BumpMap;

sampler2D _MainTex;

fixed4 frag( v2f i ) : COLOR

{

// calculate perturbed coordinates

half2 bump = tex2D( _BumpMap, i.uvbump.xy ).rg;

i.uvgrab.xy = i.uvbump.zw*bump + i.uvgrab.xy;

fixed4 col = tex2Dproj( _GrabTexture, UNITY_PROJ_COORD(i.uvgrab));

fixed4 tint = tex2D( _MainTex, i.uvmain );

return col*tint;

}

ENDCG

}

}

// ------------------------------------------------------------------

// Fallback for older cards and Unity non-Pro

SubShader {

Blend DstColor Zero

Pass {

Name "BASE"

SetTexture [_MainTex] { combine texture }

}

}

}

}

Key changes are:
- Reduced precision for some calculations
- Some calculations are moved from fragment shader to vertex shader

Also it's very important which texture formats you are using for this shader. Try not to use 24/32bpp formats.
On which devices performance is really bad?

naxel · May 29, 2015

And of course follow ChiuanWei advice. As I understand he recommends to render all scene with your own camera using render-to-texture technique. After that render fullscreen quad with this texture and your object with custom effect using second camera. Pass texture you have used in the first pass using Shader.SetGlobalTexture().
This should work much better because GrabPass unity implementation break CPU/GPU parallelization.

ChiuanWei · May 29, 2015

naxel said: ↑

And of course follow ChiuanWei advice. As I understand he recommends to render all scene with your own camera using render-to-texture technique. After that render fullscreen quad with this texture and your object with custom effect using second camera. Pass texture you have used in the first pass using Shader.SetGlobalTexture().
This should work much better because GrabPass unity implementation break CPU/GPU parallelization.
Click to expand...

yes！ that's what i mean.

Zicandar · May 29, 2015

babazookZ said: ↑

So you think this won't give any better performance on mobiles?
Click to expand...

Actually what I suggested should give better performance no matter what, however the entire idea of actually using grab passes is quite bad.

adam_mehman · May 29, 2015

naxel said: ↑

This one should be slightly better on some mobiles.

Code (CSharp):

// Per pixel bumped refraction.

// Uses a normal map to distort the image behind, and

// an additional texture to tint the color.

Shader "FX/Glass/Stained BumpDistort" {

Properties {

_BumpAmt ("Distortion", range (0,128)) = 10

_MainTex ("Tint Color (RGB)", 2D) = "white" {}

_BumpMap ("Normalmap", 2D) = "bump" {}

}

Category {

// We must be transparent, so other objects are drawn before this one.

Tags { "Queue"="Transparent" "RenderType"="Opaque" }

SubShader {

// This pass grabs the screen behind the object into a texture.

// We can access the result in the next pass as _GrabTexture

GrabPass {

Name "BASE"

Tags { "LightMode" = "Always" }

}

// Main pass: Take the texture grabbed above and use the bumpmap to perturb it

// on to the screen

Pass {

Name "BASE"

Tags { "LightMode" = "Always" }

CGPROGRAM

#pragma vertex vert

#pragma fragment frag

#pragma fragmentoption ARB_precision_hint_fastest

#include "UnityCG.cginc"

struct appdata_t {

float4 vertex : POSITION;

float2 texcoord: TEXCOORD0;

};

struct v2f {

float4 vertex : POSITION;

half4 uvgrab : TEXCOORD0;

half4 uvbump : TEXCOORD1;

half2 uvmain : TEXCOORD2;

};

half _BumpAmt;

half4 _BumpMap_ST;

half4 _MainTex_ST;

half4 _GrabTexture_TexelSize;

v2f vert (appdata_t v)

{

v2f o;

o.vertex = mul(UNITY_MATRIX_MVP, v.vertex);

#if UNITY_UV_STARTS_AT_TOP

float scale = -1.0;

#else

float scale = 1.0;

#endif

o.uvgrab.xy = (half2(o.vertex.x, o.vertex.y*scale) + o.vertex.w) * 0.5;

o.uvgrab.zw = o.vertex.zw;

// bake fragment shader computations here

o.uvbump.zw = (2.0*_BumpAmt*o.uvgrab.z)*_GrabTexture_TexelSize.xy;

o.uvgrab.xy = o.uvgrab.xy - (_BumpAmt*o.uvgrab.z)*_GrabTexture_TexelSize.xy;

o.uvbump.xy = TRANSFORM_TEX( v.texcoord, _BumpMap );

o.uvmain = TRANSFORM_TEX( v.texcoord, _MainTex );

return o;

}

sampler2D _GrabTexture;

sampler2D _BumpMap;

sampler2D _MainTex;

fixed4 frag( v2f i ) : COLOR

{

// calculate perturbed coordinates

half2 bump = tex2D( _BumpMap, i.uvbump.xy ).rg;

i.uvgrab.xy = i.uvbump.zw*bump + i.uvgrab.xy;

fixed4 col = tex2Dproj( _GrabTexture, UNITY_PROJ_COORD(i.uvgrab));

fixed4 tint = tex2D( _MainTex, i.uvmain );

return col*tint;

}

ENDCG

}

}

// ------------------------------------------------------------------

// Fallback for older cards and Unity non-Pro

SubShader {

Blend DstColor Zero

Pass {

Name "BASE"

SetTexture [_MainTex] { combine texture }

}

}

}

}

Key changes are:
- Reduced precision for some calculations
- Some calculations are moved from fragment shader to vertex shader

Also it's very important which texture formats you are using for this shader. Try not to use 24/32bpp formats.
On which devices performance is really bad?
Click to expand...

Thank you very much, i'll try it as soon as get home. Really bad performance issue I have on Samsung Galaxy S3.

adam_mehman · May 29, 2015

naxel said: ↑

And of course follow ChiuanWei advice. As I understand he recommends to render all scene with your own camera using render-to-texture technique. After that render fullscreen quad with this texture and your object with custom effect using second camera. Pass texture you have used in the first pass using Shader.SetGlobalTexture().
This should work much better because GrabPass unity implementation break CPU/GPU parallelization.
Click to expand...

Ok, I'll do my best to understand what I need to do.

Zicandar · May 29, 2015

babazookZ said: ↑

Really bad performance issue I have on Samsung Galaxy S3.
Click to expand...

Well, what you really need to figure out is what is causing the performance issues.
As with GPU's it isn't always what you expect! And some optimizations might slow things down, some might have no effect ect...

ChiuanWei · May 29, 2015

naxel said: ↑

And of course follow ChiuanWei advice. As I understand he recommends to render all scene with your own camera using render-to-texture technique. After that render fullscreen quad with this texture and your object with custom effect using second camera. Pass texture you have used in the first pass using Shader.SetGlobalTexture().
This should work much better because GrabPass unity implementation break CPU/GPU parallelization.
Click to expand...

in Unity5, i have use Command Buffers instead of GrabPass works.

adam_mehman · May 29, 2015

Zicandar said: ↑

Well, what you really need to figure out is what is causing the performance issues.
As with GPU's it isn't always what you expect! And some optimizations might slow things down, some might have no effect ect...
Click to expand...

I know that issue is about the shader, because when i disable it game runs smoothly.

adam_mehman · May 29, 2015

ChiuanWei said: ↑

in Unity5, i have use Command Buffers instead of GrabPass works.
Click to expand...

You mean I should try on mobile platform with this link?
http://blogs.unity3d.com/2015/02/06/extending-unity-5-rendering-pipeline-command-buffers/

naxel · May 29, 2015

It's not really the shader, but the grab pass. On SGS 3 (Mali-400 MP) shader consumes just 4 cycles per pixel, it's not very much to cause serious problems. But GrabPass initiate glReadPixels call which stalls CPU, wait for GPU to finish, then copy framebuffer content to texture, and only after that CPU continue processing. In average it costs you about 30 additional milliseconds and more per frame which is performance killer.
BTW. render to texture approach will suffer depth test issues and only appropriate if object is not occluded.

Search Unity

Unity ID

Useful Searches

Optimize shader