So, I started trying to optimize one of my surface shaders. I thought since I have some (really basic) understanding of CG, I would port it to CG and then I'll hopefully have better performance on iOS. The shader I started with is this: Code (csharp): Shader "Reflective/Diffuse_Custom_Fresnel_Texture" { Properties { _MainTex ("Base (RGB) RefStrength (A)", 2D) = "white" {} _Cube ("Reflection Cubemap", Cube) = "_Skybox" { TexGen CubeReflect } } SubShader { LOD 200 Tags { "RenderType"="Opaque" } CGPROGRAM #pragma surface surf Lambert approxview noambient sampler2D _MainTex; samplerCUBE _Cube; struct Input { half2 uv_MainTex; half3 worldRefl; half3 viewDir; }; void surf (Input IN, inout SurfaceOutput o) { fixed4 tex = tex2D(_MainTex, IN.uv_MainTex); fixed3 reflcol = texCUBE (_Cube, IN.worldRefl); fixed rim = 1.1 - dot (IN.viewDir, o.Normal); o.Albedo = tex.rgb; o.Emission = reflcol.rgb * rim * tex.a; } ENDCG } FallBack "Reflective/VertexLit" } I wanted it to work with lightmaps, have a cubemap, and a pseudo-fresnel. So I started working on the CG version. After a while where everything was pink, I finally got it to work! Here is the shader: Code (csharp): Shader "Reflective/Diffuse_Custom_Fresnel_CG" { Properties { _MainTex ("Base (RGB) RefStrength (A)", 2D) = "white" {} _Cube ("Reflection Cubemap", Cube) = "_Skybox" { TexGen CubeReflect } } SubShader { LOD 200 Tags { "RenderType"="Opaque" "Queue"="Geometry" } Pass { CGPROGRAM #pragma vertex vert approxview #pragma fragment frag #include "UnityCG.cginc" struct appdata { fixed4 vertex : POSITION; fixed3 normal : NORMAL; }; struct v2f { fixed4 pos : SV_POSITION; fixed3 normalDir : TEXCOORD0; fixed2 uv : TEXCOORD2; fixed3 viewDir : TEXCOORD1; fixed2 uv2 : TEXCOORD3; fixed3 color : COLOR; }; uniform fixed4 _MainTex_ST; uniform sampler2D _MainTex; uniform samplerCUBE _Cube; uniform half4 unity_LightmapST; uniform sampler2D unity_Lightmap; v2f vert (appdata_full v) { v2f o; o.pos = mul (UNITY_MATRIX_MVP, v.vertex); fixed3 viewDir = normalize(ObjSpaceViewDir(v.vertex)); fixed dotProduct = (1.1 - dot(viewDir, v.normal)); fixed4x4 modelMatrix = _Object2World; fixed4x4 modelMatrixInverse = _World2Object; o.viewDir = fixed3(mul(modelMatrix, v.vertex) - fixed4(_WorldSpaceCameraPos, 1.0)); o.normalDir = normalize(fixed3(mul(fixed4(v.normal, 0.0), modelMatrixInverse))); o.uv = v.texcoord1.xy * unity_LightmapST.xy + unity_LightmapST.zw; o.color = dotProduct; o.uv2 = TRANSFORM_TEX(v.texcoord, _MainTex); return o; } fixed4 frag(v2f i) : COLOR { fixed4 texcol = tex2D(_MainTex, i.uv2); fixed3 reflectedDir = reflect(i.viewDir, normalize(i.normalDir)); fixed4 refcol = texCUBE(_Cube, reflectedDir); refcol *= texcol.a; texcol.rgb *= DecodeLightmap(tex2D(unity_Lightmap, i.uv)); refcol.rgb *= i.color.rgb; texcol += refcol; return texcol; } ENDCG } } } This one looks *almost* identical to the first one (the way the cubemap is placed it subtly different), but it performs worse (I got an extra 2ms on a scene where this shader was covering 1/3 of the screen). And this is why I'm here. I'm obviously out of my depth here. I did a lot of looking up on wiki and CG examples to manage to write this shader, I understand the basics of the code and what does what, but I have no clue on why it is performing worse. I may be doing something fundamentally wrong that I cannot understand right now. I even looked at the compiled GLes output and the shaders were similar. Actually the second one used less temp vars so it should be more efficient, right? The main difference was in how the cubemap was applied, which makes think that this line is suspect: fixed3 reflectedDir = reflect(i.viewDir, normalize(i.normalDir)); But I don't know, as I said I am clearly out of my depth. How can I optimize this shader (for iOS)?
fixed3 reflectedDir = reflect(i.viewDir, normalize(i.normalDir)); Id start by moving the above line to the vertex shader. your doing a hell of allot of calcs in that one line. you will also save a texcoord by doing it. you are also normalizing a few things multiple times which is something you want to avoid. you are also converting some matrices to fixed precision in the vertex shader. I would avoid doing that because its probably costing you more than the calculation itself. Also Id pass the Uv's from the vertex shader to the fragment shader as half2's. Ive found fixed precision to be slower than Halfs in the vertex shader. Sometimes depending on what needs to be converted keeping some things as floats in the vertex shader is also faster. I think you will find you get a significant speed increase once you have fixed those few things. Edit: Actually looking at the shader there are other approaches you could take,but they are diverting from the idea of optimising what you have already. There are some less correct but faster ways of achieving some of the effects.
For shader optimization on iOS I would recommend to get the compiled GLSL shader from Unity and analyse it with PVRShaman from Imagination Technologies (http://www.imgtec.com/powervr/insider/powervr-pvrshaman.asp ), which shows how many GPU cycles each line approximately requires.
I'm still quite new to this myself, but if your using appdata_full, then do you still need to declare appdata as a struct as I'm not sure your using it? Looking at unity's examples http://docs.unity3d.com/Documentation/Components/SL-VertexProgramInputs.html as reference. I may be wrong though...