Search Unity

why passing a vector to fragment for calculation causes problems?

Discussion in 'Shaders' started by tswalk, Apr 21, 2014.

  1. tswalk

    tswalk

    Joined:
    Jul 27, 2013
    Posts:
    1,109
    I came across a strange issue when trying to build a custom shader for use with D3D11 level 9.1 for use on the surface ARM device (uses Tegra 3) that I really just can not explain, so I hope someone can shed some light on the reason why this is happening.

    I attempted to implement some of the techniques I've read, to do vertex lit calculations for specular and apply the contribution in the fragment portion to sacrifice quality for speed. However, it seemed that no matter what fakery I employed, nothing was bumping my frame rates above 30FPS. I tried the BRDF other LUT techniques, raw calculations, I even tried a home brewed (yet weird) ShadeSH9 variation... nothing was working until I was about to hulk smash and then noticed this:


    (this is in general a standard passing of the specular float3 value from vertex to fragment)
    30FPS

    Code (csharp):
    1.  
    2. // float3 from vertex to fragment of o.spec
    3. diffuse.rgb *= (rSH.rgb + lightmap.rgb);  // diffuse * (normal map + lightmap)
    4. diffuse.rgb += i.spec * cSG.r; // diffuse + (specularity * mask)
    5. return float4(diffuse, 1);
    6. // 24 instructions, 4 temp regs, 0 temp arrays:
    7. // ALU 17 float, 0 int, 0 uint
    8. // TEX 4 (0 load, 0 comp, 0 bias, 0 grad)
    9. // FLOW 1 static, 0 dynamic
    10. // "ps_4_0_level_9_1
    11.  
    (this passes the specular as only a float value from vertex to fragment, it was my hail-mary pass to try to optimize)
    60FPS

    Code (csharp):
    1.  
    2. // float from vertex to fragment of o.spec (converted to float3)
    3. diffuse.rgb *= (rSH.rgb + lightmap.rgb);  // diffuse * (normal map + lightmap)
    4. float3 nspec = float3(i.spec, i.spec, i.spec);  //rebuild float3... magic happens.
    5. diffuse.rgb += nspec * cSG.r; // diffuse + (specularity * mask)
    6. // 24 instructions, 4 temp regs, 0 temp arrays:
    7. // ALU 17 float, 0 int, 0 uint
    8. // TEX 4 (0 load, 0 comp, 0 bias, 0 grad)
    9. // FLOW 1 static, 0 dynamic
    10. // "ps_4_0_level_9_1
    11.  
    hmmm, hulk no smash. its' a free operation apparently, but yet yields so much... yes, an additional 30 frames per second. how odd.

    I thought it must have something to do with my texture, perhaps its' not compressed right or i'm doing something goofy. Changing compression methods did not change anything. However, using different combinations of smearing did!

    the cSG variable from above is a simple tex2D grab:

    Code (csharp):
    1. float4 cSG = tex2D(_myimage, i.uv0);

    here's some of my notes:

    Code (csharp):
    1.  
    2. // d+= float * float ok! 60fps
    3. // d+= float3 * float4 bad! if float3 is passed from vert to frag
    4. // d+= float3 * float4 flaky! if float is passed and float3 is built in frag and float4 is smeared cSG.rgaa ~45fps (bounces between 55  35)
    5. // d+= float3 * float4 ok! (solid 60fps) if float3 is built in frag and float4 is smeared cSG.rgga
    6. // d+= float3 * float4 ok! (solid 60fps) if float3 is built in frag and float4 is smeared cSG.rrrg, where "gloss channel (G)" is used as an additional masking
    7. // d+= float4 * float bad! 30fps (i.spec * cSG.r) where spec is float4
    8. // d+= float3 * float bad! 30fps (i.spec * cSG.r) where spec is float3  <<this is what I was originally after
    9. // d+= float3 * float3 bad! 30fps (i.spec * cSG.rrr)
    10.  
    without going through all the troubles of frame captures, and drawcall debugging in visual studio for each combination above to try to get the bytecode... is there any reason for this strange behavior?

    if it possibly a bug, i'll do some bytecode dumps and submit a simple repo... otherwise, if there is a simple answer (since I'm still learning this).. i'll take my lesson and move on.