Search Unity

optimized way to support up to 32 vertex point light in single pass(mobile)?

Discussion in 'Shaders' started by colin299, May 13, 2017.

  1. colin299

    colin299

    Joined:
    Sep 2, 2013
    Posts:
    181
    hello, I want to support up to 32 vertex point lights in a uber shader for a mobile game(support OpenGLES2).
    I would like to hear some suggestions about the best way to do it before commit my current code.

    Why not just use what Unity provided?
    Unity's default point light is calculated per pixel & it is additive blending by extra drawcalls, performance seems not good for mobile I guess (+drawcalls & heavy fragment shader & many overdraw, I don't want them).
    I believe unity have some vertex lit LightMode, but I am not sure how they work & how to add them to my uber shader.

    What I have done currently
    So I implement vertex point light in a naive way first (not rely on any Unity's lighting system, just mono behavior & some GlobalVectorArrays to let shader know what lights to render). My uber shader will ALWAYS
    1.compute 32 separate vertex point light's additive colors in vertex shader
    2.add them up in vertex shader
    3.pass this single result additive color(step 2) to fragment shader by fixed4 interpolater
    4.just add that result to final fragment shader color(so per pixel calculation is just an ADD).

    I tested this solution on some mobiles devices:
    -some of them did not affect fps at all(currently fragment bound by image effects), which is good!
    -some of them did cost 2~3ms(maybe 32 point lights in vertex shader are just too heavy)
    which seems to be a not perfect but at least practical solution.

    What is the problem?
    when no point light is active & visible, I just turn off the whole point light part by Shader Keyword,
    which is perfect(0 performance cost).

    but sometime I just need a few point light(1~3), because my solution can only support 0 or 32 vertex point light(either off or on), setting the rest unused point light(20+) to black seems like a big waste.The cost of having 1 or 32 point light is the same.
    I want to skip every unnecessary calculation for performance reason.

    How other developer handle "single pass multi light"?
    To improve the performance of my solution, I downloaded The Lab Renderer by Valve and study how they handle dynamic point lights in VR platform.
    https://www.assetstore.unity3d.com/en/#!/content/63141

    I believe I found the most important part by reading their vr_lighting.cginc
    Code (CSharp):
    1. //-------------------------------------//
    2.     // Point, spot, and directional lights //
    3.     //-------------------------------------//
    4.     int nNumLightsUsed = 0;
    5.     [ loop ] for ( int i = 0; i < g_nNumLights; i++ )
    6.     {
    7.         float3 vPositionToLightRayWs = g_vLightPosition_flInvRadius[ i ].xyz - vPositionWs.xyz;
    8.         float flDistToLightSq = dot( vPositionToLightRayWs.xyz, vPositionToLightRayWs.xyz );
    9.         if ( flDistToLightSq > g_vLightFalloffParams[ i ].z ) // .z stores radius squared of light
    10.         {
    11.             // Outside light range
    12.             continue;
    13.         }
    14.  
    15.         if ( dot( vNormalWs.xyz, vPositionToLightRayWs.xyz ) <= 0.0 )
    16.         {
    17.             // Backface cull pixel to this light
    18.             continue;
    19.         }
    20.         //.......
    21.  
    seems The Lab Renderer use a dynamic forloop(loop iteration is controlled by Shader.SetGlobalInt()) to skip unwanted lighting calculation, and inside forloop they use "if(condition) continue;" to further cull lights that will have black result(not affecting rendering)

    ------------------------------------------------
    so the first question:
    is writing forloop in mobile vertex shader a good idea?(In my situation, about 50k vertices will run vertex point light calculation per frame)
    I know writing forloop with fixed iteration count is safe(hardcode iteration count in shader), and on the other hand, I know there will be dynamic branching cost if the iteration count is calculated in shader.
    But how about setting the iteration count by uniform?(set iteration count by Shader.SetGlobalInt() just like what The Lab Renderer did)?

    the second question:
    if "forloop & set iteration by uniform(Shader.SetGlobalInt())" is safe to use,
    is writing "if(condition) continue;" to cull light a good idea?(just like what The Lab Renderer did)?
    ---------------------------------------------------------
    All I want to do is to add many vertex point light for mobile platform without costing too much performance.
    if there is a totally different but better solution, I would like to hear it also.
    thank you!
     
    Last edited: May 13, 2017
  2. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,493
  3. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,350
    Any kind of branching on a OpenGL ES 2.0 device will be very expensive, not because of the branching itself really, but because there's no dynamic flow control. That means if you have an if statement with two options, both will always be calculated and the results of one option thrown. Note that this is true regardless of it being a "static" branch (the condition is purely dependent on a value coming from a uniform) or "dynamic" branch (the value is something calculated in the shader). In the case of a "dynamic loop", it'll always be doing the maximum iterations the compiler guesses the shader could do and throwing out the results for all iterations past the "dynamic" size. Sometimes the compiler will just error if it can't determine a count, or if that count is too high.

    So, the answer to #1 is yes, the for loop is fine, but setting a dynamic iteration count via a uniform is not. The answer to #2 is also no, because OpenGL ES 2.0 can't do dynamic loops or dynamic flow control, using "continue" still runs all of the code for that iteration, it just doesn't use the result, so it'll be just as slow or slower to use it if including the result isn't a problem.

    The short version is using 32 point lights on OpenGL ES 2.0 is going to be expensive no matter what. Your best option is to use multiple keywords to enable several fixed iteration counts. It might also be cheaper to calculate the number of lights that touch each game object / renderer bounds and only pass data for the lights needed, as well as set the keyword for the smallest iteration count that is still larger than the number of lights affecting it. That'll obviously increase the number of draw calls, so depending on your scene this may or may not be faster than having everything iterate over 32 lights all the time.

    The other option is for the devices you know are slow, only allow 8 or 16 lights, or simplify the lighting calculation (no dot product, use distance attenuation only, for example).

    For Valve's The Lab Renderer, all of those shaders are setup to only compile for dx11, which changes things significantly. That supports flow control and real dynamic loops, so something like continue or dyanamic loop iteration counts are possible and can actually be a significant boost. Static branching is basically free, and dynamic branching on AMD GCN GPUs is known to cost 6 instructions, so if what you're doing is more than that then it's a win.
     
    mh114 and colin299 like this.