Search Unity

Compute shader synchronization

Discussion in 'Shaders' started by Furtys, Jun 26, 2017.

  1. Furtys

    Furtys

    Joined:
    Jun 2, 2017
    Posts:
    1
    Hello !

    I need your help today ! I begin to work with compute shader in a really simple use case :
    I have a depth camera and I want to calculate the bounding box of an object near to the camera.

    But I have too much pixel to process and I want to use GPGPU, compute shader and parallelization to compute this.

    I currently have a problem, when I run my program, I have the same min and max coordinates. So I think that all my group and threads write in the same time to my StructuredBuffers.

    Do you have an idea to how to do that ?

    Thanks a lot !

    PS : Sorry for my English, I'm French :D

    Here is the code of my compute shader :

    Code (HLSL Compute Shader):
    1. #pragma kernel ComputeBoundingBox
    2. //We define the size of a group in the x, y and z directions, z direction will just be one
    3. #define thread_group_size_x 1024
    4. #define thread_group_size_y 1
    5. #define thread_group_size_z 1
    6. //Size of the depthData frame
    7. #define width 512;
    8. #define height 424;
    9.  
    10. //DataBuffer = depthData of the camera
    11. //minBuffer, maxBuffer, array of size 3 with min/max x, y and z
    12. //mask = image area to process
    13. RWStructuredBuffer<float> dataBuffer;
    14. globallycoherent RWStructuredBuffer<float>minBuffer;
    15. globallycoherent RWStructuredBuffer<float> maxBuffer;
    16. RWStructuredBuffer<float> mask;
    17.  
    18.  
    19. float xValue = 0, yValue = 0, zValue = 0;
    20.  
    21. [numthreads(thread_group_size_x, thread_group_size_y, thread_group_size_z)]
    22. void ComputeBoundingBox(uint3 id : SV_DispatchThreadID)
    23. {
    24.     xValue = (id.x + 1) % width;
    25.     yValue = (id.x + 1) / width;
    26.     zValue = dataBuffer[id.x];
    27.  
    28.     if (mask[id.x] > 0.49)
    29.     {
    30.         if (zValue > 500 && zValue < 1500)
    31.         {
    32.             if (xValue < minBuffer[0])
    33.                 minBuffer[0] = xValue;
    34.             else if (xValue > maxBuffer[0])
    35.                 maxBuffer[0] = xValue;
    36.             if (yValue < minBuffer[1])
    37.                 minBuffer[1] = yValue;
    38.             else if (yValue > maxBuffer[1])
    39.                 maxBuffer[1] = yValue;
    40.             if (zValue < minBuffer[2])
    41.                 minBuffer[2] = zValue;
    42.             else if (zValue > maxBuffer[2])
    43.                 maxBuffer[2] = zValue;
    44.         }
    45.     }
    46. }
     
    Last edited: Jun 26, 2017
  2. LukasCh

    LukasCh

    Unity Technologies

    Joined:
    Mar 9, 2015
    Posts:
    102
    Yes, this indeed requires sync, you can try interlocked operations https://msdn.microsoft.com/en-us/library/windows/desktop/ff471411(v=vs.85).aspx. Any way I really recommend you to re-think this code, because performance is going to be extremely poor:
    - All your threads and thread groups going to fight for locks, as basically all the work is in critical sections
    - You have lots of branching, that will force basically only few threads actually working in group