Hello ! I need your help today ! I begin to work with compute shader in a really simple use case : I have a depth camera and I want to calculate the bounding box of an object near to the camera. But I have too much pixel to process and I want to use GPGPU, compute shader and parallelization to compute this. I currently have a problem, when I run my program, I have the same min and max coordinates. So I think that all my group and threads write in the same time to my StructuredBuffers. Do you have an idea to how to do that ? Thanks a lot ! PS : Sorry for my English, I'm French Here is the code of my compute shader : Code (HLSL Compute Shader): #pragma kernel ComputeBoundingBox //We define the size of a group in the x, y and z directions, z direction will just be one #define thread_group_size_x 1024 #define thread_group_size_y 1 #define thread_group_size_z 1 //Size of the depthData frame #define width 512; #define height 424; //DataBuffer = depthData of the camera //minBuffer, maxBuffer, array of size 3 with min/max x, y and z //mask = image area to process RWStructuredBuffer<float> dataBuffer; globallycoherent RWStructuredBuffer<float>minBuffer; globallycoherent RWStructuredBuffer<float> maxBuffer; RWStructuredBuffer<float> mask; float xValue = 0, yValue = 0, zValue = 0; [numthreads(thread_group_size_x, thread_group_size_y, thread_group_size_z)] void ComputeBoundingBox(uint3 id : SV_DispatchThreadID) { xValue = (id.x + 1) % width; yValue = (id.x + 1) / width; zValue = dataBuffer[id.x]; if (mask[id.x] > 0.49) { if (zValue > 500 && zValue < 1500) { if (xValue < minBuffer[0]) minBuffer[0] = xValue; else if (xValue > maxBuffer[0]) maxBuffer[0] = xValue; if (yValue < minBuffer[1]) minBuffer[1] = yValue; else if (yValue > maxBuffer[1]) maxBuffer[1] = yValue; if (zValue < minBuffer[2]) minBuffer[2] = zValue; else if (zValue > maxBuffer[2]) maxBuffer[2] = zValue; } } }
Yes, this indeed requires sync, you can try interlocked operations https://msdn.microsoft.com/en-us/library/windows/desktop/ff471411(v=vs.85).aspx. Any way I really recommend you to re-think this code, because performance is going to be extremely poor: - All your threads and thread groups going to fight for locks, as basically all the work is in critical sections - You have lots of branching, that will force basically only few threads actually working in group