Search Unity

ComputeBuffer.GetData takes 15 - 20ms?!?!

Discussion in 'Scripting' started by derPuppeteer, Mar 7, 2015.

  1. derPuppeteer

    derPuppeteer

    Joined:
    Mar 21, 2014
    Posts:
    23
    Hi!

    I wrote a rather simple compute shader and it works fine but almost always resultBuffer.getData(someArray) takes forever! (Mostly 15-20ms but sometimes 1ms to 50ms)

    What am I doing wrong?
    Some Code:
    Code (csharp):
    1. //initialize input buffers fill in setData to some arrays and stuff them into the computeshader
    2. ComputeBuffer result = new ComputeBuffer(100, sizeof(int));
    3. computeShader.SetBuffer(kernel, "Result", result);
    4. computeShader.Dispatch(kernel, 100, 1, 1);
    5. int[] resData = new int[100];
    6. result.GetData(resData); //Takes forever! WTF!
    7. //Do something cool with the results here!
    8. result.Release();
    Thank you very much!
     
  2. superpig

    superpig

    Drink more water! Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,658
    GetData has to wait for the GPU to actually carry out the work you've dispatched, so calling GetData immediately after Dispatch like you're doing there is going to mean you're stuck waiting for the GPU's work queue to clear, and for it to carry out the kernel, and for the results to be copied back to system memory.

    Try doing other work after Dispatch(), so that the GPU has a chance to get things done - you could even yield for a frame and retrieve the results later that way.
     
    OhneHerz and JohnnyBackflip like this.
  3. derPuppeteer

    derPuppeteer

    Joined:
    Mar 21, 2014
    Posts:
    23
    Thank you! My Computeshader is slower than I expected... I just tried that as you wrote your post and you are right!
     
  4. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    8,532
    Have you tried putting this on another thread?

    I haven't used ComputerShader in unity ever, but I don't see why they'd force you to access it only on the main thread.
     
  5. derPuppeteer

    derPuppeteer

    Joined:
    Mar 21, 2014
    Posts:
    23
    I have not tried that. But my problem now turned into a computeshader speed problem:
    Code (csharp):
    1.  
    2. [numthreads(16,1,1)]
    3. void CSMain (uint3 id : SV_DispatchThreadID)
    4. {
    5.     Result[id.x] = 1.0f;
    6. }
    This rather simple shader takes pretty much the same amount of time (~20ms) as the one I was using. Is there anything else I could have done wrong?
    I will try to create a minimal example script that shows the problem.

    EDIT: Even if the computeshader does not compile the getData after Dispatch takes > 20 ms
     
  6. derPuppeteer

    derPuppeteer

    Joined:
    Mar 21, 2014
    Posts:
    23
    Here are my example scripts getData takes ~20ms according to the profiler!
    Code (csharp):
    1.  
    2. using UnityEngine;
    3. using System.Collections;
    4.  
    5. public class wtfScript : MonoBehaviour {
    6.  
    7.     public ComputeShader wtf;
    8.  
    9.     // Use this for initialization
    10.     void Start () {
    11.  
    12.     }
    13.  
    14.     // Update is called once per frame
    15.     void Update () {
    16.         ComputeBuffer result = new ComputeBuffer(300, sizeof(float));
    17.         int kernel = wtf.FindKernel("CSMain");
    18.         wtf.SetBuffer(kernel, "Result", result);
    19.         wtf.Dispatch(kernel, 300, 1, 1);
    20.         float[] resultData = new float[300];
    21.  
    22.         result.GetData(resultData);
    23.         result.Release();
    24.         Debug.Log("Data[0]: " + resultData[0]);
    25.     }
    26. }
    27.  
    28.  
    Shader:
    Code (csharp):
    1.  
    2. // Each #kernel tells which function to compile; you can have many kernels
    3. #pragma kernel CSMain
    4.  
    5. // Create a RenderTexture with enableRandomWrite flag and set it
    6. // with cs.SetTexture
    7. RWStructuredBuffer<float> Result;
    8.  
    9. [numthreads(16,1,1)]
    10. void CSMain (uint3 id : SV_DispatchThreadID)
    11. {
    12.     // TODO: insert actual code here!
    13.  
    14.     Result[id.x] = 1.0f;
    15. }
    16.  
    17.  
    EDIT: Okay, I should have tried it in a new scene! No problems there! But it seems to interact with post processing effects like my custom SSAO and the Unity DX11 Dof! If you add it the getData suddenly spikes in the profiler!
    Possibly a bug?
    EDIT2: But only in my game scene. Just adding DX11 Dof to the scene is not enough...
    EDIT3: But my custom SSAO script added to the empty scene set to 32 samples is enough!!
    Any ideas?
     
    Last edited: Mar 8, 2015
  7. superpig

    superpig

    Drink more water! Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,658
    The issue is not the time taken by your compute shader itself; it's the fact that it's not executed immediately when you call Dispatch(), but is instead queued to be executed by the graphics card. So that 20ms wait you're seeing is down to the GPU needing to clear through its queue of work before it gets to your kernel (which then probably runs in <1ms).
     
  8. derPuppeteer

    derPuppeteer

    Joined:
    Mar 21, 2014
    Posts:
    23
    Thank you very much!
    That also makes a lot of sense!
    Are there any hints on how to shedule those compute shaders correctly?
    It seems at the moment that my main thread is always behind my render thread even though the computeshader script has the highest priority.

    If you dont mind I abuse this thread for an other question:
    The game I am working on has always some cube-planets with a lot of objects on them.
    In order to reduce the amount of stuff that is rendered I use raymarching (with this compute shader) to figure out which objects are visible which are covert by the planet. Objects that are not visible are deactivated with gameobject.setactive(false). This is done every frame.
    Is there a better way to create a custom object-culling solution?

    I could theoretically collect the results of the computeshader one frame later. (Should work for slow camera movement)
    But this can of course not be a permanent solution for everyone..
     
  9. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    You can belay the camera movement by one frame too, then you wouldn't get sly pop in, or does it not work like that?
     
  10. derPuppeteer

    derPuppeteer

    Joined:
    Mar 21, 2014
    Posts:
    23
    Yes, I could do that. Good idea! Thank you!
    Still kinda hacky and annoying!