Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

[Idea] Unity with C# to GPU power!

Discussion in 'General Discussion' started by Arowx, Jan 7, 2015.

  1. TheSniperFan

    TheSniperFan

    Joined:
    Jul 18, 2013
    Posts:
    712
    Yeah, but what you can come up isn't relevant here, is it? Unless you're having more ideas than the entire Unity community combined, I guarantee you that, if you were to give programmers an API for this, sooner or later someone would come up with something really great.

    While it would be cool, such a thing isn't going to happen, simply because it's pretty much impossible.
     
  2. darkhog

    darkhog

    Joined:
    Dec 4, 2012
    Posts:
    2,218
    Not every game uses up whole GPU. If game is something simple graphically, let's say Minecraft clone, most of GPU is lazily laying doing nothing. Why not use it fully to make e.g. world generator faster? Most of problems Minecraft and it's clones have in terms of performance lays within the slow world generation, you can test it by running MC in normal generator, then in flatlands one and compare FPS values.
     
  3. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    20,965
    Most of Minecraft's problems are not with world generation but with simulation and loading previously generated chunks back into memory. The former can be very difficult to thread properly and still achieve improvements while the latter is primarily due to the sluggish nature of hard drives.

    World generation is a minor problem that should only be problematic if everyone is constantly moving to unexplored regions.
     
    Last edited: Apr 19, 2015
  4. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,617
    I'm sure they would. Just because I can't think of something isn't evidence that it doesn't exist, and I may well think of something under different circumstances. Like I said in that very quote, I'm not implying that it's useless. Like I also said in that same quote, "it has to be considered across the whole system", a part of which is weighing up pros and cons of investing into this vs. putting the same resources into something else.

    If you're talking about entertainment games you're most probably right. In the simulation space, though, it's not necessarily true because often little thought is given to what something looks like, and often performance and graphics are secondary to simulation fidelity. At least one of my projects comes to mind where the CPU is getting hammered while the GPU is almost idling.
     
    Last edited: Apr 19, 2015
    Seneral likes this.
  5. darkhog

    darkhog

    Joined:
    Dec 4, 2012
    Posts:
    2,218
    Someone should build native plugin that would allow for that until Unity would support it by default.
     
    Seneral likes this.
  6. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    Definetely, I'm just too inexperienced with advanced .NET features and the general structure of a GPU Program in order to convert the existing solutions.
    But with native solutions you mean building up something similar from scratch or building a wrapper? If its the first case, you need someone really advanced. And the performance leaks a bit using a c++ wrapper, I guess.

    A general question for downgrading .NET applications, is it possiple to cut out some higher .NET functions (f.E. out of the 4.5 source code) and implement them in .NET 3,5 manually? If so I'll give it another try, I'm currently stuck at these features:
    - CancellationToken (Struct used by threading)
    - ConcurrentDictionary (Some Dictionary in System.Collections.Concurrent used instead of Dictionarys when multithreading)
    - Tuple (Some funny type I never heard of, but seems useful, like structs )

    Correct me when I'm wrong with these;)
     
  7. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    20,965
    The token itself may be a struct, but the system it belongs to is considerably more complex. It is essentially a unified framework for cancelling thread operations. Any alternative is likely to involve a lot of changes to the code.

    https://msdn.microsoft.com/en-us/library/dd997364(v=vs.110).aspx

    This is basically a thread safe dictionary.

    A tuple is simply an ordered list.
     
  8. Tomnnn

    Tomnnn

    Joined:
    May 23, 2013
    Posts:
    4,148
    It's funny that it is even in their case since they have air as a block so they can use object pooling for out of range chunks instead of mass deleting & instantiating.
     
  9. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    20,965
    Minecraft is designed poorly in general. The chunk size is 16 by 16 by the height of the map which is typically 256. If the chunk size were smaller, such as 16 by 16 by 16, you could more easily optimize the game.

    You could generate only the chunks that were immediately reachable rather than being forced to generate areas in the chunk that you may never visit. If the chunk were a single block type, such as air, you could more quickly create the mesh representing the chunk.

    Smaller chunks would help reduce processing needs, memory footprint, and load times of previously generated chunks.
     
    Last edited: Apr 19, 2015
    Tomnnn likes this.
  10. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    Thanks for the link, I'll look into it. I'll figure out if I could get similar results using a similar system...
     
  11. mgear

    mgear

    Joined:
    Aug 3, 2010
    Posts:
    9,350
  12. Tomnnn

    Tomnnn

    Joined:
    May 23, 2013
    Posts:
    4,148
    Maybe they couldn't figure out how to handle people mining 16x16x256 holes in their world (like most quarry mods do), so they had no other choice? That's my best guess. The other thing I see people point at for the source of much lag is flowing water. You can tell when there are many underground rivers because 1 chunk will have slightly worse FPS than other chunks, haha.

    Maybe what would work best would be even more options regarding how the chunking process works.
     
  13. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    20,965
    Mojang has yet to provide us with a modding API and the game has been this way long before mods were available.
     
  14. Tomnnn

    Tomnnn

    Joined:
    May 23, 2013
    Posts:
    4,148
    Heh, I know, and 1.8 Forge is taking a loooong time due to some updates with item IDs and the like people are having fun trying to get around...

    I vote HQM for best minecraft mod!
     
  15. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    20,965
    At times I wonder why Forge doesn't build their own cube game with a real modding API. :p
     
    Tomnnn likes this.
  16. tswalk

    tswalk

    Joined:
    Jul 27, 2013
    Posts:
    1,109
    steering towards the topic... I think you need to be careful on what is sent to be computed on the GPU, I honestly am not that familiar with it, but I think alternate use cases could be had.

    Perhaps some parallel AI threads for quick distance checks or basic logic pattern analysis?.. I don't know, just blowing hot air.
     
  17. lilymontoute

    lilymontoute

    Joined:
    Feb 8, 2011
    Posts:
    1,181
    I currently maintain a Unity port of Cloo (OpenCL library for C#):
    https://github.com/thinksquirrel/cloo-unity

    It has some changes to compile on all platforms, and changes the backend (the native code wrapper) to use an interface instead. The reason for that is so that Cloo can be built as a DLL, but the interface implementations can use Unity's platform dependent compilation.

    I don't have the interface implementations on GitHub but you can find it in Fluvio Free for now (I plan on having a more GitHub friendly version of that up soon):

    https://www.assetstore.unity3d.com/en/#!/content/2888

    The files in question are CL10.cs, CL11.cs, and CL12.cs. The cool thing about this is you can make a backend that uses your own OpenCL implementation on platforms that don't support it (if you can compile the kernels to some form and run it on the CPU).
     
  18. SunnyChow

    SunnyChow

    Joined:
    Jun 6, 2013
    Posts:
    360
    I know Unity already supports GPU computing, but it's only for DirectX 11. And all my projects are for mobile :(
     
  19. darkhog

    darkhog

    Joined:
    Dec 4, 2012
    Posts:
    2,218
    Interesting. Does this work on Mac/Linux?
     
  20. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    The release notes of fluvio note that opencl is avaible on windows, mac and linux, and fallback to cpu automatically if theres no other opencl-compatible device, but opencl computing is currently not supported in the editor. These are the "limitations" I found. Besides common driver bugs, of course.

    The AS page of Fluvio says it's avaible to desktop, as well as web player and mobile, but I'm not sure if its meant for opencl or not.

    Anyways, seems like a solid solution! Looking forward to use it in some terrain erosion solution, seems like the way to go for me.
     
  21. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    @Thinksquirrel Fluvios opencl is not useable in the editor, right? Is this related to the the cloo conversion library of yours or will we be able to use it in the editor still?
     
  22. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    Uhh is there any documentation avaible for Cloo? There's no official one and the tutorials for general opencl just go over the starting process with sth like OpenCLTemplate (which of course is not avaible for unity;) ). The API seems also undocumented, and as I've near to zero experiene with OpenCL I'm totally lost.

    Don't even know where to start when just creating a new context with lots of undocumented parameters isn't possible.
    A new Kernel accepts no parameters. Well then... Found no way to compile my C99 code btw, the ComputeKernel.Program.Build doesn't accept a source parameter.

    Again, I'm lost. Can anyone please provide a start script to learn with?

    EDIT: Nevermind. Looked into Clootils from GitHub (were using Fluvios librarys until then...) and found everything I needed;) And the functions of the source code were documented, those in the dll not? Crazy.
     
    Last edited: Apr 21, 2015
  23. lilymontoute

    lilymontoute

    Joined:
    Feb 8, 2011
    Posts:
    1,181
    Hi guys, to answer some questions.

    Yep! I haven't tested on Linux but it should work as long as A) an OpenCL driver is installed B) The path to the library is correctly specified (Mono does a pretty good job at guessing your shared library path), and C) There aren't any weird bugs that were overlooked either in upstream Cloo or my implementation.

    It is usable in the editor, including outside of Play mode =) Fluvio specifically had a few OpenCL bugs in an earlier beta (especially with NVIDIA cards). The current version has a workaround for those bugs, and a version that will be up this week fixes those issues entirely.

    Yea, definitely take a look at Clootils, it's a great place to get started.
     
    darkhog, Ryiah and Seneral like this.
  24. darkhog

    darkhog

    Joined:
    Dec 4, 2012
    Posts:
    2,218
    Cool! I'll certainly use this when I'll actually need power of the GPU, for now everything runs fine on CPU-only for me.
     
  25. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    @Thinksquirrel I encountered a problem, it started to throw a NotImplementedException: The requested feature is not implemented. today. It an be tracked down to when I try to access the ICL12, or, as I tried out later, any other version (1.0 or 1.1). I ensured with a Cudafy tool that both my CPU and GPU are capable of running OpenCL 1.2. The script that throws the error is CLInterface, when I access the ICL12. Seems none of the versions is setup. The methode below these variables ( public static void SetInterface(ICL12 cl) ) Has no referneces whatsoever (neither has any other member of CLInterface besides CL11 and CL12), means It's nowhere called at all. Do I have to setup my OpenCL manually before? Wouldn't make any sense as it worked yesterday... Do you know why this occured?
     
  26. lilymontoute

    lilymontoute

    Joined:
    Feb 8, 2011
    Posts:
    1,181
    Hi Seneral, as stated above:

    So for a step by step (off the top of my head, not looking at the code at the moment):
    • Download Fluvio Free and grab CL10.cs, CL11.cs, CL12.cs. Again, these will be in the GitHub project at some point.
    • Somewhere before you call any OpenCL code (in the Awake function of a script for example), you need to call:
    Code (csharp):
    1. Cloo.Bindings.CLInterface.SetInterface(new Cloo.Bindings.CL12());
    After this, everything in Cloo will work as normal. Basically, I've replaced all the native API calls with that interface, but if it's not explicitly set, then a NotImplementedException is thrown. If the native P/Invoke stuff is built in to the DLL by default, then the Unity project would not build for unsupported platforms, like iOS for example.

    As to why it may have worked before: If you have Fluvio in your project, it does this behind the scenes with a helper gameobject, but only when Fluvio is used (meaning you need to have a fluid in your scene). So for general purpose, non-Fluvio OpenCL, you'll need to do this yourself.
     
    Last edited: Apr 23, 2015
  27. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    Ok, that makes sense, thank you xD
     
  28. Breyer

    Breyer

    Joined:
    Nov 10, 2012
    Posts:
    412
    Definitely worth a shot! How do you deal with webplayer? As i guess you completely skip cloo and fallback to cpu? Anyway since webplayer slowly dying this isnt much important just courious. More interesting topic is webgl.
     
  29. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    20,965
    OpenGL ES 3.1 has compute shaders. If Unity were to ever implement GPGPU, that's most likely the route they'd use. A specification does exist for WebCL, but it looks like Mozilla is favoring OpenGL ES 3.1 over it.

    http://en.wikipedia.org/wiki/WebCL
    http://en.wikipedia.org/wiki/OpenGL_ES#OpenGL_ES_3.1
     
    Last edited: Apr 24, 2015
  30. zombiegorilla

    zombiegorilla

    Moderator

    Joined:
    May 8, 2012
    Posts:
    9,042
    Compute shaders are sick! Unity does leverage Metal for IOS, not sure if that part is supported though.
     
  31. Lockethane

    Lockethane

    Joined:
    Sep 15, 2013
    Posts:
    114
    http://forum.unity3d.com/threads/ios-8-metal.273784/ Looks like Metal is still out for now, unless if they put it in 5.1 but not in the beta notes.
     
  32. lilymontoute

    lilymontoute

    Joined:
    Feb 8, 2011
    Posts:
    1,181
    For platforms that don't support OpenCL, I have the following fallbacks:

    1) Fallback to Unity's native compute shader pipeline - DX11, Consoles, Metal (when Unity decides to make that), OpenGL ES 3.1 with Unity 5.1+
    2) Fallback to the CPU (using a multithreaded implementation)

    The CPU fallback is manual (it doesn't use the library). In the future, I'd like to have it use the library and just have a C#, CPU-based OpenCL implementation (it would just skip the compilation step and you would feed it some delegates).

    Fun fact, OpenCL in my tests is a decent amount faster than DX11 compute shaders, especially when buffer reads are needed. They're a bit trickier to use in combination with geometry shaders and stuff though.
     
  33. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    Nice, looking like some serious efforts you make there;) Thank you!
    Regarding making it more "user-friendly", CUDAfy did a great implementation there, it uses the Attribute [CUDAfy] and every methode tagged with it was automatically compiled in some way so it could be used. I don't really know if its solid as you can't compile every C# code, I guess it generates it under the hood which seems very complicated... Anyways, this is definetely something I would call User-friendly;) As it's open source one could look into their implementation of course.

    Regarding Wepplayers, there seems to be a limitation, you can't use Images as far as I know. You would have to cast it to a pointer (IntPtr), which can only be called from within the unsafe { fixed () { } } and thus cannot be used in Wepplayers, as such seem to not allow for unsafe code. If I'm not missing anything, of course.
     
  34. lilymontoute

    lilymontoute

    Joined:
    Feb 8, 2011
    Posts:
    1,181
    So there is another aspect to this, tied in with the higher level API Fluvio uses that merges the functionality between OpenCL and DX1. I have a translation layer (done with a huge set of defines) that lets you write the same code between HLSL and OpenCL:

    So code like:

    Code (csharp):
    1.  
    2. FLUVIO_PLUGIN_BUFFER_RW_0(int);
    3.  
    4. FLUVIO_KERNEL(myKernelName)
    5. {
    6.     FLUVIO_BUFFER_RW myBuffer = FluvioGetPluginBuffer(0);
    7.  
    8.    // Do stuff
    9. }
    10.  
    In OpenCL, it'll show up in the kernel arguments:

    Code (csharp):
    1.  
    2. __kernel void myKernelName(__global int* fluvio_pluginData0)
    3. {
    4.     int id = get_global_id(0);
    5.     global int* myBuffer = fluvio_pluginData0;
    6. }
    7.  
    In DX11, it'll show up as a RWStructuredBuffer delcaration:

    Code (csharp):
    1.  
    2. RWStructuredBuffer<int> fluvio_pluginData0;
    3.  
    4. void myKernelName(int3 fluvio_dispatchThreadId : SV_DispatchThreadID)
    5. {
    6.     // here, get_global_id is a macro created for DX11 that will return fluvio_dispatchThreadId, similar to OpenCL
    7.     int id = fluvio_dispatchThreadId.x;
    8.     RWStructuredBuffer<int> myBuffer = fluvio_pluginData0;
    9. }
    10.  
    11.  
    This example isn't exact as I'm not looking at the code, but the translation layer handles all sorts of stuff and lets you write the same code for OpenCL and DX11.

    Again, this is all Fluvio-specific so I don't have any plans to put that into Cloo, but is an example of how you can streamline things. I didn't go with the CUDAfy approach since that wouldn't work well with AoT platforms, but the C# code is similar enough that I can just copy-paste and rename a few things to maintain the same code.

    Regarding Web Player and other platforms. These just don't support OpenCL at all - only desktops (and *some* android devices) support it, so expect compute shaders to eventually on those platforms through Unity (using OpenGL/GLES/Vulkan/Metal) instead.
     
    Last edited: Apr 25, 2015
    Seneral likes this.
  35. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    A quick question, whats wrong with either of these lines
    Code (csharp):
    1. CLkernel.SetValueArgument<float> (2, 0.945f);
    2. CLkernel.SetValueArgument<int> (3, 32);
    when the third and fourth argument (indices 2&3) are
    Code (csharp):
    1. global read_only float* value,
    2. global read_only int* pictureWidth
    ?
    Seems there IS a problem (or bug?) as the compiler spitted out this error code:
    Code (csharp):
    1. InvalidArgumentSizeComputeException: OpenCL error code detected:  InvalidArgumentSize.
    in both lines where i set the value arguments.
    A MemoryArgument did no trouble before, so whats wrong here?
    Any ideas or things i'm missing?
     
  36. lilymontoute

    lilymontoute

    Joined:
    Feb 8, 2011
    Posts:
    1,181
    IIRC, value arguments are single numbers/structs passed by value, so you need to remove the pointer:

    Code (csharp):
    1. global read_only float value,
    2. global read_only int pictureWidth
     
  37. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    That's exactly what I tried before, but then a BuildProgramFailure Exception was thrown.
    For clarification, that's the simplified OpenCL Program:
    Code (csharp):
    1.  
    2. string multiplyPixel = @"
    3. kernel void multiplyPixelValue (
    4.    global read_only float* src_pixel,
    5.    global write_only float* dest_pixel,
    6.    global read_only float value,
    7.    global read_only int pictureWidth )
    8. {
    9.    int x = get_global_id (0);
    10.    int y = get_global_id (1);
    11. }";
    That was why I asked;)
    And even when this would compile, how would I acess the variable?
    Code (csharp):
    1. value;
    2. // OR
    3. value[0]
    Thanks!
     
  38. lilymontoute

    lilymontoute

    Joined:
    Feb 8, 2011
    Posts:
    1,181
    Ah whoops, your kernel should be like this:

    Code (csharp):
    1.  
    2. kernel void multiplyPixelValue (
    3.     global read_only float* src_pixel,
    4.     global write_only float* dest_pixel,
    5.     float value,
    6.     int pictureWidth )
    7. {
    8.     int x = get_global_id (0);
    9.     int y = get_global_id (1);
    10. }
    11.  
    Memory specifiers (global/local, read_only/write_only/read_write, etc) are for memory objects (buffers).

    Your value argument would be accessed like this:

    Code (csharp):
    1.  
    2. float my_value = value;
    3.  
    Memory objects/buffers would be accessed like:

    Code (csharp):
    1.  
    2. // get_index would be some function you define
    3. // to get your current buffer index from the x and y positions.
    4. int pixel_index = get_index(x, y);
    5.  
    6. float current_pixel = src_pixel[pixel_index];
    7. dest_pixel[pixel_index] = do_something_to(current_pixel);
    8.  
     
    Seneral likes this.
  39. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    Thanks, it's now working, as far as simple image manipulation goes:)
    Though it seems OpneCL BuiltIn functions are not avaible?
    I'm trying to acess fmax and fmin to clamp an int:
    Code (csharp):
    1. int clamped = fmax (7, 12);
    (Of course stripped down to ints)
    But this causes an error, why?

    Besides, I'm getting nasty crashes (yes, Blue Screens...) when calling the functions too often (constantly writing to gpu memory). What's the correct way of cleaning up used GPU Memory? I'm actually trying to Dispose every memory buffer, command queue, kernel, program and context after use, but in the second run, right after disposing everything, Unity crashes.
     
  40. lilymontoute

    lilymontoute

    Joined:
    Feb 8, 2011
    Posts:
    1,181
    built-in functions are definitely available - fmax returns a float, doesn't it? You should be using max/min instead for integer types I think.

    And yea, OpenCL is NOT forgiving of any issues in your host code. Best thing I can recommend is to set things up *exactly* like Clootils and modify from there. And after EVERY call you should check for errors. If you're still getting a bluescreen or hard crash, it's probably a runtime error with your kernel (out of range index, null reference, etc).
     
  41. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    Thank you, makes sense xD Sorry for asking the most basic OpenCL questions... should've figured it out myself;)
     
  42. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    @Thinksquirrel I'm not sure if it's related to my setup, but in order to let OpenCL recognise my changes I made on the source code I always have to restart Unity, which is kinda annoying. I guess it's just OpenCL caching my source code, this would be the only reason I could think of. Are you experiencing the same, and do you know how to fix it?
     
    Last edited: May 1, 2015
  43. lilymontoute

    lilymontoute

    Joined:
    Feb 8, 2011
    Posts:
    1,181
    Are you recompiling the program? You need to do this every time you change the source. I haven't experienced this issue (I currently recompile during OnEnable, but plan to do a bit of caching in the future).
     
  44. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    Ýes, I do, everything, even the setup of the Interface, is done in one function so far.
    But tbh I it was my fault, I relied on the fact that all my variables are resetted each time I run the function again. Which they should do, as I call Dispose on every GPU Memory Object and clear every Host variable. But it turns out they don't, and that's because I was confused when I took out the code of the OpenCL functions and nothing happened and concluded it was because the code wasn't reloaded.
    Now I have to find out why they don't seem to be resetted...
    Sorry about that:)
     
  45. lilymontoute

    lilymontoute

    Joined:
    Feb 8, 2011
    Posts:
    1,181
    Ah yea, as a side note with low-level code like this, never assume that a variable will be initialized properly between runs. You need to set kernel arguments every time you recompile (I currently set a lot of them every frame in fact).
     
  46. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    Thanks, I do that.
    Just to be sure, I'm currently doing an algorithm consisting of multiple iterations, each consisting of multiple steps. Those steps are seperate OpenCL functions because of syncronisation. I currently only set the arguments once, right after compilation. The kernels are used multiple times, in each iteration once.
    1. Can I assume that, even though I'm not updating the arguments, as it's a pointer it will update in each iteration / step when a change is made, as long as I'm doing a synchronisation point?
    2. How can I be sure every data is cleared? I'm currently doing everything as in the Clootils, but I'm doing Image manipulation, and I can watch the image evolve out of previous results, even if I (I assume) reset every data, both on CPU (setting everythin to null) and GPU (Disposing every Memory Object / Resource). It still gets something from previous runs. Everything is, as I said, setup in that one function. Theoretically no data could have passed my clean up (Again, assuming Disposing Memory Objects clears data on the GPU...). Any ideas?

    EDIT
    1: Yes
    2: Fixed that by adding AllocateHostPointer ComputeMemoryFlag to every MemoryObject.
     
    Last edited: May 4, 2015
  47. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    @Thinksquirrel Is there a way of pointing to a device memory buffer outside of cloo? What I'm trying to do is creating an write_only Image with a data pointer to a Render Texture and writing to that renderTexture from a GPU kernel. I used GetNativeTexturePtr to get the pointer. My problem now is that it throws a InvalidHostPointer exception when I try the UseHostPointer ComputeMemoryFlag. My kernel code:
    Code (csharp):
    1. kernel void WriteToRenderTexture (
    2.     write_only image2d_t terrainDataRT,
    3.     global float* terrainHeight )
    4. {
    5.     int x = get_global_id (0);
    6.     int y = get_global_id (1);
    7.  
    8.     int2 coords = { x, y };
    9.     float4 data = { 0, terrainHeight [x + y * get_global_size (0) * 3], 0, 0 };
    10.     write_imagef (terrainDataRT, coords, data );
    11. }
    and my terrainDataRT definition:
    Code (csharp):
    1. renderTexture = new RenderTexture (width, height, 3, RenderTextureFormat.RGB565);
    2.  
    3. terrainDataRT = new ComputeImage2D (CLcontext, ComputeMemoryFlags.WriteOnly | ComputeMemoryFlags.CopyHostPointer,
    4.                 new ComputeImageFormat (ComputeImageChannelOrder.Rgba, ComputeImageChannelType.Float),
    5.                 width, height, 0, renderTexture.GetNativeTexturePtr ());
    How would you go about this? What I basically need is to visualize some image data (terrain stuff) in unity without costly streaming back and forth between Host and device Memory every frame.

    EDIT: I know it's useHOSTpointer, so the pointer defenitely is wrong. So I need another way of getting the buffer.
     
  48. lilymontoute

    lilymontoute

    Joined:
    Feb 8, 2011
    Posts:
    1,181
    OpenCL has extensions for sharing between OpenCL and OpenGL/DX9/DX11, which will avoid copying data around when possible. You'll need to check the spec, make sure the driver supports those extensions, and call those extension functions manually (Cloo has the API for getting extensions and getting extension function pointers).

    If the extension you need is not supported, you have to copy manually through the CPU (which is slow of course, but no other way). I don't know of any desktop card that doesn't support it though (some cards have vendor specific extensions to do it that you may need to use).
     
  49. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    Thanks, I'll do that! Had not looked into "Interop" yet. I'll try it!
     
  50. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    @Thinksquirrel Ok I looked into those sparely documented Interoperations between OGL/D3D and OCL but as far as I've read the OGL/D3D environment has to create my context in both cases and I'll need to pass that context to the extension functions. Also I would need to setup a huge system of extensions and check if those are supported and if not switch to another one and what the hell else...
    Seems like a thing scaled too large to be able to implement that efficiently, and I guess if thats true I got so far, than I'll not even be able to do that as Unity doesn't give me access to their OGL/D3D environment.

    I'm totally new there, but thats what I got so far. Is that right?
    Did you already made something like cross-graphics API and cross-platform Interop compatible with most devices in Unity? I guess it'd be easier compared to efforts to just copy over and write to that RenderTexture...