Search Unity

  1. Unity Asset Manager is now available in public beta. Try it out now and join the conversation here in the forums.
    Dismiss Notice

5.6.0b9 + IL2CPP Memory Profiler + UWP (Windows Store App - same PC)

Discussion in '5.6 Beta' started by Arkade, Mar 6, 2017.

  1. Arkade

    Arkade

    Joined:
    Oct 11, 2012
    Posts:
    655
    I'm having great difficulty trying to get the new MemoryProfiler to work with Unity 5.6.0b9. Is it expected to work with it?

    I've had it work *twice* in 2 days of trying. As you can imagine, I'm a little ... disappointed. I'll try and keep this post focused on useful information. I really hope it won't come across as a rant.

    My configuration is Windows 10, Unity 5.6.0b9, targeting Windows Store (UWP), Autoconnect Profiler enabled (thanks to @Tautvydas-Zilys for helping on Twitter), building with Visual Studio 2015 Enterprise on a 32GB i7 on SDD. Tiny test-case app created (happy to share). (hope that's most of the required info)

    I'm going to list a few problems I had since there seem to be a few. Some of them are easier to workaround than others. Obviously it would be preferable none of them existed.

    Profiler restarting

    Once the profiler has been closed, I find it less likely to work a second time. Not always but sometimes.
    Workaround: Restart Unity. Exacerbates next problem...

    Zomibe Unity processes

    Profiler will *never* work if there's an old Unity process loitering. I presume the old process has the local rendezvous socket the app tries to autoconnect back to the Profiler at.
    I'm not sure of all the things that cause the zombie process. Perhaps starting Visual Studio from Unity (e.g. double-click a MonoBehaviour) and leaving it running while quitting Unity (I've seen some with VS child process). Having the Profiler work then fail definitely seems to cause (see below re. busy spinning on select()).
    Workaround: Use Process Explorer (or whatever the default one is) to check whether there are any Unity processes after quitting Unity. Kill 'em.

    Older Unity versions to profile newer builds?

    Given all the documentation talks about using Unity 5.3.4, I'd hoped Unity 5.4.0 or Unity 5.5.0 might work to profile my binary. I never got this to work.
    Investigating with Process Explorer, I found the app built with 5.6.0b9, when it profiles with the old profiler successfully, connects on port 34999. This is LISTENING on 5.6.0b9 but is not on 5.4.0 nor 5.5.0. I concluded it's just changed.

    RequestNewSnapshot() failure and "Skipping profile frame. Receiver can not keep up with the amount of data sent"

    This is the biggie.

    Once the default Profiler is successfully connected, pressing "Take Snapshot" in the new Memory Profiler almost guarateeably seems to do nothing. (I've left it for an hour just in case.) On investigating the app and Editor log, nothing happens for quite a while then the App log fills with reams of:

    Skipping profile frame. Receiver can not keep up with the amount of data sent

    (Filename: C:\buildslave\unity\build\Runtime/Network/PlayerCommunicator/GeneralConnection.cpp Line: 376)

    Aside:
    Searching the forum, it's mentioned in this forum thread but that focused on a Unity-side error relating to size being too large and it's since been fixed. I hope that means it's not that.​

    Watching in Process Explorer, Unity consumes 1 core but achieves nothing.
    Investigating the source, I added printf() (and fflush()) to MemoryInformation.cpp CaptureManagedMemorySnapshot() and FreeCapturedManagedMemorySnapshot() and see them called successfully. I added Debug.Log() to MemoryProfilerWindow.IncomingSnapshot() but, in the fail-case, it is not called (and the 'old' profiler stays frozen). In the case when it works, it *is* called (perfectly reasonably) and the 'old' profiler resumes scrolling. I also added a log message to confirm it was registered. Yep, that's fine.

    Checking Process Explorer again, 1 system-level thread is on CPU and its stack revolves roughly around variants of this:
    ntoskrnl.exe!KeSynchronizeExecution+0x3f26
    ntoskrnl.exe!KeWaitForMultipleObjects+0x109c
    ntoskrnl.exe!KeWaitForMultipleObjects+0xb3f
    ntoskrnl.exe!KeWaitForSingleObject+0x377
    ntoskrnl.exe!KiCheckForKernelApcDelivery+0x650
    ntoskrnl.exe!KiCheckForKernelApcDelivery+0x27a
    ntoskrnl.exe!KeSynchronizeExecution+0x25d3
    ntoskrnl.exe!KiCheckForKernelApcDelivery+0x8
    ntoskrnl.exe!ObfReferenceObject+0x79
    ntoskrnl.exe!ObReferenceObjectByHandle+0x288
    ntoskrnl.exe!ObReferenceObjectByHandle+0x2e
    ntoskrnl.exe!NtDeviceIoControlFile+0x1bf
    ntoskrnl.exe!NtDeviceIoControlFile+0x56
    ntoskrnl.exe!setjmpex+0x3a73
    ntdll.dll!ZwDeviceIoControlFile+0x14
    mswsock.dll!Tcpip6_WSHGetSocketInformation+0x11d9
    WS2_32.dll!select+0x1d3
    Unity.exe+0xe56ce8
    Unity.exe+0x146c09d
    Unity.exe+0x11af24c
    KERNEL32.DLL!BaseThreadInitThunk+0x14
    ntdll.dll!RtlUserThreadStart+0x21

    So I'd guess it's busy-looping select()ing on some sockets. Unsurprisingly there's one between app and Editor.
    So I'd hazard that the Editor is expecting more data?

    Sadly that's near the depth I can pursue this too. I can duplicate any of this into a bug report if that'd be helpful.

    On a positive note, it's been fascinating getting to know more about IL2CPP -- especially for my actual eventual need (of trying to get this MemoryProfiler working!)

    Background (skippable):
    I’m *actually* trying to track down causes of the STW (stop the world) pauses in a 4 year old, not-yet-published game using many 3’rd party libraries. At least one of those libraries is highly multi-threaded. The allocations are largely not UnityEngine.Object subclasses (i.e. they’re POCOs -- is Plain Old C# Objects a thing?! I'm an old Java engineer).

    I state all this in case you say "don't use X, use Y instead!". (you never know!)

    Obviously I originally thought the default Profiler might help but all it shows is one section labelled "managed heap" (IIRC) that is *really big*! I remembered the blog post about the new Memory Profiler *and* that it only works against IL2CPP builds. My game is really unlikely to run on my phone but I also remembered 5.6 saying it can build IL2CPP for desktop Windows Store apps! So I figured: update game from 5.4.0 to 5.6.0 (done -- working nicely and looking forward to using all the new and shiny ;-) ). Then before trying building for UWP (not yet done), I'd prove the new Profiler with a simple test app. ...aaaand then the 'fun' began.

    So there you have it.

    Lastly, I'd quickly like to mention that I *love* that you've made so much of this open-source! It's a real pleasure to actually be able to look into this and it fills me with ideas for changes to the MemoryProfiler and GC introspection (e.g. Is that juicy-looking Profiler.cpp stuff the API the Editor's Profiler works against? Can BoehmGC be modified to record and report GC'd object counts since given time so we can see what's causing janks? (e.g. 2000 Shot instances taking 256 bytes each and 1 float array taking 2MB implies pool Shots, etc.))

    Thanks for reading and in advance for any ideas!
    Rupert.
     
  2. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,674
    Hi!

    I'll address some of the points I know about.

    It is expected to work. Anything else is a bug.

    Sorry, this is not supported.

    How large is your managed heap size?

    I don't think process explorer is a right tool to investigate this. Windows Performance Analyzer would work much better: http://files.unity3d.com/zilys/ETWPerfGuide/data/CPUUsageSampledProvider.html

    I made example script in 5.6.0b10:

    Code (csharp):
    1. public class NewBehaviourScript : MonoBehaviour
    2. {
    3.    static int[] integerThings = new int[24 * 1024 * 1024];
    4.    static double[] floatingThings = new double[24 * 1024 * 1024];
    5.    static bool[] boolThings = new bool[24 * 1024 * 1024];
    6.  
    7.    void Start()
    8.    {
    9.        Debug.Log(integerThings.Length);
    10.    }
    11. }
    And indeed, I saw a similar "editor stuck eating CPU cycles" thing. I profiled a bit and realized that it was actually crawling the heap, taking it's precious time. It took 2 minutes on my machine to finish crawling the project with only this one script in it. I pushed an optimization commit here, which brings it down to 5 seconds. Perhaps you're facing the same issue? I'd give that commit a shot.

    Actually we added IL2CPP for Windows Store in 5.3 ;).
     
  3. Arkade

    Arkade

    Joined:
    Oct 11, 2012
    Posts:
    655
    Hello again, @Tautvydas-Zilys ! (and thanks again for all your help!)

    Brilliant news that you made it faster. Merged in you changes. Sadly still not working for me.

    Yes you're absolutely right Process Explorer is not the best tool for this job -- it came easily to hand. Great tip about the Windows Performance Analyzer / Event Tracing. I'll definitely give that a whirl -- sadly not tonight -- maybe tomorrow evening? I'll also try moving to beta 10 -- who knows!

    Extra question: What impact might .Net "Api Compatibility Level" on the built app make? I just remembered I enabled ".NET 4.6" as part of some "this'll make something or other work [better]" instruction (sorry for vagueness :-D ). I'll also try disabling that tomorrow in case relevant.

    Lastly the simple test-case is what's failing atm -- it's tiny (0.8-1.3 MB -- it's just UGUI for allocating stuff but it fails without doing any new allocations). For the actual program, the old profiler shows ManagedHeap.UsedSize as 0.75 GB. Yeah, wonder why I need to do some investigating? ;-)

    Thanks again, Rupert.
     
  4. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,674
    API compatibility level basically lets you set which .NET version you target. With IL2CPP for Windows Store in Unity 5.6 you have three options:

    * .NET 2.0 - this matches the 2.0 profile Unity uses for all other platforms and the editor
    * .NET 2.0 Subset - this is a profile that's a little bit smaller than the former one, and was made so people who didn't need all features could get their game smaller on mobile platforms. It's available everywhere too.
    * .NET 4.6 - this matches the .NET API surface as it was if you were to create a new C# desktop application from Visual Studio targeting .NET 4.6. In Unity 5.6, this option is only available to use for Windows Store and Xbox One when using IL2CPP. No other platforms support it yet.

    So it mostly affects which .NET APIs are available for you to use.

    As for it failing - give beta 10 a shot. I tested it pretty thoroughly yesterday and did not encounter major problems (apart from it being slow). If that still doesn't work, we can investigate further.
     
  5. Arkade

    Arkade

    Joined:
    Oct 11, 2012
    Posts:
    655
    *Good* evening,
    Beta 10 works! (insert choice wasted time swears here ;-) ... followed by a generally relieved and happy sigh.)

    Well, actually only partially :-(

    I'm perfectly happy to supply the whole testcase project but simplistically, here's the allocation:
    Code (CSharp):
    1.         heightmap = new float[w, h]; // it's a private member of a MonoBehaviour btw
    2.         for (int y = 0; y < h; y++) {
    3.             for (int x = 0; x < w; x++) {
    4.                 heightmap[x, y] = UnityEngine.Random.Range(-1f, 1f);
    5.             }
    6.         }
    (I'm doing lots of voxel work so large arrays (e.g. heightmap data) isn't unusual for me.)
    • No allocation worked fine.
    • 1000x1000 = 3.8 MB : fine (though I haven't found the array in the image yet)
    • 10000x10000 = 380 MB : crashed the app (not the Editor) with errors below.
    • 10000x1000 = 39 MB : same symptom as beta 9 (it's still running though so time for Windows Performance Analyzer next)
    Errors from 380 MB:

    RK-DEBUG: CaptureManagedHeap() Starting
    RK-DEBUG: CaptureManagedHeap() Captured. Sending.
    DynamicHeapAllocator allocation probe 1 failed - Could not get memory for large allocation 268435456.
    DynamicHeapAllocator allocation probe 2 failed - Could not get memory for large allocation 268435456.
    DynamicHeapAllocator allocation probe 3 failed - Could not get memory for large allocation 268435456.
    DynamicHeapAllocator allocation probe 4 failed - Could not get memory for large allocation 268435456.
    DynamicHeapAllocator out of memory - Could not get memory for large allocation 268435456!

    Those first two are printf() I added at the top and bottom of CaptureManagedHeap() respectively. (Obvious implication is that the problem is after that method exited.) Nothing came through on the Unity side (including MemoryProfilerWindow.IncomingSnapshot() not getting called).

    I'll write again once I have more info. Thanks again!
     
  6. Arkade

    Arkade

    Joined:
    Oct 11, 2012
    Posts:
    655
    Update: I'd say Windows Performance analysis suggests same thing Process Explorer did -- specifically that Unity is busy-looping select() waiting for data. (see screenshot showing beta 10 doing the 39 MB reproduction of the polling behaviour.)

    In case it's not obvious, Unity is the blackish line that rises around 29 seconds and runs to the end.
    I hit the "Take snapshot" button around the 29 second mark.
    The purple line with 7, 10, 19 and 29 seconds is the UWP app.
    Btw that extra spike at about 33 seconds is a different colour! (it's some Asus thing, not the UWP app.)

    I can upload the .etl file privately if you'd like (but it's 902 MB). If there're other things I can do to help get this resolved, let me know (though I'm only really able to do things in the evenings on week days).

    Separately, hopefully you can reproduce the 380 MB problem without difficulty. (I'm using a 32 GB machine so I'm pretty confident those 256 MB allocation failures are *not* due to my machine running out of memory.)

    Thanks, Rupert.
     

    Attached Files:

  7. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,674
    Can you send me that .etl file in a PM?

    As for those failed allocations - did you by any chance run 32-bit player? It seems that it couldn't allocate enough memory for capturing the heap (which requires as much memory as the heap size itself), and that usually happens in 32-bit processes.
     
  8. Arkade

    Arkade

    Joined:
    Oct 11, 2012
    Posts:
    655
    Sure, I'll send the .etl when I'm at the PC (perhaps 1pm GMT; else this evening).

    Regarding 32 bit, yes it would make sense.
    I'm 95% sure it's built for 64 bit but I'll double check when I'm at the PC.
     
  9. Arkade

    Arkade

    Joined:
    Oct 11, 2012
    Posts:
    655
    doh! Yup 32 bit binary (when I switched to beta 10)!
    Retried. No crash. Glad/hope you didn't waste any time on that red herring. (and sorry for my being dumb there!)
    That said, I'm getting the same behaviour now for 380 MB that I did for 38 MB. I can't leave it for long -- lunch is over -- but a cursory check shows that same stack chewing the CPU.

    I've PM'd the .etl file to you so hopefully that'll give some clues.

    Thanks, Rupert.
     
  10. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,674
    Okay, so it seems there are three separate problems:

    First of all, I looked at the trace and found something very strange: for some reason, sockets functions are super slow on your machine. It takes 20 times as long to send the snapshot than to capture it. On my machine, they both take around the same amount of time. Furthermore, every time unity sends data to the editor, this process seems to call into networking drivers: "AsusGameFirstService.exe". Do you know what it is and what's it's supposed to be doing? It's almost as if it's throttling the connection between the editor and the player. Now I'm not saying it's at fault for causing the connectivity issues, but it does look suspicious. Could you try turning it off and seeing if connection issues get resolved?

    Secondly, there is a known bug in the editor that causes it to continuously poll sockets burning CPU on a secondary thread. I just found this out - people working on connection stuff told me they are aware of it and were planning to fix it soon when I showed them the trace. It shouldn't cause any other issues other than increased CPU load, though, as it's happening on an independent thread.

    Thirdly, there seems to be a bug in IL2CPP which makes it not send back type data for multidimensional arrays. That means that even though it sends back memory with the array, the editor cannot display it as it fails to identify that it is an object. That means multidimensional arrays don't appear in the memory profiler graph. This is why you probably couldn't see them. Fortunately, the bug is in public part of il2cpp source code, so you could patch it up locally if you wish to work around it.

    in libil2cpp\metadata\ArrayMetadata.cpp, line 555. Change from this:

    Code (csharp):
    1. s_ArrayClassMap.insert(std::make_pair(ArrayClassMap::key_type(std::make_pair(klass->element_class, arrayClass->rank)), klass));
    to this:

    Code (csharp):
    1. s_ArrayClassMap.insert(std::make_pair(ArrayClassMap::key_type(std::make_pair(klass->element_class, klass->rank)), klass));
    After doing that change, the 380 MB array appears just fine:

    upload_2017-3-8_18-24-40.png

    I'll be fixing the bug in IL2CPP runtime - it will get into future Unity builds.
     
    Arkade likes this.
  11. Arkade

    Arkade

    Joined:
    Oct 11, 2012
    Posts:
    655
    YES! It works!

    I would leave it at that but I also want to say I am really impressed by your mad investigative skills. (I could do these sorts of things in Solaris -- I was a UNIX guru -- but hadn't realised it was possible in Windows! I'll have to investigate this tool more!)

    Thanks again. I've got a few more steps before I can try my *real* game in the profiler but I feel it's at least problem now... albeit probably taking a few years ;-)

    Related: regarding pull-request you merged to improve speed, would something like Protocol Buffers be a better (faster) protocol than JSON?
     
    Tautvydas-Zilys likes this.
  12. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,674
    Perhaps? It's not used for sending data - just for saving it to disk.
     
  13. Arkade

    Arkade

    Joined:
    Oct 11, 2012
    Posts:
    655
    Ha yeah I realised that as I was lying in bed after sending, doh :)

    JSON's probably miles better at that point for people to do their own processing/diffs (if they don't feel like improving the open-source in-editor facilities for some reason!?)