Search Unity

DropConnectionRequest service

Discussion in 'Multiplayer' started by KelsoMRK, Aug 8, 2017.

  1. KelsoMRK

    KelsoMRK

    Joined:
    Jul 18, 2010
    Posts:
    5,539
    Under what circumstances does the engine make a call to this service?

    I'm trying to debug a scenario where clients consistently disconnect while in our loading screen. While in this screen players, send Commands to the host which turns around and sends ClientRpcs to everyone that inform each other about loading progress and fills up a UI bar for each player.

    What I found initially was that a NullRef exeption was being thrown but never reported in the console. Basically, this happened, the ClientRpc didn't finish correctly and then (I'm guessing based on console output) the engine calls the DropRequest service to leave the match. Ultimately both the client and the host report that the client has timed out (which I also don't think is an accurate message - it seems that the client disconnects and then simply waits for the socket to time out).

    Fixing that null ref exception fixed this issue if I run two instances of the game on the same machine - however the issue is still present with two instances running on two different machines. Increasing the disconnect timeout just makes it take longer for each instance to notice that a client has disconnected. Running those commands through a different channel with a different QoS just makes the problem worse (in some cases I actually managed to configure it to disconnect just by sitting idle in our pregame lobby for a few seconds).

    Really feels like a black box here. I get no reported errors and it works locally (also works if the host plays a game alone).
     
  2. KelsoMRK

    KelsoMRK

    Joined:
    Jul 18, 2010
    Posts:
    5,539
    Seriously!? Why does this timeout? Can anyone explain to me what is happening in this scenario? The host always works. Two clients on the same machine always works. Two clients on different machines never works, and the non-host never receives any network messages.

    I thought maybe loading lots of objects was slowing the frame rate down to the point that messages weren't propagating but nothing ever takes more than 112ms and the timeout is set to 500ms.

    This used to work on 5.3 but since 5.6 and now 2017.1 it doesn't. Something changed and I certainly can't tell because the engine isn't giving me any information about what's happening. I just see network messages stopping and then a call to the DropConnection service and then a timeout, meanwhile the host continues on his merry way and wins the match before it even starts because his competition DC'd.

    I pay for a license to an engine so that, at the very least, I don't have to debug ridiculous networking issues. I'm sure one of you can point me in the right direction @aabramychev @JeremyUnity @superpig @willgoldstone

    network.png console.png
     
  3. TwoTen

    TwoTen

    Joined:
    May 25, 2016
    Posts:
    1,168
    Try to run the same scenario without the Relay server. Might be bandwith issues. If that doesn't work, any if you are using the NetworkManager. Set the Log Level to Developer to get much more insight in what's going on. Hope you get it solved.
     
  4. KelsoMRK

    KelsoMRK

    Joined:
    Jul 18, 2010
    Posts:
    5,539
    Increasing the log level is useless. All I see is a DisconnectEvent with an error code of 6. Means nothing to me.

    And like I said - works solo and if you play with two instances on the same machine. If I understand correctly, if it were a bandwidth issue then the entire match would disconnect, which doesn't happen. And I can see the host processing some of the Rpcs that come from the client. The client gets nothing and disconnects. The host continues on normally. And I can't imagine it would be a bandwidth issue anyway - I'm literally sending a single float value. And decreasing the number of Commands that are called does nothing.
     
  5. TwoTen

    TwoTen

    Joined:
    May 25, 2016
    Posts:
    1,168
    Well, try to move it off the relay? Won't solve your problem. But might get you closer to understanding what's going on
     
  6. KelsoMRK

    KelsoMRK

    Joined:
    Jul 18, 2010
    Posts:
    5,539
    Yeah I really don't see how it would be that. Look at the profiler. There are 6 outbound calls to CmdUpdateMapProgress which is the client telling the host to update his loading bar. That sends a single float argument. That's 24 bytes of data (plus whatever headers etc). You'll also notice that there are no inbound calls to RpcUpdateMapProgress which means that none of those 6 commands (or any from the host) are getting to the client. This is literally a straight through pass - the only thing the command does is call the RPC in order to relay the information to all clients. We do it all over the place and it has always worked.

    And if you look in the log you'll see an explicit call to the DropConnectionRequest service. The client is telling the server that it wants to disconnect. And, according to the log, that call is coming from inside NetworkIdentity.

    Frankly, I'm not going to take the time to do the necessary infrastructure changes to work around the relay server when it's not a viable fix to the problem anyway.
     
  7. KelsoMRK

    KelsoMRK

    Joined:
    Jul 18, 2010
    Posts:
    5,539
    Can anyone from Unity explain how - if at all - frame latency affects perceived packet loss? If I'm instantiating a bunch of objects across multiple frames in a coroutine and that increases the processing time of each frame does that increase the chance that Reliable packets won't get confirmation from the host in a timely enough manner? Also - does executing a coroutine in the context of an Rpc call that does these things have any effect on it?
     
  8. KelsoMRK

    KelsoMRK

    Joined:
    Jul 18, 2010
    Posts:
    5,539
    Further testing today revealed that this can be recreated with two instances running on the same machine if that machine is connected to public WiFi where presumably the signal is weaker.
     
  9. KelsoMRK

    KelsoMRK

    Joined:
    Jul 18, 2010
    Posts:
    5,539
    So FWIW (and for anyone who finds this thread later) we moved to Photon Thunder and a difference in functionality between the two libraries may have revealed what the issue was.

    When the host launches the game we stop the match maker. We do this so that the match doesn't show up in the lobby and makes other players think that they can join it while it's currently in progress. Somewhere around 5.6 it appears that this functionality was changed so that stopping the match maker disconnects all the clients. Photon actually throws an exception because stopping the match maker does more to un-initialize the transport layer.