OmmProvider.submit seems dead locked on > 2000 RICs

Jianping
Jianping Newcomer
edited August 20 in EMA

The server in question registers 2000 RICs, and updates all of them at interval of 0.5s. The 2000 RefreshMsg on registration and 4000 UpdateMsg/s are queued, and OmmProvider.submit-ed in a single thread. It all works fine.

However, when the RIC size is raised to 5000, it gets hanged. The thread dump looks like this:

image.png

Any help would be greatly appreciated.

Jianping

Best Answer

  • Gurpreet
    Gurpreet admin
    Answer ✓

    Hello Jianping,

    This seems to be a case of resource crunch - most likely network. I can recommend these things to try to identify and remedy the problem:

    1. Try using the infrastructure tool like TestClient to subscribe to your list of instruments. You can specify the snapshot option in the command line.
    2. Enable OMM logging in the SDK
    3. Try to pace out the subscriptions

    See more information on how to get the infrastructure tool and usage in this previous discussion.

Answers

  • Hello @Jianping

    I would recommend that you enable the OMM Logging for the EMA SDK and upload the log file when this deadlock happens. There might be a status message that your app might have missed. If this is connecting to your local RTMDS, then check the ADS logs for server side message as well.

    You can also try to pace out the subscriptions in a batch size of 1000 each. The batch size is physically limited by the array size in the SDK as described in this previous discussion -

  • Jirapongse
    Jirapongse ✭✭✭✭✭

    @Jianping

    Deadlock situations typically involve two threads.

    Could you please provide the full thread dump for analysis?

  • Jianping
    Jianping Newcomer

    Hi Gurpreet,

    I simplify the use case where I register 5000 Rics without updating, and would expect to receive 5000 RefreshMsg. On registering, I loop through 5000 Rics one by one via registerClient continuously.

    On the client side, I only receive 133 RefreshMsg, and the rest of (5000-133) comes with StatusMsg of Timeout, and then 5000 StatusMsg of Channel down.

    On the server side, the log records 1062 Incoming reactor message of ReqMsg, which is fewer than 5000 expected. Furthermore, out of these 1062 Incoming reactor message, OmmProviderClient only receives 133 onReqMsg, which it replies back with RefreshMsg. The number of RefreshMsg do match with that in client.

    I hope i made it clear.

    Jianping

  • Jianping
    Jianping Newcomer

    Hi Gurpreet,

    Yes, it looks a case of resource crunch. I interleave 100 Ric registering with 100 ms sleep. All together 5000 Rics are registered successfully. I also want to have a try of batch mode.

    Many thanks

    Jianping