question

Upvotes
Accepted
18 1 2 2

Elektron SDK Provider fails to accept client logins with dispatch timeOut = NoWaitEnum

To serve market data across my company network we start EMA Interactive OmmProvider with OperationalModel set to UserDispatchEnum. When OmmProvider::dispatch() is called with its default timeOut = NoWaitEnum, clients cannot login to the server.

Specification for OmmProvider::dispatch:
https://docs-developers.refinitiv.com/1564020531520/4725/Docs/refman/ema/a00059.html#aacc00f0651fc18fe95e5c90bfbe1e31d

How to reproduce the bug:
1. Use 130_MarketPrice_UserDispatch example as a server
https://github.com/Refinitiv/Elektron-SDK/blob/master/Cpp-C/Ema/Examples/Training/IProvider/100_Series/130__MarketPrice__UserDispatch/IProvider.cpp

2. Remove timeouts from dispatch() calls, i.e:
75
76 while ( itemHandle == 0 ) provider.dispatch( 1000 ); // change to provider.dispatch()
77

3. Use 130_MarketPrice_UserDisp example as a client:
https://github.com/Refinitiv/Elektron-SDK/tree/master/Cpp-C/Ema/Examples/Training/Consumer/100_Series/130__MarketPrice__UserDisp

Impact:
Delay of almost 1 millisecond between the time market data received by Ark I.G. and the time data are processed by Ark I.G. algorhitms.

Questions: Why provider dispatch is not working and how to fix it?

elektronrefinitiv-realtimeelektron-sdkema-apirrtelektron-message-api
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

The client is not TRDC named user. The client has submitted this issue on Github: https://github.com/Refinitiv/Elektron-SDK/issues/110.

Hello @sergey_at_ark

Thank you for your participation in the forum. Is the reply below satisfactory in resolving your query? If yes, please click the 'Accept' text next to the reply. This will guide all community members who have a similar question. Otherwise please post again offering further insight into your question.

Thanks,

AHS

Hello AHS,

I've learned about workarounds, but I cannot accept them as an answer because they don't allow creating architecture with desired qualities: no CPU time wasted handling timeouts + no CPU time wasted switching thread contexts.

Yesterday I received confirmation that development team is looking into this issue and they will post reply "soon".

Thanks

Upvotes
Accepted
18 1 2 2

Yesterday the fix was posted in:

https://github.com/Refinitiv/Elektron-SDK/commit/1fa8b75367836c21689c36b6d11b37020c7f76bf

I'm going to take new version of SDK for a test-drive...

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

The latest (26 Sep 2019) SDK includes the above fix and our C++ tests show no problems with OmmProvider::dispatch(0). Logins are now accepted OK.

Thank you everyone.

Hi @sergey_at_ark

Thanks to you too for testing and updating this thread - much appreciated!

Upvotes
7.6k 15 6 9

@sergey_at_ark

I did a quick test with the example and see the same behavior when setting dispatch timeout to Nowait in the UserDispatch mode and also see the same result when setting ApiDispatchTimeout to 0 on APIDispatch mode. I suspect that calling dispatch with no timeout inside a polling loop may block EMA internal thread.

I did some additional test by setting it to a new value such 10 (which is 10 micro sec) and found that the consumer app can receive login refresh as usual. In general, the client provider app needs to figure out the proper timeout value to achieve their desired throughput and latency.

-Not sure why you need to set it to Nowait in this case and what is the update rate you are publishing the update? and Can you set dispatch timeout to a higher value such as 10 or 100 microsecond?

-And can you elaborate more about the delay issue? It does mean that if you did not set it to Nowait, it will cause the delay on the consumer side around 1 millisecond?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
25.3k 87 12 25

Hi @sergey_at_ark

I tested as per your instructions and was able to recreate the situation you described.

I will raise a ticket with our developer support team.

As a workaround, I changed the dispatch timeout to 10 microsec and it worked fine.

while ( itemHandle == 0 ) provider.dispatch( 10 );

This will mean the that dispatch method will wait a maximum of 10 microsec before returning control to the application thread. If an event becomes available before the timeout, it will dispatch the event and return control immediately after i..e not wait the full 10 microsecs.

I am not sure what you mean about 1ms delay for data - can you please expand on this? When do you encounter 1millisecond delay?
OR are you concerned that using the 1000 timeout will delay your data? As mentioned above the timeout is a maximum possible value - if some data arrives in the dispatch queue during the timeout it will be dispatched immediately.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
18 1 2 2

Thank you moragodrit and Umer for investigating the bug.

Calling dispatch with no timeout inside a poling loop cannot block any other thread, because the main thread will be preempted and other thread will have its run even on single CPU machine.

We are after absolute minimum latency achievable (before bypassing EMA), and we dedicate cores to processes. That means any delays should be removed. Here is the background that I hope will give you a better context:

We have hybrid application which receives data from Reuters, does very basic processing locally and publishes data to other machines. There is no thread switching and dispatching is done in tight loop like this:

while (server_is_running()) { consumer.dispatch(0); provider.dispatch(1000); }

1000 comes from the examples in SDK. The server is running all the time. When there is no clients connected all local processing is delayed by 1000 (1 millisecond) and we can see it because we can compare it with Java-based server (our legacy app).

As a temporary solution we will reduce timeout to 30 microseconds to be on the safe side until we know what is causing the problem.

May be you can suggest how to generate dummy event for the provider, so that it does not wait? (I'm thinking about a workaround that will get us to 0 timeout).

I am looking forward to hearing news from the development team. Umer, is there a way for me to contact the development team.

Best regards

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
25.3k 87 12 25

Hi @sergey_at_ark

Thanks for the additional information - it was not clear from the original post that you had a hybrid app with OMMProvider and OMMConsumer in the same application.

The NoWaitEnum issue has been reported to development and in the meantime we help progress your coding.

It is not a good idea to dispatch both Consumer and Provider events from the same loop for various reasons e.g.

  1. There may well be more Consumer related events to dispatch from the queue - compared to Provider related events - or vice versa. As you may be aware, each time dispatch is called - one event is dispatched.
  2. Since, we seem to need a small timeout in order for the dispatch to work - having both provider and consumer dispatch in the same loop will cause unnecessary holdups for the consumer if the provider has nothing to dispatch (or vice versa)

In view of the above, the OmmProvider and OmmConsumer events should be dispatched independently of each other.

If you look at the Consumer example 410_MarketPrice_HorizontalScaling - you will note that it creates two OmmConsumer instances and each one dispatches it event independently of the other - as each instance uses its own thread for the dispatching.

I recommend you use the same technique for your hybrid app - i.e. individual threads for the OMMProvider dispatch and for the OmmConsumer dispatch.

This will ensure that the processing of consumer and provider events is done independently. Also, bear in mind that when an event is queued up in either queue ready for dispatching - then this will be done as soon as the dispatch method is called without any timeout period.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
18 1 2 2

It is good to know that developers are looking into the issue, thank you Umer.

Regarding the reasons for introducing threading:

1. "each time dispatch is called - one event is dispatched". Your documentation says that by default 100 events are handled in one dispatch call, and one can control that number.

2. "we seem to need a small timeout". Yes, we need it to work around the bug. Once the bug is fixed the timeout will not be needed.

Thank you for pointing to the HorizontalScaling example. Having multiple consumer threads will not improve our case because consumers would do very little processing and they would be competing for the same single data feed connection.

About having OmmProvider in separate thread.. We considered it from the beginning, but in our case provider and consumer are not independent. Provider does nothing on its own (except for login handshaking), it is driven by consumer update events. If sending update is asynchronous in provider (I hope so) then having another thread will introduce extra overhead. If sending update in provider has blocking somewhere (which is a problem) then we need to look if having another thread is the optimal solution.

Our plan was to switch to "reliable multicast" once all other things are working as expected.

Thank you for sharing your knowledge, we appreciate it.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.