question

Upvotes
Accepted
48 3 12 11

What do these ADS errors mean and how do I detect the errors on a subscribe client

<rmds.1.ads: Warning: Wed May 06 14:55:34 2020> Output threshold breached for ****** at position *****/****** on host ******** using application 256 on channel 97.<END>


<rmds.1.ads: Info: Wed May 06 14:57:46 2020> User ****** at position ******/******.com on host ******** using application 256 on channel 97 has been disconnected due to an overflow condition.<END>

elektronrefinitiv-realtimetrepinfrastructureADS
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
Accepted
25.3k 87 12 25

Hi @duncan_kerr

I don't believe that is currently possible in EMA to check if the event queue is getting too big - I have asked the dev team if this is something they can add to a future version - I will let you know.

In terms of detecting slow consumer scenario, you could check a timestamp field of the received data and if you detect considerable latency then you deduce that the data is being read too slowly and the events are being queued up.

In terms of checking if the application is still logged into the server, you can access the Login stream as demonstrated in examples

example330__Login__Streaming and example333__Login__Streaming__DomainRepresentation

With the above examples the OmmConsumerClient receives messages for the Login stream - so when logged out you can expect to receive a Closed StatusMsg for the Login.

You can also extract the ChannelInformation from the OmmConsumerEvents as mentioned in

https://community.developers.refinitiv.com/questions/56634/need-to-understand-few-extra-details-on-disconnect.html

https://community.developers.refinitiv.com/questions/35516/ema-java-how-to-receive-connection-level-events-su.html

https://community.developers.refinitiv.com/questions/55369/how-to-determine-which-channel-is-down-and-reason.html


In addition to processing the callback faster, you can also use Horizontal Scaling - i.e. multiple OmmConsumer instances on a multi-core CPU - so that the processing workload is split across the cores and illustrated in example410__MarketPrice__HorizontalScaling

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Umer - if the event queue becomes too big & the ADS drops ticks, which status messages would you expect to get? Presumably the ADS doesnt drop the link & send a LoggedOut message on the Login stream?

If we go for the checking timestamps method, would we have to chose which timestamps to compare on the message, or can you recommend the relevant FIDs? Does the ADS timestamp the message when its queued/dequeued?

Hi @duncan_kerr

Please see this thread which talks about Suspect Login status after disconnection due to buffer overflow.

I can't advise on which fields to use for timestamps - you would have to look at the data from your particular set of RICs and identify some you think will fit the bill. Just to be clear when I have seen this technique used, the client more or less discarded the events (i.e. not processing the other fields) until the timeliness of the data improved - because data that was e.g. over a minute old was worthless/pointless for their requirements. So, this won't work if you need to process every single update...

The approach that has worked best for other customers is to improve the throughput using by other worker threads to offload the processing from the API thread and by using Horizontal scaling.


Upvotes
25.3k 87 12 25

Hi @duncan_kerr

A few existing posts on the same issue which should help

https://community.developers.refinitiv.com/questions/27666/elektron-emaconfigxml-consumer-buffer-tuning.html

https://community.developers.refinitiv.com/questions/14322/ema-issue-with-update-items.html

https://community.developers.refinitiv.com/questions/6590/rfa-api-connect-fail-in-murex.html

Essentially an application is not consuming the data quickly enough and the ADS can only buffer so much for each application - eventually it will disconnect the problem application.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Hi Umer - thanks for quick answer. I understand that we need to process data faster & how to do that. What I dont understand is:

  1. how do I monitor the queuing in the API? Can I get an idea of when we are running into trouble?
  2. how do I get notified when the queue fills up & we start to lose data?
Upvote
9.6k 10 7 7

Hello @duncan_kerr

Please see the Solution section in this article to consume data quickly in RFA application.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Hi Pimchaya - I understand that we need to process data faster & how to do that. What I dont understand is:

  1. how do I monitor the queuing in the API? Can I get an idea of when we are running into trouble?
  2. how do I get notified when the queue fills up & we start to lose data?
Upvotes
25.3k 87 12 25

Hi @duncan_kerr

Please confirm which API and language you are using so we can provide the appropriate advice.


icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

We are using EMA Java. I understand that we need to process the onMarketData callback faster, what I want to know is, how do we detect the error condition on a RIC level, and on a channel level?

Upvotes
48 3 12 11

thanks all - Im working through your various suggestions as well as increasing performance. can you confirm, though, when we have a slow consumer, and the ADS runs out of buffer space, what is the sequence of events? Dooes the whole channel go down, or do we just get transient errors of some RICS?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Hi @duncan_kerr

The ADS places its need to service all other consumers in a timely manner - above that of a single consumer.

I notice in your output log that there is a time lag between the threshold breach and disconnect. TBH when helping developers, I have never noticed / explored this time lag. All I am aware of and have advised on is that once the buffer overflows the app is disconnected - because that is when a developer usually notices something is wrong!

However, reading the ADS manual - it seems a bit more complicated.

ADS Installation Manual - see section 7.5.5. OVerflow Handling

A quick read suggests that the ADS will queue any further requests, but continue to process updates etc - when the threshold is breached. If the level drops down below the OK level, then it will resume requests.

However, if the buffer size continues to grow and hits maxOutputBuffers then the ADS will disconnect the application and drop the connection - requiring the application to reconnect and login again.

The reality is that if a consumer is slow - unless you were just going through a short burst of volatility - the buffer will continue to increase, hit the max value and disconnect.

As well as the programmatic suggestions, sometimes experimenting with the buffer sizes can get an application through some short burst periods - but this is not a solution for a generally slow consumer - as you are just delaying the inevitable.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.