question

Upvotes
Accepted
2 2 2 1

EMA Java - Streams terminated abruptly - Reconnect

We have two channels configured with retry configs set in our EmaConfig.xml as below


<!-- Specifies the maximum number of times the consumer and non-interactive provider attempt -->
<!-- to reconnect to a channel when it fails. if set to -1, the consumer and non-interactive provider continually -->
<!-- attempt to reconnect -->
<ReconnectAttemptLimit value ="-1"/>
<!--Sets the maximum amount for time the consumer and non-interactive provider wait (in milliseconds)   -->
<!-- before attempting to reconnect a failed channel. Refer also to the ReconnectMinDelay parameter     -->
<ReconnectMaxDelay value="10000"/>
<!--Specifies the minimum amount of time the consumer and non-interactive provider wait (in milliseconds) -->
<!--before attempting to reconnect a failed channel.  -->
<!--This wait time increases with each connection attempt, from reconnectMinDelay to reconnectMaxDelay -->
<ReconnectMinDelay value="5000"/>
<!--Specifies the amount of time (in milliseconds) the provider has to respond with a login refresh message  -->
<!--before OmmConsumer throws an exception. if set to 0, EAM will wait for a response indefinitely  -->

Subscription threads failed to retry after receiving "SocketChannel.read returned -1 (end-of-stream)", complete reconnect logs attached


2020-12-09 07:31:42,021 [WARN ] pool-4-thread-1 com.thomsonreuters.ema.access.OmmConsumerImpl - loggerMsg
    ClientName: ChannelCallbackClient
    Severity: Warning
    Text:    Received ChannelDownReconnecting event on channel Channel_1
        RsslReactor @29841d8f
        RsslChannel @5631984c
        Error Id 0
        Internal sysError 0
        Error Location null
        Error text SocketChannel.read returned -1 (end-of-stream)
loggerMsgEnd

thread-4-reconnect.txt

elektronrefinitiv-realtimeelektron-sdkrrtema-apielektron-message-apiconnection
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
Accepted
25k 87 11 23

Hi @nnagarajan4

As explained previously, the most likely reason some of your consumer connections recover and some don't is 'because the DACS server is facing intermittent issues and sometimes the ADS is able to connect to it and other times it is not e.g. the DACS Server is overloaded and not able to respond to further connections in a timely manner. I am not an ADS / DACS expert so cannot explain why this may be happening. As mentioned your market data team needs to investigate further'

Please explain what you mean by 'average of 25 RICS per subscription with 5000 RICS approx'. How many OMMConsumer instances do you have and how many RICS on average do you register with each OmmConsumer instance? Each OMMConsumer instance should easily be able to handle several thousand RICs. Most developers I have worked with would use a single OMMConsumer Instance For 5000 RICs. If required they spread the RICS across 2-4 instances - but certainly not 25. You ask if there are any concerns about using more than 6 threads - by which I assume you mean >6 OmmConsumer Instances. Each OmmConsumer instance introduces associated overheads which in my experience are unnecessary for such a small number of RICs.

As a test, I recommend you take the EMA Java example 410_MP_HorizontalScaling (found in the Java\Ema\Examples\src\main\java\com\refinitiv\ema\examples\training\consumer\series400 folder). Amend the code to request for example 2500 RICs per OmmConsumer instance and see if your DACS issue continues to occur. If it does, then your Market Data team will need to investigate with the help of Refinitiv support (if required). If the issue does not occur, then you can amend your code to use fewer OmmConsumer instances as per the example.

I would also recommend you work through the EMA Java tutorials so that you can obtain a better understanding of how the API works.

I can't think of anything the EMA API itself can do to fix this issue - as mentioned previously, the API connects to the ADS server - not the DACS server. You appear to have a problem with your ADS<-->DACS server connectivity - you cannot ignore this - otherwise, you risk issues in a production scenario.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvote
25k 87 11 23

Hi @nnagarajan4

Looking at the attached log file, the problem does not appear to be with your ADS connection - I can see several Received ChannelUp event on channel Channel_x messages.

The log file goes onto report "A21: Connection to DACS down, users not allowed to connect"

This means that the ADS connection to the Data Access Control (authentication and entitlement) server is down. Therefore the ADS is unable to validate your credentials and has no choice but to drop the connection.

The ADS is the one that interacts with the DACS server - not your application. Your application only talks to the ADS.

Therefore, you need to speak to your Market Data team and/or DACS administrators to resolve the DACS issue. If they are uncertain how to resolve they should raise a My.Refinitiv ticket for technical support for the ADS / DACS issue.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

if you look at the log at 5:08 there was a channel up event, doesn't it mean the auth was successful with the DACS and the threads was able to reconnect ? why is there a "SocketChannel.read returned -1 (end-of-stream)" after successful channelUp event ?

Also out of 25 subscription threads only 10 tried to reconnect and the remaining 15 never had any down event . How can this happen to only a subset of subscribers ?

Upvotes
25k 87 11 23

Hi @nnagarajan4

My comments were based on the file attachment above.

There are multiple channelup events in the attachment (e.g. 03:56:30, 03:56:56, 03:57:21 etc) and most of them are then followed by status message reporting the RDM Login state 'A21: Connection to DACS down, users not allowed to connect.'.

ChannelUp indicates that you were able to establish a channel between your application and the ADS.

Once the channel is established, the next thing the API (or application) normally does is send a Login request with your username. However, as explained above if it cannot connect to the DACS, you are disconnected - Error text Received login response with Closed/Recover stream state. Disconnecting.

The fact that some of your subscriber threads are working and some are not, could be because the DACS server is facing intermittent issues and sometimes the ADS is able to connect to it and other times it is not e.g. the DACS Server is overloaded and not able to respond to further connections in a timely manner. I am not an ADS / DACS expert so cannot explain why this may be happening. As mentioned your market data team needs to investigate further.

One point that is also of concern, is the reason why you have 25 subscriber threads? Is each thread creating its own connection to the server i.e. 25 mounts and all the associated overhead for each connection?

Do you need to have 25 subscriber threads? Can your implementation be achieved by fewer consumers? I have seen customers create between 3-6 OmmConsumer instances when dealing with 10s of thousands of instrument request so they spread the watchlist across the cores + threads - but not come across such a large number of consumers.


icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

We have average of 25 RICS per subscription with 5000 RICS approx. Is there any other concerns wrt to the API that would cause any issues due to threads > 6 ? Cores we have configured accordingly.


Attached is a different thread that was able to recover after "A21: Connection to DACS down, users not allowed to connect." warning. Why would this thread be able to connect but not the thread from earlier logs ?

pool-11-thread-1-reconnect.txt


Is there anything in the EMA API that could handle the disconnect from DACS and retry exponentially ?

@umer.nalla

We have average of 25 RICS per subscription with 5000 RICS approx. Is there any other concerns wrt to the API that would cause any issues due to threads > 6 ? Cores we have configured accordingly.


Attached is a different thread that was able to recover after "A21: Connection to DACS down, users not allowed to connect." warning. Why would this thread be able to connect but not the thread from earlier logs ?

pool-11-thread-1-reconnect.txt


Is there anything in the EMA API that could handle the disconnect from DACS and retry exponential