question

Upvotes
20 2 2 5

Elektron Transport API Java Application Got Disconnected Randomly

Hi guys,


Our team is developing a Java application using ETA API to obtain and contribution information from TREP. I have been following the online guidance developing the consumer application and so far there's a concerning bug. Our application occasionally lost channel connection with TREP and we have to setup automatic recovering method to keep the market information live. The disconnect happens randomly, range from once a week to once a day. I have followed the following link's example to program the ping handler that will periodically check the channel status. I will assume the channel connection is bad if my ping action returns return code that's less than TransportReturnCodes.SUCCESS.

https://github.com/Refinitiv/Elektron-SDK/blob/master/Java/Eta/Applications/Shared/src/main/java/com/thomsonreuters/upa/shared/PingHandler.java


How can I trace down the root cause of this failing ping attempts? Is there something wrong with the client application / example code or the TREP ADS service?


Please help, and thank you for your time.

elektronrefinitiv-realtimeelektron-sdkrrtjavaeta-apielektron-transport-api
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
52.4k 134 44 63

@ccai

First, we need to determine which side (TREP or client application) has cut the connection.

Typically, when TREP cuts the connection, the reason for disconnection will show in the ADS log file. There are two reasons for TREP to cut the connection.

1. Buffer overflow condition. This indicates that the application is a slow consumer which is unable to handle the number of messages sent by ADS

User user at position 10.42.68.175/net on host host1 using application256 on channel 257 has been disconnected due to an overflow condition.

2. Ping timeout. This indicates that the application didn't send ping messages to ADS

RSSL disconnect from "user" at position "10.42.68.175/U8009686-TPL-A" on host "host1" using application "256" on channel 19.
Reason: Client application did not ping.

Please verify the ADS log for the reason of disconnection if TREP has cut the connection.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

I have confirmed with our engineer and they don't find any ADS logs around the time of disconnect. That's very strange. On the consumer client side, here's my client application log:

============================

PolarisMarketDataServer.log:2019-10-02 16:14:54,704 ERROR j.l.Class [Thread-3] channelInactive portno=java.nio.channels.SocketChannel[connected local=/172.18.228.106:35266 remote=njads1-1/192.168.5.207:14002]<SocketChannel.read returned -1 (end-of-stream)>.

PolarisMarketDataServer.log:2019-10-02 16:14:54,704 ERROR j.l.Class [Thread-3] Uninitializing Channel State is closed! SocketChannel.read returned -1 (end-of-stream)

PolarisMarketDataServer.log:2019-10-02 16:14:54,704 ERROR j.l.Class [Thread-3] ChannelSession Error: FAILURE in ETA Adapter receiving thread.SocketChannel.read returned -1 (end-of-stream)

PolarisMarketDataServer.log:2019-10-02 16:14:54,793 ERROR c.s.c.c.m.ETAPingHandler [SMBCScheduler_Worker-2] ETA send ping failed!socket channel is not in the active state for ping


============


Seems to me the disconnect is from the server side, not my client side. Can you please suggest how to diagnose this issue? Thank you!

My ping handler mainly does one thing, which is these lines of code from example above.


/* send ping to remote (connection) If failed, alert the clients. */

int ret = chnl.ping(error);

if (ret < TransportReturnCodes.SUCCESS)

{

return ret;

}

@ccai

To see these logs, you need to set logMountRequests to True and logger*selector to *info in the TREP configuration file.

*ads*logMountRequests: True 
*ads*logger*selector : *.info
Upvotes
16.7k 31 9 12

Update from @ccai via an email.

Hi,

I believe the logging is enabled on the server end by our developers. And I can see the overflow and timeout information in the logging. However, I don’t think the error log capture the system disconnect related to our application, rather, it captured other people’s mistakes. I need to go through the latest log again once it’s shared with me. So far I don’t find the log that helpful.


Can you please suggest other ways to debug or possible solutions? I am happy to call with your team anytime to resolve this issue. This really make us concerned about the production environment and the stability of the API.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.