question

Upvotes
3 4 5 10

Java RFA callback on disconnect

We are using Java RFA 8.1.0.L1 and I am wondering if there is a way to get a callback in our code when we have an unexpected disconnect from the Refinitiv host. We have a Client that gets call backs on connection events, so we see the reconnect, but I do not see any callbacks for the disconnect that would proceed the reconnect. It gets logged in the mountTrace log that the RFA library logs but not in our code.

rfajavadisconnection
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
32.2k 40 11 20

Hello @daniel.lipofsky ,

In addition to registering interest for item events, you may wish to register interest in connection and error events, for example:

 OMMErrorIntSpec ommErrorIntSpec = new OMMErrorIntSpec();
 OMMConnectionIntSpec ommConnectionIntSpec = new OMMConnectionIntSpec();

 Handle errHandle = _mainApp.getOMMConsumer().registerClient(_mainApp.getEventQueue(),
                ommErrorIntSpec, this, null);
 Handle connHandle = _mainApp.getOMMConsumer().registerClient(_mainApp.getEventQueue(),
                ommConnectionIntSpec, this, null);

Trigger a test disconnect, to verify if this is what you are looking for?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
3 4 5 10

This doesn't appear to get me what I want. While I am getting OMMConnectionEventMsg that I never saw before, the timestamps in the log match the reconnects, not the disconnects. I am still not getting anything at all for the disconnects. The OMMConnectionEventMsg that I am getting look like:

OMM_CONNECTION_EVENT, connection name = SSLNamespace::pageReutersSSLConn, ConnectionType = RSSL, ConnectedHostName = 159.220.230.146, ConnectedHostPort = 14002, receive address = , receive port = , send address = , send port = , unicast port = , ConnectedComponentVersion = ads3.5.0.L1.linux.rrg 64-bit
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Hi @daniel.lipofsky ,

It may be helpful to learn how do you test this failure on your side?

The way I test this on my side, is I break off my connection to infra (ADS) by forcing a disconnect (disconnecting VPN or WiFi). I see the status reported on the disconnect.

Thinking further of this, are you connecting to your infra directly, or via LPC? If by way of LPC, how do you trigger the issue for testing?

We are not using LPC, this is a direct RFA connection, and I am testing in AWS so there is no way to force a disconnect with WiFi or VPN. I test by just waiting for the disconnect to happen. However, I can tell by looking at the file generated by RFA (SSLNamespace.Connections.pageReutersSSLConn.logFileName) that a disconnect did happen, and I can see from my app logs that no callback was sent to my code.

Upvotes
32.2k 40 11 20

Hello @daniel.lipofsky ,

Thanks for this information. Something in this behavior in AWS is not exactly right, from what I understand.

Is there another way to force a disconnect in AWS during your test, so that we can both be sure that what you see is a disconnect, as well as fully verify the handling of it?

If what you detect is a loss of connectivity, I would expect the behavior of a custom app to be the same as that of a simple example StarterConsumer that comes with RFA Java SDK and that I use to do some quick testing. On connectivity loss:

...
ItemManager.processEvent: Received Item Event... MESSAGE Msg Type: MsgType.STATUS_RESP Msg Model Type: MARKET_PRICE Indication Flags: Hint Flags: HAS_ATTRIB_INFO | HAS_STATE State: OPEN, SUSPECT, NO_RESOURCES,  "An existing connection was forcibly close d by the remote host" AttribInfo ServiceName: ELEKTRON_DD ServiceId: 356 Name: MSFT.O NameType: 1 (RIC) Payload: None ItemManager.processEvent: Received Item Event... MESSAGE Msg Type: MsgType.STATUS_RESP Msg Model Type: MARKET_PRICE Indication Flags: Hint Flags: HAS_ATTRIB_INFO | HAS_STATE State: OPEN, SUSPECT, NONE,  "Waiting for service ELEKTRON_DD UP. Item recovery in progress..." AttribInfo ServiceName: ELEKTRON_DD Name: MSFT.O NameType: 1 (RIC) Payload: None LoginClient: Receive an OMM_CONNECTION_EVENT Name: myNS::connection28 Status: { state: DOWN, code: NONE, text: "An existing connection was forcibly cl osed by the remote host"}
...

- see status callback trigger, on connection loss, both on open item stream, and on open login stream.

I would try to verify the behavior with StarterConsumer running next to your custom app in the same AWS instance, you would need to be able to force a disconnect for a quick sanity check test like this.

The other approach is to sort out failure conditions by testing your application on a testbed where you can simulate connectivity loss. If the loss is not detected, test with StarterConsumer, make sure the loss is detected and move to the custom app.

Additionally, it may be helpful to enable RFA tracing, to learn more of . See this previous discussion thread -> answer by Pimchaya on how to enable RFA tracing.

Hope this information helps.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
3 4 5 10

I don't know any way to force disconnects in the AWS environment, and I only get disconnects with a large watchlist. Turning on full tracing gives a rather huge file - I've done it an I have a 54 GB file to show for it, but really no idea what to do with that file. But in any case the goal is to be able to detect disconnect within the app, so an external log file is not useful.

But I do have a minimal log (attached: trace.txt) that shows the disconnects. You can see "Connection reset by peer" in this file line 49 and 97, e.g.

<record>
  <date>2021-10-27T19:34:07.235645Z</date>
  <millis>1635363247235</millis>
  <nanos>645000</nanos>
  <sequence>2</sequence>
  <logger>com.reuters.rsslc.0</logger>
  <level>FINER</level>
  <class>com.reuters.ipc.TraceLogger</class>
  <method>traceData</method>
  <thread>27</thread>
  <message>
Thread: SSLNamespace::pageReutersSSLSession Session EventQueueGroup
Connection 0
RSSL Connection failed for 159.220.230.146: Connection reset by peer
</message>
</record>

trace.txt (3.7 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
32.2k 40 11 20

Hello @daniel.lipofsky ,

Please be assured that I am trying to help you to investigate the suspected issue, and to come up with the troubleshooting approach that will lead to the solution, within the framework of the discussion forums.

---

Let us start at the point, that disconnects should be detected by an RFA consumer, and reported as status, via callback. In this case, from your report, this is not happening with a custom consumer on a large watchlist.

Without the reliable way to reproduce and fully understand the cause of the issue, there is no way to the solution, as I am sure you know. Further, to support the production app, you will needs to have a way to test going forward, if and when needed, so this is a continuous requirement, rather then a once-off.

---

Please confirm, has the issue only started to manifest, as the watchlist grew (this is what I am guessing)?

Please confirm, was the application, prior to the deployment into AWS, fully tested for failures and error conditions on your dev testbed?

* If this is the case, going back to dev testbed, retesting, and understand when and why the status on disconnect does not manifest properly can be the best and most efficient course of action.

* An alternative approach would be to consider migrating to EMA API, as a strategic option, and investing time into designing a solution in EMA. A new article: Migrating from Refinitiv Legacy to Strategic APIs can be of help. To deliver a consumer application ready for production, we need a development environment where functionality, performance and failure conditions can be fully tested, where we design for robustness from the start.

If you think this could be helpful, please contact your account manager, and let's setup to discuss this offline.



icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
3 4 5 10

@zoya faberov , to be clear, we very much want to go EMA but we have been waiting weeks for Refinitiv to provide us with IDs to access EMA, and it may be many more weeks (months?) before we get them. We can't even get an ETA on when we'll get them. In the mean time, we are having multiple production outages per day and are trying to find a solution.

We only see this with larger watch lists. We have dev and uat clusters we can test on. We are seeing the disconnects there too. What I need is suggestions for things to try.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
32.2k 40 11 20

Hello @daniel.lipofsky

Totally understand, and trying to help.

Are you in control of the connectivity for the purposes of testing, in dev/uat environment?

Would it make sense to you to test with your SDK version/flavor StarterConsumer, trigger disconnect, makes sure it is received and processed by your Starter? I feel that you should try to start with the working basics (working Starter, detecting disconnects) and move up to your requirements, would this make sense?

Otherwise, I would suggest to consider the same test, but starting by running a test with StarterConsumer, running it on your local developer machine, pointing it for testing to the same infra, is there a connectivity from your local developer machine to your infra, otherwise, to request the route with your local network group/admin. Would it make sense to use it as a starter environment to work on the connectivity issue from the ground up, as you are in control of the connectivity on it?

I am looking to start with something that is proven to work, and easily verifiable at every next step building up, with respect to this specific suspected issue.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

I was able to use a mock server, then kill that server, to simulate that type of disconnect. There was no callback in that case either.
Hi @daniel.lipofsky

Thanks. This is important. Just to confirm, you are testing in the same environment where you are facing the issue on the custom consumer, with Perf Provider example, is this right?

From this information, I think the next step is, to test simple StarterConsumer example connecting to Perf Provider, breaking the connection, and to obtain the information that should allow you to narrow down the issue?

Or if I am misunderstanding, please explain more?

We have a mock server that was built from some code you provided a long time ago. I do not know the details or if this is what you mean by Perf Provider, but it works to provide quotes. Asking to build and deploy a totally new app is a lot of work, especially since I am very skeptical that it would actually help us learn anything.

Show more comments
Upvotes
3 4 5 10

yes, that would probably be helpful. I am here for a few more hours today, then out on vacation Fri-Tue.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
24.6k 54 17 14

Hello @daniel.lipofsky

Could you please confirm if you see the "Contact premium support" on the RFA Java API (https://developers.refinitiv.com/en/api-catalog/refinitiv-real-time/robust-foundation-api-rfa-java) page? If so, you can click that button to submit a ticket to the RFA team directly.

rfa-java-support.png

Additionally, please give us the following details that can help us understand your environment better:

  • The Java SDK/JRE version
  • The OS/Platform version
  • The snippet code that registers and handles the OMMConnectionIntSpec

rfa-java-support.png (110.0 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
24.6k 54 17 14

Hello @daniel.lipofsky

Please give us the following details that can help us understand your environment better:

  • The Java SDK/JRE version
  • The OS/Platform version
  • The snippet code that registers and handles the OMMConnectionIntSpec

How many EventQueue that the application is using? If you are using only EventQueue, all messages including the OMM connection event messages might not be dispatched to the application. You may separate the items and OMM connection events to different EventQueues and then use separate threads to dispatch them.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
24.6k 54 17 14

Hello @daniel.lipofsky

We did not receive updates from you for a while. Could you please let us know if the problem is still persist in your environment?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.