A/B side failover using websockets API for data contributions

fjemiolo · February 2024

Hi,

We are trying to implement an integration using the websockets API to contribute our data to Refinitiv.

We have not been successful in identifying in your documentation all the scenarios in which a failover should occur between A (contrib-ws1-amers1.platform.refinitiv.com) and B (contrib-ws2-amers1.platform.refinitiv.com) sides. Mainly, we would like to get some answers to the following questions:

1. In a document shared by one of your relationship managers it mentions that we should only failover if we receive a "service degraded message", however we have not been able to find the exact specification for this message anywhere in your websocket API docs (https://developers.lseg.com/content/dam/devportal/refinitivrealtimeapi_pdfs/websocket_api_protocol_specification.pdf). Could you please send an example, so that we can implement it on our end?

2. Are there any other cases in which we should failover from A-side to B-side apart from the one mentioned above? For example, if due to some reasons all the messages are no longer accepted by your backend, or issues with login occur, should we failover to the B-side?

3. The document shared by one of your relationship managers does not explain what to do when the A-side status is no longer degraded. Are we supposed to receive a message? Will the connection be still available to us to listen for new messages? Or will B-side send a failover request saying that it now is degraded?

4. Should we consider A-side as primary always? Or is there no difference between these two endpoints? Would the message about service being degraded be retransmitted to us if we reconnect/attempt to reconnect to the degraded A-side again? How should we think of restarts throughout the day? Should we continuously attempt to retry to log in every N seconds to one of the sides if the other one is working correctly and we are contributing data there successfully?

5. Are there any instances in which the data would be delivered correctly to Refinitiv, but we would not received a proper Ack back?

6. Are there any NaK messages which we should treat as errors indicating that we should try publishing the same message to the other side?

7. Are there any NaK messages which indicate the need to retransmit the data?

8. Can we safely assume that every update message that has received an Ack has been successfully delivered?

9. After how many seconds/minutes of not receiving any Acks for a message should we consider failover or raise an alert?

10. We have noticed numerous issues when first trying to publish some messages over the wire that sometimes we were not getting any Acks/NaKs back from the system, but the connection was still held. Could you explain to us how to treat such problems? We have later narrowed it down to either incorrect FIDs being published or schema being incompatible, however that is very difficult to debug while integrating, so we would prefer to get a NaK indicating that the data is plain incompatible with your schema/system.

11. Can we use the same login for both A and B side connections, or should we authenticate separately for each one of them?

We would like to finish our integration as soon as possible, so we would appreciate quick and precise answers to the questions posted above.

Thank you!

Gurpreet · February 2024

Hi @fjemiolo,

I have received the response from the product team -

1. There is no specific service degraded message but would include any connection error or failure

2. Same as above

3. They should continue to try and make connections to both endpoints, with a structured back off approach to not overload the authentication services. We recommend that connection recovery should start at 10 seconds and on each login denial, double the connection recovery interval (10, 20, 40, 80, 160 etc) until a maximum back off time of 600 seconds is reached, where it will continue at that value. Once the reconnection login succeeds, the recovery interval should be reset back to the 10 second default value. We don’t send specific failover requests but they should always try to maintain a separate connection to both endpoints.

4. There is no functional difference to each endpoint. As above, try to maintain separate connections to both endpoints so that you can failover quickly yourself without having to form a new connection first.

5. No, we only provide an ACK once the data has been successfully received on the head-end.

6. Yes, connection errors relating to authentication could be endpoint specific

7. Yes, NAK messages contain response text indicating the problem with the update. In this case the update should be corrected and resent.

8. Yes

9. 60 seconds

10. NAK messages contain response text indicating the problem with the update. However, if you continue to send an invalid update to a record then we will disable further updates to that record for 5 minutes and respond that the record has been disabled. Under these circumstances you may not receive another response until the 5 minutes has elapsed.

11. Each account has single sign in properties, so can only have 1 concurrent connection. We provide 2 accounts and endpoints to provide resiliency of the end to end service. When we perform service maintenance we do this on one endpoint at a time, so that there is always one endpoint available. We recommend connecting one account to one endpoint so that you always have a connection available.

I see the service degraded message, and will ask product team to remove it in the next revision of the document.

Hope this helps.

Gurpreet · February 2024

Hello @fjemiolo,

I am not familiar with the document that you are referring to. I have not encountered a "service degraded" message in the system. Can you please provide a reference to this said document.

Normally, I would consider failing over from A to B, when the application looses the connection to the server, or gets a login denied or any other network issue like ping timeout etc. Once failed over to B, keep contributing to that channel, until there is an issue. There is no Primary or Secondary channels and both the endpoints are equal in every term.

The ACK indicates successful contribution, while a NACK indicates the contribution was rejected - and will contain the reason like invalid symbol, rate error etc.

These are a lot of questions and I am not able to answer all of these. I will raise it up with the Contributions product team.

fjemiolo · February 2024

https://community.developers.refinitiv.com/discussion/comment/114215#Comment_114215

Hi @Gurpreet ,

Thank you for the preliminary answers.

When can we expect the Contributions product team to respond to our questions?

Please find the document I have referenced here - Refinitiv Contributions Channel - Development Quick Start Guide 3.1.pdf

Thank you!

A/B side failover using websockets API for data contributions

Best Answer

Answers

Categories

EXPLORE OUR SITES

A/B side failover using websockets API for data contributions

Best Answer

Answers

Categories