A mix of Request timeout/channel down/Service not up
Our feedhandler uses C# Real-Time SDK, it subscribed to ~5000 names. On Dec 18, 2024 we notice that there were some issue from ~14:30 to 16:30 EST. The below are some of the samples we received:
2024-12-18 14:51:58.9900 | Refinitiv StatusMsg: Name: CASY.O, ServiceName: hEDD, State: Open / Suspect / None / 'channel down.' 2024-12-18 14:51:58.9900 | Refinitiv StatusMsg: Name: VIRC.OQ, ServiceName: hEDD, State: Open / Suspect / None / 'Request timeout' 2024-12-18 14:51:58.9900 | Refinitiv StatusMsg: Name: CMPS.O, ServiceName: hEDD, State: Open / Suspect / None / 'channel down.' 2024-12-18 14:51:58.9900 | Refinitiv StatusMsg: Name: CMRE.K, ServiceName: hEDD, State: Open / Suspect / None / 'channel down.' 2024-12-18 14:51:58.9900 | Refinitiv StatusMsg: Name: VIRT.OQ, ServiceName: hEDD, State: Open / Suspect / None / 'Request timeout' 2024-12-18 14:51:58.9900 | Refinitiv StatusMsg: Name: CZR.O, ServiceName: hEDD, State: Open / Suspect / None / 'channel down.' 2024-12-18 14:51:58.9900 | Refinitiv StatusMsg: Name: VIS.P, ServiceName: hEDD, State: Open / Suspect / None / 'Request timeout' 2024-12-18 14:51:58.9900 | Refinitiv StatusMsg: Name: CZWI.O, ServiceName: hEDD, State: Open / Suspect / None / 'channel down.' 2024-12-18 14:51:58.9900 | Refinitiv StatusMsg: Name: VIST.N, ServiceName: hEDD, State: Open / Suspect / None / 'Request timeout' 2024-12-18 14:51:58.9900 | Refinitiv StatusMsg: Name: EFAV.K, ServiceName: hEDD, State: Open / Suspect / None / 'channel down.' ... a lot more similar rows… 2024-12-18 15:47:17.0643 | Refinitiv StatusMsg: Name: QFIN.OQ, ServiceName: hEDD, State: Open / Suspect / None / 'Service not up' 2024-12-18 15:47:17.0643 | Refinitiv StatusMsg: Name: WAY.O, ServiceName: hEDD, State: Open / Suspect / None / 'Request timeout' 2024-12-18 15:47:17.0643 | Refinitiv StatusMsg: Name: QGEN.N, ServiceName: hEDD, State: Open / Suspect / None / 'Service not up' 2024-12-18 15:47:17.0643 | Refinitiv StatusMsg: Name: WB.O, ServiceName: hEDD, State: Open / Suspect / None / 'Request timeout' 2024-12-18 15:47:17.0643 | Refinitiv StatusMsg: Name: MKZR.OQ, ServiceName: hEDD, State: Open / Suspect / None / 'Service not up' 2024-12-18 15:47:17.0643 | Refinitiv StatusMsg: Name: MYRG.OQ, ServiceName: hEDD, State: Open / Suspect / None / 'Service not up' 2024-12-18 15:47:17.0643 | Refinitiv StatusMsg: Name: WBA.O, ServiceName: hEDD, State: Open / Suspect / None / 'Request timeout' 2024-12-18 15:47:17.0643 | Refinitiv StatusMsg: Name: GRVY.OQ, ServiceName: hEDD, State: Open / Suspect / None / 'Service not up' 2024-12-18 15:47:17.0643 | Refinitiv StatusMsg: Name: WBD.O, ServiceName: hEDD, State: Open / Suspect / None / 'Request timeout' 2024-12-18 15:47:17.0643 | Refinitiv StatusMsg: Name: GS.N, ServiceName: hEDD, State: Open / Suspect / None / 'Service not up' 2024-12-18 15:47:17.0643 | Refinitiv StatusMsg: Name: WBTN.O, ServiceName: hEDD, State: Open / Suspect / None / 'Request timeout' 2024-12-18 15:47:17.0643 | Refinitiv StatusMsg: Name: IMTX.OQ, ServiceName: hEDD, State: Open / Suspect / None / 'Service not up' ... a lot more similar rows… 2024-12-18 16:15:23.0855 | Refinitiv StatusMsg: Name: PAYC.N, ServiceName: hEDD, State: Open / Suspect / None / 'Service not up' 2024-12-18 16:15:23.0855 | Refinitiv StatusMsg: Name: SONY.K, ServiceName: hEDD, State: Open / Suspect / None / 'channel down.' 2024-12-18 16:15:23.0855 | Refinitiv StatusMsg: Name: SOUN.O, ServiceName: hEDD, State: Open / Suspect / None / 'channel down.' 2024-12-18 16:15:23.0855 | Refinitiv StatusMsg: Name: PB.N, ServiceName: hEDD, State: Open / Suspect / None / 'Service not up' 2024-12-18 16:18:34.3745 | Refinitiv StatusMsg: Name: QTEC.O, ServiceName: hEDD, State: Open / Suspect / None / 'Request timeout' 2024-12-18 16:18:34.3745 | Refinitiv StatusMsg: Name: CXM.N, ServiceName: hEDD, State: Open / Suspect / None / 'Request timeout' 2024-12-18 16:18:34.3745 | Refinitiv StatusMsg: Name: QTRX.O, ServiceName: hEDD, State: Open / Suspect / None / 'Request timeout' 2024-12-18 16:18:34.3745 | Refinitiv StatusMsg: Name: CXW.N, ServiceName: hEDD, State: Open / Suspect / None / 'Request timeout'
We already opened a ticket for it (not sure if you have access to the ticketing system, but fyi: 14220838) and we asked the support team to login to the server see if anything is wrong.
But we are also interested in the technology aspect, what do these three message types mean? We heard from somewhere that channel down means we have slow consumers or slow network, can you let us have a bit more insights?
Thanks,
Find more posts tagged with
Hello @Y_Intercept
Please be informed that you can set the RequestTimeOut parameter in the EmaConfig.xml file to tweak the amount of time (in milliseconds) the OmmConsumer waits for a response message.
You can initially check if this is a network issue by running the testclient tool on the same network to determing the update rate of all subscribed items.
The tool is in the ADS package and you can also download this tool (Infrastructure Tools) from the Software Downloads. It is in the MDS - Infra/Infrastructure Tools category.
First, create a rics.txt file that contains all subscribed RICs. For example:
CASY.OGRVY.OQ
Then, run the test client with the following parameters.
./testclient -h <server ip> -p <server port> -S <service name> -f rics.txt -I 1 -u <user>
The output will look like this:
You may need to run it between ~14:30 to 16:30 EST. If the testclient can handle all updates without any disconnections, it is probably not a network issue.
With the result from the testclient, you will get the average update rate during the peak period. After that, you can check if the application can properly handle that update rate.
- If we see only channel down in log, we can not rule out slow consumers is to be blamed—we need to open a ticket to check the service log to be sure.
- If we see request timeout only or request timeout + channel down in log, it is a strong signal that the problem is on the network. My reasoning: request timeout means the client side SDK code is still waiting for service response and it fails to receive one in 15 sec. As the "slow consumer" typically means our callback function implementation (i.e., the
OnUpdateMsg()
)is too slow, in this "request timeout" scenario,OnUpdateMsg()
is not invoked at all, so request timeout is most likely not a slow consumer issue, but a connectivity issue. - If we see Service not up in the log, it becomes a bit more complicated, it could be the case that the server side program is not running.
Hi @Jirapongse .
Thanks for this answer. Can I summarize the below:
We are especially interested in the 2nd point, as we are currently trying to identify the issue: is it slow consumer or slow network to be blame.
Hello @Y_Intercept
Please be informed that you can set the RequestTimeOut parameter in the EmaConfig.xml file to tweak the amount of time (in milliseconds) the OmmConsumer waits for a response message.
You can initially check if this is a network issue by running the testclient tool on the same network to determing the update rate of all subscribed items.
The tool is in the ADS package and you can also download this tool (Infrastructure Tools) from the Software Downloads. It is in the MDS - Infra/Infrastructure Tools category.
First, create a rics.txt file that contains all subscribed RICs. For example:
CASY.OGRVY.OQ
Then, run the test client with the following parameters.
./testclient -h <server ip> -p <server port> -S <service name> -f rics.txt -I 1 -u <user>
The output will look like this:
You may need to run it between ~14:30 to 16:30 EST. If the testclient can handle all updates without any disconnections, it is probably not a network issue.
With the result from the testclient, you will get the average update rate during the peak period. After that, you can check if the application can properly handle that update rate.
@Y_Intercept
Thank you for reaching out to us.
The 'channel down' message indicates that the connection between the application and the server has been disconnected. Typically, we need to check the server log for a reason of disconnection.
The 'Request timeout' message indicates that the application didn't receive a response message of a request item from the server within a timeout period (15 seconds by default).
The 'Service not up' message indicates that the subscribed service (such as ELEKTRON_DD) is down when requesting items.
All items' states are Open/Suspect which means that the API will recover a connection and item subscriptions on behalf of the application when it can reconnect to the server and the subscribed service is up.
A slow consumer issue happens when an application is unable to process all the messages on time by taking too long to process each message it receives. Therefore, it is building up a backlog on the server. At some point, the server will cut connections of slow consumer applications. Typically, we will see this log on the server side.
For example, the total update messages of all subcribed items are 10,000 updates per second. However, the application can handle only 5,000 updates per second. The application is a slow consumer and the server can cut the connetion of this application due to buffer overflow condition on the server side.
Hi @Jirapongse .
Thanks for this answer. Can I summarize the below:
OnUpdateMsg()
)is too slow, in this "request timeout" scenario,OnUpdateMsg()
is not invoked at all, so request timeout is most likely not a slow consumer issue, but a connectivity issue.We are especially interested in the 2nd point, as we are currently trying to identify the issue: is it slow consumer or slow network to be blame.