Stream timeouts on websocket under python lseg.data API

davidk
davidk Contributor

We're streaming LSEG data using the LSEG Data Library for Python, using lseg.data.content.pricing.

Sometimes— seemingly at random (different times of day, sometimes after streams have been up for hours, other times after just minutes), we receive Stream.on_status events with the following (example) payload:

status: {'ID': 8, 'State': {'Code': 'ConnectionError', 'Data': 'Suspect', 'Stream': 'Closed', 'Text': 'Timeout'}, 'Type': 'Status'}
name: AAPL.ITC (or, e.g., BEI_u.TO, 7203.T at various points))

Now when we have a session disconnect (we receive a Session.on_event with an EventCode like SessionDisconnected, SessionConnected, Stream Disconnected, or StreamConnected), typically what happens is the system auto-reconnects. We've even tested this by inducing failures (for example, by unplugging the network cable, letting the error happen, and plugging it back in). This is obviously ideal.

But for these stream disconnects, it seems like the problem is at the head-end… we're not seeing any other problems on the PCs. And the connections never restore, so it seems we're responsible for doing reconnection logic in this case.

So some questions:

1) The message says this is a "timeout"— what is being timed out? Did the headend just stop sending us data? Is there any way to detect a timeout? Is this problem always permanent or is it possible there is a way to "increase the timeout" to avoid this issue?

2) Even though this message is presented to us at the stream level, it seems to affect all streams simultaneously running on the machine— is there a reason to do this as a stream event rather than a (recoverable) Session event?

3) Are there any best practices about knowing whether a Stream event is recoverable or requires ditching the old stream and requesting the stream again?

Thanks very much,

David

Answers

  • Jirapongse
    Jirapongse ✭✭✭✭✭

    @davidk

    Thank you for reaching out to us.

    Based on the error, the issue may originate from the backend service or the headend. Typically, when the stream state is closed, the item cannot be recovered. To continue receiving data, the application must resubscribe.

    status: {'ID': 8, 'State': {'Code': 'ConnectionError', 'Data': 'Suspect', 'Stream': 'Closed', 'Text': 'Timeout'}, 'Type': 'Status'}
    name: AAPL.ITC (or, e.g., BEI_u.TO, 7203.T at various points))

    1) The message says this is a "timeout"— what is being timed out? Did the headend just stop sending us data? Is there any way to detect a timeout? Is this problem always permanent or is it possible there is a way to "increase the timeout" to avoid this issue?

    This could be a connection timeout between the service (API Proxy) and the headend, which lies outside the scope of the LSEG Data Library.

    2) Even though this message is presented to us at the stream level, it seems to affect all streams simultaneously running on the machine— is there a reason to do this as a stream event rather than a (recoverable) Session event?

    The error code is ConnectionError. If the issue is related to connectivity, it is likely to affect all streams.

    3) Are there any best practices about knowing whether a Stream event is recoverable or requires ditching the old stream and requesting the stream again?

    Typically, we begin by checking the stream state. If the stream state is Closed, it indicates that the stream will not be automatically recovered by the system or the library. In such cases, the application must initiate its own recovery process.

    Is it possible the enable the debug log file in the library? Then, we can send the debug log file to the product team to verify what the problem is. To enable the debug log file, you can run the following code before opening a session.

    config = ld.get_config()
    config.set_param("logs.transports.file.enabled", True)
    config.set_param("logs.transports.file.name", "lseg-data-lib.log")
    config.set_param("logs.level", "debug")
    ld.open_session()
  • davidk
    davidk Contributor

    Thanks Jirapongse. I'll try setting the API logs to get at root cause. I'm a little nervous about the throughput though if the logs are enabled. My machine has an HDD and already Workspace pins my HDD at about 40-50% utilization at all times.. In the meantime, it sounds like it's just on us to process this event, throw away the old connection stream object and re-establish the connection. I'll take a look at wrapping the pricing stream in some way to make it more robust.