Inquiry About `get_data()` Performance and Comparison Between `lseg.data` and `refinitiv.data`

Posting this query on behalf of a client.

Dear Support Team,

We recently migrated our application from the Eikon .NET APIs to the LSEG Data Library for Python. However, we have observed that the performance of data retrieval operations, particularly when using get_data(), is slower compared to the previous implementation with the Eikon .NET APIs.

Our use case involves retrieving data for up to about 850 instruments with 3 column. While we understand that the Python library may have different performance characteristics, we would like to know:

1. Performance of `get_data()`:

   - Is `get_data()` the most optimized function for retrieving real-time and historical data in bulk?

   - Are there any best practices or configurations (e.g., batching, filtering) to optimize the performance of get_data() when retrieving data for our case?

For your reference, our current implementation is as follows.

```

            ret = d.get_data(

                universe=instruments,

                fields=cols

            )

```

2. Alternative Functions:

   - Does the `lseg.data` library provide any alternative functions that are faster or more efficient than `get_data()` for our use cases?

3. Comparison with `refinitiv.data`:

   - I noticed that Refinitiv also provides the `refinitiv.data` library. Is this library faster or more optimized than `lseg.data` for similar use cases?

   - Are there specific scenarios where `refinitiv.data` is recommended over `lseg.data`?

4. Documentation or Examples:

   - Could you provide any documentation, examples, or recommendations for optimizing data retrieval using `lseg.data` or `refinitiv.data`?

I would appreciate your guidance on these questions to ensure that I am using the most efficient approach for my application.

Thank you for your support.

Answers

  • Jirapongse
    Jirapongse ✭✭✭✭✭

    @tserenbat.uyanga

    Thank you for reaching out to us.

    Typically, the LSEG Data Library is a rebranded version of the Refinitiv Data Library (changing Refinitiv to LSEG).

    Most functionalities are the same and both libraries can be used to retrieve data from the desktop or platform session.

    We highly recommend using the LSEG Data Library instead of the Refinitiv Data Library.

    You can enable the debug log in the both libraries to verify what the problem is by running the following code before opening a session.

    config = ld.get_config()
    config.set_param("logs.transports.file.enabled", True)
    config.set_param("logs.transports.file.name", "lseg-data-lib.log")
    config.set_param("logs.level", "debug")

    With this code, the log file ("lseg-data-lib.log") will be created. The log file will contain all sent requests and the corresponding retrieved responses.

    The get_data method is not designed to retrieve bulk data, as mentioned in the LSEG Data Library for Python - Reference Guide.

  • *Posted on behalf of a client.

    Hi @Jirapongse,

    Thank you for your reply.

    We appreciate the clarification regarding the LSEG Data Library and its similarities to the Refinitiv Data Library. However, we believe there may have been a misunderstanding regarding our inquiry.

    Our question is not related to any errors or latency issues. We are aware that the get_data() method is functioning as intended and that no errors are occurring. Instead, our concern is about the inherent speed limitations of the get_data() method when retrieving multiple data points (e.g., 800 rows with 3 columns each) at once. We understand that the term "bulk" might have been misleading, as we are not referring to extremely large datasets (e.g., mega/gigabytes of data), but rather to retrieving multiple data points simultaneously.

    Additionally, while I mentioned "historical data" in the inquiry, we were aware that the get_data() method is not designed for historical data retrieval. Our question is whether there are alternative methods or approaches within the LSEG Data Library that are better suited for retrieving multiple data points efficiently, given the limitations of get_data().

    Furthermore, the log provided below is not strictly generated using the method you suggested but is included here for reference and potential assistance:

    2025-06-10 13:20:30,700 - DEBUG - connect_tcp.started host='localhost' port=9000 local_address=None timeout=20 socket_options=None

    2025-06-10 13:20:32,737 - DEBUG - connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x000001522644B3E0>

    2025-06-10 13:20:32,739 - DEBUG - send_request_headers.started request=<Request [b'POST']>

    2025-06-10 13:20:32,740 - DEBUG - send_request_headers.complete2025-06-10 13:20:32,741 - DEBUG - send_request_body.started request=<Request [b'POST']>

    2025-06-10 13:20:32,742 - DEBUG - send_request_body.complete

    2025-06-10 13:20:32,742 - DEBUG - receive_response_headers.started request=<Request [b'POST']>

    2025-06-10 13:20:33,755 - DEBUG - receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Access-Control-Allow-Origin', b'*'), (b'X-ID', b'XXXXX'), (b'X-Request-Id', b'XXXXX'), (b'Content-Type', b'application/json; charset=utf-8'), (b'RateLimit-Remaining', b'XXXXX'), (b'VolumeLimit-Remaining', b'XXXXX'), (b'QueueLimit-Remaining', b'XXXXX'), (b'RateLimit-Policy', b'XXXXX'), (b'VolumeLimit-Policy', b'XXXXX'), (b'RateLimit-Resource', b'*'), (b'VolumeLimit-Resource', b'*'), (b'Content-Length', b'XXXXX'), (b'ETag', b'XXXXX'), (b'Date', b'Tue, 10 Jun 2025 04:20:33 GMT'), (b'Connection', b'keep-alive'), (b'Keep-Alive', b'timeout=5')])

    2025-06-10 13:20:33,756 - INFO - HTTP Request: POST http://localhost:9000/api/udf "HTTP/1.1 200 OK"

    2025-06-10 13:20:33,757 - DEBUG - receive_response_body.started request=<Request [b'POST']>

    2025-06-10 13:20:33,758 - DEBUG - receive_response_body.complete

    2025-06-10 13:20:33,758 - DEBUG - response_closed.started

    2025-06-10 13:20:33,759 - DEBUG - response_closed.complete

  • Jirapongse
    Jirapongse ✭✭✭✭✭

    @tserenbat.uyanga

    Both libraries retrieve data from the Data API Proxy running on localhost.

    Typically, latency occurs either in the Data API Proxy or in the endpoints that process the request messages.

    The libraries are primarily responsible for constructing and sending request messages, and for processing the corresponding responses.

    We need to log data from both the Refinitiv Data Library and the LSEG Data Library to compare latency.

    Another method that can be used to retrieve historical data is get_history. However, you need to verify if the client would like to retrieve historical data of the TR.xxx fields or the real-time fields (such as BID, and ASK).

    The get_data method can retrieve historical data of the TR fields while the get_history method can retrieve historical data of both the TR fields and real-time fields.

    Please refer to the examples on GitHub.

  • *posted on behalf of a client.

    @Jirapongse

    LSEG Workspace has become a web API application, which means that every data retrieval involves the overhead of HTTP/HTTPS communication. 

    On the other hand, Refinitiv Eikon seems to use some proprietary method to maintain a constant connection with the backend system, eliminating the need to establish connections or prepare communication for each request. 

    As a result, the slower performance of LSEG Workspace compared to the older Refinitiv Eikon is likely due to fundamental differences in their architecture, and bridging this performance gap would be difficult (if not impossible). 

    This is not an issue that can be resolved by making changes on the client side. Instead, this speed difference stems from the structural performance differences between Refinitiv Eikon desktop and LSEG Workspace desktop, and perhaps this does not seem to be the content that should be discussed in this thread.

  • Jirapongse
    Jirapongse ✭✭✭✭✭

    This forum is dedicated to software developers who have questions about using LSEG APIs.

    Currently, the client has raised a feedback about the product design, which falls outside the scope of the forum. Please raise this feedback to the product support team or product manager directly.