Skipping values lseg data platform

I checked the log files and found that this could be the problem on the backend.

I can replicate this issue by sending requests to the RDP underlying platform.

config = ld.get_config()
config.set_param("apis.data.datagrid.underlying-platform", "rdp")
ld.open_session()

I invoked the get_data method twice using identical parameters, but the returned results differ in the number of NA values.

I will raise this issue on your behalf to the backend team to verify what the problem is. The case number is 15200357.

I checked can found that the problem has been resolved.

However, the usage of the API should not exceed the 10,000 data point limit. Therefore, if you requested data for 20 fields, the number of items should be 500 or less than 500. For example:

df1 = ld.get_data(
    universe=universe[0:500],
    fields = fields)

All comments

Gurpreet

Hello @inzilya

Please refer to this usage limits and guidelines document. With the number of instruments and the fields, you seem to be exceeding the allowed limit.

I ran 5 times the synchronous code where I sent requests of 1000 RIC every 2 seconds to explicitly avoid the request time limitation.
The volume of data received is 3 Mb.
I received about 7000 data points per request, although in my version there is no limit on the number of data points.

(for example, if the request asked for 5000 instruments, but only 3000 were returned)

All 60,000 RICs I sent a request for returned with data, but the problem remains that the number of missing values is still different.

The code provides for handling timeouts and server errors.
There are also cases when a completely empty column is returned in the response, for this situation there is a solution to handle it.
But when there is no data in a specific cell, it is very difficult to catch.
Even if you use additional loading for certain RICs, it may not return data for a specific RIC. In this case, how can I identify the lack of data in a cell? Is this due to the fact that a specific value is not in refenitiv or for some reason it did not reach me?
Can you suggest methods for identifying missing values in a cell or "data point"?

Code:

FIELDS = ['TR.TotRevenue3YrCAGR', 'TR.F.IncAftTaxMargPct3YrIntmCAGR',

          'TR.EBITDA3YrInterimCAGR', 'TR.F.EBITDA5YrIntmCAGR',

          'TR.F.NetCFOp3YrIntmCAGR', 'TR.F.EPSDilExclExOrdComTot3YrIntmCAGR']

response = fundamental_and_reference.Definition(

                    universe=rics_batch,

                    fields=FIELDS,

                    parameters=PARAMS

                ).get_data()


                if response.is_success:

                    ld_df = response.data.df

Detect count of missing values:

missing_values_df = pd.DataFrame(data=df.replace("", np.nan).isna().mean() * 100).T.round(3)

Gurpreet

When you say synchronous - does the code wait for a response before proceeding with the next batch? Is the response taking more than 300 seconds (server timeout limit).

Of course it does, because the code works in main thread (synchronously), which will wait for a response from the server. And it will not move to the next batch until it receives a response.
The response from the server takes about 1-5 seconds. Depending on the number of fields in the request and the server load.
For all the time that I have been using this library, I have not had a timeout of 300 seconds.

Gurpreet

Can you provide the complete code and the instrument list. I will try to run it at my end.

rics.txt

import pandas as pd

from time import sleep


import lseg.data as ld

from lseg.data.content import fundamental_and_reference

from lseg.data.errors import LDError




def fetch_data(rics, fields, params):

    ld.open_session()

    df = pd.DataFrame()

    step = 1000


    for rics_list in range(0, len(rics), step):

        attempt = 0

        gotData = False

        batch = rics[rics_list: rics_list + step]

        while not gotData and (attempt < 3):

            try:

                response = fundamental_and_reference.Definition(

                    universe=batch,

                    fields=fields,

                    parameters=params

                ).get_data()

                ld_df = response.data.df

                df = pd.concat([df, ld_df], ignore_index=True)

                print(f'got - {rics_list + step}')

                gotData = True

                # sleep(2)

            except LDError as lde:

                print(lde)

            except Exception as e:

                print(e)

            sleep(5 * attempt + 10)

            attempt += 1

    ld.close_session()

    return df




def main():

    with open('rics.txt') as file:

        rics = [line.strip() for line in file]


    FIELDS = ['TR.TotRevenue3YrCAGR', 'TR.F.IncAftTaxMargPct3YrIntmCAGR',

              'TR.EBITDA3YrInterimCAGR', 'TR.F.EBITDA5YrIntmCAGR',

              'TR.F.NetCFOp3YrIntmCAGR', 'TR.F.EPSDilExclExOrdComTot3YrIntmCAGR']

    PARAMS = {'Curn': 'USD', 'CH': 'Fd'}

    result = fetch_data(rics, FIELDS, PARAMS)

    result.to_csv('result.csv', header=True)




if __name__ == '__main__':

    main()

I checked the log files and found that this could be the problem on the backend.

I can replicate this issue by sending requests to the RDP underlying platform.

config = ld.get_config()
config.set_param("apis.data.datagrid.underlying-platform", "rdp")
ld.open_session()

I invoked the get_data method twice using identical parameters, but the returned results differ in the number of NA values.

I will raise this issue on your behalf to the backend team to verify what the problem is. The case number is 15200357.

Thanks for the feedback. I hope the backend team can solve this problem.
Tell me where I can find the case number 15200357?

I checked can found that the problem has been resolved.

However, the usage of the API should not exceed the 10,000 data point limit. Therefore, if you requested data for 20 fields, the number of items should be 500 or less than 500. For example:

df1 = ld.get_data(
    universe=universe[0:500],
    fields = fields)

Thanks for the reply. I've done a few uploads and overall I'm pleased with the results.
I used the formula step = int(10000 / len(FIELDS))
However, in 1% of cases, the response doesn't include an entire column, so I have to do a manual check, which doesn't always work.

FIELDS = ['TR.TotRevenue3YrCAGR', 'TR.F.IncAftTaxMargPct3YrIntmCAGR',

          'TR.EBITDA3YrInterimCAGR', 'TR.F.EBITDA5YrIntmCAGR',

          'TR.F.NetCFOp3YrIntmCAGR', 'TR.F.EPSDilExclExOrdComTot3YrIntmCAGR',

          'TR.EBITDAMarginPercent(Period=LTM)', 'TR.GrossMargin(Period=LTM)']

if not (ld_df.isna().mean() * 100 == 100).any():
                        print(f'good')
                        return ld_df
                    else:
                        col_attempt += 1
                        print(f'col')
                        if col_attempt > 3:
                            return ld_df

In the screenshot I have marked these cases in red.

It would be good if this problem would also be solved over time.

If you need logs, I can provide them.

Yes, please share the log files.

20251007-0101-1812-lseg-data-lib.log

Search "HTTP Response id 52"

For example, the line that I received was ["VISW.NS","","","","","","","",""]
In another request ["VISW.NS",16.3991425851467,"","","","","",13.7384307839,25.22533]

I could not replicate this issue.

Could you share the runnable Python code that can be used to replicate this issue?

https://community.developers.refinitiv.com/discussion/comment/127761#Comment_127761

As I mentioned earlier, this issue isn't easy to reproduce. You need to run several data dumps and look for missing values. It's good to catch it in 10 dumps. But I suspect this is related to server load. The only problem is that when this issue occurs, the server doesn't notify you.

Here's a comment with the code where I was able to catch the problem; it also occurs when doing asynchronous unloading: