Skipping values lseg data platform

inzilya
inzilya Newcomer
edited September 4 in Refinitiv Data Platform

Hello, I am using the lseg data library for async data dumping for 60,000 RICs and I have encountered a problem that sometimes the response for some RICs does not return data.
I checked the availability of dumped indicators for specific RICs using workspace. These indicators are present, but when I request them to the API, they are not returned to me.
I have made several identical dumps and received a different number of missing values ​​in the columns.

df.isna().mean() * 100
missing_prc.png


Accordingly, each time I did not receive data for some RICs. Here are the differences in data between dumps 1 and 3.

Pasted image 20250903034412.png

I am using dumping for 2500 RICs and an async function.

Code:

FIELDS = ['TR.TotRevenue3YrCAGR',  
          'TR.F.IncAftTaxMargPct3YrIntmCAGR',  
          'TR.EBITDA3YrInterimCAGR',  
          'TR.F.EBITDA5YrIntmCAGR',  
          'TR.F.NetCFOp3YrIntmCAGR',  
          'TR.F.EPSDilExclExOrdComTot3YrIntmCAGR',  
          'TR.EBITDAMarginPercent(Period=LTM)',  
          'TR.GrossMargin(Period=LTM)',  
          'TR.NetProfitMargin(Period=LTM)',  
          'TR.TotalAssetsReported(Period=FQ0)',  
          'TR.TotalEquity(Period=FQ0)',  
          'TR.AssetTurnover(Period=LTM)',  
          'TR.DilutedEpsExclExtra(Period=LTM)',  
          'TR.EV',  
          'TR.F.CAPEXTot(Period=LTM)',  
          'TR.Revenue(Period=LTM)',  
          'TR.EarningsRetentionRatePeriodToPeriodDiff',  
          'TR.F.EmpFTEEquivPrdEnd',  
          'TR.F.NetIncAfterTax(Period=LTM)']

PARAMS = {'Curn': 'USD', 'CH': 'Fd'}

response = await fundamental_and_reference.Definition(
                    universe=rics_batch,
                    fields=FIELDS,
                    parameters=PARAMS
                ).get_data_async()

Attached the log file for 1 and 3 unloading to the letter.

Can you recommend a solution to this problem?

Answers

  • Hello @inzilya

    Please refer to this usage limits and guidelines document. With the number of instruments and the fields, you seem to be exceeding the allowed limit.

  • inzilya
    inzilya Newcomer
    edited September 4

    I ran 5 times the synchronous code where I sent requests of 1000 RIC every 2 seconds to explicitly avoid the request time limitation.
    The volume of data received is 3 Mb.
    I received about 7000 data points per request, although in my version there is no limit on the number of data points.

    (for example, if the request asked for 5000 instruments, but only 3000 were returned)

    All 60,000 RICs I sent a request for returned with data, but the problem remains that the number of missing values ​​is still different.

    image.png


    The code provides for handling timeouts and server errors.
    There are also cases when a completely empty column is returned in the response, for this situation there is a solution to handle it.
    But when there is no data in a specific cell, it is very difficult to catch.
    Even if you use additional loading for certain RICs, it may not return data for a specific RIC. In this case, how can I identify the lack of data in a cell? Is this due to the fact that a specific value is not in refenitiv or for some reason it did not reach me?
    Can you suggest methods for identifying missing values ​​in a cell or "data point"?

    Code:

    FIELDS = ['TR.TotRevenue3YrCAGR', 'TR.F.IncAftTaxMargPct3YrIntmCAGR',
    'TR.EBITDA3YrInterimCAGR', 'TR.F.EBITDA5YrIntmCAGR',
    'TR.F.NetCFOp3YrIntmCAGR', 'TR.F.EPSDilExclExOrdComTot3YrIntmCAGR'] response = fundamental_and_reference.Definition(
    universe=rics_batch,
    fields=FIELDS,
    parameters=PARAMS
    ).get_data()

    if response.is_success:
    ld_df = response.data.df

    Detect count of missing values:

    missing_values_df = pd.DataFrame(data=df.replace("", np.nan).isna().mean() * 100).T.round(3)
    
  • When you say synchronous - does the code wait for a response before proceeding with the next batch? Is the response taking more than 300 seconds (server timeout limit).

  • inzilya
    inzilya Newcomer

    Of course it does, because the code works in main thread (synchronously), which will wait for a response from the server. And it will not move to the next batch until it receives a response.
    The response from the server takes about 1-5 seconds. Depending on the number of fields in the request and the server load.
    For all the time that I have been using this library, I have not had a timeout of 300 seconds.

  • Can you provide the complete code and the instrument list. I will try to run it at my end.

  • inzilya
    inzilya Newcomer

    import pandas as pd
    from time import sleep

    import lseg.data as ld
    from lseg.data.content import fundamental_and_reference
    from lseg.data.errors import LDError


    def fetch_data(rics, fields, params):
    ld.open_session()
    df = pd.DataFrame()
    step = 1000

    for rics_list in range(0, len(rics), step):
    attempt = 0
    gotData = False
    batch = rics[rics_list: rics_list + step]
    while not gotData and (attempt < 3):
    try:
    response = fundamental_and_reference.Definition(
    universe=batch,
    fields=fields,
    parameters=params
    ).get_data()
    ld_df = response.data.df
    df = pd.concat([df, ld_df], ignore_index=True)
    print(f'got - {rics_list + step}')
    gotData = True
    # sleep(2)
    except LDError as lde:
    print(lde)
    except Exception as e:
    print(e)
    sleep(5 * attempt + 10)
    attempt += 1
    ld.close_session()
    return df


    def main():
    with open('rics.txt') as file:
    rics = [line.strip() for line in file]

    FIELDS = ['TR.TotRevenue3YrCAGR', 'TR.F.IncAftTaxMargPct3YrIntmCAGR',
    'TR.EBITDA3YrInterimCAGR', 'TR.F.EBITDA5YrIntmCAGR',
    'TR.F.NetCFOp3YrIntmCAGR', 'TR.F.EPSDilExclExOrdComTot3YrIntmCAGR']
    PARAMS = {'Curn': 'USD', 'CH': 'Fd'}
    result = fetch_data(rics, FIELDS, PARAMS)
    result.to_csv('result.csv', header=True)


    if __name__ == '__main__':
    main()