Inconsistent results across queries in Codebook

changta · July 2024

Hello, I was referred to the forum via Codebook support (Case 13704675, if helpful).

I am using Python 3 in Codebook within the Workspace app (1.24.159).

I have been obtaining different results in Codebook for the same query. This occurs even when the query is re-run on the same day. For example, I ran the query twice (the second query ~30 mins after the first) and obtained inconsistent results (i.e., in one query a value would be null and in the other it would be populated). I also ran the query a third time on the same day and curiously only received 3 columns back (vs 26 columns in the other two queries) with almost completely all null values.

The explanation provided for the inconsistent results across the first two queries, as well for why not all columns would be returned as in the third query was API usage and limits and I was referred to this page: https://developers.lseg.com/en/api-catalog/eikon/eikon-data-api/documentation#eikon-data-api-usage-and-limits-guideline

From my reading, given my query, any potential issue would be around “requests,” and not datapoints or response volume. I’m including my query below, as well as a walk through of each of the limit restrictions listed on the site above and my evaluation of each. My most basic question is what constitutes a request and what portions of my code would count as a single request. I’ve highlighted the areas of potential applicability with clarifying questions.

Call-based limit

Request per second – 5 requests per second
- What constitutes as request? What portion of the above code constitutes a single request?
Response volume per minute – 50 MB per minute
- My resulting output is a txt file with a size of 1.2MB. It takes ~5mins to produce this, I don’t believe this is an issue.
Datapoints returned per request:
- I am using get_data
  - No enforced limit for version 1.1.0+
  - Server timeout around 300 seconds – what does this mean?

Daily limits

Requests per day – 10k requests per day
- What constitutes a request? If I run the above code 10x (with input list each of 10k), does that count as 10 requests or 100,000 requests?
Response volume per day – 5GB per day/50 MB per minute
- Resulting output is 1.2MB – not an issue

A screenshot of the code being run is below. It completes without error.

Where field_list = ['TR.RIC',

'TR.CUSIP',

'TR.tickersymbol',

'TR.legalentityidentifier',

'TR.CinCUSIPCode',

'TR.MIC',

'TR.exchangecountrycode',

'TR.CommonName',

'TR.ExchangeMarketIdCode',

'TR.exchangename',

'TR.exchangecountry',

'TR.ISIN',

'TR.ISINCode',

'TR.CUSIPCode',

'TR.CUSIPExtended',

'CF_Name',

'TR.FiIssuerName',

'TR.GICSSectorCode',

'TR.GICSIndustryGroupCode',

'TR.GICSIndustryCode',

'TR.GICSSubindustryCode',

'TR.GICSSector',

'TR.GICSIndustryGroup',

'TR.GICSIndustry',

'TR.GICSSubindustry'

]

And
input_list1 is a list with 10k elements.

[Deleted User] · July 2024

Hi @changta, to clarify the limits:

Call-based limit: Request per second – 5 requests per second
- What constitutes as request?
  Q: What portion of the above code constitutes a single request?
  A: Any one request to the LSEG servers is considered a request, e.g.:

rd.get_data(
        universe=input_list1[0],  # RIC 'A.N'
        fields=field_list
    )

is only one call, so is

rd.get_data(
        universe=input_list1[0:50],
        fields=field_list
    )

I believe you might possibly hit this limit, because of the 5 requests per second. To make sure that's not the case, please use something like

import time
time.sleep(1)

Call-based limit: Response volume per minute – 50 MB per minute
- My resulting output is a txt file with a size of 1.2MB. It takes ~5mins to produce this, I don’t believe this is an issue.

Datapoints returned per request:
- I am using get_data
  - No enforced limit for version 1.1.0+
  - Server timeout around 300 seconds.
    Q: What does this mean?
    A: If a request requires LSEG servers to 'work' for more than 300 seconds, it will fail. E.g.: there are calculating services supplied by LSEG APIs (e.g.: Instrument Pricing Analytics) that can take time to run; over 300 seconds is considered 'too much' and we ask that users request calculations for fewer instruments and make repeated requests in order to limit server throughput to ensure server health.

Daily limits: Requests per day – 10k requests per day
- Q: What constitutes a request? If I run the above code 10x (with input list each of 10k), does that count as 10 requests or 100,000 requests?
  A: In your code, if the `output` object is 50 items long, then it will send 50 requests, each of "a*b" data items where "a" = number of fields in request and "b" = number of instruments. E.g.: if the first `output` object is 50 items long and `field_list` is 25 items long, then each request will return 25*50 = 1250 data items, and that is only just one request. Same for the second `output` object. So you will send as many requests as there are items in the `output` object.

Daily limits: Response volume per day – 5GB per day/50 MB per minute
- Resulting output is 1.2MB – not an issue
- I believe that, due to the large number of calls, and possibly when debugging too, you might hit this limit... How many times a day do you call this Python code?

[Deleted User] · July 2024

For investigation, note the following code:

import pandas as pd
import refinitiv.data as rd
rd.open_session()
def split_list(lst, n):
    for i in range(0, len(lst), n):
        yield lst[i:i + n]
field_list = [
    'TR.RIC',
    'TR.CUSIP',
    'TR.tickersymbol',
    'TR.legalentityidentifier',
    'TR.CinCUSIPCode',
    'TR.MIC',
    'TR.exchangecountrycode',
    'TR.CommonName',
    'TR.ExchangeMarketIdCode',
    'TR.exchangename',
    'TR.exchangecountry',
    'TR.ISIN',
    'TR.ISINCode',
    'TR.CUSIPCode',
    'TR.CUSIPExtended',
    'CF_Name',
    'TR.FiIssuerName',
    'TR.GICSSectorCode',
    'TR.GICSIndustryGroupCode',
    'TR.GICSIndustryCode',
    'TR.GICSSubindustryCode',
    'TR.GICSSector',
    'TR.GICSIndustryGroup',
    'TR.GICSIndustry',
    'TR.GICSSubindustry']
spx = rd.discovery.Chain("0#.SPX")
spx_constituents = spx.constituents
%%time
input_list1  = spx_constituen
output = list(split_list(input_list1, n=50
full_set1 = pd.DataFrame(
    columns = ['Instrument', 'RIC'
count = 1
for i in output:
    pull = rd.get_data(
        universe=i,
        fields=field_li)
    print(count)
    full_set1 = pd.concat([full_set1, pull
    count += 1
print(len(full_set1
full_set1.to_csv('./data/full_set1d.txt', index=False)

changta · July 2024

https://community.developers.refinitiv.com/discussion/comment/119029#Comment_119029

Thank you for the thorough response, @jonathan.legrand. Additional clarifying questions below.

Call-based limit: Request per second – 5 requests per second
- Follow-up: The two examples of what constitutes one call - a run of the full input_list would take over 6 minutes - so unless each data items being requested constitutes a request, I'm not sure how I would hit this limit. I can implement time.sleep(1) if the request per second is not calls per second.
Datapoints returned per request:
- Follow-up: Confirming the 300 timeout likely is not an issue

Daily limits: Requests per day – 10k requests per day
- Follow-up: I have 25 fields in the request and 10k instruments, for 250k data items but this is only 1 request. I run this for a total of 20 times (each with 10k instruments) – this would only count as 20 requests and this limit should not apply?

Daily limits: Response volume per day – 5GB per day/50 MB per minute
- I believe that, due to the large number of calls, and possibly when debugging too, you might hit this limit... How many times a day do you call this Python code?
- As explained above, I have 20 input lists. Ideally I would run this once (from my understanding, 20 requests), but with debugging, I would estimate the maximum number of times I’ve run the full list over a day to be no more than 5 times (for 100 requests). If my understanding on what constitutes a request is correct, this should not be an issue?

[Deleted User] · July 2024

https://community.developers.refinitiv.com/discussion/comment/119037#Comment_119037

Hi @changta, Thank you for your response. In this case, I believe that the issue could come from the endpoints themselves. It is rare for issues to come from this deep, but due to the large amount of data requested, it is possible. For me to investigate, would you mind sending me your logs? You can create logs as per the article "Troubleshooting IPA & RD Python Library", under the part "Switching `Logs` On".

changta · July 2024

Thanks @jonathan.legrand. I see how to display log output for each cell within the notebook; how do I create a log file? I see in the article you noted using the configuration file but I am not familiar with how to make the appropriate modifications. Can you provide detailed steps to enable this so I can produce a log file to share?

raksina.samasiri · July 2024

Hi @changta ,

The example configuration file can be found here.

You have the option to set initialization parameters of the Refinitiv Data Library in the refinitiv-data.config.json configuration file. This file must be located beside your notebook, in your user folder or in a folder defined by the RD_LIB_CONFIG_PATH environment variable. The RD_LIB_CONFIG_PATH environment variable is the option used by this series of examples. The following code sets this environment variable.

import os
os.environ["RD_LIB_CONFIG_PATH"] = "../../Configuration"

changta · July 2024

Thank you @raksina.samasiri. @jonathan.legrand is there a way I can send you logs securely? I see the logs include my input, which I would like to remain nonpublic.

changta · July 2024

Hello @jonathan.legrand - following up to see how I can share logs (securely) for further investigation? Thanks in advance.

raksina.samasiri · July 2024

Hi @changta ,

You can contact your Refinitiv account representative and send the email of logs to them, then they can forward it to me.

Inconsistent results across queries in Codebook

Best Answer

Answers

Categories

EXPLORE OUR SITES

Inconsistent results across queries in Codebook

Best Answer

Answers

Categories