question

Upvotes
Accepted
1 0 1 2

How to efficiently and briskly retrieve datapoints for all SNP 500 RICs

Hi everyone,

I am trying to retrieve some Fundamental and Reference data for all of the SNP 500 stocks. These datapoints include values such as Income before Extraordinary Items, EBIT, Capital, etc. Now, I am trying to asynchronously retrieve all of this data. At first, I tried to loop ethrough the list of the RICs of the SNP, and from there, retrieve the data for each RIC one at a time (by looping with a for loop). However, this was rather inefficient, and it took about 9 minutes to run.

Now, I tried to separate each call into multiple tasks and I used asyncio.gather() to run all of these requests concurrently.

import asyncio
import pandas as pd
from datetime import datetime

async def fetch_ebit_data(ric, start_year, current_year):
    try:
        ebit_datapoints = await content.fundamental_and_reference.Definition(
            universe=[f'{ric}'],
            fields=["TR.F.EBIT.date", "TR.F.EBIT"],
            parameters={"SDate": f"{start_year}-01-01", "EDate": f"{current_year}-12-31", "Frq": "Y"}
        ).get_data_async(closure=f'')
        
        # Check if the response contains data
        if ebit_datapoints and not ebit_datapoints.data.df.empty:
            return ebit_datapoints.data.df
        else:
            print(f"No data returned for {ric}")
            return None
        
    except Exception as e:
        print(f"Error retrieving data for {ric}: {str(e)}")
        return None

async def retrieve_ebit_data(list_of_snp_rics, start_year, current_year):
    tasks = [fetch_ebit_data(ric, start_year, current_year) for ric in list_of_snp_rics]
    ebit_values_list = await asyncio.gather(*tasks)
    
    # Filter out None values (i.e., failed or empty responses)
    ebit_values_list = [df for df in ebit_values_list if df is not None]
    
    combined_ebit_df = pd.concat(ebit_values_list, ignore_index=True)
    display(combined_ebit_df)
    return combined_ebit_df

# Setup
current_year = datetime.today().year
start_year = current_year - 8
list_of_snp_rics = await retrieve_ric_snp_stocks()

# Retrieve and combine EBIT data
combined_ebit_df = await retrieve_ebit_data(list_of_snp_rics, start_year, current_year)

The problem with this method is that, I couldn't get any values for MAJORITY of the tickers / RICs. So currently, I am faced with a problem as I am unsure of a workaround in this situation. One thing I can think of is to use the ThreadPoolExecutor for the number of RICs there are (deploying about 503 threads), but this might be a really CPU-intensive process. I need some guidance on how I can navigate through this, as I can then apply it to retrieving multiple datapoints (EBIT and Capital for example).


Thanks!

python#technologyrdp-apirefinitiv-data-platformdatardpjupyter-notebook
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
Accepted
7.1k 18 2 8

Hi @vishal.nanwani ,


I don't think you need to use asynchronous calls for the task you are describing above. Using get_data or content.fundamental_and_reference.Definition function and passing all constituent rics (or Index chain) to the function will produce the result you are after in about a second. See below an example:

1. with index chain:

spx = rd.get_data("0#.SPX", fields  =["TR.F.EBIT.date", "TR.F.EBIT"])
spx

2. by passing all rics:

spx_constituents = rd.get_data("0#.SPX", fields  =["TR.CompanyName"])
df = rd.get_data(universe=spx_constituents['Instrument'].to_list(), fields = ["TR.F.EBIT.date", "TR.F.EBIT"])
df


screenshot-2024-08-29-at-120712.png

If you are using other fields for which you don't get values, best would be to reach the content team via HelpDesk in Workspace or via my.refinitiv.com. They will help you identify the correct fields as not getting values for the fields I don't think is related the API calls.


Best regards,

Haykaz


icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
1 0 1 2

Hi @aramyan.h ,


1. Is it possible to do this for the past 8 years? Something like this:

response = await content.fundamental_and_reference.Definition(
                universe=[f'{ric}'],
                fields=specific_fields,
                parameters={"SDate": f"{start_year}-01-01", "EDate": f"{current_year}-12-31", "Frq": "Y"}
            ).get_data_async(closure='');

2. With the method described above, I get usually 2-3/10 RICs that have empty values for some of the data I am trying to pull , for specific years. I am wondering,

a. why does this happen in the first place? and is there a workaround if I want to get all valid values of annual data for different datapoints for the last eight years?

b. can this be prevented with your solution?


Thanks!

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
7.1k 18 2 8

Hi @vishal.nanwani ,


Yes, you can request the history, see below:

spx_constituents = rd.get_data("0#.SPX", fields  =["TR.CompanyName"])
df = rd.get_data(universe=spx_constituents['Instrument'].to_list(), fields = ["TR.F.EBIT.date", "TR.F.EBIT"], parameters={"SDate": "2018-01-01", "EDate": "2024-12-31", "Frq": "Y"})
df

screenshot-2024-09-02-at-133137.png

The content should be same under async and non-async, however, for empty values you can raise a content question via my.refinitiv.com.


Best regards,

Haykaz


icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.