Hi everyone,
I am trying to retrieve some Fundamental and Reference data for all of the SNP 500 stocks. These datapoints include values such as Income before Extraordinary Items, EBIT, Capital, etc. Now, I am trying to asynchronously retrieve all of this data. At first, I tried to loop ethrough the list of the RICs of the SNP, and from there, retrieve the data for each RIC one at a time (by looping with a for loop). However, this was rather inefficient, and it took about 9 minutes to run.
Now, I tried to separate each call into multiple tasks and I used asyncio.gather() to run all of these requests concurrently.
import asyncio
import pandas as pd
from datetime import datetime
async def fetch_ebit_data(ric, start_year, current_year):
try:
ebit_datapoints = await content.fundamental_and_reference.Definition(
universe=[f'{ric}'],
fields=["TR.F.EBIT.date", "TR.F.EBIT"],
parameters={"SDate": f"{start_year}-01-01", "EDate": f"{current_year}-12-31", "Frq": "Y"}
).get_data_async(closure=f'')
# Check if the response contains data
if ebit_datapoints and not ebit_datapoints.data.df.empty:
return ebit_datapoints.data.df
else:
print(f"No data returned for {ric}")
return None
except Exception as e:
print(f"Error retrieving data for {ric}: {str(e)}")
return None
async def retrieve_ebit_data(list_of_snp_rics, start_year, current_year):
tasks = [fetch_ebit_data(ric, start_year, current_year) for ric in list_of_snp_rics]
ebit_values_list = await asyncio.gather(*tasks)
# Filter out None values (i.e., failed or empty responses)
ebit_values_list = [df for df in ebit_values_list if df is not None]
combined_ebit_df = pd.concat(ebit_values_list, ignore_index=True)
display(combined_ebit_df)
return combined_ebit_df
# Setup
current_year = datetime.today().year
start_year = current_year - 8
list_of_snp_rics = await retrieve_ric_snp_stocks()
# Retrieve and combine EBIT data
combined_ebit_df = await retrieve_ebit_data(list_of_snp_rics, start_year, current_year)
The problem with this method is that, I couldn't get any values for MAJORITY of the tickers / RICs. So currently, I am faced with a problem as I am unsure of a workaround in this situation. One thing I can think of is to use the ThreadPoolExecutor for the number of RICs there are (deploying about 503 threads), but this might be a really CPU-intensive process. I need some guidance on how I can navigate through this, as I can then apply it to retrieving multiple datapoints (EBIT and Capital for example).
Thanks!