Hi everyone,
I am trying to retrieve some Fundamental and Reference data for all of the SNP 500 stocks. These datapoints include values such as Income before Extraordinary Items, EBIT, Capital, etc. Now, I am trying to asynchronously retrieve all of this data. At first, I tried to loop ethrough the list of the RICs of the SNP, and from there, retrieve the data for each RIC one at a time (by looping with a for loop). However, this was rather inefficient, and it took about 9 minutes to run.
Now, I tried to separate each call into multiple tasks and I used asyncio.gather() to run all of these requests concurrently.
import asyncio import pandas as pd from datetime import datetime async def fetch_ebit_data(ric, start_year, current_year): try: ebit_datapoints = await content.fundamental_and_reference.Definition( universe=[f'{ric}'], fields=["TR.F.EBIT.date", "TR.F.EBIT"], parameters={"SDate": f"{start_year}-01-01", "EDate": f"{current_year}-12-31", "Frq": "Y"} ).get_data_async(closure=f'') # Check if the response contains data if ebit_datapoints and not ebit_datapoints.data.df.empty: return ebit_datapoints.data.df else: print(f"No data returned for {ric}") return None except Exception as e: print(f"Error retrieving data for {ric}: {str(e)}") return None async def retrieve_ebit_data(list_of_snp_rics, start_year, current_year): tasks = [fetch_ebit_data(ric, start_year, current_year) for ric in list_of_snp_rics] ebit_values_list = await asyncio.gather(*tasks) # Filter out None values (i.e., failed or empty responses) ebit_values_list = [df for df in ebit_values_list if df is not None] combined_ebit_df = pd.concat(ebit_values_list, ignore_index=True) display(combined_ebit_df) return combined_ebit_df # Setup current_year = datetime.today().year start_year = current_year - 8 list_of_snp_rics = await retrieve_ric_snp_stocks() # Retrieve and combine EBIT data combined_ebit_df = await retrieve_ebit_data(list_of_snp_rics, start_year, current_year)
The problem with this method is that, I couldn't get any values for MAJORITY of the tickers / RICs. So currently, I am faced with a problem as I am unsure of a workaround in this situation. One thing I can think of is to use the ThreadPoolExecutor for the number of RICs there are (deploying about 503 threads), but this might be a really CPU-intensive process. I need some guidance on how I can navigate through this, as I can then apply it to retrieving multiple datapoints (EBIT and Capital for example).
Thanks!