Hey everyone,
I'm working on a project on a codebook workspace where I need to pull a lot of fundamental data(322 fields) for various companies using their RICs via the Refinitiv Eikon API. The problem I'm facing is that the process is taking way too long, and I'm frequently running into gateway timeouts (Error code 2504).
- Long Processing Time: The entire process is very slow.
- Gateway Timeouts: Getting frequent gateway timeouts (Error code 2504), especially with larger batches.
EXPORT_FIELDS = [
"TR.F.TotRevPerShr(Curn=USD)",
"TR.F.SalesOfGoodsSrvcNet(Curn=USD)",
"TR.Revenue(Curn=USD)",
"TR.F.COGS(Curn=USD)",
"TR.F.COGSUNCLASSIF(Curn=USD)",
"TR.F.OpExpnTot(Curn=USD)",
"TR.F.SGATot(Curn=USD)",
"TR.F.EBITDANorm(Curn=USD)",
"TR.F.DeprDeplAmortTot(Curn=USD)",
"TR.F.RnD(Curn=USD)",
# 311 more fields...
]
ek.set_app_key('DEFAULT_CODE_BOOK_APP_KEY')
def process_single_company(company):
max_attempts = 4
attempts = 0
while attempts < max_attempts:
try:
print(f"Fetching data for {company} (Attempt {attempts + 1})")
fundamentals_data, e = ek.get_data(
instruments=[company],
fields=["TR.RevenueActReportDate", "TR.F.StatementDate", *EXPORT_FIELDS],
parameters={
'SDate': '1940-01-01',
'EDate': datetime.now().strftime("%Y-%m-%d"),
'Frq': 'Y'
}
)
if e:
print(f"Error fetching data for {company}: {e}")
time.sleep(2) # Pause before retry
attempts += 1
continue
print(f"Saving data for {company}")
fundamentals_data.to_csv(file_pat
print(f"Finished processing for {company}. Moving to the next RIC.\n")
return
except Exception as e:
print(f"Failed to get market cap for {company} due to: {tb.format_exc()}")
attempts += 1
time.sleep(2)
for company in COMPANY:
process_single_company(company)
I tried:
- Batching the requests to process multiple companies or multiple fields at once.
- Processing fields one by one to handle local errors.
- Implementing rate limiting.
How can I speed up the process or maybe I am missing something and there is a better way to handle this amount of data? Or maybe any tips on how to better handle large volumes of data without hitting timeouts?
Thank you!