I am attempting to get data on a set of rics for these variables (['TR.InvestorFullName', 'TR.InvestorFullName.investorid','TR.InvestorFullName.investorpermid','TR.CategoryOwnershipPct','TR.InvestorType','TR.InvParentType', 'TR.PctOfSharesOutHeld', 'TR.PctofSharesOutHeld.Date', 'TR.InvAddrCountry', 'TR.OwnTrnverRating', 'TR.OwnTurnover','TR.OwnTurnover.date','TR.InvInvestmentStyleCode']) across 20 years (suffixes_list= [('_1999','_2000'),('_2001','_2001'),('_2001','_2002'),('_2003','_2003'),('_2003','_2004'),('_2005','_2005'),('_2005','_2006'),('_2007','_2007'), ('_2007','_2008'), ('_2009','_2009'),('_2009','_2010'),('_2011',"_2011"),('_2011',"_2012"),('_2013',"_2013"),('_2013',"_2014"),('_2015',"_2015"), ('_2015',"_2016"),('_2017',"_2017"),('_2017',"_2018"),('_2019',"_2019"),('_2019',"_2020"),('_2021',"_2021"),('_2021',"_2022")] . I am running this chunk by changing the value of RICs from my list.
# Chunk for RIC at index 238 (Number) instruments = sortedrics[238] # Adjust the index for each RIC data_frames = [] for i in range(-24, -1): s_date = str(i) e_date = str(i) df, err = ek.get_data(instruments, fields, {'SDate': s_date, 'Edate': e_date, 'Frq':"Y"}) keylist = df.keys() for item in keylist: df[item] = df[item].astype(str) print(s_date, e_date) data_frames.append(df) merged_df = data_frames[0] for i in range(1, len(data_frames)): merged_df = pd.merge(merged_df, data_frames[i], on='Investor Full Name', how='outer', suffixes=suffixes_list[i-1]) merged_df.to_csv(f'{instruments}.csv')
I encounter memory errors. I am wondering if there are any suggestions on how to mitigate that. I tried looping this request, but API can only take 5 minutes per request, so it did not work well for me. Thanks.
The issue is that the API response pulls the request, but on the step of the merger, it is not consistent, meaning that after I get the memory error, I run the chunk again, and it works fine sometimes. It changes from one RIC to another.