Given the exmaple dataframe below (real dataframe is more than few thousand rows), what is the most efficient way to obtain the maximum high price and minimum low price given the ricCode and startDate and endDate of each row of the dataframe ?
{'ricCode': {0: '0.0700.HK', 2: '0700.HK', 3: '9988.HK, 6: '2388.HK', 7: 'NVDA.O, 8: 'AMD.O'}, 'startDate': {0: '2023-03-13', 2: '2023-03-16', 3: '2023-05-03', 6: '2024-01-29', 7: '2025-01-06', 8: '2024-11-19'}, 'endDate': {0: '2023-03-13', 2: '2023-03-16', 3: '2023-05-03', 6: '2024-01-29', 7: '2025-01-06', 8: '2024-11-19'}}
Currently, I am using a for loop to iterate through each row of data frame. This is not only slow and it sometimes doesn't even return any output and just got stuck at times, Would appreciate a more efficient and error-prone way to do this, given the the start date and end date are different each row.
def get_rd_data(row):
try:
rd_out = rd.get_data(universe=row['ricCode'],fields=['MAX(TR.HIGHPRICE)','MIN(TR.LOWPRICE)'], parameters={'SDate':row[‘startDate'], 'EDate':row['endDate'], 'Frq':'D','Adjusted':1}) # adjusted already
except:
out = get_rd_data(row)
return out
output = pd.DataFrame()
for idx, row in data.iterrows():
output = pd.concat([output,get_rd_data(row)], axis=0)