Hey there,
I am using the Datastream API in Python and I am tyring to retrieve historical end-of-month prices for all constituents of the Euro Stoxx 50 index in the selected period. The problem here is that the constituents have changed over the years so my initial thought was to create a dictionary first that delivers the constituents for every month in the selected period and after that making another datastream request based on the listed constituents in each month. Unfortunately, I've reached my data limit for this month because I ran my code multiple times and it seems like my code was quite inefficient retrieving the names and prices for 50 constituents every single month.
My code is the following:
import pandas as pd
import PyDSWS as dsws
import time
from datetime import date
from datetime import timedelta
from dateutil.rrule import rrule, MONTHLY
import calendar
#%%
ds = dsws.Datastream(username="xxxx",password="yyyy")
#%%
start_date = date(2018, 1, 31)
end_date = date(2023, 1, 31)
#%%
date_range = pd.date_range(start=start_date, end = end_date, freq='M').strftime('%Y-%m-%d')
#%%
data_dict = {}
for date in date_range:
data = ds.get_data('LDJES50I0118', fields = ['ISIN'],date=date)
data_dict[date] = data
#%%
data_dict_list = {k:v.values.tolist() for k,v, in data_dict.items()}
#%%
prices_dict = {}
for date, securities in data_dict_list.items():
prices = ds.get_data(securities, fields = ['P'], start = date, end = date)
prices_dict[date] = prices
#%%
df = pd.concat(prices_dict.values(), axis=0)
I know there have been similar questions on this platform, but I couldn't find a case where someone kept the changing constituents in mind. Does anyone have an idea of how to make this code more efficient so that I don't reach my datalimit after a couple of requests? Maybe by storing the requested results or anything similar?
Thank you in advance for your help!