Hi want download price from list of code. But the datframe Price is empty
CodiceDatastream
MSACWFELBEURSELBGOBL$LGBDGCEIT0004782758LU1694212348FR0010149120LU0452734238LU0764816798import numpy as np
import pandas as pd
import DatastreamDSWS as DSWS
ds = DSWS.Datastream(username="xxxxxxxxxxxx", password= "xxxxxxxxxxxx")
Codici = pd.read_excel('AnagraficaCodici.xlsx')
Codici['DatastreamRICs'] = Codici['CodiceDataStream']
ds_rics_str = Codici['DatastreamRICs'].str.cat(sep=',')
Price = ds.get_data(ds_rics_str, fields=['X(P)~EUR'], start='-5Y', end='0D', freq='D')
Hi @andrea.negri,
Hope Jirapongse answer was helpful, I'm adding code snippets for few of the RICs below assuming start and end day by my own using which you can read the RICs name from the file, and request for the data from DSWS API:
# Installing Packages import DatastreamDSWS as DSWS import pandas as pd import numpy as np import datetime from IPython.display import display pd.set_option('display.max_columns', None) pd.set_option('display.max_rows', None) # Loading RICs name from file stock_list = list(pd.read_excel("Sample.xlsx")['CodiceDataStream']) # Declaring Variables ds = DSWS.Datastream(username ="******", password='******') fields = ['(X)~EUR'] final_df = pd.DataFrame() frames = [] year = 2018 start_day = 1 end_day = 20 month = 10 start_date = datetime.datetime(year, month, start_day).strftime('%Y-%m-%d') end_date = datetime.datetime(year, month, end_day).strftime('%Y-%m-%d') for stock in stock_list: df = ds.get_data(tickers=stock,fields=fields,start=start_date, end=end_date)[stock].reset_index().droplevel('Currency', axis=1) df['Instrument'] = stock frames.append(df) temp_df = pd.concat(frames) final_df = temp_df.pivot_table(index='Dates', columns='Instrument', values=fields) final_df.columns = final_df.columns.swaplevel(0, 1) final_df.sort_index(axis=1, level=0, inplace=True) final_df.head()
Below is the output I got:
Please change the parameters like fields, and start/end day based on the data you need.
Adding complete code for @Jirapongse answers as well, in case you need it:
# Installing Packages import DatastreamDSWS as DSWS import pandas as pd import numpy as np import datetime from IPython.display import display pd.set_option('display.max_columns', None) pd.set_option('display.max_rows', None) # Loading RICs name from file stock_list = ",".join(pd.read_excel("Sample.xlsx")['CodiceDataStream']) # Declaring Variables and Session Initialization ds = DSWS.Datastream(username ="******", password='******') fields = ['(X)~EUR'] # Load Data df = ds.get_data(",".join(stock_list), fields=['(X)~EUR'], start='-1Y', end='0D', freq='D').reset_index().droplevel('Currency', axis=1) # Top 5 rows data df.head()
The second code show this error:
'NoneType' object has no attribute 'reset_index'
I have the problem in first code when code don't exist (example LU2348335964). The code in for cycle generate error
Hi @andrea.negri,
What is the in the excel sheet (post it here). Does the get_data API work, when you manually input the RICS from the excel sheet?
I can't attached the file there is an error in upload. In the file there are only two columns:
I assume that you retrieved data for the following instruments.
MSACWFE,LBEURSE,LBGOBL$,LGBDGCE,IT0004782758,LU1694212348,FR0010149120,LU0452734238,LU0764816798
MSACWFE, LBEURSE, LBGOBL$, and LGBDGCE don't have data for the P data type.
LU1694212348 has data from 10/24/2017.
You can use the following code to get data from the default data type.
Price = ds.get_data(ds_rics_str, fields=['(X)~EUR'], start='-5Y', end='0D', freq='D')
The output is:
Hi @andrea.negri,
I loaded the excel file and found for a few ISIN, there might be the possibility that data is not available for it, & it was also pointed out by @Jirapongse in his answer, so in that case, it will through an error if you will try to get data for all ISIN at once which is in the case of my previous answer.
To handle the exception here I have tried loading data for each ISIN individually, and if it throws an error that means there is no data available for it, so for those ISIN, I have created a separate dataframe.
Please find the code below for it:
# Installing Packages import DatastreamDSWS as DSWS import pandas as pd import numpy as np import datetime from IPython.display import display pd.set_option('display.max_columns', None) pd.set_option('display.max_rows', None) # Loading RICs name from file stock_list = list(pd.read_excel("Dati.xlsx")['Codice_ISIN']) # Declaring Variables ds = DSWS.Datastream(username ="*****", password='*****') fields = ['(X)~EUR'] final_df = pd.DataFrame() frames = [] err_frames = [] year = 2018 start_day = 1 end_day = 20 month = 10 start_date = datetime.datetime(year, month, start_day).strftime('%Y-%m-%d') end_date = datetime.datetime(year, month, end_day).strftime('%Y-%m-%d') # Loading Data for stock in stock_list: try: df = ds.get_data(tickers=stock, fields=fields, start=start_date, end=end_date)[stock].reset_index().droplevel('Currency', axis=1) df['Instrument'] = stock frames.append(df) # Dataframes of those ISIN for which data is available except: df = ds.get_data(tickers=stock,fields=fields,start=start_date, end=end_date) df['Instrument'] = stock err_frames.append(df) # Dataframes of those ISIN for which data is not available
# Processing ISIN with data temp_df = pd.concat(frames) final_df = temp_df.pivot_table(index='Dates', columns='Instrument', values=fields) final_df.columns = final_df.columns.swaplevel(0, 1) final_df.sort_index(axis=1, level=0, inplace=True) final_df.head()
# Processing ISIN without data err_temp_df = pd.concat(err_frames) err_temp_df.head()
When I ran this code for your file, I found for 2844 ISIN, data was available for the period which I have used in the code, and for 371 ISIN, data was not available.
Also please note, that loading data individually for each ISIN will be time taking process, so you may use threading functionality in python to speed up the execution time.
Nice!! The code work.
One question I don't know this threading functionality in python. You could explain me if is possible for you.
Hi @andrea.negri,
The error that you were getting with the second code was because of the DSWS limit. You can read more about DSWS User Stats and Limits here.
So I have written an alternative code to split the request which can load data for 50 ISIN (instruments) at a time, and will be a little faster than the previous code:
# Installing Packages import DatastreamDSWS as DSWS import pandas as pd from IPython.display import display pd.set_option('display.max_columns', None) pd.set_option('display.max_rows', None) # Loading RICs name from file stock_list = list(pd.read_excel("Dati.xlsx")['Codice_ISIN']) # Declaring Variables ds = DSWS.Datastream(username ="*****", password='*****') fields = ['(X)~EUR'] final_df = pd.DataFrame() frames = [] # Load Data for i in range(0, len(stock_list), 50): stock = stock_list[i:i+50] df = ds.get_data(",".join(stock), fields=['(X)~EUR'], start='-1Y', end='0D', freq='D') frames.append(df) # Concatenate DataFrames final_df = pd.concat(frames)
Let us know if any further queries you have with DSWS API.
Hi @andrea.negri,
Please find the below code using multiprocessing to fasten your data pull request:
# Installing Packages import DatastreamDSWS as DSWS import pandas as pd from multiprocessing.pool import ThreadPool as Pool # Loading RICs name from file stock_list = list(pd.read_excel("../DataStream/Dati.xlsx")['Codice_ISIN']) # Declaring Variables ds = DSWS.Datastream(username ="*****", password='*****') fields = ['(X)~EUR'] final_df = pd.DataFrame() frames = [] # Multiprocessing Implementation pool_size = 20 # your "parallelness" # define worker function before a Pool is instantiated def get_data(stock): df = ds.get_data(",".join(stock), fields=['(X)~EUR'], start='-1Y', end='0D', freq='D') frames.append(df) pool = Pool(pool_size) for i in range(0, len(stock_list), 50): stock = stock_list[i:i+50] pool.apply_async(get_data, (stock,)) pool.close() pool.join() # Concat Dataframe final_df = pd.concat(frames) # Top 5 rows final_df.head()