Requesting large data amounts in batches via RDP API

yannick.schneller · February 2022

I am trying to download roughly 10 years of daily price data for 3000 firms with rdp.get_get_historical_price When trying it at once, the API times out. What are best practices for requesting data in batches and the combining the outputs to a single pandas dataframe?

Currently I am trying to do it in batches by appending lists, but I'm not satisfied with the output as i can't format it properly into a dataframe

dict_list = ric_lists #ric_lists is list of ALL RICs
batch_list = []
return_list = []

for i in dict_list:
    batch_list.append(i)
    if len(batch_list)  == 5:
        return_list.append(Pricegetter(batch_list))
        batch_list.clear()

if batch_list:
    return_list.append(Pricegetter(batch_list))

[              KAER.VI
 2011-01-04  18.056605
 2011-01-05  18.056605
 2011-01-11  18.253407
 2011-01-12  18.253407
 2011-01-18  18.253407
 ...               ...
 2021-12-23  14.900000
 2021-12-27  15.100000
 2021-12-28  15.100000
 2021-12-29  15.200000
 2021-12-30  15.300000

 [1794 rows x 1 columns],
             BHAV.VI
 2011-01-03    43.05
 2011-01-07    43.01
 2011-01-11    43.00
 2011-01-19    43.05
 2011-01-20    43.05
 ...             ...
 2021-12-17    95.00
 2021-12-20    98.00
 2021-12-21    98.00
 2021-12-23    98.50
 2021-12-27    98.50

 [918 rows x 1 columns],
             SMPV.VI
 2011-01-03    39.60
 2011-01-04    39.50
 2011-01-05    39.45
 2011-01-07    39.10
 2011-01-10    39.30
 ...             ...
 2021-12-23    27.85
 2021-12-27    28.40
 2021-12-28    28.65
 2021-12-29    28.75
 2021-12-30    29.00

 [2740 rows x 1 columns]]

Should I change how I use the batches or is there way to tranfrom the output into a dataframe with datetime?

raksina.samasiri · February 2022

hi @yannick.schneller

Is this what you're looking for? Please see the code below, I create a dataframe with the last 10 year dates as an index first, then call the RDP function to get historical price summaries and join the output together with the date dataframe, then change the column name from field name to be RIC, I hope this helps.

import pandas as pd
from datetime import datetime, timedelta

start_date = '2011-01-01'
end_date = '2021-12-31'
ric_lists = ['KAER.VI','BHAV.VI','SMPV.VI']
field = 'TRDPRC_1'

def get_historical_price(df, ric):
    df2 = rdp.get_historical_price_summaries(
        universe = ric, 
        start = start_date, 
        end = end_date,
        interval = rdp.Intervals.DAILY,
        fields = [field]
    )

    df = df.join(df2)
    df = df.rename(columns={field: ric})
    return df

# generate a dataframe with last 10 year dates as an index
days = pd.date_range(start_date, end_date, freq='D')
df = pd.DataFrame({'Date': days})
df = df.set_index('Date')

for ric in ric_lists:
    df = get_historical_price(df, ric)
display(df)

Requesting large data amounts in batches via RDP API

Best Answer

Categories

EXPLORE OUR SITES

Requesting large data amounts in batches via RDP API

Best Answer

Categories