question

Upvotes
Accepted
1 0 0 0

Requesting large data amounts in batches via RDP API

I am trying to download roughly 10 years of daily price data for 3000 firms with rdp.get_get_historical_price When trying it at once, the API times out. What are best practices for requesting data in batches and the combining the outputs to a single pandas dataframe?

Currently I am trying to do it in batches by appending lists, but I'm not satisfied with the output as i can't format it properly into a dataframe

dict_list = ric_lists #ric_lists is list of ALL RICs
batch_list = []
return_list = []

for i in dict_list:
    batch_list.append(i)
    if len(batch_list)  == 5:
        return_list.append(Pricegetter(batch_list))
        batch_list.clear()

if batch_list:
    return_list.append(Pricegetter(batch_list))
[              KAER.VI
 2011-01-04  18.056605
 2011-01-05  18.056605
 2011-01-11  18.253407
 2011-01-12  18.253407
 2011-01-18  18.253407
 ...               ...
 2021-12-23  14.900000
 2021-12-27  15.100000
 2021-12-28  15.100000
 2021-12-29  15.200000
 2021-12-30  15.300000
 
 [1794 rows x 1 columns],
             BHAV.VI
 2011-01-03    43.05
 2011-01-07    43.01
 2011-01-11    43.00
 2011-01-19    43.05
 2011-01-20    43.05
 ...             ...
 2021-12-17    95.00
 2021-12-20    98.00
 2021-12-21    98.00
 2021-12-23    98.50
 2021-12-27    98.50
 
 [918 rows x 1 columns],
             SMPV.VI
 2011-01-03    39.60
 2011-01-04    39.50
 2011-01-05    39.45
 2011-01-07    39.10
 2011-01-10    39.30
 ...             ...
 2021-12-23    27.85
 2021-12-27    28.40
 2021-12-28    28.65
 2021-12-29    28.75
 2021-12-30    29.00
 
 [2740 rows x 1 columns]]

Should I change how I use the batches or is there way to tranfrom the output into a dataframe with datetime?

rdp-apipandas
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

1 Answer

· Write an Answer
Upvote
Accepted
14k 30 5 10

hi @yannick.schneller

Is this what you're looking for? Please see the code below, I create a dataframe with the last 10 year dates as an index first, then call the RDP function to get historical price summaries and join the output together with the date dataframe, then change the column name from field name to be RIC, I hope this helps.

import pandas as pd
from datetime import datetime, timedelta

start_date = '2011-01-01'
end_date = '2021-12-31'
ric_lists = ['KAER.VI','BHAV.VI','SMPV.VI']
field = 'TRDPRC_1'

def get_historical_price(df, ric):
    df2 = rdp.get_historical_price_summaries(
        universe = ric, 
        start = start_date, 
        end = end_date,
        interval = rdp.Intervals.DAILY,
        fields = [field]
    )

    df = df.join(df2)
    df = df.rename(columns={field: ric})
    return df
    
# generate a dataframe with last 10 year dates as an index
days = pd.date_range(start_date, end_date, freq='D')
df = pd.DataFrame({'Date': days})
df = df.set_index('Date')

for ric in ric_lists:
    df = get_historical_price(df, ric)
display(df)

1645767380450.png



1645767380450.png (57.4 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.