Hello,
I'm working on a Python project using the Refinitiv Data API to retrieve large sets of equity quote data. I need to fetch data in chunks of 10,000 records due to API limitations, but I'm struggling with implementing effective pagination to avoid overlapping data without missing any records. The API doesn't seem to support direct offset management.
Here's what I'm trying to do:
- Retrieve 10,000 records at a time.
- Ensure the next batch of 10,000 starts right after the last record of the previous batch.
- I'm using
rd.discovery.search
method but can't find a way to paginate properly.
Here is a code with offset(I know that search() doesn't have such an argument, it's just an example):
import refinitiv.data as rd
import pandas as pd
def retrieve_data():
rd.open_session()
offset = 0
all_data = pd.DataFrame()
while True:
results = rd.discovery.search(
view=rd.discovery.Views.EQUITY_QUOTES,
top=10000,
filter="(RCSAssetCategoryLeaf eq 'Ordinary Share')",
select="RIC, DTSubjectName, RCSAssetCategoryLeaf, ExchangeName, ExchangeCountry",
offset=offset # Adjust the offset for each iteration
)
df = pd.DataFrame(results)
if df.empty:
break
all_data = pd.concat([all_data, df], ignore_index=True)
offset += 10000 # Increase the offset to get the next batch of records
df.to_csv(f'./Retrieved_RICs_{offset}.csv')
all_data.to_csv('./Retrieved_RICs_All.csv')
print("Data retrieval complete.")
rd.close_session()
retrieve_data()
Any advice on managing continuation tokens or other effective methods would be greatly appreciated!
Thank you in advance!