How to Optimize Large-Scale Daily Time Series Extraction

I have a script that extracts daily time series data and static data for the individual constituents of the ETF IEAC.L over a 10-year period. Currently, I’m fetching the data day by day for each of roughly 4000 constituents, which makes the process very slow — a test for a single day already took about 1 hour.
My current approach:
# Configuartions
etf_ric = "IEAC.L"
snapshot_date = "2025-04-30"
start_ts = "2015-04-30"
end_ts = "2025-04-30"
# Holding and static fields
fields_holdings = ["TR.FundHoldingRIC", "TR.FundHoldingName", "TR.FundLatestFilingDate"]
static_fields = [
"TR.CommonName", "TR.ISIN", "TR.FIIssuerName", "TR.TRBCEconomicSector",
"TR.TRBCBusinessSector", "TR.TRBCIndustryGroup", "TR.FiCountryName",
"TR.FIMaturityDate", "TR.FiFaceIssuedTotal", "TR.FiIssueDate",
"TR.FiIssuerCountry", "TR.FILiquidationRank", "TR.FI.Group", "TR.FI.Rank",
"TR.CouponRate", "TR.FiCouponFrequency", "TR.FiCouponFrequencyDescription",
"TR.FiFirstCouponDate", "TR.FiLastCouponDate", "TR.ADF_COUPON",
"TR.FiMaturityStandardYield", "TR.MACAULAYDURATION"
]
# --- 1. Extract constituents from a single date ---
parameters = {
"SDate": snapshot_date,
"EDate": snapshot_date,
"Frq": "D",
"EndNum": 4000
}
holdings = rd.get_data([etf_ric], fields=fields_holdings, parameters=parameters)
df_holdings = holdings.dropna(subset=["Holding RIC"])
df_holdings = df_holdings[df_holdings["Holding RIC"].str.strip() != ""]
print(f"Total unique holdings found: {df_holdings['Holding RIC'].nunique()}")
# --- 2. Static data ---
unique_rics = df_holdings["Holding RIC"].astype(str).unique().tolist()
def fetch_static_data(rics, fields, chunk_size=500):
dfs = []
for i in range(0, len(rics), chunk_size):
chunk = rics[i:i + chunk_size]
try:
df_chunk = rd.get_data(chunk, fields)
dfs.append(df_chunk)
except Exception as e:
print(f"Error static data: {e}")
time.sleep(1)
return pd.concat(dfs, ignore_index=True)
df_static = fetch_static_data(unique_rics, static_fields)
# --- 3. Daily ime series ---
ts_data_list = []
for ric in unique_rics:
try:
df_ts = rd.get_history(universe=ric, start=start_ts, end=end_ts)
if df_ts is not None and not df_ts.empty:
df_ts = df_ts.reset_index()
df_ts["RIC"] = ric
ts_data_list.append(df_ts)
print(f"OK: {ric}")
except Exception as e:
print(f"Error time series {ric}: {e}")
time.sleep(1)
df_ts_all = pd.concat(ts_data_list, ignore_index=True)
# --- 4. Merge and Export ---
df_final = df_ts_all.merge(df_static, how="left", left_on="RIC", right_on="Instrument")
df_final_sorted = df_final.sort_values(by=["Date", "RIC"]).reset_index(drop=True)
df_final_sorted.to_csv("IEAC_static_and_timeseries_data.csv", index=False)
print("File saved: IEAC_static_and_timeseries_data.csv")
- Get ETF holdings snapshot for a given date (~4000 instruments)
- Retrieve static data for all constituents in chunks
- Retrieve daily historical data for each constituent individually over 10 years
- Merge and export the data
I’m wondering if there are more efficient ways to speed up the extraction, for example by using batch requests, multithreading or multiprocessing, or any built-in API features I might be missing.
Would appreciate any advice on optimizing this workflow or examples of best practices with the Refinitiv Data API for this kind of large-scale data extraction.
Thanks!
Answers
-
Hello @BeatrizRogerio
You didn't specify which session is being used here. The scope of this data is beyond what should be extracted from the Workspace desktop. You will probably need an enterprise grade product like Tick History or RDP bulk.
If you are using Workspace desktop session - I would advise you to stay within the data extraction limits and guidelines described in this document.
1 -
Hi Gurpreet, thank you for your reply.
To clarify, I am currently using a Workspace desktop session — the session is opened in my code with:
import refinitiv.data as rd rd.open_session()
Given that, I understand I’m working within the limits of the Workspace-based LSEG Data Library (LDL). I also reviewed the limits document you referenced.
About the workload, I'm trying to retrieve:
- ~4000 ETF constituents
- Static data for each constituent (batched in chunks of 500)
- 10 years of daily time series per constituent, using
rd.get_history()
(so potentially >10 million rows total)
My questions:
Is there any way to make this workflow more efficient within the Workspace context, such as:
- Batch time series requests for multiple RICs in a single
get_history()
call? - Use of
ThreadPoolExecutor
or parallel requests to better utilize the 5 req/sec / 50MB/minute allowance? - Any built-in throttling strategy or retry handler you recommend?
0 -
You may refer to the answer on this discussion.
0
Categories
- All Categories
- 3 Polls
- 6 AHS
- 36 Alpha
- 166 App Studio
- 6 Block Chain
- 4 Bot Platform
- 18 Connected Risk APIs
- 48 Data Fusion
- 34 Data Model Discovery
- 689 Datastream
- 1.4K DSS
- 627 Eikon COM
- 5.2K Eikon Data APIs
- 11 Electronic Trading
- 1 Generic FIX
- 7 Local Bank Node API
- 3 Trading API
- 2.9K Elektron
- 1.4K EMA
- 255 ETA
- 559 WebSocket API
- 39 FX Venues
- 15 FX Market Data
- 1 FX Post Trade
- 1 FX Trading - Matching
- 12 FX Trading – RFQ Maker
- 5 Intelligent Tagging
- 2 Legal One
- 23 Messenger Bot
- 3 Messenger Side by Side
- 9 ONESOURCE
- 7 Indirect Tax
- 60 Open Calais
- 277 Open PermID
- 44 Entity Search
- 2 Org ID
- 1 PAM
- PAM - Logging
- 6 Product Insight
- Project Tracking
- ProView
- ProView Internal
- 23 RDMS
- 2K Refinitiv Data Platform
- 706 Refinitiv Data Platform Libraries
- 4 LSEG Due Diligence
- LSEG Due Diligence Portal API
- 4 Refinitiv Due Dilligence Centre
- Rose's Space
- 1.2K Screening
- 18 Qual-ID API
- 13 Screening Deployed
- 23 Screening Online
- 12 World-Check Customer Risk Screener
- 1K World-Check One
- 46 World-Check One Zero Footprint
- 45 Side by Side Integration API
- 2 Test Space
- 3 Thomson One Smart
- 10 TR Knowledge Graph
- 151 Transactions
- 143 REDI API
- 1.8K TREP APIs
- 4 CAT
- 27 DACS Station
- 121 Open DACS
- 1.1K RFA
- 106 UPA
- 194 TREP Infrastructure
- 229 TRKD
- 918 TRTH
- 5 Velocity Analytics
- 9 Wealth Management Web Services
- 95 Workspace SDK
- 11 Element Framework
- 5 Grid
- 19 World-Check Data File
- 1 Yield Book Analytics
- 48 中文论坛