I have a script that extracts daily time series data and static data for the individual constituents of the ETF IEAC.L over a 10-year period. Currently, I’m fetching the data day by day for each of roughly 4000 constituents, which makes the process very slow — a test for a single day already took about 1 hour.
My current approach:
# Configuartions
etf_ric = "IEAC.L"
snapshot_date = "2025-04-30"
start_ts = "2015-04-30"
end_ts = "2025-04-30"
# Holding and static fields
fields_holdings = ["TR.FundHoldingRIC", "TR.FundHoldingName", "TR.FundLatestFilingDate"]
static_fields = [
"TR.CommonName", "TR.ISIN", "TR.FIIssuerName", "TR.TRBCEconomicSector",
"TR.TRBCBusinessSector", "TR.TRBCIndustryGroup", "TR.FiCountryName",
"TR.FIMaturityDate", "TR.FiFaceIssuedTotal", "TR.FiIssueDate",
"TR.FiIssuerCountry", "TR.FILiquidationRank", "TR.FI.Group", "TR.FI.Rank",
"TR.CouponRate", "TR.FiCouponFrequency", "TR.FiCouponFrequencyDescription",
"TR.FiFirstCouponDate", "TR.FiLastCouponDate", "TR.ADF_COUPON",
"TR.FiMaturityStandardYield", "TR.MACAULAYDURATION"
]
# --- 1. Extract constituents from a single date ---
parameters = {
"SDate": snapshot_date,
"EDate": snapshot_date,
"Frq": "D",
"EndNum": 4000
}
holdings = rd.get_data([etf_ric], fields=fields_holdings, parameters=parameters)
df_holdings = holdings.dropna(subset=["Holding RIC"])
df_holdings = df_holdings[df_holdings["Holding RIC"].str.strip() != ""]
print(f"Total unique holdings found: {df_holdings['Holding RIC'].nunique()}")
# --- 2. Static data ---
unique_rics = df_holdings["Holding RIC"].astype(str).unique().tolist()
def fetch_static_data(rics, fields, chunk_size=500):
dfs = []
for i in range(0, len(rics), chunk_size):
chunk = rics[i:i + chunk_size]
try:
df_chunk = rd.get_data(chunk, fields)
dfs.append(df_chunk)
except Exception as e:
print(f"Error static data: {e}")
time.sleep(1)
return pd.concat(dfs, ignore_index=True)
df_static = fetch_static_data(unique_rics, static_fields)
# --- 3. Daily ime series ---
ts_data_list = []
for ric in unique_rics:
try:
df_ts = rd.get_history(universe=ric, start=start_ts, end=end_ts)
if df_ts is not None and not df_ts.empty:
df_ts = df_ts.reset_index()
df_ts["RIC"] = ric
ts_data_list.append(df_ts)
print(f"OK: {ric}")
except Exception as e:
print(f"Error time series {ric}: {e}")
time.sleep(1)
df_ts_all = pd.concat(ts_data_list, ignore_index=True)
# --- 4. Merge and Export ---
df_final = df_ts_all.merge(df_static, how="left", left_on="RIC", right_on="Instrument")
df_final_sorted = df_final.sort_values(by=["Date", "RIC"]).reset_index(drop=True)
df_final_sorted.to_csv("IEAC_static_and_timeseries_data.csv", index=False)
print("File saved: IEAC_static_and_timeseries_data.csv")
- Get ETF holdings snapshot for a given date (~4000 instruments)
- Retrieve static data for all constituents in chunks
- Retrieve daily historical data for each constituent individually over 10 years
- Merge and export the data
I’m wondering if there are more efficient ways to speed up the extraction, for example by using batch requests, multithreading or multiprocessing, or any built-in API features I might be missing.
Would appreciate any advice on optimizing this workflow or examples of best practices with the Refinitiv Data API for this kind of large-scale data extraction.
Thanks!