Inconsitent retrieval

Hi all, i am trying to retrieve a list of fields for a set of bond cusip codes, however i am getting an inconsistent output. First when i retrieve i see that the dates/indexes instead of going from 2010-01-31 to 2024-12-31 it starts in 1968 and goes until 2024. Additionally i have lots of missing values that should be there in the dataframe. Can you please help me to with this query. Here is the code;
import os
# Chunking parameters
chunk_size = 10 # Number of instruments to process in each chunk
sleep_time = 2 # Time (in seconds) to wait between API calls
# Backup file paths
backup_data_file = 'backup_all_data.csv'
retrieved_cusips_file = 'retrieved_cusips.csv'
# Load progress if backup files exist
if os.path.exists(backup_data_file):
all_data = pd.read_csv(backup_data_file)
else:
all_data = pd.DataFrame()
if os.path.exists(retrieved_cusips_file):
retrieved_cusips = pd.read_csv(retrieved_cusips_file)['cusip'].tolist()
else:
retrieved_cusips = []
# Compute remaining cusips to process
cusips_remaining = [cusip for cusip in cusip_bonds if cusip not in retrieved_cusips]
# Process in chunks
for i in range(0, len(cusips_remaining), chunk_size):
chunk = cusips_remaining[i:i + chunk_size]
try:
# Retrieve data for the current chunk
df = ld.get_history(
universe=chunk,
fields=[
'TR.FIIssueDate',
'TR.FiMaturityDate',
'TR.NICouponRate',
'EFF_DURTN',
'TR.IssueSizeUSD',
'TR.CA.AmtOutstanding',
'TR.Volatility200D',
'TR.OASAnalytics',
],
interval="monthly",
start='2010-01-31',
end='2024-12-31'
)
# Append the results to the main DataFrame
all_data = pd.concat([all_data, df],axis=1)
# Update retrieved_cusips list
retrieved_cusips.extend(chunk)
# Save current progress
all_data.to_csv(backup_data_file, index=True)
pd.DataFrame({'cusip': retrieved_cusips}).to_csv(retrieved_cusips_file, index=True)
# Display progress
print(f"Processed chunk {i // chunk_size + 1} of {len(cusips_remaining) // chunk_size + 1}")
# Sleep between chunks
time.sleep(sleep_time)
except Exception as e:
print(f"Error occurred while processing chunk {i // chunk_size + 1}: {e}")
print("Progress saved. You can resume from the last successful chunk.")
break
# Transform the combined DataFrame into 2D panel data format if all chunks are processed
if len(retrieved_cusips) == len(cusip_bonds):
panel_data = all_data.stack(level=0).reset_index()
# Dynamically extract column names from the original DataFrame
extracted_columns = ['Date', 'cusip_bonds'] + list(all_data.columns.get_level_values(1).unique())
# Assign the extracted column names to the panel_data
panel_data.columns = extracted_columns
# Ensure the Date column is sorted from oldest to newest
panel_data['Date'] = pd.to_datetime(panel_data['Date']) # Convert to datetime if not already
panel_data = panel_data.sort_values(by='Date').reset_index(drop=True)
# Save the final DataFrame to a CSV file
panel_data.to_csv('final_bond_chars.csv', index=True)
print("Data processing complete. File saved as 'final_bond_chars.csv'.")
else:
print("Processing incomplete. Backup saved. Resume later.")
Thanks
Answers
-
Hi @Robinrob
It would be easier if you were to narrow down the specific API call that includes a single item (RIC) in the request. We can't debug your entire application for you but can help better understand whether a specific API call is not retrieving expected data and go from there. For example, provide an example that makes a get_history() call with a single item and demonstrate where you perceive it is not providing the expected data.
0 -
I will narrow my question. How can i retrieve the fields, for that timespan in chunks, and then transform it from a multiindex dataframe into a 2d panel data, like this one:
I tried previous answers given in this forum and i didnt manage to solve it
Thanks
0 -
Unfortunately, this is not related to our APIs. This forum is dedicated to the usage of our APIs as opposed to general purpose Python and dataframe transform-related questions.
If you can demonstrate a single call with a RIC using our APIs and show where the issue is, we can take it from there. I hope this makes sense.
0 -
Ok, anyway. I tried a similar procedure with this code:
Define chunk size and sleep timechunk_size = 10 # Number of instruments to process at a time
Initialize an empty DataFrame to store the results
sleep_time = 2 # Time (in seconds) to sleep between processing chunksall_data = pd.DataFrame()
Process in chunks
stock_codes=stock_codes[:20]for i in range(0, len(stock_codes), chunk_size):
chunk = stock_codes[i:i + chunk_size]# Fetch history for the current chunk df = ld.get_history( universe=chunk, fields=[ 'TR.FiSPRating', 'TR.TotalEquity', 'TR.F.MinIntrNonCtrlIntrTot', 'TR.MARKETCAPITALISATION', 'TR.F.CashCashEquivTot', 'TR.TotalDebtOutstanding', 'TR.F.InvntTot', 'TR.F.PrefStockRedeemTot', 'TR.MnATargetNetSalesMRQ', 'TR.F.TotRevenue', 'TR.F.EBITDA', 'TR.DepreciationAmort', 'TR.F.CostOfOpRev', 'TR.F.TotAssets' ], interval='monthly', start='2010-01-31', end='2024-12-31' ) # Append the results to the main DataFrame all_data = pd.concat([all_data, df]) # Display progress print(f"Processed chunk {i // chunk_size + 1} of {len(stock_codes) // chunk_size + 1}") # Sleep between chunks time.sleep(sleep_time)
Save the final result to a CSV fileall_data.to_csv('stock_chars.csv', index=True)
And i got this result, the code on the firt row is PermID of JPM, and this output is inconsistent in terms of dates and there is a lot of missing values:
0 -
Hi @Robinrob
Thanks for the sample.
When using the 'get_history()' call you will have to use a different set of fields. In addition, the call does not support permids as input, only RICs - so you will have to convert.
For example:
import lseg.data as ld
from lseg.data.content import symbol_conversion … # Convert permid to ric
permids = ['5000021791']
response = symbol_conversion.Definition(permids, from_symbol_type=symbol_conversion.SymbolTypes.OA_PERM_ID).get_data()Then pull out the RIC and perform the following get_history() call, without specifying any fields - this will default to bring back everyone:
The other way users pull down history is by utilizing the get_data() function which pulls out historical content from a different database. With this, you can use the fields you specified.
For example:
ld.get_data(
universe=['5000021791'],
fields=['TR.FiSPRating', # No ratings returned - you may need additional license
'TR.F.TotRevenue.calcdate',
'TR.TotalEquity',
'TR.F.MinIntrNonCtrlIntrTot',
'TR.MARKETCAPITALISATION',
'TR.F.CashCashEquivTot',
'TR.TotalDebtOutstanding',
'TR.F.InvntTot',
'TR.F.PrefStockRedeemTot',
#'TR.MnATargetNetSalesMRQ', # Commented out - causes duplicates in result set
'TR.F.TotRevenue',
'TR.F.EBITDA',
'TR.DepreciationAmort',
'TR.F.CostOfOpRev',
'TR.F.TotAssets'],
parameters={'SDate': "2010-01-31", 'EDate': "2024-12-31", 'frq': "CM"}
)This should provide you a place to start to experiment.
0 -
So the get_history method only supports RIC identifiers for both stocks and bonds?
0 -
The get_history method can retreive historical data of both TR fields and real-time fields. The TR fields are the fields that have the "TR." prefix while the real-time fields don't have the "TR." prefix, such as BID and ASK.
To retrieve historical data from the TR fields, you can use PermIDs. However, to retrieve historical data of real-time fields, you can use only RICs.
I tested the code and found that the 'TR.MnATargetNetSalesMRQ' field may cause this issue. If I removed it, I can retrieve the data properly.
df = ld.get_history( universe=['5000021791'], fields=[ 'TR.FiSPRating', 'TR.TotalEquity', 'TR.F.MinIntrNonCtrlIntrTot', 'TR.MARKETCAPITALISATION', 'TR.F.CashCashEquivTot', 'TR.TotalDebtOutstanding', 'TR.F.InvntTot', 'TR.F.PrefStockRedeemTot' 'TR.F.TotRevenue', 'TR.F.EBITDA', 'TR.DepreciationAmort', 'TR.F.CostOfOpRev', 'TR.F.TotAssets' ], interval='monthly', start='2010-01-31', end='2024-12-31' ) df
0 -
I tried to retrieve these fields with stock RICS and i still get data out of the time parameters i defined, is it more robust to use get_data?:
Load the stocksfile = pd.read_csv('filtered_bonds_final.csv')
Define chunk size for processing
stocks = file['RIC'].unique().tolist() # Convert to a list
all_data=pd.DataFrame()chunk_size = 20 # Adjust as needed for API rate limits or memory constraints
File to store ESG data
sleep_time = 2 # Time (in seconds) to wait between API callsesg_data_file = 'ESG_data.csv'
Define the fields to retrievefields = [
'TR.AnalyticCO2DirectScope1ToEVIC',
'TR.AnalyticCO2DirectScope1YoY',
'TR.AnalyticCO2EVIC',
'TR.AnalyticCO2EquivalentEmissionsTotalYoY',
'TR.AnalyticCO2IndirectScope2ToEVIC',
'TR.AnalyticCO2IndirectScope2YoY',
'TR.AnalyticDischargeWaterSystemScore',
'TR.AnalyticEnergyUse',
'TR.AnalyticEnergyUseEVIC',
'TR.AnalyticEnvControvScore',
'TR.AnalyticEnvExpendituresScore',
'TR.AnalyticGHGEmissionsDirectScope1ToEVIC',
'TR.AnalyticGHGEmissionsIndirectScope2ToEVIC',
'TR.AnalyticGHGEmissionsIntensityIndirectScope2YoYPercentChg',
'TR.AnalyticGHGEmissionsIntensityIndirectScope3YoYPercentChg',
'TR.AnalyticGHGEmissionsIntensityScope1and2YoYChg',
'TR.AnalyticGHGEmissionsIntensityScope1and2and3CAGR3Year',
'TR.AnalyticGHGEmissionsIntensityScope1and2and3CAGR5Year',
'TR.AnalyticGHGEmissionsIntensityScope1and2and3YoYChg',
'TR.AnalyticGHGEmissionsIntensityScope3CAGR5Year',
'TR.AnalyticGHGEmissionsIntensitydirectScope1YoYPercentChg',
'TR.AnalyticGHGEmissionsScope1and2and3',
'TR.AnalyticGHGEmissionsScope1and2and3CAGR3Year',
'TR.AnalyticGHGEmissionsScope1and2and3CAGR5Year',
'TR.AnalyticGHGEmissionsScope1and2and3ToEVIC',
'TR.AnalyticGHGEmissionsScope1and2and3YoYChgPercent',
'TR.AnalyticHazardousWasteEVIC',
'TR.AnalyticRenewEnergyUse',
'TR.AnalyticResourceRedPolicyScore',
'TR.AnalyticResourceRedTargetsScore',
'TR.AnalyticTotalWasteEVIC',
'TR.CDPEmissionReductionTargetPctage',
'TR.CDPFutureCapexAlignmentCommitment',
'TR.CDPPhasingoutInvestmentsinCarbonIntensiveAssetsorProducts',
'TR.CompanyGreenRevenuePercentage',
'TR.ControvEnv',
'TR.EnvExpenditures',
'TR.EnvMaterialsSourcingScore',
'TR.EnvRD',
'TR.GovernancePillarScore',
'TR.GovernancePillarScoreGrade',
'TR.PolicyEnergyEfficiencyScore',
'TR.PolicyEnvSupplyChainScore',
'TR.PolicySustainablePackagingScore',
'TR.Scope1EstTotalToEVIC',
'TR.Scope1EstTotalToRevenues',
'TR.Scope1andScope2EstTotalsToRevenues',
'TR.Scope1andScope2andScope3EstTotal',
'TR.Scope2EstTotalToEVIC',
'TR.Scope2EstTotalToRevenues',
'TR.Scope3DownstreamEstTotalToEVIC',
'TR.Scope3DownstreamEstTotalToRevenues',
'TR.Scope3DownstreamTotalToEVIC',
'TR.Scope3DownstreamTotalToRevenues',
'TR.Scope3EstTotalToRevenues',
'TR.Scope3UpstreamEstTotalToEVIC',
'TR.Scope3UpstreamEstTotalToRevenues',
'TR.Scope3UpstreamTotalToEVIC',
'TR.Scope3UpstreamTotalToRevenues',
'TR.SocialPillarScore',
'TR.SocialPillarScoreGrade',
'TR.TargetEmissionsScore',
'TR.TargetPollution',
'TR.TargetsWaterEfficiencyScore',
'TR.ToxicChemicalsReductionScore',
'TR.TRESGCControversiesScore',
'TR.TRESGCControversiesScoreGrade',
'TR.TRESGCControversiesScoreGrowth',
'TR.TRESGCScore',
'TR.TRESGCScoreGrade',
'TR.TRESGProductResponsibilityScore',
'TR.TRESGProductResponsibilityScoreGrade',
'TR.TRESGScore',
'TR.TRESGScoreGrade',
'TR.WaterRecycledScore',
]for i in range(0, len(stocks), chunk_size):
Fetch ESG data for the current chunk
chunk = stocks[i:i + chunk_size] # Select a chunk of stocks
print(f"Processing chunk {i // chunk_size + 1}: {len(chunk)} instruments")df = ld.get_history( universe=chunk, fields=fields, interval="monthly", start='2010-01-31', end='2024-12-31' ) # Append the results to the main DataFrame df=df.stack(level=0).reset_index() # Extract column names dynamically extracted_columns =list(panel_data.columns.get_level_values(0).unique()) panel_data.columns = extracted_columns all_data = pd.concat([all_data, df]) # Display progress print(f"Processed chunk {i // chunk_size + 1} of {len(stocks) // chunk_size + 1}") # Sleep between chunks time.sleep(sleep_time)
Ensure the Date column is ordered from the oldest to the newestpanel_data['Date'] = pd.to_datetime(panel_data['Date']) # Convert to datetime if not already
Save the final result to a CSV file
panel_data = panel_data.sort_values(by='Date').reset_index(drop=True)panel_data.to_csv('esg_chars.csv', index=False)
0 -
Please scope down the issue by providing sample instruments used in the universe parameter what causes this issue. Therefore, we can test it with this code and focus on the resposne returned from the get_history method.
df = ld.get_history(
universe=chunk,
fields=fields,
interval="monthly",
start='2010-01-31',
end='2024-12-31'
)0 -
Here is a sample of the codes 020002BA8, 48130CQH1
, 871829AF4
, 559222BA1
and 20030NBU4 that make the retireval start in 2007 and not in 2010.0 -
Thank you for the information.
I requested one field at a time to check the returned start date and found that some fields are not availble. The results are:
- TR.AnalyticCO2DirectScope1ToEVIC : An error occured Unable to resolve all requested fields in ['TR.ANALYTICCO2DIRECTSCOPE1TOEVIC']. The formula must contain at least one field or function.
- TR.AnalyticCO2DirectScope1YoY : 2009-01-31T00:00:00.000000000
- TR.AnalyticCO2EVIC : 2009-01-31T00:00:00.000000000
- TR.AnalyticCO2EquivalentEmissionsTotalYoY : 2009-01-31T00:00:00.000000000
- TR.AnalyticCO2IndirectScope2ToEVIC : An error occured Unable to resolve all requested fields in ['TR.ANALYTICCO2INDIRECTSCOPE2TOEVIC']. The formula must contain at least one field or function.
- TR.AnalyticCO2IndirectScope2YoY : 2009-01-31T00:00:00.000000000
- TR.AnalyticDischargeWaterSystemScore : No Data
- TR.AnalyticEnergyUse : 2008-06-28T00:00:00.000000000
- TR.AnalyticEnergyUseEVIC : 2009-01-31T00:00:00.000000000
- TR.AnalyticEnvControvScore : 2008-06-28T00:00:00.000000000
- TR.AnalyticEnvExpendituresScore : 2008-06-28T00:00:00.000000000
- TR.AnalyticGHGEmissionsDirectScope1ToEVIC : 2009-01-31T00:00:00.000000000
- TR.AnalyticGHGEmissionsIndirectScope2ToEVIC : 2009-01-31T00:00:00.000000000
- TR.AnalyticGHGEmissionsIntensityIndirectScope2YoYPercentChg : 2009-01-31T00:00:00.000000000
- TR.AnalyticGHGEmissionsIntensityIndirectScope3YoYPercentChg : 2009-01-31T00:00:00.000000000
- TR.AnalyticGHGEmissionsIntensityScope1and2YoYChg : 2009-01-31T00:00:00.000000000
- TR.AnalyticGHGEmissionsIntensityScope1and2and3CAGR3Year : 2009-01-31T00:00:00.000000000
- TR.AnalyticGHGEmissionsIntensityScope1and2and3CAGR5Year : 2009-01-31T00:00:00.000000000
- TR.AnalyticGHGEmissionsIntensityScope1and2and3YoYChg : 2009-01-31T00:00:00.000000000
- TR.AnalyticGHGEmissionsIntensityScope3CAGR5Year : 2008-12-31T00:00:00.000000000
- TR.AnalyticGHGEmissionsIntensitydirectScope1YoYPercentChg : 2009-01-31T00:00:00.000000000
- TR.AnalyticGHGEmissionsScope1and2and3 : 2008-12-31T00:00:00.000000000
- TR.AnalyticGHGEmissionsScope1and2and3CAGR3Year : 2009-01-31T00:00:00.000000000
- TR.AnalyticGHGEmissionsScope1and2and3CAGR5Year : 2009-01-31T00:00:00.000000000
- TR.AnalyticGHGEmissionsScope1and2and3ToEVIC : 2009-01-31T00:00:00.000000000
- TR.AnalyticGHGEmissionsScope1and2and3YoYChgPercent : 2009-01-31T00:00:00.000000000
- TR.AnalyticHazardousWasteEVIC : 2010-01-31T00:00:00.000000000
- TR.AnalyticRenewEnergyUse : 2009-02-28T00:00:00.000000000
- TR.AnalyticResourceRedPolicyScore : 2008-06-28T00:00:00.000000000
- TR.AnalyticResourceRedTargetsScore : 2008-06-28T00:00:00.000000000
- TR.AnalyticTotalWasteEVIC : 2010-01-31T00:00:00.000000000
- TR.CDPEmissionReductionTargetPctage : 2009-01-31T00:00:00.000000000
- TR.CDPFutureCapexAlignmentCommitment : No Data
- TR.CDPPhasingoutInvestmentsinCarbonIntensiveAssetsorProducts : 2009-01-31T00:00:00.000000000
- TR.CompanyGreenRevenuePercentage : An error occured Output parameter 'date' is unrecognized. This could possibly happen because of mismatch between top level function expression and its usage in output. Make sure that you used exactly 1:1 identical expressions. Requested universes: ['020002BA8', '48130CQH1', '871829AF4', '559222BA1']. Requested fields: ['TR.COMPANYGREENREVENUEPERCENTAGE']
- TR.ControvEnv : 2010-01-31T00:00:00.000000000
- TR.EnvExpenditures : 2009-01-31T00:00:00.000000000
- TR.EnvMaterialsSourcingScore : 2008-06-28T00:00:00.000000000
- TR.EnvRD : No Data
- TR.GovernancePillarScore : 2008-06-28T00:00:00.000000000
- TR.GovernancePillarScoreGrade : 2008-06-28T00:00:00.000000000
- TR.PolicyEnergyEfficiencyScore : 2008-06-28T00:00:00.000000000
- TR.PolicyEnvSupplyChainScore : 2008-06-28T00:00:00.000000000
- TR.PolicySustainablePackagingScore : 2008-06-28T00:00:00.000000000
- TR.Scope1EstTotalToEVIC : 2009-01-31T00:00:00.000000000
- TR.Scope1EstTotalToRevenues : No Data
- TR.Scope1andScope2EstTotalsToRevenues : 2008-12-31T00:00:00.000000000
- TR.Scope1andScope2andScope3EstTotal : 2008-12-31T00:00:00.000000000
- TR.Scope2EstTotalToEVIC : 2009-01-31T00:00:00.000000000
- TR.Scope2EstTotalToRevenues : No Data
- TR.Scope3DownstreamEstTotalToEVIC : 2009-01-31T00:00:00.000000000
- TR.Scope3DownstreamEstTotalToRevenues : 2008-12-31T00:00:00.000000000
- TR.Scope3DownstreamTotalToEVIC : 2009-01-31T00:00:00.000000000
- TR.Scope3DownstreamTotalToRevenues : 2009-01-31T00:00:00.000000000
- TR.Scope3EstTotalToRevenues : 2008-12-31T00:00:00.000000000
- TR.Scope3UpstreamEstTotalToEVIC : 2009-01-31T00:00:00.000000000
- TR.Scope3UpstreamEstTotalToRevenues : 2008-12-31T00:00:00.000000000
- TR.Scope3UpstreamTotalToEVIC : 2009-01-31T00:00:00.000000000
- TR.Scope3UpstreamTotalToRevenues : 2009-01-31T00:00:00.000000000
- TR.SocialPillarScore : 2008-06-28T00:00:00.000000000
- TR.SocialPillarScoreGrade : 2008-06-28T00:00:00.000000000
- TR.TargetEmissionsScore : 2008-12-31T00:00:00.000000000
- TR.TargetPollution : An error occured Unable to resolve all requested fields in ['TR.TARGETPOLLUTION']. The formula must contain at least one field or function.
- TR.TargetsWaterEfficiencyScore : 2009-01-31T00:00:00.000000000
- TR.ToxicChemicalsReductionScore : 2008-06-28T00:00:00.000000000
- TR.TRESGCControversiesScore : 2008-06-28T00:00:00.000000000
- TR.TRESGCControversiesScoreGrade : 2008-06-28T00:00:00.000000000
- TR.TRESGCControversiesScoreGrowth : 2008-06-28T00:00:00.000000000
- TR.TRESGCScore : 2008-06-28T00:00:00.000000000
- TR.TRESGCScoreGrade : 2008-06-28T00:00:00.000000000
- TR.TRESGProductResponsibilityScore : 2008-06-28T00:00:00.000000000
- TR.TRESGProductResponsibilityScoreGrade : 2008-06-28T00:00:00.000000000
- TR.TRESGScore : 2008-06-28T00:00:00.000000000
- TR.TRESGScoreGrade : 2008-06-28T00:00:00.000000000
- TR.WaterRecycledScore : No DataAccording to this information, it looks like to be a content issue. I would like to suggest to contact the helpdesk team directly via MyAccount to verify the fields, parameters, and data.
0
Categories
- All Categories
- 6 AHS
- 36 Alpha
- 166 App Studio
- 6 Block Chain
- 4 Bot Platform
- 18 Connected Risk APIs
- 47 Data Fusion
- 34 Data Model Discovery
- 684 Datastream
- 1.4K DSS
- 613 Eikon COM
- 5.2K Eikon Data APIs
- 10 Electronic Trading
- Generic FIX
- 7 Local Bank Node API
- 3 Trading API
- 2.9K Elektron
- 1.4K EMA
- 248 ETA
- 552 WebSocket API
- 37 FX Venues
- 14 FX Market Data
- 1 FX Post Trade
- 1 FX Trading - Matching
- 12 FX Trading – RFQ Maker
- 5 Intelligent Tagging
- 2 Legal One
- 23 Messenger Bot
- 3 Messenger Side by Side
- 9 ONESOURCE
- 7 Indirect Tax
- 60 Open Calais
- 275 Open PermID
- 44 Entity Search
- 2 Org ID
- 1 PAM
- PAM - Logging
- 6 Product Insight
- Project Tracking
- ProView
- ProView Internal
- 22 RDMS
- 1.9K Refinitiv Data Platform
- 629 Refinitiv Data Platform Libraries
- 4 LSEG Due Diligence
- LSEG Due Diligence Portal API
- 4 Refinitiv Due Dilligence Centre
- Rose's Space
- 1.2K Screening
- 18 Qual-ID API
- 13 Screening Deployed
- 23 Screening Online
- 12 World-Check Customer Risk Screener
- 1K World-Check One
- 46 World-Check One Zero Footprint
- 45 Side by Side Integration API
- 2 Test Space
- 3 Thomson One Smart
- 10 TR Knowledge Graph
- 151 Transactions
- 143 REDI API
- 1.8K TREP APIs
- 4 CAT
- 26 DACS Station
- 121 Open DACS
- 1.1K RFA
- 104 UPA
- 191 TREP Infrastructure
- 228 TRKD
- 915 TRTH
- 5 Velocity Analytics
- 9 Wealth Management Web Services
- 86 Workspace SDK
- 11 Element Framework
- 5 Grid
- 18 World-Check Data File
- 1 Yield Book Analytics
- 46 中文论坛