Inconsitent retrieval

Hi all, i am trying to retrieve a list of fields for a set of bond cusip codes, however i am getting an inconsistent output. First when i retrieve i see that the dates/indexes instead of going from 2010-01-31 to 2024-12-31 it starts in 1968 and goes until 2024. Additionally i have lots of missing values that should be there in the dataframe. Can you please help me to with this query. Here is the code;

import os

# Chunking parameters

chunk_size = 10 # Number of instruments to process in each chunk

sleep_time = 2 # Time (in seconds) to wait between API calls

# Backup file paths

backup_data_file = 'backup_all_data.csv'

retrieved_cusips_file = 'retrieved_cusips.csv'

# Load progress if backup files exist

if os.path.exists(backup_data_file):

all_data = pd.read_csv(backup_data_file)

else:

all_data = pd.DataFrame()

if os.path.exists(retrieved_cusips_file):

retrieved_cusips = pd.read_csv(retrieved_cusips_file)['cusip'].tolist()

else:

retrieved_cusips = []

# Compute remaining cusips to process

cusips_remaining = [cusip for cusip in cusip_bonds if cusip not in retrieved_cusips]

# Process in chunks

for i in range(0, len(cusips_remaining), chunk_size):

chunk = cusips_remaining[i:i + chunk_size]

try:

# Retrieve data for the current chunk

df = ld.get_history(

universe=chunk,

fields=[

'TR.FIIssueDate',

'TR.FiMaturityDate',

'TR.NICouponRate',

'EFF_DURTN',

'TR.IssueSizeUSD',

'TR.CA.AmtOutstanding',

'TR.Volatility200D',

'TR.OASAnalytics',

interval="monthly",

start='2010-01-31',

end='2024-12-31'

)

# Append the results to the main DataFrame

all_data = pd.concat([all_data, df],axis=1)

# Update retrieved_cusips list

retrieved_cusips.extend(chunk)

# Save current progress

all_data.to_csv(backup_data_file, index=True)

pd.DataFrame({'cusip': retrieved_cusips}).to_csv(retrieved_cusips_file, index=True)

# Display progress

print(f"Processed chunk {i // chunk_size + 1} of {len(cusips_remaining) // chunk_size + 1}")

# Sleep between chunks

time.sleep(sleep_time)

except Exception as e:

print(f"Error occurred while processing chunk {i // chunk_size + 1}: {e}")

print("Progress saved. You can resume from the last successful chunk.")

break

# Transform the combined DataFrame into 2D panel data format if all chunks are processed

if len(retrieved_cusips) == len(cusip_bonds):

panel_data = all_data.stack(level=0).reset_index()

# Dynamically extract column names from the original DataFrame

extracted_columns = ['Date', 'cusip_bonds'] + list(all_data.columns.get_level_values(1).unique())

# Assign the extracted column names to the panel_data

panel_data.columns = extracted_columns

# Ensure the Date column is sorted from oldest to newest

panel_data['Date'] = pd.to_datetime(panel_data['Date']) # Convert to datetime if not already

panel_data = panel_data.sort_values(by='Date').reset_index(drop=True)

# Save the final DataFrame to a CSV file

panel_data.to_csv('final_bond_chars.csv', index=True)

print("Data processing complete. File saved as 'final_bond_chars.csv'.")

else:

print("Processing incomplete. Backup saved. Resume later.")

Thanks

Find more posts tagged with

api

#content

PHYTON

Accepted answers

All comments

nick.zincone

Hi @Robinrob

It would be easier if you were to narrow down the specific API call that includes a single item (RIC) in the request. We can't debug your entire application for you but can help better understand whether a specific API call is not retrieving expected data and go from there. For example, provide an example that makes a get_history() call with a single item and demonstrate where you perceive it is not providing the expected data.

Robinrob

I will narrow my question. How can i retrieve the fields, for that timespan in chunks, and then transform it from a multiindex dataframe into a 2d panel data, like this one:

I tried previous answers given in this forum and i didnt manage to solve it

Thanks

nick.zincone

@Robinrob

Unfortunately, this is not related to our APIs. This forum is dedicated to the usage of our APIs as opposed to general purpose Python and dataframe transform-related questions.

If you can demonstrate a single call with a RIC using our APIs and show where the issue is, we can take it from there. I hope this makes sense.

Robinrob

Ok, anyway. I tried a similar procedure with this code:

Define chunk size and sleep time

chunk_size = 10 # Number of instruments to process at a time
sleep_time = 2 # Time (in seconds) to sleep between processing chunks

Initialize an empty DataFrame to store the results

all_data = pd.DataFrame()
stock_codes=stock_codes[:20]

Process in chunks

for i in range(0, len(stock_codes), chunk_size):
chunk = stock_codes[i:i + chunk_size]

# Fetch history for the current chunk
df = ld.get_history(
    universe=chunk,
    fields=[
        'TR.FiSPRating',
        'TR.TotalEquity',
        'TR.F.MinIntrNonCtrlIntrTot',
        'TR.MARKETCAPITALISATION',
        'TR.F.CashCashEquivTot',
        'TR.TotalDebtOutstanding',
        'TR.F.InvntTot',
        'TR.F.PrefStockRedeemTot',
        'TR.MnATargetNetSalesMRQ',
        'TR.F.TotRevenue',
        'TR.F.EBITDA',
        'TR.DepreciationAmort',                        
        'TR.F.CostOfOpRev',
        'TR.F.TotAssets'
    ],
    interval='monthly',
    start='2010-01-31',
    end='2024-12-31'
)

# Append the results to the main DataFrame
all_data = pd.concat([all_data, df])

# Display progress
print(f"Processed chunk {i // chunk_size + 1} of {len(stock_codes) // chunk_size + 1}")

# Sleep between chunks
time.sleep(sleep_time)

Save the final result to a CSV file

all_data.to_csv('stock_chars.csv', index=True)

And i got this result, the code on the firt row is PermID of JPM, and this output is inconsistent in terms of dates and there is a lot of missing values:

nick.zincone

Hi @Robinrob

Thanks for the sample.

When using the 'get_history()' call you will have to use a different set of fields. In addition, the call does not support permids as input, only RICs - so you will have to convert.

For example:

import lseg.data as ld

from lseg.data.content import symbol_conversion
…
# Convert permid to ric

permids = ['5000021791']

response = symbol_conversion.Definition(permids, from_symbol_type=symbol_conversion.SymbolTypes.OA_PERM_ID).get_data()

Then pull out the RIC and perform the following get_history() call, without specifying any fields - this will default to bring back everyone:

The other way users pull down history is by utilizing the get_data() function which pulls out historical content from a different database. With this, you can use the fields you specified.

For example:

ld.get_data(

    universe=['5000021791'],

    fields=['TR.FiSPRating',              # No ratings returned - you may need additional license

            'TR.F.TotRevenue.calcdate',

            'TR.TotalEquity',

            'TR.F.MinIntrNonCtrlIntrTot',

            'TR.MARKETCAPITALISATION',

            'TR.F.CashCashEquivTot',

            'TR.TotalDebtOutstanding',

            'TR.F.InvntTot',

            'TR.F.PrefStockRedeemTot',

            #'TR.MnATargetNetSalesMRQ',   # Commented out - causes duplicates in result set

            'TR.F.TotRevenue',

            'TR.F.EBITDA',

            'TR.DepreciationAmort',

            'TR.F.CostOfOpRev',

            'TR.F.TotAssets'],

    parameters={'SDate': "2010-01-31", 'EDate': "2024-12-31", 'frq': "CM"}

)

This should provide you a place to start to experiment.

Robinrob

So the get_history method only supports RIC identifiers for both stocks and bonds?

Jirapongse

@Robinrob

The get_history method can retreive historical data of both TR fields and real-time fields. The TR fields are the fields that have the "TR." prefix while the real-time fields don't have the "TR." prefix, such as BID and ASK.

To retrieve historical data from the TR fields, you can use PermIDs. However, to retrieve historical data of real-time fields, you can use only RICs.

I tested the code and found that the 'TR.MnATargetNetSalesMRQ' field may cause this issue. If I removed it, I can retrieve the data properly.

df = ld.get_history(
    universe=['5000021791'],
    fields=[
               'TR.FiSPRating',
        'TR.TotalEquity',
        'TR.F.MinIntrNonCtrlIntrTot',
        'TR.MARKETCAPITALISATION',
        'TR.F.CashCashEquivTot',
        'TR.TotalDebtOutstanding',
        'TR.F.InvntTot',
        'TR.F.PrefStockRedeemTot'
        'TR.F.TotRevenue',
        'TR.F.EBITDA',
        'TR.DepreciationAmort',                        
        'TR.F.CostOfOpRev',
        'TR.F.TotAssets'
    ],
    interval='monthly',
    start='2010-01-31',
    end='2024-12-31'
)

df

Robinrob

I tried to retrieve these fields with stock RICS and i still get data out of the time parameters i defined, is it more robust to use get_data?:

Load the stocks

file = pd.read_csv('filtered_bonds_final.csv')
stocks = file['RIC'].unique().tolist() # Convert to a list
all_data=pd.DataFrame()

Define chunk size for processing

chunk_size = 20 # Adjust as needed for API rate limits or memory constraints
sleep_time = 2 # Time (in seconds) to wait between API calls

File to store ESG data

esg_data_file = 'ESG_data.csv'

Define the fields to retrieve

fields = [
'TR.AnalyticCO2DirectScope1ToEVIC',
'TR.AnalyticCO2DirectScope1YoY',
'TR.AnalyticCO2EVIC',
'TR.AnalyticCO2EquivalentEmissionsTotalYoY',
'TR.AnalyticCO2IndirectScope2ToEVIC',
'TR.AnalyticCO2IndirectScope2YoY',
'TR.AnalyticDischargeWaterSystemScore',
'TR.AnalyticEnergyUse',
'TR.AnalyticEnergyUseEVIC',
'TR.AnalyticEnvControvScore',
'TR.AnalyticEnvExpendituresScore',
'TR.AnalyticGHGEmissionsDirectScope1ToEVIC',
'TR.AnalyticGHGEmissionsIndirectScope2ToEVIC',
'TR.AnalyticGHGEmissionsIntensityIndirectScope2YoYPercentChg',
'TR.AnalyticGHGEmissionsIntensityIndirectScope3YoYPercentChg',
'TR.AnalyticGHGEmissionsIntensityScope1and2YoYChg',
'TR.AnalyticGHGEmissionsIntensityScope1and2and3CAGR3Year',
'TR.AnalyticGHGEmissionsIntensityScope1and2and3CAGR5Year',
'TR.AnalyticGHGEmissionsIntensityScope1and2and3YoYChg',
'TR.AnalyticGHGEmissionsIntensityScope3CAGR5Year',
'TR.AnalyticGHGEmissionsIntensitydirectScope1YoYPercentChg',
'TR.AnalyticGHGEmissionsScope1and2and3',
'TR.AnalyticGHGEmissionsScope1and2and3CAGR3Year',
'TR.AnalyticGHGEmissionsScope1and2and3CAGR5Year',
'TR.AnalyticGHGEmissionsScope1and2and3ToEVIC',
'TR.AnalyticGHGEmissionsScope1and2and3YoYChgPercent',
'TR.AnalyticHazardousWasteEVIC',
'TR.AnalyticRenewEnergyUse',
'TR.AnalyticResourceRedPolicyScore',
'TR.AnalyticResourceRedTargetsScore',
'TR.AnalyticTotalWasteEVIC',
'TR.CDPEmissionReductionTargetPctage',
'TR.CDPFutureCapexAlignmentCommitment',
'TR.CDPPhasingoutInvestmentsinCarbonIntensiveAssetsorProducts',
'TR.CompanyGreenRevenuePercentage',
'TR.ControvEnv',
'TR.EnvExpenditures',
'TR.EnvMaterialsSourcingScore',
'TR.EnvRD',
'TR.GovernancePillarScore',
'TR.GovernancePillarScoreGrade',
'TR.PolicyEnergyEfficiencyScore',
'TR.PolicyEnvSupplyChainScore',
'TR.PolicySustainablePackagingScore',
'TR.Scope1EstTotalToEVIC',
'TR.Scope1EstTotalToRevenues',
'TR.Scope1andScope2EstTotalsToRevenues',
'TR.Scope1andScope2andScope3EstTotal',
'TR.Scope2EstTotalToEVIC',
'TR.Scope2EstTotalToRevenues',
'TR.Scope3DownstreamEstTotalToEVIC',
'TR.Scope3DownstreamEstTotalToRevenues',
'TR.Scope3DownstreamTotalToEVIC',
'TR.Scope3DownstreamTotalToRevenues',
'TR.Scope3EstTotalToRevenues',
'TR.Scope3UpstreamEstTotalToEVIC',
'TR.Scope3UpstreamEstTotalToRevenues',
'TR.Scope3UpstreamTotalToEVIC',
'TR.Scope3UpstreamTotalToRevenues',
'TR.SocialPillarScore',
'TR.SocialPillarScoreGrade',
'TR.TargetEmissionsScore',
'TR.TargetPollution',
'TR.TargetsWaterEfficiencyScore',
'TR.ToxicChemicalsReductionScore',
'TR.TRESGCControversiesScore',
'TR.TRESGCControversiesScoreGrade',
'TR.TRESGCControversiesScoreGrowth',
'TR.TRESGCScore',
'TR.TRESGCScoreGrade',
'TR.TRESGProductResponsibilityScore',
'TR.TRESGProductResponsibilityScoreGrade',
'TR.TRESGScore',
'TR.TRESGScoreGrade',
'TR.WaterRecycledScore',
]

for i in range(0, len(stocks), chunk_size):
chunk = stocks[i:i + chunk_size] # Select a chunk of stocks
print(f"Processing chunk {i // chunk_size + 1}: {len(chunk)} instruments")

Fetch ESG data for the current chunk

df = ld.get_history(
    universe=chunk,
    fields=fields,
    interval="monthly",
    start='2010-01-31',
    end='2024-12-31'
)


# Append the results to the main DataFrame
df=df.stack(level=0).reset_index()

# Extract column names dynamically
extracted_columns =list(panel_data.columns.get_level_values(0).unique())
panel_data.columns = extracted_columns

all_data = pd.concat([all_data, df])

# Display progress
print(f"Processed chunk {i // chunk_size + 1} of {len(stocks) // chunk_size + 1}")

# Sleep between chunks
time.sleep(sleep_time)

Ensure the Date column is ordered from the oldest to the newest

panel_data['Date'] = pd.to_datetime(panel_data['Date']) # Convert to datetime if not already
panel_data = panel_data.sort_values(by='Date').reset_index(drop=True)

Save the final result to a CSV file

panel_data.to_csv('esg_chars.csv', index=False)

Jirapongse

@Robinrob

Please scope down the issue by providing sample instruments used in the universe parameter what causes this issue. Therefore, we can test it with this code and focus on the resposne returned from the get_history method.

df = ld.get_history(

    universe=chunk,

    fields=fields,

    interval="monthly",

    start='2010-01-31',

    end='2024-12-31'

)

Robinrob

Here is a sample of the codes 020002BA8, 48130CQH1
, 871829AF4
, 559222BA1
and 20030NBU4 that make the retireval start in 2007 and not in 2010.

Jirapongse

@Robinrob

Thank you for the information.

I requested one field at a time to check the returned start date and found that some fields are not availble. The results are:

-  TR.AnalyticCO2DirectScope1ToEVIC  :  An error occured Unable to resolve all requested fields in ['TR.ANALYTICCO2DIRECTSCOPE1TOEVIC']. The formula must contain at least one field or function.

-  TR.AnalyticCO2DirectScope1YoY  :  2009-01-31T00:00:00.000000000

-  TR.AnalyticCO2EVIC  :  2009-01-31T00:00:00.000000000

-  TR.AnalyticCO2EquivalentEmissionsTotalYoY  :  2009-01-31T00:00:00.000000000
-  TR.AnalyticCO2IndirectScope2ToEVIC  :  An error occured Unable to resolve all requested fields in ['TR.ANALYTICCO2INDIRECTSCOPE2TOEVIC']. The formula must contain at least one field or function.

-  TR.AnalyticCO2IndirectScope2YoY  :  2009-01-31T00:00:00.000000000
-  TR.AnalyticDischargeWaterSystemScore  :  No Data

-  TR.AnalyticEnergyUse  :  2008-06-28T00:00:00.000000000

-  TR.AnalyticEnergyUseEVIC  :  2009-01-31T00:00:00.000000000

-  TR.AnalyticEnvControvScore  :  2008-06-28T00:00:00.000000000

-  TR.AnalyticEnvExpendituresScore  :  2008-06-28T00:00:00.000000000

-  TR.AnalyticGHGEmissionsDirectScope1ToEVIC  :  2009-01-31T00:00:00.000000000

-  TR.AnalyticGHGEmissionsIndirectScope2ToEVIC  :  2009-01-31T00:00:00.000000000

-  TR.AnalyticGHGEmissionsIntensityIndirectScope2YoYPercentChg  :  2009-01-31T00:00:00.000000000

-  TR.AnalyticGHGEmissionsIntensityIndirectScope3YoYPercentChg  :  2009-01-31T00:00:00.000000000

-  TR.AnalyticGHGEmissionsIntensityScope1and2YoYChg  :  2009-01-31T00:00:00.000000000

-  TR.AnalyticGHGEmissionsIntensityScope1and2and3CAGR3Year  :  2009-01-31T00:00:00.000000000

-  TR.AnalyticGHGEmissionsIntensityScope1and2and3CAGR5Year  :  2009-01-31T00:00:00.000000000

-  TR.AnalyticGHGEmissionsIntensityScope1and2and3YoYChg  :  2009-01-31T00:00:00.000000000

-  TR.AnalyticGHGEmissionsIntensityScope3CAGR5Year  :  2008-12-31T00:00:00.000000000

-  TR.AnalyticGHGEmissionsIntensitydirectScope1YoYPercentChg  :  2009-01-31T00:00:00.000000000

-  TR.AnalyticGHGEmissionsScope1and2and3  :  2008-12-31T00:00:00.000000000

-  TR.AnalyticGHGEmissionsScope1and2and3CAGR3Year  :  2009-01-31T00:00:00.000000000

-  TR.AnalyticGHGEmissionsScope1and2and3CAGR5Year  :  2009-01-31T00:00:00.000000000

-  TR.AnalyticGHGEmissionsScope1and2and3ToEVIC  :  2009-01-31T00:00:00.000000000

-  TR.AnalyticGHGEmissionsScope1and2and3YoYChgPercent  :  2009-01-31T00:00:00.000000000

-  TR.AnalyticHazardousWasteEVIC  :  2010-01-31T00:00:00.000000000

-  TR.AnalyticRenewEnergyUse  :  2009-02-28T00:00:00.000000000

-  TR.AnalyticResourceRedPolicyScore  :  2008-06-28T00:00:00.000000000

-  TR.AnalyticResourceRedTargetsScore  :  2008-06-28T00:00:00.000000000

-  TR.AnalyticTotalWasteEVIC  :  2010-01-31T00:00:00.000000000

-  TR.CDPEmissionReductionTargetPctage  :  2009-01-31T00:00:00.000000000
-  TR.CDPFutureCapexAlignmentCommitment  :  No Data

-  TR.CDPPhasingoutInvestmentsinCarbonIntensiveAssetsorProducts  :  2009-01-31T00:00:00.000000000
-  TR.CompanyGreenRevenuePercentage  :  An error occured Output parameter 'date' is unrecognized. This could possibly happen because of mismatch between top level function expression and its usage in output. Make sure that you used exactly 1:1 identical expressions. Requested universes: ['020002BA8', '48130CQH1', '871829AF4', '559222BA1']. Requested fields: ['TR.COMPANYGREENREVENUEPERCENTAGE']

-  TR.ControvEnv  :  2010-01-31T00:00:00.000000000

-  TR.EnvExpenditures  :  2009-01-31T00:00:00.000000000

-  TR.EnvMaterialsSourcingScore  :  2008-06-28T00:00:00.000000000
-  TR.EnvRD  :  No Data

-  TR.GovernancePillarScore  :  2008-06-28T00:00:00.000000000

-  TR.GovernancePillarScoreGrade  :  2008-06-28T00:00:00.000000000

-  TR.PolicyEnergyEfficiencyScore  :  2008-06-28T00:00:00.000000000

-  TR.PolicyEnvSupplyChainScore  :  2008-06-28T00:00:00.000000000

-  TR.PolicySustainablePackagingScore  :  2008-06-28T00:00:00.000000000

-  TR.Scope1EstTotalToEVIC  :  2009-01-31T00:00:00.000000000
-  TR.Scope1EstTotalToRevenues  :  No Data

-  TR.Scope1andScope2EstTotalsToRevenues  :  2008-12-31T00:00:00.000000000

-  TR.Scope1andScope2andScope3EstTotal  :  2008-12-31T00:00:00.000000000

-  TR.Scope2EstTotalToEVIC  :  2009-01-31T00:00:00.000000000
-  TR.Scope2EstTotalToRevenues  :  No Data

-  TR.Scope3DownstreamEstTotalToEVIC  :  2009-01-31T00:00:00.000000000

-  TR.Scope3DownstreamEstTotalToRevenues  :  2008-12-31T00:00:00.000000000

-  TR.Scope3DownstreamTotalToEVIC  :  2009-01-31T00:00:00.000000000

-  TR.Scope3DownstreamTotalToRevenues  :  2009-01-31T00:00:00.000000000

-  TR.Scope3EstTotalToRevenues  :  2008-12-31T00:00:00.000000000

-  TR.Scope3UpstreamEstTotalToEVIC  :  2009-01-31T00:00:00.000000000

-  TR.Scope3UpstreamEstTotalToRevenues  :  2008-12-31T00:00:00.000000000

-  TR.Scope3UpstreamTotalToEVIC  :  2009-01-31T00:00:00.000000000

-  TR.Scope3UpstreamTotalToRevenues  :  2009-01-31T00:00:00.000000000

-  TR.SocialPillarScore  :  2008-06-28T00:00:00.000000000

-  TR.SocialPillarScoreGrade  :  2008-06-28T00:00:00.000000000

-  TR.TargetEmissionsScore  :  2008-12-31T00:00:00.000000000
-  TR.TargetPollution  :  An error occured Unable to resolve all requested fields in ['TR.TARGETPOLLUTION']. The formula must contain at least one field or function.

-  TR.TargetsWaterEfficiencyScore  :  2009-01-31T00:00:00.000000000

-  TR.ToxicChemicalsReductionScore  :  2008-06-28T00:00:00.000000000

-  TR.TRESGCControversiesScore  :  2008-06-28T00:00:00.000000000

-  TR.TRESGCControversiesScoreGrade  :  2008-06-28T00:00:00.000000000

-  TR.TRESGCControversiesScoreGrowth  :  2008-06-28T00:00:00.000000000

-  TR.TRESGCScore  :  2008-06-28T00:00:00.000000000

-  TR.TRESGCScoreGrade  :  2008-06-28T00:00:00.000000000

-  TR.TRESGProductResponsibilityScore  :  2008-06-28T00:00:00.000000000

-  TR.TRESGProductResponsibilityScoreGrade  :  2008-06-28T00:00:00.000000000

-  TR.TRESGScore  :  2008-06-28T00:00:00.000000000

-  TR.TRESGScoreGrade  :  2008-06-28T00:00:00.000000000
-  TR.WaterRecycledScore  :  No Data

According to this information, it looks like to be a content issue. I would like to suggest to contact the helpdesk team directly via MyAccount to verify the fields, parameters, and data.

EXPLORE OUR SITES