Inconsitent retrieval

Robinrob
Robinrob Newcomer

Hi all, i am trying to retrieve a list of fields for a set of bond cusip codes, however i am getting an inconsistent output. First when i retrieve i see that the dates/indexes instead of going from 2010-01-31 to 2024-12-31 it starts in 1968 and goes until 2024. Additionally i have lots of missing values that should be there in the dataframe. Can you please help me to with this query. Here is the code;

import os

# Chunking parameters

chunk_size = 10 # Number of instruments to process in each chunk

sleep_time = 2 # Time (in seconds) to wait between API calls

# Backup file paths

backup_data_file = 'backup_all_data.csv'

retrieved_cusips_file = 'retrieved_cusips.csv'

# Load progress if backup files exist

if os.path.exists(backup_data_file):

all_data = pd.read_csv(backup_data_file)

else:

all_data = pd.DataFrame()

if os.path.exists(retrieved_cusips_file):

retrieved_cusips = pd.read_csv(retrieved_cusips_file)['cusip'].tolist()

else:

retrieved_cusips = []

# Compute remaining cusips to process

cusips_remaining = [cusip for cusip in cusip_bonds if cusip not in retrieved_cusips]

# Process in chunks

for i in range(0, len(cusips_remaining), chunk_size):

chunk = cusips_remaining[i:i + chunk_size]

try:

# Retrieve data for the current chunk

df = ld.get_history(

universe=chunk,

fields=[

'TR.FIIssueDate',

'TR.FiMaturityDate',

'TR.NICouponRate',

'EFF_DURTN',

'TR.IssueSizeUSD',

'TR.CA.AmtOutstanding',

'TR.Volatility200D',

'TR.OASAnalytics',

],

interval="monthly",

start='2010-01-31',

end='2024-12-31'

)

# Append the results to the main DataFrame

all_data = pd.concat([all_data, df],axis=1)

# Update retrieved_cusips list

retrieved_cusips.extend(chunk)

# Save current progress

all_data.to_csv(backup_data_file, index=True)

pd.DataFrame({'cusip': retrieved_cusips}).to_csv(retrieved_cusips_file, index=True)

# Display progress

print(f"Processed chunk {i // chunk_size + 1} of {len(cusips_remaining) // chunk_size + 1}")

# Sleep between chunks

time.sleep(sleep_time)

except Exception as e:

print(f"Error occurred while processing chunk {i // chunk_size + 1}: {e}")

print("Progress saved. You can resume from the last successful chunk.")

break

# Transform the combined DataFrame into 2D panel data format if all chunks are processed

if len(retrieved_cusips) == len(cusip_bonds):

panel_data = all_data.stack(level=0).reset_index()

# Dynamically extract column names from the original DataFrame

extracted_columns = ['Date', 'cusip_bonds'] + list(all_data.columns.get_level_values(1).unique())

# Assign the extracted column names to the panel_data

panel_data.columns = extracted_columns

# Ensure the Date column is sorted from oldest to newest

panel_data['Date'] = pd.to_datetime(panel_data['Date']) # Convert to datetime if not already

panel_data = panel_data.sort_values(by='Date').reset_index(drop=True)

# Save the final DataFrame to a CSV file

panel_data.to_csv('final_bond_chars.csv', index=True)

print("Data processing complete. File saved as 'final_bond_chars.csv'.")

else:

print("Processing incomplete. Backup saved. Resume later.")

Thanks

Tagged:

Answers

  • Hi @Robinrob

    It would be easier if you were to narrow down the specific API call that includes a single item (RIC) in the request. We can't debug your entire application for you but can help better understand whether a specific API call is not retrieving expected data and go from there. For example, provide an example that makes a get_history() call with a single item and demonstrate where you perceive it is not providing the expected data.

  • Robinrob
    Robinrob Newcomer

    I will narrow my question. How can i retrieve the fields, for that timespan in chunks, and then transform it from a multiindex dataframe into a 2d panel data, like this one:

    image.png

    I tried previous answers given in this forum and i didnt manage to solve it

    Thanks

  • @Robinrob

    Unfortunately, this is not related to our APIs. This forum is dedicated to the usage of our APIs as opposed to general purpose Python and dataframe transform-related questions.

    If you can demonstrate a single call with a RIC using our APIs and show where the issue is, we can take it from there. I hope this makes sense.

  • Robinrob
    Robinrob Newcomer

    Ok, anyway. I tried a similar procedure with this code:

    Define chunk size and sleep time

    chunk_size = 10 # Number of instruments to process at a time
    sleep_time = 2 # Time (in seconds) to sleep between processing chunks

    Initialize an empty DataFrame to store the results

    all_data = pd.DataFrame()
    stock_codes=stock_codes[:20]

    Process in chunks

    for i in range(0, len(stock_codes), chunk_size):
    chunk = stock_codes[i:i + chunk_size]

    # Fetch history for the current chunk
    df = ld.get_history(
        universe=chunk,
        fields=[
            'TR.FiSPRating',
            'TR.TotalEquity',
            'TR.F.MinIntrNonCtrlIntrTot',
            'TR.MARKETCAPITALISATION',
            'TR.F.CashCashEquivTot',
            'TR.TotalDebtOutstanding',
            'TR.F.InvntTot',
            'TR.F.PrefStockRedeemTot',
            'TR.MnATargetNetSalesMRQ',
            'TR.F.TotRevenue',
            'TR.F.EBITDA',
            'TR.DepreciationAmort',                        
            'TR.F.CostOfOpRev',
            'TR.F.TotAssets'
        ],
        interval='monthly',
        start='2010-01-31',
        end='2024-12-31'
    )
    
    # Append the results to the main DataFrame
    all_data = pd.concat([all_data, df])
    
    # Display progress
    print(f"Processed chunk {i // chunk_size + 1} of {len(stock_codes) // chunk_size + 1}")
    
    # Sleep between chunks
    time.sleep(sleep_time)
    
    Save the final result to a CSV file

    all_data.to_csv('stock_chars.csv', index=True)

    And i got this result, the code on the firt row is PermID of JPM, and this output is inconsistent in terms of dates and there is a lot of missing values:

    image.png
  • Hi @Robinrob

    Thanks for the sample.

    When using the 'get_history()' call you will have to use a different set of fields. In addition, the call does not support permids as input, only RICs - so you will have to convert.

    For example:

    import lseg.data as ld
    from lseg.data.content import symbol_conversion … # Convert permid to ric
    permids = ['5000021791']
    response = symbol_conversion.Definition(permids, from_symbol_type=symbol_conversion.SymbolTypes.OA_PERM_ID).get_data()
    image.png

    Then pull out the RIC and perform the following get_history() call, without specifying any fields - this will default to bring back everyone:

    image.png

    The other way users pull down history is by utilizing the get_data() function which pulls out historical content from a different database. With this, you can use the fields you specified.

    For example:

    ld.get_data(
    universe=['5000021791'],
    fields=['TR.FiSPRating', # No ratings returned - you may need additional license
    'TR.F.TotRevenue.calcdate',
    'TR.TotalEquity',
    'TR.F.MinIntrNonCtrlIntrTot',
    'TR.MARKETCAPITALISATION',
    'TR.F.CashCashEquivTot',
    'TR.TotalDebtOutstanding',
    'TR.F.InvntTot',
    'TR.F.PrefStockRedeemTot',
    #'TR.MnATargetNetSalesMRQ', # Commented out - causes duplicates in result set
    'TR.F.TotRevenue',
    'TR.F.EBITDA',
    'TR.DepreciationAmort',
    'TR.F.CostOfOpRev',
    'TR.F.TotAssets'],
    parameters={'SDate': "2010-01-31", 'EDate': "2024-12-31", 'frq': "CM"}
    )
    image.png

    This should provide you a place to start to experiment.

  • Robinrob
    Robinrob Newcomer

    So the get_history method only supports RIC identifiers for both stocks and bonds?

  • Jirapongse
    Jirapongse ✭✭✭✭✭

    @Robinrob

    The get_history method can retreive historical data of both TR fields and real-time fields. The TR fields are the fields that have the "TR." prefix while the real-time fields don't have the "TR." prefix, such as BID and ASK.

    To retrieve historical data from the TR fields, you can use PermIDs. However, to retrieve historical data of real-time fields, you can use only RICs.

    I tested the code and found that the 'TR.MnATargetNetSalesMRQ' field may cause this issue. If I removed it, I can retrieve the data properly.

    df = ld.get_history(
        universe=['5000021791'],
        fields=[
                   'TR.FiSPRating',
            'TR.TotalEquity',
            'TR.F.MinIntrNonCtrlIntrTot',
            'TR.MARKETCAPITALISATION',
            'TR.F.CashCashEquivTot',
            'TR.TotalDebtOutstanding',
            'TR.F.InvntTot',
            'TR.F.PrefStockRedeemTot'
            'TR.F.TotRevenue',
            'TR.F.EBITDA',
            'TR.DepreciationAmort',                        
            'TR.F.CostOfOpRev',
            'TR.F.TotAssets'
        ],
        interval='monthly',
        start='2010-01-31',
        end='2024-12-31'
    )
    
    df
    
  • Robinrob
    Robinrob Newcomer

    I tried to retrieve these fields with stock RICS and i still get data out of the time parameters i defined, is it more robust to use get_data?:

    Load the stocks

    file = pd.read_csv('filtered_bonds_final.csv')
    stocks = file['RIC'].unique().tolist() # Convert to a list
    all_data=pd.DataFrame()

    Define chunk size for processing

    chunk_size = 20 # Adjust as needed for API rate limits or memory constraints
    sleep_time = 2 # Time (in seconds) to wait between API calls

    File to store ESG data

    esg_data_file = 'ESG_data.csv'

    Define the fields to retrieve

    fields = [
    'TR.AnalyticCO2DirectScope1ToEVIC',
    'TR.AnalyticCO2DirectScope1YoY',
    'TR.AnalyticCO2EVIC',
    'TR.AnalyticCO2EquivalentEmissionsTotalYoY',
    'TR.AnalyticCO2IndirectScope2ToEVIC',
    'TR.AnalyticCO2IndirectScope2YoY',
    'TR.AnalyticDischargeWaterSystemScore',
    'TR.AnalyticEnergyUse',
    'TR.AnalyticEnergyUseEVIC',
    'TR.AnalyticEnvControvScore',
    'TR.AnalyticEnvExpendituresScore',
    'TR.AnalyticGHGEmissionsDirectScope1ToEVIC',
    'TR.AnalyticGHGEmissionsIndirectScope2ToEVIC',
    'TR.AnalyticGHGEmissionsIntensityIndirectScope2YoYPercentChg',
    'TR.AnalyticGHGEmissionsIntensityIndirectScope3YoYPercentChg',
    'TR.AnalyticGHGEmissionsIntensityScope1and2YoYChg',
    'TR.AnalyticGHGEmissionsIntensityScope1and2and3CAGR3Year',
    'TR.AnalyticGHGEmissionsIntensityScope1and2and3CAGR5Year',
    'TR.AnalyticGHGEmissionsIntensityScope1and2and3YoYChg',
    'TR.AnalyticGHGEmissionsIntensityScope3CAGR5Year',
    'TR.AnalyticGHGEmissionsIntensitydirectScope1YoYPercentChg',
    'TR.AnalyticGHGEmissionsScope1and2and3',
    'TR.AnalyticGHGEmissionsScope1and2and3CAGR3Year',
    'TR.AnalyticGHGEmissionsScope1and2and3CAGR5Year',
    'TR.AnalyticGHGEmissionsScope1and2and3ToEVIC',
    'TR.AnalyticGHGEmissionsScope1and2and3YoYChgPercent',
    'TR.AnalyticHazardousWasteEVIC',
    'TR.AnalyticRenewEnergyUse',
    'TR.AnalyticResourceRedPolicyScore',
    'TR.AnalyticResourceRedTargetsScore',
    'TR.AnalyticTotalWasteEVIC',
    'TR.CDPEmissionReductionTargetPctage',
    'TR.CDPFutureCapexAlignmentCommitment',
    'TR.CDPPhasingoutInvestmentsinCarbonIntensiveAssetsorProducts',
    'TR.CompanyGreenRevenuePercentage',
    'TR.ControvEnv',
    'TR.EnvExpenditures',
    'TR.EnvMaterialsSourcingScore',
    'TR.EnvRD',
    'TR.GovernancePillarScore',
    'TR.GovernancePillarScoreGrade',
    'TR.PolicyEnergyEfficiencyScore',
    'TR.PolicyEnvSupplyChainScore',
    'TR.PolicySustainablePackagingScore',
    'TR.Scope1EstTotalToEVIC',
    'TR.Scope1EstTotalToRevenues',
    'TR.Scope1andScope2EstTotalsToRevenues',
    'TR.Scope1andScope2andScope3EstTotal',
    'TR.Scope2EstTotalToEVIC',
    'TR.Scope2EstTotalToRevenues',
    'TR.Scope3DownstreamEstTotalToEVIC',
    'TR.Scope3DownstreamEstTotalToRevenues',
    'TR.Scope3DownstreamTotalToEVIC',
    'TR.Scope3DownstreamTotalToRevenues',
    'TR.Scope3EstTotalToRevenues',
    'TR.Scope3UpstreamEstTotalToEVIC',
    'TR.Scope3UpstreamEstTotalToRevenues',
    'TR.Scope3UpstreamTotalToEVIC',
    'TR.Scope3UpstreamTotalToRevenues',
    'TR.SocialPillarScore',
    'TR.SocialPillarScoreGrade',
    'TR.TargetEmissionsScore',
    'TR.TargetPollution',
    'TR.TargetsWaterEfficiencyScore',
    'TR.ToxicChemicalsReductionScore',
    'TR.TRESGCControversiesScore',
    'TR.TRESGCControversiesScoreGrade',
    'TR.TRESGCControversiesScoreGrowth',
    'TR.TRESGCScore',
    'TR.TRESGCScoreGrade',
    'TR.TRESGProductResponsibilityScore',
    'TR.TRESGProductResponsibilityScoreGrade',
    'TR.TRESGScore',
    'TR.TRESGScoreGrade',
    'TR.WaterRecycledScore',
    ]

    for i in range(0, len(stocks), chunk_size):
    chunk = stocks[i:i + chunk_size] # Select a chunk of stocks
    print(f"Processing chunk {i // chunk_size + 1}: {len(chunk)} instruments")

    Fetch ESG data for the current chunk
    df = ld.get_history(
        universe=chunk,
        fields=fields,
        interval="monthly",
        start='2010-01-31',
        end='2024-12-31'
    )
    
    
    # Append the results to the main DataFrame
    df=df.stack(level=0).reset_index()
    
    # Extract column names dynamically
    extracted_columns =list(panel_data.columns.get_level_values(0).unique())
    panel_data.columns = extracted_columns
    
    all_data = pd.concat([all_data, df])
    
    # Display progress
    print(f"Processed chunk {i // chunk_size + 1} of {len(stocks) // chunk_size + 1}")
    
    # Sleep between chunks
    time.sleep(sleep_time)
    
    Ensure the Date column is ordered from the oldest to the newest

    panel_data['Date'] = pd.to_datetime(panel_data['Date']) # Convert to datetime if not already
    panel_data = panel_data.sort_values(by='Date').reset_index(drop=True)

    Save the final result to a CSV file

    panel_data.to_csv('esg_chars.csv', index=False)

  • Jirapongse
    Jirapongse ✭✭✭✭✭

    @Robinrob

    Please scope down the issue by providing sample instruments used in the universe parameter what causes this issue. Therefore, we can test it with this code and focus on the resposne returned from the get_history method.

    df = ld.get_history(
    universe=chunk,
    fields=fields,
    interval="monthly",
    start='2010-01-31',
    end='2024-12-31'
    )
  • Robinrob
    Robinrob Newcomer

    Here is a sample of the codes 020002BA8, 48130CQH1
    , 871829AF4
    , 559222BA1
    and 20030NBU4 that make the retireval start in 2007 and not in 2010.

  • Jirapongse
    Jirapongse ✭✭✭✭✭

    @Robinrob

    Thank you for the information.

    I requested one field at a time to check the returned start date and found that some fields are not availble. The results are:

    -  TR.AnalyticCO2DirectScope1ToEVIC  :  An error occured Unable to resolve all requested fields in ['TR.ANALYTICCO2DIRECTSCOPE1TOEVIC']. The formula must contain at least one field or function.
    - TR.AnalyticCO2DirectScope1YoY : 2009-01-31T00:00:00.000000000
    - TR.AnalyticCO2EVIC : 2009-01-31T00:00:00.000000000
    - TR.AnalyticCO2EquivalentEmissionsTotalYoY : 2009-01-31T00:00:00.000000000
    - TR.AnalyticCO2IndirectScope2ToEVIC : An error occured Unable to resolve all requested fields in ['TR.ANALYTICCO2INDIRECTSCOPE2TOEVIC']. The formula must contain at least one field or function.
    - TR.AnalyticCO2IndirectScope2YoY : 2009-01-31T00:00:00.000000000
    - TR.AnalyticDischargeWaterSystemScore : No Data
    - TR.AnalyticEnergyUse : 2008-06-28T00:00:00.000000000
    - TR.AnalyticEnergyUseEVIC : 2009-01-31T00:00:00.000000000
    - TR.AnalyticEnvControvScore : 2008-06-28T00:00:00.000000000
    - TR.AnalyticEnvExpendituresScore : 2008-06-28T00:00:00.000000000
    - TR.AnalyticGHGEmissionsDirectScope1ToEVIC : 2009-01-31T00:00:00.000000000
    - TR.AnalyticGHGEmissionsIndirectScope2ToEVIC : 2009-01-31T00:00:00.000000000
    - TR.AnalyticGHGEmissionsIntensityIndirectScope2YoYPercentChg : 2009-01-31T00:00:00.000000000
    - TR.AnalyticGHGEmissionsIntensityIndirectScope3YoYPercentChg : 2009-01-31T00:00:00.000000000
    - TR.AnalyticGHGEmissionsIntensityScope1and2YoYChg : 2009-01-31T00:00:00.000000000
    - TR.AnalyticGHGEmissionsIntensityScope1and2and3CAGR3Year : 2009-01-31T00:00:00.000000000
    - TR.AnalyticGHGEmissionsIntensityScope1and2and3CAGR5Year : 2009-01-31T00:00:00.000000000
    - TR.AnalyticGHGEmissionsIntensityScope1and2and3YoYChg : 2009-01-31T00:00:00.000000000
    - TR.AnalyticGHGEmissionsIntensityScope3CAGR5Year : 2008-12-31T00:00:00.000000000
    - TR.AnalyticGHGEmissionsIntensitydirectScope1YoYPercentChg : 2009-01-31T00:00:00.000000000
    - TR.AnalyticGHGEmissionsScope1and2and3 : 2008-12-31T00:00:00.000000000
    - TR.AnalyticGHGEmissionsScope1and2and3CAGR3Year : 2009-01-31T00:00:00.000000000
    - TR.AnalyticGHGEmissionsScope1and2and3CAGR5Year : 2009-01-31T00:00:00.000000000
    - TR.AnalyticGHGEmissionsScope1and2and3ToEVIC : 2009-01-31T00:00:00.000000000
    - TR.AnalyticGHGEmissionsScope1and2and3YoYChgPercent : 2009-01-31T00:00:00.000000000
    - TR.AnalyticHazardousWasteEVIC : 2010-01-31T00:00:00.000000000
    - TR.AnalyticRenewEnergyUse : 2009-02-28T00:00:00.000000000
    - TR.AnalyticResourceRedPolicyScore : 2008-06-28T00:00:00.000000000
    - TR.AnalyticResourceRedTargetsScore : 2008-06-28T00:00:00.000000000
    - TR.AnalyticTotalWasteEVIC : 2010-01-31T00:00:00.000000000
    - TR.CDPEmissionReductionTargetPctage : 2009-01-31T00:00:00.000000000
    - TR.CDPFutureCapexAlignmentCommitment : No Data
    - TR.CDPPhasingoutInvestmentsinCarbonIntensiveAssetsorProducts : 2009-01-31T00:00:00.000000000
    - TR.CompanyGreenRevenuePercentage : An error occured Output parameter 'date' is unrecognized. This could possibly happen because of mismatch between top level function expression and its usage in output. Make sure that you used exactly 1:1 identical expressions. Requested universes: ['020002BA8', '48130CQH1', '871829AF4', '559222BA1']. Requested fields: ['TR.COMPANYGREENREVENUEPERCENTAGE']
    - TR.ControvEnv : 2010-01-31T00:00:00.000000000
    - TR.EnvExpenditures : 2009-01-31T00:00:00.000000000
    - TR.EnvMaterialsSourcingScore : 2008-06-28T00:00:00.000000000
    - TR.EnvRD : No Data
    - TR.GovernancePillarScore : 2008-06-28T00:00:00.000000000
    - TR.GovernancePillarScoreGrade : 2008-06-28T00:00:00.000000000
    - TR.PolicyEnergyEfficiencyScore : 2008-06-28T00:00:00.000000000
    - TR.PolicyEnvSupplyChainScore : 2008-06-28T00:00:00.000000000
    - TR.PolicySustainablePackagingScore : 2008-06-28T00:00:00.000000000
    - TR.Scope1EstTotalToEVIC : 2009-01-31T00:00:00.000000000
    - TR.Scope1EstTotalToRevenues : No Data
    - TR.Scope1andScope2EstTotalsToRevenues : 2008-12-31T00:00:00.000000000
    - TR.Scope1andScope2andScope3EstTotal : 2008-12-31T00:00:00.000000000
    - TR.Scope2EstTotalToEVIC : 2009-01-31T00:00:00.000000000
    - TR.Scope2EstTotalToRevenues : No Data
    - TR.Scope3DownstreamEstTotalToEVIC : 2009-01-31T00:00:00.000000000
    - TR.Scope3DownstreamEstTotalToRevenues : 2008-12-31T00:00:00.000000000
    - TR.Scope3DownstreamTotalToEVIC : 2009-01-31T00:00:00.000000000
    - TR.Scope3DownstreamTotalToRevenues : 2009-01-31T00:00:00.000000000
    - TR.Scope3EstTotalToRevenues : 2008-12-31T00:00:00.000000000
    - TR.Scope3UpstreamEstTotalToEVIC : 2009-01-31T00:00:00.000000000
    - TR.Scope3UpstreamEstTotalToRevenues : 2008-12-31T00:00:00.000000000
    - TR.Scope3UpstreamTotalToEVIC : 2009-01-31T00:00:00.000000000
    - TR.Scope3UpstreamTotalToRevenues : 2009-01-31T00:00:00.000000000
    - TR.SocialPillarScore : 2008-06-28T00:00:00.000000000
    - TR.SocialPillarScoreGrade : 2008-06-28T00:00:00.000000000
    - TR.TargetEmissionsScore : 2008-12-31T00:00:00.000000000
    - TR.TargetPollution : An error occured Unable to resolve all requested fields in ['TR.TARGETPOLLUTION']. The formula must contain at least one field or function.
    - TR.TargetsWaterEfficiencyScore : 2009-01-31T00:00:00.000000000
    - TR.ToxicChemicalsReductionScore : 2008-06-28T00:00:00.000000000
    - TR.TRESGCControversiesScore : 2008-06-28T00:00:00.000000000
    - TR.TRESGCControversiesScoreGrade : 2008-06-28T00:00:00.000000000
    - TR.TRESGCControversiesScoreGrowth : 2008-06-28T00:00:00.000000000
    - TR.TRESGCScore : 2008-06-28T00:00:00.000000000
    - TR.TRESGCScoreGrade : 2008-06-28T00:00:00.000000000
    - TR.TRESGProductResponsibilityScore : 2008-06-28T00:00:00.000000000
    - TR.TRESGProductResponsibilityScoreGrade : 2008-06-28T00:00:00.000000000
    - TR.TRESGScore : 2008-06-28T00:00:00.000000000
    - TR.TRESGScoreGrade : 2008-06-28T00:00:00.000000000
    - TR.WaterRecycledScore : No Data

    According to this information, it looks like to be a content issue. I would like to suggest to contact the helpdesk team directly via MyAccount to verify the fields, parameters, and data.