For a deeper look into our Eikon Data API, look into:

Overview |  Quickstart |  Documentation |  Downloads |  Tutorials |  Articles

question

Upvotes
Accepted
2 1 3 5

Extracting data for 100,000 RICs from Refinitiv through Eikon/RDP

Hello,

I have a list of 100,000+ RICs in a .csv file and I want to create a code that opens the file and retrieves bonds' monthly midyield, midprice, ask price, bid price, ask yield, bid yield, ratings, and monthly transaction volume and saves the output in a new .csv file. Ideally, the code should read RICs by chunks and move on to another chunk automatically (this is where my code fails, it only retrieves data for 1 chunk).

I have read every single thread on this topic but since I have 0 technical background, I am struggling to optimise my code.

I would appreciate if you could edit my Eikon code and provide a sample code for RDP.

#Importing
import refinitiv.dataplatform as rdp
import json
import pandas as pd
import os
from pandas.io.json import json_normalize
from pydash.arrays import chunk
import time
symbologylookup_endpoint = rdp.Endpoint(session,
'https://api.refinitiv.com/discovery/symbology/v1/lookup')
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
import eikon as ek
session=rdp.open_desktop_session('XX')


#create a list of RICs
import csv
csvReader = csv.reader(open("Data.csv"))
instruments=[]
for row in csvReader:
instruments.append(row[0])


#Loop through RICs
content_df = []
Bond_fields = ["TR.MIDYIELD","TR.MIDPRICE","TR.ASKPRICE.date","TR.ASKPRICE", "TR.BIDPRICE", "TR.ASKYIELD","TR.BIDYIELD",'TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).RatingSourceDescription','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY)','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).date']
chunklist=2000
for i in range(0, len(instruments), chunklist):
   stock = instruments[i:i+chunklist]
   df, err = ek.get_data(stock, Bond_fields,{'CALCMETHOD':'SUM','Frq': 'M', 'SDate': '2016-01-01','EDate': '2021-12-31'})
   content_df.append(df)
content_df = pd.concat(content_df)
content_df 


@zoya faberov @jason.ramchandani01 @j.dessain

pythonrefinitiv-dataplatform-eikonrdp-apidatabonds
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
Accepted
31.8k 37 11 19

Hello @georgecambridge001 ,

As I see that you have been provided with the solution on the other discussion thread currently active, I will avoid duplicating the same answer and refer to that discussion instead.

Please note, that RD library is strategic, and will continue actively evolving and improving going forward in time, while RDP is legacy remains available as is, at the time of this writing. It is recommended to select RD Library Python for any new requirement implementation, as suggested and illustrated with examples by @umer.nalla and @nick.zincone.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvote
884 4 2 8

Hi @georgecambridge001,

I have updated your code, please test it and let us know if it's working for you:


#Importing
import refinitiv.dataplatform as rdp
import json
import pandas as pd
import os
from pandas.io.json import json_normalize
from pydash.arrays import chunk
import time
symbologylookup_endpoint = rdp.Endpoint(session, 'https://api.refinitiv.com/discovery/symbology/v1/lookup')
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
import eikon as ek
session=rdp.open_desktop_session('XX')


#create a list of RICs
import csv
csvReader = csv.reader(open("Data.csv"))
instruments=[]
for row in csvReader:
instruments.append(row[0])


#Loop through RICs
content_df = []
Bond_fields = ["TR.MIDYIELD","TR.MIDPRICE","TR.ASKPRICE.date","TR.ASKPRICE", "TR.BIDPRICE", "TR.ASKYIELD","TR.BIDYIELD",'TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).RatingSourceDescription','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY)','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).date']
chunklist=2000
for i in range(0, len(instruments), chunklist):
   stock = instruments[i:i+chunklist]
   df, err = ek.get_data(stock, Bond_fields,{'CALCMETHOD':'SUM','Frq': 'M', 'SDate': '2016-01-01','EDate': '2021-12-31'})
   content_df.append(df)
final_df = pd.concat(content_df)
final_df 
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

ezyzip.zip

Hello,

No it does not work. I get this error for concat:

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Please note that some parts of the code become red once I paste them in the cell (see attached photo). when I moved them to the right side and run the code, I got N/A for all the columns. I am also attaching the original .csv file for your reference. I have been struggling to complete this project for the last 3 weeks and would appreciate it if you could test the code in your environment. Please find attached a sample of 10,000 RICs.screenshot-2022-06-06-at-100522.png

Finally, I would also like to request a sample code for RDP that performs identical operation but with RDP.

@Shadab Hussain

@zoya faberov @jason.ramchandani01 @j.dessain

Upvotes
31.8k 37 11 19

Hello @georgecambridge001 ,

In addition to the solution from @Shadab Hussain, you may wish to watch out for the duplicates. If concatenated DataFrames contain either columns that are named the same ( for example "DATE") or rows that contain identical values, this may result in "InvaldiIndexError" on concatenation.

So an approach similar to:

#Loop through RICs
content_df = []
Bond_fields = ["TR.MIDYIELD","TR.MIDPRICE","TR.ASKPRICE.date","TR.ASKPRICE", "TR.BIDPRICE", "TR.ASKYIELD","TR.BIDYIELD",'TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).RatingSourceDescription','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY)','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).date']
chunklist=1000
for i in range(0, len(instruments), chunklist):
   stock = instruments[i:i+chunklist]
   df, err = ek.get_data(stock, Bond_fields,{'CALCMETHOD':'SUM','Frq': 'M', 'SDate': '2016-01-01','EDate': '2021-12-31'})
   print('<<<iter=',str(i),'>>>\nRows=',str(len(df.index)),',Rows duplicates dropped=',str(len(df.drop_duplicates().index)),',Columns=',str(len(df.columns)),'df=\n')#, df,'\n')
   df.columns = ['INSTRUMENT','MIDY','MIDP','ASKP_DATE','ASKP','BIDP','ASKY','BIDY','RATINGDESCR','RATING','RATING_DATE']
   df.drop_duplicates(inplace=True)
   content_df.append(df)
final_df = pd.concat(content_df,ignore_index=True)
print(final_df) 

Renaming all columns with unique names and dripping duplicate rows, could help you avoid the error. Any approach that will avoid any duplicates should work.

RD library is the strategic solution, RDP library being it's predecessor, and you can easily incorporate it just by replacing:

import eikon as ek

with

import refinitiv.data.eikon as ek

Hope this helps

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
2 1 3 5

Hello @zoya faberov and thanks for your reply.

I have tested your code but unfortunately, like my previous code it does not move to another chunk. When I specify chunklist=1000, it simply returns bond data for only 1000 RICs and does not move to the next 1000 RIC automatically. What can be the solution to this issue? I have a list of 100,000 RICs and want to automate the process.

Additionally, I get the following error

AttributeError: 'NoneType' object has no attribute 'logger'

when I replace

  
                
  1. import eikon as ek

with

  
                
  1. import refinitiv.data.eikon as ek

@zoya faberov @Shadab Hussain @jason.ramchandani01 @j.dessain

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
31.8k 37 11 19

Hello @georgecambridge001,

As on my side this chunking code is working,

Let's try to understand better, what is happening on your side.

1. You are running running outside of CodeBook, running naked .py script rather then Jupyter Lab/Notebook (.ipynb)?

2. Try breaking after the first iteration and printing the first result DataFrame and any errors that come up on your first iteration. For example:

#Loop through RICs
content_df = []
Bond_fields = ["TR.MIDYIELD","TR.MIDPRICE","TR.ASKPRICE.date","TR.ASKPRICE", "TR.BIDPRICE", "TR.ASKYIELD","TR.BIDYIELD",'TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).RatingSourceDescription','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY)','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).date']
chunklist=1000
for i in range(0, len(instruments), chunklist):
   stock = instruments[i:i+chunklist]
   df, err = ek.get_data(stock, Bond_fields,{'CALCMETHOD':'SUM','Frq': 'M', 'SDate': '2016-01-01','EDate': '2021-12-31'})
   print('<<<iter=',str(i),'>>>\nRows=',str(len(df.index)),',Rows duplicates dropped=',str(len(df.drop_duplicates().index)),',Columns=',str(len(df.columns)),'df=\n')#, df,'\n')
   df.columns = ['INSTRUMENT','MIDY','MIDP','ASKP_DATE','ASKP','BIDP','ASKY','BIDY','RATINGDESCR','RATING','RATING_DATE']
   df.drop_duplicates(inplace=True)
   print(df,'\n')
   print('***iter=',str(i),'***\nerr=',err)
   content_df.append(df)
   break
final_df = pd.concat(content_df,ignore_index=True)
print('Len=',str(len(final_df.index)))
print(final_df) 

My DataFrame output setups prior to this step look like:

pd.set_option('display.max_rows', 10)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 16)

Therefore, my output looks like:

<<<iter= 0 >>>
Rows= 46428 ,Rows duplicates dropped= 22809 ,Columns= 11 df=

           INSTRUMENT       MIDY        MIDP        ASKP_DATE       ASKP       BIDP       ASKY       BIDY RATINGDESCR RATING RATING_DATE
0      00003#AB9=RRPS  35.659348  1001.72665  2019-09-30T0...  1003.0972  1000.3561  35.159632  36.159632
1      00003#AB9=RRPS   80.06577   2309.1513  2019-10-31T0...  2312.2468  2306.0558  78.916549  81.216549
2      00003#AB9=RRPS  74.969156  2103.19835  2019-11-30T0...  2105.9374  2100.4593   73.91978   76.01978
3      00003#AB9=RRPS  78.872257  2202.41415  2019-12-31T0...     2205.2  2199.6283  77.772943  79.972943
4      00003#AB9=RRPS  79.957356  2308.58905  2020-01-31T0...  2311.4224  2305.7557  78.807996  81.107996
...               ...        ...         ...              ...        ...        ...        ...        ...         ...    ...         ...
46423  004421QQ9=FINR       <NA>        <NA>                        <NA>       <NA>       <NA>       <NA>
46424  004421QR7=FINR       <NA>        <NA>                        <NA>       <NA>       <NA>       <NA>
46425  004421RH8=FINR       <NA>        <NA>                        <NA>       <NA>       <NA>       <NA>
46426  004421TQ6=FINR       <NA>        <NA>                        <NA>       <NA>       <NA>       <NA>
46427  004421UC5=FINR       <NA>        <NA>                        <NA>       <NA>       <NA>       <NA>

[22809 rows x 11 columns] 

and my errors ( which are not empty, but not fatal) look like:

***iter= 0 ***
err= [{'code': 416, 'col': 8, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).RatingSourceDescription' and some specific identifier(s).", 'row': 112}, {'code': 416, 'col': 9, 'message': "Unable to collect data for the field 'TR.GR.Rating' and some specific identifier(s).", 'row': 112}, {'code': 416, 'col': 10, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).DATE' and some specific identifier(s).", 'row': 112}, {'code': 416, 'col': 8, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).RatingSourceDescription' and some specific identifier(s).", 'row': 113}, {'code': 416, 'col': 9, 'message': "Unable to collect data for the field 'TR.GR.Rating' and some specific identifier(s).", 'row': 113}, {'code': 416, 'col': 10, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).DATE' and some specific identifier(s).", 'row': 113}, {'code': 416, 'col': 8, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).RatingSourceDescription' and some specific identifier(s).", 'row': 114}, {'code': 416, 'col': 9, 'message': "Unable to collect data for the field 'TR.GR.Rating' and some specific identifier(s).", 'row': 114}, {'code': 416, 'col': 10, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).DATE' and some specific identifier(s).", 'row': 114}, {'code': 416, 'col': 8, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).RatingSourceDescription' and some specific identifier(s).", 'row': 115}, {'code': 416, 'col': 9,
...

3. My eikon library installed is version 1.1.14

4. I do not reproduce logger error, able to replace the library and run the same code succefully. I have refinitiv-data library version 1.0.0b9 installed, rather then the latest 1.0.0b13. This is because some issues were introduced in version 13, that library development is looking into, so I am holding out for the next version, and staying on version b9 for now.

Let us know how (2) works on your side and what you find?



icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
2 1 3 5

Hello @zoya faberov and @Shadab Hussain , thank you for your support. It works better now.

However, as I stated in this thread before, I want to retrieve identical information with historical_pricing.summaries.Definition (not rd, only RDP historical summaries because for some RICs I found out that RDP retrieves information while RD does not). Will you be so kind as to have a look at my sample code and optimise it so that it can once again open attached (updated, non-duplicated) .csv file with RICs, go through each and return identical bond-related information?

Thank you so much for your help. You are making peoples' lives so much easier.

RICs.zipNew RDP.txt


rics.zip (44.9 KiB)
new-rdp.txt (893 B)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.