Very bizarre behavior-- data which is showing up as NaN when requested with multiple securities, is showing up as valid when requested one security at a time. More oddly, if the multiple securities request is rerun after a single-security request, any securities singly-requested then have non-NaN values.
Reproduction instructions:
Suppose I set rics to a list of rics:
rics = [u'GLT', u'GLW', u'GATX.K', u'NEO.O', u'GNTX.O', u'CORT.O', u'COLM.O', u'GPC', u'GPS', u'MRC', u'CASH.O', u'GRC', u'GT.O', u'ZBH', u'LBAI.O', u'ROCK.O', u'IAC.O', u'GWW', u'AJRD.K', u'KAI', u'UBSH.O', u'HAL', u'BECN.O', u'HAS.O', u'HRC', u'RDWR.O', u'HB AN.O', u'ARNA.O', u'PTEN.O', u'ALLT.O', u'DECK.K', u'WELL.K', u'HCP', u'HCSG.O', u'BEL', u'HD', u'HOG', u'HE', u'UFPI.O', u'HEI', u'SCHN.O', u'HELE.O', u'MOV', u'LXP', u'CHK', u'REG.O', u'CSEAF.PK', u'HL']
Consider the following base request:
In[19:] data, err = ek.get_data(rics, ['TR.CUSIPCode', 'TR.CinCUSIPCode'])
In the response to this first call, I get all NaN for every RIC:
In [20]: print data
NaN TR.CUSIPCODE TR.CINCUSIPCODE
0 GLT NaN NaN
1 GLW NaN NaN
2 GATX.K NaN NaN
...
46 CSEAF.PK NaN NaN
47 HL NaN NaN
In the second call, I request just the first RIC, and find a valid CUSIP for it:
In [21]: data, err = ek.get_data(rics[0], data_field)
In [22]: print data
Instrument CUSIP Code CIN
0 GLT 377316104 NaN
Note that requesting RIC GLT alone makes it non-NaN, even though in the multi-RIC request it was NaN.
Then if I immediately re-request all the RICs-- now GLT has a valid CUSIP code but none of the other RICs do:
In [23]: data, err = ek.get_data(rics, data_field)
In [24]: print data
Instrument CUSIP Code CIN
0 GLT 377316104.0 NaN
1 GLW NaN NaN
2 GATX.K NaN NaN
...
46 CSEAF.PK NaN NaN
47 HL NaN NaN
This seems really strange, and I have confirmed that it occurs on both pyeikon 0.1.13 and 1.0.0.
Any ideas what is going on here?
@davidk that's a backend oddity which seems like a bug. Here is a workaround, which is not ideal though:
df, err = ek.get_data(rics[:10], ['TR.CUSIPExtended'])
In the meantime, I will report it to the dev.
Thanks, we will use the CUSIPExtended field.
As for batching-- Any general rule as to what limit to use? We're quants who often ask for data fields for 1000-2000 securities. Limiting requests even for a single field to batches of 10 (or 1, the only thing we've found consistently works recently) really hampers the performance of the Python API. I know there are data limits associated with each API request IIRC they were orders of magnitude higher than this.
@davidk the causes are being investigated. I suggested batching this specific request as a workaround, so you should be fine in other cases, API will tell you when the per request limit is hit.
Another optimization technique for you, until we finish investigating it, would be to use the symbology API to convert RIC to CUSIP, and in a separate call look for the CIN for the tickers that do not have a CUSIP:
df = ek.get_symbology(rics, from_symbol_type='RIC', to_symbol_type='CUSIP') cin_rics = df[df['CUSIP'].isnull()].index.tolist() cin_df, err = ek.get_data(cin_rics, ['TR.CUSIPExtended']