Retrieving bulk data

davet1 · February 2019

I made a series of requests for 1500 rics. My Composite, FundAllocation, and TermsAndConditions requests all came back, but my two PriceHistory requests (adjusted and unadjusted prices) and CorporateActions requests have not come back yet.

These requests are all available under X-Client-Session-Id:

6e8cdf96-0508-11e9-8014-525400a87d41_6933

6e8cdf96-0508-11e9-8014-525400a87d41_6938

6e8cdf96-0508-11e9-8014-525400a87d41_6932

For the two PriceHistory requests I am using

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRaw

For the CorporateActions I am using

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractWithNotes

I have set a 4 minute (!) timeout on my requests, but I am still occasionally timing out on hitting the Location url received with a 202 response.

The requests were all created at 08:09, yet half an hour later they're still not resolved.

1500 Rics is only part of the overall data we need to retrieve so this is a bit worrying.

Christiaan Meihsl · February 2019

@davet1,

those are quite big requests (>1200 RICs, 20 years of data). For such large requests (many RICs and/or long time range and/or many data fields), I recommend segmenting them into more manageable ones.

In the DSS best practices we recommend making fewer queries for more RICs, rather than many queries for fewer RICs, to minimize the overhead. This works well, but there is a point where very large requests become a disadvantage. This is not documented, but thinking of it, it appears logical: every system has its limits. And in your case it seems your queries have hit a limit.

The difficulty is evaluating when a request becomes too big. This is more of an art than a science, and it will take some experimenting with your queries to determine what works best for you. As a very rough rule of thumb, if a query takes more than an hour or two to complete, it is probably too big.

For your PriceHistory and Corporate Actions requests I suggest you try one of the following approaches:

Request 1 instrument over 20 years. If it takes less than 5 minutes, try 10 instruments. If that takes less than 5 minutes increase the instrument count more. At the point where the wait time increases too much, stop increasing the number of instruments.
An alternative could be to request more instruments per request, but for a shorter time period. I'd try ranges of 1 year as a start, and apply a similar methodology as above.

I agree this is not ideal, but there is no silver bullet I'm afraid.

I hope this helps.

Christiaan Meihsl · February 2019

@davet1,

can you please share the body of the 2 price history and the corax requests ? That will help us analyze.

Please note that for 1500 instruments, depending on the number of fields and total data, the extraction time could be important.

davet1 · February 2019

https://community.developers.refinitiv.com/discussion/comment/38306#Comment_38306

They're a bit big. Can you find them from the X-Client-Session-Ids?

I expect the CorporateActions problem is data size, so I am changing that to use the Csv/RawExtract flow.

Christiaan Meihsl · February 2019

https://community.developers.refinitiv.com/discussion/comment/38308#Comment_38308

@davet1,

you can share them as attachements. Only the Datascope product support team can access requests from Session Ids, we moderators cannot do that.

For very big requests (many RICs and/or long time range), I'd suggest segmenting them into more manageable ones, maybe 500 RICs at a time, and time ranges of 1 year (these are ballpark numbers, in absence of details). We recommend to make few queries for more RICs rather than many queries for few RICs, but there is a point where very large requests are a disadvantage.

Yes, requesting compressed data instead of CSV will speed up download.

davet1 · February 2019

https://community.developers.refinitiv.com/discussion/comment/38312#Comment_38312

> For very big requests (many RICs and/or long time range), I'd suggest segmenting them into more manageable ones, maybe 500 RICs at a time, and time ranges of 1 year (these are ballpark numbers, in absence of details).

This is quite a big suggestion. I didn't see anything in the documentation about this. This kind of restriction should really be in bold text somewhere around page 1, I would suggest.

Christiaan Meihsl · February 2019

https://community.developers.refinitiv.com/discussion/comment/38314#Comment_38314

@davet1,

I see your point. The point is, this is not an exact science. It is quite difficult to give recommendations, there are many parameters to consider. What I wrote is based on my experience. I usually try to keep my requests in a "reasonable" range, i.e. neither too small (to avoid overhead) neither too big (to avoid huge extraction times). I agree it is very subjective, and sometimes needs tweaking and experimenting to find what "reasonable" means.

Please also note that, in absence of your request details, the numbers I gave are just wild ballpark numbers.