Extracting TRTH Intraday for SP 500 companies with X-Direct-Download: true
why I request 500 companies' bid/ask/price/volume data for the past year, the server returns 202 and never goes anywhere further than that.
I thought there is no quota (with x-direct-download: true), but maybe there is? How can I retrieve these data?
Best Answer
-
Hi @Xiao.Xiao
Just want to add that you can check the progress of the request with the following end point.
https://hosted.datascopeapi.reuters.com/RestApi/v1/Jobs/Jobs('JobId')
It will give you the progress percentage of the request, which should give you a rough idea of how long you have to wait.
Here is the sample response:
{
"@odata.context": "https://hosted.datascopeapi.reuters.com/RestApi/v1/$metadata#Jobs/$entity",
"JobId": "0x0625548817cb####",
"UserId": #######,
"Status": "Completed",
"StatusMessage": "Completed in 1st Request, Results Returned",
"Description": "TickHistoryRawReportTemplate Extraction",
"ProgressPercentage": 100,
"CreateDate": "2018-04-20T04:00:10.106Z",
"StartedDate": "2018-04-20T04:00:10.106Z",
"CompletionDate": "2018-04-20T04:00:14.430Z",
"MonitorUrl": "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRawResult(ExtractionId='0x0625548817cb####')"
}0
Answers
-
@Xiao.Xiao, what you observe is normal, and has nothing to do with quotas.
When you make an On Demand extraction, you will get a response in 30 seconds (default wait time) or less. The HTTP status of the response can have one of several values. Here are the 2 most common ones:
- 200 OK happens if the request processing completed in less than 30 seconds. This can only occur for very small requests (and even then it is not guaranteed). Your request is for 500 instruments, there is no chance you get a 200 OK after sending the request.
- 202 Accepted is the one you are most likely to receive. It means the request was accepted, but processing has not yet completed. The next step is to check the request status by polling it regularly until it returns a 200 OK.
I suggest you look at REST API Tutorial 3, it details the entire workflow. The section on HTTP status 202 is here.
0 -
Thanks Christiaan. However, I am not able to retrieve any data. I got 202 for 1.5 hours and the program simply ends without any warning/error message or anything...
I was wonder if there is another way to get large amount of data? I am doing 500 companies for the past year right now...
0 -
Considering your request is for 500 instruments and 1 year of data, it is possible that it could take longer than 1.5 hours. I suggest you wait longer.
"The program ends": what program are you using ? If you post the source code I could have a look at it.
x-direct-download: true has no influence on the data extraction performance. This parameter means that you will be able to download the resulting data (once it was extracted) from the AWS cloud, which will deliver better download performance.
0 -
I have waited for the whole afternoon ~3 hours, and I am still getting 202 all the time.
I am considering breaking my request into smaller ones BUT is there any other strategic way that I can do this?
I suppose this is only a few GB of data...
0 -
... and the program is almost identical to the `.py` example provided by TRTH.
0 -
Hi Alex, thanks for asking. I appreciate the replies here, they gives me confidence that there is nothing wrong from my request to TRTH. However, I was waiting forever (hours and hours...) trying to fetch 500 companies' data with status code=202. So I ended up making loops of 50 requests per loop and gets all the data I needed. I am not sure it can be considered as "resolved" in this case.
0 -
@Xiao.Xiao, how long do your requests for 50 instruments take to deliver data ? If I were you I'd let a request for 500 run till the end, to see how long it takes, and what volume of data it delivers. If you post your code (remove your account and password first, but ensure it includes the 500 instruments) I can also test it here and investigate.
0 -
for 50 instruments, it takes 3-4 hours...
Well, I didn't get any data for 500 instruments for more than half a day, and I need to move on with my project...
My `.py` is identical to the `.py` code you provided under example here. The only difference is that I stack 500 instruments according to the user guide (the example shown here only has one instrument)... and I kept getting 202 so I assume my request is valid (?)
0 -
@Xiao.Xiao, if the request takes 3-4 hours for 50 instruments, then you will not get 500 in half a day ...
And yes, if you get a 202 your request is valid. If it was not valid you'd get an HTTP status in the 400 range.
Could you post your entire Python code ? I could test it to see if I get the same results as you.
0 -
@Xiao.Xiao, Could you post your entire Python code?
0 -
trth-ondemand-intradaybars-allpy.zip
Please see the attachment above. I have two local files with all SP500 RICs and all SP400 RICs respectively.
0 -
@Xiao.Xiao, please also post the files LIST_SP500.csv and LIST_SP400.csv, I'd like to test with the exact same instrument lists you are using.
0 -
@Xiao.Xiao, let me make a few comments on the code you sent (re-attached for easy reference). It has some important differences compared to our sample available under the downloads tab.
Amount of downloaded data
Your code makes a request for more than a year (~275 business days) of 1 minute bars (1440 records / day) for 900 instruments. That is a big amount of data (~356 million records). It is not surprising that it takes a long time.
X-Direct-Download
One sets this header to request a data download from AWS, which is faster. This header is only useful when put in the request to download the data (i.e. the request that uses the JobId in the endpoint URL). Your code sets this header in getId (line 58 of your code) which has no effect; it should be set in getRaw (line 132). Changing this should enhance download performance (it will not decrease the extraction time).
Saving the data to file
In writeToFile (line 146) your code is:
rr = r5.raw
connector.write_to_s3('', fileName, rr.read())This is not efficient: it first pulls all data into RAM, and only then writes it to disk. RAM usage is therefore quite high, and performance poor.
A better and faster solution would be:
chunk_size = 1024
rr = r5.raw
with open(fileName, 'wb') as fd:
shutil.copyfileobj(rr, fd, chunk_size)This is taken from step 5 of our sample code.
For more info on download tuning with Python
See this article: How to Optimize TRTH (Tick History) file downloads for Python (and other languages).
Hope this helps.
0 -
Thank for your reponse. I was more concerned about your first and second point, as I have tried different versions for save streaming response to file.
But if I understand you correctly, "It is not surprising that it takes a long time" and "Changing this should enhance download performance (it will not decrease the extraction time).", there is no way to improve the waiting time for the response - which seems "never" come back in my case...
0 -
The server performance depends on its load. You cannot influence the server to deliver a faster extraction time.
You can optimize your code to optimize the download time.
You are making a huge request. You say it takes 3-4 hours for 50 RICs. Do the math, for 500 RICs it will take more than 1 day ...
You need to diminish the number of instruments and/or diminish the date range and/or wait the time it takes.
0
Categories
- All Categories
- 3 Polls
- 6 AHS
- 36 Alpha
- 166 App Studio
- 6 Block Chain
- 4 Bot Platform
- 18 Connected Risk APIs
- 47 Data Fusion
- 34 Data Model Discovery
- 687 Datastream
- 1.4K DSS
- 621 Eikon COM
- 5.2K Eikon Data APIs
- 10 Electronic Trading
- Generic FIX
- 7 Local Bank Node API
- 3 Trading API
- 2.9K Elektron
- 1.4K EMA
- 254 ETA
- 557 WebSocket API
- 38 FX Venues
- 14 FX Market Data
- 1 FX Post Trade
- 1 FX Trading - Matching
- 12 FX Trading – RFQ Maker
- 5 Intelligent Tagging
- 2 Legal One
- 23 Messenger Bot
- 3 Messenger Side by Side
- 9 ONESOURCE
- 7 Indirect Tax
- 60 Open Calais
- 276 Open PermID
- 44 Entity Search
- 2 Org ID
- 1 PAM
- PAM - Logging
- 6 Product Insight
- Project Tracking
- ProView
- ProView Internal
- 22 RDMS
- 1.9K Refinitiv Data Platform
- 669 Refinitiv Data Platform Libraries
- 4 LSEG Due Diligence
- LSEG Due Diligence Portal API
- 4 Refinitiv Due Dilligence Centre
- Rose's Space
- 1.2K Screening
- 18 Qual-ID API
- 13 Screening Deployed
- 23 Screening Online
- 12 World-Check Customer Risk Screener
- 1K World-Check One
- 46 World-Check One Zero Footprint
- 45 Side by Side Integration API
- 2 Test Space
- 3 Thomson One Smart
- 10 TR Knowledge Graph
- 151 Transactions
- 143 REDI API
- 1.8K TREP APIs
- 4 CAT
- 27 DACS Station
- 121 Open DACS
- 1.1K RFA
- 104 UPA
- 193 TREP Infrastructure
- 229 TRKD
- 917 TRTH
- 5 Velocity Analytics
- 9 Wealth Management Web Services
- 90 Workspace SDK
- 11 Element Framework
- 5 Grid
- 18 World-Check Data File
- 1 Yield Book Analytics
- 48 中文论坛