question

Upvotes
Accepted
13 2 2 6

Extracting TRTH Intraday for SP 500 companies with X-Direct-Download: true

why I request 500 companies' bid/ask/price/volume data for the past year, the server returns 202 and never goes anywhere further than that.

I thought there is no quota (with x-direct-download: true), but maybe there is? How can I retrieve these data?

tick-history-rest-apihistorical
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
Accepted
4.4k 10 6 9

Hi @Xiao.Xiao

Just want to add that you can check the progress of the request with the following end point.

https://hosted.datascopeapi.reuters.com/RestApi/v1/Jobs/Jobs('JobId')

It will give you the progress percentage of the request, which should give you a rough idea of how long you have to wait.

Here is the sample response:

{
    "@odata.context": "https://hosted.datascopeapi.reuters.com/RestApi/v1/$metadata#Jobs/$entity",
    "JobId": "0x0625548817cb####",
    "UserId": #######,
    "Status": "Completed",
    "StatusMessage": "Completed in 1st Request, Results Returned",
    "Description": "TickHistoryRawReportTemplate Extraction",
    "ProgressPercentage": 100,
    "CreateDate": "2018-04-20T04:00:10.106Z",
    "StartedDate": "2018-04-20T04:00:10.106Z",
    "CompletionDate": "2018-04-20T04:00:14.430Z",
    "MonitorUrl": "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRawResult(ExtractionId='0x0625548817cb####')"
}
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
13.7k 26 8 12

@Xiao.Xiao, what you observe is normal, and has nothing to do with quotas.

When you make an On Demand extraction, you will get a response in 30 seconds (default wait time) or less. The HTTP status of the response can have one of several values. Here are the 2 most common ones:

  • 200 OK happens if the request processing completed in less than 30 seconds. This can only occur for very small requests (and even then it is not guaranteed). Your request is for 500 instruments, there is no chance you get a 200 OK after sending the request.
  • 202 Accepted is the one you are most likely to receive. It means the request was accepted, but processing has not yet completed. The next step is to check the request status by polling it regularly until it returns a 200 OK.

I suggest you look at REST API Tutorial 3, it details the entire workflow. The section on HTTP status 202 is here.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Thanks Christiaan. However, I am not able to retrieve any data. I got 202 for 1.5 hours and the program simply ends without any warning/error message or anything...

I was wonder if there is another way to get large amount of data? I am doing 500 companies for the past year right now...

@Xiao.Xiao,

Considering your request is for 500 instruments and 1 year of data, it is possible that it could take longer than 1.5 hours. I suggest you wait longer.

"The program ends": what program are you using ? If you post the source code I could have a look at it.

x-direct-download: true has no influence on the data extraction performance. This parameter means that you will be able to download the resulting data (once it was extracted) from the AWS cloud, which will deliver better download performance.

I have waited for the whole afternoon ~3 hours, and I am still getting 202 all the time.

I am considering breaking my request into smaller ones BUT is there any other strategic way that I can do this?

I suppose this is only a few GB of data...

Show more comments
Show more comments
Upvotes
13.7k 26 8 12

@Xiao.Xiao, let me make a few comments on the code you sent (re-attached for easy reference). It has some important differences compared to our sample available under the downloads tab.

Amount of downloaded data

Your code makes a request for more than a year (~275 business days) of 1 minute bars (1440 records / day) for 900 instruments. That is a big amount of data (~356 million records). It is not surprising that it takes a long time.

X-Direct-Download

One sets this header to request a data download from AWS, which is faster. This header is only useful when put in the request to download the data (i.e. the request that uses the JobId in the endpoint URL). Your code sets this header in getId (line 58 of your code) which has no effect; it should be set in getRaw (line 132). Changing this should enhance download performance (it will not decrease the extraction time).

Saving the data to file

In writeToFile (line 146) your code is:

rr = r5.raw
connector.write_to_s3('', fileName, rr.read())

This is not efficient: it first pulls all data into RAM, and only then writes it to disk. RAM usage is therefore quite high, and performance poor.

A better and faster solution would be:

chunk_size = 1024
rr = r5.raw
with open(fileName, 'wb') as fd:
    shutil.copyfileobj(rr, fd, chunk_size)

This is taken from step 5 of our sample code.

For more info on download tuning with Python

See this article: How to Optimize TRTH (Tick History) file downloads for Python (and other languages).

Hope this helps.


icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Thank for your reponse. I was more concerned about your first and second point, as I have tried different versions for save streaming response to file.

But if I understand you correctly, "It is not surprising that it takes a long time" and "Changing this should enhance download performance (it will not decrease the extraction time).", there is no way to improve the waiting time for the response - which seems "never" come back in my case...

@Xiao.Xiao,

The server performance depends on its load. You cannot influence the server to deliver a faster extraction time.

You can optimize your code to optimize the download time.

You are making a huge request. You say it takes 3-4 hours for 50 RICs. Do the math, for 500 RICs it will take more than 1 day ...

You need to diminish the number of instruments and/or diminish the date range and/or wait the time it takes.