question

Upvotes
Accepted
9 5 3 8

TRTH: Retrieving range data using Time and Sales Data

Hi there,

I have a problem with extracting data from Tick History

I specified the range in the report request but couldn't retrieve all data. How can I retrieve all data I wrote in the code below? Any help would be appreciated.

Thank you,

body_data = json.dumps({
    "ExtractionRequest": {
        "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.TickHistoryTimeAndSalesExtractionRequest",
        "ContentFieldNames":[
            "Quote - Bid Price",
            "Quote - Bid Size",
            "Quote - Ask Price",
            "Quote - Ask Size"
 
        ],
        "IdentifierList": {
            "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.InstrumentIdentifierList",
            "InstrumentIdentifiers": [ { "Identifier": "JNIc1", "IdentifierType": "Ric" } ],
            "ValidationOptions": None,
            "UseUserPreferencesForValidationOptions": False
        },
        "Condition": {
            "MessageTimeStampIn": "",
            "ReportDateRangeType": "Range",
            "QueryStartDate":"2017-01-03T23:45:00.000Z",
            "QueryEndDate": "2017-01-06T20:30:00.000Z",
            "DisplaySourceRIC": True
        }
    }
})

responseGet = requests.post( "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRaw",
                              data = body_data,
                              headers = header2)

res_json = responseGet.json()
job_id = res_json['JobId']
response_obj = requests.get( "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('{0}')/$value".format(job_id),
                            headers = header2, stream=True)

gzip_file = "jnic1.csv"


with open(gzip_file, 'wb') as f:
    for data in response_obj.raw.stream(decode_content=True):
        f.write(data)


pythontick-history-rest-apipricing
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@YK

With the code, the extracted result starts at 2017-01-04T08:45:00.077934619+09

and ends at 2017-01-04T08:45:00.077934619+09. Do you receive the same result?

Also, how could you verify if all data is not received? Could you please elaborate?

YK_deprecated_1 avatar image YK_deprecated_1 veerapath.rungruengrayubkul

@veerapath.rungruengrayubkul

Thanks your help.

I receive the same result by ur code.

I could not get the whole data because it didn't include the data I specified in the condition.

Upvotes
Accepted
13.7k 26 8 12

@YK, am I right in guessing that you are only receiving the first part of the expected data ? If yes, if you run the query several times (try at least 10 times), is the number of lines of received data always the same, or does it vary ? If yes, this might be related to a similar issue we saw in Java with libraries that were not robust enough and dropped the stream when decoding data on the fly.

I see you set decode_content=True. If I am not mistaken, that means the file will be decompressed before saving to disk. Can you try setting it to false ?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@Christiaan MeihslI found the example of getting the latest schedule files or venue files with decode_content=True. I am wondering why it works for that and why should we treat it differently compared to the on demand request for the decode_content

Upvotes
9 5 3 8

@Christiaan Meihsl

After setting decode to false, I can get all data in a gzip file.

I still can't figure out why I cannot get the whole data by using decode_content = True.. Does it simply overflow the capacity of API? or some other reasons..

but it's ok it clears.

Thank you!

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@YK, in the similar issue I mentioned with the Java libraries, we observed that when the data set was small the decoding worked fine. But it started failing when the data set was larger. I guess many such libraries were tested on fairly small data sets, which correspond to the common use cases. With TRTH we are often handling large data sets, which is somewhat atypical, and it seems some libraries were just not built for that.

Glad I helped solve the issue.