question

Upvotes
Accepted
36 4 6 12

TRTH download issues

We have some issues with TRTH v2 api. We have different jobs stuck. I've canceled them using the api, but they are still stuck there (screenshot below).

The real issue I have is with the download: - It's an on-demand extraction of about 700mb csv file. The extraction has successfully completed and the file is ready for the download. I use the url

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('0x05c9d0

26dafb3026')/$value

to download the stream to a local file.

The problem I have with a job like this one is that the download stops at 177mb, always!

Is there a download limit?

Can we also run 10-20 parallel downloads?

P.S: since yesterday night the username/password stopped to work (invalid username/password). I'm calling reuters..

tick-history-rest-apiDownload
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvote
Accepted
36 4 6 12

The way I made the compression and the download work properly is avoiding to use "requests" library but using urllib3 python lib instead.

With it is possible to ask to don't "decode" the content (if it gzipped it would uncompress it)

headers = {
   'content-type': 'text/plain',
   'Authorization': "Token " + token,
   'Prefer': 'respond-async'
}

http = urllib3.PoolManager(timeout=urllib3.util.Timeout(connect=60.0, read=60.0))
r = http.request("GET",url,headers=headers,preload_content=False,decode_content=False)

# r is a stream

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@alvise.susmel

The "requests" library will automatically decode data, according to this page.

You can use the command in this question to get encoded gzip data. Also, please see this related question.

alvise.susmel avatar image alvise.susmel veerapath.rungruengrayubkul

really helpful. thanks!

@alvise.susmel, to clarify: using urllib3 and this code solved the entire problem (i.e. the data is saved compressed, and nothing is lost) ? Or are there any remaining issues ?

that solved the issue, thanks! :)

Great, thanks for letting us know !

Upvotes
36 4 6 12

Login problem solved but I still have download issues.

These are the notes related to one of the jobs that should be ready but where the download is stuck. This is what I do.

Request

I'm trying full range for ESU8 from 2007-06-18 to 2008-09-20, which is the tradable date for the contract (ticker ESU08 Index) (is the range too big?)

payload = {
    "ExtractionRequest": {
        "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.TickHistoryTimeAndSalesExtractionRequest",
        "ContentFieldNames": CONTENT_FIELD_NAMES,
        "IdentifierList": {
            "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.InstrumentIdentifierList",
            "InstrumentIdentifiers": identifiers,
            "UseUserPreferencesForValidationOptions": False,
            "ValidationOptions": {
                "AllowHistoricalInstruments": True
            }
        },
        "Condition": {
            "MessageTimeStampIn": "GmtUtc",
            "ApplyCorrectionsAndCancellations": False,
            "ReportDateRangeType": "Range",
            "QueryStartDate": start_date.strftime('%Y-%m-%dT%H:%M:%S.000Z'),
            "QueryEndDate": end_date.strftime('%Y-%m-%dT%H:%M:%S.000Z'),
            "DisplaySourceRIC": True
        }
    }
}

Monitor with the url

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRawResult(ExtractionId='0x05ca060558db2f96')

and getting these notes

{
    "@odata.context": "https://hosted.datascopeapi.reuters.com/RestApi/v1/$metadata#RawExtractionResults/$entity",
    "JobId": "0x05ca060558db2f96",
    "Notes": [
        "'Extraction Services Version 11.1.36930 (3ce781d3dd80), Built Jun 15 2017 06:42:50\nUser ID: 9012542\nExtraction ID: 2000000001154665\nSchedule: 0x05ca060558db2f96 (ID = 0x0000000000000000)\nInput List (1 items):  (ID = 0x05ca060558db2f96) Created: 10/07/2017 11:47:11 Last Modified: 10/07/2017 11:47:11\nReport Template (8 fields): _OnD_0x05ca060558db2f96 (ID = 0x05ca06057ceb2f96) Created: 10/07/2017 11:40:59 Last Modified: 10/07/2017 11:40:59\nSchedule dispatched via message queue (0x05ca060558db2f96), Data source identifier (CDA7151F657E4D8289B7D6277AE4702A)\nSchedule Time: 10/07/2017 11:41:02\nProcessing started at 10/07/2017 11:41:02\nProcessing completed successfully at 10/07/2017 11:47:12\nExtraction finished at 10/07/2017 11:47:12 UTC, with servers: tm01n01, TRTH (363.118 secs)\nInstrument <RIC,ESU8> expanded to 1 RIC: ESU8.\nQuota Message: INFO: Tick History Futures Quota Count Before Extraction: 1; Instruments Extracted: 1; Tick History Futures Quota Count After Extraction: 1, 1.96078431372549% of Limit; Tick History Futures Quota Limit: 51\nManifest: #RIC,Domain,Start,End,Status,Count\nManifest: ESU8,Market Price,2008-01-28T15:44:34.350858000Z,2008-09-19T18:34:38.290958000Z,Active,67504661\n'"
    ]
}

so the result should be ready to be downloaded

To download the full csv file (is there a way to download a gzipped version of it?)

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('0x05ca060558db2f96')/$value

This starts to download the uncompressed csv and it blocks at around 177mb (it should be around 4gb uncompressed). It doesn't close the connection.. just times out.

If I try to do the same with the web insterface, it works and I can get the file (compressed..)

Do I do something wrong in one of the steps above?

Thanks

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@alvise.susmel, the steps you describe seem ok.

That said, there is no reason to download the uncompressed file, you should be able to download it in compressed format. This depends on how you code the download request. Can you share your code ?

the code is the one above, where I create the payload of the request. Maybe I'm missing some fields to enable the compression..

How do I ask to compress using gzip?

@alvise.susmel, can you confirm the file you download using the GUI is compressed ? It should be, and its name should end in .gz

Sorry, I was not clear, I meant the full code used to generate the HTTPS download request (the one that downloads data using the JobId). I'd like to see how is the HTTP connection created, how the GET query is generated and submitted, what are the headers, and how the response is handled.

You can post your code as an attachment. If you do not want to post it in the forum you can also send it to me directly to christiaan.meihsl@thomsonreuters.com.

I've posted the code below. Yes I confirm that the code I download from the GUI is compressed but there I create the template while using APIs I don't use any template, I just send the request via api with details and fields. Is it possible to ask to compress the file using the request I initially posted?

Thanks

Thanks, I'll started looking at the code. The file should be compressed by default, nothing special should be done to request that.

Upvotes
36 4 6 12
@christiaan.meijsl

sure no problem

def _download_file(self, job_id, local_filename):

DOWNLOAD_URL = "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('{0}')/$value"

url = DOWNLOAD_URL.format(job_id)
token = self._auth.ensure_token()
headers = {
    'content-type': 'application/json; charset=UTF-8',
    'Authorization': "Token " + token,
    'Prefer': 'respond-async'
}

LOGGER.info("_download_file start {0} {1}".format(url, local_filename))

r = requests.get(url, stream=True, headers=headers)
with gzip.open(local_filename, 'wb') as f:
    for chunk in r.iter_content(chunk_size=1024):
        if chunk: f.write(chunk)

LOGGER.info("_download_file finished {0} {1}".format(local_filename, url)) 

,
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Ok, so you are using Python. I'm not very familiar with Python, but as far as I understand, gzip.open automatically decompresses the data, which would explain why you get uncompressed data. I must investigate how to save the file without decompressing it.

No, the gzip.open opens a write file descriptor to compress the data I write into the file.

All the chunks I receive are uncompressed.. I try to see now the content-length, maybe this could help

If I am not mistaken, if you change the headers of the download request to the following, the server should send compressed data:

url = DOWNLOAD_URL.format(job_id)
token = self._auth.ensure_token()
headers = {
'content-type': 'text/plain',
'Accept-Encoding':'gzip',
'Authorization': "Token " + token,
'Prefer': 'respond-async'
}
Show more comments

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.