TRTH download issues

Question

question

alvise.susmel

36 ●4 ●6 ●12

TRTH download issues

We have some issues with TRTH v2 api. We have different jobs stuck. I've canceled them using the api, but they are still stuck there (screenshot below).

The real issue I have is with the download: - It's an on-demand extraction of about 700mb csv file. The extraction has successfully completed and the file is ready for the download. I use the url

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('0x05c9d0

26dafb3026')/$value

to download the stream to a local file.

The problem I have with a job like this one is that the download stops at 177mb, always!

Is there a download limit?

Can we also run 10-20 parallel downloads?

P.S: since yesterday night the username/password stopped to work (invalid username/password). I'm calling reuters..

tick-history-rest-api Download

Jul 10, 2017 at 04:30 AM

10 |1500

Attachments: Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Answer 1 · 2017-07-10T15:24:15Z

alvise.susmel

36 ●4 ●6 ●12

The way I made the compression and the download work properly is avoiding to use "requests" library but using urllib3 python lib instead.

With it is possible to ask to don't "decode" the content (if it gzipped it would uncompress it)

headers = {
   'content-type': 'text/plain',
   'Authorization': "Token " + token,
   'Prefer': 'respond-async'
}

http = urllib3.PoolManager(timeout=urllib3.util.Timeout(connect=60.0, read=60.0))
r = http.request("GET",url,headers=headers,preload_content=False,decode_content=False)

# r is a stream

Jul 10, 2017 at 03:24 PM

10 |1500

Attachments: Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

veerapath.rungruengrayubkul Jul 11, 2017 at 05:27 AM

@alvise.susmel

The "requests" library will automatically decode data, according to this page.

You can use the command in this question to get encoded gzip data. Also, please see this related question.

alvise.susmel veerapath.rungruengrayubkul Jul 11, 2017 at 09:40 AM

really helpful. thanks!

Christiaan Meihsl Jul 11, 2017 at 05:02 AM

@alvise.susmel, to clarify: using urllib3 and this code solved the entire problem (i.e. the data is saved compressed, and nothing is lost) ? Or are there any remaining issues ?

alvise.susmel Christiaan Meihsl Jul 11, 2017 at 09:40 AM

that solved the issue, thanks! :)

Christiaan Meihsl alvise.susmel Jul 11, 2017 at 11:41 AM

Great, thanks for letting us know !

Answer 2 · 2017-07-10T08:38:29Z

alvise.susmel

36 ●4 ●6 ●12

Login problem solved but I still have download issues.

These are the notes related to one of the jobs that should be ready but where the download is stuck. This is what I do.

Request

I'm trying full range for ESU8 from 2007-06-18 to 2008-09-20, which is the tradable date for the contract (ticker ESU08 Index) (is the range too big?)

payload = {
    "ExtractionRequest": {
        "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.TickHistoryTimeAndSalesExtractionRequest",
        "ContentFieldNames": CONTENT_FIELD_NAMES,
        "IdentifierList": {
            "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.InstrumentIdentifierList",
            "InstrumentIdentifiers": identifiers,
            "UseUserPreferencesForValidationOptions": False,
            "ValidationOptions": {
                "AllowHistoricalInstruments": True
            }
        },
        "Condition": {
            "MessageTimeStampIn": "GmtUtc",
            "ApplyCorrectionsAndCancellations": False,
            "ReportDateRangeType": "Range",
            "QueryStartDate": start_date.strftime('%Y-%m-%dT%H:%M:%S.000Z'),
            "QueryEndDate": end_date.strftime('%Y-%m-%dT%H:%M:%S.000Z'),
            "DisplaySourceRIC": True
        }
    }
}

Monitor with the url

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRawResult(ExtractionId='0x05ca060558db2f96')

and getting these notes

{
    "@odata.context": "https://hosted.datascopeapi.reuters.com/RestApi/v1/$metadata#RawExtractionResults/$entity",
    "JobId": "0x05ca060558db2f96",
    "Notes": [
        "'Extraction Services Version 11.1.36930 (3ce781d3dd80), Built Jun 15 2017 06:42:50\nUser ID: 9012542\nExtraction ID: 2000000001154665\nSchedule: 0x05ca060558db2f96 (ID = 0x0000000000000000)\nInput List (1 items):  (ID = 0x05ca060558db2f96) Created: 10/07/2017 11:47:11 Last Modified: 10/07/2017 11:47:11\nReport Template (8 fields): _OnD_0x05ca060558db2f96 (ID = 0x05ca06057ceb2f96) Created: 10/07/2017 11:40:59 Last Modified: 10/07/2017 11:40:59\nSchedule dispatched via message queue (0x05ca060558db2f96), Data source identifier (CDA7151F657E4D8289B7D6277AE4702A)\nSchedule Time: 10/07/2017 11:41:02\nProcessing started at 10/07/2017 11:41:02\nProcessing completed successfully at 10/07/2017 11:47:12\nExtraction finished at 10/07/2017 11:47:12 UTC, with servers: tm01n01, TRTH (363.118 secs)\nInstrument <RIC,ESU8> expanded to 1 RIC: ESU8.\nQuota Message: INFO: Tick History Futures Quota Count Before Extraction: 1; Instruments Extracted: 1; Tick History Futures Quota Count After Extraction: 1, 1.96078431372549% of Limit; Tick History Futures Quota Limit: 51\nManifest: #RIC,Domain,Start,End,Status,Count\nManifest: ESU8,Market Price,2008-01-28T15:44:34.350858000Z,2008-09-19T18:34:38.290958000Z,Active,67504661\n'"
    ]
}

so the result should be ready to be downloaded

To download the full csv file (is there a way to download a gzipped version of it?)

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('0x05ca060558db2f96')/$value

This starts to download the uncompressed csv and it blocks at around 177mb (it should be around 4gb uncompressed). It doesn't close the connection.. just times out.

If I try to do the same with the web insterface, it works and I can get the file (compressed..)

Do I do something wrong in one of the steps above?

Thanks

Jul 10, 2017 at 08:38 AM

10 |1500

Attachments: Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Christiaan Meihsl Jul 10, 2017 at 08:46 AM

@alvise.susmel, the steps you describe seem ok.

That said, there is no reason to download the uncompressed file, you should be able to download it in compressed format. This depends on how you code the download request. Can you share your code ?

alvise.susmel Christiaan Meihsl Jul 10, 2017 at 08:51 AM

the code is the one above, where I create the payload of the request. Maybe I'm missing some fields to enable the compression..

alvise.susmel Christiaan Meihsl Jul 10, 2017 at 08:59 AM

How do I ask to compress using gzip?

Christiaan Meihsl alvise.susmel Jul 10, 2017 at 09:04 AM

@alvise.susmel, can you confirm the file you download using the GUI is compressed ? It should be, and its name should end in .gz

Sorry, I was not clear, I meant the full code used to generate the HTTPS download request (the one that downloads data using the JobId). I'd like to see how is the HTTP connection created, how the GET query is generated and submitted, what are the headers, and how the response is handled.

You can post your code as an attachment. If you do not want to post it in the forum you can also send it to me directly to christiaan.meihsl@thomsonreuters.com.

alvise.susmel Christiaan Meihsl Jul 10, 2017 at 09:32 AM

I've posted the code below. Yes I confirm that the code I download from the GUI is compressed but there I create the template while using APIs I don't use any template, I just send the request via api with details and fields. Is it possible to ask to compress the file using the request I initially posted?

Thanks

Christiaan Meihsl alvise.susmel Jul 10, 2017 at 09:41 AM

Thanks, I'll started looking at the code. The file should be compressed by default, nothing special should be done to request that.

Answer 3 · 2017-07-10T09:13:02Z

alvise.susmel

36 ●4 ●6 ●12

@christiaan.meijsl

sure no problem

def _download_file(self, job_id, local_filename):

DOWNLOAD_URL = "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('{0}')/$value"

url = DOWNLOAD_URL.format(job_id)
token = self._auth.ensure_token()
headers = {
    'content-type': 'application/json; charset=UTF-8',
    'Authorization': "Token " + token,
    'Prefer': 'respond-async'
}

LOGGER.info("_download_file start {0} {1}".format(url, local_filename))

r = requests.get(url, stream=True, headers=headers)
with gzip.open(local_filename, 'wb') as f:
    for chunk in r.iter_content(chunk_size=1024):
        if chunk: f.write(chunk)

LOGGER.info("_download_file finished {0} {1}".format(local_filename, url))

,

Jul 10, 2017 at 09:13 AM

10 |1500

Attachments: Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Christiaan Meihsl Jul 10, 2017 at 09:39 AM

Ok, so you are using Python. I'm not very familiar with Python, but as far as I understand, gzip.open automatically decompresses the data, which would explain why you get uncompressed data. I must investigate how to save the file without decompressing it.

alvise.susmel Christiaan Meihsl Jul 10, 2017 at 09:53 AM

No, the gzip.open opens a write file descriptor to compress the data I write into the file.

All the chunks I receive are uncompressed.. I try to see now the content-length, maybe this could help

Christiaan Meihsl alvise.susmel Jul 10, 2017 at 10:07 AM

If I am not mistaken, if you change the headers of the download request to the following, the server should send compressed data:

url = DOWNLOAD_URL.format(job_id)
 token = self._auth.ensure_token()
 headers = {
                     
 'content-type': 'text/plain',
 'Accept-Encoding':'gzip',
 'Authorization': "Token " + token,
 'Prefer': 'respond-async'
 }

Show more comments

Q&A Forum

question

TRTH download issues

3 Answers

Write an Answer

question

TRTH download issues

3 Answers

Write an Answer

Related Questions