TRTH download issues

We have some issues with TRTH v2 api. We have different jobs stuck. I've canceled them using the api, but they are still stuck there (screenshot below).

The real issue I have is with the download:
- It's an on-demand extraction of about 700mb csv file. The extraction has successfully completed and the file is ready for the download. I use the url

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('0x05c9d0

26dafb3026')/$value

to download the stream to a local file.

The problem I have with a job like this one is that the download stops at 177mb, always!

Is there a download limit?

Can we also run 10-20 parallel downloads?

P.S: since yesterday night the username/password stopped to work (invalid username/password). I'm calling reuters..

Find more posts tagged with

Download

tick-history-rest-api

Accepted answers

alvise.susmel

The way I made the compression and the download work properly is avoiding to use "requests" library but using urllib3 python lib instead.

With it is possible to ask to don't "decode" the content (if it gzipped it would uncompress it)

headers = {
   'content-type': 'text/plain',
   'Authorization': "Token " + token,
   'Prefer': 'respond-async'
}

http = urllib3.PoolManager(timeout=urllib3.util.Timeout(connect=60.0, read=60.0))
r = http.request("GET",url,headers=headers,preload_content=False,decode_content=False)

# r is a stream

All comments

alvise.susmel

These are the notes related to one of the jobs that should be ready but where the download is stuck. This is what I do.

Request

I'm trying full range for ESU8 from 2007-06-18 to 2008-09-20, which is the tradable date for the contract (ticker ESU08 Index) (is the range too big?)

payload = {
    "ExtractionRequest": {
        "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.TickHistoryTimeAndSalesExtractionRequest",
        "ContentFieldNames": CONTENT_FIELD_NAMES,
        "IdentifierList": {
            "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.InstrumentIdentifierList",
            "InstrumentIdentifiers": identifiers,
            "UseUserPreferencesForValidationOptions": False,
            "ValidationOptions": {
                "AllowHistoricalInstruments": True
            }
        },
        "Condition": {
            "MessageTimeStampIn": "GmtUtc",
            "ApplyCorrectionsAndCancellations": False,
            "ReportDateRangeType": "Range",
            "QueryStartDate": start_date.strftime('%Y-%m-%dT%H:%M:%S.000Z'),
            "QueryEndDate": end_date.strftime('%Y-%m-%dT%H:%M:%S.000Z'),
            "DisplaySourceRIC": True
        }
    }
}

Monitor with the url

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRawResult(ExtractionId='0x05ca060558db2f96')

and getting these notes

{
    "@odata.context": "https://hosted.datascopeapi.reuters.com/RestApi/v1/$metadata#RawExtractionResults/$entity",
    "JobId": "0x05ca060558db2f96",
    "Notes": [
        "'Extraction Services Version 11.1.36930 (3ce781d3dd80), Built Jun 15 2017 06:42:50\nUser ID: 9012542\nExtraction ID: 2000000001154665\nSchedule: 0x05ca060558db2f96 (ID = 0x0000000000000000)\nInput List (1 items):  (ID = 0x05ca060558db2f96) Created: 10/07/2017 11:47:11 Last Modified: 10/07/2017 11:47:11\nReport Template (8 fields): _OnD_0x05ca060558db2f96 (ID = 0x05ca06057ceb2f96) Created: 10/07/2017 11:40:59 Last Modified: 10/07/2017 11:40:59\nSchedule dispatched via message queue (0x05ca060558db2f96), Data source identifier (CDA7151F657E4D8289B7D6277AE4702A)\nSchedule Time: 10/07/2017 11:41:02\nProcessing started at 10/07/2017 11:41:02\nProcessing completed successfully at 10/07/2017 11:47:12\nExtraction finished at 10/07/2017 11:47:12 UTC, with servers: tm01n01, TRTH (363.118 secs)\nInstrument <RIC,ESU8> expanded to 1 RIC: ESU8.\nQuota Message: INFO: Tick History Futures Quota Count Before Extraction: 1; Instruments Extracted: 1; Tick History Futures Quota Count After Extraction: 1, 1.96078431372549% of Limit; Tick History Futures Quota Limit: 51\nManifest: #RIC,Domain,Start,End,Status,Count\nManifest: ESU8,Market Price,2008-01-28T15:44:34.350858000Z,2008-09-19T18:34:38.290958000Z,Active,67504661\n'"
    ]
}

so the result should be ready to be downloaded

To download the full csv file (is there a way to download a gzipped version of it?)

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('0x05ca060558db2f96')/$value

This starts to download the uncompressed csv and it blocks at around 177mb (it should be around 4gb uncompressed). It doesn't close the connection.. just times out.

If I try to do the same with the web insterface, it works and I can get the file (compressed..)

Do I do something wrong in one of the steps above?

Thanks

Christiaan Meihsl

@alvise.susmel, the steps you describe seem ok.

That said, there is no reason to download the uncompressed file, you should be able to download it in compressed format. This depends on how you code the download request. Can you share your code ?

alvise.susmel

the code is the one above, where I create the payload of the request. Maybe I'm missing some fields to enable the compression..

alvise.susmel

How do I ask to compress using gzip?

Christiaan Meihsl

@alvise.susmel, can you confirm the file you download using the GUI is compressed ? It should be, and its name should end in .gz

Sorry, I was not clear, I meant the full code used to generate the HTTPS download request (the one that downloads data using the JobId). I'd like to see how is the HTTP connection created, how the GET query is generated and submitted, what are the headers, and how the response is handled.

You can post your code as an attachment. If you do not want to post it in the forum you can also send it to me directly to christiaan.meihsl@thomsonreuters.com.

alvise.susmel

@christiaan.meijsl

sure no problem

def _download_file(self, job_id, local_filename):

DOWNLOAD_URL = "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('{0}')/$value"

url = DOWNLOAD_URL.format(job_id)
token = self._auth.ensure_token()
headers = {
    'content-type': 'application/json; charset=UTF-8',
    'Authorization': "Token " + token,
    'Prefer': 'respond-async'
}

LOGGER.info("_download_file start {0} {1}".format(url, local_filename))

r = requests.get(url, stream=True, headers=headers)
with gzip.open(local_filename, 'wb') as f:
    for chunk in r.iter_content(chunk_size=1024):
        if chunk: f.write(chunk)

LOGGER.info("_download_file finished {0} {1}".format(local_filename, url))

alvise.susmel

I've posted the code below. Yes I confirm that the code I download from the GUI is compressed but there I create the template while using APIs I don't use any template, I just send the request via api with details and fields. Is it possible to ask to compress the file using the request I initially posted?

Thanks

Christiaan Meihsl

Ok, so you are using Python. I'm not very familiar with Python, but as far as I understand, gzip.open automatically decompresses the data, which would explain why you get uncompressed data. I must investigate how to save the file without decompressing it.

Christiaan Meihsl

Thanks, I'll started looking at the code. The file should be compressed by default, nothing special should be done to request that.

alvise.susmel

No, the gzip.open opens a write file descriptor to compress the data I write into the file.

All the chunks I receive are uncompressed.. I try to see now the content-length, maybe this could help

Christiaan Meihsl

If I am not mistaken, if you change the headers of the download request to the following, the server should send compressed data:

url = DOWNLOAD_URL.format(job_id)
 token = self._auth.ensure_token()
 headers = {
 'content-type': 'text/plain',
 'Accept-Encoding':'gzip',
 'Authorization': "Token " + token,
 'Prefer': 'respond-async'
 }

alvise.susmel

The way I made the compression and the download work properly is avoiding to use "requests" library but using urllib3 python lib instead.

With it is possible to ask to don't "decode" the content (if it gzipped it would uncompress it)

headers = {
   'content-type': 'text/plain',
   'Authorization': "Token " + token,
   'Prefer': 'respond-async'
}

http = urllib3.PoolManager(timeout=urllib3.util.Timeout(connect=60.0, read=60.0))
r = http.request("GET",url,headers=headers,preload_content=False,decode_content=False)

# r is a stream

Christiaan Meihsl

@alvise.susmel, to clarify: using urllib3 and this code solved the entire problem (i.e. the data is saved compressed, and nothing is lost) ? Or are there any remaining issues ?

veerapath.rungruengrayubkul

@alvise.susmel

The "requests" library will automatically decode data, according to this page.

You can use the command in this question to get encoded gzip data. Also, please see this related question.

alvise.susmel

really helpful. thanks!

alvise.susmel

that solved the issue, thanks!

Christiaan Meihsl

Great, thanks for letting us know !

EXPLORE OUR SITES