TRTH download issues

alvise.susmel · July 2017

We have some issues with TRTH v2 api. We have different jobs stuck. I've canceled them using the api, but they are still stuck there (screenshot below).

The real issue I have is with the download:
- It's an on-demand extraction of about 700mb csv file. The extraction has successfully completed and the file is ready for the download. I use the url

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('0x05c9d0

26dafb3026')/$value

to download the stream to a local file.

The problem I have with a job like this one is that the download stops at 177mb, always!

Is there a download limit?

Can we also run 10-20 parallel downloads?

P.S: since yesterday night the username/password stopped to work (invalid username/password). I'm calling reuters..

alvise.susmel · July 2017

The way I made the compression and the download work properly is avoiding to use "requests" library but using urllib3 python lib instead.

With it is possible to ask to don't "decode" the content (if it gzipped it would uncompress it)

headers = {
   'content-type': 'text/plain',
   'Authorization': "Token " + token,
   'Prefer': 'respond-async'
}

http = urllib3.PoolManager(timeout=urllib3.util.Timeout(connect=60.0, read=60.0))
r = http.request("GET",url,headers=headers,preload_content=False,decode_content=False)

# r is a stream

alvise.susmel · July 2017

Login problem solved but I still have download issues.

These are the notes related to one of the jobs that should be ready but where the download is stuck. This is what I do.

Request

I'm trying full range for ESU8 from 2007-06-18 to 2008-09-20, which is the tradable date for the contract (ticker ESU08 Index) (is the range too big?)

payload = {
    "ExtractionRequest": {
        "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.TickHistoryTimeAndSalesExtractionRequest",
        "ContentFieldNames": CONTENT_FIELD_NAMES,
        "IdentifierList": {
            "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.InstrumentIdentifierList",
            "InstrumentIdentifiers": identifiers,
            "UseUserPreferencesForValidationOptions": False,
            "ValidationOptions": {
                "AllowHistoricalInstruments": True
            }
        },
        "Condition": {
            "MessageTimeStampIn": "GmtUtc",
            "ApplyCorrectionsAndCancellations": False,
            "ReportDateRangeType": "Range",
            "QueryStartDate": start_date.strftime('%Y-%m-%dT%H:%M:%S.000Z'),
            "QueryEndDate": end_date.strftime('%Y-%m-%dT%H:%M:%S.000Z'),
            "DisplaySourceRIC": True
        }
    }
}

Monitor with the url

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRawResult(ExtractionId='0x05ca060558db2f96')

and getting these notes

{
    "@odata.context": "https://hosted.datascopeapi.reuters.com/RestApi/v1/$metadata#RawExtractionResults/$entity",
    "JobId": "0x05ca060558db2f96",
    "Notes": [
        "'Extraction Services Version 11.1.36930 (3ce781d3dd80), Built Jun 15 2017 06:42:50\nUser ID: 9012542\nExtraction ID: 2000000001154665\nSchedule: 0x05ca060558db2f96 (ID = 0x0000000000000000)\nInput List (1 items):  (ID = 0x05ca060558db2f96) Created: 10/07/2017 11:47:11 Last Modified: 10/07/2017 11:47:11\nReport Template (8 fields): _OnD_0x05ca060558db2f96 (ID = 0x05ca06057ceb2f96) Created: 10/07/2017 11:40:59 Last Modified: 10/07/2017 11:40:59\nSchedule dispatched via message queue (0x05ca060558db2f96), Data source identifier (CDA7151F657E4D8289B7D6277AE4702A)\nSchedule Time: 10/07/2017 11:41:02\nProcessing started at 10/07/2017 11:41:02\nProcessing completed successfully at 10/07/2017 11:47:12\nExtraction finished at 10/07/2017 11:47:12 UTC, with servers: tm01n01, TRTH (363.118 secs)\nInstrument <RIC,ESU8> expanded to 1 RIC: ESU8.\nQuota Message: INFO: Tick History Futures Quota Count Before Extraction: 1; Instruments Extracted: 1; Tick History Futures Quota Count After Extraction: 1, 1.96078431372549% of Limit; Tick History Futures Quota Limit: 51\nManifest: #RIC,Domain,Start,End,Status,Count\nManifest: ESU8,Market Price,2008-01-28T15:44:34.350858000Z,2008-09-19T18:34:38.290958000Z,Active,67504661\n'"
    ]
}

so the result should be ready to be downloaded

To download the full csv file (is there a way to download a gzipped version of it?)

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('0x05ca060558db2f96')/$value

This starts to download the uncompressed csv and it blocks at around 177mb (it should be around 4gb uncompressed). It doesn't close the connection.. just times out.

If I try to do the same with the web insterface, it works and I can get the file (compressed..)

Do I do something wrong in one of the steps above?

Thanks

Christiaan Meihsl · July 2017

https://community.developers.refinitiv.com/discussion/comment/15795#Comment_15795

@alvise.susmel, the steps you describe seem ok.

That said, there is no reason to download the uncompressed file, you should be able to download it in compressed format. This depends on how you code the download request. Can you share your code ?

alvise.susmel · July 2017

https://community.developers.refinitiv.com/discussion/comment/15803#Comment_15803

the code is the one above, where I create the payload of the request. Maybe I'm missing some fields to enable the compression..

alvise.susmel · July 2017

https://community.developers.refinitiv.com/discussion/comment/15803#Comment_15803

How do I ask to compress using gzip?

Christiaan Meihsl · July 2017

https://community.developers.refinitiv.com/discussion/comment/15799#Comment_15799

@alvise.susmel, can you confirm the file you download using the GUI is compressed ? It should be, and its name should end in .gz

Sorry, I was not clear, I meant the full code used to generate the HTTPS download request (the one that downloads data using the JobId). I'd like to see how is the HTTP connection created, how the GET query is generated and submitted, what are the headers, and how the response is handled.

You can post your code as an attachment. If you do not want to post it in the forum you can also send it to me directly to christiaan.meihsl@thomsonreuters.com.

alvise.susmel · July 2017

@christiaan.meijsl

sure no problem

def _download_file(self, job_id, local_filename):

DOWNLOAD_URL = "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('{0}')/$value"

url = DOWNLOAD_URL.format(job_id)
token = self._auth.ensure_token()
headers = {
    'content-type': 'application/json; charset=UTF-8',
    'Authorization': "Token " + token,
    'Prefer': 'respond-async'
}

LOGGER.info("_download_file start {0} {1}".format(url, local_filename))

r = requests.get(url, stream=True, headers=headers)
with gzip.open(local_filename, 'wb') as f:
    for chunk in r.iter_content(chunk_size=1024):
        if chunk: f.write(chunk)

LOGGER.info("_download_file finished {0} {1}".format(local_filename, url))

,

alvise.susmel · July 2017

https://community.developers.refinitiv.com/discussion/comment/15803#Comment_15803

I've posted the code below. Yes I confirm that the code I download from the GUI is compressed but there I create the template while using APIs I don't use any template, I just send the request via api with details and fields. Is it possible to ask to compress the file using the request I initially posted?

Thanks

Christiaan Meihsl · July 2017

https://community.developers.refinitiv.com/discussion/comment/15800#Comment_15800

Ok, so you are using Python. I'm not very familiar with Python, but as far as I understand, gzip.open automatically decompresses the data, which would explain why you get uncompressed data. I must investigate how to save the file without decompressing it.

Christiaan Meihsl · July 2017

https://community.developers.refinitiv.com/discussion/comment/15811#Comment_15811

Thanks, I'll started looking at the code. The file should be compressed by default, nothing special should be done to request that.

alvise.susmel · July 2017

https://community.developers.refinitiv.com/discussion/comment/15805#Comment_15805

No, the gzip.open opens a write file descriptor to compress the data I write into the file.

All the chunks I receive are uncompressed.. I try to see now the content-length, maybe this could help

Christiaan Meihsl · July 2017

https://community.developers.refinitiv.com/discussion/comment/15812#Comment_15812

If I am not mistaken, if you change the headers of the download request to the following, the server should send compressed data:

url = DOWNLOAD_URL.format(job_id)
 token = self._auth.ensure_token()
 headers = {
 'content-type': 'text/plain',
 'Accept-Encoding':'gzip',
 'Authorization': "Token " + token,
 'Prefer': 'respond-async'
 }

alvise.susmel · July 2017

https://community.developers.refinitiv.com/discussion/comment/15807#Comment_15807

Thanks a lot. I try it straight away and I let you know if it blocks.

alvise.susmel · July 2017

https://community.developers.refinitiv.com/discussion/comment/15807#Comment_15807

Ok, there is a good and bad news.

the bad news is that I still receive an uncompressed stream. I've also disabled the automatic decoding of the library (which could uncompress if the content is compressed). I've checked the content length in the response headers which is '415567324' and the downloaded (uncompressed) file is

415567324 Jul 10 14:38 /tmp/trth_requests/59635f61a5ebcc3b3e5cd7a7.csv.gz

So it shows that the download stream is uncompressed, since the file is a plain csv file, not a gzipped.

the good news, as you could imagine, is that now it doesn't block (I don't know exactly why.. )

Christiaan Meihsl · July 2017

https://community.developers.refinitiv.com/discussion/comment/15814#Comment_15814

Can you check the contents of the header of the response to the last request (the one using the jobId for the data download) ? What is the value of Content-Encoding ? I believe it should be gzip.

Christiaan Meihsl · July 2017

https://community.developers.refinitiv.com/discussion/comment/15814#Comment_15814

You might also want to check out this page, and scroll down to the Compression section, there is some additional info you might find useful.

alvise.susmel · July 2017

https://community.developers.refinitiv.com/discussion/comment/15810#Comment_15810

yes the value is "gzip" but then the data in not gzipped..

Christiaan Meihsl · July 2017

https://community.developers.refinitiv.com/discussion/comment/15814#Comment_15814

Could you send your latest code over ? The one that generated the good and bad news ?

Christiaan Meihsl · July 2017

https://community.developers.refinitiv.com/discussion/comment/15819#Comment_15819

@alvise.susmel, to clarify: using urllib3 and this code solved the entire problem (i.e. the data is saved compressed, and nothing is lost) ? Or are there any remaining issues ?