TRTH download issues

We have some issues with TRTH v2 api. We have different jobs stuck. I've canceled them using the api, but they are still stuck there (screenshot below).
The real issue I have is with the download:
- It's an on-demand extraction of about 700mb csv file. The extraction has successfully completed and the file is ready for the download. I use the url
26dafb3026')/$value
to download the stream to a local file.
The problem I have with a job like this one is that the download stops at 177mb, always!
Is there a download limit?
Can we also run 10-20 parallel downloads?
P.S: since yesterday night the username/password stopped to work (invalid username/password). I'm calling reuters..
Best Answer
-
The way I made the compression and the download work properly is avoiding to use "requests" library but using urllib3 python lib instead.
With it is possible to ask to don't "decode" the content (if it gzipped it would uncompress it)
headers = {
'content-type': 'text/plain',
'Authorization': "Token " + token,
'Prefer': 'respond-async'
}
http = urllib3.PoolManager(timeout=urllib3.util.Timeout(connect=60.0, read=60.0))
r = http.request("GET",url,headers=headers,preload_content=False,decode_content=False)
# r is a stream0
Answers
-
Login problem solved but I still have download issues.
These are the notes related to one of the jobs that should be ready but where the download is stuck. This is what I do.
Request
I'm trying full range for ESU8 from 2007-06-18 to 2008-09-20, which is the tradable date for the contract (ticker ESU08 Index) (is the range too big?)
payload = {
"ExtractionRequest": {
"@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.TickHistoryTimeAndSalesExtractionRequest",
"ContentFieldNames": CONTENT_FIELD_NAMES,
"IdentifierList": {
"@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.InstrumentIdentifierList",
"InstrumentIdentifiers": identifiers,
"UseUserPreferencesForValidationOptions": False,
"ValidationOptions": {
"AllowHistoricalInstruments": True
}
},
"Condition": {
"MessageTimeStampIn": "GmtUtc",
"ApplyCorrectionsAndCancellations": False,
"ReportDateRangeType": "Range",
"QueryStartDate": start_date.strftime('%Y-%m-%dT%H:%M:%S.000Z'),
"QueryEndDate": end_date.strftime('%Y-%m-%dT%H:%M:%S.000Z'),
"DisplaySourceRIC": True
}
}
}Monitor with the url
and getting these notes
{
"@odata.context": "https://hosted.datascopeapi.reuters.com/RestApi/v1/$metadata#RawExtractionResults/$entity",
"JobId": "0x05ca060558db2f96",
"Notes": [
"'Extraction Services Version 11.1.36930 (3ce781d3dd80), Built Jun 15 2017 06:42:50\nUser ID: 9012542\nExtraction ID: 2000000001154665\nSchedule: 0x05ca060558db2f96 (ID = 0x0000000000000000)\nInput List (1 items): (ID = 0x05ca060558db2f96) Created: 10/07/2017 11:47:11 Last Modified: 10/07/2017 11:47:11\nReport Template (8 fields): _OnD_0x05ca060558db2f96 (ID = 0x05ca06057ceb2f96) Created: 10/07/2017 11:40:59 Last Modified: 10/07/2017 11:40:59\nSchedule dispatched via message queue (0x05ca060558db2f96), Data source identifier (CDA7151F657E4D8289B7D6277AE4702A)\nSchedule Time: 10/07/2017 11:41:02\nProcessing started at 10/07/2017 11:41:02\nProcessing completed successfully at 10/07/2017 11:47:12\nExtraction finished at 10/07/2017 11:47:12 UTC, with servers: tm01n01, TRTH (363.118 secs)\nInstrument <RIC,ESU8> expanded to 1 RIC: ESU8.\nQuota Message: INFO: Tick History Futures Quota Count Before Extraction: 1; Instruments Extracted: 1; Tick History Futures Quota Count After Extraction: 1, 1.96078431372549% of Limit; Tick History Futures Quota Limit: 51\nManifest: #RIC,Domain,Start,End,Status,Count\nManifest: ESU8,Market Price,2008-01-28T15:44:34.350858000Z,2008-09-19T18:34:38.290958000Z,Active,67504661\n'"
]
}so the result should be ready to be downloaded
To download the full csv file (is there a way to download a gzipped version of it?)
https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('0x05ca060558db2f96')/$value
This starts to download the uncompressed csv and it blocks at around 177mb (it should be around 4gb uncompressed). It doesn't close the connection.. just times out.
If I try to do the same with the web insterface, it works and I can get the file (compressed..)
Do I do something wrong in one of the steps above?
Thanks
0 -
@alvise.susmel, the steps you describe seem ok.
That said, there is no reason to download the uncompressed file, you should be able to download it in compressed format. This depends on how you code the download request. Can you share your code ?
0 -
the code is the one above, where I create the payload of the request. Maybe I'm missing some fields to enable the compression..
0 -
How do I ask to compress using gzip?
0 -
@alvise.susmel, can you confirm the file you download using the GUI is compressed ? It should be, and its name should end in .gz
Sorry, I was not clear, I meant the full code used to generate the HTTPS download request (the one that downloads data using the JobId). I'd like to see how is the HTTP connection created, how the GET query is generated and submitted, what are the headers, and how the response is handled.
You can post your code as an attachment. If you do not want to post it in the forum you can also send it to me directly to christiaan.meihsl@thomsonreuters.com.
0 -
@christiaan.meijsl
sure no problem
def _download_file(self, job_id, local_filename):
,
DOWNLOAD_URL = "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('{0}')/$value"
url = DOWNLOAD_URL.format(job_id)
token = self._auth.ensure_token()
headers = {
'content-type': 'application/json; charset=UTF-8',
'Authorization': "Token " + token,
'Prefer': 'respond-async'
}
LOGGER.info("_download_file start {0} {1}".format(url, local_filename))
r = requests.get(url, stream=True, headers=headers)
with gzip.open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: f.write(chunk)
LOGGER.info("_download_file finished {0} {1}".format(local_filename, url))
0 -
I've posted the code below. Yes I confirm that the code I download from the GUI is compressed but there I create the template while using APIs I don't use any template, I just send the request via api with details and fields. Is it possible to ask to compress the file using the request I initially posted?
Thanks
0 -
Ok, so you are using Python. I'm not very familiar with Python, but as far as I understand, gzip.open automatically decompresses the data, which would explain why you get uncompressed data. I must investigate how to save the file without decompressing it.
0 -
Thanks, I'll started looking at the code. The file should be compressed by default, nothing special should be done to request that.
0 -
No, the gzip.open opens a write file descriptor to compress the data I write into the file.
All the chunks I receive are uncompressed.. I try to see now the content-length, maybe this could help
0 -
If I am not mistaken, if you change the headers of the download request to the following, the server should send compressed data:
url = DOWNLOAD_URL.format(job_id)
token = self._auth.ensure_token()
headers = {
'content-type': 'text/plain',
'Accept-Encoding':'gzip',
'Authorization': "Token " + token,
'Prefer': 'respond-async'
}0 -
Thanks a lot. I try it straight away and I let you know if it blocks.
0 -
Ok, there is a good and bad news.
the bad news is that I still receive an uncompressed stream. I've also disabled the automatic decoding of the library (which could uncompress if the content is compressed). I've checked the content length in the response headers which is '415567324' and the downloaded (uncompressed) file is
415567324 Jul 10 14:38 /tmp/trth_requests/59635f61a5ebcc3b3e5cd7a7.csv.gz
So it shows that the download stream is uncompressed, since the file is a plain csv file, not a gzipped.
the good news, as you could imagine, is that now it doesn't block (I don't know exactly why.. )
0 -
Can you check the contents of the header of the response to the last request (the one using the jobId for the data download) ? What is the value of Content-Encoding ? I believe it should be gzip.
0 -
You might also want to check out this page, and scroll down to the Compression section, there is some additional info you might find useful.
0 -
yes the value is "gzip" but then the data in not gzipped..
0 -
Could you send your latest code over ? The one that generated the good and bad news ?
0 -
@alvise.susmel, to clarify: using urllib3 and this code solved the entire problem (i.e. the data is saved compressed, and nothing is lost) ? Or are there any remaining issues ?
0 -
The "requests" library will automatically decode data, according to this page.
You can use the command in this question to get encoded gzip data. Also, please see this related question.
0 -
really helpful. thanks!
0 -
that solved the issue, thanks!
0 -
Great, thanks for letting us know !
0
Categories
- All Categories
- 3 Polls
- 6 AHS
- 36 Alpha
- 166 App Studio
- 6 Block Chain
- 4 Bot Platform
- 18 Connected Risk APIs
- 47 Data Fusion
- 34 Data Model Discovery
- 684 Datastream
- 1.4K DSS
- 615 Eikon COM
- 5.2K Eikon Data APIs
- 10 Electronic Trading
- Generic FIX
- 7 Local Bank Node API
- 3 Trading API
- 2.9K Elektron
- 1.4K EMA
- 249 ETA
- 554 WebSocket API
- 37 FX Venues
- 14 FX Market Data
- 1 FX Post Trade
- 1 FX Trading - Matching
- 12 FX Trading – RFQ Maker
- 5 Intelligent Tagging
- 2 Legal One
- 23 Messenger Bot
- 3 Messenger Side by Side
- 9 ONESOURCE
- 7 Indirect Tax
- 60 Open Calais
- 275 Open PermID
- 44 Entity Search
- 2 Org ID
- 1 PAM
- PAM - Logging
- 6 Product Insight
- Project Tracking
- ProView
- ProView Internal
- 22 RDMS
- 1.9K Refinitiv Data Platform
- 643 Refinitiv Data Platform Libraries
- 4 LSEG Due Diligence
- LSEG Due Diligence Portal API
- 4 Refinitiv Due Dilligence Centre
- Rose's Space
- 1.2K Screening
- 18 Qual-ID API
- 13 Screening Deployed
- 23 Screening Online
- 12 World-Check Customer Risk Screener
- 1K World-Check One
- 46 World-Check One Zero Footprint
- 45 Side by Side Integration API
- 2 Test Space
- 3 Thomson One Smart
- 10 TR Knowledge Graph
- 151 Transactions
- 143 REDI API
- 1.8K TREP APIs
- 4 CAT
- 26 DACS Station
- 121 Open DACS
- 1.1K RFA
- 104 UPA
- 192 TREP Infrastructure
- 228 TRKD
- 915 TRTH
- 5 Velocity Analytics
- 9 Wealth Management Web Services
- 90 Workspace SDK
- 11 Element Framework
- 5 Grid
- 18 World-Check Data File
- 1 Yield Book Analytics
- 46 中文论坛