TRTH V2 REST API Download Extracted Files
After I have scheduled my report template and the extracted files are in place, how do I get the files downloaded? The API tutorial uses ExtractFiles('id')/$value. I tried to use it, but somehow it downloads partial file. However, I was concerned about this approach as the actual file size iis 500MB, zipped. What's the point to get the data though an unzipping process again? The DataScope interface seems provides such a link for direct download, which makes me believe there should be a better way in the rest API to achieve this?
Best Answer
-
Yes, the code I give works just fine.
Further research shows that the zlib module used by python requests (v 2.8.1) has compatibility issue, as the zlib will spill out zero bytes decompressed data at some point, which essentially gave me the partial file. However, without decode on the fly (i.e. take zlilb out of the process), after the gzipped file is downloaded, the gzip/zcat etc. utility works fine on the file.
0
Answers
-
@jsi-data, you do not mention what programming language you are using. I shall assume you are using Java. If that is not the case, then please tell us what you are using.
You can save the data file to hard disk without decompressing it, using the following code (input parameters are the output file path and name, and the file ID):
public void extractAndSaveFile( String filename, String ExtractedFileId) {
String urlGet = urlHost + "/Extractions/ExtractedFiles('"+ExtractedFileId+"')/$value";
try {
URL myURL = new URL(urlGet);
HttpURLConnection myURLConnection = (HttpURLConnection)myURL.openConnection();
myURLConnection.setRequestProperty("Authorization", "Token "+sessionToken);
myURLConnection.setRequestProperty("Accept-Charset", "UTF-8");
myURLConnection.setRequestMethod("GET");
try( DataInputStream readerIS = new DataInputStream( myURLConnection.getInputStream())) {
Files.copy (readerIS, Paths.get (filename));
}
} catch (IOException e) {
e.printStackTrace();
}
}This is an extract from our sample DSS2ImmediateScheduleTicksTRTHClient2 which is part of the Java code sample set available for download.
0 -
Sorry I was using Python, direct http request.
0 -
Here is my http response listing the files extracted, I can see the file name for the Note and the actual data zipped format, and I remember in the previous version of the soap API, I can just use wget on a FTP download link directly retrieve the zipped data file. I don't understand why I have to go thought the $value link and save the data, which as far as I can tell, is unzipped plain text.
-----------DUMP OF RESPONSE-----------
{'Content-Length': '804', 'X-Request-Execution-Correlation-Id': '5ccb6ba9-8084-4e32-ab85-4e718e65a02b', 'X-App-Id': 'Custom.RestApi', 'Set-Cookie': 'DSSAPI-
COOKIE=R3148268809; path=/', 'Expires': '-1', 'Server': 'Microsoft-IIS/7.5', 'X-App-Version': '11.0.487.64', 'Pragma': 'no-cache', 'Cache-Control': 'no-cach
e', 'Date': 'Fri, 12 May 2017 13:25:13 GMT', 'Content-Type': 'application/json; charset=utf-8'} 200 OK
@{"@odata.context":"https://hosted.datascopeapi.reuters.com/RestApi/v1/$metadata#ExtractedFiles","value":[{"ExtractedFileId":"VjF8MHgwNWI3MGU4OGNiMmIyZjg2fA"
,"ReportExtractionId":"2000000000407358","ScheduleId":"0x05b70e670e9b2f86","FileType":"Note","ExtractedFileName":"9012092.DailyExtractionByTemplate_20170511
_0922.20170512.092254.2000000000407358.tm01n01.csv.gz.notes.txt","LastWriteTimeUtc":"0001-01-01T00:00:00.000Z","ContentsExists":false,"Size":0},{"ExtractedF
ileId":"VjF8MHgwNWI3MGU4ODc3NmIyZjc2fA","ReportExtractionId":"2000000000407358","ScheduleId":"0x05b70e670e9b2f86","FileType":"Full","ExtractedFileName":"901
2092.DailyExtractionByTemplate_20170511_0922.20170512.092254.2000000000407358.tm01n01.csv.gz","LastWriteTimeUtc":"2017-05-12T13:25:11.000Z","ContentsExists"
:true,"Size":87590966}]}0 -
Do you use the Python Request module for the http request?
The API generally provides the extracted files in the format stated in the server. However, according to the guide,
the Request library automatically decompress the gzip and deflate
encoding data, so you will always receive unzipped plain text.To retrieve actual data format, you can use the Raw Response Content to get actual gzip file.
Below is the sample code to write Extracted file in actual file format. You have to add stream=True in the get() method.
urlFile = "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractedFiles(\'VjF8MHgwNWI3MGMyOTI5YmIyZjc2fA\')/$value"
headers = {'Authorization': myToken, 'Accept-Encoding': 'compress'}
resp = requests.get(urlFile, headers=headers, stream=True)
...
with open(extractFileName, 'wb') as fd:
fd.write(resp.raw.read())0 -
I do use python requests. I was using the session object and prepare request to get the raw content. However, I also tried what you suggested above, still partial file. I downloaded the file twice, and the actual file size is ~4M zipped, but I only got ~94K data. No error was reported. You can try this,
url="https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractedFiles('VjF8MHgwNWI4MDk4ZjZiOWIyZjg2fA')/$value"
r = requests.get(url, headers=headers, stream=True, proxies=PROXIES, auth=("9012092","xxxxxxx"))
with gzip.open("t1", 'wb') as f:
for chunk in r.iter_content(chunk_size=1024 * 1024):
f.write(chunk)And this is the file I got, whereas the actual file should be ~4M showing in the http response.
rw-r--r-- 1 jsi_trth jsi_data_group 94914 May 15 10:38 t1
@{"@odata.context":"https://hosted.datascopeapi.reuters.com/RestApi/v1/$metadata#ExtractedFiles","value":[{"ExtractedFileId":"VjF8MHgwNWI4MDk4ZjZi
OWIyZjg2fA","ReportExtractionId":"2000000000413472","ScheduleId":"0x05b8096826bb3016","FileType":"Full","ExtractedFileName":"9012092.DailyExtract
ionByTemplate_20170511_1029.20170515.102932.2000000000413472.tm03n01.csv.gz","LastWriteTimeUtc":"2017-05-15T14:32:12.000Z","ContentsExists":true,
"Size":4317179}]}0 -
What you describe reminds me of a case a customer had recently, where a download of a large file resulted in a very small file. This was due to a proxy, and I see you are using one.
Could you test your code on a different machine, or even maybe from your home ? The point is to check if the proxy is the issue or not.
0 -
@jsi-data
I have tried your code and found the same behavior. I understand that your code get the data which has already decompressed by the Python Requests and then compress it again to create a gzip file. It seems that there is a issue when the Requests tries to decompress the downloaded data.
I have tried the code I provided again with the extracted file which has 14 MB. The code can create a file which has the same size as the one shown in the http response.
Can you dump the response header to verify the actual response size?
0 -
A little more research shows some interesting finds.
1. For the two files I got from scheduled extraction, one Note file and one data file, all gzipped. When I requested the $value, the responses are different. Note file response headers have the encoding method as 'chunked', no content size. Whereas the data file has content size, and no 'chunked' method. I suspect the content size header information confused the python requests lib, which results in the behavior of partial download. However, I think Reuters should find out why the responses are different. I can download the first file without problem. From the requests lib doc, it seems chunked method should be right header info here.
2. If you download the file as requests lib suggested using iter_contents, then the data is decoded(unzipped in my case), even I use decode_unicode=Fasle. In this case, the size returned by the lib is actually the unzipped bytes, and this may have confused the lib, as the content header is actually gzipped file size. However, the working method should be go directly to the underlying lib, and do below. It will open a generator on the data stream which you can iterate through the raw bytes and save the gzipp file.
for data in r.raw.stream(decode_content=False) :
f.write(data)
0 -
The
Note file and the data file generally are created in different format;
the Note file is plain text, while the data file is gzip. so the way,
the data is delivered, may be different.Anyway, could you please confirm if you are able to retreive the correct gzip file with the code you provided in 2)?
If
not, was the Content-Length of response header the same as the size
returned from the /Extractions/ExtractedFiles({id}) end point?0
Categories
- All Categories
- 3 Polls
- 6 AHS
- 36 Alpha
- 166 App Studio
- 6 Block Chain
- 4 Bot Platform
- 18 Connected Risk APIs
- 47 Data Fusion
- 34 Data Model Discovery
- 684 Datastream
- 1.4K DSS
- 615 Eikon COM
- 5.2K Eikon Data APIs
- 10 Electronic Trading
- Generic FIX
- 7 Local Bank Node API
- 3 Trading API
- 2.9K Elektron
- 1.4K EMA
- 249 ETA
- 554 WebSocket API
- 37 FX Venues
- 14 FX Market Data
- 1 FX Post Trade
- 1 FX Trading - Matching
- 12 FX Trading – RFQ Maker
- 5 Intelligent Tagging
- 2 Legal One
- 23 Messenger Bot
- 3 Messenger Side by Side
- 9 ONESOURCE
- 7 Indirect Tax
- 60 Open Calais
- 275 Open PermID
- 44 Entity Search
- 2 Org ID
- 1 PAM
- PAM - Logging
- 6 Product Insight
- Project Tracking
- ProView
- ProView Internal
- 22 RDMS
- 1.9K Refinitiv Data Platform
- 643 Refinitiv Data Platform Libraries
- 4 LSEG Due Diligence
- LSEG Due Diligence Portal API
- 4 Refinitiv Due Dilligence Centre
- Rose's Space
- 1.2K Screening
- 18 Qual-ID API
- 13 Screening Deployed
- 23 Screening Online
- 12 World-Check Customer Risk Screener
- 1K World-Check One
- 46 World-Check One Zero Footprint
- 45 Side by Side Integration API
- 2 Test Space
- 3 Thomson One Smart
- 10 TR Knowledge Graph
- 151 Transactions
- 143 REDI API
- 1.8K TREP APIs
- 4 CAT
- 26 DACS Station
- 121 Open DACS
- 1.1K RFA
- 104 UPA
- 192 TREP Infrastructure
- 228 TRKD
- 915 TRTH
- 5 Velocity Analytics
- 9 Wealth Management Web Services
- 90 Workspace SDK
- 11 Element Framework
- 5 Grid
- 18 World-Check Data File
- 1 Yield Book Analytics
- 46 中文论坛