question

Upvote
Accepted
16 1 2 4

TRTH v2 using Unirest java http client vbd gzip download issue

We’re having issues while trying to programmatically run DSS StandardExtractions UserPackageDelivery using Unirest java http client.

When trying to stream the file contents using ‘StandardExtractions/UserPackageDeliveries({id})/$value’ request we do not get the file in compressed ‘gzip’ format as advised by the API documentation.

While sending the http GET request we’re sending the header ‘Accept-Encoding: gzip’ as advised. We can also see the response header correctly puts ‘Content-Encoding=gzip’, the response body however appears to be plain text format. How can we receive the streamed file contents in gzip format for StandardExtractions?

tick-history-rest-apistreaming-pricesvenue-by-daycompression
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Contacted DSS SWAT to verify the problem.

@Arvind Kaushik, considering your query is related to TRTH, we moved it from the DSS to the TRTH forum, it seems more appropriate, and will help people discover it. AHS

@Arvind Kaushik, thank you for your participation in the forum. Are any of the replies below satisfactory in resolving your query? If yes please click the 'Accept' text next to the most appropriate reply. This will guide all community members who have a similar question. Otherwise please post again offering further insight into your question. Thanks, AHS

Hello again!

Thank you for your participation in the forum. Is the reply below satisfactory in resolving your query?

If yes please click the 'Accept' text next to the reply. This will guide all community members who have a similar question. Otherwise please post again offering further insight into your question.

Thanks,

AHS

Please be informed that a reply has been verified as correct in answering the question, and has been marked as such.

Thanks,

AHS

Upvotes
Accepted
78.8k 250 52 74
@Arvind Kaushik

I think that you are correct. From my test, the response always contain "Content-Encoding: gzip" no matter the "Accept-Encoding: gzip" is present, or not. It conflicts with the statement mentioned in the documents. I will contact the development team to verify it.

For now, to get the raw gz file, you can use this kind of Java code.

		HttpClient client = HttpClients.custom().disableContentCompression().build();
		
		HttpGet httpget = new HttpGet("https://hosted.datascopeapi.reuters.com/RestApi/v1/StandardExtractions/UserPackageDeliveries('0x05d0f50a992b2f96')/$value");
		httpget.addHeader("Authorization", "Token <token>");
		CloseableHttpResponse response;
		try {
			response = (CloseableHttpResponse) client.execute(httpget);
			InputStream is = response.getEntity().getContent();
			FileOutputStream fos = new FileOutputStream(new File("c:\\output.cvs.gz"));
			int inByte;
			while((inByte = is.read()) != -1)
			     fos.write(inByte);
			is.close();
			fos.close();
		} catch (ClientProtocolException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
78.8k 250 52 74

@Arvind Kaushik

Refer to Unirest code HttpReponse.java at line 84, it decompresses the data if the data is gzipped.

To get the gzip file, you need to use the solution mentioned in this question.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

I have used client other than Unirest with closeable http client like this:

CloseableHttpClient client = HttpClients.custom().disableContentCompression().build();
Unirest.setHttpClient(client);

Also, I have tried using java library directly too with same result:

	            HttpURLConnection conn= (HttpURLConnection)myURL.openConnection();
	            conn.setRequestProperty("Authorization", "Token " + getAuthToken());
	            conn.setRequestProperty("Accept-Encoding", "gzip");
	            conn.setRequestMethod("GET");
	            conn.getInputStream();



I do see the Unirest code where it checks the content-encoding:gzip, however I am not expecting 'content-encoding:gzip' to be set on the response as per : https://hosted.datascopeapi.reuters.com/RestApi.Help/Home/KeyMechanisms?ctx=Extractions&tab=0&uid=Streaming its mentioned that if request header does not have 'Accept-Encoding:gzip' set then the response header wouldn't have 'content-encoding:gzip' set for response header.

Upvotes
13.7k 26 8 12

@Arvind Kaushik, thanks to some out of band information received separately by email I shall answer this one.

If I was well informed, you retrieved the list of user package deliveries with a GET to this URL:

https://hosted.datascopeapi.reuters.com/RestApi/v1/StandardExtractions/UserPackageDeliveryGetUserPackageDeliveriesByPackageId(PackageId='0x0460dc1d24a62cb1')

The result is:

{
    "@odata.context": "https://hosted.datascopeapi.reuters.com/RestApi/v1/$metadata#UserPackageDeliveries",
    "value": [
        {
            "PackageDeliveryId": "MHgwNDYwZGMxZDI0YTYyY2IxfEZGRS1MZWdhY3l8SW5zaWRlcnNcVVNcTW9kZWx2M3xXZWVrbHl8SVRNdjNXZWVrbHkuMjAxNzAyMDI",
            "UserPackageId": "0x0460dc1d24a62cb1",
            "SubscriptionId": "0x0400dc1d24a00cb3",
            "Name": "ITMv3Weekly.20170202",
            "ReleaseDateTime": "2017-02-02T11:31:35.000Z",
            "FileSizeBytes": 314925,
            "Frequency": "Weekly"
        },
etc.

The issue is with the UserPackageId 0x0460dc1d24a62cb1, which is not for TRTH VBD, it is for Insider data. That is why you get something different from what is shown in step 4 of TRTH REST API Tutorial 2 (get list of user package deliveries).

See the value of UserPackageId in this extract from the first step of TRTH REST API Tutorial 2 (list all user packages):

… 
{ 
 "UserPackageId": "0x04f21a8d20c59cb1", 
 "PackageId": "0x04f21a8d20c59cb1", 
 "PackageName": "KAR - Karachi Stock Exchange", 
 "SubscriptionId": "0x0400dc1d24a00cb4", 
 "SubscriptionName": "TRTH Venue by Day" 
}, 
{ 
 "UserPackageId": "0x0460dc1d24a62cb1", 
 "PackageId": "0x0460dc1d24a62cb1", 
 "PackageName": "US Insider Trading Model v3", 
 "SubscriptionId": "0x0400dc1d24a00cb3", 
 "SubscriptionName": "Insider" 
},
…

Use a PackageId that is related to "SubscriptionName": "TRTH Venue by Day" and you will receive TRTH data as described in the tutorial, in gzip format.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
16 1 2 4

@Christiaan Meihsl

I'm not sure what your source of information is, but I have tried using number of different requests with similar results. e.g

https://hosted.datascopeapi.reuters.com/RestApi/v1/StandardExtractions/UserPackageDeliveryGetUserPackageDeliveriesByDateRange(SubscriptionId='0x0400dc1d24a00cb4',FromDate=2017-07-30,ToDate=2017-07-31)
	
	{
    "@odata.context": "https://hosted.datascopeapi.reuters.com/RestApi/v1/$metadata#UserPackageDeliveries",
    "value": [
        {
            "PackageDeliveryId": "0x05d0a0119a2b3016",
            "UserPackageId": "0x04f21a8d1f759cb1",
            "SubscriptionId": "0x0400dc1d24a00cb4",
            "Name": "IOM-2017-07-30-NORMALIZEDMP-Report-4-of-6.csv.gz",
            "ReleaseDateTime": "2017-07-31T03:00:00.000Z",
            "FileSizeBytes": 841968,
            "Frequency": "Daily",
            "ContentMd5": ""
        },
	
	https://hosted.datascopeapi.reuters.com/RestApi/v1/StandardExtractions/UserPackageDeliveries('0x05d0a0119a2b3016')/$value
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Well I received a request from Steven Peng, where I saw that the UserPackageId was for Insider instead of TRTH VBD, hence my answer.

Upvotes
13.7k 26 8 12

@Arvind Kaushik, I'm assuming you want to save the data, not decompress it on the fly. If that is the case, can you try this:

String urlGet = urlHost + "/StandardExtractions/UserPackageDeliveries('"+FileId+"')/$value";
try {
    URL myURL = new URL(urlGet);
    HttpURLConnection myURLConnection = (HttpURLConnection)myURL.openConnection();
    myURLConnection.setRequestProperty("Authorization", "Token "+sessionToken);
    myURLConnection.setRequestProperty("Accept-Encoding", "gzip");
    myURLConnection.setRequestProperty("Accept-Charset", "UTF-8");
    myURLConnection.setRequestMethod("GET");

    try( DataInputStream readerIS = new DataInputStream( myURLConnection.getInputStream())) {
        Files.copy (readerIS, Paths.get (filename));
    }

I have not had the occasion to try this with VBD, but I think it should work.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
16 1 2 4

@Christiaan Meihsl

You mean without specifying 'Accept-Encoding:gzip' so that the 'Content-Encoding:gzip' is not returned in the response ?

I shall give it a try.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@Arvind Kaushik, oops, sorry, I forgot that header, please include header 'Accept-Encoding:gzip'.

I'll edit my response above to add this line.

Upvotes
16 1 2 4

@Christiaan Meihsl

I did try this with Postman, however I still see the content-encoding:gzip being returned in the response header.

Request Headers:
cache-control:"no-cache"
Postman-Token:"ea9146cd-c631-4161-b248-06b476adcb68"
Prefer:"respond-async"
Authorization:"Basic OTAxMjYyMzppaHNtQHJraXQ="
User-Agent:"PostmanRuntime/6.1.6"
Accept:"*/*"
Host:"hosted.datascopeapi.reuters.com"
cookie:"DSSAPI-COOKIE=R3148268809"
accept-encoding:"gzip, deflate"
Response Headers:
set-cookie:"DSSAPI-COOKIE=R3148268809; path=/"
cache-control:"no-cache"
pragma:"no-cache"
content-length:"841968"
content-type:"text/plain"
content-encoding:"gzip"
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@Arvind Kaushik, my suggestion was actually to try the piece of Java code I posted.

Upvotes
426 2 4 4
@Arvind Kaushik

Sorry, I think I have given Christiaan wrong information. Anyway, I did a quick test with Python using the URL your provided and was able to retrieve the file as gz file. Have you tested with the Java solution suggested by Jiraponse?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Jiraponse?? do you have a link?

Upvotes
16 1 2 4

under https://hosted.datascopeapi.reuters.com/RestApi.Help/Home/KeyMechanisms?ctx=Extractions&tab=0&uid=Streaming its mentioned that if request header does not have 'Accept-Encoding:gzip' set then the response header wouldn't have 'content-encoding:gzip' set for response header.

Standard Extractions feeds content can deliver very large files that would increase the burdens on our servers and bandwidth were the files not compressed. In this situation we always deliver the file in a compressed format, although if the Accept-Encoding: gzip is not present, the Content-Encoding: gzip header will not be included in the response, but the file will always be compressed as gzip if that subscription delivers gzipped content.

I do not see this happening for my testing though. I always get content-encoding:gzip causing the stream to be unzipped, contrary to what I want.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.