AWS File Extraction not working properly

Question

question

kai fang

1 ●4 ●3 ●7

AWS File Extraction not working properly

I try to get the stream file of my schedule task, and read it into python, using the rest api call, using the following headers:

headers = {'Authorization': 'Token %s' % auth, 'Prefer': 'respond-async', 'Accept-Charset': 'UTF-8', 'Content-Type': 'application/json',"X-Direct-Download":"true"}

When i do not include X-Direct-Download, i only get part of the data, compared to the csv file i directly download in the web GUI, however, by adding this, i get some text result like the following:

'\x1f�\x08\x00\x00\x00\x00\x00\x00\x00�][s\x1c�n~?��u�\x1aw��\x17��\x12m1�(\x15I%�_T�M�RŖO,�R��\x01fv�;3��Pܥ\x1d\x17�ds��h4\x1a@��/��V�\x7f��\x7f�yr�\u19db�\u05ef�'

which is clearly not correct. Is there any way to get correct full file as a stream in aws enviorment.

full code:

result = simple_get("/Extractions/ExtractedFiles('%s')/$value" % file_id, auth)

where:

def simple_get(endpoint, auth):

headers = {'Authorization': 'Token %s' % auth, 'Prefer': 'respond-async', 'Accept-Charset': 'UTF-8', 'Content-Type': 'application/json',"X-Direct-Download":"true"}

r = requests.get(url_base + endpoint, headers=headers)

return r

dss-rest-api tick-history-rest-api extraction

Oct 29, 2018 at 04:21 PM

10 |1500

Attachments: Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Christiaan Meihsl Oct 30, 2018 at 04:49 AM

@kf2449 - This is a private post

Considering your query, we moved it from the DSS to the TRTH forum, it seems more appropriate, and will help people discover it. AHS

pimchaya.wongrukun01 Nov 07, 2018 at 01:12 AM

Hello @kf2449

Thank you for your participation in the forum. Is the reply below satisfactory in resolving your query? If yes, please click the 'Accept' text next to the reply. This will guide all community members who have a similar question. Otherwise please post again offering further insight into your question.

Thanks,

AHS

wasin.w Nov 14, 2018 at 02:38 AM

Hello @kf2449

Thank you for your participation in the forum. Is the reply below satisfactory in resolving your query?

If so please can you click the 'Accept' text next to the appropriate reply. This will guide all community members who have a similar question.

Thanks,

AHS

zoya faberov Nov 21, 2018 at 02:36 PM

Hello @kf2449,

Please be informed that a reply has been verified as correct in answering the question, and has been marked as such.

Thanks,

-AHS

Answer 1 · 2018-10-30T05:08:35Z

Christiaan Meihsl

13.7k ●26 ●8 ●12

@@kf2449, in addition to the response by Jirapongse:

What you observe is actually gzipped data:

'\x1f�\x08\x00\x00\x00\x00\x00\x00\x00�][s\x1c�n~?��u�\x1aw��\x17��\x12m1�(\x15I%�_T�M�RŖO,�R��\x01fv�;3��Pܥ\x1d\x17�ds���h4\x1a@��/���V�\x7f��������\x7f�yr�\u19db�\u05ef�'

When handling the data, you are getting different behaviors due to the response headers, which differ between TRTH and AWS downloads.

TRTH returns the following content related headers:

Content-Encoding:gzip
Content-Type:text/plain

AWS returns different headers:

Content-Disposition:attachment; filename=_OnD_0x05de99857e5b3036.csv.gz
Content-Type:application/gzip

Your application reacts differently to these (many code libraries automatically decide to decompress data (or not) based on response headers), that is why you observe different data formats. It is best to set the appropriate headers and parameters to avoid such behavior. If you follow what is explained in the links sent by Jirapongse you should be able to download the entire file, in the correct format.

You can also look at the Python code samples set available under the downloads tab, and explained in this document. One of the samples (TRTH_Get_Latest_Schedule_Files) should be of help.

Oct 30, 2018 at 05:08 AM

10 |1500

Attachments: Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Answer 2 · 2018-10-30T04:46:07Z

Jirapongse

78.8k ●250 ●52 ●74

@kf2449

To avoid incomplete output, you need to download the Gzip file and then decompress it. It was mentioned in this documentation ADVISORY: AVOID INCOMPLETE OUTPUT - DOWNLOAD THEN DECOMPRESS. You can also refer to this article: How to Optimize TRTH (Tick History) file downloads for Python (and other languages) for Python.

To download from AWS, please refer to Boost TRTH downloads with AWS.

Oct 30, 2018 at 04:46 AM

10 |1500

Attachments: Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Answer 3 · 2018-10-30T04:56:52Z

veerapath.rungruengrayubkul

11.3k ●25 ●9 ●14

@kf2449

The response text likely is in gzip compression format. Normally, TRTH extraction provides data in gzip compress format. The Python Requests library automatically decompress data if the response header contains Content-Encoding="gzip". However, AWS does not set this header, so the Requests call does not decompress. Application needs to decompress data in separated call. Please see code example in the "TRTH Python Code" downloaded from this link.

Oct 30, 2018 at 04:56 AM

10 |1500

Attachments: Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Q&A Forum

question

AWS File Extraction not working properly

3 Answers

Write an Answer

question

AWS File Extraction not working properly

3 Answers

Write an Answer

Related Questions