question

Upvotes
Accepted
36 2 5 6

x-direct-download aws access key error

Hi,

I'm trying to download the latest extracted file for an everyday schedule. I was using the x-direct-download header to download the files directly from the aws server. However I get an aws access key error. I guess the key should be something that must be provided from your api end when it redirects. If it is supposed to be provided by us, then how to get it? I get the following error messages when trying the direct download:

<Error><Code>SignatureDoesNotMatch</Code><Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message><AWSAccessKeyId>AKIAJVAI4XORJURKYMEA</AWSAccessKeyId><StringToSign>GET

text/plain
1508866448
x-amz-request-payer:requester
/tickhistory.query.production.hdc-results/569288DA08B4447C911246BAFDFF15E2/data/merged/merged.csv.gz?response-content-disposition=attachment; filename=latestdownload.csv.gz</StringToSign><SignatureProvided>jSLffd/FgvE9+OJxV4MJesQmru8=</SignatureProvided><StringToSignBytes>47 45 54 0a 0a 74 65 78 74 2f 70 6c 61 69 6e 0a 31 35 30 38 38 36 36 34 34 38 0a 78 2d 61 6d 7a 2d 72 65 71 75 65 73 74 2d 70 61 79 65 72 3a 72 65 71 75 65 73 74 65 72 0a 2f 74 69 63 6b 68 69 73 74 6f 72 79 2e 71 75 65 72 79 2e 70 72 6f 64 75 63 74 69 6f 6e 2e 68 64 63 2d 72 65 73 75 6c 74 73 2f 35 36 39 32 38 38 44 41 30 38 42 34 34 34 37 43 39 31 31 32 34 36 42 41 46 44 46 46 31 35 45 32 2f 64 61 74 61 2f 6d 65 72 67 65 64 2f 6d 65 72 67 65 64 2e 63 73 76 2e 67 7a 3f 72 65 73 70 6f 6e 73 65 2d 63 6f 6e 74 65 6e 74 2d 64 69 73 70 6f 73 69 74 69 6f 6e 3d 61 74 74 61 63 68 6d 65 6e 74 3b 20 66 69 6c 65 6e 61 6d 65 3d 6c 61 74 65 73 74 64 6f 77 6e 6c 6f 61 64 2e 63 73 76 2e 67 7a</StringToSignBytes><RequestId>6D0CF27BCFA8F1B8</RequestId><HostId>SsWFsDplls+pBOt4kIJ6oaHmPd2u5pajDLqEAgUzxOJYC3V8ihWBw14IyJAUV7wQLlEjvUTeDQg=</HostId></Error>

tick-history-rest-apiaws
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvote
Accepted
36 2 5 6

Figured out the problem was with the requests library. Upgrading it made everything work perfectly.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@sgan208, thank you for letting us know :-)

Upvotes
13.7k 26 8 12

@sgan208,

You do not have to manage the AWS key, it should be transparent to you. Here is the mechanism:

When you send the call to retrieve your data with the X-Direct-Download: true header, you receive a response with HTTP status 302 (redirect). The response header contains a redirection URI in item Location. It has this format:

https://s3.amazonaws.com/tickhistory.query.production.hdc-results/C15C8A6BBD824C5FB5BE39AD36BD9B16/data/merged/merged.csv.gz?AWSAccessKeyId=AKIAJVAI4XORJURKYMEA&Expires=1508181359&response-content-disposition=attachment%3B%20filename%3D_OnD_0x05e996379ebb3036.csv.gz&Signature=6TRD6XTbeesQ7165A3KF%2BDofgPs%3D&x-amz-request-payer=requester

This is a self signed URI, using an AWS Access Key Id which is included directly in the URI.

Most HTTP clients automatically follow the redirection, which means you have nothing to do. A call is made in the background to this URI, and the data is retrieved. You can actually check if the redirection works by sniffing the traffic, with a tool like Fiddler or equivalent.

Some HTTP clients support and follow the redirection, but fail to connect to AWS, because they include the Authorization header in the request message redirected to AWS, which then returns a BadRequest status (400) with the error:

Only one auth mechanism allowed; only the X-Amz-Algorithm query parameter, Signature query string parameter or the Authorization header should be specified”.

That is described in an advisory.

But the error you encountered is different.

If this generic description of the mechanism is not sufficient to solve your issue:

  • You might want to look at some of our samples, they were recently enhanced to use AWS. Under the downloads tab there are the .Net SDK Tutorials code (2, 4 and 5 use AWS), the Java samples (2 for immediate schedules and 2 for On demand extractions) and a Python sample.
  • Alternatively, if you'd like us to help you further, we'd need to know exactly what workflow you followed, from the call to retrieve the data till the occurence of the error. The code you use would also be of help.
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
36 2 5 6

Hi,

Here's the full code:

#!/bin/python
# coding: utf-8

import requests
import json
import shutil
import time
import sys


base_url = "https://hosted.datascopeapi.reuters.com/RestApi/v1/"
auth_req_url = base_url+"Authentication/RequestToken"
instrument_url = base_url + "Extractions/InstrumentLists"
get_scheduled_url = base_url + "Extractions/Schedules"
get_report_status_url = base_url + "Extractions/ReportExtractions"
download_extracted_url = base_url + "Extractions/ExtractedFiles"

requestHeaders={ "Prefer":"respond-async", "Content-Type":"application/json" }
credData = { 'Credentials' : { 'Username' : CENSORED, 'Password' : CENSORED} }

def getAuthToken( data = credData):
    r = requests.post(url = auth_req_url, json = data, headers = requestHeaders)
    if(r.status_code == 200):
        jsonResponse = json.loads(r.text.encode('ascii', 'ignore'))
        token = jsonResponse["value"]
        return token

def addToInstrumentList( token, listName, identifierList ):
    requestHeaders["Authorization"] = "token " + token
    url = instrument_url+"('" +listName + "')/ThomsonReuters.Dss.Api.Extractions.InstrumentListAppendIdentifiers"
    data = {"Identifiers": identifierList, "KeepDuplicates":False}
    print data
    r =requests.post(url, json=data, headers=requestHeaders )
    # if(r.status_code == 200):
    print r.status_code, r.text

def getLatestData( token, scheduleId ):
    requestHeaders["Authorization"] = "token " + token
    url = get_scheduled_url+ "('" + scheduleId + "')/LastExtraction"
    # print url
    r = requests.get(url, headers=requestHeaders)
    if(r.status_code==200):
        jsonResponse = json.loads(r.text.encode('ascii', 'ignore'))
        return jsonResponse["ReportExtractionId"]
    else:
        print "Can't get report id. Here's some debug info: ",r.status_code, r.text

def getReportFiles( token, extractionId ):
    requestHeaders["Authorization"] = "token " + token
    url = get_report_status_url + "('" + extractionId + "')/Files"
    r = requests.get( url, headers=requestHeaders )
    if(r.status_code == 200):
        jsonResponse = json.loads(r.text.encode('ascii', 'ignore'))["value"]
        fileTuple = {}
        fileTuple["notes"] = jsonResponse[0]["ExtractedFileId"]
        fileTuple["file"] = jsonResponse[1]["ExtractedFileId"]
        return fileTuple
    else:
        print "Can't get extracted files. Here's some debug info: ",r.status_code, r.text


def downloadReportFiles( token, fileId, outfile ):
    requestHeaders["Authorization"] = "token " + token
    requestHeaders["Accept-Encoding"] = "gzip"
    requestHeaders["Content-Type"] = "text/plain"
    requestHeaders["X-Direct-Download"] = "true"
    url = download_extracted_url + "('" + fileId + "')/$value"
    # print url, requestHeaders
    r = requests.get( url, headers=requestHeaders, stream=True )
    if(r.status_code == 302):
        print r
    r.raw.decode_content = False
    print r.status_code, r.headers["Content-Type"]#, r.headers["Content-Encoding"], r.headers["Content-Length"]
    fileName = outfile
    chunk_size = 1024
    rr = r.raw
    with open(fileName, 'wb') as fd:
        shutil.copyfileobj(rr, fd, chunk_size)

if __name__ == "__main__":
    token = getAuthToken()
    print "Token is: ",token
    scheduleId = CENSORED
    reportId = getLatestData(token, scheduleId)
    extractedFiles = getReportFiles(token, reportId)
    timestr = time.strftime("%Y%m%d-%H%M%S")
    outFileName = "download_"+timestr+".csv.gz"
    downloadReportFiles(token, extractedFiles["file"], outFileName)
    # notesFileName = "notes_"+timestr+".csv.gz"
    # downloadReportFiles(token, extractedFiles["notes"], notesFileName)

Also the problem seems machine specific. Works on some machine but on others I get this 403 forbidden. Any idea why?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
13.7k 26 8 12

@sgan208, your code is ok, I have just tested it successfully.

Your last sentence makes me wonder: are you by chance behind a firewall or proxy ?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
36 2 5 6

No.There's no firewall. The python libraries may differ. I'll check and get back.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
47.2k 109 44 60

You can try to disable redirection in requests. For example:

requests.get(url, headers=requestHeaders, allow_redirects=False)

After that, the application will get the HTTP response with status code 302.

HTTP/1.1 302 Found
Cache-Control: no-cache
Date: Sun, 03 Sep 2017 10:34:17 GMT
Expires: -1
Location: https://s3.amazonaws.com/tickhistory.query.production.hdc-results/xxx/data/merged/merged.csv.gz?AWSAccessKeyId=xxx&Expires=1504456458&response-content-disposition=attachment%3B%20filename%3D_OnD_0x05dbb5f5a62b3016.csv.gz&Signature=xxx&x-amz-request-payer=requester

The Location header in the response contains the AWS URL for download. Then, the application needs to send another GET request with this AWS URL without any headers to download a file.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.