For a deeper look into our DataScope Select REST API, look into:

Overview |  Quickstart |  Documentation |  Downloads |  Tutorials

question

Upvotes
Accepted
1 0 0 0

Is there is some header we need to set, as we might have triggered some kind of recompression during download?

Hi all.


We are consuming the https://selectapi.datascope.refinitiv.com RestApi/v1.

Methods I am calling are:

- [1] StandardExtractions/UserPackages

- [2] StandardExtractions/UserPackageDeliveries

- [3] StandardExtractions/UserPackageDeliveryGetUserPackageDeliveriesByPackageId


The process is fine and we are getting the file list and are able to download files, but the issue I am facing is that the MD5 checksum and file size received from the api [1] does not match the file's MD5 checksum or file size when I download it.

For example, this is what we receive from API:

file: AEX-2024-04-09-Instruments-SEDOL-1-of-1.csv.gz

package: 0x08e342ecb3b9c483

md5: f805ce4d58892c7c097a6486c002452a

file_size: 1104824











But the actual downloaded file has md5sum equal to b31566ff09092d14480389c276a2729b and its filesize is 1028387.










If I decompress the archive, the md5sum is 1c4bf6cfe0207081e84082761ef68f09


My only suspicion is that there is some header we need to set, as we might have triggered some kind of recompression during download?

If it matters, I am using the Python programming language and the module I use to query / download is requests.


I checked https://developers.lseg.com/content/dam/devportal/api-families/thomson-reuters-tick-history-trth/thomson-reuters-tick-history-trth-rest-api/documentation/tick_hist_rest_api-guide_november2019.pdf and the answer was not there :)


Here is the minimum code example (username and password omitted):


import requests

import os

import json

import hashlib


URL = "https://selectapi.datascope.refinitiv.com"

API = "RestApi/v1"


def uri(path):

return os.path.join(URL, API, path)


def download_file(path, output_stream):

payload = {"Credentials": {"Username": username, "Password": password}}

headers = {"Content-type": "application/json"}

token = requests.post(uri("Authentication/RequestToken"), headers=headers, data=json.dumps(payload)).json()["value"]

auth_header = {"Authorization": f"Token {token}"}


md5 = hashlib.md5()

file_size = 0

for chunk in requests.get(uri(path), stream=True, headers=auth_header).iter_content(chunk_size=8192):

output_stream.write(chunk)

md5.update(chunk)

file_size += len(chunk)


return (md5.hexdigest(), file_size)


with open("/tmp/foo", "wb") as f:

path = "StandardExtractions/UserPackageDeliveries('0x08e342ecb3b9c483')/$value?fn=AEX-2024-04-09-Instruments-SEDOL-1-of-1.csv.gz"

ret = download_file(path, f)

print(ret)


#technologydss-rest-api
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@piotr.wroblewski

Hi,

Thank you for your participation in the forum.

Are any of the replies below satisfactory in resolving your query?

If yes please click the 'Accept' text next to the most appropriate reply, and then close the question. This will guide all community members who have a similar question.

Otherwise please post again offering further insight into your question.

Thanks,

AHS

Please be informed that a reply has been verified as correct in answering the question, and has been marked as such.

Thanks,


AHS


Upvote
Accepted
22.2k 59 14 21

Hi @piotr.wroblewski,

The Python requests module will try to decompress the file on the fly and it can fail due to the large size. The advise in this manner is to download the raw file and the perform the actions like MD5 or unzip etc. Here is the code that I used along with the results for your file:

dResp = requests.get(url, headers=hdrs, stream=True)
# do not auto decompress the data
dResp.raw.decode_content = False

chunkSize = 1024*1024

with open(fileName, 'wb') as f:
  for chunk in dResp.iter_content(chunk_size=chunkSize):
  if chunk:
    f.write(chunk)

The result:

> ls -la *.gz
-rwx------+ 1 xxxxx None 1104824 Apr 10 10:02 AEX-2024-04-09-Instruments-SEDOL-1-of-1.csv.gz

> certutil -hashfile AEX-2024-04-09-Instruments-SEDOL-1-of-1.csv.gz MD5
MD5 hash of AEX-2024-04-09-Instruments-SEDOL-1-of-1.csv.gz:
f805ce4d58892c7c097a6486c002452a

The file size and the MD5 matches the original published parameters.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
22.2k 59 14 21

In addition you can also download the file from AWS direct for faster downloads. It should not effect the file size or MD5 hash. Add this to your request headers:

'X-Direct-Download': 'true'
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
1 0 0 0

Hi @Gurpreet Many thanks for your guidance. Appreciate it :)


Client managed to make it work by using:

resp.raw.stream(1024*1024, decode_content=False):

instead of:

resp.iter_content(chunk_size=1024*1024):

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.