...ing download?
Hi all.
We are consuming the https://selectapi.datascope.refinitiv.com RestApi/v1.
Methods I am calling are:
- [1] StandardExtractions/UserPackages
- [2] StandardExtractions/UserPackageDeliveries
- [3] StandardExtractions/UserPackageDeliveryGetUserPackageDeliveriesByPackageId
The process is fine and we are getting the file list and are able to download files, but the issue I am facing is that the MD5 checksum and file size received from the api [1] does not match the file's MD5 checksum or file size when I download it.
For example, this is what we receive from API:
file: AEX-2024-04-09-Instruments-SEDOL-1-of-1.csv.gz
package: 0x08e342ecb3b9c483
md5: f805ce4d58892c7c097a6486c002452a
file_size: 1104824
But the actual downloaded file has md5sum equal to b31566ff09092d14480389c276a2729b and its filesize is 1028387.
If I decompress the archive, the md5sum is 1c4bf6cfe0207081e84082761ef68f09
My only suspicion is that there is some header we need to set, as we might have triggered some kind of recompression during download?
If it matters, I am using the Python programming language and the module I use to query / download is requests.
I checked https://developers.lseg.com/content/dam/devportal/api-families/thomson-reuters-tick-history-trth/thomson-reuters-tick-history-trth-rest-api/documentation/tick_hist_rest_api-guide_november2019.pdf and the answer was not there
Here is the minimum code example (username and password omitted):
import requests
import os
import json
import hashlib
URL = "https://selectapi.datascope.refinitiv.com"
API = "RestApi/v1"
def uri(path):
return os.path.join(URL, API, path)
def download_file(path, output_stream):
payload = {"Credentials": {"Username": username, "Password": password}}
headers = {"Content-type": "application/json"}
token = requests.post(uri("Authentication/RequestToken"), headers=headers, data=json.dumps(payload)).json()["value"]
auth_header = {"Authorization": f"Token {token}"}
md5 = hashlib.md5()
file_size = 0
for chunk in requests.get(uri(path), stream=True, headers=auth_header).iter_content(chunk_size=8192):
output_stream.write(chunk)
md5.update(chunk)
file_size += len(chunk)
return (md5.hexdigest(), file_size)
with open("/tmp/foo", "wb") as f:
path = "StandardExtractions/UserPackageDeliveries('0x08e342ecb3b9c483')/$value?fn=AEX-2024-04-09-Instruments-SEDOL-1-of-1.csv.gz"
ret = download_file(path, f)
print(ret)