How to make sure content length matches the downloaded files from Tick History API?

I am looking at an extraction request based upon:

{ "ExtractionRequest": { "@odata.type": "#DataScope.Select.Api.Extractions.ExtractionRequests.TickHistoryRawExtractionRequest", "IdentifierList": { "@odata.type": "#DataScope.Select.Api.Extractions.ExtractionRequests.InstrumentIdentifierList", "InstrumentIdentifiers": [ { "Identifier": "ESH25", "IdentifierType": "Ric" }, { "Identifier": "ESM25", "IdentifierType": "Ric" }, { "Identifier": "ESU25", "IdentifierType": "Ric" } ], "ValidationOptions": { "AllowHistoricalInstruments": true }, "UseUserPreferencesForValidationOptions": false }, "Condition": { "MessageTimeStampIn": "LocalExchangeTime", "ReportDateRangeType": "Range", "QueryStartDate": "2025-01-16T17:00:00-06:00", "QueryEndDate": "2025-01-17T16:00:00-06:00", "DisplaySourceRIC": true } } }

then download extracted file using the jobid, the header for the request is

{
"Content-Type": "application/json",
"Accept-Encoding": "gzip"
}

Is this the correct header? Or should Accept-Encoding go in the original extraction request?

So I am expecting a gzip file, is that correct?

The value of:

int(response.headers['Content-Length'])

Does not match the actual file size, do you know why?

I am using the "Content-length" to validate what has been downloaded. If the file has been downloaded correctly its file size should in theory match the content length. Can you confirm if my understanding is correct?
An alternative is to use the "Content-MD5" but this appears to be missing from the response headers. Do you know how I obtain this, then I can validate against the checksum?

Find more posts tagged with

#technology

Tick History API

RESTAPI

Filesize

Content-length

Accepted answers

All comments

Gurpreet

Hello @prasad.reddy01

The content length in response header refers to the number of bytes, a receiver should expect to receive on the stream. If you count the size of the read-data in the code, you would see that this size matches the content length header exactly.

Once the data is written to the file, the file size would be different. See this answer for an explanation.

The response for the final extraction result is a gzip encoded stream, so your accept header is ok - but you can also leave it to accept all.

Accept: */*

There is no MD5 checksum provided for raw extraction results. MD5 is only available in the Venue-By-Day (VBD) file downloads.

marsman

Hello Gurpreet

I can see:

'Content-Length': '244279206', 'Content-Type': 'text/plain', 'Content-Encoding': 'gzip',

But summing the bytes of the response is much larger than the content length:

for chunk in response.iter_content(chunk_size=chunk_size):
if chunk:
chunk_size = len(chunk)
total_bytes += chunk_size

ie total_bytes is much larger than Content-Length

even though the Content-Encoding suggests gzip, the content is uncompressed. Is the Content-Length referring to the compressed bytes?

marsman

actually iter_content decompresses the content, looking at the raw byte stream it matches the content length

EXPLORE OUR SITES