How to make sure content length matches the downloaded files from Tick History API?

prasad.reddy01
edited February 17 in TREP APIs

I am looking at an extraction request based upon:

{
"ExtractionRequest": {
"@odata.type": "#DataScope.Select.Api.Extractions.ExtractionRequests.TickHistoryRawExtractionRequest",
"IdentifierList": {
"@odata.type": "#DataScope.Select.Api.Extractions.ExtractionRequests.InstrumentIdentifierList",
"InstrumentIdentifiers": [
{
"Identifier": "ESH25",
"IdentifierType": "Ric"
},
{
"Identifier": "ESM25",
"IdentifierType": "Ric"
},
{
"Identifier": "ESU25",
"IdentifierType": "Ric"
}
],
"ValidationOptions": {
"AllowHistoricalInstruments": true
},
"UseUserPreferencesForValidationOptions": false
},
"Condition": {
"MessageTimeStampIn": "LocalExchangeTime",
"ReportDateRangeType": "Range",
"QueryStartDate": "2025-01-16T17:00:00-06:00",
"QueryEndDate": "2025-01-17T16:00:00-06:00",
"DisplaySourceRIC": true
}
}
}

then download extracted file using the jobid, the header for the request is

{
"Content-Type": "application/json",
"Accept-Encoding": "gzip"
}

Is this the correct header? Or should Accept-Encoding go in the original extraction request?

So I am expecting a gzip file, is that correct?

The value of:

int(response.headers['Content-Length'])

Does not match the actual file size, do you know why?

I am using the "Content-length" to validate what has been downloaded. If the file has been downloaded correctly its file size should in theory match the content length. Can you confirm if my understanding is correct?
An alternative is to use the "Content-MD5" but this appears to be missing from the response headers. Do you know how I obtain this, then I can validate against the checksum?

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • Hello @prasad.reddy01

    The content length in response header refers to the number of bytes, a receiver should expect to receive on the stream. If you count the size of the read-data in the code, you would see that this size matches the content length header exactly.

    Once the data is written to the file, the file size would be different. See this answer for an explanation.

    The response for the final extraction result is a gzip encoded stream, so your accept header is ok - but you can also leave it to accept all.

    Accept: */*

    There is no MD5 checksum provided for raw extraction results. MD5 is only available in the Venue-By-Day (VBD) file downloads.

  • marsman
    marsman Newcomer

    Hello Gurpreet

    I can see:

    'Content-Length': '244279206', 'Content-Type': 'text/plain', 'Content-Encoding': 'gzip',

    But summing the bytes of the response is much larger than the content length:

    for chunk in response.iter_content(chunk_size=chunk_size):
    if chunk:
    chunk_size = len(chunk)
    total_bytes += chunk_size

    ie total_bytes is much larger than Content-Length

    even though the Content-Encoding suggests gzip, the content is uncompressed. Is the Content-Length referring to the compressed bytes?

  • marsman
    marsman Newcomer

    actually iter_content decompresses the content, looking at the raw byte stream it matches the content length

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.