How do I extract tick history data for a set of non-consecutive dates, rather than a range of dates?

I am not using any API as of now. I am here to understand if an API can solve my issue.

I am using Windows.

I need to extract tick history data for a specific time window and for specific non-consecutive dates. E.g. I want to extract some fields of CLc1 between 10:00 and 11:00 ET for 11/12/2022, 01/03/2023, and 23/06/2023.

I have 800 of such dates, and the only solutions I have found so far is to create a report template for each one of the 800 dates and manually download the data.

Is there a way I can use an API to automatise the extraction?

How do I get started?

Downloading the data for all days and then discarding the ones I do not need is not a solution, as I have memory constraints.

My desired output is a csv file for each date.

Thanks!

Find more posts tagged with

tick-history-rest-api

tick-data

#product

Accepted answers

Gurpreet

Hi @riccardo.degasperi,

There is no means to specify multiple date ranges in a single request. Your workflow can definitely be simplified by using the REST API - this will save you the hassle of creating hundreds of report templates.

Your application code will create a JSON request message and invoke it multiple times, with a different date range. You might also be able to send multiple concurrent requests (within limit) to speed up the retrieval process.

If you use Python for example, something along the lines of this should work:

def requestData(accessToken, instrument, startDate, endDate):
  hdrs = {
    "Authorization": "Token " + accessToken,
    "Prefer": "respond-async, wait=10",
    "Content-Type": "application/json; odata=minimalmetadata"
  }

  body = {
    "ExtractionRequest": {
      "@odata.type": "#DataScope.Select.Api.Extractions.ExtractionRequests.TickHistoryRawExtractionRequest",
      "ContentFieldNames": [
      ],
      "IdentifierList": {
        "@odata.type": "#DataScope.Select.Api.Extractions.ExtractionRequests.InstrumentIdentifierList",
        "InstrumentIdentifiers": [
          {
            "Identifier": instrument,
            "IdentifierType": "Ric"
          }
        ],
        "ValidationOptions": None,
        "UseUserPreferencesForValidationOptions": False
      },
      "Condition": {
        "MessageTimeStampIn": "GmtUtc",
        "ReportDateRangeType": "Range",
        "QueryStartDate": startDate,
        "QueryEndDate": endDate,
        "DisplaySourceRIC": True
      }
    }
  }

  return sess.post(url, data = json.dumps(body), headers = hdrs)


for <<your range of dates>>:
  requestData(<<token>>, "IBM.N", sDate, eDate)

All comments

Gurpreet

Hi @riccardo.degasperi,

If you use Python for example, something along the lines of this should work:

def requestData(accessToken, instrument, startDate, endDate):
  hdrs = {
    "Authorization": "Token " + accessToken,
    "Prefer": "respond-async, wait=10",
    "Content-Type": "application/json; odata=minimalmetadata"
  }

  body = {
    "ExtractionRequest": {
      "@odata.type": "#DataScope.Select.Api.Extractions.ExtractionRequests.TickHistoryRawExtractionRequest",
      "ContentFieldNames": [
      ],
      "IdentifierList": {
        "@odata.type": "#DataScope.Select.Api.Extractions.ExtractionRequests.InstrumentIdentifierList",
        "InstrumentIdentifiers": [
          {
            "Identifier": instrument,
            "IdentifierType": "Ric"
          }
        ],
        "ValidationOptions": None,
        "UseUserPreferencesForValidationOptions": False
      },
      "Condition": {
        "MessageTimeStampIn": "GmtUtc",
        "ReportDateRangeType": "Range",
        "QueryStartDate": startDate,
        "QueryEndDate": endDate,
        "DisplaySourceRIC": True
      }
    }
  }

  return sess.post(url, data = json.dumps(body), headers = hdrs)


for <<your range of dates>>:
  requestData(<<token>>, "IBM.N", sDate, eDate)

Lukasz Ossowski

On top of Gurpreet's answer, please find a snapshot from the Graphical User Interface of RTH solution, where you can apply specific time window for your extractions. Data will be retrieved within that window for a string of dates only:

Jirapongse

@riccardo.degasperi

You need to group dates together to create a set of consecutive dates to reduce the number of requests. Some Tick History report templates support the TimeRangeMode property. You can set it to Window to create a report on the specified time window on each day within the date range, as mentioned by @Lukasz Ossowski.

riccardo.degasperi

Hi @Gurpreet,

Thanks! I like the idea of extracting the data in a for loop.

I'll try it out.

I'm not sure I understand @Jirapongse's point to group dates together. Do you mean there could be an issue if I run a for loop over too many dates?

Thanks!

Gurpreet

No, the window idea is that if you are only interested in extracting data for say 1 hour every day, then you specify the window. I doubt it will be applicable in your use case, since you are requesting non-contingent date range.

Jirapongse

For example, if you would like to request data on 1 Jun 2023, 6 Jun 2023, and 8 Jun 2023. You need to send three requests for that. However, you can send the request from 1 Jun 2023 to 8 Jun 2023, as one request. Then, filter out the unwanted data. This can reduce the number of requests.

riccardo.degasperi

I managed to do this in R. Below an example.

# Install packages
#install.packages('jsonlite','httr','readr')

# Set working directory
#setwd()

# Load libraries
library(jsonlite)
library(httr)
library(read)

# Load functions
source("Rfunctions.R")
source("utilFunctions.R")

cacheEnv <- new.env()

# Get token
un <- "username"
pw <- "password"
token <- RTHLogin(un,pw)

# Select target dates
startDates <- c("2017-09-29T00:00:00.000Z","2017-09-30T00:00:00.000Z")
endDates <- c("2017-09-29T12:00:00.000Z","2017-09-30T12:00:00.000Z")

# Define instrument list and report template settings
n <- length(startDates)
for (i in 1:n) {

   b <- paste('{
      "ExtractionRequest": {
        "@odata.type": "#DataScope.Select.Api.Extractions.ExtractionRequests.TickHistoryTimeAndSalesExtractionRequest",
        "ContentFieldNames": [
          "Trade - Price",
          "Trade - Volume",
          "Trade - Exchange Time"
        ],
        "IdentifierList": {
          "@odata.type": "#DataScope.Select.Api.Extractions.ExtractionRequests.InstrumentIdentifierList",
          "InstrumentIdentifiers": [
            {
              "Identifier": "CLc1",
              "IdentifierType": "Ric"
            },
            {
              "Identifier": "FFc1",
              "IdentifierType": "Ric"
            }]
        },
        "Condition": {
          "MessageTimeStampIn": "GmtUtc",
          "ReportDateRangeType": "Range",
          "QueryStartDate": "',startDates[i],'",
          "QueryEndDate": "',endDates[i],'"
        }
      }
    }', sep = "")
   
   # Define filename
   tmp <- substr(startDates[i], 1, 10)
   filename <- paste('./',tmp,'.csv.gz',sep = "")
   
   # Submit request
   murl <- RTHExtractRaw(b,filename,TRUE)
   
   # Check request status & get extraction results
   check <- 202
   while (check == 202){
     wait(30)
     out <- RTHCheckRequestStatus(murl,filename,TRUE)
     check <- out$status_code
   }
}

The functions are:

#' Pauses the execution for x seconds
wait <- function(x)
{
  p1 <- proc.time()
  Sys.sleep(x)
  proc.time() - p1 # The cpu usage should be negligible
}

#' Request authentication token
#' @param uname DSS username
#' @param pword DSS password
#' @return An authentication token that must be applied to all requests
RTHLogin <- function(uname,pword) {
url <- "https://selectapi.datascope.refinitiv.com/RestApi/v1/Authentication/RequestToken"
  b <- list(Credentials=list(Username=jsonlite::unbox(uname),Password=jsonlite::unbox(pword)))
  r <- httr::POST(url,add_headers(prefer = "respond-async"),content_type_json(),body = b,encode = "json")
  stop_for_status(r)
  a <- httr::content(r, "parsed", "application/json", encoding="UTF-8")
  token <- paste('Token',a[[2]],sep=" ")
  assign("token",token,envir = cacheEnv)
  return(token)
}

#' Performs an on demand extraction returning the raw results as a stream if the response is available in a short amount of time,
#' otherwise the server accepted the extracting and response with a monitor URL.
#' In the later case, You must poll the extraction status with RTHCheckRequestStatus.
#'
#' The result format is the native/raw result from the underlying extractor (usually csv).
#' @param b JSON request body. See REST API Reference Tree for format.
#' @param path Path to content to.
#' @param overwrite Will only overwrite existing path if TRUE.
RTHExtractRaw <- function(b,path,overwrite = FALSE) {
url <- "https://selectapi.datascope.refinitiv.com/RestApi/v1/Extractions/ExtractRaw"
  token <- get("token",envir = cacheEnv)
  r <- httr::POST(url,add_headers(prefer = "respond-async",Authorization = token),content_type_json(),body = b,encode = "json")
  if (status_code(r) == 202) {
    message("The request has been accepted but has not yet completed executing asynchronously.\r\nReturn monitor URL\r\n",r$headers$location)
    return(invisible(r$headers$location))
  } else if(status_code(r) == 200) {
    a<-content(r, "parsed", "application/json", encoding="UTF-8")
    message(a$Notes)
    return(RTHRawExtractionResults(a$JobId,path,overwrite))
  } else {
    warn_for_status(r)
    a<-content(r, "parsed", "application/json", encoding="UTF-8")
    return(a)
  }
}

#' Polling the extraction status.
#' On Demand extraction requests are executed as soon as possible.
#' However, There is no guarantee on the delivery time.
#' If the previous request returned a monitor URL, RTHCheckRequestStatus must be executed until it returns the result.
#' @param location The monitor URL.
#' @param path Path to content to.
#' @param overwrite Will only overwrite existing path if TRUE.
RTHCheckRequestStatus <- function(location,path,overwrite = FALSE) {
  token <- get("token",envir = cacheEnv)
  r <- GET(location,add_headers(prefer = "respond-async",Authorization = token))
  if (status_code(r) == 202) {
    message("The request has not yet completed executing asynchronously.\r\nPlease wait a bit and check the request status again.\r\n")
    return(invisible(r))
  } else if(status_code(r) == 200) {
    a<-content(r, "parsed", "application/json", encoding="UTF-8")
    message(a$Notes)
    return(RTHRawExtractionResults(a$JobId,path,overwrite))
  } else {
    warn_for_status(r)
    a<-content(r, "parsed", "application/json", encoding="UTF-8")
    return(a)
  }
}

RTHRawExtractionResults <- function(jobId,path,overwrite = TRUE) {
url <- paste0("https://selectapi.datascope.refinitiv.com/RestApi/v1/Extractions/RawExtractionResults('",jobId,"')/$value")
  token <- get("token",envir = cacheEnv)
  r <- GET(url,add_headers(prefer = "respond-async",Authorization = token),config(http_content_decoding=0),write_disk(path,overwrite),progress())
  stop_for_status(r)
  return(r)
}

Hope this helps.

EXPLORE OUR SITES