Discover Refinitiv
MyRefinitiv Refinitiv Perspectives Careers
Created with Sketch.
All APIs Questions & Answers  Register |  Login
Ask a question
  • Questions
  • Tags
  • Badges
  • Unanswered
Search:
  • Home /
  • TRTH /
avatar image
Question by duckman · Nov 19, 2018 at 07:12 AM · tick historyDownloadtickhistoryv2intraday extractiondepth

Download market depth day by day

I am trying to download depth data from Tick History Market Depth/Legacy market depth using REST API from this post.

As you already know, depth data is quite large. so I would like to save the files by date. i.e. instead of having 1 giant file, I have one file for each day that contains depth data for all stocks in the list. I would also love to separate by stock and then by date as well but by date is fine for now.

How would I go about it? The following is my code. I use Python 3.6 with Pycharm by the way. And I am not really good with Python. I normally use SAS.

1. The JSON file that specifies the fields to download

{
  "ExtractionRequest": {
    "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.TickHistoryMarketDepthExtractionRequest",
    "ContentFieldNames": [
      "Ask Price",
      "Ask Size",
      "Bid Price",
      "Bid Size"
    ],
    "IdentifierList": {
        "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.InstrumentListIdentifierList",
        "InstrumentListId":"0x06698c5d00301db4"
     },
    "Condition": {
      "View": "NormalizedLL2",
      "NumberOfLevels": 5,
      "MessageTimeStampIn": "GmtUtc",
      "ReportDateRangeType": "Range",
      "QueryStartDate": "1996-01-01T00:00:00.000Z",
      "QueryEndDate": "2018-06-06T23:59:59.999Z",
      "DisplaySourceRIC": "True"
    }
  }
}

And here is the code to run and get data:

#!/usr/bin/python
# -*- coding: UTF-8 -*-
from json import dumps, loads, load
from requests import post
from requests import get
from time import sleep
from getpass import _raw_input as input
from getpass import getpass
from getpass import GetPassWarning
from collections import OrderedDict
import os
import gzip
import pandas as pd




_outputFilePath="./"
_outputFileName="TestOutput"
_retryInterval=int(30) #value in second used by Pooling loop to check request status on the server
_jsonFileName="TickHistoricalRequest.json"


def RequestNewToken(username="",password=""):
    _AuthenURL = "https://hosted.datascopeapi.reuters.com/RestApi/v1/Authentication/RequestToken"
    _header= {}
    _header['Prefer']='respond-async'
    _header['Content-Type']='application/json; odata.metadata=minimal'
    _data={'Credentials':{
        'Password':password,
        'Username':username
        }
    }

    print("Send Login request")
    resp=post(_AuthenURL,json=_data,headers=_header)


    if resp.status_code!=200:
        message="Authentication Error Status Code: "+ str(resp.status_code) +" Message:"+dumps(loads(resp.text),indent=4)
        raise Exception(str(message))


    return loads(resp.text)['value']


def ExtractRaw(token,json_payload):
    try:
        _extractRawURL="https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRaw"
        #Setup Request Header
        _header={}
        _header['Prefer']='respond-async'
        _header['Content-Type']='application/json; odata.metadata=minimal'
        _header['Accept-Charset']='UTF-8'
        _header['Authorization']='Token'+token

        #Post Http Request to DSS server using extract raw URL
        resp=post(_extractRawURL,data=None,json=json_payload,headers=_header)


        #Print Status Code return from HTTP Response
        print("Status Code="+str(resp.status_code) )


        #Raise exception with error message if the returned status is not 202 (Accepted) or 200 (Ok)
        if resp.status_code!=200:
            if resp.status_code!=202:
                message="Error: Status Code:"+str(resp.status_code)+" Message:"+resp.text
                raise Exception(message)


            #Get location from header, URL must be https so we need to change it using string replace function
            _location=str.replace(resp.headers['Location'],"http://","https://")


            print("Get Status from "+str(_location))
            _jobID=""


            #pooling loop to check request status every 2 sec.
            while True:
                resp=get(_location,headers=_header)
                _pollstatus = int(resp.status_code)


                if _pollstatus==200:
                    break
                else:
                    print("Status:"+str(resp.headers['Status']))
                sleep(_retryInterval) #wait for _retyInterval period and re-request the status to check if it already completed


        # Get the jobID from HTTP response
        json_resp = loads(resp.text)
        _jobID = json_resp.get('JobId')
        print("Status is completed the JobID is "+ str(_jobID)+ "\n")


        # Check if the response contains Notes.If the note exists print it to console.
        if len(json_resp.get('Notes')) > 0:
            print("Notes:\n======================================")
            for var in json_resp.get('Notes'):
                print(var)
            print("======================================\n")


        # Request should be completed then Get the result by passing jobID to RAWExtractionResults URL
        _getResultURL = str("https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults(\'" + _jobID + "\')/$value")
        print("Retrieve result from " + _getResultURL)
        resp=get(_getResultURL,headers=_header,stream=True)


        #Write Output to file.
        outputfilepath = str(_outputFilePath + _outputFileName + str(os.getpid()) + '.csv.gz')
        if resp.status_code==200:
            with open(outputfilepath, 'wb') as f:
                f.write(resp.raw.read())


        print("Write output to "+outputfilepath+" completed\n\n")
        print("Below is sample data from "+ outputfilepath)
        #Read data from csv.gz and shows output from dataframe head() and tail() 
        df=pd.read_csv(outputfilepath,compression='gzip')
        print(df.head())
        print("....")
        print(df.tail())


    except Exception as ex:
        print("Exception occrus:", ex)


    return






def main():
    try:
        #Request a new Token
        print("Login to DSS Server")
        _DSSUsername=input('Enter DSS Username:')
        try:
            _DSSPassword=getpass(prompt='Enter DSS Password:')
            _token=RequestNewToken(_DSSUsername,_DSSPassword)
        except GetPassWarning as e:
             print(e)
        print("Token="+_token+"\n")


        #Read the HTTP request body from JSON file. So you can change the request in JSON file instead.
        queryString = {}
        with open(_jsonFileName, "r") as filehandle:
            queryString=load(filehandle,object_pairs_hook=OrderedDict)


        #print(queryString)
        ExtractRaw(_token,queryString)


    except Exception as e:
        print(e)


print(__name__)


if __name__=="__main__":
    main()

Sorry for the long post!

People who like this

0 Show 0
Comment
10 |1500 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

3 Replies

  • Sort: 
avatar image
REFINITIV
Best Answer
Answer by Christiaan Meihsl · Nov 21, 2018 at 05:05 AM

@ducman.nguyen,

TRTH delivers data for the instrument list and date range specified in the request, as a compressed CSV file.

To save this in separate files by date (and/or instrument) there are several possibilities, most of them based on decompressing the data, identifying when the date or instrument changes, and saving to files accordingly. How to do that ?

  1. Decompress data on the fly, and save to different files ? Avoid this, as per this advisory it is strongly recommended not to decompress data on the fly, you will probably run into issues.
  2. Make multiple requests, one per day of data ? This requires to build a simple loop in the request code, which is fairly easy. Your request is for all data since 1996 till now, which would make for several hundred requests. This is not recommended, as making multiple requests is inefficient (each request has some overhead). For tips on efficiency, see the best practices document.
  3. Save the giant file to disk, then split it in several files ? This requires to read it from disk, decompress, analyze and then save content to separate files (compressed or not).This is a possibility, that requires some coding, but should not generate any technical issues, and be fairly efficient.
  4. Make a compromise ? You could download data for more manageable chunks. Instead of requesting everything from 1996 to now, you could make a request for every single year, or month.

Conclusion: if you really want to split the data in smaller chunks, I'd recommend the 4th (or 3rd) option.

Final comment: all that said, I do not understand why having a large file is a concern. Market depth is voluminous data, that's the way it is. Storing it in one big file is not more complex than using multiple files. Could you tell us why you want to do this ?

Comment

People who like this

0 Show 2 · Share
10 |1500 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

avatar image
duckman · Nov 21, 2018 at 09:41 PM 0
Share

@Christiaan Meihsl

Thank you very much for the reply. I have built a code the loops over each day and request data, i.e. change the date in the JSON file and run the request again. I will try option 4 which I think is the best option.

The reason I want to separate them is that I will do some manipulation/ calculation using SAS. And having a big file takes a long time to run. And if the computer freezes or disrupted then I have to rerun again which is not efficient.

avatar image
REFINITIV
Christiaan Meihsl ♦♦ duckman · Nov 22, 2018 at 04:49 AM 0
Share

@ducman.nguyen, ok, makes sense; thank you for your feedback. And I hope your computer does not freeze too often :-)

avatar image
Answer by duckman · Nov 19, 2018 at 07:17 AM

I don't know why the code appears strangely. I got the code from here

https://github.com/TR-API-Samples/Article.TRTH.Python.REST.OndemandRequestTimeAndSales/blob/master/TickHistoryTimesAndSalesRequest.py

Comment

People who like this

0 Show 0 · Share
10 |1500 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

avatar image
Answer by duckman · Nov 25, 2018 at 07:39 PM

@Christiaan Meihsl

While we are on this topic, is it possible to request hourly snapshot of the depth data, instead of getting all data? One potential way is to change the the query time field so that only data between say 11:00 and 11:01 is downloaded.

"QueryStartDate": "2018-06-06T11:00:00.000Z",
"QueryEndDate": "2018-06-06T11:01:00.000Z",

However, this only gives me 1 snapshot at 11:00. But I would like to get hourly snapshots at 10:00, 11:00, 12:00 ...etc...

What would be the best way?

Comment

People who like this

0 Show 1 · Share
10 |1500 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

avatar image
REFINITIV
Christiaan Meihsl ♦♦ · Nov 26, 2018 at 04:52 AM 0
Share

@ducman.nguyen,

I'm sorry to say that is not possible for market depth. You can request Intraday Summaries to get hourly data, but that API call does not include market depth fields.

Watch this question

Add to watch list
Add to your watch list to receive emailed updates for this question. Too many emails? Change your settings >
9 People are following this question.

Related Questions

Requesting Intraday prices for (inactive) instruments fails but it does not when retrieving Depth and EndOfDay data

5 Decimal Places data for EUR/USD

Documentation for Market Conditions Qualifiers

Python Asynchronous Alternative to Polling in Tick History?

AWS server return different data type when using REST API with python2 instead of python3

  • Feedback
  • Copyright
  • Cookie Policy
  • Privacy Statement
  • Terms of Use
  • Careers
  • Anonymous
  • Sign in
  • Create
  • Ask a question
  • Spaces
  • Alpha
  • App Studio
  • Block Chain
  • Bot Platform
  • Calais
  • Connected Risk APIs
  • DSS
  • Data Fusion
  • Data Model Discovery
  • Datastream
  • Eikon COM
  • Eikon Data APIs
  • Elektron
    • EMA
    • ETA
    • WebSocket API
  • Legal One
  • Messenger Bot
  • Messenger Side by Side
  • ONESOURCE
    • Indirect Tax
  • Open PermID
    • Entity Search
  • Org ID
  • PAM
    • PAM - Logging
  • ProView
  • ProView Internal
  • Product Insight
  • Project Tracking
  • Refinitiv Data Platform
    • Refinitiv Data Platform Libraries
  • Rose's Space
  • Screening
    • Qual-ID API
    • Screening Deployed
    • Screening Online
    • World-Check One
    • World-Check One Zero Footprint
  • Side by Side Integration API
  • TR Knowledge Graph
  • TREP APIs
    • CAT
    • DACS Station
    • Open DACS
    • RFA
    • UPA
  • TREP Infrastructure
  • TRIT
  • TRKD
  • TRTH
  • Thomson One Smart
  • Transactions
    • REDI API
  • Velocity Analytics
  • Wealth Management Web Services
  • World-Check Data File
  • Explore
  • Tags
  • Questions
  • Badges