For a deeper look into our DataScope Select REST API, look into:

Overview |  Quickstart |  Documentation |  Downloads |  Tutorials

question

Upvotes
Accepted
38 3 9 8

Request timeouts

I'm encountering problems with request timeouts, more of my requests fail than succeed now.

I'm making a request to

#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.PriceHistoryExtractionRequest

and receiving a 202 with a Location header, and my application polls the TR system waiting for a response. But I find that I make my initial request around 23:00:00, then at first I get 202 responses when I poll, but they soon turn into timeouts, even with a 30 second timeout setting on my RestTemplate.

For example:

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractWithNotesResult(ExtractionId='0x0673fe52bc7027d2')

This is requesting 20 years of data, so if the problem is simply that the system cannot start responding within 30 seconds please let me know what a reasonable timeout setting would be, from your POV.

(Also I get a fair few instances of entirely failing to connect:

ResourceAccessException: I/O error on GET request for "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractWithNotesResult(ExtractionId='0x0673f e52bc7027d2')": Connect to hosted.datascopeapi.reuters.com:443 [hosted.datascopeapi.reuters.com/192.165.219.152] failed: Connection refused (Connection refused); nested exception is org.apache.http.conn.HttpHostConnectException: Connect to hosted.datascopeapi.reuters.com :443 [hosted.datascopeapi.reuters.com/192.165.219.152] failed: Connection refused (Connection refused)

)

dss-rest-apidssdatascope-selectjavatime-out
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@davet1,

20 years of data for how many RICs ?

As a rule of thumb, for large requests, a polling interval of a few minutes should be fine.

Could you please add a unique Client-Session-Id to your requests (it must be unique for each request), and log that as well as the returned Request-Execution-Correlation-Id, as described in the help page here.

Then send us those 2 Ids for a request that times out, and for a request that ends in a refused connection.

That will allow us to investigate what happened.

45 at present, but we would eventually be doing this for hundreds.

At the moment I poll fairly frequently, although I would be happy to turn that down eventually.

I will look at Client-Session-Id and Request-Execution-Correlation-Id now

@davet1, what is "fairly frequently" ? The interval should not be less than 30 seconds.

Show more comments

Two requests should have just been received:

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractWithNotesResult(ExtractionId=' 0x0673fe52bc7027d2')

X-Client-Session-Id:6e8cdf96-0508-11e9-8014-525400a87d41_6294

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractWithNotesResult(ExtractionId=' 0x0673fe52b31027d2')

X-Client-Session-Id: 6e8cdf96-0508-11e9-8014-525400a87d41_6293

They both timed out, though, so I do not have a Correlation Id.

@davet1, thank you, I will send this now to the team who can investigate.

@davet1,

Just checking: are you following this exact workflow:

1) Initial extraction request. Results in a 202, returns monitor URL in the response headers.

2) Poll the monitor URL using a GET (at interval > 30 seconds), until it returns a 200.

3) Retrieve the data from the body of the 200.

Yup, that's what I'm doing.

Upvotes
Accepted
106 1 2 1

I agree that the long delay before the bytes actually start flowing is undesirable and, while there is a reason behind it, I am not sure it is a very good reason... Extraction 0x0673fe52bc7027d2 produced a fairly large result file and currently, before the bytes can be sent, the entire raw results need to be converted to JSON text before the ExtractWithNotesResults call can start sending bytes. Our development plan includes adding support for streaming JSON results, but I cannot tell you where that is on the development timeline.

You will find that for large result set extractions, the ExtractRaw method is more responsive, although you will receive the data as a CSV file stream and not JSON. You would then deploy your own CSV processing once the bytes are streamed down.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Ah-ha, ok, thank you. That sounds like it won't scale properly to hundreds or a few thousands of instruments.

Would you generally advise doing something completely different to achieve a cache of data in our system? Or is "use CSV" The Answer?

My team is not the best to address your business case needs. Your Account Manager should be your best resource to connect you with the best Refinitiv resource to analyze your specific use case.

@davet1,

Without knowing the details of your use case, and as a generic answer:

For large requests (many data fields for hundreds or thousands of instruments), I would consider using ExtractRaw instead of ExtractWithNotes.

Caveat: this requires code changes, as the workflow and data format are different; for details see the DSS extract raw tutorial. Via the stream you will receive a compressed CSV instead of uncompressed JSON. I'd recommend saving the compressed file, and then reading and decompressing from file, instead of decompressing on the fly which can cause issues.

It seems to add a whole other request/response step into the workflow, rather than just being a simple switch to retrieve the data in a different format. How extremely klunky. I'll work on it today.

@Rick Weyrauch Too, thank you , this is very interesting; I was not aware of this.

Upvotes
13.7k 26 8 12

@davet1,

I have some feedback from the development team:

It might be the speed of your network versus the local timeout time being set. We show a connection open for 2m17s and sending 69MB:

2018-12-21 13:36:27.123 2018-12-21 13:34:10.575 GET "9019523" 172.25.182.9 "31.193.172.61" 200 136548 "CiD/9019523/PhQNBQ.0x06744ee083d0280f/RA" 1736183808 12.82 31.82 80 69390327 882 "Apache-HttpClient/4.5.1 (Java/1.8.0_181)" /RestApi/v1/Extractions/ExtractWithNotesResult(ExtractionId='0x0673fe52bc7027d2') -

That’s about 500kB/sec.

How long is your local Apache-HttpClient timeout setting?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

speedtest-cli from the server reports:

Download: 598.06 Mbit/s

Upload: 334.07 Mbit/s

I am setting a timeout of 30 seconds. From the JDK documentation:

"If the timeout expires before there is data available for read, a java.net.SocketTimeoutException is raised. "

I realise that's not entirely nailed-down, but to me that sounds like "if the data is already flowing, then it won't timeout".

I am trying this from the command line first.

$ time curl -m 600 -H "Accept-Charset: UTF-8" -H "Prefer: respond-async, wait=1" -H "Content-Type: applicat ion/json" -H "Authorization: Token $TR_TOKEN" -X GET "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractWithNotesResult(ExtractionId='0x0673fe52bc7027d2')" > x

Took 2 minutes 21 seconds to retrieve 67MB of data.

But it didn't slowly receive that data. The file was empty for at least the first 2 minutes.

That seems extremely slow. If we had 2000 tickers would the request have to sit and wait for 45 minutes?

@davet1,

  • Are you setting Prefer: respond-async, wait=1 on all your requests, or was that just for this particular request ? Changing the wait parameter when using DSS/TRTH is not recommended, for more info see this help page.
  • 2 min 21 secs for 67MB data = ~3.8Mb/sec, could be normal (depending on your location and internet bandwidth).
  • Re "the file was empty 2 min": I'm not sure you will see the file size increase immediately, there could be a buffering mechanism that delays the write/save to disk which could delay the moment you see an increase in size.
Show more comments

I have written a couple of tests and verified that the resttemplate/httpclient readtimeout does NOT fire if it is receiving data (even if that data is coming very slowly).

However, TR-DSS seems to sit mute for the first two minutes before beginning to return data, which triggers the timeout.

How would you advise I scale this out to hundreds or (a few) thousands of Rics?

Any update on this please?

@davet1, the development team came back to me, suggesting you open a service ticket, that way the on call 2nd Level people could help investigate.