How can you get a list of available scheduled extracted files and download them via API by file n...

...ame?

I've got a few requests scheduled to run daily and generate files which I want to download. I created the requests through the GUI

Ideally I'd like to be able to donwload the extracted files using the file name (because it will remain constant). Otherwise I'd need to obtain the list of all extracted files available with their job id and download all those files using their jobs id (I prefer names as they would remain the same)

I'm using Python with the requests library,.

NOTE: I'm able to obtain some info about my extracted files in "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractedFiles", but I'm not sure if it's the best way and I can't download with the file name

Find more posts tagged with

dss-rest-api

tick-history-rest-api

Accepted answers

Christiaan Meihsl

@alvaro.canencia, let me attempt to answer your queries:

Q1. How can I get those 'schedule ids' for my scheduled reports?

A1. There is
a call to list all schedules (it returns them all, not only the active
ones):

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/Schedules

The response includes (among other fields) the ScheduleID. You can run this example in the C# example app (its installation and usage are described in the Quick Start), it is the first one in category Schedule Examples. As far as I know this should only return "Stored and Scheduled", not "On demand".

Q2. With the schedule id I got from the GUI I've tried to use the urls you provide, substituting the '123' with the 'schedule id' from the GUI. None of the urls work. I get the message: "Resource not found for the segment 'Schedule'"

A2. Using Schedule IDs, the workflow is:

a) Check each scheduled extraction status using its ScheduleID (retrieved as explained under A1 above):

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/Schedules('0x05a2b98d233b3036')/LastExtraction

Repeat until the returned status is "Completed", then save the returned ReportExtractionId

As your schedules are daily, you do this daily after your schedules should have triggered, at a point in time when you expect them to have completed.

b) Retrieve the corresponding extraction report, using the ReportExtractionId:

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ReportExtractions('2000000000197524')/Files

Save the returned ExtractedFileId you are interested in, for each file you want. I fully agree with Troy, you should not only download the data file, but also download the Note file, because it contains useful information on the extraction, eventual errors or warnings, etc.

c) Retrieve the files, using their ExtractedFileId:

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractedFiles('VjF8MHgwNWEyYjk5ODRiNmIyZjk2fA')/$value

These steps are all described in detail in REST API Tutorial 12. As you created your instrument lists, templates and schedules manually, you can skip the first steps of that tutorial, and start directly at step Check the extraction status, which corresponds to step a) in this answer.

Alternative using file names

You could choose to ignore the schedules completely, and just look at the extracted files.

You can simply get the latest extracted files extraction for all schedules. The call delivers the file ID, report extraction ID, schedule ID, file type, file name, timestamp, size, etc.

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractedFiles

After that you can get the changes to that list (using the delta token returned by the previous call).

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractedFiles?$deltatoken='MjAxNy0wOS0wMVQxNTo0MjoxNi4yNzAwMDAwfDB4MDAwMDAwMDAwMDAwMDAwMCwzMTUyMjUyNjg'

Finally, you can proceed to get the files, using the returned file ID(s):

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractedFiles('VjF8fDMxNTIyNzYzOQ')/$value

Note you can run the calls I list here under Scheduled Extractions in the C# example app (described in the Quick Start).

Conclusion

There is no "best way", it really depends on your workflow and you own preferences.

Hope this helps.

All comments

steven.peng

Yes, you can use "ExtractedFiles" function but it will return all extracted files in the history of the user account and the list may grow very large.

You can also use "Jobs" functions such as

https://hosted.datascopeapi.reuters.com/RestApi/v1/Jobs/Jobs

https://hosted.datascopeapi.reuters.com/RestApi/v1/Jobs/JobGetCompleted

https://hosted.datascopeapi.reuters.com/RestApi/v1/Jobs/JobGetActive

TRTH REST API Reference Guide has detailed explanations:

https://hosted.datascopeapi.reuters.com/RestApi.Help/Context/Entity?ctx=Jobs&ent=Job

alvaro.canencia

Thanks @steven.peng

The problem with your "Jobs" requests is that they don't retrieve any of the scheduled reports ( only on demand jobs, which I don't need)

However with "ExtractedFiles" I get scheduled and on demand but as you said the list can grow too large. Any alternative?

Addtionally, is there a way to use the output file name defined in the GUI (which would remain constant)? Or the only way to request the file is with the job id/extraction id/schedule id??

alvaro.canencia

Also, when creating and using the url for downloading with the my id from "Extracted"(https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRawResult(ExtractionId='2000000001888079')) after 5 mins waiting I get a message

"Job of id '2000000001888079' not found". So it needs 5 mins to reply that something doesn't exist? And the response is incorrect after 5 mins?

Troy Dalldorf

You can download the last extraction's file given a schedule id using the following URL:

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/Schedule('123')/LastExtraction/FullFile/$value

or:

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/Schedule('123')/LastExtraction/$value

The notes file must be downloaded
separately and we do recommend you download that for support purposes.

If you happen to miss a day, you will
need to follow a different process to catch up.

You need to be able to handle duplicates in
the event that we do not create a file (unlikely event of a DSS error).

You should allow adequate time to pass
before checking for file (so as to avoid checking to early).

alvaro.canencia

Hi Troy, thanks for your answer, I'm afraid it doesn't work

1. How can I get those 'schedule ids' for my scheduled reports? I could get one of those ids from the GUI (in Extracted Files => Notes). However for hundreds of ids how can I get them via the API? And I always mean "Stored and Scheduled" reports, not "On demand".

2. With the schedule id I got from the GUI I've tried to use the urls you provide, substituting the '123' with the 'schedule id' from the GUI. None of the urls work. I get the message: "Resource not found for the segment 'Schedule'"

NOTE: even with the 'scheduled ids' I get from https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractedFiles and pasting them into your urls I get the same "Resource not found for the segment 'Schedule'" error message

Christiaan Meihsl

@alvaro.canencia, let me attempt to answer your queries:

Q1. How can I get those 'schedule ids' for my scheduled reports?

A1. There is
a call to list all schedules (it returns them all, not only the active
ones):

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/Schedules

A2. Using Schedule IDs, the workflow is:

a) Check each scheduled extraction status using its ScheduleID (retrieved as explained under A1 above):

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/Schedules('0x05a2b98d233b3036')/LastExtraction

Repeat until the returned status is "Completed", then save the returned ReportExtractionId

As your schedules are daily, you do this daily after your schedules should have triggered, at a point in time when you expect them to have completed.

b) Retrieve the corresponding extraction report, using the ReportExtractionId:

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ReportExtractions('2000000000197524')/Files

c) Retrieve the files, using their ExtractedFileId:

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractedFiles('VjF8MHgwNWEyYjk5ODRiNmIyZjk2fA')/$value

Alternative using file names

You could choose to ignore the schedules completely, and just look at the extracted files.

You can simply get the latest extracted files extraction for all schedules. The call delivers the file ID, report extraction ID, schedule ID, file type, file name, timestamp, size, etc.

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractedFiles

After that you can get the changes to that list (using the delta token returned by the previous call).

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractedFiles?$deltatoken='MjAxNy0wOS0wMVQxNTo0MjoxNi4yNzAwMDAwfDB4MDAwMDAwMDAwMDAwMDAwMCwzMTUyMjUyNjg'

Finally, you can proceed to get the files, using the returned file ID(s):

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractedFiles('VjF8fDMxNTIyNzYzOQ')/$value

Note you can run the calls I list here under Scheduled Extractions in the C# example app (described in the Quick Start).

Conclusion

There is no "best way", it really depends on your workflow and you own preferences.

Hope this helps.

alvaro.canencia

Thanks Christiaan, this is what I needed. It works except for getting the status with the ScheduleID (A2a), I always get the following error:

{"error":{"code":"b4e06a6b-ed37-4d84-9082-f61b1d6742c6","message":"Resource not found for the segment 'Schedule'."}}

However maybe I can work with your last suggestion using file names and studying the REST API Tutorial 12

Christiaan Meihsl

@alvaro.canencia, glad this helped.

On getting the status of a schedule (A2a) I just re-tested it. I saw I made a typo in my response above, which I have now corrected. It was:

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/Schedule('0x05a2b98d233b3036')/LastExtraction

But the endpoint should be Schedules with an s at the end:

GET https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/Schedules('0x05a2b98d233b3036')/LastExtraction

Apologies for that. Hopefully it should now also work for you.

EXPLORE OUR SITES