question

Upvotes
Accepted
50 1 0 3

how to correctly paginate through search results?

Hi,

i have a search which would result in roughly 450,000 results. So, i thought i could just paginate through the search(results) via using the top and skip parameters. but that does not work.

The issue i get is:

Result is maxed at 10000 while the total is 453761 rows.
Requested - 100, skipped - 9900 rows.


the code i use

search = rd.discovery.search(
    view = rd.discovery.Views.INDICATOR_QUOTES,
    top = 100,
    skip = 9900,
    filter = "( SearchAllCategoryv2 eq 'Economic Indicators')",
    select = "RIC, RCSCountryOfIndicatorLeaf,CommonName, Periodicity, StartDate, EndDate, ObservationDate, PreviousReleaseDate, NextRelease")


when i try to skip 10,000, i get this error:

RDError: Error code 400 | Invalid result window: (top + skip) must not exceed 10,000


so, how do i get the other 443,761 rows?

thanks
Andreas

#technologysearchrefinitiv-data-libraries
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvote
Accepted
1.4k 5 3 6

Hi @andreas01

If you need to get this significant amount of RIC codes you can use the benefit of navigators that can split your search request in a reasonable buckets where the number of results is less than an accepted max number of 10k.

The original request brings around 450k results but if you remove China (which you can later also split into similar baskets) it will give arount 250k results. The idea is to segregate the response by the country "RCSCountryofindicatorLeaf" that can be later on used to create filter that can narrow down the search results

response=search.Definition(
        view = rd.discovery.Views.INDICATOR_QUOTES,
         filter = "SearchAllCategoryv2 eq 'Economic Indicators' and RCSCountryOfIndicatorLeaf ne 'China'",
        top = 0,
        navigators = "RCSCountryOfIndicatorLeaf"  
    ).get_data()

#the format of the output
response.data.raw['Navigators']['RCSCountryOfIndicatorLeaf']['Buckets']

Looking at the out put you can see that the possible max result for a single country is 8902 - United States.
Now you can create a loop going through each of the list items and putting that into a search syntax and merge to a final results list.

results = [] 
for i in response.data.raw['Navigators']['RCSCountryOfIndicatorLeaf']['Buckets']:
    response=search.Definition(
            view = rd.discovery.Views.INDICATOR_QUOTES,
            filter = f"SearchAllCategoryv2 eq 'Economic Indicators' and RCSCountryOfIndicatorLeaf xeq '{i}'",
            select = "RIC, RCSCountryOfIndicatorLeaf,CommonName, Periodicity, StartDate, EndDate, ObservationDate, PreviousReleaseDate, NextRelease",
            top = 10000,
        ).get_data()
    results.extend(response.data.raw['Hits'])


icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

that's brilliant. thank you! i had to adjust it a little bit

{i}

had to become

{i}['Label'] 

but other than that it works nicely.


do you happen to have a suggestion for the China "problem"? What could be a good basket for this one?

thanks a lot, @m.bunkowski

Hi @andreas01

You can do it in a similar way but with a different filter:

response=search.Definition(
        view = rd.discovery.Views.INDICATOR_QUOTES,
         filter = "SearchAllCategoryv2 eq 'Economic Indicators' and RCSCountryOfIndicatorLeaf eq 'China'",
        top = 0,
        navigators = "ObservationValue(buckets:50)"  
    ).get_data()

and then:

results = []
for i in response.data.raw['Navigators']['ObservationValue']['Buckets']:
    response=search.Definition(
            view = rd.discovery.Views.INDICATOR_QUOTES,
            filter = f"SearchAllCategoryv2 eq 'Economic Indicators' and RCSCountryOfIndicatorLeaf eq 'China' and {i['Filter']}",
            select = "RIC, RCSCountryOfIndicatorLeaf,CommonName, Periodicity, StartDate, EndDate, ObservationDate, PreviousReleaseDate, NextRelease",
            top = 10000,
            
        ).get_data()
    results.extend(response.data.raw['Hits'])


that is awesome. learning some great new tricks here. thank you @m.bunkowski

Upvotes
5k 16 2 7

Hi @andreas01 ,


As much as I know, unfortunately, there is no way to introduce pagination as such in Search. What you can do is to provide additional filters which will reduce the size of your request below 10,000 and you can extract through iterating the filter criteria. One example could be playing with the StartDate, EndDate parameters and/or RCSCountryOfIndicatorLeaf and Periodicity. See an example filter below with these:

    filter = "( SearchAllCategoryv2 eq 'Economic Indicators') and StartDate ge 1996-01-01 and StartDate le 1997-01-01 and Periodicity eq 'Annual' and RCSCountryOfIndicatorLeaf eq 'China (Mainland)' "

I hope this helps.


Best regards,

Haykaz

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.