how to correctly paginate through search results?

Hi,

i have a search which would result in roughly 450,000 results. So, i thought i could just paginate through the search(results) via using the top and skip parameters. but that does not work.

The issue i get is:

Result is maxed at 10000 while the total is 453761 rows.
Requested - 100, skipped - 9900 rows.

the code i use

search = rd.discovery.search(
    view = rd.discovery.Views.INDICATOR_QUOTES,
    top = 100,
    skip = 9900,
    filter = "( SearchAllCategoryv2 eq 'Economic Indicators')",
    select = "RIC, RCSCountryOfIndicatorLeaf,CommonName, Periodicity, StartDate, EndDate, ObservationDate, PreviousReleaseDate, NextRelease")

when i try to skip 10,000, i get this error:

RDError: Error code 400 | Invalid result window: (top + skip) must not exceed 10,000

so, how do i get the other 443,761 rows?

thanks
Andreas

Find more posts tagged with

#technology

refinitiv-data-libraries

Accepted answers

m.bunkowski

Hi @andreas01

If you need to get this significant amount of RIC codes you can use the benefit of navigators that can split your search request in a reasonable buckets where the number of results is less than an accepted max number of 10k.

The original request brings around 450k results but if you remove China (which you can later also split into similar baskets) it will give arount 250k results. The idea is to segregate the response by the country "RCSCountryofindicatorLeaf" that can be later on used to create filter that can narrow down the search results

response=search.Definition(
        view = rd.discovery.Views.INDICATOR_QUOTES,
         filter = "SearchAllCategoryv2 eq 'Economic Indicators' and RCSCountryOfIndicatorLeaf ne 'China'",
        top = 0,
        navigators = "RCSCountryOfIndicatorLeaf"  
    ).get_data()

#the format of the output
response.data.raw['Navigators']['RCSCountryOfIndicatorLeaf']['Buckets']

Looking at the out put you can see that the possible max result for a single country is 8902 - United States.
Now you can create a loop going through each of the list items and putting that into a search syntax and merge to a final results list.

results = [] 
for i in response.data.raw['Navigators']['RCSCountryOfIndicatorLeaf']['Buckets']:
    response=search.Definition(
            view = rd.discovery.Views.INDICATOR_QUOTES,
            filter = f"SearchAllCategoryv2 eq 'Economic Indicators' and RCSCountryOfIndicatorLeaf xeq '{i}'",
            select = "RIC, RCSCountryOfIndicatorLeaf,CommonName, Periodicity, StartDate, EndDate, ObservationDate, PreviousReleaseDate, NextRelease",
            top = 10000,
        ).get_data()
    results.extend(response.data.raw['Hits'])

All comments

aramyan.h

Hi @andreas01 ,

As much as I know, unfortunately, there is no way to introduce pagination as such in Search. What you can do is to provide additional filters which will reduce the size of your request below 10,000 and you can extract through iterating the filter criteria. One example could be playing with the StartDate, EndDate parameters and/or RCSCountryOfIndicatorLeaf and Periodicity. See an example filter below with these:

    filter = "( SearchAllCategoryv2 eq 'Economic Indicators') and StartDate ge 1996-01-01 and StartDate le 1997-01-01 and Periodicity eq 'Annual' and RCSCountryOfIndicatorLeaf eq 'China (Mainland)' "

I hope this helps.

Best regards,

Haykaz

m.bunkowski

Hi @andreas01

response=search.Definition(
        view = rd.discovery.Views.INDICATOR_QUOTES,
         filter = "SearchAllCategoryv2 eq 'Economic Indicators' and RCSCountryOfIndicatorLeaf ne 'China'",
        top = 0,
        navigators = "RCSCountryOfIndicatorLeaf"  
    ).get_data()

#the format of the output
response.data.raw['Navigators']['RCSCountryOfIndicatorLeaf']['Buckets']

results = [] 
for i in response.data.raw['Navigators']['RCSCountryOfIndicatorLeaf']['Buckets']:
    response=search.Definition(
            view = rd.discovery.Views.INDICATOR_QUOTES,
            filter = f"SearchAllCategoryv2 eq 'Economic Indicators' and RCSCountryOfIndicatorLeaf xeq '{i}'",
            select = "RIC, RCSCountryOfIndicatorLeaf,CommonName, Periodicity, StartDate, EndDate, ObservationDate, PreviousReleaseDate, NextRelease",
            top = 10000,
        ).get_data()
    results.extend(response.data.raw['Hits'])

andreas01

that's brilliant. thank you! i had to adjust it a little bit

{i}

had to become

{i}['Label']

but other than that it works nicely.

do you happen to have a suggestion for the China "problem"? What could be a good basket for this one?

thanks a lot, @m.bunkowski

EXPLORE OUR SITES