question

Upvotes
Accepted
1 0 0 1

Get "Max retries exceed" error or "NoneType" error when doing parallel parsing bundle

截圖-2023-07-12-141631.png截圖-2023-07-12-141808.png

As the title and screenshot of the process, I'm getting some fields latest data (related to price, dividend, splits) on numbers of tickers (symbol list is d dataframe with datastream_id, and tickers, containing about 6000 symbols).

I'm using get_bundle_data in DatastreamPy, also using ProcessPoolExecutor to get data parallelly. Unfortunately, I get errors of "Max retries exceeded with url" and "NontType" error.


python#technology#producterrorrefinitiv-data-platform-libraries
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvote
Accepted
84.6k 287 53 77

@hsheng

Thank you for reaching out to us.

It looks like to be a connection or SSL (Secure Socket Layer) issue.

1689145276827.png

Moreover, please also check the limitations of the GetDataBundle request in the DSWS user stats and limits document.

1689147172397.png

Did the problem happen when using simple requests or non-parallel requests?




1689145276827.png (158.5 KiB)
1689147172397.png (122.0 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Thank you for replying.

The problem didn't happen when using non-parallel requests,

but it happened when using simple requests (with parallel parsing) in the beginning.


I'm very curious about that because the problem happened "sometimes", not every time.

Moreover, I think how I used to get bundle data with parallelization is similar to the sample attached which I received from your team in the past.

DSWS Python Sample.html.zip

Upvotes
84.6k 287 53 77

@hsheng

I can run the code properly with 3000 instruments.

import tqdm
import numpy
import pandas as pd
import DatastreamPy


from concurrent.futures import ProcessPoolExecutor


df1 = pd.read_html('list.xls')
df2 = pd.read_html('list1.xls') 
df3 = pd.read_html('list2.xls') 
df4 = pd.read_html('list3.xls')
df = pd.concat([df1[0], df2[0], df3[0], df4[0]], ignore_index=True) 


ds = DatastreamPy.Datastream(username="username",password="password")


max_data_points = 100


request_fields=["UPO","UPH","UPL","UP","X(UVO)*1000",
                "AF","PO","PH","PL","P","UDD","DD","DPS",
                "DY","AND","PYD","XDD","SPLDTE","SPLFCT",
                "DT","IBPDTE"]
run_symbols = df["Symbol"].tolist()

batches = numpy.array_split(run_symbols, int(numpy.ceil(len(run_symbols)/numpy.floor(max_data_points / len(request_fields)))))


total_requests = [ds.post_user_request(','.join(c), list(request_fields), kind=0, start='2023-07-11', end='2023-07-11') for c in batches]
get_ds_data_pivot = lambda per_request: ds.get_bundle_data(
    [per_request])[0].drop_duplicates(subset=['Instrument','Datatype'], keep='first').pivot(index = 'Instrument',columns='Datatype',values='Value')


def get_symbol_list_and_check_time(request):    
    return get_ds_data_pivot(request)


def main():
    with ProcessPoolExecutor(max_workers = 5) as executor:
        result = list(tqdm.tqdm(executor.map(get_symbol_list_and_check_time, total_requests), desc='getting latest data', total = len(total_requests)))


if __name__ == '__main__':
    main()

I can't run the code on Jupyter Notebook so I ran it on the console.

1689238047725.png

Please check the version of DatastreamPy that you are using. I am using 1.0.12.



1689238047725.png (9.9 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.