How to Get Data from AWS instead of TRTH server using REST API

2»

Answers

  • @pj4, glad to hear you solved it :-)

    I agree fully with your comment on the polling wait time, and appreciate the fact that you are considerate about the server load. It is true that every single request is a load on the servers.

    There is a trade-off to make between placing an unnecessary burden on the server and optimizing run times. I ran your sample with the parameter files you sent, for 5 FIDs and 10 base RICs. In all it took more than 15 hours to run. The average time was 11.6 minutes, fastest was 5 minutes, slowest was 30 minutes. 50% of all extractions took between 6 and 8 minutes. In these circumstances, a polling time of 5 minutes seems more appropriate than one of 15 minutes, it allows your program to terminate significantly faster without adding undue burden on our servers.

  • pj4
    pj4 Explorer

    Thanks for analyzing the code/requests/time taken in this much detail. and providing insights out of it.

    One more issue that i have faced many times is below:

    Error Message: File "THR_TimeZoneMapped_Integrated_Modules_AWS1.py", line 458, in requestCode
    shutil.copyfileobj(rr, fd, chunk_size)
    IOError: [Errno 22] Invalid argument

    The line 458 is: shutil.copyfileobj(rr, fd, chunk_size)

    It seems that the issue is while writing the data into file. Any idea why we get it?

    While trying to extract the downloaded gzip file, it got extracted but with some error. And in the extracted file the last element seems some incorrect timestamp (snapshot attached) file incorrect-timestamp.png

  • @pj4, I just looked at 10 (out of 50) of the files I downloaded using your code. All zips opened correctly, and all CSVs inside ended correctly, no issues. I also did not see that error when running the code.

    The code I have to retrieve and save the gzipped files is this (and useAws is true):

      r5 = requests.get(requestUrl,headers=requestHeaders,stream=True)
    r5.raw.decode_content = False
    ...
    fileName = filePath0 + dateforfilename+".csv.gz"
    chunk_size = 1024
    rr = r5.raw
    with open(fileName, 'wb') as fd:
    shutil.copyfileobj(rr, fd, chunk_size)
    fd.close

    I have less code lines than you, there must be a difference between our codes. Can you post your latest THR_TimeZoneMapped_Integrated_Modules_AWS1.py code please, with the associated params.xlsx ?

    Please also post a GZIP file that exhibits the issue. Attachments are limited to 500KB; if your file is larger (but smaller than 10MB) send it to me (christiaan.meihsl@tr.com) directly.

  • pj4
    pj4 Explorer

    I tried to send you the mail since the gzip file size was more than 500KB but it didn't get delivered due to "file size violation" issue.

    However, i attached the latest code and params files here for your reference. 2415-code-and-params.zip

  • @pj4, looking at the latest code you sent I see the final request r5 for data retrieval is still made in all cases.
    This requires correcting, because it should only be done if the preceding HTTP status was 200 and the JobId variable assigned.

  • @pj4, I have been running your latest code with the latest params file on my PC since midday, with a few debugging traces added. Up to now 4 data files were produced (LC fids 70, 64, 54, 15). There were no errors, the gzip files are healthy, the CSV extracts fine and its last timestamp is correct. I will be out of office tomorrow, so I'll come back to you with the other results on Monday.

  • @pj4, I ran your latest code with your params file. My PC rebooted for an update in my absence, so I could not see the console output, but it seems to have worked fine: all gzips were created and can be opened, CSV files can be extracted, their content looks fine. I opened the gzip files using winrar.