question

Upvotes
33 1 1 7

Query the refinitiv.data API in databricks

Hi everyone,


I'm currently building an application using databricks, and in this process I'm migrating some scripts that were previously running on my local PC to run in a databricks notebook.


I'm using refinitiv-data-1.6.2 to query the data.


This is my auth flow :

rd.open_session(config_name='credentials.json')

with credentials.json :

{
    "logs": {
        "level": "debug",
        "transports": {
            "console": {
                "enabled": false
            },
            "file": {
                "enabled": false,
                "name": "refinitiv-data-lib.log"
            }
        }
    },
    "sessions": {
        "default": "platform.rdp",
        "platform": {
            "rdp": {
                "app-key": APPKEY,
                "username": USERNAME,
                "password": PASSWORD,
                "signon_contro":"true"
                
            },
            "deployed": {
                "app-key": APPKEY,
                "realtime-distribution-system": {
                    "url" : "ws://ADS1:15000",
                    "dacs" : {  
                        "username" : USERNAME,
                        "application-id" : ID,
                        "position" : ""
                    }
                }
            }
        },
        "desktop": {
            "workspace": {
                "app-key": APPKEY
            }
        }
    }
}

When I try to test to connection with a simple query :

rd.get_history("AAPL.O")

I get the following error:

An error occurred while requesting URL('https://api.refinitiv.com/data/historical-pricing/v1/views/interday-summaries/AAPL.O').
ConnectError('[Errno 104] Connection reset by peer')
RDError: Error code -1 | [Errno 104] Connection reset by peer
---------------------------------------------------------------------------
RDError                                   Traceback (most recent call last)
File <command-3082401349704677>, line 1
----> 1 rd.get_history("AAPL.O")

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0ef252e0-3959-4315-a0ad-c721bed19027/lib/python3.11/site-packages/refinitiv/data/_access_layer/get_history_func.py:206, in get_history(universe, fields, interval, start, end, adjustments, count, use_field_names_in_headers, parameters)
    204 if exceptions and all(exceptions):
    205     except_msg = "\n\n".join(exceptions)
--> 206     raise RDError(-1, except_msg)
    208 if not any({adc_data, hp_data, cust_inst_data}):
    209     return DataFrame()

RDError: Error code -1 | [Errno 104] Connection reset by peer


I understand that this is caused by the fact that the API connection goes through a proxy that is probably not allowed on the databricks virtual machine, but i'm trying to find way around this issue, or another connection method that would be more adapted to this.


Let me know if you need more information on my databricks setup, or if you have the contact info of someone I could get in contact with to solve this.


Thank you


#technologyrefinitiv-data-platformauthenticationconnection-errorvirtualization
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
25.4k 90 13 26

Hi @adam.leroux
I have done a couple of DataBricks projects in the past, which connect to some of our RDP API endpoints and did not have any issues getting this to work.


Note that I was not using the RD Library - I was connecting to the endpoints directly using the raw RDP API in Python - as described here https://developers.lseg.com/en/api-catalog/refinitiv-data-platform/refinitiv-data-platform-apis/tutorials (in your case the RD Library is trying to use https://api.refinitiv.com/data/historical-pricing/v1/views/interday-summaries/)

I checked with my cloud engineering colleague and they advised that they had to open up internet access in our Azure Databricks environment, in order to be able to connect to the RDP endpoints.

I am not aware of the inner workings of the RD Library, but I expect that when you open the session, this would also have to connect to our RDP Authentication endpoints, so if you are able to open a session, this would suggest you dont have an issue connecting to RDP?

I recommend some testing with the raw RDP API calls (see the Python tutorials - https://developers.lseg.com/en/api-catalog/refinitiv-data-platform/refinitiv-data-platform-apis/tutorials#authorization-in-python) from Databricks (i.e. not using the RD Library) - to see if there is an issue accessing RDP from within your Databricks env - OR if it is an issue with the RD Library and/or your particular usage scenario?

If you cannot access the raw RDP API, then I recommend you speak to your Databricks engineering/admin team to see if they can open up access. If you can access using the raw RDP APIs and not the RD Libraries, then you could raise a ticket for the RD Library team to investigte further.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
33 1 1 7

Just adding details for anyone having the same issue :

After testing with the RDP API, the same issue comes up when trying to request the token :


response = requests.post(
TOKEN_ENDPOINT,
headers = {
"Accept": "application/json"
},
data = tData, 
auth = (
"61ef6e3c917f4fab92ec2a8f72ddfbf9f4a1a839",
""
)
)


ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'


So it seems this is not specific to the RD library.


Will update this post once I've added access to the endpoint to databricks.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.