We have an error when trying to download files from the filings api: https://api.refinitiv.com/data/filings/v1/retrieval/search/docId/{doc_id}
The goal is to be able to download the original file. Using the above api, we obtain a valid url (see code below).However the url does not work from Databricks, resulting in a 403 forbidden. The same code works as expected from our laptop. Code example:
def get_pdf_content(pdf_url):
headers = {
"Accept": "application/pdf",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
response = requests.get(pdf_url, headers=headers)
if response.status_code != 200:
raise Exception(f"Error: {response.status_code} while downloading PDF: {response.text}")
return response.content
get_pdf_content("https://cdn-filings.filings.refinitiv.com/retrieval/filings/20231229_4295877697_97681355318_1_26_ESG_raw.pdf?ClientID=API_Playground&Expires=1744813219&Signature=Le-nXXD7OaxbgoBjUKaFzX7FGT69B9TuSt22fxE4QRQCdDvxxLdK~XXdmzy~NMFWcF2WbBiCXLHFJYsQuQQ7OSjW302qjIOkRXSeZgmUB9IlviP~K2z1PGYrsNn5mtdcI9SCojr6JLAT59kKMvvEh2kacII5O3PNUoufKh2z2qCbirKZuyu3EVAD8muM-vS9Dk7iH0Hzji9c7vxftQZiyASk7uEGyN94buUPeqY6z3T4b6~0RMCU0iEmhVIvl9jXlGn-qTQSztCalzK0zj4Lc7kvOLNqRL~CgYuAABg-Vexnwbd2dSlU-QJrZUukoMBkXzcMQqporoqvpUrlqntFcw__&Key-Pair-Id=APKAIDW27KNAZ6YUBN7A")
Error: 403 - Forbidden
Error: 403 while downloading PDF: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <TITLE>ERROR: The request could not be satisfied</TITLE> </HEAD><BODY> <H1>403 ERROR</H1> <H2>The request could not be satisfied.</H2> <HR noshade size="1px"> Request blocked. We can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner. <BR clear="all"> If you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation. <BR clear="all"> <HR noshade size="1px"> <PRE> Generated by cloudfront (CloudFront) Request ID: tzoAj2QX9YTyHZXTfj4GfAgzckGfKFsP5Xf5rbw3mDI5maS5LaKjtA== </PRE> <ADDRESS> </ADDRESS> </BODY></HTML>