How to retrieve news story via Eikon Data API, if the underlying news is a PDF file

Hi team,

How to retrieve news story via Eikon Data API, if the underlying news is a PDF file? Many thanks.

Regards,

Sunny

Below is python code

---------------------------------------------------------------------------------------------

Ric='3333.HK'
start_date='10/5/2020'
end_date='10/5/2021'


news_headline= ek.get_news_headlines(f'{Ric} HIIS UNAUDITED OPERATING',
date_from=f'{start_date}',
count=100,
date_to=f'{end_date}')


news_body=ek.get_news_story(news_headline['storyId'][2])
news_body


Tagged:

Best Answer

  • Jirapongse
    Jirapongse ✭✭✭✭✭
    Answer ✓

    @sunny.to@refinitiv.com

    According to the response on this thread, this feature is still not available in Eikon Data API.

    However, I can get the PDF file with this URL: https://newsfile.refinitiv.com/getnewsfile/v1/story?guid=urn:newsml:reuters.com:20210701:nHKS6BgsFL.

    '<div class="storyContent" lang="en"><p>UNAUDITED OPERATING STATISTICS OF PROPERTIES OF THE GROUP FOR JUNE 2021(with URL)</p><p class="line-break"><br/></p><p class="line-break"><br/></p><p class="line-break"><br/></p><p class="line-break"><br/></p><p>Exchange T1 category code 10000:"Announcements and Notices"</p><p class="line-break"><br/></p><p class="line-break"><br/></p><p>Exchange T2 category code 19800:"Other – Trading Update"</p><p class="line-break"><br/></p><p class="line-break"><br/></p><p><a href="reuters://screen/verb=Open/url=cpurl%3A%2F%2Fviews.cp.%2Fnewsfile%2Fgetnewsfile%2Fv1%2Fstory%3Fguid%3Durn%3Anewsml%3Areuters.com%3A20210701%3AnHKS6BgsFL" data-type="cpurl" data-cpurl="cpurl://views.cp./newsfile/getnewsfile/v1/story?guid=urn:newsml:reuters.com:20210701:nHKS6BgsFL" translate="no">http://newsfile.refinitiv.com/getnewsfile/v1/story...</a></p><p class="line-break"><br/></p><p class="line-break"><br/></p><p class="line-break"><br/></p><p>Double click on the URL above to view the article. Please note that internet access is required. If you experience problem accessing the internet, please consult your network administrator or technical support</p><p class="line-break"><br/></p><p class="line-break"><br/></p><p class="line-break"><br/></p><p>Latest version of Adobe Acrobat reader is recommended to view PDF files.  The latest version of the reader can be obtained from <a href="http://www.adobe.com/products/acrobat/readstep2.html&quot; data-type="url" class="tr-link" translate="no">http://www.adobe.com/products/acrobat/readstep2.html</a></p></div>';

Answers

  • Hi @Jirapongse ,

    Thanks for your reply! May I know how to retrieve the link as a whole by using the python code? Since I can see it is separated in the news body.

    Thanks,

    Danni

  • Jirapongse
    Jirapongse ✭✭✭✭✭

    @Danni Qiu

    You need to parse the news body to get the link.

    For example, you can use the BeautifulSoup library to parse the news body.

    try: 
        from BeautifulSoup import BeautifulSoup
    except ImportError:
        from bs4 import BeautifulSoup


    parsed_html = BeautifulSoup(news_body)
    link = parsed_html.body.find('a', attrs={'data-type':'cpurl'})
    link_cpurl = link['data-cpurl']
    link_text = link.text.split('...')[0] + "?" + link_cpurl.split('?')[1]
    link_text

    This is just a sample code that may not work for all cases.