Upgrade from Eikon -> Workspace. Learn about programming differences.

For a deeper look into our Eikon Data API, look into:

Overview |  Quickstart |  Documentation |  Downloads |  Tutorials |  Articles

question

Upvotes
Accepted
7 1 1 6

A 'get_news_story' request into a dataframe

Hi, I am using ek.get_news_headlines to display a dataframe of 5 news articles for a particular company. i.e.

df = ek.get_news_headlines('GOOG.O AND Language:LEN', date_from='2021-01-01T09:00:00', date_to='2023-06-30T23:59:59', count = 5)


The above works fine at display the last 5 storyId's... but i'd like to use the ek.get_news_story request to loop through the rows in the above df and pull the article from each storyID into another dataframe? When I try the below snippet - which I found on another post - I just get a HTML dump from the first storyId only.

for idx, storyId in enumerate(headlines['storyId'].values): #for each row in our df dataframe

newsText = ek.get_news_story(storyId) #get the news story

time.sleep(5) # sleep for 5 seconds

print(newsText)


I'd ideally like to see 1 new dataframe containing 5 rows (one row for each news article), one column with the news article's title, another column containing just the text from each article (no HTML tags!), and then another column of the URL.

Any help would be greatly appreciated.

Thank you!

eikon-data-api#technologyrdp-apinewspandas
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvote
Accepted
81.5k 267 53 76

@di.ti

Thank you for reaching out to us.

To get the story text (no HTML tag), you need to use Refinitiv Data Library for Python. The example code is avaiable on GitHub.

The code looks like this:

import time
import pandas as pd
df = pd.DataFrame(columns=['headline', 'story', 'storyid'])
headlines = rd.news.get_headlines('GOOG.O AND Language:LEN', 
                                  start='2021-01-01T09:00:00', 
                                  end='2023-06-30T23:59:59', 
                                  count = 5)
for index, row in headlines.iterrows():    
    newsText = rd.news.get_story(row['storyId'], format=rd.news.Format.TEXT) #get the news story
    df = df.append({'headline':row['headline'],'story':newsText,'storyid':row['storyId']}, ignore_index=True)
    time.sleep(5) 
    
df

The ouput is:

1694063737808.png



1694063737808.png (36.1 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Thank you, this worked. Any idea of how I can include a column for the timestamp of each article too?

Please this one:

import time
import pandas as pd
df = pd.DataFrame(columns=['timestamp','headline', 'story', 'storyid'])
headlines = rd.news.get_headlines('GOOG.O AND Language:LEN', 
                                  start='2021-01-01T09:00:00', 
                                  end='2023-06-30T23:59:59', 
                                  count = 5)
headlines = headlines.reset_index()
for index, row in headlines.iterrows():    
    newsText = rd.news.get_story(row['storyId'], format=rd.news.Format.TEXT) #get the news story
    df = df.append({
                    'timestamp':row['versionCreated'],'headline':row['headline'],'story':newsText,'storyid':row['storyId']}, ignore_index=True)
    time.sleep(5) 
    
df

thank you @Jirapongse, this was exactly what i was looking for!

One last question please re: this topic :)

Is it possible to do a freeform search as part of this news query? i.e. if I wanted to pull news articles into a data frame where "Elon Musk SpaceX" was my search term?

Thank you!

Show more comments
Show more comments
Upvotes
7 1 1 6

Its ok @Jirapongse, i worked it out:

get_headlines('4297089638 AND SIG AND Language:LEN',


:)


icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.