I'm currently working on a market sentiment report and utilizing Python's get_news_headlines function to collect data. Because the function is run for each currency pair, if more than one currency pair is referenced in the same news, there may be duplicates. The dashboard shows the news' sentiment and how it relates to the currency pair in question. The problem is that there are times when a news piece discusses only one currency pair, and the sentiment of that news article is directed toward that currency pair, but it also includes mentions to other currency pairs. Hence, when the filters for the other currency pairs are applied, this news story is also caught. As a result, the sentiment gathered in this news story does not correspond to the other currency pairs indicated, causing the statistics gathered to be erroneous. Is there any way to resolve this problem, please?
Examples of filters used currently: EURGBP=, EUR=, EURJPY=
Example of this issue:
Thanks in advance.
@azzopardic So this is essentially an NLP (Natural Language Processing) and classification problem set - not really an API issue. So if I am right you are looking to separate articles that only talk about one currency pair and ignore articles that mention more than one currency pair. In this case you would need to download the news story and then scan for mentions of any other pairs (ie you should have a list of all pairs) and then only pass the news item to the sentiment engine if only the pair you are referring to is mentioned or you could say then weight the sentiment of the article according to the number of mentions of a currency pair - please see the following example:
from IPython.display import HTML from bs4 import BeautifulSoup df = ek.get_news_headlines('GBP=',count=100) df
this returns us a list of headlines and storyIds. So we then download a news story like this:
We could then count the number of mentions of particular phrases - in our case from currency pairs or rics even:
fx_list = ['GBP','JPY','USD','EUR','EUR/GBP','GBP/EUR'] for fx in fx_list: soup = BeautifulSoup(story,"html") wordcount = sum(fx == word for word in soup.get_text().split()) print(fx, wordcount)
So once you have this data you could then decide whether the article was more about one pair or more about other pairs. I don't think this would be the holy grail but its a simple way for you to perhaps experiment a bit to see how things are working in real life. I hope this simple sample can help.
I'm not sure I'm following you. Please, correct me if I'm wrong, but from your description it seems to me that the issue is squarely within the proprietary logic in your application. get_news_headlines method only returns the headlines. The sentiment analysis you do with these headlines does not utilize any Refinitiv capabilities, right? If this is correct, I don't see how this forum could possibly help you.
Hello @azzopardic ,
Fully agree with @jason.ramchandani and @Alex Putkov's answers, and would like to mention, that if you require the relevance of the news, to specific RIC, as a score (i.e. 0.9 is very relevant and 0.15 is just a passing mention), as well as other news analytics, this information is available for Realtime news, as part of Refinitiv Machine Readable News Analytics product, and history of MRN news analytics as Refinitiv Machine Readable News History product. These are separate enterprise products, not part of Eikon/Refinitiv Workspace space.
If this may be of interest to your organization, please discuss your requirements in detail with your Refinitiv account team so they able to help you best address the requirement.