How to clear special character in news extracted from eikon api

Hi team, I encountered a question regarding eikon api retrieving news. The news body contains too many special characters, hyper links as well delimiters. Is there any way to clean them up and only keep the raw text? I've attached my code below and the original news from workspace. Thanks for the help.

1700535282789.png

1700535364526.png

Welcome!

It looks like you're new here. Sign in or register to get started.

Best Answer

Answers

  • Hi Jira, thanks for the reply. The new command did help to cleaned up special characters, but it truncated quite a lot text.

    text = rd.news.get_story("urn:newsml:reuters.com:20231113:nL4S3CE14O:1", format=rd.news.Format.TEXT)

    print(text)

    Original news:

    1700563399600.png

    News extracted:

    1700563414385.png

    Every line was truncated right in front of a hyper link or RIC. Is that some bugs or any other adjustments I need to do? Thank you.

  • Jirapongse
    Jirapongse ✭✭✭✭✭

    @Julian.Bai

    This is what I get from the API.

    1700568715153.png


  • Thanks Jira, I'll try again later on my side.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.