For a deeper look into our Eikon Data API, look into:

Overview |  Quickstart |  Documentation |  Downloads |  Tutorials |  Articles

question

Upvotes
Accepted
0 0 0 2

[Python API] How to fuzzy search for companies by name ?

Hi, thanks for taking a look. See subj.

Ideally, search would retrieve top N entries.

Even better, it would let me constrain the search scope to companies that are registered in a given country (precise match by code), operate in a given industry sector (either precise match by a symbology or a fuzzy one given keywords or a precise match against a list of options ("contains")) and / or have turnover (or other financials) in a given range -- and still return top N of those satisfying the constraints.

How can that be achieved using the Python API ?

Thanks in advance!

eikoneikon-data-apipythonrefinitiv-dataplatform-eikonworkspaceworkspace-data-apisearch
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvote
Accepted
39.3k 77 11 27

Fuzzy search is not available in Eikon. You can however combine exact string match name search with other criteria in Eikon Screener. Here's an example returning public companies containing the word "resources" in the company name, headquartered in the US and falling under Financials sector according to Thomson Reuters Business Classification scheme (TRBC).

name = '"resources"'
exp = ('SCREEN(U(IN(Equity(active,public,primary))),' + 
       ' Contains(TR.CommonName,%s),' %name +
       'IN(TR.HQCountryCode,"US"),' +
       'IN(TR.TRBCEconSectorCode,"55"))')
fields = ['TR.CompanyName']
ek.get_data(exp, fields)
To construct Screener expression follow the wizard behind Screener button under Thomson Reuters tab in Excel ribbon. You may also want to watch a series of short video tutorials titled "Working with the Screener" available from the main Eikon menu - Help - Tutorials and Training. Type in "Screener" in the search bar a click on "Working with the Screener" in the results.
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Thanks, Alex. Are there boolean operators that can be used in conjunction with Contains?

Imagine there is a complex names with multiple words which may be in different forms including abbreviations. One way to search for such an entry would be to split it into words or ngrams, potentially, at character level, and use AND and OR to build up query expression.

For example:

OR(
  AND(
Contains(..., "that-co"),
Contains(..., "inc")
),
AND(
Contains(..., "that"),
Contains(..., "company"),
Contains("incorporated")
)
)

Follow the Screener wizard launched through Screener button in Excel ribbon to learn what is and isn't possible through the Screener. Once you set up your criteria in the Screener wizard click Insert Screen button to paste the formula into Excel worksheet. The screener expression in the first argument of the resulting =TR worksheet function can be copied & pasted into Python script with minor modifications related to syntactical differences in string manipulations between Excel formulas and Python.
It is possible to use logical expressions with multiple keywords and Contains statement and TR.CommonName field:

exp = ('SCREEN(U(IN(Equity(active,public,primary))),' + 
       ' (Contains(TR.CommonName,"resources") OR' +
       ' Contains(TR.CommonName,"energy") AND' +
       ' Contains(TR.CommonName,"Corp")),'
       ' IN(TR.HQCountryCode,"US"))')
fields = ['TR.CompanyName']
ek.get_data(exp, fields)
Upvotes
0 0 0 2

This question is similar, but the answers do not cover fuzzy matching and constraints:

https://community.developers.refinitiv.com/questions/37096/screening-using-name-contains.html?redirectedFrom=37019

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.