question

Upvotes
Accepted
50 2 5 4

/entity/search scores

Using the DDS instance of Data Fusion, if I use the /entity/search call with the inputs:

searchstring = 'Toyota Motor Corp'

entitytype = 12 (organization)

In the results, one entity has a name that is an exact match for the input. However, that entity has a lower score than the top match, which is:

Toyota Motor Philippines Corp

This doesn't make sense to me. What kind of scoring algorithm are we using that produces this result?

apidata-fusion
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

1 Answer

Upvotes
Accepted
1.2k 6 10 8

The score is computed by Solr and largely based on the tf-idf statistic.

How are documents scored

By default, a "TF-IDF" based Scoring Model is used. The basic scoring factors:

  • tf stands for term frequency - the more times a search term appears in a document, the higher the score
  • idf stands for inverse document frequency - matches on rarer terms count more than matches on common terms
  • coord is the coordination factor - if there are multiple terms in a query, the more terms that match, the higher the score
  • lengthNorm - matches on a smaller field score higher than matches on a larger field
  • index-time boost - if a boost was specified for a document at index time, scores for searches that match that document will be boosted.
  • query clause boost - a user may explicitly boost the contribution of one part of a query over another.

See also http://docs.datafusion.thomsonreuters.com/user-interface-searching-advanced-search

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.