question

Upvotes
Accepted
50 3 8 4

/entity/search scores

Using the DDS instance of Data Fusion, if I use the /entity/search call with the inputs:

searchstring = 'Toyota Motor Corp'

entitytype = 12 (organization)

In the results, one entity has a name that is an exact match for the input. However, that entity has a lower score than the top match, which is:

Toyota Motor Philippines Corp

This doesn't make sense to me. What kind of scoring algorithm are we using that produces this result?

apidata-fusion
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Hello @peter.vonbredow

Thank you for your participation in the forum. Is the reply below satisfactory in resolving your query? If yes please click the 'Accept' text next to the reply. This will guide all community members who have a similar question. Otherwise please post again offering further insight into your question.

Thanks,

AHS

Please be informed that a reply has been verified as correct in answering the question, and has been marked as such.
Thanks,
AHS

1 Answer

· Write an Answer
Upvotes
Accepted
1.2k 8 11 8

The score is computed by Solr and largely based on the tf-idf statistic.

How are documents scored

By default, a "TF-IDF" based Scoring Model is used. The basic scoring factors:

  • tf stands for term frequency - the more times a search term appears in a document, the higher the score
  • idf stands for inverse document frequency - matches on rarer terms count more than matches on common terms
  • coord is the coordination factor - if there are multiple terms in a query, the more terms that match, the higher the score
  • lengthNorm - matches on a smaller field score higher than matches on a larger field
  • index-time boost - if a boost was specified for a document at index time, scores for searches that match that document will be boosted.
  • query clause boost - a user may explicitly boost the contribution of one part of a query over another.

See also http://docs.datafusion.thomsonreuters.com/user-interface-searching-advanced-search

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.