Sitecore Solr eDisMax: Boosting, Tuning, Debugging

Posted by

Welcome to part 2 of my Sitecore Solr eDisMax series. In part one I introduced Solr eDisMax and how to get started with it in your Sitecore solution. In this second post in this series I will explain relevancy, boosting, tuning and debugging.

What is Relevancy?

From the Solr Relevancy FAQ:

“Relevancy is the quality of results returned from a query, encompassing both what documents are found, and their relative ranking.”

You can easily see WHY solr returned the documents that it did, leveraging the “explain” feature. See the “explain” section below for more details.

Query Field (qf)

The qf parameter field you list the fields (with optional boosts) you want solr to query when evaluating your keyword.

qf=headline_t^2 subheadline_t^1.1 maincontent_t
  • headline_t is boosted by 2
  • subheadline is boosted by 1.1
  • maincontent_t has no boost applied. T

We boost subheadline_t lower than headline_t because if a page has a keyword match in headline_t that is more important if it matches in the subheadline_t (and by the same extension, main_content_t).

Phrase Fields (pf)

The optional pf parameter is used to boost the score of documents when all of the search terms (i.e. the q parameter) appear close together.

For example, if the user searches for (i.e. the q parameter is) “intellectual property”, then Solr documents where “intellectual” and “property” are close or directly adjacent to each other will receive a boost to their score. Adversely, if “intellectual” and “property” are a few words apart the boost added to the document score will be lower or even 0.

bq (Boost Query)

The optional bq parameter allows you to write simple functions to boost relevancy, such as boosting more recently created or updated documents to the top of the result set.

A great example from the official Solr documentation is the ability to boost recent documents:

bq=date:[NOW/DAY-1YEAR TO NOW/DAY]

bf (Boost Function)

The optional bf parameter is nearly identical to the bq parameter functionality but provides a more function-like syntax (and shorthand way) to write bq functions.

The Solr bf documentation states: “Specifying functions with the bf parameter is essentially just shorthand for using the bq param combined with the {!func} parser.”

For example, if you want to show the most recent documents first, you could use either of the following:

bf=recip(rord(creationDate),1,1000,1000)
  ...or...
bq={!func}recip(rord(creationDate),1,1000,1000)

As for Sitecore documents, you can use the item created date field “__smallcreateddate_tdt” found on every item to boost most recently created items:

bq={!func}recip(rord(__smallcreateddate_tdt),1,1000,1000)

Minimum Match (mm)

Minimum match is a very powerful tool to quickly adjust the relevancy of all documents returned in a result set.

The mm parameter specifies a minimum number of clauses or phrases that must match in a query.

“By default, all words or phrases specified in the q parameter are treated as “optional” clauses unless they are preceded by a “+” or a “-“. When dealing with these “optional” clauses, the mm parameter makes it possible to say that a certain minimum number of those clauses must match (without having to preceded each clause by “+” or “-” ).

There are many other acceptable values for mm, I recommend reviewing the documentation.

I recommend starting with the value of “100%” and perhaps lowering it to 75%. In my opinion, using percentages is makes more sense as they are dynamic; hard integers can be problematic since the user’s search query could contain any number of search terms.

On my project, this provided a drastic improvement of relevant results when searching for multiple terms. For example, we saw a dramatic decrease of “random” results (i.e. “why does this page show up”), and in general the entire result simply just had a better feel to it.

Tuning

  • When adjusting boost values, avoid increasing the boost values, instead try reduce boost values of the other fields instead.
  • Use decimals for boosting, e.g. headline_t^1.2, subheadline_t^1. This lowers the boost multiplier in the relevance calculation. Remember all these values are just relative to one another, so keep them small.
  • Try not to exceed boost values greater than 2, 5 or 10. Large numbers (e.g. 50, 100) will exponentially increase relevancy scores and ultimatly deliver more confusing, if not less relevant, results.

Debugging

Explain

Explain will show how solr calculated the relevancy score which determines the order of the results returned.

There are (at least) two ways to show the explain score in the solr dashboard:

  1. In the fl field, enter [explain]
  2. check the “debugQuery” checkbox.
    1. (consider setting the “wt” field to XML more readable output”)

For example, in my demo site solr instance, if i search for “sitecore” (i.e. q=sitecore), this is the explain for the top result (using the debugQuery/xml technique) with a total document score is 2.9960673

<str name="sitecore://web/{4bcaf44e-5dbf-4c48-9175-4caa40596efd}?lang=en&ver=1&ndx=sitecore_web_index">
2.9960673 = weight(title_t:sitecore in 55) [SchemaSimilarity], result of:
  2.9960673 = score(doc=55,freq=1.0 = termFreq=1.0
), product of:
    3.1309862 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:
      43.0 = docFreq
      995.0 = docCount
    0.9569085 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
      1.0 = termFreq=1.0
      1.2 = parameter k1
      0.75 = parameter b
      2.7025125 = avgFieldLength
      3.0 = fieldLength
</str>

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s