Suggester
The SuggestComponent in Solr provides users with automatic suggestions for query terms.
You can use this to implement a powerful auto-suggest feature in your search application.
Although it is possible to use the Spell Checking functionality to power autosuggest behavior, Solr has a dedicated SuggestComponent designed for this functionality.
This approach utilizes Lucene’s Suggester implementation and supports all of the lookup implementations available in Lucene.
The main features of this Suggester are:
-
Lookup implementation pluggability
-
Term dictionary pluggability, giving you the flexibility to choose the dictionary implementation
-
Distributed support
The solrconfig.xml
found in Solr’s "techproducts" example has a Suggester implementation configured already.
For more on search components, see the section Request Handlers and Search Components.
The "techproducts" example solrconfig.xml
has a suggest
search component and a /suggest
request handler already configured.
You can use that as the basis for your configuration, or create it from scratch, as detailed below.
Adding the Suggest Search Component
The first step is to add a search component to solrconfig.xml
and tell it to use the SuggestComponent.
Here is some sample code that could be used.
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">cat</str>
<str name="weightField">price</str>
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnStartup">false</str>
</lst>
</searchComponent>
Suggester Search Component Parameters
The Suggester search component takes several configuration parameters.
The choice of the lookup implementation (lookupImpl
, how terms are found in the suggestion dictionary) and the dictionary implementation (dictionaryImpl
, how terms are stored in the suggestion dictionary) will dictate some of the parameters required.
Below are the main parameters that can be used no matter what lookup or dictionary implementation is used. In the following sections additional parameters are provided for each implementation.
searchComponent name
-
Required
Default: none
Arbitrary name for the overall search component definition.
searchComponent class
-
Required
Default: none
The Suggester class. This should be defined as
solr.SuggestComponent
. name
-
Required
Default: none
A symbolic name for this suggester. You can refer to this name in the URL parameters and in the SearchHandler configuration. It is possible to have multiples of these in one
solrconfig.xml
file. In the example configuration above, this is the name referenced in the line:<str name="name">mySuggester</str>
. lookupImpl
-
Optional
Default:
JaspellLookupFactory
Lookup implementation. There are several possible implementations, described in the section Lookup Implementations.
dictionaryImpl
-
Optional
Default: see description
The dictionary implementation to use. There are several possible implementations, described in the section Dictionary Implementations.
If not set, the default dictionary implementation is
HighFrequencyDictionaryFactory
. However, if asourceLocation
is used, the dictionary implementation will beFileDictionaryFactory
. field
-
Optional
Default: none
A field from the index to use as the basis of suggestion terms. If
sourceLocation
is empty (meaning any dictionary implementation other thanFileDictionaryFactory
), then terms from this field in the index will be used.To be used as the basis for a suggestion, the field must be stored. You may want to use copyField rules to create a special 'suggest' field comprised of terms from other fields in documents.
You usually want a minimal amount of analysis on the field (no stemming, no synonyms, etc.), so an option is to create a field type in your schema that only uses basic tokenizers or filters. An example of such a field type is shown here:
<fieldType class="solr.TextField" name="textSuggest" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
However, this minimal analysis is not required if you want more analysis to occur on terms.
If using the
AnalyzingLookupFactory
as yourlookupImpl
, however, you have the option of defining the field type rules to use for index and query time analysis. sourceLocation
-
Optional
Default: see description
The path to the dictionary file if using the
FileDictionaryFactory
. If this value is empty, the main index will be used as a source of terms and weights. storeDir
-
Optional
Default: none
The location to store the dictionary file.
buildOnCommit
andbuildOnOptimize
-
Optional
Default:
false
If
true
, the lookup data structure will be rebuilt after soft-commit. Iffalse
,then the lookup data will be built only when requested by URL parametersuggest.build=true
. UsebuildOnCommit
to rebuild the dictionary with every soft commit, orbuildOnOptimize
to build the dictionary only when the index is optimized.Some lookup implementations may take a long time to build, especially with large indexes. In such cases, using
buildOnCommit
orbuildOnOptimize
, particularly with a high frequency of soft commits is not recommended. Instead, build the suggester at a lower frequency by manually issuing requests withsuggest.build=true
. buildOnStartup
-
Optional
Default:
false
If
true,
then the lookup data structure will be built when Solr starts or when the core is reloaded. If this parameter is not specified, the suggester will check if the lookup data structure is present on disk and build it if not found.Enabling this to
true
could lead to Solr taking longer to load (or reload) cores as the suggester data structure is built, which can sometimes take a long time. It’s usually preferred to leave this set tofalse
and build suggesters manually withsuggest.build=true
.
Lookup Implementations
The lookupImpl
parameter defines the algorithms used to look up terms in the suggest index.
There are several possible implementations to choose from, and some require additional parameters to be configured.
AnalyzingLookupFactory
A lookup that first analyzes the incoming text and adds the analyzed form to a weighted FST, and then does the same thing at lookup time.
This implementation uses the following additional properties:
suggestAnalyzerFieldType
-
Required
Default: none
The field type to use for the query-time and build-time term suggestion analysis.
exactMatchFirst
-
Optional
Default:
true
If
true
, exact suggestions are returned first, even if they are prefixes or other strings in the FST have larger weights. preserveSep
-
Optional
Default:
true
If
true
, then a separator between tokens is preserved. This means that suggestions are sensitive to tokenization (e.g., baseball is different from base ball). preservePositionIncrements
-
Optional
Default:
false
If
true
, the suggester will preserve position increments. This means that token filters which leave gaps (for example, when StopFilter matches a stopword) the position would be respected when building the suggester.
FuzzyLookupFactory
This is a suggester which is an extension of the AnalyzingSuggester but is fuzzy in nature. The similarity is measured by the Levenshtein algorithm.
This implementation uses the following additional properties:
exactMatchFirst
-
Optional
Default:
true
If
true
, exact suggestions are returned first, even if they are prefixes or other strings in the FST have larger weights. preserveSep
-
Optional
Default:
true
If
true
, then a separator between tokens is preserved. This means that suggestions are sensitive to tokenization (e.g., "baseball" is different from "base ball"). maxSurfaceFormsPerAnalyzedForm
-
Optional
Default:
256
The maximum number of surface forms to keep for a single analyzed form. When there are too many surface forms we discard the lowest weighted ones.
maxGraphExpansions
-
Optional
Default:
-1
When building the FST ("index-time"), we add each path through the tokenstream graph as an individual entry. This places an upper-bound on how many expansions will be added for a single suggestion.
preservePositionIncrements
-
Optional
Default:
false
If
true
, the suggester will preserve position increments. This means that token filters which leave gaps (for example, when StopFilter matches a stopword) the position would be respected when building the suggester. maxEdits
-
Optional
Default:
1
The maximum number of string edits allowed. Solr’s hard limit is
2
. transpositions
-
Optional
Default:
true
If
true
, transpositions should be treated as a primitive edit operation. nonFuzzyPrefix
-
Optional
Default:
1
The length of the common non fuzzy prefix match which must match a suggestion.
minFuzzyLength
-
Optional
Default:
3
The minimum length of query before which any string edits will be allowed.
unicodeAware
-
Optional
Default:
false
If
true
, themaxEdits
,minFuzzyLength
,transpositions
andnonFuzzyPrefix
parameters will be measured in unicode code points (actual letters) instead of bytes.
AnalyzingInfixLookupFactory
Analyzes the input text and then suggests matches based on prefix matches to any tokens in the indexed text. This uses a Lucene index for its dictionary.
This implementation uses the following additional properties.
indexPath
-
Optional
Default: see description
When using
AnalyzingInfixSuggester
you can provide your own path where the index will get built. The default isanalyzingInfixSuggesterIndexDir
and will be created in your collection’sdata/
directory. minPrefixChars
-
Optional
Default:
4
Minimum number of leading characters before PrefixQuery is used. Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).
allTermsRequired
-
Optional
Default:
true
If
true
, all terms will be required. highlight
-
Optional
Default:
true
Highlight suggest terms.
This implementation supports Context Filtering.
BlendedInfixLookupFactory
An extension of the AnalyzingInfixSuggester
which provides additional functionality to weight prefix matches across the matched documents.
It scores higher if a hit is closer to the start of the suggestion.
This implementation uses the following additional properties:
blenderType
-
Optional
Default:
position_linear
Used to calculate weight coefficient using the position of the first matching word. Available options are:
-
position_linear
: Matches to the start will be given a higher score.weightFieldValue * (1 - 0.10*position)
-
position_reciprocal
: Matches to the start will be given a higher score. The score of matches positioned far from the start of the suggestion decays faster than linear.weightFieldValue / (1 + position)
-
position_exponential_reciprocal
: Matches to the start will be given a higher score. The score of matches positioned far from the start of the suggestion decays faster than reciprocal.weightFieldValue / pow(1 + position,exponent)
When using this blender type, an additional parameter is available:
-
exponent
: Controls how fast the score will decrease. The default2.0
.
-
-
numFactor
-
Optional
Default:
10
The factor to multiply the number of searched elements from which results will be pruned.
indexPath
-
Optional
Default: see description
When using
BlendedInfixSuggester
you can provide your own path where the index will get built. The default directory name isblendedInfixSuggesterIndexDir
and will be created in your collection’s data directory. minPrefixChars
-
Optional
Default:
4
Minimum number of leading characters before PrefixQuery is used. Prefixes shorter than this are indexed as character ngrams, which increases index size but makes lookups faster.
This implementation supports Context Filtering.
FreeTextLookupFactory
It looks at the last tokens plus the prefix of whatever final token the user is typing, if present, to predict the most likely next token. The number of previous tokens that need to be considered can also be specified. This suggester would only be used as a fallback, when the primary suggester fails to find any suggestions.
This implementation uses the following additional properties:
suggestFreeTextAnalyzerFieldType
-
Required
Default: none
The field type used at "query-time" and "build-time" to analyze suggestions.
ngrams
-
Optional
Default:
2
The max number of tokens out of which singles will be made the dictionary. Increasing this would mean you want more than the previous 2 tokens to be taken into consideration when making the suggestions.
FSTLookupFactory
An automaton-based lookup. This implementation is slower to build, but provides the lowest memory cost. We recommend using this implementation unless you need more sophisticated matching results, in which case you should use the Jaspell implementation.
This implementation uses the following additional properties:
exactMatchFirst
-
Optional
Default:
true
If
true
, the default, exact suggestions are returned first, even if they are prefixes or other strings in the FST have larger weights. weightBuckets
-
Optional
Default: none
The number of separate buckets for weights which the suggester will use while building its dictionary.
WFSTLookupFactory
A weighted automaton representation which is an alternative to FSTLookup
for more fine-grained ranking.
WFSTLookup
does not use buckets, but instead a shortest path algorithm.
Note that it expects weights to be whole numbers.
If weight is missing it’s assumed to be 1.0
.
Weights affect the sorting of matching suggestions when spellcheck.onlyMorePopular=true
is selected: weights are treated as "popularity" score, with higher weights preferred over suggestions with lower weights.
JaspellLookupFactory
A more complex lookup based on a ternary trie from the JaSpell project. Use this implementation if you need more sophisticated matching results.
Dictionary Implementations
The dictionary implementations define how terms are stored. There are several options, and multiple dictionaries can be used in a single request if necessary.
DocumentDictionaryFactory
A dictionary with terms, weights, and an optional payload taken from the index.
This dictionary implementation takes the following parameters in addition to parameters described for the Suggester generally and for the lookup implementation:
weightField
-
Optional
Default: none
A field that is stored or a numeric DocValue field.
payloadField
-
Optional
Default: none
The
payloadField
should be a field that is stored. contextField
-
Optional
Default: none
Field to be used for Context Filtering. Note that only some lookup implementations support filtering.
DocumentExpressionDictionaryFactory
This dictionary implementation is the same as the DocumentDictionaryFactory
but allows users to specify an arbitrary expression into the weightExpression
tag.
This dictionary implementation takes the following parameters in addition to parameters described for the Suggester generally and for the lookup implementation:
payloadField
-
Optional
Default: none
The
payloadField
should be a field that is stored. weightExpression
-
Required
Default: none
An arbitrary expression used for scoring the suggestions. The fields used must be numeric fields.
contextField
-
Optional
Default: none
Field to be used for Context Filtering. Note that only some lookup implementations support filtering.
HighFrequencyDictionaryFactory
This dictionary implementation allows adding a threshold to prune out less frequent terms in cases where very common terms may overwhelm other terms.
This dictionary implementation takes one parameter in addition to parameters described for the Suggester generally and for the lookup implementation:
threshold
-
Optional
Default:
0
A value between
0
and1
representing the minimum fraction of the total documents where a term should appear in order to be added to the lookup dictionary.
FileDictionaryFactory
This dictionary implementation allows using an external file that contains suggest entries. Weights and payloads can also be used.
If using a dictionary file, it should be a plain text file in UTF-8 encoding.
You can use both single terms and phrases in the dictionary file.
If adding weights or payloads, those should be separated from terms using the delimiter defined with the fieldDelimiter
property (the default is \t
, the tab representation).
If using payloads, the first line in the file must specify a payload.
This dictionary implementation takes one parameter in addition to parameters described for the Suggester generally and for the lookup implementation:
fieldDelimiter
-
Optional
Default:
\t
Specifies the delimiter to be used separating the entries, weights and payloads. The default is tab (
\t
).Example Fileacquire accidentally 2.0 accommodate 3.0
Multiple Dictionaries
It is possible to include multiple dictionaryImpl
definitions in a single SuggestComponent definition.
To do this, simply define separate suggesters, as in this example:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">cat</str>
<str name="weightField">price</str>
<str name="suggestAnalyzerFieldType">string</str>
</lst>
<lst name="suggester">
<str name="name">altSuggester</str>
<str name="dictionaryImpl">DocumentExpressionDictionaryFactory</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="field">product_name</str>
<str name="weightExpression">((price * 2) + ln(popularity))</str>
<str name="sortField">weight</str>
<str name="sortField">price</str>
<str name="storeDir">suggest_fuzzy_doc_expr_dict</str>
<str name="suggestAnalyzerFieldType">text_en</str>
</lst>
</searchComponent>
When using these Suggesters in a query, you would define multiple suggest.dictionary
parameters in the request, referring to the names given for each Suggester in the search component definition.
The response will include the terms in sections for each Suggester.
See the Example Usages section below for an example request and response.
Adding the Suggest Request Handler
After adding the search component, a request handler must be added to solrconfig.xml
.
This request handler works the same as any other request handler, and allows you to configure default parameters for serving suggestion requests.
The request handler definition must incorporate the "suggest" search component defined previously.
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
Suggest Request Handler Parameters
The following parameters allow you to set defaults for the Suggest request handler:
suggest
-
Optional
Default:
false
This parameter should always be
true
, because we always want to run the Suggester for queries submitted to this handler. suggest.dictionary
-
Required
Default: none
The name of the dictionary component configured in the search component. It can be set in the request handler, or sent as a parameter at query time.
suggest.q
-
Optional
Default: none
The query to use for suggestion lookups. If not provided, the
q
parameter is used. suggest.count
-
Optional
Default:
1
Specifies the number of suggestions for Solr to return.
suggest.cfq
-
Optional
Default: none
A context filter query used to filter suggestions based on the context field, if supported by the suggester.
Context filtering is currently only supported by
AnalyzingInfixLookupFactory
andBlendedInfixLookupFactory
, and only when backed by aDocument*Dictionary
. All other implementations will return unfiltered matches as if filtering was not requested. suggest.build
-
Optional
Default:
false
If
true
, it will build the suggester index. This is likely useful only for initial requests; you would probably not want to build the dictionary on every request, particularly in a production system. If you would like to keep your dictionary up to date, you should use thebuildOnCommit
orbuildOnOptimize
parameter for the search component. suggest.reload
-
Optional
Default:
false
If
true
, it will reload the suggester index. suggest.buildAll
-
Optional
Default:
false
If
true
, it will build all suggester indexes. suggest.reloadAll
-
Optional
Default:
false
If
true
, it will reload all suggester indexes.
These properties can also be overridden at query time, or not set in the request handler at all and always sent at query time.
Example Usages
Get Suggestions with Weights
This is a basic suggestion using a single dictionary and a single Solr core.
Example query:
http://localhost:8983/solr/techproducts/suggest?suggest=true&suggest.build=true&suggest.dictionary=mySuggester&suggest.q=elec
In this example, we’ve simply requested the string 'elec' with the suggest.q
parameter and requested that the suggestion dictionary be built with suggest.build
(note, however, that you would likely not want to build the index on every query - instead you should use buildOnCommit
or buildOnOptimize
if you have regularly changing documents).
Example response:
{
"responseHeader": {
"status": 0,
"QTime": 35
},
"command": "build",
"suggest": {
"mySuggester": {
"elec": {
"numFound": 3,
"suggestions": [
{
"term": "electronics and computer1",
"weight": 2199,
"payload": ""
},
{
"term": "electronics",
"weight": 649,
"payload": ""
},
{
"term": "electronics and stuff2",
"weight": 279,
"payload": ""
}
]
}
}
}
}
Using Multiple Dictionaries
If you have defined multiple dictionaries, you can use them in queries.
Example query:
http://localhost:8983/solr/techproducts/suggest?suggest=true&suggest.dictionary=mySuggester&suggest.dictionary=altSuggester&suggest.q=elec
In this example we have sent the string 'elec' as the suggest.q
parameter and named two suggest.dictionary
definitions to be used.
Example response:
{
"responseHeader": {
"status": 0,
"QTime": 3
},
"suggest": {
"mySuggester": {
"elec": {
"numFound": 1,
"suggestions": [
{
"term": "electronics and computer1",
"weight": 100,
"payload": ""
}
]
}
},
"altSuggester": {
"elec": {
"numFound": 1,
"suggestions": [
{
"term": "electronics and computer1",
"weight": 10,
"payload": ""
}
]
}
}
}
}
Context Filtering
Context filtering lets you filter suggestions by a separate context field, such as category, department or any other token.
The AnalyzingInfixLookupFactory
and BlendedInfixLookupFactory
currently support this feature, when backed by DocumentDictionaryFactory
.
Add contextField
to your suggester configuration.
This example will suggest names and allow to filter by category:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">name</str>
<str name="weightField">price</str>
<str name="contextField">cat</str>
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnStartup">false</str>
</lst>
</searchComponent>
Example context filtering suggest query:
http://localhost:8983/solr/techproducts/suggest?suggest=true&suggest.build=true&suggest.dictionary=mySuggester&suggest.q=c&suggest.cfq=memory
The suggester will only bring back suggestions for products tagged with 'cat=memory'.