Search with synonyms
Stack Serverless
Learn about adding custom synonym bundles to your Elastic Cloud Enterprise deployment.
Synonyms are words or phrases that share the same or similar meaning. Searching using synonyms allows you to:
- Improve search relevance by finding relevant documents that use different terms to express the same concept.
- Make domain-specific vocabulary more user-friendly.
- Define misspellings and typos to transparently handle common mistakes.
To use synonyms in Elasticsearch, follow this workflow:
- Create synonym sets and rules: Define which terms are equivalent and where to store your synonym sets.
- Configure analyzers: Configure your token filters and analyzers to use them.
- Test and apply: Verify your configuration works correctly.
Synonym rules define which terms should be treated as equivalent during search and indexing.
There are two main formats for synonym rules: explicit mappings and equivalent mappings.
Explicit mappings use =>
to specify exact replacements:
i-pod, i pod => ipod
sea biscuit, sea biscit => seabiscuit
With explicit mappings, the relationship is one-way. In the previous examples:
i-pod
andi pod
will be replaced withipod
, butipod
will not be replaced withi-pod
ori pod
sea biscuit
andsea biscit
will be replaced withseabiscuit
, butseabiscuit
will not be replaced withsea biscuit
orsea biscit
This is different from equivalent synonyms, which can create bidirectional relationships when expand=true
.
Equivalent synonyms use commas to group interchangeable terms:
ipod, i-pod, i pod
foozball, foosball
universe, cosmos
lol, laughing out loud
The behavior of equivalent synonyms depends on the expand
parameter in your token filter configuration:
- If
expand=true
:ipod, i-pod, i pod
creates bidirectional mappings:ipod
↔i-pod
ipod
↔i pod
i-pod
↔i pod
- If
expand=false
:ipod, i-pod, i pod
maps all terms to the first term as canonical:ipod
→ipod
i-pod
→ipod
i pod
→ipod
You have multiple options for creating synonym sets and rules.
Serverless Elasticsearch
You can create and manage synonym sets and synonym rules using the Kibana user interface.
To create a synonym set using the UI:
- Navigate to Elasticsearch > Synonyms or use the global search field.
- Click Get started.
- Enter a name for your synonym set.
- Add your synonym rules in the editor by adding terms to match against:
- Add Equivalent rules by adding multiple equivalent terms. For example:
ipod, i-pod, i pod
- Add Explicit rules by adding multiple terms that map to a single term. For example:
i-pod, i pod => ipod
- Add Equivalent rules by adding multiple equivalent terms. For example:
- Click Save to save your rules.
The UI supports the same synonym rule formats as the file-based approach. Changes made through the UI will automatically reload the associated analyzers.
You can use the synonyms APIs to manage synonyms sets. This is the most flexible approach, as it allows to dynamically define and modify synonyms sets. For examples of how to create or update a synonym set with APIs, refer to the Create or update synonyms set API examples page.
Changes in your synonyms sets will automatically reload the associated analyzers.
Serverless
You can store your synonyms set in a file.
Make sure you upload a synonyms set file for all your cluster nodes, to the configuration directory for your Elasticsearch distribution. If you're using Elastic Cloud Hosted, you can upload synonyms files using custom bundles.
An example synonyms file:
# Blank lines and lines starting with pound are comments.
# Explicit mappings match any token sequence on the left hand side of "=>"
# and replace with all alternatives on the right hand side.
# These types of mappings ignore the expand parameter in the schema.
# Examples:
i-pod, i pod => ipod
sea biscuit, sea biscit => seabiscuit
# Equivalent synonyms may be separated with commas and give
# no explicit mapping. In this case the mapping behavior will
# be taken from the expand parameter in the token filter configuration.
# This allows the same synonym file to be used in different synonym handling strategies.
# Examples:
ipod, i-pod, i pod
foozball, foosball
universe, cosmos
lol, laughing out loud
# If expand==true in the synonym token filter configuration,
# "ipod, i-pod, i pod" is equivalent to the explicit mapping:
ipod, i-pod, i pod => ipod, i-pod, i pod
# If expand==false, "ipod, i-pod, i pod" is equivalent
# to the explicit mapping:
ipod, i-pod, i pod => ipod
# Multiple synonym mapping entries are merged.
foo => foo bar
foo => baz
# is equivalent to
foo => foo bar, baz
To update an existing synonyms set, upload new files to your cluster. Synonyms set files must be kept in sync on every cluster node.
When a synonyms set is updated, search analyzers that use it need to be refreshed using the reload search analyzers API
This manual syncing and reloading makes this approach less flexible than using the synonyms API.
You can test your synonyms by adding them directly inline in your token filter definition.
Inline synonyms are not recommended for production usage. A large number of inline synonyms increases cluster size unnecessarily and can lead to performance issues.
Once your synonyms sets are created, you can start configuring your token filters and analyzers to use them.
Synonyms sets must exist before they can be added to indices. If an index is created referencing a nonexistent synonyms set, the index will remain in a partially created and inoperable state. The only way to recover from this scenario is to ensure the synonyms set exists then either delete and re-create the index, or close and re-open the index.
Elasticsearch uses synonyms as part of the analysis process. You can use two types of token filter to include synonyms:
- Synonym graph: Recommended as it can correctly handle multi-word synonyms.
- Synonym: Not recommended if you need to use multi-word synonyms.
Check each synonym token filter documentation for configuration details and instructions on adding it to an analyzer.
Invalid synonym rules can cause errors when applying analyzer changes. For reloadable analyzers, this prevents reloading and applying changes. You must correct errors in the synonym rules and reload the analyzer.
An index with invalid synonym rules cannot be reopened, making it inoperable when:
- A node containing the index starts
- The index is opened from a closed state
- A node restart occurs (which reopens the node assigned shards)
You can test an analyzer configuration without modifying your index settings. Use the analyze API to test your analyzer chain:
GET /_analyze
{
"tokenizer": "standard",
"filter" : [
"lowercase",
{
"type": "synonym_graph",
"synonyms": ["pc => personal computer", "computer, pc, laptop"]
}
],
"text" : "Check how PC synonyms work"
}
Analyzers can be applied at index time or search time.
You need to decide when to apply your synonyms:
- Index time: Synonyms are applied when the documents are indexed into Elasticsearch. This is a less flexible alternative, as changes to your synonyms require reindexing.
- Search time: Synonyms are applied when a search is executed. This is a more flexible approach, which doesn't require reindexing. If token filters are configured with
"updateable": true
, search analyzers can be reloaded when you make changes to your synonyms.NoteSynonyms sets created using the synonyms API or the UI can only be used at search time.
You can specify the analyzer that contains your synonyms set as a search time analyzer or as an index time analyzer.
The following example adds my_analyzer
as a search analyzer to the title
field in an index mapping:
{
"mappings": {
"properties": {
"title": {
"type": "text",
"search_analyzer": "my_analyzer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": [
"synonyms_filter"
]
}
},
"filter": {
"synonyms_filter": {
"type": "synonym",
"synonyms_path": "analysis/synonym-set.txt",
"updateable": true
}
}
}
}
}