05 Implement Advanced Search Features in Azure AI Search
05 Implement Advanced Search Features in Azure AI Search
• Azure AI Search is a powerful search service that can index a wide range of data
from various sources. A core part of search is returning relevant results from search
queries.
• This module builds on Create a custom skill for Azure AI Search to explore more
advanced search features supported by Azure AI Search.
Improve the ranking of a document with term
boosting
• Search works best when the most relevant results are shown first.
• All search engines try to return the most relevant results to search queries.
• Azure AI Search implements an enhanced version of Apache Lucene for full text
search.
• You'll then improve the relevance of results by boosting specific terms in your
search query.
Improve the ranking of a document with term
boosting
Search an index
• Azure AI Search lets you query an index using a REST endpoint or inside the Azure
portal with the search explorer tool.
• You'll use the search explorer to see the difference between using the simple and
full query type changes your search results.
Improve the ranking of a document with term
boosting
Write a simple
query
• The hotel sample data contains 50 hotels with descriptions, room details, and their
locations. Imagine you run a hotel booking business and have an app that users
can book hotels with. Users can search and the most relevant hotels must be shown
first.
• You have a use case where a customer is trying to find a luxury hotel. Start by
looking at the search results from this simple query:
search=luxury&$select=HotelId, HotelName, Category, Tags,
Description&$count=true
• The query parses will search for the term luxury across all the fields for a document in the index.
• The query string also limits the returned fields from documents by adding the select option.
• $count=true
{
"@search.score": 1.966516,
"HotelId": "40",
"HotelName": "Trails End Motel",
"Description": "Only 8 miles from Downtown. On-site bar/restaurant, Free hot breakfast
buffet, Free wireless internet, All non-smoking hotel. Only 15 miles from airport.",
"Category": "Luxury",
"Tags": [
"continental breakfast",
"view",
"view"
]
},
...
]
}
Improve the ranking of a document with term
boosting
Write a simple
query
• The customer might be surprised that the top hotel you have that's supposed to be
luxury is in the budget category and doesn't have any air conditioning.
• If the customer enters multiple words in their search, your app assumes all terms
should be in the results, so it adds + between terms to the query.
• The search service now returns five documents but still the top results are in the
budget category.
Improve the ranking of a document with term
boosting
Enable the Lucene Query
Parser
• You can tell the search explorer to use the Lucene Query parser by
adding &queryType=full to the query string.
• With the Lucene syntax, you can write more precise queries. Here is a summary of
available features:
• Boolean operators: AND, OR, NOT for example luxury AND 'air con’
• Adding the other query string parameters you get this query string:
• Here, you'll see how to add scoring profiles to alter the scores for documents
based on your own criteria.
• The score is a function of the number of times identified search terms appear in a
document, the document's size, and the rarity of each of the terms.
• By default, the search results are ordered by their search score, highest first.
• If two documents have an identical search score, you can break the tie by adding
an $orderby clause.
Improve the relevance of results by adding
scoring profiles
Improve the score for more relevant documents
• As the default scoring works on the frequency of terms and rarity, the final
calculated score might not return the highest score for the most relevant
document.
• Each dataset is different, so AI Search lets you influence a document score using
scoring profiles.
Improve the relevance of results by adding
scoring profiles
Improve the score for more relevant documents
• The most straightforward scoring profile defines different weights for fields in an
index.
• In the above example, the Hotel index has a scoring profile that has the
Description field five times more relevant than data in the Location or Rooms
fields.
• The scoring profile can also include functions, for example, distance or freshness.
• Functions provide more control than simple weighting, for example, you can
define the boosting duration applied to newer documents before they score the
same as older documents.
• The power of scoring profiles means that instead of boosting a specific term in a
search request, you can apply a scoring profile to an index so that fields are
boosted automatically for all queries.
Improve the relevance of results by adding
scoring profiles
Add a weighted scoring profile
• You can add up to 100 scoring profiles to a search index. The simplest way to
create a scoring profile is in the Azure portal.
• The profile has also been set as the default profile. You can then use this search
query:
• The results now match the same query with a term boosted:
• You can control which scoring profile is applied to a search query by appending
the &scoringProfile=PROFILE NAME parameter.
• Scoring profiles can also be added programmatically using the Update Index
Improve the relevance of results by adding
scoring profiles
Use functions in a scoring profile
• The functions available to add to a scoring
profile are:
Function Description
Magnitude Alter scores based on a range of values for a numeric field
Freshness Alter scores based on the freshness of documents as given
by a DateTimeOffset field
Distance Alter scores based on the distance between a reference
location and a GeographyPoint field
Tag Alter scores based on common tag values in documents and
queries
• For example, using the hotel index the magnitude function can be applied to the
Rating field.
• The Azure portal will guide you through completing the parameters for each
function.
Improve the relevance of results by adding
scoring profiles
Use functions in a scoring profile
Improve an index with analyzers and tokenized
terms
• Azure AI Search is configured by default to analyze text and identify tokens
that will be helpful in your index.
• The right tokens ensure that users can find the documents they need quickly.
• However, when you have unusual or unique fields, you might want to
configure exactly how text is analyzed.
• Here, you'll learn how to define a custom analyzer to control how the content
of a field is split into tokens for inclusion in the index.
Improve an index with analyzers and tokenized
terms
Analyzers in AI Search
• When AI Search indexes your content, it retrieves text. To build a useful
index, with terms that help users locate documents, that text needs
processing. For example:
• The text should be broken into words, often by using whitespace and
punctuation characters as delimiters.
• Words should be reduced to their root form. For example, past tense
words, such as "ran", should be replaced with present tense words, such
as "run".
• If you don't specify an analyzer for a field, the default Lucene analyzer is
used.
Improve an index with analyzers and tokenized
terms
Analyzers in AI Search
• Alternatively, you can specify one of the analyzers that are built into AI
Search. Built-in analyzers are of two types:
• The built-in analyzers provide you with many options but sometimes you
need an analyzer with unusual behavior for a field.
• Token filters. These filters remove or modify the tokens emitted by the
tokenizer.
• mapping. This filter enables you to specify mappings that replace one
string with another. For example, you could specify a mapping that
replaces TX with Texas.
• Often, a token is a single word, but you might want to create unusual tokens
such as:
• keyword. This tokenizer emits the entire input as a single token. Use this
tokenizer for fields that should always be indexed as one value.
• apostrophe. This filter removes any apostrophe from a token and any
characters after the apostrophe.
Improve an index with analyzers and tokenized
terms
What is a custom analyzer?
Token filters
• classic. This filter removes English possessives and dots from acronyms.
• keep. This filter removes any token that doesn't include one or more words
from a list you specify.
• length. This filter removes any token that is longer than your specified
minimum or shorter than your specified maximum.
• trim. This filter removes any leading and trailing white space from tokens.
Improve an index with analyzers and tokenized
terms
Create a custom analyzer
• You create a custom analyzer by specifying it when you define the index.
• You must do this with JSON code - there's no way to specify a custom index in
the Azure portal.
• You can include only one tokenizer but one or more character filters and one
or more token filters.
• Use a unique name for your analyzer and set the @odata.type property to
Microsoft.Azure.Search.CustomAnalyzer.
• In this request:
• Replace <search service name> with the name of your AI Search resource.
• Replace <index name> with the name of the index that includes the custom
analyzer.
• Replace <api-version> with the version number of the REST API.
• Replace <api-key> with the access key for your AI Search resource. You can
obtain this key from the Azure portal.
Improve an index with analyzers and tokenized
terms
Test a custom analyzer
• Your request must also include a JSON body like this:
• Replace <analyzer name> with the name you specified when you defined
the custom analyzer.
• Be sure to test with lots of different text values until you're sure that the
custom analyzer behaves as you expect.
Improve an index with analyzers and tokenized
terms
Use a custom analyzer for a field
• Once you've defined and tested a custom analyzer, you can configure your
index to use it.
• You can use the analyzer field when you want to use the same analyzer for
both indexing and searching:
Improve an index with analyzers and tokenized
terms
Use a custom analyzer for a field
• It's also possible to use a different analyzer when indexing the field and when
searching the field.
• Use this configuration if you need a different set of processing steps when
you index a field to when you analyze a query:
Enhance an index to include multiple languages
• Support for multiple languages can be added to a search index.
• You can add language support manually by providing all the translated text
fields in all the different languages you want to support.
• You could also choose to use Azure AI Services to provide translated text
through an enrichment pipeline.
• Here, you'll see how to add fields with different languages to an index.
• Finally, create a scoring profile to boost the native language of your end
users.
Enhance an index to include multiple languages
Add language specific fields
• To add multiple languages to an index, first, identify all the fields that need a
translation.
• Then duplicate those fields for each language you want to support.
• For each field, add to its definition the corresponding language analyzer.
• Your language specific search solution can combine these two features to focus
on fields with specific languages in them.
• Using the searchFields and select properties in the above results would return
these results from the real estate sample database.
Enhance an index to include multiple languages
Enrich an index with multiple languages using
Azure AI Services
• If you don't have access to translations, you can enrich your index and add
translated fields using Azure AI Services.
• The steps are to add fields for each language, add a skill for each language,
and then map the translated text to the correct fields.
"value": [
{
"@search.score": 1,
"listingId": "OTM4MjI2NQ2",
"beds": 5,
"baths": 4,
"description": "This is an apartment residence and is perfect for entertaining. This home provides lakefront property located close to
parks and features a detached garage, beautiful bedroom floors, and lots of storage.",
"description_de": "Dies ist eine Wohnanlage und ist perfekt für Unterhaltung. Dieses Haus bietet Seeliegenschaft Parks in der Nähe
und verfügt über eine freistehende Garage schöne Zimmer-Etagen and viel Stauraum.",
"description_fr": "Il s'agit d'un appartement de la résidence et est parfait pour se divertir. Cette maison offre propriété au bord du lac
Situé à proximité de Parcs et dispose d'un garage détaché, planchers de belle chambre and beaucoup de rangement.",
Enhance an index to include multiple languages
Map the translated output into the index
• The documents now all have two new translated description fields.
"description_it": "Si tratta di un appartamento residence ed è perfetto per intrattenere. Questa casa fornisce proprietà lungolago
Situato vicino ai parchi e dispone di un garage indipendente, piani di bella camera da letto and sacco di stoccaggio.",
"description_es": "Se trata de una residencia Apartamento y es perfecto para el entretenimiento. Esta casa ofrece propiedad de lago
situado cerca de parques y cuenta con un garaje independiente, pisos de dormitorio hermoso and montón de almacenamiento.",
"description_pl": "Jest to apartament residence i jest idealny do zabawy. Ten dom zapewnia lakefront Wlasciwosc usytuowany w
poblizu parków i oferuje garaz wolnostojacy, piekna sypialnia podlogi and mnóstwo miejsca do przechowywania.",
"description_nl": "Dit is een appartement Residentie en is perfect voor entertaining. Dit huis biedt lakefront eigenschap vlakbij
parken en beschikt over een vrijstaande garage, mooie slaapkamer vloeren and veel opslag.",
"description_jp": " これはアパートの住居であり、娯楽に最適です。 この家は公園の近くに位置する湖畔のプロパティを提供し、独立したガレージ、美しいベッドルームの床とストレージの多くを備えています。 ",
"description_uk": "Це багатоквартирна резиденція і прекрасно підходить для розваг. Цей будинок забезпечує нерухомість на
березі озера, розташовану недалеко від парків, і має окремий гараж, красиві підлоги спальні та багато місць для зберігання
речей.",
...
},
Improve search experience by ordering results by
distance from a given reference point
• Often, users want to search for items associated with a geographical location.
• For example, they might want to find the nearest coffee shop to their location.
• To help you compare locations on the Earth's surface, AI Search includes geo-
spatial functions that you can call in queries.
• Here, you'll learn how to search for things that are near a physical point or
within a bounded area.
Improve search experience by ordering results by
distance from a given reference point
What are geo-spatial functions?
• In previous units in this module, you saw how users might locate a hotel by specifying fields
to search, such as Description and Category:
• An important consideration when you're booking a hotel is its geographical location. For
example, if you're booking a trip to see the Eiffel Tower, you'll want a hotel located near it.
• To ask AI Search to return results based on their location information, you can use two
functions in your query:
• geo.distance. This function returns the distance in a straight line across the Earth's surface
from the point you specify to the location of the search result.
• geo.intersects. This function returns true if the location of a search result is inside a polygon
that you specify.
• To use these functions, make sure that your index includes the location for results. Location
fields should have the datatype Edm.GeographyPoint and store the latitude and longitude.
Improve search experience by ordering results by
distance from a given reference point
Use the geo.distance function
• geo.distance is a function that takes two points as parameters and returns the distance
between them in kilometers.
• Suppose you're looking for a hotel near the Eiffel Tower. You can modify the above query,
adding a new filter:
• search=(Description:luxury OR Category:luxury)$filter=geo.distance(location,
geography'POINT(-122.131577 47.678581)') le 5&$select=HotelId, HotelName, Category,
Tags, Description&$count=true
• This query returns all the luxury hotels in the index within five kilometers of the Eiffel Tower.
In the query:
• Location is the name of the field that stores the hotel's location.
• geography'POINT(2.294481 48.858370)' is the location of the Eiffel Tower as a longitude and
latitude.
• le 5 specifies that hotels should be included in the results if the geo.distance function returns
a number less than or equal to five kilometers.
Improve search experience by ordering results by
distance from a given reference point
Use the geo.distance function
• Because geo.distance returns several kilometers, you can also use it in an orderby clause. For
example, this query returns all luxury hotels in the index, but those closest to the Eiffel Tower
are listed first:
• search=(Description:luxury OR Category:luxury)&orderby=geo.distance(Location,
geography'POINT(2.294481 48.858370)') asc&$select=HotelId, HotelName, Category, Tags,
Description&$count=true
• In this query, asc specifies that the luxury hotels are returned in the ascending order of their
distance from the Eiffel Tower.
Improve search experience by ordering results by
distance from a given reference point
Use the geo.intersects function
• Suppose you've decided that you want to stay within the seventh arrondissement of Paris for
your trip to the Eiffel Tower. When you search for a hotel, you'd like to specify this area as
precisely as possible. You can formulate such a query by using the geo.intersects function.
• The geo.intersects function compares a location with a polygon on the Earth's surface, which
you specify with three or more points. By using a polygon, you can create a shape that
closely matches an area, such as an arrondissement. Use this polygon to add a geographical
filter to your query:
• This query returns all luxury hotels within a square around the Eiffel Tower. You can use more
than four points to create a more precise area.
• geo.intersects returns a boolean value, so it's not possible to use it in an orderby clause.
Exercise - Implement enhancements to
search results
Knowledge check
1. What character do you add after a search term boost the term?
a) +.
b) ^.
c) !.
2. Which of the following options is a function you can use in a scoring profile?
d) Tag.
e) Volume.
f) Staleness.
3. What Azure product can you use to enrich an index with different language
translations?
g) Azure AI Search.
h) Azure Speech Service.
i) Azure AI Services.