0% found this document useful (0 votes)
3 views

Module 3

The document introduces natural language processing (NLP) as a key area of AI that enables computer systems to understand and respond to human language. It discusses the capabilities of Azure AI Language, a cloud-based service that offers features like sentiment analysis, text summarization, and named entity recognition. The document also outlines various text analysis techniques, including tokenization, frequency analysis, and machine learning for text classification.

Uploaded by

aishux07
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Module 3

The document introduces natural language processing (NLP) as a key area of AI that enables computer systems to understand and respond to human language. It discusses the capabilities of Azure AI Language, a cloud-based service that offers features like sentiment analysis, text summarization, and named entity recognition. The document also outlines various text analysis techniques, including tokenization, frequency analysis, and machine learning for text classification.

Uploaded by

aishux07
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Unit 1 of 6 S テ Ask Learn

100 XP

Introduction
1 minute

In order for computer systems to interpret the subject of a text in a similar way humans do,
they use natural language processing (NLP), an area within AI that deals with understanding
written or spoken language, and responding in kind. Text analysis describes NLP processes
that extract information from unstructured text.

Natural language processing might be used to create:


A social media feed analyzer that detects sentiment for a product marketing campaign.
A document search application that summarizes documents in a catalog.
An application that extracts brands and company names from text.

Azure AI Language is a cloud-based service that includes features for understanding and
analyzing text. Azure AI Language includes various features that support sentiment analysis,
key phrase identification, text summarization, and conversational language understanding.

In this module, you'll explore the capabilities of text analytics, and how you might use them.

Next unit: Understand Text Analytics


Next T
Unit 2 of 6 S テ Ask Learn

" 100 XP

Understand Text Analytics


6 minutes

Before exploring the text analytics capabilities of the Azure AI Language service, let's examine
some general principles and common techniques used to perform text analysis and other
natural language processing (NLP) tasks.

Some of the earliest techniques used to analyze text with computers involve statistical analysis
of a body of text (a corpus) to infer some kind of semantic meaning. Put simply, if you can
determine the most commonly used words in a given document, you can often get a good
idea of what the document is about.

Tokenization
The first step in analyzing a corpus is to break it down into tokens. For the sake of simplicity,
you can think of each distinct word in the training text as a token, though in reality, tokens can
be generated for partial words, or combinations of words and punctuation.

For example, consider this phrase from a famous US presidential speech: "we choose to go

to the moon" . The phrase can be broken down into the following tokens, with numeric

identifiers:

1. we
2. choose
3. to
4. go
5. the
6. moon

Notice that "to" (token number 3) is used twice in the corpus. The phrase "we choose to go

to the moon" can be represented by the tokens {1,2,3,4,3,5,6}.

7 Note

We've used a simple example in which tokens are identified for each distinct word in the
text. However, consider the following concepts that may apply to tokenization depending
on the specific kind of NLP problem you're trying to solve:
Text normalization: Before generating tokens, you may choose to normalize the text
by removing punctuation and changing all words to lower case. For analysis that
relies purely on word frequency, this approach improves overall performance.
However, some semantic meaning may be lost - for example, consider the sentence
"Mr Banks has worked in many banks." . You may want your analysis to

differentiate between the person "Mr Banks" and the "banks" in which he has
worked. You may also want to consider "banks." as a separate token to "banks"
because the inclusion of a period provides the information that the word comes at
the end of a sentence
Stop word removal. Stop words are words that should be excluded from the
analysis. For example, "the" , "a" , or "it" make text easier for people to read but
add little semantic meaning. By excluding these words, a text analysis solution may
be better able to identify the important words.
n-grams are multi-term phrases such as "I have" or "he walked" . A single word
phrase is a unigram , a two-word phrase is a bi-gram , a three-word phrase is a
tri-gram , and so on. By considering words as groups, a machine learning model

can make better sense of the text.


Stemming is a technique in which algorithms are applied to consolidate words
before counting them, so that words with the same root, like "power" , "powered" ,
and "powerful" , are interpreted as being the same token.

Frequency analysis
After tokenizing the words, you can perform some analysis to count the number of
occurrences of each token. The most commonly used words (other than stop words such as
"a" , "the" , and so on) can often provide a clue as to the main subject of a text corpus. For

example, the most common words in the entire text of the "go to the moon" speech we
considered previously include "new" , "go" , "space" , and "moon" . If we were to tokenize the
text as bi-grams (word pairs), the most common bi-gram in the speech is "the moon" . From
this information, we can easily surmise that the text is primarily concerned with space travel
and going to the moon.

 Tip

Simple frequency analysis in which you simply count the number of occurrences of each
token can be an effective way to analyze a single document, but when you need to
differentiate across multiple documents within the same corpus, you need a way to
determine which tokens are most relevant in each document. Term frequency - inverse
document frequency (TF-IDF) is a common technique in which a score is calculated based
on how often a word or term appears in one document compared to its more general
frequency across the entire collection of documents. Using this technique, a high degree
of relevance is assumed for words that appear frequently in a particular document, but
relatively infrequently across a wide range of other documents.

Machine learning for text classification


Another useful text analysis technique is to use a classification algorithm, such as logistic
regression, to train a machine learning model that classifies text based on a known set of
categorizations. A common application of this technique is to train a model that classifies text
as positive or negative in order to perform sentiment analysis or opinion mining.
For example, consider the following restaurant reviews, which are already labeled as 0
(negative) or 1 (positive):

- *The food and service were both great*: 1


- *A really terrible experience*: 0
- *Mmm! tasty food and a fun vibe*: 1
- *Slow service and substandard food*: 0

With enough labeled reviews, you can train a classification model using the tokenized text as
features and the sentiment (0 or 1) a label. The model will encapsulate a relationship between
tokens and sentiment - for example, reviews with tokens for words like "great" , "tasty" , or
"fun" are more likely to return a sentiment of 1 (positive), while reviews with words like

"terrible" , "slow" , and "substandard" are more likely to return 0 (negative).

Semantic language models


As the state of the art for NLP has advanced, the ability to train models that encapsulate the
semantic relationship between tokens has led to the emergence of powerful language models.
At the heart of these models is the encoding of language tokens as vectors (multi-valued
arrays of numbers) known as embeddings.
It can be useful to think of the elements in a token embedding vector as coordinates in
multidimensional space, so that each token occupies a specific "location." The closer tokens
are to one another along a particular dimension, the more semantically related they are. In
other words, related words are grouped closer together. As a simple example, suppose the
embeddings for our tokens consist of vectors with three elements, for example:

- 4 ("dog"): [10.3.2]
- 5 ("bark"): [10,2,2]
- 8 ("cat"): [10,3,1]
- 9 ("meow"): [10,2,1]
- 10 ("skateboard"): [3,3,1]

We can plot the location of tokens based on these vectors in three-dimensional space, like
this:

The locations of the tokens in the embeddings space include some information about how
closely the tokens are related to one another. For example, the token for "dog" is close to
"cat" and also to "bark" . The tokens for "cat" and "bark" are close to "meow" . The token

for "skateboard" is further away from the other tokens.

The language models we use in industry are based on these principles but have greater
complexity. For example, the vectors used generally have many more dimensions. There are
also multiple ways you can calculate appropriate embeddings for a given set of tokens.
Different methods result in different predictions from natural language processing models.

A generalized view of most modern natural language processing solutions is shown in the
following diagram. A large corpus of raw text is tokenized and used to train language models,
which can support many different types of natural language processing task.
Common NLP tasks supported by language models include:
Text analysis, such as extracting key terms or identifying named entities in text.
Sentiment analysis and opinion mining to categorize text as positive or negative.
Machine translation, in which text is automatically translated from one language to
another.
Summarization, in which the main points of a large body of text are summarized.
Conversational AI solutions such as bots or digital assistants in which the language model
can interpret natural language input and return an appropriate response.
These capabilities and more are supported by the models in the Azure AI Language service,
which we'll explore next.

Next unit: Get started with text analysis


R Previous Next T
Unit 3 of 6 S テ Ask Learn

" 100 XP

Get started with text analysis


5 minutes

Azure AI Language is a part of the Azure AI services offerings that can perform advanced
natural language processing over unstructured text. Azure AI Language's text analysis features
include:

Named entity recognition identifies people, places, events, and more. This feature can
also be customized to extract custom categories.
Entity linking identifies known entities together with a link to Wikipedia.
Personal identifying information (PII) detection identifies personally sensitive
information, including personal health information (PHI).
Language detection identifies the language of the text and returns a language code such
as "en" for English.
Sentiment analysis and opinion mining identifies whether text is positive or negative.
Summarization summarizes text by identifying the most important information.
Key phrase extraction lists the main concepts from unstructured text.

Entity recognition and linking


You can provide Azure AI Language with unstructured text and it will return a list of entities in
the text that it recognizes. An entity is an item of a particular type or a category; and in some
cases, subtype, such as those as shown in the following table.

ノ Expand table

Type SubType Example

Person "Bill Gates", "John"

Location "Paris", "New York"

Organization "Microsoft"

Quantity Number "6" or "six"


Type SubType Example

Quantity Percentage "25%" or "fifty percent"

Quantity Ordinal "1st" or "first"

Quantity Age "90 day old" or "30 years old"

Quantity Currency "10.99"

Quantity Dimension "10 miles", "40 cm"

Quantity Temperature "45 degrees"

DateTime "6:30PM February 4, 2012"

DateTime Date "May 2nd, 2017" or "05/02/2017"

DateTime Time "8am" or "8:00"

DateTime DateRange "May 2nd to May 5th"

DateTime TimeRange "6pm to 7pm"

DateTime Duration "1 minute and 45 seconds"

DateTime Set "every Tuesday"

URL " https://www.bing.com "

Email " [email protected] "

US-based Phone Number "(312) 555-0176"

IP Address "10.0.1.125"

Azure AI Language also supports entity linking to help disambiguate entities by linking to a
specific reference. For recognized entities, the service returns a URL for a relevant Wikipedia
article.

For example, suppose you use Azure AI Language to detect entities in the following restaurant
review extract:

"I ate at the restaurant in Seattle last week."

ノ Expand table

Entity Type SubType Wikipedia URL

Seattle Location https://en.wikipedia.org/wiki/Seattle

last week DateTime DateRange

Language detection
Use the language detection capability of Azure AI Language to identify the language in which
text is written. You can submit multiple documents at a time for analysis. For each document
submitted the service will detect:
The language name (for example "English").
The ISO 639-1 language code (for example, "en").
A score indicating a level of confidence in the language detection.
For example, consider a scenario where you own and operate a restaurant where customers
can complete surveys and provide feedback on the food, the service, staff, and so on. Suppose
you have received the following reviews from customers:

Review 1: "A fantastic place for lunch. The soup was delicious."

Review 2: "Comida maravillosa y gran servicio."


Review 3: "The croque monsieur avec frites was terrific. Bon appetit!"

You can use the text analytics capabilities in Azure AI Language to detect the language for
each of these reviews; and it might respond with the following results:

ノ Expand table
Document Language Name ISO 6391 Code Score

Review 1 English en 1.0

Review 2 Spanish es 1.0

Review 3 English en 0.9

Notice that the language detected for review 3 is English, despite the text containing a mix of
English and French. The language detection service will focus on the predominant language in
the text. The service uses an algorithm to determine the predominant language, such as length
of phrases or total amount of text for the language compared to other languages in the text.
The predominant language will be the value returned, along with the language code. The
confidence score might be less than 1 as a result of the mixed language text.

There might be text that is ambiguous in nature, or that has mixed language content. These
situations can present a challenge. An ambiguous content example would be a case where the
document contains limited text, or only punctuation. For example, using Azure AI Language to
analyze the text ":-)", results in a value of unknown for the language name and the language
identifier, and a score of NaN (which is used to indicate not a number).

Sentiment analysis and opinion mining


The text analytics capabilities in Azure AI Language can evaluate text and return sentiment
scores and labels for each sentence. This capability is useful for detecting positive and
negative sentiment in social media, customer reviews, discussion forums and more.

Azure AI Language uses a prebuilt machine learning classification model to evaluate the text.
The service returns sentiment scores in three categories: positive, neutral, and negative. In
each of the categories, a score between 0 and 1 is provided. Scores indicate how likely the
provided text is a particular sentiment. One document sentiment is also provided.
For example, the following two restaurant reviews could be analyzed for sentiment:

Review 1: "We had dinner at this restaurant last night and the first thing I noticed was how
courteous the staff was. We were greeted in a friendly manner and taken to our table right
away. The table was clean, the chairs were comfortable, and the food was amazing."

and
Review 2: "Our dining experience at this restaurant was one of the worst I've ever had. The
service was slow, and the food was awful. I'll never eat at this establishment again."

The sentiment score for the first review might be: Document sentiment: positive Positive score:
.90 Neutral score: .10 Negative score: .00
The second review might return a response: Document sentiment: negative Positive score: .00
Neutral score: .00 Negative score: .99

Key phrase extraction


Key phrase extraction identifies the main points from text. Consider the restaurant scenario
discussed previously. If you have a large number of surveys, it can take a long time to read
through the reviews. Instead, you can use the key phrase extraction capabilities of the
Language service to summarize the main points.
You might receive a review such as:

"We had dinner here for a birthday celebration and had a fantastic experience. We were
greeted by a friendly hostess and taken to our table right away. The ambiance was relaxed,
the food was amazing, and service was terrific. If you like great food and attentive service,
you should try this place."

Key phrase extraction can provide some context to this review by extracting the following
phrases:

birthday celebration
fantastic experience
friendly hostess
great food
attentive service
dinner
table
ambiance
place
As well as using sentiment analysis to determine that this is a positive review, you can also use
the key phrase service to identify important elements of the review.

Create a resource for Azure AI Language


To use Azure AI Language in an application, you must provision an appropriate resource in
your Azure subscription. You can choose either of the following types of resource:
A Language resource - choose this resource type if you only plan to use Azure AI
Language services, or if you want to manage access and billing for the resource
separately from other services.
An Azure AI services resource - choose this resource type if you plan to use Azure AI
Language in combination with other Azure AI services, and you want to manage access
and billing for these services together.

Next unit: Exercise - Analyze text in Azure AI Foundry portal


R Previous Next T
Unit 5 of 6 S テ Ask Learn

" 200 XP

Knowledge check
Module assessment 3 minutes

1. You want to use Azure AI Language to determine the key talking points in a text
document. Which feature of the service should you use? *

Sentiment analysis

Key phrase extraction


" Correct. Key phrases can be used to identify the main talking points in a text
document.
Entity detection

2. You use Azure AI Language to perform sentiment analysis on a sentence. The confidence
scores .04 positive, .36 neutral, and .60 negative are returned. What do these confidence
scores indicate about the sentence sentiment? *

The document is positive.

The document is neutral.

The document is negative.


" Correct. The sentiment is most likely the type with the highest confidence score,
in this case .6 negative.

3. When might you see NaN returned for a score in language detection? *

When the score calculated by the service is outside the range of 0 to 1

When the predominant language in the text is mixed with other languages

When the language is ambiguous


" Correct. The service will return NaN when it can't determine the language in the
provided text.
Unit 1 of 6 S テ Ask Learn

" 100 XP

Introduction
1 minute

We are used to being able to communicate at any time of the day or night, anywhere in the
world, putting organizations under pressure to react fast enough to their customers. We want
personal responses to our queries, without having to read in-depth documentation to find
answers. This often means that support staff get overloaded with requests for help through
multiple channels, and that people are left waiting for a response.
Conversational AI describes solutions that enable a dialog between an AI agent and a human.
Generically, conversational AI agents are known as bots. People can engage with bots through
channels such as web chat interfaces, email, social media platforms, and more.

Azure AI Language's question answering feature provides you with the ability to create
conversational AI solutions. Next you'll learn about question answering.

Next unit: Understand question answering


Next T
Unit 2 of 6 S テ Ask Learn

" 100 XP

Understand question answering


2 minutes

Question answering supports natural language AI workloads that require an automated


conversational element. Typically, question answering is used to build bot applications that
respond to customer queries. Question answering capabilities can respond immediately,
answer concerns accurately, and interact with users in a natural multi-turned way. Bots can be
implemented on a range of platforms, such as a web site or a social media platform.
Question answering applications provide a friendly way for people to get answers to their
questions and allows people to deal with queries at a time that suits them, rather than during
office hours.

In the following example, a chat bot uses natural language and provides options to a customer
to best handle their query. The user gets an answer to their question quickly, and only gets
passed to a person if their query is more complicated.

Next, learn how Azure AI services can be used to create a question answering project.

Next unit: Get started with custom question answering


Unit 3 of 6 S テ Ask Learn

" 100 XP

Get started with custom question


answering
3 minutes

You can easily create a question answering solution on Microsoft Azure using Azure AI
Language service. Azure AI Language includes a custom question answering feature that
enables you to create a knowledge base of question and answer pairs that can be queried
using natural language input.

Creating a custom question answering knowledge


base
You can use Azure AI Language Studio to create, train, publish, and manage question
answering projects.

7 Note

You can write code to create and manage projects using the Azure AI Language REST API
or SDK. However, in most scenarios it is easier to use the Language Studio.

To create a project, you must first provision a Language resource in your Azure subscription.

Define questions and answers


After provisioning a Language resource, you can use the Language Studio's custom question
answering feature to create a project that consists of question-and-answer pairs. These
questions and answers can be:

Generated from an existing FAQ document or web page.


Entered and edited manually.

In many cases, a project is created using a combination of all of these techniques; starting with
a base dataset of questions and answers from an existing FAQ document and extending the
knowledge base with additional manual entries.
Questions in the project can be assigned alternative phrasing to help consolidate questions
with the same meaning. For example, you might include a question like:

What is your head office location?

You can anticipate different ways this question could be asked by adding an alternative
phrasing such as:

Where is your head office located?

Test the project


After creating a set of question-and-answer pairs, you must save it. This process analyzes your
literal questions and answers and applies a built-in natural language processing model to
match appropriate answers to questions, even when they are not phrased exactly as specified
in your question definitions. Then you can use the built-in test interface in the Language
Studio to test your knowledge base by submitting questions and reviewing the answers that
are returned.

Next unit: Exercise - Use question answering with Language


Studio
R Previous Next T
Unit 5 of 6 S テ Ask Learn

" 200 XP

Knowledge check
Module assessment 2 minutes

) Great job! You passed the module assessment. P

1. Your organization has an existing frequently asked questions (FAQ) document. You need
to create a knowledge base that includes the questions and answers from the FAQ with the
least possible effort. What should you do? *

Create an empty knowledge base, and then manually copy and paste the FAQ
entries into it.

Import the existing FAQ document into a new knowledge base.


" Correct. You can import question and answer pairs from an existing FAQ
document into a question answering knowledge base.
Import a pre-defined chit-chat data source.

2. You want to create a knowledge base for your organization’s bot service. Which Azure AI
service is best suited to creating a knowledge base? *

Conversational Language Understanding

Question Answering
" Correct. Question Answering is part of the Azure AI Language service and enables
you to create a knowledge base of question and answer pairs
Optical Character Recognition

Next unit: Summary


Unit 1 of 6 S テ Ask Learn

" 100 XP

Introduction
1 minute

In 1950, the British mathematician Alan Turing devised the Imitation Game, which has become
known as the Turing Test and hypothesizes that if a dialog is natural enough, you might not
know whether you're conversing with a human or a computer. As artificial intelligence (AI)
grows ever more sophisticated, this kind of conversational interaction with applications and
digital assistants is becoming more and more common, and in specific scenarios can result in
human-like interactions with AI agents. Common scenarios for this kind of solution include
customer support applications, reservation systems, and home automation, among others.
To realize the aspiration of the imitation game, computers need not only to be able to accept
language as input (either in text or audio format), but also to be able to interpret the semantic
meaning of the input - in other words, understand what is being said.
Azure AI Language service supports conversational language understanding (CLU). You can
use CLU to build language models that interpret the meaning of phrases in a conversational
setting. One example of a CLU application is one that's able to turn devices on and off based
on speech. The application is able to take in audio input such as, "Turn the light off", and
understand an action it needs to take, such as turning a light off. Many types of tasks involving
command and control, end-to-end conversation, and enterprise support can be completed
with Azure AI Language's CLU feature.

Next unit: Describe conversational language understanding


Next T
Unit 2 of 6 S テ Ask Learn

" 100 XP

Describe conversational language


understanding
3 minutes

To work with conversational language understanding (CLU), you need to take into account
three core concepts: utterances, entities, and intents.

Utterances
An utterance is an example of something a user might say, and which your application must
interpret. For example, when using a home automation system, a user might use the following
utterances:

"Switch the fan on."

"Turn on the light."

Entities
An entity is an item to which an utterance refers. For example, fan and light in the following
utterances:

"Switch the fan on."

"Turn on the light."

You can think of the fan and light entities as being specific instances of a general device
entity.

Intents
An intent represents the purpose, or goal, expressed in a user's utterance. For example, for
both of the previously considered utterances, the intent is to turn a device on; so in your CLU
application, you might define a TurnOn intent that is related to these utterances.
A CLU application defines a model consisting of intents and entities. Utterances are used to
train the model to identify the most likely intent and the entities to which it should be applied
based on a given input. The home assistant application we've been considering might include
multiple intents, like the following examples:

ノ Expand table

Intent Related Utterances Entities

Greeting "Hello"

"Hi"

"Hey"

"Good morning"

TurnOn "Switch the fan on" fan (device)

"Turn the light on" light (device)

"Turn on the light" light (device)

TurnOff "Switch the fan off" fan (device)

"Turn the light off" light (device)

"Turn off the light" light (device)

CheckWeather "What is the weather for today?" today (datetime)

"Give me the weather forecast"

"What is the forecast for Paris?" Paris (location)

"What will the weather be like in Seattle Seattle (location), tomorrow


tomorrow?" (datetime)
Intent Related Utterances Entities

None "What is the meaning of life?"

"Is this thing on?"

In the table there are numerous utterances used for each of the intents. The intent should be a
concise way of grouping the utterance tasks. Of special interest is the None intent. You should
consider always using the None intent to help handle utterances that do not map any of the
utterances you have entered. The None intent is considered a fallback, and is typically used to
provide a generic response to users when their requests don't match any other intent.

After defining the entities and intents with sample utterances in your CLU application, you can
train a language model to predict intents and entities from user input - even if it doesn't
match the sample utterances exactly. You can then use the model from a client application to
retrieve predictions and respond appropriately.

Next unit: Get started with conversational language


understanding in Azure
R Previous Next T
Unit 3 of 6 S テ Ask Learn

" 100 XP

Get started with conversational language


understanding in Azure
3 minutes

Azure AI Language's conversational language understanding (CLU) feature enables you to


author a language model and use it for predictions. Authoring a model involves defining
entities, intents, and utterances. Generating predictions involves publishing a model so that
client applications can take user input and return responses.

Azure resources for conversational language


understanding
To use CLU capabilities in Azure, you need a resource in your Azure subscription. You can use
the following types of resource:

Azure AI Language: A resource that enables you to build apps with industry-leading
natural language understanding capabilities without machine learning expertise. You can
use a language resource for authoring and prediction.
Azure AI services: A general resource that includes CLU along with many other Azure AI
services. You can only use this type of resource for prediction.

The separation of resources is useful when you want to track resource utilization for Azure AI
Language use separately from client applications using all Azure AI services applications.

Authoring
After you've created an authoring resource, you can use it to train a CLU model. To train a
model, start by defining the entities and intents that your application will predict as well as
utterances for each intent that can be used to train the predictive model.
CLU provides a comprehensive collection of prebuilt domains that include pre-defined intents
and entities for common scenarios; which you can use as a starting point for your model. You
can also create your own entities and intents.

When you create entities and intents, you can do so in any order. You can create an intent, and
select words in the sample utterances you define for it to create entities for them; or you can
create the entities ahead of time and then map them to words in utterances as you're creating
the intents.
You can write code to define the elements of your model, but in most cases it's easiest to
author your model using the Language studio - a web-based interface for creating and
managing CLU applications.

Training the model


After you have defined the intents and entities in your model, and included a suitable set of
sample utterances; the next step is to train the model. Training is the process of using your
sample utterances to teach your model to match natural language expressions that a user
might say to probable intents and entities.
After training the model, you can test it by submitting text and reviewing the predicted intents.
Training and testing is an iterative process. After you train your model, you test it with sample
utterances to see if the intents and entities are recognized correctly. If they're not, make
updates, retrain, and test again.

Predicting
When you are satisfied with the results from the training and testing, you can publish your
Conversational Language Understanding application to a prediction resource for consumption.
Client applications can use the model by connecting to the endpoint for the prediction
resource, specifying the appropriate authentication key; and submit user input to get
predicted intents and entities. The predictions are returned to the client application, which can
then take appropriate action based on the predicted intent.

Next unit: Exercise - Use Conversational Language


Understanding with Language Studio
R Previous Next T
Unit 5 of 6 S テ Ask Learn

" 200 XP

Knowledge check
Module assessment 3 minutes

1. You need to provision an Azure resource that will be used to author a new conversational
language understanding application. What kind of resource should you create? *

Azure AI Speech

Azure AI Language
" Correct. To author a conversational language understanding model, you need an
Azure AI Language resource.
Azure AI services

2. You are authoring a conversational language understanding application to support an


international clock. You want users to be able to ask for the current time in a specified city,
for example "What is the time in London?". What should you do? *

Define a "city" entity and a "GetTime" intent with utterances that indicate the
city entity.
" Correct. The intent encapsulates the task (getting the time) and the entity
specifies the item to which the intent is applied (the city).
Create an intent for each city, each with an utterance that asks for the time in
that city.

Add the utterance "What time is it in city" to the "None" intent.

3. You have published your conversational language understanding application. What


information does a client application developer need to get predictions from it? *

The endpoint and key for the application's prediction resource


" Correct. Client applications must connect to the endpoint of the prediction
resource, specifying an associated authentication key.
The endpoint and key for the application's authoring resource
Unit 1 of 7 S テ Ask Learn

" 100 XP

Introduction
1 minute

AI speech capabilities enable us to manage home and auto systems with voice instructions,
get answers from computers for spoken questions, generate captions from audio, and much
more.

To enable this kind of interaction, the AI system must support at least two capabilities:

Speech recognition - the ability to detect and interpret spoken input


Speech synthesis - the ability to generate spoken output

Azure AI Speech provides speech to text, text to speech, and speech translation capabilities
through speech recognition and synthesis. You can use prebuilt and custom Speech service
models for a variety of tasks, from transcribing audio to text with high accuracy, to identifying
speakers in conversations, creating custom voices, and more. Next you'll learn how AI speech
capabilities work.

Next unit: Understand speech recognition and synthesis


Next T
Unit 2 of 7 S テ Ask Learn

" 100 XP

Understand speech recognition and


synthesis
2 minutes

Speech recognition takes the spoken word and converts it into data that can be processed -
often by transcribing it into text. The spoken words can be in the form of a recorded voice in
an audio file, or live audio from a microphone. Speech patterns are analyzed in the audio to
determine recognizable patterns that are mapped to words. To accomplish this, the software
typically uses multiple models, including:
An acoustic model that converts the audio signal into phonemes (representations of
specific sounds).
A language model that maps phonemes to words, usually using a statistical algorithm
that predicts the most probable sequence of words based on the phonemes.

The recognized words are typically converted to text, which you can use for various purposes,
such as:

Providing closed captions for recorded or live videos


Creating a transcript of a phone call or meeting
Automated note dictation
Determining intended user input for further processing
Speech synthesis is concerned with vocalizing data, usually by converting text to speech. A
speech synthesis solution typically requires the following information:
The text to be spoken
The voice to be used to vocalize the speech

To synthesize speech, the system typically tokenizes the text to break it down into individual
words, and assigns phonetic sounds to each word. It then breaks the phonetic transcription
into prosodic units (such as phrases, clauses, or sentences) to create phonemes that will be
converted to audio format. These phonemes are then synthesized as audio and can be
assigned a particular voice, speaking rate, pitch, and volume.

You can use the output of speech synthesis for many purposes, including:
Generating spoken responses to user input
Creating voice menus for phone systems
Reading email or text messages aloud in hands-free scenarios
Broadcasting announcements in public locations, such as railway stations or airports

Next unit: Get started with speech on Azure


R Previous Next T
Unit 3 of 7 S テ Ask Learn

" 100 XP

Get started with speech on Azure


3 minutes

Microsoft Azure offers speech recognition and synthesis capabilities through Azure AI Speech
service, which supports many capabilities, including:
Speech to text
Text to speech

7 Note

This module covers speech to text and text to speech capabilities. A separate module
covers speech translation in Azure AI services.

Speech to text
You can use Azure AI Speech to text API to perform real-time or batch transcription of audio
into a text format. The audio source for transcription can be a real-time audio stream from a
microphone or an audio file.

The model that is used by the Speech to text API, is based on the Universal Language Model
that was trained by Microsoft. The data for the model is Microsoft-owned and deployed to
Microsoft Azure. The model is optimized for two scenarios, conversational and dictation. You
can also create and train your own custom models including acoustics, language, and
pronunciation if the pre-built models from Microsoft don't provide what you need.

Real-time transcription: Real-time speech to text allows you to transcribe text in audio
streams. You can use real-time transcription for presentations, demos, or any other scenario
where a person is speaking.
In order for real-time transcription to work, your application needs to be listening for
incoming audio from a microphone, or other audio input source such as an audio file. Your
application code streams the audio to the service, which returns the transcribed text.
Batch transcription: Not all speech to text scenarios are real time. You might have audio
recordings stored on a file share, a remote server, or even on Azure storage. You can point to
audio files with a shared access signature (SAS) URI and asynchronously receive transcription
results.
Batch transcription should be run in an asynchronous manner because the batch jobs are
scheduled on a best-effort basis. Normally a job starts executing within minutes of the request
but there's no estimate for when a job changes into the running state.

Text to speech
The text to speech API enables you to convert text input to audible speech, which can either
be played directly through a computer speaker or written to an audio file.
Speech synthesis voices: When you use the text to speech API, you can specify the voice to be
used to vocalize the text. This capability offers you the flexibility to personalize your speech
synthesis solution and give it a specific character.
The service includes multiple pre-defined voices with support for multiple languages and
regional pronunciation, including neural voices that leverage neural networks to overcome
common limitations in speech synthesis with regard to intonation, resulting in a more natural
sounding voice. You can also develop custom voices and use them with the text to speech API

Supported Languages
Both the speech to text and text to speech APIs support a variety of languages. Use the links
below to find details about the supported languages:
Speech to text languages.
Text to speech languages.

Next unit: Use Azure AI Speech


R Previous Next T
Unit 4 of 7 S テ Ask Learn

" 100 XP

Use Azure AI Speech


2 minutes

Azure AI Speech is available for use through several tools and programming languages
including:
Studio interfaces
Command Line Interface (CLI)
REST APIs and Software Development Kits (SDKs)

Using studio interfaces


You can create Azure AI Speech projects using user interfaces with Speech Studio or Azure AI
Studio.

ノ Expand table

Speech Studio Azure AI Studio

Azure resources for Azure AI Speech


To use Azure AI Speech in an application, you must create an appropriate resource in your
Azure subscription. You can choose to create either of the following types of resource:

A Speech resource - choose this resource type if you only plan to use Azure AI Speech, or
if you want to manage access and billing for the resource separately from other services.
An Azure AI services resource - choose this resource type if you plan to use Azure AI
Speech in combination with other Azure AI services, and you want to manage access and
billing for these services together.

Next unit: Exercise - Explore Speech in Azure AI Foundry


portal
R Previous Next T
Unit 6 of 7 S テ Ask Learn

" 200 XP

Knowledge check
Module assessment 2 minutes

1. You plan to build an application that uses Azure AI Speech to transcribe audio recordings
of phone calls into text, and then submit the transcribed text to Azure AI Language to
extract key phrases. You want to manage access and billing for the application services with
a single Azure resource. Which type of Azure resource should you create? *

Speech

Language

Azure AI services
" Correct. This resource would support both the Azure AI Speech and Azure AI
Language services.

2. You want to use Azure AI Speech service to build an application that reads incoming
email message subjects aloud. Which API should you use? *

Speech to text

Text to speech
" Correct. The Text to speech API converts text to audible speech.
Translator

3. What is the main function of the Azure AI Speech to text API? *

It converts text into audible speech.

It translates speech from one language to another.

It performs real-time or batch transcription of audio into a text format.


" Correct. The Azure AI Speech to text API transcribes audio into text, either in real-
time or in batches.
Unit 1 of 7 S テ Ask Learn

" 100 XP

Introduction
1 minute

As organizations and individuals collaborate with people in other cultures and geographic
locations, they continue to need ways to remove language barriers.
One solution is to hire multilingual people to translate between languages. However the
scarcity of such skills, and the number of possible language combinations can make this
approach difficult to scale. Increasingly, automated translation, sometimes known as machine
translation, is being employed to solve this problem.

In this module, we explore Azure AI Translator and Azure AI Speech's cloud-based neural
machine translation capabilities.

Next unit: Understand translation concepts


Next T
Unit 2 of 7 S テ Ask Learn

" 100 XP

Understand translation concepts


2 minutes

One of the many challenges of translation between languages is that words don't have a one
to one replacement between languages. Machine translation advancements are needed to
improve the communication of meaning and tone between languages.

Literal and semantic translation


Early attempts at machine translation applied literal translations. A literal translation is where
each word is translated to the corresponding word in the target language. This approach
presents some issues. For one case, there may not be an equivalent word in the target
language. Another case is where literal translation can change the meaning of the phrase or
not get the context correct.
Artificial intelligence systems must be able to understand, not only the words, but also the
semantic context in which they're used. In this way, the service can return a more accurate
translation of the input phrase or phrases. The grammar rules, formal versus informal, and
colloquialisms all need to be considered.

Text and speech translation


Text translation can be used to translate documents from one language to another, translate
email communications that come from foreign governments, and even provide the ability to
translate web pages on the Internet. Many times you see a Translate option for posts on social
media sites, or the Bing search engine can offer to translate entire web pages that are returned
in search results.
Speech translation is used to translate between spoken languages, sometimes directly (speech-
to-speech translation) and sometimes by translating to an intermediary text format (speech-
to-text translation).

Next unit: Understand translation in Azure


Unit 3 of 7 S テ Ask Learn

" 100 XP

Understand translation in Azure


3 minutes

Microsoft provides Azure AI services that support translation. Specifically, you can use the
following services:
The Azure AI Translator service, which supports text-to-text translation.
The Azure AI Speech service, which enables speech to text and speech-to-speech
translation.

Azure AI Translator
Azure AI Translator is easy to integrate in your applications, websites, tools, and solutions. The
service uses a Neural Machine Translation (NMT) model for translation, which analyzes the
semantic context of the text and renders a more accurate and complete translation as a result.
Language support: Azure AI Translator supports text-to-text translation between more than
130 languages. When using the service, you must specify the language you are translating
from and the language you are translating to using ISO 639-1 language codes, such as en for
English, fr for French, and zh for Chinese. Alternatively, you can specify cultural variants of
languages by extending the language code with the appropriate 3166-1 cultural code - for
example, en-US for US English, en-GB for British English, or fr-CA for Canadian French. When
using Azure AI Translator, you can specify one from language with multiple to languages,
enabling you to simultaneously translate a source document into multiple languages.

Azure AI Speech
You can use Azure AI Speech to translate spoken audio from a streaming source, such as a
microphone or audio file, and return the translation as text or an audio stream. This enables
scenarios such as real-time closed captioning for a speech or simultaneous two-way
translation of a spoken conversation.

Language support: As with Azure AI Translator, you can specify one source language and one
or more target languages to which the source should be translated with Azure AI Speech. You
can translate speech into over 90 languages. The source language must be specified using the
extended language and culture code format, such as es-US for American Spanish. This
requirement helps ensure that the source is understood properly, allowing for localized
pronunciation and linguistic idioms. The target languages must be specified using a two-
character language code, such as en for English or de for German.

Next unit: Get started with translation in Azure


R Previous Next T
Unit 4 of 7 S テ Ask Learn

" 100 XP

Get started with translation in Azure


3 minutes

You can use Azure AI Translator with a programming language of your choice or the REST API.
You can use some of its features with Language Studio.
You can get started with Azure AI Speech with Speech Studio or a programming language of
your choice or the REST API.

Azure resources for Azure AI Translator and Azure


AI Speech
Before you can use Azure AI Translator or Azure AI Speech, you must provision appropriate
resources in your Azure subscription.

There are dedicated Translator and Speech resource types for these services, which you can
use if you want to manage access and billing for each service individually.

Alternatively, you can create an Azure AI services resource that provides access to both
services through a single Azure resource, consolidating billing and enabling applications to
access both services through a single endpoint and authentication key.

Using Azure AI Translator


Azure AI Translator includes the following capabilities:

Text translation - used for quick and accurate text translation in real time across all
supported languages.
Document translation - used to translate multiple documents across all supported
languages while preserving original document structure.
Custom translation - used to enable enterprises, app developers, and language service
providers to build customized neural machine translation (NMT) systems.
Azure AI Translator's application programming interface (API) offers some optional
configuration to help you fine-tune the results that are returned, including:
Profanity filtering. Without any configuration, the service will translate the input text,
without filtering out profanity. Profanity levels are typically culture-specific but you can
control profanity translation by either marking the translated text as profane or by
omitting it in the results.
Selective translation. You can tag content so that it isn't translated. For example, you
may want to tag code, a brand name, or a word/phrase that doesn't make sense when
localized.

Speech translation with Azure AI Speech


Azure AI Speech includes the following capabilities:

Speech to text - used to transcribe speech from an audio source to text format.
Text to speech - used to generate spoken audio from a text source.
Speech Translation - used to translate speech in one language to text or speech in
another.

7 Note

You can learn more about Azure AI Speech and Speech Studio with the learn module
Fundamentals of Azure AI Speech.

Next unit: Exercise - Explore Azure AI Translator


R Previous Next T
Unit 6 of 7 S テ Ask Learn

" 200 XP

Knowledge check
Module assessment 2 minutes

1. What is the main function of the Azure AI Translator service? *

To translate spoken audio from a streaming source into text or an audio stream.

To support text-to-text translation between more than 130 languages using a


Neural Machine Translation model.
" Correct. Azure AI Translator uses a Neural Machine Translation model to analyze
the semantic context of the text and render a more accurate and complete
translation.
To support multiple AI capabilities including text analysis, translation, and
speech.

2. Your team would like to build an application that translates digital copies of books.
Which Azure AI Translator capability would you use? *

Text translation

Document translation
" Correct. Document translation supports the processing of multiple documents
and large files.
Custom translation

3. You're developing an application that must take English input from a microphone and
generate a real-time audio output in Hindi. Which capability of Azure AI Speech would you
use? *

Text-to-speech

Speech translation
" Correct. Azure AI Speech can translate audio from one language into audio in
another language.

You might also like