HCIP-AI-EI Developer V2.0 Training Material
HCIP-AI-EI Developer V2.0 Training Material
• Advantage: The SeLU can avoid the range of gradient 0 in the ReLU and achieve
the effect similar to batch standardization.
• Gradient explosion: This problem occurs when the initial weight value is too large
in the deep network.
• Momentum optimizer accelerates training, reduces vibrations, and hits the local
extremum.
• The Adagrad optimizer is similar to playing golf. At the beginning, the ball is far
away from the target point and needs to use a relatively large force to reduce
the number of updates. When the ball is near the target point, the we need to
use a relatively small force to get the ball into the target point.
• Adam attempts to calculate momentum and adaptive learning rates for each
parameter, which is useful in complex network structures because different parts
of the network have different sensitivity to weight adjustment. Very sensitive
parts generally require lower learning rates. If sensitive parts are manually
identified and specially set learning rates for them, it is difficult or cumbersome
to implement. It's probably the best optimizer so far.
• The graph shows that when there are only two hidden layers, the performance
improvement of the network is limited regardless of the number of neurons. Add
a layer of hidden layers, and the performance of the model will be significantly
improved.
• Similar to cutting a window flower, folding a few more times requires fewer cuts.
• ResNet suggests that when the network layer is too deep, the network
performance deteriorates because the vanishing gradient problem. ResNet
proposed that residual structure can be used to alleviate the vanishing gradient
problem.
• Answers:
▫ BCD
• Image Source: stanfor.edu CS231N
• 8-bit images are most commonly used in computers.
• YUV: Human eyes are more sensitive to bright spots. That is, UV data can be
compressed, which is difficult to detect by human eyes. Therefore, the first step
of compression algorithm is to convert RGB data into YUV data. Compress Y less
and compress UV more to balance the image effect and compression ratio.
• Principle of the weighted mean value method: The human eye is most sensitive
to green, followed by red and blue (least sensitive).
• The intensity histogram reflects the frequencies of pixels with different gray
levels in an image. The intensity histogram is a relationship graph, which uses the
gray level as the horizontal coordinate and the frequency as the vertical
coordinate. The intensity histogram is an important feature of an image and
reflects the intensity distribution of the image.
• The upper right figure shows the specification result, and the lower right figure
shows the result histogram. Compared with histogram equalization, the details of
the sky in the image after histogram specification are clearer.
• Grayscale transformation mentioned earlier operates on single pixels, and its
output values are related only to the pixels. Histogram transformation applies to
an image globally and its output values are related to the entire image.
• Different from the preceding two methods, spatial filtering is a local processing
method. The output value of pixel 𝑃 is determined by the pixel values of 𝑃 and its
neighborhood 𝑁 in the input image.
• The calculation method is to perform a template operation on pixel values of
neighborhood 𝑁 and a subimage with the same size as the neighborhood. The
subimage is referred to as a template or filter.
• Common template operations include template convolution and template sorting.
• The two processing methods are commonly used and suitable for different
algorithms. In complex image processing tasks, the two methods are frequently
switched and combined to complete a task.
• Husky, Golden retriever, and Border collie
• Information can be divided into two categories. A type of information can be
represented by data or a unified structure, which is called structured data, such
as numbers and symbols. The other type of information, such as text, images,
sound, web pages, and so on, can not be represented by numbers or uniform
structures. We call it unstructured data method.
• The binarization of the image reduces the amount of data in the image and
highlights the contour of the object. The set property of the image after
binarization is related only to the position of the point whose pixel value is 0 or
255, and does not involve the multi-level value of the pixel, which makes the
processing simple and reduces the amount of data to be processed in the
calculation process.
• In order to adapt to the uneven illumination image, the adaptive threshold
algorithm also derives local adaptive threshold segmentation, which can
calculate the threshold for the local area in different regions of the image to
achieve the best segmentation effect.
• According to the actual problem, selecting or designing an appropriate adaptive
threshold algorithm can greatly improve the segmentation effect and subsequent
processing.
• bimodal method
▫ The OTSU algorithm divides the image into two parts: foreground and
background according to the gray feature of the image. The greater the
interclass variance between the foreground and background, the greater
the difference between the two parts that make up the image. If the
threshold is incorrectly split, the inter-class variance of the foreground
background will decrease. Therefore, it traverses all possible thresholds and
selects a segmentation threshold that maximizes the variance between
classes as the optimal threshold.
• First, we divide the image into small connected areas, we call it cell units, then
we collect the gradients and edge directions of the pixels in the cell units, and
then we add up a one - dimensional gradient histogram in each cell.
▫ Crop the corresponding area of the proposal on the original image, align
the area to the same scale, and extract features of each area of the original
image after alignment using the neural network.
• Post-processing
1 Huawei Confidential
Foreword
⚫ Speech technology has gradually changed the way we live and work in
recent years. Voice has become a major way for people to communicate
with some devices, including voice assistants, smart speakers, and robots.
So how does the machine understand what we say and answer our
questions? This chapter will take you to unveil speech processing
technology.
2 Huawei Confidential
Objectives
3 Huawei Confidential
Contents
1. Speech Processing
◼ Overview of Speech Processing
Speech Processing
Speech Signal Analysis and Feature Extraction
2. Speech Recognition
3. Text-to-Speech Synthesis
4 Huawei Confidential
Overview of Speech Processing (1)
⚫ Speech Signal Processing is a general term used to study various processing
technologies such as phonological pronouncing process, speech signal statistics, speech
automatic recognition, machine synthesis, and speech perception.
⚫ Since modern speech processing technologies are based on digital computing and
implemented by microprocessors, signal processors or general-purpose computers, it is
also called digital speech signal processing.
5 Huawei Confidential
Overview of Speech Processing (2)
⚫ The study of speech signal processing originated from the simulation of the vocal
organs.
⚫ Language information is mainly included in the parameters of speech signal, so it is the
key to extract the parameters of speech signal accurately and quickly.
6 Huawei Confidential
Main Application Scenarios of Speech Processing
⚫ Technology ⚫ Scenario
Speech preprocessing Man-machine interaction
Speech recognition Security protection
Speaker recognition Smart home
Speech translation Smart city
Speech synthesis Elderly care
Voiceprint recognition Education
Speech coding Customer service
7 Huawei Confidential
Linguistics
⚫ There are certain rules followed when communicating with each other to ensure
smooth communication and correct information transmission. For example, we need to
comply with:
Unified lexical meaning
Uniform pronunciation
Unified grammar specifications
Unified writing mode
……
⚫ All these are the linguistic research contents to explore the essential laws of language.
8 Huawei Confidential
Linguistics
⚫ Linguistics is the science of language as the object of study. Its research object is
human language, its task is to study and describe the structure, function and
historical development of language, find out the essence of language and
explore the law of language.
⚫ Phonology, grammar, lexical and literal all focus on the structure of language
itself, which are the center of linguistics, called microlinguistics.
⚫ Texts are used to basically record ideas, communicate ideas, or carry language
images or symbols.
9 Huawei Confidential
Phonetics (1)
⚫ Phonetics is a branch of linguistics, which is the study of human language and sound.
This paper mainly studies the pronunciation mechanism, phonetic characteristics and
changes in speech.
⚫ Phonetics in a narrow sense focuses on the specific nature of phonetics and the
methods of producing phonetics. In contrast, phonology is a system of abstract rules
and phonetics that study phonetic or phonetic distinguishing features in a language.
⚫ The generalization of phonetics refers to the sum of phonetics and phonology.
Speech Essence/
Generation Method Phonetics
Linguistics
Speech features/
operation rules
Phonology
10 Huawei Confidential
Phonetics (2)
⚫ Phonetics can be roughly divided into four main types:
Pronunciation phonetics: the study of how the sound of speech is produced through the vocal
organs in the mouth (such as the lips, teeth, tongue, vocal cords, etc.).
Acoustic phonetics: the study of how to perform acoustic analysis of speech sounds, such as the
frequency, duration, and amplitude of sound waves.
Auditory phonetics: the study of how the human ear receives sound, i.e. the auditory perception of
the human ear to the speech.
Language phonetics: how to combine sound, social environment, personal habits, and language laws.
◼ People in the same area pronounce the same word or sentence differently.
◼ People in different areas may pronounce the same word or sentence differently.
◼ The same person pronunciation of the same word or sentence differs in different situations and
emotions.
11 Huawei Confidential
Contents
1. Speech Processing
Overview of Speech Processing
◼ Speech Processing
Speech Signal Analysis and Feature Extraction
2. Speech Recognition
3. Text-to-Speech Synthesis
12 Huawei Confidential
Voice Source
⚫ The vocal organs are divided into three parts: sublarynx, larynx and upper larynx.
The sublarynx runs from the trachea to the lungs. The air flowing out of the lungs becomes the sound
source of speech.
The laryngeal part is mainly glottis and vocal cords. The vocal cords are two ligaments that act as valves
for the throat, and they close and open as glottis. When the glottis is open, the air is smooth, otherwise
the air burst out to make the vocal chords vibrate periodically to produce sound.
The upper part of larynx includes three areas of pharynx, mouth and nasal cavity, which mainly used to
adjust speech.
13 Huawei Confidential
Voice data
⚫ Speech: A voice with language information. It is a combination of acoustic and
language.
⚫ Features of speech signals:
Speech signals vary with time, which is a time-varying non-stationary signal. And it is
relatively stable in a short time range effected by the oral muscle, which can be regarded
as a quasi-steady process, so speech signal has short-term stability.
The short-term analysis technology is used throughout the whole process of speech signal
analysis.
Speech is produced by the movement of oral muscles, which move very slowly compared
to the frequency of speech.
Generally, the short time range is 10 ms to 30 ms.
14 Huawei Confidential
Speech Signal Preprocessing
⚫ Generally, voice files are in .wav format. In the following figure, the horizontal coordinate
indicates the number of sampling points, and the vertical coordinate indicates the amplitude.
⚫ The speech signals need to be preprocessed. The main problems of the speech signals are as
follows:
The distribution of waveform data is very uneven.
The muted part at the beginning and end.
The waveform contains noise.
15 Huawei Confidential
Speech Signal Preprocessing Procedure
⚫ Speech signal preprocessing includes:
Digitalization: Discretizes analog speech signals collected from sensors into digital signals.
Pre-emphasis: The purpose of pre-emphasis is to emphasize the high-frequency part of speech, eliminate
the impact of lip radiation, and increase the high-frequency resolution of speech.
Endpoint detection: Identify and eliminate long-time silence segments from speech signals to reduce
interference from the environment to signals.
Frame division: Short-term analysis is required because the voice is stable in a short time. That is, signals
are segmented. Each segment is called a frame (generally 10 ms to 30 ms).
Windowing: The speech signal is divided into frames by weighting a movable window with a limited
length. The purpose of windowing is to reduce the truncation effect of speech frames. Common windows
are: rectangular window, Haning window and Hanming window.
16 Huawei Confidential
Speech Signal Preprocessing - Pre-emphasis
⚫ Lip radiation: Lip radiation causes energy loss, which is obvious in high frequency bands
and has little impact on low frequency bands. Therefore, we use a high-pass filter to
pre-emphasize the signal to enhance the high-frequency part resolution.
17 Huawei Confidential
Speech Signal Preprocessing – Frame blocking
⚫ Frame blocking: Partitioning the speech signals into fixed-length frames.
Although a continuous segmentation method may be used for frame division, an overlapping
segmentation method is usually used to ensure smooth transition between frames and maintain
continuity. The overlap between the previous frame and the next frame is called a frame shift.
Window size
Frame shift
Overlapping
18 Huawei Confidential
Speech Signal Preprocessing - Windowing
⚫ Windowing: The framed speech signal is then windowed. Which is applied to minimize
disjointedness at the start and finish of each frame. Since more frames are split, and the error
between the frame and the original signal is greater. It increases the sharpness of harmonics,
eliminates the discontinuous of signal by tapering beginning and ending of the frame zero. It also
reduces the spectral distortion formed by the overlap.
⚫ Different window functions affect the results of speech signal analysis. Rectangular window has
good smoothness, but the details of the waveform are lost and leakage occurs. Hamming window
can effectively overcome the leakage phenomenon and has the widest application scope.
19 Huawei Confidential
Contents
1. Speech Processing
Overview of Speech Processing
Speech Processing
◼ Speech Signal Analysis and Feature Extraction
2. Speech Recognition
3. Text-to-Speech Synthesis
20 Huawei Confidential
Speech Features (1)
⚫ When analyzing a audio file, some speech features are required to reflect the essence of the voice
to facilitate subsequent process use. Therefore, the design of speech features is an important part
of speech processing.
⚫ Speech feature is the core information of speech description and plays an important role in
speech model construction.
⚫ Speech features:
Contains valid information to distinguish phonemes: time domain resolution and frequency domain
resolution.
Separate the fundamental frequency F0 and its harmonic components.
Robustness for different speakers.
Robustness to noise or channel distortion.
Good pattern recognition characteristics: low-dimensional features, independent features.
21 Huawei Confidential
Speech Features (2)
⚫ The feature extraction methods are as follows:
Linear Prediction Coefficient(LPC)
Linear Prediction Cepstral Coefficients (LPCC)
Discrete Wavelet Transform (DWT)
Line Spectral Frequencies (LSF)
Mel Frequency Cepstral Coefficients (MFCC)
Perceptual Linear Prediction (PLP)
⚫ The most commonly used speech feature in speech recognition and speaker
recognition is Mel-Frequency Cepstral Coefficients(MFCC).
22 Huawei Confidential
Speech Analysis (1)
⚫ Speech signal analysis does not include speech signal preprocessing, but noise reduction and
smoothing. Those two processes are called speech signal analysis.
⚫ Importance of speech signal analysis:
The quality of speech synthesis and the recognition rate depend on the accuracy and precision of speech
signal analysis.
Speech signal analysis is the basis and prerequisite of speech synthesis, speech recognition, speech
enhancement, and target speech extraction, which can be better used in different service scenarios.
Speech Analysis
23 Huawei Confidential
Speech Analysis(2)
⚫ There are many methods for analyzing speech signals according to specific
requirements. Speech analysis can be classified into the following types:
Time domain analysis
Frequency domain analysis
Inverted frequency analysis
Wavelet domain analysis
……
⚫ The analysis methods are classified into the following two types:
Model analysis method
Non-model analysis method
24 Huawei Confidential
Time Domain Speech Analysis
⚫ Time domain analysis is to analyze and extract time domain parameters of speech
signals, which is the earliest and most widely used analysis method (speech signals are
time domain signals). It is usually used for parameter analysis and application such as
speech segmentation, preprocessing, and classification.
⚫ List of Short-Time Features:
Short-time energy
Short-time zero-cross rate(ZCR)
Short-time auto-correlation
Short-time amplitude difference
25 Huawei Confidential
Frequency Domain Speech Analysis
⚫ Frequency domain analysis of speech signals is to analyze and extract frequency
domain parameters of speech signals.
⚫ The most common frequency domain analysis method is Fourier analysis.
Speech signal is a non-stationary process, so short-time Fourier transform is
needed to analyze the spectrum of speech signal. The resonance peak
characteristics, pitch frequency and harmonic frequency of speech signals can
be observed through the spectrum of speech signals.
26 Huawei Confidential
Speech Features
⚫ The most commonly used speech feature in speech recognition and speaker recognition
is Mel-Frequency Cepstral Coefficients(MFCC).
⚫ MFCC processor:
Pre-emphasis
Frame Blocking
Windowing
Fast Fourier Transform(FFT)
Mel-Scale filter bank
𝐿𝑜𝑔|. |
Discrete Cosine Transform(DCT)
http://speech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/ASR%20%28v12%29.pdf
27 Huawei Confidential
Contents
1. Speech Processing
2. Speech Recognition
3. Text-to-Speech Synthesis
28 Huawei Confidential
Speech - Text
29 Huawei Confidential
Speech Recognition
⚫ Speech recognition is the technology that enables machines to recognize and
understand speech signals into texts or commands.
⚫ Speech recognition technologies include signal processing, pattern recognition,
probability theory and information theory, sound generation mechanism, auditory
mechanism, artificial intelligence, and so on.
Input Output
“I love you.”
30 Huawei Confidential
History of Speech Recognition
31 Huawei Confidential
Current Situation of Speech Recognition
⚫ Speech recognition is a sensory intelligence in artificial intelligence, and has been
applied in various fields, such as home appliances, communications, automobiles,
healthcare, and home services.
⚫ Currently, the recognition rate of some companies in standard data sets or quiet near-
field environments has reached 97%, but the recognition rate in real scenarios is far
from the expected level.
hello?
Hello
32 Huawei Confidential
Difficulties in Speech Recognition
⚫ Difficulties in speech recognition tasks:
Regional;
Scenario-specific;
Physiological.
⚫ To sum up, the difficulty of speech recognition is uncertain. The same word or sentence
may be pronounced differently because of different factors.
33 Huawei Confidential
Speech Recognition - Isolated Word Recognition
⚫ Isolated word recognition: At the early stage of speech processing, a small
number of isolated words are identified. The input voice file contains only one
word. Then, the model is used to identify the word to which the file belongs.
The common model is GMM-HMM.
Input Output
“0”
“2”
……
“9”
34 Huawei Confidential
Speech Recognition - Continuous Speech Recognition
⚫ Continuous Speech Recognition: In practice, a few isolated words cannot meet actual application
requirements. Most of the requirements need to recognize consecutive sentences. Therefore, if a
few isolated words are still used, the following problems may occur:
The entire file needs to be split into isolated words, which requires a lot of manual work and
cannot guarantee accuracy because many words are pronunciations are adhesive.
Even if the vocabulary is perfectly split, The number of words we use in actual use is so large
that the matching strategy used for isolated word recognition has no advantage here, or even
becomes a disadvantage due to the large vocabulary.
“I love you”
“Nice weather!”
……
35 Huawei Confidential
Traditional Speech Recognition Task Process
Uncompresse Language
d voice file model
Acoustic
Preprocessing
model
36 Huawei Confidential
Speech Recognition Algorithm
⚫ Traditional speech recognition algorithm:GMM-HMM。
⚫ There are many kinds of speech recognition algorithms based on deep learning,
and new models and algorithms are proposed. These models can be divided into
two directions.
Hybrid Model
End2end Model
37 Huawei Confidential
Speech Recognition Application
⚫ Many speech recognition applications like voice interaction and operation functions are
used during daily use of the app. In smart home, the state of the appliance is controlled
by voice without the use of a remote control and so on.
Voice typewriter
Voice search
Voice assistant
Smart Speaker
Customer service robot
38 Huawei Confidential
Contents
1. Speech Processing
2. Speech Recognition
3. Text-to-Speech Synthesis
39 Huawei Confidential
Text-To-Speech Synthesis
⚫ Speech synthesis, also known as Text-To-Speech(TTS), can convert any text information
into corresponding speech.
⚫ Speech synthesis, which involves acoustics, linguistics, digital signal processing and
computer science, is an advanced technology in information processing.
⚫ In order to synthesize high-quality language, it depends on various rules, including
semantic rules, lexical rules and phonetics rules, but also needs to have a good
understanding of the content of the text, which also involves the understanding of
natural language.
40 Huawei Confidential
Application Scenarios of TTS
⚫ Service robot
⚫ Customer service system
⚫ Smart furniture
⚫ Trip navigation
⚫ Reading software
41 Huawei Confidential
TTS System
⚫ A complete TTS system process is to first convert the text sequence into a phonological sequence
and then generate speech waveforms based on the phonological sequence. Where:
Step 1 involves linguistic processing, for example, particle and pronunciation conversion, and a whole set
of valid rhythm control rules.
Step 2 requires an advanced speech synthesis technology, which can synthesize high-quality speech
streams in real time based on requirements.
⚫ The research on speech synthesis technologies has a history of more than 200 years. The modern
voice synthesis technology that is really practically develops with the development of computer
technology and digital signal processing technology. It enables computers to generate high-
definition and highly natural continuous speech.
42 Huawei Confidential
Speech Synthesis Flow
Waveform synthesis
43 Huawei Confidential
Text Analysis
⚫ The main task of text analysis in speech recognition is to convert text data into
phonemic internal representation. Specific content includes:
Text normalization: Natural text data of all types is preprocessed or normalized,
including word example restoration of sentences and disambiguation of non-
standard words and homonyms.
Speech analysis: The next step of text normalization is speech analysis. Specific
methods include using a large pronunciation dictionary and word-to-phoneme
conversion rules.
Rhythmic analysis: Analyzes tone patterns and rhyming rules of text, including the
rhythmic mechanism, rhythmic prominence, and tone.
44 Huawei Confidential
Speech Synthesis Method
⚫ During development of speech synthesis technologies, early research primarily adopts the parameter synthesis
method. Later, with the development of computer technology, the synthesis method of waveform concatenation
appears.
⚫ Parameter synthesis
During development of speech synthesis technologies, early research primarily adopts the parameter synthesis method.
What is worth mentioning are Holmes' parallel formant synthesizer (1973) and Klatt's serial/parallel formant synthesizer
(1980). Through fine parameter adjustment, the two synthesizers can synthesize natural speech. However, it is difficult to
accurately extract formant parameters. Therefore, the quality of synthesized speech cannot meet practical requirements.
⚫ Waveform concatenation
From the late 1980s to nowadays, new progress has been made in speech synthesis technologies. In particular, the pitch
synchronous overlap add (PSOLA) method that was proposed in 1990 greatly improves the timbre and degree of
naturalness of speech synthesized based on the method of time domain waveform concatenation. The degree of
naturalness is higher than that based on the LPC method or formant synthesizer. In addition, the structure of the
synthesizer based on the PSOLA method is simple and the synthesizer is easy to implement and has a great commercial
prospect.
45 Huawei Confidential
Speech Synthesis Algorithms
⚫ HMM-based parameter synthesis
⚫ WaveNet
⚫ Tacotron
⚫ Deep Voice 3
46 Huawei Confidential
Contents
1. Speech Processing
2. Speech Recognition
3. Text-to-Speech Synthesis
47 Huawei Confidential
GMM Introduction
⚫ GMM can also be abbreviated as Mixture of Gaussian (MOG). A GMM is
used to accurately quantify a thing using the Gaussian density function
(normal distribution curve) and break down one thing into several
models formed based on the Gaussian probability density function (PDF).
48 Huawei Confidential
Gaussian Distribution
⚫ Gaussian distribution, also called normal distribution, was obtained the earliest by Abraham De
Moivre from the asymptotic formula for binomial distribution. Gauss derived Gaussian distribution
from another angle when he studied measurement error. Gaussian distribution is an important
probability distribution in the fields such as mathematics, physics, and engineering and has a
great influence in many aspects of statistics.
⚫ If the random variable 𝑋 follows a mathematical expectation 𝜇, the variance is the normal
distribution of 𝜎 2 and is recorded as 𝑁 (𝜇, 𝜎 2 ). The PDF of the random variable is a normal
distribution and the expectation value 𝜇 decides the position of the variable and the standard
variance σ decides the distribution amplitude. When 𝜇 is 0 and σ is 1, the normal distribution is a
standard normal distribution. The formula is as follows:
1 (𝑥−𝜇)2
−
f 𝑥 = 𝑒 2𝜎2
2𝜋𝜎
49 Huawei Confidential
Gaussian Distribution Curve
⚫ The normal curve is in a bell shape. The two sides are low and the middle is high. The
left and the right are symmetrical and take on a bell shape.
⚫ If the standard deviation is higher, the curve is flatter; if the standard deviation is lower,
the curve is thinner.
50 Huawei Confidential
SGM
⚫ When sample data 𝑋 is one-dimensional data, Gaussian distribution follows the PDF
below:
1 (𝑥−𝜇)2
−
𝑃 𝑥𝜃 = 𝑒 2𝜎2
2𝜋𝜎 2
(𝜇: average value of the data, 𝜎: standard deviation of the data.)
⚫ When sample data 𝑋 is multi-dimensional data, Gaussian distribution follows the PDF
below:
1 𝑥−𝜋 𝑇 σ(𝑥−𝜇)−1
𝑃 𝑥|𝜃 = 𝐷 𝑒− 2
(2𝜋) 2
(𝜇: average value of the data,∑: covariance,𝐷: the data dimension)
51 Huawei Confidential
Maximum Likelihood
⚫ Maximum likelihood (ML), also called maximum likelihood estimation, is a theoretical
point estimation method. The maximum likelihood estimation is a statistical method. It
is used to solve the parameter of the related PDF for a sample set.
⚫ The basic idea of this method is: After 𝑛 groups of observed sample values are
randomly drawn from a model, the most reasonable parameter estimation should make
the probability of the n groups of observed sample values drawn from the model the
maximum instead of making the model best fit the parameter estimation of sample
data, as provided by the least square estimation.
Known Unknown
1. Distribution model followed Maximum
by the sample Model parameter
Likelihood
2. Sample randomly selected
Estimation
52 Huawei Confidential
Maximum Likelihood
⚫ Assume N independent data points follow a distribution 𝑃𝑟(𝑥;𝜃). We want to find a group of parameters 𝜃 to
make the maximum probability of generating the data points. The probability is:
ෑ 𝑃𝑟(𝑥𝑖 ; 𝜃)
𝑖=1
⚫ It is called a likelihood function. Generally, the probability of one point is lower. After continuous
multiplication, data becomes smaller, which may cause floating point underflow. Therefore, the logarithm of
data is used. As a result, the probability becomes:
𝑙𝑜𝑔𝑃𝑟(𝑥𝑖 ; 𝜃)
𝑖=1
⚫ It is called a log-likelihood function. Then derivation can be performed to find the parameter 𝜃 that makes
the value of the preceding formula the greatest. We think the possibility of obtaining the observed values is
the lowest but the parameter 𝜃 makes this happen at the maximum likelihood.
53 Huawei Confidential
Parameter Learning of SGM
⚫ For an SGM, we can use maximum likelihood to estimate the value of the parameter 𝜃: 𝜃 =
𝑎𝑟𝑔𝑚𝑎𝑥𝜃 𝐿(𝜃). Here, assume each data point is independent. The likelihood parameter is provided
by the PDF:
𝑁
𝐿 𝜃 = ෑ 𝑃(𝑥𝑗 |𝜃)
𝑗=1
⚫ Because the probability of occurrence for each point is lower, the product becomes very small,
which is not helpful to calculation or observation. So we often use maximum log-likelihood for
calculation. Since logarithmic functions feature monotonicity, the functions do not change the
location of the extremum. In addition, any small change of the input value in the range 0 to 1
causes great change to the output value.
55 Huawei Confidential
Gaussian Mixture Model(GMM)
⚫ GMM is an extension of a single Gaussian probability density function, and it can
smoothly approximate the density distribution of any shape. There are two types of
Gaussian mixed models: single Gaussian Model (SGM) and Gaussian Mixture Model
(GMM).
⚫ Similar to clustering, each Gaussian model can be considered as a category based on
the Probability Density Function (PDF) parameter. Input a sample x to calculate its
value using the PDF, then a threshold is used to determine whether the sample belongs
to the Gaussian model. Obviously, SGM is suitable for binary classification, while GMM
is more refined because it has multiple models and is suitable for multi-classification
and can be applied to complex object modeling.
56 Huawei Confidential
GMM
⚫ Probability distribution of the GMM:
𝑃 𝑥 𝜃 = 𝛼𝑘 𝜙(𝑥|𝜃𝑘 )
𝑘=1
⚫ For the GMM, the parameter 𝜃=((𝜇_𝑘 ) ̃,(𝜎_𝑘 ) ̃,(𝛼_𝑘 ) ̃) indicates the occurrence
probability of the expectation and variance (or covariance) of each sub-model in the
GMM.
⚫ Parameter description:
𝑥𝑗 indicates the 𝑗th observed data. 𝑗 = 1,2,3, … , 𝑁. 𝐾 indicates the number of Gaussian models
in a GMM; 𝛼𝑘 indicates the probability that observed data belongs to the 𝑘𝑡ℎ sub-model. 𝛼𝑘 ≥
0, σ𝐾
𝑘=1 𝑎𝑘 = 1. ∅ 𝑥 𝜃𝑘 indicates the Gaussian PDF of the 𝑘𝑡ℎ sub-model: 𝜃𝑘 = (𝜇𝑘 , 𝜎 𝑘 ).
2
57 Huawei Confidential
Parameter Learning of GMM
⚫ For a GMM, the log-likelihood function is:
𝑁 𝑁 𝑁
⚫ How to calculate parameters of a GMM? We cannot use maximum likelihood to derive and
calculate the parameter that makes likelihood the maximum like a Gaussian model because we
do not know which sub-distribution (hidden variable) an observed data point belongs to.
Therefore, summation also exists in the log. The sum of 𝐾 Gaussian models is not a Gaussian
model. For each sub-model, 𝛼𝑘 , 𝜇𝑘 , and 𝜎𝑘 are unknown and cannot be calculated through direct
derivation. The parameters must be solved through an iterative approach.
58 Huawei Confidential
EM Algorithm
⚫ The expectation maximization (EM) algorithm is an iterative algorithm. It is used for the
maximum likelihood estimation(MLE) or maximum a posteriori estimation(MAP) of
probability parameter models that contain hidden variables.
⚫ The EM algorithm is a method proposed by Dempster, Laind, and Rubin in 1977 to solve
maximum likelihood estimation parameters. It can perform maximum likelihood
estimation (MLE) for incomplete data sets. This method can be widely used to process
missing data, truncated data, and incomplete data such as data with noise.
Known Unknown
1. Distribution of each
EM Algorithm
1. Sample randomly sample
selected 2. Model parameter
59 Huawei Confidential
EM Algorithm Step (1)
⚫ Initialize parameters:
⚫ E-step: According to current parameters, calculate the possibility that each data
𝑗 comes from the sub-model k.
𝛼𝑘 ∅(𝑥𝑗 |𝜃𝑘 )
𝛾𝑗𝑘 =
σ𝐾
𝑘=1 𝛼𝑘 ∅(𝑥𝑗 |𝜃𝑘 )
σ𝑁
𝑗 (𝛾𝑗𝑘 𝑥𝑗 )
𝜇𝑘 = ,where 𝑘 = 1,2,3, … , 𝑁
σ𝑁
𝑗 𝛾𝑗𝑘
60 Huawei Confidential
EM Algorithm Step(2)
⚫ M-step: Calculate the model parameter in a new round of iteration.
σ𝑁
𝑗 𝛾𝑗𝑘 (𝑥𝑗 −𝜇𝑘 )(𝑥𝑗 −𝜇𝑘 )
𝑇
σ𝑘 = σ𝐾
(Use 𝜇_𝑘 updated after this round of iteration)
𝑗 𝛾𝑗𝑘
σ𝑁
𝑗=1 𝛾𝑗𝑘
𝛼𝑘 =
𝑁
where 𝑘=1,2,3,…,𝑁
⚫ Iteration: Repeatedly calculate E-step and M-step until convergence occurs.
⚫ At this point, we complete parameter learning of GMM. It should be noted that the EM algorithm
features convergence but does not ensure that the maximum global value can be found. A local
maximum value may be found. The solution is to initialize different parameters several times for
iteration and use the initialization with the best results.
61 Huawei Confidential
EM Algorithm Step (3)
⚫ Specific iteration steps:
Initialize parameters.
E-step: Find the expectation.
M-step: Find the maximum value and calculate the model parameter in a new round
of iteration.
Perform iteration until convergence occurs.
62 Huawei Confidential
Disadvantages of GMM
⚫ Advantages:
Strong fitting capability
Maximum probability of speech feature matching
⚫ Disadvantages:
The sequence factor cannot be processed.
Linear or approximate linear data cannot be processed.
63 Huawei Confidential
Contents
1. Speech Processing
2. Speech Recognition
3. Text-to-Speech Synthesis
64 Huawei Confidential
Cases of Markov Chain
⚫ The publicity for commodities A, B, and C under a category is different. The
probabilities for customers to attempt to select and buy commodities A, B, and
C under the advertising effect are respectively 0.2, 0.4, and 0.4. The following
table describes purchase predisposition of customers. Find the probabilities for
customers to buy the commodities the fourth time.
Second Purchase
A B C
65 Huawei Confidential
Cases of Markov Chain
⚫ Three elements:
Initial probability: 𝜋 = (0.2,0.4,0.4)
Transition probability: 𝑝𝐴𝐴 = 0.8, 𝑝𝐴𝐵 = 0.1, 𝑝𝐴𝐶 = 0.1, 𝑝𝐵𝐴 = 0.5, …
66 Huawei Confidential
Markov Chain
⚫ Markov chain refers to a random discrete event process with Markov properties in
mathematics. In the process, when current knowledge or information is provided, future
prediction is unrelated to the past and is related to only the current state.
⚫ In each step of Markov chain, the system can change from one state to another
according to probability distribution or maintain the current state. The change of states
is called transition. The probability related to change of different states is called
transition probability.
0.4
0.3 0.1
0.4
67 Huawei Confidential
Principles of Markov Chain
⚫ Principles:
Markov chain describes a state sequence, in which each state value depends on the finite number of
previous states. Markov chain is a sequence of random variables with Markov properties. The range of the
variables, namely the set of possible values of the variables, is called state space.
⚫ Properties:
Positive definiteness: Each element in the state transition matrix is called a state transmission probability.
It can be learned from knowledge of the probability theory that each state transition probability is a
positive number, which can be expressed using the formula:
𝑝𝑖𝑗 (𝑘) ≥ 0
Finiteness: According to knowledge of the probability theory, adding each row in the state transmission
matrix is equal to 1, which can be expressed using the formula:
σ 𝑝𝑖𝑗 = 1
68 Huawei Confidential
Observable Markov Model
⚫ For one question, we have initial distribution 𝜋 and transition probability matrix 𝐴. In
any given time 𝑡, we have a state 𝑄𝑡 . When one state transits to another with the
change of time, an observation sequence is obtained, that is, state sequence 𝑂 =
[𝑞1 , 𝑞2 , 𝑞3 , 𝑞4 , … , 𝑞𝑛 ]. In the whole question, there are 𝑛 observation states in total.
⚫ The probability of such a sequence is:
⚫ Therefore, an observable Markov model has a triplet description (𝐴, 𝜋, 𝑛), which can be
abbreviated as 𝐴, 𝜋 .
69 Huawei Confidential
Markov Chain Learning
⚫ Markov Process
Markov model learning problem refers to learning parameters of Markov model after a series
of observation data is given.
The learning content includes the initial probability 𝜋 and the transition probability matrix A.
The state set is determined during the problem study and does not require an additional
learning process.
Initial probability:𝜋
ABACCACBCBBAC… Transition probability:𝑝𝐴𝐴 , 𝑝𝐴𝐵 , 𝑝𝐴𝐶 , 𝑝𝐵𝐴 , …...
BCBCACCBACACB
State set:S
70 Huawei Confidential
Markov Chain Learning Algorithm-Exhaustion
⚫ Exhaustion method: In Markov model learning, exhaustion is probability approximation
method. The more sample sequence data can be obtained, the more accurate the
obtained parameter is.
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒𝑠 𝑠𝑡𝑎𝑟𝑡𝑖𝑛𝑔 𝑤𝑖𝑡ℎ 𝑠𝑡𝑎𝑡𝑒 𝑖
Initial probability:𝜋𝑖 = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒𝑠
⚫ For example, assuming that there is data [red, red, red] [red, red, blue] [red, blue, red]
[blue, red, red], the initial probability and the transition probability may be obtained by
using the exhaustion method:
Initial probability:𝜋={0.75,0.25};
Transition probability: 𝑝(𝑟𝑒𝑑, 𝑏𝑙𝑢𝑒) = 1/3, 𝑝(𝑟𝑒𝑑, 𝑟𝑒𝑑) = 2/3, 𝑝(𝑏𝑙𝑢𝑒, 𝑏𝑙𝑢𝑒) = 0, 𝑝(𝑏𝑙𝑢𝑒, 𝑟𝑒𝑑) = 1
71 Huawei Confidential
Markov Chain Prediction
⚫ Markov Model Prediction
Markov‘s prediction problem refers to the initial probability of 𝜋 given the Markov model
parameter. Transition probability: 𝑝𝐴𝐴 , 𝑝𝐴𝐵 , 𝑝𝐴𝐶 , 𝑝𝐵𝐴 , … …;State set 𝑆, calculate the probability
of occurrence of an observation sequence.
⚫ The probability of occurrence of a sequence can be predicted. At the same time, the
Markov model can be used to determine which sequence is generated.
Markov characteristic that the probability of the state at the n-1 moment is related only to the
state at the n moment. Thus the probability calculation of the foregoing sequence may be
chained decomposition.
Multiplier on the equation consists of initial probability and transition probability. These
probability values are known. Thus the probability of sequence occurrence can be easily
calculated.
73 Huawei Confidential
Case Study-Hidden Markov Model
⚫ The key difference between Hidden Markov Model and Markov Model lies in this "hidden". The
following uses a simple case to describe what the Hidden Markov Model is.
Mood Activity
affect
⚫ Assumption:
As shown in the upper left figure, Jimmy has three mood states: "happy", "unhappy" and “just so so“;
Jimmy's daily activities are "playing football", "listening to music" and "watching television".
Jimmy's activities are affected by the mood of the day. For example, when he is "happy" he will choose to
watch TV, “unhappy" most likely to play football.
Jimmy's mood changes every day. The mood of the next day is affected only by the mood of the previous
day.
74 Huawei Confidential
Case Study-Hidden Markov Model
⚫ In this case, Jimmy’s mood and activity sequence in a period of time may be represented by using
the following schematic diagram, and it can be seen that there are two sequences:
The mood states are a sequence, and the states of the mood sequence are generated under certain rules.
Activity is also a sequence, and the sequence of activity is influenced by the mood sequence.
⚫ In this case, Jimmy's mood is hidden (the mood is not observed directly), and activities can be
directly observed. So we usually
The states that cannot be observed directly are called hidden states, such as Jimmy's mood.
The states that can be observed directly are called Observation States, like Jimmy's activities.
Hidden States
Observation States
75 Huawei Confidential
Case Study-Hidden Markov Model
⚫ In this case:
⚫ The mathematical model for this problem is the hidden Markov model.
76 Huawei Confidential
Case Study-Hidden Markov Model
⚫ In fact, the Hidden Markov Model is an extension of the Standard Markov Model by
adding the set of observation states and hidden states.
⚫ Similar to Markov model, there are three model elements, and Hidden Markov model
has five elements. Once these five elements are determined, Hidden Markov model is
only determined.
Observation
Hidden states 𝑆 states 𝑅 Initial probability 𝜋
77 Huawei Confidential
Case Study-Hidden Markov Model
⚫ The HMM can be described in the following five elements:
Observation states:𝑅 = 𝑅1 , 𝑅2 , 𝑅3,…, 𝑅𝑚 ; like Jimmy’s mood states "happy", "unhappy" and “just so so“.
78 Huawei Confidential
Three Issues of the HMM
⚫ Evaluation:
Forward algorithm
Backward algorithm
⚫ Decoding
Dynamic planning algorithm
Viterbi algorithm
⚫ Learning
Supervised algorithm
Unsupervised Baum-Welch algorithm
79 Huawei Confidential
Hidden Markov Model Evaluation
⚫ Hidden Markov model evaluation
Given the hidden Markov model, it includes:
◼ Initial probability:𝜋
◼ Transition probability:𝐴
◼ Observation probability:𝐵
◼ Hidden states: S= 𝑆1 , 𝑆2 , 𝑆3,…, 𝑆𝑛
80 Huawei Confidential
Hidden Markov Model Evaluation
⚫ Example: Calculate the probability that Jimmy's activities in a week are in the
following sequence:
TV-> Football-> Football-> Music-> TV -> Football-> TV Football-> Football-> Music-
> TV -> Music-> Football-> Music
Observation Sequence
Known Model Parameters Probability
Prediction
……
Initial probability :𝜋
……
Transition probability : 𝐴
Observation probability:𝐵 ……
……
81 Huawei Confidential
HMM Algorithm - Forward and Backward Algorithm (1)
⚫ Assumed that the hidden state sequence for generating the observation sequence is 𝑄
𝑄 = 𝑞1 ,𝑞2 ,𝑞3 ,𝑞4 , … … , 𝑞𝑇 , T indicates the sequence length;𝑞𝑖 indicates the observation
status of the 𝑖𝑡ℎ position in the sequence,𝑞𝑖 ∈ S
⚫ The forward and backward algorithms are similar. The core improvement of the
forward algorithm is to repeat the calculation part.
Assume 𝑂𝑡 = 𝑜1 ,𝑜2 ,𝑜3 ,𝑜4 , … … , 𝑜𝑡
P(𝑂𝑡 , 𝑞t = 𝑆𝑖 ) indicates the probability that the 𝑡𝑡ℎ hidden state is 𝑆𝑖 and 𝑂𝑡 is generated.
P(𝑂𝑡 , 𝑞t = 𝑆𝑖 ) characteristic:
◼ 𝑃 𝑂 = 𝑃 𝑂𝑇 = σ𝑛𝑖=1 P(𝑂𝑇 , 𝑞𝑇 = 𝑆𝑖 )
82 Huawei Confidential
HMM Algorithm - Forward and Backward Algorithm (2)
⚫ Forward algorithm calculation process:
Traversal sequence point 𝑡 ∈ 1, 𝑇 :
◼ 𝑡 =1, P(𝑂1 , 𝑞1 = 𝑆𝑖 ) = 𝑃𝑖𝑛𝑖 𝑆𝑖 ∗ 𝑃𝑜𝑏𝑣 (𝑜1 ห𝑆𝑖 )
◼ 𝑡 >1for each hidden state 𝑆𝑖 ,compute P(𝑂𝑡 , 𝑞t = 𝑆𝑖 ) = σ𝑛𝑗=1 P(𝑂𝑡−1 , 𝑞t−1 = 𝑆𝑗 ) ∗ 𝑃𝑡𝑟𝑎𝑛𝑠 𝑆𝑖 ቚ𝑆𝑗 ∗
83 Huawei Confidential
Hidden Markov Model Learning
⚫ The learning problem is the basis of the other two problems of the Hidden Markov
model, which is given in both evaluation and decoding problems. But in practice, like
Jimmy's mood and activities, no one knows what the model parameters of this hidden
Markov model are, thus it needs to learn these model parameters from the observation
data.
⚫ The learning problem of Hidden Markov Model is
Known observation state sequences and corresponding hidden state sequences.
Objective: to solve the parameters of Hidden Markov model.
◼ Initial probability :𝜋
◼ Transition probability : 𝐴
◼ Observation probability:𝐵
84 Huawei Confidential
Hidden Markov Model Learning
⚫ For example, learning a hidden Markov model for both Jimmy's mood and activities
given the sequence of Jimmy's activities in a week.
…… Learning
Initial probability :𝜋
……
Transition probability : 𝐴
……
Observation probability:𝐵
……
85 Huawei Confidential
HMM Learning Algorithm - Supervision Algorithm
⚫ Application scenario: A large number of observation state sequences and corresponding hidden states sequence are known.
⚫ Idea: Use the conclusion of the large number theorem that the limit of frequency is probability to directly obtain the
parameter estimation of HMM.
Hidden states
|𝑆𝑒𝑞_𝑏𝑒𝑔𝑖𝑛_𝑤𝑖𝑡ℎ_𝑆𝑖 |
⚫ Initial probability 𝜋:𝑃𝑖𝑛𝑖 𝑆𝑖 = |𝑆𝑒𝑞| Observation states
𝑆𝑒𝑞_𝑏𝑒𝑔𝑖𝑛_𝑤𝑖𝑡ℎ_𝑆𝑖 indicates the hidden states starting with 𝑆𝑖 in the sample.
|𝑡𝑟𝑎𝑛𝑠(𝑆𝑗, 𝑆𝑖 )|
⚫ Transition probability 𝐴:𝑃𝑡𝑟𝑎𝑛𝑠 𝑆𝑖 ቚ𝑆𝑗 = | 𝑡𝑟𝑎𝑛𝑠 𝑆𝑗 |
𝑡𝑟𝑎𝑛𝑠 𝑆𝑗 indicates all state transition pairs initiated by 𝑆𝑗 in the hidden states sequence.
𝑡𝑟𝑎𝑛𝑠(𝑆𝑗 , 𝑆𝑖 ) indicates all state transition pairs from 𝑆𝑗 to 𝑆𝑖 in the hidden state sequence.
| 𝑒𝑚𝑖𝑠𝑠(𝑆𝑗,𝑅𝑘 ) |
⚫ Observation probability 𝐵:𝑃𝑜𝑏𝑣 𝑅𝑘 ቚ𝑆𝑗 = | 𝑒𝑚𝑖𝑠𝑠 𝑆𝑗 |
𝑒𝑚𝑖𝑠𝑠(𝑆𝑗 , 𝑅𝑘 ) indicates the emission pair of the observed state 𝑅𝑘 generated by the hidden state 𝑆𝑗 .
𝑒𝑚𝑖𝑠𝑠 𝑆𝑗 indicates emission pairs of all possible observation states generated by the hidden state 𝑆𝑗 .
86 Huawei Confidential
HMM Learning Algorithm - Baum-Welch
⚫ Observe the sequence, states sequence not provided, then calculate the HMM.
⚫ The essence of this algorithm is the EM algorithm. When the observed value 𝑋
is available and the observed value has a hidden variable 𝑍, the joint probability
𝑃(𝑋, 𝑍|𝜆) under the HMM parameter λ can be calculated.
⚫ Solution steps:
Find the Log Likelihood Function of data
መ
EM-E step: Find Q function 𝑄(𝜆, 𝜆)
EM-M step: Max 𝑄 function to find the parameter
87 Huawei Confidential
Hidden Markov Model Decoding
⚫ Hidden Markov Model Decoding
Given the hidden Markov model, it includes:
◼ Initial probability:𝜋
◼ Transition probability:𝐴
◼ Observation probability:𝐵
◼ Hidden states: S= 𝑆1 , 𝑆2 , 𝑆3,…, 𝑆𝑛
88 Huawei Confidential
HMM Decoding Algorithm - Exhaustion
⚫ Assumed that the possible hidden states sequence for generating the observation sequence is 𝑄
𝑄 = 𝑞1 ,𝑞2 ,𝑞3 ,𝑞4 , … … , 𝑞𝑇 , T indicates the sequence length;
𝑞𝑖 indicates the observation states of the 𝑖 𝑡ℎ position in the sequence,𝑞𝑖 ∈ S
89 Huawei Confidential
HMM Decoding Algorithm - Viterbi
⚫ Observation sequence 𝑂 = 𝑜1 ,𝑜2 ,𝑜3 ,𝑜4 , … … , 𝑜𝑇 , 𝑄𝑚𝑎𝑥 = 𝑞1 ,𝑞2 ,𝑞3 ,𝑞4 , … … , 𝑞𝑇 ;
⚫ Observation sequence 𝑂𝑡 = [𝑜1 ,𝑜2 ,𝑜3 ,𝑜4 , … … , 𝑜𝑡 ], 𝑄max_𝑡 = 𝑞1 ,𝑞2 ,𝑞3 ,𝑞4 , … … , 𝑞𝑡 .
𝑂𝑡 is a subsequence of the first 𝑡 elements of 𝑂;𝑄max_𝑡 is a subsequence of the first 𝑡 elements of 𝑄𝑚𝑎𝑥 .
⚫ Idea:
Forward traversal: For each hidden state in the sequence at each time step, two information is
recorded.
◼ Hidden state at the previous time step with highest probability path.
◼ Maximum probability of observation sequence in this state
Backtracking
◼ For each moment, a hidden state sequence with a maximum probability of observation state is extracted.
90 Huawei Confidential
HMM Decoding Algorithm - Viterbi
⚫ Assumed that the possible hidden states sequence for generating the observation sequence is 𝑄
𝑄 = 𝑞1 ,𝑞2 ,𝑞3 ,𝑞4 , … … , 𝑞𝑇 , T indicates the sequence length;
𝑞𝑖 indicates the observation states of the 𝑖 𝑡ℎ position in the sequence,𝑞𝑖 ∈ S
⚫ Definition as follows:
Assume 𝑂𝑡 = 𝑜1 ,𝑜2 ,𝑜3 ,𝑜4 , … … , 𝑜𝑡
P(𝑂𝑡 , 𝑞t = 𝑆𝑖 ) indicates the max probability of the 𝑡𝑡ℎ hidden state is 𝑆𝑖 and 𝑂𝑡 is generated.
P(𝑂𝑡 , 𝑞t = 𝑆𝑖 ) characteristics:
◼ Max probability P(𝑂𝑡 , 𝑞t = 𝑆𝑖 ) based on 𝑡 − 1 time step hidden state 𝑃𝑟𝑒_𝑆(𝑡, 𝑆𝑖 ):
◼ Compute P(𝑂𝑡 , 𝑞t = 𝑆𝑖 ) = P(𝑂𝑡−1 , 𝑞t−1 = 𝑆𝑗 ) ∗ 𝑃𝑡𝑟𝑎𝑛𝑠 𝑃𝑟𝑒_𝑆(𝑡, 𝑆𝑖 ) = ቚ𝑆𝑗 ∗ 𝑃𝑜𝑏𝑣 (𝑜𝑡−1 ห𝑆𝑖 )
91 Huawei Confidential
HMM Decoding Algorithm - Viterbi
⚫ The Viterbi algorithm consists of forward calculation and backward backtracking.
Forward calculation, traversing sequence 𝑡 ∈ 1, 𝑇 :
◼ 𝑡 =1, P(𝑂1 , 𝑞1 = 𝑆𝑖 ) = 𝑃𝑖𝑛𝑖 𝑆𝑖 ∗ 𝑃𝑜𝑏𝑣 (𝑜1 ห𝑆𝑖 )
◼ 𝑡 >1, for each hidden state𝑆𝑖
− Compute P(𝑂𝑡 , 𝑞t= 𝑆𝑖 ) = P(𝑂𝑡−1, 𝑞t−1= 𝑆𝑗 ) ∗ 𝑃𝑡𝑟𝑎𝑛𝑠 𝑃𝑟𝑒_𝑆(𝑡, 𝑆𝑖 ) = ቚ𝑆𝑗 ∗ 𝑃𝑜𝑏𝑣 𝑜𝑡−1ห𝑆𝑖
− 𝑡 = T, 𝑞𝑡 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑆j P(𝑂𝑇 , 𝑞𝑇 = 𝑆𝑗
92 Huawei Confidential
Contents
1. Speech Processing
2. Speech Recognition
3. Text-to-Speech Synthesis
93 Huawei Confidential
Role of GMM - HMM (1)
⚫ Role of GMM:
A GMM is used to obtain the probability of a factor.
A GMM consists of three to five superimposed Gaussian models.
A Gaussian model is a normal distribution and represents the probability density of signals.
In speech recognition, one word consists of multiple phonemes. One phoneme is one state and
one state is associated with a GMM.
Each GMM has 𝐾 model parameters.
(Here, the corresponding model is a GMM, that is, 𝐾 Gaussian weights corresponding to each
state, each Gaussian average vector, and variance matrix.)
94 Huawei Confidential
Role of GMM - HMM (2)
⚫ An HMM performs speech modeling:
An HMM is created for each word. Training samples of the word are used. The
training samples are labeled in advance, that is, each sample corresponds to a
section of audio and the audio contains only the pronunciation of the word.
After multiple training samples of the word are available, the samples, together with
the Baum-Welch algorithm and the EM algorithm, are used to train out all
parameters of GMM-HMM. These parameters include the probability vector of the
initial state, inter-state transition matrix, and observation matrix corresponding to
each state.
95 Huawei Confidential
Role of GMM – HMM (3)
⚫ HMM in the recognition phase:
If a section of audio that includes multiple words is input, the audio can be manually
separated (the simplest method is considered). Then the audio MFCC feature
sequence of each word is extracted and is input in each HMM (that is trained in
advance). The forward algorithm is used to obtain the probability of generating the
sequence by each HMM and finally the model with the highest probability is used.
The word indicated by the model is the recognition result.
96 Huawei Confidential
GMM - HMM Speech Recognition
⚫ A wav file is obtained. The following shows the process of recognizing a word:
Cut a waveform into several equal frames and extract the MFCC feature of each frame.
Run the GMM for the feature of each frame to obtain the probability state in which a frame belongs to a state.
According to the HMM state transition probability a of each word, calculate the probability of generating the frame in
each state sequence. If the probability where the HMM sequence of a word occurs is the highest, the speech belongs to
the word.
𝑏𝑠𝑖𝑙 𝑜1 ∙ 0.6 ∙ 𝑏𝑠𝑖𝑙 𝑜2 ∙ 0.6 ∙ 𝑏𝑠𝑖𝑙 𝑜3 ∙ 0.6 ∙ 𝑏𝑠𝑖𝑙 𝑜4 ∙ 0.4 ∙ 𝑏𝑦 𝑜5 ∙ 0.3 ∙ 𝑏𝑦 𝑜6 ∙ 0.3 ∙ 𝑏𝑦 𝑜7 ∙ 0.7 ∙∙∙
97 Huawei Confidential
Example: Single Word Recognition (1)
⚫ Task : recognize the speech of single word
Like : /l ai k/
One: /w ∧ n/
⚫ Like :
Hidden state: l ai k sli
observe state: mfcc feature
⚫ One:
Hidden state: w ∧ n sli
observe state: mfcc feature
98 Huawei Confidential
Example: Single Word Recognition (2)
⚫ Like :
sample 1 sample 2
……
l l l ai ai ai ai ai k k k sli
Hidden state sequence l l l l ai ai ai ai ai k k sli
99 Huawei Confidential
Example: Single Word Recognition (3)
⚫ One :
sample 1 sample n
……
Hidden state sequence w w w w ∧ ∧ ∧ n n sli sli sli w w ∧ ∧ ∧ ∧ ∧ n n n sli sli
LIKE ONE
HMM-GMM HMM-GMM
preprocessing preprocessing
1. Speech Processing
2. Speech Recognition
3. Text-to-Speech Synthesis
𝑉𝐿 Observation
Probabilities
𝑉 𝐿−1
… DNN
𝑉2
Window of feature
frames 𝑉1
Observation
1. Speech Processing
2. Speech Recognition
3. Text-to-Speech Synthesis
Input layer
Hidden layer
Output layer
𝑜
𝑜𝑡−1 𝑜𝑡 𝑜𝑡+1
V V V V
W
s W
W W W
U U U U
𝑥 𝑥𝑡−1 𝑥𝑡 𝑥𝑡+1
ℎ𝑡−1 ℎ𝑡 ℎ𝑡+1
tanh
A A
𝑥𝑡−1 𝑥𝑡 𝑥𝑡+1
𝜕𝐸3
𝜕𝑠1 𝜕𝑠2 𝜕𝑠3 𝜕𝑠3
𝜕𝑠0 𝜕𝑠1 𝜕𝑠2
𝑠0 𝑠1 𝑠2 𝑠3 𝑠4
𝑥0 𝑥1 𝑥2 𝑥3 𝑥4
ℎ0 ℎ1 ℎ2 ℎt ℎ𝑡+1 ℎ𝑡+2
A A A A A A
…
𝑥0 𝑥1 𝑥2 𝑥𝑡 𝑥𝑡+1 𝑥𝑡+2
ℎ𝑡−1 ℎ𝑡 ℎ𝑡+1
𝑥𝑡−1 𝑥𝑡 𝑥𝑡+1
ℎt
𝐶t−1 𝐶t
tanh
𝑓t 𝑖t 𝐶ሚ 𝑜t
𝑡
𝜎 𝜎 tanh
𝜎
ℎt−1 ℎt
𝑥t
𝐶t−1 𝐶t
tanh
𝑖t 𝑜t
𝑓t 𝑓𝑡 = 𝜎(𝑊𝑓 ∙ ℎ𝑡−1 , 𝑥𝑡 + 𝑏𝑓 )
𝐶ሚ𝑡
𝜎 𝜎 tanh 𝜎
ℎt−1 ℎt
𝑥t
𝐶t−1 𝐶t
𝑖𝑡 = 𝜎(𝑊𝑖 ∙ ℎ𝑡−1, 𝑥𝑡 + 𝑏𝑖 )
tanh
𝑖t 𝑜t
𝑓t
𝐶ሚ𝑡 𝐶ሚ𝑡 = 𝜎(𝑊𝐶 ∙ ℎ𝑡−1, 𝑥𝑡 + 𝑏𝐶 )
𝜎 𝜎 tanh 𝜎
ℎt
ℎt−1
𝑥t
𝐶t−1 𝐶t
tanh
𝑜t
𝑓t 𝑖t 𝐶ሚ𝑡 𝐶𝑡 = 𝑓𝑡 ∗ 𝐶𝑡−1 + 𝑖𝑡 ∗ 𝐶ሚ𝑡
𝜎 𝜎 tanh 𝜎
ℎt
ℎt−1
tanh
𝑥t
ℎt
𝐶t−1 𝐶t
𝑜𝑡 = 𝜎(𝑊𝑜 ℎ𝑡−1, 𝑥𝑡 + 𝑏𝑜 )
tanh
𝑖t 𝑜t
𝑓t
𝐶ሚ𝑡 ℎ𝑡 = 𝑜𝑡 ∗ 𝑡𝑎𝑛ℎ(𝐶𝑡 )
𝜎 𝜎 tanh 𝜎
ℎt
ℎt−1
𝑥t
Update gate
Reset gate
State candidate
Current State
Output
1. Speech Processing
2. Speech Recognition
3. Text-to-Speech Synthesis
𝑋=
𝑎 𝑎 𝜖 𝑏 𝑏 𝜖 𝑐 𝑎 𝜖 𝑏 𝜖 𝑐 𝑐
ℬ ℬ
𝑎 𝑏 𝑐 𝑎 𝑏 𝑐
136 Huawei Confidential
Voice data alignment (1)
⚫ In a voice dataset, audio files are difficult to align with the output text.
⚫ In the traditional speech recognition model, before training the speech model, the text and speech are often
strictly aligned. There are two disadvantages:
Strict alignment takes manpower and time.
After strict alignment, the predicted label is only the result of partial classification, but cannot provide the
output result of the entire sequence. The expected result can be obtained only after post-processing is
performed on the predicted label.
⚫ CTC(Connectionist Temporal Classification) Loss Function in many-to-many sequence is without alignment
information.
https://distill.pub/2017/ctc/
https://distill.pub/2017/ctc/
1. Speech Processing
2. Speech Recognition
3. Text-to-Speech Synthesis
B. Noise reduction
C. Standardization
D. Deduplication
2 Huawei Confidential
Contents
1. Introduction to NLP
2. Knowledge Required
3. Key Tasks
4. Applications
3 Huawei Confidential
What Is NLP ?
⚫ “Natural” languages
English, Mandarin, French, Swahili, Arabic, Nahuatl, ….
NOT Java, C++, Perl, …
4 Huawei Confidential
Real-word NLP
5 Huawei Confidential
Language Technology
making good progress
Sentiment analysis still really hard
mostly solved Best roast chicken in San Francisco!
Question answering (QA)
The waiter ignored us for 20 minutes.
Q. How effective is ibuprofen in reducing
Spam detection Coreference resolution fever in patients with acute febrile illness?
Let’s go to Agra! ✓ Carter told Mubarak he shouldn’t run again.
Paraphrase
Buy V1AGRA … ✗
Word sense disambiguation (WSD) XYZ acquired ABC yesterday
ABC has been taken over by XYZ
Part-of-speech (POS) tagging I need new batteries for my mouse.
ADJ ADJ NOUN VERB ADV Summarization
Colorless green ideas sleep furiously. Parsing The Dow Jones is up
Economy
I can see Alcatraz from the window! The S&P500 jumped is good
Named entity recognition (NER) Housing prices rose
Machine translation (MT)
PERSON ORG LOC 第13届上海国际电影节开幕… Dialog Where is Citizen Kane playing in SF?
Einstein met with UN officials in Princeton The 13th Shanghai International Film Festival…
Castro Theatre at 7:30. Do
Information extraction Party you want a ticket?
(IE) invited to our dinner
You’re May
27
party, Friday May 27 at 8:30
add
7 Huawei Confidential
Why NLP is Hard? (2)
⚫ Ambiguity at multiple levels :
Word senses: bank (finance or river ?)
Part of speech: chair (noun or verb ?)
Syntactic structure: I can see a man with a telescope.
Multiple: I made her duck.
8 Huawei Confidential
Why NLP Is Hard? (3)
Logistic-based/rule-
based NLP
~1990s
Statistical NLP
~2010s
End-to-end NLP
10 Huawei Confidential
Symbolic and Probabilistic NLP
11 Huawei Confidential
Deep Learning and NLP
output
⚫ Representation Learning (e.g. word embeddings)
⚫ End-to-end Optimization (e.g. NMT)
Deep Learning
⚫ Transfer Learning (e.g. BERT)
⚫ Structure Learning (e.g. Transformer)
input
12 Huawei Confidential
Contents
1. Introduction to NLP
2. Knowledge Required
▫ Word Embeddings
◼ Language Models
3. Key Tasks
4. Applications
13 Huawei Confidential
Why Do We Need Word Representation?
https://github.com/yandexdataschool/nlp_course/tree/2020/week01_embeddings
14 Huawei Confidential
Representing words as discrete symbols
⚫ In traditional NLP, we regard words as discrete symbols:
hotel = [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 ]
motel = [0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ]
15 Huawei Confidential
Problem with one-hot vectors
⚫ These two vectors are orthogonal.
⚫ There is no natural notion of similarity for one-hot vectors!
⚫ These vectors do not contain information about the meaning of a word.
hotel = [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 ]
motel = [0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ]
16 Huawei Confidential
Representing Words By Their Context
⚫ Distributional semantics: A word’s meaning is given by the words that
frequently appear close-by.
17 Huawei Confidential
Dense Word Vector By Contexts
⚫ We will build a dense vector for each word, so that it is similar to vectors of
words that appear in similarity contexts.
banking =
18 Huawei Confidential
Visualization
19 Huawei Confidential
Count-based: Co-occurrence Counts + SVD
Example corpus:
I like deep learning
I like nlp
I enjoy flying
20 Huawei Confidential
Word2Vec: A Prediction-Based Method
⚫ take a huge text corpus;
⚫ go over the text with a sliding window, moving one word at a time. At each step, there
is a central word and context words (other words in this window);
⚫ for the central word, compute probabilities of context words (or vice versa);
⚫ adjust the vectors to increase these probabilities.
21 Huawei Confidential
Objective Function: Negative Log-Likelihood
⚫ For each position 𝑡 = 1, … , 𝑇 in a text corpus, Word2Vec predicts context words
within a m-sized window given the central word 𝑤𝑡 :
https://github.com/yandexdataschool/nlp_course/tree/2020/week01_embeddings
22 Huawei Confidential
Word2Vec: prediction function
𝑇
exp 𝑢𝑜 𝑣𝑐
𝑝 𝑜𝑐 = 𝑇𝑣
σ𝑤∈𝑉 exp 𝑢𝑤 𝑐
23 Huawei Confidential
Two Vectors for Each Word
https://github.com/yandexdataschool/nlp_course/tree/2020/week01_embeddings
24 Huawei Confidential
Why Two Vectors ?
⚫ In Word2Vec we train two vectors for each word: one when it is a central word
and another when it is a context word. After training, context vectors are
thrown away.
⚫ When central and context words have different vectors, both the first term and
dot products inside the exponents are linear with respect to the parameters.
Therefore, the gradients are easy to compute.
25 Huawei Confidential
How to Train Word2Vec
https://github.com/yandexdataschool/nlp_course/tree/2020/week01_embeddings
26 Huawei Confidential
Faster Training: Negative Sampling
https://github.com/yandexdataschool/nlp_course/tree/2020/week01_embeddings
27 Huawei Confidential
Word2Vec Variants: Skip-Gram and CBOW
https://github.com/yandexdataschool/nlp_course/tree/2020/week01_embeddings
28 Huawei Confidential
GloVe: Combine Count-based and Direct Prediction methods
𝑊
1
𝐽 𝜃 = 𝑓(𝑋𝑖𝑗 )(𝑢𝑖 𝑇 𝑣𝑗 − log 𝑋𝑖𝑗 )2
2
𝑖,𝑗=1
𝑋𝑓𝑖𝑛𝑎𝑙 = 𝑈 + 𝑉
29 Huawei Confidential
Fasttext: Word2Vec with N-gram
30 Huawei Confidential
Contents
1. Introduction to NLP
2. Knowledge Required
▫ Word Embeddings
◼ Language Models
3. Key Tasks
4. Applications
31 Huawei Confidential
What Is a Language Model ? (1)
⚫ Language Modeling is the task of predicting what word comes next.
working
you
⚫ More formally: given a sequence of words 𝑥 (1) , 𝑥 (2) , … , 𝑥 (𝑡) , compute the
probability distribution of the next word 𝑥 (𝑡+1) :
𝑡+1
𝑃 𝑥 𝑥 (𝑡) , … , 𝑥 1 )
33 Huawei Confidential
Applications of Language Models
34 Huawei Confidential
N-gram Language Models
⚫ Definition: A n-gram is a chunk of n consecutive words.
I really really like ______
unigrams: “I”, “really”, “really”, “like”
Bigrams: “I really”, “really really”, “really like”
Trigrams: “I really really”, “really really like”
4-grams: “I really really like”
35 Huawei Confidential
N-gram Language Models
⚫ First we make a simplifying assumption: 𝑥 (𝑡+1) depends only on the
preceding n-1 words.
n-1 words
𝑡+1
𝑃 𝑥 𝑥𝑡 , … , 𝑥 1 = 𝑃(𝑥 𝑡+1 |𝑥 𝑡 , … , 𝑥 (𝑡−𝑛+2) )
prob of a n-gram
𝑃(𝑥 𝑡+1 , 𝑥 𝑡 ,…,𝑥 (𝑡−𝑛+2) )
=
𝑃(𝑥 𝑡 ,…,𝑥 (𝑡−𝑛+2) )
36 Huawei Confidential
Example: N-gram Language Models
Suppose we are learning a 4-gram Language Model.
discard
condition on this
37 Huawei Confidential
Problems with n-gram Language Models
Storage: Need to store count for all n-grams you saw in the corpus.
38 Huawei Confidential
Fixed-window Neural Language Model
working
output distribution: you
𝑦 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑈ℎ + 𝑏2 )
hidden layer:
ℎ = 𝑓(𝑊𝑒 + 𝑏1 )
concatenated word
embedding
39 Huawei Confidential
Recurrent Neural Networks (RNN)
⚫ Core idea: Apply the same weights W repeatedly.
outputs
(optional)
hidden states
input sequence
(any length)
40 Huawei Confidential
RNN Language Model
⚫ RNN Advantages:
Can process any length input.
Compute for step t can (in theory) use information from many steps back.
Model size does not increase for longer input.
⚫ RNN Disadvantages:
Recurrent computation is slow.
In practice, difficult to access information from many steps back
41 Huawei Confidential
Generating Text with a RNN Language Model
42 Huawei Confidential
Language Model for Word Embedding
⚫ ELMO, BERT, GPT use language model to generate the word embedding
dynamic, the same word with different context will have different
representation vectors.
43 Huawei Confidential
Contents
1. Introduction to NLP
2. Knowledge Required
3. Key Tasks
▫ Keywords Extraction
▫ Text Classification
▫ Text Generation
▫ Sequence Labeling
▫ Sequence to Sequence
4. Application Systems
44 Huawei Confidential
Keywords Extraction
⚫ Keywords are a group of words that represent the important content of an
article.
⚫ the technology of automatic keyword extraction enables people to browse and
retrieve information conveniently, and plays an important role in text clustering,
classification, and automatic summarization.
45 Huawei Confidential
TF - IDF Algorithm
⚫ Term Frequency-Inverse Document Frequency (TF-IDF): a statistical calculation
method commonly used to assess the importance of a word to a document in a
fileset.
|𝐷|
𝑖𝑑𝑓𝑖 = log( )
1 + |𝐷𝑖 |
𝑛𝑖𝑗 |𝐷|
𝑡𝑓 × 𝑖𝑑𝑓 𝑖, 𝑗 = 𝑡𝑓𝑖𝑗 × 𝑖𝑑𝑓𝑖 = × log( )
σ𝑘 𝑛𝑘𝑗 1 + |𝐷𝑖 |
46 Huawei Confidential
TextRank: PageRank on Text
⚫ The basic idea of the TextRank algorithm comes from Google's PageRank algorithm.
PageRank is a link analysis algorithm which used to evaluate the importance of a web
page in the search system. There are two basic ideas:
Link quantity. A web page is more important if it is linked by more other web pages.
Link quality. A web page is more important if it is linked by another web page with higher
weight.
47 Huawei Confidential
Contents
1. Introduction to NLP
2. Knowledge Required
3. Key Tasks
▫ Keywords Extraction
▫ Text Classification
▫ Text Generation
▫ Sequence Labeling
▫ Sequence to Sequence
4. Application Systems
48 Huawei Confidential
Text Classification: Definition
⚫ Input:
A document 𝑑
A fixed set of classes 𝐶 = {𝐶1 , 𝐶2 , … , 𝐶𝑗 }
output: 0 (positive)
49 Huawei Confidential
Text Classification: Application
⚫ Spam detection
⚫ Authorship identification
⚫ Age/gender identification
⚫ Language Identification
⚫ Sentiment analysis
⚫ ……
50 Huawei Confidential
Text Classification: Method
51 Huawei Confidential
Text Representation – Bag of Words
I Love Shanghai d1 1 0 0 1 0 1
I Love Hangzhou
d2 0 0 0 1 1 1
I Love Beijing TianAnMen
d3 0 1 1 1 0 1
52 Huawei Confidential
Text Representation – TF-IDF
53 Huawei Confidential
Text Representation – LSA
document vector
54 Huawei Confidential
Classifier – Naïve Bayes (1)
⚫ Given document 𝑑 and a fixed set of classes 𝐶 = {𝑐1 , 𝑐2 }, 𝑥 is the words of 𝑑,
compute the class of 𝑑 :
= argmax 𝑃 𝑥1 , 𝑥2 , … , 𝑥𝑛 |𝑐 𝑃(𝑐)
𝑐∈𝐶
55 Huawei Confidential
Classifier – Naïve Bayes (2)
57 Huawei Confidential
CNN for Text Classification
58 Huawei Confidential
RNN for Text Classification
59 Huawei Confidential
BERT for Text Classification
60 Huawei Confidential
Contents
1. Introduction to NLP
2. Knowledge Required
3. Key Tasks
▫ Keywords Extraction
▫ Text Classification
▫ Text Generation
▫ Sequence Labeling
▫ Sequence to Sequence
4. Application Systems
61 Huawei Confidential
Text Generation: Language Model
62 Huawei Confidential
Contents
1. Introduction to NLP
2. Knowledge Required
3. Key Tasks
▫ Keywords Extraction
▫ Text Classification
▫ Text Generation
▫ Sequence Labeling
▫ Sequence to Sequence
4. Application Systems
63 Huawei Confidential
Sequence Labeling
⚫ For each input 𝑥𝑖 ,it has corresponding label 𝑦𝑖
𝑥1 𝑥2 𝑥3 𝑥1 𝑥2 𝑥3
𝑦 𝑦1 𝑦2 𝑦3
64 Huawei Confidential
Part-of-Speech Tagging
Source nlpforhackers
65 Huawei Confidential
Named Entity Recognition (NER)
⚫ Identify token spans of entity mentions in text, and classify them into types of
entity.
66 Huawei Confidential
Sequence Labeling: Method
Traditional:
HMM
MEMM
CRF
Deep Learning:
RNN/LSTM
BiLSTM + CRF
BERT
67 Huawei Confidential
BiLSTM + CRF
68 Huawei Confidential
Contents
1. Introduction to NLP
2. Knowledge Required
3. Key Tasks
▫ Keywords Extraction
▫ Text Classification
▫ Text Generation
▫ Sequence Labeling
▫ Sequence to Sequence
4. Application Systems
69 Huawei Confidential
Sequence to Sequence (Seq2Seq)
70 Huawei Confidential
Sequence to Sequence: Applications
Machine Translation
Caption Generation
Speech Recognition
71 Huawei Confidential
Machine Translation Using RNN
Encoder Decoder
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2017/Lecture/Attain%20(v5).pdf
72 Huawei Confidential
Problem – Information Bottleneck
73 Huawei Confidential
Attention (1)
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2018/Lecture/Seq%20(v2).pdf
74 Huawei Confidential
Attention (2)
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2018/Lecture/Seq%20(v2).pdf
75 Huawei Confidential
Seq2seq with Attention
76 Huawei Confidential
Transformer: Attention is All You Need (1)
Decoder
Encoder
77 Huawei Confidential
Transformer: Attention is All You Need (2)
78 Huawei Confidential
Self-Attention (1)
79 Huawei Confidential
Self-Attention (2)
𝑄𝐾 𝑇
𝑠𝑜𝑓𝑡𝑚𝑎𝑥
𝑑𝑘
80 Huawei Confidential
Self-Attention (3)
http://jalammar.github.io/illustrated-transformer/
81 Huawei Confidential
Multi-Head Self-Attention (1)
82 Huawei Confidential
Multi-Head Self-Attention (2)
83 Huawei Confidential
Contents
1. Introduction to NLP
2. Knowledge Required
3. Key Tasks
4. Application Systems
84 Huawei Confidential
Dialogue Systems
85 Huawei Confidential
Quiz
B. False
86 Huawei Confidential
Recommendations
87 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.
2 Huawei Confidential
Contents
3. Huawei's AI Portfolio
3 Huawei Confidential
AI: Overall outcome of 60 years of development in ICT
AI Winter I I
AI Winter I
4 Huawei Confidential
AI is a new general purpose technology (GPT)
th th
9000 BC~1000 AD 15 ~18 Century 19th Century 20th Century 21st Century
Multiple uses across the economy Many technological complementarities and spillovers
https://www.researchgate.net/publication/227468040_Economic_Transformations_General_Purpose_Technologies_and_Long-Term_Economic_Growth
5 Huawei Confidential
AI Will Reshape Industries
Speech recognition Machine vision Decision and inference Natural language processing
6 Huawei Confidential
AI will change every organization
Leaders
Leaders
Managers / Experts
/ Data Scientists
Managers /
Experts
Junior Managers / Senior
Professionals
Junior Managers / Senior /Data Science Engineers
Professionals
Junior
Junior Employees Employees
7 Huawei Confidential
AI-triggered change has just begun
Reactions to AI: Excitement, urge to act, anxiety, confusion
AI adoption / productivity
Now
Small-scale exploration New tech and society collide Tech and society reinforce each other
8 Huawei Confidential
Continuous Breakthroughs in AI Algorithms Unlock
Boundless Possibilities
In specific fields, AI is approaching or exceeding human capabilities.
10 Huawei Confidential
Contents
3. Huawei's AI Portfolio
11 Huawei Confidential
10 changes that will shape the future
Training in days or even months Training in minutes or even seconds
Scarce & costly computing power Abundant & affordable computing power
AI: Mostly in cloud, some at the edge Pervasive AI for all scenarios. Respects and protects user privacy
Today’s basic algorithms invented before the 1980s Data and energy-efficient, secure, and explainable algorithms
Inadequate integration with other technologies Synergy between AI and cloud, IoT, edge computing, blockchain,
big data, databases, etc.
Only highly-skilled experts can work with AI AI as a basic skill, supported by one-stop platforms
Scarcity of data scientists Data scientists + Subject matter experts + Data science engineers
As Is To Be
12 Huawei Confidential
Contents
3. Huawei's AI Portfolio
13 Huawei Confidential
Huawei’s Full-Stack, All-Scenario AI Solution
14 Huawei Confidential
Atlas AI Computing Portfolio
15 Huawei Confidential
Atlas Accelerates AI Training
Ascend 910
AI processor
World’s most powerful World's most powerful training World's fastest AI training
training card server cluster
16 Huawei Confidential
Atlas Accelerates AI Inference
Ascend 310
AI processor
Intelligent devices with Highest density, 64 video inference Edge intelligence and cloud-edge AI inference platform with
7x higher performance channels synergy ultimate computing power
Atlas 200 AI accelerator module Atlas 300 AI accelerator card Atlas 500 AI edge station Atlas 800 AI server
17 Huawei Confidential
CANN: High-Performance Chip Operator Library and
Automated Operator Development Tool
➢ CANN: Includes the chip operator library and highly
CANN
Compute Architecture for Neural Networks automated operator development tool for optimal
18 Huawei Confidential
MindSpore: All-Scenario AI Computing Framework
MindSpore
All-scenario unified APIs
19 Huawei Confidential
1 Platform + 3 Plans Support Ascend Industry Partners and Developers
Business
partners Developers Universities
20 Huawei Confidential
Atlas Products: Built on Ascend 310 and Serving Many Industries
……
Finance Electric power Transportation Internet Carriers
21 Huawei Confidential
Quiz
22 Huawei Confidential
Summary
⚫ This course describes AI, a new general purpose technology, and introduces
the 10 changes that will shape the future. It also elaborates on Huawei's AI
development strategy and AI portfolio.
23 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.
2 Huawei Confidential
Contents
1. Overview of ModelArts
3 Huawei Confidential
Huawei's Full-Stack, All-Scenario AI Portfolio
AI Applications Application enablement: provides end-to-end
services (ModelArts), layered APIs, and pre-
HiAI Application
Engine ModelArts integrated solutions.
Enablement
Paddle- MindSpore: supports the unified training and
Paddle MindSpore
TensorFlow PyTorch Framework inference framework that is independent of
the device, edge, and cloud.
Full Stack CANN Chip CANN: a chip operator library and highly
Enablement automated operator development tool.
Ascend- Ascend- Ascend- Ascend- Ascend- IP & Chip Ascend: provides a series of NPU IPs and chips
Nano Tiny Lite Ascend Mini Max IP and Chip based on a unified, scalable architecture.
Huawei's "all AI scenarios" indicate different deployment scenarios for AI, including public clouds,
private clouds, edge computing in all forms, industrial IoT devices, and consumer devices.
4 Huawei Confidential
ModelArts 2.0, an Open Platform for Inclusive AI
For AI Users: Application Developers Data Scientist AI Specialist AI Ops
Online learning
Model update
5 Huawei Confidential
Innovative Data Processing Reduces Data Preparation
• Savings on workforce
50% to 80%
6 Huawei Confidential
Data Feature Mining - Data Enhancement Suggestions
20+ features
auto extraction
Quality
Saturation Brightness Clarity
features
Image
Resolution Complexity Colorfulness
property
7 Huawei Confidential
ExeML Engine, Automated AI pipelines
Zero
step 1:
upload and label data
Coding
step 2:
train the model
Zero
AI experience step 3:
validate and deploy model
8 Huawei Confidential
Interactive Notebook Coding
• Multiple languages:Python 2.7, Python 3.6;
• Multiple resouces:CPU/GPU/Ascend;
• Configurable auto-stop time
• Built-in environments: TensorFlow,Pytorch, Spark MLlib, Scikit-Learn, XGBoost, customization supported
9 Huawei Confidential
Guided Training Process
Training Guidance
• built-in algorithms
1. algorithm
• frameworks with customized code
source
• custom images
• built-in datasets
2. dataset source
• user dataset on OBS
data/code
• input/output directory
3. hyper-parameters
configuration
• running hyper-parameters
• computing resources:
4. computing CPU/GPU/NPU
resource
• node numbers
5. start training
10 Huawei Confidential
Real-time observing in training process
log configs
resource
utils metrics
visualization versions
param model
templates traceback
11 Huawei Confidential
Model Deployment: Cloud, Edge, Terminal
Batch Inference
▪ large data batch inference
▪ high efficiency distributed computing
Model Compression
(for hardware, frequency, precision limits)
AI model Edge Inference
compression
▪ integrated with IEF
distillation prune estimation
12 Huawei Confidential
AI Market: Trade, Share, Learn
HiLens HoloSens
Model Hub model algorithms
skills applications
13 Huawei Confidential
ModelArts: Development Platform Used by HUAWEI
CLOUD AI Services
ModelArts
14 Huawei Confidential
Autonomous Driving - Data Processing and Model Training
⚫ Training management
Data quality Model quality Responsibility
Unified Management Platform ⚫ Dataset management
analysis analysis analysis
15 Huawei Confidential
Quiz
1. (Multiple)Which of the following modules are covered by ModelArts? ( )
A. Data management
B. DevEnviron
C. AI Market
D. Model Management
E. Service Deployment
16 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.