0% found this document useful (0 votes)
10 views75 pages

Natural Language Processing Theory and Applications

The document outlines a course on Natural Language Processing (NLP), detailing its objectives, key tasks, and applications. It explains the fundamentals of NLP, including the distinction between natural and artificial languages, and discusses various methods and challenges in NLP. Additionally, it covers the development status of NLP and introduces language models and their significance in processing human language.

Uploaded by

yopenbey934
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views75 pages

Natural Language Processing Theory and Applications

The document outlines a course on Natural Language Processing (NLP), detailing its objectives, key tasks, and applications. It explains the fundamentals of NLP, including the distinction between natural and artificial languages, and discusses various methods and challenges in NLP. Additionally, it covers the development status of NLP and introduces language models and their significance in processing human language.

Uploaded by

yopenbey934
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

o

y B Uk B2
Theory and Applications of NLP
Copyright © 2019 Huawei Technologies Co, Ltd. All rights reserved.
E} Objectives
« Upon completion of this course, you will be able to:
o Understand the basic knowledge of Natural Language Processing (NLP)
o Master the algorithms of the Recurrent Neural Network (RNN)

o Master the key tasks of NLP


o Understand the applications of NLP

Page2 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!
> Contents
1. Introduction to NLP
2. Knowledge Required
3. Key Tasks
4. Applications

Page3 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!
E > What Is a Natural Language
« A natural language is a symbolic system that is embodied externally as
voice and consists of vocabulary and grammar. Text and sound are two
attributes of a language.
« A language is a tool for human communication and a carrier of human
thinking. In human history, the knowledge recorded and spread in the form
of language accounts for more than 80% of the total human knowledge.
« A natural language is established by usage, different from artificial
languages such as Java, C++, and other programming languages.

Paged Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!

o A natural language is a symbolic system that is embodied externally as voice and


consists of vocabulary and grammar. — Xinhua Dictionary

o The biggest difference between natural and artificial languages lies in ambiguity.
For example: "We give bananas to monkeys because they are hungry.” "We give
bananas to monkeys because they are ripe." Here is a problem in pronoun
reference.
5 ) What Is NLP
« Natural Language Processing (NLP) is a technology that uses computers as tools to
perform various processing on human-specific written and verbal natural language
information.
— Feng Zhiwei

« NLP is a discipline that studies language problems in human-human and human-


computer interaction. NLP needs to develop models that express language
capability and language application, establish a computing framework to
implement such language models, propose corresponding methods to
continuously improve the models, design various practical systems based on the
models, and explore evaluation technologies of these systems.
— Bill Manaris
Page5 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!

o NLP is a branch discipline in the fields of artificial intelligence (Al) and linguistics. It
studies various theories and methods for effective communication between human
beings and computers using natural languages.
= ) Basic Methods of NLP (1)
« Capability model
o This is a model established based on linguistic rules and the hypothesis that there is a
general grammar rule in the human brain. This model is compliant with the idea that
languages are derived from the language capability of the human brain, and the
establishment of a language model is to simulate this innate language capability by
establishing a manually edited set of language rules.
Itis also known as the "rationalistic" language model, with representatives of Chomsky
and Minsky.
= Modeling steps:
. Formalize linguistic knowledge
« Convert formalized rules into algorithms
+ Implement the algorithms
PageG Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!

« Rationalistic:

o Set up grammar rules (general): <noun> <verb> <adj> <noun>


o Set up semantic rules (scenario-based):
<Person> <Action><FoodType> <Food>

o Use the rules for verification.


= ) Basic Methods of NLP (2)
« Application model
= This is a specific language model established based on different languages processing
applications. It establishes specific mathematical models to learn complex and
extensive language structures, and uses methods such as statistics, pattern recognition,
and machine learning to train parameters of the model to expand the scale of language
use.
Itis also known as the "empirical” language model, with representatives of Shannon and
Skinner.
= Modeling steps:
« Obtain statistics at different levels of language units from large-scale real corpuses.
+ Use related statistical inference techniques based on statistics of lower-level language units to
calculate statistics of higher-level language units
Page7 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!

o Empirical:

o Collect the frequency of word combination from a large number of English


texts.

o Establish a statistical model to calculate the probability of two sentences.


E > Basic Methods of NLP (3)
« NLP methods can be classified into the following categories:
o Rule-based method
o Statistics-based method

Page8 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!
E > NLP Research Direction
« NLP is an important research direction in the fields of computer science
and Al. It is a cross-disciplinary subject, covering linguistics, computer
science, mathematics, psychology, information theory, acoustics ...
Phonermics
Morphology
[ Lexicology
language
understanding Syntax

Pragmatics
Natural
language ool Natural language
generation text

Paged Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. &2 HuawEI

o Phonemics

o Describes the combination rules of phonemes and how phonemes form


morphemes.

o Example: delete file x -> dilet#fail#eks


+ Morphology

o Describes the combination rules of morphemes and how morphemes form


words.

o Example: unusually -> un+usual+ly


o Lexicology

o Describes the laws of the lexical system and explains the inherent semantic
and grammatical characteristics of words.

o Example: delete file x -> delete(VERB) file(NOUN) x(ID)


> Three Levels of NLP
- Lexical analysis: It contains word segmentation, part-of-speech tagging, and named entity recognition.
+ Syntax analysis: It is the process of analyzing input text by sentence to obtain the syntactic structure of the sentence.
« Semantic analysis: The ultimate goal is to understand the true semantics of a sentence.
Preposition Noun Verb Noun Article
rticle
Pronoun Article Preposition Noun
o4 4 + oo + + + 4
[In the room], he broke a window <with a hammer>.
Adverbial Subject Predicate Object Complement
modifier

Source.
sentence Target sentence

Page 11 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!

« Syntax analysis: At present, there are three mainstream syntax analysis methods in
the industry: phrase structure syntax system, dependency structure syntax system,
and deep syntax analysis.
E > Challenges in NLP (1)
« Lexical ambiguity:
o Word segmentation: English is easier to segment than other languages.
o Part-of-speech tagging: The part of speech of a same word varies with a
context.
= | plan/v to take the postgraduate
= | have completed the plan/n

o Named entity recognition: Unregistered words, such as person names, company


names, and organization name, are difficult to recognize.
« Apple

= First National Bank Donates 2 Vans To Future School Of Fort Smith

Page 12 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!

« Difficulties of NLP include:

o Natural languages are complex to express and learn.


o Language translation relies on the real environment and context.

o Human languages are ambiguous, and rules are difficult to obtain. For
example, for anaphora resolution, computer languages use x and y, and
natural languages use this, that, he ...

o People usually use omissions for efficient communication.


) challenges in NLP (2)
« Syntax ambiguity: The syntactic dependency is effected by the context.
« John likes Jane more than Adam.
o Compared to Adam and Jane, Joh likes Jane more.

o Compared to Adam'’s liking of Jane, John likes Jane more.

« Alex saw a man on the hill with a telescope.


« The government asks us to save soap and waste paper.

Page 13 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!
=) challenges in NLP (3)
« Semantic ambiguity
o At last, a computer that understands you like your mother.
= Meaning 1: A computer understands you as your mother does.
» Meaning 2: A computer understands that you like your mother.
= Meaning 3: A computer understands you as it understands your mother.

« Meredith is in a terrible state.


o "state”: condition of something.
o "state”: a country or part of a country.

Page 14 Copyright © 2019 Huawei Technologies Co, Ltd. All rights reserved. W2 HuAWE!
E > Challenges in NLP (4)
« Pragmatic ambiguity
o "You are so bad"
= When this sentence is said to an adult who has done bad things, it is severe blame.
» When a mother says it to her naughty son, what she actually expresses is a kind of
love for her son.
= When a girl in love says it to her boyfriend, this is a manifestation of the girl playing
the woman in front of her boyfriend.

Page 15 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!

o The essence of ambiguity is the lack of semantic knowledge:

o Syntax rules do not drive understanding.


o The collected contexts do not cover the semantic knowledge required for
complete understanding. (Example: EEHRREFE40HTFKF)

« Countermeasures in practical application:

o Ambiguity-eliminating technical methods (considering the context in


statistics collection and modeling)

o Establishing "known knowledge" (adding field dictionaries and rules)

o Reducing "unknown knowledge” (vertical field development)


E > Development Status of NLP
« A number of influential language databases have been developed, and some
technologies have reached or basically reached a practical level, and have played a
huge role in practical applications.
o Corpus of Peking University and HowNet

« Many new research directions merge.


= Reading comprehension, image (video) understanding, and simultaneous interpretation
of speech

« Many theoretical issues have not yet been fundamentally resolved.


o Problems of unregistered-word recognition, ambiguity elimination, and semantic
understanding
o Lack of a complete and systematic theoretical framework

Page 16 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!
-> Contents
Introduction to NLP
2. Knowledge Required
= Language Models
o Text Vectorization
o Common Algorithms

3. Key Tasks
4. Applications

Page 17 Copyright © 2019 Huawei Technologies Co., Ltd. Al rights reserved. W2 HuAWE!
E > What Is a Language Model
« A language model is an abstract correspondence established based on objective
language facts.
« In practical applications, we often need to resolve the following problems:
o Spelling correction: P(about fifteen minutes from) > P(about fifteenminuets from)

= Question answering system, ...


o The above questions can be expressed as follows according to the chain rule:
P(wy, Wy, o, W) = P(W)P(Wa|W1) o P(wi| Wy, W,Wisg) o P (Wi [ Wi, W e W)
Wy,
Wy, .., W,, are words in the text. P(w;,w,,..,wy) indicates the probability
distribution of a character string with a length of m, and is a language model, that is,
a model for calculating the probability of a sentence.

Page 16 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!

« "Alanguage model assumes that all possible sentences of a language obey a prob
ability distribution, and the occurrence probability of each sentence adds up to 1. T
he task of a language model is to predict the probability of each sentence appeari
ng in the language. For a common sentence in the language, a good language mo
del gets a relatively high probability. For a sentence that is not grammatical, the cal
culated probability approaches zero. If a sentence is regarded as a sequence of wo
rds, the language model can be represented as a calculation model. A language m
odel only models the occurrence probability of a sentence and does not attempt t
o understand the meaning of the sentence.”

« Core of a language model: using scores to make the machine know how to speak
G > Neural Network Language Model (1)
i~ th output = P(w, = i|context)
1 softmax
(e== - O : o)
T Most computation
here | |

Cwig) Clwen)
o] (coe -]
E\ Matrix ¢ ,.!

Index for Wi-n.1 Index for wi_ Index for wiey

Page 19 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!

o The Bag of Word model is the earliest text vectorization method that uses word as
the basic processing unit. However, this model has the following shortcomings:

o Curse of dimensionality

o The order information cannot be kept.

o There is a semantic gap. (Words are the basic units of semantic expression.
The Bag of Word model only symbolizes words, and does not contain any
semantic information.)

« Word2vec and other word vector models are based on the distribution hypothesis,
that is, words with similar contexts have similar semantics. This is based on the
linguistic principle of "iconicity of distance”. A word and its context constitute an
image. When similar images are learned from the corpus, their semantics are
always similar. Neural network language model (NNLM) enables modeling based
on the relationship between the context and target word. Some researchers
proposed a method that uses the distribution of context to represent the meaning
of words, namely the Word Space Model.
Neural Network Language Model (2)

| E B
-
RN
B = & 3 5
o |I
Page20 Copyright © 2019 Huawei Technologies Co, Ltd. All rights reserved. W2 HuAWE!

« Softmax layer: converts RNN output status to the probability of each word.

« Source of this model: https://www.zhihu.com/question/29456588


E > N - gram Language Model
« When N-gram model is used to estimate the conditional probability, the preceding
words at a distance greater than or equal to n is ignored. Therefore, the
conditional probability can be calculated from frequency counts:
P(wilwy,wa,
e, Wiy) = P(Wil Wil (1), Wie(n2ys s Wiy )

_ count(Wi_n_1), Wi-(n-2)
count (Wi—(n—1), Wi-(n-2),
= When n = 1, this is a unigram model: P(wy, wy, . ., wy) = P(W)P(W3) ... P(Wy)
= When n = 2, this is a bigram model: P(wy, w,, ..., wy,) = P(w,)P(W,|w,) ... P(Wyn
[ Win—1)

« For example:
<s> lam Lily </s> p(l| <s>) = 2/3 = 0.667
<s> Lily 1 am </s> Pia(r;\sl)\i E“Z)/ /és_s'gs
<s> I do not like green eggs and ham</s> P ly =

Page21 Copyright © 2019 Huawei Technologes Co, 1 All rights reserved. e Huawer

o nA larger value of n indicates richer order information in the model and larger
calculation workload. At the same time, the number of long text sequences
decreases, and the numerator or denominator can be zero. Therefore, a
corresponding smoothing algorithm, such as Laplacian Smoothing, needs to be
used together with n-gram model to solve the problem. A unigram model
completely loses the order information in the sentence, and is improper.
Relationship Between NNLM and Statistical
] Language Model
« Similarity: These two models take a sentence as a word sequence to calculate the
sentence probability.
« Differences:
= Manner of probability calculation: Based on the Markov Assumption, N-gram model
considers only the first n words, while NNLM considers the context of the whole
sentence.
= Manner of model training: N-gram calculates parameters based on maximum likelihood
estimation, and such calculation is word-based; NNLM trains the model based on the
RNN optimization method.
Recurrent neural networks (RNNs) can store context information of any length in a
hidden state, not subject to the window limit in the N-gram model.

Page 22 Copyright © 2019 Huawei Technologies Co,, Ltd. All rights reserved. W2 HuAWE!
-> Contents
Introduction to NLP
2. Knowledge Required
o Language Models
= Text Vectorization
o Common Algorithms

3. Key Tasks
4. Applications

Page 23 Copyright © 2019 Huawei Technologies Co., Ltd. Al rights reserved. W2 HuAWE!
E > Text Vectorization (1)
» Text vectorization: represents text by using a series of vectors that express
the semantics of the text. Common vectorization algorithms are as follows:
o one-hot
o TF-IDF

o word2vec
« CBOW model
« Skip-gram model

o doc2vec/str2vec
« Distributed Memory (DM)
« Distributed Bag of Words (DBOW)

Page 24 Copyright© 2019 Huawei Technologes Co, 1 All rights reserved. e Huawer

o Text representation is fundamental in NLP. The quality of text representation


directly affects the performance of the entire NLP system. Text vectorization is an
important way of text representation. Words are the basic unit of semantic
expression. Most of current researches on text vectorization focus on word
vectorization (word2vec). Some researches also take articles or sentences as the
basic unit of text processing, and therefore doc2vec and str2vec merged.
« Word Embedding is a general term for language model and representation
learning technologies in NLP. Conceptually, it refers to "embedding” a high-
dimension space (a word represented in one-hot form) whose quantity of
dimensions equals the number of all words into a continuous vector space with
much fewer dimensions, and mapping each word or phrase to a vector in the real
number field.
G > Text Vectorization (2)
Man | Woman | King | Queen | Apple | Orange
(5391) | (9853) | (4914) | (7157) | (456) | (6257)
Gender | -1 1 -0.95 097 0.00 0.01

Royal [001 [0.02 093 [o9s [-001 [000


Age 0.03 002 070 069 003 -0.02
Food |0.09 0.01 0.02 0.01 0.95 0.97

™, woman do
king 2
cat @ fish
queen -

fogr gape- <0


1
three o .
cmoone orange

Page 25 Copyright © 2019 Huawei Technologes Co, 1 All rights reserved. e Huawer
@ ) word2vec - CBOW Model
Input layer

x
"l Woxn
Hidden layer Quipat loyer
Xzk ]
“H Waxn L

r N —dim vV —dim

Woxn

= CxV—dim

Page 26 Copyright © 2019 Huawel Technolgies Co, L. All rights reservec. S Huawe!
G > word2vec - Skip-gram Model
Output layer

B,
Input layer Hidden layer.

i
i
N—din
v —dim

€ xV—dim

Page 27 Copyright © 2019 Huawei Technologies Co, Ltd. All rights reserved. W2 HuAWE!
doc2vec - DM Model

Classifier ; 5

Average/Concatenate

L Am\m
oo ]
Paragraph
E&l
the
i
sat

Page 26 Copyright© 2019 Huawei Technologes Co, 1 All rights reserved. e Huawer
doc2vec - DBOW Model
Classifier the cat sat on

\,
Paragraph Matrix
Paragraph
id

Page 29 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!
-> Contents
Introduction to NLP
2. Knowledge Required
o Language Models
o Text Vectorization
= Common Algorithms

3. Key Tasks
4. Applications

Page30 Copyright © 2019 Huawei Technologies Co., Ltd. Al rights reserved. W2 HuAWE!
HMM(1)

S IS
L L L
r T 1 "

Page 31 Copyright © 2019 Huawei Technologies Co, Ltd. All rights reserved. W2 HuAWE!
@ ) HMM (2)
Schematic diagram of the Hidden Markov Model (HMM)

TIl
i L g
00600000000
FH A hidden state + From a hidden state to the next hidden state

6 An observed state l Output from a hidden state to an observed state

Page 32 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!
= ) HMM (3)
« HMM is a statistical model used to describe Markov processes with hidden
parameters. Mathematically, it is expressed by the following formula:
maxP(h|w) = maxP(hyh, ... hy|lwyw; ... wy,)

« w =w;w,..w,is the input sentence, n is the sentence length, w; is every


character in the sentence (including non-Chinese characters like
punctuations), and h =hyh; ... h,is the output label.

Page 33 Copyright© 2019 Huawei Technologes Co, 1 All rights reserved. e Huawer

+ HMM application: Founded in the 1970s, HMM is a kind of Markov chainand is a


double stochastic process. It was spread and developed in the 1980s and has
become an important direction of signal processing. It has been successfully
applied in the fields of speech recognition, behavior recognition, text recognition,
and fault diagnosis.

o Introduction:

o Entropy: indicates the degree of uncertainty in the state of the system.


Shannon uses the concept of entropy to describe an information system,
where entropy represents the average amount of information (average
uncertainty) of the information system (such as the movement of a drop of
ink).

o Maximum entropy model: It is often said that do not put all eggs in one
basket. (We know that there may be dozens or even hundreds of factors that
affect stock fluctuation, and the maximum entropy method can find a model
that meets thousands of different conditions at the same time.) To minimize
risks, keep all uncertainties.

« Corresponding algorithms are proposed for the following problems:

o Evaluation problem: forward algorithm

o Decoding problem: Viterbi algorithm

o Learning problem: Baum-Welch algorithm (forward and backward algorithm)


A HVIM (4)
[ maxP(w)
= P(wy, Wy, ., Win) = POW)P(Walw1) . P(Wi Wiy, Wy e Wis) o P (Wi W1, Way ooy Wine1)

Add hidden variable h

Bayesian formula
Plwlt)P(h)
)
Constant P(w)
[maxp iy p(h)
Observation independence hypothesis, chain rule
‘mwl'(w\\'h?l'(WMJ P(wlha)PChy)PCholhy)PChs|hy
by ) - P(ha
by ho vhm)‘

Homogeneous Markov hypothesis


|‘MXP(W1I’\)P('~:VI;) o P W Ry )P(Ry)PCha|hy)
PChs | ) ... PChalhn1) ‘

Page 34 Copyright © 2019 Huawei Technologes Co, 1 All rights reserved. e Huawer

o Observation independence hypothesis: Every character is relevant only to the


current character.

« Transition probability: probability of transitioning from a hidden state to the next


hidden state.

« Transmit probability: probability of transitioning from a hidden state to an


observed state

o When P(0k|0x-1) is set to O, improper combinations such as BBB and EM can be


eliminated.

o In HMM, the Viterbi algorithm is used to solve maxP(1|o)P


(o).
E > Conditional Random Field
» Assuming that X = (X1, X,,..,X,)and Y = (¥3,Y,...,Y,) form a joint random
variable, if ¥ constitutes a Markov model represented by an undirected
graph G = (V,E), the conditional probability P(Y|X)is called a Conditional
Random Field (CRF).

i Y, Y Va1 Yn i Y. Y Yo-1 Yo

X X Xs Xn-1 Xp X=Xy Ky, K

Linear chain CRF

Page 35 Copyright © 2019 Huawei Technologes Co, 1 All rights reserved. e Huawer

o For several positions in the whole, if a position is randomly assigned a value


according to a certain distribution, the whole is called a random field.
>RNN
«» Different from conventional machine translation models that consider only
limited vocabulary prefix information as the conditional item of the
semantic model, the recursive neural network (RNN) can incorporate all
pre-order vocabulary in the corpus into the model consideration.

Colah, 2015, Understanding LSTMs Networks

Page 37 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!

« Long-Term Dependencies: A key point of RNN is that an RNN can associate


previous information with the current task, such as using past video segments to
derive the understanding of the current segment. If RNNs can do this, they will
become very useful. Then, can they really do so? The answer is that there are still
many dependencies. Sometimes we just need to know the previous information to
perform the current task. For example, we have a language model to predict the
next word based on previous words. If we try to predict the last word of "the
clouds are in the sky", we do not need any other context — so the next word is
obviously sky. In such a scenario, the distance between the positions of related
information and the word to be predicted is very small, and the RNN can learn to
use the previous information. However, there will also be some more complicated
scenarios. If we try to predict the last word of "I grew up in France .... | speak fluent
French", the current information suggests that the next word may be the name of
a language, but if we need to figure out what language it is, we need the context
of the previous word France that is far from the current position. This means that
the distance between the position of related information and the current position
to be predicted certainly becomes large. Unfortunately, when this distance
increases, the RNN loses the capability of learning information so far away. In
theory, RNNs can definitely handle such long-term dependencies. People can
carefully select parameters to resolve the initial forms of such problems, but in
practice, RNNs certainly cannot learn this knowledge successfully. An excessively
long sequence will cause gradient dissipation during optimization. Fortunately,
LSTM does not have this problem.
= ) LSTM
» Long short-term memory (LSTM): a special type of RNN that can learn
long-term dependencies.

Colah, 2015, Understanding LSTMs Networks

Page38 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. &2 HuawEI

o LSTM adopts an intended design to avoid long-term dependencies. In practice,


recording long-term information is default behavior of LSTM, not an ability
requiring high costs. All RNNs have a chained form of a repetitive/neural network
module. In a standard RNN, this repetitive module has only a simple structure,
such as a tanh layer. LSTM has a similar form, but the repetitive module has a
different structure. Unlike a single neural network layer, there are four layers here
interacting in a very special way. LSTM is a special network structure with three
“doors".
o Using the "doors", LSTM enables information to selectively influence the state of
each moment in the RNN. Such a "door" structure is an operation using a Sigmod
neural network and performing bitwise multiplication. The reason why the
structure is called a door is because the fully connected neural network layer using
Sigmod as the activation function sends a value of 0 or 1 to describe how much
information of the current input can get through this structure. Therefore, the
function of this structure is similar to that of a door: When the door is open
(Sigmod output is 1), all information can get through; when the door is closed
(Sigmod output is 0), no information can get through. (source:
https://blog.csdn.net/mpk_no1/article/details/72875185)
&) Gru
« Gated Recurrent Unit (GRU) can be considered as a variant of LSTM, and replaces
the forget gate and input gate of the LSTM with an update gate. It combines the
cell state and hidden state h,, and calculates new information at the current
moment by using a method different from LSTM.
hf

X
Colah, 2015, Understanding LSTMs Networks
Page 39 Copyright © 2019 Huawei Technologies Co, Ltd. All rights reserved. W2 HuAWE!
= ) BiRNN
+ In a typical RNN, the state is transmitted unidirectionally backwards. However, in some problems, the
output of the current moment is related not only to the previous state, but also to the subsequent
state. In this case, a bi-directional recurrent neural network (BIRNN) is needed to resolve such
problems. For example, a missing word in a sentence needs to be predicted based on not only the
previous text, but also the following content, and the BIRNN functions. A BIRNN is composed of two
RNNS, one on top of the other. The output is determined by the state of the two RNNS.
[}

Hanbingtao, 2017, Convolutional neural network

Page 40 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. &2 HuawEI
> Contents
1. Introduction to NLP
2. Knowledge Required
3. Key Technologies
= Word Segmentation
a Part-of-Speech Tagging
o Named Entity Recognition
a Keyword Extraction
o Syntax Analysis

o Semantic Analysis

4. Application System

Page 41 Copyright © 2019 Huawei Technologies Co, Ltd. All rights reserved. W2 HuAWE!
E > Chinese Word Segmentation
« Chinese word segmentation is the task of splitting Chinese text (a
sequence of Chinese characters) into words. Word segmentation is the
process of dividing a string of written language into its component words.
= For example: —JL1/\ &/ E/SCH/EHH O/ RME/E/—FTFN+/\m=12/5mT

Page42 Copyright © 2019 Huawei Technologes Co, 1 All rights reserved. e Huawer

o In English, spaces are used as natural delimiters between words. In Chinese,


characters, sentences, and paragraphs can be simply delimited by obvious
delimiters. Words, however, do not have formal delimiters. Although the English
language also has the problem of dividing phrases, at the word level, Chinese is
much more complex and difficult than English.
E > Regular Word Segmentation (1)
« Regular word segmentation: It is a mechanical word segmentation method that
matches each character string in a sentence with words in the word list one by one
during sentence segmentation by maintaining the dictionary. Only matched words
are segmented. By the way of matching and segmentation, there are the following
methods:
& Maximum match (MM) method

o Reverse maximum match (RMM) method


o Bi-directional match method
« Features: simple and efficient, but difficult to maintain the dictionary. New words
emerge continuously on the Internet, and therefore the dictionary cannot cover all
the words.

Page 43 Copyright © 2019 Huawei Technologies Co,, Ltd. All rights reserved. W2 HuAWE!
E > Regular Word Segmentation (2)
+« MM method: /17
Y ook u%
Character strng ST10] < detonary to
be segmented; outpu check whether
uopezyerul
character string 52 = exists
maximum word \
length Maxten

Remove the rightmost


sT=s1-w
s2-s2 4 character of W.
Y
o pd . p
N " 1s W asingle
«_ character’
Start from the left of S1, and take
out the candidate character
string W whose length is not
greater than Maxten.

Page44 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!

o MM method:

o Basic idea: Assume that the longest word in the word segmentation
dictionary has i Chinese characters. Then, use the first i characters in the
current character string of the processed document as the matching field to
look up the dictionary. If such an i word exists in the dictionary, the match is
successful and the matched field is segmented as a word. If such an i word is
not found in the dictionary, the match fails, the last character in the field to
be matched is removed, and the remaining string is re-matched. This process
is repeated until a word is segmented or the length of the remaining string is
zero.
o For example: "EIIRMITARS" is segmented to "BIRMR\T\AHR"
¢ RMM method: The basic principle is the same as that of the MM method, but the
direction of word segmentation is opposite to that of MM.

o Comparison with MM: Chinese have many attributive structures. Matching


forwards from the end can improve the accuracy. Statistical results show that
the error rate of pure MM method is 1/169, and that of pure RMM method is
1/245.

o For example: "SR ARS" is segmented to "BIRM\IKITARE". Such


segmentation may not be completely correct. It is also possible that a "ER
" (Mayor of Nanjing) is named "STA#5" (Jiang Dagiao).
E > Statistical Word Segmentation
« Main idea: Word segmentation is implemented as a task of tagging the sequence of
characters in the string. Each character occupies a certain position in the construction of a
particular word. If successive characters often appear in different texts, the successive
characters are likely a word.

Four tags: B = Beginning of a word, M = Middle of a word, £ = End of a


word, S = Separate word
Two tags: B = Beginning of a word, | = Not beginning of a word
Input: The Chinese nation is brave The Chinese nation is brave
Tags: BME BMMMME BME BE BMME BII Bl BIllll BIBIlI
Output: The Chinese nation/is/brave
« Steps:
o Establish a statistical language model.
o Segment words in a sentence, and calculate the probabilities of the results to obtain the word
segmentation method with the highest probability, such as HMM or CRF.
Page 46 Copyright © 2019 Huawei Technologes Co, 1 All rights reserved. e Huawer

¢ Main idea: Each word is composed of characters, the smallest units of a word. If
successive characters often appear in different texts, the successive characters are
likely a word. Therefore, the occurrence frequency of successive characters is used
to reflect the reliability of words, and the frequency of a combination of successive
characters in the corpus can be collected. When the combination frequency is
higher than a certain threshold, it is considered that this character combination
may constitute a word.

o Selection of a statistical model:

o Hidden state: Choose HMM when the hidden state sequence needs to be
calculated based on an observation sequence (character sequence).

o Category label: When the context of each character constitutes a feature,


choose the maximum entropy model to determine the word category of the
current character.
E > Deep Learning Word Segmentation
« This word segmentation algorithm uses word2vec to embed corpus words, inputs
word embedding features to the bi-directional LSTM, adds a linear layer to the
output hidden layer, and adds a CRF to obtain the final implemented model.


S ! F‘\IF e T): r

SRR R
Word embeddmgS[ @ @ @ @

Page47 Copyright © 2019 Huawei Technologies Co. Ltd. All rights reserved. W2 HuAWE!
E > Mixed Word Segmentation
» In most practical engineering applications, a word segmentation algorithm
is used with the assistance of other word segmentation algorithms. The
most common practice is to use dictionary-based word segmentation, with
the assistance of statistical word segmentation algorithms.

Page ds Copyright © 2019 Huawei Technologes Co, 1 All rights reserved. e Huawer

o In fact, the word segmentation effects of rule-based algorithms, HMM, CRF, and
deep learning algorithms in specific tasks have little difference.
> Contents
1. Introduction to NLP
2. Knowledge Required
3. Key Tasks
o Word Segmentation
= Part-of-Speech Tagging
o Named Entity Recognition
a Keyword Extraction
o Syntax Analysis

o Semantic Analysis

4. Applications

Page 49 Copyright © 2019 Huawei Technologies Co, Ltd. All rights reserved. W2 HuAWE!
E > Part-of-Speech Tagging
« Part-of-speech tagging: process of tagging a correct part of speech for
each word in a sentence after word segmentation, that is, process of
determining each word as a noun, a verb, an adjective, or any other part of
speech. For example: march toward/v, be filled with/v, hope/n, of/uj,
new/a, and century/n.
o Part of speech: a basic syntax attribute of a word
o Purpose: This is a pre-processing step for many NLP tasks, such as syntactic
analysis and information extraction. The text obtained after part-of-speech
tagging brings great convenience, but this is dispensable.
o Methods: rule-based, statistics-based, and deep learning-based methods

Page 50 Copyright © 2019 Huawei Technologies Co,, Ltd. All rights reserved. W2 HuAWE!

o Statistics-based methods:

o Maximum entropy-based part-of-speech tagging


o Part-of-speech output based on the statistical maximum probability

o HMM-based part-of-speech tagging


> Contents
1. Introduction to NLP
2. Knowledge Required
3. Key Tasks
o Word Segmentation
a Part-of-Speech Tagging
= Named Entity Recognition
a Keyword Extraction
o Syntax Analysis

o Semantic Analysis

4. Applications

Page 51 Copyright © 2019 Huawei Technologies Co, Ltd. All rights reserved. W2 HuAWE!
=) Named Entity Recognition (1)
. Named Entities Recognition (NER): also known as "proper name recognition”. It refers to the
recognition of entities with specific meanings in the text, mainly including person names, place
names, institution names, and proper nouns. For example: metallurgy/n, ministry of industry/n,
Hongkong/n, fireproofing material/l, and research institute/n.
o Classification: Named entities studied by NER are divided into three categories (entity, time, and
number) and seven subcategories (person name, place name, institution name, time, date, currency,
and percent).
Function: Same as automatic word segmentation and part-of-speech tagging, named entity
recognition is also a basic task for natural languages. It is essential to technologies such as
information extraction, information retrieval, machine translation, and the question and answer
system.
o Steps:
+ Recognize the entity boundary.
+ Determine the entity category (such as person name, place name, or institution name),
Page'52. Copyright © 2019 Huawel Technolgies Co, L. All rights reservec. S Huawe!

o See https://blog.csdn.net/ZJL0105/article/details/82194610.
E > Named Entity Recognition (2)
« Difficulties:
o There are a large number of various named entities.
o The composition of named entities is complex.
o Entities are embedded and complex.
o The entity length is uncertain.

Page 53 Copyright © 2019 Huawei Technologies Co, Ltd. All rights reserved. W2 HuAWE!
Deep Learning NER
Character/Word Billy goes to the training centerto study
vector

Word
embeddings

Bi-LSTM

CRF
Page 54 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. &2 HuawEI

« Source: https://blog.csdn.net/DataGrand/article/details/83312169
> Contents
1. Introduction to NLP
2. Knowledge Required
3. Key Tasks
o Word Segmentation
a Part-of-Speech Tagging
o Named Entity Recognition

= Keyword Extraction
o Syntax Analysis

o Semantic Analysis

4. Applications

Page 55 Copyright © 2019 Huawei Technologies Co, Ltd. All rights reserved. W2 HuAWE!
E > Keyword Extraction
« Keywords are a group of words that represent the important content of an article. In actual
scenarios, a large amount of text does not contain keywords. Therefore, the technology of
automatic keyword extraction enables people to browse and retrieve information
conveniently, and plays an important role in text clustering, classification, and automatic
summarization.
« Keyword extraction algorithms can be divided into supervised and unsupervised types:
o Supervised: Supervised keyword extraction is carried out via classification. This type of algorithms
creates a comprehensive word list, and determines the degree of matching between each word in
each document and the word list to extract keywords in a manner similarto tagging.
o Unsupervised: This type of algorithms does not require manually generated and maintained word
list, or manual standard corpus to assist training. These algorithms include TF-IDF, TextRank, and
topic model algorithms (such as LSA, LS|, and LDA),

Page 56 Copyright © 2019 Huawel Technolgies Co, L. All rights eservec. S Huawe!

o Supervised algorithms can achieve higher accuracy, but require a large amount of
tagged data, featuring high labor costs.
D) TF - IDF Algorithm (1)
« Term Frequency-Inverse Document Frequency (TF-IDF): a statistical calculation
method commonly used to assess the importance of a word to a document in a
fileset.
- For example:

On the World Blood Donor Day, school groups and blood donation service
volunteers can go to the blood center to visit the inspection process. We will
publicize the test results, and the price of blood will also be publicized.
In the sentence, the occurrence frequencies of words "blood", “the", “donor”
and " visit" are 4. In TF algorithm, the importance of these words to this document
is the same. However, "blood" and “donor” are obviously more important to this
document.

Page 57 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!
D) TF - IDF Algorithm (2)
. The TF algorithm collects the occurrence frequency of a word in a document. The basic idea is that a
word appears more often in a document better expresses the document.
.+ _ _m__ Number of times that a word appears in the document
fo = Somy Number of total words in the document
« The IDF algorithm counts the number of documents in a fileset that contain a same word. The basic
idea is that a word appears in fewer documents better distinguishes the documents.
idf, = lug(&)
! 1+|D;|

+ ID] s the total number of documents in a fileset, |Di| is the number of documents containing word i
« TF-IDF algorithm:

D
of X df(i,]) = tfy x idf, = x Iog(%“‘”)

Page 58 Copyright © 2019 Huawei Technologies Co,, Ltd. All rights reserved. W2 HuAWE!

o IDF algorithm: The denominator plus 1 means that Laplacian Smoothing is used.
This avoids the situation in which the denominator is zero because some new
words do not appear in the corpus, enhancing the robustness of the algorithm.

o TF-IDF algorithm: combination of TF and IDF algorithms. Scholars have done a lot
of researches on how to combine these two algorithms: whether to add or multiply
them and whether to take logarithm for IDF calculation. After a lot of theoretical
derivation and experimental researches, multiplication is found to be one of the
more effective calculation methods.
E > TextRank Algorithm (1)
« The basic idea of the TextRank algorithm comes from Google's PageRank algorithm.
PageRank is a link analysis algorithm proposed by Google founder Larry Page and Sergey
Brin when constructing an early search system prototype in 1997. The algorithm is used to
evaluate the importance of a web page in the search system. There are two basic ideas:
o Link quantity. A web page is more important if it is linked by more other web pages.
o Link quality. A web page is more important if it is linked by another web page with higher weight.


R :/;;ln’u =]

AR
— =
o T

Page 53 Copyright © 2019 Huawel Technologies Co, L. All rights reservec. S Huawe!

« Difference between TextRank and other algorithms: TextRank can extract the
keywords of an individual document without a corpus.

o TF-IDF counts each word appears in how many documents of the corpus,
that is, inverse document frequency.

o A topic model discovers the hidden topic of a document by learning


documents in a large scale.

o The TextRank algorithm was first used for automatic summarization of


documents. Based on the analysis at the sentence level, TextRank scores each
sentence, and selects the nsentences with the highest scores as the key
sentences of the document to realize automatic summarization.
E > TextRank Algorithm (2)
« In(Vi) is a set of incoming links of Vi, Out(;) is a set of outgoing links of ¥, [Out(V})| is the number of
outgoing links, and therefore, the score of V; is:

> o
1
SV = xS
sy 10wl
- To avoid the score of an isolated web page being 0, add a damping factor d as follows:

SW)=(1-d+dx Y 10w ot (V)]


ity
xS

. PageRank is a directed non-weight graph, and TextRank automatic summarization is a weighted graph,
because in addition to the importance of link sentences, the similarity between two sentences also
needs to be considered. Therefore, the complete expression of TextRank is:
1
WS(W) = (1—d)+d x
vyémtvy SV ourw, )
Page 60 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!
E > TextRank Algorithm (3)
» When TextRank is applied to a keyword extraction task, there are two main
differences compared with a case in which it is applied to an automatic
summarization task:
o The association between words has no weight.
o Not every word is linked to all the words in the document.

« Because of the first difference, the score in TextRank decreases to be


consistent with that in PageRank; because of the second difference, the link
relationship can be defined by a window.

Page 61 Copyright © 2019 Huawei Technologes Co, 1 All rights reserved. e Huawer
= ) LSA, LS, and LDA Algorithms
« A topic model considers that there is no direct connection between words and
documents, and they are connected by another dimension, which is called a topic.
Each document should correspond to one or more topics, and each topic should
have a corresponding word distribution. The word distribution of each document
can be obtained through the topic.

Source of the picture: http://www bubuko.comyinfodetail- 2796156 htmi


Page 62 Copyright © 2019 Huawei Technologies Co, Ltd. All rights reserved. HUAWEI

o In general, TF-IDF and TextRank can satisfy most keyword extraction tasks.
However, in some scenarios, keyword extraction based on the document itself is
not sufficient, and some keywords may not appear in the document. For example,
a science article about animal living environments introduces a variety of animals,
such as lions, tigers, and crocodiles, but does not contain the word of animal. In
this case, the two algorithms are inapplicable, and a topic model is required.
o The previous two models extract keywords based on the relationship between
words and documents. These two algorithms use only the statistical information in
the text, and do not fully utilize the rich information, especially the semantic
information, which is obviously very useful for text keyword extraction.
= ) LSA/LSI
« Both Latent Semantic Analysis (LSA) and Latent Semantic Indexing (LSI) are used to
analyze latent semantics of a document. However, after analysis, LSI may further
establish a related index by using the analysis result. LSA and LS| are usually
treated as the same algorithm. The analysis steps are as follow:
o Represent each document as a vector using the BOW model.
o Combine all document word vectors to form a word-document matrix (mxn).
o Perform the singular value decomposition (SVD) ([mxr].[rxr] [rxn]) on the word-document matrix.
According to the SVD result, each word and each document may be represented as a dot in a space
formed by r topics. The level of similarity of each word in each document may be obtained by
calculating the similarity between each word and each document, and the word with the highest
level similarity is selected as a keyword of the document.

Page 63 Copyright© 2019 Huawei Technologes Co, 1 All rights reserved. e Huawer

« Compared with a conventional Space Vector Model (SVM) that lacks utilization of
semantic information, LSA maps a word and a document to a low-dimensional
semantic space by using SVD, to mine shallow semantic information of the word
and the document. In this way, the document is more essentially expressed.

o Advantage: The words and text can be mapped to low-dimensional space.


The semantic information of text is used in a limited way and the computing
cost is greatly reduced, improving the analysis quality.

o Disadvantages: (1) SVD has high computing complexity. Higher dimensions


of eigenspace results in low computing efficiency. (2) The distribution
information obtained by LSA is based on the existing dataset. When a new
document enters the existing eigenspace, the entire space needs to be
retrained to obtain the distribution information after the document is added.
(3) LSA is insensitive to the frequency distribution of words and weak in
physical interpretation.
) LDA Algorithm (1)
« LDA assumes that the prior distribution of topics in a document and the prior
distribution of words in the topics are subject to the Dirichlet distribution. Then, by
counting the existing datasets, the polynomial distribution of topics in each
document and the polynomial distribution of the corresponding words of each
topic can be obtained. The training procedure is as follows:
o Perform random initialization, and randomly assign a topic number z to each word w in
every document in the corpus.
= Rescan the corpus, sample its topics again for each word w according to the Gibbs
sampling formula, and update them in the corpus.
o Repeat the sampling process in the corpus until the Gibbs sampling converges.
o In the corpus, collect the topic-word co-occurrence frequency matrix, which is the LDA
model.
Page 64 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!

o According to the Bayesian school, Prior distribution + Data (likelihood) = Posterior


distribution.
) LDA Algorithm (2)
« After the above steps, a trained LDA model is obtained, and then topics of a new document can be
estimated as follows:
Perform random initialization, and randomly assign a topic number z to each word w in the current document.
s Rescan the current document, and sample its topics again according to the Gibbs sampling formula.
= Repeat the above process until the Gibbs sampling converges.
o Take the topic distribution in the statistical document as the estimated result.

11
@
Page 65 Copyright © 2019 Huawei Technologies Co, Ltd. All rights reserved. W2 HuAWE!
> Contents
1. Introduction to NLP
2. Knowledge Required
3. Key Tasks
o Word Segmentation
o Part-of-Speech Tagging
o Named Entity Recognition
a Keyword Extraction
= Syntax Analysis

o Semantic Analysis

4. Applications

Page 66 Copyright © 2019 Huawei Technologies Co, Ltd. All rights reserved. W2 HuAWE!
E > Syntax Analysis
» The main task of syntax analysis is to identify the syntactic components
contained in a sentence and the dependencies between these components.
It can be divided into syntactic structure analysis and dependency
analysis. The result of syntax analysis is represented by a syntax tree.

I want to watch a horror movie.

want
5y
I

to
a horror
Page 67 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!

« Importance of syntax analysis:

o Machine translation is a main field of NLP, and syntax analysis is the core
data structure of machine translation. Syntax analysis is the core technology
of NLP and the basis of deep understanding of languages.

o For complex sentences, no correct sentence component relationship can be


obtained only by part-of-speech analysis.

o With the use of deep learning in NLP, especially the application of the LSTM
model with its own syntactic relationship, syntax analysis has become less
important. However, syntax analysis can still play a great role in long
sentences with very complex syntactic structures and few tagged samples.
Therefore, the study of syntax analysis is still necessary.

« Difficulties of syntax analysis:

o Ambiguity: An important difference between natural languages and artificial


languages is the existence of ambiguity. Human beings can rely on a large
amount of prior knowledge to eliminate ambiguities, while machines have
serious deficiencies in knowledge representation and acquisition, and cannot
eliminate ambiguities like human beings.

o Search space: Syntax analysis is a complex task. The number of candidate


trees increases exponentially with the increase of sentences, and the search
space is huge. Therefore, a proper decoder must be designed to ensure that
the optimal model definition can be found within a tolerable time.
o Syntax analysis can be divided into complete syntax analysis and partial syntax
analysis by the focused goals. The methods used in syntax analysis can be
divided into rule-based methods and statistics-based methods.
E > Importance of Syntax Analysis
» Machine translation is a main field of NLP, and syntax analysis is the core
data structure of machine translation and the basis of deep understanding
of languages.
« For complex sentences with few tagged samples, no correct sentence
component relationship can be obtained only by part-of-speech analysis.

Page 66 Copyright© 2019 Huawei Technologes Co, 1 All rights reserved. e Huawer

o With the use of deep learning in NLP, especially the application of the LSTM model
with its own syntactic relationship, syntax analysis has become less important.
However, syntax analysis can still play a great role in long sentences with very
complex syntactic structures and few tagged samples.

o Reference: https://blog.csdn.net/yu5064/article/details/82151578
> Contents
1. Introduction to NLP
2. Knowledge Required
3. Key Tasks
o Word Segmentation
o Part-of-Speech Tagging
o Named Entity Recognition
a Keyword Extraction
o Syntax Analysis

= Semantic Analysis

4. Applications

Page 69 Copyright © 2019 Huawei Technologies Co, Ltd. All rights reserved. W2 HuAWE!
E > Semantic Analysis
« Semantic analysis is a logical phase of the compilation process. The task of
semantic computing is to explain the meaning of parts (words, phrases,
sentences, paragraphs, or chapters) of an article in natural languages.

Natural Grammatical Partial


aramatial Sentence stem
Sentences processing i exrcton ||

L ence Modifier Semantic Semantic


SetenceP®
recognition L semantic || information [ TOmation
extaction generation
Semantic analysis process

Page 70 Copyright © 2019 Huawei Technologies Co. Ltd. All rights reserved. W2 HuAWE!
E > Importance of Semantic Analysis
« Isit enough to know just the structure of a sentence?
« For example: syllogism: All men are mortal.
Socrates is a man.
Therefore, Socrates is mortal

« Inference: All plants die.


All men die.
Men are plants.
« The above case is grammatical in structure, but semantically impractical. Therefore,
pure analysis of a sentence structure does not properly resolve problems such as
machine understanding and translation. Therefore, semantic analysis is necessary.

Page71 Copyright © 2019 Huamei Technologes Co, 1 All rights reserved. e Huawer

« Without semantic analysis of words, languages cannot be translated.

o Sentences with the same syntactic structure often vary greatly in semantics. In this
case, the entire analysis cannot go on without semantic analysis.
> Contents
1. Introduction to NLP
2. Knowledge Required
3. Key Tasks
4. Applications

Page 72 Copyright © 2019 Huawei Technologies Co., Ltd. Al rights reserved. W2 HuAWE!
= ) Applications (1)
Text classification: process of associating given text with one or more categories based on the
characteristics of the text under a predefined classification system. Example: spam detection and
sentiment analysis
Text clustering: clusters text based on the clustering hypothesis that documents in the same
category have a high similarity and documents in different categories have a small similarity.
Machine translation: uses computers to translate between different languages. Related
researches and exploration have been started since the birth of the first computer: memory-
based -> instance-based -> statistical machine translation -> neural network translation.
Question answering system: an information retrieval system that receives questions raised by
users in natural languages and finds or infers answers to the questions from a large amount of
heterogeneous data.

Page 73 Copyright © 2019 Huawei Technologes Co, 1 All rights reserved. e Huawer

o Text classification process:

o Definition phase: Defines data and the classification system, involving specific
category division and data required.

Data preprocessing: Implements word segmentation, stop words removal,


and other preparation.

Data features extraction: Reduces the dimension of the document matrix, and
extracts the most useful features from the training set. Algorithms include
BOW, TF-IDF, and N-gram.

Model training phase: Selects specific classification models and algorithms,


and trains the text classifiers.

Evaluation phase: Tests and evaluates the performance of the classifiers by


using the test set.
Application phase: Uses the classification model with the optimal
performance to classify documents.
= ) Applications (2)
» Automatic abstract: uses computers to automatically extract abstracts
from the original document. An abstract is a simple and coherent essay
that comprehensively and accurately reflects the central content of a
certain document.
« Information Extraction (IE): extracts the information contained in the text
and then integrates it in a unified form.
« Public opinion analysis is a process of performing deep thinking
processing, analysis, and researches on the public opinions of a specific
problem as required, to obtain a conclusion.
« Machine writing: Al robots write articles.
Page 75 Copyright© 2019 Huawei Technologes Co, 1 All rights reserved. e Huawer

o The information extraction technology does not attempt to fully understand the
entire document but only analyzes the parts of the document that contain relevant
information. As for which information is relevant, it will be determined by the
predefined scope of field.
> Quiz
1. Which of the following options is NOT one of the three levels of NLP? ()

A. Lexical analysis
B. Syntax analysis
C. Speech analysis
D. Semantic analysis

2. The TF algorithm counts the number of documents in a fileset that contain a


same word. The basic idea is that a word appears in fewer documents better
distinguishes the documents. ()

Page 76 Copyright © 2019 Huawel Technologies Co, L. All rights reservec. S Huawe!

o Answers:

o C
o False
> Summary
« This document introduces the basic knowledge of NLP, describes language
models and commonly used algorithms, and illustrates key technologies
and applications of NLP.

Page 77 Copyright © 2019 Huawei Technologies Co, Ltd. Al rights reserved. W2 HuAWE!
ThanklYdu
WwWiv.huaWei
.€o m

You might also like