0% found this document useful (0 votes)
17 views44 pages

Lec 1.1.2

This document provides an overview of a course on natural language processing (NLP). The course aims to provide students with knowledge of fundamental NLP concepts and techniques. On completing the course, students will be able to demonstrate understanding of topics like ambiguity, NLP models and algorithms, and speech and language processing. The content covered includes NLP components and terminology, challenges in natural language understanding, and recent advances in areas like dialogue systems, machine translation and question answering.

Uploaded by

yuvrajaditya1306
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views44 pages

Lec 1.1.2

This document provides an overview of a course on natural language processing (NLP). The course aims to provide students with knowledge of fundamental NLP concepts and techniques. On completing the course, students will be able to demonstrate understanding of topics like ambiguity, NLP models and algorithms, and speech and language processing. The content covered includes NLP components and terminology, challenges in natural language understanding, and recent advances in areas like dialogue systems, machine translation and question answering.

Uploaded by

yuvrajaditya1306
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Apex Institute of Technology

Department of Computer Science & Engineering

NATURAL LANGUAGE PROCESSING


(20CST354)

Dr Satinderjit Kaur Gill


E15282
Associate Professor DISCOVER . LEARN . EMPOWER
CSE(AIT), CU 1
NATURAL LANGUAGE PROCESSING : Course Objectives

COURSE OBJECTIVES

The Course aims to:


•This course is an introduction to the fundamental concepts and techniques of natural
language processing (NLP).

2
COURSE OUTCOMES

On completion of this course, the students shall be able to:-


CO1

3
Contents to be Covered
• Knowledge in speech and language processing
• Ambiguity
• Models and Algorithms
Natural Language Processing
Natural Language Processing (NLP) refers to AI method of communicating with an
intelligent systems using a natural language such as English.
• Processing of Natural Language is required when you want an intelligent system like
robot to perform as per your instructions, when you want to hear decision from a
dialogue based clinical expert system, etc.
• The field of NLP involves making computers to perform useful tasks with the natural
languages humans use. The input and output of an NLP system can be −
• Speech
• Written Text

5
Components of NLP
There are two components of NLP as given −
• Natural Language Understanding (NLU)
• Understanding involves the following tasks −
• Mapping the given input in natural language into useful representations.
• Analyzing different aspects of the language.
• Natural Language Generation (NLG)
It is the process of producing meaningful phrases and sentences in the form of
natural language from some internal representation.
• It involves −
• Text planning − It includes retrieving the relevant content from knowledge base.
• Sentence planning − It includes choosing required words, forming meaningful
phrases, setting tone of the sentence.
• Text Realization − It is mapping sentence plan into sentence structure.
6
Difficulties in NLU
• NL has an extremely rich form and structure.
• It is very ambiguous. There can be different levels of ambiguity −
• Lexical ambiguity − It is at very primitive level such as word-level.
• For example, treating the word “board” as noun or verb?
• Syntax Level ambiguity − A sentence can be parsed in different ways.
• For example, “He lifted the beetle with red cap.” − Did he use cap to lift the beetle
or he lifted a beetle that had red cap?
• Referential ambiguity − Referring to something using pronouns. For example,
Rima went to Gauri. She said, “I am tired.” − Exactly who is tired?
• One input can mean different meanings.
• Many inputs can mean the same thing.

7
NLP Terminology
• Phonology − It is study of organizing sound systematically.
• Morphology − It is a study of construction of words from primitive meaningful units.
• Morpheme − It is primitive unit of meaning in a language.
• Syntax − It refers to arranging words to make a sentence. It also involves determining
the structural role of words in the sentence and in phrases.
• Semantics − It is concerned with the meaning of words and how to combine words
into meaningful phrases and sentences.
• Pragmatics − It deals with using and understanding sentences in different situations
and how the interpretation of the sentence is affected.
• Discourse − It deals with how the immediately preceding sentence can affect the
interpretation of the next sentence.
• World Knowledge − It includes the general knowledge about the world.
8
Steps in NLP
There are general five steps −
• Lexical Analysis − It involves identifying and analyzing the structure of words. Lexicon of a
language means the collection of words and phrases in a language. Lexical analysis is dividing
the whole chunk of txt into paragraphs, sentences, and words.
• Syntactic Analysis (Parsing) − It involves analysis of words in the sentence for grammar and
arranging words in a manner that shows the relationship among the words. The sentence
such as “The school goes to boy” is rejected by English syntactic analyzer.
• Semantic Analysis − It draws the exact meaning or the dictionary meaning from the text. The
text is checked for meaningfulness. It is done by mapping syntactic structures and objects in
the task domain. The semantic analyzer disregards sentence such as “hot ice-cream”.
• Discourse Integration − The meaning of any sentence depends upon the meaning of the
sentence just before it. In addition, it also brings about the meaning of immediately
succeeding sentence.
• Pragmatic Analysis − During this, what was said is re-interpreted on what it actually meant. It
involves deriving those aspects of language which require real world knowledge.
9
Why is Language Hard?

● Ambiguity on many levels

● Sparse data — many words are rare

● N o clear understand how humans process language

10
Words

11
Morphology

12
• Parts of Speech

13
Syntax

14
Semantics

15
Discourse

16
Recent Advances
Spoken dialogue devices
(Siri, Google Now, Echo, ...)

•IBM Watson wins Jeopardy

•Google machine translation

• Web-scale question answering

17
Language models 29

● Language models answer the question:


H ow likely is a string of English words good English?

● Help with ordering

pL M ( the house is small) > pL M ( small the is house)

● Help with word choice

pL M ( I am going home) > pL M ( I am going house)

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
N-Gram Language Models 30

● Given: a string of English words W = w 1 , w 2 , w 3 , ..., w n

● Question: what is p( W ) ?

● Sparse data: Many good English sentences will not have been seen before

→ Decomposing p(W ) using the chain rule:

p(w 1 , w 2 , w 3 , ..., w n ) = p(w 1 ) p(w2 |w1 ) p(w3|w1, w 2 )...p(w n |w 1 , w 2 ,...w n − 1 )

(not much gained yet, p(wn |w1 , w 2 ,...w n − 1 ) is equally sparse)

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Markov Chain 31

● Markov assumption:
– only previous history matters
– limited memory: only last k words are included in history (older
words less relevant)
→ kth order Markov model

● For instance 2-gram language model:

p( w1 , w2 , w3 , ..., wn ) p( w1 ) p( w2 |w1 ) p( w3 |w2 ) ...p( wn |


wn −1 )

● What is conditioned on, here wi−1 is called the history

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Estimating N-Gram Probabilities 32

● Maximum likelihood estimation

count( w ,1 w )2
p( w 2|w 1) =
count(w 1 )
● Collect counts over a large text corpus

● Millions to billions of words are easy to get (trillions of


English words available on the web)

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Example: 3-Gram 33

● Counts for trigrams and estimated word probabilities

the green (total: 1748) the red (total: 225) the blue (total: 54)
word c. prob. word c. prob. word c. prob.
paper 801 0.458 cross 123 0.547 box 16 0.296
group 640 0.367 tape 31 0.138 . 6 0.111
light 110 0.063 army 9 0.040 flag 6 0.111
party 27 0.015 card 7 0.031 , 3 0.056
ecu 21 0.012 , 5 0.022 angel 3 0.056

– 225 trigrams in the Europarl corpus start with the red


– 123 of them end with cross
123
→ maximum likelihood probability is 225 = 0.547.

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
How good is the LM? 34

● A good model assigns a text of real English W a high probability

● This can be also measured with cross entropy:

1
H ( W ) = log p( W 1n )
n

● Or, perplexity
perplexity( W ) = 2H (W )

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Example: 3-Gram 35

prediction -log2 pLM


p LM
0.109 3.197
pLM(would|<s>i) 0.144 2.791
pLM 0.489 1.031
pLM 0.905 0.144
pLM 0.002 8.794
pLM 0.472 1.084
pLM 0.147 2.763
pLM 0.056 4.150
pLM 0.194 2.367
pLM 0.089 3.498
pLM 0.290 1.785
pLM < > 0.99999 0.000014
average 2.634

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Comparison 1–4-Gram 36

word unigram bigram trigram 4-gram


i 6.684 3.197 3.197 3.197
would 8.342 2.884 2.791 2.791
like 9.129 2.026 1.031 1.290
to 5.081 0.402 0.144 0.113
commend 15.487 12.335 8.794 8.633
the 3.885 1.402 1.084 0.880
rapporteur 10.840 7.319 2.763 2.350
on 6.765 4.140 4.150 1.862
his 10.678 7.316 2.367 1.978
work 9.993 4.816 3.498 2.394
. 4.896 3.020 1.785 1.510
</s> 4.828 0.005 0.000 0.000
average 8.051 4.072 2.634 2.251
perplexity 265.136 16.817 6.206 4.758

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Core Challange 37

● How to handle low counts and unknown n-grams?

● Smoothing
– adjust counts for seen n-grams
– use probability mass for unseen n-grams
– many discount schemes developed

● Backoff
– if 5-gram unseen → use 4-gram instead

● Neural network models promise to handle this better

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
38

parts of speech

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Parts of Speech 39

● Open class words (or content words)


– nouns, verbs, adjectives, adverbs
– refer to objects, actions, and features in the world
– open class, new ones are added all the time (email, website).

● Close class words (or function words)


– pronouns, determiners, prepositions, connectives, ...
– there is a limited number of these
– mostly functional: to tie the concepts of a sentence together

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Parts of Speech 40

● There are about 30-100 parts of speech


– distinguish between names and abstract nouns?
– distinguish between plural noun and singular noun?
– distinguish between past tense verb and present tense word?

● Identifying the parts of speech is a first step towards syntactic analysis

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Ambiguous Words 41

● For instance: like


– verb: I like the class.
– preposition: He is like me.

● Another famous example: Time flies like an arrow

● Most of the time, the local context disambiguated the part of speech

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Part-of-Speech Tagging 42

● Task: Given a text of English, identify the parts of speech of each word

● Example
– Input: Word sequence Time flies
like an arrow
– Output: Tag sequence
Time/ NN flies/ VB like/ P an/ DET
arrow/ NN

● What will help us to tag words with


their parts-of-speech?

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Relevant Knowledge for POS Tagging 43

● The word itself


– Some words may only be nouns, e.g. arrow
– Some words are ambiguous, e.g. like, flies
– Probabilities may help, if one tag is more likely than another

● Local context
– two determiners rarely follow each other
– two base form verbs rarely follow each other
– determiner is almost always followed by adjective or noun

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Bayes Rule 44

● We want to find the best part-of-speech tag sequence T for a sentence S :

argmax T p( T |S )

● Bayes rule gives us: p( S |T ) p( T )


p( T |S ) =
p( S )

● We can drop p ( S ) if we are only interested in argmax T :

argmax T p( T |S ) = argmax T p( S |T ) p( T )

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Decomposing the Model 45

● The mapping p(S|T ) can be decomposed into

p( S |T ) = G p(w i |t i )
i

● p(T ) could be called a part-of-speech language model, for which we can use an n-gram
model (bigram):

p( T ) = p( t1 ) p( t2 |t1 ) p( t3 |t2 ) ...p( tn |tn −1 )

● We can estimate p(S| T ) and p( T ) with maximum likelihood estimation (and


maybe some smoothing)

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Hidden Markov Model (HMM) 46

● The model we just developed is a Hidden Markov Model

● Elements of an H M M model:
– a set of states (here: the tags)
– an output alphabet (here: words)
– intitial state (here: beginning of sentence)
– state transition probabilities (here: p(tn |tn−1 ))
– symbol emission probabilities (here: p(w i |t i ))

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Graphical Representation 47

● When tagging a sentence, we are walking through the state graph:

START VB

NN IN

DET

END

● State transition probabilities: p(t n |t n−1 )

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Graphical Representation 48

● At each state we emit a word:

like
flies

VB

● Symbol emission probabilities: p(w i |t i )

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Search for the Best Tag Sequence 49

● We have defined a model, but how do we use it?


– given: word sequence
– wanted: tag sequence

● If we consider a specific tag sequence, it is straight-forward to compute its


probability
p( S |T ) p( T ) = G p( wi |ti ) p( ti |ti −1 )
i

● Problem: if we have on average c choices for each of the n words, there are c n

possible tag sequences, maybe too many to efficiently evaluate

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Walking Through the States 50

● First, we go to state NN to emit time:

VB NN

START
D
E
T

I
N

time

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Walking Through the States 51

● Then, we go to state VB to emit flies:

VB VB

NN NN

START
DET DET

IN IN

time flies

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Walking Through the States 52

● Of course, there are many possible paths:

VB VB VB VB

NN NN NN NN

START
DET DET DET DET

IN IN IN IN

time flies like an

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Viterbi Algorithm 53

● Intuition: Since state transition out of a state only depend on the current state (and not
previous states), we can record for each state the optimal path

● We record:
– cheapest cost to state j at step s in δ j (s)
– backtrace from that state to best predecessor ψ j (s)

● Stepping through all states at each time steps allows us to compute


– δj ( s + 1) = max 1≤i ≤N δi ( s) p( tj |ti ) p( ws +1 |tj )
– ψj ( s + 1) = argmax 1≤i ≤N δi ( s) p( tj |ti ) p( ws +1 |tj )

● Best final state is argmax1≤i≤N δ i (|S|), we can backtrack from there

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Key points of the lecture

• Knowledge in speech and language processing

• Ambiguity

• Models

• Algorithms

43
Thank you

Please Send Your Queries on:

e-Mail:[email protected]

44

You might also like