0% found this document useful (0 votes)

17 views44 pages

Lec 1.1.2

This document provides an overview of a course on natural language processing (NLP). The course aims to provide students with knowledge of fundamental NLP concepts and techniques. On completing the course, students will be able to demonstrate understanding of topics like ambiguity, NLP models and algorithms, and speech and language processing. The content covered includes NLP components and terminology, challenges in natural language understanding, and recent advances in areas like dialogue systems, machine translation and question answering.

Uploaded by

yuvrajaditya1306

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views44 pages

Lec 1.1.2

Uploaded by

yuvrajaditya1306

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 44

Apex Institute of Technology

Department of Computer Science & Engineering

NATURAL LANGUAGE PROCESSING

(20CST354)

Dr Satinderjit Kaur Gill

E15282
Associate Professor DISCOVER . LEARN . EMPOWER
CSE(AIT), CU 1
NATURAL LANGUAGE PROCESSING : Course Objectives

COURSE OBJECTIVES

The Course aims to:

•This course is an introduction to the fundamental concepts and techniques of natural
language processing (NLP).

2
COURSE OUTCOMES

On completion of this course, the students shall be able to:-

CO1

3
Contents to be Covered
• Knowledge in speech and language processing
• Ambiguity
• Models and Algorithms
Natural Language Processing
Natural Language Processing (NLP) refers to AI method of communicating with an
intelligent systems using a natural language such as English.
• Processing of Natural Language is required when you want an intelligent system like
robot to perform as per your instructions, when you want to hear decision from a
dialogue based clinical expert system, etc.
• The field of NLP involves making computers to perform useful tasks with the natural
languages humans use. The input and output of an NLP system can be −
• Speech
• Written Text

5
Components of NLP
There are two components of NLP as given −
• Natural Language Understanding (NLU)
• Understanding involves the following tasks −
• Mapping the given input in natural language into useful representations.
• Analyzing different aspects of the language.
• Natural Language Generation (NLG)
It is the process of producing meaningful phrases and sentences in the form of
natural language from some internal representation.
• It involves −
• Text planning − It includes retrieving the relevant content from knowledge base.
• Sentence planning − It includes choosing required words, forming meaningful
phrases, setting tone of the sentence.
• Text Realization − It is mapping sentence plan into sentence structure.
6
Difficulties in NLU
• NL has an extremely rich form and structure.
• It is very ambiguous. There can be different levels of ambiguity −
• Lexical ambiguity − It is at very primitive level such as word-level.
• For example, treating the word “board” as noun or verb?
• Syntax Level ambiguity − A sentence can be parsed in different ways.
• For example, “He lifted the beetle with red cap.” − Did he use cap to lift the beetle
or he lifted a beetle that had red cap?
• Referential ambiguity − Referring to something using pronouns. For example,
Rima went to Gauri. She said, “I am tired.” − Exactly who is tired?
• One input can mean different meanings.
• Many inputs can mean the same thing.

7
NLP Terminology
• Phonology − It is study of organizing sound systematically.
• Morphology − It is a study of construction of words from primitive meaningful units.
• Morpheme − It is primitive unit of meaning in a language.
• Syntax − It refers to arranging words to make a sentence. It also involves determining
the structural role of words in the sentence and in phrases.
• Semantics − It is concerned with the meaning of words and how to combine words
into meaningful phrases and sentences.
• Pragmatics − It deals with using and understanding sentences in different situations
and how the interpretation of the sentence is affected.
• Discourse − It deals with how the immediately preceding sentence can affect the
interpretation of the next sentence.
• World Knowledge − It includes the general knowledge about the world.
8
Steps in NLP
There are general five steps −
• Lexical Analysis − It involves identifying and analyzing the structure of words. Lexicon of a
language means the collection of words and phrases in a language. Lexical analysis is dividing
the whole chunk of txt into paragraphs, sentences, and words.
• Syntactic Analysis (Parsing) − It involves analysis of words in the sentence for grammar and
arranging words in a manner that shows the relationship among the words. The sentence
such as “The school goes to boy” is rejected by English syntactic analyzer.
• Semantic Analysis − It draws the exact meaning or the dictionary meaning from the text. The
text is checked for meaningfulness. It is done by mapping syntactic structures and objects in
the task domain. The semantic analyzer disregards sentence such as “hot ice-cream”.
• Discourse Integration − The meaning of any sentence depends upon the meaning of the
sentence just before it. In addition, it also brings about the meaning of immediately
succeeding sentence.
• Pragmatic Analysis − During this, what was said is re-interpreted on what it actually meant. It
involves deriving those aspects of language which require real world knowledge.
9
Why is Language Hard?

● Ambiguity on many levels

● Sparse data — many words are rare

● N o clear understand how humans process language

10
Words

11
Morphology

12
• Parts of Speech

13
Syntax

14
Semantics

15
Discourse

16
Recent Advances
Spoken dialogue devices
(Siri, Google Now, Echo, ...)

•IBM Watson wins Jeopardy

•Google machine translation

• Web-scale question answering

17
Language models 29

● Language models answer the question:

H ow likely is a string of English words good English?

● Help with ordering

pL M ( the house is small) > pL M ( small the is house)

● Help with word choice

pL M ( I am going home) > pL M ( I am going house)

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
N-Gram Language Models 30

● Given: a string of English words W = w 1 , w 2 , w 3 , ..., w n

● Question: what is p( W ) ?

● Sparse data: Many good English sentences will not have been seen before

→ Decomposing p(W ) using the chain rule:

p(w 1 , w 2 , w 3 , ..., w n ) = p(w 1 ) p(w2 |w1 ) p(w3|w1, w 2 )...p(w n |w 1 , w 2 ,...w n − 1 )

(not much gained yet, p(wn |w1 , w 2 ,...w n − 1 ) is equally sparse)

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Markov Chain 31

● Markov assumption:
– only previous history matters
– limited memory: only last k words are included in history (older
words less relevant)
→ kth order Markov model

● For instance 2-gram language model:

p( w1 , w2 , w3 , ..., wn ) p( w1 ) p( w2 |w1 ) p( w3 |w2 ) ...p( wn |

wn −1 )

● What is conditioned on, here wi−1 is called the history

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Estimating N-Gram Probabilities 32

● Maximum likelihood estimation

count( w ,1 w )2
p( w 2|w 1) =
count(w 1 )
● Collect counts over a large text corpus

● Millions to billions of words are easy to get (trillions of

English words available on the web)

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Example: 3-Gram 33

● Counts for trigrams and estimated word probabilities

the green (total: 1748) the red (total: 225) the blue (total: 54)
word c. prob. word c. prob. word c. prob.
paper 801 0.458 cross 123 0.547 box 16 0.296
group 640 0.367 tape 31 0.138 . 6 0.111
light 110 0.063 army 9 0.040 flag 6 0.111
party 27 0.015 card 7 0.031 , 3 0.056
ecu 21 0.012 , 5 0.022 angel 3 0.056

– 225 trigrams in the Europarl corpus start with the red

– 123 of them end with cross
123
→ maximum likelihood probability is 225 = 0.547.

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
How good is the LM? 34

● A good model assigns a text of real English W a high probability

● This can be also measured with cross entropy:

1
H ( W ) = log p( W 1n )
n

● Or, perplexity
perplexity( W ) = 2H (W )

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Example: 3-Gram 35

prediction -log2 pLM

p LM
0.109 3.197
pLM(would|<s>i) 0.144 2.791
pLM 0.489 1.031
pLM 0.905 0.144
pLM 0.002 8.794
pLM 0.472 1.084
pLM 0.147 2.763
pLM 0.056 4.150
pLM 0.194 2.367
pLM 0.089 3.498
pLM 0.290 1.785
pLM < > 0.99999 0.000014
average 2.634

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Comparison 1–4-Gram 36

word unigram bigram trigram 4-gram

i 6.684 3.197 3.197 3.197
would 8.342 2.884 2.791 2.791
like 9.129 2.026 1.031 1.290
to 5.081 0.402 0.144 0.113
commend 15.487 12.335 8.794 8.633
the 3.885 1.402 1.084 0.880
rapporteur 10.840 7.319 2.763 2.350
on 6.765 4.140 4.150 1.862
his 10.678 7.316 2.367 1.978
work 9.993 4.816 3.498 2.394
. 4.896 3.020 1.785 1.510
</s> 4.828 0.005 0.000 0.000
average 8.051 4.072 2.634 2.251
perplexity 265.136 16.817 6.206 4.758

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Core Challange 37

● How to handle low counts and unknown n-grams?

● Smoothing
– adjust counts for seen n-grams
– use probability mass for unseen n-grams
– many discount schemes developed

● Backoff
– if 5-gram unseen → use 4-gram instead

● Neural network models promise to handle this better

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
38

parts of speech

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Parts of Speech 39

● Open class words (or content words)

– nouns, verbs, adjectives, adverbs
– refer to objects, actions, and features in the world
– open class, new ones are added all the time (email, website).

● Close class words (or function words)

– pronouns, determiners, prepositions, connectives, ...
– there is a limited number of these
– mostly functional: to tie the concepts of a sentence together

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Parts of Speech 40

● There are about 30-100 parts of speech

– distinguish between names and abstract nouns?
– distinguish between plural noun and singular noun?
– distinguish between past tense verb and present tense word?

● Identifying the parts of speech is a first step towards syntactic analysis

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Ambiguous Words 41

● For instance: like

– verb: I like the class.
– preposition: He is like me.

● Another famous example: Time flies like an arrow

● Most of the time, the local context disambiguated the part of speech

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Part-of-Speech Tagging 42

● Task: Given a text of English, identify the parts of speech of each word

● Example
– Input: Word sequence Time flies
like an arrow
– Output: Tag sequence
Time/ NN flies/ VB like/ P an/ DET
arrow/ NN

● What will help us to tag words with

their parts-of-speech?

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Relevant Knowledge for POS Tagging 43

● The word itself

– Some words may only be nouns, e.g. arrow
– Some words are ambiguous, e.g. like, flies
– Probabilities may help, if one tag is more likely than another

● Local context
– two determiners rarely follow each other
– two base form verbs rarely follow each other
– determiner is almost always followed by adjective or noun

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Bayes Rule 44

● We want to find the best part-of-speech tag sequence T for a sentence S :

argmax T p( T |S )

● Bayes rule gives us: p( S |T ) p( T )

p( T |S ) =
p( S )

● We can drop p ( S ) if we are only interested in argmax T :

argmax T p( T |S ) = argmax T p( S |T ) p( T )

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Decomposing the Model 45

● The mapping p(S|T ) can be decomposed into

p( S |T ) = G p(w i |t i )
i

● p(T ) could be called a part-of-speech language model, for which we can use an n-gram
model (bigram):

p( T ) = p( t1 ) p( t2 |t1 ) p( t3 |t2 ) ...p( tn |tn −1 )

● We can estimate p(S| T ) and p( T ) with maximum likelihood estimation (and

maybe some smoothing)

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Hidden Markov Model (HMM) 46

● The model we just developed is a Hidden Markov Model

● Elements of an H M M model:
– a set of states (here: the tags)
– an output alphabet (here: words)
– intitial state (here: beginning of sentence)
– state transition probabilities (here: p(tn |tn−1 ))
– symbol emission probabilities (here: p(w i |t i ))

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Graphical Representation 47

● When tagging a sentence, we are walking through the state graph:

START VB

NN IN

DET

END

● State transition probabilities: p(t n |t n−1 )

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Graphical Representation 48

● At each state we emit a word:

like
flies

● Symbol emission probabilities: p(w i |t i )

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Search for the Best Tag Sequence 49

● We have defined a model, but how do we use it?

– given: word sequence
– wanted: tag sequence

● If we consider a specific tag sequence, it is straight-forward to compute its

probability
p( S |T ) p( T ) = G p( wi |ti ) p( ti |ti −1 )
i

● Problem: if we have on average c choices for each of the n words, there are c n

possible tag sequences, maybe too many to efficiently evaluate

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Walking Through the States 50

● First, we go to state NN to emit time:

VB NN

START
D
E
T

I
N

time

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Walking Through the States 51

● Then, we go to state VB to emit flies:

VB VB

NN NN

START
DET DET

IN IN

time flies

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Walking Through the States 52

● Of course, there are many possible paths:

VB VB VB VB

NN NN NN NN

START
DET DET DET DET

IN IN IN IN

time flies like an

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Viterbi Algorithm 53

● Intuition: Since state transition out of a state only depend on the current state (and not
previous states), we can record for each state the optimal path

● We record:
– cheapest cost to state j at step s in δ j (s)
– backtrace from that state to best predecessor ψ j (s)

● Stepping through all states at each time steps allows us to compute

– δj ( s + 1) = max 1≤i ≤N δi ( s) p( tj |ti ) p( ws +1 |tj )
– ψj ( s + 1) = argmax 1≤i ≤N δi ( s) p( tj |ti ) p( ws +1 |tj )

● Best final state is argmax1≤i≤N δ i (|S|), we can backtrack from there

Philipp Koehn
Artificial Intelligence: Natural Language Processing 23 April 2020
Key points of the lecture

• Knowledge in speech and language processing

• Ambiguity

• Models

• Algorithms

43
Thank you

Please Send Your Queries on:

e-Mail:[email protected]

NLP Module - 1
No ratings yet
NLP Module - 1
16 pages
Natural Language Processing: Dr. Abdulfetah A.A
No ratings yet
Natural Language Processing: Dr. Abdulfetah A.A
25 pages
NLP Introduction Week3
No ratings yet
NLP Introduction Week3
28 pages
1.introduction To Natural Language Processing (NLP)
100% (1)
1.introduction To Natural Language Processing (NLP)
37 pages
Natural Language Processing
No ratings yet
Natural Language Processing
28 pages
NLP Unit1
No ratings yet
NLP Unit1
51 pages
NLP Merged
100% (1)
NLP Merged
975 pages
1 Introduction
No ratings yet
1 Introduction
45 pages
Lec1-UNIT5 - MORE SIMPLER
No ratings yet
Lec1-UNIT5 - MORE SIMPLER
28 pages
INTRONLP
No ratings yet
INTRONLP
30 pages
Lecture 1
No ratings yet
Lecture 1
16 pages
Chapter 6
100% (1)
Chapter 6
28 pages
1 - Intro - To - NLP 2
No ratings yet
1 - Intro - To - NLP 2
55 pages
NLP Introduction
No ratings yet
NLP Introduction
35 pages
2 Introduction
No ratings yet
2 Introduction
15 pages
Unit 1 NLP Introduction
No ratings yet
Unit 1 NLP Introduction
48 pages
NLP Textbook Star Edu
No ratings yet
NLP Textbook Star Edu
103 pages
NLP Presentation
No ratings yet
NLP Presentation
19 pages
Unit V
No ratings yet
Unit V
16 pages
Unit 1
No ratings yet
Unit 1
68 pages
Artificial Intelligence: Rohan Raj Poudel
No ratings yet
Artificial Intelligence: Rohan Raj Poudel
34 pages
Module1 Chapter1
No ratings yet
Module1 Chapter1
23 pages
Natural Language Processing
No ratings yet
Natural Language Processing
72 pages
Module 1 Part1 NLP
No ratings yet
Module 1 Part1 NLP
24 pages
Natural Language Processing (NLP) : Chapter 1: Introduction To NLP
No ratings yet
Natural Language Processing (NLP) : Chapter 1: Introduction To NLP
96 pages
Unit I
No ratings yet
Unit I
28 pages
1 Natural Language Processing-Intro
No ratings yet
1 Natural Language Processing-Intro
16 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
45 pages
Natural Language Processing
No ratings yet
Natural Language Processing
20 pages
1 Introduction
No ratings yet
1 Introduction
13 pages
NLP 1
No ratings yet
NLP 1
20 pages
Nayie Bayes Classifier 21 Page
No ratings yet
Nayie Bayes Classifier 21 Page
28 pages
NLP Module 1
No ratings yet
NLP Module 1
124 pages
Chapter 1
No ratings yet
Chapter 1
5 pages
1-Introduction To NLP - Part1
No ratings yet
1-Introduction To NLP - Part1
31 pages
10 Natural Language Processing
No ratings yet
10 Natural Language Processing
27 pages
NLP01 IntroNLP
No ratings yet
NLP01 IntroNLP
68 pages
AI Chapter 6 and 7 New
No ratings yet
AI Chapter 6 and 7 New
48 pages
1.chapter1 Introduction Chapter2 LanguageCharacteristics
No ratings yet
1.chapter1 Introduction Chapter2 LanguageCharacteristics
35 pages
1 Intro To NLP
100% (1)
1 Intro To NLP
46 pages
NLP Introduction Overview
No ratings yet
NLP Introduction Overview
34 pages
Chapter 6
No ratings yet
Chapter 6
21 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
No ratings yet
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
41 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
Unit-I NLP
No ratings yet
Unit-I NLP
15 pages
1-Introduction To NLP
No ratings yet
1-Introduction To NLP
31 pages
6CS4 AI Unit-5
No ratings yet
6CS4 AI Unit-5
65 pages
NLP Introduction
No ratings yet
NLP Introduction
35 pages
Introduction To NLP
No ratings yet
Introduction To NLP
23 pages
NLP Unit I
No ratings yet
NLP Unit I
30 pages
Introduction To Natural Language Processing: Unit 1
No ratings yet
Introduction To Natural Language Processing: Unit 1
60 pages
Natural Language Processing
100% (1)
Natural Language Processing
3 pages
3.1 Natural Language Processing
No ratings yet
3.1 Natural Language Processing
5 pages
Unit 5 - Notes
No ratings yet
Unit 5 - Notes
11 pages
Seminar Report1
No ratings yet
Seminar Report1
17 pages
Lect1 Intro 3jan08
No ratings yet
Lect1 Intro 3jan08
94 pages
Introduction To NLP
No ratings yet
Introduction To NLP
51 pages
Drug Recommendation System Based On Sentiment Analysis of Drug Reviews Using Machine Learning
No ratings yet
Drug Recommendation System Based On Sentiment Analysis of Drug Reviews Using Machine Learning
8 pages
Sentiment Analysis For Stock Price Prediction: Rubi Gupta, Min Chen
No ratings yet
Sentiment Analysis For Stock Price Prediction: Rubi Gupta, Min Chen
6 pages
UBC Summer School in NLP - VSP 2019 Lecture 9
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 9
17 pages
NLP2 7
No ratings yet
NLP2 7
400 pages
Learning Stock Market Sentiment Lexicon and Sentiment-Oriented Word Vector From StockTwits
No ratings yet
Learning Stock Market Sentiment Lexicon and Sentiment-Oriented Word Vector From StockTwits
10 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
35 pages
Feature Engineering and Selection: CS 294: Practical Machine Learning October 1, 2009 Alexandre Bouchard-Côté
No ratings yet
Feature Engineering and Selection: CS 294: Practical Machine Learning October 1, 2009 Alexandre Bouchard-Côté
94 pages
Rnn-Based Ams + Introduction To Language Modeling: Instructor: Preethi Jyothi
No ratings yet
Rnn-Based Ams + Introduction To Language Modeling: Instructor: Preethi Jyothi
36 pages
NLP-Lectures 4,5,6
No ratings yet
NLP-Lectures 4,5,6
85 pages
3-Lecture Three - (Chapter Two-N-gram Language Models)
No ratings yet
3-Lecture Three - (Chapter Two-N-gram Language Models)
28 pages
Text Similarity in Vector Space Models: A Comparative Study
No ratings yet
Text Similarity in Vector Space Models: A Comparative Study
17 pages
Word N-Gram Based Approach For Word Sense Disambiguation in Telugu Natural Language Processing
No ratings yet
Word N-Gram Based Approach For Word Sense Disambiguation in Telugu Natural Language Processing
5 pages
Bigrams and Trigrams
No ratings yet
Bigrams and Trigrams
8 pages
Text Classification by Augmenting Bag of Words (BOW) Representation With Co-Occurrence Feature
No ratings yet
Text Classification by Augmenting Bag of Words (BOW) Representation With Co-Occurrence Feature
5 pages
Design and Implementation
No ratings yet
Design and Implementation
74 pages
Build Your Own Resume Parser Using Python and NLP - APILayer
No ratings yet
Build Your Own Resume Parser Using Python and NLP - APILayer
12 pages
ML 12 NLP Example
No ratings yet
ML 12 NLP Example
30 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
Spark NLP Training-Public-Oct 2020
No ratings yet
Spark NLP Training-Public-Oct 2020
50 pages
Computer Science Engineering
No ratings yet
Computer Science Engineering
33 pages
Evaluating Language Models
No ratings yet
Evaluating Language Models
21 pages
Word Embedding 9 Mar 23 PDF
No ratings yet
Word Embedding 9 Mar 23 PDF
16 pages
Arabic Fine-Grained Dialect Identification With Ensemble Learning
No ratings yet
Arabic Fine-Grained Dialect Identification With Ensemble Learning
5 pages
PHD Thesis
No ratings yet
PHD Thesis
218 pages
Handwritten Text Recognition: M.J. Castro-Bleda, S. Espa Na-Boquera, F. Zamora-Mart Inez
No ratings yet
Handwritten Text Recognition: M.J. Castro-Bleda, S. Espa Na-Boquera, F. Zamora-Mart Inez
24 pages
A Tutorial of Text Mining in R Using TM Package
No ratings yet
A Tutorial of Text Mining in R Using TM Package
6 pages
Unit 2 Data - Structures
No ratings yet
Unit 2 Data - Structures
84 pages
Rap Lyric Generator: 1 Research Question
100% (1)
Rap Lyric Generator: 1 Research Question
9 pages
A Complete Kaldi Recipe For Building Arabic Speech Recognition Systems
No ratings yet
A Complete Kaldi Recipe For Building Arabic Speech Recognition Systems
5 pages
Temam Mohammed AR2
No ratings yet
Temam Mohammed AR2
8 pages

Lec 1.1.2

Uploaded by

Lec 1.1.2

Uploaded by

Apex Institute of Technology

Department of Computer Science & Engineering

NATURAL LANGUAGE PROCESSING

Dr Satinderjit Kaur Gill

The Course aims to:

On completion of this course, the students shall be able to:-

● Ambiguity on many levels

● Sparse data — many words are rare

● N o clear understand how humans process language

•IBM Watson wins Jeopardy

•Google machine translation

• Web-scale question answering

● Language models answer the question:

● Help with ordering

pL M ( the house is small) > pL M ( small the is house)

● Help with word choice

pL M ( I am going home) > pL M ( I am going house)

● Given: a string of English words W = w 1 , w 2 , w 3 , ..., w n

→ Decomposing p(W ) using the chain rule:

p(w 1 , w 2 , w 3 , ..., w n ) = p(w 1 ) p(w2 |w1 ) p(w3|w1, w 2 )...p(w n |w 1 , w 2 ,...w n − 1 )

(not much gained yet, p(wn |w1 , w 2 ,...w n − 1 ) is equally sparse)

● For instance 2-gram language model:

p( w1 , w2 , w3 , ..., wn ) p( w1 ) p( w2 |w1 ) p( w3 |w2 ) ...p( wn |

● What is conditioned on, here wi−1 is called the history

● Maximum likelihood estimation

● Millions to billions of words are easy to get (trillions of

● Counts for trigrams and estimated word probabilities

– 225 trigrams in the Europarl corpus start with the red

● A good model assigns a text of real English W a high probability

● This can be also measured with cross entropy:

prediction -log2 pLM

word unigram bigram trigram 4-gram

● How to handle low counts and unknown n-grams?

● Neural network models promise to handle this better

● Open class words (or content words)

● Close class words (or function words)

● There are about 30-100 parts of speech

● Identifying the parts of speech is a first step towards syntactic analysis

● For instance: like

● Another famous example: Time flies like an arrow

● What will help us to tag words with

● The word itself

● We want to find the best part-of-speech tag sequence T for a sentence S :

● Bayes rule gives us: p( S |T ) p( T )

● We can drop p ( S ) if we are only interested in argmax T :

● The mapping p(S|T ) can be decomposed into

p( T ) = p( t1 ) p( t2 |t1 ) p( t3 |t2 ) ...p( tn |tn −1 )

● We can estimate p(S| T ) and p( T ) with maximum likelihood estimation (and

● The model we just developed is a Hidden Markov Model

● When tagging a sentence, we are walking through the state graph:

● State transition probabilities: p(t n |t n−1 )

● At each state we emit a word:

● Symbol emission probabilities: p(w i |t i )

● We have defined a model, but how do we use it?

● If we consider a specific tag sequence, it is straight-forward to compute its

possible tag sequences, maybe too many to efficiently evaluate

● First, we go to state NN to emit time:

● Then, we go to state VB to emit flies:

● Of course, there are many possible paths:

time flies like an

● Stepping through all states at each time steps allows us to compute

● Best final state is argmax1≤i≤N δ i (|S|), we can backtrack from there

• Knowledge in speech and language processing

Please Send Your Queries on:

You might also like