0% found this document useful (0 votes)

89 views28 pages

3-Lecture Three - (Chapter Two-N-gram Language Models)

This document discusses n-gram language models. It begins with an introduction that defines language models and their uses in applications like speech recognition and machine translation. It then discusses the role of n-gram models, describing them as (n-1)-order Markov models that use the previous n-1 words to predict the next word. The document provides examples of estimating probabilities from text using n-gram counts and discusses issues like handling unknown words. Finally, it discusses parameter estimation and smoothing techniques for n-gram models.

Uploaded by

Getnete degemu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views28 pages

3-Lecture Three - (Chapter Two-N-gram Language Models)

Uploaded by

Getnete degemu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 28

Chapter 2 : N – gram Language

Models
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of CSE
Dr. Mesfin Abebe Haile (2022)
Outline

 Introduction
 The role of language models
 Simple N-gram models
 Estimating parameters and smoothing
 Evaluating language models

01/02/23 2
Introduction

 Language models assign a probability that a sentence is a legal

string in a language.

 Language Models are useful component of many NLP systems,

such as:
 Automatic Speech Recognition (ASR),

 Optical Character Recognition (OCR), and

 Machine Translation (MT).

01/02/23 3
Introduction …

 Language Models Definition:

 Impossible to recover W successfully in all cases – ambiguity.
 Instead, minimize probability of error.
 Choosing estimate of W out of a number of options.
 Ŵ – for which the probability given signal Y is greatest.

Ŵ = max(∀ i : p(Ŵi | Y) )

 Language model – computational mechanism for obtaining these

conditional probabilities.

01/02/23 4
Introduction …

 Language models answer the question:

 How likely is a string of English words good English?

 Help with reordering:

 PLM(the house is small) > PLM(small the is house)

 Help with word choice:

 PLM(I am going home) > PLM(I am going house)

01/02/23 5
Introduction …

 What is a statistical language model?

 A stochastic process model for word sequences. A mechanism for
computing the probability of:
p(w1, . . . ,wT )
 Statistical language modeling
 Goal: create a statistical model so that one can calculate the
probability of a sequence of tokens s = w1, w2,…, wn in a language.
 General approach:
Training corpus s

Probabilities of the
observed elements P(s)
01/02/23 6
Role of Language Models

 Why are language models interesting?

 Important component of a speech recognition system.
 Helps discriminate between similar sounding words.
 Helps reduce search costs.
 In statistical machine translation, a language model characterizes the target
language, captures fluency.
 For selecting alternatives in summarization, generation.
 Text classification (style, reading level, language, topic, . . . )
 Language models can be used for more than just words:
 Letter sequences (language identification)
 Speech act sequence modeling
 Case and punctuation restoration

01/02/23 7
Role of Language Models…

 Uses of Language Models:

 Speech recognition
 “I ate a cherry” is a more likely sentence than “Eye eight uh Jerry”
 OCR & Handwriting recognition
 More probable sentences are more likely correct readings.
 Machine translation
 More likely sentences are probably better translations.
 Generation
 More likely sentences are probably better NL generations.
 Context sensitive spelling correction
 “Their are problems wit this sentence.”

01/02/23 8
Role of Language Models…

 Completion Prediction
 A language model also supports predicting the completion of a
sentence.
 Please turn off your cell _____
 Your program does not ______

 Predictive text input systems can guess what you are typing and
give choices on how to complete it.

01/02/23 9
Simple N-Gram Models

 An n-gram model is a type of probabilistic language model for

predicting the next item in such a sequence in the form of a (n-1) -
order Markov model.

 Two benefits of n-gram models (and algorithms that use them) are
simplicity and scalability – with larger n, a model can store more
context with a well-understood space–time tradeoff, enabling small
experiments to scale up efficiently.

 Simple n-gram models are easy to train on unsupervised corpora and

can provide useful estimates of sentence likelihood.

01/02/23 10
Simple N-Gram Models…

 Estimate probability of each word given prior context.

 P(phone | Please turn off your cell)

 Number of parameters required grows exponentially with the

number of words of prior context.

 An n-gram model uses only N1 words of prior context.

 Unigram: P(phone)
 Bigram: P(phone | cell)
 Trigram: P(phone | your cell)

01/02/23 11
Simple N-Gram Models…

 The Markov assumption is the presumption that the future

behavior of a dynamical system only depends on its recent
history.
 In particular, in a kth-order Markov model, the next state only
depends on the k most recent states, therefore an n-gram model is
a (N1)-order Markov model.
 Use the previous N-1 words in a sequence to predict the next
word.
 Language Model (LM)
 unigrams, bigrams, trigrams, 4 grams, 5 grams…
 How do we train these models?
 Using very large corpora
01/02/23 12
Simple N-Gram Models…

 N-Gram Model Formulas

01/02/23 13
Estimating Probabilities

 N-gram conditional probabilities can be estimated from raw

text based on the relative frequency of word sequences.
 To have a consistent probabilistic model, append a unique start
(<s>) and end (</s>) symbol to every sentence and treat these
as additional words.

01/02/23 14
Example

 Here are some text normalized sample user queries (a sample of

9332 sentences is on the website):
 Berkeley Restaurant Project Senetences:
 can you tell me about any good cantonese restaurants close by
 mid priced thai food is what i’m looking for
 tell me about chez panisse
 can you give me a listing of the kinds of food that are available
 i’m looking for a good place to eat breakfast
 when is caffe venezia open during the day

01/02/23 15
Example

01/02/23 16
Example

 Bigram estimates of sentence probabilities:

 P(<s> i want Chinese food </s>)

 What kinds of knowledge?

01/02/23 18
Example

 Practical Issues: We do every thing in log space,

 To avoid underflow, (Arithmetic under flow)
 To make easy computing (Adding is faster than Multiplication).

 P1 x P2 x P3 x P4 = logP1 + logP2 + logP3 + logP4

01/02/23 19
Simple N-Gram Models …

 Train and Test Corpora:

 A language model must be trained on a large corpus of text to
estimate good parameter values.

 Ideally, the training (and test) corpus should be representative of

the actual application data.

 May need to adapt a general model to a small amount of new (in-

domain) data by adding highly weighted small corpus to original
training data.

01/02/23 20
Simple N-Gram Models …

 Train and Test Corpora…

 Unknown Words:
 How to handle words in the test corpus that did not occur in
the training data, i.e. out of vocabulary (OOV) words?

 Train a model that includes an explicit symbol for an unknown

word (<UNK>).
 Choose a vocabulary in advance and replace other words in the
training corpus with <UNK>.
 Replace the first occurrence of each word in the training data
with <UNK>.

01/02/23 21
Estimating Parameters and
Smoothing
 Estimating Parameters
 Parameter estimation is fundamental to many statistical approaches
to NLP.
 Because of the high-dimensional nature of natural language, it is
often easy to generate an extremely large number of features.
 The challenge of parameter estimation is to find a combination of
the typically noisy, redundant features that accurately predicts the
target output variable and avoids over fitting.
 List of potential parameter estimators:
 Maximum Entropy (ME) estimation with L 2 regularization, the
Averaged Perceptron (AP), Boosting, ME estimation with L 1
regularization using a novel optimization algorithm, and BLasso,
which is a version of Boosting with Lasso (L 1) regularization, etc
01/02/23 22
Estimating Parameters and
Smoothing…
 Estimating Parameters…
 Intuitively, this can be achieved either
 By selecting a small number of highly-effective features and ignoring
the others, or
 By averaging over a large number of weakly informative features.
 The first intuition motivates feature selection methods such as
Boosting and Blasso which usually work best when many features
are completely irrelevant.
 L1 or Lasso regularization of linear models embeds feature
selection into regularization so that both an assessment of the
reliability of a feature and the decision about whether to remove it
are done in the same framework, and has generated a large amount
of interest in the NLP community recently.
01/02/23 23
Estimating Parameters and
Smoothing…
 Estimating Parameters…
 If on the other hand most features are noisy but at least weakly
correlated with the target, it may be reasonable to attempt to
reduce noise by averaging over all of the features.

 ME estimators with L2 regularization, which have been widely

used in NLP tasks tend to produce models that have this property.

 In addition, the perceptron algorithm and its variants, e.g., the

voted or averaged perceptron, is becoming increasingly popular
due to their competitive performance, simplicity in
implementation and low computational cost in training.
01/02/23 24
Evaluating Language Model

 Ideally, evaluate use of model in end application (extrinsic

evaluation)
 Realistic approach
 Expensive (time consuming)
 Evaluate the ability of the model using test corpus and metrics
(intrinsic evaluation: independent of any application).
 Less realistic
 Cheaper

 Verify at least once that intrinsic evaluation correlates with an

extrinsic one.
01/02/23 25
Summary of Language Model

 Limitations of LM (n-gram) so far:

 P(word / full history) is too expensive.
 P(word / previous few words) is feasible
 The approach give us the local context only! It has lack of the
global context.
 Other approaches:
 Neural Networks
 Recurrent Neural Network (RNN – Most recent words)
 Long Short Term Memory (LSTM – limited to a few hundred
words due to their inherently complex sequential paths from the
previous unit to the current unit)
 Transformer (new model – in 2017 Google paper)
01/02/23 26
Question & Answer

01/02/23 27
Thank You !!!

01/02/23 28

The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
100% (8)
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
148 pages
DL Notes 1 5 Deep Learning
100% (1)
DL Notes 1 5 Deep Learning
189 pages
Heart Disease Prediction Final
67% (3)
Heart Disease Prediction Final
45 pages
Ngrams
100% (1)
Ngrams
22 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
Intro To Language Models - Soumyasis Mishra - 191001021003 - BCS4C
No ratings yet
Intro To Language Models - Soumyasis Mishra - 191001021003 - BCS4C
10 pages
04 Language Modeling
No ratings yet
04 Language Modeling
70 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
NLP Unit2
No ratings yet
NLP Unit2
65 pages
NLP Unit-4
No ratings yet
NLP Unit-4
62 pages
NLP Sem Unit 5
No ratings yet
NLP Sem Unit 5
9 pages
3 LM 2024
No ratings yet
3 LM 2024
78 pages
NLP m2
No ratings yet
NLP m2
74 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
NLP 1
No ratings yet
NLP 1
13 pages
NLP 1.2
No ratings yet
NLP 1.2
22 pages
NLP-Ch-2 Introduction To Language Models
No ratings yet
NLP-Ch-2 Introduction To Language Models
82 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
14 Ngramlm
No ratings yet
14 Ngramlm
67 pages
Bcse306l Ai Module-7 Smsatapathy
No ratings yet
Bcse306l Ai Module-7 Smsatapathy
51 pages
N Grams
No ratings yet
N Grams
51 pages
5) Lecture Feb11&13&17&18
No ratings yet
5) Lecture Feb11&13&17&18
21 pages
NLP
No ratings yet
NLP
12 pages
Lecture 6 To 8 N-Gram
No ratings yet
Lecture 6 To 8 N-Gram
19 pages
Probabilistic Language Modeling Challenges
No ratings yet
Probabilistic Language Modeling Challenges
12 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
Language Modeling
No ratings yet
Language Modeling
50 pages
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
No ratings yet
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
36 pages
Cs224n 2023 Lecture05 RNNLM
No ratings yet
Cs224n 2023 Lecture05 RNNLM
68 pages
UNIT 3 Language Modelling
No ratings yet
UNIT 3 Language Modelling
15 pages
3 LM 2024
No ratings yet
3 LM 2024
78 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
NLP 5th Unit
No ratings yet
NLP 5th Unit
19 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
LM 24 Aug
No ratings yet
LM 24 Aug
84 pages
NLP Unit5 15marks Jntuh
No ratings yet
NLP Unit5 15marks Jntuh
4 pages
XCS224N Module4 Slides
No ratings yet
XCS224N Module4 Slides
91 pages
Module-1 ch-2
No ratings yet
Module-1 ch-2
31 pages
NLP Unit 5
No ratings yet
NLP Unit 5
3 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
02 NLP LM
No ratings yet
02 NLP LM
99 pages
Machine Learning and Statistical Natural Language Processing
No ratings yet
Machine Learning and Statistical Natural Language Processing
27 pages
Unit 2b
No ratings yet
Unit 2b
22 pages
NLP Unit-V
No ratings yet
NLP Unit-V
30 pages
Formal Aspects of Language Modeling
No ratings yet
Formal Aspects of Language Modeling
252 pages
NLP Unit-V
No ratings yet
NLP Unit-V
30 pages
Notes - Ryan
No ratings yet
Notes - Ryan
258 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
79 pages
Language Model PDF
No ratings yet
Language Model PDF
76 pages
NLP Unit 4 Q & A
No ratings yet
NLP Unit 4 Q & A
17 pages
Unit 1
No ratings yet
Unit 1
17 pages
Lecture5 Ngrams
No ratings yet
Lecture5 Ngrams
40 pages
Introduction To Language Models
No ratings yet
Introduction To Language Models
24 pages
NLP PLM
No ratings yet
NLP PLM
35 pages
Language Identification: Fundamentals and Applications
From Everand
Language Identification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Explanation Based Learning: Fundamentals and Applications
From Everand
Explanation Based Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
HDP Work Book Final
100% (2)
HDP Work Book Final
98 pages
2017 2nd and 4th Class Schedule-Final
No ratings yet
2017 2nd and 4th Class Schedule-Final
2 pages
Utilizing Semantic Textual Similarity For Clinical Survey Data Feature Selection
No ratings yet
Utilizing Semantic Textual Similarity For Clinical Survey Data Feature Selection
9 pages
Dereje Mesfin: Sno College Department Time Lab # of Stud. Invigilator Supervisor
No ratings yet
Dereje Mesfin: Sno College Department Time Lab # of Stud. Invigilator Supervisor
1 page
Mitiku Tamirat Profile
No ratings yet
Mitiku Tamirat Profile
1 page
Comparable Evaluation of Contemporary Corpus-Based and Knowledge-Based Semantic Similarity Measures of Short Texts
No ratings yet
Comparable Evaluation of Contemporary Corpus-Based and Knowledge-Based Semantic Similarity Measures of Short Texts
7 pages
2nd Yr Maths Summer Class Sechedule
No ratings yet
2nd Yr Maths Summer Class Sechedule
1 page
2017 EC Academic Calendar
No ratings yet
2017 EC Academic Calendar
3 pages
Text Encoders Lack Knowledge: Leveraging Generative Llms For Domain-Specific Semantic Textual Similarity
No ratings yet
Text Encoders Lack Knowledge: Leveraging Generative Llms For Domain-Specific Semantic Textual Similarity
12 pages
Grade 8-Career and Technical Education Cte - Fetena - Net - 7a2b
No ratings yet
Grade 8-Career and Technical Education Cte - Fetena - Net - 7a2b
162 pages
Applsci 12 09691 v2
No ratings yet
Applsci 12 09691 v2
35 pages
Grade 8-Information Technology IT Fetena Net Af43
100% (1)
Grade 8-Information Technology IT Fetena Net Af43
115 pages
Grade 8-Social Studies Fetena Net 1dc2
100% (5)
Grade 8-Social Studies Fetena Net 1dc2
213 pages
Paraphrasing Textual Entailment and Semantic Simil
No ratings yet
Paraphrasing Textual Entailment and Semantic Simil
239 pages
Grade 8-Performing and Visual Arts Pva - Fetena - Net - 9aeb
100% (1)
Grade 8-Performing and Visual Arts Pva - Fetena - Net - 9aeb
115 pages
The Final Main Thesis-Compressed
No ratings yet
The Final Main Thesis-Compressed
85 pages
Collective Human Opinions in Semantic Textual Simi
No ratings yet
Collective Human Opinions in Semantic Textual Simi
17 pages
Shimaa IsmailSemanticSimilarity
No ratings yet
Shimaa IsmailSemanticSimilarity
11 pages
Published Paper
No ratings yet
Published Paper
12 pages
6-Lecture Six (Chapter Four-Semantic Analysis)
No ratings yet
6-Lecture Six (Chapter Four-Semantic Analysis)
25 pages
PVA Grade 10 Student Textbook Final Version V20220802 - Compressed
100% (1)
PVA Grade 10 Student Textbook Final Version V20220802 - Compressed
144 pages
Kaiwartya 2016
No ratings yet
Kaiwartya 2016
17 pages
Boosting The Performance of Transformer Architectu
No ratings yet
Boosting The Performance of Transformer Architectu
6 pages
Handout Cloud, Iot, Ip
No ratings yet
Handout Cloud, Iot, Ip
141 pages
4-Lecture Four - (Part of Speech Tagging and Sequence Labeling)
No ratings yet
4-Lecture Four - (Part of Speech Tagging and Sequence Labeling)
36 pages
Let2 W
No ratings yet
Let2 W
46 pages
8-Deep Learning For NLP
No ratings yet
8-Deep Learning For NLP
49 pages
9 Speech Recognition
No ratings yet
9 Speech Recognition
26 pages
2-Lecture Two - (Back Ground of NLP)
No ratings yet
2-Lecture Two - (Back Ground of NLP)
65 pages
7-Information Extraction (IE) and Machine Translation (MT)
No ratings yet
7-Information Extraction (IE) and Machine Translation (MT)
46 pages
Farm Fusion
No ratings yet
Farm Fusion
14 pages
Conclusions and Future Work
No ratings yet
Conclusions and Future Work
12 pages
Human Computer Interaction and Robotics
No ratings yet
Human Computer Interaction and Robotics
43 pages
ThungYang ClassificationOfTrashForRecyclabilityStatus Report
No ratings yet
ThungYang ClassificationOfTrashForRecyclabilityStatus Report
6 pages
Google About SGE
No ratings yet
Google About SGE
16 pages
Topics For Internship in The Netherlands - 2024
No ratings yet
Topics For Internship in The Netherlands - 2024
3 pages
(Ebooks PDF) Download (Ebook PDF) MIS 10th Edition by Hossein Bidgoli Full Chapters
100% (3)
(Ebooks PDF) Download (Ebook PDF) MIS 10th Edition by Hossein Bidgoli Full Chapters
49 pages
Chat
No ratings yet
Chat
36 pages
Data Science in Practice
No ratings yet
Data Science in Practice
34 pages
Life As A Darktrace Customer
No ratings yet
Life As A Darktrace Customer
16 pages
XX 04 Example ProjectPlan
No ratings yet
XX 04 Example ProjectPlan
20 pages
14 Project Management Trends Emerging in 2024
No ratings yet
14 Project Management Trends Emerging in 2024
7 pages
Deep Learning Q Bank Mte
No ratings yet
Deep Learning Q Bank Mte
2 pages
Term Paper
No ratings yet
Term Paper
27 pages
Decision Trees & The Iterative Dichotomiser 3 (ID3) Algorithm
100% (1)
Decision Trees & The Iterative Dichotomiser 3 (ID3) Algorithm
8 pages
Ai Vs Human
No ratings yet
Ai Vs Human
18 pages
Mourya Swecha Internship Powerpoint
No ratings yet
Mourya Swecha Internship Powerpoint
8 pages
Btech Cs 8 Sem Artificial Intelligence Ecs 801 2013
No ratings yet
Btech Cs 8 Sem Artificial Intelligence Ecs 801 2013
3 pages
6-Month Roadmap To Becoming An AI Engineer - A Step-By-Step Guide
No ratings yet
6-Month Roadmap To Becoming An AI Engineer - A Step-By-Step Guide
20 pages
Be Summer 2022
No ratings yet
Be Summer 2022
2 pages
878 2234 1 PB
No ratings yet
878 2234 1 PB
12 pages
IX-Unit-2-WorksheetAnswerKey-AI Project Cycle
No ratings yet
IX-Unit-2-WorksheetAnswerKey-AI Project Cycle
4 pages
1 Soft Computing
No ratings yet
1 Soft Computing
8 pages
SGS-AI Whitepaper2 Final
No ratings yet
SGS-AI Whitepaper2 Final
19 pages
Basics of ANN
No ratings yet
Basics of ANN
16 pages
Business Plan Alternative Medicine
100% (1)
Business Plan Alternative Medicine
8 pages
Assignment 1
100% (1)
Assignment 1
17 pages

3-Lecture Three - (Chapter Two-N-gram Language Models)

Uploaded by

3-Lecture Three - (Chapter Two-N-gram Language Models)

Uploaded by

Chapter 2 : N – gram Language

 Language models assign a probability that a sentence is a legal

 Language Models are useful component of many NLP systems,

 Optical Character Recognition (OCR), and

 Machine Translation (MT).

 Language Models Definition:

 Language model – computational mechanism for obtaining these

 Language models answer the question:

 Help with reordering:

 Help with word choice:

 What is a statistical language model?

 Why are language models interesting?

 Uses of Language Models:

 An n-gram model is a type of probabilistic language model for

 Simple n-gram models are easy to train on unsupervised corpora and

 Estimate probability of each word given prior context.

 Number of parameters required grows exponentially with the

 An n-gram model uses only N1 words of prior context.

 The Markov assumption is the presumption that the future

 N-Gram Model Formulas

 N-gram conditional probabilities can be estimated from raw

 Here are some text normalized sample user queries (a sample of

 Bigram estimates of sentence probabilities:

 P(<s> i want Chinese food </s>)

 What kinds of knowledge?

 Practical Issues: We do every thing in log space,

 P1 x P2 x P3 x P4 = logP1 + logP2 + logP3 + logP4

 Train and Test Corpora:

 Ideally, the training (and test) corpus should be representative of

 May need to adapt a general model to a small amount of new (in-

 Train and Test Corpora…

 Train a model that includes an explicit symbol for an unknown

 ME estimators with L2 regularization, which have been widely

 In addition, the perceptron algorithm and its variants, e.g., the

 Ideally, evaluate use of model in end application (extrinsic

 Verify at least once that intrinsic evaluation correlates with an

 Limitations of LM (n-gram) so far:

You might also like