0% found this document useful (0 votes)

32 views6 pages

NLP Lec 11

Sequence Models are machine learning models designed to handle sequences of data, particularly in natural language processing, speech recognition, and time-series forecasting. They can process inputs and outputs of varying lengths and are categorized into Statistical Sequence Models and Neural Network Based Sequence Models, with N-Gram Models and Hidden Markov Models being common types of the former. N-Gram Models predict the next item in a sequence based on previous items, utilizing the Markov assumption to simplify probability calculations.

Uploaded by

irazahid26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views6 pages

NLP Lec 11

Uploaded by

irazahid26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

7.

2 Types of Sequence Models

Sequence Models are a subset of machine learning models that work with sequences of data. These models leverage the
temporal or sequential order of data to perform tasks such as natural language processing, speech recognition, time-series
forecasting, and more.
The most pivotal feature of sequence models is their capability to process inputs and outputs of varying lengths. They
aren't restricted to one-to-one mapping, as seen in traditional ML models.
Sequence models in NLP are generally categorized into:
1. Statistical Sequence Models
2. Neural Network Based Sequence Models

7.2.1 Statistical Sequence Models

These are based on probabilistic methods and operate with hand-crafted features or word-based statistics. They were
dominant before deep learning became mainstream.
Common Types of Statistical Sequence Models:
 N-Gram Models
 Hidden Markov Models (HMM)

7.2.1.1 N-Gram Models

An N-Gram model is a probabilistic language model used to predict the next item (word, character, etc.) in
a sequence, based on the previous N−1 items. It assumes the Markov property, which means the probability
of a word depends only on the previous N−1 words.

Types of N-Grams

 Unigram (N = 1): Considers each word independent

 Bigram (N = 2): Considers one word of context
 Trigram (N = 3): Considers two words of context
 4-gram and above: More context but leads to data sparsity
Unigram Model:
A unigram model assumes that each word in a sequence is independent of the previous words. So, the model predicts the
next word based solely on its individual frequency in a training corpus.
The probability of a word www is calculated as:

The model predicts the most probable next word, regardless of context.
Example:
Training Corpus:
"I love NLP. I love AI. AI loves me."
Tokenized Words:
[I, love, NLP, I, love, AI, AI, loves, me]
Word Counts:
I→2
love → 2
NLP → 1
AI → 2
loves → 1
me → 1
Total Words: 9
Unigram Probabilities:

Predicting the Next Word Using Unigram Model

Let’s say a user typed:
“I love”
In a unigram model, context is ignored. So, the model will predict the most frequent word from the list of probabilities.
Most likely next word = “I”, “love”, or “AI” (all tied at 0.222)
The model might randomly choose among the top-scoring words, or choose the one with highest prior probability.
 The unigram model is very simple and fast.
 It does not consider context, so its predictions are often inaccurate or unnatural.
 Still useful for baseline comparisons and some low-resource scenarios.

The Markov assumption

The intuition of the n-gram model is that instead of computing the probability of a word given its entire history, we can
approximate the history by just the last few words. The bigram model, for example, approximates the probability of a
word given all the previous words P(wn | w1:n-1) by using only the conditional probability given the preceding word P(wn |
w1). In other words, instead of computing the probability
P(blue | The water of Walden Pond is so beautifully)
we approximate it with the probability
P(blue | beautifully)
When we use a bigram model to predict the conditional probability of the next word, we are thus making the following
approximation:
P(wn | w1:n-1) = P(wn | w1)
The assumption that the probability of a word depends only on the previous word is called a Markov assumption.
Markov models are the class of probabilistic models that assume we can predict the probability of some future unit
without looking too far into the past. We can generalize the bigram (which looks one word into the past) to the trigram
(which looks two words into the past) and thus to the n-gram (which looks n-1words into the past).

How to estimate probabilities

An intuitive way to estimate probabilities is called maximum likelihood estimation or MLE. We get estimation
normalize the MLE estimate for the parameters of an n-gram model by getting counts from a corpus, and normalizing the
counts so that they lie between 0 and 1. For probabilistic models, normalizing means dividing by some total count so that
the resulting probabilities fall between 0 and 1 and sum to 1. For example, to compute a particular bigram probability of a
word wn given a previous word wn-1, we’ll compute the count of the bigram C(wn-1 wn ) and normalize by the sum of all
the bigrams that share the same first word wn-1:

We can simplify this equation, since the sum of all bigram counts that start with a given word wn-1 must be equal to the
unigram count for that word wn-1:

Let’s work through an example using a mini-corpus of three sentences.

"I love NLP. I love AI. AI loves me."

We'll represent each sentence like this:

1. <s> I love NLP </s>

2. <s> I love AI </s>
3. <s> AI loves me </s>

Flattened Token Sequence:

[<s>, I, love, NLP, </s>, <s>, I, love, AI, </s>, <s>, AI, loves, me, </s>]

Step 1: Count Bigrams

Bigram Count Bigram Count

<s> → I 2 NLP → </s> 1

<s> → AI 1 AI → </s> 1

I → love 2 AI → loves 1

love → NLP 1 loves → me 1

love → AI 1 me → </s> 1

Step 2: Count Unigrams (for denominator)

Word Count Word Count

<s> 3 AI 2

I 2 loves 1

love 2 me 1

NLP 1 </s> 3
Step 3: Bigram Probabilities

Let’s calculate a few sample bigram probabilities:

Example 1: Predict word after “love”

Example 2: Probability of ending a sentence after “me”

Example 3: Probability that a sentence starts with “AI”

Suppose your current word is: “love”

You want to predict the next word.

From the above:

 P(NLP | love) = 0.5

 P(AI | love) = 0.5

Prediction: Either “NLP” or “AI” with equal probability (50%)

Trigram Model Basics

A trigram model assumes that the probability of a word depends on the two previous words:

To estimate this, we need to count how often each triple of consecutive words occurs in the corpus.

Sentences:

1. "I love NLP."

2. "I love AI."
3. "AI loves me."

We'll add <s> <s> at the beginning of each sentence (to account for trigram context), and </s> at the end:

Tokenized Sentences with Markers:

 <s> <s> I love NLP </s>

 <s> <s> I love AI </s>
 <s> <s> AI loves me </s>

Step 1: Trigram Counts

Trigram Count Trigram Count

<s> <s> I 2 love NLP </s> 1

<s> <s> AI 1 love AI </s> 1

<s> I love 2 <s> AI loves 1

I love NLP 1 AI loves me 1

I love AI 1 loves me </s> 1

Step 2: Bigram Counts (for denominator)

Bigram Count Bigram Count

<s> <s> 3 love NLP 1

<s> I 2 love AI 1

<s> AI 1 AI loves 1

I love 2 loves me 1

Step 3: Predict Next Word Using Trigram

Suppose the previous two words are: "I love"

To predict the next word, we calculate:

From the trigram table:

 I love NLP = 1
 I love AI = 1
 Total = 2

Probabilities:
So, if your context is “I love”, the trigram model says the next word is “NLP” or “AI” with equal probability.

Another Example:

Previous words: "AI loves"

AI loves me = 1

Count(AI loves) = 1

The model is confident the next word is “me” after “AI loves”

N Grams - Nptel Notes
No ratings yet
N Grams - Nptel Notes
75 pages
NLP - Module 2
No ratings yet
NLP - Module 2
77 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
79 pages
Lecture 3 - Language Modelling and RNNs Part 1
No ratings yet
Lecture 3 - Language Modelling and RNNs Part 1
44 pages
NLP Cat 2
No ratings yet
NLP Cat 2
78 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
Lecture 03
No ratings yet
Lecture 03
41 pages
NLP PLM
No ratings yet
NLP PLM
35 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
N-Gram Language Model: Based On Speech and Language Processing. Daniel Jurafsky & James H. Martin Book, 2023
No ratings yet
N-Gram Language Model: Based On Speech and Language Processing. Daniel Jurafsky & James H. Martin Book, 2023
46 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
LM 24 Aug
No ratings yet
LM 24 Aug
84 pages
N-Grams and Corpus Linguistics: Julia Hirschberg
No ratings yet
N-Grams and Corpus Linguistics: Julia Hirschberg
47 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
Lecture04-Ngram Lang Models
No ratings yet
Lecture04-Ngram Lang Models
39 pages
NLP UNIT III (Part 1)
No ratings yet
NLP UNIT III (Part 1)
15 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
NLP Week 03
No ratings yet
NLP Week 03
33 pages
N-Gram Language Models
No ratings yet
N-Gram Language Models
26 pages
N Grams
No ratings yet
N Grams
51 pages
NLP Week4 Ngrams
No ratings yet
NLP Week4 Ngrams
60 pages
NLP Unit2
No ratings yet
NLP Unit2
65 pages
NLP Unit-4
No ratings yet
NLP Unit-4
62 pages
CME4408 P5 N-Grams Smooting
No ratings yet
CME4408 P5 N-Grams Smooting
43 pages
Language Modeling
No ratings yet
Language Modeling
50 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
Unit 3-Notes AI
No ratings yet
Unit 3-Notes AI
36 pages
Lecture13 LM YirenWang
No ratings yet
Lecture13 LM YirenWang
8 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
Artificial Intelligence: Natural Language Processing
No ratings yet
Artificial Intelligence: Natural Language Processing
13 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
Lecture 4 N Grams
No ratings yet
Lecture 4 N Grams
29 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
Predicting Words and Sentences Using Statistical Models: Nicola Carmignani
No ratings yet
Predicting Words and Sentences Using Statistical Models: Nicola Carmignani
42 pages
Module-1 ch-2
No ratings yet
Module-1 ch-2
31 pages
5) Lecture Feb11&13&17&18
No ratings yet
5) Lecture Feb11&13&17&18
21 pages
Language Modelling
No ratings yet
Language Modelling
17 pages
NLP
No ratings yet
NLP
12 pages
Ngrams
100% (1)
Ngrams
22 pages
1 N-Grams and Language Models Detailed
No ratings yet
1 N-Grams and Language Models Detailed
4 pages
N Gram Model
No ratings yet
N Gram Model
4 pages
N-Gram in NLP
No ratings yet
N-Gram in NLP
15 pages
Trigram Language Models
No ratings yet
Trigram Language Models
19 pages
NLP 1.2
No ratings yet
NLP 1.2
22 pages
Module 2
No ratings yet
Module 2
26 pages
Language Modelling
No ratings yet
Language Modelling
3 pages
NLP 5th Unit
No ratings yet
NLP 5th Unit
19 pages
NLPPR8
No ratings yet
NLPPR8
4 pages
Language Models
No ratings yet
Language Models
34 pages
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
No ratings yet
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
32 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Mains Voltage Compensation
No ratings yet
Mains Voltage Compensation
6 pages
09 Kbat Jawapan
88% (8)
09 Kbat Jawapan
40 pages
Cost & Management Accounting
No ratings yet
Cost & Management Accounting
3 pages
Robert D'Onofrio-Delay Analysis UK-US Approaches 2018
100% (1)
Robert D'Onofrio-Delay Analysis UK-US Approaches 2018
9 pages
Project Aditya)
No ratings yet
Project Aditya)
82 pages
Analog Electronic Circuits Lab Manual
No ratings yet
Analog Electronic Circuits Lab Manual
99 pages
Communication Aids and Strategies Using Tools of Technology
No ratings yet
Communication Aids and Strategies Using Tools of Technology
32 pages
q3 Peh Week3
No ratings yet
q3 Peh Week3
8 pages
1 - Tuberia 4'' SCH40 222956 Tpco
No ratings yet
1 - Tuberia 4'' SCH40 222956 Tpco
2 pages
كل مذكرات السنة الأولى في الانجليزية
No ratings yet
كل مذكرات السنة الأولى في الانجليزية
32 pages
Prof Ed 2023 New Curriculum
No ratings yet
Prof Ed 2023 New Curriculum
17 pages
World Vision - Education - Unlock-Literacy
No ratings yet
World Vision - Education - Unlock-Literacy
2 pages
Science: Junior Cycle Final Examination Sample Paper A Solutions
No ratings yet
Science: Junior Cycle Final Examination Sample Paper A Solutions
10 pages
Lesson 3: Surface Creation
No ratings yet
Lesson 3: Surface Creation
86 pages
21-Economics-2017 (Tamil) - Final - 1693223768823
No ratings yet
21-Economics-2017 (Tamil) - Final - 1693223768823
74 pages
02-07-23 SR - Iit Star Co-Sc (Model-A) Jee Adv 2020 (P-I) Wat-45 Key&Sol
No ratings yet
02-07-23 SR - Iit Star Co-Sc (Model-A) Jee Adv 2020 (P-I) Wat-45 Key&Sol
14 pages
Astm C40 C40M 16
No ratings yet
Astm C40 C40M 16
1 page
Physics Grade 9 Worksheet I Second Sem
No ratings yet
Physics Grade 9 Worksheet I Second Sem
11 pages
Assignment/Tugasan: Kod Kursus /course Code: EBTQ 3103 Tajuk Kursus /course Title: Quality Control
No ratings yet
Assignment/Tugasan: Kod Kursus /course Code: EBTQ 3103 Tajuk Kursus /course Title: Quality Control
6 pages
(Buehler & Griffin & Peetz-2012) The Planning Fallacy - Cognitive, Motivational, and Social Origins
No ratings yet
(Buehler & Griffin & Peetz-2012) The Planning Fallacy - Cognitive, Motivational, and Social Origins
62 pages
Foot-Surface-Structure Analysis Using A Smartphone-Based 3D Foot Scanner
No ratings yet
Foot-Surface-Structure Analysis Using A Smartphone-Based 3D Foot Scanner
7 pages
Dpi Reports
No ratings yet
Dpi Reports
2 pages
Engine Code Won't Clear in My 2008 Saturn Vue - Google Search
No ratings yet
Engine Code Won't Clear in My 2008 Saturn Vue - Google Search
1 page
Stanford GSB Ee Sample Schedule MRR
No ratings yet
Stanford GSB Ee Sample Schedule MRR
1 page
Unit 8 Year 6 (w21)
No ratings yet
Unit 8 Year 6 (w21)
23 pages
Filtermedia HSL HSL-C Uk
No ratings yet
Filtermedia HSL HSL-C Uk
2 pages
Final Baba Ghulam Shah Badshah University Admission 2021 Notification 1
No ratings yet
Final Baba Ghulam Shah Badshah University Admission 2021 Notification 1
2 pages
Performance and Durability Comparison: Dell Latitude 14 5000 Series vs. HP EliteBook 840 G1
No ratings yet
Performance and Durability Comparison: Dell Latitude 14 5000 Series vs. HP EliteBook 840 G1
20 pages
2013 ME Magway,, English
No ratings yet
2013 ME Magway,, English
4 pages
Admnadvt
No ratings yet
Admnadvt
2 pages
Worked Examples in Advanced Mechanics of Materials using MATLAB
From Everand
Worked Examples in Advanced Mechanics of Materials using MATLAB
Eric Okoth Ogur
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet