0% found this document useful (0 votes)

13 views19 pages

Lecture 6 To 8 N-Gram

The document discusses various types of language models in Natural Language Processing (NLP), focusing on N-gram models, which predict the next word based on preceding words. It highlights the advantages and limitations of these models, including their applications in machine translation, speech recognition, and spell correction. Additionally, it covers concepts such as log probability, perplexity, and the need for smoothing techniques in N-gram models.

Uploaded by

shivasharma8189

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views19 pages

Lecture 6 To 8 N-Gram

Uploaded by

shivasharma8189

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Language model: N-gram

Lecture-6, 7, 8 (n-gram model)

Types of Language Models in NLP

Language modeling is a fundamental concept in Natural Language Processing

(NLP). It involves building statistical or machine learning models that can
predict the probability of a sequence of words. In essence, these models learn the
patterns and structures of language, enabling machines to understand and generate
human-like text.

Here's a breakdown of the key types:

1. Statistical Language Models:

These models use statistical techniques to determine the probability of word
sequences. They rely on counting word occurrences in large text corpora.
○ N-gram Models:
Traditional and simpler types of language models. They calculate the
probability of a word based on the preceding n words.
Examples:
■ Unigrams: Consider each word independently.
■ Bigrams: Consider the previous word.
■ Trigrams: Consider the previous two words.
Advantages: Easy to implement, efficient for small datasets.
Limitations: Fail to capture long-term dependencies (cannot
model meaning beyond a few words).
○ Probabilistic & Bayesian Language Models:
These models use probability theory to generate sequences.
■ Markov Models: Predict sequences based on visible state
transitions (e.g., word-to-word). Simpler than HMMs, used in
early text generation.
■ Hidden Markov Model (HMM): Assumes words depend only
on the previous state (Markov assumption). Used in speech
recognition and part-of-speech tagging. Simple and
interpretable, but not effective for long-range dependencies.
2. Neural Language Models:
These models utilize neural networks to learn representations of words and
their relationships, capturing complex patterns and semantic information.
Examples include Recurrent Neural Networks (RNNs), with subtypes like
LSTM (Long Short-Term Memory) for better memory, and Transformer
networks.
3. Large Language Models (LLMs):
These are a subset of neural language models, separated due to their
massive scale and usage. Characterized by a large number of parameters and
enormous datasets, they’re based on the Transformer architecture. They
excel in NLP tasks like text generation, translation, and question-answering.
Examples include GPT models and Google’s PaLM models.
○ Applications: HMMs powered traditional POS tagging; LLMs now
handle it with deeper context.

Applications of language models-

Language models are used in a wide range of NLP applications, including:

a. Machine translation

b. Speech recognition
c. Text generation
d. Question answering
e. Text summarization

Spell correction-

Language models help in spell correction by predicting the most likely correct
word based on context and probability.

● A user types a word with a spelling mistake (e.g., "teh" instead of "the")
● The system needs to correct it based on context and word probability.

How Does a Language Model Help?

1️Uses a probability-based model to determine the most likely correct word.
2 Considers the surrounding words (context) to predict the best correction.
3️Ranks possible corrections based on how often they appear in real-world text.

Let's say a user types:

"I am goign to the market"

Possible corrections for "goign" are:

● "going"
● "gogin"
● "gonig"

A bigram language model estimates the probability of each word based on

context:

● P(going | I am) = 0.9

● P(gogin | I am) = 0.02

● P(gonig | I am) = 0.08

The model chooses "going" because it has the highest probability in natural text.

N-Gram Models

A language model (LM) is a statistical model that predicts the next word in a
sequence given the previous words. It is essential in applications like speech
recognition, text generation, machine translation, and spell correction.

An N-Gram model is a type of probabilistic language model that predicts the

next word based on the previous (N-1) words in a sequence.

● Unigram (1-Gram) → Predicts a word independently (no context).

● Bigram (2-Gram) → Predicts a word based on 1 previous word.
● Trigram (3-Gram) → Predicts a word based on 2 previous words.
● N-Gram (N ≥ 4) → Predicts a word based on N-1 previous words.
Unigram: “The”, “dog”, “runs” → No dependency
Bigram: P(dog | The)
Trigram: P(runs | The dog)
Question
Question
<s> John drinks tea </s>
<s> She prefers tea with sugar </s>

Compute the Bigram Probability of sentence <s> John drinks tea </s>
Why use Log Prob?

We use log probabilities in n-gram calculations primarily to prevent numerical

underflow and simplify probability computations.

1. Avoiding Numerical Underflow:

○ N-gram models compute the probability of a sentence by multiplying
the probabilities of individual words.
○ Since all probabilities lie between 0 and 1, multiplying many small
probabilities results in extremely tiny numbers, which can cause
numerical underflow (i.e., values becoming too small for the
computer to represent accurately).
○ Using logarithms transforms the product into a sum, which avoids
this issue.
2. Mathematical Simplification:
○ A logarithmic identity states:
log⁡(a×b×c)=log⁡a+log⁡b+log⁡c= log a + log b + log c
○ Instead of multiplying many small numbers, we can add their log
values, making computations simpler and more stable.
3. Efficient Computation in Machine Learning & NLP:
○ Log probabilities allow for more efficient storage and processing in
large corpora.
○ Many machine learning models work better with log values rather
than raw probabilities.

The log probability in an n-gram model represents the likelihood of a sequence of

words occurring in a corpus, but expressed in the logarithmic domain instead of
the standard probability domain.

What Does Log Probability Represent?

Interpreting Log Probability:

● Higher (closer to 0) log probability → more likely sequence.

● Lower (more negative) log probability → less likely sequence.

Example Interpretation

If we have two sentences:

1. Sentence A: "The dog chased the cat."

○ Log probability: -5.2

2. Sentence B: "Cat the chased dog the."

○ Log probability: -12.8

The log probability of Sentence A is higher (less negative), meaning it is more

likely based on the n-gram model.
Perplexity?

Perplexity is a measure of how well a language model predicts a sequence of words. Think of it
as the model’s “confusion level”—lower perplexity means the model is less confused, better at
guessing the next word. It’s widely used to evaluate N-gram models (and others) by quantifying
their predictive power on test data.
● Intuition: Imagine you’re guessing the next word in “The cat ___.” If your model strongly
predicts “runs” (high probability), it’s less perplexed. If it’s unsure (low probability spread
across many words), perplexity shoots up.
● Goal: Lower perplexity = better model. It’s like a score—aim for the lowest you can get!

Why Perplexity in N-gram Models?

N-gram models predict the probability of a word based on the previous N-1 words (e.g., bigrams
use 1 prior word, trigrams use 2). Perplexity tests how well these probabilities hold up on
unseen text:

● Unigram: Guesses each word independently—high perplexity (lots of uncertainty).

● Bigram: Uses one prior word—better, but still limited.
● Trigram: Uses two prior words—lower perplexity if trained well.
What Does Perplexity Mean?

● Perplexity = 2.75: The model’s “effective vocabulary” is ~2.75 words—it’s

like it’s choosing between ~3 options per word on average. Lower is
better—means higher confidence.
● High Perplexity (e.g., 100): Model’s guessing from 100
possibilities—terrible predictions.
● Perfect Model: Perplexity = 1 (probability = 1 for every word—impossible
in practice).

Why Use Perplexity?

● Evaluation: Compare models—bigram (e.g., 2.75) vs. trigram (maybe

2.5)—lower wins.
● N-gram Limits: High perplexity on long sentences shows N-grams miss
distant context (e.g., “The cat... [20 words]... runs”).
Need for Smoothing
Smoothing Techniques-

Smoothing techniques address the issue of zero probabilities in MLE-based n-gram

models by redistributing probability mass across observed and unseen word
sequences.
Advantages & Limitations of N-Gram Models

✅ Advantages:
✔ Simple & Efficient: Easy to train and use for text generation.
✔ Works Well for Small Datasets: Performs decently for moderate text corpora.

❌ Limitations:
❌ Data Sparsity: Large N-Grams require huge amounts of training data. As N
increases, the number of possible N-grams grows exponentially, leading to sparse

❌
data and increased computational demands.
Lack of Long-Range Context: Cannot capture dependencies beyond

❌
N-words.
High Computational Cost: Higher N-Gram models require more memory.
Applications of N-Gram Models in NLP

🔹 Text Prediction (Smartphones, Keyboards - T9, SwiftKey)

🔹 Speech Recognition (Google Speech, Siri, Alexa)
🔹 Machine Translation (Statistical MT before Deep Learning)
🔹 Spell Checking & Auto-Correction (Grammarly, MS Word)
🔹 Plagiarism Detection & Text Summarization
Code

https://colab.research.google.com/drive/1g5hVdk8hd6WF1LA-suTN1nOC7KWCa
FPE

КТП 4 Smiles
No ratings yet
КТП 4 Smiles
10 pages
Unseen Passage For Class 12 Factual CBSE With Answers
71% (14)
Unseen Passage For Class 12 Factual CBSE With Answers
28 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
NLP Unit2
No ratings yet
NLP Unit2
65 pages
NLP m2
No ratings yet
NLP m2
74 pages
NLP Unit-4
No ratings yet
NLP Unit-4
62 pages
Language Modeling
No ratings yet
Language Modeling
50 pages
LM 24 Aug
No ratings yet
LM 24 Aug
84 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Ngrams
100% (1)
Ngrams
22 pages
Module-1 ch-2
No ratings yet
Module-1 ch-2
31 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
NLP
No ratings yet
NLP
12 pages
NLP Sem Unit 5
No ratings yet
NLP Sem Unit 5
9 pages
NLP Unit 4 Q & A
No ratings yet
NLP Unit 4 Q & A
17 pages
Probabilistic Language Modeling Challenges
No ratings yet
Probabilistic Language Modeling Challenges
12 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
3 LM 2024
No ratings yet
3 LM 2024
78 pages
NLP Unit 5
No ratings yet
NLP Unit 5
3 pages
5) Lecture Feb11&13&17&18
No ratings yet
5) Lecture Feb11&13&17&18
21 pages
04 Language Modeling
No ratings yet
04 Language Modeling
70 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
N Grams
No ratings yet
N Grams
51 pages
3-Lecture Three - (Chapter Two-N-gram Language Models)
No ratings yet
3-Lecture Three - (Chapter Two-N-gram Language Models)
28 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
3 LM 2024
No ratings yet
3 LM 2024
78 pages
Unit 1
No ratings yet
Unit 1
17 pages
Cs224n 2025 Lecture05 RNNLM
No ratings yet
Cs224n 2025 Lecture05 RNNLM
54 pages
NLP-Ch-2 Introduction To Language Models
No ratings yet
NLP-Ch-2 Introduction To Language Models
82 pages
UNIT 3 Language Modelling
No ratings yet
UNIT 3 Language Modelling
15 pages
14 Ngramlm
No ratings yet
14 Ngramlm
67 pages
Language Modeling
No ratings yet
Language Modeling
3 pages
NLP 1.2
No ratings yet
NLP 1.2
22 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
Intro To Language Models - Soumyasis Mishra - 191001021003 - BCS4C
No ratings yet
Intro To Language Models - Soumyasis Mishra - 191001021003 - BCS4C
10 pages
1 N-Grams and Language Models Detailed
No ratings yet
1 N-Grams and Language Models Detailed
4 pages
Clip Unit 4
No ratings yet
Clip Unit 4
9 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
Introduction To Language Models
No ratings yet
Introduction To Language Models
24 pages
NLP PLM
No ratings yet
NLP PLM
35 pages
Chapter 5
No ratings yet
Chapter 5
22 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
N-Gram Language Models
No ratings yet
N-Gram Language Models
15 pages
N Gram Model
No ratings yet
N Gram Model
4 pages
NLTK - N-Gram LM
No ratings yet
NLTK - N-Gram LM
13 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
79 pages
NLP Unit-5.2 Notes
No ratings yet
NLP Unit-5.2 Notes
72 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
Technical NLP U3-6
No ratings yet
Technical NLP U3-6
83 pages
PLM 17
No ratings yet
PLM 17
15 pages
Langauage Model
No ratings yet
Langauage Model
148 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
Lecture 3 - Language Modelling and RNNs Part 1
No ratings yet
Lecture 3 - Language Modelling and RNNs Part 1
44 pages
Lecture 4 N Grams
No ratings yet
Lecture 4 N Grams
29 pages
NLP 5th Unit
No ratings yet
NLP 5th Unit
19 pages
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
No ratings yet
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
36 pages
Statistical Language Model
No ratings yet
Statistical Language Model
9 pages
Curriculum Vitae Mavita Sari: Name P.D.O.B Gender Marrital Status: Nationality: Address: Telepon
No ratings yet
Curriculum Vitae Mavita Sari: Name P.D.O.B Gender Marrital Status: Nationality: Address: Telepon
3 pages
Italian Report On Reflexive Verbs
No ratings yet
Italian Report On Reflexive Verbs
35 pages
Annual Sports Meet 2024-25
No ratings yet
Annual Sports Meet 2024-25
26 pages
Participant 5
No ratings yet
Participant 5
5 pages
CH 4-2 - PPT 2
No ratings yet
CH 4-2 - PPT 2
25 pages
Pedang UlarEmas (Kim Coa Kiam)
No ratings yet
Pedang UlarEmas (Kim Coa Kiam)
505 pages
Lesson 2A - Extra Activity
No ratings yet
Lesson 2A - Extra Activity
1 page
Positive Role Model For Students: Needs Improvement
100% (1)
Positive Role Model For Students: Needs Improvement
3 pages
Up Book For Advanced German
No ratings yet
Up Book For Advanced German
8 pages
Morphology FST
No ratings yet
Morphology FST
47 pages
Acfrogbwl3fwgg8315jfkfcucui5l349hadb5bfnyy0-Pov6l5 P H3xzkolttnkb8glt8n25e6rzsi580g2q4fdej Vo0jrbzeneqz7cnipnhizpfaq2c03p-Iybze1v7n6zxgl Gkt5szhfw8v
No ratings yet
Acfrogbwl3fwgg8315jfkfcucui5l349hadb5bfnyy0-Pov6l5 P H3xzkolttnkb8glt8n25e6rzsi580g2q4fdej Vo0jrbzeneqz7cnipnhizpfaq2c03p-Iybze1v7n6zxgl Gkt5szhfw8v
35 pages
ESA INGLÊS - Ex. - Pronouns
No ratings yet
ESA INGLÊS - Ex. - Pronouns
6 pages
Class English Grammar Test10
No ratings yet
Class English Grammar Test10
3 pages
SUBJECT-VERB AGREEMENT PPT
No ratings yet
SUBJECT-VERB AGREEMENT PPT
20 pages
Bahasa Inggris Ujian
No ratings yet
Bahasa Inggris Ujian
4 pages
Lores Bonney AVIATION PIONEER (1897-1994)
No ratings yet
Lores Bonney AVIATION PIONEER (1897-1994)
4 pages
Giving Advice and Making Suggestions
No ratings yet
Giving Advice and Making Suggestions
11 pages
Always and Never: Activity Type
No ratings yet
Always and Never: Activity Type
2 pages
Tiempo Futuro
No ratings yet
Tiempo Futuro
3 pages
DSC 2025 - Press Release - English
No ratings yet
DSC 2025 - Press Release - English
3 pages
Developing Discussion Skills
No ratings yet
Developing Discussion Skills
6 pages
@InglizEnglish-4000 Essential English Words 2 Uzb
100% (2)
@InglizEnglish-4000 Essential English Words 2 Uzb
196 pages
Spiderman Thesis Statement
100% (3)
Spiderman Thesis Statement
5 pages
Wa0005
No ratings yet
Wa0005
9 pages
P.4 Primary Four Eng Notes - Teacher - Ac
100% (1)
P.4 Primary Four Eng Notes - Teacher - Ac
66 pages
Sample TEST Paper OMR SHEET PET2025 SUTC
No ratings yet
Sample TEST Paper OMR SHEET PET2025 SUTC
4 pages
To Be
No ratings yet
To Be
8 pages
English Test Paper - 12th - 2023-24 (QTS)
No ratings yet
English Test Paper - 12th - 2023-24 (QTS)
12 pages

Lecture 6 To 8 N-Gram

Uploaded by

Lecture 6 To 8 N-Gram

Uploaded by

Language model: N-gram

Lecture-6, 7, 8 (n-gram model)

Types of Language Models in NLP

Language modeling is a fundamental concept in Natural Language Processing

Here's a breakdown of the key types:

1.​ Statistical Language Models:​

Applications of language models-

Language models are used in a wide range of NLP applications, including:

a.​ Machine translation

How Does a Language Model Help?

Let's say a user types:

"I am goign to the market"

Possible corrections for "goign" are:

A bigram language model estimates the probability of each word based on

●​ P(going | I am) = 0.9​

●​ P(gogin | I am) = 0.02​

●​ P(gonig | I am) = 0.08

An N-Gram model is a type of probabilistic language model that predicts the

●​ Unigram (1-Gram) → Predicts a word independently (no context).

We use log probabilities in n-gram calculations primarily to prevent numerical

1.​ Avoiding Numerical Underflow:

The log probability in an n-gram model represents the likelihood of a sequence of

What Does Log Probability Represent?

●​ Higher (closer to 0) log probability → more likely sequence.

If we have two sentences:

1.​ Sentence A: "The dog chased the cat."​

○​ Log probability: -5.2​

2.​ Sentence B: "Cat the chased dog the."​

○​ Log probability: -12.8

The log probability of Sentence A is higher (less negative), meaning it is more

Why Perplexity in N-gram Models?

●​ Unigram: Guesses each word independently—high perplexity (lots of uncertainty).

●​ Perplexity = 2.75: The model’s “effective vocabulary” is ~2.75 words—it’s

Why Use Perplexity?

●​ Evaluation: Compare models—bigram (e.g., 2.75) vs. trigram (maybe

Smoothing techniques address the issue of zero probabilities in MLE-based n-gram

🔹 Text Prediction (Smartphones, Keyboards - T9, SwiftKey)​

You might also like

1. Statistical Language Models:

a. Machine translation

● P(going | I am) = 0.9

● P(gogin | I am) = 0.02

● P(gonig | I am) = 0.08

● Unigram (1-Gram) → Predicts a word independently (no context).

1. Avoiding Numerical Underflow:

● Higher (closer to 0) log probability → more likely sequence.

1. Sentence A: "The dog chased the cat."

○ Log probability: -5.2

2. Sentence B: "Cat the chased dog the."

○ Log probability: -12.8

● Unigram: Guesses each word independently—high perplexity (lots of uncertainty).

● Perplexity = 2.75: The model’s “effective vocabulary” is ~2.75 words—it’s

● Evaluation: Compare models—bigram (e.g., 2.75) vs. trigram (maybe

🔹 Text Prediction (Smartphones, Keyboards - T9, SwiftKey)