Lecture 6 To 8 N-Gram
Lecture 6 To 8 N-Gram
Spell correction-
Language models help in spell correction by predicting the most likely correct
word based on context and probability.
● A user types a word with a spelling mistake (e.g., "teh" instead of "the")
● The system needs to correct it based on context and word probability.
● "going"
● "gogin"
● "gonig"
The model chooses "going" because it has the highest probability in natural text.
N-Gram Models
A language model (LM) is a statistical model that predicts the next word in a
sequence given the previous words. It is essential in applications like speech
recognition, text generation, machine translation, and spell correction.
Compute the Bigram Probability of sentence <s> John drinks tea </s>
Why use Log Prob?
Example Interpretation
Perplexity is a measure of how well a language model predicts a sequence of words. Think of it
as the model’s “confusion level”—lower perplexity means the model is less confused, better at
guessing the next word. It’s widely used to evaluate N-gram models (and others) by quantifying
their predictive power on test data.
● Intuition: Imagine you’re guessing the next word in “The cat ___.” If your model strongly
predicts “runs” (high probability), it’s less perplexed. If it’s unsure (low probability spread
across many words), perplexity shoots up.
● Goal: Lower perplexity = better model. It’s like a score—aim for the lowest you can get!
N-gram models predict the probability of a word based on the previous N-1 words (e.g., bigrams
use 1 prior word, trigrams use 2). Perplexity tests how well these probabilities hold up on
unseen text:
✅ Advantages:
✔ Simple & Efficient: Easy to train and use for text generation.
✔ Works Well for Small Datasets: Performs decently for moderate text corpora.
❌ Limitations:
❌ Data Sparsity: Large N-Grams require huge amounts of training data. As N
increases, the number of possible N-grams grows exponentially, leading to sparse
❌
data and increased computational demands.
Lack of Long-Range Context: Cannot capture dependencies beyond
❌
N-words.
High Computational Cost: Higher N-Gram models require more memory.
Applications of N-Gram Models in NLP
https://colab.research.google.com/drive/1g5hVdk8hd6WF1LA-suTN1nOC7KWCa
FPE