NLP Lec 11
NLP Lec 11
These are based on probabilistic methods and operate with hand-crafted features or word-based statistics. They were
dominant before deep learning became mainstream.
Common Types of Statistical Sequence Models:
N-Gram Models
Hidden Markov Models (HMM)
An N-Gram model is a probabilistic language model used to predict the next item (word, character, etc.) in
a sequence, based on the previous N−1 items. It assumes the Markov property, which means the probability
of a word depends only on the previous N−1 words.
Types of N-Grams
The model predicts the most probable next word, regardless of context.
Example:
Training Corpus:
"I love NLP. I love AI. AI loves me."
Tokenized Words:
[I, love, NLP, I, love, AI, AI, loves, me]
Word Counts:
I→2
love → 2
NLP → 1
AI → 2
loves → 1
me → 1
Total Words: 9
Unigram Probabilities:
We can simplify this equation, since the sum of all bigram counts that start with a given word wn-1 must be equal to the
unigram count for that word wn-1:
[<s>, I, love, NLP, </s>, <s>, I, love, AI, </s>, <s>, AI, loves, me, </s>]
<s> → AI 1 AI → </s> 1
I → love 2 AI → loves 1
love → AI 1 me → </s> 1
<s> 3 AI 2
I 2 loves 1
love 2 me 1
NLP 1 </s> 3
Step 3: Bigram Probabilities
To estimate this, we need to count how often each triple of consecutive words occurs in the corpus.
Sentences:
We'll add <s> <s> at the beginning of each sentence (to account for trigram context), and </s> at the end:
<s> I 2 love AI 1
<s> AI 1 AI loves 1
I love 2 loves me 1
I love NLP = 1
I love AI = 1
Total = 2
Probabilities:
So, if your context is “I love”, the trigram model says the next word is “NLP” or “AI” with equal probability.
Another Example:
AI loves me = 1
Count(AI loves) = 1
The model is confident the next word is “me” after “AI loves”