PoSTagging-HMM
PoSTagging-HMM
shadakhtar:nlp:iiitd:2025:POS:HMM
Part of Speech Tagging
● Given a PoS tag set T = [t1, t2, ⋯, tm]and a sentence S = [w1, w2, ⋯, wn]
○ Assign each word wi a tag tj that defines its grammatical class.
He ate an apple .
PRP VBD DT NN .
2
Shadakhtar:nlp:iiitd:2025:POS:HMM
Part of Speech Tagging
● A word can have more than one PoS tag.
He_PRP went_VBD to_TO park_VB the_DT car_NN in_IN the_DT shed_NN ._.
bank1_Verb
I bank1 on the bank2 by the river bank3. bank2_Noun
bank3_Noun
3
Shadakhtar:nlp:iiitd:2025:POS:HMM
Part of Speech
● Grouping of words into equivalence classes w.r.t. the role they play in the syntactic
structure.
○ Contains information about the word and its neighbours, e.g., an article always follows a noun/
noun phrase.
● Traditionally,
○ Noun Content words (Open class)
○ Verb ● Semantically rich word
○ Adjective ● Not-fixed
○ Adverb
○ Pronoun
○ Preposition
Functional words (Closed class)
○ Conjunction ● Important to bind the sentence
○ Interjection ● Usually, have less information
● Fixed and Limited
4
Shadakhtar:nlp:iiitd:2025:POS:HMM
Part of Speech
Penn Tree-bank tagset
45 tags
5
Shadakhtar:nlp:iiitd:2025:POS:HMM
Ambiguity in POS tag
● Most words in English are unambiguous
○ Almost, 88.5% (Brown corpus)
○ If no ambiguity, PoS tagging is simple
■ learn a table of words and its corresponding tags.
● Disambiguation
○ Look for the contextual information, i.e., look-back or look-ahead.
shadakhtar:nlp:iiitd:2020:POS:HMM
Illustration of PoS tagging
7
shadakhtar:nlp:iiitd:2025:POS:HMM
An example
● S = Brown foxes jumped over the fence.
● T = ??
● Tag set = [NN, NNS, VB, VBS, VBD, JJ, RB, DT, IN, . ]
● Exhaustive search
○ For each token, we have to estimate the probability w.r.t. each tag.
|S|
○ O( | T | )
○ Very expensive!!
● If we know that a word can take only a handful of PoS tags, we can reduce this search.
○ How to get this information?
■ Corpus
○ Still, it can have lots of possibilities.
■ Retain only the most probable path so far, and discard others.
■ Viterbi Algorithm
8
Shadakhtar:nlp:iiitd:2025:POS:HMM
0 1 2 3 4 5 6 7
W: ^ Brown foxes jumped over the fence .
T: ^ JJ NNS VBD NN DT NN .
NN VBS JJ IN VB
JJ
RB
VBD NN DT NN .
NNS
JJ VB .
JJ IN DT
VBS
^
JJ DT
NNS
NN
VBS RB DT
shadakhtar:nlp:iiitd:2020:POS:HMM
VBD NN DT NN .
NNS
JJ VB .
JJ IN DT
VBS
^
JJ DT
NNS
NN
VBS RB DT
● Pick a path to the leaf that maximizes the likelihood of the tag sequence.
○ E.g., the top most path,
■ ^ JJ NNS VBD NN DT NN .
shadakhtar:nlp:iiitd:2020:POS:HMM
Hidden Markov Model (HMM)
11
shadakhtar:nlp:iiitd:2025:POS:HMM
Motivation
Bag1 Bag 2 Bag 3
R → 30 R → 10 R → 60
G → 50 G → 40 G → 10
B → 20 B → 50 B → 30
● Question
○ Give the most likely sequence of bags for the withdrawal of above sequence of balls.
■ Not an easily computable problem.
12
Shadakhtar:nlp:iiitd:2025:POS:HMM
Transition probabilities Emission probabilities
Given two probability matrices B1 B2 B3 R G B
R, 0.03 R, 0.18
R, 0.15
G, 0.05 G, 0.03
G, 0.25
B, 0.02 B, 0.09
B, 0.10
R, 0.18
B1 G, 0.03 B3
B, 0.09
R, 0.24
G, 0.04
R,0.06 B, 0.12
G,0.24
R, 0.08 B,0.30
R, 0.02
G, 0.20
G, 0.08
B, 0.12
B2 B, 0.10
R, 0.02
G, 0.08 13 13
Shadakhtar:nlp:iiitd:2025:POS:HMM B, 0.10
Transition probabilities Emission probabilities
Given two probability matrices B1 B2 B3 R G B
^ R R G
o1 o2 o3 o4 o5 o6 o7 o8
Observation R R G G B R G R
State s1 s2 s3 s4 s5 s6 s7 s8
where,
si ∈ [B1, B2, B3]
S*: Best possible state sequence
Goal: Maximize P(S | O) tag sequence by choosing the best sequence S
S* = argmaxS P(S | O)
shadakhtar:nlp:iiitd:2020:POS:HMM
Formulation of HMM
S* = argmaxS P(S | O)
P(S | O) = P({s1, s2, s3, s4, s5, s6, s7, s8} | {o1, o2, o3, o4, o5, o6, o7, o8})
1. Apply chain-rule
= P(s1 | O)P(s2 | s1, O)P(s3 | s1s2, O)P(s4 | s1s2s3, O)⋯P(s8 | s1s2⋯s7, O)
16
Shadakhtar:nlp:iiitd:2025:POS:HMM
Bayes’ Theorem
P(B | A)P(A)
P(A | B) =
P(B)
where,
P(A | B) = Posterior
P(B | A) = Likelihood
P(A) = Prior
P(B) = Normalizing constant
17
Shadakhtar:nlp:iiitd:2025:POS:HMM
Formulation of HMM
3. Apply Bayes’ theorem
S* = argmaxS P(S | O) = argmaxS P (O|S) . P(S)
Prior
P(S) = P(s1) . P(s2|s1) . P(s3|s2) ... P(s8|s7)
Likelihood
P(O | S) = P(o1 | S) P(o2 | o1, S) P(o3 | o2, S) P(o4 | o3, S) … P(o8 | o7 , S)
4. Ball withdrawal depends on the current bag/state only.
P(O | S) = P(o1 | s1) P(o2 | s2) P(o3 | s3) P(o4 | s4) … P(o8 | s8)
State s0 s1 s2 s3 s4 s5 s6 s7 s8 s9
We introduce two new states, s0 and s9, to represent the initial and final states with ε symbol represents
the start of the observation.
S0 →ε S1 S2
0.0
a, 0.2
1.0
b, 0.3
S1 S2
0.1 0.3 0.2 0.3 →a
1*0.1 = 0.1 Observation sequence: a b a b
S1 1* 0.3 = 0.3 S2 S1 S2
S1 S2 S1 S2
0.3 * 0.3 = 0.09 0.3 * 0.2 = 0.06
0.1 * 0.2 = 0.02 0.1 * 0.4 = 0.04
0.1 0.3
0.2 0.3 →a
0.012
0.009 S1 S2 0.0027 S1 S2 0.0018
0.2
0.3 0.2 0.4 →b
ak
P(si → sj) ∀i, k, j
21
Shadakhtar:nlp:iiitd:2025:POS:HMM
Computing P(.) values for PoS tagging
• States (sk) are the tags (NN, JJ, VB) and
• Observations (ok) are the words in a sentence.
shadakhtar:nlp:iiitd:2020:POS:HMM
Associated tasks with HMM
● Given an automaton and an observation sequence, find the best possible state sequences
○ S* = ?
○ Viterbi
24
shadakhtar:nlp:iiitd:2025:POS:HMM