0% found this document useful (0 votes)
7 views24 pages

PoSTagging-HMM

The document discusses Part-of-Speech (PoS) tagging and its implementation using Hidden Markov Models (HMM). It explains the process of assigning grammatical tags to words in a sentence, the challenges of ambiguity in tagging, and how contextual information can aid in disambiguation. The document also outlines the formulation of HMM for PoS tagging, including the use of transition and emission probabilities to determine the most likely sequence of tags for a given sentence.

Uploaded by

saurav22465
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views24 pages

PoSTagging-HMM

The document discusses Part-of-Speech (PoS) tagging and its implementation using Hidden Markov Models (HMM). It explains the process of assigning grammatical tags to words in a sentence, the challenges of ambiguity in tagging, and how contextual information can aid in disambiguation. The document also outlines the formulation of HMM for PoS tagging, including the use of transition and emission probabilities to determine the most likely sequence of tags for a given sentence.

Uploaded by

saurav22465
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Part-of-Speech Tagging and

Hidden Markov Model


Md Shad Akhtar
[email protected]

shadakhtar:nlp:iiitd:2025:POS:HMM
Part of Speech Tagging
● Given a PoS tag set T = [t1, t2, ⋯, tm]and a sentence S = [w1, w2, ⋯, wn]
○ Assign each word wi a tag tj that defines its grammatical class.

He ate an apple .

PRP VBD DT NN .

2
Shadakhtar:nlp:iiitd:2025:POS:HMM
Part of Speech Tagging
● A word can have more than one PoS tag.

He_PRP went_VBD to_TO the_DT park_NN in_IN a_DT car_NN ._.

He_PRP went_VBD to_TO park_VB the_DT car_NN in_IN the_DT shed_NN ._.

● Context can help in disambiguation!

bank1_Verb
I bank1 on the bank2 by the river bank3. bank2_Noun
bank3_Noun

3
Shadakhtar:nlp:iiitd:2025:POS:HMM
Part of Speech
● Grouping of words into equivalence classes w.r.t. the role they play in the syntactic
structure.
○ Contains information about the word and its neighbours, e.g., an article always follows a noun/
noun phrase.

● Traditionally,
○ Noun Content words (Open class)
○ Verb ● Semantically rich word
○ Adjective ● Not-fixed
○ Adverb
○ Pronoun
○ Preposition
Functional words (Closed class)
○ Conjunction ● Important to bind the sentence
○ Interjection ● Usually, have less information
● Fixed and Limited
4
Shadakhtar:nlp:iiitd:2025:POS:HMM
Part of Speech
Penn Tree-bank tagset
45 tags

5
Shadakhtar:nlp:iiitd:2025:POS:HMM
Ambiguity in POS tag
● Most words in English are unambiguous
○ Almost, 88.5% (Brown corpus)
○ If no ambiguity, PoS tagging is simple
■ learn a table of words and its corresponding tags.

● However, most common words are ambiguous.


○ Almost, 40% (tokens in Brown corpus)

● Disambiguation
○ Look for the contextual information, i.e., look-back or look-ahead.

shadakhtar:nlp:iiitd:2020:POS:HMM
Illustration of PoS tagging

7
shadakhtar:nlp:iiitd:2025:POS:HMM
An example
● S = Brown foxes jumped over the fence.
● T = ??

● Tag set = [NN, NNS, VB, VBS, VBD, JJ, RB, DT, IN, . ]

● Exhaustive search
○ For each token, we have to estimate the probability w.r.t. each tag.
|S|
○ O( | T | )
○ Very expensive!!
● If we know that a word can take only a handful of PoS tags, we can reduce this search.
○ How to get this information?
■ Corpus
○ Still, it can have lots of possibilities.
■ Retain only the most probable path so far, and discard others.
■ Viterbi Algorithm
8
Shadakhtar:nlp:iiitd:2025:POS:HMM
0 1 2 3 4 5 6 7
W: ^ Brown foxes jumped over the fence .
T: ^ JJ NNS VBD NN DT NN .
NN VBS JJ IN VB
JJ
RB

VBD NN DT NN .
NNS

JJ VB .
JJ IN DT
VBS
^
JJ DT
NNS

NN

VBS RB DT

shadakhtar:nlp:iiitd:2020:POS:HMM
VBD NN DT NN .
NNS

JJ VB .
JJ IN DT
VBS
^
JJ DT
NNS

NN

VBS RB DT

^ Brown foxes jumped over the fence .

● Pick a path to the leaf that maximizes the likelihood of the tag sequence.
○ E.g., the top most path,
■ ^ JJ NNS VBD NN DT NN .

shadakhtar:nlp:iiitd:2020:POS:HMM
Hidden Markov Model (HMM)

11
shadakhtar:nlp:iiitd:2025:POS:HMM
Motivation
Bag1 Bag 2 Bag 3

R → 30 R → 10 R → 60
G → 50 G → 40 G → 10
B → 20 B → 50 B → 30

● Assume, we have an observation of N balls withdrawn from these bags


Red Red Green Green Blue Red Green Red
B1, B2, or B3? B1, B2, or B3? B1, B2, or B3? … B1, B2, or B3?

● Question
○ Give the most likely sequence of bags for the withdrawal of above sequence of balls.
■ Not an easily computable problem.
12
Shadakhtar:nlp:iiitd:2025:POS:HMM
Transition probabilities Emission probabilities
Given two probability matrices B1 B2 B3 R G B

B1 0.1 0.4 0.5 B1 0.3 0.5 0.2

B2 0.6 0.2 0.2 B2 0.1 0.4 0.5

B3 0.3 0.4 0.3 B3 0.6 0.1 0.3

R, 0.03 R, 0.18
R, 0.15
G, 0.05 G, 0.03
G, 0.25
B, 0.02 B, 0.09
B, 0.10
R, 0.18
B1 G, 0.03 B3
B, 0.09
R, 0.24
G, 0.04
R,0.06 B, 0.12
G,0.24
R, 0.08 B,0.30
R, 0.02
G, 0.20
G, 0.08
B, 0.12
B2 B, 0.10

R, 0.02
G, 0.08 13 13
Shadakhtar:nlp:iiitd:2025:POS:HMM B, 0.10
Transition probabilities Emission probabilities
Given two probability matrices B1 B2 B3 R G B

B1 0.1 0.4 0.5 B1 0.3 0.5 0.2

B2 0.6 0.2 0.2 B2 0.1 0.4 0.5

B3 0.3 0.4 0.3 B3 0.6 0.1 0.3

^ R R G

R, 0.03 R, 0.18 B1, 0.009*0.1*0.5 0.00045


R, 0.15
G, 0.05 G, 0.03 B1, 0.3*0.1*0.3
G, 0.25 B2, 0.009*0.4*0.4 0.00144
B, 0.02 B, 0.09 =0.009
B, 0.10
R, 0.18 B3, 0.009*0.5*0.1 0.00045
B1 G, 0.03 B3
B, 0.09 B1, 0.012*0.6*0.5 0.0036
R, 0.24
Initial State
G, 0.04 B0 B1, 0.3*1 B2, 0.3*0.4*0.1
B2, 0.012*0.2*0.4 0.00048
R,0.06 B, 0.12 =0.3 =0.012
G,0.24
B3, 0.012*0.2*0.1 0.00024
R, 0.08 B,0.30
R, 0.02
G, 0.20 B1, 0.09*0.3*0.5 0.0108
G, 0.08
B, 0.12
B2 B, 0.10 B3, 0.3*0.5*0.6
B2, 0.09*0.4*0.4 0.0144
= 0.09
R, 0.02
G, 0.08 B3, 0.09*0.3*0.1 0.002714
Shadakhtar:nlp:iiitd:2025:POS:HMM B, 0.10
Formulation of HMM-based PoS tagging

o1 o2 o3 o4 o5 o6 o7 o8

Observation R R G G B R G R

State s1 s2 s3 s4 s5 s6 s7 s8

where,
si ∈ [B1, B2, B3]
S*: Best possible state sequence
Goal: Maximize P(S | O) tag sequence by choosing the best sequence S

S* = argmaxS P(S | O)

shadakhtar:nlp:iiitd:2020:POS:HMM
Formulation of HMM
S* = argmaxS P(S | O)

P(S | O) = P({s1, s2, s3, s4, s5, s6, s7, s8} | {o1, o2, o3, o4, o5, o6, o7, o8})

1. Apply chain-rule
= P(s1 | O)P(s2 | s1, O)P(s3 | s1s2, O)P(s4 | s1s2s3, O)⋯P(s8 | s1s2⋯s7, O)

2. Apply Markov assumption


= P(s1 | O)P(s2 | s1, O)P(s3 | s2, O)P(s4 | s3, O)⋯P(s8 | s7, O)

16
Shadakhtar:nlp:iiitd:2025:POS:HMM
Bayes’ Theorem

P(B | A)P(A)
P(A | B) =
P(B)

where,
P(A | B) = Posterior
P(B | A) = Likelihood
P(A) = Prior
P(B) = Normalizing constant

17
Shadakhtar:nlp:iiitd:2025:POS:HMM
Formulation of HMM
3. Apply Bayes’ theorem
S* = argmaxS P(S | O) = argmaxS P (O|S) . P(S)

Prior
P(S) = P(s1) . P(s2|s1) . P(s3|s2) ... P(s8|s7)

Likelihood
P(O | S) = P(o1 | S) P(o2 | o1, S) P(o3 | o2, S) P(o4 | o3, S) … P(o8 | o7 , S)
4. Ball withdrawal depends on the current bag/state only.

P(O | S) = P(o1 | s1) P(o2 | s2) P(o3 | s3) P(o4 | s4) … P(o8 | s8)

Putting prior and likelihood together


P(S | O) = P (O|S) . P(S)
= P(o1 | s1) P(o2 | s2) P(o3 | s3) P(o4 | s4) … P(o8 | s8) P(s1) . P(s2|s1) . P(s3|s2) ... P(s8|s7) 18
Shadakhtar:nlp:iiitd:2025:POS:HMM
o0 o1 o2 o3 o4 o5 o6 o7 o8

Formulation of HMM Observation ε R R G G B R G R

State s0 s1 s2 s3 s4 s5 s6 s7 s8 s9

We introduce two new states, s0 and s9, to represent the initial and final states with ε symbol represents
the start of the observation.

P(S | O) = [P(o0 | s0) P(s1|s0)] . P(S | O) = ∏k=1 P(ok | sk) P(sk+1|sk)


[P(o1 | s1) P(s2|s1)] .
[P(o2 | s2) P(s3|s2)]
ok
[P(o3 | s3) P(s4|s3)] P(S | O) = ∏k=1 P(sk → sk+1)
[P(o4 | s4) P(s5|s4)]
[P(o5 | s5) P(s6|s5)] where,
[P(o6 | s6) P(s7|s6)] P(s9|s8) = 1, is the transition probability from the state of the last observation

[P(o7 | s7) P(s8|s7)] s8 to the final state s9


P(s1|s0) is the initial transition probability.
[P(o8 | s8) P(s9|s8)]
P(o0 | s0) is the initial emission probability
19
Shadakhtar:nlp:iiitd:2025:POS:HMM
Decoding a state sequence a, 0.1
b, 0.2 a, 0.3
b, 0.4
a, 0.3
b, 0.2

S0 →ε S1 S2
0.0
a, 0.2
1.0
b, 0.3
S1 S2
0.1 0.3 0.2 0.3 →a
1*0.1 = 0.1 Observation sequence: a b a b
S1 1* 0.3 = 0.3 S2 S1 S2

0.2 0.4 0.0 0.0


0.3 0.2 →b

S1 S2 S1 S2
0.3 * 0.3 = 0.09 0.3 * 0.2 = 0.06
0.1 * 0.2 = 0.02 0.1 * 0.4 = 0.04
0.1 0.3
0.2 0.3 →a

0.012
0.009 S1 S2 0.0027 S1 S2 0.0018

0.2
0.3 0.2 0.4 →b

0.0081 S1 S2 0.0024 S1 S2 0.0048 20


Shadakhtar:nlp:iiitd:2025:POS:HMM 0.0054
Summary of HMM
● Problem definition:
○ S* = argmaxS P(S | O, M) where,
■ S → State sequence,
■ O → Observation sequence,
■ M → Model
● M = [Q, Q0, A, T] where,
○ Q → Set of states,
○ Q0 → Start state,
○ A → Alphabet, and
○ T → Transition function

ak
P(si → sj) ∀i, k, j

21
Shadakhtar:nlp:iiitd:2025:POS:HMM
Computing P(.) values for PoS tagging
• States (sk) are the tags (NN, JJ, VB) and
• Observations (ok) are the words in a sentence.

P(S | O) = ∏k=1 P(ok | sk) P(sk+1|sk)

shadakhtar:nlp:iiitd:2020:POS:HMM
Associated tasks with HMM
● Given an automaton and an observation sequence, find the best possible state sequences
○ S* = ?
○ Viterbi

● Given an automaton, find the probability of an observation sequence


○ P(O) = ?
Forward algorithm: F (k , i ) = P (o1o2 o3⋯ok , Si ) = ∑ F (k − 1, j ) . P (sj →ok si )

j=0,N

○ Probability of being in state si having seen o1o2 o3⋯ok

Backward algorithm: B(k , i ) = P (ok ok+1ok+2⋯om, si ) = ∑ B(k + 1, j )P (si →ok sj )



j=0,N
○ Probability of seeing ok ok+1ok+2⋯om given that the state was si

● Given the observation sequence, find the HMM parameters


○ M(T) = ?
○ Baum-Welch algorithm or Forward-Backward algorithm: EM
23
Shadakhtar:nlp:iiitd:2025:POS:HMM
Thanks

24
shadakhtar:nlp:iiitd:2025:POS:HMM

You might also like