Lecture05-Hmm Pos Tagging
Lecture05-Hmm Pos Tagging
Processing
Lecture 5: Sequence Labeling with Hidden Markov
Models. Part-of-Speech Tagging.
9/20/2024
COMS W4705
Daniel Bauer
Garden-Path Sentences
• The horse raced past the barn.
past participle
VBN VBD
[The horse raced past the barn] fell
NP
adjective
JJ NN
[The old dog] [the footsteps of the young]
NP NP
NNS VB
[The old] dog [the footsteps of the young]
NP NP
• ~9 traditional parts-of-speech:
• noun, pronoun, determiner, adjective, verb, adverb,
preposition, conjunction, interjection
Syntactic Ambiguities and
Parts-of-Speech
N / V? N / V? V / Preposition?
• Time ies like an arrow.
fl
Syntactic Ambiguities and
Parts-of-Speech
N N V
• [Time
NP
ies] like an arrow.
fl
Why do we need P.O.S.?
• Interacts with most levels of linguistic representation.
• Speech processing:
• lead (V) vs. lead (N).
• insult, insult
• object, object
• content, content
• Syntactic parsing
• …
DT NN VBD DT NNS IN DT NN .
the koala put the keys on the table .
P(words | tags)
START NN VBZ IN DT NN
transition probabilities
start t1 t2 t3
emission probabilities
w1 w2 w3
Important Tasks on HMMs
• Decoding: Given a sequence of words, nd the most likely tag
sequence.
(Bayesian inference using Viterbi algorithm).
IN IN IN IN P
NN NN NN NN NN
DT DT DT DT DT
0 IN IN IN IN IN
.2 x .3 = .06 NN NN NN NN NN
0 DT DT DT DT DT
IN
.06 NN NN NN
DT
.06 NN NN NN
DT
.06 NN NN NN
.0024
DT
.06 NN NN NN
.0024 0.7
.00216 x .7 x 1 = .001512
DT
.06 NN NN NN
.0024 .0001512
DT
• Create a table π, such that each entry π[k,t] contains the score of the
highest-probability sequence ending in tag t at time k.
• for k=1 to n:
• for t ∈ T:
emission probability
•
• return transition probability
Trigram Language Model
• Instead of using a unigram context , use a bigram
context .
• So the HMM probability for a given tag and word sequence is:
• Need to compute:
Forward Algorithm
• Input: Sequence of observed words w1, …, wn
• Create a table π, such that each entry π[k,t] contains the sum of the
probabilities of all tag/word sequences ending in tag t at time k.
• for k=1 to n:
• for t ∈ T:
•
• return
Named Entity Recognition
as Sequence Labeling
• Use 3 tags:
• O - outside of named entity
• I - inside named entity
• B - rst word (beginning) of named entity
O O B I O
… identi cation of tetronic acid in …