0% found this document useful (0 votes)

16 views156 pages

Hidden Markov Models and POS Tagging

NLP

Uploaded by

Rasha Elsayed Sakr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views156 pages

Hidden Markov Models and POS Tagging

NLP

Uploaded by

Rasha Elsayed Sakr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 156

Hidden Markov

Models and
POS Tagging
Natalie Parde
UIC CS 421
In general: assigning labels to
individual tokens or spans of
tokens given a longer string of
input
Sequence
Modeling
2 and
Sequence article verb preposition noun
Labeling
The students were excited about the lecture.
Natalie Parde - UIC CS 421

noun adjective article

Sequence Labeling
• Objective: Find the label for the next item, based on the labels of other
items in the sequence.

verb determiner verb noun

Give me a break! Did the window break?

pronoun noun determiner verb

Natalie Parde - UIC CS 421 3

• In document-level text classification,
models assume that the individual
datapoints being classified are
Why perform disconnected and independent
• Many NLP problems do not satisfy this
sequence assumption! Instead, they involve

Natalie Parde - UIC CS 421

• Interconnected decisions
labeling? • Each of which are mutually
dependent
• Each of which resolve different
ambiguities

4
• Named entity recognition
• Semantic role labeling
Example
Sequence person organization

Labeling Natalie Parde works at the University of Illinois at

Chicago and lives in Chicago, Illinois.
Applications location

agent source destination

Natalie drove for 15 hours from Dallas to Chicago in her
hail-damaged Honda Accord.
instrument

Natalie Parde - UIC CS 421 5

This Hidden Markov Models

Week’s
Forward Algorithm
Viterbi Algorithm
Forward-Backward

Topics
Algorithm

Thursday

Tuesday

Parts of Speech
POS Tagsets
POS Tagging

Natalie Parde - UIC CS 421 6

This Hidden Markov Models

Week’s
Forward Algorithm
Viterbi Algorithm
Forward-Backward

Topics
Algorithm

Thursday

Tuesday

Parts of Speech
POS Tagsets
POS Tagging

Natalie Parde - UIC CS 421 7

Probabilistic Sequence Models
• We can perform multiple, interdependent classifications to address a greater problem
using probabilistic sequence models
• These models can be neural networks, but they can also be lighter-weight alternatives
closer to finite state automata known as hidden Markov models
• Hidden Markov models are probabilistic generative models for sequences that make
predictions based on an underlying set of hidden states

Natalie Parde - UIC CS 421 8

• Finite state automata with probabilistic

What are state transitions

• Markov Property: The future is independent of

Markov the past, given the present.

• In other words, the next state only depends
on the current state …it is independent of
Models? previous history.
• Also referred to as Markov Chains

Natalie Parde - UIC CS 421

9
Sample Markov Model
.1 .1
q2
.7
.2

q0 q4

.7 .1 .3
.2 .2
.4 .1 .4 .3
q1 q3
.2

Natalie Parde - UIC CS 421 10

Sample Markov Model
.1 .1 P(q3 q2 q1 q4)
q2 = .2 * .1 * .2 * .3
= .0012
.7
.2

q0 q4

.7 .1 .3
.2 .2
.4 .1 .4 .3
q1 q3
.2

Natalie Parde - UIC CS 421 11

Hidden Markov Models
• Markov models that assume an underlying set of
hidden (unobserved) states in which the model can be
• Assume probabilistic transitions between states over
time
• Assume probabilistic generation of items (e.g., tokens)
from states

12
Formal Definition

• A Hidden Markov Model can be specified by enumerating the following properties:

• The set of states, Q
• A sequence of observation likelihoods, B, also called emission probabilities,
each expressing the probability of an observation being generated from a state i
• A start state, q0, and final state, qF, that are not associated with observations
Natalie Parde - UIC CS 421

13
Sample Hidden Markov Model
𝑃(𝑥|𝑞" ) .1
𝑃(𝑦|𝑞" ) = .4
.1 .1
𝑃(𝑧|𝑞" ) .5
q2
.7
.2

q0 q4

.7 .1 .3
.2 .2
.4 .1 .4 .3
q1 q3 𝑃(𝑥|𝑞# ) .7
𝑃(𝑥|𝑞! ) .2 .2 𝑃(𝑦|𝑞# ) = .1
𝑃(𝑦|𝑞! ) = .4 𝑃(𝑧|𝑞# ) .2
𝑃(𝑧|𝑞! ) .4
Natalie Parde - UIC CS 421 14
Formal Definition

• A Hidden Markov Model can be specified by enumerating the following properties:

• The set of states, Q
• A sequence of observation likelihoods, B, also called emission probabilities,
each expressing the probability of an observation ot being generated from a
state i
• A start state, q0, and final state, qF, that are not associated with observations,
together with transition probabilities out of q0 and into qF
• A transition probability matrix, A, where each aij represents the probability of
moving from state i to state j, such that ∑$!"# 𝑎%! = 1 ∀𝑖
Natalie Parde - UIC CS 421

• A sequence of T observations, O, each drawn from a vocabulary V = v1, v2, …,

15
Sample Hidden Markov Model
B2
𝑃(𝑥|𝑞" ) .1
a02 = .1 a24 = .1
𝑃(𝑦|𝑞" ) = .4
q2 𝑃(𝑧|𝑞" ) .5
O = x, y, z a21 = .2 a23 = .7

q0 q4

a11 = .1 a14 = .3
a01 = .7 a33 = .3
a13 = .2
a03 = .2
a32 = .1 a34 = .4 B3
q1 a12 = .4 q3
B1 𝑃(𝑥|𝑞# ) .7
𝑃(𝑥|𝑞! ) .2 a31 = .2 𝑃(𝑦|𝑞# ) = .1
𝑃(𝑦|𝑞! ) = .4 𝑃(𝑧|𝑞# ) .2
𝑃(𝑧|𝑞! ) .4
Natalie Parde - UIC CS 421 16
Corresponding Transition Matrix

q0 q1 q2 q3 q4

a02 = .1 a24 = .1
B2
!(#|%" ) .1
q0 N/A .7 .1 .2 N/A
!('|%" ) = .4
q2 !((|%" ) .5

q1
O = x, y, z a21 = .2 a23 = .7

q0 q4

a11 = .1
q2
a01 = .7 a33 = .3 a14 = .3
a13 = .2
a03 = .2

q3
a32 = .1 a34 = .4 B3
q1 a12 = .4 q3
B1 !(#|%# ) .7
!(#|%! ) .2 a31 = .2 !('|%# ) = .1
!('|%! ) = .4 !((|%# ) .2
!((|%! ) .4

Natalie Parde - UIC CS 421 17

Corresponding Transition Matrix

q0 q1 q2 q3 q4

a02 = .1 a24 = .1
B2
!(#|%" ) .1
q0 N/A .7 .1 .2 N/A
!('|%" ) = .4
q2 !((|%" ) .5

q1 N/A .1 .4 .2 .3
O = x, y, z a21 = .2 a23 = .7

q0 q4

a11 = .1
q2
a01 = .7 a33 = .3 a14 = .3
a13 = .2
a03 = .2

q3
a32 = .1 a34 = .4 B3
q1 a12 = .4 q3
B1 !(#|%# ) .7
!(#|%! ) .2 a31 = .2 !('|%# ) = .1
!('|%! ) = .4 !((|%# ) .2
!((|%! ) .4

Natalie Parde - UIC CS 421 18

Corresponding Transition Matrix

q0 q1 q2 q3 q4

a02 = .1 a24 = .1
B2
!(#|%" ) .1
q0 N/A .7 .1 .2 N/A
!('|%" ) = .4
q2 !((|%" ) .5

q1 N/A .1 .4 .2 .3
O = x, y, z a21 = .2 a23 = .7

q0 q4

a11 = .1
q2 N/A .2 N/A .7 .1
a01 = .7 a33 = .3 a14 = .3
a13 = .2
a03 = .2

q3
a32 = .1 a34 = .4 B3
q1 a12 = .4 q3
B1 !(#|%# ) .7
!(#|%! ) .2 a31 = .2 !('|%# ) = .1
!('|%! ) = .4 !((|%# ) .2
!((|%! ) .4

Natalie Parde - UIC CS 421 19

Corresponding Transition Matrix

q0 q1 q2 q3 q4

a02 = .1 a24 = .1
B2
!(#|%" ) .1
q0 N/A .7 .1 .2 N/A
!('|%" ) = .4
q2 !((|%" ) .5

q1 N/A .1 .4 .2 .3
O = x, y, z a21 = .2 a23 = .7

q0 q4

a11 = .1
q2 N/A .2 N/A .7 .1
a01 = .7 a33 = .3 a14 = .3
a13 = .2
a03 = .2

q3 N/A .2 .1 .3 .4
a32 = .1 a34 = .4 B3
q1 a12 = .4 q3
B1 !(#|%# ) .7
!(#|%! ) .2 a31 = .2 !('|%# ) = .1
!('|%! ) = .4 !((|%# ) .2
!((|%! ) .4

Natalie Parde - UIC CS 421 20

Corresponding Transition Matrix

q0 q1 q2 q3 q4

a02 = .1 a24 = .1
B2
!(#|%" ) .1
q0 N/A .7 .1 .2 N/A
!('|%" ) = .4
q2 !((|%" ) .5

q1 N/A .1 .4 .2 .3
O = x, y, z a21 = .2 a23 = .7

q0 q4

a11 = .1
q2 N/A .2 N/A .7 .1
a01 = .7 a33 = .3 a14 = .3
a13 = .2
a03 = .2

q3 N/A .2 .1 .3 .4
a32 = .1 a34 = .4 B3
q1 a12 = .4 q3
B1 !(#|%# ) .7
!(#|%! ) .2 a31 = .2 !('|%# ) = .1
!('|%! ) = .4 !((|%# ) .2
!((|%! ) .4

q4 N/A N/A N/A N/A N/A

Natalie Parde - UIC CS 421 21

HMMs can also • More generally, you can use an HMM to
be used for generate a sequence of T observations: O
probabilistic text = o1, o2, …, oT
generation!
Begin in the start state
For t in [0, …, T]:
Randomly select a new state based on the
22 transition distribution for the current state
Randomly select an observation from the new state
based on the observation distribution for that state
Natalie Parde - UIC CS 421
Sample Text Generation
dog = .2, cat = .3,
a02 = .1 a24 = .1 lizard = .1, unicorn = .4
q2
a21 = .2 a23 = .7

q0 q4

a11 = .1 a14 = .3
a01 = .7 a33 = .3
a13 = .2
a03 = .2
a32 = .1 a34 = .4
q1 a12 = .4 q3
laughed = .5, ate = .2,
a31 = .2 slept = .3
the = .3, her = .1,
my = .3, Devika’s = .3
Natalie Parde - UIC CS 421 23
Sample Text Generation
dog = .2, cat = .3,
a02 = .1 a24 = .1 lizard = .1, unicorn = .4
q2
a21 = .2 a23 = .7

q0 q4

a11 = .1 a14 = .3
a01 = .7 a33 = .3
a13 = .2
a03 = .2
a32 = .1 a34 = .4
q1 a12 = .4 q3
laughed = .5, ate = .2,
a31 = .2 slept = .3
the = .3, her = .1,
my = .3, Devika’s = .3
Natalie Parde - UIC CS 421 24
Sample Text Generation
dog = .2, cat = .3,
a02 = .1 a24 = .1 lizard = .1, unicorn = .4
q2
a21 = .2 a23 = .7

q0 q4

a11 = .1 a14 = .3
a01 = .7 a33 = .3
a13 = .2
a03 = .2
a32 = .1 a34 = .4
q1 a12 = .4 q3
laughed = .5, ate = .2,
a31 = .2 slept = .3
the = .3, her = .1,
my = .3, Devika’s = .3
Natalie Parde - UIC CS 421 25
Sample Text Generation
dog = .2, cat = .3,
a02 = .1 a24 = .1 lizard = .1, unicorn = .4
q2
a21 = .2 a23 = .7

q0 q4

a11 = .1 a14 = .3
a01 = .7 a33 = .3
a13 = .2
a03 = .2
a32 = .1 a34 = .4
q1 a12 = .4 q3
laughed = .5, ate = .2,
a31 = .2 slept = .3
the = .3, her = .1,
my = .3, Devika’s = .3
Natalie Parde - UIC CS 421 26
Sample Text Generation
dog = .2, cat = .3,
a02 = .1 a24 = .1 lizard = .1, unicorn = .4
q2
a21 = .2 a23 = .7

q0 q4

a11 = .1 a14 = .3
a01 = .7 a33 = .3
a13 = .2
a03 = .2
a32 = .1 a34 = .4
q1 a12 = .4 q3
laughed = .5, ate = .2,
a31 = .2 slept = .3
the = .3, her = .1,
my = .3, Devika’s = .3
Natalie Parde - UIC CS 421 27
Sample Text Generation
dog = .2, cat = .3,
a02 = .1 a24 = .1 lizard = .1, unicorn = .4
q2
a21 = .2 a23 = .7

q0 q4

a11 = .1 a14 = .3
a01 = .7 a33 = .3
a13 = .2
a03 = .2
a32 = .1 a34 = .4
q1 a12 = .4 q3
laughed = .5, ate = .2,
a31 = .2 slept = .3
the = .3, her = .1,
my = .3, Devika’s = .3
Natalie Parde - UIC CS 421 28
Sample Text Generation
dog = .2, cat = .3,
a02 = .1 a24 = .1 lizard = .1, unicorn = .4
q2
a21 = .2 a23 = .7

q0 q4

a11 = .1 a14 = .3
a01 = .7 a33 = .3
a13 = .2
a03 = .2
a32 = .1 a34 = .4
q1 a12 = .4 q3
laughed = .5, ate = .2,
a31 = .2 slept = .3
the = .3, her = .1,
my = .3, Devika’s = .3
Natalie Parde - UIC CS 421 29
Sample Text Generation
dog = .2, cat = .3,
a02 = .1 a24 = .1 lizard = .1, unicorn = .4
q2
a21 = .2 a23 = .7

my unicorn laughed
q0 q4

a11 = .1 a14 = .3
a01 = .7 a33 = .3
a13 = .2
a03 = .2
a32 = .1 a34 = .4
q1 a12 = .4 q3
laughed = .5, ate = .2,
a31 = .2 slept = .3
the = .3, her = .1,
my = .3, Devika’s = .3
Natalie Parde - UIC CS 421 30
Three Fundamental HMM Problems

• Observation Likelihood: How likely is a particular observation

sequence to occur?
• Decoding: What is the best sequence of hidden states for an
observed sequence?
• What is the best sequence of labels for our test data?
• Learning: What are the transition probabilities and observation
likelihoods that best fit the observation sequence and HMM states?
• How do we empirically fit our training data?

Natalie Parde - UIC CS 421

31
This Hidden Markov Models

Week’s
Forward Algorithm
Viterbi Algorithm
Forward-Backward

Topics
Algorithm

Thursday

Tuesday

Parts of Speech
POS Tagsets
POS Tagging

Natalie Parde - UIC CS 421 32

• Given a sequence of
observations and an
HMM, what is the
probability that this
sequence was generated
Observation by the model?
Likelihood • Useful for two tasks:
• Sequence
classification
• Selecting the most
likely sequence

Natalie Parde - UIC CS 421

33
Sequence Classification

• Assuming an HMM is available for every possible class,

what is the most likely class for a given observation
sequence?
• Which HMM is most likely to have generated the
sequence?

Natalie Parde - UIC CS 421

34
Most Likely Sequence
• Of two or more possible sequences, which one was most likely generated
by a given HMM?

I love long and Oh, yay, I just looooove

confusing homework Sarcasm
long and confusing
assignments. homework assignments.

35
How can we compute the observation
likelihood?
• Naïve Solution:
• Consider all possible state sequences, Q, of length T that the model, 𝜆, could
have traversed in generating the given observation sequence, O
• Compute the probability of a given state sequence from A, and multiply it by
the probability of generating the given observation sequence for that state
sequence
• P(O,Q | 𝜆) = P(O | Q, 𝜆) * P(Q | 𝜆)
• Repeat for all possible state sequences, and sum over all to get P(O | 𝜆)
• But, this is computationally complex!
• O(TNT)

Natalie Parde - UIC CS 421

36
How can we compute the
observation likelihood?

• Efficient Solution:
• Forward Algorithm: Dynamic programming
algorithm that computes the observation
probability by summing over the probabilities
of all possible hidden state paths that could
generate the observation sequence.
• Implicitly folds each of these paths into a
single forward trellis
• Why does this work?
• Markov assumption (the probability of being in
any state at a given time t only relies on the
probability of being in each possible state at
time t-1)
• Works in O(TN2) time!

Natalie Parde - UIC CS 421 37

How does the forward algorithm work?
• Let 𝛼! (𝑗) be the probability of being in state j after seeing the first t observations,
given your HMM 𝜆
• 𝛼! (𝑗) is computed by summing over the probabilities of every path that could lead
you to this cell
• 𝛼! 𝑗 = 𝑃 𝑜" , 𝑜# … 𝑜$ , 𝑞$ = 𝑗 𝜆 = ∑& !%" 𝛼$'" (𝑖)𝑎!( 𝑏( (𝑜$ )
• 𝛼$'" (𝑖): The previous forward path probability from the previous time step
• 𝑎!( : The transition probability from previous state qi to current state qj
• 𝑏( (𝑜$ ): The state observation likelihood of the observed item ot given the
current state j

Natalie Parde - UIC CS 421

38
Formal Algorithm
create a probability matrix forward[N+2,T]

for each state q in [1, …, N] do:

forward[q,1] ← a0,q * bq(o1)
for each time step t from 2 to T do:
for each state q in [1, …, N] do:
forward[q,t] ←∑3!
0 12 𝑓𝑜𝑟𝑤𝑎𝑟𝑑 𝑞 4 , 𝑡 − 1 ∗ 𝑎 ! ∗ 𝑏 (𝑜 )
0 ,0 0 6

forwardprob ←∑3
012 𝑓𝑜𝑟𝑤𝑎𝑟𝑑 𝑞, 𝑇

Natalie Parde - UIC CS 421 39

Sample Problem
• You’re trying to solve a problem that relies on you knowing which
days it was hot and cold in Chicago during the summer of 1923
• Unfortunately, you have no official records of the weather in Chicago
for that summer, although you’re trying to model some key weather
patterns from that year using an HMM
• You do have one promising lead: You find a detailed diary tracking
how many ice cream cones the author of that diary ate on each day
• You decide to focus on a three-day sequence:
• Day 1: 3 ice cream cones
• Day 2: 1 ice cream cone
• Day 3: 3 ice cream cones
• Your first task is to determine whether this HMM does a good job at
modeling your sequence

Natalie Parde - UIC CS 421 40

Your HMM
.7
.8 B1
𝑃(1|ℎ𝑜𝑡! ) .2
hot1 𝑃(2|ℎ𝑜𝑡! ) = .4
𝑃(3|ℎ𝑜𝑡! ) .4

q0 .3 .4
.6
B2
𝑃(1|𝑐𝑜𝑙𝑑! ) .5
cold2
𝑃(2|𝑐𝑜𝑙𝑑! ) = .4
.2 𝑃(3|𝑐𝑜𝑙𝑑! ) .1

Natalie Parde - UIC CS 421 41

Forward Trellis

• Incorporates all the information you’ll need

to implement the forward algorithm
• Observations
• Transition probabilities
• State observation likelihoods
• Forward probabilities from earlier
observations

Natalie Parde - UIC CS 421 42

Forward Step
𝛼 t-2(N) 𝛼 t-1(N)

qN qN 𝑎&$ 𝛼 t(j) = ∑! 𝛼"#$ (𝑖)𝑎!% 𝑏% (𝑜" ) qN

qj
…

…
𝛼 t-2(2) 𝛼 t-1(2) 𝑎"$

q2 q2 q2
𝑎!$
𝛼 t-2(1) 𝛼 t-1(1)

q1 q1 𝑏$ (𝑜% ) q1

Ot-2 Ot-1 Ot Ot+1

Natalie Parde - UIC CS 421 43

.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Forward Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

qF end end end end

q2 h h h h

q1 c c c c

q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 44
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Forward Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

qF end end end end

q2 h h h h
)
|h
(3
*P
4 t)
* . tar
.8 (h|s

q1 c c c c
P

c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 45
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Forward Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

qF end end end end

𝛼 1(h) = .32

q2 h h h h
)
|h
(3

𝛼 1(c) = .02
*P
4 t)
* . tar
.8 (h|s

q1 c c c c
P

c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 46
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Forward Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

qF end end end end

𝛼 1(h) = .32
P(h|h) * P(1|h)
.7* .2
q2 h h h h
P(c
)
|h

.3* |h) *
(3

𝛼 1(c) = .02
*P

.5 P(1
| c)
4 t)
* . tar
.8 (h|s

q1 c c c c
P

c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 47
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Forward Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

qF end end end end

𝛼 1(h) = .32
P(h|h) * P(1|h)
.7* .2
q2 h h h h
h)
(1| P(c
)
|h

P .3* |h) *
(3

*
𝛼 1(c) = .02 )
*P

h|c .5 P(1
P( .2 | c)
4 t)

.4*
* . tar
.8 (h|s

q1 c c c c
P

P(c|c) * P(1|c)
.6* .5
c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 48
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Forward Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

qF end end end end

𝛼 1(h) = .32 𝛼 2(h) = .32 * .14 + .02 * .08 = .0464

P(h|h) * P(1|h)
.7* .2
q2 h h h h
h)
(1| P(c
)
|h

P .3* |h) *
(3

* 𝛼 2(c) = .32 * .15 + .02 * .30 = .054

𝛼 1(c) = .02 )
*P

h|c .5 P(1
P( .2 | c)
4 t)

.4*
* . tar
.8 (h|s

q1 c c c c
P

P(c|c) * P(1|c)
.6* .5
c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 49
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Forward Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

qF end end end end

𝛼 1(h) = .32 𝛼 2(h) = .0464

P(h|h) * P(1|h) P(h|h) * P(3|h)
.7* .2 .7* .4
q2 h h h h
h) h)
(1| P(c (3| P(c
)
|h

P .3* |h) * P .3* |h) *

* 𝛼 2(c) = .054 |c) *

𝛼 1(c) = .02 )
*P

h|c .5 P(1
(h .1 P(3
P( .2 | c) P .4 | c)
4 t)

.4* .4*
* . tar
.8 (h|s

q1 c c c c
P

P(c|c) * P(1|c) P(c|c) * P(3|c)

.6* .5 .6* .1
c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 50
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Forward Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

qF end end end end

𝛼 1(h) = .32 𝛼 2(h) = .0464 𝛼 3(h) = .0464 * .28 + .054 * .16 = .021632
P(h|h) * P(1|h) P(h|h) * P(3|h)
.7* .2 .7* .4
q2 h h h h
h) h)
(1| P(c (3| P(c
)
|h

P .3* |h) * P .3* |h) * 𝛼 3(c) = .0464 * .03 + .054 * .06 = .004632
(3

* 𝛼 2(c) = .054 |c) *

𝛼 1(c) = .02 )
*P

h|c .5 P(1
(h .1 P(3
P( .2 | c) P .4 | c)
4 t)

.4* .4*
* . tar
.8 (h|s

q1 c c c c
P

P(c|c) * P(1|c) P(c|c) * P(3|c)

.6* .5 .6* .1
c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 51
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Forward Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

𝛼 = .021632 + .004632 = 0.026264

qF end end end end

𝛼 1(h) = .32 𝛼 2(h) = .0464 𝛼 3(h) = .0464 * .28 + .054 * .16 = .021632
P(h|h) * P(1|h) P(h|h) * P(3|h)
.7* .2 .7* .4
q2 h h h h
h) h)
(1| P(c (3| P(c
)
|h

P .3* |h) * P .3* |h) * 𝛼 3(c) = .0464 * .03 + .054 * .06 = .004632
(3

* 𝛼 2(c) = .054 |c) *

𝛼 1(c) = .02 )
*P

h|c .5 P(1
(h .1 P(3
P( .2 | c) P .4 | c)
4 t)

.4* .4*
* . tar
.8 (h|s

q1 c c c c
P

P(c|c) * P(1|c) P(c|c) * P(3|c)

.6* .5 .6* .1
c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 52
We’ve so far • What is the probability that a sequence
tackled one of observations fits a given HMM?
of the • Calculate using forward probabilities!
fundamental • However, there are still two remaining
HMM tasks. tasks to explore….

Natalie Parde - UIC CS 421 53

This Hidden Markov Models

Week’s
Forward Algorithm
Viterbi Algorithm
Forward-Backward

Topics
Algorithm

Thursday

Tuesday

Parts of Speech
POS Tagsets
POS Tagging

Natalie Parde - UIC CS 421 54

Decoding
• Given an observation sequence
and an HMM, what is the best
hidden state sequence?
• How do we choose a state
sequence that is optimal in some
sense (e.g., best explains the
observations)?
• Very useful for sequence
labeling!

Natalie Parde - UIC CS 421

55
Decoding

• Naïve Approach:
• For each hidden state sequence Q, compute P(O|Q)
• Pick the sequence with the highest probability
• However, this is computationally inefficient!
• O(NT)

Natalie Parde - UIC CS 421

56
• Viterbi Algorithm
• Another dynamic programming algorithm
• Uses a similar trellis to the Forward algorithm
How can • Viterbi time complexity: O(N2T)

we decode
sequences
more
efficiently?

Natalie Parde - UIC CS 421

57
Viterbi Intuition

• Goal: Compute the joint probability of the observation sequence together with the
best state sequence
• So, recursively compute the probability of the most likely subsequence of
states that accounts for the first t observations and ends in state qj.
• 𝑣$ 𝑗 = max 𝑃 𝑞, , 𝑞" , … , 𝑞$'" , 𝑜" , … , 𝑜$ , 𝑞$ = 𝑞( |𝜆
)' ,)( ,…,))*(
• Also record backpointers that subsequently allow you to backtrace the most
probable state sequence
• 𝑏𝑡$ (𝑗) stores the state at time t-1 that maximizes the probability that the
system was in state qj at time t, given the observed sequence

Natalie Parde - UIC CS 421

58
Formal Algorithm
create a path probability matrix Viterbi[N+2,T]

for each state q in [1,…,N] do:

Viterbi[q,1] ← a0,q * bq(o1)
backpointer[q,1] ← 0
for each time step t in [2,…,T] do:
for each state q in [1,…,N] do:
𝑣𝑖𝑡𝑒𝑟𝑏𝑖[𝑞, 𝑡] ← ) max 𝑣𝑖𝑡𝑒𝑟𝑏𝑖 𝑞- , 𝑡 − 1 ∗ 𝑎&) ,& ∗ 𝑏& (𝑜. )
& ∈[#,…,+]
𝑏𝑎𝑐𝑘𝑝𝑜𝑖𝑛𝑡𝑒𝑟[𝑞, 𝑡] ← argmax 𝑣𝑖𝑡𝑒𝑟𝑏𝑖 𝑞- , 𝑡 − 1 ∗ 𝑎&) ,& ∗ 𝑏& (𝑜. )
&) ∈[#,…,+]
bestpathprob ← max 𝑣𝑖𝑡𝑒𝑟𝑏𝑖 𝑞′, 𝑇
&) ∈ #,…,+
bestpathpointer ← argmax 𝑣𝑖𝑡𝑒𝑟𝑏𝑖 𝑞′, 𝑇
&) ∈ #,…,+

Natalie Parde - UIC CS 421 59

• Viterbi is basically the forward
Seem algorithm + backpointers!
• Instead of summing across prior
familiar? forward probabilities, we use a max
function

Natalie Parde - UIC CS 421

60
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Viterbi Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

qF end end end end

q2 h h h h

q1 c c c c

q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 61
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Viterbi Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

qF end end end end

q2 h h h h
)
|h
(3
*P
4 t)
* . tar
.8 (h|s

q1 c c c c
P

c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 62
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Viterbi Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

qF end end end end

𝑣1(h) = .32

q2 h h h h
)
|h
(3

𝑣1(c) = .02
*P
4 t)
* . tar
.8 (h|s

q1 c c c c
P

c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 63
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Viterbi Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

qF end end end end

𝑣1(h) = .32
P(h|h) * P(1|h)
.7* .2
q2 h h h h
P(c
)
|h

.3* |h) *
(3

𝑣1(c) = .02
*P

.5 P(1
| c)
4 t)
* . tar
.8 (h|s

q1 c c c c
P

c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 64
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Viterbi Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

qF end end end end

𝑣1(h) = .32
P(h|h) * P(1|h)
.7* .2
q2 h h h h
h)
(1| P(c
)
|h

P .3* |h) *
(3

*
𝑣1(c) = .02 )
*P

h|c .5 P(1
P( .2 | c)
4 t)

.4*
* . tar
.8 (h|s

q1 c c c c
P

P(c|c) * P(1|c)
.6* .5
c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 65
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Viterbi Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

qF end end end end

𝑣1(h) = .32 𝑣2(h) = max(.32 * .14, .02 * .08) = .0448

P(h|h) * P(1|h)
.7* .2
q2 h h h h
h)
(1| P(c
)
|h

P .3* |h) *
(3

* 𝑣2(c) = max(.32 * .15, .02 * .30) = .048

𝑣1(c) = .02 )
*P

h|c .5 P(1
P( .2 | c)
4 t)

.4*
* . tar
.8 (h|s

q1 c c c c
P

P(c|c) * P(1|c)
.6* .5
c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 66
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Viterbi Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

qF end end end end

𝑣1(h) = .32 𝑣2(h) = .0448

P(h|h) * P(1|h) P(h|h) * P(3|h)
.7* .2 .7* .4
q2 h h h h
h) h)
(1| P(c (3| P(c
)
|h

P .3* |h) * P .3* |h) *

* 𝑣2(c) = .048 *
𝑣1(c) = .02 ) )
*P

h|c .5 P(1 h|c .1 P(3

P( .2 | c) P( .4 | c)
4 t)

.4* .4*
* . tar
.8 (h|s

q1 c c c c
P

P(c|c) * P(1|c) P(c|c) * P(3|c)

.6* .5 .6* .1
c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 67
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Viterbi Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

qF end end end end

𝑣1(h) = .32 𝑣2(h) = .0448 𝑣3(h) = max(.0448 * .28, .048 * .16) = .01254
P(h|h) * P(1|h) P(h|h) * P(3|h)
.7* .2 .7* .4
q2 h h h h
h) h)
(1| P(c (3| P(c
)
|h

P .3* |h) * P .3* |h) * 𝑣3(c) = .max(.0448 * .03, .048 * .06) = .00288
(3

* 𝑣2(c) = .048 *
𝑣1(c) = .02 ) )
*P

h|c .5 P(1 h|c .1 P(3

P( .2 | c) P( .4 | c)
4 t)

.4* .4*
* . tar
.8 (h|s

q1 c c c c
P

P(c|c) * P(1|c) P(c|c) * P(3|c)

.6* .5 .6* .1
c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 68
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Viterbi Trellis q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

𝑏𝑒𝑠𝑡𝑝𝑎𝑡ℎ𝑝𝑟𝑜𝑏 = max(.01254, .00288) = .01254

qF end end end end

𝑣1(h) = .32 𝑣2(h) = .0448 𝑣3(h) = max(.0448 * .28, .048 * .16) = .01254
P(h|h) * P(1|h) P(h|h) * P(3|h)
.7* .2 .7* .4
q2 h h h h
h) h)
(1| P(c (3| P(c
)
|h

P .3* |h) * P .3* |h) * 𝑣3(c) = .max(.0448 * .03, .048 * .06) = .00288
(3

* 𝑣2(c) = .048 *
𝑣1(c) = .02 ) )
*P

h|c .5 P(1 h|c .1 P(3

P( .2 | c) P( .4 | c)
4 t)

.4* .4*
* . tar
.8 (h|s

q1 c c c c
P

P(c|c) * P(1|c) P(c|c) * P(3|c)

.6* .5 .6* .1
c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 69
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Viterbi Backtrace q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

𝑏𝑒𝑠𝑡𝑝𝑎𝑡ℎ𝑝𝑟𝑜𝑏 = max(.01254, .00288) = .01254

qF end end end end

𝑣1(h) = .32 𝑣2(h) = .0448 𝒗3(h) = max(.0448 * .28, .048 * .16) = .01254
P(h|h) * P(1|h) P(h|h) * P(3|h)
.7* .2 .7* .4
q2 h h h h
h) h)
(1| P(c (3| P(c
)
|h

P .3* |h) * P .3* |h) * 𝑣3(c) = .max(.0448 * .03, .048 * .06) = .00288
(3

* 𝑣2(c) = .048 *
𝑣1(c) = .02 ) )
*P

h|c .5 P(1 h|c .1 P(3

P( .2 | c) P( .4 | c)
4 t)

.4* .4*
* . tar
.8 (h|s

q1 c c c c
P

P(c|c) * P(1|c) P(c|c) * P(3|c)

.6* .5 .6* .1
c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 70
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Viterbi Backtrace q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

𝑏𝑒𝑠𝑡𝑝𝑎𝑡ℎ𝑝𝑟𝑜𝑏 = max(.01254, .00288) = .01254

qF end end end end

𝑣1(h) = .32 𝒗2(h) = .0448 𝒗3(h) = max(.0448 * .28, .048 * .16) = .01254
P(h|h) * P(1|h) P(h|h) * P(3|h)
.7* .2 .7* .4
q2 h h h h
h) h)
(1| P(c (3| P(c
)
|h

P .3* |h) * P .3* |h) * 𝑣3(c) = .max(.0448 * .03, .048 * .06) = .00288
(3

* 𝑣2(c) = .048 *
𝑣1(c) = .02 ) )
*P

h|c .5 P(1 h|c .1 P(3

P( .2 | c) P( .4 | c)
4 t)

.4* .4*
* . tar
.8 (h|s

q1 c c c c
P

P(c|c) * P(1|c) P(c|c) * P(3|c)

.6* .5 .6* .1
c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 71
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Viterbi Backtrace q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

𝑏𝑒𝑠𝑡𝑝𝑎𝑡ℎ𝑝𝑟𝑜𝑏 = max(.01254, .00288) = .01254

qF end end end end

𝒗1(h) = .32 𝒗2(h) = .0448 𝒗3(h) = max(.0448 * .28, .048 * .16) = .01254
P(h|h) * P(1|h) P(h|h) * P(3|h)
.7* .2 .7* .4
q2 h h h h
h) h)
(1| P(c (3| P(c
)
|h

P .3* |h) * P .3* |h) * 𝑣3(c) = .max(.0448 * .03, .048 * .06) = .00288
(3

* 𝑣2(c) = .048 *
𝑣1(c) = .02 ) )
*P

h|c .5 P(1 h|c .1 P(3

P( .2 | c) P( .4 | c)
4 t)

.4* .4*
* . tar
.8 (h|s

q1 c c c c
P

P(c|c) * P(1|c) P(c|c) * P(3|c)

.6* .5 .6* .1
c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 72
.7
.8 B1
!(1|ℎ&'! ) .2
hot1 !(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4

Viterbi Backtrace q0

.2
.3
.6

cold2
.4

B2
!(1|/&01! ) .5
!(2|/&01! ) = .4
!(3|/&01! ) .1

𝑏𝑒𝑠𝑡𝑝𝑎𝑡ℎ𝑝𝑟𝑜𝑏 = max(.01254, .00288) = .01254

qF end end end end

𝒗1(h) = .32 𝒗2(h) = .0448 𝒗3(h) = max(.0448 * .28, .048 * .16) = .01254
P(h|h) * P(1|h) P(h|h) * P(3|h)
.7* .2 .7* .4
q2 h h h h
h) h)
(1| P(c (3| P(c
)
|h

P .3* |h) * P .3* |h) * 𝑣3(c) = .max(.0448 * .03, .048 * .06) = .00288
(3

* 𝑣2(c) = .048 *
𝑣1(c) = .02 ) )
*P

h|c .5 P(1 h|c .1 P(3

P( .2 | c) P( .4 | c)
4 t)

.4* .4*
* . tar
.8 (h|s

q1 c c c c
P

P(c|c) * P(1|c) P(c|c) * P(3|c)

.6* .5 .6* .1
c)
(3|
t) *P
| s tar
c
P( * .1
.2
q0 start start start start

3 1 3

o1 o2 CS 421
Natalie Parde - UIC
o3 73
The Viterbi algorithm is used in many
domains, even beyond text processing!
• Speech recognition
• Given an input acoustic signal, find the most likely sequence of words or
phonemes
• Digital error correction
• Given a received, potentially noisy signal, determine the most likely
transmitted message
• Computer vision
• Given noisy measurements in video sequences, estimate the most likely
trajectory of an object over time
• Economics
• Given historical data, predict financial market states at certain timepoints

Natalie Parde - UIC CS 421 74

This Hidden Markov Models

Week’s
Forward Algorithm
Viterbi Algorithm
Forward-Backward

Topics
Algorithm

Thursday

Tuesday

Parts of Speech
POS Tagsets
POS Tagging

Natalie Parde - UIC CS 421 75

Finally …how do we train HMMs?

• If we have a set of observations, can we learn the parameters

(transition probabilities and observation likelihoods) directly?

.7
.8 B1

313 hot1
!(1|ℎ&'! ) .2
!(2|ℎ&'! ) = .4
!(3|ℎ&'! ) .4
213
333 q0 .3
.6
.4

322 B2
!(1|/&01! ) .5
cold2
112 .2
!(2|/&01! ) = .4
!(3|/&01! ) .1

Natalie Parde - UIC CS 421 76

Forward-Backward Algorithm

• Special case of expectation-maximization (EM) algorithm

• Input:
• Unlabeled sequence of observations, O
• Vocabulary of hidden states, Q
• Output: Transition probabilities and observation likelihoods

Natalie Parde - UIC CS 421 77

• Iteratively estimate the counts for
transitions from one state to
another
How does • Start with base estimates for aij
and bj, and iteratively improve
the algorithm those estimates

78
compute • Get estimated probabilities by:
• Computing the forward probability
these for an observation
• Dividing that probability mass
outputs? among all the different paths that
contributed to this forward
Natalie Parde - UIC CS 421

probability (backward
probability)
Backward Algorithm

79
• We define the backward probability as follows:
• 𝛽6 𝑖 = 𝑃(𝑜6@2, 𝑜6@A, … , 𝑜B |𝑞6 = 𝑖, 𝜆)
• Probability of generating partial observations from time t+1 until
the end of the sequence, given that the HMM 𝜆 is in state i at time t
• Also computed using a trellis, but moves backwards instead
Natalie Parde - UIC CS 421
Backward Step
𝛽t+1(N)
𝛽t(i) = ∑&
$+! 𝛽%,! (𝑗)𝑎-$ 𝑏$ (𝑜%,! )
qN qN
𝑎-& 𝑏& (𝑜%,! )
qi

…
…

𝑎-" 𝛽t+1(2)

q2 q2 q2 𝑏" (𝑜%,! )

𝑎-! 𝛽t+1(1)

q1 q1 𝑏$ (𝑜% ) q1
𝑏! (𝑜%,! )

Ot-1 Ot Ot+1
𝛽. 𝑖 = 1

Natalie Parde - UIC CS 421 80

For the expectation step of the forward-backward
algorithm, we re-estimate transition probabilities and
observation likelihoods.
• We re-estimate transition probabilities, aij, as follows:
5!(6)5"#$7$8!%&(9)
• Let 𝜁4 𝑖, 𝑗 = 5'(:()
expected # transitions from state 𝑖 to state 𝑗 ∑>?=
;<= 3; (%,!)
• Then, 𝑎
B%! = =
expected # transitions from state 𝑖 ∑>?= A
;<= ∑@<= 3; (%,!)
• Check out the course textbook (Appendix A) for an in-depth discussion of how the
numerator and denominator above are derived!

Natalie Parde - UIC CS 421 81

Re-Estimating Observation Likelihood
• We re-estimate bj as follows:
6; (!)7; (!)
• Let 𝛾. 𝑗 =
6> (&B )
∑>
expected # of times in state 𝑗 and observing symbol 9C s.t. D;<EC :;(!)
• Then, 𝑏P! 𝑣8
;<=
= =
expected number of times in state 𝑗 ∑>
;<= :; (!)

Natalie Parde - UIC CS 421 82

Putting it all together, we have the
forward-backward algorithm!
initialize A and B
iterate until convergence:

# Expectation Step
compute 𝛾6 (𝑗) for all t and j
compute 𝜁6 (𝑖, 𝑗) for all t, i, and j

# Maximization Step
𝛼GH = 𝑎> GH for all i and j
𝑏H (𝑣I ) = 𝑏@H (𝑣I ) for all j, and all 𝑣I in the output vocab V

Natalie Parde - UIC CS 421 83

• HMMs are probabilistic generative models for
sequences
• They make predictions based on underlying hidden
states

Summary: • Three fundamental HMM problems include:

• Computing the likelihood of a sequence of

Hidden
observations
• Determining the best sequence of hidden states
for an observed sequence

Markov • Learning HMM parameters given an observation

sequence and a set of hidden states

Models • Observation likelihood can be computed using the

forward algorithm
• Sequences of hidden states can be decoded using
the Viterbi algorithm
• HMM parameters can be learned using the forward-
backward algorithm

Natalie Parde - UIC CS 421 84

This Hidden Markov Models

Week’s
Forward Algorithm
Viterbi Algorithm
Forward-Backward

Topics
Algorithm

Thursday

Tuesday

Parts of Speech
POS Tagsets
POS Tagging

Natalie Parde - UIC CS 421 85

What are parts
of speech?
• Traditional (broad) categories:
• noun
• verb
• adjective
• adverb
• preposition
• article
• interjection
• pronoun
• conjunction
• Sometimes also referred to as lexical
categories, word classes, or morphological
classes

Natalie Parde - UIC CS 421 86

Natalie Parde - UIC CS 421

Parts of Speech

Noun Verb Adjective Adverb

• People, • Actions or • Descriptive • Modifies other

places, or states attributes words by
things • Eat, sleep, • Purple, answering
• Doctor, be…. triangular, how, in what
mountain, windy…. way, when,
cellphone…. where, and to
what extent
questions
• Gently, quite,
quickly….

87
Parts of Speech

Pronoun Preposition Article Interjection Conjunction

Describes Coordinates words in

Refers to nouns relationship between the same clause or
Indicates specificity Exclamations
mentioned elsewhere noun/pronoun and connects multiple
other word in clause clauses/sentences
Natalie Parde - UIC CS 421

he, she, you…. on, above, to…. a, an, the…. oh, yikes, ah…. and, but, if….
What is part-of-
The process of automatically assigning grammatical
speech (POS) word classes to individual tokens in text.
tagging?
Natalie Parde - UIC CS 421 89
Why is
POS
tagging lead

useful?
Even when using end-
to-end approaches or
pretrained LLMs, POS
tagging is useful.
Offers an avenue for interpretable linguistic
analysis!

Natalie Parde - UIC CS 421 91

POS Tag Categories
Each POS type falls into one of two larger classes:

• Open
• Closed

Open class:

• New members can be created at any time

• In English:
• Nouns, verbs, adjectives, and adverbs
• Many (but not all!) languages have these four classes

Closed class:

• A small, fixed membership …new members cannot be created spontaneously

• Usually function words
• In English:
• Prepositions and auxiliaries (may, can, been, etc.)

Natalie Parde - UIC CS 421 92

Finer-Grained POS Classes

• Broader POS classes often have smaller subclasses

• Noun:
• Proper (Illinois)
93
• Common (state)
• Verb:
• Main (tweet)
• Modal (had)
Natalie Parde - UIC CS 421

• Some subclasses of a part of speech might be open, while others are

closed
Open Class
Nouns Verbs Adjectives old older oldest

Proper Common Main Adverbs slowly

IBM cat / cats see
Italy snow registered

Closed Class
Modal
Determiners the some can Prepositions to with
had
Conjunctions and or

Natalie Parde - UIC CS 421 94

POS Tagging

• Can be very challenging!

• Words often have more than one valid part of speech tag
95
• Today’s faculty meeting went really well! = adverb
• Do you think the undergrads are well? = adjective
• Well, did you see the latest response to your email? = interjection
• Jurafsky and Martin’s book is a well of information. = noun
• Laughter began to well up inside her at, as always, a highly
Natalie Parde - UIC CS 421

inconvenient time. = verb

verb determiner

Give me a break! POS Tagging

pronoun noun
• Goal: Determine the best POS tag for a
particular instance of a word.

verb noun

Did the window break?

determiner verb

Natalie Parde - UIC CS 421 96

This Hidden Markov Models

Week’s
Forward Algorithm
Viterbi Algorithm
Forward-Backward

Topics
Algorithm

Thursday

Tuesday

Parts of Speech
POS Tagsets
POS Tagging

Natalie Parde - UIC CS 421 97

POS Tagsets
In order to determine which POS tag to assign to a
word, we first need to decide which tagset we will use

Tagset: A finite set of POS tags, where each tag

defines a distinct grammatical role

Can range from very coarse to very fine

Natalie Parde - UIC CS 421 98

Penn Treebank Tagset
• Most common POS tagset
• 36 POS tags + 12 other tags (punctuation and currency)
• Used when developing the Penn Treebank, a corpus created at the University of
Pennsylvania containing more than 4.5 million words of American English
• Link to documentation: https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html

Natalie Parde - UIC CS 421 99

Penn Treebank Tagset

CC Coordinating Conjunction NNS Noun, plural TO to

CD Cardinal Number NNP Proper noun, singular UH Interjection
DT Determiner NNPS Proper noun, plural VB Verb, base form
EX Existential there PDT Predeterminer VBD Verb, past tense
Verb, gerund or present
FW Foreign word POS Possessive ending VBG
participle
Preposition or subordinating
IN PRP Personal pronoun VBN Verb, past participle
conjunction
Verb, non-3rd person singular
JJ Adjective PRP$ Possessive pronoun VBP
present
Verb, 3rd person singular
JJR Adjective, comparative RB Adverb VBZ
present
JJS Adjective, superlative RBR Adverb, comparative WDT Wh-determiner
LS List item marker RBS Adverb, superlative WP Wh-pronoun
MD Modal RP Particle WP$ Possessive wh-pronoun
NN Noun, singular or mass SYM Symbol WRB Wh-adverb

Natalie Parde - UIC CS 421 100

What do some of these distinctions
mean? cities
CC Coordinating Conjunction NNS Noun, plural TO to
CD Cardinal Number NNP Proper noun, singular UH Interjection
DT Determiner NNPS Proper noun, plural VB Verb, base form
Chicago
EX Existential there PDT Predeterminer VBD Verb, past tense
Verb, gerund or present
FW Foreign word POS Possessive ending VBG
participle
Preposition or subordinating Chicagos
IN PRP Personal pronoun VBN Verb, past participle
conjunction
Verb, non-3rd person singular
JJ Adjective PRP$ Possessive pronoun VBP
present
Verb, 3rd person singular
JJR Adjective, comparative RB Adverb VBZ
present
city JJS Adjective, superlative RBR Adverb, comparative WDT Wh-determiner
LS List item marker RBS Adverb, superlative WP Wh-pronoun
MD Modal RP Particle WP$ Possessive wh-pronoun
NN Noun, singular or mass SYM Symbol WRB Wh-adverb

Natalie Parde - UIC CS 421 101

What do some of these distinctions
mean?
eat
CC Coordinating Conjunction NNS Noun, plural TO to
CD Cardinal Number NNP Proper noun, singular UH Interjection
ate
DT Determiner NNPS Proper noun, plural VB Verb, base form
EX Existential there PDT Predeterminer VBD Verb, past tense
eating ending Verb, gerund or present
FW Foreign word POS Possessive VBG
participle
Preposition or subordinating
IN PRP Personal pronoun VBN Verb, past participle
conjunction
eaten
Verb, non-3rd person singular
JJ Adjective PRP$ Possessive pronoun VBP
present
eat Verb, 3rd person singular
JJR Adjective, comparative RB Adverb VBZ
present
shouldJJS Adjective, superlative RBR Adverb, comparative WDT Wh-determiner
eats
LS List item marker RBS Adverb, superlative WP Wh-pronoun
MD Modal RP Particle WP$ Possessive wh-pronoun
NN Noun, singular or mass SYM Symbol WRB Wh-adverb

Natalie Parde - UIC CS 421 102

What do some of these distinctions
mean?
CC Coordinating Conjunction NNS Noun, plural TO to
CD Cardinal Number NNP Proper noun, singular UH Interjection
DT Determiner NNPS Proper noun, plural VB Verb, base form
EX Existential there PDT Predeterminer VBD Verb, past tense
weirdForeign word Verb, gerund or present
FW POS Possessive ending VBG
participle
Preposition or subordinating
IN PRP Personal pronoun VBN Verb, past participle
conjunction
Verb, non-3rd person singular
JJ Adjective PRP$ Possessive pronoun VBP
present
weirder
Verb, 3rd person singular
JJR Adjective, comparative RB Adverb VBZ
present
JJS Adjective, superlative RBR Adverb, comparative WDT Wh-determiner
LS List item marker RBS Adverb, superlative WP Wh-pronoun
MD Modal RP Particle WP$ Possessive wh-pronoun
weirdest
NN Noun, singular or mass SYM Symbol WRB Wh-adverb

Natalie Parde - UIC CS 421 103

What do some of these distinctions
mean?
CC Coordinating Conjunction NNS Noun, plural TO to
CD Cardinal Number NNP Proper noun, singular UH Interjection
DT Determiner NNPS Proper noun, plural VB Verb, base form
EX Existential there PDT Predeterminer VBD Verb, past tense
Verb, gerund or present
FW Foreign word POS Possessive ending VBG
participle
Preposition or subordinating calmly
IN PRP Personal pronoun VBN Verb, past participle
conjunction
Verb, non-3rd person singular
JJ Adjective PRP$ Possessive pronoun VBP
present
Verb, 3rd person singular
JJR Adjective, comparative RB Adverb VBZ
present
JJS calmer
Adjective, superlative RBR Adverb, comparative WDT Wh-determiner
LS List item marker RBS Adverb, superlative WP Wh-pronoun
MD Modal RP Particle WP$ Possessive wh-pronoun
calmest
NN Noun, singular or mass SYM Symbol WRB Wh-adverb

Natalie Parde - UIC CS 421 104

As a general (but not perfect!) rule….
CC Coordinating Conjunction NNS Noun, plural TO to
CD Cardinal Number NNP Proper noun, singular UH Interjection
DT Determiner NNPS Proper noun, plural VB Verb, base form
EX Existential there PDT Predeterminer VBD Verb, past tense
Clo s
Verb, gerund or present
FW Foreign word POS Possessive ending VBG
participle ed C
Preposition or subordinating
lass
IN PRP Personal pronoun VBN Verb, past participle
conjunction
Verb, non-3rd person singular
JJ Adjective PRP$ Possessive pronoun VBP
present
Verb, 3rd person singular
JJR Adjective, comparative RB Adverb VBZ
present
JJS Adjective, superlative RBR Adverb, comparative WDT Wh-determiner
LS List item marker RBS Adverb, superlative WP Wh-pronoun
MD Modal RP Particle WP$ Possessive wh-pronoun
NN Noun, singular or mass SYM Symbol WRB Wh-adverb

Natalie Parde - UIC CS 421 105

Verb, non-3rd person singular

JJ Adjective PRP$ Possessive pronoun VBP
present
Verb, 3rd person singular
JJR Adjective, comparative RB Adverb VBZ
present
JJS Adjective, superlative RBR Adverb, comparative WDT Wh-determiner
LS List item marker RBS Adverb, superlative WP Wh-pronoun
MD Modal RP Particle WP$ Possessive wh-pronoun
NN Noun, singular or mass SYM Symbol WRB Wh-adverb

Natalie Parde - UIC CS 421 106

Other Popular POS Tagsets

Brown C5 C7
Corpus Tagset Tagset
~1 million
Text from the Text from the
words of
American British National British National
Corpus Corpus
English text

146 (!!) POS

82 (!) POS tags 61 POS tags
tags

Natalie Parde - UIC CS 421 107

This Hidden Markov Models

Week’s
Forward Algorithm
Viterbi Algorithm
Forward-Backward

Topics
Algorithm

Thursday

Tuesday

Parts of Speech
POS Tagsets
POS Tagging

Natalie Parde - UIC CS 421 108

So …how
can we
assign
POS
tags?

Natalie Parde - UIC CS 421 109

So …how can we assign POS tags?
Time flies like an arrow; fruit flies like a banana

CC Coordinating Conjunction NNS Noun, plural TO to

CD Cardinal Number NNP Proper noun, singular UH Interjection

DT Determiner NNPS Proper noun, plural VB Verb, base form

EX Existential there PDT Predeterminer VBD Verb, past tense

FW Foreign word POS Possessive ending VBG Verb, gerund or present participle

Preposition or subordinating
IN PRP Personal pronoun VBN Verb, past participle
conjunction

JJ Adjective PRP$ Possessive pronoun VBP Verb, non-3rd person singular present

JJR Adjective, comparative RB Adverb VBZ Verb, 3rd person singular present

JJS Adjective, superlative RBR Adverb, comparative WDT Wh-determiner

LS List item marker RBS Adverb, superlative WP Wh-pronoun

MD Modal RP Particle WP$ Possessive wh-pronoun

NN Noun, singular or mass SYM Symbol WRB Wh-adverb

Natalie Parde - UIC CS 421 110
So …how can we assign POS tags?
Time flies like an arrow fruit flies like a banana

CC Coordinating Conjunction NNS Noun, plural TO to

CD Cardinal Number NNP Proper noun, singular UH Interjection

DT Determiner NNPS Proper noun, plural VB Verb, base form

EX Existential there PDT Predeterminer VBD Verb, past tense

FW Foreign word POS Possessive ending VBG Verb, gerund or present participle

Preposition or subordinating
IN PRP Personal pronoun VBN Verb, past participle
conjunction

JJ Adjective PRP$ Possessive pronoun VBP Verb, non-3rd person singular present

JJR Adjective, comparative RB Adverb VBZ Verb, 3rd person singular present

JJS Adjective, superlative RBR Adverb, comparative WDT Wh-determiner

LS List item marker RBS Adverb, superlative WP Wh-pronoun

MD Modal RP Particle WP$ Possessive wh-pronoun

NN Noun, singular or mass SYM Symbol WRB Wh-adverb

Natalie Parde - UIC CS 421 111
So …how can we assign POS tags?
Time flies like an arrow fruit flies like a banana

CC Coordinating Conjunction NNS Noun, plural TO to

CD Cardinal Number NNP Proper noun, singular UH Interjection

DT Determiner NNPS Proper noun, plural VB Verb, base form

EX Existential there PDT Predeterminer VBD Verb, past tense

FW Foreign word POS Possessive ending VBG Verb, gerund or present participle

Preposition or subordinating
IN PRP Personal pronoun VBN Verb, past participle
conjunction

JJ Adjective PRP$ Possessive pronoun VBP Verb, non-3rd person singular present

JJR Adjective, comparative RB Adverb VBZ Verb, 3rd person singular present

JJS Adjective, superlative RBR Adverb, comparative WDT Wh-determiner

LS List item marker RBS Adverb, superlative WP Wh-pronoun

MD Modal RP Particle WP$ Possessive wh-pronoun

NN Noun, singular or mass SYM Symbol WRB Wh-adverb

Natalie Parde - UIC CS 421 112
So …how can we assign POS tags?
Time flies like an arrow fruit flies like a banana

CC Coordinating Conjunction NNS Noun, plural TO to

CD Cardinal Number NNP Proper noun, singular UH Interjection

DT Determiner NNPS Proper noun, plural VB Verb, base form

EX Existential there PDT Predeterminer VBD Verb, past tense

FW Foreign word POS Possessive ending VBG Verb, gerund or present participle

Preposition or subordinating
IN PRP Personal pronoun VBN Verb, past participle
conjunction

JJ Adjective PRP$ Possessive pronoun VBP Verb, non-3rd person singular present

JJR Adjective, comparative RB Adverb VBZ Verb, 3rd person singular present

JJS Adjective, superlative RBR Adverb, comparative WDT Wh-determiner

LS List item marker RBS Adverb, superlative WP Wh-pronoun

MD Modal RP Particle WP$ Possessive wh-pronoun

NN Noun, singular or mass SYM Symbol WRB Wh-adverb

Natalie Parde - UIC CS 421 113
So …how can we assign POS tags?
Time flies like an arrow fruit flies like a banana

CC Coordinating Conjunction NNS Noun, plural TO to

CD Cardinal Number NNP Proper noun, singular UH Interjection

DT Determiner NNPS Proper noun, plural VB Verb, base form

EX Existential there PDT Predeterminer VBD Verb, past tense

FW Foreign word POS Possessive ending VBG Verb, gerund or present participle

Preposition or subordinating
IN PRP Personal pronoun VBN Verb, past participle
conjunction

JJ Adjective PRP$ Possessive pronoun VBP Verb, non-3rd person singular present

JJR Adjective, comparative RB Adverb VBZ Verb, 3rd person singular present

JJS Adjective, superlative RBR Adverb, comparative WDT Wh-determiner

LS List item marker RBS Adverb, superlative WP Wh-pronoun

MD Modal RP Particle WP$ Possessive wh-pronoun

NN Noun, singular or mass SYM Symbol WRB Wh-adverb

Natalie Parde - UIC CS 421 114
So …how can we assign POS tags?
Time flies like an arrow fruit flies like a banana

CC Coordinating Conjunction NNS Noun, plural TO to

CD Cardinal Number NNP Proper noun, singular UH Interjection

DT Determiner NNPS Proper noun, plural VB Verb, base form

EX Existential there PDT Predeterminer VBD Verb, past tense

FW Foreign word POS Possessive ending VBG Verb, gerund or present participle

Preposition or subordinating
IN PRP Personal pronoun VBN Verb, past participle
conjunction

JJ Adjective PRP$ Possessive pronoun VBP Verb, non-3rd person singular present

JJR Adjective, comparative RB Adverb VBZ Verb, 3rd person singular present

JJS Adjective, superlative RBR Adverb, comparative WDT Wh-determiner

LS List item marker RBS Adverb, superlative WP Wh-pronoun

MD Modal RP Particle WP$ Possessive wh-pronoun

NN Noun, singular or mass SYM Symbol WRB Wh-adverb

Natalie Parde - UIC CS 421 115
So …how can we assign POS tags?
Time flies like an arrow fruit flies like a banana

CC Coordinating Conjunction NNS Noun, plural TO to

CD Cardinal Number NNP Proper noun, singular UH Interjection

DT Determiner NNPS Proper noun, plural VB Verb, base form

EX Existential there PDT Predeterminer VBD Verb, past tense

FW Foreign word POS Possessive ending VBG Verb, gerund or present participle

Preposition or subordinating
IN PRP Personal pronoun VBN Verb, past participle
conjunction

JJ Adjective PRP$ Possessive pronoun VBP Verb, non-3rd person singular present

JJR Adjective, comparative RB Adverb VBZ Verb, 3rd person singular present

JJS Adjective, superlative RBR Adverb, comparative WDT Wh-determiner

LS List item marker RBS Adverb, superlative WP Wh-pronoun

MD Modal RP Particle WP$ Possessive wh-pronoun

NN Noun, singular or mass SYM Symbol WRB Wh-adverb

Natalie Parde - UIC CS 421 116
So …how can we assign POS tags?
Time flies like an arrow fruit flies like a banana

CC Coordinating Conjunction NNS Noun, plural TO to

CD Cardinal Number NNP Proper noun, singular UH Interjection

DT Determiner NNPS Proper noun, plural VB Verb, base form

EX Existential there PDT Predeterminer VBD Verb, past tense

FW Foreign word POS Possessive ending VBG Verb, gerund or present participle

Preposition or subordinating
IN PRP Personal pronoun VBN Verb, past participle
conjunction

JJ Adjective PRP$ Possessive pronoun VBP Verb, non-3rd person singular present

JJR Adjective, comparative RB Adverb VBZ Verb, 3rd person singular present

JJS Adjective, superlative RBR Adverb, comparative WDT Wh-determiner

LS List item marker RBS Adverb, superlative WP Wh-pronoun

MD Modal RP Particle WP$ Possessive wh-pronoun

NN Noun, singular or mass SYM Symbol WRB Wh-adverb

Natalie Parde - UIC CS 421 117
So …how can we assign POS tags?
Time flies like an arrow fruit flies like a banana

CC Coordinating Conjunction NNS Noun, plural TO to

CD Cardinal Number NNP Proper noun, singular UH Interjection

DT Determiner NNPS Proper noun, plural VB Verb, base form

EX Existential there PDT Predeterminer VBD Verb, past tense

FW Foreign word POS Possessive ending VBG Verb, gerund or present participle

Preposition or subordinating
IN PRP Personal pronoun VBN Verb, past participle
conjunction

JJ Adjective PRP$ Possessive pronoun VBP Verb, non-3rd person singular present

JJR Adjective, comparative RB Adverb VBZ Verb, 3rd person singular present

JJS Adjective, superlative RBR Adverb, comparative WDT Wh-determiner

LS List item marker RBS Adverb, superlative WP Wh-pronoun

MD Modal RP Particle WP$ Possessive wh-pronoun

NN Noun, singular or mass SYM Symbol WRB Wh-adverb

Natalie Parde - UIC CS 421 118
So …how can we assign POS tags?
Time flies like an arrow fruit flies like a banana

CC Coordinating Conjunction NNS Noun, plural TO to

CD Cardinal Number NNP Proper noun, singular UH Interjection

DT Determiner NNPS Proper noun, plural VB Verb, base form

EX Existential there PDT Predeterminer VBD Verb, past tense

FW Foreign word POS Possessive ending VBG Verb, gerund or present participle

Preposition or subordinating
IN PRP Personal pronoun VBN Verb, past participle
conjunction

JJ Adjective PRP$ Possessive pronoun VBP Verb, non-3rd person singular present

JJR Adjective, comparative RB Adverb VBZ Verb, 3rd person singular present

JJS Adjective, superlative RBR Adverb, comparative WDT Wh-determiner

LS List item marker RBS Adverb, superlative WP Wh-pronoun

MD Modal RP Particle WP$ Possessive wh-pronoun

NN Noun, singular or mass SYM Symbol WRB Wh-adverb

Natalie Parde - UIC CS 421 119
Ambiguity is a big issue for
POS taggers!

• Many words have multiple senses

• time = noun, verb
• flies = noun, verb
• like = verb, preposition

Natalie Parde - UIC CS 421 120

Just how • Brown Corpus: Approximately 11% of
ambiguous word types have multiple valid part of
speech labels
is natural • These tend to be very common words!
language? • We think that the meeting will only
last two more hours. = IN
121
• Was that the 32nd Piazza post
today? = DT
• You can’t eat that many donuts
every time the clock strikes
midnight! = RB
Natalie Parde - UIC CS 421

• Overall, ~40% of word tokens are

instances of ambiguous word types
Despite
this,
modern • Accuracy > 97%
• Even a simple baseline can achieve ~90%

POS accuracy
• Tag every word with its most frequent

taggers
tag
• Tag unknown words as nouns

still work
quite well.
Natalie Parde - UIC CS 421 122
How do
POS
taggers
work?
Rule-Based POS Tagging

Start with a dictionary, and assign all relevant tags to the

words in that dictionary

Manually design rules to selectively remove invalid tags for

test instances in context

Keep the remaining correct tag for each word

Natalie Parde - UIC CS 421 124

• Start with a dictionary that specifies
permissible tags for our small vocabulary:
• she

Example • PRP
• promised

Rule- • to
• VBN, VBD

Based
• TO
• back

Approach
• VB, JJ, RB, NN
• the
• DT
• bill
• NN, VB

Natalie Parde - UIC CS 421 125

Example Rule-Based Approach
Assign every possible tag to each word in the sequence

she promised to back the bill

PRP VBN TO VB DT NN
VBD JJ VB
RB
NN

Natalie Parde - UIC CS 421 126

Example Rule-Based Approach
Apply rules to eliminate invalid tags

Eliminate VBN if VBD is an option when VBN|VBD follows “<start> PRP”

she promised to back the bill

PRP VBN TO VB DT NN
VBD JJ VB
RB
NN

Natalie Parde - UIC CS 421 127

Example Rule-Based Approach
Keep the remaining correct tag for each word

she promised to back the bill

PRP VBN TO VB DT NN
VBD JJ VB
RB
NN

Natalie Parde - UIC CS 421 128

Rule-based
POS taggers • Like all rule-based methods, they carry important
disadvantages:
are an • Time-consuming to build
• Difficult to update or generalize to new
adequate domains
• Might miss important patterns latent in the
baseline, specified text domain

but….

Natalie Parde - UIC CS 421 129

Nice alternative to rule-
based POS tagging?
• Statistical POS Tagging: POS taggers that make
decisions based on learned knowledge of POS tag
distribution in a training corpus
• the is usually tagged as DT
• Words with uppercase letters are more likely to be
tagged NNP or NNPS
• Words starting with the prefix un- may be tagged JJ
• Words ending with the suffix –ly may be tagged RB

Natalie Parde - UIC CS 421 130

Simple Statistical POS Tagger

• Using a training corpus, determine the most frequent tag for each
word
• Assign POS tags to new words based on those frequencies
• Assign NN to new words for which there is no information from the
training corpus

I saw a wampimuk at the zoo yesterday!

Natalie Parde - UIC CS 421 131

Simple Statistical POS Tagger

Natalie Parde - UIC CS 421 132

Simple Statistical POS Tagger

PRP DT IN NN

I saw a wampimuk at the zoo yesterday!

VBD NN DT NN

Natalie Parde - UIC CS 421 133

• This approach works reasonably well
• Approximately 90% accuracy
• However, we can do much better!
• One way to improve upon our results is
to use HMMs

Simple
Statistical
POS Tagger
Natalie Parde - UIC CS 421 134
Bigram HMM POS Tagger
• To determine the tag ti for a single word wi:
• 𝑡! = argmax 𝑃 𝑡( 𝑡!'" 𝑃 𝑤! 𝑡(
$/ ∈{$' ,$( ,…,$)*( }

• This means we need to be able to compute two probabilities:

• The probability that the tag is tj given that the previous tag is ti-1
• 𝑃 𝑡( 𝑡!'"
• The probability that the word is wi given that the tag is tj
• 𝑃 𝑤! 𝑡(
• We can compute both of these from corpora like the Penn Treebank or the Brown
Corpus
• Then, we can find the most optimal sequence of tags using the Viterbi algorithm!

Natalie Parde - UIC CS 421 135

• Given two possible sequences
Superman is expected to fly tomorrow
of tags from the Brown Corpus
NNP VBZ VBN TO VB NR tagset for the following
NNP VBZ VBN TO NN NR sentence, what is the best way
to tag the word “fly”?

Proper noun, singular Verb, past participle

Verb, 3rd person singular present Infinitive to Adverbial noun

Example: Bigram HMM Tagger

Natalie Parde - UIC CS 421 136
• Since we’re creating a bigram
HMM tagger and focusing on the
word “fly,” we only need to be
Superman is expected to fly tomorrow concerned with the subsequence
“to fly tomorrow”
NNP VBZ VBN TO VB NR
• For simplicity when decoding,
NNP VBZ VBN TO NN NR we’ll assume that:
• The first word in the subsequence
for sure has label TO (v0(TO) = 1.0)
• The word “tomorrow” for sure has
label NR (P(“tomorrow”|NR) = 1.0)

Example: Bigram HMM Tagger

Natalie Parde - UIC CS 421 137
We have the following HMM
sample:

a22

a24
Superman is expected to fly tomorrow a02 TO2

NNP VBZ VBN TO VB NR

a21
NNP VBZ VBN TO NN NR start0 NR4
a11 a33
a01 a12 a32 a34
a03
VB1 NN3
a31

a14
a13

Example: Bigram HMM Tagger

Natalie Parde - UIC CS 421 138
The specific transition
probabilities we are interested in
are:
a22

a24
Superman is expected to fly tomorrow a02 TO2

NNP VBZ VBN TO VB NR a23

a21
NNP VBZ VBN TO NN NR start0 NR4
a11 a33
a01 a12 a32 a34
a03
VB1 NN3
a31

a14
a13

Example: Bigram HMM Tagger

Natalie Parde - UIC CS 421 139
Superman is expected to fly tomorrow
NNP VBZ VBN TO VB NR
NNP VBZ VBN TO NN NR
• We can estimate the transition
probabilities for a21, a23, a34,
TO2
and a14 using frequency counts
from the Brown Corpus
a23 <(.F?= .F )
a21 • 𝑃 𝑡% 𝑡%;# =
NR4 <(.F?= )

a34

VB1 NN3

a14

Example: Bigram HMM Tagger

Natalie Parde - UIC CS 421 140
Superman is expected to fly tomorrow
NNP VBZ VBN TO VB NR
NNP VBZ VBN TO NN NR • We can estimate the transition
probabilities for a21, a23, a34,
and a14 using frequency counts
from the Brown Corpus
TO2
<(.F?= .F )
• 𝑃 𝑡% 𝑡%;# =
<(.F?= )
a21 0.00047
NR4 • So, P(NN|TO) = C(TO NN) /
a34
C(TO) = 0.00047

VB1 NN3

a14

Example: Bigram HMM Tagger

Natalie Parde - UIC CS 421 141
Superman is expected to fly tomorrow
NNP VBZ VBN TO VB NR • We can estimate the transition
NNP VBZ VBN TO NN NR probabilities for a21, a23, a34,
and a14 using frequency counts
from the Brown Corpus
<(.F?= .F )
TO2 • 𝑃 𝑡% 𝑡%;# =
<(.F?= )

0.00047 • So, P(NN|TO) = C(TO NN) /

0.83
NR4 C(TO) = 0.00047
a34 • Likewise, P(VB|TO) = C(TO
VB) / C(TO) = 0.83
VB1 NN3

a14

Example: Bigram HMM Tagger

Natalie Parde - UIC CS 421 142
Superman is expected to fly tomorrow
• We can estimate the transition
NNP VBZ VBN TO VB NR probabilities for a21, a23, a34,
NNP VBZ VBN TO NN NR
and a14 using frequency counts
from the Brown Corpus
<(.F?= .F )
• 𝑃 𝑡% 𝑡%;# =
<(.F?= )
TO2
• So, P(NN|TO) = C(TO NN) /
C(TO) = 0.00047
0.83 0.00047
NR4 • Likewise, P(VB|TO) = C(TO
a34
VB) / C(TO) = 0.83
• P(NR|VB) = C(VB NR) / C(VB)
VB1 NN3 = 0.0027
0.0027

Example: Bigram HMM Tagger

Natalie Parde - UIC CS 421 143
Superman is expected to fly tomorrow • We can estimate the transition
probabilities for a21, a23, a34,
NNP VBZ VBN TO VB NR and a14 using frequency counts
NNP VBZ VBN TO NN NR
from the Brown Corpus
<(.F?= .F )
• 𝑃 𝑡% 𝑡%;# =
<(.F?= )
• So, P(NN|TO) = C(TO NN) /
TO2 C(TO) = 0.00047
0.00047
• Likewise, P(VB|TO) = C(TO
0.83
NR4
VB) / C(TO) = 0.83
• P(NR|VB) = C(VB NR) / C(VB)
0.0012 = 0.0027
VB1 NN3 • Finally, P(NR|NN) = C(NN NR) /
C(NN) = 0.0012
0.0027

Example: Bigram HMM Tagger

Natalie Parde - UIC CS 421 144
Superman is expected to fly tomorrow
• We have our transition
NNP VBZ VBN TO VB NR probabilities …what now?
NNP VBZ VBN TO NN NR • Observation likelihoods!
• We can also estimate these
using frequency counts from
TO2 the Brown Corpus
<(=F ,.F )
fly
• 𝑃 𝑤% 𝑡% =
0.83 0.00047 <(.F )
NR4
VB • Since we’re trying to decide the
NN best tag for “fly,” we need to
0.0012
compute both P(fly|VB) and
VB1 NN3 P(fly|NN)

0.0027

Example: Bigram HMM Tagger

Natalie Parde - UIC CS 421 145
Superman is expected to fly tomorrow • We have our transition
probabilities …what now?
NNP VBZ VBN TO VB NR
• Observation likelihoods!
NNP VBZ VBN TO NN NR
• We can also estimate these
using frequency counts from
the Brown Corpus
<(=F ,.F )
TO2 • 𝑃 𝑤% 𝑡% =
<(.F )

0.83 0.00047 fly • Since we’re trying to decide the

NR4 best tag for “fly,” we need to
VB 0.00012
compute both P(fly|VB) and
0.0012 NN P(fly|NN)
VB1 NN3
• P(fly|VB) = C(fly, VB) / C(VB) =
0.00012
0.0027

Example: Bigram HMM Tagger

Natalie Parde - UIC CS 421 146
Superman is expected to fly tomorrow • We have our transition
probabilities …what now?
NNP VBZ VBN TO VB NR • Observation likelihoods!
NNP VBZ VBN TO NN NR • We can also estimate these
using frequency counts from the
Brown Corpus
#(%! ,'! )
• 𝑃 𝑤" 𝑡" = #('! )
TO2
• Since we’re trying to decide the
0.00047 fly best tag for “fly,” we need to
0.83 compute both P(fly|VB) and
NR4
VB 0.00012 P(fly|NN)
• P(fly|VB) = C(fly, VB) / C(VB) =
0.0012 NN 0.00057 0.00012
VB1 NN3 • P(fly|NN) = C(fly, NN) / C(NN) =
0.00057
0.0027

Example: Bigram HMM Tagger

Natalie Parde - UIC CS 421 147
Superman is expected to fly tomorrow
• Now, to decide how to tag “fly,”
we can consider our two
NNP VBZ VBN TO VB NR possible sequences:
NNP VBZ VBN TO NN NR • to (TO) fly (VB) tomorrow (NR)
• to (TO) fly (NN) tomorrow (NR)
• We will select the tag that
maximizes the probability:
TO2
• P(ti|TO)P(NR|ti)P(fly|ti)

0.83 0.00047 fly • We determine that:

Example: Bigram HMM Tagger

Natalie Parde - UIC CS 421 148
Superman is expected to fly tomorrow • Now, to decide how to tag “fly,”
we can consider our two
NNP VBZ VBN TO VB NR possible sequences:
NNP VBZ VBN TO NN NR • to (TO) fly (VB) tomorrow (NR)
• to (TO) fly (NN) tomorrow (NR)
• We will select the tag that
maximizes the probability:
TO2 • P(ti|TO)P(NR|ti)P(fly|ti)
• We determine that:
0.83 0.00047 fly • P(VB|TO)P(NR|VB)P(fly|VB) =
NR4 0.83 * 0.0027 * 0.00012 =
VB 0.00012
0.00000027
0.0012 NN 0.00057 • Optimal sequence!
• P(NN|TO)P(NR|NN)P(fly|NN) =
VB1 NN3 0.00047 * 0.0012 * 0.00057 =
0.00000000032
0.0027

Example: Bigram HMM Tagger

Natalie Parde - UIC CS 421 149
Superman is expected to fly tomorrow • Visualized in a Viterbi trellis,
this would look like:
NNP VBZ VBN TO VB NR v2(NR) = max(2.68*10-7*0.0012*1.0, 9.96*10-5*0.0027*1.0)
= 0.00000027

NNP VBZ VBN TO NN NR NR NR

0.0
012
NR

v1(NN) = 1.0 * 0.00047 * 0.00057 = 2.68*10-7

27
NN NN NN

00
0.
TO2 v1(VB) = 1.0 * 0.83 * 0.00012 = 9.96*10-5

7
004
VB VB VB

0.0
0.83 0.00047 fly
v0(TO) = 1.0 83
NR4 0.
VB 0.00012
TO TO TO

0.0012 NN 0.00057

VB1 NN3 to fly tomorrow

0.0027

Example: Bigram HMM Tagger

Natalie Parde - UIC CS 421 150
Neural Sequence t3

noun
Modeling t2

adjective
t1
• Use a sequential or pretrained neural
network architecture determiner
h3
• Recurrent neural networks
• Transformers h2

• Predict a label for each item in the latte

input sequence h1
delicious
• If using a subword vocabulary,
you will need to merge the labels
predicted for all subwords in a h0 a

word

Natalie Parde - UIC CS 421 151

How can POS
taggers handle
unknown words?
• New words are continually added to language,
so it is likely that a POS tagger will encounter
words not found in its training corpus
• Easy baseline approach: Assume that
unknown words are nouns
• More sophisticated approach: Assume that
unknown words have a probability
distribution similar to other words
occurring only once in the training corpus,
and make an (informed) random choice
• Even more sophisticated approach: Use
morphological information to choose the
POS tag (for example, words ending with “ed”
tend to be tagged VBN)

Natalie Parde - UIC CS 421 152

Evaluation Metrics
for POS Taggers
• Common metrics for POS taggers are:
• Accuracy
153 • Precision
• Recall
• F1
Natalie Parde - UIC CS 421
Comparison

• The scores computed for these metrics

should be compared to alternative POS
tagging methods, to place the values in
context
• Is this a good accuracy, or just okay?
• It’s good to compare to both a lower-bound
baseline and an upper-bound ceiling
• Baseline: What should your POS
tagger definitely perform better than?
• Most Frequent Class
• Ceiling: What is the highest possible
value for this task?
• Human Agreement

Natalie Parde - UIC CS 421 154

What factors can impact performance?

155
• Many factors can lead to your results being higher or lower than
expected!
• Some common factors:
• The size of the training dataset
• The specific characteristics of your tag set
Natalie Parde - UIC CS 421

• The difference between your training and test corpora

• The number of unknown words in your test corpus
POS tagging is the process of automatically
assigning grammatical word classes (parts
of speech) to individual tokens

Summary: The most common POS tagset is the Penn

Treebank tagset
Part-of-
Speech

Natalie Parde - UIC CS 421

Ambiguity is common in natural language,
and is a major issue that POS taggers must
Tagging address

Although POS taggers can be designed

using many approaches, statistical (and
neural) models are most common

156

Helpful Notes - 11 - Final - Review
No ratings yet
Helpful Notes - 11 - Final - Review
21 pages
SAT SMT by Example
No ratings yet
SAT SMT by Example
585 pages
SMT
No ratings yet
SMT
563 pages
Theory of Computation
No ratings yet
Theory of Computation
242 pages
Lecture 7 - Conditional Language Modeling
No ratings yet
Lecture 7 - Conditional Language Modeling
64 pages
CSCI 5832 Natural Language Processing: Jim Martin
No ratings yet
CSCI 5832 Natural Language Processing: Jim Martin
47 pages
LSTM Lecture
No ratings yet
LSTM Lecture
163 pages
All Models PDF
No ratings yet
All Models PDF
125 pages
TM 1
No ratings yet
TM 1
48 pages
Recurrent Neural Networks: CSC2535 2013: Advanced Machine Learning
No ratings yet
Recurrent Neural Networks: CSC2535 2013: Advanced Machine Learning
57 pages
MS Excel 280 Short Keys Guide Book
No ratings yet
MS Excel 280 Short Keys Guide Book
36 pages
ECO2201 - Slides - 0.2 Mathematics Review PDF
No ratings yet
ECO2201 - Slides - 0.2 Mathematics Review PDF
20 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
51 pages
3 cs772 Skip Gram Perceptron Week of 17jan22
No ratings yet
3 cs772 Skip Gram Perceptron Week of 17jan22
69 pages
Unit-1 TOC (Theory of Computation)
No ratings yet
Unit-1 TOC (Theory of Computation)
107 pages
Evaluating Risks of Construction-Induced Building Damage For Large Underground Construction Projects
No ratings yet
Evaluating Risks of Construction-Induced Building Damage For Large Underground Construction Projects
28 pages
Homework2 Ans
No ratings yet
Homework2 Ans
5 pages
Confusion Matrix Validation of An Urdu Language Speech Number System
No ratings yet
Confusion Matrix Validation of An Urdu Language Speech Number System
7 pages
Notes - Ryan
No ratings yet
Notes - Ryan
258 pages
SAT SMT by Example
No ratings yet
SAT SMT by Example
671 pages
4Q Sampling Distribution of The Sample Means
No ratings yet
4Q Sampling Distribution of The Sample Means
26 pages
4.2. Regular Languages
No ratings yet
4.2. Regular Languages
24 pages
Title: Hidden Markov Model: Hidden Markov Model The States That Were Responsible For Emitting The Various Symbols Are
No ratings yet
Title: Hidden Markov Model: Hidden Markov Model The States That Were Responsible For Emitting The Various Symbols Are
5 pages
Math 5 - Numerical Soltuions To Ce Problems: B S C E
No ratings yet
Math 5 - Numerical Soltuions To Ce Problems: B S C E
28 pages
Course Outline Honors Physics - 2021-2022
No ratings yet
Course Outline Honors Physics - 2021-2022
9 pages
Csci 544 Sequence Labeling L
No ratings yet
Csci 544 Sequence Labeling L
79 pages
Math 462: HW3 Solutions
No ratings yet
Math 462: HW3 Solutions
8 pages
Porous Media in Openfoam: Chalmers Spring 2009
No ratings yet
Porous Media in Openfoam: Chalmers Spring 2009
14 pages
VLSI Design Automation Syllabus Modified
No ratings yet
VLSI Design Automation Syllabus Modified
3 pages
Last Unit Motes
No ratings yet
Last Unit Motes
11 pages
Syllabii OF B.Tech. Computer Engineering 2002
No ratings yet
Syllabii OF B.Tech. Computer Engineering 2002
82 pages
Unit - 2
No ratings yet
Unit - 2
33 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Average Case Analysis of Binary Search
No ratings yet
Average Case Analysis of Binary Search
3 pages
Topografia Moire
No ratings yet
Topografia Moire
6 pages
3 - Deep Learning
No ratings yet
3 - Deep Learning
33 pages
Computer Science BCS
No ratings yet
Computer Science BCS
28 pages
HW Assignment 2: Simplex Method For Solving LP and LINDO: TRAN-650 Urban Systems Engineering
No ratings yet
HW Assignment 2: Simplex Method For Solving LP and LINDO: TRAN-650 Urban Systems Engineering
3 pages
Lec 10 New
No ratings yet
Lec 10 New
57 pages
Markov Chain Implementation in C++ Using Eigen
No ratings yet
Markov Chain Implementation in C++ Using Eigen
9 pages
CSE330 Quiz Solutions
No ratings yet
CSE330 Quiz Solutions
5 pages
NLP Unit 4
No ratings yet
NLP Unit 4
22 pages
Joint Sparse Channel Estimation and Data Detection For Underwater Acoustic Channels Using Partial Interval Demodulation
No ratings yet
Joint Sparse Channel Estimation and Data Detection For Underwater Acoustic Channels Using Partial Interval Demodulation
6 pages
IEEE Template Research-Track
No ratings yet
IEEE Template Research-Track
3 pages
Class 9 Sample Paper 2020-21
No ratings yet
Class 9 Sample Paper 2020-21
3 pages
Lecture 04
No ratings yet
Lecture 04
42 pages
Grains Weight (G) : Ugyen Academy Assignments For Class VIII Students - 2022
No ratings yet
Grains Weight (G) : Ugyen Academy Assignments For Class VIII Students - 2022
6 pages
Sequence Learning Problem
No ratings yet
Sequence Learning Problem
42 pages
Transformer NLP
No ratings yet
Transformer NLP
64 pages
1 Introduction To Rings
No ratings yet
1 Introduction To Rings
23 pages
Theory of Computation
No ratings yet
Theory of Computation
119 pages
L5 Cse256 Fa24 LM
No ratings yet
L5 Cse256 Fa24 LM
65 pages
AN2DL 05 2324 Seq2SeqAndWordEmbedding
No ratings yet
AN2DL 05 2324 Seq2SeqAndWordEmbedding
42 pages
13 TextGen 2024
No ratings yet
13 TextGen 2024
106 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
Lecture2 Transformer
No ratings yet
Lecture2 Transformer
64 pages
Anlp 04 Seqmod - Annotated
No ratings yet
Anlp 04 Seqmod - Annotated
51 pages
Module5 DS PPT
No ratings yet
Module5 DS PPT
38 pages
Lecture05-Hmm Pos Tagging
No ratings yet
Lecture05-Hmm Pos Tagging
38 pages
Chap 2
No ratings yet
Chap 2
47 pages
Maths MCQ & FB From Revised Edition of Text Book-SSB
No ratings yet
Maths MCQ & FB From Revised Edition of Text Book-SSB
50 pages
Recurrent Neural Networks (RNN) : Course 5: Sequence Models
No ratings yet
Recurrent Neural Networks (RNN) : Course 5: Sequence Models
76 pages
NLP Basics
No ratings yet
NLP Basics
119 pages
Risk-Neutral Valuation and Scenario Analysis (Class 6, 7, 8, 9, 10 and 11)
No ratings yet
Risk-Neutral Valuation and Scenario Analysis (Class 6, 7, 8, 9, 10 and 11)
64 pages
1 Automata
No ratings yet
1 Automata
104 pages
CS242 Module 9 Part-2
No ratings yet
CS242 Module 9 Part-2
70 pages
Ch. 1 Notes
No ratings yet
Ch. 1 Notes
11 pages
DS3001 - DAV - Final Exam - Fall23 - v3
No ratings yet
DS3001 - DAV - Final Exam - Fall23 - v3
14 pages
Level - I (C.W) (Areas)
No ratings yet
Level - I (C.W) (Areas)
3 pages
01-Transformer Based NLP Applications
No ratings yet
01-Transformer Based NLP Applications
55 pages
Estimating On A Number Line To 1000 - Horizontal
No ratings yet
Estimating On A Number Line To 1000 - Horizontal
7 pages
Control Flow A Byte of Python
No ratings yet
Control Flow A Byte of Python
11 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
PCS224 MST 23
No ratings yet
PCS224 MST 23
3 pages
Ix MCQS
No ratings yet
Ix MCQS
37 pages
05 Timeseries
No ratings yet
05 Timeseries
26 pages
Instant Access To Topics in Non Commutative Geometry Y. Manin Ebook Full Chapters
No ratings yet
Instant Access To Topics in Non Commutative Geometry Y. Manin Ebook Full Chapters
51 pages
CSE - Unit 5 - ToC - Basic Complexity
No ratings yet
CSE - Unit 5 - ToC - Basic Complexity
39 pages
CT-1 Sem6
No ratings yet
CT-1 Sem6
6 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
NLP LLM
No ratings yet
NLP LLM
47 pages
ML4D-L6 nlp2
No ratings yet
ML4D-L6 nlp2
58 pages
13-Neuralcrf Pos Tagging
No ratings yet
13-Neuralcrf Pos Tagging
40 pages
Primes
No ratings yet
Primes
39 pages
02 Random Vars All Handout
No ratings yet
02 Random Vars All Handout
23 pages
ch07 Consistency Replication
No ratings yet
ch07 Consistency Replication
30 pages
Tut4 - WordEmb NLP
No ratings yet
Tut4 - WordEmb NLP
30 pages
Project-4 MS
No ratings yet
Project-4 MS
49 pages
04-Textcat Text Class
No ratings yet
04-Textcat Text Class
77 pages
2.BasicTextProcessing NEW
No ratings yet
2.BasicTextProcessing NEW
39 pages
01-Introduction PLC
No ratings yet
01-Introduction PLC
53 pages
Bag - of - Words NLP
No ratings yet
Bag - of - Words NLP
23 pages
CSE538 sp25 (4) Lexical and Vector Semantics 2-25 NLP
No ratings yet
CSE538 sp25 (4) Lexical and Vector Semantics 2-25 NLP
126 pages
2DI90 ch11
No ratings yet
2DI90 ch11
54 pages
Syntactic and Dependency Parsing
No ratings yet
Syntactic and Dependency Parsing
159 pages
Imc Shift-Cipher
No ratings yet
Imc Shift-Cipher
17 pages
10 Estimators Pre Lecture
No ratings yet
10 Estimators Pre Lecture
109 pages
3 - Slides Corpus3
No ratings yet
3 - Slides Corpus3
88 pages
4 - Slides Regualer Expression
No ratings yet
4 - Slides Regualer Expression
75 pages
07 Covariance Answers Hidden Lecture
No ratings yet
07 Covariance Answers Hidden Lecture
62 pages
POS Tagging
No ratings yet
POS Tagging
63 pages
01-Bayes-All-Handout Prob
No ratings yet
01-Bayes-All-Handout Prob
28 pages
2DI90 ch9
No ratings yet
2DI90 ch9
83 pages
2DI90 chID190-CH5
No ratings yet
2DI90 chID190-CH5
62 pages
Jarrar LectureNotes Ch1 Introduction
No ratings yet
Jarrar LectureNotes Ch1 Introduction
18 pages
13-Oo-Opolymorphism PLC
No ratings yet
13-Oo-Opolymorphism PLC
15 pages
Slides08 LR Parsing
No ratings yet
Slides08 LR Parsing
25 pages
New Trends For Authentication
No ratings yet
New Trends For Authentication
5 pages
Reduction Proofs
No ratings yet
Reduction Proofs
9 pages
Deep
No ratings yet
Deep
73 pages
Lect33 Textcat
No ratings yet
Lect33 Textcat
70 pages
L6 - UCLxDeepMind DL2020 Document of Google
No ratings yet
L6 - UCLxDeepMind DL2020 Document of Google
141 pages
LLMs For Mathematicians 1702200180
No ratings yet
LLMs For Mathematicians 1702200180
13 pages
AML - Lecture - 09 - 08nov24
No ratings yet
AML - Lecture - 09 - 08nov24
126 pages
Lecture 14 Post Training Annotations
No ratings yet
Lecture 14 Post Training Annotations
79 pages
Lecture 6 N Gram Language Models Contd Annotations
No ratings yet
Lecture 6 N Gram Language Models Contd Annotations
36 pages
Lec Sequence Modelling
No ratings yet
Lec Sequence Modelling
52 pages
PyQt6 101: A Beginner’s guide to PyQt6
From Everand
PyQt6 101: A Beginner’s guide to PyQt6
Edward Chang
No ratings yet
Fast Data Processing Systems with SMACK Stack
From Everand
Fast Data Processing Systems with SMACK Stack
Raúl Estrada
No ratings yet