0% found this document useful (0 votes)

2 views

lecture7-pos-tagging

Mcgill COMP 550 Fall 2024 lecture note

Uploaded by

Bohan Wang

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

lecture7-pos-tagging

Mcgill COMP 550 Fall 2024 lecture note

Uploaded by

Bohan Wang

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Lecture 7: Part of Speech Tagging

Instructor: Jackie CK Cheung & David Adelani

COMP-550
J&M Ch. 8.1–8.3 (1st ed); J&M Ch. 5.1–5.3
(2nd ed); J&M Ch. 8.1–8.4 (3rd ed)
Lecture cancellation
October 2 is cancelled due to NLP Workshop at MILA
• Register to attend online

https://mila.quebec/en/event/workshop-nlp-in-the-era-of-generative- 2
ai-cognitive-sciences-and-societal-transformation
So Far In the Course
Making a single prediction from a sequence
à text classification
Predicting the sequence itself
à language modelling
Today:
Making a series of predictions from a sequence, one
per token in the sequence
à sequence labelling
particular application: part-of-speech tagging

3
Outline
Parts of speech in English
POS tagging as a sequence labelling problem
Markov chains revisited
Hidden Markov models

4
Parts of Speech in English
Nouns restaurant, me, dinner
Verbs find, eat, is
Adjectives good, vegetarian
Prepositions in, of, up, above
Adverbs quickly, well, very
Determiners the, a, an

5
What is a Part of Speech?
A kind of syntactic category that tells you some of the
grammatical properties of a word.

The __________ was delicious.

• Only a noun fits here.
This hamburger is ___________ than that one.
• Only a comparative adjective fits.
The cat ate. (OK – grammatical)
*The cat enjoyed. (Ungrammatical. Note the *)

6
Important Note
You may have learned in grade school that nouns =
things, verbs = actions. This is wrong!

Nouns that can be actions or events:

• Examination, wedding, construction, opening
Verbs that are not necessarily actions:
• Be, have, want, enjoy, remember, realize

7
Penn Treebank Tagset
CC Coordinating conjunction PRP$ Possessive pronoun
CD Cardinal number RB Adverb
DT Determiner RBR Adverb, comparative
EX Existential there RBS Adverb, superlative
FW Foreign word RP Particle
IN Preposition; subord. conjunct. SYM Symbol
JJ Adjective TO to
JJR Adjective, comparative UH Interjection
JJS Adjective, superlative VB Verb, base form
LS List item marker VBD Verb, past tense
MD Modal VBG Verb, gerund or present part.
NN Noun, singular or mass VBN Verb, past participle
NNS Noun, plural VBP Verb, non-3rd pers. sing. pres.
NNP Proper noun, singular VBZ Verb, 3rd pers. sing. pres.
NNPS Proper noun, plural WDT Wh-determiner
PDT Predeterminer WP Wh-pronoun
POS Possessive ending WP$ Possessive wh-pronoun
PRP Personal pronoun WRB Wh-adverb

8
Other Parts of Speech
Modals and auxiliary verbs
• The police can and will catch the fugitives.
• Did the chicken cross the road?
In English, these play an important role in question
formation, and in specifying tense, aspect and mood.
Conjunctions
• and, or, but, yet
They connect and relate elements.
Particles
• look up, turn down
Can be parts of particle verbs. May have other functions
(depending on what you consider a particle.)

9
Classifying Parts of Speech: Open Class
Open classes are parts of speech for which new words
are readily added to the language (neologisms).
• Nouns Twitter, Kleenex, turducken
• Verbs google, photoshop
• Adjectives Pastafarian, sick
• Adverbs automagically
• Interjections D’oh!
• More at https://neologisms.rice.edu/word/browse
Open class words usually convey most of the content.
They tend to be content words.

10
Closed Class
Closed classes are parts of speech for which new words
tend not to be added.
• Pronouns I, he, she, them, their
• Determiners a, the
• Quantifiers some, all, every
• Conjunctions and, or, but
• Modals and auxiliaries might, should, ought
• Prepositions to, of, from
Closed classes tend to convey grammatical information.
They tend to be function words.

11
Universal dependency Tagset
Open classes Closed classes
ADJ Adjective ADP Adposition
ADV Adverb AUX Auxiliary
INTJ Interjection CCONJ Coordinating conjunction
NOUN Noun DET Determiner
PROPN Proper noun NUM Numeral
VERB Verb PART Particle
PRON Pronoun
SCONJ Subordinating conjunction
Other
PUNCT Punctuation
SYM Symbol https://universaldependencies.org/u/pos/index.html
X other

12
Corpus Differences
How fine-grained do you want your tags to be?
e.g., PTB tagset distinguishes singular from plural nouns
• NN cat, water
• NNS cats

e.g., PTB doesn’t distinguish between intransitive verbs and

transitive verbs
• VBD listened (intransitive)
• VBD heard (transitive)

Brown corpus (87 tags) vs. PTB (45)

13
Language Differences
Languages differ widely in which parts of speech they
have, and in their specific functions and behaviours.
• In Japanese, there is no great distinction between nouns
and pronouns. Pronouns are open class. OTOH, true verbs
are a closed class.
• I in Japanese: watashi, watakushi, ore, boku, atashi, …
• In Wolof (Niger-Congo language spoken in West Africa),
verbs are not conjugated for person and tense. Instead,
pronouns are.
• maa ngi (1st person, singular, present continuous perfect)
• naa (1st person, singular, past perfect)
• In Salishan languages (in the Pacific Northwest), the
distinction between nouns and verbs is subtle or possibly
non-existent (disputed) (Kinkade, 1983).

14
Exercise
Give coarse POS tag labels to the following passage:

A Canadian geography nerd has become a bit of a TikTok

sensation in Iceland after he wowed a social media

influencer with his detailed knowledge of the country.

15
POS Tagging
Assume we have a tagset and a corpus with words
labelled with POS tags. What kind of problem is this?
Supervised or unsupervised?
Classification or regression?

Difference from classification that we saw last class—

context matters!
I saw the …
The team won the match …
Several cats …

16
Sequence Labelling
Predict labels for an entire sequence of inputs:
? ? ? ? ? ? ? ? ? ? ?
Pierre Vinken , 61 years old , will join the board …

NNP NNP , CD NNS JJ , MD VB DT NN

Pierre Vinken , 61 years old , will join the board …
Must consider:
Current word
Previous context

17
Markov Chains
Our model will assume an underlying Markov process
that generates the POS tags and words.
You’ve already seen Markov processes:
• Morphology: transitions between morphemes that make
up a word
• N-gram models: transitions between words that make up
a sentence
In other words, they are highly related to finite state
automata

18
Observable Markov Model
• N states that represent
unique observations about
the world. car

• Transitions between states

are weighted—weights of ants ran
all outgoing edges from a
state sum to 1.

• e.g., this is a bigram model of the

• What would a trigram
model look like?

19
Unrolling the Timesteps
A walk along the states in the Markov chain generates
the text that is observed:

the car of ants ran

The probability of the observation is the product of all

the edge weights (i.e., transition probabilities).

20
Hidden Variables
The POS tags to be predicted are hidden variables. We
don’t see them during test time (and sometimes not
during training either).
It is very common to have hidden phenomena:
• Encrypted symbols are outputs of hidden messages
• Genes are outputs of functional relationships
• Weather is the output of hidden climate conditions
• Stock prices are the output of market conditions
• …

21
Markov Process w/ Hidden Variables
Model transitions between POS tags, and outputs
(“emits”) a word which is observed at each timestep.

be 0.15
have 0.07
VB do 0.04
thing 0.03 …
stuff 0.015
market 0.006 0.04
… NN DT the 0.55
0.7 a 0.35
an 0.05
0.27 …
JJ
good 0.06
bad 0.35
…

22
Unrolling the Timesteps
Now, the sample looks something like this:

DT NN IN NNS VBD

the car of ants ran

23
Probability of a Sequence
Suppose we know both the sequence of POS tags and
words generated by them:
𝑃(𝑇ℎ𝑒/𝐷𝑇 𝑐𝑎𝑟/𝑁𝑁 𝑜𝑓/𝐼𝑁 𝑎𝑛𝑡𝑠/𝑁𝑁𝑆 𝑟𝑎𝑛/𝑉𝐵𝐷)
emit
= 𝑃 𝐷𝑇 ×𝑃 𝐷𝑇 → 𝑇ℎ𝑒
×𝑃 𝐷𝑇 trans
→ 𝑁𝑁 ×𝑃(𝑁𝑁 → emit
𝑐𝑎𝑟)
trans emit
×𝑃 𝑁𝑁 → 𝐼𝑁 ×𝑃(𝐼𝑁 → 𝑜𝑓)
trans emit
×𝑃 𝐼𝑁 → 𝑁𝑁𝑆 ×𝑃(𝑁𝑁𝑆 → 𝑎𝑛𝑡𝑠)
trans emit
×𝑃 𝑁𝑁𝑆 → 𝑉𝐵𝐷 ×𝑃(𝑉𝐵𝐷 → 𝑟𝑎𝑛)

• Product of hidden state transitions and observation

emissions
• Note independence assumptions

24
Graphical Models
Since we now have many random variables, it helps to
visualize them graphically. Graphical models precisely
tell us:
• Latent or hidden random variables (clear)

𝑄! 𝑃(𝑄! = 𝑉𝐵) : Probability that tth tag is VB

• Observed random variables (filled)

𝑂! 𝑃(𝑂! = 𝑎𝑛𝑡𝑠) : Probability that tth word is ants

• Conditional independence assumptions (the edges)

25
Hidden Markov Models
Graphical representation

𝑄! 𝑄" 𝑄# 𝑄$ 𝑄%

𝑂! 𝑂" 𝑂# 𝑂$ 𝑂%

Denote entire sequence of tags as 𝑸

Entire sequence of words as 𝑶

26
Decomposing the Joint Probability
Graph specifies how join probability decomposes

𝑄! 𝑄" 𝑄# 𝑄$ 𝑄%

𝑂! 𝑂" 𝑂# 𝑂$ 𝑂%

$%" $

𝑃(𝑶, 𝑸) = 𝑃 𝑄" * 𝑃(𝑄!&" |𝑄! ) * 𝑃(𝑂! |𝑄! )

!#" !#"
Initial state probability
Emission probabilities
State transition probabilities
27
Model Parameters
Let there be 𝑁 possible tags, 𝑊 possible words
Parameters 𝜃 has three components:
1. Initial probabilities for 𝑄$:
Π = {𝜋! , 𝜋" , … , 𝜋# } (categorical)
2. Transition probabilities for 𝑄! to 𝑄!%$:
𝐴 = 𝑎$% 𝑖, 𝑗 ∈ [1, 𝑁] (categorical)
3. Emission probabilities for 𝑄! to 𝑂! :
𝐵 = 𝑏$ (𝑤& ) 𝑖 ∈ 1, 𝑁 , 𝑘 ∈ 1, 𝑊 (categorical)

How many distributions and values of each type are there?

28
Training a HMM POS Tagger
Suppose that we have a labelled corpus of words with
their POS tags.
Supervised training possible using techniques that we
learned for N-gram language models!
• Initial probability distribution: look at the POS tags in the
first word of each sentence
• Transition probability distributions: look at transitions of
POS tags that are seen in the training corpus
• Emission probability distributions: look at emissions of
words from each POS tag in the training corpus

29
Supervised Estimation of Parameters
Recall categorical distributions’ MLE:
#(outcome i)
𝑃 outcome i =
# all events
For our parameters:
# 𝑄$ = 𝑖
𝜋& = 𝑃 𝑄$ = 𝑖 =
#(sentences)

𝑎&' = 𝑃 𝑄!%$ = 𝑗 𝑄! = 𝑖) = #(𝑖, 𝑗) / #(𝑖)

𝑏&( = 𝑃 𝑂! = 𝑘 𝑄! = 𝑖) = #(word 𝑘, tag 𝑖) / #(𝑖)

Previous discussions about smoothing and OOV items
also apply here.
30
Exercise in Supervised Training
What are the MLE for the following training corpus?
• Give the initial probability distribution, and the transition
and emission distributions from the DT and VBD tags.
DT NN VBD IN DT NN
the cat sat on the mat

DT NN VBD JJ
the cat was sad

RB VBD DT NN
so was the mat

DT JJ NN VBD IN DT JJ NN
the sad cat was on the sad mat
31
Inference with HMMs
Now that we have a model, how do we actually tag a
new sentence?
• Suppose that for each word, we just found the most likely
POS tag that emitted it. What is the problem with this?
• Need a way to find the best POS tag sequence (and we
need to define what best means).

3. Given an observation sequence (without labels),

what is the best model for it?
Forward-backward algorithm
a.k.a. Baum-Welch algorithm
a.k.a. Expectation Maximization

The Mystery of The Spanish Chest
No ratings yet
The Mystery of The Spanish Chest
8 pages
Lecture 5
No ratings yet
Lecture 5
56 pages
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
No ratings yet
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
86 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
9.Chapter7 POS Tagging
No ratings yet
9.Chapter7 POS Tagging
37 pages
unit-3
No ratings yet
unit-3
50 pages
POStagging
No ratings yet
POStagging
72 pages
Chapter Two Natural Language Processing
No ratings yet
Chapter Two Natural Language Processing
141 pages
Lecture 04
No ratings yet
Lecture 04
42 pages
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
No ratings yet
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
93 pages
Natural Language Processing: Dr. G. Bharadwaja Kumar
No ratings yet
Natural Language Processing: Dr. G. Bharadwaja Kumar
44 pages
Roark - Lec 2 - HMM Viterbi Forward
No ratings yet
Roark - Lec 2 - HMM Viterbi Forward
37 pages
Week9
No ratings yet
Week9
36 pages
Lecture 16-17-18-19
No ratings yet
Lecture 16-17-18-19
42 pages
Lect6 Pos
No ratings yet
Lect6 Pos
62 pages
Lecture05-Hmm Pos Tagging
No ratings yet
Lecture05-Hmm Pos Tagging
38 pages
Sequence Labeling For Parts of Speech and Named Entities: To Each Word A Warbling Note A Midsummer Night's Dream, V.I
No ratings yet
Sequence Labeling For Parts of Speech and Named Entities: To Each Word A Warbling Note A Midsummer Night's Dream, V.I
27 pages
08 Sequence Labelling
No ratings yet
08 Sequence Labelling
27 pages
NLP 4
No ratings yet
NLP 4
83 pages
NLP-Lectures 4,5,6
No ratings yet
NLP-Lectures 4,5,6
85 pages
Parts of Speech Using Hidden Markov Models
No ratings yet
Parts of Speech Using Hidden Markov Models
5 pages
Hmm
No ratings yet
Hmm
94 pages
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
No ratings yet
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
40 pages
CSCI 5832 Natural Language Processing: Jim Martin
No ratings yet
CSCI 5832 Natural Language Processing: Jim Martin
46 pages
Sequence Labeling For Parts of Speech and Named Entities: To Each Word A Warbling Note A Midsummer Night's Dream, V.I
No ratings yet
Sequence Labeling For Parts of Speech and Named Entities: To Each Word A Warbling Note A Midsummer Night's Dream, V.I
27 pages
Lec3-posner intro
No ratings yet
Lec3-posner intro
30 pages
Lec-5 POStagging
No ratings yet
Lec-5 POStagging
24 pages
Eisenstein
No ratings yet
Eisenstein
305 pages
NLP_Lecture_9_and_10_Week_5
No ratings yet
NLP_Lecture_9_and_10_Week_5
10 pages
NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing
No ratings yet
NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing
28 pages
Unit 1 NLP KCS072
No ratings yet
Unit 1 NLP KCS072
12 pages
Lec PoS Tagging 2022
No ratings yet
Lec PoS Tagging 2022
67 pages
NLP - Pos and N-Gram Models
No ratings yet
NLP - Pos and N-Gram Models
21 pages
Module-5 (Markov Model and Pos Tagging)
No ratings yet
Module-5 (Markov Model and Pos Tagging)
66 pages
17
No ratings yet
17
27 pages
Assignment 3
No ratings yet
Assignment 3
12 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
SebentaLN-parte1
No ratings yet
SebentaLN-parte1
42 pages
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
No ratings yet
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
63 pages
Chapter Four 1
No ratings yet
Chapter Four 1
91 pages
Natural Language Processing:: N-Gram Language Models
No ratings yet
Natural Language Processing:: N-Gram Language Models
48 pages
Module 2 HMMppt
No ratings yet
Module 2 HMMppt
31 pages
POS Tagging 2.0
No ratings yet
POS Tagging 2.0
14 pages
nlp-unit-iii-notes
No ratings yet
nlp-unit-iii-notes
30 pages
MOD-1
No ratings yet
MOD-1
71 pages
Cme4408 p6 Pos Tagging
No ratings yet
Cme4408 p6 Pos Tagging
33 pages
NLP 3
No ratings yet
NLP 3
25 pages
Part-Of-Speech (POS) Tagging
No ratings yet
Part-Of-Speech (POS) Tagging
53 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
47 pages
19CSE453 - Natural Language Processing: Part of Speech Tagging
No ratings yet
19CSE453 - Natural Language Processing: Part of Speech Tagging
59 pages
POS Tagging: Introduction: Heng Ji
No ratings yet
POS Tagging: Introduction: Heng Ji
35 pages
Natural Language: Anguage Odels
No ratings yet
Natural Language: Anguage Odels
28 pages
Hmms Spring2013
No ratings yet
Hmms Spring2013
22 pages
Lecture Notes On Syntactic Processing
No ratings yet
Lecture Notes On Syntactic Processing
14 pages
NLP CH 2
No ratings yet
NLP CH 2
59 pages
CH2
No ratings yet
CH2
119 pages
UBC Summer School in NLP - VSP 2019 Lecture 11
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 11
51 pages
5624 - Softskill - NLP
No ratings yet
5624 - Softskill - NLP
28 pages
Cortado-Cap 6
No ratings yet
Cortado-Cap 6
160 pages
Everyday English: How to Say What You Mean and Write Everything Right
From Everand
Everyday English: How to Say What You Mean and Write Everything Right
Patrick Scrivenor
No ratings yet
class-04-vocabulary
No ratings yet
class-04-vocabulary
20 pages
642850-mark-scheme-communicating-information-and-ideas
No ratings yet
642850-mark-scheme-communicating-information-and-ideas
21 pages
Unit 16 - (Lomloe) 2023
No ratings yet
Unit 16 - (Lomloe) 2023
7 pages
Home Instructor's Manual: Let's Sail Into Language Arts Basic Level
No ratings yet
Home Instructor's Manual: Let's Sail Into Language Arts Basic Level
58 pages
Sosiopragmatik 2013
No ratings yet
Sosiopragmatik 2013
3 pages
Research Final 3
No ratings yet
Research Final 3
65 pages
De Thi HSG Tieng Anh 6 de 14
No ratings yet
De Thi HSG Tieng Anh 6 de 14
7 pages
Icibemba Literacy Assessment Tool Set 1-3 2024
No ratings yet
Icibemba Literacy Assessment Tool Set 1-3 2024
4 pages
Theme 5 Verbs
No ratings yet
Theme 5 Verbs
8 pages
SSC Digialm Com Per g27 Pub 2207 Touchstone AssessmentQPHTMLMode1
No ratings yet
SSC Digialm Com Per g27 Pub 2207 Touchstone AssessmentQPHTMLMode1
74 pages
Introduction To Linguistics: Mopholog Y
No ratings yet
Introduction To Linguistics: Mopholog Y
8 pages
Canais para Aprender Coreano
No ratings yet
Canais para Aprender Coreano
3 pages
801 3rd Term2 PDF
No ratings yet
801 3rd Term2 PDF
6 pages
Download Complete The Routledge Handbook of Sociophonetics 1st Edition Christopher Strelluf (Ed.) PDF for All Chapters
100% (9)
Download Complete The Routledge Handbook of Sociophonetics 1st Edition Christopher Strelluf (Ed.) PDF for All Chapters
60 pages
Знімок екрана 2024-10-14 о 21.04.58
No ratings yet
Знімок екрана 2024-10-14 о 21.04.58
80 pages
Artikel For Household & Furniture PDF
100% (2)
Artikel For Household & Furniture PDF
2 pages
The Analysis of Lyrics and Its Relationship With Melody and Chord Progression
No ratings yet
The Analysis of Lyrics and Its Relationship With Melody and Chord Progression
18 pages
Conditional Sentence
No ratings yet
Conditional Sentence
15 pages
That That: Adjective + Clause (Tính Từ + Mệnh Đề)
No ratings yet
That That: Adjective + Clause (Tính Từ + Mệnh Đề)
2 pages
Pre Interview Task
0% (2)
Pre Interview Task
10 pages
Sem 2 - Business Communication II Assignment 2 Before Submission
No ratings yet
Sem 2 - Business Communication II Assignment 2 Before Submission
13 pages
Grammar: Expressing Movement
No ratings yet
Grammar: Expressing Movement
3 pages
Long Weekend Long Weekend: Middle East International School Academic Year 2021 - 2022 Homework Schedu Grade: 5
No ratings yet
Long Weekend Long Weekend: Middle East International School Academic Year 2021 - 2022 Homework Schedu Grade: 5
2 pages
Script Debate
No ratings yet
Script Debate
5 pages
9th RMS Syllabus
No ratings yet
9th RMS Syllabus
1 page
First Conditional
No ratings yet
First Conditional
2 pages
CLEFT SENTENCE Mini Test
No ratings yet
CLEFT SENTENCE Mini Test
2 pages
CPI unit 1
No ratings yet
CPI unit 1
10 pages
RESEARCH PROPOSAL Lan Chinhsua 2
No ratings yet
RESEARCH PROPOSAL Lan Chinhsua 2
29 pages