Lecture 20-23 Part of Speech Tagging

The document discusses part of speech (POS) tagging, distinguishing between open class (nouns, verbs, adjectives, adverbs) and closed class (determiners, pronouns, prepositions) words. It highlights the challenges of POS tagging in English, including ambiguity and the performance of current tagging methods, which achieve about 97% accuracy. Various algorithms for POS tagging, such as Hidden Markov Models and neural sequence models, are mentioned, along with the importance of training data and morphological analysis for unknown words.

Uploaded by

enl36756

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views36 pages

Lecture 20-23 Part of Speech Tagging

Uploaded by

enl36756

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Part Of Speech Tagging

Open class (lexical) words

Nouns Verbs Adjectives old older oldest

Proper Common Main Adverbs slowly

IBM cat / cats see
Italy snow registered Numbers … more
122,312
one
Closed class (functional)
Modals
Determiners the some can Prepositions to with
had
Conjunctions and or Particles off up … more

Pronouns he its Interjections Ow Eh

Open vs. Closed classes
• Open vs. Closed classes
• Closed:
• determiners: a, an, the
• pronouns: she, he, I
• prepositions: on, under, over, near, by, …
• Open:
• Nouns, Verbs, Adjectives, Adverbs.
Why Part of Speech Tagging?
• Parsing
• Machine Translation
• Sentiment of Affective tasks
• Text To Speech
• Meaning
How difficult is POS tagging in English?
• Roughly 15% of word types are ambiguous
• But those 15% tend to be very common so approximately 60% of
word tokens are ambiguous
• Words often have more than one POS: back
• The back door = JJ
• On my back = NN
• Win the voters back = RB
• Promised to back the bill = VB
• The POS tagging problem is to determine the POS tag for a particular
instance of a word.
POS tagging performance
• How many tags are correct? (Tag accuracy)
• About 97% currently
• But baseline is already 90%
• Baseline is performance of stupidest possible method
• Tag every word with its most frequent tag
• Tag unknown words as nouns
• Partly easy because
• Many words are unambiguous
• You get points for them (the, a, etc.) and for punctuation marks!
Sources of information
• What are the main sources of information for POS tagging?
• Knowledge of neighboring words
• Bill saw that man yesterday
• NNP NN DT NN NN
• VB VB(D) IN VB NN
• Knowledge of word probabilities
• man is rarely used as a verb….
• The latter proves the most useful, but the former also helps
More and Better Features ➔ Feature-based
tagger
• Can do surprisingly well just looking at a word by itself:
• Word the: the → DT
• Lowercased word Importantly: importantly → RB
• Prefixes unfathomable: un- → JJ
• Suffixes Importantly: -ly → RB
• Capitalization Meridian: CAP → NNP
• Word shapes 35-year: d-x → JJ
• Then build a supervised machine learning model to predict tag
• P(t|w):
Hidden Markov Model
Zero Order Markov Model (Unigram Model)
𝑛+1 𝑛

𝑝 𝑥1 … 𝑥𝑛 𝑦𝑖 … 𝑦𝑛+1 = ෑ 𝑞 𝑦𝑖 ෑ 𝑒 𝑥𝑖 𝑦𝑖 )
𝑖=1 𝑖=1

First Order Markov Model (Bigram Model)

𝑛+1 𝑛

𝑝 𝑥1 … 𝑥𝑛 𝑦𝑖 … 𝑦𝑛+1 = ෑ 𝑞 𝑦𝑖 | 𝑦𝑖−1 ෑ 𝑒 𝑥𝑖 𝑦𝑖 )
𝑖=1 𝑖=1
Unknown Words
• strongest source of information for guessing the part-of-speech of
unknown words is morphology.
• Words that end in -s are likely to be plural nouns (NNS),
• words ending with -ed tend to be past participles (VBN),
• words ending with -able adjectives (JJ),
Unknown Words
• Store for each final letter sequence (word suffixes) of up to 10 letters,
the statistics of the tag it was associated with in training.
• We are thus computing for each suffix of length i the probability of
the tag ti given the suffix letters

• Back-off is used to smooth these probabilities with successively

shorter suffixes.
Unknown Words
• we can compute the likelihood p(wi|ti) (Prob (word | tag )) that
HMMs require by using Bayesian inversion (i.e., using Bayes rule)

= *
Tagging Problem
Let |S| = 50, length of sequence = n = 15
|S|n = 5015
Standard Algorithms for POS Tagging
• HMM (with Viterbi algorithm)
• Neural Sequence Models (RNN, Transformers)
• Large Language Models (like BERT), fine-tuned

• All require hand-labelled training data, all about equal performance

(97% on English)
Training set
1. She/PRON eats/VERB fish/NOUN
2. He/PRON eats/VERB rice/NOUN
3. She/PRON likes/VERB rice/NOUN
4. Rice/NOUN is/VERB tasty/ADJ
5. Fish/NOUN is/VERB tasty/ADJ

• Test sentence: He likes fish rice, tag sequence: PRON VERB NOUN
NOUN
Reading
• Chapter 8, Speech and Language Processing, Third Edition

Pos Tagging Pushpak
No ratings yet
Pos Tagging Pushpak
88 pages
Syntactic Processing - Lecture Notes
No ratings yet
Syntactic Processing - Lecture Notes
56 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
36 pages
SPR 07 Nltk2
No ratings yet
SPR 07 Nltk2
30 pages
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
No ratings yet
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
108 pages
Developing Methods For Part of Speech Tagging in Turkish Language
No ratings yet
Developing Methods For Part of Speech Tagging in Turkish Language
45 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
13 pages
Module-2 NLP
No ratings yet
Module-2 NLP
50 pages
Unit3 01
No ratings yet
Unit3 01
10 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
94 pages
Lecture Part of Speech Tagging
No ratings yet
Lecture Part of Speech Tagging
41 pages
NLP Unit III Notes
No ratings yet
NLP Unit III Notes
30 pages
Language Structure
No ratings yet
Language Structure
10 pages
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
No ratings yet
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
93 pages
Lecture 16-17-18-19
No ratings yet
Lecture 16-17-18-19
42 pages
Module 2 HMMPPT
No ratings yet
Module 2 HMMPPT
31 pages
Part of Speech Tagging and Hidden Markov Models
No ratings yet
Part of Speech Tagging and Hidden Markov Models
24 pages
NLP 4
No ratings yet
NLP 4
83 pages
10 - POS Tagging
No ratings yet
10 - POS Tagging
75 pages
Cme4408 p6 Pos Tagging
No ratings yet
Cme4408 p6 Pos Tagging
33 pages
Lecture#11 (POS Tagging)
No ratings yet
Lecture#11 (POS Tagging)
19 pages
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
No ratings yet
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
86 pages
Unit 3
No ratings yet
Unit 3
50 pages
Print Lect6 Pos
No ratings yet
Print Lect6 Pos
11 pages
Lecture 5
No ratings yet
Lecture 5
56 pages
Module-5 (Markov Model and Pos Tagging)
No ratings yet
Module-5 (Markov Model and Pos Tagging)
66 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
5 Sequence Learning
No ratings yet
5 Sequence Learning
50 pages
Unit 3
No ratings yet
Unit 3
16 pages
Pos Tagging and Chunking
No ratings yet
Pos Tagging and Chunking
29 pages
Lect6 Pos
No ratings yet
Lect6 Pos
62 pages
Lec3-Posner Intro
No ratings yet
Lec3-Posner Intro
30 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
47 pages
10pos Tagging PDF
No ratings yet
10pos Tagging PDF
76 pages
NLPChapter 3
No ratings yet
NLPChapter 3
14 pages
Word Class Prediction of Ambiguous and Unknown Words of Punjabi Language Using Bi-Gram Methods
No ratings yet
Word Class Prediction of Ambiguous and Unknown Words of Punjabi Language Using Bi-Gram Methods
5 pages
Assignment 3
No ratings yet
Assignment 3
12 pages
Ilak Pos Tagging
No ratings yet
Ilak Pos Tagging
48 pages
Natural Language Processing: Parts of Speech Tagging - Pos
No ratings yet
Natural Language Processing: Parts of Speech Tagging - Pos
20 pages
Grammar Galore 5th Grade
57% (7)
Grammar Galore 5th Grade
38 pages
8 POSNER Intro May 6 2021
No ratings yet
8 POSNER Intro May 6 2021
26 pages
3 cs626 Pos Tagging Week of 8aug22
No ratings yet
3 cs626 Pos Tagging Week of 8aug22
27 pages
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
No ratings yet
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
63 pages
3 Natural Language Processing-PoS Tagging
No ratings yet
3 Natural Language Processing-PoS Tagging
14 pages
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
No ratings yet
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
40 pages
Part-Of-Speech (POS) Tagging
No ratings yet
Part-Of-Speech (POS) Tagging
53 pages
Lec-5 POStagging
No ratings yet
Lec-5 POStagging
24 pages
Nadina Visan Syntax
100% (3)
Nadina Visan Syntax
374 pages
Chapter Two Natural Language Processing
No ratings yet
Chapter Two Natural Language Processing
141 pages
POStagging
No ratings yet
POStagging
72 pages
NLP Report - Modified
No ratings yet
NLP Report - Modified
8 pages
Parts of Speech 50 Questions SPSC Practice
No ratings yet
Parts of Speech 50 Questions SPSC Practice
14 pages
Past Simple Verb To Be
100% (1)
Past Simple Verb To Be
20 pages
4-Lecture Four - (Part of Speech Tagging and Sequence Labeling)
No ratings yet
4-Lecture Four - (Part of Speech Tagging and Sequence Labeling)
36 pages
Rutuja
No ratings yet
Rutuja
10 pages
Descriptive Text
No ratings yet
Descriptive Text
15 pages
Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis
No ratings yet
Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis
18 pages
Lecture Notes On Syntactic Processing
No ratings yet
Lecture Notes On Syntactic Processing
14 pages
Complete The Sentences With My
100% (1)
Complete The Sentences With My
5 pages
A9254058119 PDF
No ratings yet
A9254058119 PDF
10 pages
Syntax
No ratings yet
Syntax
26 pages
The Defenite Tenses in English
No ratings yet
The Defenite Tenses in English
116 pages
Part of Speech Tagging
No ratings yet
Part of Speech Tagging
13 pages
POS Tagging: Introduction: Heng Ji
No ratings yet
POS Tagging: Introduction: Heng Ji
35 pages
Multi-Tagging For Transition-Based Dependency Parsing
No ratings yet
Multi-Tagging For Transition-Based Dependency Parsing
10 pages
Speech Recognition Architecture
No ratings yet
Speech Recognition Architecture
13 pages
Unit 4
No ratings yet
Unit 4
29 pages
Conjugation of Goodan and Ichidan Verbs
No ratings yet
Conjugation of Goodan and Ichidan Verbs
14 pages
Extract Questions Chapter 101
No ratings yet
Extract Questions Chapter 101
3 pages
Simple Sentence Nominalizations in Turkish: Yüksel Göknel 2013
No ratings yet
Simple Sentence Nominalizations in Turkish: Yüksel Göknel 2013
29 pages
Xavier IV Syllabus
No ratings yet
Xavier IV Syllabus
20 pages
Checkpoint English
100% (3)
Checkpoint English
2 pages
Grade 8 Small Summative 5 Variant A
No ratings yet
Grade 8 Small Summative 5 Variant A
2 pages
Here Is The Complete Exercise With Answers and Explanations For Each Blank
No ratings yet
Here Is The Complete Exercise With Answers and Explanations For Each Blank
2 pages
Eng3 Q2M1 SLM PDF
No ratings yet
Eng3 Q2M1 SLM PDF
16 pages
Lexikologia Otvety Ekzamen
No ratings yet
Lexikologia Otvety Ekzamen
17 pages
Semana 2 Ingles
No ratings yet
Semana 2 Ingles
2 pages
V1 V2 V3 V-Ing Infinitive/ Present Simple/Base Form Gerund Simple Past Past Participle
No ratings yet
V1 V2 V3 V-Ing Infinitive/ Present Simple/Base Form Gerund Simple Past Past Participle
10 pages
English Lesson 1
No ratings yet
English Lesson 1
13 pages
Ergativity HW: Attempt Review
No ratings yet
Ergativity HW: Attempt Review
4 pages
Escape The Room - Clue Cards Answers
No ratings yet
Escape The Room - Clue Cards Answers
5 pages
An Approximate Guide To Their Meaning and Use: Link Words
No ratings yet
An Approximate Guide To Their Meaning and Use: Link Words
2 pages
Same Spelling, Different Pronunciation and Meaning
No ratings yet
Same Spelling, Different Pronunciation and Meaning
2 pages
Although, However, in Spite Of/ Despite (CONTRAST)
No ratings yet
Although, However, in Spite Of/ Despite (CONTRAST)
2 pages
Activity Sheet 5 - Q4
No ratings yet
Activity Sheet 5 - Q4
2 pages
Subject + Be + V + by + O: Passive Voice
No ratings yet
Subject + Be + V + by + O: Passive Voice
5 pages
Irregular Verb List
No ratings yet
Irregular Verb List
4 pages
Modal Could Uses
No ratings yet
Modal Could Uses
2 pages
Across and Down: The ABC's of Solving Crossword Puzzles
From Everand
Across and Down: The ABC's of Solving Crossword Puzzles
Adrienne Cadik
No ratings yet

Lecture 20-23 Part of Speech Tagging

Uploaded by

Lecture 20-23 Part of Speech Tagging

Uploaded by

Part Of Speech Tagging

Open class (lexical) words

Proper Common Main Adverbs slowly

Pronouns he its Interjections Ow Eh

First Order Markov Model (Bigram Model)

• Back-off is used to smooth these probabilities with successively

• All require hand-labelled training data, all about equal performance

You might also like