0% found this document useful (0 votes)

76 views18 pages

Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis

Part-of-speech tagging involves assigning lexical categories such as noun or verb to each word in a text. While early systems divided words into 8 categories, modern tagsets contain over 40 tags. Tagging accuracy has improved from baseline methods that assign each word its most frequent tag (90% accuracy) to current systems that use machine learning and features from neighboring words to achieve over 97% accuracy. However, tagging unknown and ambiguous words remains challenging.

Uploaded by

Derryza

We take content rights seriously. If you suspect this is your content, claim it here.

0% found this document useful (0 votes)

76 views18 pages

Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis

Uploaded by

Derryza

We take content rights seriously. If you suspect this is your content, claim it here.

You are on page 1/ 18

Part-of-speech

tagging

A simple but useful form of

linguistic analysis

Christopher Manning
Christopher Manning

Parts of Speech
• Perhaps starting with Aristotle in the West (384–322 BCE), there
was the idea of having parts of speech
• a.k.a lexical categories, word classes, “tags”, POS
• It comes from Dionysius Thrax of Alexandria (c. 100 BCE) the
idea that is still with us that there are 8 parts of speech
• But actually his 8 aren’t exactly the ones we are taught today
• Thrax: noun, verb, article, adverb, preposition, conjunction, participle,
pronoun
• School grammar: noun, verb, adjective, adverb, preposition,
conjunction, pronoun, interjection
Open class (lexical) words
Nouns Verbs Adjectives old older oldest

Proper Common Main Adverbs slowly

IBM cat / cats see
Italy snow registered Numbers … more
122,312
one
Closed class (functional)
Modals
Determiners the some can Prepositions to with
had
Conjunctions and or Particles off up … more

Pronouns he its Interjections Ow Eh

Christopher Manning

Open vs. Closed classes

• Open vs. Closed classes
• Closed:
• determiners: a, an, the
• pronouns: she, he, I
• prepositions: on, under, over, near, by, …
• Why “closed”?
• Open:
• Nouns, Verbs, Adjectives, Adverbs.
Christopher Manning

POS Tagging
• Words often have more than one POS: back
• The back door = JJ
• On my back = NN
• Win the voters back = RB
• Promised to back the bill = VB
• The POS tagging problem is to determine the POS tag for a
particular instance of a word.
Christopher Manning

POS Tagging
• Input: Plays well with others Penn
• Ambiguity: NNS/VBZ UH/JJ/NN/RB IN NNS Treebank
POS tags
• Output: Plays/VBZ well/RB with/IN others/NNS
• Uses:
• Text-to-speech (how do we pronounce “lead”?)
• Can write regexps like (Det) Adj* N+ over the output for phrases, etc.
• As input to or to speed up a full parser
• If you know the tag, you can back off to it in other tasks
Christopher Manning

POS tagging performance

• How many tags are correct? (Tag accuracy)
• About 97% currently
• But baseline is already 90%
• Baseline is performance of stupidest possible method
• Tag every word with its most frequent tag
• Tag unknown words as nouns
• Partly easy because
• Many words are unambiguous
• You get points for them (the, a, etc.) and for punctuation marks!
Christopher Manning

Deciding on the correct part of speech can

be difficult even for people

• Mrs/NNP Shaefer/NNP never/RB got/VBD around/RP to/TO

joining/VBG

• All/DT we/PRP gotta/VBN do/VB is/VBZ go/VB around/IN the/DT

corner/NN

• Chateau/NNP Petrus/NNP costs/VBZ around/RB 250/CD

Christopher Manning

How difficult is POS tagging?

• About 11% of the word types in the Brown corpus are
ambiguous with regard to part of speech
• But they tend to be very common words. E.g., that
• I know that he is honest = IN
• Yes, that play was nice = DT
• You can’t go that far = RB
• 40% of the word tokens are ambiguous
Part-of-speech
tagging

A simple but useful form

of linguistic analysis

Christopher Manning
Part-of-speech
tagging revisited

A simple but useful form

of linguistic analysis

Christopher Manning
Christopher Manning

Sources of information
• What are the main sources of information for POS tagging?
• Knowledge of neighboring words
• Bill saw that man yesterday
• NNP NN DT NN NN
• VB VB(D) IN VB NN
• Knowledge of word probabilities
• man is rarely used as a verb….
• The latter proves the most useful, but the former also helps
Christopher Manning

More and Better Features  Feature-

based tagger
• Can do surprisingly well just looking at a word by itself:
• Word the: the  DT
• Lowercased word Importantly: importantly  RB
• Prefixes unfathomable: un-  JJ
• Suffixes Importantly: -ly  RB
• Capitalization Meridian: CAP  NNP
• Word shapes 35-year: d-x  JJ
• Then build a maxent (or whatever) model to predict tag
• Maxent P(t|w): 93.7% overall / 82.6% unknown
Christopher Manning

Overview: POS Tagging Accuracies

• Rough accuracies:
• Most freq tag: ~90% / ~50%

Most errors
• Trigram HMM: ~95% / ~55%
on unknown
• Maxent P(t|w): 93.7% / 82.6% words
• TnT (HMM++): 96.2% / 86.0%
• MEMM tagger: 96.9% / 86.9%
• Bidirectional dependencies: 97.2% / 90.0%
• Upper bound: ~98% (human agreement)
Christopher Manning

How to improve supervised results?

• Build better features!
RB
PRP VBD IN RB IN PRP VBD .
They left as soon as he arrived .

• We could fix this with a feature that looked at the next word
JJ
NNP NNS VBD VBN .
Intrinsic flaws remained undetected .

• We could fix this by linking capitalized words to their lowercase versions

Christopher Manning

Tagging Without Sequence Information

Baseline Three Words
t0 t0

w0 w-1 w0 w1

Model Features Token Unknown Sentence

Baseline 56,805 93.69% 82.61% 26.74%
3Words 239,767 96.57% 86.78% 48.27%
Using words only in a straight classifier works as well as a
basic (HMM or discriminative) sequence model!!
Christopher Manning

Summary of POS Tagging

For tagging, the change from generative to discriminative model does not
by itself result in great improvement
One profits from models for specifying dependence on overlapping
features of the observation such as spelling, suffix analysis, etc.
An MEMM allows integration of rich features of the observations, but can
suffer strongly from assuming independence from following
observations; this effect can be relieved by adding dependence on
following words
This additional power (of the MEMM ,CRF, Perceptron models) has been
shown to result in improvements in accuracy
The higher accuracy of discriminative models comes at the price of much
slower training
Part-of-speech
tagging revisited

A simple but useful form

of linguistic analysis

Christopher Manning

Plural Forms of Compound Nouns
100% (1)
Plural Forms of Compound Nouns
6 pages
Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis Christopher Manning
No ratings yet
Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis Christopher Manning
14 pages
Lecture#11 (POS Tagging)
No ratings yet
Lecture#11 (POS Tagging)
19 pages
Lecture 20-23 Part of Speech Tagging
No ratings yet
Lecture 20-23 Part of Speech Tagging
36 pages
10 - POS Tagging
No ratings yet
10 - POS Tagging
75 pages
Lecture Part of Speech Tagging
No ratings yet
Lecture Part of Speech Tagging
41 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
94 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
36 pages
NLPChapter 3
No ratings yet
NLPChapter 3
14 pages
Lec3-Posner Intro
No ratings yet
Lec3-Posner Intro
30 pages
Part-Of-Speech (POS) Tagging
No ratings yet
Part-Of-Speech (POS) Tagging
53 pages
POS Tagging: Introduction: Heng Ji
No ratings yet
POS Tagging: Introduction: Heng Ji
35 pages
NLP Unit III Notes
No ratings yet
NLP Unit III Notes
30 pages
Ilak Pos Tagging
No ratings yet
Ilak Pos Tagging
48 pages
Unit 3
No ratings yet
Unit 3
16 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
47 pages
Unit3 01
No ratings yet
Unit3 01
10 pages
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
No ratings yet
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
108 pages
Ai TXT Unit4
No ratings yet
Ai TXT Unit4
39 pages
3 Natural Language Processing-PoS Tagging
No ratings yet
3 Natural Language Processing-PoS Tagging
14 pages
Natural Language Processing: Parts of Speech Tagging - Pos
No ratings yet
Natural Language Processing: Parts of Speech Tagging - Pos
20 pages
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
No ratings yet
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
40 pages
Pos Tagging and Chunking
No ratings yet
Pos Tagging and Chunking
29 pages
Lect6 Pos
No ratings yet
Lect6 Pos
62 pages
Cme4408 p6 Pos Tagging
No ratings yet
Cme4408 p6 Pos Tagging
33 pages
NLP 4
No ratings yet
NLP 4
83 pages
NLP Ia2
No ratings yet
NLP Ia2
18 pages
Module 2 HMMPPT
No ratings yet
Module 2 HMMPPT
31 pages
4-Lecture Four - (Part of Speech Tagging and Sequence Labeling)
No ratings yet
4-Lecture Four - (Part of Speech Tagging and Sequence Labeling)
36 pages
Parts of Speech Tagging
No ratings yet
Parts of Speech Tagging
17 pages
Part of Speech Tagging
No ratings yet
Part of Speech Tagging
13 pages
Lecture 16-17-18-19
No ratings yet
Lecture 16-17-18-19
42 pages
8 POSNER Intro May 6 2021
No ratings yet
8 POSNER Intro May 6 2021
26 pages
10pos Tagging PDF
No ratings yet
10pos Tagging PDF
76 pages
Multi-Tagging For Transition-Based Dependency Parsing
No ratings yet
Multi-Tagging For Transition-Based Dependency Parsing
10 pages
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
No ratings yet
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
86 pages
SPR 07 Nltk2
No ratings yet
SPR 07 Nltk2
30 pages
Developing Methods For Part of Speech Tagging in Turkish Language
No ratings yet
Developing Methods For Part of Speech Tagging in Turkish Language
45 pages
Tagging and Its Types
No ratings yet
Tagging and Its Types
3 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
13 pages
Speech Recognition Architecture
No ratings yet
Speech Recognition Architecture
13 pages
Rule Based POS Tagging Example
No ratings yet
Rule Based POS Tagging Example
4 pages
Lecture 5
No ratings yet
Lecture 5
56 pages
Syntactic Processing - Lecture Notes
No ratings yet
Syntactic Processing - Lecture Notes
56 pages
Print Lect6 Pos
No ratings yet
Print Lect6 Pos
11 pages
PARTS OF SPEECH TAGGING Article
No ratings yet
PARTS OF SPEECH TAGGING Article
4 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
POStagging
No ratings yet
POStagging
72 pages
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
No ratings yet
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
63 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
Module 3
No ratings yet
Module 3
33 pages
Rutuja
No ratings yet
Rutuja
10 pages
Lec-5 POStagging
No ratings yet
Lec-5 POStagging
24 pages
Language Structure
No ratings yet
Language Structure
10 pages
2023 Icon-1 70
No ratings yet
2023 Icon-1 70
11 pages
Automatic Tagging. Project, Holovko Yana
No ratings yet
Automatic Tagging. Project, Holovko Yana
9 pages
5 Sequence Learning
No ratings yet
5 Sequence Learning
50 pages
Assignment 3
No ratings yet
Assignment 3
12 pages
Parts of Speech
No ratings yet
Parts of Speech
26 pages
Say What? Second Edition: The Fiction Writer's Handy Guide to Grammar, Punctuation, and Word Usage: The Writer's Toolbox Series, #1
From Everand
Say What? Second Edition: The Fiction Writer's Handy Guide to Grammar, Punctuation, and Word Usage: The Writer's Toolbox Series, #1
C. S. Lakin
4/5 (1)
Writing for the World of Work: “Writing Right – a Clear, Concise,Complete, Correct and Courteous Approach to Good Business Writing”
From Everand
Writing for the World of Work: “Writing Right – a Clear, Concise,Complete, Correct and Courteous Approach to Good Business Writing”
Harley Robinson
No ratings yet
Lingva LATINA VII Lectio XV
No ratings yet
Lingva LATINA VII Lectio XV
14 pages
Alphabet Spanish
No ratings yet
Alphabet Spanish
4 pages
Countable Uncountable 2
No ratings yet
Countable Uncountable 2
6 pages
Class 7 Grammar
No ratings yet
Class 7 Grammar
72 pages
Irregular Nouns Norwegian
No ratings yet
Irregular Nouns Norwegian
1 page
Get Your Greek On!
100% (8)
Get Your Greek On!
58 pages
Danish Grammar PDF
No ratings yet
Danish Grammar PDF
12 pages
ECAT STD 2 Sample Question Paper
0% (1)
ECAT STD 2 Sample Question Paper
7 pages
Gustar and Verbs Like Gustar - Notes and Worksheet
No ratings yet
Gustar and Verbs Like Gustar - Notes and Worksheet
3 pages
Morphology Lecture 1
No ratings yet
Morphology Lecture 1
38 pages
KDK
No ratings yet
KDK
12 pages
Colours in Latin
No ratings yet
Colours in Latin
29 pages
Adverb Placement
No ratings yet
Adverb Placement
5 pages
Gerunds
No ratings yet
Gerunds
3 pages
Lecture 10: Compound & Other Word Formation Processes: Morphology
No ratings yet
Lecture 10: Compound & Other Word Formation Processes: Morphology
5 pages
Mentor Text Set
No ratings yet
Mentor Text Set
14 pages
English Worksheet1
No ratings yet
English Worksheet1
6 pages
Nominative, Accusative, and Dative
No ratings yet
Nominative, Accusative, and Dative
3 pages
Latin For Kids
No ratings yet
Latin For Kids
46 pages
English-Sanskrit Dictionary - Anglasamskritakoshah आङ्ग्लसंस्कृतकोशः PDF
75% (4)
English-Sanskrit Dictionary - Anglasamskritakoshah आङ्ग्लसंस्कृतकोशः PDF
87 pages
French Tutorial Basic Phrases, Vocabulary and Grammar: Babelfish
No ratings yet
French Tutorial Basic Phrases, Vocabulary and Grammar: Babelfish
22 pages
Fragments & Specimens or Early Latin - JOHN WORDSWORTH
100% (3)
Fragments & Specimens or Early Latin - JOHN WORDSWORTH
728 pages
Summary Basic Grammar (Part of Speech)
No ratings yet
Summary Basic Grammar (Part of Speech)
8 pages
Gerund and Infinitive
No ratings yet
Gerund and Infinitive
9 pages
Unit 4 - Have Got - Plurals
No ratings yet
Unit 4 - Have Got - Plurals
2 pages
Vocabulary For Primary School Level 2: J.Sisc+ Bi SR Beluran
No ratings yet
Vocabulary For Primary School Level 2: J.Sisc+ Bi SR Beluran
10 pages
Take My Word For It A Dictionary of English Idioms (Anatoly Liberman) (Z-Library)
83% (6)
Take My Word For It A Dictionary of English Idioms (Anatoly Liberman) (Z-Library)
336 pages
2nd Quarter Test Elem
No ratings yet
2nd Quarter Test Elem
2 pages
S1 Morphemes
No ratings yet
S1 Morphemes
2 pages