0% found this document useful (0 votes)
76 views18 pages

Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis

Part-of-speech tagging involves assigning lexical categories such as noun or verb to each word in a text. While early systems divided words into 8 categories, modern tagsets contain over 40 tags. Tagging accuracy has improved from baseline methods that assign each word its most frequent tag (90% accuracy) to current systems that use machine learning and features from neighboring words to achieve over 97% accuracy. However, tagging unknown and ambiguous words remains challenging.

Uploaded by

Derryza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
0% found this document useful (0 votes)
76 views18 pages

Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis

Part-of-speech tagging involves assigning lexical categories such as noun or verb to each word in a text. While early systems divided words into 8 categories, modern tagsets contain over 40 tags. Tagging accuracy has improved from baseline methods that assign each word its most frequent tag (90% accuracy) to current systems that use machine learning and features from neighboring words to achieve over 97% accuracy. However, tagging unknown and ambiguous words remains challenging.

Uploaded by

Derryza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 18

Part-of-speech

tagging

A simple but useful form of


linguistic analysis

Christopher Manning
Christopher Manning

Parts of Speech
• Perhaps starting with Aristotle in the West (384–322 BCE), there
was the idea of having parts of speech
• a.k.a lexical categories, word classes, “tags”, POS
• It comes from Dionysius Thrax of Alexandria (c. 100 BCE) the
idea that is still with us that there are 8 parts of speech
• But actually his 8 aren’t exactly the ones we are taught today
• Thrax: noun, verb, article, adverb, preposition, conjunction, participle,
pronoun
• School grammar: noun, verb, adjective, adverb, preposition,
conjunction, pronoun, interjection
Open class (lexical) words
Nouns Verbs Adjectives old older oldest

Proper Common Main Adverbs slowly


IBM cat / cats see
Italy snow registered Numbers … more
122,312
one
Closed class (functional)
Modals
Determiners the some can Prepositions to with
had
Conjunctions and or Particles off up … more

Pronouns he its Interjections Ow Eh


Christopher Manning

Open vs. Closed classes


• Open vs. Closed classes
• Closed:
• determiners: a, an, the
• pronouns: she, he, I
• prepositions: on, under, over, near, by, …
• Why “closed”?
• Open:
• Nouns, Verbs, Adjectives, Adverbs.
Christopher Manning

POS Tagging
• Words often have more than one POS: back
• The back door = JJ
• On my back = NN
• Win the voters back = RB
• Promised to back the bill = VB
• The POS tagging problem is to determine the POS tag for a
particular instance of a word.
Christopher Manning

POS Tagging
• Input: Plays well with others Penn
• Ambiguity: NNS/VBZ UH/JJ/NN/RB IN NNS Treebank
POS tags
• Output: Plays/VBZ well/RB with/IN others/NNS
• Uses:
• Text-to-speech (how do we pronounce “lead”?)
• Can write regexps like (Det) Adj* N+ over the output for phrases, etc.
• As input to or to speed up a full parser
• If you know the tag, you can back off to it in other tasks
Christopher Manning

POS tagging performance


• How many tags are correct? (Tag accuracy)
• About 97% currently
• But baseline is already 90%
• Baseline is performance of stupidest possible method
• Tag every word with its most frequent tag
• Tag unknown words as nouns
• Partly easy because
• Many words are unambiguous
• You get points for them (the, a, etc.) and for punctuation marks!
Christopher Manning

Deciding on the correct part of speech can


be difficult even for people

• Mrs/NNP Shaefer/NNP never/RB got/VBD around/RP to/TO


joining/VBG

• All/DT we/PRP gotta/VBN do/VB is/VBZ go/VB around/IN the/DT


corner/NN

• Chateau/NNP Petrus/NNP costs/VBZ around/RB 250/CD


Christopher Manning

How difficult is POS tagging?


• About 11% of the word types in the Brown corpus are
ambiguous with regard to part of speech
• But they tend to be very common words. E.g., that
• I know that he is honest = IN
• Yes, that play was nice = DT
• You can’t go that far = RB
• 40% of the word tokens are ambiguous
Part-of-speech
tagging

A simple but useful form


of linguistic analysis

Christopher Manning
Part-of-speech
tagging revisited

A simple but useful form


of linguistic analysis

Christopher Manning
Christopher Manning

Sources of information
• What are the main sources of information for POS tagging?
• Knowledge of neighboring words
• Bill saw that man yesterday
• NNP NN DT NN NN
• VB VB(D) IN VB NN
• Knowledge of word probabilities
• man is rarely used as a verb….
• The latter proves the most useful, but the former also helps
Christopher Manning

More and Better Features  Feature-


based tagger
• Can do surprisingly well just looking at a word by itself:
• Word the: the  DT
• Lowercased word Importantly: importantly  RB
• Prefixes unfathomable: un-  JJ
• Suffixes Importantly: -ly  RB
• Capitalization Meridian: CAP  NNP
• Word shapes 35-year: d-x  JJ
• Then build a maxent (or whatever) model to predict tag
• Maxent P(t|w): 93.7% overall / 82.6% unknown
Christopher Manning

Overview: POS Tagging Accuracies


• Rough accuracies:
• Most freq tag: ~90% / ~50%

Most errors
• Trigram HMM: ~95% / ~55%
on unknown
• Maxent P(t|w): 93.7% / 82.6% words
• TnT (HMM++): 96.2% / 86.0%
• MEMM tagger: 96.9% / 86.9%
• Bidirectional dependencies: 97.2% / 90.0%
• Upper bound: ~98% (human agreement)
Christopher Manning

How to improve supervised results?


• Build better features!
RB
PRP VBD IN RB IN PRP VBD .
They left as soon as he arrived .

• We could fix this with a feature that looked at the next word
JJ
NNP NNS VBD VBN .
Intrinsic flaws remained undetected .

• We could fix this by linking capitalized words to their lowercase versions


Christopher Manning

Tagging Without Sequence Information


Baseline Three Words
t0 t0

w0 w-1 w0 w1

Model Features Token Unknown Sentence


Baseline 56,805 93.69% 82.61% 26.74%
3Words 239,767 96.57% 86.78% 48.27%
Using words only in a straight classifier works as well as a
basic (HMM or discriminative) sequence model!!
Christopher Manning

Summary of POS Tagging


For tagging, the change from generative to discriminative model does not
by itself result in great improvement
One profits from models for specifying dependence on overlapping
features of the observation such as spelling, suffix analysis, etc.
An MEMM allows integration of rich features of the observations, but can
suffer strongly from assuming independence from following
observations; this effect can be relieved by adding dependence on
following words
This additional power (of the MEMM ,CRF, Perceptron models) has been
shown to result in improvements in accuracy
The higher accuracy of discriminative models comes at the price of much
slower training
Part-of-speech
tagging revisited

A simple but useful form


of linguistic analysis

Christopher Manning

You might also like