Part-of-Speech (POS) Tagging
Part-of-Speech (POS) Tagging
5
Applications for POS Tagging
Speech synthesis
• Lead – leading a procession
• Lead - Element
Parsing: e.g. Time flies like an arrow
• Is flies an N or V?
Word prediction in speech recognition /Typing
• Possessive pronouns (my, your, her) are likely to be followed by nouns
• Personal pronouns (I, you, he) are likely to be followed by verbs
Machine Translation
To derive the internal structure of a sentence which
• Finds application in IR, IE and word sense disambiguation
6
Closed
Classes in
English
Open Classes
• Brown Corpus 1M words, 87 tags – more informative
but more difficult to tag
Choosing a • Penn Treebank: hand-annotated corpus of Wall
Street Journal, 1M words, 45-46 subset
POS Tagset • The C5 tagset used for the British National Corpus
(BNC) has 61 tags.
Penn
Treebank
Tagset
Using Penn Treebank Tags
• Many words have only one POS tag (e.g. is, Mary, very, smallest)
• Others have a single most likely tag
• Tags also tend to co-occur regularly with other tags (e.g. Det, N)
• Rule-Based: Human crafted rules based on lexical
and other linguistic knowledge.
• Learning-Based: Trained on human annotated
corpora like the Penn Treebank.
• Statistical models: Hidden Markov Model
POS Tagging (HMM), Maximum Entropy Markov Model
(MEMM), Conditional Random Field (CRF)
Approaches • Rule learning: Transformation Based Learning
(TBL)
• Neural networks: Recurrent networks like Long
Short Term Memory (LSTMs)
• Learning-based approaches have been found to be
more effective.
Some Ways to do POS Tagging
Rule-based tagging
• E.g. EnCG ENGTWOL tagger –English Two level Tagger
Transformation-based tagging
• Learned rules (statistic and linguistic)
• E.g., Brill tagger
Stochastic, or, Probabilistic tagging
• HMM (Hidden Markov Model) tagging
1. Start with a dictionary of words and possible
tags
Rule-Based 2. Assign all possible tags to words using the
dictionary
Tagging 3. Write rules by hand to selectively remove tags
4. Stop when each word has exactly one
(probably correct) tag
she PRP
Start with a promised VBN,VBD
POS to
back
TO
VB, JJ, RB, NN
Dictionary the
bill
DT
NN, VB
Assign All Possible POS to Each Word
NN
RB
VBN JJ VB
PRP VBD TO VB DT NN
She promised to back the bill
Apply Rules Eliminating Some POS
• Unigram - Predicts the most frequent tag for the every given token.
• Bigram tagger
• Given word and the previous word, and tag as tuple
• Get the given tag for the test word.
• Trigram Tagger
• Looks for the previous two words with a similar process.
• Decision Trees and Rule Learning
ML - •
•
Naïve Bayes and Bayesian Networks
Logistic Regression / Maximum Entropy (MaxEnt)
•
Classification •
Perceptron and Neural Networks
Support Vector Machines (SVMs)
• Nearest-Neighbor / Instance-Based
Beyond ML-Classification
Problems with
Sequence Difficult to propagate
uncertainty between decisions.
Labeling as
Classification
Difficult to “collectively”
determine the most likely joint
assignment of categories.
Probabilistic sequence models allow