0% found this document useful (0 votes)

58 views47 pages

Part-of-Speech (POS) Tagging

Uploaded by

saisuraj1510

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views47 pages

Part-of-Speech (POS) Tagging

Uploaded by

saisuraj1510

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

Part-of-Speech (POS) Tagging

POS tagging & terminologies

• Parts of speech - word classes or lexical categories.

• POS tagging - Word category disambiguation
• Collection of tags - tagset
• Classifying words into their parts of speech and labeling
them by considering the adjacent words is known
as part-of-speech tagging or POS-tagging
POS Tagging

The process of assigning a part-of-speech

or lexical class marker to each word in a
corpus:
Some Examples
N noun chair, printer
V verb study, chat
ADJ adjective yellow, tall,
ADV adverb unfortunately, fast
P preposition of, by, to
PRO pronoun I, me, mine
DET determiner the, a, that, an

5
Applications for POS Tagging
Speech synthesis
• Lead – leading a procession
• Lead - Element
Parsing: e.g. Time flies like an arrow
• Is flies an N or V?
Word prediction in speech recognition /Typing
• Possessive pronouns (my, your, her) are likely to be followed by nouns
• Personal pronouns (I, you, he) are likely to be followed by verbs
Machine Translation
To derive the internal structure of a sentence which
• Finds application in IR, IE and word sense disambiguation
6
Closed
Classes in
English
Open Classes
• Brown Corpus 1M words, 87 tags – more informative
but more difficult to tag
Choosing a • Penn Treebank: hand-annotated corpus of Wall
Street Journal, 1M words, 45-46 subset
POS Tagset • The C5 tagset used for the British National Corpus
(BNC) has 61 tags.
Penn
Treebank
Tagset
Using Penn Treebank Tags

The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ

topics/NNS ./.
• Prepositions marked IN
• “to” is just marked “TO”
Tag Ambiguity

Words often have more than one POS: back

• The back door = JJ
• On my back = NN
• Promised to back the bill = VB
• The POS tagging problem is to determine the POS tag for a particular
instance of a word
Tagging Whole Sentences with POS

• Ambiguous POS contexts

• E.g., Time flies like an arrow.
• Possible POS assignments
• Time/[V,N] flies/[V,N] like/[V,Prep] an/Det arrow/N
• Time/N flies/V like/Prep an/Det arrow/N
• Time/V flies/N like/Prep an/Det arrow/N
• Time/N flies/N like/V an/Det arrow/N
• …..
How Big is
this
Ambiguity
Problem?
POS

• Many words have only one POS tag (e.g. is, Mary, very, smallest)
• Others have a single most likely tag
• Tags also tend to co-occur regularly with other tags (e.g. Det, N)
• Rule-Based: Human crafted rules based on lexical
and other linguistic knowledge.
• Learning-Based: Trained on human annotated
corpora like the Penn Treebank.
• Statistical models: Hidden Markov Model
POS Tagging (HMM), Maximum Entropy Markov Model
(MEMM), Conditional Random Field (CRF)
Approaches • Rule learning: Transformation Based Learning
(TBL)
• Neural networks: Recurrent networks like Long
Short Term Memory (LSTMs)
• Learning-based approaches have been found to be
more effective.
Some Ways to do POS Tagging

Rule-based tagging
• E.g. EnCG ENGTWOL tagger –English Two level Tagger
Transformation-based tagging
• Learned rules (statistic and linguistic)
• E.g., Brill tagger
Stochastic, or, Probabilistic tagging
• HMM (Hidden Markov Model) tagging
1. Start with a dictionary of words and possible
tags
Rule-Based 2. Assign all possible tags to words using the
dictionary
Tagging 3. Write rules by hand to selectively remove tags
4. Stop when each word has exactly one
(probably correct) tag
she PRP
Start with a promised VBN,VBD

POS to
back
TO
VB, JJ, RB, NN
Dictionary the
bill
DT
NN, VB
Assign All Possible POS to Each Word

NN
RB
VBN JJ VB
PRP VBD TO VB DT NN
She promised to back the bill
Apply Rules Eliminating Some POS

Eliminate VBN if VBD is an option when VBN|VBD follows “<start> PRP”

NN
RB
VBN JJ VB
PRP VBD TO VB DT NN
She promised to back the bill
Apply Rules Eliminating Some POS

Eliminate VBN if VBD is an option when VBN|VBD follows “<start> PRP”

NN
RB
JJ VB
PRP VBD TO VB DT NN
She promised to back the bill
ENG Constraint Grammar
English Constraint Grammar
Grammar for morphological (e.g. part-of-speech) disambiguation
• 1,200 "grammar-based" constraints
• 99.7-100% of all words retain the appropriate morphological reading
• 3-7% of all words remain (partly) ambiguous
• 200 "heuristic" constraints
• resolves some 50% of remaining ambiguities
• after heuristic disambiguation, 99.5% or more retain the appropriate
morphological reading
Grammar for determining syntactic functions
• 830 syntactic constraints for syntactic ambiguity resolution
• some 85-90% of all words become syntactically unambiguous
EngCG
ENGTWOL ✓ 1,100 constraints
✓ 93-97% of the words are correctly
(English Two disambiguated
✓ Heuristic rules can be applied over the rest
Level) Tagger
Sample
ENGTWOL
Dictionary
Regex tagger

✓ Deﬁne a regular expression

✓ Define tag for the given expressions.
[( [0-9]+(.[0-9]+)$', 'CD'), # cardinal numbers
( (The|the|A|a|An|an)$', 'AT'), # articles
( .*able$', 'JJ'), # adjectives
( .*ness$', 'NN'), # nouns formed from adj

Regex tagger ( .*ly$', 'RB'), # adverbs

( .*s$', 'NNS'), # plural nouns
( .*ing$', 'VBG'), # gerunds
(.*ed$', 'VBD'), # past tense verbs
(.*', 'NN') # nouns (default)])
Transformation-Based (Brill) Tagging

Combines Rule-based and Stochastic Tagging

• Like rule-based
Rules are used to specify tags
• Like stochastic approach
Uses tagged corpus to find the best performing rules
Input:
• Tagged corpus
• Dictionary (with most frequent tags)
• Step 1: Label every word with most likely tag (from
dictionary)

TBL Tagging • Step 2: Check every possible transformation &

select one improves tag accuracy (Gold)
• Step 3: Re-tag corpus applying this rule, and add
Algorithm rule to end of rule set
• Repeat 2-3 until some stopping criterion is reached
• E.g., X% correct with respect to training corpus
Templates
for TBL
Labels every word with its most-likely tag
• P(NN|race) = .98 P(VB|race)= .02

Sample TBL • is/VBZ expected/VBN to/TO race/NN tomorrow/NN

Rule Apply rule that Improves tag accuracy

“Change NN to VB when previous tag is TO”
Application … is/VBZ expected/VBN to/TO race/NN tomorrow/NN
becomes
… is/VBZ expected/VBN to/TO race/VB tomorrow/NN
✓Keep applying (new) transformations
TBL Issues endlessly
✓Rules may interrelate
Evaluating Tagging Approaches

Possible Gold Standards :

• Annotated corpus
• Human performance (96-97%)
• How well do humans agree?
Methodology: Error Analysis
Confusion matrix:
• E.g. which tags did we most often confuse with which other tags?

8.7% of the total errors caused by mistagging NN as JJ

Tag indeterminacy:
More ✓ Gold /truth is not clear
Tagging multipart words
Complex ✓wouldn’t --> would/MD n’t/RB
Unknown words
Issues ✓Assume all tags equally likely
✓Use morphology
N-gram tagger
Considers previous n words to predict the
POS tag for the given token
• Unigram Tagger
Sequential • Bigram Tagger
• Trigram Tagger
taggers Regex tagger
N-gram taggers comparison

• Unigram - Predicts the most frequent tag for the every given token.
• Bigram tagger
• Given word and the previous word, and tag as tuple
• Get the given tag for the test word.
• Trigram Tagger
• Looks for the previous two words with a similar process.
• Decision Trees and Rule Learning

ML - •
•
Naïve Bayes and Bayesian Networks
Logistic Regression / Maximum Entropy (MaxEnt)
•
Classification •
Perceptron and Neural Networks
Support Vector Machines (SVMs)
• Nearest-Neighbor / Instance-Based
Beyond ML-Classification

• Standard classification - Assumes individual classes are disconnected and independent

• Many NLP problems do not satisfy this assumption
• Involve making many connected decisions
• Each resolving a different ambiguity
• mutually dependent
• More sophisticated learning and inference techniques are needed
Sequence Labeling Problem

• Many NLP problems can viewed as sequence labeling.

• Each token in a sequence is assigned a label.
• Labels of tokens are dependent on the labels of other tokens in the sequence
Information Extraction
• Identify phrases in language that refer to specific types of entities and
relations in text.
• Named entity recognition is task of identifying names of people, places,
organizations, etc. in text.
people organizations places
• Sundar Pitchai is the CEO of Google Corporation and lives in New York.
• Extract pieces of information relevant to a specific application, e.g. used
car ads:
make model year mileage price
• For sale, Benz, C3, 2016, 20,000 mi, $11K or best offer. Available
starting July 30, 2017.
Semantic Role Labeling
For each clause, determine the semantic role played by each noun
phrase that is an argument to the verb.
agent target source destination instrument
• John drove Mary from Rome to Greece in his Benz.

“case role analysis,” “thematic analysis,” and “shallow semantic parsing”

Bioinformatics
Sequence labeling also valuable in labeling genetic sequences in genome
analysis.
Extron intron
• AGCTAACGTTCGATACGGATTACAGCCT
Not easy to integrate
information from category of
tokens on both sides.

Problems with
Sequence Difficult to propagate
uncertainty between decisions.
Labeling as
Classification
Difficult to “collectively”
determine the most likely joint
assignment of categories.
Probabilistic sequence models allow

• Integrating uncertainty over multiple,

Probabilistic interdependent classifications
• Collectively determine the most likely global
Sequence assignment.

Models Two standard models

• Hidden Markov Model (HMM)

• Conditional Random Field (CRF)
References Speech & Language Processing By Dan Jurafsky

NLP Ia2
No ratings yet
NLP Ia2
18 pages
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
No ratings yet
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
93 pages
NLP Unit III Notes
No ratings yet
NLP Unit III Notes
30 pages
10pos Tagging PDF
No ratings yet
10pos Tagging PDF
76 pages
NLP 4
No ratings yet
NLP 4
83 pages
Lect6 Pos
No ratings yet
Lect6 Pos
62 pages
Chapter Two Natural Language Processing
No ratings yet
Chapter Two Natural Language Processing
141 pages
10 - POS Tagging
No ratings yet
10 - POS Tagging
75 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
94 pages
Unit 1 NLP KCS072
No ratings yet
Unit 1 NLP KCS072
12 pages
NLP-Lectures 4,5,6
No ratings yet
NLP-Lectures 4,5,6
85 pages
Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis
No ratings yet
Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis
18 pages
8 POSNER Intro May 6 2021
No ratings yet
8 POSNER Intro May 6 2021
26 pages
Unit 3
No ratings yet
Unit 3
50 pages
5 Sequence Learning
No ratings yet
5 Sequence Learning
50 pages
Lec3-Posner Intro
No ratings yet
Lec3-Posner Intro
30 pages
Ijcnn 2001
No ratings yet
Ijcnn 2001
5 pages
Lecture 16-17-18-19
No ratings yet
Lecture 16-17-18-19
42 pages
Natural Language Processing: Parts of Speech Tagging - Pos
No ratings yet
Natural Language Processing: Parts of Speech Tagging - Pos
20 pages
Lecture 5
No ratings yet
Lecture 5
56 pages
(The Basel Commentary) Krieter-Spiro, Martha - Homer's Iliad. Book III (2015, de Gruyter) PDF
100% (2)
(The Basel Commentary) Krieter-Spiro, Martha - Homer's Iliad. Book III (2015, de Gruyter) PDF
214 pages
Module-5 (Markov Model and Pos Tagging)
No ratings yet
Module-5 (Markov Model and Pos Tagging)
66 pages
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
No ratings yet
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
108 pages
Cme4408 p6 Pos Tagging
No ratings yet
Cme4408 p6 Pos Tagging
33 pages
Pos Tagging Pushpak
No ratings yet
Pos Tagging Pushpak
88 pages
Parts of Speech
No ratings yet
Parts of Speech
26 pages
Rule-Based POS Tagging: Part of Speech Tagging
No ratings yet
Rule-Based POS Tagging: Part of Speech Tagging
10 pages
CH-2 Natural Language Processing Models and Algorithm
No ratings yet
CH-2 Natural Language Processing Models and Algorithm
119 pages
Assignment 3
No ratings yet
Assignment 3
12 pages
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
No ratings yet
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
86 pages
3 Natural Language Processing-PoS Tagging
No ratings yet
3 Natural Language Processing-PoS Tagging
14 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
Module 2 HMMPPT
No ratings yet
Module 2 HMMPPT
31 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
Explain in Detail Rule Based POS Tagging
No ratings yet
Explain in Detail Rule Based POS Tagging
12 pages
Module 3
No ratings yet
Module 3
33 pages
Pos Tagging and Chunking
No ratings yet
Pos Tagging and Chunking
29 pages
Lecture 20-23 Part of Speech Tagging
No ratings yet
Lecture 20-23 Part of Speech Tagging
36 pages
Parts of Speech Tagging
No ratings yet
Parts of Speech Tagging
17 pages
Rutuja
No ratings yet
Rutuja
10 pages
Ai TXT Unit4
No ratings yet
Ai TXT Unit4
39 pages
Lecture Part of Speech Tagging
No ratings yet
Lecture Part of Speech Tagging
41 pages
Lecture#11 (POS Tagging)
No ratings yet
Lecture#11 (POS Tagging)
19 pages
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
No ratings yet
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
40 pages
Unit 3
No ratings yet
Unit 3
16 pages
English Grammar For PTE
No ratings yet
English Grammar For PTE
9 pages
Unit No 3
No ratings yet
Unit No 3
8 pages
NLPChapter 3
No ratings yet
NLPChapter 3
14 pages
Part-Of-Speech (POS) Tagging
No ratings yet
Part-Of-Speech (POS) Tagging
53 pages
Lecture Notes On Syntactic Processing
No ratings yet
Lecture Notes On Syntactic Processing
14 pages
Part of Speech Tagging and Hidden Markov Models
No ratings yet
Part of Speech Tagging and Hidden Markov Models
24 pages
POS Tagging
No ratings yet
POS Tagging
5 pages
Hmms Spring2013
No ratings yet
Hmms Spring2013
22 pages
Multi-Tagging For Transition-Based Dependency Parsing
No ratings yet
Multi-Tagging For Transition-Based Dependency Parsing
10 pages
Rule Based POS Tagging Example
No ratings yet
Rule Based POS Tagging Example
4 pages
POStagging
No ratings yet
POStagging
72 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
36 pages
POS Tagging: Introduction: Heng Ji
No ratings yet
POS Tagging: Introduction: Heng Ji
35 pages
POS Tagging-II
No ratings yet
POS Tagging-II
11 pages
SPR 07 Nltk2
No ratings yet
SPR 07 Nltk2
30 pages
Breakthrough Plus Level 1 Scope and Sequence
100% (1)
Breakthrough Plus Level 1 Scope and Sequence
2 pages
Level 1 - Opt
No ratings yet
Level 1 - Opt
66 pages
The Conditionals
No ratings yet
The Conditionals
26 pages
LM3 TRM Grammar Worksheet6 Passive 25631
No ratings yet
LM3 TRM Grammar Worksheet6 Passive 25631
1 page
LP - Infinitives
No ratings yet
LP - Infinitives
7 pages
EmSat Revision. Final
No ratings yet
EmSat Revision. Final
29 pages
Spanish The Subjunctive
No ratings yet
Spanish The Subjunctive
10 pages
Nouns and Pronouns
100% (1)
Nouns and Pronouns
18 pages
Sesión N 01 Greetings and Farewells
No ratings yet
Sesión N 01 Greetings and Farewells
58 pages
BBH Short-2
No ratings yet
BBH Short-2
42 pages
1-The Student Is Able To Use The Present Perfect and The Past Simple
No ratings yet
1-The Student Is Able To Use The Present Perfect and The Past Simple
2 pages
Unit 05 Heroes - Lesson A PDF
No ratings yet
Unit 05 Heroes - Lesson A PDF
64 pages
Whose This? It's Neiyoursihers: Is Etc
No ratings yet
Whose This? It's Neiyoursihers: Is Etc
2 pages
Beshitos
No ratings yet
Beshitos
15 pages
Grade 3 Syllabus 2021-2022
No ratings yet
Grade 3 Syllabus 2021-2022
1 page
Expressing Need With Personal Infinitive Verbs in Hungarian
No ratings yet
Expressing Need With Personal Infinitive Verbs in Hungarian
4 pages
Lesson3 Part1 PDF v2.0
No ratings yet
Lesson3 Part1 PDF v2.0
24 pages
Simple Present Tense: By: Ronaldi, S PD
No ratings yet
Simple Present Tense: By: Ronaldi, S PD
33 pages
4MS Revision For EXAM
No ratings yet
4MS Revision For EXAM
1 page
Duzmeler
No ratings yet
Duzmeler
6 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Holiday Romance
No ratings yet
Holiday Romance
2 pages
Future Perfect Continuous Key - University of California - Answer Key
No ratings yet
Future Perfect Continuous Key - University of California - Answer Key
2 pages
Eng Quen
No ratings yet
Eng Quen
93 pages
Pronouns and Prepositions
No ratings yet
Pronouns and Prepositions
4 pages
TP Maxi-Liste Anglais-Français
No ratings yet
TP Maxi-Liste Anglais-Français
3 pages
Unit 1 All About You Autoguardado
No ratings yet
Unit 1 All About You Autoguardado
19 pages
German Greetings
No ratings yet
German Greetings
2 pages
Across and Down: The ABC's of Solving Crossword Puzzles
From Everand
Across and Down: The ABC's of Solving Crossword Puzzles
Adrienne Cadik
No ratings yet
Quick Cups Of Coca
From Everand
Quick Cups Of Coca
Mura Nava
No ratings yet

Part-of-Speech (POS) Tagging

Uploaded by

Part-of-Speech (POS) Tagging

Uploaded by

Part-of-Speech (POS) Tagging

POS tagging & terminologies

• Parts of speech - word classes or lexical categories.

The process of assigning a part-of-speech

The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ

Words often have more than one POS: back

• Ambiguous POS contexts

Eliminate VBN if VBD is an option when VBN|VBD follows “<start> PRP”

Eliminate VBN if VBD is an option when VBN|VBD follows “<start> PRP”

✓ Deﬁne a regular expression

Regex tagger ( .*ly$', 'RB'), # adverbs

Combines Rule-based and Stochastic Tagging

TBL Tagging • Step 2: Check every possible transformation &

Sample TBL • is/VBZ expected/VBN to/TO race/NN tomorrow/NN

Rule Apply rule that Improves tag accuracy

Possible Gold Standards :

8.7% of the total errors caused by mistagging NN as JJ

• Standard classification - Assumes individual classes are disconnected and independent

• Many NLP problems can viewed as sequence labeling.

“case role analysis,” “thematic analysis,” and “shallow semantic parsing”

• Integrating uncertainty over multiple,

Models Two standard models

• Hidden Markov Model (HMM)

You might also like