Lecture 16-17-18-19
Lecture 16-17-18-19
COURSE OBJECTIVES
2
COURSE OUTCOMES
3
Contents to be Covered
• Part of Speech tagging
• Rule based part of speech Tagging
• Transformation based tagging.
PART OF SPEECH TAGGING
• Introduction
• Part of speech tagging(POS)
• Rule-based taggers
• Statistical taggers
• Hybrid approaches
POS Tagging
INTRODUCTION1
Content
1. Introduction to Human Language Technology
2. Applications
3. Resources
4. Language models
5. Morphology and lexicons
6. Syntactic processing
7. Semantic processing
8. Generation
POS Tagging
INTRODUCTION
2
- Parts of speech (POS), word classes, morpho- logical
classes, or lexical tags give information about a word
and its neighbors
Open class:
Nouns: people, place and things proper nouns, common
nouns,count nouns and mass nouns
Verbs: actions and processes. Main verbs, not auxiliaries
Adjectives: Properties
Adverbs
POS Tagging
PART OF SPEECH CATEGORIES 2
POS Tagging 1
PART OF SPEECH CATEGORIES 3
POS Tagging 1
PART OF SPEECH CATEGORIES 5
POS Tagging 1
PART OF SPEECH TAGGING 2
PAVLOV N SG PROPER
HAVE V PAST VFIN SVO (verb with subject and object)
HAVE PCP2(past participle) SVO
SHOWN SHOW PCP2 SVO SV SVOO (verb with
subject and two complements)
THAT ADV
PRON DEM SG
DET CENTRAL DEM SG
CS (subordinating conjunction)
SALIVATION N SG
POS Tagging 1
PART OF SPEECH TAGGING 3
Pos taggers
• Rule-based
• Statistical
• Hybrid
POS Tagging 1
PART OF SPEECH TAGGING 6
W = w 1 w 2 … w n sequence of words
T = t1 t2 …t n sequence of POS tags
f : W T = f(W)
POS Tagging 1
RULE-BASED TAGGERS1
POS Tagging 1
Brill’s set of
templates
“Change tag a to tag b when: ..” The
preceding (following) word is tagged z. The word
two before (after) is tagged z.
One of the two preceding (following) words is
tagged z.
One of the three preceding (following) words is
tagged z.
The preceding word is tagged z and the following
word is tagged w.
The preceding (following) word is tagged z and the
word two before (after) is tagged w
a,b,z and w are part of speech tags
Rules are automatically induced from tagged corpus
POS Tagging 2
RULE-BASED TAGGERS2
ADVERBIAL - THAT RULE
Given input: “that”
if
(+1 A/ADV/QUANT) /* if next word is adj, adv or quantifier */ (+2 SENT-
LIM) /* and following is a sentence boundary */ (NOT -1
SVOC/A) /* and the previous word is not a verb like */
/* ‘consider’ which allows adjs as object complements */
then eliminate non-ADV tags
else eliminate ADV tag
Ex: In the sentence “I consider that odd “, that will not be tagged as
adverb (ADV)
POS Tagging 2
RULE-BASED TAGGERS3
Linguistically motivated rules
High precision
ej. EngCG 99.5%
– High development cost
– Not transportable
–Time cost of tagging
• TAGGIT, Green,Rubin,1971
• TOSCA, Oosdijk,1991
•Constraint Grammars, EngCG, Voutilainen,1994,
Karlsson et al, 1995
• AMBILIC, de Yzaguirre et al, 2000
POS Tagging 2
RULE-BASED TAGGERS4
Constraint Grammars CG
POS Tagging 2
RULE-BASED TAGGERS5
Constraint Grammars CG
• ENGCG. ENGTWOL Reductionist
POS tagging 1,100 constraints
93-97% of the words are correctly disambiguated 99.7%
accuracy
Heuristic rules can be applied over the rest 2-3%
residual ambiguity with 99.6% precision
CG syntactic
POS Tagging 2
STATISTICAL POS TAGGING 1
To find the most probable tag sequence given the observationP(t1 n|w ) isn highest.
sequence of n words w1 , that is, find 1
n
But P(t1 |w ) is difficult to compute and Bayesian classification rule is
n n
1
used:
P(x|y) = P(x) P(y|x) / P(y)
When applied to the sequence of words, the most probable tag sequence would be
where P(w1 ) does not change and thus do not need to be calculated
n
Thus, the most probable tag sequence is the product of two probabilites for each possible sequence:
- Prior probability of the tag sequence. Context P(t1 )
n
- Likelihood of the sequence of words considering a sequence of (hidden) tags. P(w n|t n)
POS Tagging 2
STATISTICAL POS TAGGING 2
Two simplifications for computing the most probable sequence of tags
- Prior probability of the part of speech tag of a word depends only on the tag of the
previous word (bigrams, reduce context to previous). Facilitates the computation of P(t1 )
n
POS Tagging 2
STATISTICAL POS TAGGING 3
●
Secretariat/NNP is/BEZ expected/VBN to/TO
race/VB tomorrow/NR
●
People/NNS continue/VB to/TO inquire/VB the/AT
reason/NN for/IN the/AT race/NN for/IN outer/JJ
space/NN
POS Tagging 2
STATISTICAL POS TAGGING
4
POS Tagging 2
STATISTICAL POS TAGGING 5
Formalization of a Hidden Markov Model Q = q1q2 ...qN
a set of N states
A = a11a12 ...an1 ...ann a transition probability matrix A, each aij
representing the probability of moving from state i to state j,
∑nj=1 aij = 1 ∀i
O = o1o2 ...oT a sequence of T observations, each one drawn from
a vocabulary V = v1,v2,...,vV
B = bi(ot) A sequence of observation likelihoods, also called
emission probabilities, each expressing the probability of an
observation ot being generated from a state i.
q0,qF
POS Tagging 2
STATISTICAL POS TAGGING 6
POS Tagging 3
STATISTICAL POS TAGGING 7
POS Tagging 3
STATISTICAL POS TAGGING 8
VB TO NN PPSS
from the 87-tag Brown corpus without smoothing. The rows are labeled
with the conditioning event; thus P(PPSS|VB) is .0070. The symbol <s> is
the start-of-sentence symbol
POS Tagging 3
STATISTICAL POS TAGGING 9
I want to race
VB 0 .0093 0 .00012
TO 0 0 .99 0
NN 0 .000054 0 .00057
PPSS .37 0 0 0
POS Tagging 3
STATISTICAL POS TAGGING 10
POS Tagging 3
STATISTICAL POS TAGGING 11
POS Tagging 3
STATISTICAL POS TAGGING 12
Data-driven
LM and smoothing automatically learned from
tagged corpora (supervised learning)
N-gram
Hidden Markov Models Machine
Charniak, 1993
Learning Jelinek, 1998
Supervised learning Manning, Schütze,
1999
– Semi-supervised
– Forward-Backward, Baum-Welc h
POS Tagging 3
HYBRID SYSTEMS1
• Maximum Entropy
• Combination of several knowledge sources No
• independence is assumed
Ratnaparkhi, 1998,
• A high number of parameters is allowed Rosenfeld, 1994
(e.g. lexical features) Ristad, 1997
POS Tagging 3
HYBRID SYSTEMS2
Brill’s system
POS Tagging 3
OTHER COMPLEX SYSTEMS1
Black,Magerman, 1992
• Decision trees Magerman 1996
Màrquez, 1999
• Supervised learning Màrquez, Rodríguez, 1997
• ej. TreeTagger
TiMBL
Case-based, Memory-based
Daelemans et al, 1996
Learning
• Relaxation labelling
• Statistical and linguistic Padrò,
constraints ej. RELAX 1997
POS Tagging 3
OTHER COMPLEX SYSTEMS2
Combining taggers
Màrquez, Rodríguez, 1998
• Combination of Language models in a Màrquez, 1999
tagger Padrò, 1997
• STT+
• RELAX
• Combination of taggers through Màrquez et al, 1998
votation
•bootstrapping Combinación of
• classifiers Brill, Wu, 1998
• bagging (Breiman, 1996) Màrquez et al,
1999
• boosting (Freund, Schapire, Abney et al, 1999
1996)
POS Tagging 4
Reference:
Books:
TEXTBOOKS
T1: Speech and Language processing an introduction to Natural Language Processing, Computational Linguistics
and speech Recognition by Daniel Jurafsky and James H. Martin
T2: Natural Language Processing with Python by Steven Bird, Ewan Klein, Edward Lopper
REFERENCE BOOKS:
R1: Handbook of Natural Language Processing, Second Edition—Nitin Indurkhya, Fred J. Damerau, Fred J. Damera
Course Link:
https://in.coursera.org/specializations/natural-language-processing
Video Link:
https://youtu.be/YVQcE5tV26s
Web Link:
https://www.tutorialspoint.com/natural_language_processing/natural_language_processing_tutorial.pdf
41
Thank you
42