NLP04 PartOfSpeechTagging
NLP04 PartOfSpeechTagging
SoSe 2017
Part-of-speech tagging
2
Examples of POS tags
●
Noun: book/books, nature, Germany, Sony
●
Verb: eat, wrote
●
Auxiliary: can, should, have
●
Adjective: new, newer, newest
●
Adverb: well, urgently
●
Number: 872, two, first
●
Article/Determiner: the, some
●
Conjuction: and, or
●
Pronoun: he, my
●
Preposition: to, in
●
Particle: off, up
●
Interjection: Ow, Eh
3
Motivation: Speech Synthesis
●
Word „content“
– „Eggs have a high protein content.“
– „She was content to step down after four years as chief
executive.“
4 (http://www.thefreedictionary.com/content)
Motivation: Machine Translation
5
Motivation: Syntactic parsing
6
(http://nlp.stanford.edu:8080/parser/index.jsp)
Motivation: Information extraction
7 (http://www.nactem.ac.uk/tsujii/GENIA/tagger/)
Motivation: Information extraction
8 (http://www.nactem.ac.uk/tsujii/GENIA/tagger/)
Open vs. Closed Classes
● Closed
– limited number of words, do not grow usually
– e.g., Auxiliary, Article, Determiner, Conjuction, Pronoun,
Preposition, Particle, Interjection
● Open
– unlimited number of words
– e.g., Noun, Verb, Adverb, Adjective
9
POS Tagsets
●
There are many parts of speech tagsets
●
Tag types
– Coarse-grained
●
Noun, verb, adjective, ...
– Fine-grained
●
noun-proper-singular, noun-proper-plural, noun-
common-mass, ..
●
verb-past, verb-present-3rd, verb-base, ...
●
adjective-simple, adjective-comparative, ...
10
POS Tagsets
●
Brown tagset (87 tags)
– Brown corpus
●
C5 tagset (61 tags)
●
C7 tagset (146 tags!)
●
Penn TreeBank (45 tags) – most used
– A large annotated corpus of English tagset
11
POS Tagging
●
The process of assigning a part of speech to each word in a
text
●
Challenge: words often have more than one POS
– On my back[NN] (noun)
– The back[JJ] door (adjective)
– Win the voters back[RB] (adverb)
– Promised to back[VB] the bill (verb)
12
Ambiguity in POS tags
●
45-tags Brown corpus (word types)
– Unambiguous (1 tag): 38,857
– Ambiguous: 8,844
●
2 tags: 6,731
●
3 tags: 1,621
●
4 tags: 357
●
5 tags: 90
●
6 tags: 32
●
7 tags: 6 (well, set, round, open, fit, down)
●
8 tags: 4 ('s, half, back, a)
●
9 tags: 3 (that, more, in)
13
Baseline method
●
This method performs around 90% precision
14
POS Tagging
●
The process of assigning a POS tag to each word in a text.
Choosing the best candidate tag for each word.
– Plays (NNS/VBZ)
– well (UH/JJ/NN/RB)
– with (IN)
– others (NNS)
15
Rule-Based Tagging
●
Standard approach (two steps):
1. Dictionaries to assign a list of potential tags
●
Plays (NNS/VBZ)
●
well (UH/JJ/NN/RB)
●
with (IN)
●
others (NNS)
2. Hand-written rules to restrict to a POS tag
●
Plays (VBZ)
●
well (RB)
●
with (IN)
●
others (NNS)
16
Rule-Based Tagging
●
Some approaches rely on morphological parsing
– e.g., EngCG Tagger below
….
17
(http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.57.972&rep=rep1&type=pdf)
Sequential modeling
18
Sequential modeling
●
Making a decision based on:
– Current Observation:
● Word (W0): „35-years-old“
●
Prefix, Suffix: „computation“ „comp“, „ation“
●
Lowercased word: „New“ „new“
●
Word shape: „35-years-old“ „d-a-a“
– Surrounding observations
● Words (W+1, W−1)
– Previous decisions
● POS tags (T−1, T−2)
19
Sequential modeling
●
Greedy inference
– Start in the beginning of the sequence
– Assign a label to each item using the classifier
– Using previous decisions as well as the observed data
20
Sequential modeling
●
Beam inference
– Keeping the top k labels in each position
– Extending each sequence in each local way
– Finding the best k labels for the next position
21
Hidden Markov Model (HMM)
●
Probabilistic View
– Considering all possible sequences of tags
– Choosing the tag sequence from this universe of
sequences, which is most probable given the observation
sequence
22
Using the Bayes Rule
t̂1n=argmax t P (t 1n∣w 1n )
n
1
P ( B∣A)⋅P ( A)
P ( A∣B)=
P (B )
n n P (w n1∣t n1 )⋅P (t n1 )
P (t ∣w )=
1 1
P (w n1 )
23
Using Markov Assumption
n
P (w n1∣t n1 )≃ i=1 ∏ P (w i∣t i ) (it depends only on its POS tag and independent of other words)
n
P(t 1n )≃ i=1 ∏ P (t i∣t i−1) (it depends only on the previous POS tag, thus, bigram)
n
t̂1n =argmax t n
1 i=1 ∏ P (wi∣t i )⋅P (t i∣t i −1)
24
Two Probabilities
C (t i −1 , t i )
P (t i∣t i−1 )=
C (t i −1 )
25
Two Probabilities
C (t i , w i )
P (w i∣t i )=
C (t i )
26
Two Probabilities
C ([ DT ] ,[ NN ])
P([ NN ]∣[ DT ])=
C ([ DT ])
C ([ NN ] , man)
P(man∣[ NN ])=
C ([ NN ])
27
Ambiguity in POS tagging
28
Ambiguity
29
Ambiguity
P(VB|TO) = 0.83
P(race|VB) = 0.00012
P(NR|VB) = 0.0027
P(VB|TO).P(NR|VB).P(race|VB) = 0.00000027
30
Ambiguity
P(NN|TO) = 0.00047
P(race|NN) = 0.00057
P(NR|NN) = 0.0012
P(NN|TO).P(NR|NN).P(race|NN) = 0.00000000032
31
Viterbi algorithm
● Probability matrix
– Columns corresponding to inputs (words)
– Rows corresponding to possible states (POS tags)
32
Viterbi algorithm
1. Move through the matrix in one pass filling the columns left to
right using the transition probabilities and observation
probabilities
2. Store the max probability path to each cell (not all paths) using
dynamic programming
33
qend end
q4 NN
q3 TO
q2 VB
q1 PPSS
q0 start
i want to race
34
Q1 Q2 Q3 Q4
qend end
q3 TO
q2 VB
q1 PPSS
q0 start
V0(0)=1.0
i want to race
35
Q1 Q2 Q3 Q4
qend end end
q2 VB VB
P(VB|start)·P(start)
.019·1.0=0.019
q1 PPSS PPSS
P(PPSS|start)·P(start)
.067·1.0=0.067
q0 start start
v0(0)=1.0
i want to race
36
Q1 Q2 Q3 Q4
N
qend end end v t ( j )=max v t −1 (i)⋅a ij⋅b j (ot )
i=1
q4 NN NN
v1(4)=P(PPSS|start)·P(start)·P(I|NN)=0.041·0=0
q3 TO TO
v1(3)=P(PPSS|start)·P(start)·P(I|TO)=0.043·0=0
q2 VB VB
v1(2)=P(PPSS|start)·P(start)·P(I|VB)=0.019·0=0
q1 PPSS PPSS
v1(1)=P(PPSS|start)·P(start)·P(I|PPSS)=0.067·0.37=0.025
q0 start start
i want to race
37
Q1 Q2 Q3 Q4
qend end end end
q4 NN NN NN
v1(4)·P(VB|NN)
0·.0040=0
q3 TO TO TO
v1(3)·P(VB|TO)
0·.83=0
v1(2)·P(VB|VB)
q2 VB VB VB
0·.0038=0
v1(1)·P(VB|PPSS)
.025·.23=0.0055
q1 PPSS PPSS PPSS
i want to race
38
Q1 Q2 Q3 Q4
qend end end end
q4 NN NN NN
q3 TO TO TO
q2 VB VB VB
v2(2)=max(0,0,0,.0055)·.0093=.000051
i want to race
39
Q1 Q2 Q3 Q4
qend end end end
q4 NN NN NN
q3 TO TO TO
q2 VB VB VB
i want to race
40
Q1 Q2 Q3 Q4
qend end end end end end end
q4 NN NN NN NN NN NN
q3 TO TO TO TO TO TO
q2 VB VB VB VB VB VB
i want to race
41
Q1 Q2 Q3 Q4
qend end end end end end end
q4 NN NN NN NN NN NN
q3 TO TO TO TO TO TO
q2 VB VB VB VB VB VB
i want to race
42
Q1 Q2 Q3 Q4
POS tagging using machine learning
(https://link.springer.com/chapter/10.1007/11573036_36)
43
POS tagging using
neural networks
44 (https://www.aclweb.org/anthology/P/P16/P16-2067.pdf)
Evaluation
● Corpus
– Training and test, and optionally also development set
– Training (cross-validation) and test set
● Evaluation
– Comparison of gold standard (GS) and predicted tags
– Evaluation in terms of Precision, Recall and F-Measure
45
Precision and Recall
●
Precision:
– Amount of labeled items which are correct
tp
Precision=
tp+ fp
●
Recall:
– Amount of correct items which have been labeled
tp
Recall =
tp+ fn
46
F-Measure
●
There is a strong anti-correlation between precision and recall
●
Having a trade off between these two metrics
●
Using F-measure to consider both metrics together
●
F -measure is a weighted harmonic mean of precision and
recall
(β2 +1) P R
F= 2
β P+R
47
Error Analysis
●
Confusion matrix or contingency table
– Percentage of overall tagging error
48
Summary
49
Tools for POS tagging
●
Spacy: https://spacy.io/
●
OpenNLP: https://opennlp.apache.org/
●
Stanford CoreNLP: https://stanfordnlp.github.io/CoreNLP/
●
NLTK Python: http://www.nltk.org/
●
and others...
50
Further reading
●
Book Jurafski & Martin
– Chapter 5
51
Exercise
●
Project: choose a POS tagger and use it in your project.
– Can POS tags support your task?
52