0% found this document useful (0 votes)
32 views52 pages

NLP04 PartOfSpeechTagging

The document discusses part-of-speech tagging, including motivation for its use in applications like speech synthesis, machine translation, syntactic parsing and information extraction. It covers topics like POS tagsets, ambiguity, baseline methods, rule-based and statistical tagging using hidden Markov models.

Uploaded by

ipn7 birta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views52 pages

NLP04 PartOfSpeechTagging

The document discusses part-of-speech tagging, including motivation for its use in applications like speech synthesis, machine translation, syntactic parsing and information extraction. It covers topics like POS tagsets, ambiguity, baseline methods, rule-based and statistical tagging using hidden Markov models.

Uploaded by

ipn7 birta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Natural Language Processing

SoSe 2017

Part-of-speech tagging

Dr. Mariana Neves May 22nd, 2017


Part-of-Speech (POS) Tags

● Also known as:


– Part-of-speech tags, lexical categories, word classes,
morphological classes, lexical tags

Plays[VERB] well[ADVERB] with[PREPOSITION] others[NOUN]

Plays[VBZ] well[RB] with[IN] others[NNS]

2
Examples of POS tags


Noun: book/books, nature, Germany, Sony

Verb: eat, wrote

Auxiliary: can, should, have

Adjective: new, newer, newest

Adverb: well, urgently

Number: 872, two, first

Article/Determiner: the, some

Conjuction: and, or

Pronoun: he, my

Preposition: to, in

Particle: off, up

Interjection: Ow, Eh

3
Motivation: Speech Synthesis


Word „content“
– „Eggs have a high protein content.“
– „She was content to step down after four years as chief
executive.“

4 (http://www.thefreedictionary.com/content)
Motivation: Machine Translation

● e.g., translation from English to German:


– „I like ...“
● „Ich mag ….“ (verb)
● „Ich wie ...“ (preposition)

5
Motivation: Syntactic parsing

6
(http://nlp.stanford.edu:8080/parser/index.jsp)
Motivation: Information extraction

● Named-entity recognition (usually nouns)

7 (http://www.nactem.ac.uk/tsujii/GENIA/tagger/)
Motivation: Information extraction

● Relation extraction (triggers are usually verbs)

8 (http://www.nactem.ac.uk/tsujii/GENIA/tagger/)
Open vs. Closed Classes

● Closed
– limited number of words, do not grow usually
– e.g., Auxiliary, Article, Determiner, Conjuction, Pronoun,
Preposition, Particle, Interjection

● Open
– unlimited number of words
– e.g., Noun, Verb, Adverb, Adjective

9
POS Tagsets


There are many parts of speech tagsets

Tag types
– Coarse-grained

Noun, verb, adjective, ...
– Fine-grained

noun-proper-singular, noun-proper-plural, noun-
common-mass, ..

verb-past, verb-present-3rd, verb-base, ...

adjective-simple, adjective-comparative, ...

10
POS Tagsets


Brown tagset (87 tags)
– Brown corpus


C5 tagset (61 tags)

C7 tagset (146 tags!)


Penn TreeBank (45 tags) – most used
– A large annotated corpus of English tagset

11
POS Tagging


The process of assigning a part of speech to each word in a
text


Challenge: words often have more than one POS
– On my back[NN] (noun)
– The back[JJ] door (adjective)
– Win the voters back[RB] (adverb)
– Promised to back[VB] the bill (verb)

12
Ambiguity in POS tags


45-tags Brown corpus (word types)
– Unambiguous (1 tag): 38,857
– Ambiguous: 8,844

2 tags: 6,731

3 tags: 1,621

4 tags: 357

5 tags: 90

6 tags: 32

7 tags: 6 (well, set, round, open, fit, down)

8 tags: 4 ('s, half, back, a)

9 tags: 3 (that, more, in)

13
Baseline method

1. Tagging unambiguous words with the correct label


2. Tagging ambiguous words with their most frequent label
3. Tagging unknown words as a noun


This method performs around 90% precision

14
POS Tagging


The process of assigning a POS tag to each word in a text.
Choosing the best candidate tag for each word.

– Plays (NNS/VBZ)
– well (UH/JJ/NN/RB)
– with (IN)
– others (NNS)

– Plays[VBZ] well[RB] with[IN] others[NNS]

15
Rule-Based Tagging


Standard approach (two steps):
1. Dictionaries to assign a list of potential tags

Plays (NNS/VBZ)

well (UH/JJ/NN/RB)

with (IN)

others (NNS)
2. Hand-written rules to restrict to a POS tag

Plays (VBZ)

well (RB)

with (IN)

others (NNS)

16
Rule-Based Tagging


Some approaches rely on morphological parsing
– e.g., EngCG Tagger below

….

17
(http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.57.972&rep=rep1&type=pdf)
Sequential modeling

● Many of the NLP techniques should deal with data represented


as sequence of items
– Characters, Words, Phrases, Lines, …

● e.g., for part-of-speech tagging


– I[PRP] saw[VBP] the[DT] man[NN] on[IN] the[DT] roof[NN] .

● e.g., for named-entity recognition


– Steven[PER] Paul[PER] Jobs[PER] ,[O] co-founder[O] of[O]
Apple[ORG] Inc[ORG] ,[O] was[O] born[O] in[O] California[LOC].

18
Sequential modeling


Making a decision based on:
– Current Observation:
● Word (W0): „35-years-old“

Prefix, Suffix: „computation“  „comp“, „ation“

Lowercased word: „New“  „new“

Word shape: „35-years-old“  „d-a-a“
– Surrounding observations
● Words (W+1, W−1)
– Previous decisions
● POS tags (T−1, T−2)

19
Sequential modeling


Greedy inference
– Start in the beginning of the sequence
– Assign a label to each item using the classifier
– Using previous decisions as well as the observed data

20
Sequential modeling


Beam inference
– Keeping the top k labels in each position
– Extending each sequence in each local way
– Finding the best k labels for the next position

21
Hidden Markov Model (HMM)

● Finding the best sequence of tags (t1 ...tn ) that corresponds to


the sequence of observations (w1 ...wn )


Probabilistic View
– Considering all possible sequences of tags
– Choosing the tag sequence from this universe of
sequences, which is most probable given the observation
sequence

t̂1n =argmax t P (t 1n∣w n1 )


n
1

22
Using the Bayes Rule

t̂1n=argmax t P (t 1n∣w 1n )
n
1

P ( B∣A)⋅P ( A)
P ( A∣B)=
P (B )

n n P (w n1∣t n1 )⋅P (t n1 )
P (t ∣w )=
1 1
P (w n1 )

t̂1n =argmax t P (w n1∣t 1n )⋅P (t n1 )


n
1

likelihood prior probability

23
Using Markov Assumption

t̂1n =argmax t P (w n1∣t 1n )⋅P (t n1 )


n
1

n
P (w n1∣t n1 )≃ i=1 ∏ P (w i∣t i ) (it depends only on its POS tag and independent of other words)

n
P(t 1n )≃ i=1 ∏ P (t i∣t i−1) (it depends only on the previous POS tag, thus, bigram)

n
t̂1n =argmax t n
1 i=1 ∏ P (wi∣t i )⋅P (t i∣t i −1)

24
Two Probabilities

● The tag transition probabilities: P(ti|ti−1)


– Finding the likelihood of a tag to proceed by another tag
– Similar to the normal bigram model

C (t i −1 , t i )
P (t i∣t i−1 )=
C (t i −1 )

25
Two Probabilities

● The word likelihood probabilities: P(wi|ti)


– Finding the likelihood of a word to appear given a tag

C (t i , w i )
P (w i∣t i )=
C (t i )

26
Two Probabilities

I[PRP] saw[VBP] the[DT] man[NN?] on[] the[] roof[] .

C ([ DT ] ,[ NN ])
P([ NN ]∣[ DT ])=
C ([ DT ])

C ([ NN ] , man)
P(man∣[ NN ])=
C ([ NN ])

27
Ambiguity in POS tagging

Secretariat[NNP] is[VBZ] expected[VBN] to[TO] race[VB] tomorrow[NR] .

People[NNS] inquire[VB] the[DT] reason[NN] for[IN] the[DT] race[NN] .

28
Ambiguity

Secretariat[NNP] is[VBZ] expected[VBN] to[TO] race[?] tomorrow[NR] .

NNP VBZ VBN TO VB NR

Secretariat is expected to race tomorrow

NNP VBZ VBN TO NN NR

Secretariat is expected to race tomorrow

29
Ambiguity

Secretariat[NNP] is[VBZ] expected[VBN] to[TO] race[VB] tomorrow[NR] .

NNP VBZ VBN TO VB NR

Secretariat is expected to race tomorrow

P(VB|TO) = 0.83
P(race|VB) = 0.00012
P(NR|VB) = 0.0027
P(VB|TO).P(NR|VB).P(race|VB) = 0.00000027

30
Ambiguity

Secretariat[NNP] is[VBZ] expected[VBN] to[TO] race[VB] tomorrow[NR] .

NNP VBZ VBN TO NN NR

Secretariat is expected to race tomorrow

P(NN|TO) = 0.00047
P(race|NN) = 0.00057
P(NR|NN) = 0.0012
P(NN|TO).P(NR|NN).P(race|NN) = 0.00000000032

31
Viterbi algorithm

● Decoding algorithm for HMM


– Determine the best sequence of POS tags

● Probability matrix
– Columns corresponding to inputs (words)
– Rows corresponding to possible states (POS tags)

32
Viterbi algorithm

1. Move through the matrix in one pass filling the columns left to
right using the transition probabilities and observation
probabilities
2. Store the max probability path to each cell (not all paths) using
dynamic programming

33
qend end

q4 NN

q3 TO

q2 VB

q1 PPSS

q0 start

i want to race

34
Q1 Q2 Q3 Q4
qend end

vt-1: previous Viterbi path probability


q4 NN
(from the previous time step)

q3 TO

q2 VB

q1 PPSS

q0 start

V0(0)=1.0
i want to race

35
Q1 Q2 Q3 Q4
qend end end

aij: transition probability


q4 NN NN (from previous state qi to current state qj)
P(NN|start)·P(start) bj(ot): state observation likelihood
.0041·1.0=0.0041
(observation ot given the current state j)
q3 TO TO
P(TO|start)·P(start)
.0043·1.0=0.0043

q2 VB VB
P(VB|start)·P(start)
.019·1.0=0.019
q1 PPSS PPSS
P(PPSS|start)·P(start)
.067·1.0=0.067
q0 start start
v0(0)=1.0
i want to race

36
Q1 Q2 Q3 Q4
N
qend end end v t ( j )=max v t −1 (i)⋅a ij⋅b j (ot )
i=1

q4 NN NN
v1(4)=P(PPSS|start)·P(start)·P(I|NN)=0.041·0=0

q3 TO TO
v1(3)=P(PPSS|start)·P(start)·P(I|TO)=0.043·0=0

q2 VB VB
v1(2)=P(PPSS|start)·P(start)·P(I|VB)=0.019·0=0

q1 PPSS PPSS
v1(1)=P(PPSS|start)·P(start)·P(I|PPSS)=0.067·0.37=0.025

q0 start start

i want to race

37
Q1 Q2 Q3 Q4
qend end end end

q4 NN NN NN
v1(4)·P(VB|NN)
0·.0040=0
q3 TO TO TO
v1(3)·P(VB|TO)
0·.83=0
v1(2)·P(VB|VB)
q2 VB VB VB
0·.0038=0
v1(1)·P(VB|PPSS)
.025·.23=0.0055
q1 PPSS PPSS PPSS

q0 start start start

i want to race

38
Q1 Q2 Q3 Q4
qend end end end

q4 NN NN NN

q3 TO TO TO

q2 VB VB VB
v2(2)=max(0,0,0,.0055)·.0093=.000051

q1 PPSS PPSS PPSS

q0 start start start

i want to race

39
Q1 Q2 Q3 Q4
qend end end end

q4 NN NN NN

q3 TO TO TO

q2 VB VB VB

q1 PPSS PPSS PPSS

q0 start start start

i want to race

40
Q1 Q2 Q3 Q4
qend end end end end end end

q4 NN NN NN NN NN NN

q3 TO TO TO TO TO TO

q2 VB VB VB VB VB VB

q1 PPSS PPSS PPSS PPSS PPSS PPSS

q0 start start start start start start

i want to race

41
Q1 Q2 Q3 Q4
qend end end end end end end

q4 NN NN NN NN NN NN

q3 TO TO TO TO TO TO

q2 VB VB VB VB VB VB

q1 PPSS PPSS PPSS PPSS PPSS PPSS

q0 start start start start start start

i want to race

42
Q1 Q2 Q3 Q4
POS tagging using machine learning

● Classification problem (token by token) using a rich set of


features

(https://link.springer.com/chapter/10.1007/11573036_36)

43
POS tagging using
neural networks

● e.g., using Bidirectional Long


Short-Term Memory
Recurrent Neural Network
(bi-LSTM)
● Input based on tokens,
characters and bytes

44 (https://www.aclweb.org/anthology/P/P16/P16-2067.pdf)
Evaluation

● Corpus
– Training and test, and optionally also development set
– Training (cross-validation) and test set

● Evaluation
– Comparison of gold standard (GS) and predicted tags
– Evaluation in terms of Precision, Recall and F-Measure

45
Precision and Recall


Precision:
– Amount of labeled items which are correct

tp
Precision=
tp+ fp


Recall:
– Amount of correct items which have been labeled

tp
Recall =
tp+ fn

46
F-Measure


There is a strong anti-correlation between precision and recall

Having a trade off between these two metrics

Using F-measure to consider both metrics together

F -measure is a weighted harmonic mean of precision and
recall

(β2 +1) P R
F= 2
β P+R

47
Error Analysis


Confusion matrix or contingency table
– Percentage of overall tagging error

IN JJ NN NNP RB VBD VBN


IN - .2 .7
JJ .2 - 3.3 2.1 1.7 .2 2.7
NN 8.7 - .2
NNP .2 3.3 4.1 - .2
RB 2.2 2.0 .5 -
VBD .3 .5 - 4.4
VBN 2.8 2.6

48
Summary

● POS tagging and tagsets


● Rule-based algorithms
● Sequential algorithms
● Neural networks
● Evaluation (P,R,FM)

49
Tools for POS tagging


Spacy: https://spacy.io/

OpenNLP: https://opennlp.apache.org/

Stanford CoreNLP: https://stanfordnlp.github.io/CoreNLP/

NLTK Python: http://www.nltk.org/

and others...

50
Further reading


Book Jurafski & Martin
– Chapter 5

51
Exercise


Project: choose a POS tagger and use it in your project.
– Can POS tags support your task?

52

You might also like