0% found this document useful (0 votes)

15 views50 pages

Unit 3

NLP

Uploaded by

adhikariprabn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views50 pages

Unit 3

NLP

Uploaded by

adhikariprabn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Unit 3

Part of Speech Tagging

Types of PoS Tagging, Hidden Markup Model

Natural Language Processing (NLP)

MDS 555

1
Objective
●
Types of PoS Tagger
– Rule Based PoSTagging
– Stochastic PoS Tagging
– Transformation based Tagging
●
Hidden Markup Model

2
Reference / Reading
●
Chapter 8
Speech and Language Processing. Daniel Jurafsky
& James H. Martin
– https://web.stanford.edu/~jurafsky/slp3/old_oct19/8.pdf

3
What is PoS?
●
A category to which a word is assigned in
accordance with its syntactic functions
●
The role a word plays in a sentence denotes
what part of speech it belongs to
●
In English the main parts of speech are
– noun, pronoun, adjective, determiner, verb, adverb,
preposition, conjunction, and interjection

4
Part of Speech
●
Noun
●
Adjective
●
Adverb
●
Verb
●
Preposition
●
Pronoun
●
Conjunctions
●
Interjections
5
PoS Tagsets
●
There are many parts of speech tagsets
●
Tag types
– Coarse-grained
●
Noun, verb, adjective, …
– Fine-grained
●
noun-proper-singular, noun-proper-plural, noun-common-mass, ..
●
verb-past, verb-present-3rd, verb-base, …
●
adjective-simple, adjective-comparative, ...

6
PoS Tagsets
●
Brown tagset (87 tags)
– Brown corpus
– https://en.wikipedia.org/wiki/Brown_Corpus
●
C5 tagset (61 tags)
●
C7 tagset (146 tags!)
●
Penn TreeBank (45 tags) – most used
– A large annotated corpus of English tagset
– https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos
.html
●
UPenn TreeBank II - 36 tags
7
PoS Tag: Challenge
●
Words often have more than one POS
●
Ambiguity in POS tags
●
Out of Vocabulary (OOV) in POS
●
Complex grammatical structure of the language
●
Lack of annotated dataset
●
Inconsistencies in annotated dataset

8
Type of PoS Taggers
●
There are different algorithms for tagging.
– Rule Based Tagging
– Transformation Based Tagging
– Statistical Tagging (HMM Part-of-Speech Tagging)

9
Rule-Based POS tagging
●
The rule-based approach uses handcrafted sets
of rules to tag input sentence
●
There are two stages in rule-based taggers:
– First Stage: Uses a dictionary to assign each word a
list of potential parts-of-speech
– Second Stage: Uses a large list of handcrafted rules
to window down this list to a single part-of-speech
for each word
10
Rule-Based POS tagging
●
The ENGTWOL is a rule-based tagger
– In the first stage, uses a two-level lexicon transducer
– In the second stage, uses hand-crafted rules (about
1100 rules).
●
Rule-1: if (the previous tag is an article)
then eliminate all verb tags
●
Rule-2: if (the next tag is verb)
then eliminate all verb tags

11
Rule-Based POS tagging
●
Example: He had a fly.
●
The fırst stage:
– He → he/pronoun
– had → have/verbpast have/auxliarypast
– a → a/article
– fly → fly/verb fly/noun
●
The second stage:
– apply rule: if (the previous tag is an article) then eliminate all verb tags
●
he → he/pronoun
●
had → have/verbpast have/auxliarypast
●
a → a/article
●
fly → fly/verb fly/noun
12
Transformation-based tagging
●
Transformation-based tagging is also known as
Brill Tagging.
– Brill Tagging uses transformation rules and rules are
learned from a tagged corpus.
– Then these learned rules are used in tagging.
●
Before the rules are applied, the tagger labels
every word with its most likely tag.
– We get these most likely tags from a tagged corpus.
13
Transformation-based tagging
●
Example:
– He is expected to race tomorrow
he/PRN is/VBZ expected/VBN to/TO race/NN tomorrow/NN
●
After selecting most-likely tags, we apply
transformation rules.
– Change NN to VB when the previous tag is TO
●
This rule converts race/NN into race/VB
●
This may not work for every case
….. According to race
14
Brill Tagger – How rules are learned?
●
We assume that we have a tagged corpus. Brill Tagger
algorithm has three major steps.
– Tag the corpus with the most likely tag for each (unigram model)
– Choose a transformation that deterministically replaces an existing tag
with a new tag such that the resulting tagged training corpus has the
lowest error rate out of all transformations.
– Apply the transformation to the training corpus.
●
These steps are repeated until a stopping criterion is reached
●
The result (which will be our tagger) will be:
– First tags using most-likely tags
– Then apply the learned transformations in the learning order.
15
Brill Tagger – Transformation Rules?
●
Change tag a to tag b when
– The preceding (following) word is tagged z.
– The word two before (after) is tagged z.
– One of two preceding (following) words is tagged z.
– One of three preceding (following) words is tagged z.
– The preceding word is tagged z and the following word is
tagged w.
– The preceding (following) word is tagged z and the word two
before (after) is tagged w.
16
Methods of PoS Tagging
●
Stochastic (Probabilistic) tagging
– e.g., TNT [ Brants, 2000 ]
●
Trigrams n Tags
●
Based on Markov model
– Original Paper: https://dl.acm.org/doi/pdf/10.3115/974147.974178

17
Hidden Markov Model based PoS
Tagging

18
Markov Chains
●
A Markov chain is a model that tells us
something about the probabilities of sequences
of states (random variables)
– A Markov chain makes a very strong assumption
that if we want to predict the future in the sequence,
all that matters is the current state (Markov
assumption)
– All states before the current state have no impact on
the future except via the current state
19
Markov Chains
●
Markov Assumption:
– Consider a sequence of state variables q1, q2, ..., qi.
A Markov model embodies the Markov assumption
on the probabilities of this sequence: that Markov
assumption when predicting the future, the past doesn’t
matter, only the present :
–

P(qi | q1…qi-1) = P(qi | qi-1)

20
Markov Chains
●
A Markov chain is specified by the following
components

21
Markov Chains
●
Markov chain for weather events
– Vocabulary : HOT, COLD, and WARM
●
States are represented as nodes
●
Transitions, with their probabilities, as edge
●
A start distribution π is required.
– setting π = [0.1, 0.7, 0.2] would mean a probability 0.7 of starting in state 2
(cold), probability 0.1 of starting in state 1 (hot), etc.
●
Probability of the sequence: cold - hot – hot - warm
– P(cold hot hot warm) = π2 * P(hot|cold) * P(hot|hot) * P(warm|hot)
= 0.7 * 0.1 * 0.6 * 0.3
22
Markov Chains
●
Compute the probability of each of the following
sequences:
– hot hot hot hot
– cold hot cold hot

●
What does the difference in these probabilities tell
you about a real-world weather fact encoded in
Figure
23
Markov Chains
●
Markov chain is useful to compute a probability for a sequence of
observable events.
– In many cases, the events we are interested in are hidden events:
●
We don’t observe hidden events directly.
●
For example we don’t normally observe part-of-speech tags in a text.
Rather, we see words, and must infer the tags from the word sequence.
●
We call the tags hidden because they are not observed.
●
A Hidden Markov model (HMM) allows us to talk about both
observed events (like words that we see in the input) and hidden
events (like part-of-speech tags) that we think of as causal factors
in our probabilistic model.
24
Hidden Markov Model
●
An HMM is specified by the following
components

25
First-Order Hidden Markov Model
●
A first-order hidden Markov model uses two simplifying
assumptions:
1) As with a first-order Markov chain, the probability of a particular state
depends only on the previous state:

Markov Assumption: P(qi | q1…qi-1) = P(qi | qi-1)

2) Probability of an output observation oi depends only on the state that

produced the observation qi and not on any other states or any other
observations:

Output Independence: P(oi | q1…qi…qn, o1…oi…on) = P(oi | qi )

26
The components of an HMM tagger
●
An HMM has two components, the A and B
probabilities
●
The A matrix contains the tag transition probabilities P(ti|ti−1) which
represent the probability of a tag occurring given the previous tag.
– For example, modal verbs (MD) like will are very likely to be followed by a verb
in the base form (VB), like race, so we expect this probability to be high.
– We compute the maximum likelihood estimate of this transition probability by
counting, out of the times we see the first tag in a labeled corpus, how often the
first tag is followed by the second

27
The components of an HMM tagger
- In the WSJ corpus, for example, MD occurs 13124 times of which it is followed by
VB 10471, for an MLE estimate of

– In HMM tagging, the probabilities are estimated by counting on a tagged training

corpus.

28
The components of an HMM tagger
●
The B emission probabilities P(wi|ti), represent the probability, given a
tag (say
MD), that it will be associated with a given word (say will).
– The MLE of the emission probability is

– Of the 13124 occurrences of MD in the WSJ corpus, it is associated

with will 4046 times

29
HMM tagger
●
The A transition probabilities, and B observation likelihoods (emission
probabilities) of the HMM are illustrated for three states in an HMM
part-of-speech tagger; the full tagger would have one state for each tag

30
HMM tagger
●
States: Set of part-of-speech tags.
●
Transition Probabilities: Tag transition probabilities
– A tag transition probability P(tagb | taga) represents the probability of a tag tagb occurring given the
previous tag taga.

●
Observations: Words (Vocabulary)
– Observation Likelihoods: Emission Probabilities P(word|tag)
– A emission probability P(word | tag ) represents probability of tag producing word.

●
Initial Probability Distribution: First Tag Probabilities P(tag |<s>) in sentences.

31
HMM Tagging as Decoding
●
For an HMM that contains hidden variables, task of
determining hidden variables sequence corresponding to
sequence of observations is called decoding.
●
Decoding:
– Given as input an HMM λ = (TransProbs, ObsLikelihoods) and a
sequence of observations O = o1,…,oT, find the most probable
sequence of states Q = q1,…,qT .
●
For part-of-speech tagging, we will find the most probable
sequence of tags t1,…,tn (hidden variables) for a given
sequence of words w1,…,wn (observations).
32
HMM - Decoding

33
HMM - Decoding
●
HMM taggers make two further simplifying
assumption
– The first is that the probability of a word appearing depends only on its
own tag and is independent of neighboring words and tags:

– The second assumption, the bigram assumption, is that the probability of a

tag is dependent only on the previous tag, rather than the entire tag
sequence;

34
HMM - Decoding
– Plugging the simplifying assumptions, results in the
following equation for the most probable tag
sequence from a bigram tagger:

– The two parts of above equation correspond neatly

to the B emission probability and A transition
probability
35
Viterbi Algorithm
●
The decoding algorithm for HMMs is the Viterbi algorithm

36
Working of Viterbi Algorithm
Word Sequence O1, O2...
Most possible tag sequence
Number of tags

37
Working of Viterbi Algorithm

most probable path probabilities of first word o1

where,

PI is first tag probability of tag s and

bs(o1) is emission probability P(word o1 | tag s)

38
Working of Viterbi Algorithm

most probable path probabilities of first t words where

viterbi[st,t-1] is most probable path probability of t-1

words such that the tag of word t-1 is st

ast,s is tansition probability P(tag s | tag st) and

bs(ot) is emission probability P(word ot | tag s)

39
Working of Viterbi Algorithm

most probable path probability of T words

40
Viterbi Algorithm - Example
●
Let’s tag the sentence Janet will back the bill

41
Viterbi Algorithm - Example
●
Viterbi[NNP,Janet]
= P(NNP|<s>)*P(Janet|NNP)
= 0.2767*0.000032 = 0.00000885
= 8.85x10-6

42
Viterbi Algorithm - Example

43
Viterbi Algorithm - Example

44
Viterbi Algorithm - Example

45
Viterbi Algorithm - Example

46
Viterbi Algorithm - Example
●
Viterbi Matrix for
●
Janet will back the bill
– Janet /NNP
– will /MD
– back /VB
– the /DT
– bill /NN

47
Self Study
●
Beam search is a variant of Viterbi decoding that
maintains only a fraction of high scoring states rather than
all states during decoding.
●
Maximum Entropy Markov Model (MEMM) taggers are
another types of taggers that train logistic regression
models to pick the best tag given a word, its context and
its previous tags using feature templates.

48
Reference
●
WSJ Corpus
– https://www.spsc.tugraz.at/databases-and-tools/wall-
street-journal-corpus.html
– https://aclanthology.org/H92-1073.pdf

49
Thank you

MEMO JW MATH GRADE 11 TERM 1 TEST - 2024 - Final
No ratings yet
MEMO JW MATH GRADE 11 TERM 1 TEST - 2024 - Final
4 pages
Math Reviewer g10 1
100% (10)
Math Reviewer g10 1
10 pages
Beyond Effective Go: Part 1 - Achieving High-Performance Code
From Everand
Beyond Effective Go: Part 1 - Achieving High-Performance Code
Corey S Scott
5/5 (1)
Lecture 5
No ratings yet
Lecture 5
56 pages
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
No ratings yet
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
93 pages
19CSE453 - Natural Language Processing: Part of Speech Tagging
No ratings yet
19CSE453 - Natural Language Processing: Part of Speech Tagging
59 pages
Part-Of-Speech (POS) Tagging
No ratings yet
Part-Of-Speech (POS) Tagging
53 pages
NLP 4
No ratings yet
NLP 4
83 pages
5 Sequence Learning
No ratings yet
5 Sequence Learning
50 pages
CSCI 5832 Natural Language Processing: Jim Martin
No ratings yet
CSCI 5832 Natural Language Processing: Jim Martin
46 pages
Week 9
No ratings yet
Week 9
36 pages
Lecture Part of Speech Tagging
No ratings yet
Lecture Part of Speech Tagging
41 pages
Assignment 3
No ratings yet
Assignment 3
12 pages
9.chapter7 POS Tagging
No ratings yet
9.chapter7 POS Tagging
37 pages
Lecture 16-17-18-19
No ratings yet
Lecture 16-17-18-19
42 pages
Module-5 (Markov Model and Pos Tagging)
No ratings yet
Module-5 (Markov Model and Pos Tagging)
66 pages
Cme4408 p6 Pos Tagging
No ratings yet
Cme4408 p6 Pos Tagging
33 pages
Lect6 Pos
No ratings yet
Lect6 Pos
62 pages
Lecture Notes On Syntactic Processing
No ratings yet
Lecture Notes On Syntactic Processing
14 pages
Lec3-Posner Intro
No ratings yet
Lec3-Posner Intro
30 pages
Lecture7 Pos Tagging
No ratings yet
Lecture7 Pos Tagging
33 pages
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
No ratings yet
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
86 pages
Pos Tagging of Punjabi Language Using Hidden Markov Model
No ratings yet
Pos Tagging of Punjabi Language Using Hidden Markov Model
9 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
94 pages
Parts of Speech Using Hidden Markov Models
No ratings yet
Parts of Speech Using Hidden Markov Models
5 pages
Print Lect6 Pos
No ratings yet
Print Lect6 Pos
11 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
POStagging
No ratings yet
POStagging
72 pages
Lecture 20-23 Part of Speech Tagging
No ratings yet
Lecture 20-23 Part of Speech Tagging
36 pages
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
No ratings yet
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
40 pages
Parts of Speech
No ratings yet
Parts of Speech
26 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
13 pages
NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing
No ratings yet
NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing
28 pages
NLPChapter 3
No ratings yet
NLPChapter 3
14 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
47 pages
Natural Language Processing: Parts of Speech Tagging - Pos
No ratings yet
Natural Language Processing: Parts of Speech Tagging - Pos
20 pages
Multi-Tagging For Transition-Based Dependency Parsing
No ratings yet
Multi-Tagging For Transition-Based Dependency Parsing
10 pages
Part of Speech Tagging and Hidden Markov Models
No ratings yet
Part of Speech Tagging and Hidden Markov Models
24 pages
2 cs626 Pos Tagging Week of 1aug22
No ratings yet
2 cs626 Pos Tagging Week of 1aug22
57 pages
NLP-Lectures 4,5,6
No ratings yet
NLP-Lectures 4,5,6
85 pages
Unit3 01
No ratings yet
Unit3 01
10 pages
NLP Mod5 Lec1 Markov Model and Pos
No ratings yet
NLP Mod5 Lec1 Markov Model and Pos
21 pages
Lec 10
No ratings yet
Lec 10
77 pages
Unit No 3
No ratings yet
Unit No 3
8 pages
Rule-Based POS Tagging: Part of Speech Tagging
No ratings yet
Rule-Based POS Tagging: Part of Speech Tagging
10 pages
Lec PoS Tagging 2022
No ratings yet
Lec PoS Tagging 2022
67 pages
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
No ratings yet
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
63 pages
POS Tagging: Introduction: Heng Ji
No ratings yet
POS Tagging: Introduction: Heng Ji
35 pages
Lec-5 POStagging
No ratings yet
Lec-5 POStagging
24 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
36 pages
SPR 07 Nltk2
No ratings yet
SPR 07 Nltk2
30 pages
5 Natural Language Processing
No ratings yet
5 Natural Language Processing
7 pages
Speech and Language Processing: SLP Chapter 5
No ratings yet
Speech and Language Processing: SLP Chapter 5
56 pages
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
No ratings yet
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
108 pages
CSCI 5832 Natural Language Processing: Jim Martin
No ratings yet
CSCI 5832 Natural Language Processing: Jim Martin
47 pages
Lecture 5 Part of Speech Tagging
No ratings yet
Lecture 5 Part of Speech Tagging
39 pages
Ai TXT Unit5
No ratings yet
Ai TXT Unit5
7 pages
S1 Chp1 Slides
No ratings yet
S1 Chp1 Slides
8 pages
Module-2 NLP
No ratings yet
Module-2 NLP
50 pages
Speech Recognition Architecture
No ratings yet
Speech Recognition Architecture
13 pages
Parts of Speech Tagging Using Hidden Markov Model, Maximum Entropy Model and Conditional Random Field
No ratings yet
Parts of Speech Tagging Using Hidden Markov Model, Maximum Entropy Model and Conditional Random Field
28 pages
Math 132 Master Syllabus
No ratings yet
Math 132 Master Syllabus
2 pages
Assignment PDF
No ratings yet
Assignment PDF
2 pages
Revision Test No1 E6 New
No ratings yet
Revision Test No1 E6 New
11 pages
Chapter 6 Numercal Methods: Ahmad Shukri Yahaya School of Civil Engineering USM
No ratings yet
Chapter 6 Numercal Methods: Ahmad Shukri Yahaya School of Civil Engineering USM
40 pages
Padasalai Net 12th Computer Science Full Answer Key Quarterly Exam 2017
No ratings yet
Padasalai Net 12th Computer Science Full Answer Key Quarterly Exam 2017
5 pages
Delivery Format - Step 2203058 - 24
No ratings yet
Delivery Format - Step 2203058 - 24
11 pages
Class Log Discrete Maths
No ratings yet
Class Log Discrete Maths
16 pages
Feedforward PDF
No ratings yet
Feedforward PDF
21 pages
Endogeneity
No ratings yet
Endogeneity
19 pages
Kongsberg K-Pos DP (OS)
No ratings yet
Kongsberg K-Pos DP (OS)
334 pages
Stat Chapter 4
No ratings yet
Stat Chapter 4
27 pages
Mathematics Advanced Sample Examination Materials 2020
No ratings yet
Mathematics Advanced Sample Examination Materials 2020
33 pages
Design Theory For R Elational Databases
No ratings yet
Design Theory For R Elational Databases
56 pages
02 แกลม
No ratings yet
02 แกลม
12 pages
Altair 05 TR
No ratings yet
Altair 05 TR
27 pages
Beamer Class Example8 Warsaw
No ratings yet
Beamer Class Example8 Warsaw
28 pages
Data Structures Se E&Tc List of Practicals
No ratings yet
Data Structures Se E&Tc List of Practicals
23 pages
CS201 Solved Final Papers 249 Pages File
50% (4)
CS201 Solved Final Papers 249 Pages File
249 pages
Algorithmic Trading Strategy Based On Massive Data Mining
No ratings yet
Algorithmic Trading Strategy Based On Massive Data Mining
5 pages
Chapter 3 - Combinational Logic Circuits (Part 1) - Digital Electronics
No ratings yet
Chapter 3 - Combinational Logic Circuits (Part 1) - Digital Electronics
12 pages
4BarLinkV3 0c
No ratings yet
4BarLinkV3 0c
17 pages
Available Puzzles
No ratings yet
Available Puzzles
1 page
Modern Approach To Axiomatics
No ratings yet
Modern Approach To Axiomatics
91 pages
of Quantitative Techniques
100% (1)
of Quantitative Techniques
116 pages
MOE Por Vibração Transversal
No ratings yet
MOE Por Vibração Transversal
11 pages
5 Integration of Inverse Trigonometric Functions
No ratings yet
5 Integration of Inverse Trigonometric Functions
4 pages
The Solid Earth - An Introduction To Global Geophysics-283-335
No ratings yet
The Solid Earth - An Introduction To Global Geophysics-283-335
53 pages
SAFE 12 Paper For Long Term Deflection
100% (4)
SAFE 12 Paper For Long Term Deflection
3 pages

Unit 3

Uploaded by

Unit 3

Uploaded by

Unit 3

Part of Speech Tagging

Natural Language Processing (NLP)

P(qi | q1…qi-1) = P(qi | qi-1)

Markov Assumption: P(qi | q1…qi-1) = P(qi | qi-1)

2) Probability of an output observation oi depends only on the state that

Output Independence: P(oi | q1…qi…qn, o1…oi…on) = P(oi | qi )

– In HMM tagging, the probabilities are estimated by counting on a tagged training

– Of the 13124 occurrences of MD in the WSJ corpus, it is associated

– The second assumption, the bigram assumption, is that the probability of a

– The two parts of above equation correspond neatly

most probable path probabilities of first word o1

PI is first tag probability of tag s and

most probable path probabilities of first t words where

viterbi[st,t-1] is most probable path probability of t-1

ast,s is tansition probability P(tag s | tag st) and

most probable path probability of T words

You might also like