0% found this document useful (0 votes)

48 views12 pages

Assignment 3

The document discusses various techniques for part-of-speech (POS) tagging, including rule-based tagging, stochastic tagging, and transformation-based learning. It then provides details on implementing hidden Markov models for POS tagging, including how to represent the model as a directed graph with states, transition probabilities between states, and emission probabilities from hidden states to observable words. The key steps to build the hidden Markov model for POS tagging are to count tag pairs in a training corpus, calculate transition probabilities, and populate the emission probability matrix.

Uploaded by

Radhe Shyam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views12 pages

Assignment 3

Uploaded by

Radhe Shyam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

ASSIGNMENT 3

AIM :
Survey various techniques for POS tagging and implement any one of
them.

THEORY :
Part-of-speech (POS) tagging is a popular Natural Language Processing
process which refers to categorizing words in a text (corpus) in
correspondence with a particular part of speech, depending on the
definition of the word and its context.

In Figure, we can see each word has its own lexical term written
underneath, however, having to constantly write out these full terms when
we perform text analysis can very quickly become cumbersome
especially as the size of the corpus grows. Thence, we use a short
representation referred to as “tags” to represent the categories.
As earlier mentioned, the process of assigning a specific tag to a word in
our corpus is referred to as part-of-speech tagging (POS tagging for short)
since the POS tags are used to describe the lexical terms that we have
within our text.
Most of the POS tagging falls under Rule Base POS tagging, Stochastic
POS tagging and Transformation based tagging.

1. Rule-based POS Tagging :

One of the oldest techniques of tagging is rule-based POS tagging. Rule-
based taggers use dictionary or lexicon for getting possible tags for tagging
each word. If the word has more than one possible tag, then rule-based
taggers use hand-written rules to identify the correct tag. Disambiguation
can also be performed in rule-based tagging by analyzing the linguistic
features of a word along with its preceding as well as following words. For
example, suppose if the preceding word of a word is article then word must
be a noun.
As the name suggests, all such kind of information in rule-based POS
tagging is coded in the form of rules. These rules may be either −
• Context-pattern rules
• Or, as Regular expression compiled into finite-state automata,
intersected with lexically ambiguous sentence representation.
We can also understand Rule-based POS tagging by its two-stage
architecture −
• First stage − In the first stage, it uses a dictionary to assign each
word a list of potential parts-of-speech.
• Second stage − In the second stage, it uses large lists of hand-written
disambiguation rules to sort down the list to a single part-of-speech
for each word.

2. Stochastic POS Tagging :

Another technique of tagging is Stochastic POS Tagging. Now, the
question that arises here is which model can be stochastic. The model
that includes frequency or probability (statistics) can be called
stochastic. Any number of different approaches to the problem of part-
of-speech tagging can be referred to as stochastic tagger.
The simplest stochastic tagger applies the following approaches for
POS tagging −
Word Frequency Approach: In this approach, the stochastic taggers
disambiguate the words based on the probability that a word occurs
with a particular tag. We can also say that the tag encountered most
frequently with the word in the training set is the one assigned to an
ambiguous instance of that word. The main issue with this approach is
that it may yield inadmissible sequence of tags.

Tag Sequence Probabilities: It is another approach of stochastic

tagging, where the tagger calculates the probability of a given
sequence of tags occurring. It is also called n-gram approach. It is
called so because the best tag for a given word is determined by the
probability at which it occurs with the n previous tags.

3. Transformation Based Learning(TBL) :

In order to understand the working and concept of transformation-based
taggers, we need to understand the working of transformation-based
learning. Consider the following steps to understand the working of
TBL −
• Start with the solution − The TBL usually starts with some
solution to the problem and works in cycles.
• Most beneficial transformation chosen − In each cycle, TBL
• Apply to the problem − The transformation chosen in the last
step will be applied to the problem.
The algorithm will stop when the selected transformation in step 2 will
not add either more value or there are no more transformations to be
selected. Such kind of learning is best suited in classification tasks.

4. Hidden Markov Model (HMM) POS Tagging :

Before digging deep into HMM POS tagging, we must understand the
concept of Markov Chains

Markov Model :
Taking the example text we used in Figure , “Why not tell someone?”,
imaging the sentence is truncated to “Why not tell … ” and we want to
determine whether the following word in the sentence is a noun, verb,
adverb, or some other part-of-speech.
Now, if you are familiar with English, you’d instantly identify the verb
and assume that it is more likely the word is followed by a noun rather
than another verb. Therefore, the idea as shown in this example is that
the POS tag that is assigned to the next word is dependent on the POS
tag of the previous word.

By associating numbers with each arrow direction, of which imply the

likelihood of the next word given the current word, we can say there is a
higher likelihood the next word in our sentence would be a noun since it
has a higher likelihood than the next word being a verb if we are
currently on a verb. The image in figure above, is a great example of
how a Markov Model works on a very small scale.

Given this example, we may now describe Markov models as “a

stochastic model used to model randomly changing systems. It is
assumed that future states depend only on the current state, not on the
events that occurred before it (that is, it assumes the Markov property).”.
Therefore, to get the probability of the next event, it needs only the
states of the current event.
We can depict a Markov chain as directed graph:
The lines with arrows are an indication of the direction hence the name
“directed graph”, and the circles may be regarded as the states of the
model — a state is simply the condition of the present moment.
We could use this Markov model to perform POS. Considering we view
a sentence as a sequence of words, we can represent the sequence as a
graph where we use the POS tags as the events that occur which would
be illustrated by the stats of our model graph.
For example, q1 in above figure would become NN indicating a noun,
q2 would be VB which is short for verb, and q3 would be O signifying
all other tags that are not NN or VB. Like in previous figure, the
directed lines would be given a transition probability that define the
probability of going from one state to the next.

A more compact way to store the transition and state probabilities is

using a table, better known as a “transition matrix”.
This model only tells us the transition probability of one state to the next
when we know the previous word. Hence, this model does not show us
what to do when there is no previous word. To handle this case, we add
what is known as the “initial state”.

How did we populate the transition matrix? . Let’s use 3 sentences for
our corpus. The first is “<s> in a station of the metro”, “<s> the
apparition of these faces in the crowd”, “<s> petals on a wet, black
bough.” (Note these are the same sentences used in the course). Next,
we will break down how to populate the matrix into steps:
1. Count occurrences of tag pairs in the training dataset

At the end of step one, our table would look something like this…
2. Calculate the probability of using the counts

Appling the formula in above formula to the table in previous table, our
new table would look as follows…

You may notice that there are many 0’s in our transition matrix which
would result in our model being incapable of generalizing to other text
that may contain verbs. To overcome this problem, we add smoothing.
Adding smoothing requires we slightly we adjust the formula
from above formula by adding a small value, epsilon, to each of the
counts in the numerator, and add N * epsilon to the denominator, such
that the row sum still adds up to 1.
Hidden Markov Model
Hidden Markov Model (HMM) is a statistical Markov model in which
the system being modeled is assumed to be a Markov process with
unobservable (“hidden”) states .In our case, the unobservable states are
the POS tags of a word.
If we rewind back to our Markov Model, we see that the model has
states for part of speech such as VB for verb and NN for a noun. We
may now think of these as hidden states since they are not directly
observable from the corpus. Though a human may be capable of
deciphering what POS applies to a specific word, a machine only sees
the text, hence making it observable, and is unaware of whether that
word POS tag is noun, verb, or something else which in-turn means they
are unobservable.
Both the Markov Model and Hidden Markov model have transition
probabilities that describe the transition from one hidden state to the
next, however, the Hidden Markov Model also has something known as
emission probabilities.
The emission probabilities describe the transitions from the hidden states
in the model — remember the hidden states are the POS tags — to the
observable states — remember the observable states are the words.
In Figure we see that for the hidden VB state we have observable states.
The emission probability from the hidden states VB to the observable
eat is 0.5 hence there is a 50% chance that the model would output this
word when the current hidden state is VB.
We can also represent the emission probabilities as a table…

Similar to the transition probability matrix, the row values must sum to
1. Also, the reason all of our POS tags emission probabilities are more
than 0 since words can have a different POS tag depending on the
context.
To populate the emission matrix, we’d follow a procedure very similar
to the way we’d populate the transition matrix. We’d first count how
often a word is tagged with a specific tag.

Since the process is so similar to calculating the transition matrix, I will

instead provide you with the formula with smoothing applied to see how
it would be calculated.

CODE :
import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('treebank')
nltk.download('maxent_ne_chunker')
nltk.download('words')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
stop_words = set(stopwords.words('english'))

txt = "A and B are good friends"

tokenized = sent_tokenize(txt)
for i in tokenized:
wordsList = nltk.word_tokenize(i)
wordsList = [w for w in wordsList if not w in stop_words]
tagged = nltk.pos_tag(wordsList)
print(tagged)

sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"),("dog",

"NN"), ("barked", "VBD"), ("at", "IN"), ("the", "DT"), ("cat", "NN")]

grammar = "NP: {<DT>?<JJ>*<NN>}"

cp = nltk.RegexpParser(grammar)
result = cp.parse(sentence)
print(result)
CONCLUSION :
We studied various techniques for POS tagging and successfukky
implemented one of them.

NLP Assignment 5
No ratings yet
NLP Assignment 5
5 pages
This Is AI4001: GCR: t37g47w
No ratings yet
This Is AI4001: GCR: t37g47w
51 pages
A Guide To Hidden Markov Model and Its Applications in NLP
No ratings yet
A Guide To Hidden Markov Model and Its Applications in NLP
11 pages
Asher
No ratings yet
Asher
26 pages
Lecture#11 (POS Tagging)
No ratings yet
Lecture#11 (POS Tagging)
19 pages
hidden markov model
No ratings yet
hidden markov model
13 pages
Hardware Programming
No ratings yet
Hardware Programming
2 pages
Bibliography: Useful Sources For Learning and Teaching English Through Drama
No ratings yet
Bibliography: Useful Sources For Learning and Teaching English Through Drama
5 pages
Thirty Years of Research and Practice
100% (1)
Thirty Years of Research and Practice
4 pages
Mixture of Varieties
100% (2)
Mixture of Varieties
4 pages
MOCK TEST 1 - Entrance Exam To Colledge 2022
No ratings yet
MOCK TEST 1 - Entrance Exam To Colledge 2022
5 pages
To Be Verb: Affirmative Contraction
No ratings yet
To Be Verb: Affirmative Contraction
3 pages
LS4 Reading
No ratings yet
LS4 Reading
4 pages
Social Functions of Conditiona
No ratings yet
Social Functions of Conditiona
7 pages
Inference: Rasmus Rask
No ratings yet
Inference: Rasmus Rask
1 page
BBC Learning English How Do I Talk About My Abilities?
No ratings yet
BBC Learning English How Do I Talk About My Abilities?
3 pages
Lecture Part of Speech Tagging
No ratings yet
Lecture Part of Speech Tagging
41 pages
c1 Open Cloze the Frequency Effect
No ratings yet
c1 Open Cloze the Frequency Effect
3 pages
NLP 4
No ratings yet
NLP 4
83 pages
Parts of Speech Using Hidden Markov Models
No ratings yet
Parts of Speech Using Hidden Markov Models
5 pages
7. POS Tagging-II
No ratings yet
7. POS Tagging-II
11 pages
Part-Of-Speech (POS) Tagging
No ratings yet
Part-Of-Speech (POS) Tagging
53 pages
POS Tagging (1)
No ratings yet
POS Tagging (1)
5 pages
Rule-Based POS Tagging: Part of Speech Tagging
No ratings yet
Rule-Based POS Tagging: Part of Speech Tagging
10 pages
Bac 2023 Liste Examens
No ratings yet
Bac 2023 Liste Examens
198 pages
POS Tagging Comparison
No ratings yet
POS Tagging Comparison
3 pages
0abc5d7ab6458ec55a14c9f7c300438b_lec10
No ratings yet
0abc5d7ab6458ec55a14c9f7c300438b_lec10
77 pages
Parts of Speech
No ratings yet
Parts of Speech
26 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
47 pages
Lecture 20-23 Part of Speech Tagging
No ratings yet
Lecture 20-23 Part of Speech Tagging
36 pages
Part of Speech Tagging and Hidden Markov Models
No ratings yet
Part of Speech Tagging and Hidden Markov Models
24 pages
Roark - Lec 2 - HMM Viterbi Forward
No ratings yet
Roark - Lec 2 - HMM Viterbi Forward
37 pages
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
No ratings yet
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
93 pages
Cme4408 p6 Pos Tagging
No ratings yet
Cme4408 p6 Pos Tagging
33 pages
pos tagging and chunking
No ratings yet
pos tagging and chunking
29 pages
Hmm
No ratings yet
Hmm
94 pages
Lecture 16-17-18-19
No ratings yet
Lecture 16-17-18-19
42 pages
Lecture 5
No ratings yet
Lecture 5
56 pages
Multi-Tagging For Transition-Based Dependency Parsing
No ratings yet
Multi-Tagging For Transition-Based Dependency Parsing
10 pages
2.1 Rule Based POS Tagging
No ratings yet
2.1 Rule Based POS Tagging
5 pages
UNIT NO 3
No ratings yet
UNIT NO 3
8 pages
NLPChapter3
No ratings yet
NLPChapter3
14 pages
NLP-Lectures 4,5,6
No ratings yet
NLP-Lectures 4,5,6
85 pages
CH2
No ratings yet
CH2
119 pages
POS Tagging HMM Notes With Diagrams (1)
No ratings yet
POS Tagging HMM Notes With Diagrams (1)
4 pages
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
No ratings yet
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
86 pages
10 - POS Tagging
No ratings yet
10 - POS Tagging
75 pages
Week9
No ratings yet
Week9
36 pages
unit-3
No ratings yet
unit-3
50 pages
Karthik M: Master of Business Administration
No ratings yet
Karthik M: Master of Business Administration
3 pages
POS Tagging
No ratings yet
POS Tagging
11 pages
HSC Informal Letters First Paper 1
No ratings yet
HSC Informal Letters First Paper 1
7 pages
NLP Report - Modified
No ratings yet
NLP Report - Modified
8 pages
Wordsworth Theory of Poetry
No ratings yet
Wordsworth Theory of Poetry
6 pages
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
No ratings yet
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
40 pages
10 Word Spelling Test Template
No ratings yet
10 Word Spelling Test Template
3 pages
Module-5 (Markov Model and Pos Tagging)
No ratings yet
Module-5 (Markov Model and Pos Tagging)
66 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
Rutuja
No ratings yet
Rutuja
10 pages
Hmms Spring2013
No ratings yet
Hmms Spring2013
22 pages
Pos Tagging
No ratings yet
Pos Tagging
84 pages
9.Chapter7 POS Tagging
No ratings yet
9.Chapter7 POS Tagging
37 pages
Harris 2013 D-Re Final
No ratings yet
Harris 2013 D-Re Final
9 pages
Markov and Pos Report
No ratings yet
Markov and Pos Report
30 pages
NLP Assignment 5
No ratings yet
NLP Assignment 5
5 pages
5 Sequence Learning
No ratings yet
5 Sequence Learning
50 pages
Speech Recognition Architecture
No ratings yet
Speech Recognition Architecture
13 pages
JWL - JWL Instructions For Authors
No ratings yet
JWL - JWL Instructions For Authors
10 pages
5 Natural Language Processing
No ratings yet
5 Natural Language Processing
7 pages
May 14
No ratings yet
May 14
23 pages
2 cs626 Pos Tagging Week of 1aug22
No ratings yet
2 cs626 Pos Tagging Week of 1aug22
57 pages
Multicolour Pastel Simple English Narrative Conventions Activity Worksheet
No ratings yet
Multicolour Pastel Simple English Narrative Conventions Activity Worksheet
15 pages
NTS Sample Paper OG1 and OG2
No ratings yet
NTS Sample Paper OG1 and OG2
8 pages
Parts of Speech Tagging Using Hidden Markov Model, Maximum Entropy Model and Conditional Random Field
No ratings yet
Parts of Speech Tagging Using Hidden Markov Model, Maximum Entropy Model and Conditional Random Field
28 pages
Lec PoS Tagging 2022
No ratings yet
Lec PoS Tagging 2022
67 pages
19CSE453 - Natural Language Processing: Part of Speech Tagging
No ratings yet
19CSE453 - Natural Language Processing: Part of Speech Tagging
59 pages
f3 2019-Writing
100% (1)
f3 2019-Writing
2 pages
tiếng anh chủ đề 1
No ratings yet
tiếng anh chủ đề 1
25 pages
Users Guide NMM Chap1-7
No ratings yet
Users Guide NMM Chap1-7
204 pages
Pos Tagging of Punjabi Language Using Hidden Markov Model
No ratings yet
Pos Tagging of Punjabi Language Using Hidden Markov Model
9 pages
POStagging
No ratings yet
POStagging
72 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
9 pages
Carol Dougherty, Leslie Kurke Cultural Poetics in Archaic Greece - Cult, Performance, Politics 1998 PDF
0% (1)
Carol Dougherty, Leslie Kurke Cultural Poetics in Archaic Greece - Cult, Performance, Politics 1998 PDF
344 pages
Countable and Uncountable Nouns
No ratings yet
Countable and Uncountable Nouns
15 pages
CSCI 5832 Natural Language Processing: Jim Martin
No ratings yet
CSCI 5832 Natural Language Processing: Jim Martin
46 pages
Lecture Notes On Syntactic Processing
No ratings yet
Lecture Notes On Syntactic Processing
14 pages
MIL LP Lesson 6.1 Media and Information Languages Introduction
100% (1)
MIL LP Lesson 6.1 Media and Information Languages Introduction
4 pages
English DSE Giveaway Notes
No ratings yet
English DSE Giveaway Notes
22 pages
Grade 6 Learner's Packet
No ratings yet
Grade 6 Learner's Packet
32 pages
The Sequence Algorithms Handbook
From Everand
The Sequence Algorithms Handbook
Pasquale De Marco
No ratings yet

Assignment 3

Uploaded by

Assignment 3

Uploaded by

ASSIGNMENT 3

1. Rule-based POS Tagging :

2. Stochastic POS Tagging :

Tag Sequence Probabilities: It is another approach of stochastic

3. Transformation Based Learning(TBL) :

4. Hidden Markov Model (HMM) POS Tagging :

By associating numbers with each arrow direction, of which imply the

Given this example, we may now describe Markov models as “a

A more compact way to store the transition and state probabilities is

Since the process is so similar to calculating the transition matrix, I will

txt = "A and B are good friends"

sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"),("dog",

grammar = "NP: {<DT>?<JJ>*<NN>}"

You might also like