0% found this document useful (0 votes)
13 views5 pages

Parts of Speech Using Hidden Markov Models

Pos using hmm model in detail with a numerical example.

Uploaded by

Vicky Dethe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views5 pages

Parts of Speech Using Hidden Markov Models

Pos using hmm model in detail with a numerical example.

Uploaded by

Vicky Dethe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1

In this article, we are going to learn one of the


Parts of Speech most important parts of a natural language
processing pipeline, Parts of Speech
tagging. Due to the complexity of the English
Tagging using language, it's very important that computers
learn the context of each word. Parts of speech
tagging is used to tag the parts of speech of the

Hidden Markov words in a sentence based on the context. For


example, consider two sentences:

Model 1. The computer is not able to understand


languages because it is too dumb.

Prasun Kumar 2. The computer is not able to understand


languages because it is too complex.

What does it refer to in the above two


sentences? In the first sentence, it refers to the
computer, while in the other, it refers to the
language English. In this example, the part of
speech of it is the same, but due to a different
context, the meaning changes. Take, for
instance, the following examples:

1. Snorlax is sleeping

2. Sleeping is a boon

In these examples, the same word


sleeping has been used as a verb (in the first
sentence) as well as a noun (in the second
sentence). So, it is important for a computer to
understand the parts of speech (PoS) of a word.
There are various methods of PoS tagging such
as lookup tables, n-grams, Hidden Markov
Model, and Viterbi algorithms.

20
Before understanding these methods, let's about the chance of occurrence of a part of
look at some of the terminologies. To start speech after another part of speech, while
with the PoS tagging problem, we need to emission probabilities tell us about the
have a training set of many sentences in chance of occurrence of a particular word
which we know the parts of speech of each corresponding to a part of speech.
word priorly. The collection of these
sentences is known as text corpus. In the above example, transition
probabilities include what is the probability
Lookup tables and N-grams have some that a Modal is coming after a Noun, a Verb is
serious limitations. In lookup tables, each coming after a Modal and a Noun is coming
word will get tagged to one and only one part after a Verb. Emission probabilities include
of speech each time, irrespective of the the chance that a Noun will be the word John,
context. In the case of n-gram, it is possible and a Verb will be the word see, etc. For the
that we will get some new combination of correct tagging, we want overall probabilities
words in our test set, and thus n-gram will (multiplication of all prob.) to be higher.
not be able to tag the PoS for these cases.
These limitations make lookup tables and n- Emission Probabilities
grams relatively less popular. Hidden Markov
Model and Viterbi Algorithm take care of this
problem, and we will understand how they
work in this article.

Hidden Markov Model

We will take an example to understand the


working principle of these methods. Let us
consider a sentence: ‘John may see Rob’. In
this sentence, let us say that a way of tagging
PoS is as follows: John is a Noun (N), may is a
modal verb (M), see is a verb (V), and Rob is a
noun (N).
We need our training corpus to have all
the PoS tags so that we can find emission and
transition probabilities. Let us consider four
sentences, as well as their parts of speech
associated with each word. The example has
been taken from NLP course at Udacity. <s>
and <e> denote starting and ending tags.
Figure 1: Example of a sentence with parts of
speech tags We will find the probability of each word
being a particular part of speech using the
Our aim is to calculate the probability above information. For example, Mary is
associated with the above tagging. To find occurring 4 times as Noun, in the above
this out, we need two sets of probabilities: corpus, and there are 9 occurrences of words
transition probability and emission which are Noun, so the probability that a
probability. Transition probability tells us Noun will be the word Mary is equal to 4/9.

21
the parts of speech of each word are hidden to
us and not directly observable, so we call
them hidden states. There are 9 observations
and 3 hidden states in our corpus. Each
hidden state (PoS) is connected to every other
hidden state, with the transition probability,
and each hidden state is also connected to
every observation (words) by emission
Figure 2: Emission Probabilities
probabilities. The following diagram
describes this relation:
In the same way, we find all the probabilities
and represent them as follows, which is
called emission probabilities.

Transition Probabilities

This is the set of probabilities of one part of


speech following another. For example, in the
above corpus, ‘Noun followed by Modal’
occurs three times (in first, second, and
Figure 4: Hidden Markov Model
fourth sentences). In total, Noun is followed
by Noun once, by Modal thrice, by Verb once
In the above diagram, values on solid
and by end-of-sentence four times. Thus, the
arrows show the probability of a part of
probability that Modal occurs after Noun is
speech coming after another part of speech
3/9 = 1/3. In the same way, we calculate
(transition prob.), while the numbers on the
probabilities for all combinations, and
dashed arrow show the probability that a
summarize them in the following diagram,
noun is the given word (emission
which represents transition probabilities:
probability).

The Hidden Markov Model can generate


all sentences based on the sequence in which
we travel from one state to another. Here
state refers to the parts of speech N (noun),
M (modal verb) and V (verb), start-of-
sentence (<s>) and end-of-sentence (<e>).
Let's consider an example where we want to
generate the sentence ‘Jane will spot Will.’
Let us see in how many ways we can generate
Figure 3: Transition Probability this sentence using the above model.

Now we have both emission and We will start from <s>. One of the ways
transition probabilities. We will proceed to is to reach Noun (N) with probability 3/4. We
see them in action. The words which are there can pick the word Jane with probability 2/9,
in the corpus are called observations because and can move to Modal (M) with probability
these are the things that we can observe. But, 1/3, and can pick will with probability 3/4. We

22
can move to Verb (V) with probability 3/4, paths, there are two more paths, whose
and can pick spot with probability 1/4, and likelihood is also shown below:
can move to Noun (N) with probability 1.
Then, we can choose Will with probability
1/9. Finally, we can reach the end-of-
sentence with a probability 4/9. The above
few sentences might be complex to
understand, so let's have a look at the
following diagram to understand the flow.
This is one of the many ways to generate the
sentence: Figure 7: Third and fourth possibility of
occurrence of the above sentence

Out of all 4 possibilities, we find that the


likelihood of the first possibility is highest.
Figure 5: First possibility of occurrence of the Thus the PoS tags in that sentence will be
above sentence reported as the correct PoS tags. Choosing
that combination that has the highest
Moving from one state to another is likelihood is called the maximum likelihood
independent of other states, so we can principle, which is widely used in many
multiply all these probabilities to calculate machine learning algorithms. Based on the
the probability of the above combination of above values, we will report that, in the given
words and parts of speech. For the above sentence, Jane is Noun, will is Modal Verb,
case, we obtain 0.0003858 after multiplying Spot is a Verb, and Will is a Noun. Thus, we
all the probabilities. There are other ways in are able to find the correct parts of speech of
which we can generate the same sentences. each word in a sentence.
Let's have a look at one of them:
In this article, we learned the
application of Hidden Markov Models in
Parts of Speech tagging in simple words. To
further improve the Hidden Markov Model,
we use the Viterbi algorithm, which uses
Figure 6: Second possibility of occurrence of dynamic programming to reduce the
the above sentence calculations required in the above method.

The above possibility occurs when all of


the words are Noun. This sentence makes no
sense in the real world, and thus we have also
obtained very low probability. We can check
all possibilities in which the above sentence
can be generated from our Hidden Markov
Model and calculate the likelihood for each
one of them. We will ignore those paths in
which there is at least one 0 probability edge
because these paths will not be possible.
Apart from the above two already discussed

23

You might also like