Hidden Markov Model
Hidden Markov Model
Back in the days, the POS annotation was manually done by human
annotators but being such a laborious task, today we have automatic tools
that are capable of tagging each word with an appropriate POS tag within a
context.
In this, you will learn how to use POS tagging with the Hidden Makrow
model.
Alternatively, you can also follow this link to learn a simpler way to do POS
tagging.
If you want to learn NLP, do check out our Free Course on Natural
Language Processing at Great Learning Academy.
There are various techniques that can be used for POS tagging such as
Let us consider an example proposed by Dr.Luis Serrano and find out how
HMM selects an appropriate tag sequence for a sentence.
In this example, we consider only 3 POS tags that are noun, model and
verb. Let the sentence “ Ted will spot Will ” be tagged as noun, model, verb
and a noun and to calculate the probability associated with this particular
sequence of tags we require their Transition probability and Emission
probability.
Now, what is the probability that the word Ted is a noun, will is a model,
spot is a verb and Will is a noun. These sets of probabilities are Emission
probabilities and should be high for our tagging to be likely.
Let us calculate the above two probabilities for the set of sentences below
In the above sentences, the word Mary appears four times as a noun. To
calculate the emission probabilities, let us create a counting table in a
similar manner.
Mary 4 0 0
Jane 2 0 0
Will 1 3 0
Spot 2 0 1
Can 0 1 0
See 0 0 2
pat 0 0 1
Now let us divide each column by the total number of their appearances for
example, ‘noun’ appears nine times in the above sentences so divide each
term by 9 in the noun column. We get the following table after this
operation.
Mary 4/9 0 0
Jane 2/9 0 0
Can 0 1/4 0
See 0 0 2/4
pat 0 0 1
In a similar manner, you can figure out the rest of the probabilities. These
are the emission probabilities.
N M V <E>
<S> 3 1 0 0
N 1 3 1 4
M 1 0 3 0
V 4 0 0 0
In the above figure, we can see that the <S> tag is followed by the N tag
three times, thus the first entry is 3.The model tag follows the <S> just
once, thus the second entry is 1. In a similar manner, the rest of the table is
filled.
Next, we divide each term in a row of the table by the total number of co-
occurrences of the tag in consideration, for example, The Model tag is
followed by any other tag four times as shown below, thus we divide each
element in the third row by four.
N M V <E>
<S> 3/4 1/4 0 0
M 1/4 0 3/4 0
V 4/4 0 0 0
These are the respective transition probabilities for the above four
sentences. Now how does the HMM determine the appropriate sequence
of tags for a particular sentence from the above tables? Let us find it out.
Take a new sentence and tag them with wrong tags. Let the sentence, ‘
Will can spot Mary’ be tagged as-
Will as a model
Can as a verb
Spot as a noun
Mary as a noun
Now calculate the probability of this sequence being correct in the following
manner.
The probability of the tag Model (M) comes after the tag <S> is ¼ as seen
in the table. Also, the probability that the word Will is a Model is 3/4. In the
same manner, we calculate each and every probability in the graph. Now
the product of these probabilities is the likelihood that this sequence is
right. Since the tags are not correct, the product is zero.
1/4*3/4*3/4*0*1*2/9*1/9*4/9*4/9=0
When these words are correctly tagged, we get a probability greater than
zero as shown below
Calculating the product of these terms we get,
3/4*1/9*3/9*1/4*3/4*1/4*1*4/9*4/9=0.00025720164
For our example, keeping into consideration just three POS tags we have
mentioned, 81 different combinations of tags can be formed. In this case,
calculating the probabilities of all 81 combinations seems achievable. But
when the task is to tag a larger sentence and all the POS tags in the Penn
Treebank project are taken into consideration, the number of possible
combinations grows exponentially and this task seems impossible to
achieve. Now let us visualize these 81 combinations as paths and using the
transition and emission probability mark each vertex and edge as shown
below.
The next step is to delete all the vertices and edges with probability zero,
also the vertices which do not lead to the endpoint are removed. Also, we
will mention-
Now there are only two paths that lead to the end, let us calculate the
probability associated with each path.
<S>→N→M→N→N→<E>
=3/4*1/9*3/9*1/4*1/4*2/9*1/9*4/9*4/9=0.00000846754
<S>→N→M→N→V→<E>=3/4*1/9*3/9*1/4*3/4*1/4*1*4/9*4/9=0.000257201
64
Clearly, the probability of the second sequence is much higher and hence
the HMM is going to tag each word in the sentence according to this
sequence.