NLP Assignment 5
NLP Assignment 5
POORVAJA R
AIML
III YEAR
The forward algorithm computes the forward probabilities recursively, using the following
formula:
where α_i(t) is the probability of being in state i at time t, given the observations up to
time t and the HMM λ. The forward algorithm starts by computing the base case, which
is the probability of starting in state i and observing the first observation:
where π_i is the initial probability of being in state i, and b_i(o_1) is the probability of
observing the first observation o_1, given that the system is in state i.
The forward algorithm then computes the forward probabilities for each time step t > 1,
using the following recursive formula:
The backward algorithm computes the backward probabilities recursively, using the
following formula:
where β_i(t) is the probability of observing the future observations, given that the system
is in state i at time t and the HMM λ. The backward algorithm starts by computing the
base case, which is the probability of observing the future observations, given that the
system is in state i at the last time step:
β_i(T) = 1
The backward algorithm then computes the backward probabilities for each time step t <
T, using the following recursive formula:
where b_j(o_t+1) is the probability of observing the observation o_t+1, given that the
system is in state j, and the sum is over all possible states j.
Once the forward and backward probabilities have been computed, the probability of
being in a particular state at a particular time can be computed using the following
formula:
The HMM-based POS tagging model can be represented using the following
components:
A set of states S = {s1, s2, ..., sn} representing the different parts of speech.
A set of observations O = {o1, o2, ..., om} representing the different words in the text.
An initial probability distribution π, which gives the probability of starting in each state.
A transition probability matrix A, which gives the probability of moving from one state to
another.
An emission probability matrix B, which gives the probability of observing each
observation given the current state.
The transition probability matrix A and emission probability matrix B can be estimated
from a corpus of labeled data using the maximum likelihood estimation method. The
initial probability distribution π can be set to be uniform or estimated from the corpus.
Given a sequence of words, the HMM-based POS tagging algorithm computes the most
likely sequence of part-of-speech tags that would produce the observed sequence of
words. This is done using the Viterbi algorithm, which is a dynamic programming
algorithm that computes the maximum likelihood sequence of hidden states.
The Viterbi algorithm starts by computing the probability of being in each state at the
first time step, given the first observation. This is done using the following formula:
where δ_i(1) is the probability of being in state i at the first time step, given the first
observation o_1.
The algorithm then computes the maximum likelihood sequence of hidden states
recursively, using the following formula:
where δ_i(t) is the probability of being in state i at time step t, given the observations up
to time t, and the maximum is taken over all possible states j.
Once the most likely sequence of hidden states has been computed using the Viterbi
algorithm, the corresponding part-of-speech tags can be assigned to the words in the
observed sequence.
HMM-based POS tagging has been used in many natural language processing
applications, such as information extraction, text summarization, and machine
translation. It is a simple and effective approach for POS tagging that can be easily
extended to handle more complex models, such as hidden semi-Markov models or
neural network models.
The Viterbi algorithm is a dynamic programming algorithm used to find the most likely
sequence of hidden states in a Hidden Markov Model (HMM). It is commonly used in
natural language processing tasks such as part-of-speech tagging, where the goal is to
assign a part of speech tag to each word in a sentence.
The Viterbi algorithm works by recursively computing the maximum likelihood probability
of each state at each time step, given the observations up to that time step. At each
time step, the algorithm keeps track of the most likely sequence of states that would
produce the observed sequence of observations up to that time step. Once the
algorithm has computed the most likely sequence of states, the corresponding output
sequence can be determined.
Let's take an example of using the Viterbi algorithm for part-of-speech tagging. Consider
the following sentence: "The cat sat on the mat". We want to assign a part of speech tag
to each word in the sentence. We can represent this problem as an HMM, where the
hidden states represent the part of speech tags and the observations represent the
words in the sentence.
We can estimate the transition probability matrix A and emission probability matrix B
from a training corpus. For simplicity, let's assume that we have estimated the matrices
as follows:
A=
Noun Verb Article Preposition
Noun 0.2 0.4 0.1 0.3
Verb 0.1 0.3 0.2 0.4
Article 0.6 0.1 0.2 0.1
Prep 0.4 0.2 0.3 0.1
B=
The cat sat on mat
Noun 0.2 0.3 0.1 0.2 0.2
Verb 0.1 0.2 0.3 0.2 0.2
Article 0.6 0.1 0.1 0.1 0.1
Prep 0.1 0.1 0.2 0.4 0.2
Now, we can apply the Viterbi algorithm to find the most likely sequence of
part-of-speech tags for the sentence "The cat sat on the mat".
At time step t=1, we start with the initial probability distribution π, which can be set to be
uniform. The probability of being in each state at the first time step is:
At time step t=2, we compute the probability of being in each state at time step t=2,
given the observations up to time step t=2, and the most likely sequence of states