0% found this document useful (0 votes)
188 views5 pages

NLP Assignment 5

The document discusses the forward-backward algorithm and how it can be used to compute probabilities in hidden Markov models. It also explains how hidden Markov models can be used for part-of-speech tagging by assigning probabilities to tags for each word.

Uploaded by

poorvaja.r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
188 views5 pages

NLP Assignment 5

The document discusses the forward-backward algorithm and how it can be used to compute probabilities in hidden Markov models. It also explains how hidden Markov models can be used for part-of-speech tagging by assigning probabilities to tags for each word.

Uploaded by

poorvaja.r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

NLP ASSIGNMENT 5

POORVAJA R
AIML
III YEAR

1. Explain Forward – Backward algorithm


The forward-backward algorithm is a dynamic programming algorithm used to compute
the probabilities of being in a particular state in a hidden Markov model (HMM). HMMs
are statistical models that are commonly used to model time series data, where the
observed data is a sequence of observations, and the underlying state of the system is
not directly observable.

The forward-backward algorithm computes two sets of probabilities: the forward


probabilities and the backward probabilities. The forward probabilities are the
probabilities of being in a particular state at a particular time, given all the previous
observations. The backward probabilities are the probabilities of observing all the future
observations, given that the system is in a particular state at a particular time.

The forward algorithm computes the forward probabilities recursively, using the following
formula:

α_i(t) = P(o_1, o_2, ..., o_t, q_t = i | λ)

where α_i(t) is the probability of being in state i at time t, given the observations up to
time t and the HMM λ. The forward algorithm starts by computing the base case, which
is the probability of starting in state i and observing the first observation:

α_i(1) = π_i * b_i(o_1)

where π_i is the initial probability of being in state i, and b_i(o_1) is the probability of
observing the first observation o_1, given that the system is in state i.

The forward algorithm then computes the forward probabilities for each time step t > 1,
using the following recursive formula:

α_i(t) = b_i(o_t) * ∑_j α_j(t-1) * a_ji


where b_i(o_t) is the probability of observing o_t given that the system is in state i, a_ji
is the transition probability from state j to state i, and the sum is over all possible states
j.

The backward algorithm computes the backward probabilities recursively, using the
following formula:

β_i(t) = P(o_t+1, o_t+2, ..., o_T | q_t = i, λ)

where β_i(t) is the probability of observing the future observations, given that the system
is in state i at time t and the HMM λ. The backward algorithm starts by computing the
base case, which is the probability of observing the future observations, given that the
system is in state i at the last time step:

β_i(T) = 1

The backward algorithm then computes the backward probabilities for each time step t <
T, using the following recursive formula:

β_i(t) = ∑_j a_ij * b_j(o_t+1) * β_j(t+1)

where b_j(o_t+1) is the probability of observing the observation o_t+1, given that the
system is in state j, and the sum is over all possible states j.

Once the forward and backward probabilities have been computed, the probability of
being in a particular state at a particular time can be computed using the following
formula:

γ_i(t) = α_i(t) * β_i(t) / P(O | λ)

where P(O | λ) is the probability of observing the entire sequence of observations O,


given the HMM λ.

The forward-backward algorithm is used in many applications, such as speech


recognition, natural language processing, and bioinformatics. It is particularly useful in
applications where the underlying state of the system is not directly observable, and the
observed data is noisy or incomplete. The algorithm can be extended to handle more
complex models, such as hidden semi-Markov models, and can be used in combination
with other algorithms, such as the Viterbi algorithm, to perform more complex inference
tasks.
2. Explain HMM model for POS Tagging
Part-of-speech (POS) tagging is the process of assigning a part of speech, such as
noun, verb, adjective, or adverb, to each word in a text. Hidden Markov Model (HMM) is
a statistical model that can be used for POS tagging. In HMM-based POS tagging, the
observed data is the sequence of words in a text, and the hidden states are the parts of
speech.

In an HMM-based POS tagging model, each word in the text is treated as an


observation, and each part of speech is treated as a state. The model assumes that the
probability of observing a word depends only on the part of speech of the word, and that
the part of speech of a word depends only on the part of speech of the previous word.
This assumption is known as the Markov assumption.

The HMM-based POS tagging model can be represented using the following
components:

A set of states S = {s1, s2, ..., sn} representing the different parts of speech.
A set of observations O = {o1, o2, ..., om} representing the different words in the text.
An initial probability distribution π, which gives the probability of starting in each state.
A transition probability matrix A, which gives the probability of moving from one state to
another.
An emission probability matrix B, which gives the probability of observing each
observation given the current state.
The transition probability matrix A and emission probability matrix B can be estimated
from a corpus of labeled data using the maximum likelihood estimation method. The
initial probability distribution π can be set to be uniform or estimated from the corpus.

Given a sequence of words, the HMM-based POS tagging algorithm computes the most
likely sequence of part-of-speech tags that would produce the observed sequence of
words. This is done using the Viterbi algorithm, which is a dynamic programming
algorithm that computes the maximum likelihood sequence of hidden states.

The Viterbi algorithm starts by computing the probability of being in each state at the
first time step, given the first observation. This is done using the following formula:

δ_i(1) = π_i * b_i(o_1)

where δ_i(1) is the probability of being in state i at the first time step, given the first
observation o_1.
The algorithm then computes the maximum likelihood sequence of hidden states
recursively, using the following formula:

δ_i(t) = b_i(o_t) * max_j (δ_j(t-1) * a_ji)

where δ_i(t) is the probability of being in state i at time step t, given the observations up
to time t, and the maximum is taken over all possible states j.

Once the most likely sequence of hidden states has been computed using the Viterbi
algorithm, the corresponding part-of-speech tags can be assigned to the words in the
observed sequence.

HMM-based POS tagging has been used in many natural language processing
applications, such as information extraction, text summarization, and machine
translation. It is a simple and effective approach for POS tagging that can be easily
extended to handle more complex models, such as hidden semi-Markov models or
neural network models.

3. Explain the Viterbi Algorithm with the help of a suitable example.

The Viterbi algorithm is a dynamic programming algorithm used to find the most likely
sequence of hidden states in a Hidden Markov Model (HMM). It is commonly used in
natural language processing tasks such as part-of-speech tagging, where the goal is to
assign a part of speech tag to each word in a sentence.

The Viterbi algorithm works by recursively computing the maximum likelihood probability
of each state at each time step, given the observations up to that time step. At each
time step, the algorithm keeps track of the most likely sequence of states that would
produce the observed sequence of observations up to that time step. Once the
algorithm has computed the most likely sequence of states, the corresponding output
sequence can be determined.

Let's take an example of using the Viterbi algorithm for part-of-speech tagging. Consider
the following sentence: "The cat sat on the mat". We want to assign a part of speech tag
to each word in the sentence. We can represent this problem as an HMM, where the
hidden states represent the part of speech tags and the observations represent the
words in the sentence.

We can define the hidden states as follows:


S = {Noun, Verb, Article, Preposition}

And the observations as follows:

O = {The, cat, sat, on, mat}

We can estimate the transition probability matrix A and emission probability matrix B
from a training corpus. For simplicity, let's assume that we have estimated the matrices
as follows:

A=
Noun Verb Article Preposition
Noun 0.2 0.4 0.1 0.3
Verb 0.1 0.3 0.2 0.4
Article 0.6 0.1 0.2 0.1
Prep 0.4 0.2 0.3 0.1

B=
The cat sat on mat
Noun 0.2 0.3 0.1 0.2 0.2
Verb 0.1 0.2 0.3 0.2 0.2
Article 0.6 0.1 0.1 0.1 0.1
Prep 0.1 0.1 0.2 0.4 0.2

Now, we can apply the Viterbi algorithm to find the most likely sequence of
part-of-speech tags for the sentence "The cat sat on the mat".

At time step t=1, we start with the initial probability distribution π, which can be set to be
uniform. The probability of being in each state at the first time step is:

δ_Noun(1) = π_Noun * B_Noun(The) = 0.25 * 0.2 = 0.05


δ_Verb(1) = π_Verb * B_Verb(The) = 0.25 * 0.1 = 0.025
δ_Article(1) = π_Article * B_Article(The) = 0.25 * 0.6 = 0.15
δ_Prep(1) = π_Prep * B_Prep(The) = 0.25 * 0.1 = 0.025

At time step t=2, we compute the probability of being in each state at time step t=2,
given the observations up to time step t=2, and the most likely sequence of states

You might also like