Memoria 4
Memoria 4
2023-11-12
1 Objectives
Project Objective: The objective of this project is to analyze and implement the Viterbi al-
gorithm for Hidden Markov Models (HMMs). The project aims to study the efficiency of the
Viterbi algorithm on HMMs with different characteristics.
Context: Hidden Markov Models (HMMs) are widely used in various fields, including natural
language processing, speech recognition, bioinformatics, and more. HMMs model sequences of
observations by representing underlying states and their transitions. The Viterbi algorithm is
a dynamic programming algorithm used to find the most likely sequence of hidden states given
a sequence of observations.
Algorithm to Be Analyzed: The primary focus of this project is the Viterbi algorithm. The
Viterbi algorithm is used to compute the maximum-likelihood sequence of hidden states in an
HMM. It is a crucial algorithm in HMM-based applications as it helps in determining the most
probable sequence of states based on observed data.
Specific Algorithmic Issue: The main algorithmic issue to be addressed in this project is the
efficiency of the Viterbi algorithm. This involves analyzing the time complexity of the algorithm
and understanding how it scales with different sizes and characteristics of HMMs. The goal is
to determine how the algorithm’s performance is affected by the number of states, observations,
and other factors.
Approach: To accomplish the objectives of the project, the following approach will be taken:
Implementation of Viterbi Algorithm: The project will involve the
implementation of the Viterbi algorithm. The algorithm will be implemented
in Java to compute the maximum-likelihood sequence of hidden states based
on the given HMM and observed data.
1
analysis of results, and any observations related to the algorithm's
behavior.
Expected Outcome: The expected outcome of this project is a deeper understanding of the
Viterbi algorithm’s efficiency in different HMM scenarios. By analyzing the algorithm’s perfor-
mance, it will be possible to identify the factors that affect its efficiency and determine under
what conditions it may be most effective.
This project aims to provide insights into the practical application of the Viterbi algorithm and
its performance considerations, which can be valuable in various fields where HMMs are used.
2 Experimental Setup
Performance vs. Number of States: In this experiment, the Viterbi algorithm
will be run on HMMs with different numbers of states, ranging from small to
large. The objective is to measure the algorithm's runtime as the number of
states increases.
2
the results. The average runtime and any other relevant statistics will be calculated based on
these runs.
The experiments will be designed to provide insights into the algorithm’s performance under
different conditions, helping to understand how it scales with varying numbers of states and
observations and how it behaves with different HMM characteristics.
3 Empirical Results
A summary of the experimental results is provided in Table 2 in the Appendix.
Describe the results in Figure 1. The coefficients of the statistical fitting are provided in the
Appendix. Time (t): This is the dependent variable, representing the time taken for a certain
process or system to complete. The goal is to understand how this time variable depends on
the other two factors.
Observations:
This is one of the independent variables. It represents the number of observations made
States:
This is the second independent variable. It represents the number of states in the syste
Power Model:
The power model form t=a�observationsb�statest=a�observationsb�state
s indicates that the relationship between time, observations, and states
is not linear but follows a power law. This implies that the effect of
changes in observations and states is not proportional but follows a
power relationship.
Dotted Lines:
The dotted lines indicate a fit to the power model. This means that the
parameters 'a' and 'b' have been estimated from observed data, and the
model has been plotted as a curve that best fits the observed
relationship.
Interpretation:
The power model suggests that the time required for a process is
influenced by both the number of observations and the number of states,
and the nature of this influence is determined by the parameters 'a' and
'b'. If 'b' is positive, it indicates a positive correlation, meaning
that an increase in observations or states leads to an increase in time.
If 'b' is negative, it implies a negative correlation.
Practical Use:
3
This kind of model might be used in fields where the time complexity of
a process depends on both the number of observations and the number of
states. Understanding this relationship can help in optimizing processes
or predicting the time required for similar tasks in the future.
In summary, the function represents a power-law relationship between time, observations, and
states, and the dotted lines show a fitted curve based on observed data, providing insights into
the nature of the relationship and the influence of the variables involved.
observations
0.60 6000
4000
2000
0.40
time (s)
0.20
0.00
Figure 1: Time as a function of the number of states for different number of observations. The
𝑏
dotted lines show a fit to a power model 𝑡 = 𝑎 ⋅ observations ⋅ states𝑐 .
4 Discussion
Provide your interpretation of the results: discuss whether the results match the theoretical
predictions, whether some algorithm is better in practice than others, etc.
Initialization (Base Case):
The initialization of the LL and DD matrices for the first observation
involves O(n)O(n) operations, where nn is the number of states.
Backtracking:
The backtracking step involves traversing the DD matrix for each
4
observation, which takes O(T�n)O(T�n) time.
The overall time complexity is dominated by the dynamic programming step, resulting in
O(T�n2)O(T�n2).
The results match completely the therical predictions so , what we have done seems to be
correct.
A Appendix
A.1 Data Summary
5
#states #observations time (s)
40 384 0.0020067
40 576 0.0030241
40 864 0.0055117
40 1296 0.0071505
40 1944 0.0090067
40 2916 0.0138917
40 4374 0.0195512
40 6561 0.0336450
60 256 0.0021409
60 384 0.0032958
60 576 0.0052498
60 864 0.0077286
60 1296 0.0116632
60 1944 0.0209514
60 2916 0.0287044
60 4374 0.0382013
60 6561 0.0613277
90 256 0.0043476
90 384 0.0067279
90 576 0.0141432
90 864 0.0190311
90 1296 0.0281548
90 1944 0.0369649
90 2916 0.0662574
90 4374 0.0797176
90 6561 0.1388702
135 256 0.0090389
135 384 0.0166886
135 576 0.0273701
135 864 0.0335366
135 1296 0.0474821
135 1944 0.0801279
135 2916 0.1161059
135 4374 0.2071814
135 6561 0.3471609
202 256 0.0286490
202 384 0.0449808
202 576 0.0975789
202 864 0.1187365
202 1296 0.1987051
202 1944 0.2449659
202 2916 0.3350550
202 4374 0.4537273
202 6561 0.6434945
6
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 9.626e-09 2.638e-09 3.649 0.000474 ***
## b 8.680e-01 1.839e-02 47.195 < 2e-16 ***
## c 1.962e+00 4.346e-02 45.143 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01038 on 78 degrees of freedom
##
## Number of iterations to convergence: 17
## Achieved convergence tolerance: 1.449e-06