0% found this document useful (0 votes)
9 views13 pages

Prques 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views13 pages

Prques 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Properties of the Expectation-Maximization (EM) Algorithm

Bayesian estimation vs MLE

Definitions:

Training Set and Testing Set in Pattern Recognition:


• Training Set: A subset of the data used to train a model. It contains input-output pairs where
the output is known, allowing the model to learn the mapping from input features to the
output labels.
• Testing Set: A separate subset of the data used to evaluate the performance of the trained
model. It contains input-output pairs that were not used during training, providing an
unbiased assessment of how well the model generalizes to new data.

Training Dataset and Test Dataset in Pattern Recognition:

• Training Dataset: The entire collection of input-output pairs used for training the model.
It may include various features and labels that the model uses to learn patterns and
relationships.
• Test Dataset: The complete set of input-output pairs used to test the model’s performance
after training. It helps in assessing the accuracy, precision, recall, and other performance
metrics of the model on unseen data.

Validation dataset:

• A subset of the data set aside during training to tune the model’s hyperparameters and
to perform cross-validation.
• It is used to prevent overfitting and to ensure that the model generalizes well to new data.
• The validation dataset is not used for training the model but rather for validating the
model during the training process.

Auto Correlation in Pattern Recognition:

• Auto correlation refers to the correlation of a signal with a delayed copy of itself as a
function of delay.
• It measures how the values of the signal or time series are related to its past values,
helping to identify repeating patterns, trends, or periodicity within the data.

Stationary Process in Pattern Recognition:

• A stationary process is a stochastic process whose statistical properties, such as mean and
variance, do not change over time.
• In pattern recognition, this implies that the data's statistical characteristics are constant,
making it easier to model and predict future values.
• Stationarity is an important assumption in many time series analysis techniques.

Hidden Markov Model

• The hidden Markov Model (HMM) is a statistical model that is used to describe the
probabilistic relationship between a sequence of observations and a sequence of
hidden states.
• It is often used in situations where the underlying system or process that generates
the observations is unknown or hidden, hence it has the name “Hidden Markov
Model.”
• It is used to predict future observations or classify sequences, based on the underlying
hidden process that generates the data.

Components of HMM
• An HMM consists of two types of variables: hidden states and observations.
➢ The hidden states are the underlying variables that generate the observed
data, but they are not directly observable.
➢ The observations are the variables that are measured and observed.
• The HMM is the relationship between the hidden states and the observations using
two sets of probabilities: the transition probabilities and the emission probabilities.
➢ The transition probabilities describe the probability of transitioning from one
hidden state to another.
➢ The emission probabilities describe the probability of observing an output
given a hidden state.
• There is another set of initial probability distribution

Forward-Backward algorithm used in HMM

• The Forward-Backward algorithm is a fundamental method used in Hidden Markov


Models (HMMs) to compute the probabilities of hidden states given a sequence of
observed events.
• The steps are:

Initialization:

➢ Forward Pass: Start by initializing the probabilities for each state at the first time step,
based on the initial state distribution and the likelihood of the first observation.
➢ Backward Pass: Initialize the probabilities for the last time step, assuming all states are
equally likely to contribute to the final observation.

Forward Pass:

➢ Move forward through the sequence, updating the probability of being in each state at
each time step.
➢ This is done by considering the probability of transitioning from every possible state at
the previous step, combined with the likelihood of the current observation.

Backward Pass:

➢ Move backward through the sequence, updating the probability of having been in each
state at each previous time step.
➢ This is done by considering the probability of transitioning to every possible state at the
next step, combined with the likelihood of the next observation.

Combining Results:

➢ Combine the results from the forward and backward passes to calculate the overall
probability of being in each state at each time step.
➢ This involves multiplying the forward probability (probability of reaching that state) by
the backward probability (probability of observing the rest of the sequence from that
state onward).

Viterbi algorithm and its application in HMM

• The Viterbi algorithm is a dynamic programming algorithm for finding the most likely
sequence of hidden states(known as the Viterbi path) given a sequence of observed
events in a Hidden Markov Model (HMM).
• Steps:
Initialization
➢ Start by initializing the probabilities of being in each state at the first time step
based on the initial state distribution and the likelihood of the first observation.
Recursion
➢ For each subsequent time step, compute the highest probability of arriving at
each state from any of the previous states, taking into account the transition
probabilities and the likelihood of the current observation.
➢ Track the path (i.e., the sequence of states) that led to these maximum
probabilities.
Termination
➢ Identify the final state with the highest probability after processing the entire
sequence of observations.
Path Backtracking:
➢ Backtrack from the final state to the initial state using the tracked paths to
reconstruct the most likely sequence of hidden states.

• Applications:
➢ Speech Recognition: Finding the most likely sequence of phonemes or words.
➢ Bioinformatics: Identifying the most likely sequence of genes or proteins.
➢ Robotics: Determining the most probable sequence of states in a robot’s navigation
system.

Baum Welch Algorithm for training HMMs

• Baum-Welch algorithm is specifically designed for training Hidden Markov Models


(HMMs) when the model parameters (transition probabilities, emission probabilities,
initial state probabilities) are unknown or need to be refined based on observed data.
• Steps:
Expectation step(E-step):
➢ In the Baum-Welch algorithm, this involves computing the forward and backward
probabilities (𝛼 𝑡 ( 𝑖 ) and 𝛽 𝑡 ( 𝑖 )) for all possible state sequences given the observed
data.
➢ This step estimates how much each state and transition contributes to the observed
data.
Maximization step(M-step):
➢ In this step, the estimated probabilities from the E-step are used to update the
model parameters (transition probabilities, emission probabilities, initial state
probabilities).
➢ The goal is to maximize the likelihood of observing the data under the current model
parameters.
Iteration:
➢ Repeat the E-step and M-step iteratively until convergence criteria are met.
Convergence is typically assessed by the change in log-likelihood or parameter
values between iterations.
Convergence
➢ The Baum-Welch algorithm is guaranteed to converge to a local maximum of the
likelihood function, improving the model's fit to the observed data with each
iteration.

Role of HMM in classifier design

• Captures sequential and temporal patterns in data, which is crucial for tasks where order
and timing matter.
• Estimates the probability of observed sequences given different model parameters,
allowing for probabilistic classification.
• Utilizes algorithms like Forward-Backward and Baum-Welch to learn model parameters
from training data, ensuring the classifier adapts to the underlying patterns.
• Employs algorithms like Viterbi to find the most likely sequence of hidden states, aiding
in the classification decision.
• Robust to noise and variations in the data, improving classification performance in real-
world scenarios.

Variants of HMM

• Basic HMM: Standard model with discrete hidden states and discrete or continuous
observable states.
• Continuous HMM (CHMM): Extends basic HMM by assuming observable states are
continuous, modeled typically as Gaussian distributions.
• Left-Right HMM: Restricts state transitions to occur only from left to right or to stay in
the same state, used in speech recognition where sequences are linear.
• Mixture HMM (MHMM): Mixes multiple HMMs, allowing more complex behavior
modeling by combining simpler HMMs.
• Hidden Semi-Markov Model (HSMM): Allows for variable-length durations in states,
extending HMMs by modeling the time spent in each state explicitly.
• Coupled HMM (CHMM): Models interactions between multiple sequences of data, often
used in applications where multiple time series are interdependent.
• Factorial HMM (FHMM): Represents dependencies between multiple observed
sequences using a shared set of hidden states, useful for modeling complex interactions
among multiple data streams.
• Switching Linear Dynamical System (SLDS): A generalization of HMMs where the
dynamics of the hidden states are linear, suitable for modeling continuous time series
with multiple interacting states.

Advantages of Hidden Markov Models (HMMs):

• Flexibility: Can model sequences with complex temporal dependencies and non-linear
patterns.
• Probabilistic Framework: Provides probabilistic interpretations of sequence generation
and state transitions.
• Effective for Unsupervised Learning: Suitable for tasks where labeled data is scarce or
unavailable.
• Versatility: Applicable across various domains such as speech recognition,
bioinformatics, and finance.

Disadvantages of Hidden Markov Models (HMMs):


• Sensitivity to Model Complexity: Performance can degrade if the number of hidden
states or parameters is not properly chosen.
• Assumption of Stationarity: Assumes that the underlying processes generating data are
stationary, which may not always hold true.
• Difficulty in Parameter Estimation: Estimating parameters (e.g., transition probabilities,
emission probabilities) accurately can be computationally intensive.
• Limited Representational Power: May struggle to capture long-range dependencies or
complex interactions present in some data sequences.

Significance of HMM

• Modeling Sequential Data: HMMs are crucial for capturing patterns in sequential data
where the order and timing of events matter.
• Pattern Recognition: They are widely used in tasks like speech recognition, handwriting
recognition, and gesture recognition.
• Probabilistic Inference: HMMs provide a probabilistic framework for analyzing and
predicting sequences, offering insights into uncertainty and confidence levels.
• Versatility: They find application across diverse fields including biology, finance, natural
language processing, and robotics.

Discrete hidden HMM vs Continuous density HMM


The algorithms like Viterbi, Baum-Welch, and Forward-Backward are used in both discrete and
continuous Hidden Markov Models (HMMs). These algorithms are fundamental to the
framework of HMMs in general, regardless of whether the observable states are discrete
(represented as symbols) or continuous (modeled as probability distributions).

Need of dimension reduction in pattern recognition

• Curse of Dimensionality: High-dimensional data increases computational complexity and


requires more data to achieve reliable estimates.
• Improved Computational Efficiency: Reducing dimensions speeds up learning algorithms
and reduces storage requirements.
• Noise and Redundancy Reduction: Eliminates irrelevant features and noise, focusing on
meaningful patterns.
• Visualization: Simplifies data visualization and interpretation, aiding in understanding
complex relationships.
• Enhanced Generalization: Reduces overfitting by focusing on essential features, improving
model generalization.
• Feature Extraction: Identifies informative features and reduces the impact of irrelevant or
redundant data.

Fisher’s Linear discriminant analysis


Linear Discriminant Function

• A linear discriminant function is a linear combination of features used to classify or


discriminate between different classes in a supervised learning context.
Computing Linear Discriminant Function
Principal Component Analysis (PCA)
Applications of PCA

PCA vs LDA
Performing PCA on a given dataset and determining principal components

Let's consider a dataset with three samples (data points) and two features (dimensions):

1 2
[2 3]
3 4
Step 1: Mean Cenetering

Calculating of each column


1+2+3
Mean if x1 = =2
3

You might also like