0% found this document useful (0 votes)
42 views28 pages

ML 5

The document discusses hidden Markov models (HMMs). It describes the key elements of an HMM including states, observations, transition probabilities, and emission probabilities. It also outlines the three basic problems of HMMs: evaluation, finding the most likely state sequence, and learning model parameters from data. The evaluation and state sequence problems can be solved using dynamic programming algorithms like the forward-backward and Viterbi algorithms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views28 pages

ML 5

The document discusses hidden Markov models (HMMs). It describes the key elements of an HMM including states, observations, transition probabilities, and emission probabilities. It also outlines the three basic problems of HMMs: evaluation, finding the most likely state sequence, and learning model parameters from data. The evaluation and state sequence problems can be solved using dynamic programming algorithms like the forward-backward and Viterbi algorithms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Day & Time: Monday (10am-11am & 3pm-4pm)

Tuesday (10am-11am)
Wednesday (10am-11am & 3pm-4pm)
Friday (9am-10am, 11am-12am, 2pm-3pm)
Dr. Srinivasa L. Chakravarthy
&
Smt. Jyotsna Rani Thota
Department of CSE
GITAM Institute of Technology (GIT)
Visakhapatnam – 530045
Email: [email protected] & [email protected]
Department of CSE, GIT 1
2 Novt 2020
EID 403 and machine learning
Course objectives

● Explore about various disciplines connected with ML.


● Explore about efficiency of learning with inductive bias.
● Explore about identification of Ml algorithms like decision
tree learning.
● Explore about algorithms like Artificial Neural networks,
genetic programming, Bayesian algorithm, Nearest neighbor
algorithm, Hidden Markov chain model.

Department of CSE, GIT EID 403 and machine learning


Learning Outcomes

● Identify the various applications connected with ML.


● Classify efficiency of ML algorithms with Inductive bias
technique.
● Discriminate the purpose of all ML algorithms.
● Analyze any application and Correlate available ML
algorithms.
● Choose an ML algorithm to develop their project.

Department of CSE, GIT EID 403 and machine learning


Syllabus

20 August 2020 4

Department of CSE, GIT EID 403 and machine learning


Reference book 1. Title -Machine Learning
Author- Tom M Mitchell

Department of CSE, GIT EID 403 and machine learning


Reference book 2. Title –Introduction to Machine Learning
Author- Ethem Alpaydin

Department of CSE, GIT EID 403 and machine learning


Module -5
(Chapter-15 from prescribed book author -Ethem Alpaydin)
It includes-

Discrete Markov processes

Hidden Markov Models

Three problems of HMM

Evaluation problem

Finding state sequence

Learning model parameters & continuous observations

HMM with output & Model selection in HMM 7


Introduction
So far, we assumed that the instances that forms a sample are
independent and identically distributed i.e., if each random variable has the
same probability distribution as the others and all are mutually independent.

This assumption is not valid for applications where successive instances


are dependent.

For example, Processes where sequence of observations cannot be


modeled as sample probability distributions are-

1. In a word successive letters are dependent.


2. Base pairs in a DNA sequence are dependent. and.
3. In a speech recognition, phonemes in a word (dictionary),
words in a sentence (syntax, semantics of the language).
Introduction

Any sequence is characterized by parametric random process.

In this chapter-it is about

● How modelling will be done.


● How parameters of such a model can be learned from a training sample of-
example sequences.
Discrete Markov Processes
Consider a system that acts like, at any time it is in one of the

set of N distinct states-S1,S2,...SN,.

The state at time t is qt, t=1,2,...

For example qt=Si means that at time t, the system state is Si.

At regularly spaced discrete times, the system moves to a state with a given
probability depends on previous state-

P(qt+1=Sj | qt=Si, qt-1=Sk ,...)


Discrete Markov Processes(cont.)

First-order Markov, the state at time t+1 depends on state at time t-


P(qt+1=Sj | qt=Si , qt-1=Sk ,...) = P(qt+1=Sj | qt=Si)
This corresponds to saying that, for a give present state-the future is
independent of past.

Let us assume that the probabilities are independent of time called transition
probabilities-
aij ≡ P(qt+1=Sj | qt=Si) aij ≥ 0 and Σj= 1N aij=1
So,going from Si to Sj has the same probability aij at any time. The only special
case is first state Si with an initial probability 𝛑i .
πi ≡ P(q1=Si) Σj=1N πi=1
Discrete Markov Processes(cont.)

Example of a markov model with 3 states. This is a stochastic automaton.

In an observable Markov model, the states are observable. At any time, as the
system moves from one state to other state, we get an observation sequence i.e., a
sequence of states.

The output of the process is set of states at each instant of time where each state
corresponds to physical observable event.
Discrete Markov Processes(cont.)
We have an observation sequence, i.e, the state sequence O = Q ={q1,q2,..qT}

Where the probability is given as

Where 𝛑q1 is probability going from q1. aq1q2 is the probability of going from q1 to q2
.
We multiply these probabilities to get the probability of whole sequence.
Discrete Markov Processes(cont.)
Let us assume, we have N urns/baskets where each urn contains balls of only
one color.
So there is an urn of red balls, another of blue balls and so on..

Let us say we have 3 states, S : red, S : blue, S : green


1 2 3
With initial probabilities
Let us say A=[ aij ] is a N X N matrix whose rows sum to 1.
aij is the probability of drawing from urn j (a ball of color j), after drawing a ball
of color i from urn i. The transition matrix is
Discrete Markov Processes(cont.)
Given 𝚷 and A, it is easy to generate K random sequences each of length T.

Let us see how to calculate probability of a sequence..

Assume that the first 4 balls are “red,red,green,green” .

This corresponds to observation sequence O={S1,S1,S3,S3}.

Its probability is-


Discrete Markov Processes(cont.)
Now, let us see how we can learn the parameters 𝚷 and A.

Given K example sequences of length T,


Where qtk is the state at time t of sequence k,
The initial probability

Where 1(b) is 1 if b is true and 0 otherwise.


Transition probability aij
Hidden Markov Models
In HMM,

1. The states are not observable.

But when we visit a state, an observation is recorded that is a probabilistic function of


the state.

2. Discrete observations {v1,v2,...,vM} in each state.


3. Observable or Emission probability bj(m),that we observe vm, m=1...M in state Sj.
bj(m) ≡ P(Ot=vm | qt=Sj)
We assume that the probabilities do not depend on t.
The values observed forms the observation sequence O.
4. The state sequence Q is not observed, that is what makes the model “hidden” but it
should be inferred from the observation sequence O.
Hidden Markov Models(cont.)
Elements of an HMM

N: Number of states
M: Number of observation symbols
A = [aij]: N X N state transition probability matrix
B = bj(m): N X M observation probability matrix
Π = [πi]: N X 1 initial state probability vector

λ = (A, B, Π), λ is a parameter set of HMM.


Given λ, the model can be used to generate an arbitrary number of
observation sequences of arbitrary length.
Three Basic Problems of HMMs

Given a number of sequences of observations,we are interested in 3 problems-

1. Evaluation- Given a model λ and observation sequence O, evaluate the probability


P(O| λ).

2. State Sequence- Given λ and O, state sequence Q ={q1,q2...qT}, find Q* which is the
highest probability state sequence ,
such that P (Q* | O, λ ) = maxQ P (Q | O , λ ) .

3. Learning- Given a training set of observation sequences, X={Ok}k learn the model
that maximizes the probability of X i.e., find λ* = maxλ P ( X | λ ).
Hidden Markov Models(cont.)
1. Evaluation Problem

Give an observation sequence O = {O1,O2...OT} and state sequence Q ={q1,...qT},

λ is a parameter set of HMM.

To calculate P(O| λ) there is an efficient procedure called forward-backward


procedure.
It is based on the idea of dividing the observation sequence into two parts-
1. Starting from time 1 until time t,
2. Starting from time t+1 until time t.
Hidden Markov Models(cont.)
1. Evaluation Problem(cont.)
We define forward variable as, the probability of observing the partial
sequence {O1...Ot} until time t and being in Si at time t, given the model λ:

The nice thing about it is that, it can be calculated recursively by accumulating


results-
Hidden Markov Models(cont.)
1. Evaluation Problem(cont.)
We define backward variable as, the probability of being in Si at time t and
observing the partial sequence Ot+1….OT

It can be calculated recursively by time going in the backward direction-


Hidden Markov Models(cont.)
2. Finding the State Sequence-
Let us define as, the probability of being in state S i at time t, given O
and λ, which can be computed as follows-

To find the state sequence,Choose the state that has the highest probability,
for each time step t:
qt*= arg maxi γt(i)
Hidden Markov Models(cont.)
2. Finding the State Sequence-(cont.)

To find the single best state sequence, we use the Viterbi algorithm, based
on dynamic programming, which takes such transition probabilities into account.
Given state sequence Q, observation sequence O,
we define δt(i) ≡ maxq1q2∙∙∙ qt-1 p(q1q2∙∙∙qt-1,qt =Si,O1∙∙∙Ot | λ)

Where δt(i) is the highest probability path at time t that accounts for the first
t observations and ends in Si.
Hidden Markov Models(cont.)
3. Learning Model Parameters
To calculate λ* that maximizes the probability of X, i.e., P ( X | λ )

We define as, the probability of being in S i at time t and in Sj at


time t+1, given the whole observation O and λ-
Hidden Markov Models(cont.)
Continuous Observations-
We assumed discrete observations modeled as a multi-nominal-

The k-means used for vector quantization is the hard version of a


Gaussian mixture model-

The scalar continuous observation, The easiest is to assume a normal distribution-


Hidden Markov Models(cont.)
Model Selection in HMM-
Example of left-right HMM

In classification, estimate P (O | λi) by a separate HMM and


Use Bayes’ rule-
END OF MODULE-5 (Chapter 15)

You might also like