HMM Bioinformatics
HMM Bioinformatics
–1–
CSCE 771 Spring 2011
Markov Chain
▪ A Markov chain is a model that tells us something about the
probabilities of sequences of random variables, states, each of
which can take on values from some set
▪ A Markov chain makes a very strong assumption that if we
want to predict the future in the sequence, all that matters is
the current state
▪ The states before the current state have no impact on the future
except via the current state
▪ It’s as if to predict tomorrow’s weather you could examine
today’s weather but you weren’t allowed to look at yesterday’s
weather
–2–
CSCE 771 Spring 2011
Markov Chain: Weather Example
–3–
CSCE 771 Spring 2011
Markov Chain: “First-order observable
Markov Model”
–4–
CSCE 771 Spring 2011
4
Hidden Markov Models
States Q = q1, q2…qN;
Observations O= o1, o2…oN;
■ Each observation is a symbol from a vocabulary V =
{v1,v2,…vV}
Transition probabilities
■ Transition probability matrix A = {aij}
Observation likelihoods
■ Output probability matrix B={bi(k)}
–5–
Special initial probability vector π CSCE 771 Spring 2011
A Markov model
–6–
CSCE 771 Spring 2011
A Markov model
–7–
CSCE 771 Spring 2011
The weather model
–8–
CSCE 771 Spring 2011
Practice Example
What is the probability of 4 consecutive warm
days?
Sequence is : hot-hot-hot-hot
–9–
CSCE 771 Spring 2011
Markov Chain for Weather
What is the probability of 4 consecutive hot weather?
Sequence is hot-hot-hot-hot
I.e., state sequence is 1-1-1-1-1
P(1,1,1,1,1) =
■ π1a11a11a11a11 = 0.5 x (0.5)4 = 0.03125
– 10 –
CSCE 771 Spring 2011
Markov Model
A Markov model consists of three things:
A set of states A, C, G, T
Transition probabilities between states
Probabilities for the starting states
0.2 Prob(starting in X)=1/4
0.4 0.7 Prob(starting in Y)=1/5
x Y
0.1
– 11 –
CSCE 771 Spring 2011
Markov Model
A Markov model consists of three things:
A set of states A, C, G, T
Transition probabilities between states
Probabilities for the starting states
Prob(starting in A)=1/4
Prob(starting in T)=1/4
Prob(starting in G)=0
Prob(starting in C)=1/4
– 12 –
CSCE 771 Spring 2011
Markov Model
A Markov model consists of three things:
A set of states A, C, G, T
Transition probabilities between states
Probabilities for the starting states
Prob(starting in A)=0.25
A C G T Prob(starting in T)=0.25
Prob(starting in G)=0
Prob(starting in C)=o.5
A 0.4 0.2 0.2 0.2
A C G T Prob(starting in A)=0.25
Prob(starting in T)=0.25
A 0.4 0.2 0.2 0.2 Prob(starting in G)=0
C 0.25 0.25 0.25 0.25 Prob(starting in C)=0.5
F B
0.2
State: F F B B B B F F
Emissions: T H H H T H T H
– 17 –
CSCE 771 Spring 2011
Markov Model
An HMM is a Markov model that emits symbols at each
state different probabilities
You are at a casino and the dealer has two coins
one coin is fair: 50% Heads and 50% Tails
one coin is biased: 90% Heads and 10% Tails
Goals: For each of the 10 coin tosses, guess which coin was used
Emit H: 0.5 Emit H: 0.9
T: 0.5 0.2 T: 0.1
0.8 0.8
F B
0.2
State: F F B B B B F F
Emissions: T H H H T H T H
– 18 –
CSCE 771 Spring 2011
Markov Model
The player only see the emissions
States are hidden
State: F F B B B B F F
Emissions: T H H H T H T H
– 19 –
CSCE 771 Spring 2011
Markov Model
What would be your guess for states from the following emissions?
Emissions: T H H H T H T H H H H T T H H H H
States:
F B
0.2
– 20 –
CSCE 771 Spring 2011
Markov Model: Forward Algorithm
It is easy for the dealer to compute the probability because they know both the
states and emissions
F B
0.2
State: F F B B B
Emissions: T H H H H
P(start in F). P(Emit T in F).P(Stay state in F). P(Emit H in F). P(F to B). P(Emit H in B).P(Stay state in B).
P(Emit H in B).P(Stay state in B). P(Emit H in B)
=0.5*0.5*0.8*0.5*0.2*0.9*0.8*0.9*0.8*0.9=0.00933
– 21 –
CSCE 771 Spring 2011
Markov Model: Forward Algorithm
But not easy for the player without knowing the states
As the nucleotide is five, there are different state combinations
Emit H: 0.5 Emit H: 0.9
T: 0.5 0.2 T: 0.1
0.8 0.8
F B
0.2
Case 1:
P
P(start in F). P(Emit T in F).P(Stay state in F). P(Emit H in F). P(Stay state in F). P(Emit H in F). P(Stay
state in F). P(Emit H in F). P(Stay state in F). P(Emit H in F).
=0.5*0.5*0.8*0.5*0.8*0.5*0.80*0.5*0.8*0.5=0.0064
– 22 –
CSCE 771 Spring 2011
Markov Model: Forward Algorithm
But not easy for the player without knowing the states
As the nucleotide is five, there are different state combinations
Emit H: 0.5 Emit H: 0.9
T: 0.5 0.2 T: 0.1
0.8 0.8
F B
0.2
Case 2:
P
P(start in F). P(Emit T in F).P(Stay state in F). P(Emit H in F). P(Stay state in F). P(Emit H in F). P(Stay
state in F). P(Emit H in F). P(F to B). P(Emit H in B).
=0.5*0.5*0.8*0.5*0.8*0.5*0.80*0.5*0.2*0.9=0.0029
– 23 –
CSCE 771 Spring 2011
Markov Model: Forward Algorithm
But not easy for the player without knowing the states
As the nucleotide is five, there are different state combinations
Emit H: 0.5 Emit H: 0.9
T: 0.5 0.2 T: 0.1
0.8 0.8
F B
0.2
Case 3:
P
P(start in F). P(Emit T in F).P(Stay state in F). P(Emit H in F). P(Stay state in F). P(Emit H in F). P(F to B).
P(Emit H in B). P(B to F). P(Emit H in F).
=0.5*0.5*0.8*0.5*0.8*0.5*0.20*0.9*0.2*0.5=0.0007
– 24 –
CSCE 771 Spring 2011
Markov Model: Forward Algorithm
But not easy for the player without knowing the states
As the nucleotide is five, there are different state combinations
Emit H: 0.5 Emit H: 0.9
T: 0.5 0.2 T: 0.1
0.8 0.8
F B
0.2
Case N:
P
P(start in B). P(Emit T in B).P(Stay state in B). P(Emit H in B). P(Stay state in B). P(Emit H in B). P(Stay
state in B). P(Emit H in B). P(Stay state in B). P(Emit H in B).
=0.5*0.1*0.8*0.9*0.8*0.9*0.80*0.9*0.8*0.9=0.0134
– 25 –
CSCE 771 Spring 2011
Viterbi Algorithm
Find the most likely state sequence
– 26 –
CSCE 771 Spring 2011
Scenario: The Occasionally
Dishonest Casino Problem
• A casino uses a fair die most of the time, but
occasionally switches to a loaded one
– Fair die: Prob(1) = Prob(2) = . . . = Prob(6) = 1/6
– Loaded die: Prob(1) = Prob(2) = . . . = Prob(5) =
1/10, Prob(6) = ½
– These are the emission probabilities
• Transition probabilities
– Prob(Fair, Loaded) = 0.01
– Prob(Loaded, Fair) = 0.2
– Transitions between states obey a Markov process
– 27 –
CSCE 771 Spring 2011
An HMM for Occasionally
Dishonest Casino
akl
0.99
0.80
1: 1/6 0.01
1: 1/10
2: 1/6
2: 1/10
3: 1/6 3: 1/10
4: 1/6 4: 1/10
5: 1/6 0.2 5: 1/10
6: 1/6 6: 1/2
– 28 –
CSCE 771 Spring 2011
Outcome = 6,2,6
The Viterbi Algorithm Static Probability is always =
0.5
Transition Probability
6 2 6
Fair (1/6)x(1/2) (1/6) x max{(1/12) x 0.99, (1/6) x
= 1/12 (1/4) x 0.2} max{0.01375 x 0.99,
0.02 x 0.2}
= 0.01375 = 0.00226875
Loaded (1/2) x (1/2) (1/10) x max{(1/12) x 0.01, (1/2) x
= 1/4 (1/4) x 0.8} max{0.01375 x 0.01,
0.02 x 0.8}
= 0.02 = 0.08
0.80
0.99
1: 1/6 0.01 1: 1/10
r v
2: 1/10
vk (i ) ek (xi ) max
r
rk
2:
3:
1/6
1/6 3: 1/10
4: 1/6 4: 1/10
(i 1)a 5:
6:
1/6
1/6
0.2 5:
6:
1/10
1/2
– 29 – Fair Loaded
CSCE 771 Spring 2011
The Viterbi Algorithm: Example
6 2 6
Fair (1/6)x(1/2) (1/6) x max{(1/12) x 0.99, (1/6) x
= 1/12 (1/4) x 0.2} max{0.01375 x 0.99,
0.02 x 0.2}
= 0.01375 = 0.00226875
Loaded (1/2) x (1/2) (1/10) x max{(1/12) x 0.01, (1/2) x
= 1/4 (1/4) x 0.8} max{0.01375 x 0.01,
0.02 x 0.8}
= 0.02 = 0.08
0.80
0.99
1: 1/6 0.01 1: 1/10
r v
2: 1/10
vk (i ) ek (xi ) max
r
rk
2:
3:
1/6
1/6 3: 1/10
4: 1/6 4: 1/10
(i 1)a 5:
6:
1/6
1/6
0.2 5:
6:
1/10
1/2
– 30 – Fair Loaded
CSCE 771 Spring 2011