0% found this document useful (0 votes)
96 views59 pages

Hmms

Here are the steps to solve the evaluation problem for HMMs: 1. Define the HMM model M with states S, emissions Ω, transition probabilities A, and emission probabilities B. 2. Use the forward algorithm to calculate P(x|M), the probability of observing the sequence x given the model M. 3. The forward algorithm calculates P(x|M) by summing over all possible state sequences: P(x|M) = Σs1...sT P(x,s1...sT|M) 4. It does this recursively by calculating P(x1...xt, st|M) for t = 1 to T. 5.

Uploaded by

matin ashrafi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views59 pages

Hmms

Here are the steps to solve the evaluation problem for HMMs: 1. Define the HMM model M with states S, emissions Ω, transition probabilities A, and emission probabilities B. 2. Use the forward algorithm to calculate P(x|M), the probability of observing the sequence x given the model M. 3. The forward algorithm calculates P(x|M) by summing over all possible state sequences: P(x|M) = Σs1...sT P(x,s1...sT|M) 4. It does this recursively by calculating P(x1...xt, st|M) for t = 1 to T. 5.

Uploaded by

matin ashrafi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

HMMs and applications

Notes from Dr. Takis Benos


and DEKM book
Markov chains
•  What is a Markov chain?

Markov chain of order n is a stochastic process of


a series of outcomes, in which the probability of
outcome x depends on the state of the previous n
outcomes.

Benos 02-710/MSCBIO2070 1-3.Apr.2013 5


Markov chains (cntd)

•  Markov chain (of first order) and the Chain Rule


!
P( x ) = P(X L , X L"1,..., X1 ) =
= P(X L | X L"1,..., X1 )P(X L"1, X L"2 ,..., X1 ) =
= P(X L | X L"1,..., X1 )P(X L"1 | X L"2 ,..., X1 )...P(X1 ) =
= P(X L | X L"1 )P(X L"1 | X L"2 )...P(X 2 | X1 )P(X1 ) =
L
= P(X1 )# P(X i | X i"1 )
i= 2

Chain rule: P(A,B,C)=P(C|A,B) P(B|A) P(A)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 6


!
Application of Markov chains:
CpG islands

l  CG is relatively rare in the genome due to high mutation of methyl-


CG to methyl-TG (or CA)
l  Methylated CpG residues are often associated with house-keeping
genes in the promoter and exon regions.
l  Methyl-CpG binding proteins recruit histone deacetylases and are
thus responsible for transcriptional repression.
l  They have roles in gene silencing, genomic imprinting, and X-
chromosome inactivation.

Benos 02-710/MSCBIO2070 1-3.Apr.2013 7


CpG islands and DNA Methylation
Largely confined to CpG dinucleotides

CpG islands - regions of more than 500 bp with CG content > 55%

denoted CpG to not confuse with CG base pair

Methylation often suppressed around genes, promoters

Can we predict CpG islands? – a good way of identifying


potential gene regions as well! – But not so fast!! 8
Application of Markov chains:
CpG islands
•  Problem:
Given two sets of sequences from the human genome,
one with CpG islands and one without, can we calculate a
model that can predict the CpG islands?

Sequence: s = t !t !a!c!g!g!t N
0th-order: P0 (s ) = p(t ) ⋅ p(t ) ⋅ p(a ) ⋅ p(c ) ⋅ p(g ) = ∏ p(s ) i
i =1
N
1st-order: P1 (s ) = p(t ) ⋅ p(t | t ) ⋅ p(a | t ) ⋅ p(c | a ) = p(s1 ) ⋅ ∏ p(si | si −1 )
i =2
N
2nd-order: P (s ) = p(tt ) ⋅ p(a | tt ) ⋅ p(c | ta ) ⋅ p(g | ac ) = p(s s ) ⋅
2 1 2 ∏ p(s | s
i =3
i s
i − 2 i −1 )
Benos 02-710/MSCBIO2070 1-3.Apr.2013 9
Application of Markov chains:
CpG islands (cntd)
aAT Training Set:
A T l  set of DNA sequences w/ known CpG islands
Derive two Markov chain models:
aAC aGT
l  ‘+’ model: from the CpG islands
C G l  ‘-’ model: from the remainder of sequence
aGC
Transition probabilities for each model:
+
•  A state for each of the four letters A,C, G, and c + c st is the number of times
T in the DNA alphabet a st+ = st letter t followed letter s
•  : probability of a residue following
another residue

t'
+
cst' in the CpG islands

To use these models for discrimination, calculate the


+ A C G T log-odds ratio:
A .180 .274 .426 .120
P(x|model + ) L a +x i −1 x i
C .171

.161
.368

.339
.274

.375
.188

.125
S(x) = log
P(x|model − )
= ∑ i =1
log
a −x i −1 x i
G
T .079 .355 .384 .182 Benos 02-710/MSCBIO2070 1-3.Apr.2013 10
Application of Markov chains:
CpG islands (cntd)

P( t | s,+ )
log 2 (P(t | s, +) / P(t | s, !))

P( t | s,- )

!
P( x | +) L P(x i+1 | x i ,+)
log 2 ! = # log 2
P( x | ") i=1 P(x i+1 | x i ,")

Benos 02-710/MSCBIO2070 1-3.Apr.2013 11


!
Histogram of log-odd scores

other
CpG

Q1: Given a short sequence x, does it come from CpG island? (Yes-No question)
•  Evaluate S(x)
Q2: Given a long sequence x, how do we find CpG islands in it (Where question)?
•  Calculate the log-odds score for a window of, say, 100 nucleotides around
every nucleotide, plot it, and predict CpG islands as ones w/ positive values
•  Drawbacks: Window size?
Benos 02-710/MSCBIO2070 1-3.Apr.2013 12
HMM: A parse of a sequence
Given a sequence x = x1……xL, and a HMM with K states,
A parse of x is a sequence of states π = π1, ……, πL
1 1 1 … 1

2 2 2 … 2

… … … …

K K K … K

x1 x2 x3 xL
13

Benos 02-710/MSCBIO2070 1-3.Apr.2013


Hidden Markov Models (HMMs)
•  What is a HMM?
A Markov process in which the probability of an outcome depends
also in a (hidden) random variable (state).
•  Memory-less: future states affected only by current state
•  We need:
ü  Ω : alphabet of symbols (outcomes)
ü  ∫ : set of states (hidden), each of which emits symbols
ü  A = (akl) : matrix of state transition probabilities
ü  E = (ek(b)) = (P(xi=b|π=k)) : matrix of emission probabilities

Benos 02-710/MSCBIO2070 1-3.Apr.2013 14


Example: the dishonest casino

0.95 0.9 ü Ω = {1, 2, 3, 4, 5, 6}


1: 1/6 1: 1/10 ü  ∫ = {F, L}
2: 1/6 2: 1/10 ü  A : aFF=0.95, aLL=0.9,
0.05

Loaded
3: 1/6 3: 1/10
Fair

aFL=0.05, aLF=0.1
4: 1/6 4: 1/10 ü  E : eF(b)=1/6 (∀ b∈Ω),
0.1
5: 1/6 5: 1/10 eL( 6 )=1/2
6: 1/6 6: 1/2 eL(b)=1/10 (if b≠6)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 15


Three main questions on HMMs
1.  Evaluation problem
GIVEN HMM M, sequence x
FIND P(x | M )
ALGOR. Forward O(TN2)

2.  Decoding problem


GIVEN HMM M, sequence x
FIND the sequence π of states that maximizes P(π | x, M )
ALGOR. Viterbi, Forward-Backward O(TN2)

3.  Learning problem


GIVEN HMM M, with unknown prob. parameters, sequence x
FIND parameters θ = (π, eij, akl) that maximize P(x | θ, M )
ALGOR. Maximum likelihood (ML), Baum-Welch (EM) O(TN2)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 16


Problem 1: Evaluation
Find the likelihood a given sequence is
generated by a particular model

E.g. Given the following sequence is it more likely that it comes from
a Loaded or a Fair die?

123412316261636461623411221341

Benos 02-710/MSCBIO2070 1-3.Apr.2013 17


Problem 1: Evaluation (cntd)
123412316261636461623411221341
30
P(Data | F1 ...F30 ) = # aF,F " eF (bi ) =
i=1

= 0.95 29 " (1/6) 30 = 0.226 " 4.52 "10#24 =


= 1.02 "10#24
!
30
P(Data | L1 ...L30 ) = # aL,L " eL (bi ) =
! i=1

= (1/2) 6 " (1/10) 24 " 0.90 29 = 1.56 "10#26 " 0.047 =


= 7.36 "10#28
!

What happens in a sliding window?


!
Benos 02-710/MSCBIO2070 1-3.Apr.2013 18
Three main questions on HMMs
ü  Evaluation problem
GIVEN HMM M, sequence x
FIND P(x | M )
ALGOR. Forward

1.  Decoding problem


GIVEN HMM M, sequence x
FIND the sequence π of states that maximizes P(π | x, M )
ALGOR. Viterbi, Forward-Backward O(TN2)

2.  Learning problem


GIVEN HMM M, with unknown prob. parameters, sequence x
FIND parameters θ = (π, eij, akl) that maximize P(x | θ, M )
ALGOR. Maximum likelihood (ML), Baum-Welch (EM) O(TN2)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 19


Problem 2: Decoding
Given a point xi in a sequence find its most
probable state

E.g. Given the following sequence is it more likely that the 3rd
observed 6 comes from a Loaded or a Fair die?

123412316261636461623411221341

Benos 02-710/MSCBIO2070 1-3.Apr.2013 20


The Forward Algorithm - derivation
l  In order to calculate P(xi) = probability of xi, given the
HMM, we need to sum over all possible ways of
generating xi:

P(x i ) = # P(x i ,") =# P(x i | ") $ P(")


" "

l To avoid summing over an exponential number of paths π,


!we first define the forward probability:
f k (i) = P(x1 ...x i ," i = k)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 21

!
The Forward Algorithm – derivation
(cntd)
l  Then, we need to write the fk(i) as a function of the
previous state, fl(i-1).

f k (i) = P(x1,..., x i"1, x i ,# i = k)


=% P(x1,..., x i"1,#1,...,# i"1,# i = k) $ ek (x i )
#1 ,...,# i "1

& )
! = % (% P(x1,..., x i"1,#1,...,# i" 2 ,# i"1 = l) $ al,k + $ ek (x i )
l' #1 ,...,# i "2 *
!
= % P(x1,..., x i"1,# i"1 = l) $ al,k $ ek (x i )
l
!
= ek (x i ) " $ f l (i #1) " al,k Chain rule: P(A,B,C)=P(C|A,B) P(B|A) P(A)
l

! Benos 02-710/MSCBIO2070 1-3.Apr.2013 22

!
The Forward Algorithm
We can compute fk(i) for all k, i, using dynamic programming

Initialization: f 0 (0) = 1
f k (0) = 0, "k > 0

Iteration: f k (i) = ek (x i ) " $ f l (i #1) " al,k


l
!

! !
Termination: P( x ) = # f k (N) " ak,0
k

Benos 02-710/MSCBIO2070 1-3.Apr.2013 23

!
The Backward Algorithm
l  Forward algorithm determines the most likely state k
at position i, using the previous observations.
123412316261636461623411221341

l  What if we started from the end?

Benos 02-710/MSCBIO2070 1-3.Apr.2013 24


The Backward Algorithm – derivation
l  We define the backward probability:

bk (i) = P(x i+1,..., x N | " i = k)


=# P(x i+1,..., x N ," i+1,...," N | " i = k)
" i +1 ,...," N

=# # P(x i+1,..., x N ," i+1 = l," i+2 ,...," N | " i = k)


l " i +1 ,...," N

!
= # ek (x i+1 ) " ak,l "# P(x i+2 ,..., x N ,$ i+2 ,...,$ N | $ i+i = l)
l $ i +2 ,...,$ N
!
= # bl (i + 1) " ak,l "el (x i+1 )
l
Chain rule: P(A,B,C)=P(C|A,B) P(B|A) P(A)
!
Benos 02-710/MSCBIO2070 1-3.Apr.2013 25

!
The Backward Algorithm
We can compute bk(i) for all k, i, using dynamic programming

Initialization: bk (N) = ak,0 , "k

Iteration: bk (i) = # ek (x i+1 ) " bl (i + 1) " ak,l


l
!

! !
Termination: P( x ) = # bk (1) " a0,k " ek (x1 )
k

! Benos 02-710/MSCBIO2070 1-3.Apr.2013 26


Posterior probabilities of the
dishonest casino data

Benos 02-710/MSCBIO2070 1-3.Apr.2013 27


Posterior Decoding
x1 x2 x3 …………………………………………………………..… xN

State 1

l P(πi=k|x)

l  Posterior decoding calculates the optimal path that explains the
data.

l  For each emitted symbol, xi, it finds the most likely state that
could produce it, based on the forward and backward
probabilities.
Benos 02-710/MSCBIO2070 1-3.Apr.2013 28
The Viterbi Algorithm – derivation
l  We define:

Vk (i) = max{"1 ,...," i #1 } P(x1,..., x i#1,"1,...," i#1," i = k)

l  Then, we need to write the Vk(i) as a function of the previous


state, Vl(i-1).
!
Vk (i) = ... = ek (x i ) " max l {al,k " Vl (i #1)}

Benos 02-710/MSCBIO2070 1-3.Apr.2013 29


The Viterbi Algorithm
x1 x2 x3 ………………………………………………………..xN
State 1

2
Vj(i)

Similar to aligning a set of states to a sequence


Dynamic programming!

Viterbi decoding: traceback


Benos 02-710/MSCBIO2070 1-3.Apr.2013 30
The Viterbi Algorithm
x1 x2 x3 ..xN

State 1

Vj(i)

Similar to aligning a set of states to a sequence


Dynamic programming!

Viterbi decoding: traceback


Benos 02-710/MSCBIO2070 1-3.Apr.2013 31
Viterbi results

Benos 02-710/MSCBIO2070 1-3.Apr.2013 32


Viterbi, Forward, Backward
VITERBI FORWARD BACKWARD

Initialization: Initialization: Initialization:


V0(0) = 1 f0(0) = 1 bk(N) = ak0, for all k
Vk(0) = 0, for all k > 0 fk(0) = 0, for all k > 0

Iteration: Iteration: Iteration:


Vl(i) = el(xi) maxk Vk(i-1) akl
fl(i) = el(xi) Σk f (i-1) a
k kl bl(i) = Σk e (x +1) a
l i kl bk(i+1)

Termination:
P(x, π*) = maxk Vk(N) Termination: Termination:
P(x) = Σk fk(N) ak0 P(x) = Σk a ek(x1) bk(1)
0k

Time: O(K2N) Space: O(KN) Time: O(K2N) Space: O(KN)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 33


Three main questions on HMMs
ü  Evaluation problem
GIVEN HMM M, sequence x
FIND P(x | M )
ALGOR. Forward

ü  Decoding problem


GIVEN HMM M, sequence x
FIND the sequence π of states that maximizes P(π | x, M )
ALGOR. Viterbi, Forward-Backward

3.  Learning
GIVEN HMM M, with unknown prob. parameters, sequence x
FIND parameters θ = (π, eij, akl) that maximize P(x | θ, M )
ALGOR. Maximum likelihood (ML), Baum-Welch (EM)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 34


Problem 3: Learning
Given a model (structure) and data, calculate
model s parameters

Two scenarios:
l  Labeled data - Supervised learning

12341231 62616364616 23411221341


Fair Loaded Fair
l  Unlabeled data - Unsupervised learning
123412316261636461623411221341

Benos 02-710/MSCBIO2070 1-3.Apr.2013 35


Two learning scenarios - examples
1.  Supervised learning
Examples:
GIVEN: the casino player allows us to observe him one evening, as he
changes dice and produces 10,000 rolls

GIVEN: a genomic region x = x1…x1,000,000 where we have good


(experimental) annotations of the CpG islands

2.  Unsupervised learning


Examples:
GIVEN: 10,000 rolls of the casino player, but we don t see when he
changes dice

GIVEN: a newly sequenced genome; we don t know how frequent are the
CpG islands there, neither do we know their composition

TARGET: Update the parameters θ of the model to maximize P(x|θ)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 36


Supervised learning
l  Given x = x1…xN for which the true state path π = π1…πN is known
l  Define:
Ak,l = # times state transition k→l occurs in π
Ek(b) = # times state k in π emits b in x

l  The maximum likelihood parameters θ are:

ML Ak,l E k (b)
ak,l = ekML (b) =
" Ak,i
i
" E k (c)
c

l  Problem: overfitting (when training set is small for the model)

! !

Benos 02-710/MSCBIO2070 1-3.Apr.2013 37


Overfitting
l  Example
l  Given 10 casino rolls, we observe
x = 2, 1, 5, 6, 1, 2, 3, 6, 2, 3
π = F, F, F, F, F, F, F, F, F, F

l  Then:
aFF = 10/10 = 1.00; aFL = 0/10 = 0
eF(1) = eF(3) = 2/10 = 0.2;
eF(2) = 3/10 = 0.3; eF(4) = 0/10 = 0; eF(5) = eF(6) = 1/10 = 0.1

l  Solution: add pseudocounts


l  Larger pseudocounts ⇒ strong prior belief (need a lot of data to change)
l  Smaller pseudocounts ⇒ just smoothing (to avoid zero probabilities)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 38


Overfitting
l  Example
l  Given 10 casino rolls, we observe
x = 2, 1, 5, 6, 1, 2, 3, 6, 2, 3
π = F, F, F, F, F, F, F, F, F, F

l  Then:
aFF = 11/12 = 0.92; aFL = 1/12 = 0.08
eF(1) = eF(3) = 3/16 = 0.1875;
eF(2) = 4/16 = 0.25; eF(4) = 1/16 = 0.0625; eF(5) = eF(6) = 2/16 = 0.125

l  Solution: add pseudocounts


l  Larger pseudocounts ⇒ strong prior belief (need a lot of data to change)
l  Smaller pseudocounts ⇒ just smoothing (to avoid zero probabilities)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 39


Unsupervised learning - ML
l  Given x = x1…xN for which the true state path π = π1…πN is unknown

l  EXPECTATION MAXIMIZATION (EM) in a nutshell


0. Initialize the parameters θ of the model M
1.  Calculate the expected values of Ak,l, Ek(b) based on the training
data and current parameters
2.  Update θ according to Ak,l, Ek(b) as in supervised learning
3.  Repeat #1 & #2 until convergence

l  In HMM training, we usually apply a special case of EM, called


Baum-Welch Algorithm

Benos 02-710/MSCBIO2070 1-3.Apr.2013 40


The Baum-Welch (EM) algorithm
simply put
•  Recurrence:
1.  Estimate Ak,l and Ek(b) from ak,l and ek(b) overall all training
sequences (E-step)
2.  Update ak,l and ek(b) using ML (M-step)

3.  Repeat steps #1, #2 with new parameters ak,l and ek(b)

•  Initialization:
•  Set A and E to pseudocounts (or priors)

•  Termination: if Δlog-likelihood < threshold or Ntimes>max_times

Benos 02-710/MSCBIO2070 1-3.Apr.2013 41


The Baum-Welch algorithm
•  Recurrence:
1.  Calculate forward/backwards probs, fk(i) and bk(i), for each training
sequence

2.  E-step: Estimate the expected number of kàl transitions, Ak,l


!
Ak,l = # f k (i) " ak,l " el (x i+1 ) " bl (i + 1) /P( x | $ )
i

and the expected number of symbol b appearences in state k, Ek(b)


!
!
E k (b) = # k k
f (i) " b (i) /P( x | $)
{i|x i = b}

3.  M-step: Estimate new model parameters ak,l and ek(b) using ML
across all training sequences
! 4.  Estimate the new model’s (log)likelihood to assess convergence
Benos 02-710/MSCBIO2070 1-3.Apr.2013 42
The Baum-Welch algorithm (cntd)
•  Initialization: pick arbitrary model parameters
•  Set A and E to pseudocounts (or priors)

•  Termination: if Δlog-likelihood < threshold or Ntimes>max_times

The Baum-Welch algorithm:


-  is monotone

-  guarantees convergence

-  is a special case of EM

-  has many local optima

Benos 02-710/MSCBIO2070 1-3.Apr.2013 43


An example of Baum-Welch
(thanks to Sarah Wheelan, JHU)

l  I observe the dog across the street.


Sometimes he is inside, sometimes
outside.

l  I assume that since he can not open the door


himself, then there is another factor, hidden from
me, that determines his behavior.

l  Since I am lazy, I will guess there are only two


hidden states, S1 and S2.

Benos 02-710/MSCBIO2070 1-3.Apr.2013 44


An example of Baum-Welch (cntd)
l  One set of observations:
l  I-I-I-I-I-O-O-I-I-I
l  Guessing two hidden states. I need to invent a
transition and emission matrix.
l  Note: since Baum-Welch is an EM algorithm the better my initial
guesses are the better the job I will do in estimating the true
parameters
Day k+1
S1 S2 IN OUT
Day k

S1 0.5 0.5 S1 0.2 0.8


S2 0.4 0.6 S2 0.9 0.1

Benos 02-710/MSCBIO2070 1-3.Apr.2013 45


An example of Baum-Welch (cntd)
l  Let’s assume initial values of:
l  P(S1) = 0.3, P(S2) = 0.7

l  Example guess: if initial I-I came from S1-S2 then the
probability is:
0.3 x 0.2 x 0.5 x 0.9 = 0.027

Day k+1
S1 S2 IN OUT
Day k

S1 0.5 0.5 S1 0.2 0.8


S2 0.4 0.6 S2 0.9 0.1

Benos 02-710/MSCBIO2070 1-3.Apr.2013 46


An example of Baum-Welch (cntd)
l  Now, let’s estimate the transition matrix. Sequence I-I-
I-I-I-O-O-I-I-I has the following events:
l  II, II, II, II, IO, OO, OI, II, II Seq P(Seq) for S1S2 Best P(Seq)
II 0.027 0.3403 S2S2
l  So, our estimate for S1->S2 II 0.027 0.3403 S2S2
transition probability is: II 0.027 0.3403 S2S2
l  0.285/2.4474 = 0.116 II 0.027 0.3403 S2S2
IO 0.003 0.2016 S2S1
l  Similarly, calculate the other
OO 0.012 0.0960 S1S1
three transition probs and
OI 0.108 0.1080 S1S2
normalize so they sum up to 1 II 0.027 0.3403 S2S2
l  Update transition matrix II 0.027 0.3403 S2S2
Total 0.285 2.4474
Benos 02-710/MSCBIO2070 1-3.Apr.2013 47
An example of Baum-Welch (cntd)
l  Estimating initial probabilities:
l  Assume all sequences start with hidden state S1, calculate best
probability
l  Assume all sequences start with hidden state S2, calculate best
probability
l  Normalize to 1.

l  Now, we have generated the updated transition,


emission and initial probabilities. Repeat this method
until those probabilities converge

Benos 02-710/MSCBIO2070 1-3.Apr.2013 48


The Baum-Welch algorithm
l  Time complexity:
l  # iterations x O(K2N)

l  Guaranteed to increase the likelihood P(x | θ)

l  Not guaranteed to find globally optimal parameters


l  Converges to a local optimum, depending on initial conditions

l  Too many parameters / too large model ⇒


Overtraining
Benos 02-710/MSCBIO2070 1-3.Apr.2013 49
Back to: HMM for CpG islands
How do we find CpG islands in a sequence?
A: 1 A: 0 A: 0 A: 0
C: 0 C: 1 C: 0 C: 0 Build a single model that combines both
G: 0 G: 0 G: 1 G: 0 Markov chains:
T: 0 T: 0 T: 0 T: 1
l  ‘+’ states: A+, C+, G+, T+
l Emit symbols: A, C, G, T in CpG islands
A+ C+ G+ T+
l  ‘-’ states: A-, C-, G-, T-
l Emit symbols: A, C, G, T in non-CpG islands
If a sequence CGCG is emitted by states
A- C- G- T- (C+,G-,C-,G+), then:

A: 1 A: 0 A: 0 A: 0 P(CGCG ) = a0,C+ ×1× aC+ ,G− ×1× aG− ,C− ×1× aC− ,G+ ×1× aG+ ,0
C: 0 C: 1 C: 0 C: 0
G: 0 G: 0 G: 1 G: 0
T: 0 T: 0 T: 0 T: 1 In general, we DO NOT know the path.
How to estimate the path?
Note: Each set (‘+’ or ‘-’) has an additional set
of transitions as in previous Markov chain
Benos 02-710/MSCBIO2070 1-3.Apr.2013 50
What we have..
A+ C+ G+ T+ A- C- G- T-
A+ .180 .274 .426 .120
Note: these transitions out
of each state add up to one—
C+ .171 .368 .274 .188 no room for transitions
between (+) and (-) states
G+ .161 .339 .375 .125

T+ .079 .355 .384 .182

A- .300 .205 .285 .210

C- .233 .298 .078 .302


Not a valid transition
G- .248 .246 .298 .208
probability matrix nor a
T- .177 .239 .292 .292 complete one!

51
A model of CpG Islands –
Transitions
l  What about transitions between (+) and (-) states?
l  They affect
Length distribution of region +:
l  Avg. length of CpG island
l  Avg. separation between two CpG islands P(L=1) +- = 1-p++ :
1-p++ P(L=2) ++- = p++ (1-p++)
p++ p- - …
P[L= l ] = p++l-1(1-p++)
+ -
1
Geometric distribution, with mean =
1-p- - 1 − p++
Expected length of a state to continue in that state
Benos 02-710/MSCBIO2070 1-3.Apr.2013 52
What we have..
A+ C+ G+ T+ A- C- G- T- (1-λ+) * freq(bi)
A+ .180 .274 .426 .120

C+ .171 .368 .274 .188 Now a valid transition


probability matrix and a
G+ .161 .339 .375 .125
complete one!
T+ .079 .355 .384 .182

A- .300 .205 .285 .210

λ+
C- .233 .298 .078 .302

G- .248 .246 .298 .208

T- .177 .239 .292 .292

(1-λ-)*freq(bi) λ-
53
Another application: Profile HMMs

Profile HMMS (Haussler, 1993)

l  Ungapped alignment of sequence X against profile M


l  ei(a): probability of observing a at position I

l  P(X | M ) = ! e (xi )


i=i,...,L i

l  Score(X | M ) = ! log(ei (xi ) / qxi )


i=1,...,L

l  What about indels ?

Benos 02-710/MSCBIO2070 1-3.Apr.2013 54


Profile HMMs: “match” states
LEVK
LEIR
LEIK
LDVE
We make a single state HMM to represent above profile, using match
states only

Begin M1 M2 M3 M4 End

Deriving emission Pr(L)=1 Pr(E) = 3/4 Pr(V) = 1/2 Pr(R) = 1/4


Pr(D) = 1/4 Pr(I) = 1/2 Pr(K) = 1/4
probabilities for Pr(E) = 1/4
the Match states
Introducing “insert” states to the
previous HMM
We want to know whether (for instance) the sequence
LEKKVK is a good match to the HMM

LE--VK
LE--IR We know it should look like this in the end
LE--IK
LD--VE
LEKKVK

Begin M1 M2 M3 M4 End
Introducing “delete” states to the
previous HMM
We want to know whether (for instance) the sequence
LEK is a good match to the HMM

LEVK
LEIR We know it should look like this in the end
LEIK
LDVE
LE-K

Begin M1 M2 M3 M4 End
Three main applications for
profile HMMs
1. Find sequence homologs
l  ie, we represent a sequence family by an HMM and use that
to identify (“evaluate”) other related sequences
KKKKKK
LEVK IKNGTTT
Convert Search LEAK
LEIR Profile
LEIK HMM ……
GGIAAEEIK
LDVE IIGGGAVVS

Evaluation: So Use Forward


Viterbi is OK too. P ( x,SP * | λ )
L
p
P(x | λ ) = ∑ P ( x,SP | λ ) = ∑ ∏ a(π ,π i )e(π i , xi )
i −1
AllPossibleParses AllPossibleParses i =1
( SP p ;# of possible p's =K L ) ( K L Possibilities)
Three main applications for profile
HMM
2. Align a new sequence to the profile
l  ie, we expnad our multiple sequence alignment

LEVK
LEVK LEIR
Convert Align
LEIR Profile LEIK
LEIK HMM
LDVE
LDVE LE-K

This is Decoding: Use Viterbi


Three main applications for profile
HMM
3. Align a set of sequences from scratch
l  ie, we want to build a multiple sequence alignment of a set of
“unaligned sequences”

LEVK
Align LEIR
LEVK,LEK, LEIR, LEIK, LDVE LEIK
LDVE
LE-K

This needs parameter estimation:


use Baum-Welch
Making multiple sequence alignment
from unaligned sequences
l  Baum-Welch Expectation-maximization method
l  Start with a model whose length matches the average length of
the sequences and with random output and transition
probabilities.
l  Align all the sequences to the model.
l  Use the alignment to alter the output and transition probabilities
l  Repeat. Continue until the model stops changing

l  By-product: It produced a multiple alignment

62
Acknowledgements
Some of the slides used in this lecture are adapted or modified
slides from lectures of:
l  Serafim Batzoglou, Stanford University
l  Bino John, Dow Agrosciences

l  Nagiza F. Samatova, Oak Ridge National Lab

l  Sarah Wheelan, Johns Hopkins University

l  Eric Xing, Carnegie-Mellon University

Theory and examples from the following books:


l  T. Koski, Hidden Markov Models for Bioinformatics , 2001,
Kluwer Academic Publishers
l  R. Durbin, S. Eddy, A. Krogh, G. Mitchison, Biological Sequence
Analysis , 1998, Cambridge University Press
Benos 02-710/MSCBIO2070 1-3.Apr.2013 63

You might also like