0% found this document useful (0 votes)

96 views59 pages

Hmms

Here are the steps to solve the evaluation problem for HMMs: 1. Define the HMM model M with states S, emissions Ω, transition probabilities A, and emission probabilities B. 2. Use the forward algorithm to calculate P(x|M), the probability of observing the sequence x given the model M. 3. The forward algorithm calculates P(x|M) by summing over all possible state sequences: P(x|M) = Σs1...sT P(x,s1...sT|M) 4. It does this recursively by calculating P(x1...xt, st|M) for t = 1 to T. 5.

Uploaded by

matin ashrafi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views59 pages

Hmms

Uploaded by

matin ashrafi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

HMMs and applications

Notes from Dr. Takis Benos

and DEKM book
Markov chains
•  What is a Markov chain?

Markov chain of order n is a stochastic process of

a series of outcomes, in which the probability of
outcome x depends on the state of the previous n
outcomes.

Benos 02-710/MSCBIO2070 1-3.Apr.2013 5

Markov chains (cntd)

•  Markov chain (of first order) and the Chain Rule

Chain rule: P(A,B,C)=P(C|A,B) P(B|A) P(A)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 6

!
Application of Markov chains:
CpG islands

l  CG is relatively rare in the genome due to high mutation of methyl-

CG to methyl-TG (or CA)
l  Methylated CpG residues are often associated with house-keeping
genes in the promoter and exon regions.
l  Methyl-CpG binding proteins recruit histone deacetylases and are
thus responsible for transcriptional repression.
l  They have roles in gene silencing, genomic imprinting, and X-
chromosome inactivation.

Benos 02-710/MSCBIO2070 1-3.Apr.2013 7

CpG islands and DNA Methylation
Largely confined to CpG dinucleotides

CpG islands - regions of more than 500 bp with CG content > 55%

denoted CpG to not confuse with CG base pair

Methylation often suppressed around genes, promoters

Can we predict CpG islands? – a good way of identifying

potential gene regions as well! – But not so fast!! 8
Application of Markov chains:
CpG islands
•  Problem:
Given two sets of sequences from the human genome,
one with CpG islands and one without, can we calculate a
model that can predict the CpG islands?

Sequence: s = t !t !a!c!g!g!t N
0th-order: P0 (s ) = p(t ) ⋅ p(t ) ⋅ p(a ) ⋅ p(c ) ⋅ p(g ) = ∏ p(s ) i
i =1
N
1st-order: P1 (s ) = p(t ) ⋅ p(t | t ) ⋅ p(a | t ) ⋅ p(c | a ) = p(s1 ) ⋅ ∏ p(si | si −1 )
i =2
N
2nd-order: P (s ) = p(tt ) ⋅ p(a | tt ) ⋅ p(c | ta ) ⋅ p(g | ac ) = p(s s ) ⋅
2 1 2 ∏ p(s | s
i =3
i s
i − 2 i −1 )
Benos 02-710/MSCBIO2070 1-3.Apr.2013 9
Application of Markov chains:
CpG islands (cntd)
aAT Training Set:
A T l  set of DNA sequences w/ known CpG islands
Derive two Markov chain models:
aAC aGT
l  ‘+’ model: from the CpG islands
C G l  ‘-’ model: from the remainder of sequence
aGC
Transition probabilities for each model:
+
•  A state for each of the four letters A,C, G, and c + c st is the number of times
T in the DNA alphabet a st+ = st letter t followed letter s
•  : probability of a residue following
another residue
∑
t'
+
cst' in the CpG islands

To use these models for discrimination, calculate the

+ A C G T log-odds ratio:
A .180 .274 .426 .120
P(x|model + ) L a +x i −1 x i
C .171

.161
.368

.339
.274

.375
.188

.125
S(x) = log
P(x|model − )
= ∑ i =1
log
a −x i −1 x i
G
T .079 .355 .384 .182 Benos 02-710/MSCBIO2070 1-3.Apr.2013 10
Application of Markov chains:
CpG islands (cntd)

P( t | s,+ )
log 2 (P(t | s, +) / P(t | s, !))

P( t | s,- )

!
P( x | +) L P(x i+1 | x i ,+)
log 2 ! = # log 2
P( x | ") i=1 P(x i+1 | x i ,")

Benos 02-710/MSCBIO2070 1-3.Apr.2013 11

!
Histogram of log-odd scores

other
CpG

Q1: Given a short sequence x, does it come from CpG island? (Yes-No question)
•  Evaluate S(x)
Q2: Given a long sequence x, how do we find CpG islands in it (Where question)?
•  Calculate the log-odds score for a window of, say, 100 nucleotides around
every nucleotide, plot it, and predict CpG islands as ones w/ positive values
•  Drawbacks: Window size?
Benos 02-710/MSCBIO2070 1-3.Apr.2013 12
HMM: A parse of a sequence
Given a sequence x = x1……xL, and a HMM with K states,
A parse of x is a sequence of states π = π1, ……, πL
1 1 1 … 1

2 2 2 … 2

… … … …

K K K … K

x1 x2 x3 xL
13

Benos 02-710/MSCBIO2070 1-3.Apr.2013

Hidden Markov Models (HMMs)
•  What is a HMM?
A Markov process in which the probability of an outcome depends
also in a (hidden) random variable (state).
•  Memory-less: future states affected only by current state
•  We need:
ü  Ω : alphabet of symbols (outcomes)
ü  ∫ : set of states (hidden), each of which emits symbols
ü  A = (akl) : matrix of state transition probabilities
ü  E = (ek(b)) = (P(xi=b|π=k)) : matrix of emission probabilities

Benos 02-710/MSCBIO2070 1-3.Apr.2013 14

Example: the dishonest casino

0.95 0.9 ü Ω = {1, 2, 3, 4, 5, 6}

1: 1/6 1: 1/10 ü  ∫ = {F, L}
2: 1/6 2: 1/10 ü  A : aFF=0.95, aLL=0.9,
0.05

Loaded
3: 1/6 3: 1/10
Fair

aFL=0.05, aLF=0.1
4: 1/6 4: 1/10 ü  E : eF(b)=1/6 (∀ b∈Ω),
0.1
5: 1/6 5: 1/10 eL( 6 )=1/2
6: 1/6 6: 1/2 eL(b)=1/10 (if b≠6)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 15

Three main questions on HMMs
1.  Evaluation problem
GIVEN HMM M, sequence x
FIND P(x | M )
ALGOR. Forward O(TN2)

2.  Decoding problem

GIVEN HMM M, sequence x
FIND the sequence π of states that maximizes P(π | x, M )
ALGOR. Viterbi, Forward-Backward O(TN2)

3.  Learning problem

GIVEN HMM M, with unknown prob. parameters, sequence x
FIND parameters θ = (π, eij, akl) that maximize P(x | θ, M )
ALGOR. Maximum likelihood (ML), Baum-Welch (EM) O(TN2)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 16

Problem 1: Evaluation
Find the likelihood a given sequence is
generated by a particular model

E.g. Given the following sequence is it more likely that it comes from
a Loaded or a Fair die?

123412316261636461623411221341

Benos 02-710/MSCBIO2070 1-3.Apr.2013 17

Problem 1: Evaluation (cntd)
123412316261636461623411221341
30
P(Data | F1 ...F30 ) = # aF,F " eF (bi ) =
i=1

= 0.95 29 " (1/6) 30 = 0.226 " 4.52 "10#24 =

= 1.02 "10#24
!
30
P(Data | L1 ...L30 ) = # aL,L " eL (bi ) =
! i=1

= (1/2) 6 " (1/10) 24 " 0.90 29 = 1.56 "10#26 " 0.047 =

= 7.36 "10#28
!

What happens in a sliding window?

!
Benos 02-710/MSCBIO2070 1-3.Apr.2013 18
Three main questions on HMMs
ü  Evaluation problem
GIVEN HMM M, sequence x
FIND P(x | M )
ALGOR. Forward

1.  Decoding problem

GIVEN HMM M, sequence x
FIND the sequence π of states that maximizes P(π | x, M )
ALGOR. Viterbi, Forward-Backward O(TN2)

2.  Learning problem

GIVEN HMM M, with unknown prob. parameters, sequence x
FIND parameters θ = (π, eij, akl) that maximize P(x | θ, M )
ALGOR. Maximum likelihood (ML), Baum-Welch (EM) O(TN2)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 19

Problem 2: Decoding
Given a point xi in a sequence find its most
probable state

E.g. Given the following sequence is it more likely that the 3rd
observed 6 comes from a Loaded or a Fair die?

123412316261636461623411221341

Benos 02-710/MSCBIO2070 1-3.Apr.2013 20

The Forward Algorithm - derivation
l  In order to calculate P(xi) = probability of xi, given the
HMM, we need to sum over all possible ways of
generating xi:

P(x i ) = # P(x i ,") =# P(x i | ") $ P(")

" "

l To avoid summing over an exponential number of paths π,

!we first define the forward probability:
f k (i) = P(x1 ...x i ," i = k)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 21

!
The Forward Algorithm – derivation
(cntd)
l  Then, we need to write the fk(i) as a function of the
previous state, fl(i-1).

f k (i) = P(x1,..., x i"1, x i ,# i = k)

=% P(x1,..., x i"1,#1,...,# i"1,# i = k) $ ek (x i )
#1 ,...,# i "1

& )
! = % (% P(x1,..., x i"1,#1,...,# i" 2 ,# i"1 = l) $ al,k + $ ek (x i )
l' #1 ,...,# i "2 *
!
= % P(x1,..., x i"1,# i"1 = l) $ al,k $ ek (x i )
l
!
= ek (x i ) " $ f l (i #1) " al,k Chain rule: P(A,B,C)=P(C|A,B) P(B|A) P(A)
l

! Benos 02-710/MSCBIO2070 1-3.Apr.2013 22

!
The Forward Algorithm
We can compute fk(i) for all k, i, using dynamic programming

Initialization: f 0 (0) = 1
f k (0) = 0, "k > 0

Iteration: f k (i) = ek (x i ) " $ f l (i #1) " al,k

l
!

! !
Termination: P( x ) = # f k (N) " ak,0
k

Benos 02-710/MSCBIO2070 1-3.Apr.2013 23

!
The Backward Algorithm
l  Forward algorithm determines the most likely state k
at position i, using the previous observations.
123412316261636461623411221341

l  What if we started from the end?

Benos 02-710/MSCBIO2070 1-3.Apr.2013 24

The Backward Algorithm – derivation
l  We define the backward probability:

bk (i) = P(x i+1,..., x N | " i = k)

=# P(x i+1,..., x N ," i+1,...," N | " i = k)
" i +1 ,...," N

=# # P(x i+1,..., x N ," i+1 = l," i+2 ,...," N | " i = k)

l " i +1 ,...," N

!
= # ek (x i+1 ) " ak,l "# P(x i+2 ,..., x N ,$ i+2 ,...,$ N | $ i+i = l)
l $ i +2 ,...,$ N
!
= # bl (i + 1) " ak,l "el (x i+1 )
l
Chain rule: P(A,B,C)=P(C|A,B) P(B|A) P(A)
!
Benos 02-710/MSCBIO2070 1-3.Apr.2013 25

!
The Backward Algorithm
We can compute bk(i) for all k, i, using dynamic programming

Initialization: bk (N) = ak,0 , "k

Iteration: bk (i) = # ek (x i+1 ) " bl (i + 1) " ak,l

l
!

! !
Termination: P( x ) = # bk (1) " a0,k " ek (x1 )
k

! Benos 02-710/MSCBIO2070 1-3.Apr.2013 26

Posterior probabilities of the
dishonest casino data

Benos 02-710/MSCBIO2070 1-3.Apr.2013 27

Posterior Decoding
x1 x2 x3 …………………………………………………………..… xN

State 1

l P(πi=k|x)

l  Posterior decoding calculates the optimal path that explains the
data.

l  For each emitted symbol, xi, it finds the most likely state that
could produce it, based on the forward and backward
probabilities.
Benos 02-710/MSCBIO2070 1-3.Apr.2013 28
The Viterbi Algorithm – derivation
l  We define:

Vk (i) = max{"1 ,...," i #1 } P(x1,..., x i#1,"1,...," i#1," i = k)

l  Then, we need to write the Vk(i) as a function of the previous

state, Vl(i-1).
!
Vk (i) = ... = ek (x i ) " max l {al,k " Vl (i #1)}

Benos 02-710/MSCBIO2070 1-3.Apr.2013 29

The Viterbi Algorithm
x1 x2 x3 ………………………………………………………..xN
State 1

2
Vj(i)

Similar to aligning a set of states to a sequence

Dynamic programming!

Viterbi decoding: traceback

Benos 02-710/MSCBIO2070 1-3.Apr.2013 30
The Viterbi Algorithm
x1 x2 x3 ..xN

State 1

Vj(i)

Similar to aligning a set of states to a sequence

Dynamic programming!

Viterbi decoding: traceback

Benos 02-710/MSCBIO2070 1-3.Apr.2013 31
Viterbi results

Benos 02-710/MSCBIO2070 1-3.Apr.2013 32

Viterbi, Forward, Backward
VITERBI FORWARD BACKWARD

Initialization: Initialization: Initialization:

V0(0) = 1 f0(0) = 1 bk(N) = ak0, for all k
Vk(0) = 0, for all k > 0 fk(0) = 0, for all k > 0

Iteration: Iteration: Iteration:

Vl(i) = el(xi) maxk Vk(i-1) akl
fl(i) = el(xi) Σk f (i-1) a
k kl bl(i) = Σk e (x +1) a
l i kl bk(i+1)

Termination:
P(x, π*) = maxk Vk(N) Termination: Termination:
P(x) = Σk fk(N) ak0 P(x) = Σk a ek(x1) bk(1)
0k

Time: O(K2N) Space: O(KN) Time: O(K2N) Space: O(KN)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 33

Three main questions on HMMs
ü  Evaluation problem
GIVEN HMM M, sequence x
FIND P(x | M )
ALGOR. Forward

ü  Decoding problem

GIVEN HMM M, sequence x
FIND the sequence π of states that maximizes P(π | x, M )
ALGOR. Viterbi, Forward-Backward

3.  Learning
GIVEN HMM M, with unknown prob. parameters, sequence x
FIND parameters θ = (π, eij, akl) that maximize P(x | θ, M )
ALGOR. Maximum likelihood (ML), Baum-Welch (EM)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 34

Problem 3: Learning
Given a model (structure) and data, calculate
model s parameters

Two scenarios:
l  Labeled data - Supervised learning

12341231 62616364616 23411221341

Fair Loaded Fair
l  Unlabeled data - Unsupervised learning
123412316261636461623411221341

Benos 02-710/MSCBIO2070 1-3.Apr.2013 35

Two learning scenarios - examples
1.  Supervised learning
Examples:
GIVEN: the casino player allows us to observe him one evening, as he
changes dice and produces 10,000 rolls

GIVEN: a genomic region x = x1…x1,000,000 where we have good

(experimental) annotations of the CpG islands

2.  Unsupervised learning

Examples:
GIVEN: 10,000 rolls of the casino player, but we don t see when he
changes dice

GIVEN: a newly sequenced genome; we don t know how frequent are the
CpG islands there, neither do we know their composition

TARGET: Update the parameters θ of the model to maximize P(x|θ)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 36

Supervised learning
l  Given x = x1…xN for which the true state path π = π1…πN is known
l  Define:
Ak,l = # times state transition k→l occurs in π
Ek(b) = # times state k in π emits b in x

l  The maximum likelihood parameters θ are:

ML Ak,l E k (b)
ak,l = ekML (b) =
" Ak,i
i
" E k (c)
c

l  Problem: overfitting (when training set is small for the model)

! !

Benos 02-710/MSCBIO2070 1-3.Apr.2013 37

Overfitting
l  Example
l  Given 10 casino rolls, we observe
x = 2, 1, 5, 6, 1, 2, 3, 6, 2, 3
π = F, F, F, F, F, F, F, F, F, F

l  Then:
aFF = 10/10 = 1.00; aFL = 0/10 = 0
eF(1) = eF(3) = 2/10 = 0.2;
eF(2) = 3/10 = 0.3; eF(4) = 0/10 = 0; eF(5) = eF(6) = 1/10 = 0.1

l  Solution: add pseudocounts

l  Larger pseudocounts ⇒ strong prior belief (need a lot of data to change)
l  Smaller pseudocounts ⇒ just smoothing (to avoid zero probabilities)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 38

Overfitting
l  Example
l  Given 10 casino rolls, we observe
x = 2, 1, 5, 6, 1, 2, 3, 6, 2, 3
π = F, F, F, F, F, F, F, F, F, F

l  Then:
aFF = 11/12 = 0.92; aFL = 1/12 = 0.08
eF(1) = eF(3) = 3/16 = 0.1875;
eF(2) = 4/16 = 0.25; eF(4) = 1/16 = 0.0625; eF(5) = eF(6) = 2/16 = 0.125

l  Solution: add pseudocounts

l  Larger pseudocounts ⇒ strong prior belief (need a lot of data to change)
l  Smaller pseudocounts ⇒ just smoothing (to avoid zero probabilities)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 39

Unsupervised learning - ML
l  Given x = x1…xN for which the true state path π = π1…πN is unknown

l  EXPECTATION MAXIMIZATION (EM) in a nutshell

0. Initialize the parameters θ of the model M
1.  Calculate the expected values of Ak,l, Ek(b) based on the training
data and current parameters
2.  Update θ according to Ak,l, Ek(b) as in supervised learning
3.  Repeat #1 & #2 until convergence

l  In HMM training, we usually apply a special case of EM, called

Baum-Welch Algorithm

Benos 02-710/MSCBIO2070 1-3.Apr.2013 40

The Baum-Welch (EM) algorithm
simply put
•  Recurrence:
1.  Estimate Ak,l and Ek(b) from ak,l and ek(b) overall all training
sequences (E-step)
2.  Update ak,l and ek(b) using ML (M-step)

3.  Repeat steps #1, #2 with new parameters ak,l and ek(b)

•  Initialization:
•  Set A and E to pseudocounts (or priors)

•  Termination: if Δlog-likelihood < threshold or Ntimes>max_times

Benos 02-710/MSCBIO2070 1-3.Apr.2013 41

The Baum-Welch algorithm
•  Recurrence:
1.  Calculate forward/backwards probs, fk(i) and bk(i), for each training
sequence

2.  E-step: Estimate the expected number of kàl transitions, Ak,l

!
Ak,l = # f k (i) " ak,l " el (x i+1 ) " bl (i + 1) /P( x | $ )
i

and the expected number of symbol b appearences in state k, Ek(b)

!
!
E k (b) = # k k
f (i) " b (i) /P( x | $)
{i|x i = b}

3.  M-step: Estimate new model parameters ak,l and ek(b) using ML
across all training sequences
! 4.  Estimate the new model’s (log)likelihood to assess convergence
Benos 02-710/MSCBIO2070 1-3.Apr.2013 42
The Baum-Welch algorithm (cntd)
•  Initialization: pick arbitrary model parameters
•  Set A and E to pseudocounts (or priors)

•  Termination: if Δlog-likelihood < threshold or Ntimes>max_times

The Baum-Welch algorithm:

-  is monotone

-  guarantees convergence

-  is a special case of EM

-  has many local optima

Benos 02-710/MSCBIO2070 1-3.Apr.2013 43

An example of Baum-Welch
(thanks to Sarah Wheelan, JHU)

l  I observe the dog across the street.

Sometimes he is inside, sometimes
outside.

l  I assume that since he can not open the door

himself, then there is another factor, hidden from
me, that determines his behavior.

l  Since I am lazy, I will guess there are only two

hidden states, S1 and S2.

Benos 02-710/MSCBIO2070 1-3.Apr.2013 44

An example of Baum-Welch (cntd)
l  One set of observations:
l  I-I-I-I-I-O-O-I-I-I
l  Guessing two hidden states. I need to invent a
transition and emission matrix.
l  Note: since Baum-Welch is an EM algorithm the better my initial
guesses are the better the job I will do in estimating the true
parameters
Day k+1
S1 S2 IN OUT
Day k

S1 0.5 0.5 S1 0.2 0.8

S2 0.4 0.6 S2 0.9 0.1

Benos 02-710/MSCBIO2070 1-3.Apr.2013 45

An example of Baum-Welch (cntd)
l  Let’s assume initial values of:
l  P(S1) = 0.3, P(S2) = 0.7

l  Example guess: if initial I-I came from S1-S2 then the
probability is:
0.3 x 0.2 x 0.5 x 0.9 = 0.027

Day k+1
S1 S2 IN OUT
Day k

S1 0.5 0.5 S1 0.2 0.8

S2 0.4 0.6 S2 0.9 0.1

Benos 02-710/MSCBIO2070 1-3.Apr.2013 46

An example of Baum-Welch (cntd)
l  Now, let’s estimate the transition matrix. Sequence I-I-
I-I-I-O-O-I-I-I has the following events:
l  II, II, II, II, IO, OO, OI, II, II Seq P(Seq) for S1S2 Best P(Seq)
II 0.027 0.3403 S2S2
l  So, our estimate for S1->S2 II 0.027 0.3403 S2S2
transition probability is: II 0.027 0.3403 S2S2
l  0.285/2.4474 = 0.116 II 0.027 0.3403 S2S2
IO 0.003 0.2016 S2S1
l  Similarly, calculate the other
OO 0.012 0.0960 S1S1
three transition probs and
OI 0.108 0.1080 S1S2
normalize so they sum up to 1 II 0.027 0.3403 S2S2
l  Update transition matrix II 0.027 0.3403 S2S2
Total 0.285 2.4474
Benos 02-710/MSCBIO2070 1-3.Apr.2013 47
An example of Baum-Welch (cntd)
l  Estimating initial probabilities:
l  Assume all sequences start with hidden state S1, calculate best
probability
l  Assume all sequences start with hidden state S2, calculate best
probability
l  Normalize to 1.

l  Now, we have generated the updated transition,

emission and initial probabilities. Repeat this method
until those probabilities converge

Benos 02-710/MSCBIO2070 1-3.Apr.2013 48

The Baum-Welch algorithm
l  Time complexity:
l  # iterations x O(K2N)

l  Guaranteed to increase the likelihood P(x | θ)

l  Not guaranteed to find globally optimal parameters

l  Converges to a local optimum, depending on initial conditions

l  Too many parameters / too large model ⇒

Overtraining
Benos 02-710/MSCBIO2070 1-3.Apr.2013 49
Back to: HMM for CpG islands
How do we find CpG islands in a sequence?
A: 1 A: 0 A: 0 A: 0
C: 0 C: 1 C: 0 C: 0 Build a single model that combines both
G: 0 G: 0 G: 1 G: 0 Markov chains:
T: 0 T: 0 T: 0 T: 1
l  ‘+’ states: A+, C+, G+, T+
l Emit symbols: A, C, G, T in CpG islands
A+ C+ G+ T+
l  ‘-’ states: A-, C-, G-, T-
l Emit symbols: A, C, G, T in non-CpG islands
If a sequence CGCG is emitted by states
A- C- G- T- (C+,G-,C-,G+), then:

A: 1 A: 0 A: 0 A: 0 P(CGCG ) = a0,C+ ×1× aC+ ,G− ×1× aG− ,C− ×1× aC− ,G+ ×1× aG+ ,0
C: 0 C: 1 C: 0 C: 0
G: 0 G: 0 G: 1 G: 0
T: 0 T: 0 T: 0 T: 1 In general, we DO NOT know the path.
How to estimate the path?
Note: Each set (‘+’ or ‘-’) has an additional set
of transitions as in previous Markov chain
Benos 02-710/MSCBIO2070 1-3.Apr.2013 50
What we have..
A+ C+ G+ T+ A- C- G- T-
A+ .180 .274 .426 .120
Note: these transitions out
of each state add up to one—
C+ .171 .368 .274 .188 no room for transitions
between (+) and (-) states
G+ .161 .339 .375 .125

T+ .079 .355 .384 .182

A- .300 .205 .285 .210

C- .233 .298 .078 .302

Not a valid transition
G- .248 .246 .298 .208
probability matrix nor a
T- .177 .239 .292 .292 complete one!

51
A model of CpG Islands –
Transitions
l  What about transitions between (+) and (-) states?
l  They affect
Length distribution of region +:
l  Avg. length of CpG island
l  Avg. separation between two CpG islands P(L=1) +- = 1-p++ :
1-p++ P(L=2) ++- = p++ (1-p++)
p++ p- - …
P[L= l ] = p++l-1(1-p++)
+ -
1
Geometric distribution, with mean =
1-p- - 1 − p++
Expected length of a state to continue in that state
Benos 02-710/MSCBIO2070 1-3.Apr.2013 52
What we have..
A+ C+ G+ T+ A- C- G- T- (1-λ+) * freq(bi)
A+ .180 .274 .426 .120

C+ .171 .368 .274 .188 Now a valid transition

probability matrix and a
G+ .161 .339 .375 .125
complete one!
T+ .079 .355 .384 .182

A- .300 .205 .285 .210

λ+
C- .233 .298 .078 .302

G- .248 .246 .298 .208

T- .177 .239 .292 .292

(1-λ-)*freq(bi) λ-
53
Another application: Profile HMMs

Profile HMMS (Haussler, 1993)

l  Ungapped alignment of sequence X against profile M

l  ei(a): probability of observing a at position I

l  P(X | M ) = ! e (xi )

i=i,...,L i

l  Score(X | M ) = ! log(ei (xi ) / qxi )

i=1,...,L

l  What about indels ?

Benos 02-710/MSCBIO2070 1-3.Apr.2013 54

Profile HMMs: “match” states
LEVK
LEIR
LEIK
LDVE
We make a single state HMM to represent above profile, using match
states only

Begin M1 M2 M3 M4 End

Deriving emission Pr(L)=1 Pr(E) = 3/4 Pr(V) = 1/2 Pr(R) = 1/4

Pr(D) = 1/4 Pr(I) = 1/2 Pr(K) = 1/4
probabilities for Pr(E) = 1/4
the Match states
Introducing “insert” states to the
previous HMM
We want to know whether (for instance) the sequence
LEKKVK is a good match to the HMM

LE--VK
LE--IR We know it should look like this in the end
LE--IK
LD--VE
LEKKVK

Begin M1 M2 M3 M4 End
Introducing “delete” states to the
previous HMM
We want to know whether (for instance) the sequence
LEK is a good match to the HMM

LEVK
LEIR We know it should look like this in the end
LEIK
LDVE
LE-K

Begin M1 M2 M3 M4 End
Three main applications for
profile HMMs
1. Find sequence homologs
l  ie, we represent a sequence family by an HMM and use that
to identify (“evaluate”) other related sequences
KKKKKK
LEVK IKNGTTT
Convert Search LEAK
LEIR Profile
LEIK HMM ……
GGIAAEEIK
LDVE IIGGGAVVS

Evaluation: So Use Forward

Viterbi is OK too. P ( x,SP * | λ )
L
p
P(x | λ ) = ∑ P ( x,SP | λ ) = ∑ ∏ a(π ,π i )e(π i , xi )
i −1
AllPossibleParses AllPossibleParses i =1
( SP p ;# of possible p's =K L ) ( K L Possibilities)
Three main applications for profile
HMM
2. Align a new sequence to the profile
l  ie, we expnad our multiple sequence alignment

LEVK
LEVK LEIR
Convert Align
LEIR Profile LEIK
LEIK HMM
LDVE
LDVE LE-K

This is Decoding: Use Viterbi

Three main applications for profile
HMM
3. Align a set of sequences from scratch
l  ie, we want to build a multiple sequence alignment of a set of
“unaligned sequences”

LEVK
Align LEIR
LEVK,LEK, LEIR, LEIK, LDVE LEIK
LDVE
LE-K

This needs parameter estimation:

use Baum-Welch
Making multiple sequence alignment
from unaligned sequences
l  Baum-Welch Expectation-maximization method
l  Start with a model whose length matches the average length of
the sequences and with random output and transition
probabilities.
l  Align all the sequences to the model.
l  Use the alignment to alter the output and transition probabilities
l  Repeat. Continue until the model stops changing

l  By-product: It produced a multiple alignment

62
Acknowledgements
Some of the slides used in this lecture are adapted or modified
slides from lectures of:
l  Serafim Batzoglou, Stanford University
l  Bino John, Dow Agrosciences

l  Nagiza F. Samatova, Oak Ridge National Lab

l  Sarah Wheelan, Johns Hopkins University

l  Eric Xing, Carnegie-Mellon University

Theory and examples from the following books:

l  T. Koski, Hidden Markov Models for Bioinformatics , 2001,
Kluwer Academic Publishers
l  R. Durbin, S. Eddy, A. Krogh, G. Mitchison, Biological Sequence
Analysis , 1998, Cambridge University Press
Benos 02-710/MSCBIO2070 1-3.Apr.2013 63

Lec7 - 10 - HMM Learning
No ratings yet
Lec7 - 10 - HMM Learning
88 pages
ECE 368 Course Review: Probabilistic Reasoning 2023
No ratings yet
ECE 368 Course Review: Probabilistic Reasoning 2023
138 pages
(Computational Biology, V. 2) Timo Koski - Hidden Markov Models For Bioinformatics-Kluwer (2001)
No ratings yet
(Computational Biology, V. 2) Timo Koski - Hidden Markov Models For Bioinformatics-Kluwer (2001)
404 pages
Em and Forward
No ratings yet
Em and Forward
32 pages
AlBiI WS2002 3 Huson
No ratings yet
AlBiI WS2002 3 Huson
163 pages
HMM in BI
No ratings yet
HMM in BI
37 pages
Bioinformatics-Lesson 07 - Hidden Markov Model
No ratings yet
Bioinformatics-Lesson 07 - Hidden Markov Model
28 pages
IS 7118 Unit-6 HMM
No ratings yet
IS 7118 Unit-6 HMM
78 pages
Lecture 7
No ratings yet
Lecture 7
25 pages
MCMC
No ratings yet
MCMC
70 pages
Computational Genomics Hidden Markov Models (HMMS)
No ratings yet
Computational Genomics Hidden Markov Models (HMMS)
55 pages
Hidden Markov Models 3pb6fukspf
No ratings yet
Hidden Markov Models 3pb6fukspf
29 pages
UCAS AI模式识别4 参数估计
No ratings yet
UCAS AI模式识别4 参数估计
37 pages
AML Mod2
No ratings yet
AML Mod2
38 pages
Hidden Markov Models: Modified From
No ratings yet
Hidden Markov Models: Modified From
32 pages
SP14 CS188 Lecture 14 - Hidden Markov Models - Print
No ratings yet
SP14 CS188 Lecture 14 - Hidden Markov Models - Print
26 pages
Machine Learning For Natural Language Processing: Hidden Markov Models
No ratings yet
Machine Learning For Natural Language Processing: Hidden Markov Models
33 pages
Gene Finding and HMMS: 6.096 - Algorithms For Computational Biology - Lecture 7
No ratings yet
Gene Finding and HMMS: 6.096 - Algorithms For Computational Biology - Lecture 7
69 pages
Bioinformatics HMM Updated
No ratings yet
Bioinformatics HMM Updated
28 pages
Hidden Markov Models: Forward Algorithm
No ratings yet
Hidden Markov Models: Forward Algorithm
23 pages
1999-Modelling Gene Expression Data Using Dynamic Bayesian Networks
No ratings yet
1999-Modelling Gene Expression Data Using Dynamic Bayesian Networks
12 pages
Tutorial Note 9 Hidden Markov Model
No ratings yet
Tutorial Note 9 Hidden Markov Model
25 pages
Hidden Markov Modelss
No ratings yet
Hidden Markov Modelss
59 pages
Learning HMM Parameters: WWW - Biostat.wisc - Edu/bmi576/ Sroy@biostat - Wisc.edu
No ratings yet
Learning HMM Parameters: WWW - Biostat.wisc - Edu/bmi576/ Sroy@biostat - Wisc.edu
31 pages
Using MALLET For Conditional Random Fields: Matthew Michelson & Craig A. Knoblock CSCI 548 - Lecture 3
No ratings yet
Using MALLET For Conditional Random Fields: Matthew Michelson & Craig A. Knoblock CSCI 548 - Lecture 3
41 pages
Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05
No ratings yet
Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05
67 pages
Hidden Markov Models: CH 3.2, 3.2 of DEKM
No ratings yet
Hidden Markov Models: CH 3.2, 3.2 of DEKM
27 pages
Applying-Hidden Markov Models To Bioinformatics
No ratings yet
Applying-Hidden Markov Models To Bioinformatics
28 pages
Markov Chains
No ratings yet
Markov Chains
22 pages
Fundamentals of Speech Recognition Suggested Project The Hidden Markov Model 1. Project Introduction
No ratings yet
Fundamentals of Speech Recognition Suggested Project The Hidden Markov Model 1. Project Introduction
11 pages
Bts 360 S Article
No ratings yet
Bts 360 S Article
104 pages
Markov Chains: Modified by Longin Jan Latecki Temple University, Philadelphia Latecki@temple - Edu
No ratings yet
Markov Chains: Modified by Longin Jan Latecki Temple University, Philadelphia Latecki@temple - Edu
36 pages
09 - Hidden Markov Model
No ratings yet
09 - Hidden Markov Model
78 pages
Markov Chain Models: BMI/CS 576 WWW - Biostat.wisc - Edu/bmi576/ Cdewey@biostat - Wisc.edu Fall 2010
No ratings yet
Markov Chain Models: BMI/CS 576 WWW - Biostat.wisc - Edu/bmi576/ Cdewey@biostat - Wisc.edu Fall 2010
36 pages
Recitation4 Notes
No ratings yet
Recitation4 Notes
6 pages
Lecture07 HMM S
No ratings yet
Lecture07 HMM S
26 pages
Introduction To Machine Learning CMU-10701: Hidden Markov Models
No ratings yet
Introduction To Machine Learning CMU-10701: Hidden Markov Models
30 pages
Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05
No ratings yet
Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05
67 pages
ML 5
No ratings yet
ML 5
28 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
No ratings yet
Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
29 pages
Cu HMM
No ratings yet
Cu HMM
13 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
32 pages
Lec7 MarkovChains
No ratings yet
Lec7 MarkovChains
14 pages
Machine Learning: E0270 2015 Assignment 4: Due March 24 Before Class
No ratings yet
Machine Learning: E0270 2015 Assignment 4: Due March 24 Before Class
3 pages
Probability & Statistics 2: Robert Šámal January 29, 2024
No ratings yet
Probability & Statistics 2: Robert Šámal January 29, 2024
29 pages
Lec 11
No ratings yet
Lec 11
7 pages
hw3 Solution
No ratings yet
hw3 Solution
7 pages
Hidden Markov Models: Adapted From
No ratings yet
Hidden Markov Models: Adapted From
33 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
51 pages
Lec19 PDF
No ratings yet
Lec19 PDF
9 pages
Lec20 PDF
No ratings yet
Lec20 PDF
7 pages
Stochastic Process Simulation in Matlab
No ratings yet
Stochastic Process Simulation in Matlab
17 pages
Introduction To Hidden Markov Models
No ratings yet
Introduction To Hidden Markov Models
5 pages
17 19 HMMs
No ratings yet
17 19 HMMs
23 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
41 pages
Hmms and The Forward-Backward Algorithm: Hidden Markov Models
No ratings yet
Hmms and The Forward-Backward Algorithm: Hidden Markov Models
7 pages
Python For Artificial Intelligence
No ratings yet
Python For Artificial Intelligence
298 pages
A Review On The Effectiveness of Machine Learning and Deep Learning Algorithms For Cyber Security
No ratings yet
A Review On The Effectiveness of Machine Learning and Deep Learning Algorithms For Cyber Security
19 pages
NLP Lab Tasks
No ratings yet
NLP Lab Tasks
16 pages
Algorithms - Hidden Markov Models
No ratings yet
Algorithms - Hidden Markov Models
7 pages
Htkbook
No ratings yet
Htkbook
354 pages
Recent Advancements in Aircraft Engine Health Management (EHM) Technologies and Recommendations For The Next Step
No ratings yet
Recent Advancements in Aircraft Engine Health Management (EHM) Technologies and Recommendations For The Next Step
13 pages
Hidden Markov Models: Adapted From
No ratings yet
Hidden Markov Models: Adapted From
27 pages
Pos Tagging and Chunking
No ratings yet
Pos Tagging and Chunking
29 pages
4-Lecture Four - (Part of Speech Tagging and Sequence Labeling)
No ratings yet
4-Lecture Four - (Part of Speech Tagging and Sequence Labeling)
36 pages
Automatic Speech Recognition Documentation
No ratings yet
Automatic Speech Recognition Documentation
24 pages
Num Pyro Ai en Stable
No ratings yet
Num Pyro Ai en Stable
309 pages
AVSR Project Report
No ratings yet
AVSR Project Report
62 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
A Survey On Trajectory-Prediction Methods For Autonomous Driving
0% (1)
A Survey On Trajectory-Prediction Methods For Autonomous Driving
23 pages
Hidden Markov Models For Time Series
No ratings yet
Hidden Markov Models For Time Series
18 pages
CREDIT CARD Fraud Detection Hidden Markov Model
No ratings yet
CREDIT CARD Fraud Detection Hidden Markov Model
3 pages
Gen AI Notes Part 1
No ratings yet
Gen AI Notes Part 1
15 pages
Automatic Speech Recognition: Human Computer Interface For Kinyarwanda Language
No ratings yet
Automatic Speech Recognition: Human Computer Interface For Kinyarwanda Language
101 pages
Voice Recognition With Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms
No ratings yet
Voice Recognition With Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms
8 pages
A Comprehensive Review On Music Transcription
No ratings yet
A Comprehensive Review On Music Transcription
20 pages
Mmwave SDK User Guide
No ratings yet
Mmwave SDK User Guide
64 pages
A Review On Automatic Speech Recognition Architect
No ratings yet
A Review On Automatic Speech Recognition Architect
13 pages
Ai Unit 3 Part 2
No ratings yet
Ai Unit 3 Part 2
8 pages
May 14
No ratings yet
May 14
23 pages
ADC22 04 02 Viterbi
No ratings yet
ADC22 04 02 Viterbi
29 pages
Cpu Bach
No ratings yet
Cpu Bach
8 pages
A Greek Voice Recognition Interface For ROV Applications, Using Machine Learning Technologies and The CMU Sphinx Platform
No ratings yet
A Greek Voice Recognition Interface For ROV Applications, Using Machine Learning Technologies and The CMU Sphinx Platform
11 pages
Ai Unit 5
No ratings yet
Ai Unit 5
16 pages
Kernel Ridge Regression
No ratings yet
Kernel Ridge Regression
8 pages
ML Week9 Soln
No ratings yet
ML Week9 Soln
3 pages
Csci544 2023 HW2
No ratings yet
Csci544 2023 HW2
3 pages
Computational Phonogram Archiving: Articles You May Be Interested in
No ratings yet
Computational Phonogram Archiving: Articles You May Be Interested in
7 pages
Linköping University
No ratings yet
Linköping University
3 pages
Stock Market Ieee Paper
No ratings yet
Stock Market Ieee Paper
6 pages
Bahan Ajar Pemodelan Dan Identifikasi Sistem PDF
No ratings yet
Bahan Ajar Pemodelan Dan Identifikasi Sistem PDF
5 pages
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet

Hmms

Uploaded by

Hmms

Uploaded by

HMMs and applications

Notes from Dr. Takis Benos

Markov chain of order n is a stochastic process of

Benos 02-710/MSCBIO2070 1-3.Apr.2013 5

• Markov chain (of first order) and the Chain Rule

Chain rule: P(A,B,C)=P(C|A,B) P(B|A) P(A)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 6

l CG is relatively rare in the genome due to high mutation of methyl-

Benos 02-710/MSCBIO2070 1-3.Apr.2013 7

denoted CpG to not confuse with CG base pair

Methylation often suppressed around genes, promoters

Can we predict CpG islands? – a good way of identifying

To use these models for discrimination, calculate the

Benos 02-710/MSCBIO2070 1-3.Apr.2013 11

Benos 02-710/MSCBIO2070 1-3.Apr.2013

Benos 02-710/MSCBIO2070 1-3.Apr.2013 14

0.95 0.9 ü Ω = {1, 2, 3, 4, 5, 6}

Benos 02-710/MSCBIO2070 1-3.Apr.2013 15

2. Decoding problem

3. Learning problem

Benos 02-710/MSCBIO2070 1-3.Apr.2013 16

Benos 02-710/MSCBIO2070 1-3.Apr.2013 17

= 0.95 29 " (1/6) 30 = 0.226 " 4.52 "10#24 =

= (1/2) 6 " (1/10) 24 " 0.90 29 = 1.56 "10#26 " 0.047 =

What happens in a sliding window?

1. Decoding problem

2. Learning problem

Benos 02-710/MSCBIO2070 1-3.Apr.2013 19

Benos 02-710/MSCBIO2070 1-3.Apr.2013 20

P(x i ) = # P(x i ,") =# P(x i | ") $ P(")

l To avoid summing over an exponential number of paths π,

Benos 02-710/MSCBIO2070 1-3.Apr.2013 21

f k (i) = P(x1,..., x i"1, x i ,# i = k)

! Benos 02-710/MSCBIO2070 1-3.Apr.2013 22

Iteration: f k (i) = ek (x i ) " $ f l (i #1) " al,k

Benos 02-710/MSCBIO2070 1-3.Apr.2013 23

l What if we started from the end?

Benos 02-710/MSCBIO2070 1-3.Apr.2013 24

bk (i) = P(x i+1,..., x N | " i = k)

=# # P(x i+1,..., x N ," i+1 = l," i+2 ,...," N | " i = k)

Initialization: bk (N) = ak,0 , "k

Iteration: bk (i) = # ek (x i+1 ) " bl (i + 1) " ak,l

! Benos 02-710/MSCBIO2070 1-3.Apr.2013 26

Benos 02-710/MSCBIO2070 1-3.Apr.2013 27

Vk (i) = max{"1 ,...," i #1 } P(x1,..., x i#1,"1,...," i#1," i = k)

l Then, we need to write the Vk(i) as a function of the previous

Benos 02-710/MSCBIO2070 1-3.Apr.2013 29

Similar to aligning a set of states to a sequence

Viterbi decoding: traceback

Similar to aligning a set of states to a sequence

Viterbi decoding: traceback

Benos 02-710/MSCBIO2070 1-3.Apr.2013 32

Initialization: Initialization: Initialization:

Iteration: Iteration: Iteration:

Time: O(K2N) Space: O(KN) Time: O(K2N) Space: O(KN)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 33

ü Decoding problem

Benos 02-710/MSCBIO2070 1-3.Apr.2013 34

12341231 62616364616 23411221341

Benos 02-710/MSCBIO2070 1-3.Apr.2013 35

GIVEN: a genomic region x = x1…x1,000,000 where we have good

2. Unsupervised learning

TARGET: Update the parameters θ of the model to maximize P(x|θ)

Benos 02-710/MSCBIO2070 1-3.Apr.2013 36

l The maximum likelihood parameters θ are:

Benos 02-710/MSCBIO2070 1-3.Apr.2013 37

l Solution: add pseudocounts

Benos 02-710/MSCBIO2070 1-3.Apr.2013 38

l Solution: add pseudocounts

Benos 02-710/MSCBIO2070 1-3.Apr.2013 39

l EXPECTATION MAXIMIZATION (EM) in a nutshell

l In HMM training, we usually apply a special case of EM, called

Benos 02-710/MSCBIO2070 1-3.Apr.2013 40

• Termination: if Δlog-likelihood < threshold or Ntimes>max_times

Benos 02-710/MSCBIO2070 1-3.Apr.2013 41

2. E-step: Estimate the expected number of kàl transitions, Ak,l

•  Markov chain (of first order) and the Chain Rule

l  CG is relatively rare in the genome due to high mutation of methyl-

0.95 0.9 ü Ω = {1, 2, 3, 4, 5, 6}

2.  Decoding problem

3.  Learning problem

1.  Decoding problem

2.  Learning problem

l To avoid summing over an exponential number of paths π,

l  What if we started from the end?

l  Then, we need to write the Vk(i) as a function of the previous

ü  Decoding problem

2.  Unsupervised learning

l  The maximum likelihood parameters θ are:

l  Solution: add pseudocounts

l  Solution: add pseudocounts

l  EXPECTATION MAXIMIZATION (EM) in a nutshell

l  In HMM training, we usually apply a special case of EM, called

•  Termination: if Δlog-likelihood < threshold or Ntimes>max_times

2.  E-step: Estimate the expected number of kàl transitions, Ak,l

•  Termination: if Δlog-likelihood < threshold or Ntimes>max_times

-  has many local optima

l  I observe the dog across the street.

l  I assume that since he can not open the door

l  Since I am lazy, I will guess there are only two

l  Now, we have generated the updated transition,

l  Guaranteed to increase the likelihood P(x | θ)

l  Not guaranteed to find globally optimal parameters

l  Too many parameters / too large model ⇒

l  Ungapped alignment of sequence X against profile M

l  P(X | M ) = ! e (xi )

l  Score(X | M ) = ! log(ei (xi ) / qxi )

l  What about indels ?

l  By-product: It produced a multiple alignment

l  Nagiza F. Samatova, Oak Ridge National Lab

l  Sarah Wheelan, Johns Hopkins University

l  Eric Xing, Carnegie-Mellon University