0% found this document useful (0 votes)

12 views

Bioinformatics-Lesson 07 - Hidden Markov Model

This document provides an introduction to Markov chain models and hidden Markov models. It discusses using Markov chains to model nucleotide frequencies in DNA and discriminate between CpG islands and other genomic regions. It then introduces hidden Markov models and the key questions they aim to answer: the likelihood of a sequence, the most probable path that generated a sequence, and learning model parameters from example sequences. An example of an occasionally dishonest casino is used to illustrate hidden Markov models and the forward algorithm for calculating sequence likelihoods.

Uploaded by

mahedi hasan

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Bioinformatics-Lesson 07 - Hidden Markov Model

Uploaded by

mahedi hasan

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Introduction to Bioinformatics

Lecture 9
A Markov Chain Model
• Nucleotide frequencies in the human genome
A C T G
29.5 20.4 20.5 29.6
A Markov Chain Model
• Traditionally end of a sequence is not modelled
• can also have an explicit end state; allows the model
to represent
– a distribution over sequences of different lengths
– preferences for ending sequences with certain
symbols
Markov Chain Model: Definition

• a Markov chain model is defined by

– a set of states
• some states emit symbols
• other states are silent
– (e.g. the begin and end states)
– a set of transitions with associated
probabilities
• the transitions emanating from a given state
define a distribution over the possible next
states
Markov Chain Model: Property
• given some sequence x of length L, we can ask how
probable the sequence is given our model
• for any probabilistic model of sequences, we can write
this probability as
Pr(x)  Pr(xL , xL1 ,K , x1 )
 Pr(xL | xL1 ,K , x1 ) Pr(xL1 | xL2 ,K , x1 )K Pr(x1 )

• key property of a (1st order) Markov chain: the

probability of each xi depends only on the value of xi-1
Pr(x)  Pr(xL | xL1 ) Pr(xL 1 | xL2 )K Pr(x2 | x1 ) Pr(x1 )
L
 Pr(x1 ) Pr(xi | xi 1 )
i2
The Probability of a Sequence for a Given
Markov Chain Model

Pr(cggt)  Pr(c) Pr(g | c) Pr(g | g) Pr(t | g) Pr(end |

t)
Markov Chain Model: Notation
• the transition parameters can be denoted by a
xi1 xi where

a xi1 xi  Pr(xi | x )
i1
• similarly we can denote the probability of a sequence x
as B x1 L  L

xi1 xi  Pr(x1 ) Pr(xi |

a i2
a i2
xi 1 )
where aB x i
represents the transition from the begin state
Written CpG to

CpG Islands distinguish from

a C≡G base pair

• CpG dinucleotides are rarer than would be expected

from the independent probabilities of C and G.
– Reason: When CpG occurs, C is typically chemically
modified by methylation and there is a relatively high
chance of methyl-C mutating into T
• A CpG island is a region where CpG dinucleotides
are much more abundant than elsewhere.
• High CpG frequency may be biologically significant;
e.g., may signal promoter region (“start” of a gene).
Markov Chain for Discrimination
• suppose we want to distinguish CpG islands from
other sequence regions
• given sequences from CpG islands, and sequences
from other regions, we can construct
– a model to represent CpG islands (model +)
– a null model to represent other regions (model -)
• can then score a test sequence by:

Pr( x | model)
score( x)  log
Pr(x | model - )
Markov Chain for Discrimination
• parameters estimated for + and - models
– human sequences containing 48 CpG islands
– 60,000 nucleotides

• Calculated Transition probabilities for both models

Markov Chain for Discrimination
• Calculated the log-odds ratio

L  L
a
score(x)  log
Pr( x | model)
Pr(x | model
  log xi1 x
i    xi1 xi
i1 i1

xi1 xi
a
-)
x x
• i1
i
are the log-likelihood ratios of corresponding
transition probabilities
 A C G T
A -0.740 0.419 0.580 -0.803

C -0.913 0.302 1.812 -0.685

T -0.624 0.461 0.331 -0.730

T -1.169 0.573 0.393 -0.679

Markov Chain for Discrimination

•Solid bars represent

non-CPG

•Dotted bars represent

CpG islands

•Error could be due to

inadequate modeling or
mislabelling
A simple Hidden Markov Model (HMM)

• given say a T in our input sequence, which state

emitted it?
Why Hidden ?

• we’ll distinguish between the observed parts

of a problem and the hidden parts
• in the Markov chain models it is clear which
state accounts for each part of the observed
sequence
• in the model above, there are multiple states
that could account for each part of the
observed sequence
– this is the hidden part of the problem
– the essential difference between Markov
chain and Hidden Markov model
Hidden Markov Models
• Components:
– Observed variables
• Emitted symbols
– Hidden variables
– Relationships between them
• Represented by a graph with transition
probabilities

• Goal: Find the most likely explanation for the

observed variables
Notations in HMM
• States are decoupled from symbols
• x is the sequence of symbols emitted by model
– xi is the symbol emitted at time i
• A path, , is a sequence of states
– The i-th state in  is i
• akr is the probability of making a transition from
state k to state r:
akr Pr(i  r |i 1
 k)
• ek(b) is the probability that symbol b is emitted
when in state k

e (b )  Pr(x b|
The occasionally dishonest casino
• A casino uses a fair die most of the time, but
occasionally switches to a loaded one
– Fair die: Prob(1) = Prob(2) = . . . = Prob(6) =
1/6
– Loaded die: Prob(1) = Prob(2) = . . . = Prob(5) = 1/10,
Prob(6) = ½
– These are the emission probabilities
• Transition probabilities
– Prob(Fair  Loaded) = 0.01
– Prob(Loaded  Fair) = 0.2
– Transitions between states obey a Markov process
An HMM for occasionally dishonest casino
1: 1/6
2: 1/6 akl
0.99
0.80
0.01
1: 1/10
2: 1/10
3: 1/6 3: 1/10
4: 1/6 4: 1/10
5: 1/6 5: 1/10
0.2
6: 1/6 6: 1/2

ek (b) Fair Loaded

The occasionally dishonest casino
• Known:
– The structure of the model
– The transition probabilities
• Hidden: What the casino did
– FFFFFLLLLLLLFFFF...
• Observable: The series of die
tosses
– 3415256664666153...
• What we must infer:
– When was a fair die used?
– When was a loaded one used?
• The answer is a sequence
FFFFFFFLLLLLLFFF...
Three Important Questions
• How likely is a given sequence?
– the Forward algorithm
• What is the most probable “path” for
generating a given sequence?
– the Viterbi algorithm
• How can we learn the HMM parameters
given a set of sequences?
– the Baum-Welch (Forward-Backward)
algorithm
How Likely is a Given Sequence?
The probability that the path 1, 2,…, L is taken
and the sequence x1,x2,…,xL is generated:

Pr(x1 ,K , xL | 1 ,K ,  L )  a0 1

 e (xi )a 
i i i1

i1

(assuming begin/end are the only silent

states on path)
The occasionally dishonest casino
x  x1 , x2 , 
x3 6,2,6
Pr(x,  )  a0 F eF (6)a FF eF (2)a FF eF (6)
(1)

1 1

1
0.5  0.99   0.99 
 FFF
 (1)
6 6
6
 0.00227
Pr(x, 
(2)
)  a0 L eL (6)aLL eL (2)aLL eL (6)
 LLL
 (2)  0.5 0.5 0.8 0.1 0.8
0.5
 0.008
Pr(x,  )  a0 L eL (6)aLF eF (2)aFL eL (6)aL 0
(3)

 LFL 1
 0.5 0.5 0.2   0.01
 (3)
0.5
6
 0.0000417
Making the inference
• Model assigns a probability to each explanation of the
observation:
P(326|FFL)

• Maximum Likelihood: Determine which

explanation is most likely
– Find the path most likely to have produced
the observed sequence

• Total probability: Determine probability

that observed sequence was produced by the
HMM
– Consider all paths that could have produced
the observed sequence
How Likely is a Given Sequence?
• for a single path  the probability that the
sequence x is generated L

Pr(x1 ,K , xL | 1 ,K ,  L )  a0 1

 e (xi )a 
i i i1

i1

• the probability over all such paths is:

Pr(x
• but,K
1
the number
sequence...

, xL ) ofpaths Pr(x
can
1 ,K ,
be exponential in the length of the

xL • the Forward algorithm enables us to compute this efficiently

The most probable path
The most likely path * satisfies

*  argmax
Pr(x,  )

To find *, consider all possible ways the last
symbol of x could have been emitted
vk (i )  Prob. of path  1 ,L ,  i most
Let
to emit x ,K , xlikely
1 i such
that  i k
Then
vk (i )  ek (xi ) max
r
rv rk

(i  1)a 
The Viterbi Algorithm
• Initialization: (i = 0)
v0 (0)  1, vk (0)  0 for k  0
• Recursion: (i = 1, . . . , L): For each state k
vk (i)  ek (xi ) max
r
vr (i 1)ark 
• Termination:
Pr(x,  * )  vk (L)ak
max 0
k

To find *, use trace-back, as in dynamic programming

Viterbi: Example
x
 6 2 6
1 0 0 0
B
0 (1/6)(1/2) (1/6)max{(1/12)0.99, (1/6)max{0.013750.99,
= 1/12 (1/4)0.2} 0.020.2}
F
 = 0.01375 = 0.00226875

0 (1/2)(1/2) (1/10)max{(1/12)0.01, (1/2)max{0.013750.01,

= 1/4 (1/4)0.8} 0.020.8}
L
= 0.02 = 0.08

0.80
0.99
0.01 1: 1/10
1: 1/6
2: 1/10
2: 1/6

rv
3: 1/10
vk (i )  ek (xi ) max
r
rk
3:
4:
1/6
1/6
4: 1/10
5: 1/10

5: 1/6 0.2
(i  1)a 6: 1/6
6: 1/2

Fair Loaded
Viterbi gets it right more often than not

The numbers in first rows show 300 rolls of a die. Second rows show which die
was actually used for the roll. Third lines show the Viterbi algorithm prediction

M348 Applied Statistical Modelling - Applications
No ratings yet
M348 Applied Statistical Modelling - Applications
512 pages
Applied Longitudinal Data Analysis Ch1&2
No ratings yet
Applied Longitudinal Data Analysis Ch1&2
48 pages
Bioinformatics HMM Updated
No ratings yet
Bioinformatics HMM Updated
28 pages
Hidden Markov Models: CH 3.2, 3.2 of DEKM
No ratings yet
Hidden Markov Models: CH 3.2, 3.2 of DEKM
27 pages
Gene Finding and HMMS: 6.096 - Algorithms For Computational Biology - Lecture 7
No ratings yet
Gene Finding and HMMS: 6.096 - Algorithms For Computational Biology - Lecture 7
69 pages
Hidden Markov Models: Modified From
No ratings yet
Hidden Markov Models: Modified From
32 pages
Hmms
No ratings yet
Hmms
59 pages
Markov Chain Models: BMI/CS 576 WWW - Biostat.wisc - Edu/bmi576/ Cdewey@biostat - Wisc.edu Fall 2010
No ratings yet
Markov Chain Models: BMI/CS 576 WWW - Biostat.wisc - Edu/bmi576/ Cdewey@biostat - Wisc.edu Fall 2010
36 pages
Computational Genomics Hidden Markov Models (HMMS)
No ratings yet
Computational Genomics Hidden Markov Models (HMMS)
55 pages
Lecture07 HMM S
No ratings yet
Lecture07 HMM S
26 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
51 pages
Introduction To Machine Learning CMU-10701: Hidden Markov Models
No ratings yet
Introduction To Machine Learning CMU-10701: Hidden Markov Models
30 pages
BT302_L9_HMM
No ratings yet
BT302_L9_HMM
29 pages
ML 5
No ratings yet
ML 5
28 pages
Markov Models: Current Next Transition Probabilities Current
100% (1)
Markov Models: Current Next Transition Probabilities Current
53 pages
Hidden Markov Modelss
No ratings yet
Hidden Markov Modelss
59 pages
8.1 HMM
No ratings yet
8.1 HMM
50 pages
hw3 Solution
No ratings yet
hw3 Solution
7 pages
Module 5
No ratings yet
Module 5
51 pages
HMM in BI
No ratings yet
HMM in BI
37 pages
Markov Chains
No ratings yet
Markov Chains
22 pages
Bts 360 S Article
No ratings yet
Bts 360 S Article
104 pages
Recitation4 Notes
No ratings yet
Recitation4 Notes
6 pages
IS 7118 Unit-6 HMM
No ratings yet
IS 7118 Unit-6 HMM
78 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
32 pages
Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05
No ratings yet
Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05
67 pages
6761 4 MarkovChains
No ratings yet
6761 4 MarkovChains
76 pages
Markov Chains
No ratings yet
Markov Chains
6 pages
Hidden Markov Model (HMM) Architecture
No ratings yet
Hidden Markov Model (HMM) Architecture
15 pages
MCMC
No ratings yet
MCMC
70 pages
Learning HMM Parameters: WWW - Biostat.wisc - Edu/bmi576/ Sroy@biostat - Wisc.edu
No ratings yet
Learning HMM Parameters: WWW - Biostat.wisc - Edu/bmi576/ Sroy@biostat - Wisc.edu
31 pages
Markov Chains: Modified by Longin Jan Latecki Temple University, Philadelphia Latecki@temple - Edu
No ratings yet
Markov Chains: Modified by Longin Jan Latecki Temple University, Philadelphia Latecki@temple - Edu
36 pages
Probabilistic Models
No ratings yet
Probabilistic Models
34 pages
Unit - 4 Hidden Markov Models
No ratings yet
Unit - 4 Hidden Markov Models
39 pages
Module 6.2
No ratings yet
Module 6.2
25 pages
Lecture 11
No ratings yet
Lecture 11
55 pages
Machine Learning For Natural Language Processing: Hidden Markov Models
No ratings yet
Machine Learning For Natural Language Processing: Hidden Markov Models
33 pages
ViteRbi Algorithm
No ratings yet
ViteRbi Algorithm
19 pages
Probability & Statistics 2: Robert Šámal January 29, 2024
No ratings yet
Probability & Statistics 2: Robert Šámal January 29, 2024
29 pages
Lec20 PDF
No ratings yet
Lec20 PDF
7 pages
Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05
No ratings yet
Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05
67 pages
HMM
No ratings yet
HMM
25 pages
Backward Algo
No ratings yet
Backward Algo
4 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
41 pages
Hidden Markov Models and Sequential Data
No ratings yet
Hidden Markov Models and Sequential Data
45 pages
Chapter 4 - Discrete Time Markov Chains
No ratings yet
Chapter 4 - Discrete Time Markov Chains
37 pages
Markov Chain Implementation in C++ Using Eigen
No ratings yet
Markov Chain Implementation in C++ Using Eigen
9 pages
Title: Hidden Markov Model: Hidden Markov Model The States That Were Responsible For Emitting The Various Symbols Are
No ratings yet
Title: Hidden Markov Model: Hidden Markov Model The States That Were Responsible For Emitting The Various Symbols Are
5 pages
Implementation of Discrete Hidden Markov Model For Sequence Classification in C++ Using Eigen
No ratings yet
Implementation of Discrete Hidden Markov Model For Sequence Classification in C++ Using Eigen
8 pages
Cu HMM
No ratings yet
Cu HMM
13 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
26 pages
Markov Chain - Lecture8b
No ratings yet
Markov Chain - Lecture8b
90 pages
Programación Dinamica Aplicacion Al Analisis de ADN
No ratings yet
Programación Dinamica Aplicacion Al Analisis de ADN
19 pages
Stochastic Process Simulation in Matlab
No ratings yet
Stochastic Process Simulation in Matlab
17 pages
HMM Bioinformatics
No ratings yet
HMM Bioinformatics
30 pages
(Computational Biology, V. 2) Timo Koski - Hidden Markov Models For Bioinformatics-Kluwer (2001)
No ratings yet
(Computational Biology, V. 2) Timo Koski - Hidden Markov Models For Bioinformatics-Kluwer (2001)
404 pages
17 19 HMMs
No ratings yet
17 19 HMMs
23 pages
Markov Chains
No ratings yet
Markov Chains
42 pages
Chapter 4 Markov Chain
No ratings yet
Chapter 4 Markov Chain
39 pages
(MTL106) Review Notes - Stochastic Processes (IITD)
No ratings yet
(MTL106) Review Notes - Stochastic Processes (IITD)
8 pages
Hidden Markov Model: Fundamentals and Applications
From Everand
Hidden Markov Model: Fundamentals and Applications
Fouad Sabry
No ratings yet
Robot Manipulators: Modeling, Performance Analysis and Control
From Everand
Robot Manipulators: Modeling, Performance Analysis and Control
Etienne Dombre
No ratings yet
HW 2 Chap 1 Jxip Final PDF
100% (1)
HW 2 Chap 1 Jxip Final PDF
11 pages
III Semester Time-Table
No ratings yet
III Semester Time-Table
2 pages
Unit 1 Review Packet
No ratings yet
Unit 1 Review Packet
10 pages
Statistics: Math Holiday Homework
No ratings yet
Statistics: Math Holiday Homework
18 pages
Sip Project Report Guidelines: Progressive Education Society's Modern College of Engineering, Pune - 5
No ratings yet
Sip Project Report Guidelines: Progressive Education Society's Modern College of Engineering, Pune - 5
11 pages
3RD Sem CS Ass-1
No ratings yet
3RD Sem CS Ass-1
2 pages
Starr, Martha A. (2014), Qualitative and Mixed - Methods Research in Economics. Surprising Growth, Promising Future
No ratings yet
Starr, Martha A. (2014), Qualitative and Mixed - Methods Research in Economics. Surprising Growth, Promising Future
27 pages
Lab 7 Worksheet T Tests
No ratings yet
Lab 7 Worksheet T Tests
2 pages
PR2 Unit 2 12 Stem
No ratings yet
PR2 Unit 2 12 Stem
49 pages
Group 4 Sja Finalest File (2)
No ratings yet
Group 4 Sja Finalest File (2)
54 pages
Psych Stats 4 Parametric Tests
No ratings yet
Psych Stats 4 Parametric Tests
133 pages
Data Science
No ratings yet
Data Science
16 pages
4.2! Probability Distribution Function (PDF) For A Discrete Random Variable
No ratings yet
4.2! Probability Distribution Function (PDF) For A Discrete Random Variable
5 pages
ALY6010 - Project 3 Document - Electronic Keno - v1 PDF
No ratings yet
ALY6010 - Project 3 Document - Electronic Keno - v1 PDF
6 pages
IS328 Data Mining-Tutorial 1 Solution
No ratings yet
IS328 Data Mining-Tutorial 1 Solution
5 pages
Subjects1415 PDF
No ratings yet
Subjects1415 PDF
488 pages
Intro Opnet Modeler Lecture 16.0
No ratings yet
Intro Opnet Modeler Lecture 16.0
185 pages
Item Analysis
No ratings yet
Item Analysis
7 pages
SQCT 2
No ratings yet
SQCT 2
22 pages
Mathematics Subject Curriculum
No ratings yet
Mathematics Subject Curriculum
12 pages
DE GUZMAN, ISAIAH Q._MMEM
No ratings yet
DE GUZMAN, ISAIAH Q._MMEM
19 pages
K100230027 - Amelia Diah Syahlu F - Tugas Statistika P9
No ratings yet
K100230027 - Amelia Diah Syahlu F - Tugas Statistika P9
6 pages
Maths3 3B Dec12
No ratings yet
Maths3 3B Dec12
5 pages
Moges Asmare PDF
No ratings yet
Moges Asmare PDF
87 pages
Assignment 6-Hijada Exercise 13.2
No ratings yet
Assignment 6-Hijada Exercise 13.2
5 pages
Stata Item Response Theory Reference Manual: Release 17
No ratings yet
Stata Item Response Theory Reference Manual: Release 17
257 pages
DR Prasanta Paul PDF
No ratings yet
DR Prasanta Paul PDF
2 pages
3is Reviewer
No ratings yet
3is Reviewer
5 pages