Design Presentation
Design Presentation
Gene Recognition
Our Objective:
To find the coding and non-coding regions of an unlabeled string of DNA nucleotides
Our Motivation:
Assist in the annotation of genomic data produced by genome sequencing methods
Outline
Problems Faced Hidden Markov Models Viterbi algorithm Forward-Backward algorithm System Architecture User Interface
HMM Assumptions
Observations are ordered Random process can be represented by a stochastic finite state machine with emitting states.
HMM Parameters
Using weather example Modeling daily weather for a year Two types of parameters Represented in two tables.
One for emissions One for transitions
HMM Estimation
Called training, it falls under machine learning Feed an architecture (given in advance) a set of observation sequences The training process will iteratively alter its parameters to fit the training set The trained model will assign the training sequences high probability
HMM Usage
Two major tasks Evaluate the probability of an observation sequence given the model (Forward) Find the most likely path through the model for a given observation sequence (Viterbi)
Viterbi Algorithm
A dynamic programming algorithm, based on HMM for finding the most likely sequence of hidden states called the Viterbi path The algorithm makes the assumption that computing the most likely hidden sequence up to a certain point t must depend only on the observed event at point t, and the most likely sequence at point t 1.
Forward-Backward Algorithm
An inference algorithm for hidden Markov models which is used to find the most likely state for any point in time It involves two passes
Pass 1- a set of forward probabilities are computed which provide, the probability of ending up in any particular state given the first k observations in the sequence Pass 2- the algorithm computes a set of backward probabilities which provide the probability of observing the remaining observations given any starting point k
System Architecture