0% found this document useful (0 votes)

89 views54 pages

Bio in For Ma Tics

Bioinformatics is the application of computer science and information technology to understand and organize the information associated with biological molecules like DNA and proteins on a large scale. It involves describing, analyzing, simulating and predicting biological processes using computational tools. The massive amounts of data from biological experiments makes analysis complex. Bioinformatics aims to enable new biological insights and discern unifying biological principles. It has applications in areas like sequence analysis, structure prediction, and drug development.

Uploaded by

Kaveesh Dashora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views54 pages

Bio in For Ma Tics

Uploaded by

Kaveesh Dashora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

WHAT IS BIOINFORMATICS ?

•Conceptualizing Biology in terms of molecules and applying informatics

techniques (applied math's, computer science, and statistics) to understand and
organize the information associated with these molecules on a large scale.
• MIS for molecular Biology

FUNDAMENTAL ISSUES:
How we describe, analyze, simulate and predict the dynamics of various biological
processes using IT tools.
COMPLEXITY: due to massive amount of data obtained through numerous
Biological experiments.
Managing and interpreting Biological data:
EMBL database : 17,807,926,047 nucleotides
Total entries : 15,851,373
BIOINFORMATICS
Biology, Computer Science & IT
Ultimate goal:- to enable discovery of new biological insights as well as create global
perspective from which unifying principles of Biology can be discerned.
Job prospects:-
• $300 billion Pharmaceutical industries
• Fast growing biotech industry to support it
• Drug design going to be genomics related
• shortage of Bioinformatics and molecular modeling specialists.
• Global market share for Bioinformatics is expected to cross US $ 60 billion by year
2006.
Salaries range:-
US $ 80,000/- to US $ 200,000/- per annum.
BIOINFORMATICS
• Science of using information to understand biological phenomena
• Part of Computational Biology
Bioinformatics consists of:

• DNA Micro arrays: technology to measure relative copies of

*genetic message at different stages of development or disease. (*
levels of gene expression)
• Functional genomics: large scale ways of identifying gene
functions & associations.
• Structural genomics: attempts to crystallize/predict structures of all
proteins.
• Comparative genomics: understand differences & similarities
between all the genes of multiple species – evolution.
• Medical informatics: management of biomedical experimental
data.
OBJECTIVES:
•Organizing data
•Develop tools & resources for analysis of data
•Analysis and interpretation of results

APPLICATIONS:
•Sequence analysis
•Primer design – short sequences to make many copies of a piece of
DNA sequences
•Predict function of actual gene products
•Molecular modeling
•Crystallography – structural biology
•Genetic engineering
BIOINFORMATICS
(AN OVERVIEW)
THE CURRENT EXCITEMENT IN BIOLOGY
THE CHALLENGES OF
BIOINFORMATICS.

TODAY, TOMORROW AND THE NEAR FUTURE

THE THREE E’s

Extracting
Envisaging THE BIOLOGICAL DATA
Elucidating
CENTRAL DOGMA

OF EARLY MOLECULAR BIOLOGY OF BIOINFORMATICS

DNA SEQUENCE

mRNA STRUCTURE

Shift of
PROTEIN paradigm FUNCTION
THE AGE OF BIOCOMPUTING

THE DNA COMPUTING

(The gene-protein assessment)
THE FUTURE NEXT OF
BIOINFORMATICS
Genome GENOMICS

Protein PROTEOMICS

Biochemical pathways METABOLITES

(METABONOMICS ???)

Phenotype (Function)
FUNCTOMICS

TRANSCRIPTOMICS

~THE OMICS
THE RELATED DISCIPLINES
OF BIOINFORMATICS
Cheminformatics

Medical Informatics

Health Informatics

Medical computing

Nursing informatics
STRUCTURE PREDICTION : UNLOCKING

BIOLOGICAL SECRETS

(GENE IDENTIFICATION TOOLS)

 EXPASY

 SWISS MODEL

 GENO 3D

CPH MODELS ETC…

Insilico APPROACHES OF THE DRUG
DEVELOPMENT

ADME PROPERTIES

ABSORPTION

DISTRIBUTION

METABOLISM

ELIMINATION
.

WHAT METABONOMICS IS ALL ABOUT

???
ITS RELATIONSHIP USING BIOINFORMATICS TOOLS

ANABOLISM + CATABOLISM --->>>>>> METABOLISM

THE STUDIES USING BIOINFORMATICS…….METABONOMICS

PATTERN RECOGNITION AND DATABASES

 REBASE
 PROSITE
 TRANSCRIPTION FACTOR DATABASE
 TRANSFAC
 EUCARYOTIC PROMOTER DATABASE
BIOINFORMATICIST YAHOOO!!!! I AM
A HYBRID

BIOTECHNOLOGY CONCEPTS

INFORMATION TECHNOLOGY

TOOLS
THE EXTENDING SAGA OF
BIOINFORMATICS

WANT YOUR (SEQUENCE)

MAP.??????
Human Genome Map for
$250 only REGISTER
BUY ONE AND GET
MOUSE GENOME
TRANSCRIPTOME
FREE
LIVING WITH BIOINFORMATICS?????

WAAHH!!!
MOUSEMAN….
WONDER HAVE THEY
JUST SEQUENCED
MOUSE GENOME MAP?
SEQUENCES

Sequences:- Viewed as strings of characters for

convenience of understanding & performing
Mathematical functions.

• Proteins & DNA may be similar with respect to their

function, structure or primary sequence of amino or
nucleic acids.
• Sequence determines shape, shape determines function.
• We study sequence similarity to discover similarity in
shape & function.
Similarity in sequences

Quantitative Qualitative

Similarity measure An alignment i.e.

i.e. two sequences mutual
show certain arrangement of two
degree of similarity sequences where
two sequences are
similar & where
they differ

Optimal alignment – that exhibit most correspondences &

least differences.
BIOLOGICAL MOTIVATIONS OF SEQUENCE
ANALYSIS
• Large variety of biological problems involve sequences
• Sequence alignment – useful for discovering information related to
functions, structure and evolution
Examples :
• Reconstructing long sequences of DNA from overlapping strings
fragments.
• Determine physical and genetic maps from probe data under various
experiments protocols.
• Storing, retrieving and comparing DNA strings.
• Comparing two or more strings for similarities to find related Proteins
• Exploring frequently occurring patterns of nucleotides.
• Finding informative elements in Proteins & DNA sequences.
• Identify an unknown sequence.
• Find other members of multigene families.
Aim: Learn functionality & structure of Protein without performing
experiments & without physically constructing Protein itself.

Basic idea: Similar sequences produce similar proteins.

Predict characteristics of Proteins using its sequence data.

Example: Let two Protein sequences are identical at 25% of their

positions. This association is found in Cancer and uncontrolled growth
cells. Compare sequence of Cancer associated gene and sequence of
Protein which influences cell growth.
 Correlation was very high.
 Proves connection between the two.
IDENTICAL

SIMILAR

ANALOGOUS HOMOLOGOUS

ORTHOLOGOUS PARALOGOUS
Concepts:

Identical: when corresponding character is shared between two species

that character is said to be identical.

Similar: Degree to which two species or populations share identities.

Homologous: When characters are similar due to common ancestry.

Analogous: When characters are similar due to convergent evolution they

are analogous.

Orthologous: When characters are homologous with conserved function.

Paralogous: When characters are homologous with divergent function.

SIMILARITY & DIFFERENCES
Similarity:- Maximal sum of weights. Assign weights corresponding for
resemblance.

Occurred due to mutations – modifying DNA sequences

Insertion of letter/letters in a sequence.
Deletion of letter/letters in a sequence.
Substitution of letter by another.

The notion of distance, assigning weights to each mutation.

Distance between minimal sum of weights for a set of mutations.
MODELS FOR SEQUENCE ANALYSIS
 Global alignment
Input: two strings S & T roughly of same length
Q: What is the difference (similarity) between the two?
It is done across entire sequence length to include as many matches as
possible including sequence end.
 Local alignment / similarity (more meaningful)
Input: Two strings S & T
Q: what is maximum similarity ( minimum difference) between substring
of S & substring of T?
Q: What are these most similar substrings?
Example: S=a b c x d e x
T= x x c - d e
We give each match a value 2 & mismatch a value -1.
α =cxde
β =c–de
Have optimal alignment
Model for sequence analysis continued -

 Ends free space alignment

Input: two strings of S & T of different lengths.
Q: what is maximum similarity between substrings of S & T?
Given: least one of these substrings must be prefix of the original string
& one (not necessarily other) must be a suffix.

Example:
S= - - c a c - d b d v l
T= l t c a b d d b - - -
Two leading spaces at left end of alignment are free as well as
three trailing spaces at right hand side.

 Gap penalty
Input: Two strings S & T of different length.
Q: Define gap as any maximal consecutive run of spaces, length of
gap as the number of indel operations. What is the similarity
between two strings, given weight function for gaps.
Model for sequence analysis continued -
Example :
S=attc- -ga-tggacc
T=a--cgtgatt - - - cc
Four gaps of total eight spaces.
Then alignment would be described as
 Seven matches
 No mismatch
 Eight spaces

• Length of gap is No. of indel operations.

• Concept of gap in alignment is important in many

biological applications.

• Mutational events create gaps of varying sizes.

METHODS OF ALIGNMENT
• DOT MATRIX – useful for simple alignment, however does not show
sequences or produce optimal alignment.
• BRUTE FORCE – produces alignments without gaps and has an
N2 complexity, where N is length of sequences.
• DYNAMIC PROGRAMMING – produces optimal alignment by
starting an alignment from one end (as in dot matrix), then keeping
track of all possible best alignments to that point.
• HEURISTICS METHODS – fast computational machine-based
methods. May not be as accurate as dynamic programming.

GRAPHIC SIMILARITY COMPARISONS:-

GGCTTGACCGG - -> GGCTTGACCGG - -> GGCTTGACCGG - ->

GGATTGACCCG--> GGATTGACCCG GGCTTGACCGG - ->

SIMILARITY VERUS DISTANCE
1. Elements of the matrices specify the weight to assign a
given comparison by:
• The cost of replacing one residue with another (distance); or
• A measure of the similarity for the replacement.
2. Similarity is used for database searching.
3. Distance is more applicable for phylogenetic tree
reconstruction.
4. Maximizing the similarity is fundamentally the same as
minimizing a distance. Hence distance and similarity
matrices are inter-convertible by some mathematical
transformation appropriate for the given application.

SIMILARITY DISTANCE
Local alignment Evolution & phylogeny
Suited for comparing proteins Triangle inequality
GLOBAL Vs LOCAL SIMILARITY
Global algorithms are not sensitive for highly diverged sequences, a better and faster
method is local similarity. Three most widely used local similarity algorithms are – Smith-
Waterman, BLAST, FASTA.
Smith-Waterman:
• it is a rigorous dynamic programming approach
• it does not make use of heuristic shortcuts
•FASTA: (developed by Lipman & Pearson in 1985)
• considers exact matches between short substrings, for a given parameter
• allows to trade-off speed for precision: the larger we choose the parameter, the smaller
is the number of exact matches
•Makes the program faster but loses precision
BLAST: (developed by Altschul et al. in 1990)
• it focuses on no-gap alignments of a certain, fixed length
• it uses a scoring function to measure similarity rather than distance
•It reports to the user all database entries which have a segment pair scores higher than
the threshold parameter.
FASTA algorithm
BLAST (Basic Local Alignment
Search Tool)
is a similarity search program developed by the research
staff at NCBI/GenBank. It is available as a free service over the
Internet that provides very fast, accurate, and database searching

BLAST goes through the following 3 steps

• It takes each word from the query sequence (3 amino acids or 11
nucleotides).

• If similar words are found, BLAST tries to expand the alignment to the
adjacent words.

• After all words are tested, a set of HSPs (High-scoring Segment

Pairs) are chosen for that database sequence.
BLAST algorithm
POPULAR SCORING MODELS FOR PROTEIN
SEQUENCES
There are two popular scoring models for protein sequences – PAM and
BLOSUM. PAM stands for Percent Accepted Mutation and BLOSUM
stands for BLOcks SUbstitution Matrix.
PAM is
• based on explicit evolutionary model
• represents a specific evolutionary distance
• ranges from identical to completely random
BLOSUM is
• based on empirical frequencies
• always a blend of distances as seen in the database and PROSITE
• narrower range than PAM matrix
Representation of dot plots
GRAPHIC SIMILARITY
COMPARISONS
•Uses the power of computer to present
relationships between sequences
•Similarity between two sequences can be detected
as a diagonal on an identity matrix
•To determine the similarity of sequences , we must
compare all parts of one sequence with all parts of
the other
•The alignment with the greatest number of
identities would be the optimal alignment
Graphic similarity comparisons
Representation of scoring matrix
Representation of sequence s & t
METHODS FOR OPTIMAL ALIGNMENT
Global sequence alignment :
• Here dynamic programming is used which is a method for breaking down
the alignment of sequences into small parts
• It is comparable to moving across a dot matrix and keeping track of all the
matching pairs
• Sequence alignment method predate dot-matrix searches and all of the
alignment methods in use today
• Over the course of evolution, some positions undergo base or amino acid
substitution and bases or amino acids can be inserted or deleted
Local alignment :
• Smith-Waterman dynamic programming algorithm is used for local
alignment
• The algorithm gives the highest-scoring local match between two
sequences
• The alignment are arrived at by starting at the highest-scoring positions in
the scoring matrix and following a trace path up to a box that scores zero.
EXAMPLE :
• Calculate a dynamic programming matrix and alignment for the
sequences ATT and TTC . How many optimal alignments are there?
Matrix :
0123
1123
2112
3212
Alignment :
ATT

TTC
The other optimal alignment is,

ATT-

-TTC
Construction of the optimal alignment
Hidden Markov Model
• HMMs derive from Markov chain that
concentrate only on the sequence state.
• Since the early 1970s
– Applied in speech recognition research
• The early 1990s
– Introduced this model to the bioinformatics
community
– Sequence modeling, multiple alignment, protein
structure prediction and profiling
Markov Chains
• Markov Property of order 1
• Formally
P( X 0 , X 1 ,, X t )  P( X 0 ) P( X 1 | X 0 ) P( X 2 | X 0 , X 1 )  P( X t | X 0 ,, X t 1 )
 P( X 0 ) P( X 1 | X 0 ) P( X 2 | X 1 )  P( X t | X t 1 )

– State space = list of possible values for X

– Transition matrix = probability of moving from one X to another
– Initial distribution = initial value of X
• CS intuition S0 S1
– Stochastic finite automaton
S2
Markovian Sequence
• States through which the chain passes from a
sequence 0.5
• Example: S0 , S1, S1, S1, S0 , S1, 0.45
S0 S1
P(seq)  P( S0 , S1, S1, S1, S0 , S1,) 0.2

•   ( S0 ) P( S1 | S0 ) P( S1 | S1 ) S2

• Markov chain for generating DNA sequence

S=AGATCG
P( AGATCG)  ( A)P(G | A)P( A | G)P(T | A)
Hidden Markov Chains (HMMs)
• Observed sequence is a probabilistic
function of underlying Markov chain
– Example: HMM for a (noisy) DNA sequence
• True state sequence is unknown, but
observation sequence gives us a clue

Emission probabilities from each state

MSA (MULTIPLE SEQUENCE
ALIGNMENT)

It is a tool to determine levels of homology, and hence

relatedness, between members of a series of globally
related sequence.

Tools for MSA:

• Sum-of-pairs method
• Star alignment
• Two-step method (Clustal and Pileup approaches)
• Automated tools (Macaw, Meme etc.)
Global & Local MSA (multiple sequence alignment)
Example – SP (sum of pairs), method
Example – SP method
• The sum of pairs function scores each position in the
protein, that is, each column, as the sum of the pair wise
scores. For k sequences, there are k (k -1)/2 unique pair
wise comparisons, excluding self comparisons. Here in
column three, the score would be

SP – score (I, -, I,V) = p (I, -) + p (I,I) + p (I, V) + p (-, I) +

p (-, V) + p (I, V)

Where p (a, b) is the pair wise score of two amino acids.

Optimal alignment between k = 3 sequence

Where K is the number of sequences

HMM for Multiple Alignment
• Match” states are alignment sequence positions
• Position-specific deletion penalties
• Position-specific insertion frequencies
• Path through states aligns sequence to model
Example of HMM Model

Transition probabilities (T) and emission probabilities (e)

Scoring in HMM model
• Score of aCCy along the path
loge(.4) + loge(.3) + loge(.46) + loge(.6) +
loge(.97) + loge(.5) + loge(.015) + loge(.73) +
loge(.01) + loge(1) = -13.25

Bio in For Matics
100% (1)
Bio in For Matics
160 pages
Bioinformatics 2015
No ratings yet
Bioinformatics 2015
269 pages
Bioinformatics: Nadiya Akmal Binti Baharum (PHD)
100% (2)
Bioinformatics: Nadiya Akmal Binti Baharum (PHD)
54 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
15 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
Saep 349 PDF
100% (1)
Saep 349 PDF
41 pages
First Lecture
No ratings yet
First Lecture
89 pages
BMB 822 - Bioinformatics and Computing - Lecture Notes
No ratings yet
BMB 822 - Bioinformatics and Computing - Lecture Notes
94 pages
Radiograph Interpretation CASTINGS
No ratings yet
Radiograph Interpretation CASTINGS
5 pages
Pairwise Sequence Allignment
No ratings yet
Pairwise Sequence Allignment
108 pages
Bioinformatics Intro
No ratings yet
Bioinformatics Intro
69 pages
Lec 02
No ratings yet
Lec 02
103 pages
Bioinformatics Chaper3
No ratings yet
Bioinformatics Chaper3
34 pages
5.pairwise Alignment
No ratings yet
5.pairwise Alignment
85 pages
Bioinformatics:: Guide To Bio-Computing and The Internet
No ratings yet
Bioinformatics:: Guide To Bio-Computing and The Internet
34 pages
Basics of Bioinformatics
100% (7)
Basics of Bioinformatics
99 pages
Module 5
No ratings yet
Module 5
23 pages
Bio 3
No ratings yet
Bio 3
51 pages
Tunnelling Applications Shotcrete Reinforcement
No ratings yet
Tunnelling Applications Shotcrete Reinforcement
11 pages
Module II
No ratings yet
Module II
51 pages
Bioinformaticsautosaved 181126075425
No ratings yet
Bioinformaticsautosaved 181126075425
17 pages
Into To Bioinfo
No ratings yet
Into To Bioinfo
53 pages
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
No ratings yet
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
107 pages
SECT 5 SL L1-Rev
No ratings yet
SECT 5 SL L1-Rev
30 pages
Genomics and Similarity Search
No ratings yet
Genomics and Similarity Search
43 pages
Bioinformatic Paper WPS Office
No ratings yet
Bioinformatic Paper WPS Office
20 pages
Bioinformatics Past Paper-WPS Office
No ratings yet
Bioinformatics Past Paper-WPS Office
19 pages
Lec (1) - Introduction
No ratings yet
Lec (1) - Introduction
41 pages
Unit 3 Sequence Alignment and Phylogenetic Tree
No ratings yet
Unit 3 Sequence Alignment and Phylogenetic Tree
70 pages
Huawei SinlgeSDB HSS9860-BE Feature Description
No ratings yet
Huawei SinlgeSDB HSS9860-BE Feature Description
26 pages
Sequence Analysis - Alignment
No ratings yet
Sequence Analysis - Alignment
57 pages
Bioinformatics Class Notes
No ratings yet
Bioinformatics Class Notes
12 pages
Retrieval of Data
No ratings yet
Retrieval of Data
22 pages
Sequence Alignment
No ratings yet
Sequence Alignment
24 pages
Bio Informatics
No ratings yet
Bio Informatics
46 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Bioinformtics Future
No ratings yet
Bioinformtics Future
27 pages
Unit 1
No ratings yet
Unit 1
24 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Lecture 6 - Sequence Analysis
No ratings yet
Lecture 6 - Sequence Analysis
28 pages
Msa MTech
No ratings yet
Msa MTech
17 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
Sequence Alignment
No ratings yet
Sequence Alignment
8 pages
Genomic Sequence Alignment
No ratings yet
Genomic Sequence Alignment
25 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Z Bioinformatics
No ratings yet
Z Bioinformatics
14 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
Bioinformatics
No ratings yet
Bioinformatics
22 pages
Bioinformatics and Quantumcomputing: Bio Informatics
No ratings yet
Bioinformatics and Quantumcomputing: Bio Informatics
10 pages
Aristotle On Matter
No ratings yet
Aristotle On Matter
24 pages
8024 Bio Info
No ratings yet
8024 Bio Info
28 pages
Introduction To Bioinformatics: Tolga Can
No ratings yet
Introduction To Bioinformatics: Tolga Can
21 pages
Download
No ratings yet
Download
19 pages
Unit 6 - Bioinformatics
No ratings yet
Unit 6 - Bioinformatics
41 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Computer Awareness: Computer Awareness For IBPS PO/MT and Clerk
No ratings yet
Computer Awareness: Computer Awareness For IBPS PO/MT and Clerk
10 pages
Introduction To Bioinformatics Presentation
No ratings yet
Introduction To Bioinformatics Presentation
13 pages
Bioinformatics: Sequence Alignment Methods
No ratings yet
Bioinformatics: Sequence Alignment Methods
32 pages
BTH 403-BTG407 Lecture 1
No ratings yet
BTH 403-BTG407 Lecture 1
6 pages
Bio in For Matics
No ratings yet
Bio in For Matics
17 pages
Pp12a
No ratings yet
Pp12a
55 pages
Champion Aviation Product Application / SkySupplyUSA
No ratings yet
Champion Aviation Product Application / SkySupplyUSA
64 pages
Line Parameters Program: Frequency-Dependent Electromagnetic
No ratings yet
Line Parameters Program: Frequency-Dependent Electromagnetic
10 pages
Haque 2008 - Durability Design in The African Concrete Code
No ratings yet
Haque 2008 - Durability Design in The African Concrete Code
17 pages
Course: 141 Tig Welding of Stainless Steel
No ratings yet
Course: 141 Tig Welding of Stainless Steel
17 pages
EMD Module 1
No ratings yet
EMD Module 1
69 pages
MLE1101 - Tutorial 2 - Suggested Solutions
No ratings yet
MLE1101 - Tutorial 2 - Suggested Solutions
8 pages
CV Equations Used in Hysys
No ratings yet
CV Equations Used in Hysys
3 pages
VRsec BIOINFORMATICS
No ratings yet
VRsec BIOINFORMATICS
2 pages
Socio 101 - Midterm Exam Reviewer
No ratings yet
Socio 101 - Midterm Exam Reviewer
8 pages
Computer Ebook English RBE
No ratings yet
Computer Ebook English RBE
69 pages
Top 200
No ratings yet
Top 200
232 pages
Ramsey S Legacy 1st Edition Lillehammer Download PDF
100% (6)
Ramsey S Legacy 1st Edition Lillehammer Download PDF
84 pages
Aw GR 11 Junie 2024 Memo Finaal
No ratings yet
Aw GR 11 Junie 2024 Memo Finaal
14 pages
Type of Proportions
No ratings yet
Type of Proportions
20 pages
Bhumika Di Ip
No ratings yet
Bhumika Di Ip
20 pages
Stochastic Physics Code in The UM: Unified Model Documentation Paper 081
No ratings yet
Stochastic Physics Code in The UM: Unified Model Documentation Paper 081
23 pages
Evaporators Performance
No ratings yet
Evaporators Performance
14 pages
Comparison of Shielding Methods
No ratings yet
Comparison of Shielding Methods
2 pages
Elementary Functions Complete PDF
No ratings yet
Elementary Functions Complete PDF
32 pages
How To Reduce EMI in Switching Power Supplies
No ratings yet
How To Reduce EMI in Switching Power Supplies
3 pages
Fungsi Sistem Otot
No ratings yet
Fungsi Sistem Otot
8 pages
Confined Space Entry Permit Sample 1
No ratings yet
Confined Space Entry Permit Sample 1
2 pages
Ethanolamine and Phosphoethanolamine Inhibit Mitochondrial Function in Vitro - Implications For Mitochondrial Dysfunction Hypothesis in Depression and Bipolar Disorder - ScienceDirect
No ratings yet
Ethanolamine and Phosphoethanolamine Inhibit Mitochondrial Function in Vitro - Implications For Mitochondrial Dysfunction Hypothesis in Depression and Bipolar Disorder - ScienceDirect
6 pages
Tension 13: 5or1 He T TH Ro No H RD in
No ratings yet
Tension 13: 5or1 He T TH Ro No H RD in
1 page
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
From Everand
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
Björn Olsson
No ratings yet
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Gene Expression Programming: Fundamentals and Applications
From Everand
Gene Expression Programming: Fundamentals and Applications
Fouad Sabry
No ratings yet

Bio in For Ma Tics

Uploaded by

Bio in For Ma Tics

Uploaded by

WHAT IS BIOINFORMATICS ?

•Conceptualizing Biology in terms of molecules and applying informatics

• DNA Micro arrays: technology to measure relative copies of

TODAY, TOMORROW AND THE NEAR FUTURE

THE THREE E’s

OF EARLY MOLECULAR BIOLOGY OF BIOINFORMATICS

THE DNA COMPUTING

Biochemical pathways METABOLITES

(GENE IDENTIFICATION TOOLS)

CPH MODELS ETC…

WHAT METABONOMICS IS ALL ABOUT

ANABOLISM + CATABOLISM --->>>>>> METABOLISM

PATTERN RECOGNITION AND DATABASES

WANT YOUR (SEQUENCE)

Sequences:- Viewed as strings of characters for

• Proteins & DNA may be similar with respect to their

Similarity measure An alignment i.e.

Optimal alignment – that exhibit most correspondences &

Basic idea: Similar sequences produce similar proteins.

Example: Let two Protein sequences are identical at 25% of their

Identical: when corresponding character is shared between two species

Similar: Degree to which two species or populations share identities.

Homologous: When characters are similar due to common ancestry.

Analogous: When characters are similar due to convergent evolution they

Orthologous: When characters are homologous with conserved function.

Paralogous: When characters are homologous with divergent function.

Occurred due to mutations – modifying DNA sequences

The notion of distance, assigning weights to each mutation.

 Ends free space alignment

• Length of gap is No. of indel operations.

• Concept of gap in alignment is important in many

• Mutational events create gaps of varying sizes.

GRAPHIC SIMILARITY COMPARISONS:-

GGCTTGACCGG - -> GGCTTGACCGG - -> GGCTTGACCGG - ->

GGATTGACCCG--> GGATTGACCCG GGCTTGACCGG - ->

BLAST goes through the following 3 steps

• After all words are tested, a set of HSPs (High-scoring Segment

– State space = list of possible values for X

• Markov chain for generating DNA sequence

Emission probabilities from each state

It is a tool to determine levels of homology, and hence

Tools for MSA:

SP – score (I, -, I,V) = p (I, -) + p (I,I) + p (I, V) + p (-, I) +

Where p (a, b) is the pair wise score of two amino acids.

Where K is the number of sequences

Transition probabilities (T) and emission probabilities (e)

You might also like