0% found this document useful (0 votes)

43 views30 pages

SECT 5 SL L1-Rev

This document provides information about an exam for the CSC8312 Bioinformatics Theory and Applications module. It discusses the exam format, which will consist of 3 questions with 1.5 hours total time and 45 minutes for each question. It provides some tips for the exam such as reading questions carefully, making a rough sketch before answering, not panicking, attempting all questions, watching the time, and providing examples. It also lists the module overview with the topics covered in each lecture. Finally, it discusses example exam questions related to protein sequence divergence, homology, orthology, paralogy, and the PAM and BLOSUM scoring matrices used for sequence alignment.

Uploaded by

Uday Kiran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views30 pages

SECT 5 SL L1-Rev

Uploaded by

Uday Kiran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 30

Revision Lecture

CSC8312

Prof. A. Wipat
Exam format

 3 Questions – answer 2
 Total 1.5 hours - 45 minutes each

CSC8312 Bioinformatics Theory and Applications 2

Some tips
 RTQ – read the question carefully
 Make a rough sketch of answer quickly before answering – helps structure
 Don’t panic
 Don’t spend too long on one answer – If you don’t attempt the others the examiner
will not be able to give you any marks
• always attempt – trade off of time – 1st 20 min most productive
 Watch the time – if you run out then at least list headings/themes of the stuff you
were going to write about and some bullet points
 Give examples/diagrams no matter how simple
 Learn as much as you can – revise properly, check you can write it down from
memory
 Targeted revision is a gamble – best learn it all!
 Everything covered in lectures and practicals is valid
 Evidence of external reading impresses examiners – essential for a very good mark.

CSC8312 Bioinformatics Theory and Applications 3

CSC8312 Module Overview

 Section 1: Lect 1: Molecular and Cellular Biology

 Section 1: Lect 2: Genetic Material and Genomes
 Section 1: Lect 3: Gene Expression, Transcriptomes and Proteomes
 Section 2: Lect 1: DNA and genome sequencing
 Section 2: Lect 2: Sequence data and annotation
 Section 3: Lect 1: Evolution
 Section 3: Lect 2: Sequence Similarity and Comparison
 Section 3: Lect 3: Sequence Similarity Algorithms
 Section 3: Lect 4: Multiple Sequence Alignment
 Section 4: Lect 1: Microarray Data Analysis
 Section 4: Lect 2: Data Standards and Ontologies
 Section 4: Lect 3: Proteomics
 Section 4: Lect 4: Protein structure
 Section 4: Lect 5: Protein Structure Prediction
 Section 4: Lect 6 & 7 : Biological Networks

CSC8312 Bioinformatics Theory and Applications 4

2004 paper

Example Questions
CSC8312 Bioinformatics Theory and Applications 6
CSC8312 Bioinformatics Theory and Applications 7
Protein sequence divergence

 How far can sequence diverge and still be related?

 As sequences diverge the non-conservative mutations increase and

the gaps (insertions/deletions) increase

 In some cases protein sequences can diverge to a great extent and

still be related functionally. e.g some globin sequences have only 8%
amino acid identity

• Makes it hard to detect related proteins i.e. homologues.

CSC8312 Bioinformatics Theory and Applications 8

Homology, Orthology and Paralogy

 Different types of homologous sequences:

• Orthologous genes (orthologues or orthologs):

• Homologous genes from different genomes ascribed to the same

gene family
• Evolved by speciation
• Often encode proteins that perform the same function in different
organisms

CSC8312 Bioinformatics Theory and Applications 9

Homology, Orthology and Paralogy

• Paralogous genes (paralogues or paralogs)

• Homologous genes within the same genome ascribed to

the same gene family.
• Created by gene duplication
• May perform different functions in the same host as their
function may change with evolution (lack of functional
constraint)

CSC8312 Bioinformatics Theory and Applications 10

Paralogs can possess different functions

From Biochemistry, Stryer

CSC8312 Bioinformatics Theory and Applications 11

Paralog & Ortholog

 "Two genes are said to be paralogous

if they are derived from a duplication
event, but orthologous if they are
derived from a speciation event.“
W-H Li

CSC8312 Bioinformatics Theory and Applications 12

Homology, Orthology and Paralogy

• Orphans
• Single copy genes without any homolog

• Strain-specific expansions
• Gene families of paralogs without any orthologs
• Thus confined to the same family

• Xenologs
• Homologous genes where one gene has been obtained by horizontal gene
transfer. (transfer of genetic material between organisms)
• Comparative analysis of bacterial, archaeal, and eukaryotic genomes indicates
that a significant fraction of the genes in the prokaryotic genomes have been
subject to horizontal transfer.

CSC8312 Bioinformatics Theory and Applications 13

The PAM matrices

 Method is based on our knowledge of evolution:

 As sequences diverge they accumulate mutations

 The probability of a particular change can be derived from

aligning homologous sequences

 e.g. We can count the number of S->T changes & calculate

relative probability of this happening

 The relative frequencies for all amino acid changes can be

used to derive a scoring matrix – the PAM matrix

CSC8312 Bioinformatics Theory and Applications 14

The PAM matrices

 Problem: What if there have been multiple substitutions at the

same site?
• Will bias the statistics
• Samples therefore restricted to sequences that are sufficiently
similar that this probably hasn’t happened
 The PAM is a measure of sequence divergence
 1 PAM = 1% accepted mutation
 Two sequences that are 1 PAM apart are 99% similar
 For example, a 1 PAM substitution matrix is produced by
collecting statistics from sequences 1 PAM apart.

CSC8312 Bioinformatics Theory and Applications 15

The PAM matrices

 However, we need to assert relatedness between more widely

divergent sequences (99% similarity is not much use)
 How can we do this?
 We can extrapolate other matrices from the PAM matrix
 Powers of the matrix can be taken to produce matrices more
appropriate for more divergent sequences
• i.e. the matrix can be multiplied by itself.
 A range of PAM matrices have been derived in this way.

CSC8312 Bioinformatics Theory and Applications 16

PAM Matrices contd..
 The range of PAM matrices are named with the % divergence after the PAM
designation.

 Thus the PAM250 matrix represents a level of 250% change expected (over
2500 million years)
• Sequences at this level of divergence still have around 20% similarity

 The PAM250 matrix (corresponding to ~20% overall sequence similarity) is the

lowest similarity we can use for sequence analysis

 When used for protein comparison, the mutation probability (odds) matrix is
normalized and the logarithm is taken. (this lets us add the scores along a
protein instead of multiplying the probabilities)

 The numbers are multiplied by ten to avoid decimal points

 The resulting matrix is a “log-odds” matrix.

CSC8312 Bioinformatics Theory and Applications 17

PAM 250 logarithm of odds matrix (S250 matrix)

C 12
S 0 2
T -2 1 3
P -3 1 0 6
A -2 1 1 1 2
G -3 1 0 -1 1 5
N -4 1 0 -1 0 0 2
D -5 0 0 -1 0 1 2 4
E -5 0 0 -1 0 0 1 3 4
Q -5 -1 -1 0 0 -1 1 2 2 4
H -3 -1 -1 0 -1 -2 2 1 1 3 6
R -4 0 -1 0 -2 -3 0 -1 -1 1 2 8
K -5 0 0 -1 -1 -2 1 0 0 1 0 3 5
M -5 -2 -1 -2 -1 -3 -2 -3 -2 -1 -2 0 0 6
I -2 -1 0 -2 -1 -3 -2 -2 -2 -2 -2 -2 -2 2 5
L -8 -3 -2 -3 -2 -4 -3 -4 -3 -2 -2 -3 -3 4 2 8
V -2 -1 0 -1 0 -1 -2 -2 -2 -2 -2 -2 -2 2 4 2 4
F -4 -3 -3 -5 -4 -5 -4 -6 -5 -5 -2 -4 -5 0 1 2 -1 9
Y 0 -3 -3 -5 -3 -5 -2 -4 -4 -4 0 -4 -4 -2 -1 -1 -2 7 10
W -8 -2 -5 -6 -6 -7 -4 -7 -7 -5 -3 2 -3 -4 -5 -2 -6 0 0 17
C S T P A G N D E Q H R K M I L V F Y W

CSC8312 Bioinformatics Theory and Applications 18

The BLOSUM Matrices
 Developed by Henikoff & Henikoff (1992; PNAS 89:10915-10919)

 The Dayhoff matrices were superceded by the BLOSUM matrices when more
sequence data became available

 Replaced Dayhoff matrix with one that would perform better in identifying
distant relationships

 Made use on the vast increase in data since Dayhoff’s work

 Blosum matrices are based the BLOcks Substitution Matrix

 They use the BLOCKS database to search for differences among sequences
but only among the very conserved regions of a protein family.

CSC8312 Bioinformatics Theory and Applications 19

BLOCKS Database

From: http://www.psc.edu/general/software/packages/blocks/blocks.html ...

“Blocks are multiply aligned ungapped segments corresponding to the

most highly conserved regions of proteins.

The blocks for the BLOCKS database are made automatically by

looking for the most highly conserved regions in groups of proteins
represented in the PROSITE database.

These blocks are then calibrated against the SWISS-PROT database to

obtain a measure of the chance distribution of matches.

It is these calibrated blocks that make up the BLOCKS database. “

CSC8312 Bioinformatics Theory and Applications 20

The BLOSUM Matrices
 Multiple alignments of short regions of related sequences (without gaps) were collected.

 For each alignment the sequences similar at some threshold value of percent identity are
clustered into groups and averaged.

 Substitution frequencies for all pairs of amino acids were calculated between the groups,
this was used to create the log-odds BLOSUM ( Block Substitution Matrix ).

 In general, BLOSUM62 is less tolerant of substitutions to or from hydrophilic amino acids

than PAM160 (it's closest equivalent) and of cysteine and tryptophan mismatches

 Thus, BLOSUM62 means that the sequences clustered in this block are at least 62%
identical.

 This allows detection of more distantly related sequences, as it downplays the role of the
more related sequences in the block when building the matrix.

CSC8312 Bioinformatics Theory and Applications 21

The BLOSUM 62 Matrix
C 9
S -1 4
T -1 1 5
P -3 -1 -1 7
A 0 1 0 -1 4
G -3 0 -2 -2 0 6
N -3 1 0 -2 -2 0 6
D -3 0 -1 -1 -2 -1 1 6
E -4 0 -1 -1 -1 -2 0 2 5
Q -3 0 -1 -1 -1 -2 0 0 2 5
H -3 -1 -2 -2 -2 -2 1 -1 0 0 8
R -3 -1 -1 -2 -1 -2 0 -2 0 1 0 5
K -3 0 -1 -1 -1 -2 0 -1 1 1 -1 2 5
M -1 -1 -1 -2 -1 -3 -2 -3 -2 0 -2 -1 -1 5
I -1 -2 -1 -3 -1 -4 -3 -3 -3 -3 -3 -3 -3 1 4
L -1 -2 -1 -3 -1 -4 -3 -4 -3 -2 -3 -2 -2 2 2 4
V -1 -2 0 -2 0 -3 -3 -3 -2 -2 -3 -3 -2 1 3 1 4
F -2 -2 -2 -4 -2 -3 -3 -3 -3 -3 -1 -3 -3 0 0 0 -1 6
Y -2 -2 -2 -3 -2 -3 -2 -3 -2 -1 2 -2 -2 -1 -1 -1 -1 3 7
W -2 -3 -2 -4 -3 -2 -4 -4 -3 -2 -2 -3 -3 -1 -3 -2 -3 1 2 11
C S T P A G N D E Q H R K M I L V F Y W

CSC8312 Bioinformatics Theory and Applications 22

Interpreting Multiple sequence Alignments for
protein structure

 It is possible to infer information about protein

structures from multiple sequence alignments.
 Protein sequences are aligned and the composition
of conserved regions are examined

CSC8312 Bioinformatics Theory and Applications 23

Conserved positions may be identified

 A multiple sequence alignment of the Drosophila chromatin proteins – showing a

common protein sequence motif

Conserved
positions
CSC8312 Bioinformatics Theory and Applications 24
Inference of Structure from multiple sequence
alignments (See Lesk pg. 188)

 The most highly conserved regions may correspond to an active site

 Regions rich in insertions and deletions probably correspond to
surface loops.
 A position containing a conserved Gly or Pro may correspond to a turn
 A conserved pattern of hydrophobicity with spacing 2 (i.e. every other
residue), with intervening residues more variable and including
hydrophilic residues, suggests a β-strand on the surface
 A conserved pattern of hydrophobicity with spacing ~4 suggests a
helix.

CSC8312 Bioinformatics Theory and Applications 25

MULTIPLE SEQUENCE ALIGNMENT

 Definition (from Attwood):

• “A multiple sequence alignment is a 2D table, in which the
rows represent individual sequences, and the columns the
residue positions”
• “Sequences are laid onto this grid in such a manner that
• a) the relative positioning of residues within any one sequence
is preserved
• b) Similar residues in all the sequences are brought into
vertical register (see example)

CSC8312 Bioinformatics Theory and Applications 26

Multiple sequence alignments (MSA’s)
contd..

 Visual examination of multiple sequence alignments

(MSA’s) is very valuable
 Can be displayed as multicoloured where the
different colours for amino acids of different
physicochemical types
 The alignment table can be summarised in a single
line – a pseudo sequence called the consensus
 May be produced
• By hand e.g. using alignment tools such as CINEMA
• Automatically using implementations of algorithms such as ClustalW

CSC8312 Bioinformatics Theory and Applications 27

Conserved positions may
be identified
 EXMAPLE: A multiple sequence alignment of the Drosophila chromatin proteins –
showing a common protein sequence motif

http://www.russell.embl.de/aas

Ala,A Cys,C Asp,D Glu,E Phe,F

Gly,G His,H Ile,I Lys,K Leu,L

Met,M Asn,N Pro,P Gln,Q Arg,R

Ser,S Thr,T Val,V Trp,W Tyr,Y

CSC8312 Bioinformatics Theory and Applications 28

Multiple sequence alignments

 MSA’s are starting points for phylogenetic analysis

• Can be used to group sequences or sub-sequences into families
• Once an MSA is determined each column in the alignment predicts the mutations that
at one site during the evolution of the sequence family
 Starting with an MSA it can be possible to determine the order
of appearance of the sequences during evolution
 Multiple sequence alignments can be global or local
• Protein sequences may be conserved in their entirety through
evolutionary change, or
• Functional domains in protein sequences may be conserved, whilst
remaining sequence diverges
 Used in common alignment programs such as ClustalW

CSC8312 Bioinformatics Theory and Applications 29

Algorithms for calculating multiple sequence
alignments

 Aligning multiple sequences is computational

expensive (even more that pairwise alignment)
 Only a few sequences can be aligned using an exact
solution
 To align many sequences it is necessary to rely on
heuristic strategies

CSC8312 Bioinformatics Theory and Applications 30

TNAU Publications
67% (3)
TNAU Publications
144 pages
BLOSUM Matrices
No ratings yet
BLOSUM Matrices
18 pages
Bioinformatics in PAM AND BLOSUM
100% (15)
Bioinformatics in PAM AND BLOSUM
17 pages
Module 1 Biotechnology Second Quarter
No ratings yet
Module 1 Biotechnology Second Quarter
41 pages
Bioinformatics
No ratings yet
Bioinformatics
3 pages
Bioinformatics 1 p3
No ratings yet
Bioinformatics 1 p3
17 pages
Virus Induced Gene Silencing (VIGS)
100% (1)
Virus Induced Gene Silencing (VIGS)
45 pages
BMB 822 - Bioinformatics and Computing - Lecture Notes
No ratings yet
BMB 822 - Bioinformatics and Computing - Lecture Notes
94 pages
Module III
No ratings yet
Module III
55 pages
Chromosome Walking Jumping
No ratings yet
Chromosome Walking Jumping
24 pages
Week5 Profiles HMM
No ratings yet
Week5 Profiles HMM
20 pages
Bioinformatics:: Guide To Bio-Computing and The Internet
No ratings yet
Bioinformatics:: Guide To Bio-Computing and The Internet
34 pages
Sequence Comparison1
No ratings yet
Sequence Comparison1
25 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
Bioinformatics Tutorial
No ratings yet
Bioinformatics Tutorial
12 pages
Basic Bioinformatics
No ratings yet
Basic Bioinformatics
40 pages
Lectures 9-12
No ratings yet
Lectures 9-12
39 pages
Lecture 9 Scoring Matrices
No ratings yet
Lecture 9 Scoring Matrices
20 pages
Lec 02
No ratings yet
Lec 02
103 pages
Optimal Alignment and Heuristic Solutions
No ratings yet
Optimal Alignment and Heuristic Solutions
7 pages
X Inactivation Babak Nami
No ratings yet
X Inactivation Babak Nami
19 pages
5.pairwise Alignment
No ratings yet
5.pairwise Alignment
85 pages
Pathways of Signal Transduction
No ratings yet
Pathways of Signal Transduction
8 pages
Unit Iii
No ratings yet
Unit Iii
14 pages
Sequence Alignment
No ratings yet
Sequence Alignment
8 pages
Serum Free Media
No ratings yet
Serum Free Media
8 pages
Module II
No ratings yet
Module II
51 pages
MATH3353 Notes
No ratings yet
MATH3353 Notes
100 pages
Bioinformatics Chaper3
No ratings yet
Bioinformatics Chaper3
34 pages
Module 5
No ratings yet
Module 5
23 pages
Organelles Pitch
No ratings yet
Organelles Pitch
113 pages
Protein Alignment Scoring - PAM and BLOSUM
No ratings yet
Protein Alignment Scoring - PAM and BLOSUM
11 pages
Msa MTech
No ratings yet
Msa MTech
17 pages
Bioinformatics I
No ratings yet
Bioinformatics I
39 pages
Syllabus:-Biomolecules, Cell: The Unit of Life, Structural Organisation in Animals
No ratings yet
Syllabus:-Biomolecules, Cell: The Unit of Life, Structural Organisation in Animals
9 pages
1 Pearson
No ratings yet
1 Pearson
9 pages
Sequence Alignment
No ratings yet
Sequence Alignment
24 pages
Unit-5 Bioinformatics
No ratings yet
Unit-5 Bioinformatics
13 pages
BLOSUM Matrices
No ratings yet
BLOSUM Matrices
18 pages
05 CAP5510 Fall21
No ratings yet
05 CAP5510 Fall21
40 pages
12th Annual Biology Important 2026
No ratings yet
12th Annual Biology Important 2026
22 pages
Lecture - 02 - Comparative Sequence Analysis
No ratings yet
Lecture - 02 - Comparative Sequence Analysis
28 pages
AsBioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
AsBioinfo Ders 7 ALLIGNMENT - 1
9 pages
Bioinformatics
No ratings yet
Bioinformatics
22 pages
Introduction To Different Resources of Bioinformatics and Application PDF
No ratings yet
Introduction To Different Resources of Bioinformatics and Application PDF
55 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
Full Metagenomics Methods and Protocols 3rd Edition Wolfgang R. Streit Ebook All Chapters
100% (2)
Full Metagenomics Methods and Protocols 3rd Edition Wolfgang R. Streit Ebook All Chapters
84 pages
Unit2 2
No ratings yet
Unit2 2
30 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
Primer Designing For PCR: Colloquium
No ratings yet
Primer Designing For PCR: Colloquium
3 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
MMR Vaccine
No ratings yet
MMR Vaccine
7 pages
Protein Sequence Alignment Lecture Notes
No ratings yet
Protein Sequence Alignment Lecture Notes
2 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Exam Year Questions and Answers
No ratings yet
Exam Year Questions and Answers
8 pages
Alignment of Sequences
No ratings yet
Alignment of Sequences
33 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
37 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
2-Substitution Matrices and Python - 2017
No ratings yet
2-Substitution Matrices and Python - 2017
65 pages
Broch Bioreactors Microbial Applications Bibliography Sbi1115 e Data
No ratings yet
Broch Bioreactors Microbial Applications Bibliography Sbi1115 e Data
8 pages
DLL Genetics 3
No ratings yet
DLL Genetics 3
2 pages
WNDMutationDB1 12 2w
No ratings yet
WNDMutationDB1 12 2w
7 pages
Bioinformatics Session1
No ratings yet
Bioinformatics Session1
35 pages
Science Proteins
No ratings yet
Science Proteins
13 pages
MCQ-1 Psa
No ratings yet
MCQ-1 Psa
4 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
BBYET-141 English
No ratings yet
BBYET-141 English
3 pages
Full Download Principles of Tissue Engineering Robert Lanza PDF
100% (5)
Full Download Principles of Tissue Engineering Robert Lanza PDF
64 pages
Cambridge International Examinations: Biology 9700/22 March 2017
No ratings yet
Cambridge International Examinations: Biology 9700/22 March 2017
9 pages
14 - Gluconeogenesis
No ratings yet
14 - Gluconeogenesis
6 pages
PyMOL - Exercise Solutions 18
No ratings yet
PyMOL - Exercise Solutions 18
15 pages
An Introduction To Marketing Nutrition
No ratings yet
An Introduction To Marketing Nutrition
22 pages
Lecture 3 and 4 LSM2241
No ratings yet
Lecture 3 and 4 LSM2241
6 pages
BLOSUM
No ratings yet
BLOSUM
3 pages
Sequence Similarity Searching: WWW - Med.nyu - edu/rcr/rcr/course/PPT/similarity
No ratings yet
Sequence Similarity Searching: WWW - Med.nyu - edu/rcr/rcr/course/PPT/similarity
57 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
Fermentation
No ratings yet
Fermentation
23 pages
MB Exp2
No ratings yet
MB Exp2
6 pages
PROCESSES of rDNA TECHNOLOGY
No ratings yet
PROCESSES of rDNA TECHNOLOGY
6 pages
DNA Finger Printing
No ratings yet
DNA Finger Printing
4 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
54 pages
Interpretation
No ratings yet
Interpretation
2 pages
ReishiMax Clinical Study
No ratings yet
ReishiMax Clinical Study
7 pages
Bioinformatics: ABE 2007 Kent Koster Group 3
No ratings yet
Bioinformatics: ABE 2007 Kent Koster Group 3
43 pages
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
No ratings yet
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
50 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Introduction To Bioinformatics: Sequence Alignment
No ratings yet
Introduction To Bioinformatics: Sequence Alignment
29 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Logical Modeling of Biological Systems
From Everand
Logical Modeling of Biological Systems
Luis Fariñas del Cerro
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet

SECT 5 SL L1-Rev

Uploaded by

SECT 5 SL L1-Rev

Uploaded by

Revision Lecture

CSC8312 Bioinformatics Theory and Applications 2

CSC8312 Bioinformatics Theory and Applications 3

 Section 1: Lect 1: Molecular and Cellular Biology

CSC8312 Bioinformatics Theory and Applications 4

 How far can sequence diverge and still be related?

 As sequences diverge the non-conservative mutations increase and

 In some cases protein sequences can diverge to a great extent and

• Makes it hard to detect related proteins i.e. homologues.

CSC8312 Bioinformatics Theory and Applications 8

 Different types of homologous sequences:

• Orthologous genes (orthologues or orthologs):

• Homologous genes from different genomes ascribed to the same

CSC8312 Bioinformatics Theory and Applications 9

• Paralogous genes (paralogues or paralogs)

• Homologous genes within the same genome ascribed to

CSC8312 Bioinformatics Theory and Applications 10

From Biochemistry, Stryer

CSC8312 Bioinformatics Theory and Applications 11

 "Two genes are said to be paralogous

CSC8312 Bioinformatics Theory and Applications 12

CSC8312 Bioinformatics Theory and Applications 13

 Method is based on our knowledge of evolution:

 As sequences diverge they accumulate mutations

 The probability of a particular change can be derived from

 e.g. We can count the number of S->T changes & calculate

 The relative frequencies for all amino acid changes can be

CSC8312 Bioinformatics Theory and Applications 14

 Problem: What if there have been multiple substitutions at the

CSC8312 Bioinformatics Theory and Applications 15

 However, we need to assert relatedness between more widely

CSC8312 Bioinformatics Theory and Applications 16

 The PAM250 matrix (corresponding to ~20% overall sequence similarity) is the

 The numbers are multiplied by ten to avoid decimal points

 The resulting matrix is a “log-odds” matrix.

CSC8312 Bioinformatics Theory and Applications 17

CSC8312 Bioinformatics Theory and Applications 18

 Made use on the vast increase in data since Dayhoff’s work

 Blosum matrices are based the BLOcks Substitution Matrix

CSC8312 Bioinformatics Theory and Applications 19

From: http://www.psc.edu/general/software/packages/blocks/blocks.html ...

“Blocks are multiply aligned ungapped segments corresponding to the

The blocks for the BLOCKS database are made automatically by

These blocks are then calibrated against the SWISS-PROT database to

It is these calibrated blocks that make up the BLOCKS database. “

CSC8312 Bioinformatics Theory and Applications 20

 In general, BLOSUM62 is less tolerant of substitutions to or from hydrophilic amino acids

CSC8312 Bioinformatics Theory and Applications 21

CSC8312 Bioinformatics Theory and Applications 22

 It is possible to infer information about protein

CSC8312 Bioinformatics Theory and Applications 23

 A multiple sequence alignment of the Drosophila chromatin proteins – showing a

 The most highly conserved regions may correspond to an active site

CSC8312 Bioinformatics Theory and Applications 25

 Definition (from Attwood):

CSC8312 Bioinformatics Theory and Applications 26

 Visual examination of multiple sequence alignments

CSC8312 Bioinformatics Theory and Applications 27

Ala,A Cys,C Asp,D Glu,E Phe,F

Gly,G His,H Ile,I Lys,K Leu,L

Met,M Asn,N Pro,P Gln,Q Arg,R

Ser,S Thr,T Val,V Trp,W Tyr,Y

CSC8312 Bioinformatics Theory and Applications 28

 MSA’s are starting points for phylogenetic analysis

CSC8312 Bioinformatics Theory and Applications 29

 Aligning multiple sequences is computational

CSC8312 Bioinformatics Theory and Applications 30

You might also like