0% found this document useful (0 votes)
4 views

Module_4_Reference Course content

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Module_4_Reference Course content

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

MIT School of Bioengineering Sciences and Research

(A Constituent unit of MIT ADT University)

Basic Concepts In Bioinformatics / BI301

Module04

Understanding sequence alignment and


types of sequence alignment

Course Coordinator: Dr. Sanket P. Bapat / Dr. Priyanka Nath


Mail ID: [email protected] | [email protected]
Disclaimer:

The content delivered here should be considered of utmost importance. However, it is


to be noted that, this material is not Stand-alone material for the fulfilment of the
course syllabus. The content in this presentation should only be used as an aid to
learning.
Books and other resources provided are suggested to be referred for exhaustive
understanding.

MITBIO/MITADT University
Syllabus:

Module 4:
Sequence Alignments

Detailed method of derivation of the PAM and BLOSUM matrices. Pairwise


sequence alignments: Needleman & Wuncsh, Smith & Waterman, Multiple
sequence alignments (MSA), Use of HMM-based Algorithm for MSA (e.g. SAM
method)
Introduction to Sequences, alignments and Dynamic Programming; Local
alignment and Global alignment (algorithm and example),Pairwise alignment
(BLAST and FASTA Algorithm) and multiple sequence alignment (Clustal W
algorithm).

MITBIO/MITADT University
Objective/Learning Outcome:

CO1 Understanding the basics of bioinformatics and its Applications

CO2 Difference between databases and various biological databases

CO3 Performing data storage methods and various formats.

CO4 Understanding sequence alignment and types of sequence alignment

Discuss about the basics of gene expression and understanding the difference between pattern finding
CO5
and regular expression

CO6 Deduce the evolutionary relationships between the sequences by generating a phylogenetic tree.

MITBIO/MITADT University
TERMINOLOGIES

1. Sequence identity -- exactly the same Amino Acid


or Nucleotide in the same position.
2. Sequence similarity -- Substitutions with similar
chemical properties.
3. Sequence homology -- general term that indicates
evolutionary relatedness among sequences; we
usually measure of percentage identity of sequence
homology

5
SANKET BAPAT BIOINFORMATICS Sanket Bapat
Sequence homology

6
Sanket Bapat
CLASSIFICATION

 Global/local sequence alignment

 Pairwise/multiple sequence alignment

7
SANKET BAPAT BIOINFORMATICS Sanket Bapat
Pairwise sequence alignment

• A pairwise sequence alignment is an alignment of 2


Sequences, the resulting sequences have the same length
and where each pair of residues represents a homologous
position.

8
SANKET BAPAT BIOINFORMATICS Sanket Bapat
• Alignment of Two Sequences

• METHODS
1. Dot matrix analysis
2. The dynamic programming (DP) algorithm
3. Word methods

9
SANKET BAPAT BIOINFORMATICS Sanket Bapat
Scoring matrices

 BLOSUM (Block Substitution Matrix)

 PAM (Point Accepted Mutation)

10
Sanket Bapat
PAM BLOSUM comparison
BLOSUM PAM

Identity level Mutability

Conserved sequences Evolutionary model

Local alignment Global alignment

Better accuracy Not as good as BLOSUM.

11
Sanket Bapat
12
Sanket Bapat
BLAST

• The BLAST program was developed by Stephen


Altschul of NCBI in 1990.
• BLAST uses heuristics to align a query sequence
with all sequences in a database.
• The objective is to find high-scoring ungapped
segments among related sequences.
• Three residues for protein sequences and
eleven residues for DNA sequences in word.

13
Sanket Bapat
14
Sanket Bapat
Fasta

• FASTA (FAST ALL, www.ebi.ac.uk/fasta33/) was in fact the first


database similarity search tool developed.
• FASTA uses a “hashing” strategy to find matches for a short stretch of
identical residues with a length of k.
• The string of residues is known as ktuples or ktups.
• A ktup is composed of two residues for protein sequences and six
residues for DNA sequences.

15
Sanket Bapat
Statistical Models

PSSM (Position Specific Scoring Matrix)

Profile

Hidden Markov Model

Sanket Bapat
Multiple Sequence Alignment

17
Sanket Bapat
Multiple Sequence Alignment
• In Multiple Sequence Alignment (MSA), multiple sequences are
aligned to determine the Phylogenetic / Evolutionary relationship
between the sequences.

eg: AUG CCA GGU GGC UAA…… (1)


AUG CCC GGU GGC UAA…… (2)
AUG CUC GGU GGC UAA…… (3)
CCA, CCC –PROLINE
CUC - LEUCINE
18
Sanket Bapat
Why do we need multiple alignments?
 Characterize protein families.
 Identify shared regions of homology
 Determination of the consensus sequence.
 Help prediction of the secondary and tertiary
structures of new sequences.
 Molecular evolution analysis using Phylogenetic
methods

Sanket Bapat
Multiple alignment CONSTRUCTION

1. Traditional approaches
a) Optimal multiple alignment
b) Progressive multiple alignment
2. Alignment parameters
a) Residue similarity matrices
b) Gap penalties
3. Alternative approaches
a) Iterative alignment methods
b) Combinatorial algorithms
c) PipeAlign : a protein family analysis tool

Sanket Bapat
ClustalW
• Popular multiple alignment
tool today

• Thompson et al., 1994

• CLUSTAL=Cluster
alignment

• ‘W’ stands for ‘weighted’


(different parts of
alignment are weighted
differently).

Sanket Bapat
Interesting Links:

• BLAST Tutorial
• Link: https://youtu.be/jDHrHfx0cpw?si=KvYbP2EzX-HCvRr1

• Clustal Omega Tutorial


• Link: https://youtu.be/3qLdCZBIXME?si=vDhcLU1UYFaS7kG2

MITBIO/MITADT University
Disclaimer:

The content delivered here should be considered of utmost importance. However, it is


to be noted that, this material is not Stand-alone material for the fulfilment of the
course syllabus. The content in this presentation should only be used as an aid to
learning.
Books and other resources provided are suggested to be referred for exhaustive
understanding.

MITBIO/MITADT University
References:

References Book Name Library

Jin Xiong Essential Bioinformatics Ebook / Present in Library

Mount, David W. Bioinformatics Sequence & Genome Analysis Present in Library

Charlie Hodgman Bioinformatics: Second Edition Present in Library

Parry Smith Introduction to Bioinformatics Present in Library

MITBIO/MITADT University
The content is intended for internal use only, and the ownership belongs to the coordinator. It
should not be uploaded on any platform without proper authorization.

You might also like