0% found this document useful (0 votes)
4 views

lecture2_sequence_alignment

This document provides an introduction to sequence alignment in bioinformatics, detailing methods for both pairwise and multiple sequence alignments. It explains the significance of alignment in identifying evolutionary relationships and introduces key algorithms such as Needleman-Wunsch for global alignment and Smith-Waterman for local alignment. The document also outlines the steps involved in these algorithms and their applications in database searching.

Uploaded by

Zahra.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

lecture2_sequence_alignment

This document provides an introduction to sequence alignment in bioinformatics, detailing methods for both pairwise and multiple sequence alignments. It explains the significance of alignment in identifying evolutionary relationships and introduces key algorithms such as Needleman-Wunsch for global alignment and Smith-Waterman for local alignment. The document also outlines the steps involved in these algorithms and their applications in database searching.

Uploaded by

Zahra.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

INTRODUCTION TO BIOINFORMATICS

LECTURE NO 2

SEQUENCE ALIGNMENT

By:
Laila Sehar

National Center for Bioinformatics


Quaid-e-Azam university, Islamabad
Introduction
Sequence Alignment
 Definition: “Procedure for comparing two or more
sequences by searching for a series of individual
characters or character patterns that are in the same
order in the sequences”
 Pair-wise alignment: compare two sequences
 Multiple sequence alignment: compare more
than two sequences
Example sequence alignment

 Align “abcdef” with “abdgf”


• Write second sequence below the first
abcdef
abdgf
• Move sequences to give maximum match
between them
• Show characters that match using vertical
bar
Example sequence alignment

abcdef
||
abdgf

 Insert gap between b and d on lower sequence to allow d


and f to align
abcdef
| | | |
ab-dgf
Note e and g don’t match
Why do sequence alignments?

 To find whether two (or more) genes or


proteins are evolutionarily related to each
other
 To find structurally or functionally similar
regions within proteins
Pairwise Sequence Alignment
 Pairwise sequence alignment methods are used
to find the best-matching piecewise (local) or
global alignments of two query sequences.

 Pairwise Sequence Alignment is used to


identify regions of similarity that may indicate
functional, structural and/or evolutionary
relationships between two biological
sequences (protein or nucleic acid).
Methods for Pair wise Alignment

 Dynamic Programming:

 Needlemanwunch algorithm

 Smithwaterman algorithm
Dynamic Programming

Two method for sequence alignment.

 Local sequence alignment

 Global sequence alignment


Global vs. Local Alignment

 We distinguish
Global alignment algorithms which optimize
overall alignment between two sequences .
 We assume that the two proteins are basically
similar over the entire length of one another.
Local alignment algorithms which seek only
relatively conserved pieces of sequence.
 Allign those parts of sequences that appear to have
good similarity.
Global vs. Local Alignment
 Global
LGPSSKQTGKGS-SRIWDN
| | | | | | |
LN-ITKSAGKGAIMRLGDA
 Local
--------GKG--------
| | |
--------GKG--------
Elements of Global Sequence Alignment
 Alignment scores. The score for an alignment
is taken to be the sum of scores for aligned pairs
of letters, and scores for letters aligned with
nulls.

 Substitution scores. Scores for aligned pairs


of letters are called substitution scores, whether
the letter aligned are identical or not.

 Gap scores. The score for a letter aligned with


a null is called a gap score.
Significance
 It is commonly used in bioinformatics to
align protein or nucleotide sequences.
 Alignments try to tell the evolutionary story of the
proteins.
 Sequence Alignment is used to identify regions of
similarity that may indicate functional, structural and/or
evolutionary relationships between two biological
sequences (protein or nucleic acid).
 One motivation for database searching: given a query
sequence q, find the sequences in a database D which make
‘good’ alignments to q.
Pairwise Global Alignment
Algorithm

The Needleman-Wunsch
algorithm
The Needleman-Wunsch algorithm is a dynamic
programming algorithm for optimal sequence
alignment (Needleman and Wunsch, 1970).
The optimal path can be determined by
incremental extension of the optimal sub-paths.
In a Needleman-Wunsch alignment, the optimal
path must stretch from beginning to end in both
sequences (hence the term ‘global alignment’).
The key for understanding this approach is to
observe how the alignment problem can be
divided into subproblems.
Steps:
Initialization
Matrix fill or scoring
Trace back and alignment
Pairwise Global Alignment of sequences

 Place each sequence along one axis


 Place score 0 at the up-left corner
 Fill in 1st row & column with gap penalty multiples
 Fill in the matrix with max value of 3 possible moves:
 Vertical move: Score + gap penalty
 Horizontal move: Score + gap penalty
 Diagonal move: Score + match/mismatch score
 The optimal alignment score is in the lower-right corner
 To reconstruct the optimal alignment, trace back where the max
at each step came from, stop when hit the origin.
Example of Global Alignment
 Let gap = -2 Vertical move: Score + gap penalty
Horizontal move: Score + gap penalty
match = 1 Diagonal move: Score + match/mismatch score
mismatch = -1.

empty A A A C
empty 0 -2 -4 -6 -8
A -2 1 -1 -3 -5
G -4 -1 0 -2 -4
C -6 -3 -2 -1 -1

AAAC AAAC
A-GC -AGC
Finding the alignments that give
the highest score
 The arrows constitutes paths in the matrix, and for
finding the highest-scoring alignments, we can follow
the paths from H(m,n) backwards to H(0,0).

 if the arrow comes from H(i−1,j ), the column is


(qi ,−);
 if the arrow comes from H(i,j−1), the column is
(−,dj );
 if the arrow comes from H(i−1,j−1) , the column is
(qi ,dj ).
Example of Global Alignment
 Let gap = -2
match = 1
mismatch = -1.

empty A A A C
empty 0 -2 -4 -6 -8
A -2 1 -1 -3 -5
G -4 -1 0 -2 -4
C -6 -3 -2 -1 -1

AAAC AAAC
A-GC -AGC
Local Alignment Algorithm

The Smith-Waterman algorithm


In 1981, T. F. Smith and M. S. Waterman
developed a new algorithm capable of
detecting also local similarity
Let us suppose to have a long DNA sequence
and want to isolate each subsequence similar to
each part of the yeast genome
Smith & Waterman
 Place each sequence along one axis
 Place score 0 at the up-left corner
 Fill in 1st row & column with 0s
 Fill in the matrix with max value of 4 possible values:
 0
 Vertical move: Score + gap penalty
 Horizontal move: Score + gap penalty
 Diagonal move: Score + match/mismatch score
 The optimal alignment score is the max in the matrix
 To reconstruct the optimal alignment, trace back where the MAX
at each step came from, stop when a zero is hit
Local Alignment
 Let gap = -2
GATCACCT GATCACCT
match = 1 GATACCC GAT _ ACCC
mismatch = -1.
empty G A T C A C C T
empty
0 0 0 0 0 0 0 0 0
G 0 1 0 0 0 0 0 0 0
A 0 0 2 0 0 1 0 0 0
T 0 0 0 3 1 0 0 0 1
A 0 0 1 1 2 2 0 0 0
C 0 0 0 0 2 1 3 1 0
C 0 0 0 0 1 1 2 4 2
C 0 0 0 0 1 0 2 3 3
Vertical move: Score + gap penalty
Horizontal move: Score + gap penalty
Diagonal move: Score + match/mismatch score
Practice

Match = 3
CTTCAGC Mismatch= -2
Gap = -3
CTCAGC
Smith-Waterman Algorithm
Conclusion
The Needleman-Wunsch algorithm, for realizing global
alignments, and the technique by Smith and Waterman, for
local alignments, constitute the fundamental basis on
which numerous database search algorithms were built
BLAST
FASTA
Clustal
These algorithms use indexing techniques, heuristics, and
fast comparative methods to get a quick comparison
between a query sequence and an entire database

You might also like