0% found this document useful (0 votes)
107 views13 pages

Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!

Sequence alignment is an important bioinformatics task used for predicting gene function, database searching, gene finding, and measuring sequence divergence and evolution. It involves arranging DNA, RNA, or protein sequences to identify regions of similarity that may indicate functional or evolutionary relationships. There are two main types of sequence alignment: global alignment, which attempts to match an entire sequence, and local alignment, which finds regions of highest similarity. Algorithms like Needleman-Wunsch and Smith-Waterman are commonly used to perform pairwise sequence alignments.

Uploaded by

Malik Fhaim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views13 pages

Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!

Sequence alignment is an important bioinformatics task used for predicting gene function, database searching, gene finding, and measuring sequence divergence and evolution. It involves arranging DNA, RNA, or protein sequences to identify regions of similarity that may indicate functional or evolutionary relationships. There are two main types of sequence alignment: global alignment, which attempts to match an entire sequence, and local alignment, which finds regions of highest similarity. Algorithms like Needleman-Wunsch and Smith-Waterman are commonly used to perform pairwise sequence alignments.

Uploaded by

Malik Fhaim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 13

Sequence Alignment

Sequence alignment is the most important task in bioinformatics!


Sequence alignment is important for:

* prediction of function
* database searching
* gene finding
* sequence divergence
* sequence assembly
sequence similarity

Homology: genes that derive from a common ancestor-gene are called


homologs

Orthologous genes are homologous genes in different organisms

Paralogous genes are homologous genes in one organism that derive


from gene duplication

Gene duplication: one gene is duplicated in multiple copies that


therefore free to evolve and assume new functions
HOMOLOGOUS and PARALOGOUS
HOMOLOGOUS and PARALOGOUS vs ANALOGOUS
Causes for sequence (dis)similarity

mutation: a nucleotide at a certain location is replaced by


another nucleotide (e.g.: ATA → AGA)

insertion: at a certain location one new nucleotide is


inserted inbetween two existing nucleotides
(e.g.: AA → AGA)

deletion: at a certain location one existing nucleotide


is deleted (e.g.: ACTG → AC-G)

indel: an insertion or a deletion


The biological problem of sequence alignment

DNA-sequence-1

tcctctgcctctgccatcat---caaccccaaagt
|||| ||| ||||| ||||| ||||||||||||
tcctgtgcatctgcaatcatgggcaaccccaaagt

DNA-sequence-2
Algorithms

Needleman-Wunsch
Pairwise global alignment only.

Smith-Waterman
Pairwise, local (or global) alignment.

FASTA

BLAST
Pairwise heuristic local alignment
What is sequence alignment?
Sequence alignment is a way of arranging the sequences of
DNA, RNA or protein to identify regions of similarity that may be
a consequence of functional, structural or evolutionary
relationships between the sequences.
The procedure of comparing two (pair-wise alignment) or
more multiple sequences is to search for a series of individual
characters or patterns that are in the same order in the
sequences.
 There are two types of alignment: local and global.
Global alignment vs Local alignment
 Global alignment is attempting to match as much of the sequence as
possible.
The tool for Global alignment is based on Needleman-Wunsch algorithm.

 Local alignment is to try to find the regions with highest density of


matches. The tool for local alignment is based on Smith-Waterman.

 Both algorithms are derivates from the basic dynamic programming


algorithm.
L G P S S K Q T G K G S - S R I W D N
Global alignment
L N - I T K S A G K G A I M R L G D A

- - - - - - - T G K G - - - - - - - -
Local alignment
- - - - - - - A G K G - - - - - - - -
Why do sequence alignment?
 Sequence alignment is useful for discovering structural,
functional and evolutionary information in biological sequences.
 Sequences that are very much alike may have similar secondary
and 3D structure, similar function and likely a common ancestral
sequence. It is extremely unlikely that such sequences obtained
similarity by chance.
-- For DNA molecules with n nucleotides such probability is very
low P = 4-n.
-- For proteins with n nucleotides, the probability even much lower
P = 20 –n.
Sequence alignment makes the following tasks easy: 1.annotation
of new sequences; 2. modelling of protein structures; 3. design and
analysis of gene expression experiments
An example of aligning text strings
Raw Data ???
T C A T G
C A T T G
4 matches, 1 insertion
2 matches, 0 gaps T C A- T G
T C A T G | | | |
| | . C ATT G
C A T T G
4 matches, 1 insertion
3 matches (2 end gaps) T C A T - G
T C A T G . | | | |
| | | . C A T T G
. C A T T G
Terminologies of sequence comparison
 Sequence identity -- exactly the same Amino Acid or Nucleotide in the
same position.

 Sequence similarity -- Substitutions with similar chemical properties.

 Sequence homology -- general term that indicates evolutionary


relatedness among sequences; we usually measure of percentage
identity of sequence homology

 Pairwise alignment -- used to find the best-matching piecewise (local)


or global alignments of two query sequences. Pairwise alignments
can only be used between two sequences at a time.

 Multiple sequence alignment -- try to align all of the sequences in a


given query set.

You might also like