0% found this document useful (0 votes)
21 views28 pages

Lecture 6 - Sequence Analysis

Uploaded by

aletimanaswini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views28 pages

Lecture 6 - Sequence Analysis

Uploaded by

aletimanaswini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

ISC 211

Introduction to
Bioinformatics
Lecture 6 – Sequence Analysis
Dr. Athira B
Asst. Professor, CSE
IIIT Kottayam
 Suppose you have given a set of new DNA
sequences and ask to identify the
Functional/Structural/Biological features?
 How you can do this analysis?
 One solution is compare with already existing
known sequences- how they are similar?
 How to do this similarity checking?
Sequence Analysis
 Process of subjecting a DNA, RNA or peptide sequence to any of a
wide range of analytical methods to understand its features, function,
structure, or evolution.
 Objectives:
 To find similarity, often to infer if they are related (homologous)
 To identify intrinsic features of the sequence such as active sites,
post translational modification sites, gene-structures, reading
frames, distributions of introns and exons and regulatory elements
 To identify sequence differences and variations such as point
mutations and single nucleotide polymorphism (SNP) in order to
get the genetic marker.
 Revealing the evolution and genetic diversity of sequences and
organisms
 Identification of molecular structure from sequence alone
Methods

 Sequence Alignment - Pairwise and Multiple


sequence
 Comparison against large databases
Sequence Alignment

 Procedure of comparing two or more sequences by


searching for a series of individual characters or
character patterns
 Identify same characters in the same row
 Alignment can be local/global
Sequence Alignment
 Biological Problem
 Sequence alignment is a way of arranging protein (or DNA)
sequences to identify regions of similarity that may be a
consequence of evolutionary relationships between the sequences.
 Genome sequencing allows comparison of organisms at DNA and
protein levels
 Comparisons can be used to
 Find evolutionary relationships between organisms
 Identify functionally conserved sequences
 Identify corresponding genes in human and model organisms:
develop models for human diseases
Sequence Homology

 Homology: genes that derive from a common ancestor-gene are


called homologs
 Orthologous genes are homologous genes in different organisms
 Paralogous genes are homologous genes in one organism that
derive from gene duplication
 Gene duplication: one gene is duplicated in multiple copies that
therefore free to evolve and assume new functions
Sequence similarity

 Intuitively, similarity of two sequences refers to the


degree of match between corresponding positions in
sequence
 Sequence similarity is not sequence homology
 Homology is more difficult to detect over greater
evolutionary distances
Causes of Gene (dis) similarity

 Mutation: a nucleotide at a certain location is replaced by


another
nucleotide ATA → AGA
 Insertion: at a certain location one new nucleotide is inserted
in
between two existing nucleotides (e.g.: AA → AGA)
 Deletion: at a certain location one existing nucleotide is
deleted (e.g.: ACTG → AC-G)
 Indel: an insertion or a deletion
Sequence Alignment

 Find the similarity between two (or more) DNA-sequences by


finding
a good alignment between them
 Alignment specifies which positions in two sequences match
Sequence Alignment
 Sequence alignment is an arrangement of two or more
sequences,
highlighting their similarity.
 The sequences are padded with gaps (dashes) so that
wherever
possible, columns contain identical characters from the
sequences
involved
Sequence Alignment

 Pairwise Sequence Alignment: methods are concerned with


finding
the best-matching piece-wise local or global alignments of protein
(amino acid) or DNA (nucleic acid) sequences.
 Global Alignment: an alignment in which all the characters in
both
sequences participate in the alignment.
 Local Alignment: a matching two sequence from regions which
have
more similar with each other
Algorithms

 Needleman-Wunsch
Pairwise global alignment only.
 Smith-Waterman
Pairwise, local (or global) alignment.
 BLAST
Pairwise heuristic local alignment
The Needleman-Wunsch algorithm

 The Needleman-Wunsch algorithm (1970, J Mol Biol. 48(3):443-


53)
performs a global alignment on two sequences (s and t) and is
applied to align protein or nucleotide sequences.
 The Needleman-Wunsch algorithm is an example of dynamic
programming, and is guaranteed to find the alignment with the
maximum score.
 Eg: sequences
where s(xi , yj ) is the substitution cost and d is the gap penalty
Dynamic Programming-steps

1. Initialization of the score matrix


2. Calculation of scores and filling the traceback matrix
3. Deducing the alignment from the traceback matrix
Let’s work on this simple example

 Input: AAG (sequence #1) , AGC (sequence #2)

 Gap penalty = -5
 Step 1
 Step2

 Final Table
Exercise

 Seq 1: GAATTC Seq 2 : GATAC


 Match = 2 , mismatch = -1, gap = -2
Smith-Waterman local (or global) alignment.
Example 1 :
Seq 1: GAATTC Seq 2 : GATAC
Match = 2 , mismatch = -1, gap = -2
Example 2

 Seq 1: GAATTCAT Seq 2 : CCTCATG


 Starting score: 0, match = 2, mismatch = -1, gap = -2

You might also like