0% found this document useful (0 votes)
12 views

Lecture 6- Sequence Analysis

Uploaded by

aletimanaswini
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Lecture 6- Sequence Analysis

Uploaded by

aletimanaswini
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

ISC 211

Introduction to
Bioinformatics
Lecture 6 – Sequence Analysis
Dr. Athira B
Asst. Professor, CSE
IIIT Kottayam
 Suppose you have given a set of new DNA
sequences and ask to identify the
Functional/Structural/Biological features?
 How you can do this analysis?
 One solution is compare with already existing
known sequences- how they are similar?
 How to do this similarity checking?
Sequence Analysis
 Process of subjecting a DNA, RNA or peptide sequence to any of a
wide range of analytical methods to understand its features, function,
structure, or evolution.
 Objectives:
 To find similarity, often to infer if they are related (homologous)
 To identify intrinsic features of the sequence such as active sites,
post translational modification sites, gene-structures, reading
frames, distributions of introns and exons and regulatory elements
 To identify sequence differences and variations such as point
mutations and single nucleotide polymorphism (SNP) in order to
get the genetic marker.
 Revealing the evolution and genetic diversity of sequences and
organisms
 Identification of molecular structure from sequence alone
Methods

 Sequence Alignment - Pairwise and Multiple


sequence
 Comparison against large databases
Sequence Alignment

 Procedure of comparing two or more sequences by


searching for a series of individual characters or
character patterns
 Identify same characters in the same row
 Alignment can be local/global
Sequence Alignment
 Biological Problem
 Sequence alignment is a way of arranging protein (or DNA)
sequences to identify regions of similarity that may be a
consequence of evolutionary relationships between the sequences.
 Genome sequencing allows comparison of organisms at DNA and
protein levels
 Comparisons can be used to
 Find evolutionary relationships between organisms
 Identify functionally conserved sequences
 Identify corresponding genes in human and model organisms:
develop models for human diseases
Sequence Homology

 Homology: genes that derive from a common ancestor-gene are


called homologs
 Orthologous genes are homologous genes in different organisms
 Paralogous genes are homologous genes in one organism that
derive from gene duplication
 Gene duplication: one gene is duplicated in multiple copies that
therefore free to evolve and assume new functions
Sequence similarity

 Intuitively, similarity of two sequences refers to the


degree of match between corresponding positions in
sequence
 Sequence similarity is not sequence homology
 Homology is more difficult to detect over greater
evolutionary distances
Causes of Gene (dis) similarity

 Mutation: a nucleotide at a certain location is replaced by


another
nucleotide ATA → AGA
 Insertion: at a certain location one new nucleotide is inserted
in
between two existing nucleotides (e.g.: AA → AGA)
 Deletion: at a certain location one existing nucleotide is
deleted (e.g.: ACTG → AC-G)
 Indel: an insertion or a deletion
Sequence Alignment

 Find the similarity between two (or more) DNA-sequences by


finding
a good alignment between them
 Alignment specifies which positions in two sequences match
Sequence Alignment
 Sequence alignment is an arrangement of two or more
sequences,
highlighting their similarity.
 The sequences are padded with gaps (dashes) so that
wherever
possible, columns contain identical characters from the
sequences
involved
Sequence Alignment

 Pairwise Sequence Alignment: methods are concerned with


finding
the best-matching piece-wise local or global alignments of protein
(amino acid) or DNA (nucleic acid) sequences.
 Global Alignment: an alignment in which all the characters in
both
sequences participate in the alignment.
 Local Alignment: a matching two sequence from regions which
have
more similar with each other
Algorithms

 Needleman-Wunsch
Pairwise global alignment only.
 Smith-Waterman
Pairwise, local (or global) alignment.
 BLAST
Pairwise heuristic local alignment
The Needleman-Wunsch algorithm

 The Needleman-Wunsch algorithm (1970, J Mol Biol. 48(3):443-


53)
performs a global alignment on two sequences (s and t) and is
applied to align protein or nucleotide sequences.
 The Needleman-Wunsch algorithm is an example of dynamic
programming, and is guaranteed to find the alignment with the
maximum score.
 Eg: sequences
where s(xi , yj ) is the substitution cost and d is the gap penalty
Dynamic Programming-steps

1. Initialization of the score matrix


2. Calculation of scores and filling the traceback matrix
3. Deducing the alignment from the traceback matrix
Let’s work on this simple example

 Input: AAG (sequence #1) , AGC (sequence #2)

 Gap penalty = -5
 Step 1
 Step2

 Final Table
Exercise

 Seq 1: GAATTC Seq 2 : GATAC


 Match = 2 , mismatch = -1, gap = -2
Smith-Waterman local (or global) alignment.
Example 1 :
Seq 1: GAATTC Seq 2 : GATAC
Match = 2 , mismatch = -1, gap = -2
Example 2

 Seq 1: GAATTCAT Seq 2 : CCTCATG


 Starting score: 0, match = 2, mismatch = -1, gap = -2

You might also like