0% found this document useful (0 votes)

22 views36 pages

Sequence Alignment

Uploaded by

aimalktk02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views36 pages

Sequence Alignment

Uploaded by

aimalktk02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Assignment 2

NEXT GENERATION SEQUENCING

WHAT ARE THE MAIN STEPS
WHAT IS THE BENIFIT
Sequence Alignment

• Sequence alignment is the process by which two or more biological

sequences are matched to show optimal similarity.
• DNA sequence alignments, RNA sequence alignments and protein
sequence alignments are routinely performed.
• Sequence alignment is useful for inferring function, structure and
evolutionary information.
• Expectedly, sequence comparison through sequence alignment is central
to most bioinformatics analysis.
• It is the first step towards understanding the evolutionary relationship
and the pattern of divergence between two sequences.
• The relationship between two sequences also helps predict the potential
function of an unknown sequence, thereby indicating protein family
relationship.
Sequence Alignment

• As new biological sequences are being generated at exponential

rates, sequence comparison is becoming increasingly important to
draw functional and evolutionary inference of a new protein with
proteins already existing in the database.
• The most fundamental process in this type of comparison is
sequence alignment.
• Sequence Alignment is an important first step toward structural and
functional analysis of newly determined sequences.
• Sequence identity means the same residues being present at
corresponding positions in two sequences being compared. For
proteins, it means the same amino acids; for nucleic acids, it means
the same bases.

• Sequence similarity means similar residues being present at

corresponding positions in the two sequences being compared. For
nucleic acids, sequence similarity and sequence identity are the
same. However, for proteins, sequence similarity involves amino
acids with similar physicochemical and functional properties.
DNA Sequence similarity Vs sequence Identity

• For the nucleotide sequence of the DNA and RNA both words have
the same meaning. Because both the sequences have similar base
pairs like
Calculating seq.ID and seq.Sim in DNA and
RNA strands

% Seq. ID = No, of IDs

______________________ X 100
Total No, of R in shorter seq.

Calculate for AB, BC and AC.

ASSIGNMENT

Calculate the % similarity and identity between ALL the taxon's sequences

T1-T2 T2-T3 T3-T4 T4-T5

T1-T3 T2-T4 T3-T5
T1-T4 T2-T5
T1-T5
Protein Sequence similarity Vs sequence
Identity
Protein Scoring Systems
• Amino acids have different biochemical and physical properties
that influence their relative replace ability in evolution.
Protein homology
• Similar substitutions are also
referred to as conservative
substitutions. A conservative
amino acid substitution is not
expected to disrupt the
structural/functional attributes of
the protein.
• Sequence homology is an
evolutionary term. Sequences are
called homologous if they have a
common evolutionary origin—that
is, if they are derived from a
common ancestral sequence.
• So, sequences are either
homologous or not homologous
and there is no quantitation of
homology. However, even now,
expressions like “high homology,”
“significant homology,” and even
specifying a “% homology” are
very widely used.
Calculating seq.ID and seq.Sim in Proteins

• In proteins seq. identity and seq. similarity is a different thing

Alanine—Glycine
Aspartic acid–-Glutamic Acid
Lysine---Arginine
Threonine---Serine
Calculating seq.ID and seq.Sim in Proteins

Seq identity= (number of identical a.a/number all a.a) x 100

Seq similarity= (number of similar a.a/number all a.a) x 100
Calculating seq.ID and seq.Sim in Proteins

Seq identity= (number of identical a.a/number all a.a) x 100

Seq similarity= (number of similar a.a/number all a.a) x 100
Find out the %similarity and % identity of the
two peptide sequences
Seq.1 M C G T Q K H D L G V Y F H R P Q D Y
Seq.2 M C A T Q H H D I G T Y F H K P Q E W
Pairwise Alignment
• The alignment of two sequences (DNA or protein) is a
relatively straightforward computational problem.

1. Two sequences can always be aligned.

2. Sequence alignments have to be scored.

3. Often there is more than one solution with the same

score.
Types of Alignments
• The overall goal of pairwise sequence alignment is to find the best pairing
of two sequences, such that there is maximum correspondence among
residues.

• To achieve this goal, one sequence needs to be shifted relative to the

other to find the position where maximum matches are found.

• There are two different alignment strategies that are often used:
1. Global Alignment
2. Local Alignment
Global Alignment
• A global sequence-alignment method aligns and compares two sequences
along their entire length, and comes up with the best alignment that
displays the maximum number of nucleotides or amino acids aligned.
• The algorithm that drives global alignment is the Needleman-Wunsch
algorithm. The Needleman–Wunsch algorithm is an algorithm used in
bioinformatics to align protein or nucleotide sequences. It was one of the
first applications of dynamic programming to compare biological
sequences.
• Global alignment algorithm starts at the beginning of two sequences and
adds gaps to each until the end of one is reached.
• Global alignment works the best when the sequences are similar in
character and length. Because global alignment displays the best alignment
between two sequences using the entire sequence, it may miss a small
region of biological importance.
Local Alignment

• In contrast to global alignment, local sequence alignment is intended

to find the most similar regions in two sequences being aligned. The
algorithm that drives local alignment is the Smith-Waterman
algorithm.
• A local alignment algorithm finds the region of highest similarity
between two sequences and builds the alignment outward from this
region.
• Local alignment is useful for sequences that are not similar in
character and length, yet are suspected to contain small regions of
similarity, such as biologically important motifs.
Local and global Alignment in DNA or RNA
Local and global Alignment in proteins
Methods of Alignment

• By hand - slide sequences on two lines of a word

processor
• Dot plot
• with windows
• Dynamic programming
• Needleman Wunsch, Smith-Waterman (slow, optimal)
• Heuristic methods (fast, approximate)
• BLAST and FASTA
DNA sequence Alignment “By Hand”
Working by hand

SEQ1. TATGTCG mismatch=

SEQ2. TCAGTGC match=
gap =
DNA sequence Alignment “by Dot matrix”

• The basic sequence alignment method is the dot matrix or dot plot
method. In this method, two sequences being compared are written
in the vertical and horizontal axes of the matrix.
• Then each residue is scanned and each match is given a dot;
mismatches are left blank. When enough dots are lined up, they are
connected
ATTGTAC
ATGTTAC

ATTGTAC

ATGTTAC

ATTGTAC
ATGTTAC
Shift down in diagonal means
a gap must be added in sequence 1.
Shift right in diagonal means a gap
Must be added in sequence 2.

Break in the diagonal means

there is a mismatch
SCORING MATRIX, ALIGNMENT SCORE, AND
STATISTICAL SIGNIFICANCE OF SEQUENCE
ALIGNMENT
• For both nucleic acids and proteins, the alignment score is calculated
using a scoring matrix.
• A scoring matrix is a set of values representing the likelihood of one
residue being substituted by another during sequence divergence
through evolution.
• This is why the scoring matrix is also known as the substitution
matrix.
A C A G
For Match: 1
Mismatch: -1 A
C
T
G
Alignment Scoring function
• In the case of sequence alignment, dynamic programming involves
setting up a two-dimensional matrix in which one sequence is listed
vertically and the other sequence is listed horizontally; then
calculating the scores, one row at a time.
• For example,
• a match can be given = 1,
• a mismatch = -1,
• a gap = -2
• A 100% perfect alignment will produce a diagonal straight line (with a
negative slope) spanning from the top left to bottom right.
• If the alignment is not perfect, gaps are introduced in the matrix. For
the sequence represented horizontally, gaps are introduced vertically,
and for the sequence represented vertically, gaps are introduced
horizontally, and the alignment is determined by a trace back step.
Draw the scoring matrix of the given
sequences

A3
ALIGNMENT ALGORITHMS, GAPS, AND GAP
PENALTIES
• An algorithm is a step-by-step procedure that utilizes a finite number
of instructions for automated reasoning and the calculation of a
function.
• The algorithm that drives global alignment is the Needleman-Wunsch
algorithm, and the algorithm that drives local alignment is the Smith-
Waterman algorithm.
• Both these algorithms are examples of dynamic programming.
Dynamic programming is a method for solving complex problems by
breaking them down into simpler sub-problems.
Alignment Scoring function
• In both global and local alignment, the final output is given an
alignment score.
• Gaps have to be introduced to improve the alignment. The reason
gaps are introduced is because one of the sequences may have
gained or lost sequence characteristics (insertion-/-deletion) during
evolution that did not happen with the other sequence.
• The gap penalty value is subtracted from the gross alignment score
to obtain the final alignment score. The insertion of no more than 1
gap per 20 amino acid residues is ideal but that is not possible in
most cases.
• For each gap opened, a gap-opening penalty value is assigned, and
for each gap extended, a gap-extension penalty value is assigned.
• A gap-opening penalty is always much higher than a gap-extension
penalty. Often, a default value of -10 for a gap-opening penalty and -1
for a gap-extension penalty are used.
Alignment Score

• Adjustable differential penalty for gap opening and gap extension is

called affine gap penalty.

GW1000-ABEIP: Datalink Ethernet IP To DH+ Datalink AB Ethernet To DH+
No ratings yet
GW1000-ABEIP: Datalink Ethernet IP To DH+ Datalink AB Ethernet To DH+
25 pages
Indonesia (Suite) Wiring Diagram
No ratings yet
Indonesia (Suite) Wiring Diagram
1 page
SCDA PPT Presentation
100% (1)
SCDA PPT Presentation
20 pages
BLAST and Sequence Alignment
No ratings yet
BLAST and Sequence Alignment
36 pages
How To Run Celeb On Twitter
No ratings yet
How To Run Celeb On Twitter
7 pages
C WIPG 300Hv2 - S
No ratings yet
C WIPG 300Hv2 - S
7 pages
Example Pitch Desk
100% (1)
Example Pitch Desk
21 pages
Internet Service Provider Business Plan
No ratings yet
Internet Service Provider Business Plan
44 pages
Soal Quiz Oracle 1-5
100% (4)
Soal Quiz Oracle 1-5
47 pages
Use Case Specification - Place Rush Order
No ratings yet
Use Case Specification - Place Rush Order
2 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
HTTPWWW Jamris Org012010saveas Phpquestjamrisno012010p08-19
No ratings yet
HTTPWWW Jamris Org012010saveas Phpquestjamrisno012010p08-19
12 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
Lecture1 Loi
No ratings yet
Lecture1 Loi
52 pages
Lecture 6 Evolutionary Sequence Alignment Algorithms
No ratings yet
Lecture 6 Evolutionary Sequence Alignment Algorithms
26 pages
UCO Bank Statement Sample Format
No ratings yet
UCO Bank Statement Sample Format
5 pages
Bioinfo Notes 2
No ratings yet
Bioinfo Notes 2
9 pages
Productattachments Files Downloads Ezmimo 2-4ghz Datasheet
No ratings yet
Productattachments Files Downloads Ezmimo 2-4ghz Datasheet
1 page
BI Assignment 1
No ratings yet
BI Assignment 1
6 pages
Servers
No ratings yet
Servers
4 pages
6G Spectrum - Analyzer Device User Manual
No ratings yet
6G Spectrum - Analyzer Device User Manual
23 pages
Lesson 1 in ICT FIRST QUARTER
No ratings yet
Lesson 1 in ICT FIRST QUARTER
2 pages
Unit 3 Sequence Alignment and Phylogenetic Tree
No ratings yet
Unit 3 Sequence Alignment and Phylogenetic Tree
70 pages
CQF Brochure
No ratings yet
CQF Brochure
24 pages
ZYAROCK Artec Pot Leaflet (En)
No ratings yet
ZYAROCK Artec Pot Leaflet (En)
2 pages
Lec 02
No ratings yet
Lec 02
103 pages
03 - Sequence Alignment
No ratings yet
03 - Sequence Alignment
4 pages
L6-Pairwise Seq Alignment
No ratings yet
L6-Pairwise Seq Alignment
70 pages
Genomics and Similarity Search
No ratings yet
Genomics and Similarity Search
43 pages
CE6068 Lecture 5
No ratings yet
CE6068 Lecture 5
83 pages
The Derivative As The Slope of The Tangent Line
No ratings yet
The Derivative As The Slope of The Tangent Line
5 pages
Red Hat Enterprise Linux-9-Upgrading From RHEL 8 To RHEL 9-En-US
No ratings yet
Red Hat Enterprise Linux-9-Upgrading From RHEL 8 To RHEL 9-En-US
61 pages
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
No ratings yet
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
107 pages
BT302 L3 Psa
No ratings yet
BT302 L3 Psa
47 pages
IoT Based Street Light Controlling and M
No ratings yet
IoT Based Street Light Controlling and M
8 pages
DCCN Lab
No ratings yet
DCCN Lab
37 pages
Module II
No ratings yet
Module II
51 pages
Sequence Alignment
No ratings yet
Sequence Alignment
24 pages
Sequence Alingment
No ratings yet
Sequence Alingment
10 pages
Windchill REST Services 1.5
No ratings yet
Windchill REST Services 1.5
257 pages
Tabby
No ratings yet
Tabby
11 pages
5 Sequence Alignment
No ratings yet
5 Sequence Alignment
21 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
Module 3 CSE3069 (Bioinformatics)
No ratings yet
Module 3 CSE3069 (Bioinformatics)
57 pages
Importance and Significance of Sequence Alignment - pptx12
No ratings yet
Importance and Significance of Sequence Alignment - pptx12
15 pages
UGRD-EnG6204 Computer Aided Drafting Midterm Quiz 1
No ratings yet
UGRD-EnG6204 Computer Aided Drafting Midterm Quiz 1
11 pages
Sequence Alignment
No ratings yet
Sequence Alignment
18 pages
Lecture 3
No ratings yet
Lecture 3
39 pages
Sequence Alignment
No ratings yet
Sequence Alignment
25 pages
Lecture 04 Alignment
No ratings yet
Lecture 04 Alignment
22 pages
Lecture2 Sequence Alignment
No ratings yet
Lecture2 Sequence Alignment
26 pages
Parallel Database
No ratings yet
Parallel Database
27 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
Sequence Analysis - Alignment
No ratings yet
Sequence Analysis - Alignment
57 pages
IT Reviewer
No ratings yet
IT Reviewer
13 pages
Sequence Alignment
No ratings yet
Sequence Alignment
9 pages
Dynamic Programming Methods in Pairwise Alignment
No ratings yet
Dynamic Programming Methods in Pairwise Alignment
41 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
Bioinfo Generic Skill
No ratings yet
Bioinfo Generic Skill
10 pages
Sequence Alignment Methods
No ratings yet
Sequence Alignment Methods
32 pages
Lecture 6 - Sequence Analysis
No ratings yet
Lecture 6 - Sequence Analysis
28 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
Notes Bioinformatics
No ratings yet
Notes Bioinformatics
14 pages
BB - Cac Phuong Phap Dieu Khien Tien Tien Nham Nang Cao Chat Luong Va TKNL - 11tr
No ratings yet
BB - Cac Phuong Phap Dieu Khien Tien Tien Nham Nang Cao Chat Luong Va TKNL - 11tr
11 pages
L3.4 Alignment
No ratings yet
L3.4 Alignment
90 pages
Sequencing Alignment & Its Methods Group II
No ratings yet
Sequencing Alignment & Its Methods Group II
12 pages
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
No ratings yet
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
59 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Bioinformatics Pairwise Alignment
No ratings yet
Bioinformatics Pairwise Alignment
128 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences
No ratings yet
Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences
17 pages
Pega (PRPC) Concepts PDF
No ratings yet
Pega (PRPC) Concepts PDF
14 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Sequence Alignment: "Continuing.." (5th Week)
No ratings yet
Sequence Alignment: "Continuing.." (5th Week)
61 pages
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
No ratings yet
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
17 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
SAP S - 4HANA Sourcing and Procurement - 1
100% (2)
SAP S - 4HANA Sourcing and Procurement - 1
36 pages
Sequence Alignment Methods and Algorithms
No ratings yet
Sequence Alignment Methods and Algorithms
37 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
No ratings yet
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
13 pages
Pairwise Sequence Alignment
No ratings yet
Pairwise Sequence Alignment
12 pages
Bioinformatics: Sequence Alignment Methods
No ratings yet
Bioinformatics: Sequence Alignment Methods
32 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
CDM 400x300 en
No ratings yet
CDM 400x300 en
5 pages
Scaling Factors and Scaling Parameters
75% (4)
Scaling Factors and Scaling Parameters
22 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet