0% found this document useful (0 votes)

43 views26 pages

8-5-19-Sequence Alignment in Gpu

1. The document discusses sequence alignment techniques for biological sequences using GPU platforms. 2. It describes the Smith-Waterman and Needleman-Wunsch algorithms for local and global sequence alignment using dynamic programming. The algorithms have quadratic time and space complexity. 3. Memory efficient and time efficient techniques for sequence alignment using pointing matrices and skewing transformations on GPUs are proposed to reduce the computational complexity to linear or quadratic in the shorter sequence length.

Uploaded by

Siba Prasad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views26 pages

8-5-19-Sequence Alignment in Gpu

Uploaded by

Siba Prasad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Indian Institute of Engineering Science and Technology, Shibpur, Howrah, India

SEQUENCE ALIGNMENT
IN GPU PLATFORM
PRESENTED BY- GUIDED BY-
1. KINJAL RAY(510517001) Dr. SURAJEET GHOSH
2. KISHAN SAHU(510517009)
3. SOLANKI KUNDU(510517022)
4. DEBAROTI CHOWDHURY(510517079)

1
Content:
 Sequence Alignment
 CUDA/GPU architecture
 Objectives of different sequence Algorithms
 Smith-Waterman Algorithms
 Needleman-Wunsch Algorithms
 Memory Efficient DNA Sequence Alignment Technique Using Pointing Matrix
 Time efficient Alignment using skewing transformation over Pointing Matrix
 Simulations
 Conclusion
 References

2
Sequence Alignment
 What is Sequence Alignment ?
 Why align Sequences ?
 Types :
 Global Alignment :
- Example : Needleman-Wunsch algorithm
- ACTACTAGATTACTTACGGATCAGGTACTTTAGAGGCTTGCAACCA

- ||| | |||||| |||||||||||||||

- TACTCACGGATGAGGTACTTTAGAGGC

 Local Alignment :
– Example: Smith-Waterman algorithm
ACTACTAGATTACTTACGGATCAGGTACTTTAGAGGCTTGCAACCA

||||||||||| ||||||| |||||||||||||| |||||||

ACTACTAGATT----ACGGATC--GTACTTTAGAGGCTAGCAACCA

3
CUDA/GPU ARCHITECTURE[2]:
 What is CUDA ?
 CUDA allows the programmer to program for Nvidia graphics cards with an
extension of the C programming language.
 Philosophy: provide minimum set of extensions to expose power.
 Declaration Specifiers : __global__ void KerenlFunc();//kernel func runs on device

device int GlobalVar;//variable in device memory

shared int SharedVar;//variable per block shared memory

 Why GPU?
 high performance
 running parallel tasks.
 Applications of CUDA :
 3-D image analysis,
 Bio-informatics ,
 Biological Simulations
4
Objective:
 Needleman–Wunsch algorithm : To perform global sequence alignment
between two nucleotide or amino acid sequences and find-out structural or
functional similarities using dynamic programming.

Worst-case performance O(mn)

Worst-case space complexity O(mn)

 Smith-Waterman algorithm : To perform local sequence alignment between

two nucleotide or amino acid sequences and find-out structural or functional
similarities using dynamic programming.

Worst-case performance O(mn)

Worst-case space complexity O(mn)
Here m and n are the lengths of the aligning sequences.

5
 Memory efficient Alignment using Pointing Matrix : The proposed DNA sequence
alignment technique uses a novel concept of pointing matrix. The directed path in the
pointing matrix ensures faster and accurate finding of the optimal alignment pertaining with
the accuracy ensured by the well known Needleman & Wunsch algorithm .

Time Complexity(Pointing matrix formation) O(mn)

Time Complexity(Scoring matrix formation) O(mn)
Worst-case space complexity O(mn + 2*max(m,n)) = O(mn)
 Time efficient Alignment using skewing transformation over Pointing Matrix :
We present a new parallel approach of Needleman Wunsch algorithm for global sequence
alignment. This approach uses skewing transformation for traversal and calculation of the
dynamic programming matrix.

Time Complexity(Skewing matrix formation ) O(m+n)

Time Complexity(Foundation matrix formation ) O(m+n)
Worst-case space complexity O(mn +3* min(m ,n)) = O(mn)
Here m and n are the lengths of the aligning sequences.
6
Smith-waterman algorithm [3]
This algorithm was proposed by Temple F. Smith and Michael S. Waterman in
1981.
The sequence is constructed by these rules-

 Determining of substitution matrix and the gap penalty scheme.

 Initializing the scoring matrix

 Matrix filling with the appropriate scheme

 Trace backing the sequences for an optimal alignments

7
Animation source: Wikipedia
8
Needleman-wunsch algorithm [1]

9
Filling up the dynamic programming matrix dp[n][n]

 The first row and first column are filled with index * gap penalty
dp[i][0] = dp[0][i] = GAP * i

 For every cell (i, j) in the matrix we compute:

top = dp[i-1][j] + GAP
left = dp[i][j-1] + GAP
diagonal = dp[i-1][j-1] + MATCH , if the corresponding characters match
diagonal = dp[i-1][j-1] + MISMATCH , if the corresponding characters do not match
dp[i][j] = max(top , left , diagonal)

 Trace back matrix is filled accordingly

10
Tracing arrows back to origin
The sequence is constructed by these rules-
 A diagonal arrow represents a match or mismatch, so the
letters of the column and the letter of the row of the
origin cell will align.

 A horizontal or vertical arrow represents an indel.

Horizontal arrows will align a gap ("-") to the letter of the
row (the "side" sequence), vertical arrows will align a gap
to the letter of the column (the "top" sequence).

 If there are multiple arrows to choose from, they

represent a branching of the alignments. If two or more
branches all belong to paths from the bottom left to the
top right cell, they are equally viable alignments. In this
case, note the paths as separate alignment candidates.

Sequences Best alignments

-------------- ----------------------
GCATGCU GCATG-CU GCA-TGCU GCAT-GCU
GATTACA G-ATTACA G-ATTACA G-ATTACA
11
A Memory Efficient DNA Sequence Alignment Technique Using
Pointing Matrix [5] :
Formation of Pointing Matrix :
 Filling up of each cell requires only its three
previously filled neighboring cell information.
Hence, each row i requires information of only its
previous row, i.e., (i − 1). Keeping the current row
unchanged and over-writing its content to its
previous row .
 At first, the first row is filled up based on the initial
gap penalties and the second row is filled consulting
the first row’s content. (a) Two DNA Sequences S1 and S2, (b)
Time variant Score Matrix shown by 2
 Afterward, the first row is over written by the scores rows at a time (OR and CR) (c) Pointing
of the second row and the second row is utilized for Matrix, 0: termination point, 1: Vertical
calculating the third row’s content based on the move, 2: Diagonal move, 3: Horizontal
move.
content of the 1st row.
 Similarly, the fourth row (which uses information of
only the third row) is calculated and over writes the
second row. No loss of information actually takes
12
place.
Pseudo Code for creation of pointing and Scoring Matrix :

13
Sequence Alignment using Pointing Matrix :
 The alignment begins from the last cell (last row last
column), i.e., from the bottom right corner cell (n + 1,
m + 1) of the pointing matrix.
 As discussed earlier, the movement between the
neighboring cells (viz., horizontal, vertical and
diagonal) are taken place based on the pointing
matrix’s cell value .
 A vertical movement happens if the cell value is found
to be ‘1’ and the corresponding nucleotide associated
with sequence (S2) is placed, however, in S2 a Gap “−”
is placed.
 If the cell value is found to be ‘2’ then a diagonal
movement occurs and the corresponding nucleotide (a) Generation of aligned sequence using
associated with sequences i.e., (S1 and S2) are placed. Pointing Matrix, (b) Status of Score
Matrix after full traversal of the
 A value of ‘3’ denotes a horizontal move which places sequences and (c) Final aligned
a gap in the alignment of the second DNA strand. The sequences of S1 and S2.
similar procedure is followed until it reaches the (0, 0)

14
Time efficient Alignment using skewing transformation
over Pointing Matrix [5] :
DNA_1 = A C T m = length(DNA_1)
DNA_2 = A C A A n = length(DNA_2)
Step of the algorithm as follows:
 Initialization: We use a Foundation Matrix FM[3][min(N+1, M+1)] .Fill
FM[0][0] = 0 ,
 FM[1][0] = -GAP ,
 FM[1][1] = -GAP

E L L L L

U
 Initialization of Traceback Matrix (TM): TM[m+1][n+1]
U
TM[0][0] = ‘E’ E= End point
TM[0][i] = ‘L’ L = Left , i = 1 to n ;
TM[i][0] = ‘U’ U = Up , i = 1 to m ;

15
Traceback Matrix Fill using Foundation Matrix:

 Fill Traceback Matrix using Foundation Matrix(FM)

from the value L, D & U .
A C A A

A LP MP FP
C 2 1 0
T 0 2 1
1 0 2
2 1 0
0 2 1
1 0 2

A C A A

A
C
T 16
3 different approach for different length sequences
For m = n For m > n
END Point
Initialization
Up to m times parallelization
diagonal value taking from 1st
array 1st position,
After that diagonal value taking
from 1st array 2nd position of
the Foundation Matrix
respectively
For m < n

Where m = length(Seq_1)
n = length(Seq_2)

17
Simulations
 System specifications:
 GPU : NVIDIA GeForce 940MX
 CPU : Intel(TM) i5 2.50GHz
 Memory : 8 GB

 Constraints:
 Maximum sequence length : 300 characters
 Maximum number of simultaneous queries (maximum number of threads) : 1024

18
CUDA IMPLEMENTATION

Alignment Time
Length nw(μs) nm(μs) pm(μs) seq(μs)
16 190.63 343.97 202.59 2479
32 330.88 814.7 375.81 5477
64 914.38 3015 1112.4 12100
128 2753 10041 3547.3 26735
256 9990 43413 12998 59167

19
memcpy HtoD

Length nw(μs) nm(μs) pm(μs)

16 1.32 1.31 1.31
32 1.4 1.31 1.34
64 1.4 1.32 1.34
128 1.36 1.34 1.47
256 1.55 1.53 1.57

memcpy DtoH
Length nw(μs) nm(μs) pm(μs)
16 2.14 2.14 2.11
32 2.23 2.17 2.14
64 2.24 2.3 2.17
128 2.28 2.36 2.17
256 2.38 2.49 2.33

20
C IMPLEMENTATION

Alignment Time
Length pm (us) pm_skew (10 ms)
16 1.994 8.6457
32 7.979 21.1027
64 34.903 63.6043
128 271.642 235.5422
256 805.223 620.4874

21
Sequence Length vs Alignment time (Pointing
Matrix Implementation)

length pm (us)

16 1.994

32 7.979

64 34.903

128 171.642

256 805.223

22
Sequence Length vs Alignment time (Pointing
Matrix with Skewing transformation )

length pm_skew (ms)

16 86.457

32 211.027

64 636.043

128 2355.422

256 6204.874

23
Conclusion and future work:
Conclusion:
 A memory efﬁcient approach and time efficient approach for DNA sequence
alignment has been presented .
 Parallel computations
 Performance optimization
 Minimize CPU  GPU data transfers.

Future Work:
 Study of different Sequence alignment algorithms.
 Improvement of implemented algorithm to allow sequences of larger length
 Cuda implemetation of Time efficient proposed algorithm

24
References
[1] S. B. Needleman and C. D. Wunsch, “A General Method Applicable to the
Search for Similarities in the Amino Acid Sequence of Two Proteins,” Journal of
Molecular Biology, vol. 48, no. 3, 1970.
[2] “NVIDIA CUDA Programming Guide, Version 4.2.” [Online]. Available:
http://developer.download.nvidia.com/compute/
DevZone/docs/html/C/doc/CUDA C Programming Guide.pdf
[3] T. F. Smith and M. S. Waterman, “Identiﬁcation of Common Molecular
Subsequences,” Journal of Molecular Biology, vol. 147, no. 1, pp. 195– 197, March
1981
[4] Farrar, M.: Striped Smith-Waterman speeds database searches six times over
other SIMD implementations. Bioinformatics 23, pp 156, 161 (2007)
[5] S. S. Ray, A. Banerjee, A. Datta and S. Ghosh, "A memory efficient DNA
sequence alignment technique using pointing matrix," 2016 IEEE Region 10
Conference (TENCON), Singapore, 2016, pp. 3559-3562.
doi: 10.1109/TENCON.2016.7848720

25
THANK YOU

Automatic Process Control Computer: Measuring Run Versine / Longitudinal Level Compensation
100% (1)
Automatic Process Control Computer: Measuring Run Versine / Longitudinal Level Compensation
30 pages
Daa Assignment 10 Aryan Project
No ratings yet
Daa Assignment 10 Aryan Project
11 pages
Bioinfo Generic Skill
No ratings yet
Bioinfo Generic Skill
10 pages
MIT6 047F15 Lecture03
No ratings yet
MIT6 047F15 Lecture03
56 pages
Bioinformatics Basics PDF
No ratings yet
Bioinformatics Basics PDF
10 pages
Lecture5 Newest
No ratings yet
Lecture5 Newest
124 pages
Tabby
No ratings yet
Tabby
11 pages
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
No ratings yet
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
47 pages
Ada 1
No ratings yet
Ada 1
9 pages
Pooja Anshul Saxena Engr 692: Special Topics - Computational Biology
No ratings yet
Pooja Anshul Saxena Engr 692: Special Topics - Computational Biology
24 pages
Module 3 CSE3069 (Bioinformatics)
No ratings yet
Module 3 CSE3069 (Bioinformatics)
57 pages
Needleman Wunsch
100% (1)
Needleman Wunsch
6 pages
Accelerating DNA Pairwise Sequence Alignment Using FPGA and A Customized Convolutional Neural Network - ScienceDirect
No ratings yet
Accelerating DNA Pairwise Sequence Alignment Using FPGA and A Customized Convolutional Neural Network - ScienceDirect
9 pages
Sequence Alignment Methods and Algorithms
No ratings yet
Sequence Alignment Methods and Algorithms
37 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Lecture 4.1 and 4.2 Sequence Alignment (Global and Local)
No ratings yet
Lecture 4.1 and 4.2 Sequence Alignment (Global and Local)
14 pages
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
No ratings yet
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
8 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Data Mining-Mining Sequence Patterns in Biological Data
No ratings yet
Data Mining-Mining Sequence Patterns in Biological Data
6 pages
L-8 Global Alignment
No ratings yet
L-8 Global Alignment
19 pages
Local DNA Sequence Alignment in A Cluster of Workstations: Algorithms and Tools
No ratings yet
Local DNA Sequence Alignment in A Cluster of Workstations: Algorithms and Tools
8 pages
Sequence Alignment: Lecture - 4
No ratings yet
Sequence Alignment: Lecture - 4
19 pages
Bio Ass
No ratings yet
Bio Ass
3 pages
Alignment Methods: Introduction To Global and Local Sequence Alignment Methods
No ratings yet
Alignment Methods: Introduction To Global and Local Sequence Alignment Methods
57 pages
Early Sequence Aligment
No ratings yet
Early Sequence Aligment
14 pages
Dynamic Programming
No ratings yet
Dynamic Programming
28 pages
Unit I Algorithms
No ratings yet
Unit I Algorithms
42 pages
Sequence Alignment Report
No ratings yet
Sequence Alignment Report
9 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
Hierarchical Clustering Implementation
No ratings yet
Hierarchical Clustering Implementation
34 pages
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
No ratings yet
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
18 pages
Sequence Alignment
No ratings yet
Sequence Alignment
9 pages
Needlemanwunsch 130216130832 Phpapp01
No ratings yet
Needlemanwunsch 130216130832 Phpapp01
39 pages
Dynamic Programming Methods in Pairwise Alignment
No ratings yet
Dynamic Programming Methods in Pairwise Alignment
41 pages
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
Lecture 5 Introduction Dynamic Programming
No ratings yet
Lecture 5 Introduction Dynamic Programming
52 pages
Batch 17 Final
No ratings yet
Batch 17 Final
38 pages
4.1. Pairwise Alignment - 2
No ratings yet
4.1. Pairwise Alignment - 2
4 pages
36) Corpet 1988
No ratings yet
36) Corpet 1988
10 pages
Sequence Alignment Thesis
100% (2)
Sequence Alignment Thesis
6 pages
Memo Ization
No ratings yet
Memo Ization
4 pages
Sequence Alignment
No ratings yet
Sequence Alignment
92 pages
Laboratory Work Preparation Lab Work 8: Sequence Alignment: Biomedical Informatics
No ratings yet
Laboratory Work Preparation Lab Work 8: Sequence Alignment: Biomedical Informatics
32 pages
Lecture 4
No ratings yet
Lecture 4
57 pages
Sequence Alignment
No ratings yet
Sequence Alignment
36 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Lab5 Ch2 Sequence Similarity PDF
No ratings yet
Lab5 Ch2 Sequence Similarity PDF
95 pages
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
No ratings yet
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
34 pages
HW1 2014
No ratings yet
HW1 2014
2 pages
Running BLAST Through Perl
No ratings yet
Running BLAST Through Perl
35 pages
Batch 17
No ratings yet
Batch 17
51 pages
Introduction Dynamic Programming
No ratings yet
Introduction Dynamic Programming
52 pages
Sequence Alignment Methods
No ratings yet
Sequence Alignment Methods
32 pages
Lecture2 Sequence Alignment
No ratings yet
Lecture2 Sequence Alignment
26 pages
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
No ratings yet
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
17 pages
Lectures 9-12
No ratings yet
Lectures 9-12
39 pages
Algorithm
No ratings yet
Algorithm
36 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
Sequence Comparison Part 3
No ratings yet
Sequence Comparison Part 3
22 pages
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
From Everand
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
Fouad Sabry
No ratings yet
W. James Kent - BLAT-The BLAST-Like Alignment Tool
No ratings yet
W. James Kent - BLAT-The BLAST-Like Alignment Tool
10 pages
EX 1 TREE THINKING CONCEPTS Worksheet
No ratings yet
EX 1 TREE THINKING CONCEPTS Worksheet
8 pages
download/computer Science PDF
No ratings yet
download/computer Science PDF
20 pages
Multiple Sequence Alignment Black and White
No ratings yet
Multiple Sequence Alignment Black and White
2 pages
B. Tech. Biotechnology Syllabus CDFST
No ratings yet
B. Tech. Biotechnology Syllabus CDFST
42 pages
CV Ozer HG
No ratings yet
CV Ozer HG
5 pages
Jalview 2.8: A Manual and Introductory Tutorial
No ratings yet
Jalview 2.8: A Manual and Introductory Tutorial
89 pages
Applications and Trends in Data Mining
100% (1)
Applications and Trends in Data Mining
20 pages
Protein Structure Prediction Thesis
100% (3)
Protein Structure Prediction Thesis
8 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
25 pages
MCQ A
No ratings yet
MCQ A
11,493 pages
Introduction To Bioinformatics - Notes
No ratings yet
Introduction To Bioinformatics - Notes
18 pages
Q Paper MBT-OE305 BIOINFORMATICS
No ratings yet
Q Paper MBT-OE305 BIOINFORMATICS
4 pages
BLOSUM
No ratings yet
BLOSUM
3 pages
Smith Waterman
No ratings yet
Smith Waterman
9 pages
Experiment 9 Bioinformatics Tools For Cell and Molecular Biology
No ratings yet
Experiment 9 Bioinformatics Tools For Cell and Molecular Biology
11 pages
Eyrich Bioinformatics 2001
No ratings yet
Eyrich Bioinformatics 2001
2 pages
ZMap User Manual
No ratings yet
ZMap User Manual
26 pages
CBR PHD Courses 2022
No ratings yet
CBR PHD Courses 2022
8 pages
Introduction To Bioinformatics: Database Search (FASTA)
No ratings yet
Introduction To Bioinformatics: Database Search (FASTA)
35 pages
Serial and Parallel Implementation of Needleman-Wunsch Algorithm
No ratings yet
Serial and Parallel Implementation of Needleman-Wunsch Algorithm
12 pages
Virtual Bacterial Identification Introduction
100% (1)
Virtual Bacterial Identification Introduction
11 pages
BIO 312 - Final Exam - SBU
100% (1)
BIO 312 - Final Exam - SBU
12 pages
9 Blast1
No ratings yet
9 Blast1
8 pages
Advanced Dairy Chemistry-1 Proteins: January 2003
No ratings yet
Advanced Dairy Chemistry-1 Proteins: January 2003
45 pages
Phylo Done
No ratings yet
Phylo Done
5 pages
Bioinformatics-And-Phylogeny
No ratings yet
Bioinformatics-And-Phylogeny
14 pages
BIOINFORMATICS
No ratings yet
BIOINFORMATICS
2 pages

8-5-19-Sequence Alignment in Gpu

Uploaded by

8-5-19-Sequence Alignment in Gpu

Uploaded by

Indian Institute of Engineering Science and Technology, Shibpur, Howrah, India

- ||| | |||||| |||||||||||||||

||||||||||| ||||||| |||||||||||||| |||||||

__device__ int GlobalVar;//variable in device memory

__shared__ int SharedVar;//variable per block shared memory

Worst-case performance O(mn)

 Smith-Waterman algorithm : To perform local sequence alignment between

Worst-case performance O(mn)

Time Complexity(Pointing matrix formation) O(mn)

Time Complexity(Skewing matrix formation ) O(m+n)

 Determining of substitution matrix and the gap penalty scheme.

 Initializing the scoring matrix

 Matrix filling with the appropriate scheme

 Trace backing the sequences for an optimal alignments

 For every cell (i, j) in the matrix we compute:

 Trace back matrix is filled accordingly

 A horizontal or vertical arrow represents an indel.

 If there are multiple arrows to choose from, they

Sequences Best alignments

 Fill Traceback Matrix using Foundation Matrix(FM)

Length nw(μs) nm(μs) pm(μs)

length pm_skew (ms)

You might also like

device int GlobalVar;//variable in device memory

shared int SharedVar;//variable per block shared memory