8-5-19-Sequence Alignment in Gpu
8-5-19-Sequence Alignment in Gpu
SEQUENCE ALIGNMENT
IN GPU PLATFORM
PRESENTED BY- GUIDED BY-
1. KINJAL RAY(510517001) Dr. SURAJEET GHOSH
2. KISHAN SAHU(510517009)
3. SOLANKI KUNDU(510517022)
4. DEBAROTI CHOWDHURY(510517079)
1
Content:
Sequence Alignment
CUDA/GPU architecture
Objectives of different sequence Algorithms
Smith-Waterman Algorithms
Needleman-Wunsch Algorithms
Memory Efficient DNA Sequence Alignment Technique Using Pointing Matrix
Time efficient Alignment using skewing transformation over Pointing Matrix
Simulations
Conclusion
References
2
Sequence Alignment
What is Sequence Alignment ?
Why align Sequences ?
Types :
Global Alignment :
- Example : Needleman-Wunsch algorithm
- ACTACTAGATTACTTACGGATCAGGTACTTTAGAGGCTTGCAACCA
- TACTCACGGATGAGGTACTTTAGAGGC
Local Alignment :
– Example: Smith-Waterman algorithm
ACTACTAGATTACTTACGGATCAGGTACTTTAGAGGCTTGCAACCA
ACTACTAGATT----ACGGATC--GTACTTTAGAGGCTAGCAACCA
3
CUDA/GPU ARCHITECTURE[2]:
What is CUDA ?
CUDA allows the programmer to program for Nvidia graphics cards with an
extension of the C programming language.
Philosophy: provide minimum set of extensions to expose power.
Declaration Specifiers : __global__ void KerenlFunc();//kernel func runs on device
Why GPU?
high performance
running parallel tasks.
Applications of CUDA :
3-D image analysis,
Bio-informatics ,
Biological Simulations
4
Objective:
Needleman–Wunsch algorithm : To perform global sequence alignment
between two nucleotide or amino acid sequences and find-out structural or
functional similarities using dynamic programming.
5
Memory efficient Alignment using Pointing Matrix : The proposed DNA sequence
alignment technique uses a novel concept of pointing matrix. The directed path in the
pointing matrix ensures faster and accurate finding of the optimal alignment pertaining with
the accuracy ensured by the well known Needleman & Wunsch algorithm .
7
Animation source: Wikipedia
8
Needleman-wunsch algorithm [1]
9
Filling up the dynamic programming matrix dp[n][n]
The first row and first column are filled with index * gap penalty
dp[i][0] = dp[0][i] = GAP * i
13
Sequence Alignment using Pointing Matrix :
The alignment begins from the last cell (last row last
column), i.e., from the bottom right corner cell (n + 1,
m + 1) of the pointing matrix.
As discussed earlier, the movement between the
neighboring cells (viz., horizontal, vertical and
diagonal) are taken place based on the pointing
matrix’s cell value .
A vertical movement happens if the cell value is found
to be ‘1’ and the corresponding nucleotide associated
with sequence (S2) is placed, however, in S2 a Gap “−”
is placed.
If the cell value is found to be ‘2’ then a diagonal
movement occurs and the corresponding nucleotide (a) Generation of aligned sequence using
associated with sequences i.e., (S1 and S2) are placed. Pointing Matrix, (b) Status of Score
Matrix after full traversal of the
A value of ‘3’ denotes a horizontal move which places sequences and (c) Final aligned
a gap in the alignment of the second DNA strand. The sequences of S1 and S2.
similar procedure is followed until it reaches the (0, 0)
14
Time efficient Alignment using skewing transformation
over Pointing Matrix [5] :
DNA_1 = A C T m = length(DNA_1)
DNA_2 = A C A A n = length(DNA_2)
Step of the algorithm as follows:
Initialization: We use a Foundation Matrix FM[3][min(N+1, M+1)] .Fill
FM[0][0] = 0 ,
FM[1][0] = -GAP ,
FM[1][1] = -GAP
E L L L L
U
Initialization of Traceback Matrix (TM): TM[m+1][n+1]
U
TM[0][0] = ‘E’ E= End point
TM[0][i] = ‘L’ L = Left , i = 1 to n ;
TM[i][0] = ‘U’ U = Up , i = 1 to m ;
15
Traceback Matrix Fill using Foundation Matrix:
A LP MP FP
C 2 1 0
T 0 2 1
1 0 2
2 1 0
0 2 1
1 0 2
A C A A
A
C
T 16
3 different approach for different length sequences
For m = n For m > n
END Point
Initialization
Up to m times parallelization
diagonal value taking from 1st
array 1st position,
After that diagonal value taking
from 1st array 2nd position of
the Foundation Matrix
respectively
For m < n
Where m = length(Seq_1)
n = length(Seq_2)
17
Simulations
System specifications:
GPU : NVIDIA GeForce 940MX
CPU : Intel(TM) i5 2.50GHz
Memory : 8 GB
Constraints:
Maximum sequence length : 300 characters
Maximum number of simultaneous queries (maximum number of threads) : 1024
18
CUDA IMPLEMENTATION
Alignment Time
Length nw(μs) nm(μs) pm(μs) seq(μs)
16 190.63 343.97 202.59 2479
32 330.88 814.7 375.81 5477
64 914.38 3015 1112.4 12100
128 2753 10041 3547.3 26735
256 9990 43413 12998 59167
19
memcpy HtoD
memcpy DtoH
Length nw(μs) nm(μs) pm(μs)
16 2.14 2.14 2.11
32 2.23 2.17 2.14
64 2.24 2.3 2.17
128 2.28 2.36 2.17
256 2.38 2.49 2.33
20
C IMPLEMENTATION
Alignment Time
Length pm (us) pm_skew (10 ms)
16 1.994 8.6457
32 7.979 21.1027
64 34.903 63.6043
128 271.642 235.5422
256 805.223 620.4874
21
Sequence Length vs Alignment time (Pointing
Matrix Implementation)
length pm (us)
16 1.994
32 7.979
64 34.903
128 171.642
256 805.223
22
Sequence Length vs Alignment time (Pointing
Matrix with Skewing transformation )
16 86.457
32 211.027
64 636.043
128 2355.422
256 6204.874
23
Conclusion and future work:
Conclusion:
A memory efficient approach and time efficient approach for DNA sequence
alignment has been presented .
Parallel computations
Performance optimization
Minimize CPU GPU data transfers.
Future Work:
Study of different Sequence alignment algorithms.
Improvement of implemented algorithm to allow sequences of larger length
Cuda implemetation of Time efficient proposed algorithm
24
References
[1] S. B. Needleman and C. D. Wunsch, “A General Method Applicable to the
Search for Similarities in the Amino Acid Sequence of Two Proteins,” Journal of
Molecular Biology, vol. 48, no. 3, 1970.
[2] “NVIDIA CUDA Programming Guide, Version 4.2.” [Online]. Available:
http://developer.download.nvidia.com/compute/
DevZone/docs/html/C/doc/CUDA C Programming Guide.pdf
[3] T. F. Smith and M. S. Waterman, “Identification of Common Molecular
Subsequences,” Journal of Molecular Biology, vol. 147, no. 1, pp. 195– 197, March
1981
[4] Farrar, M.: Striped Smith-Waterman speeds database searches six times over
other SIMD implementations. Bioinformatics 23, pp 156, 161 (2007)
[5] S. S. Ray, A. Banerjee, A. Datta and S. Ghosh, "A memory efficient DNA
sequence alignment technique using pointing matrix," 2016 IEEE Region 10
Conference (TENCON), Singapore, 2016, pp. 3559-3562.
doi: 10.1109/TENCON.2016.7848720
25
THANK YOU
26