0% found this document useful (0 votes)
43 views26 pages

8-5-19-Sequence Alignment in Gpu

1. The document discusses sequence alignment techniques for biological sequences using GPU platforms. 2. It describes the Smith-Waterman and Needleman-Wunsch algorithms for local and global sequence alignment using dynamic programming. The algorithms have quadratic time and space complexity. 3. Memory efficient and time efficient techniques for sequence alignment using pointing matrices and skewing transformations on GPUs are proposed to reduce the computational complexity to linear or quadratic in the shorter sequence length.

Uploaded by

Siba Prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views26 pages

8-5-19-Sequence Alignment in Gpu

1. The document discusses sequence alignment techniques for biological sequences using GPU platforms. 2. It describes the Smith-Waterman and Needleman-Wunsch algorithms for local and global sequence alignment using dynamic programming. The algorithms have quadratic time and space complexity. 3. Memory efficient and time efficient techniques for sequence alignment using pointing matrices and skewing transformations on GPUs are proposed to reduce the computational complexity to linear or quadratic in the shorter sequence length.

Uploaded by

Siba Prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Indian Institute of Engineering Science and Technology, Shibpur, Howrah, India

SEQUENCE ALIGNMENT
IN GPU PLATFORM
PRESENTED BY- GUIDED BY-
1. KINJAL RAY(510517001) Dr. SURAJEET GHOSH
2. KISHAN SAHU(510517009)
3. SOLANKI KUNDU(510517022)
4. DEBAROTI CHOWDHURY(510517079)

1
Content:
 Sequence Alignment
 CUDA/GPU architecture
 Objectives of different sequence Algorithms
 Smith-Waterman Algorithms
 Needleman-Wunsch Algorithms
 Memory Efficient DNA Sequence Alignment Technique Using Pointing Matrix
 Time efficient Alignment using skewing transformation over Pointing Matrix
 Simulations
 Conclusion
 References

2
Sequence Alignment
 What is Sequence Alignment ?
 Why align Sequences ?
 Types :
 Global Alignment :
- Example : Needleman-Wunsch algorithm
- ACTACTAGATTACTTACGGATCAGGTACTTTAGAGGCTTGCAACCA

- ||| | |||||| |||||||||||||||

- TACTCACGGATGAGGTACTTTAGAGGC

 Local Alignment :
– Example: Smith-Waterman algorithm
ACTACTAGATTACTTACGGATCAGGTACTTTAGAGGCTTGCAACCA

||||||||||| ||||||| |||||||||||||| |||||||

ACTACTAGATT----ACGGATC--GTACTTTAGAGGCTAGCAACCA

3
CUDA/GPU ARCHITECTURE[2]:
 What is CUDA ?
 CUDA allows the programmer to program for Nvidia graphics cards with an
extension of the C programming language.
 Philosophy: provide minimum set of extensions to expose power.
 Declaration Specifiers : __global__ void KerenlFunc();//kernel func runs on device

__device__ int GlobalVar;//variable in device memory

__shared__ int SharedVar;//variable per block shared memory

 Why GPU?
 high performance
 running parallel tasks.
 Applications of CUDA :
 3-D image analysis,
 Bio-informatics ,
 Biological Simulations
4
Objective:
 Needleman–Wunsch algorithm : To perform global sequence alignment
between two nucleotide or amino acid sequences and find-out structural or
functional similarities using dynamic programming.

Worst-case performance O(mn)


Worst-case space complexity O(mn)

 Smith-Waterman algorithm : To perform local sequence alignment between


two nucleotide or amino acid sequences and find-out structural or functional
similarities using dynamic programming.

Worst-case performance O(mn)


Worst-case space complexity O(mn)
Here m and n are the lengths of the aligning sequences.

5
 Memory efficient Alignment using Pointing Matrix : The proposed DNA sequence
alignment technique uses a novel concept of pointing matrix. The directed path in the
pointing matrix ensures faster and accurate finding of the optimal alignment pertaining with
the accuracy ensured by the well known Needleman & Wunsch algorithm .

Time Complexity(Pointing matrix formation) O(mn)


Time Complexity(Scoring matrix formation) O(mn)
Worst-case space complexity O(mn + 2*max(m,n)) = O(mn)
 Time efficient Alignment using skewing transformation over Pointing Matrix :
We present a new parallel approach of Needleman Wunsch algorithm for global sequence
alignment. This approach uses skewing transformation for traversal and calculation of the
dynamic programming matrix.

Time Complexity(Skewing matrix formation ) O(m+n)


Time Complexity(Foundation matrix formation ) O(m+n)
Worst-case space complexity O(mn +3* min(m ,n)) = O(mn)
Here m and n are the lengths of the aligning sequences.
6
Smith-waterman algorithm [3]
This algorithm was proposed by Temple F. Smith and Michael S. Waterman in
1981.
The sequence is constructed by these rules-

 Determining of substitution matrix and the gap penalty scheme.

 Initializing the scoring matrix

 Matrix filling with the appropriate scheme

 Trace backing the sequences for an optimal alignments

7
Animation source: Wikipedia
8
Needleman-wunsch algorithm [1]

9
Filling up the dynamic programming matrix dp[n][n]

 The first row and first column are filled with index * gap penalty
dp[i][0] = dp[0][i] = GAP * i

 For every cell (i, j) in the matrix we compute:


top = dp[i-1][j] + GAP
left = dp[i][j-1] + GAP
diagonal = dp[i-1][j-1] + MATCH , if the corresponding characters match
diagonal = dp[i-1][j-1] + MISMATCH , if the corresponding characters do not match
dp[i][j] = max(top , left , diagonal)

 Trace back matrix is filled accordingly


10
Tracing arrows back to origin
The sequence is constructed by these rules-
 A diagonal arrow represents a match or mismatch, so the
letters of the column and the letter of the row of the
origin cell will align.

 A horizontal or vertical arrow represents an indel.


Horizontal arrows will align a gap ("-") to the letter of the
row (the "side" sequence), vertical arrows will align a gap
to the letter of the column (the "top" sequence).

 If there are multiple arrows to choose from, they


represent a branching of the alignments. If two or more
branches all belong to paths from the bottom left to the
top right cell, they are equally viable alignments. In this
case, note the paths as separate alignment candidates.

Sequences Best alignments


-------------- ----------------------
GCATGCU GCATG-CU GCA-TGCU GCAT-GCU
GATTACA G-ATTACA G-ATTACA G-ATTACA
11
A Memory Efficient DNA Sequence Alignment Technique Using
Pointing Matrix [5] :
Formation of Pointing Matrix :
 Filling up of each cell requires only its three
previously filled neighboring cell information.
Hence, each row i requires information of only its
previous row, i.e., (i − 1). Keeping the current row
unchanged and over-writing its content to its
previous row .
 At first, the first row is filled up based on the initial
gap penalties and the second row is filled consulting
the first row’s content. (a) Two DNA Sequences S1 and S2, (b)
Time variant Score Matrix shown by 2
 Afterward, the first row is over written by the scores rows at a time (OR and CR) (c) Pointing
of the second row and the second row is utilized for Matrix, 0: termination point, 1: Vertical
calculating the third row’s content based on the move, 2: Diagonal move, 3: Horizontal
move.
content of the 1st row.
 Similarly, the fourth row (which uses information of
only the third row) is calculated and over writes the
second row. No loss of information actually takes
12
place.
Pseudo Code for creation of pointing and Scoring Matrix :

13
Sequence Alignment using Pointing Matrix :
 The alignment begins from the last cell (last row last
column), i.e., from the bottom right corner cell (n + 1,
m + 1) of the pointing matrix.
 As discussed earlier, the movement between the
neighboring cells (viz., horizontal, vertical and
diagonal) are taken place based on the pointing
matrix’s cell value .
 A vertical movement happens if the cell value is found
to be ‘1’ and the corresponding nucleotide associated
with sequence (S2) is placed, however, in S2 a Gap “−”
is placed.
 If the cell value is found to be ‘2’ then a diagonal
movement occurs and the corresponding nucleotide (a) Generation of aligned sequence using
associated with sequences i.e., (S1 and S2) are placed. Pointing Matrix, (b) Status of Score
Matrix after full traversal of the
 A value of ‘3’ denotes a horizontal move which places sequences and (c) Final aligned
a gap in the alignment of the second DNA strand. The sequences of S1 and S2.
similar procedure is followed until it reaches the (0, 0)

14
Time efficient Alignment using skewing transformation
over Pointing Matrix [5] :
DNA_1 = A C T m = length(DNA_1)
DNA_2 = A C A A n = length(DNA_2)
Step of the algorithm as follows:
 Initialization: We use a Foundation Matrix FM[3][min(N+1, M+1)] .Fill
FM[0][0] = 0 ,
 FM[1][0] = -GAP ,
 FM[1][1] = -GAP

E L L L L

U
 Initialization of Traceback Matrix (TM): TM[m+1][n+1]
U
TM[0][0] = ‘E’ E= End point
TM[0][i] = ‘L’ L = Left , i = 1 to n ;
TM[i][0] = ‘U’ U = Up , i = 1 to m ;

15
Traceback Matrix Fill using Foundation Matrix:

 Fill Traceback Matrix using Foundation Matrix(FM)


from the value L, D & U .
A C A A

A LP MP FP
C 2 1 0
T 0 2 1
1 0 2
2 1 0
0 2 1
1 0 2

A C A A

A
C
T 16
3 different approach for different length sequences
For m = n For m > n
END Point
Initialization
Up to m times parallelization
diagonal value taking from 1st
array 1st position,
After that diagonal value taking
from 1st array 2nd position of
the Foundation Matrix
respectively
For m < n

Where m = length(Seq_1)
n = length(Seq_2)

17
Simulations
 System specifications:
 GPU : NVIDIA GeForce 940MX
 CPU : Intel(TM) i5 2.50GHz
 Memory : 8 GB

 Constraints:
 Maximum sequence length : 300 characters
 Maximum number of simultaneous queries (maximum number of threads) : 1024

18
CUDA IMPLEMENTATION

Alignment Time
Length nw(μs) nm(μs) pm(μs) seq(μs)
16 190.63 343.97 202.59 2479
32 330.88 814.7 375.81 5477
64 914.38 3015 1112.4 12100
128 2753 10041 3547.3 26735
256 9990 43413 12998 59167

19
memcpy HtoD

Length nw(μs) nm(μs) pm(μs)


16 1.32 1.31 1.31
32 1.4 1.31 1.34
64 1.4 1.32 1.34
128 1.36 1.34 1.47
256 1.55 1.53 1.57

memcpy DtoH
Length nw(μs) nm(μs) pm(μs)
16 2.14 2.14 2.11
32 2.23 2.17 2.14
64 2.24 2.3 2.17
128 2.28 2.36 2.17
256 2.38 2.49 2.33

20
C IMPLEMENTATION

Alignment Time
Length pm (us) pm_skew (10 ms)
16 1.994 8.6457
32 7.979 21.1027
64 34.903 63.6043
128 271.642 235.5422
256 805.223 620.4874

21
Sequence Length vs Alignment time (Pointing
Matrix Implementation)

length pm (us)

16 1.994

32 7.979

64 34.903

128 171.642

256 805.223

22
Sequence Length vs Alignment time (Pointing
Matrix with Skewing transformation )

length pm_skew (ms)

16 86.457

32 211.027

64 636.043

128 2355.422

256 6204.874

23
Conclusion and future work:
Conclusion:
 A memory efficient approach and time efficient approach for DNA sequence
alignment has been presented .
 Parallel computations
 Performance optimization
 Minimize CPU  GPU data transfers.

Future Work:
 Study of different Sequence alignment algorithms.
 Improvement of implemented algorithm to allow sequences of larger length
 Cuda implemetation of Time efficient proposed algorithm

24
References
[1] S. B. Needleman and C. D. Wunsch, “A General Method Applicable to the
Search for Similarities in the Amino Acid Sequence of Two Proteins,” Journal of
Molecular Biology, vol. 48, no. 3, 1970.
[2] “NVIDIA CUDA Programming Guide, Version 4.2.” [Online]. Available:
http://developer.download.nvidia.com/compute/
DevZone/docs/html/C/doc/CUDA C Programming Guide.pdf
[3] T. F. Smith and M. S. Waterman, “Identification of Common Molecular
Subsequences,” Journal of Molecular Biology, vol. 147, no. 1, pp. 195– 197, March
1981
[4] Farrar, M.: Striped Smith-Waterman speeds database searches six times over
other SIMD implementations. Bioinformatics 23, pp 156, 161 (2007)
[5] S. S. Ray, A. Banerjee, A. Datta and S. Ghosh, "A memory efficient DNA
sequence alignment technique using pointing matrix," 2016 IEEE Region 10
Conference (TENCON), Singapore, 2016, pp. 3559-3562.
doi: 10.1109/TENCON.2016.7848720

25
THANK YOU

26

You might also like