0% found this document useful (0 votes)
9 views9 pages

Accelerating DNA Pairwise Sequence Alignment Using FPGA and A Customized Convolutional Neural Network - ScienceDirect

This study presents an optimized digital implementation of DNA pairwise sequence alignment algorithms using FPGA and a customized convolutional neural network, achieving a performance improvement with O(N/4) calculation steps. The proposed method addresses the limitations of traditional dynamic programming approaches by fully parallelizing the algorithms, resulting in a significant increase in speed and accuracy. The implementation demonstrates a global alignment accuracy of 98.3% and is applicable to both DNA and RNA sequences.

Uploaded by

CS-3-5E0 dinesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views9 pages

Accelerating DNA Pairwise Sequence Alignment Using FPGA and A Customized Convolutional Neural Network - ScienceDirect

This study presents an optimized digital implementation of DNA pairwise sequence alignment algorithms using FPGA and a customized convolutional neural network, achieving a performance improvement with O(N/4) calculation steps. The proposed method addresses the limitations of traditional dynamic programming approaches by fully parallelizing the algorithms, resulting in a significant increase in speed and accuracy. The implementation demonstrates a global alignment accuracy of 98.3% and is applicable to both DNA and RNA sequences.

Uploaded by

CS-3-5E0 dinesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

3/21/23, 11:55 AM Accelerating DNA pairwise sequence alignment using FPGA and a customized convolutional neural network - ScienceDirect

Computers & Electrical Engineering


Volume 92, June 2021, 107112

Accelerating DNA pairwise sequence alignment using FPGA and a


customized convolutional neural network
Amr Ezz El-Din Rashed , Marwa Obaya, Hossam El~Din Moustafa

Show more

Share Cite

https://doi.org/10.1016/j.compeleceng.2021.107112 ↗
Get rights and content ↗

Abstract

An optimized software and hardware digital implementation of two widely used DNA sequence
alignment algorithms based on lookup table(LUT) is illustrated in this study. These algorithms are the
best means for identifying similar regions between sequences. The proposed implementation relies on
the complete parallelization of these foundational algorithms under certain limitations to overcome
most of the problems of dynamic programming and hardware implementation. The proposed method
takes O(N/4) calculation steps, where N is the length of each sequence with a minimum value of four
(i.e., N = 4,8,12,…). A performance comparison between the state of art and our proposed algorithm is
conducted for software and hardware implementation. Combinational circuits are used for FPGA-based
hardware implementation of DNA sequence alignment algorithms. Performance and device resource
usage are evaluated for different hardware designs. A customized convolution neural network model is
used to implement global alignment and achieve 98.3% accuracy.

Graphical abstract

https://www.sciencedirect.com/science/article/abs/pii/S0045790621001178 1/9
3/21/23, 11:55 AM Accelerating DNA pairwise sequence alignment using FPGA and a customized convolutional neural network - ScienceDirect

Download : Download high-res image (160KB)


Download : Download full-size image

Introduction

Deoxyribonucleic acid (DNA) is a complex molecule and the “hereditary material” present inside each
cell of all living beings. It contains the instructions associated with the organism's development, life,
and reproduction. These instructions direct the cells with regard to their role in our bodies. Nearly all
the cells in a human body contain similar DNA, and most of it is present in the cell nucleus. The
information in DNA is saved as a unique genetic code consisting of four chemical nucleotides (NT),
namely, adenine (A), guanine (G), cytosine (C), and thymine (T), or a four-letter set {A, C, G, T}. The
complete human DNA contains about three billion NTs. The order of these NTs determines the
biological instruction in the genome for building and maintaining an organism. This is almost similar
to that method wherein alphabets appear in particular orders to form words or sentences.

Ribonucleic acid (RNA) is “a complex compound of high molecular weight that functions in cellular
protein synthesis and replaces DNA as a carrier of genetic codes in certain viruses. It consists of four
ribose NTs or nitrogenous bases: adenine (A), guanine (G), cytosine (C), and uracil (U).” U replaces the T
present in DNA. Thus, the alphabet for RNA sequence is also a four-letter set {A, C, G, U}. and the
alphabet for protein sequences is a 20-letter set {A, C−I, K−N, P−T, V WY}.

It is feasible to determine where the mismatches and matches are among two or more DNA, RNA, or
protein sequences by aligning sequences using sequence alignment algorithms. Sequence alignment is
a broadly used process in bioinformatics for arranging two (pairwise alignment) or more (multiple
sequence alignment) biological sequences (e.g., DNA, RNA, and protein sequences) of characters to
identify regions of similarity. It seeks to identify the optimal alignment with the highest total score, i.e.,
the maximum number of base-to-base matches, without altering the order of bases in either sequence.
In addition, gap-to-gap matches are prohibited. Mismatches and gaps can be considered mutations and
indels, respectively. Thereby, differences between sequences with a similar origin can be identified.

Hence, this process is considered a foundational step for detecting the structural or functional
importance of strange sequences. This process would also aid in detecting the gene responsible for a
specified disease or disorder, or determining the gene or genes that encode for a specified protein. A
large number of DNA sequencing projects have contributed to the growth of bioinformatics and
computational biology. It has numerous significant real-world applications.

https://www.sciencedirect.com/science/article/abs/pii/S0045790621001178 2/9
3/21/23, 11:55 AM Accelerating DNA pairwise sequence alignment using FPGA and a customized convolutional neural network - ScienceDirect

Pairwise sequence alignment (PWSA) methods are used for aligning two sequences simultaneously to
identify regions of similarity.

Fig. 1 shows the three foundational techniques for obtaining pairwise alignments. They are the dot-
matrix technique introduced by Gibbs and McIntyre; dynamic programming (DP), which was first
developed by Charles DeLisi in the USA for protein–DNA binding and Georgii Gurskii and Alexander
Zasedatelev in the USSR; and word techniques, which are heuristic methods that cannot guarantee an
optimum alignment result. Word techniques or database search tools are well-known for their
achievement in the database search tools (FASTA), (BLAST) family, and SIM2. In large-scale database
searches or long sequences, computational efficiency is often achieved by replacing the DP algorithms
with a heuristic one that trades accuracy for a computational time, as shown in Table I. The DP
technique can be used to produce global alignments through the Needleman–Wunsch (NW) algorithm
or the Hirschberg algorithm. It can also be used to produce local alignments through the Smith–
Waterman (SW) algorithm, the Gotoh algorithm, or the Miller–Myers algorithm.

Table II notes that the NW and SW algorithms require O(MN) calculation steps and O(MN) run time.
Here, M is the length of the first sequence, and N is the length of the second sequence. These
algorithms support different scores for exact residue matches, similar residues, and gaps. A
substitution matrix such as PAM or BLOSUM can be used to weigh residue matching scores, which will
not affect the time and space complexity. The optimized methods such as Miller–Myers and Hirschberg
can optimize space complexity to O(M+N).

DP algorithms guarantee optimal alignment for a specified set of scoring functions from a
mathematical perspective. Although they do not require a gap penalty, gap penalties are essential for
their efficient operation. In addition, they become slow for multiple sequences (more than two
sequences) or very long sequences. They are cost inefficient, time consuming, and require substantial
amounts of calculation while aligning more than two sequences. In general, the outputs of sequence
alignment algorithms, which are based on DP, are classified as global or local alignments. Table III
compares SW and NW algorithms. Another important aspect here is with regard to the alignment's
array for both algorithms. The highlighted text in green is the real alignment's results that appear
when the SW algorithm is used. Unlike the SW algorithm, the NW algorithm displays the complete set
of input letters in the alignment's result or alignment array.

The length of a DNA or RNA sequence is variable. Thus, the construction of algorithms that produce an
optimal alignment and a high score between sequences consisting of the four letters A, C, G, and T (for
DNA), or A, C, G, and U (for RNA) becomes challenging. This study aims to analyze and study two
commonly used sequence alignment algorithms and effectively realize them on cost-efficient, high-
performance, and high-speed platforms. The alignment array is reshaped as a 1-D array rather than a
2-D array in this study to aid the design process.

The remainder of this study is organized as follows: The related work is covered in Section 2. The main
issues encountered in the DNA sequence and hardware implementations are explained in Section 3.
The limitations and restrictions of our proposed technique are described in Section 4. The proposed
algorithm design, statistics, and software implementations are illustrated in Section 5. The hardware
implementation of the SW algorithm and the NW algorithm using the FPGA platform is reported in
Section 6. Before the conclusions are drawn, the design and implementation of the NW algorithm are
demonstrated using a deep learning (DL) convolutional neural network (CNN) network in Section 7.

https://www.sciencedirect.com/science/article/abs/pii/S0045790621001178 3/9
3/21/23, 11:55 AM Accelerating DNA pairwise sequence alignment using FPGA and a customized convolutional neural network - ScienceDirect

Almost all the sections of this study describe a comparison with other studies or other tools. Future
work and potential enhancements or modifications to our optimized design to improve the prediction
results are presented in the final section.

Section snippets

Related work

In [1], Strengholt and Brobbel explained a technique to store the values of the similarity score matrix of
the SW algorithm differentially. They also described the systematic approach to design an accelerator,
which realized this technique. The realization was on an Intel FPGA platform. The author stated that
this technique could produce an overall performance of ninety-four GCU/s, which may accelerate to
5 × that of classic CPUs.

In [2], SW and FASTA exhibited considerably higher performance…

Problem definition

Biological sequence alignment algorithms are time consuming even when implemented using
accelerating hardware platforms such as CPU, GPU, or FPGA for the following reasons: (1) The number
of sequences is large, and each of their lengths can be very long. (2) Table II shows that the algorithms
used to align the sequences requires O(MN) calculation steps and consumes O(MN) time (M and N are
the lengths of the two input sequences). (3) Basic sequence alignment algorithms are internally
dependent…

Limitations

According to the DC algorithm, the alignment issue can be broken down into smaller sub problems.
Then, the smaller sub problems can be solved optimally, and their results can be used to construct the
optimum solution to the main problem. In this study, we propose using equal-length sequences (i.e.,
multiples of four N=4, 8, 12 ...) that can be applied to DNA or RNA sequences because DNA and RNA
sequences consist of four letters of the alphabet, representing four NTs despite the protein sequence …

Proposed algorithm

Fig. 3 shows that our implementation depends on the development of a truth table or an LUT of all
feasible combinations of the two DNA input sequences after converting the DNA sequence from
alphabets into binary representations. A truth table presents each feasible DNA input sequence
combination to the alignment algorithm function, with the resulting alignment or alignment array
(output) depending on the combination of DNA input sequences.

Table IV presents an LUT that contains a binary…

Hardware implementation of SW algorithm

https://www.sciencedirect.com/science/article/abs/pii/S0045790621001178 4/9
3/21/23, 11:55 AM Accelerating DNA pairwise sequence alignment using FPGA and a customized convolutional neural network - ScienceDirect

In previous sections, we established that our technique is faster than the state-of-the-art
implementations for long DNA sequences. Now, we demonstrate the implementation of the SW
algorithm based on Xilinx FPGA.

Fig. 5 shows the basic steps required for our implementation. The DNA input sequences are converted
from letters into a binary representation to construct a truth table of all the possibilities for hardware
implementation. This conversion will be used for local and global hardware…

Hardware implementation of NW algorithm

Fig. 7 shows that to implement the NW algorithm, we need to first construct a truth table that contains
the two DNA input sequences (16-bit) as inputs (after converting letters into binary representation). In
addition, the output will be the alignment array for the NW algorithm after their characters are
encoded into binary representation (54-bit). Then, 54 Boolean functions are derived from the truth
table. Two proposed class reduction techniques are used. The first reduction technique reduces …

Classical machine learning for global sequence alignment

Ten traditional classifiers including MLP, support vector machine (SVM), decision tree, SGD, and
random forest are tested with four datasets as in Table XXVIII, using Python Sklearn library with default
classifiers’ hyperparameters. We use the 80/20 split for training and testing data. No reasonable
accuracy is achieved because the input features are dependent. The original dataset is the third dataset.
It has a binary input of 16 bits and 254 classes.

The other datasets are generated from this…

CNN for global sequence alignment

DL has attracted considerable interest in research centers. Compared with traditional neural network
architecture, it exhibits substantial advantages in feature extraction and model fitting. In addition, it is
highly effective at discovering increasingly abstract feature representations whose generalization
capability is strong from the raw input data. It has successfully solved certain issues that were
considered complicated to resolve by AI in the past. The use of big data for training and…

Conclusion

Most of the previous studies aimed to accelerate the alignment algorithms in different ways without
providing any effective solution for sequential process problems. Our proposed algorithms depend on
the parallelization of common alignment algorithms for DNA sequences under certain limitations to
overcome the main problems of DP and hardware implementation. It can also be applied to RNA. This
technique can be applied to any other local or global alignment method and for short as well as very…

Future work

Using different opening gap values in NW design does not substantially affect the HW performance as
well as the number of characters representing the alignment array (still 18 characters), but reviewing
and standardizing the alignment array can (i.e., use a single pattern for all full-mismatch conditions
https://www.sciencedirect.com/science/article/abs/pii/S0045790621001178 5/9
3/21/23, 11:55 AM Accelerating DNA pairwise sequence alignment using FPGA and a customized convolutional neural network - ScienceDirect

['****::::****'] or a single symbol to represent mismatch condition [colons only] instead of using two
symbols [space and colon]. In addition, this standardization will reduce the number of…

Declaration of Competing Interest


None.…

Contributions

An implementation based on a look-up-table (LUT) to accelerate DNA sequence alignment algorithms


under certain limitations is presented. Unlike other studies that rely on systolic cell architecture, this
ROM-based hardware implementation requires only O(N/4) cycles or calculation steps to obtain the
complete result or a maximum delay of 7.5 ns when implemented using combinational circuits. The
derivation of 254 patterns is presented for a global alignment array for all the input combinations.…

Author statement

Amr Ezz El-Din Rashed: :Conceptualization, Methodology, Software, Hardware ,Writing Original draft
preparation.

Hossam El~Din Moustafa: Supervision, Reviewing and Editing.

Marwa Obaya: Supervision, Reviewing ,…

Amr Ezz El-Din Rashed, PhD. student at Electronics and Communications Engineering Department,
Faculty of Engineering, Mansoura University, Egypt. Now, he is a lecturer at Computer Engineering
Department, Faculty of Computers, and Information Technology, Taif university, KSA.The main research
points include bioinformatics, biomedical Image processing, speaker recognition, computer vision,
machine learning, deep learning applications, embedded systems including FPGA and VHDL.…

References (29)

L Ji
One-dimensional pairwise CNN for the global alignment of two DNA sequences
Neurocomputing (2015)

Yi-L Liao
Adaptively Banded Smith-Waterman Algorithm for Long Reads and Its Hardware
Accelerator

Strengholt, B; Brobbel, M. Acceleration of the Smith-Waterman algorithm for DNA sequence alignment
using an FPGA...

W.R. Pearson
Comparison of methods for searching protein sequence databases
Protein Sci (1995)

https://www.sciencedirect.com/science/article/abs/pii/S0045790621001178 6/9
3/21/23, 11:55 AM Accelerating DNA pairwise sequence alignment using FPGA and a customized convolutional neural network - ScienceDirect

Fa Zhang et al.
A parallel smith-waterman algorithm based on divide and conquer

P Zhang et al.
Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing
platform

M. Kim
Accelerating Next Generation Genome Reassembly in FPGAsAlignment Using Dynamic
Programming Algorithms
(2011)

M N Isa
High performance reconfigurable architectures for biological sequence alignment
(2013)

E Rucci
SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences
BMC Syst Biol (2018)

D Zou et al.
Optimization schemes and performance evaluation of Smith–Waterman algorithm on CPU,
GPU and FPGA
Concurr Comput: Pract Exp (2012)

View more references

Cited by (8)

Deep convolutional neural networks-based Hardware–Software on-chip system for


computer vision application
2022, Computers and Electrical Engineering

Show abstract

Protein remote homology recognition using local and global structural sequence alignment
2023, Journal of Intelligent and Fuzzy Systems

A survey for the methods of detection and classification of genetic mutations


2022, Indonesian Journal of Electrical Engineering and Computer Science

Proposal of Smith-Waterman algorithm on FPGA to accelerate the forward and backtracking


steps
2022, PLoS ONE

https://www.sciencedirect.com/science/article/abs/pii/S0045790621001178 7/9
3/21/23, 11:55 AM Accelerating DNA pairwise sequence alignment using FPGA and a customized convolutional neural network - ScienceDirect

Optimizing Multiple Sequence Alignment using Multi-Objective Genetic Algorithms


2022, 2022 International Conference on Decision Aid Sciences and Applications, DASA 2022

Research and Implementation of Fast-LPRNet Algorithm for License Plate Recognition


2021, Journal of Electrical and Computer Engineering

View all citing articles on Scopus

Recommended articles (6)

Research article

Cloud edge computing for socialization robot based on intelligent data envelopment
Computers & Electrical Engineering, Volume 92, 2021, Article 107136
Show abstract

Research article

A deep multimodal feature learning network for RGB-D salient object detection
Computers & Electrical Engineering, Volume 92, 2021, Article 107006

Show abstract

Research article

MapReduce framework based gridlet allocation technique in computational grid


Computers & Electrical Engineering, Volume 92, 2021, Article 107131

Show abstract

Research article

Introduction to the special section on application of artificial intelligence in security of


cyber physical systems (VSI-aicps)
Computers & Electrical Engineering, Volume 92, 2021, Article 107145

Research article

Solving the scalarization issues of Advantage-based Reinforcement Learning algorithms


Computers & Electrical Engineering, Volume 92, 2021, Article 107117

Show abstract

Research article

Integrated design of a lower limb rehabilitation mechanism using differential evolution


Computers & Electrical Engineering, Volume 92, 2021, Article 107103

Show abstract

https://www.sciencedirect.com/science/article/abs/pii/S0045790621001178 8/9
3/21/23, 11:55 AM Accelerating DNA pairwise sequence alignment using FPGA and a customized convolutional neural network - ScienceDirect

Amr Ezz El-Din Rashed, PhD. student at Electronics and Communications Engineering Department, Faculty of
Engineering, Mansoura University, Egypt. Now, he is a lecturer at Computer Engineering Department, Faculty of
Computers, and Information Technology, Taif university, KSA.The main research points include bioinformatics,
biomedical Image processing, speaker recognition, computer vision, machine learning, deep learning applications,
embedded systems including FPGA and VHDL.

Marwa Ismael Obayya, Associate Professor at Electronics and Communications Engineering Department, Faculty of
Engineering, Mansoura University, Egypt. Now, she is a Director of Communications Engineering Program,
Electrical Engineering Department in Princess Nora Bent Abdurrahman University, Riyad, KSA. Her research area of
interest was utilized in the field of image processing, Signal Processing, Optimization, and machine learning. She
has several publications in biomedical engineering, optimization, and intelligent machine learning.

Hossam El-Din Moustafa, Associate Professor at the Department of Electronics and Communications Engineering,
the founder and executive manager of Biomedical Engineering Program (BME) at the Faculty of Engineering,
Mansoura University. The main research points include biomedical image and signal processing and deep learning
applications.

Reviews processed and recommended for publication by Guest Editor Feiran Huang.

View full text

© 2021 Elsevier Ltd. All rights reserved.

Copyright © 2023 Elsevier B.V. or its licensors or contributors.


ScienceDirect® is a registered trademark of Elsevier B.V.

https://www.sciencedirect.com/science/article/abs/pii/S0045790621001178 9/9

You might also like