Batch 17
Batch 17
V I JAYAWA DA
Department of Computer Science and Engineering
1
Abstract
Sequence Alignment is a way of arranging two(Pairwise Alignment)or more
(Multiple Sequence Alignment) biological sequences(e.g DNA, RNA, or Protein
sequences) of characters to identify regions of similarity. Similarities may be a
consequence of functional or evolutional relationships between these sequences. This
paper mainly focuses on the Pairwise Alignment of DNA sequences which identifies
the similarities and differences between two DNA sequences. Existing methods for
pairwise sequence alignment, such as the Needleman-Wunsch algorithm and the
Smith-Waterman algorithm, have limitations in terms of computational efficiency and
accuracy. Sequence alignment over huge databases cannot produce findings in a fair
amount of time, power, or money. This paper proposes a novel approach for pairwise
sequence alignment in DNA using a multilayer perceptron (MLP) trained with
particle swarm optimization (PSO). The PSO algorithm is used to optimize the
parameters of the MLP for better alignment performance.
keywords: Multilayer Perceptron, Sequence Alignment, Particle Swarm
Optimization.
2
Presentation Outline
1. Aim and Motivation 13.1. UseCase Diagram
2. Research Questions 13.2. Activity Diagram
3. Title Justification 13.3. Sequence Diagram
4. Introduction 14. Functional and Non-Functional
Requirements
5. Basic Concept
15. Implementation and Analysis of Results
6. Study on Existing Technologies
15.1 Output Screenshots
7. Gap Analysis
16. Conclusion and Future Work
8. Objectives
17. Timeline Chart
9. Scope
References
10. Dataset Description
11. Methodology
11.1.Proposed Model
11.2. Modules of the Proposed
Model
11.3 Algorithms
12. SDLC Model
13. UML Diagrams 3
1. Aim and Motivation
Aim: Sequence alignment over large databases cannot yield results within a
reasonable time, power, and cost. The aim of the project is to develop a
machine learning model to accelerate the performance of sequence alignment.
Motivation:
•Biological sequence databases are growing exponentially.
•Developing faster and more efficient algorithms for sequence alignment can
enable researchers to analyze large amounts of biological data quickly and
accurately.
4
2. Research Questions
1. Can machine learning algorithms improve the speed and accuracy of sequence alignment
over large databases compared to traditional alignment methods?
2. How can machine learning be used to optimize sequence alignment parameters for
different types of biological data (e.g., DNA vs. protein sequences)?
4. How do different types of machine learning models (e.g., neural networks, decision
trees, support vector machines) compare in terms of their performance on sequence
alignment tasks?
5
3. Title Justification
Pairwise sequence alignment is a fundamental task in bioinformatics that
involves comparing two DNA sequences. Dynamic programming algorithms
are computationally expensive for long sequences, leading to the use of
artificial neural networks (ANNs) as an alternative.
Multilayer perceptron (MLP) can be trained using Particle Swarm
Optimization (PSO) to improve its accuracy in sequence alignment. This
approach can handle longer sequences, achieve higher accuracy, and be used
for various types of sequence alignment tasks. However, it requires a large
amount of training data and can be computationally intensive.
Overall, the use of MLP trained with PSO for sequence alignment in DNA is
a promising approach that has the potential to improve the accuracy and
efficiency of sequence alignment in bioinformatics.
6
4. Introduction
Pairwise sequence alignment in bioinformatics compares two DNA sequences to
identify regions of similarity. Dynamic programming algorithms can be
computationally expensive for long sequences, so artificial neural networks like
multilayer perceptron (MLP) are used instead. MLP can be trained with Particle
Swarm Optimization (PSO) to improve accuracy in handling longer sequences. In
this project, we explore using MLP trained with PSO for pairwise sequence
alignment in DNA and compare its performance against other methods.
7
5.Basic Concepts
Sequence Alignment:
Alignment is the process of arranging sequences in a way that maximizes
their similarity by matching corresponding positions. It involves inserting
gaps in the sequences to account for insertions, deletions, or substitutions
that may have occurred during evolution.
. 8
Title:Accelerating DNA pairwise sequence alignment using FPGA and a customized
convolutional neural network
Journal Details: IEEE Access,date of publication 30 March 2021,
Description:The paper "Accelerating DNA pairwise sequence alignment using FPGA
and a customized convolutional neural network" proposes a novel approach to
accelerating pairwise sequence alignment of DNA sequences using field
programmable gate arrays (FPGA) and a customized convolutional neural network
(CNN). The authors demonstrate that their approach can achieve significant speedups
compared to traditional implementations of sequence alignment algorithms, while
maintaining high accuracy. The authors also discuss the potential for their approach
to be integrated into existing bioinformatics pipelines and applications..
Advantages:
High speed: The proposed approach can achieve significant speedups compared to
traditional implementations of sequence alignment algorithms.
Disadvantages:
Hardware dependency: The approach is hardware-dependent and requires access to
FPGA devices, which can limit its accessibility and portability
Title :Performance-Based Analogising of Needleman Wunsch Algorithm to Align
DNA Sequences Using GPU and FPGA
Journal Details:IEEE Access,date of publication 05 February 2021
Dataset: :BAliBASE, SABmark, or OXBench
Description:
The paper "Performance-Based Analogising of Needleman Wunsch Algorithm to
Align DNA Sequences Using GPU and FPGA" proposes an optimized implementation
of the Needleman-Wunsch algorithm for aligning DNA sequences using both GPU
and FPGA architectures. The authors compare the performance of their optimized
implementation to existing CPU-based implementations, demonstrating significant
speedups on both GPU and FPGA platforms. The authors also discuss the potential
for further optimization and improvement in future work.
Advantages
1.The proposed approach can achieve significant speedups compared to traditional
implementations of sequence alignment algorithms
2. Fast Computation
Disadvantages:
1. Hardware dependency, Limited flexibility
Title:Local Alignment of DNA Sequence Based on Deep Reinforcement learning
Journal Details: IEEE Open Journal of Engineering in Medicine and Biology, Date
of Publication: 27 April 2021
Dataset: :CRISPR-Cas9,Chip-sq
Description: The paper proposes a deep reinforcement learning (DRL) based
approach for local alignment of DNA sequences. Local alignment aims to identify the
subsequences that have the highest degree of similarity between two input
sequences. The proposed approach employs a convolutional neural network (CNN)
to learn the sequence features and a DRL model to make optimal alignment
decisions. The DRL model learns to maximize the reward function by aligning the two
sequences to obtain the highest similarity score. The approach is evaluated on
standard benchmark datasets and compared with existing state-of-the-art method
Advantages:The proposed DRL-based approach achieves higher accuracy and faster
processing times compared to existing methods.
Disadvantages:The approach requires a large amount of training data to learn the
optimal alignment strategy.
Title:Simple and Efficient Pattern Matching Algorithms for Biological Sequences
Journal Details: IEEE Access,Date of Publication 23 January 2020
Sequence IEEE Access, date of Needleman Wunsch BailBase, 1. higher accuracy 1. time-consuming
Alignment Using publication July 26, 2021
Horstrad
Machine
1 Learning-Based 2. highly scalable
Needleman–
Wunsch
Algorithm
Local Alignment IIEEE Open Journal of Deep Q-Network (DQN) Synthetic, 1. new and 1. significant amount of training data and
Engineering in Medicine and
algorithm promising computation power
of DNA Sequence Biology, Date of Non Synthetic
Publication: 27 April 2021 technique.
Based on Deep
Reinforcement 2.Handles variation
learning of s equences
2 .
Accelerating DNA IEEE Access,date of field programmable gate significant speedups The approach is hardware-dependent and
publication 30 March arrays (FPGA) and a compared to requires access to FPGA devices, which
pairwise
2021 customized traditional can limit its accessibility and portability
sequence implementations of
convolutional neural
alignment using network sequence alignment
FPGA and a algorithms.
3
customized
10
Table no. : Summary of existing implementations
S. Algorithms/
Article Title Journal Details Dataset Advantages Disadvantages
No. Models
Performance-Based IEEE Access,date of GPU,FPGA BAliBASE, SABmark, significant speedups Hardware dependency,
Analogising of Needleman publication 05 February 2021 or OXBench compared to traditional Limited flexibility
Wunsch Algorithm to Align implementations of
DNA Sequences Using GPU sequence alignment
and FPGA algorithms
4.
Simple and Efficient IEEE Access,Date of Boyer-Moore-Horspool Bailbase Two novel algorithms for paper does not compare
Publication 23 January (BMH) algorithm and the pattern matching in the proposed algorithms to
Pattern Matching
2020 Reverse Complement (RC) biological sequences that some of the more recent,
Algorithms for are simple and efficient state-of-the-art pattern
algorithm
Biological Sequences matching algorithms,
which could limit their
utility in cutting-edge
research
5. .
10
Table no: Summary of existing implementations
S. Algorithms/
Article Title Journal Details Dataset Advantages Disadvantages
No. Models
7
Accelerating Edit- IEEE Access,date of the wavefront algorithm on - This paper includes the All the techniques related to
a problems of pattern pattern mining were not
Distance Sequence publication 13 June 2022 GPU mining and its related included in this paper.
Alignment on GPU
. applications.
Using the
Wavefront
Algorithm
SLPal: Accelerating IEEE Access,Date of seed-and-extend paradigm, Accelerated Alignment: Hardware Dependency: SLPal
- SLPal provides significant is specifically designed for
Long Sequence Publication 23 January SIMD vectorization
2020 acceleration in long many-core and multi-core
Alignment on sequence alignment architectures. While this
Many-Core and compared to traditional allows for efficient parallel
Multi-Core methods.. processing,It also restricts its
Architectures . applicability to systems that
possess such hardware
capabilities.
8 .
7. Gap Analysis
1.In the previous works the Sequence alignment over large databases cannot yield
results within a reasonable time, power, and cost.
3.In the present work by using Multilayer Perceptron and Particle Swarm
Optimization Algorithm we improve the performance of the sequence alignment
23
8. Objectives
The objectives of this project are listed as follows:
➢ Study on existing technologies to find out the draw backs and limitations.
➢ To train the model using a large dataset of DNA, and optimize it for enhancing the
performance of the model.
➢ To evaluate the performance of the model on a test dataset, and compare it with the
existing methods.
24
9. Scope
25
10. Dataset Description
Name of the Dataset: DNA SEQUENCE ALIGNMENT DATASETS BASED ON
NW ALGORITHM
Link: https://www.kaggle.com/datasets/amrezzeldinrashed/dna-sequence-
alignmnet-dataset
S o u r c e : Kaggle
Classes: 254
Instances: 65536
Attributes:16
26
11. METHODOLOGY
11.1. Proposed Model
Figure 1 represents the Proposed system
Step 1: Prepare the dataset with DNA sequences and known alignments.
Step 1:Initialize the MLP model with hidden layers, activation functions, and
random weights.
Step 2:Implement the PSO algorithm to optimize model weights and biases.
Step 3:Train the MLP model using PSO and update weights based on best
positions found.
29
Module 3: Pairwise Sequence Alignment
5. Identify the particle with the best fitness (alignment score) as the global
best solution.
7. Update the velocity and position of each particle using the PSO equations.
10. Update the local best solution for each particle if a better alignment
score is achieved.
11. Update the global best solution if a particle achieves a better alignment
score than the current global best.
12. Extract the best weights and biases from the global best solution.
13. Train the MLP model using the best weights and biases obtained from
PSO.
14. Perform pairwise sequence alignment using the trained MLP model on
the test sequences.
34
13. UML DIAGRAMS
13.1 Use Case Diagram
Figure 3 represents Usecase diagram
Figure 4:ActivityDiagram
19
13.3. Sequence Diagram
Figure 5 represents SequenceDiagram
Figure 5:SequenceDiagram
38
14. Functional and Non-Functional Requirements
Functional Requirements:
• Input DNA Sequences: The system should allow users to input DNA
sequences for alignment, either by manual entry or by importing from external
files.
48
References
[1] A. E. E. -D. Rashed, H. M. Amer, M. El-Seddek and H. E. -D. Moustafa, ”Sequence
Alignment Using Machine Learning-Based Needleman–Wunsch Algorithm,”
in IEEE Access, vol. 9, pp. 109522-109535, 2021, doi: 10.1109/ACCESS.
2021.3100408.
[2] C. Kyal, R. Kumar and A. Zamal, ”Performance-Based Analogising of Needleman
Wunsch Algorithm to Align DNA Sequences Using GPU and FPGA,”
2020 IEEE 17th India Council International Conference (INDICON), New
Delhi, India, 2020, pp. 1-5, doi: 10.1109/INDICON49873.2020.9342078.
[3] A. E. E. -D. Rashed, H. M. Amer, M. El-Seddek and H. E. -D. Moustafa, Accelerating
DNA pairwise sequence alignment using FPGA and a customized
convolutional neural network”, in ScienceDirect,2021.
[4] Y. -J. Song and D. -H. Cho, ”Local Alignment of DNA Sequence Based
on Deep Reinforcement Learning,” in IEEE Open Journal of Engineering in
Medicine and Biology, vol. 2, pp. 170-178, 2021, doi: 10.1109/OJEMB.2021.3076156.
[5] P. Neamatollahi, M. Hadi and M. Naghibzadeh, ”Simple and Efficient Pattern
Matching Algorithms for Biological Sequences,” in IEEE Access, vol. 8,
pp. 23838-23846, 2020, doi: 10.1109/ACCESS.2020.2969038.
49
[6] A. Mishra, B. K. Tripathi and S. Singh Soam, ”A Genetic Algorithm based
Approach for the Optimization of Multiple Sequence Alignment,” 2020 International
Conference on Computational Performance Evaluation (ComPE),
Shillong, India, 2020, pp. 415-418, doi: 10.1109/ComPE49325.2020.9200060.
[7] Q. Aguado-Puig et al,”Accelerating Edit-Distance Sequence Alignment on
GPU Using theWavefront Algorithm”,in IEEE
Access,doi:10.1109/ACCESS.2022.3182714
[8] X. Xu et al., ”SLPal: Accelerating Long Sequence Alignment on Many-
Core and Multi-Core Architectures,” 2020 IEEE International Conference
on Bioinformatics and Biomedicine (BIBM), Seoul, Korea (South), 2020,
pp. 2242-2249, doi: 10.1109/BIBM49941.2020.9313429.
[9] https://study.com/cimages/multimages/16/iterativesdlc.png
51