0% found this document useful (0 votes)
10 views7 pages

Optimization of A Classical Algorithm For The Alignment of Genomic Sequences With Artificial Bee Colony

This article discusses genomic alignment methods using the Needleman and ABC-optimized Smith-Waterman algorithms. The ABC-Smith Waterman algorithm provides alternative alignments through horizontal, vertical, diagonal, and inverse searches, enhancing genomic sequence analysis for scientific research.

Uploaded by

Bianca Naomi FC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views7 pages

Optimization of A Classical Algorithm For The Alignment of Genomic Sequences With Artificial Bee Colony

This article discusses genomic alignment methods using the Needleman and ABC-optimized Smith-Waterman algorithms. The ABC-Smith Waterman algorithm provides alternative alignments through horizontal, vertical, diagonal, and inverse searches, enhancing genomic sequence analysis for scientific research.

Uploaded by

Bianca Naomi FC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Journal of Mechanics Engineering and Automation 12 (2022) 57-63

doi: 10.17265/2159-5275/2022.02.004
D DAVID PUBLISHING

Optimization of a Classical Algorithm for the Alignment


of Genomic Sequences with Artificial Bee Colony

Raul Magdaleno Peñaloza, Andrea Magadan Salazar and Gerardo Reyes Salgado
Department of Computer Science, Centro de Investigacion y Desarrollo Tecnologico, Cuernavaca Morelos 62490, Mexico

Abstract: This article shows genomic alignment methods using the classic “Needleman” and “Smith-Waterman” algorithms, the latter
they were optimized by the ABC (artificial bee colony) algorithm. In the genomic alignment, a goal state is not presented, the experiments
that are carried out show alternative alignments by ABC were proposed. Different types of alignments could exist within the classical
algorithm, based on a horizontal, vertical, diagonal and inverse search mechanism on a match value table. Our ABC-Smith Waterman
algorithm was generated from the genomic sequences written in rows and columns for the search for similarities that will provide
values that ABC uses to process and provide more results of alignments that can be used by scientists for their experiments and research.

Key words: Algorithm, genomic alignment, ABC, Needleman, Smith-Waterman.

1. Introduction Needleman-Wunch algorithm. Section 5 presents


Smith-Waterman algorithm. Section 6 presents ABC
Sequence alignment is a basic tool that allows the
(artificial bee colony) algorithm. Section 7 describes
extraction of functional, structural and evolutionary
the optimization ABC Smith-Waterman. Finally,
information contained in biological sequences. These
Section 8 shows the results with ABC optimization.
similarities may indicate functional or evolutionary
relationships [1]. 2. Genomic Sequences
The main algorithm used in the genomic areas, was
propose by Saul B. Needleman and Christian D. DNA (deoxyribonucleic acid) is a finite chain built
Wunch [2] in 1970, T. F. Smith and M. S. Waterman from an alphabet N = {A, C, G, T} of nucleotides and
[3] used this method and optimized in 1981. In 2017 the GENOME is a set of all the DNA sequences
M. A. Lopez and J. V. Medina [4] from the associated with an organism [1].
Universidad del Valle, Santiago de Cali, Colombia, According to Needleman and Wunch [2], nucleic
proposed an optimization of these algorithms using a acids are the biomolecules that carry genetic information.
parallel architecture. They are biopolymers, of high molecular weight,
The objective of this paper is to optimize the formed by other structural subunits or monomers,
alignment methods, which facilitates experimentation called nucleotides. From the chemical point of view,
in the alignment of two sequences and generates new nucleic acids are macromolecules formed by linear
ones, better with different results that can be used by polymers of nucleotides, linked by phosphate ester
specialists in the genomic area. bonds, with no apparent periodicity.
The article is organized as follows way: Section 2 DNA sequences contain the genetic information in
presents genomic sequences. Section 3 talks about all living things. The more similar two sequences are,
sequences alignment. Section 4 describes the more similar the functions of the proteins encoded
by them will tend to be. Genes having same ancestors
Corresponding author: Raul Magdaleno Peñaloza, reduce the chances that the sequences may be
mechatronic engineer, research field: genomic alignment and
artificial bee colony. homologous.
58 Optimization of a Classical Algorithm for the Alignment of Genomic Sequences
with Artificial Bee Colony

DNA undergoes mutations over the years and in one of the sequences as the disappearance of a
through their descendants, more time that passes since residue in another. In an alignment, when there is no
the last common ancestor, the more different the coincidence and in order not to move the whole
sequences will be. sequence, it is left in the closest place as shown in Fig.
2.
3. Sequences Alignment
Sales et al. [6] show that in each position of the
Sequence alignment is a basic tool that allows the alignment there will be two identical characters
extraction of functional, structural and evolutionary (MATCH), different (No MATCH) or one character
information contained in biological sequences. aligned with a gap. What cannot be are two gaps
The main goals of comparing two or more genomic aligned.
sequences are: Two sequences can be aligned in many ways. To
 Determine and quantify the degree of similarity determine which is the best alignment, a scoring
between them. system is used that gives each pair of characters a
 Determine if there is some kind of relationship different value depending on whether they are the
between them or if the resemblance is simply the same, different or whether there is a gap. The score
result of chance. of an alignment is calculated by adding the score of
 Detect the presence of conserved structural and each of the positions helping to determine if the
functional motifs. sequences are really related or if their similarity is
 Build phylogenetic trees that reflect their due to chance.
evolutionary relationships. The alignment that gets the highest score is called
It is proposed that in order to find the degree of the optimal alignment, shown in Fig. 3.
similarity in two sequences, the first thing to do is to
4. Needleman-Wunch Algorithm
look for similar characters, which consists of writing
one sequence, on top of the other so that the number Needleman and Wunch [2] introduced an approach
of symbols that coincide in the same position be in 1970 to calculate the optimal global alignment of
maximum [5]. These matches are displayed in Fig. 1. two sequences.
If necessary, gaps can be introduced in the The algorithm [2] is a way to massively reduce the
sequences and considered as the insertion of a residue number of possibilities to consider finding new alignment.

Fig. 1 Example of a sequence [7].

Fig. 2 Match NoMatch Gap/Indel Alignment [7].

Fig. 3 Score alignment [7].


Optimization of a Classical Algorithm for the Alignment of Genomic Sequences 59
with Artificial Bee Colony

According to Coll [8] and Backofen [9], under the


assumption that both input sequences come from the
same origin, a global alignment tries to identify the
parts that coincide and the changes necessary to
transfer a sequence to the other.
The dynamic programming approach tabulates the
Fig. 5 Score alignment [10].
optimal sub-solutions in a 2D matrix [2].
Needleman-Wunch consists of the following three
steps:
(1) Start the score matrix
(2) Calculate the score and fill in the back matrix
(3) Deduce the alignment of the posterior matrix
To determine which is the best alignment, a scoring
system is used that gives each pair of characters a Fig. 6 Score tracking in matrix [10].
different value depending on whether there is a
MATCH, a No MATCH or a GAP. Taking the by a path in the matrix, starting from the lower right
example of two words: SEND with AND, we get: corner to the upper left corner and following the best
SEND score, as shown in Fig. 6.
AND score: +1
5. Smith-Waterman algorithm
A-ND score: +3 ← This was the best score found.
AN-D score: -3 The dynamic programming approach of Temple F.
AND- score: -8 Smith and Michael S. Waterman [3] calculates the
Now, in order to create a Needleman-Wunch matrix optimal local alignments of two sequences that are best
[2], the following procedure is used, which is: conserved. Manavski and Valle [11] argue that this
 Align the first sequence horizontally at the top; algorithm is designed to find the optimal local
 Align the second sequence vertically, glued to the alignment between two sequences. Based on
left. Needleman’s matrix alignment computation, the
In each cell, a value is added or subtracted depending number of rows and columns is given by the sequence
on whether the characters generate a MATCH, a No database.
Match or a GAP [9]. Sales et al. [11] propose the example where he
Figs. 4 and 5 show how the matrices should be aligns the words “COELACANTH” with “PELICAN”.
filled with their respective score, which can be followed The matrix is built the same as Needleman [11]. Its
difference lies in expanding an additional data “0”.
This lower bound on the similarity score excludes the
alignments that eventually are not similar.
Since the row and column are 0 instead of negative
numbers (-1, -2, -3…), the MATCH & No MATCH
values remain the same. Now, when counting, any
number that, if negative, will be set to 0, here, the
score of the highest obtained is followed until reaching
Fig. 4 Needleman matrix [10]. 0. This method can be seen in Fig. 7.
60 Optimization of a Classical Algorithm for the Alignment of Genomic Sequences
with Artificial Bee Colony

Fig. 7 Matrix Smith-Waterman [1].

6. ABC algorithm two types of unemployed foragers: scouts,


searching the environment surrounding the nest
The ABC is one of the most recent algorithms in the
for new food sources
domain of collective intelligence proposed by Dervis
The model also defines a main mode of behavior
Karaboga in 2005 [12], in this work, a new
that is necessary for self-organization, known as
optimization algorithm based on the intelligent
collective intelligence, that the main function is
behavior of honey bee swarm has been described. The
recruiting food collectors for rich food sources results
new swarm algorithm is very simple and very flexible
in positive feedback and abandonment of poor sources
when compared to the existing swarm-based
by food collectors, causing negative feedback [15].
algorithms. The algorithm can be used for solving
Therefore, ABC’s exploration capacity is restricted [16].
unimodal and multi-modal numerical optimization
problems [13] 7. ABC Smith-Waterman
The objective of these bees is to discover the food In the Smith-Waterman algorithm it is one of the
sources with the greatest nectar [12]. most efficient and easiest to understand methods to
The behavior of bees was modeled as an find genomic alignments.
optimization heuristic [12] based on the biological The steps used for algorithm optimization are as
model that consists of according to Ref. [14] the follows:
following steps: (1) Generate the table of records.
(1) Food sources: although the value depends on (2) Generate the first alignment by the
many factors, it is summarized in a numerical Smith-Waterman method.
value that indicates their potential. (3) Generate the second alignment in reverse.
(2) Collector bees employed: these bees exploit a (4) Schedule bees for every possibility.
food source; they are also in charge of (5) Send bees for both line-ups.
communicating their location and profitability to (6) Register possibilities.
the observer bees. (7) Save possible alignments.
(3) Unemployed collector bees: They are continually The Smith-Waterman algorithm is an algorithm that
looking out for a food source to exploit. There are is based on similarity data record tables, which seeks
Optimization of a Classical Algorithm for the Alignment of Genomic Sequences 61
with Artificial Bee Colony

the best alignment possibility. However, the algorithm


can find possibilities where the alignment forks in
different ways. There is no goal state in genomic
alignment, any possibility of generating an alignment is
considered valid.
When running the method, it was found that, by
aligning with a section of the sequence, the result is
seen more directly, however, by aligning the entire
sequence; it generates a completely different result.
This experimentation raised the question: Can a
different alignment possibility be found in an
established method?
Fig. 8 shows how they can find a possibility, in Fig. 9 First result with ABC optimization.
which, choosing any of the marked paths, it could be
used as an option for genomic alignment.

8. Results with ABC Optimization


The genomic sequences were obtained from the
public database NCBI (National Center for
Biotechnology Information) by reason of the INSDC
(International Neuclotide Sequence Databank
Collaboration) which is an international collaboration
between the three largest genomic databases in Europe
and Asia [17, 18].
The genomes of the animals that were used for this
experiment were:
 Ceratotherium simum: Used for the columns
 Bubo bubo: Used for rows. Fig. 10 Second result with ABC optimization.
In Figs. 9-13, the paths that the bees were found to
generate different alignments are graphically shown in
different data graphs.

Fig. 8 Matrix possibilities. Fig. 11 Third result with ABC optimization.


62 Optimization of a Classical Algorithm for the Alignment of Genomic Sequences
with Artificial Bee Colony

Fig. 12 Fourth result with ABC optimization. Fig. 13 Fifth result with ABC optimization.

Fig. 14 Alignment result with ABC optimization.

Fig. 15 Second alignment result with ABC optimization.

The yellow color shows the first alignment made a different path by the alignment of the classical
with the classic Smith-Waterman algorithm. The method in reverse.
green color shows an alignment made in the reverse Figs. 14 and 15 show an example of the different
way. In the section where the colors are born, those alignment results generated with the aforementioned
are the possibilities found by the bees in case of taking algorithms.
Optimization of a Classical Algorithm for the Alignment of Genomic Sequences 63
with Artificial Bee Colony

9. Conclusions Ingeniería Biomédica 12 (23): 53-62. (in Spanish)


[5] Coll, V. B. 2008. “Estructura y Propiedades de Los
The method ABC for the optimization of the Ácidos Nucléicos.” M.Sc. thesis, Ingeniería Biomédica
(UV-UPV). (in Spanish)
algorithm Smith Waterman showed good results at the
[6] Sales, J. C., Blanca, J., and Ziarsolo, P. 2019.
time of implementation and generation of the Alineamiento de secuencias (s. f.-b). bioinf.comav.upv.es.
sequences. Actually, the ABC-Smith Waterman (in Spanish)
algorithm gives six different results as maximum [7] Juan Manuel González Mañas. 2020. COMPARACIÓN
DE SECUENCIAS. 7 diciembre 2020, de Universidad
alignment possibilities versus traditional algorithms
del País Vasco (in Spanish)
alignment only give one result. [8] Backofen, R. 2011. Sequence Alignment
The algorithm does not take a lot of time in the Needleman-Wunsch. Obtenido de Uni Freiburg
process, even if the sequences are long, the program Bioinformatics.
[9] Backofen, R. 2018. Teaching-Smith-Waterman. Obtenido
takes less than ten minutes.
de Uni Freiburg Bioinformatics.
The genomic sequences did not generate any [10] Likic, V. (2016). “The Needleman-Wunsch Algorithm for
problem or change in the work plans, because its Sequence Alignment.” Molecular Science and
manipulation is as any text file, giving a versatility Biotechnology Institute The University of Melbourne, 46.
[11] Manavski, S. A., and Valle, G. 2008. “CUDA Compatible
when generating the algorithms and the generation of
GPU Cards as Efficient Hardware Accelerators for
alignment results. Smith-Waterman Sequence Alignment.” BMC
Expectations were exceeded when the algorithm Bioinformatics 9: S10.
gave the first results, extending the possibilities of [12] Salto, C. 2017. Optimización mediante el algoritmo de
colonia de abejas artificial, edited by General Pico and
alignments and multiple future works with new
La Pampa. Argentina: Universidad Nacional de La
different possibilities that can be searched by Pampa. (in Spanish)
generating new artificial employed bees with a new [13] Karaboga, D. 2005. An Idea Based on Honey Bee Swarm
specification to search food sources that could give a for Numerical Optimization. Technical Report-TR06,
Engineering Faculty Computer Engineering Department,
new kind of alignment.
Erciyes University.
[14] Kumar, A., Kumar, D., and Jarial, S. K. 2016. “A
References
Comparative Analysis of Selection Schemes in the
[1] Santamaria, R. 2013. Alineamiento de pares de Artificial Bee Colony Algorithm.” Información
secuencias. (in Spanish) Tecnológica 20 (1): 55-66.
[2] Needleman, S. B., and Wunch, C. D. 1970. “A General [15] Karaboga, D. 2010. “Artificial Bee Colony Algorithm.”
Method Applicable to the Search for Similarities in the Accessed on 29 October, 2021. Scholarpedia.Org.
Amino Acid Sequence of Two Proteins.” J Mol Biol. 48 [16] Tsai, P. W., Pan, J. S., Liao, B. Y., and Chu, S. C. 2009.
(3): 443-53. “Enhanced Artificial Bee Colony Optimization.”
[3] Smith, T. F., and Waterman, M. S. 1981. Identification of International Journal of Innovative Computing,
Common Molecular Subsequences. London: Academic Information and Control 5 (12): 12.
Press Inc. [17] National Center for Biotechnology Information
[4] López, M. A., and Medina, J. V. 2017. “Implementación Ncbi.nlm.nih.gov. https://www.ncbi.nlm.nih.gov/.
hardware del algoritmo de Needleman-Wunsch [18] International Nucleotide Sequence Database
modificado usando una arquitectura paralela.” Revista Collaboration INSDCInsdc.org. http://www.insdc.org.

You might also like