Overview of Machine Learning in Biology
Overview of Machine Learning in Biology
Abstract—There is a long history between Machine analyzing can be done quicker and more accurately
Learning and its application in Biology. Machine Learning instead of the same work being done manually.
has allowed many advancements in the field of Biology. This
includes studies for prediction and discovery by different II. SYNTHETIC GENE CIRCUITS
Machine Learning techniques for specified types of The application of Machine Learning (ML) in Biology
biological data. The versatility of techniques and
has made everything easier for biologists to compute,
frameworks has helped improve Machine Learning every
day, and hopefully, this will help us improve and become
analyze, classify and structure data. With the application
more efficient on new things we discover and grow our of ML today, it is possible to get results fast from the
database on biological data. In this paper, we will discuss laboratory. Without ML, biological studying was limited,
the different applications of Machine Learning in Biology, and it was not possible to observe in real-time what
i.e., Synthetic Gene Circuits, Convolutional Neural happens in cells because cells are invisible to the naked
Networks, Recurrent Neural Networks and Generative eye. The complexity of biological data leads people to the
Adversarial Networks. As the biological and medical fields use of ML in this field of study to obtain accurate results,
have been quickly growing into a data-rich environment, predict models, and improve the efficiency of
Machine Learning has become a vital tool to sort through understanding biological data.
all of this data. The application of Machine Learning helps
in two main facets. The first way is to help classify and The application of ML in Biology allows us to observe
predict tasks that a machine can quickly do. Secondly, it cells with their normal and pathological functioning.
does not allow much human input, which will help minimize That’s why it is possible for us today to detect any viruses
human bias or performance issues. Neural Networks go or diseases in genes through the algorithms of ML. For
hand-in-hand with Deep Learning as they are one of its example, how would it have been possible without ML to
techniques. By utilizing Deep Learning and Neural detect the coronavirus in human genes and help people to
Networks, we can expand our learning in Biology far more prevent it?
than ever before.
Let’s take the example of Synthetic Gene Circuits in
Keywords—Machine Learning (ML), Deep Learning Biology. The gene circuit is an assembly of biological
(DL), Artificial Intelligence (AI), Biology, Synthetic Gene parts that enables individual cells to respond and interact
Circuits, Convolutional Neural Networks (CNNs), Recurrent with each other to perform logical functions, and its
Neural Networks (RNNs), Generative Adversarial Networks development relies on the ability to cut, copy and paste
(GANs), Neural Networks. DNA. Gene circuit development is the branch of synthetic
Biology used to address societal challenges such as
I. INTRODUCTION cancer-resistant microbes. Every living cell in the gene
There are many different types of Deep Learning (DL) circuit contains a code written in four letters “A, C, T, G”
techniques that offer various solutions to help tackle [4]. The DNA carries the code the cells used to produce
problems too complex for the human brain. DLtechniques their proteins. Then the amount of the protein can
included throughout this research paper consist of determine if the cell moves or sits in a tissue.
Synthetic Gene Circuits, Convolutional Neural Networks The application of ML in Biology makes gene circuit
(CNNs), Recurrent Neural Networks (RNNs) and behavior predictable even if it is unlikely that software
Generative Adversarial Networks (GANs). This paper can predict the levels of each protein in every single cell
will go more in depth about the methods listed. This will [13]. Although, it is possible to predict gene circuit
help readers gain a better understanding of the life- function in a defined cell type and environment [13].
changing innovations that biological DL has to offer [5].
Having the compatibility of utilizing so many techniques III. CONVOLUTIONAL NEURAL NETWORKS
comes with its fair share of obstacles. DL is essential to
The advancements in computer vision within DLhave
Biology because it helps analyze large-scale high-
been perfected over time, primarily through the algorithm
dimensional data and can be used for finding hidden
of CNNs. CNNs are a DL technique created for
structures within them by training complex networks with
processing structured arrays of data [7][20]. These Neural
multiple layers that capture their internal structure, which
Networks are primarily used to classify different images
is very difficult for humans to do manually [3]. With DL,
610
Authorized licensed use limited to: UCLA Library. Downloaded on February 29,2024 at 05:10:49 UTC from IEEE Xplore. Restrictions apply.
displaying RNNs catch patterns of nonconsecutive data, “Genome-scale metabolic models (GEMs) are used
and new data is produced from the conditions that are extensively for analysis of mechanisms underlying human
learned [6]. They explain that amino acids are a really diseases and metabolic malfunctions [22].”
good example of something that can easily be analyzed
with these types of Neural Networks [6]. For example, in Many computer scientists and information technology
a study done by Müller et al, the researchers trained RNNs professionals can conclude that Python development is
one of the fundamentals for ML in many biological
with long and short-term memory units to recognize
helical antimicrobial peptides (AMPs), which can cause discoveries and research. Python algorithms take data that
bad interactions between blood proteins and in turn, is input into the command prompts, process it, and give
clumping [6]. With RNNs, activity is transmitted in the desired output needed to understand the information
cycles much like what occurs in the brain [10]. In turn, better. ML shares the same attributes as Python. Thus, it
this allows for a network’s finite computational resources can be concluded that both play a vital role in Biology.
Some examples of Python functions often used include
to be gradually recycled and deeper occurrences of
For Loop, Range, If Else, Concatenation, String, List,
diverting transformations are carried out. Because of this,
Tuple, and so many more. Some examples of the code
RNNs are able to execute computations that are more
pertaining to these functions can be found below [16]:
complex than an individual feedforward exploration
looking through the exact amount of connections and For Loop → For second_base in bases:
units [10]. RNNs are similar to feedforward Neural Range → range(0, number_of_years + 1,
Networks in the sense that they can also be trained using 1)
backpropagation, which means that they need to go
through the cycles backward. This is known as If Else → if (codon1 == ‘ATG’):
“backpropagation through time.” A better way to Concatenation → RNA_seq = RNA_seq +
understand this term and RNNs is to unravel the RNN into S_codon
an equal feedforward network. The different layers of the String → protein = “GFP”
feedforward network signify an RNN timestep. Each List → stop_codons = [‘TAA’, ‘tAG’]
layer undergoes a replication of the RNN’s weights and Tuple → Histidine = (‘H’, ‘CAT’, ‘CAC’)
components so every layer is going to share the same
weight [15]. Another useful tool for biological computations is
Biopython [8][17]. Biopython contains practical modules
V. GENERATIVE ADVERSARIAL NETWORKS used to engineer applications involving bioinformatics
Generative Adversarial Networks (GANs) are deep [11]. An example of a Biopython feature is the ‘Seq’
Neural Network frameworks that are able to learn from a object [18]. ‘Seq’ could be used to declare a protein
set of training data and generate new data with similar sequence [18]. Once declared, then the protein sequence
characteristics to the training data [14]. GANs consist of can be transcribed and translated through Biopython [18].
two main Neural Networks. These networks are known as Chang thought of the idea of Biopython in 1999, and
the generator and the discriminator and they often clash others joined in helping to make it happen [19]. It ended
with each other. Each network has different data that they up being released in July of 2000 [19].
are trained to produce. For the generator, they are trained
to produce fake data, while the discriminator is trained to VII. CONCLUSION AND FUTURE WORK
detect real examples from the generator’s fake data. We can accomplish so much more with the help of
biological ML and the various techniques within it.
VI. APPLICATIONS OF PYTHON IN BIOLOGY Innovations are being made every day with the help of
There are many different uses for Python within the these techniques. A wide variety of methods have been
field of Biology. Python acts as a way for biologists to covered throughout this research paper along with
illustrate the major principles of programming for examples of how these applications are used and how they
Biology. In analyzing diseases or protein structure, can be restructured to be more accommodating. Through
“ssbio” use is growing in the field of Biology [21]. In the applications of Synthetic Gene Circuits, CNNs, RNNs
recent times, as a backbone for this package, Python has and GANs in Biology, ML becomes so much more than
made a significant contribution to Biology on how protein just writing code in Python. With everything that
structures at the genome-scale level are analyzed. In the scientists and innovators have created thus far to better
article titled “ssbio: a Python framework for structural Biology, there is still a lot of capacity for more
systems Biology”, the authors gave a brief introduction advancements to be made in ML. Improvements on CNNs
about this Python package, and stated that: “working with could help doctors better interpret MRIs, X-rays and other
protein structures at the genome-scale has been images. These improvements would help by better
challenging in a variety of ways. Here, we present ssbio, detecting specific aspects of the images to Biology.
a Python package that provides a framework to easily
work with structural information in the context of REFERENCES
genome-scale network reconstructions, which can contain [1] Ching, T., Himmelstein, D. S., Beaulieu-Jones, B. K., Kalinin, A.
A., Do, B. T., Way, G. P., Ferrero, E., Agapow, P.-M., Zietz, M.,
thousands of individual proteins [21].” Hoffman, M. M., Xie, W., Rosen, G. L., Lengerich, B. J., Israeli,
This is very promising and shows how Python, a J., Lanchantin, J., Woloszynek, S., Carpenter, A. E., Shrikumar,
A., Xu, J., … Greene, C. S. (2018). Opportunities and obstacles
simple programming language, has become a powerful for deep learning in biology and medicine. Journal of The Royal
tool in everyday lives, from ATMs to biological research. Society Interface, 15(141).
In an online article published on pubmed.gov, the author [2] Webb, S. (2018, February 20). Deep Learning for Biology. Nature
put the meaning of genome-scale in plain simple as, News.
611
Authorized licensed use limited to: UCLA Library. Downloaded on February 29,2024 at 05:10:49 UTC from IEEE Xplore. Restrictions apply.
[3] Tarca, A. L., Carey, V. J., Chen, X.-wen, Romero, R., & Drăghici,
S. (n.d.). Machine learning and its applications to biology. PLOS
Computational Biology.
[4] Hazlegreaves, S., & here, P. enter your name. (2021, September
8). Synthetic Biology: Past, present and future. Open Access
Government.
[5] Greener, J. G., Kandathil, S. M., Moffat, L., & Jones, D. T. (2021,
September 13). A guide to machine learning for biologists. Nature
News.
[6] Müller, A. T., Hiss, J. A., & Schneider, G. (2018). Recurrent
Neural Network Model for Constructive Peptide Design. Journal
of Chemical Information and Modeling, 58(2), 472–479.
https://doi.org/10.1021/acs.jcim.7b00414
[7] Saha, S. (2018, December 17). A comprehensive guide to
Convolutional Neural Networks - the eli5 way. Medium.
[8] Peter J. A. Cock, Tiago Antao, Jeffrey T. Chang, Brad A.
Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg,
Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, Michiel J.
L. de Hoon, Biopython: freely available Python Greetools for
computational molecular biology and bioinformatics,
Bioinformatics, Volume 25, Issue 11, 1 June 2009, Pages 1422–
1423
[9] A beginner's guide to Convolutional Neural Networks (cnns).
Pathmind. (n.d.).
[10] Kriegeskorte, N. Neural network models and deep learning.
[11] Angermueller, C., Pärnamaa, T., Parts, L., & Stegle, O. (2016,
July 29). Deep Learning for Computational Biology. Molecular
Systems Biology.
[12] Bioengineer. (2020, August 4). Deep learning on cell signaling
networks establishes AI for single-cell biology.
[13] Computational biology: Genomes, networks, evolution MIT.
[14] Generative Adversarial Network. DeepAI. (2020, July 22).
https://deepai.org/machine-learning-glossary-and
terms/generative-adversarial-network
[15] Schmidhuber, J. (2015). Deep learning in neural networks: An
overview. Neural Networks, 61, 85–117.
https://doi.org/10.1016/j.neunet.2014.09.003
[16] Stevens, T. J., & amp; Boucher, W. (n.d.). Machine learning
(Chapter 24) - python programming for biology. Cambridge Core.
[17] Biopython. Biopython · Biopython. (n.d.). Retrieved April 3,
2022, from https://biopython.org/
[18] Huseyin Kocak, B. K. (n.d.). Python for Biologists. Python for
biologists. Retrieved April 3, 2022, From
https://www.pythonforbiologists.org
[19] Bassi, S. (2009, September 23). Python for Bioinformatics:
Sebastian Bassi: Taylor & Francis Group. Taylor & Francis.
[20] O’Shea, K., & Nash, R. (2015). An Introduction to Convolutional
Neural Networks. http://arxiv.org/abs/1511.08458
[21] Mih, N., Brunk, E., Chen, K., Catoiu, E., Sastry, A., Kavvas, E.,
Monk, J. M., Zhang, Z., & Palsson, B. O. (2018). Ssbio: A python
framework for structural systems biology. Bioinformatics,
34(12), 2155–2157.
[22] Wang, H., Robinson, J. L., Kocabas, P., Gustafsson, J., Anton, M.,
Cholle P. E., Huang, S., Gobom, J., Svensson, T., Uhlen, M.,
Zetterberg, H., Nielsen, J., (n.d.). Genome-scale metabolic
network reconstruction of model animals as a platform for
Translational Research. Proceedings of the National Academy of
Sciences of the United States of America. Retrieved April 3, 2022,
from https://pubmed.ncbi.nlm.nih.gov/34282017/
[23] Alanazi, S. A., Kamruzzaman, M. M., Islam Sarker, M. N.,
Alruwaili, M., Alhwaiti, Y., Alshammari, N., & Siddiqi, M.
H. (2021, April 5). Boostingbreast cancer detection using
convolutional neural network. Journal of Healthcare Engineering.
612
Authorized licensed use limited to: UCLA Library. Downloaded on February 29,2024 at 05:10:49 UTC from IEEE Xplore. Restrictions apply.