0% found this document useful (0 votes)

19 views

Fog Computing (Use It in Some Application)

This document discusses fog computing and its potential applications for bioinformatics. It begins by providing background on bioinformatics and challenges related to high computational and storage demands of bioinformatics algorithms. It then discusses how cloud computing has helped address these challenges but still has limitations. The document introduces fog computing as an extension of cloud computing that can bring computational resources closer to where data is generated, a benefit for data-intensive bioinformatics applications. It outlines how the chapter will cover cloud computing models and applications in bioinformatics, as well as the concept of fog computing and its suitability for bioinformatics.

Uploaded by

Anwar Shah

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Fog Computing (Use It in Some Application)

Uploaded by

Anwar Shah

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

529

Fog Computing for Bioinformatics Applications

Hafeez Ur Rehman, Asad Khan, and Usman Habib
Department of Computer Science, National University of Computer and Emerging Sciences, Peshawar,
Pakistan

21.1 Introduction
Bioinformatics is an interdisciplinary field having a myriad of applications in the
area of health and life sciences. Bioinformatics algorithms generally have high
computational and storage requirements [1, 2]. Currently, the most apt prefer-
ence for bioinformatics researchers is to use cloud computing–based services to
meet their high computational requirements within a limited budget. However,
the scientists need to port their data and algorithms to the cloud environment in
order to perform analysis. In recent days, due to the emerging trend of “on-the-fly
solutions” for bioinformatics problems, the computational resources need to be
brought closer to the users [3–6]. Fog computing, as an extension of cloud com-
puting, brings the services to the edge of the network. This essentially brings the
advantages and power of the cloud closer to the place where the data is actually
generated, benefiting the resource demanding bioinformatics algorithms.
In the field of bioinformatics, the high-throughput technologies such as
next-generation sequencing result into a diverse deluge of sequencing data [7, 8].
These raw sequences are called omics sequences, which include DNA sequences,
RNA sequences, and protein sequences, among others. The study of different
types of omics sequences produced different areas of research; for example, the
study of gene sequences established the field of genomics, likewise the study
of protein sequences resulted in the field proteomics, and so on [9–11]. In each
subfield, the scientists try to explore the effect of omics sequences in under-
standing biological organism as a completely engineered system. In addition to
sequencing data, one of the promises of bioinformatics research, in general, is to
develop personalized medicine. The area that focuses on personalized medicine

Fog Computing: Theory and Practice, First Edition.

Edited by Assad Abbas, Samee U. Khan, and Albert Y. Zomaya.
© 2020 John Wiley & Sons, Inc. Published 2020 by John Wiley & Sons, Inc.
530 21 Fog Computing for Bioinformatics Applications

is called pharmacogenomics, the study of genetic variations in an organism due

to the application of a certain drug. An emergent area in pharmacogenomics is
single nucleotide polymorphism (SNP), the study of variations in nucleotides at a
specific position in a genome. SNPs in pharmacogenomics are used to select and
optimize drugs based on genetic profiles of different patients.
Advancement in high-throughput technologies has made it easier to obtain
the genetic profiles of patients [2, 12]. Next-generation sequencing technologies
have reduced the cost of sequencing genomes at record lows [8]. Nowadays
such sequencing tools are extensively used in pharmacogenomics and genomics
studies alike. These technologies, when applied to huge populations in carrying
out clinical experiments, result in a deluge of data. This abundance of data, along
with opportunities, present some challenges for researchers. First, the storage
of such data due to a real-time increase requires a lot of space, and provision of
space for increasing data is becoming cumbersome [1, 2]. Second, the analysis
of omics data also requires specialized bioinformatics software and tools that
need extensive computational resources, and provision of those resources is a
challenging task.
A primitive solution to these problems emerged in the form of bioinformatics
tools, often delivered in the form of web services, to help manage and analyze data
stored in biological databases in distant geographical locations [4, 13]. This solu-
tion had its disadvantages; for example, the storage of data in distributed databases
made the process slower. To overcome this problem, cloud computing came to the
rescue [14–17].
Cloud computing is a service-oriented computing model specifically designed
for problems requiring large-scale data and high computational resources. Due to
its appropriateness for across-the-board resource availability for resource-hungry
algorithms, the cloud-computing model has gained fame in the research commu-
nity and the concept has spread swiftly in recent years. The main aim of cloud
computing is to supply hardware and software resources to users through network
links.
The resources that are provided by cloud computing may vary from cloud to
cloud. These resources may include memory, CPU, storage, specific applications,
operating systems, etc. The cloud resources have dynamic scalability, virtual-
ization, and accessibility rendered over the Internet [18]. The model offers new
facilities for users that require scalable and massive storage, computing resources,
applications, virtual technologies, etc. on demand [19]. Therefore, cloud comput-
ing can perform a significant role in various stages of a bioinformatics analysis
pipeline, like data storage, preprocessing, sharing, integration, and exploration as
well as visualization.
Regardless of numerous merits of cloud computing, it is facing several issues
related to security, technology, management, and ethics. The security of data
21.2 Cloud Computing 531

(especially privacy), legal responsibilities, and geographical localization of data

are open challenging problems for cloud-computing platforms. For example, in
the context of bioinformatics, what will be legal responsibilities in data leakages
of sensitive information related to patients in the time of data uploading and
processing in the cloud? In addition to these problems, researchers need to port
their data and algorithms to the cloud environment in order to perform any
analysis. In recent days, due to the emerging trend of on-the-fly solutions for
bioinformatics problems, computational resources need to be brought closer
to users [5, 6]. Fog computing, as an extension of cloud computing, brings the
services to the edge of the network. This essentially brings the advantages and
power of the cloud closer to the place where the data is produced, thus, helping
and speeding up “on-the-fly solutions” for bioinformatics applications.
In this chapter, we first discuss the suitability of cloud computing for bioin-
formatics problems, alongside highlighting its limitations. Furthermore, we dis-
cuss the appropriateness of fog computing as an extension of clouding computing
to solve bioinformatics problems. The overall chapter is organized as follows: In
Section 21.2, we give an overview of cloud computing along with the overall cloud
service model. Section 21.3 presents a complete review of existing use of cloud
computing for bioinformatics application development. It also highlights the key
bioinformatics projects that utilize cloud computing to make personalized, pre-
ventive as well as precision medicines. In Section 21.4, we present the concept
of fog computing with an emphasis on its key properties (such as low jitter, low
latency, improved security, etc.) that distinguish it from the cloud paradigm. In
Section 21.5, we discuss the suitability of the fog computing paradigm and its
potential for data and resource extensive bioinformatics applications, while elab-
orating it with a real time microorganism detection example. In the last section,
the chapter is concluded.

21.2 Cloud Computing

Cloud computing is a computing paradigm in which different computers are con-
figured to provide on-demand services at a higher level, thus freeing the user from
underlying hardware, storage, and other issues. The authors have defined the term
“cloud computing” with around 22 excerpts, by evaluating the common proper-
ties of cloud computing [20]. The authors have emphasized on the significance
of service level agreements (SLAs) in order to make the cloud environment more
reliable. In addition, the virtualization property of the cloud is the key enabler
of service provisioning. Generally, the cloud computing paradigm has two funda-
mental models, i.e. the service model and the delivery model. Each of the models,
along with types, is discussed in the proceeding sections.
532 21 Fog Computing for Bioinformatics Applications

21.2.1 Service Models

The cloud service models describe the level of services in which customers interact
with the cloud. There are three type of service models.

1. Infrastructure as a Service, abbreviated as IaaS. As its name suggests, this

model provides an infrastructure to customers as a service that has high
computational power and a large storage space. This is lower model; i.e. the
customer/organization will deploy its platform on its infrastructure and also
its application software’s according to the needs of the customer/organization.
These models support virtual machine server facilities that can be managed
and configured according to need. These types of service models can be
acquired by large-scale organizations for achieving required goals. The com-
mon example is Amazon’s Elastic Compute Cloud (E2C), which has virtual
machine facility that users can manage according to his/her requirements.
Another example is Amazon’s Simple Storage Service (S3) for storing and
retrieving data through a web interface.
2. Platform as a Service, abbreviated as PaaS. In this service model, users have no
control on infrastructure, which means that the cloud service provider installs
the platform in advance and users can configure it according to their need. The
platform provides users the facility of creating, testing, and installing an appli-
cation according to requirements. The user program can be written or created
using already available libraries, programming languages, and tolls in the plat-
form. The Google apps engine is a common example of a Pass model, in which
users can create programs of Java and Python using the available software devel-
opment kit (SDK) of both languages.
3. Software as a Service, abbreviated as SaaS. As its name suggests, users have no
control on infrastructure and platform but use already available software in
the cloud. The customers can use these services through a web interface. In
some scenarios,customers can manage and configure the application program
according to their need. Dropbox is a common example of the SaaS model.

21.2.2 Delivery Models

Cloud services can be offered to end-users in using one of the three delivery mod-
els, i.e. public, private, and hybrid clouds. The following is a short introduction of
different types of delivery models:

Public Cloud. This type of cloud is publicly available. The customer can use the
hardware and software resources of a data center. These resources can be
acquired from a vender organization through credit cards. Amazon, Azure,
and Google apps are common examples of public clouds.
21.3 Cloud Computing Applications in Bioinformatics 533

Private Cloud. These types of cloud can be used only by employees of a specific
organization. The software and hardware resources can be configured by users
and the cloud administrator according to the requirements of specific users. Sev-
eral open sources and commercial software available on the Internet to establish
these types of clouds. OpenStake, Open Nebula, and VMware Cloud are com-
mon examples of software used to create private cloud.
Hybrid Cloud. This type of cloud is a combination of two clouds of different deliv-
ery models that are connected through specific technology in order to handle
data and application portability.

21.3 Cloud Computing Applications in Bioinformatics

Bioinformatics research normally involves large datasets that are usually down-
loaded from publicly available repositories and then performing experiments
using an available in-house infrastructure. The main problem faced by bioin-
formatics researchers is the lack of sufficient computing resources to perform
experiments in a limited time. A dedicated high computing system for all
researchers is not feasible. As an alternative, researchers tend to utilize the
services offered by cloud computing platforms. Cloud computing has scalable
computational power that can be ordered on demand to run experiments within
a certain time and in a cost-effective way. Researchers can choose different cloud
models according to their requirements for storing and processing data. There
is no need for cloud users to install an operating system or application software;
they can log in to the cloud with their account and use the available platforms
and applications [21].
The three service models of cloud computing are: (1) Software as a Service
(SaaS), (2) Platform as a Service (PaaS), and (3) Infrastructure as a Service (IaaS).
In the following subsection, we discuss each model, along with the tools that are
built on the basis of these models.

21.3.1 Bioinformatics Tools Deployed as SaaS

More recently, researchers have developed different cloud-based tools to properly
execute bioinformatics tasks [1]. These tools can solve problems related to
sequence alignment, mapping applications, and gene expressions [2]. Some
examples related to SaaS bioinformatics tools are presented in the following
paragraphs, highlighting the pros and cons of each.
In ref. [3], the authors developed the STORMSeq (Scalable Tools for
Open-Source Read Mapping) tool based on cloud computing. The tool has a
graphical user interface (GUI) for read mapping, read cleaning, and variant
534 21 Fog Computing for Bioinformatics Applications

calling and annotation to process personal genomic data. STORMSeq charges

$2 for processing a full exome sequence in 510 hours and $30 to process a whole
genome in 38 days. The services have open access and open source resources and
GUI and can be accessed using Amazon’s EC2 cloud.
CloudBurst [4] and CloudAligner [13] are cloud-based tools designed for map-
ping next-generation sequences of the human genome and other species. Both of
them use the MapReduce platform of Hadoop for parallelizing execution on dif-
ferent nodes. CloudAligner is preferred over cloudBurt because it can process long
sequences. Crossbow [22] is another tool for alignment and single nucleotide poly-
morphism (SNP) detection; it can detect whole human genome SNPs and align
them in a day using short read aligner.
The Variant Annotation Tool (VAT) [14] was developed for transcription-level
annotation of variants from multiple personal genomes and use it to identify sum-
marized sets of traits of individual populations. VAT also provides visualization
of details from many genetic aspects, including comparative analysis, gene and
allele gene occurrence and frequencies, which are based on the data collected
from a wide range of individuals. These visualizations can further be used to draw
additional conclusions, for example by constructing phylogenetic trees. The tool
provides both web-based and command-line interfaces for researchers to use.
For RNA-Seq analysis, the FX [15] tool has been developed; it uses the cloud
computing infrastructure for gene-level expression estimation and variant calling.
The FX tool uses a web-based interface to take advantage of the cloud computing
infrastructure. Myrna [16] is another elegant tool that calculates gene expression
from large datasets. The tool uses short read alignment and combines it with inter-
val calculation, normalization and aggregation, and statistical modeling to speed
up execution time. Once aligned, the calculation step for genes exon coverage
with deferential expression is done using both parametric and nonparametric per-
mutation tests. Both Amazon’s Elastic MapReduce and Hadoop can be used for
performing these operations.
PeakRanger [17] is software that has been developed for the chromatin immuno-
precipitation sequencing (ChIP-seq) technique. This technique is related to the
next-generation sequencing (NGS) platform and has the capability of studying the
interactions between proteins and DNA. PeakRanger is a peak caller package that
can be executed in a parallel cloud computing environment to get tremendously
high performance on huge datasets.
Cloud4SNP [23] is a novel tool based on cloud computing infrastructure. The
tool can preprocess pharmacogenomics SNP microarray data and can also perform
the statistical analysis. Different statistical tests are done using the partition of data
sets on cloud virtual servers. Additionally, different types of statistical corrections
such as false discovery rate, bonferroni correction, etc. can be done on the cloud
21.3 Cloud Computing Applications in Bioinformatics 535

in parallel, permitting the user to select different statistical models according to

their requirements.

21.3.2 Bioinformatics Platforms Deployed as PaaS

Nowadays the Galaxy cloud is one of the most used PaaS-based platforms for
bioinformatics applications; it uses the Galaxy cloud-based platform to analyze
large-scale data. It has the facility for every user to run a private Galaxy instal-
lation on the cloud, without sharing resources to other users. Using the Galaxy
cloud, users have full control of customizing software resources, deployment
as well as launching its instances. Galaxy Cloud has another advantage over
other platforms, which is portability – data as well as results can be moved from
one cloud to another. This portability makes the Galaxy Cloud an ideal choice.
Another publicly available Galaxy Cloud platform is CloudMan [24]. It is part of
the Amazon Web Services cloud, yet it is compatible with Eucalyptus and other
clouds [25]. CloudMan provides the facility of deployment and customization, as
well as sharing the entire environment of analysis framework like data, tool, and
configuration.
In [26] the researchers developed Eoulsan, a scalable modular framework based
on the Hadoop cloud infrastructure and the MapReduce algorithm for the analy-
sis of high-throughput sequences. The Eoulsan framework is rapidly scalable and
provides facilities to generate clusters that further help in the analysis of several
samples at once by using various software solutions available. The Eoulsan frame-
work is implemented in Java and is supported on Linux systems. The framework
is distributed under the LGPL license.

21.3.3 Bioinformatics Tools Deployed as IaaS

Cloud computing also provides the infrastructure to be used as a service. Bionim-
bus was introduced as an IaaS platform to handle the processing of genomic and
phenotypic data [27]. It uses OpenStack for virtual machines. OpenStack man-
ages a virtual machine on an on-demand basis to acquire computational resources.
Bionimbus uses the famous ClusterFS file system in its clusters, which makes it
the ideal candidate for applications with a lots of read/write operations. The other
key features are that it contains Tukey, which works like a portal and is associ-
ated with middleware. It also contains Yates for automatic installation as well as
for configuration and maintenance required for software infrastructure. Table 21.1
shows the summary of the bioinformatics applications utilizing different service
models, including IaaS.
Cloud Virtual Resource, named CloVR [28], is a novel desktop-based software
application that utilizes cloud computing resources to handle the analysis of
536 21 Fog Computing for Bioinformatics Applications

Table 21.1 A summary of existing cloud computing based major bioinformatics

applications.

Cloud computing–based bioinformatics applications

Service
Project name model Task Reference

STORMSeq SaaS Genome sequencing [3]

CloudBurst SaaS Short read aligner for whole genome [4]
CloudAligner SaaS Short read aligner for whole genome [13]
Crossbow SaaS Genome sequencing [22]
VAT SaaS Genome variant annotation tool [14]
FX SaaS RNA sequencing tool [15]
Myrna SaaS RNA sequencing tool [16]
PeakRanger SaaS Chip sequencing tool [17]
Cloud4SNP SaaS SNP analysis tool [23]
CloudMan PaaS Galaxy-based framework for [25]
bioinformatics
Eoulsan PaaS High-throughput sequencing analysis [26]
framework
Bionimbus IaaS A infrastructure base on cloud for [27]
genome database sharing, analyzing,
and management
CloVR IaaS Automatic and portable virtual machine [28]
for microbial sequences
CloudBioLinux IaaS Provide resources for genomes analysis [29]

genomic data automatically. CloVR has portable VM machines for metagenomics,

microbial genomics, and the whole genome. The CloVR virtual machine can be
run on PCs. It utilizes local resources and has minimal installation requirements,
although it also CloVR has support for using remote cloud computing resources
for large-scale sequence processing, which improves the performance.
Another IaaS example is Cloud BioLinux [29], which is a virtual machine that
is publicly available for the worldwide research community, aimed at expediting
bioinformatics research. The BioLinux platform has both command line and
graphical user interfaces for a variety of preconfigured software applications,
along with documentation. There are more than 100 state-of-the-art bioinfor-
matics packages to perform sequence operations, including sequence alignment,
clustering of sequences, sequence assembly, sequence feature visualization,
customization, and phylogenetic analysis. In addition, BioLinux is integrated
21.4 Fog Computing 537

with Amazons EC2 cloud, through the Eucalyptus cloud platform. Users have
the convenience of accessing the desired bioinformatics tools as well as scalable
cloud resources by using the EC2 cloud on a local machine.
The difficulties encountered by bioinformatics researchers in order to carry out
their research in a cost-effective and fast manner is resolved, to some extent, with
the help of the cloud computing services described earlier. However, there are
growing privacy concerns, longer delays, and jitter, and on top of that researchers
need to port their data and algorithms to the cloud environment in order to
perform analysis and retrieve results. Recently, the emerging trend of on-the-fly
solutions for bioinformatics problems, as well as the demand for more and more
real-time applications, requires reducing the remoteness between computing
platforms and users’ data by bringing the computational resources closer to the
users. Fog computing, as an extension of cloud computing, brings the services
to the edge of the network. In the next section we give a thorough introduc-
tion of fog computing and highlight its appropriateness for the bioinformatics
community.

21.4 Fog Computing

Presently, the Internet of Things architecture has the ability to connect physical
devices, i.e. things, to analytics and machine learning (ML) applications. These
applications take decisions without human intervention from the data generated
by these devices [30]. The fundamental parameters on which the Internet of
Things (IoT) performance is measured is fast data processing, scalability in
analytics, and quick response.
Currently, centralized cloud-based architectures are facing problems with meet-
ing these requirements. Thus, constructing efficient and logical spots in the middle
of the data source and the cloud has been proposed. This paradigm is known as
fog computing [5, 6]. The main aim of this decentralized model is to bring devices
and software to the edge of the network where the data is being produced. The key
purpose of fog computing is to reduce the volume of data that is transferred for
processing and analysis to cloud data centers. It is also improving security, a main
concern in the IoT industry [5].
The fog layer is like a junction point where there are sufficient amounts of net-
working, computing, and storage resources to handle the local ingestion of data,
which can be acquired quickly and produce quick results. In most circumstances,
low-power system-on-chip (SoC) devices are used, for the reason that they are
designed to maintain the trade-off between computing performance and power
consumption. On the other side, the cloud servers have the horsepower to per-
form sophisticated analytics and machine learning jobs to integrate time series
538 21 Fog Computing for Bioinformatics Applications

Table 21.2 Comparison between the cloud computing and fog computing paradigms
based on desirable properties.

Property Cloud computing Fog computing

Latency High latency Low latency

Security Usually undefined or Can be defined as having
difficult to define control of edge device
Delay jitter Multiple systems involved, It has very low delay jitter
so high delay jitter
Location of service Within the Internet At the edge of the local
network
Distance between client Usually multiple hops One hop
and server
Attack on data in route High probability Low probability
Location awareness No Yes
Geo-distribution Centralized Distributed
Number of server nodes Few Large number
Mobility support Limited Supported
Type of last connectivity Leased line Wireless

formed by a number of heterogeneous or mixed types of things. Table 21.2 shows

the benefits of fog computing over cloud computing.
A reference architecture for fog computing can be seen in Figure 21.1. The con-
cept for the proposed architecture has been discussed in [6]. Generally, fog sys-
tems use the two kinds of programming models, i.e. sense-process-actuate and
stream-processing. After sensing some real-life phenomena such as temperature,
heart rate etc., sensors send the stream data to IoT networks. Moreover, the appli-
cations running on fog devices subscribe to and process the received data. The
results obtained gives an insight that can be further translated into actions and
sent to actuators. The proposed architecture has a layer where fog systems can
dynamically discover and use application programming interfaces (APIs) in order
to build complex functionalities.
The resource monitoring service provides information at the resource-
management layer in order to track the state of available cloud, fog, and network
resources, and thus can be helpful in identifying the best candidates for processing
the incoming tasks. The resource-management components can prioritize the
tasks by using the multitenant applications. In order to communicate between
the edge and cloud resources, the machine-to-machine (M2M) standards such as
Message Queuing Telemetry Transport (MQTT) and the Constrained Application
21.5 Fog Computing for Bioinformatics Applications 539

Pathogens-disease
relationships

Programming
Models Sense–Process–Actuate Stream Processing

API and Service Authorization and

API discovery API composition
Management authentication

Multitenant Monitoring and

Raw data
Resource Resource scheduling profiling
management
Management

Edge and Cloud Software-defined networking Machine-to-machine networking

Services

Sequencing Data

Figure 21.1 Fog computing architecture tailored for bioinformatics sequencing data.

Protocol (CoAP) are used. The efficient management of heterogeneous fog

networks can be accomplished while using software-defined networking (SDN).

21.5 Fog Computing for Bioinformatics Applications

In comparison to cloud computing, fog computing provides better guarantees of

quality of service (QOS) for latency-sensitive applications along with provisioning
of improved security and privacy. Many computing-extensive areas, in particu-
lar bioinformatics (having high storage and computing requirements), can take
advantage of this new paradigm. For example, one of the objectives of bioinformat-
ics research (specifically the Human Genome Project) is to make personalized as
well as preventive medicines [9–11]. An important problem for many algorithms
aimed at making personalized and preventive medicines is to search for similar
sequences (also called homolog sequences) when given a set of uncharacterized
query sequences. The homology-based searches require computationally exten-
sive algorithms (e.g. FASTA or BLAST [31]) as well as large sequence databases
(take an example of the NCBI’s database for proteins). To solve these problems
using the fog computing paradigm, their relevant data can be stored locally; as
additionally, the algorithms can be run in the vicinity of users, thus providing a
540 21 Fog Computing for Bioinformatics Applications

significant boost to achieving the aim of personalized medicines. Once achieved,

prescribing the most fitting medicines to patients, in accordance with their genetic
blueprints, will be possible. Thus, personalized medicines will ultimately improve
the presently unsatisfactory situations of drug safety and efficacy, which are the
primary contributors of soaring medical conditions (including life losses) to soci-
ety from adverse drug reactions (ADRs).
Another example of using the fog computing platform would be to enhance our
understanding of complex biological organisms, with the use of computer simula-
tions, as completely engineered systems. Biological cells have many components
that work together to form a living cell, including genes, mRNAs, tRNAs, miRNAs,
proteins, etc. Computer simulations are often performed to understand the behav-
ior of such systems. However, due to the large number of components and inher-
ent complexity that emerge from the interaction of such components, we require
larger platforms with high computational capabilities to simulate and understand
the working of such organisms. Fog computing, due to provisioning of a scalable
and reliable computing platform close to the users, is an appropriate choice to
simulate such systems. Researchers worldwide working on simulating portions of
cells, e.g. cancer pathways, metabolic pathways, signaling pathways, etc., can use
the service-oriented architecture of fog computing to solve their problems as well
as to share their findings.
In general, the use of fog computing to solve bioinformatics problems will pave
ways for improved medical care, safer pharmacotherapy, and better health, as can
be seen in Figure 21.2. In the following section we describe a practical use case,
i.e. a real-time microorganism detection system, for which fog computing infras-
tructure can potentially be utilized to speed the pace of bioinformatics research
for microorganism detection.

Cloud Targeted Applications

Data Centers to Process

high-Throughput Genomic
Data
Functional Genomics
Fog
Reliable and Scalable
Network of Edge Devices
Personalized Medicine
BioApps
Bioinformatics
Open Research
Problems Better Therapeutic
Interventions

Figure 21.2 Fog computing architecture for bioinformatics applications.

21.5 Fog Computing for Bioinformatics Applications 541

21.5.1 Real-Time Microorganism Detection System

Microorganisms or microbes are microscopic organisms that surround us every-
where, i.e. in water, in soil, in air, and even in extreme conditions where normal
life cannot prosper. These organisms live together and efficiently interact with
each other for different purposes; these intricate interactions form complex enti-
ties called microbial communities. These tiny communities play a crucial role in
keeping their eukaryotic hosts healthy as well as in the cycling of important life
elements, such as carbon, phosphorus, nitrogen, etc. However, despite their bio-
logical importance, the factors that contribute to the functioning of microbial com-
munities and their relationship with environmental changes are not very well
understood. It is not yet clear how these microorganisms, along with environmen-
tal factors, perform very particular functions that have staggering complexities.
More recently, microorganism research has been sped up by advances in
sequencing technologies. In particular, the genomes of microbes can be directly
sequenced in real time by using the samples taken from different environments.
Undeniably, with the dawn of metagenomics, microbe communities can be
understood in previously unimagined ways, in reference to their genetic, taxo-
nomical, structural, and functional relationships, both within and across microbe
communities.
Nowadays, different types of bacterias in different environments can be mon-
itored using different types of portal genome sequencing technologies; one such
example is the Oxford Nanopore MinIoN device [32]. These devices can be utilized
in the pharmaceutical industry to generate alarms upon finding the candidate
microbial pathogens in the environment. They can also be used as bacterial mon-
itoring devices, to filter air in different environments, such as food industries and
hospitals, etc. [33, 34]. In addition, these applications can monitor microbe popu-
lations inside canals, lakes, rivers, and seas [35]. There are also applications that
monitor bacteria populations inside cultivated soil and greenhouses [36, 37]. An
interesting example is related to animals, where such devices can be used to detect
the susceptibility to different diseases in animals using samples from different ani-
mal environments [38–40].
In almost all the applications discussed so far, the processes related to sequenc-
ing analysis of microbes are manually conducted, as these are spot analyses.
With the passage of time, due to the emergence of myriad applications for
microorganism detection, the detection systems needed to be utilized in different
domains. As such devices will run in a local domain, where they will generate a
lot of sequencing data, real-time analysis is a daunting task. The computationally
extensive operations common to almost all the detection systems are mainly the
following: (1) base calling, which is an operation to identify genomic sequence
based on interpreting sequencer’s signals, and (2) bacteria identification, which is
the second step in classifying the type of bacteria based on sequence information.
542 21 Fog Computing for Bioinformatics Applications

Cloud for
Exhaustive
Analysis

Fog Sequence
Intersection

Microbes
Colonies in
Environments

Figure 21.3 Fog computing for real time microorganism detection.

To process the microbe sequencing data in real time without delay, a computing
paradigm with provisioning of storage and computing power close to the sequenc-
ing machines is the ideal choice. Fog computing, having the said properties, comes
to the rescue and manages such data locally. The overall theme for the use of such
a system is presented in Figure 21.3. The use of the fog paradigm will reduce data
portability to the cloud’s data centers, thus intrinsically reducing network delay
overheads as well as privacy issues. Only significant results will be sent to the cloud
for further processing.
There is another aspect of fog computing that makes it suitable for real-time
microorganism detection. Oxford Nanopore MinIoN devices have an interesting
feature according to a computational point of view: they send data streaming of
sequences immediately without sending the data in bulk form to the other end
for processing. This type of real-time streaming makes instantaneous analysis of
data possible as soon as the sequencing results become available. This real-time
streaming aspect of Oxford Nanopore MinIoN technology, when used alongside
the benefits of fog computing, can result in quick identification of bacteria sam-
ples. As reported by some authors, the identification of bacteria and in response
antimicrobial resistance is activated within 5–10 minutes [41], compared to much
longer times using other platforms.
Some authors did similar work [42], in which they used the fog computing
model to design a real-time system utilizing a network of MinION devices to
generate sequencing data. This system showed the integration of MinION and
SoC devices to produce and intelligently process the raw sequences in the fog
environment. The system, although in a constrained environment, was able to
References 543

process data stream (i.e. sequences) and perform the base calling as well as the
identification of bacteria in real time using the fog computing paradigm. The
system was able to successfully raise alarms when placed in environments with
abnormal microbe populations.
Although a small amount of work has been done in the area of microorganism
detection using fog computing, it remains an open research area. The metage-
nomics approach provides a lot of opportunities, but there are still many questions
and challenges to be answered and solved by the fusion of bioinformatics tools and
technologies alongside the benefits of the fog computing platform.

21.6 Conclusion
Bioinformatics is a data-rich field; many high-throughput technologies in the
area of bioinformatics, such as next-generation sequencing, result in a deluge of
sequencing data. Today, we have full genome datasets of different species readily
available and many more are being sequenced. These genomic sequences are of
the utmost importance in understanding the working of biological organisms and
have myriad applications in our daily life. Processing this huge amount of data
with conventional methods is a time-consuming task. In addition, if data is large,
complex, and is coming from heterogeneous sources, then processing this type of
data becomes more tedious and daunting. Analysis of such data might take hours
or days to produce results and has caused current cloud computing paradigms to
face numerous challenges (e.g. network overheads, data security including data
provenance and data privacy, etc.).
To overcome the limitations of cloud computing, many researchers have pro-
posed multiple paradigms with the unified aim of deploying the resources close
to the edge of the network. Among the proposed paradigms, for scalable resource
rendering, fog computing is widely used by researchers worldwide. Fog comput-
ing uses the cloud at the back end while extending the cloud-to-things continuum
by bringing resources close to the edge of devices, thus overcoming many limita-
tions of the cloud computing paradigm. Based on the desirable properties of fog
computing, such as low jitter, low latency, improved security, etc., we argue that
the fog computing paradigm has great potential for data- and resource-extensive
bioinformatics applications.

References

1 Dai, L., Gao, X., Guo, Y. et al. (2012). Bioinformatics clouds for big data
manipulation. Biology Direct 7 (1): 43.
544 21 Fog Computing for Bioinformatics Applications

2 Zhang, L., Gu, S., Liu, Y. et al. (2011). Gene set analysis in the cloud.
Bioinformatics 28 (2): 294–295.
3 Karczewski, K.J., Fernald, G.H., Martin, A.R. et al. (2014). STORMSeq: an
open-source, user-friendly pipeline for processing personal genomics data in
the cloud. PLoS One 9 (1): e84860.
4 Schatz, M.C. (2009). CloudBurst: highly sensitive read mapping with MapRe-
duce. Bioinformatics 25 (11): 1363–1369.
5 Bonomi, F., Milito, R., Zhu, J., and Addepalli, S. (2012). Fog computing and its
role in the Internet of Things. In: Proceedings of the First Edition of the MCC
Workshop on Mobile Cloud Computing, 13–16. ACM.
6 Dastjerdi, A.V. and Buyya, R. (2016). Fog computing: helping the Internet of
Things realize its potential. Computer 49 (8): 112–116.
7 Anaparthy, N., Ho, Y.J., Martelotto, L. et al. (2019). Single-cell applications
of next-generation sequencing. Cold Spring Harbor Perspectives in Medicine:
a026898. https://doi.org/10.1101/cshperspect.a026898.
8 Romanel, A. (2019). Allele-specific expression analysis in cancer using
next-generation sequencing data. In: Cancer Bioinformatics. Methods in Molec-
ular Biology, vol. 1878 (ed. A. Krasnitz), 125–137. New York, NY: Humana
Press.
9 Rehman, H.U., Benso, A., Di Carlo, S. et al. (2012). Combining homolog and
motif similarity data with gene ontology relationships for protein function
prediction. In: 2012 IEEE International Conference on Bioinformatics and
Biomedicine, 1–4. IEEE.
10 Benso, A., Di Carlo, S., Rehman, H.U. et al. (2013). Accounting for
post-transcriptional regulation in Boolean Networks based regulatory models.
In: International Work-Conference on Bioinformatics and Biomedical Engineer-
ing, 397–404. IWBBIO.
11 Benso, A., Di Carlo, S., Politano, G., and Savino, A. (2012). Using genome wide
data for protein function prediction by exploiting gene ontology relationships.
In: Proceedings of 2012 IEEE International Conference on Automation, Quality
and Testing, Robotics, 497–502. IEEE.
12 Carter, M.D., Gaston, D., Huang, W.Y. et al. (2018). Genetic profiles of differ-
ent subsets of Merkel cell carcinoma show links between combined and pure
MCPyV-negative tumors. Human Pathology 71: 117–125.
13 Nguyen, T., Shi, W., and Ruden, D. (2011). CloudAligner: a fast and
full-featured MapReduce based tool for sequence mapping. BMC Research
Notes 4 (1): 171.
14 Habegger, L., Balasubramanian, S., Chen, D.Z. et al. (2012). VAT: a computa-
tional framework to functionally annotate variants in personal genomes within
a cloud-computing environment. Bioinformatics 28 (17): 2267–2269.
References 545

15 Hong, D., Rhie, A., Park, S.S. et al. (2012). FX: an RNA-Seq analysis tool on
the cloud. Bioinformatics 28 (5): 721–723.
16 Langmead, B., Hansen, K.D., and Leek, J.T. (2010). Cloud-scale
RNA-sequencing differential expression analysis with Myrna. Genome Biology
11 (8): R83.
17 Feng, X., Grossman, R., and Stein, L. (2011). PeakRanger: a cloud-enabled
peak caller for ChIP-seq data. BMC Bioinformatics 12 (1): 139.
18 P. Mell and T. Grance, The NIST definition of cloud computing. Special Pub-
lication SP 800-145, doi: 10.6028/NIST.SP.800-145, National Institute of Stan-
dards and Technology Computer Security Division, Information Technology
Laboratory, 2011.
19 A. Fox, R. Griffith, A. Joseph et al., Above the clouds: A Berkeley view of
cloud computing. Technical Report No. UCB/EECS-2009-28, Electrical Engi-
neering and Computing Sciences, University of California at Berkeley, 2009.
20 Vaquero, L.M., Rodero-Merino, L., Caceres, J., and Lindner, M. (2008). A
break in the clouds: towards a cloud definition. ACM SIGCOMM Computer
Communication Review 39 (1): 50–55.
21 Shakil, K.A. and Alam, M. (2018). Cloud computing in bioinformatics and
big data analytics: current status and future research. In: Big Data Analytics,
629–640. Singapore: Springer.
22 Langmead, B., Schatz, M.C., Lin, J. et al. (2009). Searching for SNPs with cloud
computing. Genome Biology 10 (11): R134.
23 Agapito, G., Cannataro, M., Guzzi, P.H. et al. (2013). Cloud4SNP: distributed
analysis of SNP microarray data on the cloud. In: Proceedings of the Interna-
tional Conference on Bioinformatics, Computational Biology and Biomedical
Informatics, 468. ACM.
24 Afgan, E., Chapman, B., and Taylor, J. (2012). CloudMan as a platform for
tool, data, and analysis distribution. BMC Bioinformatics 13 (1): 315.
25 Afgan, E., Baker, D., Coraor, N. et al. (2011). Harnessing cloud computing with
Galaxy Cloud. Nature Biotechnology 29 (11): 972.
26 Jourdren, L., Bernard, M., Dillies, M.A., and Le Crom, S. (2012). Eoulsan: a
cloud computing-based framework facilitating high throughput sequencing
analyses. Bioinformatics 28 (11): 1542–1543.
27 Heath, A.P., Greenway, M., Powell, R. et al. (2014). Bionimbus: a cloud for
managing, analyzing and sharing large genomics datasets. Journal of the
American Medical Informatics Association 21 (6): 969–975.
28 Angiuoli, S.V., Matalka, M., Gussman, A. et al. (2011). CloVR: a virtual
machine for automated and portable sequence analysis from the desktop
using cloud computing. BMC Bioinformatics 12 (1): 356.
546 21 Fog Computing for Bioinformatics Applications

29 Krampis, K., Booth, T., Chapman, B. et al. (2012). Cloud BioLinux:

pre-configured and on-demand bioinformatics computing for the genomics
community. BMC Bioinformatics 13 (1): 42.
30 Maksimović, M. and Vujović, V. (2017). Internet of Things based e-health
systems: ideas, expectations and concerns. In: Handbook of Large-Scale Dis-
tributed Computing in Smart Healthcare, 241–280. Cham: Springer.
31 Altschul, S.F., Gish, W., Miller, W. et al. (1990). Basic local alignment search
tool. Journal of Molecular Biology 215 (3): 403–410.
32 Jain, M., Olsen, H.E., Paten, B., and Akeson, M. (2016). The Oxford nanopore
MinION: delivery of nanopore sequencing to the genomics community.
Genome Biology 17 (1): 239.
33 Alsan, M. and Klompas, M. (2010). Acinetobacter baumannii: an emerging and
important pathogen. Journal of Clinical Outcomes Management: JCOM 17 (8):
363.
34 Peleg, A.Y. and Hooper, D.C. (2010). Hospital-acquired infections due to
gram-negative bacteria. New England Journal of Medicine 362 (19): 1804–1813.
35 Tan, B., Ng, C.M., Nshimyimana, J.P. et al. (2015). Next-generation sequencing
(NGS) for assessment of microbial water quality: current progress, challenges,
and future opportunities. Frontiers in Microbiology 6: 1027.
36 Daniel, R. (2005). The metagenomics of soil. Nature Reviews Microbiology 3 (6):
470.
37 Castañeda, L.E. and Barbosa, O. (2017). Metagenomic analysis exploring tax-
onomic and functional diversity of soil microbial communities in Chilean
vineyards and surrounding native forests. PeerJ 5: e3098.
38 A. Edwards, A. R. Debbonaire, B. Sattler et al., Extreme metagenomics using
nanopore DNA sequencing: a field report from Svalbard, 78 N. Preprint
bioRxiv, doi: https://doi.org/10.1101/073965, 2016.
39 Castro-Wallace, S.L., Chiu, C.Y., John, K.K. et al. (2017). Nanopore DNA
sequencing and genome assembly on the International Space Station. Scientific
Reports 7 (1): 18022.
40 McIntyre, A.B., Rizzardi, L., Angela, M.Y. et al. (2016). Nanopore sequencing
in microgravity. npj Microgravity 2: 16035.
41 Greninger, A.L., Naccache, S.N., Federman, S. et al. (2015). Rapid metage-
nomic identification of viral pathogens in clinical samples by real-time
nanopore sequencing analysis. Genome Medicine 7 (1): 99.
42 Merelli, I., Morganti, L., Corni, E. et al. (2018). Low-power portable devices
for metagenomics analysis: Fog computing makes bioinformatics ready for the
Internet of Things. Future Generation Computer Systems 88: 467–478.