Fog Computing (Use It in Some Application)
Fog Computing (Use It in Some Application)
21
21.1 Introduction
Bioinformatics is an interdisciplinary field having a myriad of applications in the
area of health and life sciences. Bioinformatics algorithms generally have high
computational and storage requirements [1, 2]. Currently, the most apt prefer-
ence for bioinformatics researchers is to use cloud computing–based services to
meet their high computational requirements within a limited budget. However,
the scientists need to port their data and algorithms to the cloud environment in
order to perform analysis. In recent days, due to the emerging trend of “on-the-fly
solutions” for bioinformatics problems, the computational resources need to be
brought closer to the users [3–6]. Fog computing, as an extension of cloud com-
puting, brings the services to the edge of the network. This essentially brings the
advantages and power of the cloud closer to the place where the data is actually
generated, benefiting the resource demanding bioinformatics algorithms.
In the field of bioinformatics, the high-throughput technologies such as
next-generation sequencing result into a diverse deluge of sequencing data [7, 8].
These raw sequences are called omics sequences, which include DNA sequences,
RNA sequences, and protein sequences, among others. The study of different
types of omics sequences produced different areas of research; for example, the
study of gene sequences established the field of genomics, likewise the study
of protein sequences resulted in the field proteomics, and so on [9–11]. In each
subfield, the scientists try to explore the effect of omics sequences in under-
standing biological organism as a completely engineered system. In addition to
sequencing data, one of the promises of bioinformatics research, in general, is to
develop personalized medicine. The area that focuses on personalized medicine
Public Cloud. This type of cloud is publicly available. The customer can use the
hardware and software resources of a data center. These resources can be
acquired from a vender organization through credit cards. Amazon, Azure,
and Google apps are common examples of public clouds.
21.3 Cloud Computing Applications in Bioinformatics 533
Private Cloud. These types of cloud can be used only by employees of a specific
organization. The software and hardware resources can be configured by users
and the cloud administrator according to the requirements of specific users. Sev-
eral open sources and commercial software available on the Internet to establish
these types of clouds. OpenStake, Open Nebula, and VMware Cloud are com-
mon examples of software used to create private cloud.
Hybrid Cloud. This type of cloud is a combination of two clouds of different deliv-
ery models that are connected through specific technology in order to handle
data and application portability.
with Amazons EC2 cloud, through the Eucalyptus cloud platform. Users have
the convenience of accessing the desired bioinformatics tools as well as scalable
cloud resources by using the EC2 cloud on a local machine.
The difficulties encountered by bioinformatics researchers in order to carry out
their research in a cost-effective and fast manner is resolved, to some extent, with
the help of the cloud computing services described earlier. However, there are
growing privacy concerns, longer delays, and jitter, and on top of that researchers
need to port their data and algorithms to the cloud environment in order to
perform analysis and retrieve results. Recently, the emerging trend of on-the-fly
solutions for bioinformatics problems, as well as the demand for more and more
real-time applications, requires reducing the remoteness between computing
platforms and users’ data by bringing the computational resources closer to the
users. Fog computing, as an extension of cloud computing, brings the services
to the edge of the network. In the next section we give a thorough introduc-
tion of fog computing and highlight its appropriateness for the bioinformatics
community.
Table 21.2 Comparison between the cloud computing and fog computing paradigms
based on desirable properties.
Pathogens-disease
relationships
Programming
Models Sense–Process–Actuate Stream Processing
Sequencing Data
Figure 21.1 Fog computing architecture tailored for bioinformatics sequencing data.
Cloud for
Exhaustive
Analysis
Fog Sequence
Intersection
Microbes
Colonies in
Environments
To process the microbe sequencing data in real time without delay, a computing
paradigm with provisioning of storage and computing power close to the sequenc-
ing machines is the ideal choice. Fog computing, having the said properties, comes
to the rescue and manages such data locally. The overall theme for the use of such
a system is presented in Figure 21.3. The use of the fog paradigm will reduce data
portability to the cloud’s data centers, thus intrinsically reducing network delay
overheads as well as privacy issues. Only significant results will be sent to the cloud
for further processing.
There is another aspect of fog computing that makes it suitable for real-time
microorganism detection. Oxford Nanopore MinIoN devices have an interesting
feature according to a computational point of view: they send data streaming of
sequences immediately without sending the data in bulk form to the other end
for processing. This type of real-time streaming makes instantaneous analysis of
data possible as soon as the sequencing results become available. This real-time
streaming aspect of Oxford Nanopore MinIoN technology, when used alongside
the benefits of fog computing, can result in quick identification of bacteria sam-
ples. As reported by some authors, the identification of bacteria and in response
antimicrobial resistance is activated within 5–10 minutes [41], compared to much
longer times using other platforms.
Some authors did similar work [42], in which they used the fog computing
model to design a real-time system utilizing a network of MinION devices to
generate sequencing data. This system showed the integration of MinION and
SoC devices to produce and intelligently process the raw sequences in the fog
environment. The system, although in a constrained environment, was able to
References 543
process data stream (i.e. sequences) and perform the base calling as well as the
identification of bacteria in real time using the fog computing paradigm. The
system was able to successfully raise alarms when placed in environments with
abnormal microbe populations.
Although a small amount of work has been done in the area of microorganism
detection using fog computing, it remains an open research area. The metage-
nomics approach provides a lot of opportunities, but there are still many questions
and challenges to be answered and solved by the fusion of bioinformatics tools and
technologies alongside the benefits of the fog computing platform.
21.6 Conclusion
Bioinformatics is a data-rich field; many high-throughput technologies in the
area of bioinformatics, such as next-generation sequencing, result in a deluge of
sequencing data. Today, we have full genome datasets of different species readily
available and many more are being sequenced. These genomic sequences are of
the utmost importance in understanding the working of biological organisms and
have myriad applications in our daily life. Processing this huge amount of data
with conventional methods is a time-consuming task. In addition, if data is large,
complex, and is coming from heterogeneous sources, then processing this type of
data becomes more tedious and daunting. Analysis of such data might take hours
or days to produce results and has caused current cloud computing paradigms to
face numerous challenges (e.g. network overheads, data security including data
provenance and data privacy, etc.).
To overcome the limitations of cloud computing, many researchers have pro-
posed multiple paradigms with the unified aim of deploying the resources close
to the edge of the network. Among the proposed paradigms, for scalable resource
rendering, fog computing is widely used by researchers worldwide. Fog comput-
ing uses the cloud at the back end while extending the cloud-to-things continuum
by bringing resources close to the edge of devices, thus overcoming many limita-
tions of the cloud computing paradigm. Based on the desirable properties of fog
computing, such as low jitter, low latency, improved security, etc., we argue that
the fog computing paradigm has great potential for data- and resource-extensive
bioinformatics applications.
References
1 Dai, L., Gao, X., Guo, Y. et al. (2012). Bioinformatics clouds for big data
manipulation. Biology Direct 7 (1): 43.
544 21 Fog Computing for Bioinformatics Applications
2 Zhang, L., Gu, S., Liu, Y. et al. (2011). Gene set analysis in the cloud.
Bioinformatics 28 (2): 294–295.
3 Karczewski, K.J., Fernald, G.H., Martin, A.R. et al. (2014). STORMSeq: an
open-source, user-friendly pipeline for processing personal genomics data in
the cloud. PLoS One 9 (1): e84860.
4 Schatz, M.C. (2009). CloudBurst: highly sensitive read mapping with MapRe-
duce. Bioinformatics 25 (11): 1363–1369.
5 Bonomi, F., Milito, R., Zhu, J., and Addepalli, S. (2012). Fog computing and its
role in the Internet of Things. In: Proceedings of the First Edition of the MCC
Workshop on Mobile Cloud Computing, 13–16. ACM.
6 Dastjerdi, A.V. and Buyya, R. (2016). Fog computing: helping the Internet of
Things realize its potential. Computer 49 (8): 112–116.
7 Anaparthy, N., Ho, Y.J., Martelotto, L. et al. (2019). Single-cell applications
of next-generation sequencing. Cold Spring Harbor Perspectives in Medicine:
a026898. https://doi.org/10.1101/cshperspect.a026898.
8 Romanel, A. (2019). Allele-specific expression analysis in cancer using
next-generation sequencing data. In: Cancer Bioinformatics. Methods in Molec-
ular Biology, vol. 1878 (ed. A. Krasnitz), 125–137. New York, NY: Humana
Press.
9 Rehman, H.U., Benso, A., Di Carlo, S. et al. (2012). Combining homolog and
motif similarity data with gene ontology relationships for protein function
prediction. In: 2012 IEEE International Conference on Bioinformatics and
Biomedicine, 1–4. IEEE.
10 Benso, A., Di Carlo, S., Rehman, H.U. et al. (2013). Accounting for
post-transcriptional regulation in Boolean Networks based regulatory models.
In: International Work-Conference on Bioinformatics and Biomedical Engineer-
ing, 397–404. IWBBIO.
11 Benso, A., Di Carlo, S., Politano, G., and Savino, A. (2012). Using genome wide
data for protein function prediction by exploiting gene ontology relationships.
In: Proceedings of 2012 IEEE International Conference on Automation, Quality
and Testing, Robotics, 497–502. IEEE.
12 Carter, M.D., Gaston, D., Huang, W.Y. et al. (2018). Genetic profiles of differ-
ent subsets of Merkel cell carcinoma show links between combined and pure
MCPyV-negative tumors. Human Pathology 71: 117–125.
13 Nguyen, T., Shi, W., and Ruden, D. (2011). CloudAligner: a fast and
full-featured MapReduce based tool for sequence mapping. BMC Research
Notes 4 (1): 171.
14 Habegger, L., Balasubramanian, S., Chen, D.Z. et al. (2012). VAT: a computa-
tional framework to functionally annotate variants in personal genomes within
a cloud-computing environment. Bioinformatics 28 (17): 2267–2269.
References 545
15 Hong, D., Rhie, A., Park, S.S. et al. (2012). FX: an RNA-Seq analysis tool on
the cloud. Bioinformatics 28 (5): 721–723.
16 Langmead, B., Hansen, K.D., and Leek, J.T. (2010). Cloud-scale
RNA-sequencing differential expression analysis with Myrna. Genome Biology
11 (8): R83.
17 Feng, X., Grossman, R., and Stein, L. (2011). PeakRanger: a cloud-enabled
peak caller for ChIP-seq data. BMC Bioinformatics 12 (1): 139.
18 P. Mell and T. Grance, The NIST definition of cloud computing. Special Pub-
lication SP 800-145, doi: 10.6028/NIST.SP.800-145, National Institute of Stan-
dards and Technology Computer Security Division, Information Technology
Laboratory, 2011.
19 A. Fox, R. Griffith, A. Joseph et al., Above the clouds: A Berkeley view of
cloud computing. Technical Report No. UCB/EECS-2009-28, Electrical Engi-
neering and Computing Sciences, University of California at Berkeley, 2009.
20 Vaquero, L.M., Rodero-Merino, L., Caceres, J., and Lindner, M. (2008). A
break in the clouds: towards a cloud definition. ACM SIGCOMM Computer
Communication Review 39 (1): 50–55.
21 Shakil, K.A. and Alam, M. (2018). Cloud computing in bioinformatics and
big data analytics: current status and future research. In: Big Data Analytics,
629–640. Singapore: Springer.
22 Langmead, B., Schatz, M.C., Lin, J. et al. (2009). Searching for SNPs with cloud
computing. Genome Biology 10 (11): R134.
23 Agapito, G., Cannataro, M., Guzzi, P.H. et al. (2013). Cloud4SNP: distributed
analysis of SNP microarray data on the cloud. In: Proceedings of the Interna-
tional Conference on Bioinformatics, Computational Biology and Biomedical
Informatics, 468. ACM.
24 Afgan, E., Chapman, B., and Taylor, J. (2012). CloudMan as a platform for
tool, data, and analysis distribution. BMC Bioinformatics 13 (1): 315.
25 Afgan, E., Baker, D., Coraor, N. et al. (2011). Harnessing cloud computing with
Galaxy Cloud. Nature Biotechnology 29 (11): 972.
26 Jourdren, L., Bernard, M., Dillies, M.A., and Le Crom, S. (2012). Eoulsan: a
cloud computing-based framework facilitating high throughput sequencing
analyses. Bioinformatics 28 (11): 1542–1543.
27 Heath, A.P., Greenway, M., Powell, R. et al. (2014). Bionimbus: a cloud for
managing, analyzing and sharing large genomics datasets. Journal of the
American Medical Informatics Association 21 (6): 969–975.
28 Angiuoli, S.V., Matalka, M., Gussman, A. et al. (2011). CloVR: a virtual
machine for automated and portable sequence analysis from the desktop
using cloud computing. BMC Bioinformatics 12 (1): 356.
546 21 Fog Computing for Bioinformatics Applications