0% found this document useful (0 votes)
29 views8 pages

Epi Info

This document proposes a new cross-platform architecture for the Epi-Info software suite. The current Epi-Info Desktop only runs on Windows. The new architecture uses Electron as a wrapper for an AngularJS front-end, a Python analytics module, and an embedded NoSQL database called PouchDB. This allows the software to be compiled into executables that run natively on Linux, Mac and Windows from a single codebase. It simplifies development and allows for cross-platform form design, data collection, offline/online functionality and scalable storage. Initial results show performance is not significantly sacrificed and warrant further research.

Uploaded by

Mara Morán
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views8 pages

Epi Info

This document proposes a new cross-platform architecture for the Epi-Info software suite. The current Epi-Info Desktop only runs on Windows. The new architecture uses Electron as a wrapper for an AngularJS front-end, a Python analytics module, and an embedded NoSQL database called PouchDB. This allows the software to be compiled into executables that run natively on Linux, Mac and Windows from a single codebase. It simplifies development and allows for cross-platform form design, data collection, offline/online functionality and scalable storage. Initial results show performance is not significantly sacrificed and warrant further research.

Uploaded by

Mara Morán
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Camp et al.

BMC Bioinformatics 2018, 19(Suppl 11):359


https://doi.org/10.1186/s12859-018-2334-8

SOFT WA R E Open Access

A new cross-platform architecture for


epi-info software suite
Blake Camp* , Jaya Krishna Mandivarapu, Nagashayan Ramamurthy, James Wingo,
Anu G. Bourgeois, Xiaojun Cao and Rajshekhar Sunderraman
From the 6th Workshop on Computational Advances in Molecular Epidemiology (CAME 2017)
Boston, MA, USA. 20 August 2017

Abstract
Background: The Epi-Info software suite, built and maintained by the Centers for Disease Control and Prevention
(CDC), is widely used by epidemiologists and public health researchers to collect and analyze public health data,
especially in the event of outbreaks such as Ebola and Zika. As it exists today, Epi-Info Desktop runs only on the
Windows platform, and the larger Epi-Info Suite of products consists of separate codebases for several different
devices and use-cases. Software portability has become increasingly important over the past few years as it offers a
number of obvious benefits. These include reduced development time, reduced cost, and simplified system
architecture. Thus, there is a blatant need for continued research. Specifically, it is critical to fully understand any
underlying negative performance issues which arise from platform-agnostic systems. Such understanding should
allow for improved design, and thus result in substantial mitigation of reduced performance. In this paper, we present
a viable cross-platform architecture for Epi-Info which solves many of these problems.
Results: We have successfully generated executables for Linux, Mac, and Windows from a single code-base, and we
have shown that performance need not be completely sacrificed when building a cross-platform application. This has
been accomplished by using Electron as a wrapper for an AngularJS app, a Python analytics module, and a local,
browser-based NoSQL database.
Conclusions: Promising results warrant future research. Specifically, the design allows for cross-platform
form-design, data-collection, offline/online modes, scalable storage, automatic local-to-remote data sync, and fast
analytics which rival more traditional approaches.
Keywords: Cross-platform, Form-design, Analytics, Pubic-health, NoSQL, Electron, Data-collection

Background undergone a number of revisions and is currently built


Developed by the Centers for Disease Control and Pre- upon the Windows operating system. While popular, it
vention (CDC) [1], Epi-Info is a software package which has become clear that there are several areas in which
enables public health workers to assess disease outbreak, the product needs to be improved [6]. First, a particu-
collect data, manage surveillance data sets, and analyze lar deficiency of the system that needs to be addressed
data [2]. The Epi-Info software is widely used by the epi- is the software’s inability to run on Linux or Macs. A
demiologists and health professionals in the governments, tool that is truly capable of contributing to the inter-
public health non-profits, NGO’s, universities and health national community’s fight against infectious diseases
schools (for example, [3–5]). It is estimated that there should support as many operating systems and devices
are over one million users [2]. The desktop software has as possible. An open-source and cross-platform version
of such software package will allow the developers from
*Correspondence: [email protected] around the world to access, design and enhance Epi-
Department of Computer Science, Georgia State University, 25 Park Place,
Atlanta, GA, USA Info. Second, having been under development for more
than three decades, Epi-Info is now comprised of several
© The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Camp et al. BMC Bioinformatics 2018, 19(Suppl 11):359 Page 54 of 67

separate applications, codebases and use-cases includ- such as dashboards etc. This data is exposed as
ing desktop, mobile, web, and cloud. This has resulted RESTful Web Services to the clients, and
in an unfortunate increase in development complexity. • Multiple clients, each equipped with an AngularJS
Outbreaks can often spread faster than engineers can application that provides all the functionality of
keep up. It is not uncommon for new analytics compo- Epi-Info including form design, deployment of forms,
nents or data-collection tools to be requested by public data collection, and user dashboards, and on demand
health teams on the ground during highly active out- analytics. Each client stores its data in PouchDB,
break situations. This on-the-fly requirements specifica- which is automatically synchronized the CouchDB
tion and engineering can be difficult to manage together on the server. The client also includes a Python/Flask
with complicated codebases. Third, the existing interfaces module that provides access to a large set of analytics
for offline data-collection and maintenance protocols are functionality. All of the client is encapsulated with
not altogether intuitive. The processes for importing or the Electron, a platform-independent application
broadcasting between remote servers and local client framework.
machines may present steep learning curves for public
health officials. Client Side Architecture
In this work, we propose and implement a new cross- To address the cross-platform system requirement, we
platform architecture for Epi-Info software suite, which use Electron as a wrap- per for an AngularJS front-end,
can simplify the codebases, expedite the development a Python Analytics module, and an embedded NoSQL
process and incorporate open-source techniques for flex- database called PouchDB.
ible interfaces. The proposed architecture adopts the The database accessibility protocol was an important
Electron [7] as the cross-platform framework to achieve design consideration. PouchDB is a lightweight, browser-
significant reduction in development time and cost. The based NoSQL database which is designed to automatically
open-source techniques in NoSQL and Python are also sync with a remote Couch (Fig. 2) Database. However,
introduced into Epi-Info. NoSQL, as a viable database the proper access point was not immediately obviously.
option, can scale extremely well and provide a flexible As shown in Fig. 1, the PouchDB is accessed directly
structure to otherwise unstructured data. Python [8] has by the Angular front-end. Importantly, this configuration
emerged as a very popular languages for data analytics was chosen because the alternative approach, whereby
and becomes the coding language of choice for many in the database is accessed directly by the Python Analyt-
the science community. Its robust statistical libraries and ics module, would have required the use of a Python-
machine learning frameworks make it a suitable choice for PouchDB wrapper. The documentation for the wrapper is
Epi-Info. In addition, the ease-of-use and platform uni- very light, and it has been much easier to use the origi-
versality from Python can greatly reduce the development nal API’s for database interaction. Any data needed by the
time of any new modules in the event of some emergencies Python Analytics module can be requested and sent via a
or outbreak. simple HTTP connection.
A Web-Based Form-Designer is not currently part of The Flask framework manages the Python code. When
the existing Epi-Info suite, but such a product has been the Electron application is initiated, a child process is
needed for some time [2]. One challenge here is finding spawned which starts the Flask server, allowing access to
a balance between flexibility, speed, and ease-of-use with the Python analytics module. This has proved successful
respect to the form design process. We propose to use and it has allowed us to seamlessly integrate Angular and
AngularJS [9]. Even though AngularJS does not natively Python in a single, local application. When an analytics requ-
support drag & drop functionality, we develop a back-to- est is made, for example, the data is simply re-routed to the
front design methodology to ensure a user-friendly, yet appropriate Python function via the HTTP connection.
effective form-designer. Python was chosen because of it’s popularity and
platform-agnosticism. It is critical that researchers from
Implementation around the world be allowed to contribute to this project
We present a cross-platform system architecture which in a timely way. This can be facilitated by offering a plat-
allows for intuitive form-design, data-collection, online form comprised of tools which are popular and universal.
and offline modes, automatic local-remote data sync, fast
analytics, and scalable storage. The overall system archi- NoSQL and PouchDB
tecture is shown in Fig. 1. NoSQL databases have been one of our primary areas of
The Epi-Info deployment consists of research to date. They are understood to scale extremely
well because they are well-suited to provide a flexible
• A server-side CouchDB database which stores shared structure to otherwise unstructured data. That fact has
form templates, data, and other user information proven helpful when storing Epi-Form schemas. However,
Camp et al. BMC Bioinformatics 2018, 19(Suppl 11):359 Page 55 of 67

Fig. 1 System Architecture - Local PouchDB clients sync automatically with a central CouchDB cloud server,allowing for seamless online-offline
transition

Fig. 2 Client Side Architecture - An AngularJS app, Python Analytics Module, and local PouchDB
Camp et al. BMC Bioinformatics 2018, 19(Suppl 11):359 Page 56 of 67

we have identified several other database-related issues It was important to precisely identify each point of
which require careful consideration. data-transfer and manipulation in order to pinpoint any
It was necessary to choose an appropriate candidate potential bottlenecks (Fig. 3, Table 1).
to be embedded with our Electron Application. This is In order to expedite the analytics cycle, we demon-
critical because larger NoSQL databases, like MongoDB, strate drastic performance improvement by storing two
require different installation protocols for different oper- copies of any particular dataset. We keep one copy in
ating systems. Recall once again, our primary objective PouchDB, so that it may be available for automatic syncing
is to be a platform-agnostic application that is extremely with the central CouchDB. We keep another in a com-
user-friendly, and very little effort to download and install. pressed format native to Python, called HDF5. Even on
Thus, our approach has been to embed a lightweight a slow machine, the read and write times for HDF5 are
NoSQL database within our Electron desktop application. extremely fast, better even than SQL or CSV. Additionally,
After research, PouchDB was selected as the NoSQL the excellent compression ratio means that even though
database, and we consider it to be a viable option going we store the data twice, we increase the total storage-size
forward. It’s robust documentation, community support, requirement by less than 10%.
and seamless synchronization with CouchDB makes it As shown in Table 1, the most costly processes, with
very attractive. Furthermore, it is easy to embed, and respect to time, involve retrieving the data from PouchDB,
can be interfaced directly with the Angular frontend. sending the data to the analytics module, and converting
Specifically, PouchDB is designed to sync automati- the data to a useable DataFrame. This problem is exac-
cally with a remote Couch Database. This allows for erbated if the database is allowed to accumulate alot of
seamless transition between online and offline modes, data prior to carrying out these steps. Thus, it is pos-
and guards against the potential for data-loss during sible to mitigate such effects by performing the opera-
transfer. tions iteratively, whenever new data is entered into the
As a result of this auto-sync, any underlying changes database. The compressed HDF5 DataFrame must be
to data on local client machines can be automatically continuously maintained, allowing for immediate analyt-
broadcast a centralized remote database, and subse- ics requests at all times. Fortunately, PouchDB comes
quently on to any additional client machines. Further- equipped witha change-log which offers a detailed expla-
more, PouchDB provides a detailed change-log which nation of any changes to the underlying data. This can be
identifies and explains any alterations in local data or used to subsequently update the compressed DataFrame.
data-structure. The result is a system that would allow for very fast
access to data and analytics which rivals even traditional
Analytics Module approaches. Additional performance metrics are provided
Epi-Info is essentially data-collection and analytics soft- in the “Results” section of this paper.
ware. Consequently, the analytics module is perhaps the
most crucial component, and the primary objective was to Form Designer
increase speed and efficiency. In the following paragraphs, The challenge associated with building a web-based form
we outline an approach which successfully mitigates the designer is derived from a need to balance flexibil-
negative effects often found in cross-platform and NoSQL ity with specificity. The current desktop form-designer
systems. provided by Epi-Info offers extreme precision, allowing

Fig. 3 System Data-Flow - Each step requires time, see Table 1


Camp et al. BMC Bioinformatics 2018, 19(Suppl 11):359 Page 57 of 67

Table 1 Approximate time-requirements for critical data-flow base, and we have shown that performance need not
processes be completely sacrificed when building a cross-platform
Process Time (Data: 50k x 200) application.
(300MB) Figure 5 shows a typical Epi-Info desktop workflow.
1 Retrieve data from PouchDB 1 minute Our product can successfully support: Form Design,
2 HTTP POST, send JSON data to Python 30 seconds
Import/Export of Forms to a centralized database, Data-
Entry, Advanced Analytics, Savable and Customizable
3 Convert JSON to Pandas DataFrame 30–45 seconds
Advanced Analytics Dashboards (Fig. 6), and Report
4 Compress/Write JSON to HDF <1 second (compresses Export.
to only 21MB) By incorporating the use of a compressed HDF5
5 Analytics varies, but fast DataFrame, we have successfully demonstrated that we
6 HTTP POST, return results to Angular varies can expedite the analytics cycle, thus mitigating many
Test Data: 50k records, by 200 features
of the negative effects typically associated with cross-
platform or NoSQL applications. For a dataset with 50,000
records and 200 columns, the software can read the
form-creators the ability to define form elements on a data, perform a user- defined 10-variable multiple logis-
pixel-by-pixel basis. The form-schemas are then stored as tic regression, and report the results in under 2 s, even on
XML, and the exact positions of form elements are sub- modest machines.
sequently recorded. On the one hand, this is desirable Additionally, the use of multiple cores can further opti-
because health form appearance often requires such acute mize the analytics module. This allows multiple analytics
attention to detail. On the other, this can cause a large requests to be made on-the-fly as needed. Reports are sent
increase in design time. With our web-based AngularJS back to the user-interface as those jobs are completed.
form designer, we strike a balance between the two char- That is, any single request need not wait for a previous job
acteristics, offering users an acceptable level of precision to finish as long as there is another core available for use
while simultaneously expediting the form-design process on the machine (Fig. 7).
with a flexible and intuitive interface (Fig. 4).
Additional Considerations
Results The design presented in this paper should be regarded
To date, we have successfully generated executables for as one acceptable approach with respect to the the
Windows, Mac, and Linux machines from a single code aforementioned requirements. However, numerous other

Fig. 4 Screenshot of Angular-based drag-and-drop Form Designer


Camp et al. BMC Bioinformatics 2018, 19(Suppl 11):359 Page 58 of 67

use-cases, location-specific needs, and a disjoint interna-


tional community of engineers. Indeed, the CDC often
plays a leadership role in many areas of the world in the
event of outbreaks. Nevertheless, there are countless other
organizations, such as Itaipu, which each have separate
teams building unique tools to combat specific regional
problems. Consequently, it appears there is a fair amount
of redundancy with respect to functionality and code.
This problem is likely to persist without the oversight of
an international standardizing orginization. However, it is
possible that the problem could be mitigated, even slightly,
by the use of broadly-adopted and flexible technologies.
When appropriate, generality and popularity should be
favored.
Central to the initial conceptualization of our design
was the selection as Python as the language of choice
for the Analytics Module. As mentioned, this was due
in large part to its popularity among the scientific com-
puting community and its platform agnostic quality. This
ultimately factored greatly into the choice of a suit-
able cross-platform framework. Framework candidates
which were discussed included Electron, Kivy, and .NET-
Core. Electron and Kivy were selected for closer inspec-
tion due to language familiarity amongst the design
team.
Kivy is a cross-platform framework for developing
Fig. 5 Typical Epi-Info workflow
Python apps. It runs on iOS, Linux, Windows, Android,
and OSX; making it very attractive. However, it would
have required time to become acquainted with the front-
frameworks, architectures, and configurations could end framework provided by Kivy, as it does not rely on
potentially prove adequate. In this section, our reasons for traditional web-technologies. Ultimately, this encouraged
favoring this system will be explained more thoroughly. us to move towards Electron.
The research conducted during the course of this Electron, as opposed to Kivy, allows engineers to work
project resulted in discussions with several additional with familiar technologies which can be easily encapsu-
research groups. Of particular note, was a collaborative lated in the framework. We feel that this should allow
multi-day meeting with a team from the University of for increased flexibility, a reduction in development time,
Brasilia and representatives from the Itaipu Bi-National and greater ability to share components across applica-
Energy Plant. During the meeting, a consensus was artic- tions. It is not clear, for example, that it would be easy to
ulated which highlighted the need for greater amounts deploy the majority of a Kivy app to the web. However,
of international standardization and collaboration as sep- the Electron app we have designed should be fairly easy to
arate nations and organizations seek to fight the spread migrate. Additionally, there are mobile frameworks, such
of infectious diseases, particularly with respect to the as Ionic, which would also facilitate the simple transfer of
technology involved. On that front, there was additional the majority of code to a mobile app.
agreement that there are two domains where this is par- We were greatly encouraged by the technological simi-
ticularly important: data standardization, and software larity presented by the group from Itaipu. Like us, they use
standardization. a combination of AngualarJS and Python. Their current
The standardization of data is a challenging task, but app, however, is entirely web-based, yet they have a need
progress has been made thanks in part to Health Level for offline capability. Because we both are using highly
Seven International (HL7). Recently they have published flexible, similar technologies, there is a real opportunity
a standard for public-health data knows as the FHIR, and for collaboration and outright code-sharing. We feel it
it is currently being incorporated into various software would be easy to extend their application, wrap it in an
tools around the planet. There seems to be less cohe- electron framework with an embedded NoSQL database,
sion, however, on the software standardization front. This and allow for a robust offline use-case. This would simply
can be attributed to the enormous amounts of specific not be possible if each group were not using such widely
Camp et al. BMC Bioinformatics 2018, 19(Suppl 11):359 Page 59 of 67

Fig. 6 Screenshot of GSU Epi-Info Analytics interface

adopted technologies, and it shows the power and need weakly-supported tools. This was discovered first hand
for additional software standardization. by our researchers, particularly when tasked with choos-
Importantly, new technologies must first be examined ing an acceptable local NoSQL database. Many newer,
and evaluated based on the quality of community sup- lightweight NoSQL databases have very shallow docu-
port. Public-Health is a critical domain, and it would not mentation and community support. ForerunnerDB was
be wise to imprudently experiment with untested and one such product which ultimately proved impractical.

Fig. 7 High level view of step-by-step analytics optimization using multiple cores
Camp et al. BMC Bioinformatics 2018, 19(Suppl 11):359 Page 60 of 67

Development was drastically slowed whenever a small bug Funding


was found in the database code because the required doc- Research was partially funded by the CDC under contract number
200-2016-91969. Publication costs are covered by the Department of
umentation literally did not exist. Eventually, PouchDB Computer Science, Georgia State University.
was discovered, and we found it to have adequate sup-
port, greatly simplifying and expediting the development Availability of data and materials
Detailed installation instructions and a list of development dependencies can
process. be found at: https://bitbucket.org/bbc1183/epi-info/. Check out the ‘dev’
branch, and go into the directory called ‘electron-with-python’. There is a
readme.txt in the root of that directory.
Conclusions
Creative system design can alleviate many of the unde- About this supplement
sirable qualities typically associated with cross-platform This article has been published as part of BMC Bioinformatics Volume 19
Supplement 11, 2018: Proceedings from the 6th Workshop on Computational
frameworks, such as Electron. This can require a mix of Advances in Molecular Epidemiology (CAME 2017). The full contents of the
languages, databases, and design patterns. The power of supplement are available online at https://bmcbioinformatics.biomedcentral.
the resulting system has, in this case, proven to be worth com/articles/supplements/volume-19-supplement-11.
the effort, successfully addressing many of the necessary Authors’ contributions
system requirements. BC was responsible for project Management. He was fully involved in most
With our cross-platform framework design for Epi- aspects of system specifications, design, and development. AGB was the
project lead and contributed substantially with respect to analysis of system
Info, the international community will now have the performance. XC acted in a supervory role and offered networking expertise.
tools to rapidly respond to an emergency outbreak, even RS acted in a supervisory role and contributed significantly during the system
under remote conditions. By designing a single code- specification and design phase. JKM was the primary researcher responsible
for data analytics. NR researched and selected the appropriate databases. JW
base that is capable of generating executables for multi- was responsible for development of the form designer. All authors read and
ple platforms, developers can quickly provide customized approved the final manuscript.
components to those deployed. Our work towards opti-
Ethics approval and consent to participate
mizing the data analytics will enable better coordina- Not applicable.
tion and a more effective response to any outbreak
around the globe. However, future research is still needed, Consent for publication
Not applicable.
such as the development and deployment of Epi-Info on
mobile devices, including hand held tablets and smart Competing interests
phones. It is also imperative to develop methods to auto- The authors declare that they have no competing interests.
mate and adapt data synchronization with centralized
Published: 22 October 2018
servers/coud in an opportunistic way when connectivity is
available. References
1. Centers for Disease Control and Prevention. https://www.cdc.gov/.
Accessed 6 July 2018.
Availability and Requirements 2. CDC Epi Info. https://www.cdc.gov/epiinfo/index.html. Accessed 6 July
Project name: GSU Epi-Info 2018.
3. Cathy Ann Marshall EM, Unwin N. An epidemiological study of rates of
Project home page: http://epi.cs.gsu.edu & https:// illness in passengers and crew at a busy caribbean cruise port. BMC Public
bitbucket.org/bbc1183/epi-info/ Health. 2016;16:314.
Operating system(s): Platform independent. However, 4. Brian A Maponga, Daniel Chirundu NTGMTGS, Takundwa L. Risk factors for
contracting watery diarrhoea in kadoma city, zimbabwe, 2011: a case
there are known issues with respect to installing electron control study. BMC Infect Dis. 2013;13:567.
on Ubuntu 16.04. Recommend Ubuntu 14 for develop- 5. Matthew Scotch, Bambang Parmanto CSG, Sharma RK. Exploring the role
ment. of gis during community health assessment problem solving: experiences
of public health professionals. Int J Health Geogr. 2006;5:39.
Programming language(s): NodeJS, AngularJS, Python 6. Dean AG, Dean JADC, et al. Epi info, version 6: a word processing,
Other requirements: In the repository located at https:// database, and statistics program for epidemiology on microcomputers.
PhD thesis. Atlanta: Centers for Disease Control and Prevention; 1995.
bitbucket.org/bbc1183/epi-info/, the active branch is 7. Electron: Build Cross Platform Desktop Apps with Javascript, HTML, and
called ’dev’. The current software is located in the direc- CSS. https://electron.atom.io/. Accessed 6 July 2018.
tory called ’electron-with-python’. There is a readme.txt in 8. Python. https://www.python.org/. Accessed 6 July 2018.
9. Angular. https://angular.io/. Accessed 6 July 2018.
the root of that directory. Will not work out of the box on
Ubuntu 16.04.
License: MIT
Any restrictions to use by non-acedemics: Not applicable.

Abbreviations
CDC: Centers for Disease Control and Prevention; HDF5: Hierarchical Data
Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and
organize large amounts of data; NoSQL: Non-relational database

You might also like