Case Study IBM Watson Analytics Cloud Platform
Case Study IBM Watson Analytics Cloud Platform
Case Study IBM Watson Analytics Cloud Platform
net/publication/305311532
CITATIONS READS
5 705
4 authors, including:
Ernesto Iadanza
University of Florence
118 PUBLICATIONS 380 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Ernesto Iadanza on 16 July 2016.
Article
Case Study: IBM Watson Analytics Cloud Platform as
Analytics-as-a-Service System for Heart Failure
Early Detection
Gabriele Guidi, Roberto Miniati, Matteo Mazzola and Ernesto Iadanza *
Department of Information Engineering Unversità degli Studi di Firenze, v. S. Marta, 3-50139 Firenze, Italy;
[email protected] (G.G.); [email protected] (R.M.); [email protected] (M.M.)
* Correspondence: [email protected]; Tel.: +39-347-592-2874
Abstract: In the recent years the progress in technology and the increasing availability of fast
connections have produced a migration of functionalities in Information Technologies services, from
static servers to distributed technologies. This article describes the main tools available on the
market to perform Analytics as a Service (AaaS) using a cloud platform. It is also described a use
case of IBM Watson Analytics, a cloud system for data analytics, applied to the following research
scope: detecting the presence or absence of Heart Failure disease using nothing more than the
electrocardiographic signal, in particular through the analysis of Heart Rate Variability. The obtained
results are comparable with those coming from the literature, in terms of accuracy and predictive
power. Advantages and drawbacks of cloud versus static approaches are discussed in the last sections.
1. Introduction
In the recent years the progress in technology and the increasing availability of fast connections
has produced a migration of functionalities in Information Technology (IT) services, from static
servers to distributed technologies. This phenomenon is commonly well known as Cloud Computing;
the most exhaustive and official definition comes from the US National Institute of Standards and
Technology (NIST) [1], which introduces all the fundamental concepts of the cloud systems, such as
on-demand access to resources by the end user and offering services with minimal infrastructures and
management effort.
NIST definition points out that cloud computing includes data processing and data storage, both
performed on remote servers.
The arrival of cloud computing is also changing many core concepts in IT, defining new service
models for distribution to final customers. Summarizing the definitions in [1]:
‚ Software as a Service (SaaS): the consumer can use various cloud devices to take advantage of
a provider’s application (web application) that is stored on a cloud infrastructure.
‚ Platform as a Service (PaaS): business users can deploy and distribute their applications onto the
cloud, taking advantage of the tools supported by the provider without having to manage the
underlying infrastructure
‚ Infrastructure as a Service (IaaS): in addition to all the functionalities of the PaaS model, the user
can also control the operating system and the storage as well as select some network components
(e.g., host firewalls).
2. Related Studies: State of the Art about Analytics Tools on the Market
The following information has been extracted from the public websites of vendors and from the
above mentioned reviews by M. Butler [7]. The intent is exclusively to provide an overview on the
state of art about the currently available cloud-based products for analytics.
2.2. Statistica
Statistica is a suite from StatSoft, recently acquired by Dell Software. The Statistica suite offers
many products, such as Data Visualization, BigData, DataMiner, TextMiner, decision-making and
sentiment analysis tools. The application fields where Statistica has specific solutions are multiple:
cross-industry, energy oil and gas, financial, healthcare, insurance, manufacturing, pharmaceutical.
In the healthcare field, Statistica offers a variety of graphical modules that enable analytics for
several tasks like: patient/customers profiling, prediction of hospital readmissions, cost estimation,
risks management.
is that it is provided by a company whose core business is database. This is reflected in the product
Oracle Advanced Analytics in providing data analysis directly on data that are stored in Oracle
Database: customers can run the algorithms directly where the data are located, in the database (no
slow input-output operations). As mentioned, Oracle offers two types of systems: Oracle R Enterprise
that allows users to use their R-language skills and tools to analyze their data, and Oracle Data Mining
that allows users to create data mining functions using SQL language.
2.4. FICO
FICO provides predictive analytics with the peculiarity of being combined with prescriptive
analytics and business rules management. It offers specialized solutions oriented to market, such as
functions for customers engagement, or oriented to the bank scoring, as mortgage calculation or risk
functions. FICO has an entire section dedicated to analytics on the cloud, providing these technologies
to be suitable also for smaller businesses without having to keep local analytics server.
2.5. KXEN
KXEN in 2013 was acquired by SAP. KXEN originally offered solutions primarily geared to risk
minimization including heavy duty products, mainly suitable for large organizations.
2.9. SAS
SAS (Statistical Analytics System) [9] is a big company that started its activities by offering
analytical services for agriculture; today SAS offers services in all application areas, from the academic
field, to the life science field, from medical to management-aid. SAS offers a vast set of solutions whose
names recall the area of application, such as: SAS Curriculum Pathways, SAS Data Management, SAS
Visual Analytics, etc.
Future Internet 2016, 8, 32 4 of 16
Future Internet 2016, 8, 32 4 of 16
Starting from SAS 9.4, the solution is deployable also onto the cloud. To be used with its full
power, the system is mainly directed to expert users. The fields of application are several, mainly
Starting from SAS 9.4, the solution is deployable also onto the cloud. To be used with its full power,
related to business intelligence and bank trading. We also found examples of applications in
the system is mainly directed to expert users. The fields of application are several, mainly related to
healthcare ([10,11]) and Big Data ([12]).
business intelligence and bank trading. We also found examples of applications in healthcare ([10,11])
and Big Data ([12]).
2.10. IBM Watson Analytics
2.10. IBM Watson Analytics
IBM Watson Analytics [13] is a cloud based system that allows the final user to run complex
analytics using a simple interface, using nothing but a web browser (no specific clients or plug‐ins to
IBM Watson Analytics [13] is a cloud based system that allows the final user to run complex
be installed on local machines). The goal is allowing users, experts who may be familiar with data
analytics using a simple interface, using nothing but a web browser (no specific clients or plug-ins to
analytics techniques or not, to focus only on their experiment or case study.
be installed
Once on the local machines).
database The goal
is uploaded is allowing
on the users,
cloud, the experts
system offers who may
three be familiar
categories with data
of functions:
analytics techniques or not, to focus only on their experiment or case study.
Explore, Predict and Assemble (Figure 1). The “Explore” mode provides data clustering to detect
Once the database is uploaded on the cloud, the system offers three categories of functions:
patterns and intrinsic relationships between data (non‐supervised training techniques). The “Predict”
Explore, Predict and Assemble (Figure 1). The “Explore” mode provides data clustering to detect
mode allows the user to perform predictions on the data, disclosing the predictive strength of the
patterns and intrinsic relationships between data (non-supervised training techniques). The “Predict”
most significant parameters, compared to a single target parameter set up by the user. “Assemble”
mode allows the user to perform predictions on the data, disclosing the predictive strength of the most
mode is dedicated to efficiently show data using infographics.
significant
The parameters, comparedof tosuch
fields of application a single target
general parameter
purpose set up
systems are by the large;
very user. “Assemble” mode is
there is a Watson
Analytics Community where users can share use cases as samples of application in various areas [14].
dedicated to efficiently show data using infographics.
TheFor example, the system is used by a human resource manager to identify the parameters that
fields of application of such general purpose systems are very large; there is a Watson
affect the workers resignations [15]. One more example is the analysis of product sales near particular
Analytics Community where users can share use cases as samples of application in various areas [14].
events, such as fireworks near July 4th in US [16]. Other use cases include banking, insurance, retail,
For example, the system is used by a human resource manager to identify the parameters that
telecommunications,
affect government,
the workers resignations nonprofit,
[15]. One education,
more example marketing,
is the analysissales, information
of product technology,
sales near particular
finance and more.
events, such as fireworks near July 4th in US [16]. Other use cases include banking, insurance, retail,
One of the strengths
telecommunications, of Watson
government, Analytics,
nonprofit, is the automation
education, marketing,of sales,
many information
steps of the technology,
analysis,
allowing also non expert users to start using it. Main automation functionalities can be summarized
finance and more.
as:
One of the strengths of Watson Analytics, is the automation of many steps of the analysis, allowing
- non
also expert users to start using it. Main automation functionalities can be summarized as:
Automatic Data Preparation
- •Automatic
Data Transformation
Data Preparation
• Data Quality Index, based on empty fields analysis and constant values identification
‚ Data Transformation
- ‚ Automatic Modeling
Data Quality Index, based on empty fields analysis and constant values identification
- •Automatic
Auto selection of best models and detection of strongest relationships: Decision Tree (CHAID)
Modeling
and Key Driver
Autoselection
•‚ Auto selectionof
of best
best models andstatistical
predictive detectionmethod
of strongest relationships:
basing Decision
on data type: Tree
Watson (CHAID)
Analytics
and Key Driver
automatically chooses the best regression model for the user data between linear, logistic,
‚ multivariate etc.
Auto selection of best predictive statistical method basing on data type: Watson Analytics
automatically chooses the best regression model for the user data between linear, logistic,
In addition, Watson Analytics includes an engine for text cognitive analysis (IBM is the world
multivariate etc.
leader); the user can submit a question typed in natural language to extract information from data.
Figure 1. Watson Analytics (WA) home screen showing three modalities: Explore, Predict and Assemble.
Future Internet 2016, 8, 32 5 of 16
In addition, Watson Analytics includes an engine for text cognitive analysis (IBM is the world
leader); the user can submit a question typed in natural language to extract information from data.
3. Use Case: Watson Analytics as AaaS to Identify HF Patients Analyzing Only the ECG Signal
- Echocardiogram: it provides immediate information about the volumes of the atrial and
ventricular chambers, and in particular about the ejection fraction.
- Electrocardiogram: it provides information about the rhythm of the heartbeat and possible faults
in the electrical signal transmission (atrioventricular block etc.).
- Natriuretic Peptides: they are another important marker of HF. The examination consists in
the analysis of blood concentration of the BNP (B-type Natriuretic Peptide) or NT-proBNP
(N-terminal pro-B type), hormones secreted in abnormal amounts when the heart is diseased or
the load on any chamber is increased.
- Chest X-ray: this examination is often more useful to identify lung diseases that cause symptoms
similar to HF. However, this examination may show pulmonary venous congestion or edema in
a patient with HF.
- Other laboratory tests: the guidelines indicate a multitude of laboratory parameters that may
be related to HF, including biochemical (sodium, potassium, creatinine) and hematological tests
(hemoglobin, hematocrit, ferritin, Leucocytes, and platelets). Also thyroid hormone is important
because it can have an impact on HF.
ESC Guidelines also report a complex graph showing the algorithm that manages HF diagnostic
decisions (a sequence of given tests related to the symptoms).
In this complex scenario, some of the mentioned useful parameters for HF diagnosis are not
appropriate for a home telemonitoring context. One of the aims of this study is to establish in
which cases the sole ECG-HRV analysis could be appropriate to perform a preliminary and early
diagnosis of HF.
- P wave: the first wave of the cycle, which corresponds to ventricular depolarization of the atria;
the contraction is quite weak and the wave is small.
- QRS complex: set of three waves in rapid succession corresponding to the depolarization of the
ventricles: the Q wave is negative and small, the R is a high positive peak, while S is again a small
negative wave.
- T Wave: it refers to the ventricle repolarization.
- U Wave: due to the repolarization of the papillary muscles, which is also not always identifiable
- ST Section: period during which the ventricular cells are depolarized, therefore isoelectric, so
electrical changes are not greater than 1 mm on the graph.
- QT interval: interval in which occurs ventricular depolarization and repolarization; its duration
varies with the heart rate, but generally remains between 350 and 440 ms.
Heart Rate Variability (HRV) nomenclature refers to the physiological phenomenon of time length
variation between two heart beats; once defined the peak wave in the cardiac cycle as “R”, we can also
refer to HRV as “RR variation” or “RR interval”, meaning the time frame between two R waves. HRV
can be performed using two ways:
- Long-term analysis: performed on a ECG signal acquired for 24 h in a row, using a device called
Cardiac Holter
Future Internet 2016, 8, 32 7 of 16
Future Internet 2016, 8, 32 7 of 16
- Short-term analysis: performed on a ECG signal acquired for just 5 min or less
HRV analysis can be carried on in both time and frequency domain. The values obtained from
HRV analysis can be carried on in both time and frequency domain. The values obtained from the
the ECG signal performing the time domain analysis are summarized in [28], and are:
ECG signal performing the time domain analysis are summarized in [28], and are:
SDANN: Standard deviation of the averages of NN intervals in all 5‐min segments of a 24‐h
SDANN: Standard deviation of the averages of NN intervals in all 5-min segments of a 24-h recording
‚ recording
‚ AVNN:
AVNN: Average of all NN intervals
Average of all NN intervals
‚ pNN50: Percentage of differences between adjacent NN intervals that are greater than 50 ms; a
pNN50: Percentage of differences between adjacent NN intervals that are greater than 50 ms;
member of the larger pNNx family
a member of the larger pNNx family
‚ SDNNIDX:
SDNNIDX: Mean Mean of the standard deviations of NN intervals in all 5‐min segments of a 24‐h
of the standard deviations of NN intervals in all 5-min segments of a 24-h recording
‚ recording
rMSSD: Square root of the mean of the squares of differences between adjacent NN intervals
‚ rMSSD: Square root of the mean of the squares of differences between adjacent NN intervals
SDNN: Standard deviation of all NN intervals
SDNN: Standard deviation of all NN intervals
In the frequency domain parameters from the ECG signal are:
In the frequency domain parameters from the ECG signal are:
‚ LF/HF: Ratio
LF/HF: of low to high frequency power
Ratio of low to high frequency power
‚ TOTPWR: Total
TOTPWR: spectral power of all NN intervals up to 0.04 Hz
Total spectral power of all NN intervals up to 0.04 Hz
‚ LF: Total spectral power of all NN intervals between 0.04 and 0.15 Hz.
LF: Total spectral power of all NN intervals between 0.04 and 0.15 Hz.
‚ ULF: Total spectral power of all NN intervals up to 0.003 Hz
ULF: Total spectral power of all NN intervals up to 0.003 Hz
‚ HF: Total spectral power of all NN intervals between 0.15 and 0.4 Hz
HF: Total spectral power of all NN intervals between 0.15 and 0.4 Hz
‚ VLF: Total spectral power of all NN intervals between 0.003 and 0.04 Hz
VLF: Total spectral power of all NN intervals between 0.003 and 0.04 Hz
Figure 2. Diagram showing the workflow our study.
Figure 2. Diagram showing the workflow our study.
‚ CHF2DB: contains 29 subjects aged between 34 and 79 years with medium severity of heart failure;
8 of 16
the subjects include 8 men and 2 women; the sex of the remaining 19 patients is not known.
Future Internet 2016, 8, 32
‚ NSR2DB: the Normal Sinus Rhythm Database contains 54 healthy subjects including 30 men (age
NSR2DB: the Normal Sinus Rhythm Database contains 54 healthy subjects including 30 men (age
range: 28–76), and 24 women (age range 58–73).
range: 28–76), and 24 women (age range 58–73).
Table 1 summarizes the overall dataset analyzed.
Table 1 summarizes the overall dataset analyzed.
Table 1. Dataset distribution.
Table 1. Dataset distribution.
Number of Healthy Patients
Number of Healthy Patients Number of HF Patients
Number of HF Patients
54 54 44 44
3.3.2. Extraction of HRV Parameters
3.3.2. Extraction of HRV Parameters
For the extraction of HRV parameters we used the tool set provided by PhysioNet called HRV
For the extraction of HRV parameters we used the tool set provided by PhysioNet called HRV
Toolkit, used on Ubuntu Linux.
Toolkit, used on Ubuntu Linux.
In order to make repeatable tasks, we report some details on the data extraction, that has been
In order to make repeatable tasks, we report some details on the data extraction, that has been
performed creating two scripts, which recall separately the short‐term and long‐term analysis, both
performed creating two scripts, which recall separately the short-term and long-term analysis, both
set by literature instructions: 5 min time frame for the short‐term analysis and the entire recording
set by literature instructions: 5 min time frame for the short-term analysis and the entire recording
duration, 24 h, for the long‐term. In both cases, the outliers are filtered and the results are expressed
duration, 24 h, for the long-term. In both cases, the outliers are filtered and the results are expressed in
in milliseconds. The scripts are shown in Figures 3 and 4. Note that, for the short‐term analysis, only
milliseconds. The scripts are shown in Figures 3 and 4. Note that, for the short-term analysis, only
the 5 min of recording ranging from tenth to fifteenth minute of acquisition are selected, in order to
the 5 min of recording ranging from tenth to fifteenth minute of acquisition are selected, in order to
remove the possible noise due to the first seconds/minutes of recording.
remove the possible noise due to the first seconds/minutes of recording.
Figure 3. Script for Long-Term Analysis.
Figure 3. Script for Long‐Term Analysis.
Figure 4. Script for Short‐Term Analysis.
Figure 4. Script for Short-Term Analysis.
3.3.3. Database Setup for Watson Analytics Analysis.
3.3.3. Database Setup for Watson Analytics Analysis.
Watson Analytics
Watson Analytics (WA)
(WA) is ais cloud
a cloud system
system basedbased on regressive
on regressive techniques
techniques and supervised
and supervised training.
training. The analysis dataset has been structured in a format suitable to be analyzed, as shown in
The analysis dataset has been structured in a format suitable to be analyzed, as shown in Figure 5.
Figure 5.
Each data column corresponds to an HRV parameter while each row is assigned to a different
patient. Note that the last column at the right, “HF_State”, represents the target prediction, which is
the presence (1) or absence (0) of HF in the corresponding patient.
Future Internet 2016, 8, 32 9 of 16
Figure 5. Abstract of current dataset in suitable format for analysis.
Each data column corresponds to an HRV parameter while each row is assigned to a different
patient. Note that the last column at the right, “HF_State”, represents the target prediction, which is
Figure 5. Abstract of current dataset in suitable format for analysis.
the presence (1) or absence (0) of HF in the corresponding patient.
Figure 5. Abstract of current dataset in suitable format for analysis.
3.3.4.Each data column corresponds to an HRV parameter while each row is assigned to a different
3.3.4. Data Analysis with Watson Analytics
Data Analysis with Watson Analytics
patient. Note that the last column at the right, “HF_State”, represents the target prediction, which is
When HRV features dataset is ready, you can start the analysis with WA. Until now the actions
When HRV features dataset is ready, you can start the analysis with WA. Until now the actions are
the presence (1) or absence (0) of HF in the corresponding patient.
are performed
performed locally;
locally; fromfrom this point
this point on, theon, the dataset
dataset is readyis toready to be uploaded
be uploaded to the
to the cloud andcloud and
analytics
analytics operations will be performed as AaaS. Watson Analytics accepts the most common matrix
operations will be performed as AaaS.
3.3.4. Data Analysis with Watson Analytics Watson Analytics accepts the most common matrix formats,
formats, such as .CSV, .XSL, .XSLX. After loading the dataset, the system assigns an index value of
such as .CSV, .XSL, .XSLX. After loading the dataset, the system assigns an index value of data quality,
by When HRV features dataset is ready, you can start the analysis with WA. Until now the actions
data quality, by considering the completeness of the fields, possible presence of constant values, low
considering the completeness of the fields, possible presence of constant values, low number of
are performed
compared locally;
to thefrom this and
columns point on, qualitative
other the dataset is ready to be uploaded to the cloud and
number of records compared to the columns and other qualitative factors.
records factors.
analytics operations will be performed as AaaS. Watson Analytics accepts the most common matrix
Now it
Now it is
is possible
possible to
to process
process the
the dataset
dataset using
using the
the modalities
modalities offered
offered by
by WA: “Assemble”,
WA: “Assemble”,
formats, such as .CSV, .XSL, .XSLX. After loading the dataset, the system assigns an index value of
“Explore”, and “Predict” (see Section 2.10 above).
“Explore”, and “Predict” (see Section 2.10 above).
data quality, by considering the completeness of the fields, possible presence of constant values, low
In Figure 6 is shown an example of use of the Assemble feature, where the distribution of Target
In Figure 6 is shown an example of use of the Assemble feature, where the distribution of Target
number of records compared to the columns and other qualitative factors.
value (HF_State) is compared to an HRV parameter (pNN20).
value (HF_State) is compared to an HRV parameter (pNN20).
Now it is possible to process the dataset using the modalities offered by WA: “Assemble”,
“Explore”, and “Predict” (see Section 2.10 above).
In Figure 6 is shown an example of use of the Assemble feature, where the distribution of Target
value (HF_State) is compared to an HRV parameter (pNN20).
Figure 6. Graphic representation of the distribution of a target, based on a parameter in Assemble
Figure 6. Graphic representation of the distribution of a target, based on a parameter in Assemble mode.
mode.
seen in Figure 7, the
We can consider the Explore mode as a facilitator for the Predict mode. As
Explore mode proposes some questions to the user in natural language. These questions are generated
Figure 6. Graphic representation of the distribution of a target, based on a parameter in Assemble
mode.
Future Internet 2016, 8, 32 10 of 16
Future Internet 2016, 8, 32 10 of 16
We can consider the Explore mode as a facilitator for the Predict mode. As seen in Figure 7, the
FutureWe can consider the Explore mode as a facilitator for the Predict mode. As seen in Figure 7, the
Internet 2016, 8, 32 10 of 16
Explore mode proposes some questions to the user in natural language. These questions are
Explore mode proposes some questions to the user in natural language. These questions are
generated by relationships that WA automatically extracted from the dataset parameters (without
generated by relationships that WA automatically extracted from the dataset parameters (without
setting any parameters as a target).
by relationships that WA automatically extracted from the dataset parameters (without setting any
setting any parameters as a target).
parameters as a target).
Figure 7. Proposals for links automatically detected in “Explore” mode.
Figure 7. Proposals for links automatically detected in “Explore” mode.
Figure 7. Proposals for links automatically detected in “Explore” mode.
In Explore mode, the user can also ask questions in natural language, as shown in Figure 8.
In Explore mode, the user can also ask questions in natural language, as shown in Figure 8.
In Explore mode, the user can also ask questions in natural language, as shown in Figure 8.
Figure 8. Text questions typed by user to inspect data relations or distributions.
Figure 8. Text questions typed by user to inspect data relations or distributions.
Figure 8. Text questions typed by user to inspect data relations or distributions.
The most interesting mode is “Predict” that allows supervised analyses by setting a prediction
The most interesting mode is “Predict” that allows supervised analyses by setting a prediction
target. In this mode it is possible to inspect the predictive power of any other parameter.
The most interesting mode is “Predict” that allows supervised analyses by setting a prediction
target. In this mode it is possible to inspect the predictive power of any other parameter.
We created two different instances of the Predict module, one for the Long-Term HRV dataset
target. In this mode it is possible to inspect the predictive power of any other parameter.
We created two different instances of the Predict module, one for the Long‐Term HRV dataset
parameters and one for Short-Term HRV dataset, as explained in Section 3.3.2.
We created two different instances of the Predict module, one for the Long‐Term HRV dataset
parameters and one for Short‐Term HRV dataset, as explained in paragraph 3.3.2.
An interesting feature offered by WA is that, regardless of the type of dataset as target, it
parameters and one for Short‐Term HRV dataset, as explained in paragraph 3.3.2.
An interesting feature offered by WA is that, regardless of the type of dataset as target, it
automatically chooses
An interesting the most
feature appropriate
offered by WA is model to treat that
that, regardless of type of data.
the type In ouras
of dataset case study,
target, it
automatically chooses the most appropriate model to treat that type of data. In our case study, being
being HF_State a dichotomous variable, the system automatically selected the logistic
automatically chooses the most appropriate model to treat that type of data. In our case study, being regression
HF_State a dichotomous variable, the system automatically selected the logistic regression model, as
model, as shown in Figure 9.
HF_State a dichotomous variable, the system automatically selected the logistic regression model, as
shown in Figure 9.
shown in Figure 9.
Future Internet 2016, 8, 32 11 of 16
Future Internet 2016, 8, 32 11 of 16
Future Internet 2016, 8, 32 11 of 16
Figure 9. WA has automatically selected logistic regression as the best model to deal with our data.
Figure 9. WA has automatically selected logistic regression as the best model to deal with our data.
Figure 9. WA has automatically selected logistic regression as the best model to deal with our data.
3.4. Results
3.4. Results
3.4. Results
InIn
this section are are
shown results aboutabout
the above-described AaaS use case:use
search forsearch
the presence
this section shown results the above‐described AaaS case: for the
ofpresence of heart failure, starting from the analysis of ECG signals, using IBM Watson Analytics.
heart failure, starting from the analysis of ECG signals, using IBM Watson Analytics.
In this section are shown results about the above‐described AaaS use case: search for the
The system sets out the results both as graphics and text, in three ways:
presence of heart failure, starting from the analysis of ECG signals, using IBM Watson Analytics.
The system sets out the results both as graphics and text, in three ways:
The system sets out the results both as graphics and text, in three ways:
-- “single
“single predictor”: shows the predictive value of the most influent parameter
predictor”: shows the predictive value of the most influent parameter
-- “single predictor”: shows the predictive value of the most influent parameter
“double predictor”: the first two most predictive parameters are shown
“double predictor”: the first two most predictive parameters are shown
- “double predictor”: the first two most predictive parameters are shown
“combination”: the various parameters are combined for a more accurate prediction.
- “combination”: the various parameters are combined for a more accurate prediction.
- “combination”: the various parameters are combined for a more accurate prediction.
Switching from “single predictor” to two or more predictors, the overall prediction accuracy can
Switching from “single predictor” to two or more predictors, the overall prediction accuracy can
Switching from “single predictor” to two or more predictors, the overall prediction accuracy can
increase, but at the expense of the results intelligibility. In some fields of application this can be less
increase, but at the expense of the results intelligibility. In some fields of application this can be less
increase, but at the expense of the results intelligibility. In some fields of application this can be less
acceptable than losing some percentage points in accuracy. Figure 10 (left box) shows these concepts.
acceptable than losing some percentage points in accuracy. Figure 10 (left box) shows these concepts.
acceptable than losing some percentage points in accuracy. Figure 10 (left box) shows these concepts.
Figure 10. Screenshot from WA. On the left, the choice of the number of parameters to be used for the
Figure 10. Screenshot from WA. On the left, the choice of the number of parameters to be used for the
Figure 10. Screenshot from WA. On the left, the choice of the number of parameters to be used for the
prediction, to balance intelligibility and prediction power.
prediction, to balance intelligibility and prediction power.
prediction, to balance intelligibility and prediction power.
3.4.1. Long‐Term HRV Results
3.4.1. Long‐Term HRV Results
3.4.1. Long-Term HRV Results
For the Long‐Term HRV analysis many parameters have been spotted, having a Predictive
For the Long‐Term HRV analysis many parameters have been spotted, having a Predictive
For the Long-Term HRV analysis many parameters have been spotted, having a Predictive
Strength (PS) of 90% on the Target HF State, in “single predictor” mode. The most influent predictors
Strength (PS) of 90% on the Target HF State, in “single predictor” mode. The most influent predictors
Strength
are: (PS) of 90% on the Target HF State, in “single predictor” mode. The most influent
are:
predictors are:
-- In the Time Domain: SDNN (PS = 90%), SDANN (PS = 90%), SDNNIDX (PS = 88%)
In the Time Domain: SDNN (PS = 90%), SDANN (PS = 90%), SDNNIDX (PS = 88%)
--- In the Frequency Domain: TOT_PWR (PS = 90%), ULF_PWR (PS = 90%)
In the Frequency Domain: TOT_PWR (PS = 90%), ULF_PWR (PS = 90%)
In the Time Domain: SDNN (PS = 90%), SDANN (PS = 90%), SDNNIDX (PS = 88%)
- In the Frequency Domain: TOT_PWR (PS = 90%), ULF_PWR (PS = 90%)
Figure 11 shows, as an example, the screenshot for the parameter TOT_PWR. It can be noted that
Figure 11 shows, as an example, the screenshot for the parameter TOT_PWR. It can be noted that
the results are displayed as numbers, text and graphics.
the results are displayed as numbers, text and graphics.
Figure 11 shows, as an example, the screenshot for the parameter TOT_PWR. It can be noted that
the results are displayed as numbers, text and graphics.
Future Internet 2016, 8, 32
Future Internet 2016, 8, 32 12
12 of
of 16
16
Future Internet 2016, 8, 32 12 of 16
Figure 11. Results for Long Term Hearth Rate Variability (HRV), using TOT_PWR as single predictor.
Figure 11. Results for Long Term Hearth Rate Variability (HRV), using TOT_PWR as single predictor.
Figure 11. Results for Long Term Hearth Rate Variability (HRV), using TOT_PWR as single predictor.
Increasing the number of predictors to be used for the analysis, we can find many combinations
Increasing the number of predictors to be used for the analysis, we can find many combinations
Increasing the number of predictors to be used for the analysis, we can find many combinations
with a maximum overall PS of 92%. Figure 12 shows, as an example, the combination of TOT_PWR
with a maximum overall PS of 92%. Figure 12 shows, as an example, the combination of TOT_PWR
with a maximum overall PS of 92%. Figure 12 shows, as an example, the combination of TOT_PWR
and LF/HF.
and LF/HF.
and LF/HF.
Figure 12. Results for Long Term HRV, using the pair TOT_PWR, LF/HF as multiple predictors.
Figure 12. Results for Long Term HRV, using the pair TOT_PWR, LF/HF as multiple predictors.
Figure 12. Results for Long Term HRV, using the pair TOT_PWR, LF/HF as multiple predictors.
3.4.2. Short-Term HRV Results
3.4.2. Short‐Term HRV Results
3.4.2. Short‐Term HRV Results
The results for the Short-Term analysis show a lower predictive power (single predictor) if
The results for the Short‐Term analysis show a lower predictive power (single predictor) if
compared to the Long-Term
The results analysis. analysis
for the Short‐Term The mostshow
influent parameters
a lower on the
predictive HF_State
power target
(single are:
predictor) if
compared to the Long‐Term analysis. The most influent parameters on the HF_State target are:
compared to the Long‐Term analysis. The most influent parameters on the HF_State target are:
-- LF_PWR (PS = 84%)
LF_PWR (PS = 84%)
- - LF_PWR (PS = 84%)
LF/HF (PS = 83%)
- LF/HF (PS = 83%)
- - LF/HF (PS = 83%)
- TOT_PWR (PS = 80%)
TOT_PWR (PS = 80%)
- TOT_PWR (PS = 80%)
Figure 13 shows the results for the LF_PWR parameter.
Figure 13 shows the results for the LF_PWR parameter.
Figure 13 shows the results for the LF_PWR parameter.
Future Internet 2016, 8, 32 13 of 16
Future Internet 2016, 8, 32 13 of 16
Future Internet 2016, 8, 32 13 of 16
Figure 13. Results for Short Term HRV, using LF_PWR as single predictor.
Figure 13. Results for Short Term HRV, using LF_PWR as single predictor.
Figure 13. Results for Short Term HRV, using LF_PWR as single predictor.
The results are greatly enhanced by combining more predictors, achieving values similar to the
The results are greatly enhanced by combining more predictors, achieving values similar to the
Long‐Term analysis:
Long-Term analysis:
The results are greatly enhanced by combining more predictors, achieving values similar to the
- LF/HF combined with SDNN: PS = 94%
Long‐Term analysis:
- LF/HF combined with SDNN: PS = 94%
- LF_PWR combined with LF/HF: PS = 92%
--- LF/HF combined with SDNN: PS = 94%
LF_PWR combined with LF/HF: PS = 92%
pNN20 combined with LF/HF: PS = 92%
- - LF_PWR combined with LF/HF: PS = 92%
pNN20 combined with LF/HF: PS = 92%
Figure 14 shows the results using the combination of LF/HF and SDNN.
- pNN20 combined with LF/HF: PS = 92%
Figure 14 shows the results using the combination of LF/HF and SDNN.
Figure 14 shows the results using the combination of LF/HF and SDNN.
Figure 14. Results for Short Term HRV, using LF/HF + SDNN as multiple predictors.
Figure 14. Results for Short Term HRV, using LF/HF + SDNN as multiple predictors.
Figure 14. Results for Short Term HRV, using LF/HF + SDNN as multiple predictors.
4. Discussion on Results
The results show that the Long‐Term and Short‐Term HRV analyses are comparable in terms of
4. Discussion on Results
4. Discussion on Results
predictive power on the detected parameters, when the target is identifying if patients are healthy or
The results show that the Long‐Term and Short‐Term HRV analyses are comparable in terms of
The results
diseased (Heart show that The
Failure). the Long-Term
Short‐Term and
HRV Short-Term
method is HRV analyses
highly are comparable
preferable, since it is inmuch
termsless
of
predictive power on the detected parameters, when the target is identifying if patients are healthy or
predictive power on the detected parameters, when the target is identifying if patients
invasive for the patient (five minutes for ECG acquisition, compared to a 24 h Holter ECG are healthy
diseased
or diseased(Heart
(HeartFailure). The
Failure). Short‐Term
The Short-Term HRV
HRVmethod
methodis ishighly
highlypreferable,
preferable,since
sinceit
it is
acquisition). It is also very suitable for tele‐monitoring scenarios, such as those described in [29]. is much less
much less
invasive
invasive for the
for the patient
patient (five(five minutes
minutes for acquisition,
for ECG ECG acquisition,
comparedcompared
to a 24 h to a 24 ECG
Holter h Holter
These results are comparable with the literature. In [23] similar results are obtained—using a ECG
acquisition).
acquisition). It is also very suitable for tele‐monitoring scenarios, such as those described in [29].
Itstatic
is also very suitable for tele-monitoring scenarios, such as those described in [29].
(non cloud) Classification And Regression Tree (CART) approach on MatLab—in terms of
These results are comparable with the literature. In [23] similar results are obtained—using a
These results are comparable with the literature. In [23] similar results are obtained—using a static
overall accuracy (>90%) and most predictive parameters (SDNN, SDANN and TOT_PWR). In [24] is
static
(non (non cloud)
cloud) Classification
Classification And Regression
And Regression Tree (CART)
Tree (CART) approachapproach on MatLab—in
on MatLab—in
described a Short‐Term approach; the obtained results are similar to ours, both in terms of overall terms ofterms of
overall
overall accuracy (>90%) and most predictive parameters (SDNN, SDANN and TOT_PWR). In [24] is
accuracy and of most effective predictors (LF/HF).
described a Short‐Term approach; the obtained results are similar to ours, both in terms of overall
accuracy and of most effective predictors (LF/HF).
Future Internet 2016, 8, 32 14 of 16
accuracy (>90%) and most predictive parameters (SDNN, SDANN and TOT_PWR). In [24] is described
a Short-Term approach; the obtained results are similar to ours, both in terms of overall accuracy and
of most effective predictors (LF/HF).
We can therefore assert that the results obtained using a cloud approach on IBM WA are
comparable to the results obtained on ad hoc custom desktop platforms. The results are shown
in a clear and friendly way, easily understandable also by non experts.
The main advantage of the proposed approach for the researcher is the possibility of being quickly
operative, focusing only on the experiment, without taking care of hardware requirements (high
computational power is needed for these analyses) or machine learning algorithms development.
From a medical point of view, the results of this study can be interpreted as the possibility to
perform a preliminary and early diagnosis of HF, basing solely on the analysis of the ECG signal
(accepting a certain level of uncertainty, as shown by the accuracy values).
These findings are not meant to replace the diagnostic procedures for an exhaustive diagnosis,
explained in the ESC guidelines, but can be very helpful in many scenarios such as home telemonitoring
for the daily monitoring of patient status.
As shown, even short term analysis has a strong predictive power: this means that the patient will
benefit of the proposed approach, having to stay connected to an electrocardiograph for only 5 min
(instead of 24 h).
It is very important to note that HRV analysis is based only on the progress of heart rate without
any further analysis of the ECG wave form. This means that for the proposed system it is only needed
a device for high quality detection of the heartbeat (for example, a 2-lead ECG measured from hands)
instead of a costly and less practical 12-lead electrocardiograph. This aspect is particularly important
for enabling mobile applications.
5. Conclusions
In this paper, after a brief introduction of the main AaaS cloud systems, we reported the experience
of using a cloud-based analytics software applied to the following case study: identifying the presence
of HF by analyzing the ECG signal only.
We verified that the results obtained are comparable to those found in the literature, where the
same issue is addressed through custom machine learning systems, purposely developed and set
up for the target case. Hence the AaaS cloud systems could be a valid alternative to local hardware
and software systems for analyzing data. A major obstacle to AaaS could be transferring big datasets
onto the cloud. Typical machine learning projects require the analysis of large images that can easily
reach the size of 2 TB, not simply transferable onto the cloud. The model used in our case study can
solve this problem by locally performing the data extraction, in order to reduce the dataset size to be
transferred to the cloud.
In this study the HRV analysis has been locally performed starting from the raw ECG signal
(medium size). The analysis gives back a small size vector of numeric parameters that can quickly
and easily be transferred onto the cloud. This model allows you to take advantage of the full power
of the AaaS approach, no matter how big is the size of the initial dataset. From a medical point
of view, performing HF detection by analyzing the ECG signal only, opens the possibility of easy
tele-monitoring applications (we only analyze the heart rate, not the ECG waveform, so a very basic
electrocardiograph is necessary) for an early and preliminary diagnosis. Furthermore, by combining the
HRV analysis with systems for assisted drugs delivering [30], it is possible to enable scenarios in which
the patient is technologically aided both in diagnosis and therapy, making him more autonomous in
preserving the state of his health. A more comprehensive diagnosis can then be made by performing
clinical tests and following protocols as described in the ESC guidelines, and the HRV-based home
tele-monitoring can be used as a daily check of patient status. Also mobile applications can highly
benefit of this approach, given that it requires simple electro-medical hardware and low computational
power on the local device.
Future Internet 2016, 8, 32 15 of 16
Acknowledgments: We would like to thank IBM Italy for supporting us in the use of Watson Analytics, as well as
for providing us with free student-accounts for research purposes. We would like to thank also Massimo Milli,
MD, for his crucial clinical support.
Author Contributions: Gabriele Guidi and Ernesto Iadanza equally contributed to this manuscript. Matteo Mazzola
is a student in engineering that performed the analyses using IBM Watson Analytics, supervised by Gabriele Guidi.
Roberto Miniati contributed in finding the state of the art.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Mell, P.; Grance, T.; Grance, T. The NIST Definition of Cloud Computing Recommendations of the National
Institute of Standards and Technology. Available online: http://nvlpubs.nist.gov/nistpubs/Legacy/SP/
nistspecialpublication800-145.pdf (accessed on 30 June 2016).
2. Sun, X.; Gao, B.; Fan, L.; An, W. A Cost-Effective Approach to Delivering Analytics as a Service. In Proceedings
of the 2012 IEEE 19th International Conference on Web Services, Honolulu, HI, USA, 24–29 June 2012;
pp. 512–519.
3. Barga, R.S.; Ekanayake, J.; Lu, W. Project Daytona: Data Analytics as a Cloud Service. In Proceedings of
the 2012 IEEE 28th International Conference on Data Engineering, Arlington, VA, USA, 1–5 April 2012;
pp. 1317–1320.
4. Talia, D. Clouds for Scalable Big Data Analytics. Computer 2013, 46, 98–101. [CrossRef]
5. Demirkan, H.; Delen, D. Leveraging the capabilities of service-oriented decision support systems: Putting
analytics and big data in cloud. Decis. Support. Syst. 2013, 55, 412–421. [CrossRef]
6. Chen, Q.; Zeller, H. Experience in Continuous analytics as a Service (CaaaS). In Proceedings of the 14th
International Conference on Extending Database Technology, Uppsala, Sweden, 21–24 March 2011; Volume 1,
pp. 509–514.
7. 10 Enterprise Predictive Analytics Platforms Compared. Available online: http://www.kdnuggets.com/
2013/08/10-enterprise-predictive-analytics-platforms-compared.html (accessed on 30 June 2016).
8. Enterprise Predictive Analytics Comparisons 2014. Available online: http://www.butleranalytics.com/
enterprise-predictive-analytics-comparisons-2014/ (accessed on 30 June 2016).
9. SAS Analytics Home Page. Available online: Http://www.sas.com/en_us/home.html (accessed on 30 June 2016).
10. Gordon, L. Using Classification and Regression Trees (CART) in SAS ® Enterprise Miner TM For Applications
in Public Health. In SAS Global Forum—Data Mining and Text Analytics, Proceedings of the SAS Global Forum
2013, SanFrancisco, CA, USA, 28 April–1 May 2013; pp. 1–8.
11. Klatsky, A.L.; Hasan, A.S.; Armstrong, M.A.; Udaltsova, N.; Morton, C. Coffee, Caffeine, and Risk of
Hospitalization for Arrhythmias. Perm. J. 2011, 15, 19–25. [CrossRef] [PubMed]
12. Abousalh-Neto, N.A.; Kazgan, S. Big data exploration through visual analytics. In Proceedings of the IEEE
Conference on Visual Analytics Science and Technology (VAST), Seattle, WA, USA, 14–19 October 2012;
pp. 285–286.
13. IBM Watson Analytics Home Page. Available online: Http://www.ibm.com/analytics/watson-analytics/
(accessed on 30 June 2016).
14. IBM Watson Analytics Community Page. Available online: https://community.watsonanalytics.com/
(accessed on 30 June 2016).
15. Watson Analytics Use Case for HR: Retaining Valuable Employees. Available online: https://www.ibm.
com/blogs/watson-analytics/watson-analytics-use-case-for-hr-retaining-valuable-employees/ (accessed
on 30 June 2016).
16. Watson Analytics Use Case Independence Day Edition: Fireworks and the 4th of July. Available online:
http://www.scoop.it/t/gaming-analytics/p/4046911509/2015/07/02/watson-analytics-use-case-independence-
day-edition-fireworks-and-the-4th-of-july (accessed on 30 June 2016).
17. Panahiazar, M.; Taslimitehrani, V.; Pereira, N.; Pathak, J. Using EHRs and Machine Learning for Heart Failure
Survival Analysis. Stud. Health Technol. Inform. 2015, 216, 40–44. [PubMed]
18. Guidi, G.; Pettenati, M.C.; Miniati, R.; Iadanza, E. Random Forest for Automatic Assessment of Heart Failure
Severity in a Telemonitoring Scenario. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2013, 2013, 3230–3233. [PubMed]
Future Internet 2016, 8, 32 16 of 16
19. Guidi, G.; Melillo, P.; Pettenati, M.; Milli, M.; Iadanza, E. Performance Assessment of a Clinical Decision
Support System for analysis of Heart Failure. IFMBE Proc. 2014, 41, 1354–1357.
20. Chui, K.T.; Tsang, K.F.; Wu, C.K.; Hung, F.H.; Chi, H.R.; Chung, H.S.; Man, K.F.; Ko, K.T. Cardiovascular
diseases identification using electrocardiogram health identifier based on multiple criteria decision making.
Expert Syst. Appl. 2015, 42, 5684–5695. [CrossRef]
21. Boursalie, O.; Samavi, R.; Doyle, T.E. M4CVD: Mobile Machine Learning Model for Monitoring
Cardiovascular Disease. Procedia Comput. Sci. 2015, 63, 384–391. [CrossRef]
22. Guidi, G.; Pettenati, M.C.; Miniati, R.; Iadanza, E. Heart Failure analysis Dashboard for patient’s remote
monitoring combining multiple artificial intelligence technologies. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2012,
2012, 2210–2213. [PubMed]
23. Melillo, P.; Fusco, R.; Sansone, M.; Bracale, M.; Pecchia, L. Discrimination power of long-term heart rate
variability measures for chronic heart failure detection. Med. Biol. Eng. Comput. 2011, 49, 67–74. [CrossRef]
[PubMed]
24. Pecchia, L.; Melillo, P.; Sansone, M.; Bracale, M. Discrimination power of short-term heart rate variability
measures for CHF assessment. IEEE Trans. Inf. Technol. Biomed. 2011, 15, 40–46. [CrossRef] [PubMed]
25. McMurray, J.J.V.; Adamopoulos, S.; Anker, S.D.; Auricchio, A.; Böhm, M.; Dickstein, K.; Falk, V.; Filippatos, G.;
Fonseca, C.; Gomez-Sanchez, M.A.; et al. ESC Guidelines for the diagnosis and treatment of acute and
chronic heart failure 2012: The Task Force for the Diagnosis and Treatment of Acute and Chronic Heart
Failure 2012 of the European Society of Cardiology. Developed in collaboration with the Heart. Eur. Heart J.
2012, 33, 1787–1847. [CrossRef] [PubMed]
26. Inglis, S.C.; Clark, R.A.; McAlister, F.M.; Ball, J.; Lewinter, C.; Cullington, D.; Stewart, S.; Cleland, J.
Structured telephone support or telemonitoring programmes for patients with chronic heart failure.
Cochrane Lybrary 2010, 8. [CrossRef]
27. Takeda, A.; Sjc, T.; Rs, T.; Khan, F.; Krum, H.; Underwood, M. Clinical service organisation for heart failure.
Cochrane Database Syst Rev. 2012. [CrossRef]
28. Goldberger, A.L.; Amaral, L.A.N.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.;
Peng, C.-K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research
Resource for Complex Physiologic Signals. Circulation 2000, 101, E215–E220. [CrossRef] [PubMed]
29. Guidi, G.; Pollonini, L.; Dacso, C.C.; Iadanza, E. A multi-layer monitoring system for clinical management of
Congestive Heart Failure. BMC Med. Inform. Decis. Mak. 2015, 15 (Suppl. S3). [CrossRef] [PubMed]
30. Iadanza, E.; Baroncelli, L.; Manetti, A.; Dori, F.; Miniati, R.; Gentili, G.B. An rFId Smart container to perform
drugs administration reducing adverse drug events. IFMBE Proc. 2011, 37, 679–682.
© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC-BY) license (http://creativecommons.org/licenses/by/4.0/).