Secure and Robust Machine Learning For Healthcare
Secure and Robust Machine Learning For Healthcare
Abstract— Recent years have witnessed widespread adoption radiology [11], ophthalmology [12], and dermatology [13].
of machine learning (ML)/deep learning (DL) techniques due to Some of these studies have even reported that DL models
their superior performance for a variety of healthcare applica- outperform human physicians on average. The aspect of better
tions ranging from the prediction of cardiac arrest from one-
dimensional heart signals to computer-aided diagnosis (CADx) performance of DL models in comparison with humans has led
using multi-dimensional medical images. Notwithstanding the to the development of computer-aided diagnosis systems—for
impressive performance of ML/DL, there are still lingering instance, U.S. Food and Drug Administration has announced
doubts regarding the robustness of ML/DL in healthcare settings the approval of an intelligent diagnosis system for medical
(which is traditionally considered quite challenging due to the images that will not require any human intervention1 .
myriad security and privacy issues involved), especially in light
of recent results that have shown that ML/DL are vulnerable The potential of ML models for healthcare applications is
to adversarial attacks. In this paper, we present an overview also benefitting from the progress in concomitantly-advancing
of various application areas in healthcare that leverage such technologies like cloud/edge computing, mobile communi-
techniques from security and privacy point of view and present cation, and big data technology [14]. Together with these
associated challenges. In addition, we present potential methods technologies, ML/DL is capable of producing highly accurate
to ensure secure and privacy-preserving ML for healthcare
applications. Finally, we provide insight into the current research predictive outcomes and can facilitate the human-centered in-
challenges and promising directions for future research. telligent solutions [15]. Along with other benefits like enabling
remote healthcare services for rural and low-income zones,
these technologies can play a vital role in revitalizing the
I. I NTRODUCTION healthcare industry.
We are living in the age of algorithms, in which machine Notwithstanding the impressive performance of DL algo-
learning (ML)/deep learning (DL) systems have transformed rithms, many recent studies have raised concerns about the
multiple industries such as manufacturing, transportation, and security and robustness of ML models—for instance, Szegedy
governance. Over the past few years, DL has provided state et al. demonstrated for the first time that DL models are strictly
of the art performance in different domains—e.g., computer vulnerable to carefully crafted adversarial examples [16].
vision, text analytics, and speech processing, etc. Due to Similarly, various types of data and model poisoning attacks
the extensive deployment of ML/DL algorithms in various have been proposed against DL systems [17] and different
domains (e.g., social media), such technology has become defenses against such strategies have been proposed in the
inseparable from our routine life. ML/DL algorithms are literature [18]. However, the robustness of defense methods
now beginning to influence healthcare as well—a field that is also questionable and different studies have shown that
has traditionally been impervious to large-scale technological most of the defense techniques fail against a particular attack.
disruptions [1]. ML/DL techniques have shown outstanding The discovery of the fact that DL models are neither secure
results recently in versatile tasks such as recognition of body nor robust hinders significantly their practical deployment in
organs from medical images [2], classification of interstitial security-critical applications like predictive healthcare which is
lung diseases [3], detection of lungs nodules [4], medical essentially life-critical. For instance, researchers have already
image reconstruction [5], [6], and brain tumor segmentation demonstrated the threat of adversarial attacks on ML-based
[7], to name a few. medical systems [19], [20]. Therefore, ensuring the integrity
It is highly expected that intelligent software will assist and security of DL models and health data are paramount to
radiologists and physicians in examining patients in the near the widespread adoption of ML/DL in the industry.
future [8] and ML will revolutionize the medical research and In this paper, we present a comprehensive survey of existing
practice [9]. Clinical medicine has emerged as a exciting appli- literature on the security and robustness of ML/DL models
cation area for ML/DL models, and these models have already with a specific focus on their applications in healthcare sys-
achieved human-level performance in clinical pathology [10], tems. We also highlight various challenges and sources of
Fig. 1: The illustration of major phases for development of machine learning (ML) based healthcare systems.
vulnerabilities that hinder the robust application of ML/DL and privacy-preserving ML are discussed in Section IV and
models in healthcare applications. In addition, potential solu- various open research issues are outlined in Section V. Finally,
tions to address these challenges are presented in this paper. In we conclude the paper in Section VI.
summary, the following are the specific contributions of this
paper. II. ML FOR H EALTHCARE : A PPLICATIONS
1) We present an overview of different applications of In this section, various prominent applications of ML in
ML/DL models in healthcare. healthcare are discussed and we start by providing the big
2) We formulate the ML pipeline for predictive healthcare picture of ML in the context of healthcare.
and identify various sources of vulnerabilities at each
stage. A. ML in Healthcare: The Big Picture
3) We highlight various conventional security and privacy- The major phases for developing a ML-based healthcare
related challenges as well as ones that arise with the system are illustrated in Figure 1 and major types of ML/DL
adoption of ML/DL models. that can be used in healthcare applications are briefly described
4) We present potential solutions for the robust application next.
of ML/DL techniques for healthcare applications. 1) Unsupervised Learning: The ML techniques utilizing
5) Finally, we highlight various open research issues that unlabelled data are known as unsupervised learning methods.
require further investigation. Widely used examples of unsupervised learning methods are
Organization of the Paper: The rest of the paper is organized a clustering of data points using a similarity metric and
as follows. In Section II, various applications of ML and DL dimensionality reduction to project high dimensional data to
techniques in healthcare are discussed. Section III presents the lower-dimensional subspaces (sometimes also referred to as
ML pipeline in data-driven healthcare and various sources of feature selection). In addition, unsupervised learning can be
vulnerabilities along with different challenges associated with used for anomaly detection, e.g., clustering [21]. Classical
the use of ML. Different potential solutions to ensure secure examples of unsupervised learning methods in healthcare
3
mortality prediction and length-of-stay and authors showed in histopathological images is presented in [43]. A hybrid
that prediction performance gets degraded when ML models method utilizing handcrafted features and a CNN model
are trained on historical data and tested on unseen (future) for the detection of mitosis in breast cancer images is
data. presented in [44].
b) ML in Medical Image Analysis: In medical image • Classification DL models in particular, convolutional
analysis, ML techniques are used for efficient and effective ex- neural networks (CNNs) have proven to give high per-
traction of information from medical images that are acquired formance in medical image classification tasks when
using different imaging modalities such as magnetic resonance compared with other state-of-the-art non-learning based
imaging (MRI), computed tomography (CT), ultrasound, and techniques. Modality classification, recognizing different
positron emission tomography (PET), etc. These modalities body organs, and abnormalities from medical images
provide important functional and anatomical information about using CNNs have been extensively studied in the liter-
different body organs and play a crucial role in the detec- ature. In [2], an approach using CNN for multi-instance
tion/localization and diagnosis of abnormalities. A taxonomy recognition of different body organs is presented and a
of key medical imaging modalities is presented in Figure CNN based method for classification of interstitial lung
3. The key purpose of medical image analysis is to assist diseases (ILDs) is presented in [3]. In another study,
clinicians and radiologists for efficient diagnosis and prognosis a CNN model is trained for the classification of lung
of the diseases. The prominent tasks in medical image analysis nodules [4].
include detection, classification, segmentation, retrieval, recon- Transfer learning approaches have also been used for
struction, and image registration which are discussed next. medical image classification [45]. In transfer learning,
Moreover, fully automated intelligent medical image diagnosis a pre-trained DL model (typically trained on natural
systems are expected to be part of next-generation healthcare images) is fine-tuned on a comparatively small dataset of
systems. medical images. The results obtained by this approach, as
• Enhancement: Enhancement of degraded medical images reported in the literature, are promising; however, a few
is an important pre-processing step that directly effects studies have reported contradictory results. For instance,
the diagnosis process. There are many sources of noise results obtained by transfer learning in [46] and [47] are
and disturbances encountered in the medical image acqui- contradictory.
sition process which degrade the quality and significance • Segmentation: The segmentation of tissues and organs
of the resultant images. For instance, generating MRI in medical images enables quantitative analysis of ab-
images is a quite lengthy process that typically requires normalities in terms of clinical parameters, e.g., auto-
several minutes to produce a good quality image and matically measuring the volume and shape of cancer in
to acquire detailed soft-tissue contrast, patients have to brain images. In addition, the extraction of such clinically
remain still and straight as much as possible. Because significant features is an important and foremost step in
movements can cause false artifacts in image acquisi- computer-aided detection and diagnosis systems that we
tion, the complete process has to be repeated usually discuss later in this section. The process of segmentation
multiple times to produce significantly useful images. deals with the partitioning of an image into multiple
Also, depending on the body area being scanned and the non-overlapping parts using a pre-defined criterion such
number of images to be taken, patients might be asked as intrinsic color, texture, and contrast, etc. Addressing
to hold their breath during short scans [40]. Therefore, the problem of segmentation utilizing various DL models
any movement of the subject can introduce artifacts in (e.g, CNN and recurrent neural network (RNN) [48]) is
the acquired image. Moreover, some sort of mechanical widely studied in the literature and the common archi-
noise is also sometimes introduced in the output image. In tecture used for segmentation of medical images is U-net
the literature, different DL models are used for denoising [49]. Various DL architectures are being proposed for the
medical images such as convolutional denoising autoen- segmentation of multi-modal images such as the brain,
coders [41] and GANs. In addition, GANs have been skin cancer, CT images, etc. as well as segmentation
successfully used for cleaning motion artifacts introduced of volumetric images [50]. An overview of various DL
in multi-shot MRI images [14]. Super-resolution is yet models for segmentation of medical images is presented
another powerful and impactful enhancement technique in [51].
for medical images, e.g., MRI denoising [42]. • Reconstruction: The process of generating interpretable
• Detection: The process of identifying specific disease images from raw data acquired from the imaging sensor is
patterns or abnormalities (e.g., tumor, cancer) in medical known as medical image reconstruction. The fundamental
images is known as detection. In traditional clinical problem in medical image reconstruction is to accelerate
practice, such abnormalities are identified by expert ra- the inherently slow data acquisition process, which is
diologists or physicians that often require a lot of time an interesting ill-posed inverse problem in which we
and effort. Whereas, DL based methods have shown their want to determine the system’s input given its output.
potential for this task and various studies have been Many important medical imaging modalities require a
presented in the literature for the detection of diseases. lot of time for reconstructing an image from the raw
For instance, a locality-sensitive approach utilizing CNN data samples, e.g., MRI and CT. Thus in medical image
for the detection and classification of nuclei colon cancer reconstruction, we aim to reduce image acquisition time
5
Fig. 3: A typology of commonly used medical imaging modalities (adapted from [39]).
and storage space. video collections to big data. This trend is true for
Research on medical image reconstruction using deep medical imaging as well, every hospital and clinic having
models is drastically increasing and various DL models radiology services are producing thousands of medical
such as CNNs [52] and autoencoders [6] have been images daily in diverse modalities, resulting in the growth
extensively used for the reconstruction of MRI and of large-scale multi-modal medical image repositories.
CT images. Recently, generative adversarial networks Thus making it difficult to manage and query such
(GANs) have been widely used for the reconstruction of huge databases. In particular, it is more challenging for
medical images and have produced outstanding results. multi-modal medical data. To facilitate the production
For instance, a GAN based MRI reconstruction method and management of multi-modal medical data, traditional
is presented in [53] that also cleans the motion artifacts. methods are not sufficient and various ML/DL techniques
• Image Registration: Image registration is the process of are proposed in the literature [58], [59].
mapping input images with respect to a reference image In routine practice, clinicians usually compare the current
and it is the first step in image fusion. Image registration cases with the previous ones, mainly to effectively plan
has many potential applications in medical image analysis the diagnosis and treatment of the patient being examined.
as described in detail by El-Gamal et al. [54], however, In this regard, identifying modality (i.e., modality classifi-
their use in actual clinical applications is very limited cation discussed above) is of great significance as it serves
[55]. To facilitate the surgical spinal screw implant or as an initial tool to facilitate the process of comparison
tumor removal, image registration is usually applied in and an efficient modality classification system will reduce
spinal surgery or neurosurgery for the localization of the search space by only looking for relevant images in
spinal bony landmark or a tumor, respectively. Various the collections of the desired modality.
similarity metrics and reference points are calculated to 3) Applications of ML in Treatment:
align the sensed image with the reference image. In [56],
a framework for deformable image registration named a) Image Interpretation: As discussed above, medical
as Quicksilver is proposed that uses the large deforma- images are widely used in the routine clinical practice and
tion diffeomorphic metric mapping (LDDMM) model for the analysis and interpretation of these images are performed
patch-wise prediction strategy. Similarly, an unsupervised by expert physicians and radiologists. To narrate the findings
learning based methods for deformable image registration regarding images being studied, they write textual radiology
is presented in . In [57], a CNN based regression approach reports about each body organ that was examined in the
for 2D/3D image registration is presented that addresses conducted study. However, writing such reports is very chal-
two fundamental limitations of existing intensity-based lenging in some scenarios, e.g., less experienced radiologists
image registration methods, i.e., small capture range and and healthcare service providers in rural areas where the
slow computation. quality of healthcare services is not up to the mark. On the
• Retrieval: The recent era has witnessed the revolution other side, for experienced radiologists and pathologists, the
of digital interventions from the large-scale image and process of preparing high-quality reports can be tedious and
time-consuming which can be exacerbated by a large number
6
of patients visiting daily. Therefore, various researchers have c) Clinical Reinforcement Learning: In reinforcement
attempted to address this problem using natural language learning, the key objective is to learn a policy function for
processing (NLP) and ML techniques. In [60], a natural making precise decisions in an uncertain environment to
language processing based method is proposed for annotating maximise accumulated reward. In clinical medicine, RL can be
clinical radiology reports. A multi-task ML based framework used for providing optimal diagnosis and treatment for patients
is proposed for automatic tagging and description of medical with distinct characteristics [70]. The performance evaluation
images [61]. In a similar study [62], an end-to-end architecture of different RL techniques (i.e., Q-value iteration, tabular Q-
developed with the integration of CNN and RNN is presented learning, fitted Q-iteration (FQI), and deep Q-learning) for the
for thorax disease classification and reporting in chest X- treatment of sepsis in ICU using real-world medical dataset
rays. In [63], a novel multi-modal model utilizing CNN and is presented in [71]. Sepsis is a severe infection involving
long short term memory (LSTM) network is developed for organ dysfunction and is a leading cause of mortality due
automatic report generation. to expensive and suboptimal treatment. The dataset contains
b) ML in Real-time Health Monitoring: Real-time mon- trajectories of a patient’s physiological state and the provided
itoring of critical patients is crucial and is a key component treatments by clinicians at each time, along with the outcome
of the treatment process. Continuous health monitoring using (i.e., survival or mortality). The study concluded that simple
wearable devices, IoT sensors, and smartphones is gaining and tabular Q-learning can learn effective policies for sepsis
interest among people. In a typical setting of continuous health treatment and their performance is comparable with a complex
monitoring, health data is collected using a wearable device continuous state-space method, i.e., deep Q-learning.
and smartphone and then transmitted to the cloud for analysis d) ML for Clinical Time-Series Data: One of the tasks in
using an ML/DL technique. Then the outcomes are transmitted clinical workflows is the modeling of clinical time-series data.
back to the device for appropriate action(s). For instance, a Applications of clinical time-series modeling include predic-
framework having a similar system architecture is presented tion of clinical interventions in intensive care units (ICUs)
in [64]. The system is developed by integrating mobile and using CNN and LSTM [72], mortality prediction in patients
cloud for monitoring of heart rate using PPG signals. Simi- with traumatic brain injury (TBI) [73], and estimation of mean
larly, a review of different ML techniques for human activity arterial blood pressure (ABP) and intracranial pressure (ICP)
recognition with application to remote monitoring of patients which are important indicators cerebrovascular autoregulation
using wearable devices is presented in [65]. The sharing of (CA) in TBI patients. In a recent study, attention models are
health data with clouds for further analysis raises many privacy used for the management of ICUs forecasting tasks (such as
and security challenges that we discuss in the next section. diagnosis, estimation, and prediction, etc.) by integrating clin-
ical notes with multivariate and time-series measurements data
4) Applications of ML in Clinical Workflows:
[74]. In a similar study, the problem of unexpected respiratory
a) Disease Prediction and Diagnosis: The early predic- decompensation using ML techniques is investigated in [75].
tion and diagnosis of diseases from medical data are one e) Clinical Natural Language Processing: Clinical notes
of the exciting applications of ML. Various studies have are a widely used tool by the clinicians to communicate patient
highlighted the potential of using predictive healthcare for state. The use of clinical text is crucial as it often contains
the timely treatment of diseases. For instance, the case of the most important information. The progress in clinical NLP
cardiovascular risk prediction using different ML algorithms techniques is envisioned to be incorporated in future clinical
with clinical data is studied in [66] and the study concluded software for extracting relevant information from unstructured
that ML techniques improved the prediction efficacy. A survey clinical notes for improving clinical practice and research
of various ML techniques for the detection and diagnosis of [76]. Clinical NLP offers unique challenges such as the use
different diseases (such as diabetes, dengue, hepatitis, heart, of acronyms, language disparity, partial structure, and quality
and liver) is presented in [67]. The potential of using ML- variance, etc. The challenges and opportunities of clinical NLP
based methods for prediction and prognosis of cancer is for languages other than English along with a review of clinical
highlighted in [68]. NLP techniques is presented in [77]. In [78], authors presented
b) ML in Computer-Aided Detection or Diagnosis: The a toolkit named CLAMP that provides different state of the
computer-aided detection (CADe) or computer-aided diagnosis art NLP techniques for clinical text analysis.
(CADx) systems are being developed mainly for the auto- f) Clinical Speech and Audio Processing: In the clinical
matic interpretation of medical images that would assist the environment, clinicians have to do a lot of documentation, i.e.,
radiologist in their clinical practice. The system works by preparing clinical notes, discharge summaries, and radiology
utilizing different functionalities including ML/DL, traditional reports, etc. According to Dr. Simon Wallace, clinicians spend
computer vision and image processing techniques and relies 50% of their time on clinical documentation and are highly
heavily on the performance of these techniques. IBM’s Wat- demotivated due to clinical workload, administrative tasks, and
son is a classical example of CADx system developed by lack of leisure time [79]. Typically, they spend more time in
integrating various techniques including ML. However, any preparing clinical documentation as compared to interacting
task in medical image and signal analysis automated by the directly with patients. To overcome such challenges, clinical
application of ML/DL models can be deemed as a CADe or speech and audio processing offer new opportunities such
CADx systems, e.g., automation detection of fatty liver in as speech interfaces for interaction less services, automatic
ultrasound kurtosis imaging [69]. transcription of patient conversations, and synthesis of clin-
7
ical notes, etc. There are many benefits for using speech 2) Vulnerabilities Due to Data Annotation: Most applica-
and audio processing tools in the clinical environment for tions of ML/DL in healthcare systems are supervised ML tasks
each stakeholder, i.e., patients (speech is a new modality which require an abundance of labelled training data. The
for determining patient state), clinicians (efficiency and time- process of assigning labels to each data sample (e.g., medical
saving), and healthcare industry (enhance productivity and image) is known as data annotation. Ideally, this task shall
cost reduction). In the literature, speech processing has been mostly be performed by experienced clinicians (physicians
used for the identification of disorders related to speech, or radiologists) to prepare domain-enriched datasets which
e.g., vocal hyperfunction [80] and as well as disorders that are crucial to the development of useful ML/DL models in
manifest through speech, e.g., dementia [81]. Alzheimer’s healthcare systems. The literature has revealed that training
disease identification using linguistic features is presented in ML/DL models without a sound grip of the domain could
[82]. In clinical speech processing, disfluency and utterance be disastrous [85]. However, clinicians like expert radiologists
segmentation are two well-known challenges of clinical speech are rare professionals and hard to engage in secondary tasks
processing. like data annotation. As a result, trainee staff (with little
domain expertise) or ML/DL automated algorithms are usually
III. S ECURE , P RIVATE , AND ROBUST ML FOR employed during data labelling, which often leads to many
H EALTHCARE : C HALLENGES problems such as coarse-grained labels, class imbalance, label
In this section, we analyze the security and robustness of leakage, and misspecification. Some specific data annotation-
ML/DL models in healthcare settings and present various based vulnerabilities are discussed as below:
associated challenges. Ambiguous Ground Truth: In medical datasets, the ground
truth is often ambiguous, e.g., medical image classification
task [19] and even expert clinicians disagree on well-defined
A. Sources of Vulnerabilities in ML Pipeline
diagnostic tasks [86]. This problem becomes more adverse
ML application in healthcare settings suffers from various with the presence of malicious users who want to perturb
privacy and security challenges that we will thoroughly discuss data, making the diagnosis difficult and causing difficulties
in this section. In addition, the three major phases of ML in detecting its influence even with a human expert review.
model development along with different potential sources of Improper Annotation: The annotation of data samples pro-
vulnerabilities causing such challenges in each step of the ML cess for life-critical healthcare applications should be informed
pipeline are depicted in Figure 4. by proper guidelines and various privacy and legal considera-
1) Vulnerabilities in Data Collection: Training of ML/DL tions [87]. Most widely used healthcare datasets are annotated
models for clinical decision support requires the collection for coarse-grained labels whereas real-life utility of ML/DL
of a large amount of data (in formats such as EHRs, medical is to highlight rare, fine-grained and hidden strata within
images, radiology reports, etc.), which is in general often time- the clinical environment. This inability to perform labelling
consuming and requires significant human efforts. Although appropriately can lead to various efficiency challenges that are
in practice, medical data is carefully collected to ensure the discussed next.
effectiveness of the diagnosis, however, there can be many
Efficiency Challenges: The collections of healthcare data
sources of vulnerabilities that can affect the proper (expected)
on which ML/DL models are built suffer from various issues
functionality of the underlying ML/DL systems, a few of them
that arise several efficiency challenges. A few major problems
are described next.
impacting the quality of data are described next.
Instrumental and Environmental Noise: The collected data
often contains many artifacts that arise due to instrumental and (a) Limited and Imbalanced Datasets: The size of datasets
environmental disturbances. Let’s consider the example of one used for training ML/DL models is not up to the required
of the widely used imagining modalities used to acquire high- scale. In particular, one major limitation of the efficient
resolution medical images, i.e., multishot MRI. This modality application of DL approaches in healthcare is the unavail-
is highly sensitive to motion, and even slight movement of the ability of large-scale datasets, as health data is often small
subject’s head or respiration can cause undesirable artifacts in size. Notably, most life-threating health conditions are
in the resultant image [14], thereby increasing the risk of naturally rare and diagnosed once in many (thousands
misdiagnosis [83]. to millions) patients. Therefore, most ML/DL algorithms
Unqualified Personnel: Healthcare ecosystems are ex- can not be efficiently trained and optimized for such life-
tremely interdisciplinary and comprise of technical and non- threatening healthcare task.
technical personnel and often lack qualified workers that can (b) Class Imbalance and Bias: Class imbalance is yet another
develop and maintain ML/DL systems. As for the efficient problem that arises in the supervised ML/DL which refers
application of data-driven healthcare, workers with strong to the fact that the distribution of samples among classes
statistical and computational backgrounds are required, e.g., is not uniform. If a class imbalanced dataset is used for
engineers and data scientists. On the contrary, the clinical training of the model then it will be reflected in the
usability of ML/DL based systems is extremely important. model’s outcomes in terms of bias to certain categories.
Considering this aspect, hospitals tend to rely solely on Biases in models’ predictions in healthcare settings will
physician-researchers who lack computational expertise to have profound consequences and should, therefore, be
develop such systems [84]. mitigated. Various approaches have been proposed in
8
Fig. 4: The pipeline for data-driven predictive clinical care and various sources of vulnerabilities at each stage.
the literature to address class imbalance problems. These generating adversarial examples [90].
approaches are discussed in the next section. Incomplete Data: In realistic settings, data collected for
(c) Sparsity: Data sparsity, i.e., missing values are common providing patient care may contain missing observations or
in real-world data that arise due to various reasons (e.g., variables, e.g., EHRs. The simplest way to avoid missing
unmeasured and unreported samples, etc.). Missing val- values is to ignore them completely while doing analysis
ues and observations significantly affect the performance but it cannot be done without knowing their relationships
of ML/DL techniques. with already observed or unobserved data. Using the missing
3) Vulnerabilities in Model Training: The vulnerabilities observations for training ML/DL models, on the other hand,
regarding model training include improper or incomplete train- leads to two well-known problems, i.e., false positives (a
ing, privacy breaches, model poisoning and stealing. Improper healthy person is diagnosed with a disease) and false negatives
or incomplete training refers to the situations when the ML/DL (a patient is identified as healthy). Both problems can have
model is trained with improper parameters, e.g., learning rate, severe outcomes in actual healthcare settings, therefore, the
epochs, batch size. Moreover, ML/DL models have been found healthcare data should be complete and compact in all aspects
strictly vulnerable to various security and privacy threats such to ensure accurate predictions of outcomes.
as adversarial attacks [16], model [88] and data poisoning 5) Vulnerabilities in Testing Phase: Vulnerabilities in the
attacks [89], etc. The vulnerabilities of ML/DL systems hinder testing phase are concerned with the interpretation of the
their efficient deployment for security-critical applications results from the underlying ML/DL systems that include
(such as digital forensic, bio-metrics, etc.) and as well as life- misinterpretation, false positive, and false-negative outcomes.
critical applications (such as self-driving cars and healthcare, False-positive and false-negative outcomes are due to incom-
etc.). Therefore, ensuring the security and integrity of the plete/inefficient training of the model or due to incomplete
ML/DL systems is of paramount importance for such critical data fed for the inference that we have discussed in the earlier
applications. Various security threats associated with ML/DL section. Finally, the true essence of ML empowered healthcare
systems are thoroughly described in the next section. is not just about turning a crank but it demands the cautious
4) Vulnerabilities in Deployment Phase: The deployment application of analytical methods [91].
of ML/DL techniques in a clinical environment essentially
involves human-centric decisions. Therefore, ensuring the ro- B. The Security of ML: An Overview
bustness of the system while considering fairness and account- In this section, we provide an overview of ML security
ability is necessary for the deployment phase. The following particularly from the perspective of healthcare and highlight
are the major vulnerabilities that can be encountered in the de- various associated security challenges with the use of ML.
ployment phase of ML/DL systems. Whereas, security issues 1) Security Threats: The security threats on ML systems
(e.g., adversarial attacks) are discussed in the next section. can be broadly categorized into three dimensions, i.e., influ-
Distribution Shifts: Distributions shifts are very much ex- ence attacks, security violations, and attack specificity [92]. A
pected in realistic healthcare settings, for example, let’s con- taxonomy of these security threats on ML systems is depicted
sider different imaging centers and DL models trained on in Figure 5.
images of one domain (imaging center) are subsequently (a) Influence: Influence attacks can be of two types: (1)
deployed on different domain images. In such settings, the per- causative: the one that attempts to get control over
formance of the underlying DL model degrades significantly. training data; (2) exploratory: the one that exploits the
Moreover, in predictive healthcare, ML models are developed miss-classification of the ML model without intervening
using historical patient data and are usually tested on the new the model training.
patients which raise questions about the efficacy of the ML (b) Security Violation: It is concerned with the availability
predictions. Moreover, such differences can be exploited for and integrity of the services and can be categorized into
9
3) Ethical Challenges: In user-centric applications of ML 6) Availability of Good Quality Data: The availability of
such as healthcare, it is important to ensure the ethical use representative, diverse and high-quality data is one of the
of data. Explicit measures should be taken to understand the major challenges in healthcare. For instance, the amount
targeted user population and their sociological aspects before of data available to the research community is very small
collecting data for building ML models. Moreover, understand- in size and limited in scope as compared to the heteroge-
ing how data collection can harm a patient’s well-being and neous collections of large-scale multi-modal patient data being
dignity is an important consideration in this regard. If ethical generated on daily basis by different small and large size
concerns are not taken into account then the application of ML healthcare institutions. However, the development of good
in realistic settings will have adverse results. Furthermore, to quality data that resembles real clinical settings is on the
ensure fair and ethical operation of automated systems, it is other very challenging and requires resources for management
imperative to have a clear understanding of the AI system in and maintenance. The availability of high-quality data can
uncertain and complex scenarios [101]. effectively serve the intended purpose of disease prediction
4) Causality is Challenging: Understanding causality is and decision making for planning treatment.
important in healthcare because most of the crucial healthcare The data collected in practice suffer from different issues
problems require causal reasoning, i.e., “what if?” [102]. For such as subjectivity, redundancy, and bias. As the ML/DL
example, asking a question about what will happen if a doctor models perform inferences by solely learning the latent factors
prescribed treatment A instead of treatment B. Such questions of the data on which they are trained, therefore, the effect of
cannot be exploited through classical learning algorithms and data generated by the undesirable past practices of hospitals
to answer them we need to analyze the data from the lens of will be reflected in the outcomes of the algorithm. For exam-
causality [103]. In healthcare, learning is often solely based ple, most people with no health insurance are denied health-
on observational data and asking causal questions by learning care services and if AI learns from that data, it will do the
from observational data is quite challenging which requires same. It has been shown that a model could depict racial bias
building causal models. by producing varying outcomes for different subpopulations
[115] and the training data can also introduce its own modeling
DL models are black-box which lacks fundamental under- challenges [116], [117].
lying theory and these models essentially work by exploiting 7) Lack of Data Standardization and Exchange: Medical
patterns and correlations without considering any causal link ML/DL system shall facilitate a deep understanding of the
[104]. In general, this cannot be deemed as a limitation since underlying healthcare task, which (in most cases) can only be
prediction does not require any causal relation. In predictive achieved by utilising other forms of patients data. For example,
healthcare, the absence of causal relation can raise questions radiology is not all about clinical imaging. Other patient EMR
about the conclusions that can be drawn from outcomes of DL data is crucial for radiologists to derive the precise conclusion
models. Furthermore, fairness in decision making can better for an imaging study. This calls for the integration and data
be enforced through the lens of causal reasoning [105], [106]. exchange between all healthcare systems. Despite extensive
The estimation of the causal effect of some variable(s) on research on data exchange standards for healthcare, there is
a target output (e.g., target class in multi-class classification a huge ignorance in following those standards in healthcare
problem) is important to ensure fair predictions. IT systems which broadly affects the quality and efficacy of
5) Regulatory and Policy Challenges: The full potential healthcare data, accumulated through these systems. There
of ML/DL systems (which essentially constitutes software as are numerous guidelines to perform specific medical inter-
a medical device) in actual healthcare settings can only be ventions like imaging studies (i.e., with define exposure and
realized by addressing regulatory and policy challenges. The positioning) to ensure the significance of the data clinically.
literature suggests that the regulatory guidelines are needed However, current healthcare IT systems largely ignore stan-
for both medical ML/DL systems and their integration in dards and clinicians barely follow well-established guidelines.
actual clinical settings [107]. Therefore, the integration of AI- As a result, data integration and exchange efforts across
empowered ML/DL systems in the actual clinical environment different specialities and organisations fail. Data integration
should be in compliance with the policies and regulations to match diverse patients’ medical records is crucial to deliver
defined by the government and regulatory agencies. However, high-value patient care. The lack of appetite to implement
existing regulations are not suitable for certifying systems data exchange standards in wider healthcare industry hinders
which are ever-evolving such as ML/DL empowered systems the efficacy of ML/DL systems as multi-modal data is vital
because yet another key challenge with the use of ML/DL to ensure the deep understanding of algorithms, and will
algorithms in clinical practice is to determine how these undoubtedly enhance the performance of physicians towards
models should be implemented and regulated since these clinical decisions using data driven insights.
models will incorporate learning from the new patient data 8) Distribution Shifts: The problem of data distribution
[108]. In addition, the objective clinical evaluation of ML/DL shifts is yet another major challenge and perhaps one of the
systems for particular clinical settings is crucial to ensure most challenging problems to solve [118]. In clinical practice,
safe, effective, and robust operation that does not harm the training and testing data distributions can diverge due to many
patients in either way. Data scientist and AI engineers should reasons, e.g., medical data is generated by different institutions
be employed in hospitals for assessing AI systems regularly using different devices for patients having complicated cases.
to ensure it is still safe, relevant, and working fine. Due to this issue, ML/DL models developed using available
11
TABLE I: Summary of the state of the art data security and privacy preserving methods in healthcare settings.
public databases (by the scientific community and academi- collected. In general, ML/DL requires training data stored on
cians) do not give expected performance when deployed in an a central repository (e.g., cloud) that may include the users’
actual clinical environment. Distribution shifts are frequent in private data which raises various threats and to address such
the medical domain, in particular, medical imagining where concerns data anonymization techniques are used. However, it
different protocols and parameter choices can result in images has been reported in the literature that meaningful information
of significantly different distributions. ML models are typically can be inferred about individuals’ private data even when the
trained under the principle of empirical risk minimization data is anonymized [119].
(ERM) which provides good learning bounds and guarantees Various efforts in the literature have addressed the privacy
if its assumptions are satisfied. For instance, one of the issues with the use of ML. Three different protocols for the
foremost and strong assumptions is that both the training two-server model are presented in [120], where the private
and test datasets are derived from a similar domain (i.e., data is distributed among two non-colluding servers by the
data distributions). However, this assumption is not valid in data owners and then those servers train the ML models
practice, and models trained under such an assumption fail to on the joint data by following secure two-party computation
generalize to other domains In contrast, the life-critical nature (2PC). Furthermore, different techniques have been proposed
of clinical applications demands a smooth and safe operation to perform secure arithmetic operations in the secure multi-
of ML/DL techniques. party computational environment and alternatives to nonlinear
9) Updating Hospital Infrastructure is Hard: Healthcare IT activation functions used in ML models such as softmax and
systems are mostly proprietary and operate in silos, which re- sigmoid are also proposed. Similarly, various techniques for
sults in the revision, fixing, and update of software being costly privacy-preserving ML such as cryptographic and differential
and time-consuming. It has been reported in the literature that privacy approaches are discussed in [100]. Here we briefly
in 2013, the majority of hospitals were using the ninth version discuss the widely used methods for preserving privacy.
of the international classification of disease (ICD) system— 1) Cryptographic Approaches: Cryptographic approaches
even though a revised version (i.e., ICD-10) was released are used in the scenarios where the ML model requires en-
as early as 1990 [19]. The difficulties in updating hospital crypted data (for training and testing purposes) from multiple
software infrastructure can raise many vulnerabilities with the parties. The widely used methods include homomorphic en-
use of modern tools like ML/DL systems. cryption, secret sharing, garbled circuits, and secure processors
which are briefly described next.
IV. S ECURE , P RIVATE , AND ROBUST ML FOR
H EALTHCARE : S OLUTIONS (a) Homomorphic Encryption: It enables computations on
encrypted data with operations such as addition and
In this section, we present an overview of various pro-
multiplication which can be used as a basis for computing
posed methods to ensure secure, private, and robust ML for
complex functions. Typically, the data is encrypted using
healthcare applications. A summary of articles focused on the
ciphertext and public keys of the original data owners.
topic of “secure and privacy-preserving ML for healthcare” is
(b) Garbled Circuits: Garbled circuits are used in cases
presented in Table I and various approaches for secure, private,
where two parties (let’s assume Alice and Bob) want
and robust ML are described next. In addition, a taxonomy of
to get results computed using their private data. Alice
commonly used approaches for secure, private, and robust ML
will send the function in the form of the garbled circuit
is presented in Figure 6 and described individually next.
along with her input. After obtaining the garbled version
of his input from Alice in oblivious fashion, Bob will
A. Privacy-Preserving ML use his garbled input with the garbled circuit to get the
Preserving the privacy of the user in healthcare is result of the required function and can share it with Alice,
paramount, as it is a user-centric application and involves if required. The use of homomorphic encryption and
the collection of personal data and any breach of privacy can garbled circuits to build cryptographic blocks for develop-
lead to unavoidable consequences. Preserving privacy means ing three classification techniques; namely, Nave Bayes,
that ML model training and inference should not reveal any decision trees, and hyperplane decision is presented in
additional information about the subjects from whom data was [121], where the goal is to protect ML models and new
12
Fig. 6: A taxonomy of commonly used approaches for secure, private, and robust ML.
samples submitted for inference. SGX-enabled data center. All types of communications
(c) Secret Sharing: The strategy of distributing secrete between the data owners and the enclave were performed
among multiple parties while holding a “share” of the by establishing independently a secure channel (i.e., an
secret is known as secret sharing. The secret can only be individual channel for each data owner).
reconstructed when all individual shares are combined; 2) Differential Privacy: Differential privacy refers to the
otherwise, they are unuseful. In some settings, the secret mechanism of adding perturbation into the datasets to pro-
is reconstructed using t shares (where t is a threshold tect private data. The idea of adding adequate noise in the
value) that will not require all shares to be combined. A database for preserving privacy was first introduced by C.
secret sharing paradigm for computing privacy-preserving Dwork in 2006 [126]. Differential privacy constitutes a strong
parallelized principal component analysis (PCA) is pre- standard for guaranteeing privacy for algorithms performing
sented in [122]. In a similar study [123], a protocol is analysis on aggregate databases and it is defined in terms
developed using the “secret sharing” strategy for ag- of the application-specific concept of neighbor datasets [127].
gregating model updates received from multiple input Differential privacy is particularly useful for applications like
parties, the updates are used for training of the ML model. healthcare due to its several properties such as group privacy,
A privacy-preserving emotion recognition framework is composability, and robustness to auxiliary information. Group
presented in [124]. Authors used a multi-secret sharing privacy implies elegant degradation of privacy guarantees
scheme for transmitting audio-visual data collected from when datasets contain correlated samples. Whereas, compos-
users using edge devices to the cloud where a CNN and ability enables modularity of the algorithmic design, i.e., when
sparse autoencoder were applied for feature extraction individual components are differentially private. Robustness
and support vector machine (SVM) was used for emotion to auxiliary information means that the privacy of the system
recognition. will not be affected by the use of any side’s information that
(d) Secure Processors: Secure processors were originally is known to the adversary. To avoid privacy breaches, the
developed by rogue software to ensure the confidentiality researchers can also explore encrypted and noisy datasets for
and integrity of sensitive code from unauthorized access building ML empowered healthcare applications [128].
at higher privilege levels. However, these processors are
Various approaches for differential privacy have been pro-
being utilized in privacy-preserving computation, e.g.,
posed in the literature, e.g., private aggregation of teacher
Intel SGXprocessor. For instance, Ohrimenko et al. devel-
ensembles (PATE) for private ML [129], differentially pri-
oped an SGX-processor-based data oblivious system for
vate stochastic gradient descent (DP-SGD) algorithm [127],
k-mean clustering, decision trees, SVM, and matrix fac-
moments accountant [130], hyperparameter selection [131],
torization [125]. The key idea was to enable collaboration
Laplace [132] and exponential noise differential privacy mech-
between multiple data owners running the ML task on an
anisms [133], [134]. For instance, privacy-preserving dis-
13
tributed DL for clinical data using differential privacy that • Classifier Robustifying: In this method, classification
incorporates the idea of cyclical weight transfer is presented models are developed that are robust to adversarial attacks
in [135]. rather than building a detection strategy for such attacks.
3) Federated Learning: The idea of federated learning (FL) In [146], authors exploited the uncertainty around the
has been recently proposed by Google Inc. [136]. In FL, a adversarial examples and proposed a hybrid model by
shared ML model is built using distributed data from multiple utilizing Gaussian processes (GPs) with RBF kernels on
devices where each device trains the model using its local data top of DNNs to make them robust against adversarial
and then shares the model parameters with the central model attacks. In a similar study, a robust model is proposed
without sharing its actual data. An FL-based decentralized for MNIST classification that uses analysis by synthesis
scheme using iterative cluster primal-dual splitting (cPDS) through learned class-conditional data distribution.
algorithm to predict hospitalization requiring patients using • Interpretable ML: It includes those methods that aim
large-scale EHR of heart-related diseases is presented in [137]. at explaining and interpreting the outcomes of ML/DL
In [138], simple vanilla, U-shaped, and vertically partitioned models for robustifying them against adversarial attacks.
data-based configurations for split learning DL models are An approach utilizing the interpretability of deep models
presented. The proposed framework is named SplitNN that for the detection of adversarial examples for face recog-
does not require sharing of patients’ critical data with the nition task is presented in a recent study [147]. The key
server. A framework of federated autonomous deep learning aspect of this method is that it identifies critical neurons
(FADL) using distributed EHR is presented in [139]. for the individual task by initiating a bi-directional cor-
respondence reasoning between the model’s parameters
B. Countermeasures Against Adversarial Attacks and its attributes. The activation values of the identified
In the recent literature, countermeasures against adversarial neurons are then increased to augment the reasoning part
attacks are categorized into three classes: (1) modifying model; and activation values of other neurons are decreased to
(2) modifying data; and (3) adding an auxiliary model(s) [140]. mask the uninterpretable part. However, Nicholas Carlini
A taxonomy of such methods is presented in Figure 7 and are demonstrated that the aforementioned method utilizing
discussed next. the interpretability of deep models is not resilient to un-
1) Modifying Model: The modifying model includes meth- targeted adversarial examples generated using L∞ norm
ods that modify the parameters or features of the trained ML [148].
model, widely used methods include the following: • Masking ML Model: In a recent study [149], a method
• Defensive Distillation: The distillation of neural networks for secure learning is presented in which the problem of
was first introduced by Hinton et al. as a method for adversarial ML is formulated as learning and masking
transferring the knowledge from a larger model to a problem. The masking of the deep model was performed
smaller one [141]. The notion of network distillation by introducing noise in the logit output which success-
was then adopted by Papernot et al. to defend against fully deafened attacks with low distortions.
adversarial attacks, also known as defensive distillation
2) Modifying Data: It includes those methods that aim
[142]. The authors used the predicted labels of the first
at either modifying the data or its features, commonly used
model as the labels of the input sample to the original
methods are described next:
DL model. This strategy increases the robustness of the
DL model to considerably small perturbations. However, • Adversarial (Re-)training: This is a very basic method
in a later study, Carlini and Wagner demonstrated that that was originally proposed by Goodfellow et al. for
their proposed adversarial attack (named as C&W attack) making deep models robust to adversarial examples [93].
evaded the defensive distillation method [143]. In this method, the ML/DL models are trained (or re-
• Network Verification: The techniques verifying certain trained) using an augmented training set that includes
properties of DL models in response to input samples are adversarial examples. Various studies have used this
known as network verification methods. The key goal is method for evaluating the robustness of DL classifiers
to restrain adversarial examples while checking whether using different datasets, e.g., MNIST [150] and ImageNet
the input satisfied or violated certain properties. In [144], [144]. However, it has been reported in the literature
such a method is proposed that uses ReLU activation and that this method fails to defend against iterative adver-
satisfiability modulo theory (SMT) to make deep models sarial perturbation generation methods like basic iterative
resilient against adversarial attacks. method (BIM) [151].
• Gradient Regularization: The idea of using input gra- • Input Reconstruction: The method of transforming ad-
dient regularization for defending adversarial examples versarial examples into legitimate ones by cleaning the
was proposed by Ross et al. [145]. They trained the adversarial noise is known as input reconstruction. The
differentiable models by regularizing the variation in the transformed samples have no harmful effect on the infer-
results with respect to the change in the input due to ence of deep models. In [152], denoising autoencoder is
which small adversarial perturbations were not able to used for the cleaning of adversarial examples.
affect the output of DL models. However, this method • Feature Squeezing: Xu et al. [153] proposed feature
increases the complexity of the training process by a squeezing as a defense method against adversarial ex-
factor of two. amples by squeezing the input feature space that an ad-
14
Fig. 7: Taxonomy of Adversarial Defenses (Source: [140]). Defenses are categorized into three categories: (1) Modifying Data; (2) Modifying Model; and (3) Adding Auxiliary
Model(s).
versary can exploit to construct adversarial examples. To developing adversarially robust features.
reduce the available feature space to an adversary, authors • Manifold Projection: The method of projecting input
combined heterogeneous feature vectors in the original samples on the manifold learned by the generative models
feature space into a single space. The feature squeezing is known as manifold projection. Song et al. [157] used
was performed at two levels: (1) smoothing the spatial generative models to clean adversarial noise (pertur-
domain using local and non-local operations and (2) bations) from the adversarial images then the cleaned
minimizing color bit depth. Moreover, the performance images are used as the input to the non-modified model.
evaluation of the proposed defense was performed using In a similar study [158], generative adversarial networks
eleven state of the art adversarial perturbation generation (GANs) are used for cleaning of adversarial noise.
methods using three benchmark datasets (i.e., CIFAR10, 3) Adding Auxiliary Model(s): In these methods, additional
MNIST, and ImageNet). However, in a later study, the auxiliary ML/DL models are integrated to robustify the main-
aforementioned defense method was found to be less stream model, commonly used methods that fall into this class
effective [154]. are described in the following paragraphs:
• Features Masking: The method of feature masking was • Adversarial Detection: In this method, an additional
proposed by Gao et al. [155] that aims at masking the binary classifier is trained to distinguish between the
most sensitive features of the input that are susceptible adversarial and original samples that can be regarded
to adversarial perturbations. The authors added a masking as the detector model [159], [160]. In [161], a simple
layer right before the classification layer (i.e., softmax) DNN based detector model is used for the detection of
that sets the corresponding weights of the sensitive neu- adversarial examples. Similarly, an outlier class has been
rons to zero. introduced during the training of a deep model that helps
• Developing Adversarially Robust Features: To develop the model to detect the adversarial examples belonging
adversarially robust features, the connections between to the outlier class.
the metric of interest and natural spectral geometrical • Ensembling Defenses: The literature suggests that ad-
property of the dataset has been leveraged in [156]. versarial examples can be constructed in multi-faceted
Furthermore, the authors provided empirical evidence fashion. Therefore, to develop an efficient defense method
about the effectiveness of using a spectral approach for against such adversarial examples, multiple defense
15
strategies can be integrated sequentially or in parallel 1) Transfer Learning: The requirement of the availability
[162]. The PixelDefend method is an excellent example of a large-scale dataset for training DL models capable of
of an ensemble defense method in which authors used an providing high performances can be partially mitigated using
ensemble of two methods, i.e., adversarial detection and transfer learning. Transfer learning is a technique in which a
input reconstruction [157]. However, it has been shown model trained on a larger dataset is re-trained (fine-tuned) on
that the ensemble of weak defenses does not necessarily the application-specific dataset (relatively smaller in size to
increase the robustness of DL models to adversarial the first one). The aim is to transfer knowledge learned by the
attacks [154]. model from one domain (data distribution) to the other domain
• Using Generative ML Models: The idea of defending [168]. However, transfer learning can be problematic for
against adversarial attacks by utilizing generative models healthcare applications due to the requirement of sufficiently
was firstly presented by Goodfellow et al. [93], however, large data for first training and good quality data annotated
in the same study the authors presented an alternative by expert clinicians such as radiologists for domain-specific
hypothesis of ensemble training and articulated that gen- training.
erative training is not sufficient. In [163], adversarial
examples are cleaned using GAN that was trained on 2) Domain Adaptation: Domain adaptation is the method
the same dataset. In a similar study [164], a framework of learning a DL model by considering a shift between the
named Defense-GAN is presented that is trained on the training (often called as source domain) and test (often called
distribution of legitimate samples. Defense-GAN finds as target domain) data distributions, i.e., source domain and
similar output during the testing phase without adversarial target domain distributions are different. Domain adaptation is
perturbations that are given as input to the original DL a special case of transfer learning that can be particularly use-
model. ful for medical image analysis tasks such as MRI segmentation
[118], [169], chest X-ray classification [170], and multi-class
Alzheimer disease classification [171], etc. Different facets of
C. Causal Models for Healthcare
domain adaptation have been proposed in the literature and
Asking causal questions in healthcare is a very challenging can be broadly categorized as supervised, unsupervised, semi-
yet important approach and ideally, causal inferences require supervised, and self-supervised domain adaptation methods
experiments. But it in healthcare this not always possible, e.g., which are described below. Please note that the definition of
if we want to figure out what will happen if a person takes drug domain adaptation is ambiguous since it may refer to labeled
A instead of B, we can not experiment it directly on the patient data being available in the source or target domains and the
which is unethical and can have unintended consequences. definitions provided below for each method are mostly used
Alternatively, retrospective observational data is leveraged to in the literature [172].
train models for making counterfactual predictions of what
we would have observed if we had run an experiment [165].
(a) Supervised Domain Adaptation: This method is similar
Causality can be deemed in two foundational ways, i.e.,
to a supervised learning strategy with the only difference
potential outcomes and causal graphical models that require
of different distributions for source domain and target
manipulating reality. In predictive healthcare, potential out-
domain data. Supervised domain adaptation is particularly
comes can be treatment, action, and interventions. If the total
useful when a labeled data is available for the target
number of possible treatments is T then we can have T
domain and generally, the source domain also has labeled
possible outcomes and the unit of observation will be a patient
data.
who gets one of the T treatments.
(b) Unsupervised Domain Adaptation: In unsupervised do-
In the literature, different approaches have been presented
main adaptation, source domain data is labeled and target
for providing causal inferences and reasoning in healthcare
domain data is unlabeled. An unsupervised domain adap-
using classical models. For instance, the Gaussian processes
tation method using reverse flow and adversarial training
based counterfactual causal model has been presented in [165]
for generating synthetic medical images is presented in
and in a similar study, authors introduced the counterfactual
[173]. In addition, the authors used self-regularization for
Gaussian process (CGP) for predicting counterfactual future
preserving clinically-relevant features.
progression and argued that counterfactual model can pro-
(c) Semi-supervised Domain Adaptation: In semi-supervised
vide reliable decision support [102]. The use of probabilistic
domain adaptation, labeled source data and partial labeled
graphical models to analyze causality in health conditions
target domain.
for identification sleep apnea, Alzheimer’s disease, and heart
(d) Self-supervised Domain Adaptation: Self-supervised do-
diseases is presented in [166]. A comprehensive review of
main adaptation methods aims at learning visual models
graphical causal models can be found in this recent study
without manual labeling by training generic models using
[167].
auxiliary relatively simple tasks (known as pretext tasks).
The supervision is provided by modifying the original
D. Solutions to Address Distribution Shifts visual content (e.g., a set of images) according to known
To cater with data distribution shift problem various tech- transformations (e.g., rotation) and then the model is
niques have been proposed in the literature (e.g., transfer trained to predict such transformations that serve as labels
learning and domain adaptation), which are described next. for the pretext tasks [174].
16
data-driven methods with hypothesis-driven or model-based [13] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau,
methods (based on subject matter knowledge) and to bring and S. Thrun, “Dermatologist-level classification of skin cancer with
deep neural networks,” Nature, vol. 542, no. 7639, p. 115, 2017.
scientific rigor in these studies. Properly designed experiments [14] S. Latif, M. Asim, M. Usman, J. Qadir, and R. Rana, “Automating
are also necessary for deriving causal explanations. Avenues motion correction in multishot mri using generative adversarial net-
for developing secure and robust ML solutions for healthcare works,” Published as Workshop Paper at 32nd Conference on Neural
Information Processing Systems (NIPS 2018), 2018.
that are scientifically robust and rigorous requires further [15] X.-W. Chen and X. Lin, “Big data deep learning: challenges and
attention from the community. perspectives,” IEEE access, vol. 2, pp. 514–525, 2014.
[16] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfel-
low, and R. Fergus, “Intriguing properties of neural networks,” arXiv
VI. C ONCLUSIONS preprint arXiv:1312.6199, 2013.
[17] A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras,
The use of machine learning (ML)/deep learning (DL) and T. Goldstein, “Poison frogs! targeted clean-label poisoning attacks
models for clinical applications has great potential to transform on neural networks,” in Advances in Neural Information Processing
traditional healthcare service delivery. However, to ensure a se- Systems, 2018, pp. 6103–6113.
[18] X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and
cure and robust application of these models in clinical settings, defenses for deep learning,” IEEE transactions on neural networks and
different privacy and security challenges should be addressed. learning systems, 2019.
In this paper, we provided an overview of such challenges by [19] S. G. Finlayson, J. D. Bowers, J. Ito, J. L. Zittrain, A. L. Beam, and I. S.
Kohane, “Adversarial attacks on medical machine learning,” Science,
formulating the ML pipeline in healthcare and by identifying vol. 363, no. 6433, pp. 1287–1289, 2019.
different sources of vulnerabilities in it. We also discussed [20] K. Papangelou, K. Sechidis, J. Weatherall, and G. Brown, “Toward
potential solutions to provide secure and privacy-preserving an understanding of adversarial examples in clinical trials,” in Joint
European Conference on Machine Learning and Knowledge Discovery
ML for security-critical applications like healthcare. Finally, in Databases. Springer, 2018, pp. 35–51.
we presented different open research problems that require [21] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A
further investigation. survey,” ACM computing surveys (CSUR), vol. 41, no. 3, p. 15, 2009.
[22] A. K. Pandey, P. Pandey, K. Jaiswal, and A. K. Sen, “Datamining
clustering techniques in the prediction of heart disease using attribute
R EFERENCES selection method,” heart disease, vol. 14, pp. 16–17, 2013.
[23] K. Polat and S. Güneş, “Prediction of hepatitis disease based on prin-
[1] S. Latif, J. Qadir, S. Farooq, and M. Imran, “How 5G wireless cipal component analysis and artificial immune recognition system,”
(and concomitant technologies) will revolutionize healthcare?” Future Applied Mathematics and computation, vol. 189, no. 2, pp. 1282–1291,
Internet, vol. 9, no. 4, p. 93, 2017. 2007.
[2] Z. Yan, Y. Zhan, Z. Peng, S. Liao, Y. Shinagawa, S. Zhang, D. N. [24] M. Alloghani, D. Al-Jumeily, J. Mustafina, A. Hussain, and A. J.
Metaxas, and X. S. Zhou, “Multi-instance deep learning: Discover Aljaaf, “A systematic review on supervised and unsupervised machine
discriminative local anatomies for bodypart recognition,” IEEE trans- learning algorithms for data science,” in Supervised and Unsupervised
actions on medical imaging, vol. 35, no. 5, pp. 1332–1343, 2016. Learning for Data Science. Springer, 2020, pp. 3–21.
[3] M. Anthimopoulos, S. Christodoulidis, L. Ebner, A. Christe, and [25] M. N. Sohail, J. Ren, and M. Uba Muhammad, “A euclidean group
S. Mougiakakou, “Lung pattern classification for interstitial lung dis- assessment on semi-supervised clustering for healthcare clinical impli-
eases using a deep convolutional neural network,” IEEE transactions cations based on real-life data,” International journal of environmental
on medical imaging, vol. 35, no. 5, pp. 1207–1216, 2016. research and public health, vol. 16, no. 9, p. 1581, 2019.
[4] W. Shen, M. Zhou, F. Yang, C. Yang, and J. Tian, “Multi-scale convolu- [26] A. Zahin, R. Q. Hu et al., “Sensor-based human activity recognition
tional neural networks for lung nodule classification,” in International for smart healthcare: A semi-supervised machine learning,” in Inter-
Conference on Information Processing in Medical Imaging. Springer, national Conference on Artificial Intelligence for Communications and
2015, pp. 588–599. Networks. Springer, 2019, pp. 450–472.
[5] J. Schlemper, J. Caballero, J. V. Hajnal, A. Price, and D. Rueckert, [27] D. Mahapatra, “Semi-supervised learning and graph cuts for consensus
“A deep cascade of convolutional neural networks for mr image based medical image segmentation,” Pattern recognition, vol. 63, pp.
reconstruction,” in International Conference on Information Processing 700–709, 2017.
in Medical Imaging. Springer, 2017, pp. 647–658. [28] W. Bai, O. Oktay, M. Sinclair, H. Suzuki, M. Rajchl, G. Tarroni,
[6] J. Mehta and A. Majumdar, “Rodeo: robust de-aliasing autoencoder for B. Glocker, A. King, P. M. Matthews, and D. Rueckert, “Semi-
real-time medical image reconstruction,” Pattern Recognition, vol. 63, supervised learning for network-based cardiac mr image segmenta-
pp. 499–510, 2017. tion,” in International Conference on Medical Image Computing and
[7] M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, Y. Ben- Computer-Assisted Intervention. Springer, 2017, pp. 253–260.
gio, C. Pal, P.-M. Jodoin, and H. Larochelle, “Brain tumor segmentation [29] R. S. Sutton, A. G. Barto et al., Introduction to reinforcement learning.
with deep neural networks,” Medical image analysis, vol. 35, pp. 18– MIT press Cambridge, 1998, vol. 2, no. 4.
31, 2017. [30] H.-C. Kao, K.-F. Tang, and E. Y. Chang, “Context-aware symptom
[8] K. Bourzac, “The computer will see you now,” Nature, vol. 502, no. 3, checking for disease diagnosis using hierarchical reinforcement learn-
pp. S92–S94, 2013. ing,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[9] L. Xing, E. A. Krupinski, and J. Cai, “Artificial intelligence will [31] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van
soon change the landscape of medical physics research and practice,” Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam,
Medical physics, vol. 45, no. 5, pp. 1791–1793, 2018. M. Lanctot et al., “Mastering the game of go with deep neural networks
[10] B. E. Bejnordi, M. Veta, P. J. Van Diest, B. Van Ginneken, N. Karsse- and tree search,” nature, vol. 529, no. 7587, p. 484, 2016.
meijer, G. Litjens, J. A. Van Der Laak, M. Hermsen, Q. F. Manson, [32] A. Collins and Y. Yao, “Machine learning approaches: Data integra-
M. Balkenhol et al., “Diagnostic assessment of deep learning algo- tion for disease prediction and prognosis,” in Applied Computational
rithms for detection of lymph node metastases in women with breast Genomics. Springer, 2018, pp. 137–141.
cancer,” Jama, vol. 318, no. 22, pp. 2199–2210, 2017. [33] P. Afshar, A. Mohammadi, and K. N. Plataniotis, “Brain tumor type
[11] P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding, classification via capsule networks,” in 2018 25th IEEE International
A. Bagul, C. Langlotz, K. Shpanskaya et al., “Chexnet: Radiologist- Conference on Image Processing (ICIP). IEEE, 2018, pp. 3129–3133.
level pneumonia detection on chest x-rays with deep learning,” arXiv [34] W. Zhu, C. Liu, W. Fan, and X. Xie, “Deeplung: Deep 3d dual path nets
preprint arXiv:1711.05225, 2017. for automated pulmonary nodule detection and classification,” in 2018
[12] V. Gulshan, L. Peng, M. Coram, M. C. Stumpe, D. Wu, IEEE Winter Conference on Applications of Computer Vision (WACV).
A. Narayanaswamy, S. Venugopalan, K. Widner, T. Madams, IEEE, 2018, pp. 673–681.
J. Cuadros et al., “Development and validation of a deep learning [35] P. B. Jensen, L. J. Jensen, and S. Brunak, “Mining electronic health
algorithm for detection of diabetic retinopathy in retinal fundus pho- records: towards better research applications and clinical care,” Nature
tographs,” Jama, vol. 316, no. 22, pp. 2402–2410, 2016. Reviews Genetics, vol. 13, no. 6, p. 395, 2012.
19
[36] Z. Wang, A. D. Shah, A. R. Tate, S. Denaxas, J. Shawe-Taylor, [56] X. Yang, R. Kwitt, M. Styner, and M. Niethammer, “Quicksilver: Fast
and H. Hemingway, “Extracting diagnoses and investigation results predictive image registration–a deep learning approach,” NeuroImage,
from unstructured text in electronic health records by semi-supervised vol. 158, pp. 378–396, 2017.
machine learning,” PLoS One, vol. 7, no. 1, p. e30412, 2012. [57] S. Miao, Z. J. Wang, and R. Liao, “A cnn regression approach for
[37] T. Zheng, W. Xie, L. Xu, X. He, Y. Zhang, M. You, G. Yang, and real-time 2d/3d registration,” IEEE transactions on medical imaging,
Y. Chen, “A machine learning-based framework to identify type 2 vol. 35, no. 5, pp. 1352–1363, 2016.
diabetes through electronic health records,” International journal of [58] D. Shen, G. Wu, and H.-I. Suk, “Deep learning in medical image
medical informatics, vol. 97, pp. 120–127, 2017. analysis,” Annual review of biomedical engineering, vol. 19, pp. 221–
[38] B. Nestor, M. McDermott, W. Boag, G. Berner, T. Naumann, M. C. 248, 2017.
Hughes, A. Goldenberg, and M. Ghassemi, “Feature robustness in [59] A. Qayyum, S. M. Anwar, M. Awais, and M. Majid, “Medical image
non-stationary health records: caveats to deployable model perfor- retrieval using deep convolutional neural network,” Neurocomputing,
mance in common clinical machine learning tasks,” arXiv preprint vol. 266, pp. 8–20, 2017.
arXiv:1908.00690, 2019. [60] J. Zech, M. Pain, J. Titano, M. Badgeley, J. Schefflein, A. Su, A. Costa,
[39] S. M. Anwar, M. Majid, A. Qayyum, M. Awais, M. Alnowami, and J. Bederson, J. Lehar, and E. K. Oermann, “Natural language–based
M. K. Khan, “Medical image analysis using convolutional neural machine learning models for the annotation of clinical radiology
networks: a review,” Journal of medical systems, vol. 42, no. 11, p. reports,” Radiology, vol. 287, no. 2, pp. 570–580, 2018.
226, 2018. [61] B. Jing, P. Xie, and E. Xing, “On the automatic generation of
[40] M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly, “Compressed medical imaging reports,” 56th Annual Meeting of the Association for
sensing mri,” IEEE signal processing magazine, vol. 25, no. 2, pp. Computational Linguistics (ACL), 2018.
72–82, 2008. [62] X. Wang, Y. Peng, L. Lu, Z. Lu, and R. M. Summers, “Tienet: Text-
image embedding network for common thorax disease classification
[41] L. Gondara, “Medical image denoising using convolutional denoising
and reporting in chest x-rays,” in Proceedings of the IEEE conference
autoencoders,” in 2016 IEEE 16th International Conference on Data
on computer vision and pattern recognition, 2018, pp. 9049–9058.
Mining Workshops (ICDMW). IEEE, 2016, pp. 241–246.
[63] Y. Xue, T. Xu, L. R. Long, Z. Xue, S. Antani, G. R. Thoma, and
[42] Y. Chen, Y. Xie, Z. Zhou, F. Shi, A. G. Christodoulou, and D. Li, X. Huang, “Multimodal recurrent model with attention for automated
“Brain mri super resolution using 3d deep densely connected neural radiology report generation,” in International Conference on Medical
networks,” in 2018 IEEE 15th International Symposium on Biomedical Image Computing and Computer-Assisted Intervention. Springer,
Imaging (ISBI 2018). IEEE, 2018, pp. 739–742. 2018, pp. 457–466.
[43] K. Sirinukunwattana, S. e Ahmed Raza, Y.-W. Tsang, D. R. Snead, [64] V. Jindal, “Integrating mobile and cloud for ppg signal selection to
I. A. Cree, and N. M. Rajpoot, “Locality sensitive deep learning for monitor heart rate during intensive physical exercise,” in Proceedings
detection and classification of nuclei in routine colon cancer histology of the International Conference on Mobile Software Engineering and
images.” IEEE Trans. Med. Imaging, vol. 35, no. 5, pp. 1196–1206, Systems. ACM, 2016, pp. 36–37.
2016. [65] F. Attal, S. Mohammed, M. Dedabrishvili, F. Chamroukhi, L. Oukhel-
[44] H. Wang, A. C. Roa, A. N. Basavanhally, H. L. Gilmore, N. Shih, lou, and Y. Amirat, “Physical human activity recognition using wear-
M. Feldman, J. Tomaszewski, F. Gonzalez, and A. Madabhushi, able sensors,” Sensors, vol. 15, no. 12, pp. 31 314–31 338, 2015.
“Mitosis detection in breast cancer pathology images by combining [66] S. F. Weng, J. Reps, J. Kai, J. M. Garibaldi, and N. Qureshi, “Can
handcrafted and convolutional neural network features,” Journal of machine-learning improve cardiovascular risk prediction using routine
Medical Imaging, vol. 1, no. 3, p. 034003, 2014. clinical data?” PloS one, vol. 12, no. 4, p. e0174944, 2017.
[45] Y. Yu, H. Lin, J. Meng, X. Wei, H. Guo, and Z. Zhao, “Deep transfer [67] M. Fatima and M. Pasha, “Survey of machine learning algorithms
learning for modality classification of medical images,” Information, for disease diagnostic,” Journal of Intelligent Learning Systems and
vol. 8, no. 3, p. 91, 2017. Applications, vol. 9, no. 01, p. 1, 2017.
[46] J. Antony, K. McGuinness, N. E. O’Connor, and K. Moran, “Quantify- [68] J. A. Cruz and D. S. Wishart, “Applications of machine learning
ing radiographic knee osteoarthritis severity using deep convolutional in cancer prediction and prognosis,” Cancer informatics, vol. 2, p.
neural networks,” in 2016 23rd International Conference on Pattern 117693510600200030, 2006.
Recognition (ICPR). IEEE, 2016, pp. 1195–1200. [69] H.-Y. Ma, Z. Zhou, S. Wu, Y.-L. Wan, and P.-H. Tsui, “A computer-
[47] E. Kim, M. Corte-Real, and Z. Baloch, “A deep semantic mobile aided diagnosis scheme for detection of fatty liver in vivo based on
application for thyroid cytopathology,” in Medical Imaging 2016: PACS ultrasound kurtosis imaging,” Journal of medical systems, vol. 40, no. 1,
and Imaging Informatics: Next Generation and Innovations, vol. 9789. p. 33, 2016.
International Society for Optics and Photonics, 2016, p. 97890A. [70] Z. Zhang et al., “Reinforcement learning in clinical medicine: a
[48] M. F. Stollenga, W. Byeon, M. Liwicki, and J. Schmidhuber, “Parallel method to optimize dynamic treatment regime over time,” Annals of
multi-dimensional lstm, with application to fast biomedical volumetric translational medicine, vol. 7, no. 14, 2019.
image segmentation,” in Advances in neural information processing [71] A. Raghu, “Reinforcement learning for sepsis treatment: Baselines and
systems, 2015, pp. 2998–3006. analysis,” 2019.
[49] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional net- [72] H. Suresh, “Clinical event prediction and understanding with deep
works for biomedical image segmentation,” in International Confer- neural networks,” Ph.D. dissertation, Massachusetts Institute of Tech-
ence on Medical image computing and computer-assisted intervention. nology, 2017.
Springer, 2015, pp. 234–241. [73] C.-S. Rau, P.-J. Kuo, P.-C. Chien, C.-Y. Huang, H.-Y. Hsieh, and C.-
H. Hsieh, “Mortality prediction in patients with isolated moderate and
[50] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutional
severe traumatic brain injury using machine learning models,” PloS
neural networks for volumetric medical image segmentation,” in 2016
one, vol. 13, no. 11, p. e0207192, 2018.
Fourth International Conference on 3D Vision (3DV). IEEE, 2016,
[74] H. Song, D. Rajan, J. J. Thiagarajan, and A. Spanias, “Attend and
pp. 565–571.
diagnose: Clinical time series analysis using attention models,” in
[51] M. H. Hesamian, W. Jia, X. He, and P. Kennedy, “Deep learning tech- Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
niques for medical image segmentation: Achievements and challenges,” [75] O. Ren, A. E. Johnson, E. P. Lehman, M. Komorowski, J. Aboab,
Journal of digital imaging, pp. 1–15, 2019. F. Tang, Z. Shahn, D. Sow, R. Mark, and L.-w. Lehman, “Predicting
[52] H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and and understanding unexpected respiratory decompensation in critical
G. Wang, “Low-dose ct with a residual encoder-decoder convolutional care using sparse and heterogeneous clinical data,” in 2018 IEEE
neural network,” IEEE transactions on medical imaging, vol. 36, International Conference on Healthcare Informatics (ICHI). IEEE,
no. 12, pp. 2524–2535, 2017. 2018, pp. 144–151.
[53] M. Usman, S. Latif, M. Asim, and J. Qadir, “Motion corrected [76] A. K. Jha, “The promise of electronic records: around the corner or
multishot mri reconstruction using generative networks with sensitivity down the road?” Jama, vol. 306, no. 8, pp. 880–881, 2011.
encoding,” arXiv preprint arXiv:1902.07430, 2019. [77] A. Névéol, H. Dalianis, S. Velupillai, G. Savova, and P. Zweigenbaum,
[54] F. E.-Z. A. El-Gamal, M. Elmogy, and A. Atwan, “Current trends in “Clinical natural language processing in languages other than english:
medical image registration and fusion,” Egyptian Informatics Journal, opportunities and challenges,” Journal of biomedical semantics, vol. 9,
vol. 17, no. 1, pp. 99–124, 2016. no. 1, p. 12, 2018.
[55] J. Ker, L. Wang, J. Rao, and T. Lim, “Deep learning applications in [78] E. Soysal, J. Wang, M. Jiang, Y. Wu, S. Pakhomov, H. Liu, and H. Xu,
medical image analysis,” Ieee Access, vol. 6, pp. 9375–9389, 2017. “Clamp–a toolkit for efficiently building customized clinical natural
20
language processing pipelines,” Journal of the American Medical [98] M. Mozaffari-Kermani, S. Sur-Kolay, A. Raghunathan, and N. K. Jha,
Informatics Association, vol. 25, no. 3, pp. 331–336, 2017. “Systematic poisoning attacks on and defenses for machine learning in
[79] D. S. Wallace, “The role of speech recognition in clinical doc- healthcare,” IEEE journal of biomedical and health informatics, vol. 19,
umentation,” Nuance Communications, 2018, access on: 14-Dec- no. 6, pp. 1893–1905, 2014.
2019. [Online]. Available: https://www.hisa.org.au/slides/hic18/wed/ [99] S. G. Finlayson, H. W. Chung, I. S. Kohane, and A. L. Beam,
SimonWallace.pdf “Adversarial attacks against medical deep learning systems,” arXiv
[80] M. Ghassemi, J. H. Van Stan, D. D. Mehta, M. Zañartu, H. A. preprint arXiv:1804.05296, 2018.
Cheyne II, R. E. Hillman, and J. V. Guttag, “Learning to detect vocal [100] M. Al-Rubaie and J. M. Chang, “Privacy-preserving machine learning:
hyperfunction from ambulatory neck-surface acceleration features: Ini- Threats and solutions,” IEEE Security & Privacy, vol. 17, no. 2, pp.
tial results for vocal fold nodules,” IEEE Transactions on Biomedical 49–58, 2019.
Engineering, vol. 61, no. 6, pp. 1668–1675, 2014. [101] J. Zhang and E. Bareinboim, “Fairness in decision-makingthe causal
[81] C. Pou-Prom and F. Rudzicz, “Learning multiview embeddings for as- explanation formula,” in Thirty-Second AAAI Conference on Artificial
sessing dementia,” in Proceedings of the 2018 Conference on Empirical Intelligence, 2018.
Methods in Natural Language Processing, 2018, pp. 2812–2817. [102] P. Schulam and S. Saria, “Reliable decision support using counterfac-
[82] K. C. Fraser, J. A. Meltzer, and F. Rudzicz, “Linguistic features tual models,” in Advances in Neural Information Processing Systems,
identify alzheimers disease in narrative speech,” Journal of Alzheimer’s 2017, pp. 1697–1708.
Disease, vol. 49, no. 2, pp. 407–422, 2016. [103] M. Ghassemi, T. Naumann, P. Schulam, A. L. Beam, and R. Ranganath,
[83] J. B. Andre, B. W. Bresnahan, M. Mossa-Basha, M. N. Hoff, C. P. “Opportunities in machine learning for healthcare,” arXiv preprint
Smith, Y. Anzai, and W. A. Cohen, “Toward quantifying the prevalence, arXiv:1806.00388, 2018.
severity, and cost associated with patient motion during clinical mr [104] E. Begoli, T. Bhattacharya, and D. Kusnezov, “The need for uncertainty
examinations,” Journal of the American College of Radiology, vol. 12, quantification in machine-assisted medical decision making,” Nature
no. 7, pp. 689–695, 2015. Machine Intelligence, vol. 1, no. 1, p. 20, 2019.
[84] A. K. Manrai, G. Bhatia, J. Strymish, I. S. Kohane, and S. H. Jain, [105] A. Khademi, S. Lee, D. Foley, and V. Honavar, “Fairness in algorithmic
“Medicines uncomfortable relationship with math: calculating positive decision making: An excursion through the lens of causality,” in The
predictive value,” JAMA internal medicine, vol. 174, no. 6, pp. 991– World Wide Web Conference. ACM, 2019, pp. 2907–2914.
993, 2014. [106] N. Kilbertus, M. R. Carulla, G. Parascandolo, M. Hardt, D. Janzing,
[85] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad, and B. Schölkopf, “Avoiding discrimination through causal reasoning,”
“Intelligible models for healthcare: Predicting pneumonia risk and hos- in Advances in Neural Information Processing Systems, 2017, pp. 656–
pital 30-day readmission,” in Proceedings of the 21th ACM SIGKDD 666.
International Conference on Knowledge Discovery and Data Mining. [107] L. Faes, S. K. Wagner, D. J. Fu, X. Liu, E. Korot, J. R. Ledsam, T. Back,
ACM, 2015, pp. 1721–1730. R. Chopra, N. Pontikos, C. Kern et al., “Automated deep learning
[86] X. A. Li, A. Tai, D. W. Arthur, T. A. Buchholz, S. Macdonald, design for medical image classification by health-care professionals
L. B. Marks, J. M. Moran, L. J. Pierce, R. Rabinovitch, A. Taghian with no coding experience: a feasibility study,” The Lancet Digital
et al., “Variability of target and normal structure delineation for Health, vol. 1, no. 5, pp. e232–e242, 2019.
breast cancer radiotherapy: an rtog multi-institutional and multiobserver [108] N. u. . h. OReilly, “Challenges to AI in healthcare accessed online: 16
study,” International Journal of Radiation Oncology* Biology* Physics, oct 2019.”
vol. 73, no. 3, pp. 944–951, 2009. [109] B. David, R. Dowsley, R. Katti, and A. C. Nascimento, “Efficient
[87] F. Xia and M. Yetisgen-Yildiz, “Clinical corpus annotation: challenges unconditionally secure comparison and privacy preserving machine
and strategies,” in Proceedings of the Third Workshop on Building and learning classification protocols,” in International Conference on Prov-
Evaluating Resources for Biomedical Text Mining (BioTxtM’2012) in able Security. Springer, 2015, pp. 354–367.
conjunction with the International Conference on Language Resources [110] M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, and B. Li,
and Evaluation (LREC), Istanbul, Turkey, 2012. “Manipulating machine learning: Poisoning attacks and countermea-
[88] B. Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against sures for regression learning,” in 2018 IEEE Symposium on Security
support vector machines,” in 29th International Conference on Machine and Privacy (SP). IEEE, 2018, pp. 19–35.
Learning, 2012, pp. 1807–1814. [111] M. Liu, H. Jiang, J. Chen, A. Badokhon, X. Wei, and M.-C. Huang,
[89] S. Alfeld, X. Zhu, and P. Barford, “Data poisoning attacks against “A collaborative privacy-preserving deep learning system in distributed
autoregressive models,” in Thirtieth AAAI Conference on Artificial mobile environment,” in 2016 International Conference on Computa-
Intelligence, 2016. tional Science and Computational Intelligence (CSCI). IEEE, 2016,
[90] N. Papernot, P. McDaniel, A. Sinha, and M. Wellman, “Towards the pp. 192–197.
science of security and privacy in machine learning,” arXiv preprint [112] D. Malathi, R. Logesh, V. Subramaniyaswamy, V. Vijayakumar, and
arXiv:1611.03814, 2016. A. K. Sangaiah, “Hybrid reasoning-based privacy-aware disease pre-
[91] T. J. Pollard, I. Chen, J. Wiens, S. Horng, D. Wong, M. Ghassemi, diction support system,” Computers & Electrical Engineering, vol. 73,
H. Mattie, E. Lindmeer, and T. Panch, “Turning the crank for machine pp. 114–127, 2019.
learning: ease, at what expense?” The Lancet Digital Health, vol. 1, [113] H. Takabi, E. Hesamifard, and M. Ghasemi, “Privacy preserving multi-
no. 5, pp. e198–e199, 2019. party machine learning with homomorphic encryption,” in 29th Annual
[92] M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks Conference on Neural Information Processing Systems (NIPS), 2016.
that exploit confidence information and basic countermeasures,” in [114] M. Kim, Y. Song, S. Wang, Y. Xia, and X. Jiang, “Secure logistic
Proceedings of the 22nd ACM SIGSAC Conference on Computer and regression based on homomorphic encryption: Design and evaluation,”
Communications Security. ACM, 2015, pp. 1322–1333. JMIR medical informatics, vol. 6, no. 2, p. e19, 2018.
[93] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing [115] I. Chen, F. D. Johansson, and D. Sontag, “Why is my classifier dis-
adversarial examples,” arXiv preprint arXiv:1412.6572, 2014. criminatory?” in Advances in Neural Information Processing Systems,
[94] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and 2018, pp. 3539–3550.
A. Swami, “The limitations of deep learning in adversarial settings,” in [116] M. Ghassemi, T. Naumann, P. Schulam, A. L. Beam, I. Y. Chen, and
2016 IEEE European Symposium on Security and Privacy (EuroS&P). R. Ranganath, “Practical guidance on artificial intelligence for health-
IEEE, 2016, pp. 372–387. care data,” The Lancet Digital Health, vol. 1, no. 4, pp. e157–e159,
[95] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and 2019.
A. Swami, “Practical black-box attacks against machine learning,” in [117] T. Panch, H. Mattie, and L. A. Celi, “The inconvenient truth about ai
Proceedings of the 2017 ACM on Asia Conference on Computer and in healthcare,” Npj Digital Medicine, vol. 2, no. 1, pp. 1–3, 2019.
Communications Security. ACM, 2017, pp. 506–519. [118] C. S. Perone, P. Ballester, R. C. Barros, and J. Cohen-Adad, “Un-
[96] M. Usama, J. Qadir, A. Al-Fuqaha, and M. Hamdi, “The ad- supervised domain adaptation for medical imaging segmentation with
versarial machine learning conundrum: Can the insecurity of ml self-ensembling,” NeuroImage, vol. 194, pp. 1–11, 2019.
become the achilles’ heel of cognitive networks?” arXiv preprint [119] A. Narayanan and V. Shmatikov, “Robust de-anonymization of large
arXiv:1906.00679, 2019. datasets (how to break anonymity of the netflix prize dataset),” Uni-
[97] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, versity of Texas at Austin, 2008.
G. Giacinto, and F. Roli, “Evasion attacks against machine learning [120] P. Mohassel and Y. Zhang, “Secureml: A system for scalable privacy-
at test time,” in Joint European conference on machine learning and preserving machine learning,” in 2017 IEEE Symposium on Security
knowledge discovery in databases. Springer, 2013, pp. 387–402. and Privacy (SP). IEEE, 2017, pp. 19–38.
21
[121] R. Bost, R. A. Popa, S. Tu, and S. Goldwasser, “Machine learning [143] N. Carlini and D. Wagner, “Adversarial examples are not easily
classification over encrypted data.” in NDSS, vol. 4324, 2015, p. 4325. detected: Bypassing ten detection methods,” in Proceedings of the 10th
[122] D. Bogdanov, L. Kamm, S. Laur, and V. Sokk, “Implementation ACM Workshop on Artificial Intelligence and Security. ACM, 2017,
and evaluation of an algorithm for cryptographically private principal pp. 3–14.
component analysis on genomic data,” IEEE/ACM transactions on [144] G. Katz, C. Barrett, D. L. Dill, K. Julian, and M. J. Kochenderfer, “Re-
computational biology and bioinformatics, vol. 15, no. 5, pp. 1427– luplex: An efficient SMT solver for verifying deep neural networks,” in
1432, 2018. International Conference on Computer Aided Verification. Springer,
[123] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, 2017, pp. 97–117.
S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical secure aggre- [145] A. S. Ross and F. Doshi-Velez, “Improving the adversarial robustness
gation for privacy-preserving machine learning,” in Proceedings of the and interpretability of deep neural networks by regularizing their input
2017 ACM SIGSAC Conference on Computer and Communications gradients,” in Thirty-second AAAI conference on artificial intelligence,
Security. ACM, 2017, pp. 1175–1191. 2018.
[124] M. S. Hossain and G. Muhammad, “Emotion recognition using secure [146] J. Bradshaw, A. G. d. G. Matthews, and Z. Ghahramani, “Adversarial
edge and cloud computing,” Information Sciences, vol. 504, pp. 589– examples, uncertainty, and transfer testing robustness in gaussian
601, 2019. process hybrid deep networks,” arXiv preprint arXiv:1707.02476, 2017.
[125] O. Ohrimenko, F. Schuster, C. Fournet, A. Mehta, S. Nowozin, [147] G. Tao, S. Ma, Y. Liu, and X. Zhang, “Attacks meet interpretability:
K. Vaswani, and M. Costa, “Oblivious multi-party machine learning Attribute-steered detection of adversarial samples,” in Advances in
on trusted processors,” in 25th USENIX Security Symposium (USENIX Neural Information Processing Systems (NeurIPS), 2018, pp. 7717–
Security 16), 2016, pp. 619–636. 7728.
[126] C. Dwork, “Differential privacy,” Encyclopedia of Cryptography and [148] N. Carlini, “Is ami (attacks meet interpretability) robust to adversarial
Security, pp. 338–340, 2011. examples?” arXiv preprint arXiv:1902.02322, 2019.
[127] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, [149] L. Nguyen, S. Wang, and A. Sinha, “A learning and masking approach
K. Talwar, and L. Zhang, “Deep learning with differential privacy,” to secure learning,” in International Conference on Decision and Game
in Proceedings of the 2016 ACM SIGSAC Conference on Computer Theory for Security. Springer, 2018, pp. 453–464.
and Communications Security. ACM, 2016, pp. 308–318. [150] R. Huang, B. Xu, D. Schuurmans, and C. Szepesvári, “Learning with
[128] M. McDermott, S. Wang, N. Marinsek, R. Ranganath, M. Ghassemi, a strong adversary,” arXiv preprint arXiv:1511.03034, 2015.
and L. Foschini, “Reproducibility in machine learning for health,” [151] A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples
Presented at the Internation Conference on Learning Representative in the physical world,” in Artificial Intelligence Safety and Security.
(ICLR) 2019 Reproducibility in Machine Learning Workshop, 2019. Chapman and Hall/CRC, 2018, pp. 99–112.
[129] N. Papernot, S. Song, I. Mironov, A. Raghunathan, K. Talwar, and [152] S. Gu and L. Rigazio, “Towards deep neural network architectures
Ú. Erlingsson, “Scalable private learning with pate,” International robust to adversarial examples,” Published as a Workshop Paper at
Conference on Learning Representations (ICLR), 2018. International Conference on Learning Representative (ICLR), 2015.
[130] Y.-X. Wang, B. Balle, and S. Kasiviswanathan, “Subsampled r\’enyi
[153] W. Xu, D. Evans, and Y. Qi, “Feature squeezing: Detecting ad-
differential privacy and analytical moments accountant,” arXiv preprint
versarial examples in deep neural networks,” in 25th Annual Net-
arXiv:1808.00087, 2018.
work and Distributed System Security Symposium, NDSS 2018,
[131] H. B. McMahan, G. Andrew, U. Erlingsson, S. Chien, I. Mironov,
San Diego, California, USA, February 18-21, 2018, 2018. [On-
N. Papernot, and P. Kairouz, “A general approach to adding differential
line]. Available: http://wp.internetsociety.org/ndss/wp-content/uploads/
privacy to iterative training procedures,” NeurIPS 2018 workshop on
sites/25/2018/02/ndss2018 03A-4 Xu paper.pdf
Privacy Preserving Machine Learning, 2018.
[154] W. He, J. Wei, X. Chen, N. Carlini, and D. Song, “Adversarial example
[132] N. Phan, X. Wu, H. Hu, and D. Dou, “Adaptive laplace mechanism:
defense: Ensembles of weak defenses are not strong,” in 11th USENIX
Differential privacy preservation in deep learning,” in 2017 IEEE
Workshop on Offensive Technologies (WOOT)’17), 2017.
International Conference on Data Mining (ICDM). IEEE, 2017, pp.
[155] J. Gao, B. Wang, Z. Lin, W. Xu, and Y. Qi, “Deepcloak: Masking deep
385–394.
neural network models for robustness against adversarial samples,”
[133] F. McSherry and K. Talwar, “Mechanism design via differential pri-
arXiv preprint arXiv:1702.06763, 2017.
vacy.” in FOCS, vol. 7, 2007, pp. 94–103.
[134] C. Dwork and F. D. McSherry, “Exponential noise distribution to [156] S. Garg, V. Sharan, B. Zhang, and G. Valiant, “A spectral view
optimize database privacy and output utility,” Jul. 14 2009, uS Patent of adversarially robust features,” in Advances in Neural Information
7,562,071. Processing Systems (NeurlIPS), 2018, pp. 10 159–10 169.
[135] B. K. Beaulieu-Jones, W. Yuan, S. G. Finlayson, and Z. S. Wu, [157] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman, “Pix-
“Privacy-preserving distributed deep learning for clinical data,” Ma- eldefend: Leveraging generative models to understand and defend
chine Learning for Health (ML4H) Workshop at NeurIPS, 2018. against adversarial examples,” in International Conference on Learning
[136] H. B. McMahan, E. Moore, D. Ramage, S. Hampson et al., Representations (ICLR), 2018. [Online]. Available: https://openreview.
“Communication-efficient learning of deep networks from decentral- net/forum?id=rJUYGxbCW
ized data,” Proceedings of the 20 th International Conference on [158] G. Jin, S. Shen, D. Zhang, F. Dai, and Y. Zhang, “APE-GAN:
Artificial Intelligence and Statistics (AISTATS) JMLR: W&CP volume adversarial perturbation elimination with GAN,” in ICASSP 2019-
54, 2017. 2019 IEEE International Conference on Acoustics, Speech and Signal
[137] T. S. Brisimi, R. Chen, T. Mela, A. Olshevsky, I. C. Paschalidis, Processing (ICASSP). IEEE, 2019, pp. 3842–3846.
and W. Shi, “Federated learning of predictive models from federated [159] J. Lu, T. Issaranon, and D. Forsyth, “Safetynet: Detecting and rejecting
electronic health records,” International journal of medical informatics, adversarial examples robustly,” in Proceedings of the IEEE Interna-
vol. 112, pp. 59–67, 2018. tional Conference on Computer Vision, 2017, pp. 446–454.
[138] P. Vepakomma, O. Gupta, T. Swedish, and R. Raskar, “Split learning [160] D. Gopinath, G. Katz, C. S. Pasareanu, and C. Barrett, “Deepsafe:
for health: Distributed deep learning without sharing raw patient A data-driven approach for checking adversarial robustness in neural
data,” Published as Workshop Paper at 32nd Conference on Neural networks,” arXiv preprint arXiv:1710.00486, 2017.
Information Processing Systems (NIPS 2018), 2018. [161] J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff, “On detect-
[139] D. Liu, T. Miller, R. Sayeed, and K. Mandl, “Fadl: Federated- ing adversarial perturbations,” International Conference on Learning
autonomous deep learning for distributed electronic health record,” Representations (ICLR), 2017.
Machine Learning for Health (ML4H) Workshop at NeurIPS, 2018. [162] F. Tramer, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and
[140] A. Qayyum, M. Usama, J. Qadir, and A. Al-Fuqaha, “Securing con- P. McDaniel, “Ensemble adversarial training: Attacks and defenses,” in
nected & autonomous vehicles: Challenges posed by adversarial ma- International Conference on Learning Representations (ICLR), 2018.
chine learning and the way forward,” arXiv preprint arXiv:1905.12762, [163] G. K. Santhanam and P. Grnarova, “Defending against adversarial at-
2019. tacks by leveraging an entire GAN,” arXiv preprint arXiv:1805.10652,
[141] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a 2018.
neural network,” Deep Learning Workshop, NIPS, 2014. [164] P. Samangouei, M. Kabkab, and R. Chellappa, “Defense-GAN: Protect-
[142] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation ing classifiers against adversarial attacks using generative models,” in
as a defense to adversarial perturbations against deep neural networks,” International Conference on Learning Representations (ICLR), 2018.
in 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 2016, [165] P. Schulam and S. Saria, “What-if reasoning with counterfactual
pp. 582–597. gaussian processes,” History, vol. 100, p. 120, 2017.
22