What Do We Need To Build Explainable AI Systems For The Medical Domain?
What Do We Need To Build Explainable AI Systems For The Medical Domain?
1
Holzinger Group HCI-KDD, Inst. for Med. Informatics/Statistics, Medical University Graz, Austria
2
Language Technology Group LT, Department of Informatics, University of Hamburg, Germany
3
E-Health Laboratory, Department of Computer Science, University of Cyprus, Cyprus
4
School of Chemistry and Manchester Institute of Biotechnology, University of Manchester, UK
Abstract
Artificial intelligence (AI) generally and machine learning (ML) specifically demonstrate
impressive practical success in many different application domains, e.g. in autonomous
driving, speech recognition, or recommender systems. Deep learning approaches, trained
on extremely large data sets or using reinforcement learning methods have even exceeded
human performance in visual tasks, particularly on playing games such as Atari, or mas-
tering the game of Go. Even in the medical domain there are remarkable results. However,
the central problem of such models is that they are regarded as black-box models and
even if we understand the underlying mathematical principles of such models they lack
an explicit declarative knowledge representation, hence have difficulty in generating the
underlying explanatory structures. This calls for systems enabling to make decisions trans-
parent, understandable and explainable. A huge motivation for our approach are rising
legal and privacy aspects. The new European General Data Protection Regulation (GDPR
and ISO/IEC 27001) entering into force on May 25th 2018, will make black-box approaches
difficult to use in business. This does not imply a ban on automatic learning approaches
or an obligation to explain everything all the time, however, there must be a possibility to
make the results re-traceable on demand. This is beneficial, e.g. for general understanding,
for teaching, for learning, for research, and it can be helpful in court. In this paper we out-
line some of our research topics in the context of the relatively new area of explainable-AI
with a focus on the application in medicine, which is a very special domain. This is due to
the fact that medical professionals are working mostly with distributed heterogeneous and
complex sources of data. In this paper we concentrate on three sources: images, *omics
data and text. We argue that research in explainable-AI would generally help to facilitate
the implementation of AI/ML in the medical domain, and specifically help to facilitate
transparency and trust.
Artificial intelligence (AI) has a long tradition in computer science and experienced many
ups and downs since its formal introduction as an academic discipline six decades ago (Hol-
land, 1992; Russell and Norvig, 1995). The field recently gained enormous interest, mostly
due to the practical successes in Machine Learning (ML). The grand goal of AI is to provide
the theoretical fundamentals for ML to develop software that can learn autonomously from
1
Holzinger, Biemann, Pattichis, Kell
1. https://www.darpa.mil/program/explainable-artificial-intelligence
2
Explainable AI for the Medical Domain
2. Explainability
The problem of explainability is as old as AI and maybe the result of AI itself: whilst AI
approaches demonstrate impressive practical success in many different application domains,
their effectiveness is still limited by their inability to ”explain” their decisions in an under-
standable way (Core et al., 2006). Even if we understand the underlying mathematical
theories it is complicated and often impossible to get insight into the internal working of
the models and to explain how and why a result was achieved. Explainable-AI is an rapidly
emerging research area with increasing visibility in the popular press2 and even daily press3 .
In the pioneering days of AI (Newell et al., 1958), the predominant reasoning methods
were logical and symbolic. These early AI systems reasoned by performing some form of
logical inference on human readable symbols. Interestingly, these early systems were able to
provide a trace of their inference steps and became the basis for explanation. There is some
related work available on how to make such systems explainable (Shortliffe and Buchanan,
1975; Swartout et al., 1991; Johnson, 1994; Lacave and Diez, 2002).
In the medical domain there is growing demand in AI approaches, which are not only
well performing, but trustworthy, transparent, interpretable and explainable. Methods and
models are necessary to reenact the machine decision-making process, to reproduce and to
comprehend both the learning and knowledge extraction process. This is important, because
for decision support it is necessary to understand the causality of learned representations
(Pearl, 2009; Gershman et al., 2015; Peters et al., 2017).
Understanding, interpreting, explaining are often used synonymously in the context of
explainable-AI (Doran et al., 2017), and various techniques of interpretation have been
applied in the past. There is a nice discussion on the ”Myth of model interpretability” by
Lipton (2016). In the context of explainable-AI the term “understanding” usually means a
functional understanding of the model, in contrast to a low-level algorithmic understanding
of it, i.e. to seek to characterize the model’s black-box behavior, without trying to elucidate
its inner workings or its internal representations. Montavon et al. (2017) discriminate in
their work between interpretation, which they define as a mapping of an abstract concept
into a domain that the human expert can perceive and comprehend; and explanation, which
they define as a collection of features of the interpretable domain, that have contributed to
a given example to produce a decision.
We argue that in the medical domain, something like explainable medicine would be
urgently needed for many purposes including medical education, research and clinical de-
cision making. If medical professionals are complemented by sophisticated AI systems and
in some cases even overruled, the human experts must still have a chance, on demand, to
understand and to retrace the machine decision process. However, we also point out that
it is often assumed that humans are able to explain their decisions. This is often not the
case; sometimes experts are not able, or even not willing to provide an explanation.
Explainable-AI calls for confidence, safety, security, privacy, ethics, fairness and trust
(Kieseberg et al., 2016), and puts usability on the research agenda, too Miller et al. (2017).
All these aspects together are crucial for applicability in medicine generally, and for future
personalized medicine specifically (Hamburg and Collins, 2010).
2. https://www.computerworld.com.au/article/617359
3. https://www.nytimes.com/2017/11/21/magazine/can-ai-be-taught-to-explain-itself.html
3
Holzinger, Biemann, Pattichis, Kell
3. Explainable Models
We can distinguish two types of explainability/interpretability, which can be denominated
with Latin names used in law (Fellmeth and Horwitz, 2009): post-hoc explainability =
”(lat.) after this”, occurring after the event in question; e.g., explaining what the model
predicts in terms of what is readily interpretable; ante-hoc explainability = ”(lat.) before
this”, occurring before the event in question; e.g., incorporating explainability directly into
the structure of an AI-model, explainability by design.
Post-hoc systems aim to provide local explanations for a specific decision and making
it reproducible on demand (instead of explaining the whole systems behaviour). A repre-
sentative example is LIME (Local Interpretable Model-Agnostic Explanations) developed
by Ribeiro et al. (2016b), which is a model-agnostic system, where x ∈ Rd is the original
0
representation of an instance being explained, and x0 ∈ Rd is used to denote a vector for
its interpretable representation (e.g. x may be a feature vector containing word embed-
dings, with x0 being the bag of words). The goal is to identify an interpretable model over
the interpretable representation that is locally faithful to the classifier. The explanation
0
model is g : Rd → R, g ∈ G, where G is a class of potentially interpretable models, such
as linear models, decision trees, or rule lists; given a model g ∈ G, it can be visualized
as an explanation to the human expert (for details please refer to (Ribeiro et al., 2016a)).
Another example for a post-hoc system is BETA (Black Box Explanations through Trans-
parent Approximations, a model-agnostic framework for explaining the behavior of any
black-box classifier by simultaneously optimizing for fidelity to the original model and in-
terpretability of the explanation introduced by Lakkaraju et al. (2017). Bach et al. (2015)
presented a general solution to the problem of understanding classification decisions by
pixel-wise decomposition of nonlinear classifiers which allows to visualize the contributions
of single pixels to predictions for kernel-based classifiers over bag of words features and for
multilayered neural networks.
Ante-hoc systems are interpretable by design towards glass-box approaches (Holzinger
et al., 2017c); typical examples include linear regression, decision trees and fuzzy inference
systems. The latter have a long tradition and can be designed from expert knowledge or
from data and provides - from the viewpoint of HCI - a good framework for the interaction
between human expert knowledge and hidden knowledge in the data (Guillaume, 2001). A
further example was presented by Caruana et al. (2015), where high-performance generalized
additive models with pairwise interactions (GAMs) were applied to problems from the
medical domain yielding intelligible models, which uncovered surprising patterns in the data
that previously had prevented complex learned models from being fielded in this domain;
important is that they demonstrated scalability of such methods to large data sets containing
hundreds of thousands of patients and thousands of attributes while remaining intelligible
and providing accuracy comparable to the best (unintelligible) machine learning methods.
A further example for ante-hoc methods can be seen in Poulin et al. (2006), where they
described a framework for visually explaining the decisions of any classifier that is formulated
as an additive model and showed how to implement this framework in the context of three
models: naı̈ve Bayes, linear support vector machines and logistic regression, which they
implemented successfully into a bioinformatics application (Szafron et al., 2004).
4
Explainable AI for the Medical Domain
4. ImageNet is an image database containing 14,197,122 images (as of 24.12.2017) organized according to
nouns of WordNet, and is openly available: http://www.image-net.org
5. presented as a poster during the ICML 2009 workshop on Learning Feature Hierarchies,
http://www.cs.toronto.edu/ rsalakhu/deeplearning/program.html
5
Holzinger, Biemann, Pattichis, Kell
Activation maximization can be used as an analysis framework that searches for an input
pattern to produce a maximum model response for a specific quantity of interest (Berkes
and Wiskott, 2006; Simonyan and Zisserman, 2014). Consider a neural network classifier
mapping data points x to a set of classes (ωc )c . The output neurons encode the modeled
class probabilities p(ωc |x), and a prototype x? as representative of the class ωc can be found
by optimizing:
6
Explainable AI for the Medical Domain
In case d we see the other extreme, i.e. the expert is overfitted on some data distribution,
and thus, the optimization problem becomes essentially the maximization of the expert p(x)
itself.
resulting
prototype:
a b c d
artificial natural natural represents
looking looking looking p(x) instead
but unlikely and likely of ω c
p(ωc|x)
data
choice of "expert" p(x)
a none or l2
b underfitted
c true
d overfitted
Figure 1: Four cases illustrating how the ”expert” p(x) affects the prototype x? found by
activation maximization. The horizontal axis represents the input space, and the
vertical axis represents the probability (extreme case a: expert is absent, extreme
case d: expert is overfitted; Image source: (Montavon et al., 2017).
When using activation maximization for the purpose of model validation, an overfitted
expert (case d in Figure 1) must be especially avoided, as the latter could hide interesting
failure modes of the model p(ωc |x). A slightly underfitted expert (case b), e.g. that simply
favors images with natural colors, can already be sufficient. On the other hand, when using
AM to gain knowledge on a correctly predicted concept ωc , the focus should be to prevent
underfitting. Indeed, an underfitted expert would expose optima of p(ωc |x) potentially
distant from the data, and therefore, the prototype x? would not be truly representative of
ωc .
7
Holzinger, Biemann, Pattichis, Kell
In certain applications, data density models p(x) can be hard to learn, or they can be
so complex that maximizing them becomes difficult or even intractable. Therefore, a useful
alternative class of unsupervised models are generative models (e.g., Boltzmann machines,
variational Autoencoders, etc.) which do not provide the density function directly, but are
able to sample from it, usually via the following two steps:
1. Sample from a simple distribution q(z) ∼ N (0, I) which is defined in an abstract code
space Z;
2. Apply to the sample a decoding function g : Z → X , that maps it back to the original
input domain.
One suitable model is the generative adversarial network (GAN) introduced by Good-
fellow et al. (2014). It consists of two models: a generative model G that captures the data
distribution, and a discriminative model D that estimates the probability that a sample
came from the training data rather than from G. The training procedure for G is to max-
imize the probability of D making an error - which works like a minimax (minimizing a
possible loss for a worst case maximum loss) two-player game. In the space of arbitrary
functions G and D, a unique solution exists, with G recovering the training data distribu-
tion and D equal to 21 everywhere; in the case where G and D are defined by multi-layer
perceptrons, the entire system can be trained with backpropagation.
To learn the generator’s distribution pg over data x, a prior must be defined on the
input noise variables pz (z), and then a mapping to the data space as G(z; θg ), where G
is a differentiable function represented by a multi-layer perceptron with parameters θg .
The second multi-layer perceptron D(x; θd ) outputs a single scalar. D(x) represents the
probability that x came from the data rather than pg . D can be trained to maximize the
probability of assigning the correct label to both training examples and samples from G.
Simultaneously G can be trained to minimize log(1 − D(G(z))); in other words, D and G
play the following two-player minimax game with value function V (G, D):
min max V (D, G) = Ex∼pdata (x) [log D(x)] + Ez∼pz (z) [log(1 − D(G(z)))]. (5)
G D
where the first term is a composition of the newly introduced decoder and the original
classifier, and where the second term is an `2 -norm regularizer in the code space. Once
a solution z ? to the optimization problem is found, the prototype for ωc is obtained by
decoding the solution, that is, x? = g(z ? ).
The `2 -norm regularizer in the input space can be understood in the context of image
data as favoring gray-looking images. The effect of the `2 -norm regularizer in the code
space can instead be understood as encouraging codes that have high probability. High
probability codes do not necessarily map to high density regions of the input space; for
more details please refer to the excellent tutorial given by Montavon et al. (2017).
8
Explainable AI for the Medical Domain
One technique, which is highly interesting for the medical domain, e.g. for images generated
by digital pathology (which are orders of magnitudes larger than e.g. radiological images)
(Holzinger et al., 2017b) is the use of deconvolutional networks (Zeiler et al., 2010). They
are an excellent framework that permits the unsupervised construction of hierarchical image
representations and thus enables visualization of the layers of convolutional networks (Zeiler
and Fergus, 2014).
Understanding the operation of a convolutional neural network requires the interpreta-
tion of feature activity in intermediate layers, and these can be mapped back to the input
image space, showing what input pattern originally caused a given activation in the fea-
ture maps. This brings us to enormously important fundamental research opportunities in
causality (Krynski and Tenenbaum, 2007; Pearl, 2009; Bottou, 2014) - which is novel in the
context of personalized (P4) medicine.
From the perspective of fundamental research, the gained insights might contribute
towards building machines that learn and think like people (Lake et al., 2015, 2016).
One possibility to make e.g. deep networks (LeCun et al., 2015) transparent is to
generate image captions in order to train a second network with explanations without
explicitly identifying the semantic features of the original network.
Of enormous importance is the possibility to extend the approaches used to generate
image captions to train a second deep network to generate explanations (Hendricks et al.,
2016) which is at the intersection of images and text and can be tackled with Visual Ques-
tion Answering (VQA) (Goyal et al., 2016). While this second network is not guaranteed to
provide reasons correlated to those used in the original network, it seems promising to use
neural attention mechanisms to be able to trace which part of the input contributed most
to which part of the output, see e.g. (Pavlopoulos et al., 2017).
We envision alternative machine learning techniques that learn more structured, inter-
pretable, and causal models. These can include Bayesian Rule Lists (Letham et al., 2015),
and in order to learn richer, more conceptual and generative models, techniques such as
Bayesian Program Learning (Lake et al., 2015), learning models of causal relationships
(Maier et al., 2010, 2013; Aalen et al., 2016), and stochastic grammars to learn more inter-
pretable structures (Brendel and Todorovic, 2011; Zhou and De la Torre, 2012; Park et al.,
2016). Very useful for building explainable systems in the medical domain is generally ge-
netic programming (Koza, 1994; Pena-Reyes and Sipper, 1999; Tsakonas et al., 2004), and
specifically evolutionary algorithms (Wang and Palade, 2011; Holzinger et al., 2014, 2016).
9
Holzinger, Biemann, Pattichis, Kell
Input images can be expressed as a sum of AM-FM components, where the challenge is
to decompose any input image s(x) into a sum of bi-dimensional AM–FM harmonics of the
form
L
X L
X
s(x1 , x2 ) = s` (t) = A` (x1 , x2 ) cos(ϕ` (x1 , x2 )) , (7)
`=1 `=1
where A` > 0 denotes a slowly–varying amplitude function, ϕ` denotes the phase, and
` = 1, · · · , L indexes the different AM–FM harmonics. To each phase function, one can
associate an instantaneous frequency vector field defined as ω` = ∇ϕ` . Finding the compo-
nents s` from the bidimensional signal s is called the decomposition problem (Clausel et al.,
2015).
We provide an example in Figure 2 in a symptomatic stroke plaque and an asymptomatic
plaque in ultrasound imaging of the carotid. In top three rows of Figure 2, we have a
symptomatic example that can be used to demonstrate several issues associated with high-
risk cases. First, we have large dark regions corresponding to the lipids or other dangerous
components. Second, these dark plaque regions are located very close to the plaque surface.
However, in the original images, we cannot see any structure over these dark regions. A very
rich structure plaque surface structure is revealed by the FM reconstructions of the second
row. Starting from the very-low to the high frequency scales, instantaneous frequency can
be seen adjusting to the local texture content with some sharp changes around different
structures. In contrast, the asymptomatic plaque image reconstructions of the last two
rows do not include significant low-intensity components. The high-intensity components
of the fourth row (right image) dominate the reconstruction. There is also more regularity
(homogeneity) in the asymptomatic reconstructions of the last row. Far more variability
and heterogeneity are evident in the symptomatic FM reconstructions.
AM-FM decompositions have been enabled following the introduction of new demod-
ulation methods as summarized in (Murray et al., 2010), (Murray et al., 2012) and are
a hot topic, and in a combined effort can be very beneficial for context understanding; a
summary of several medical applications of novel multi-scale AM-FM methods can be found
in our recently published survey (Murray et al., 2012). Earlier work with AM-FM models
had demonstrated its promise with textured images, as in the example of fingerprint image
classification (Pattichis et al., 2001), tree image analysis analysing growth seasons (Ra-
machandran et al., 2011), non-stationary wood-grain characterization, and other texture
images (Kokkinos et al., 2009).
The introduction of a multiscale approach in (Murray et al., 2010) demonstrated that
the method can be used to reconstruct general images. In particular, a multi-scale AM-
FM representation led to the important application of population screening for diabetic
retinopathy6 as documented in (Agurto et al., 2010), (Rahim et al., 2015), hysteroscopy
image assessment (Constantinou et al., 2012), fMRI and MRI image analysis (Loizou et al.,
2011b), and atherosclerotic plaque ultrasound image and video analysis (Loizou et al.,
2011a). Alternatively, the definition of multidimensional AM-FM transforms over curvilin-
6. Early detection of diabetic retinopathy is extremely important in order to prevent premature visual loss
and blindness
10
Explainable AI for the Medical Domain
ear coordinate systems has led to the earlier development of very low bitrate video coding
as demonstrated in (Lee et al., 2001, 2002).
Complex wavelets can also be very powerful, e.g. in the analysis of images of elec-
trophoretic gels used in the analysis of protein expression levels in living cells, where much
of the positional information of a data feature is carried in the phase of a complex transform.
Complex transforms allow explicit specification of the phase, and hence of the position of
features in the image. Registration of a test gel to a reference gel is achieved by using a
multiresolution movement map derived from the phase of a complex wavelet transform (the
Q-shift wavelet transform) to dictate the warping directly via movement of the nodes of a
Delaunay-triangulated mesh of points. This warping map is then applied to the original
untransformed image such that the absolute magnitude of the spots remains unchanged.
The technique is general to any type of image. Results are presented for a simple computer
simulated gel, a simple real gel registration between similar clean gels with local warping
vectors distributed along one main direction, a hard problem between a reference gel and a
11
Holzinger, Biemann, Pattichis, Kell
dirty test gel with multi-directional warping vectors and many artifacts, and some typical
gels of present interest in post-genomic biology. The method compares favourably with
others, since it is computationally rapid, effective and entirely automatic (Woodward et al.,
2004).
12
Explainable AI for the Medical Domain
From Figure 3, it is clear that the list of the dominant Gabor filters provides for a very
compact visualization of image content. In Figure 3, each symmetric pair of circles repre-
sents a single filter. With just 10 to 30 filters, we can describe strong variabilities in image
content. The frequency domain is also very easy to explain. We observe strong directional
selectivity orthogonal to image lines, strong concentration of low-frequency components,
and select, high frequency components. As described earlier, these decompositions have
provided excellent features for a wide-range of biomedical applications. Furthermore, in
comparison, ResNet requires 152 layers that cannot be easily visualized.
Figure 3: SEM (Scanning Electron Microscopy) images of diabetes plasma with correspond-
ing dominant Gabor filters. (a) and (c) Original nice spaghetti-like images in
healthy controls. (b) and (d) Dominant Gabor filters for (a) and (c) respectively.
(g) Dense matted deposits (DMDs) in type 2 diabetes that are removed (e) when
we add small amounts of lipopolysaccharide-binding protein (LBP). (f) and (h)
Dominant Gabor filters for (e) and (g) respectively. The dominant Gabor fil-
ters are shown in the frequency domain. SEM images taken from 2017 Nature
Scientific Reports Diabetes and Control Data, Figure 7A.
13
Holzinger, Biemann, Pattichis, Kell
Currently, platforms are missing that help towards not only the analysis but the inter-
pretation of the data, information and knowledge obtained from the above-mentioned omics
technologies and to cross-link them to hospital data. Moreover, it is necessary to narrow
down the gap between genotype and phenotype as well as providing additional information
regarding biomarker research and drug discovery, where biobanks (Huppertz and Holzinger,
2014) play an increasingly important role.
One of the grand challenges here is to close the research cycle in such a way that all
the data generated by one research study can be consistently associated with the original
samples, therefore the underlying original research data and the knowledge gained thereof,
can be reused. This can be enabled by a catalogue providing the information hub connecting
all relevant information sources (Müller et al., 2015). The key knowledge embedded in such
a biobank catalogue is the availability and quality of proper samples to perform a research
project. To overview and compare collections from different catalogues, visual analytics
techniques are necessary, especially glyph based visualization techniques (Müller et al.,
2014). We cannot emphasize often enough the combined view on heterogeneous data
sources in a unified and meaningful way, consequently enabling the discovery and
visualization of data from different sources, which would enable totally new insights.
Here, toolsets are urgently needed to support the bidirectional interaction with compu-
tational multiscale analysis and modelling to help to go towards the far-off goal of future
medicine7 (Hood and Friend, 2011; Tian et al., 2012).
14
Explainable AI for the Medical Domain
In such black-box approaches already the input symbols (words) are replaced by vectors
(e.g. skip-gram model for learning vector representations of words from large amounts
of unstructured text (Mikolov et al., 2013a)), resulting in a few hundred un-interpretable
dimensions (a.k.a. embeddings, e.g. (Mikolov et al., 2013b)).
As opposed to fields such as speech or image processing, the improvements recently
gained with deep learning on text are rather modest, yet its use is very attractive since neural
representations reduce the workload of manually crafting features enormously (Manning,
2015).
In the medical domain, where a large amount of knowledge is represented in textual
form, there exists already a large knowledge graph of medical terms (the UMLS8 ), where it
is crucial to underpin machine output with reasons that are human-verifiable and where high
precision is imperative for supporting, not distracting practitioners. The only way forward
seems to be the integration of both knowledge-based and neural approaches to combine
the interpretability of the former with the high efficiency of the latter. To this end, there
have been attempts to retrofit neural embeddings with information from knowledge bases
(e.g. (Faruqui et al., 2015)) as well as to project embedding dimensions onto interpretable
low-dimensional sub-spaces (Rothe et al., 2016).
More promising, in our opinion, is the use of hybrid distributional models that combine
sparse graph-based representations (Biemann and Riedl, 2013) with dense vector repre-
sentations Mikolov et al. (2013b) and link them to lexical resources and knowledge bases
(Faralli et al., 2016). Here a hybrid human-in-the-loop approach can be beneficial,
where not only the machine learning models for knowledge extraction are supported and
improved over time, the final entity graph becomes larger, cleaner, more precise and thus
more usable for domain experts (Yimam et al., 2017). Contrary to classical automatic ma-
chine learning, human-in-the-loop approaches do not operate on predefined training or test
sets, but assume that human expert input regarding system improvement is supplied iter-
atively. In such an approach the machine learning model is built continuously on previous
annotations and used to propose labels for subsequent annotation Yimam et al. (2016).
Combined with an interpretable disambiguation system (Panchenko et al., 2017), this
realizes concept-linking in context with high accuracy while providing human-interpretable
reasons for why concepts have been selected. Figure 4 shows machine reading capabilities
of the system described in Panchenko et al. (2017): The system can automatically assign
more general terms in context and can disambiguate terms with several senses to the one
matching the context. Note that while unsupervised machine learning is used for inducing
the sense inventory, the senses are interpretable by providing a rich sense representation as
visible in the figure. This method does not require a manually defined ontology and thus is
applicable across languages and domains.
The quest for the future is to generalize these notions to enable semantic matching
beyond keyword representations (cf. Cer et al. (2017)) in order to transfer these principles
from the concept level to the event level.
8. https://www.nlm.nih.gov/research/umls
15
Holzinger, Biemann, Pattichis, Kell
Figure 4: Output of an unsupervised interpretable model for text interpretation for the
input: ”diabetes plasma is from blood transfusions with high sugar” (Note that
it brings up plasma (material) not plasma (display device)! Image created online
via ltbev.informatik.uni-hamburg.de/wsd on 27.12.2017, 19:30
16
Explainable AI for the Medical Domain
Easily Fooled” by Nguyen et al. (2015). None of this suggests easy interpretability is even
possible for deep neural networks.
In the medical domain a large amount of knowledge is represented in textual form, and
the written text of the medical reports is legally binding, unlike images nor *omics data.
Here it is crucial to underpin machine output with reasons that are human-verifiable and
where high precision is imperative for supporting, not distracting the medical experts. The
only way forward seems the integration of both knowledge-based and neural approaches to
combine the interpretability of the former with the high efficiency of the latter. Promising
for explainable-AI in the medical domain seems to be the use of hybrid distributional
models that combine sparse graph-based representations with dense vector representations
and link them to lexical resources and knowledge bases.
Last but not least we emphasize that successful explainable-AI systems need effective
user interfaces, fostering new strategies for presenting human understandable expla-
nations, e.g. explanatory debugging (Kulesza et al., 2015), affective computing (Picard,
1997), sentiment analysis (Maas et al., 2011; Petz et al., 2015). While the aforementioned
methods are inherently more explainable, their performance is less optimal hence possibili-
ties to enhance 2-way interaction have to be explored, which calls for optical computing for
machine learning purposes.
Acknowledgments
We are grateful for valuable comments from our local, national and international colleagues,
including George Spyrou, Cyprus Institute of Neurology and Genetics, Chris Christodoulou,
Ioannis Constantinou and Kyriacos Constantinou, University of Cyprus and Marios S. Pat-
tichis, University of New Mexico.
References
Odd Olai Aalen, Kjetil Røysland, Jon Michael Gran, Roger Kouyos, and Tanja Lange.
Can we believe the dags? a comment on the relationship between causal dags and
mechanisms. Statistical methods in medical research, 25(5):2294–2314, 2016. doi:
10.1177/0962280213520436.
Carla Agurto, Victor Murray, Eduardo Barriga, Sergio Murillo, Marios Pattichis, Herbert
Davis, Stephen Russell, Michael Abràmoff, and Peter Soliz. Multiscale AM-FM methods
for diabetic retinopathy lesion detection. IEEE transactions on medical imaging, 29(2):
502–512, 2010. doi: 10.1109/TMI.2009.2037146.
Yoshua Bengio. Learning deep architectures for ai. Foundations and trends in Machine
Learning, 2(1):1–127, 2009. doi: 10.1561/2200000006.
17
Holzinger, Biemann, Pattichis, Kell
Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review
and new perspectives. IEEE transactions on pattern analysis and machine intelligence,
35(8):1798–1828, 2013. doi: 10.1109/TPAMI.2013.50.
Pietro Berkes and Laurenz Wiskott. On the analysis and interpretation of inhomogeneous
quadratic forms as receptive fields. Neural computation, 18(8):1868–1895, 2006. doi:
10.1162/neco.2006.18.8.1868.
Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic Web. Scientific American,
284(5):34–43, May 2001.
Chris Biemann and Martin Riedl. Text: Now in 2D! A Framework for Lexical Expansion
with Contextual Similarity. Journal of Language Modelling, 1(1):55–95, 2013. doi: 10.
15398/jlm.v1i1.60.
Guido Bologna and Yoichi Hayashi. Characterization of symbolic rules embedded in deep
dimlp networks: A challenge to transparency of deep learning. Journal of Artificial Intel-
ligence and Soft Computing Research, 7(4):265–286, 2017. doi: 10.1515/jaiscr-2017-0019.
Léon Bottou. From machine learning to machine reasoning. Machine learning, 94(2):133–
149, 2014. doi: 10.1007/s10994-013-5335-x.
William Brendel and Sinisa Todorovic. Learning spatiotemporal graphs of human activi-
ties. In IEEE international conference on Computer vision (ICCV 2011), pages 778–785.
IEEE, 2011. doi: 10.1109/ICCV.2011.6126316.
Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad.
Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day read-
mission. In 21th ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining (KDD ’15), pages 1721–1730. ACM, 2015. doi: 10.1145/2783258.2788613.
Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, and Lucia Specia. Semeval-2017
task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017),
pages 1–14, Vancouver, Canada, August 2017. Association for Computational Linguistics.
URL http://www.aclweb.org/anthology/S17-2001.
Philipp Cimiano and Johanna Völker. Text2onto: a framework for ontology learning and
data-driven change discovery. In 10th international conference on Natural Language
Processing and Information Systems (NLDB ’05), pages 227–238. Springer, 2005. doi:
10.1007/11428817 21.
Philipp Cimiano, Christina Unger, and John McCrae. Ontology-based interpretation of
natural language, volume 24. Morgan and Claypool Publishers, 2014. doi: 10.2200/
S00561ED1V01Y201401HLT024.
Marianne Clausel, Thomas Oberlin, and Valrie Perrier. The monogenic synchrosqueezed
wavelet transform: a tool for the decomposition/demodulation of amfm images. Applied
and Computational Harmonic Analysis, 39(3):450–486, 2015. doi: 10.1016/j.acha.2014.
10.003.
18
Explainable AI for the Medical Domain
Ioannis Constantinou, Marios S. Pattichis, Vasillis Tanos, Marios Neofytou, and Constanti-
nos S. Pattichis. An adaptive multiscale AM-FM texture analysis system with applica-
tion to hysteroscopy imaging. In 12th International IEEE Conference on Bioinformatics
& Bioengineering (BIBE 2012), pages 744–747. IEEE, 2012. doi: 10.1109/BIBE.2012.
6399760.
Mark G. Core, H. Chad Lane, Michael Van Lent, Dave Gomboc, Steve Solomon, and
Milton Rosenberg. Building explainable artificial intelligence systems. In AAAI, pages
1766–1773. MIT Press, 2006.
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Li Kai, and Fei-Fei Li. Imagenet: A
large-scale hierarchical image database. In IEEE Conference on Computer Vision and
Pattern Recognition (CVPR 2009), pages 248–255. IEEE, 2009. doi: 10.1109/CVPR.
2009.5206848.
Derek Doran, Sarah Schulz, and Tarek R. Besold. What does explainable ai really mean?
a new conceptualization of perspectives. arXiv:1710.00794, 2017.
Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent. Visualizing higher-
layer features of a deep network. University of Montreal Technical Report Nr. 1341,
2009.
Andre Esteva, Brett Kuprel, Roberto A. Novoa, Justin Ko, Susan M. Swetter, Helen M.
Blau, and Sebastian Thrun. Dermatologist-level classification of skin cancer with deep
neural networks. Nature, 542(7639):115–118, 2017. doi: 10.1038/nature21056.
Stefano Faralli, Alexander Panchenko, Chris Biemann, and Simone P. Ponzetto. Linked
Disambiguated Distributional Semantic Networks, pages 56–64. Springer International
Publishing, Cham, 2016. doi: 10.1007/978-3-319-46547-0 7.
Manaal Faruqui, Jesse Dodge, Sujay Kumar Jauhar, Chris Dyer, Eduard Hovy, and Noah A.
Smith. Retrofitting word vectors to semantic lexicons. In Proceedings of the 2015 Con-
ference of the North American Chapter of the Association for Computational Linguistics:
Human Language Technologies, pages 1606–1615, Denver, Colorado, May–June 2015.
Association for Computational Linguistics. URL http://www.aclweb.org/anthology/
N15-1184.
Aaron X. Fellmeth and Maurice Horwitz. Guide to Latin in international law. Oxford
University Press, 2009.
Dominic Girardi, Josef Kng, Raimund Kleiser, Michael Sonnberger, Doris Csillag, Johannes
Trenkler, and Andreas Holzinger. Interactive knowledge discovery with the doctor-in-
the-loop: a practical example of cerebral aneurysms research. Brain Informatics, 3(3):
133–143, 2016. doi: 10.1007/s40708-016-0038-2.
19
Holzinger, Biemann, Pattichis, Kell
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies
for accurate object detection and semantic segmentation. In Proceedings of the IEEE
conference on computer vision and pattern recognition (CVPR), pages 580–587. IEEE,
2014. doi: 10.1109/CVPR.2014.81.
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil
Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Zhoubin
Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger,
editors, Advances in neural information processing systems (NIPS), pages 2672–2680,
2014.
Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, and Devi Parikh. Making the
v in vqa matter: Elevating the role of image understanding in visual question answering.
arXiv:1612.00837, 2016.
David Gunning. Explainable artificial intelligence (XAI): Technical Report Defense Ad-
vanced Research Projects Agency DARPA-BAA-16-53. DARPA, Arlington, USA, 2016.
Margaret A. Hamburg and Francis S. Collins. The path to personalized medicine. New
England Journal of Medicine, 363(4):301–304, 2010. doi: 10.1056/NEJMp1006304.
Lisa Anne Hendricks, Zeynep Akata, Marcus Rohrbach, Jeff Donahue, Bernt Schiele, and
Trevor Darrell. Generating visual explanations. In European Conference on Computer Vi-
sion ECCV 2016, Lecture Notes in Computer Science LNCS 9908, pages 3–19. Springer,
Heidelberg, 2016. doi: 10.1007/978-3-319-46493-0 1.
Patrick Hohenecker and Thomas Lukasiewicz. Deep learning for ontology reasoning.
arXiv:1705.10342, 2017.
John Henry Holland. Adaptation in natural and artificial systems: an introductory analysis
with applications to biology, control, and artificial intelligence. MIT Press, Cambridge
(MA), 1992.
20
Explainable AI for the Medical Domain
Science LNCS 8127, pages 319–328. Springer, Heidelberg, Berlin, New York, 2013. doi:
10.1007/978-3-642-40511-2 22.
Andreas Holzinger, Markus Plass, Katharina Holzinger, Gloria Cerasela Crisan, Camelia-
M. Pintea, and Vasile Palade. Towards interactive machine learning (iML): Applying ant
colony algorithms to solve the traveling salesman problem with the human-in-the-loop
approach. In Springer Lecture Notes in Computer Science LNCS 9817, pages 81–95.
Springer, Heidelberg, Berlin, New York, 2016. doi: 10.1007/978-3-319-45507-56.
Andreas Holzinger, Randy Goebel, Vasile Palade, and Massimo Ferri. Towards integrative
machine learning and knowledge extraction. In Towards Integrative Machine Learning
and Knowledge Extraction: Springer Lecture Notes in Artificial Intelligence LNAI 10344,
pages 1–12. Springer International, Cham, 2017a. doi: 10.1007/978-3-319-69775-8 1.
Andreas Holzinger, Bernd Malle, Peter Kieseberg, Peter M. Roth, Heimo Müller, Robert
Reihs, and Kurt Zatloukal. Towards the augmented pathologist: Challenges of
explainable-ai in digital pathology. arXiv:1712.06657, 2017b.
Andreas Holzinger, Markus Plass, Katharina Holzinger, Gloria Cerasela Crisan, Camelia-M.
Pintea, and Vasile Palade. A glass-box interactive machine learning approach for solving
np-hard problems with the human-in-the-loop. arXiv:1708.01104, 2017c.
Katharina Holzinger, Vasile Palade, Raul Rabadan, and Andreas Holzinger. Darwin or
lamarck? future challenges in evolutionary algorithms for knowledge discovery and data
mining. In Interactive Knowledge Discovery and Data Mining in Biomedical Informatics:
State-of-the-Art and Future Challenges. Lecture Notes in Computer Science LNCS 8401,
pages 35–56. Springer, Heidelberg, Berlin, 2014. doi: 10.1007/978-3-662-43968-5 3.
Leroy Hood and Stephen H. Friend. Predictive, personalized, preventive, participatory (P4)
cancer medicine. Nature Reviews Clinical Oncology, 8(3):184–187, 2011. doi: 10.1038/
nrclinonc.2010.227.
Xin Huang and Yuxin Peng. Cross-modal deep metric learning with multi-task regulariza-
tion. arXiv:1703.07026, 2017.
Berthold Huppertz and Andreas Holzinger. Biobanks a source of large biological data sets:
Open problems and future challenges. In Andreas Holzinger and Igor Jurisica, editors,
Interactive Knowledge Discovery and Data Mining in Biomedical Informatics, Lecture
Notes in Computer Science LNCS 8401, pages 317–330. Springer, Berlin, Heidelberg,
2014. doi: 10.1007/978-3-662-43968-5 18.
21
Holzinger, Biemann, Pattichis, Kell
Herve Jegou, Florent Perronnin, Matthijs Douze, Jorge Snchez, Patrick Perez, and Cordelia
Schmid. Aggregating local image descriptors into compact codes. IEEE transactions on
pattern analysis and machine intelligence, 34(9):1704–1716, 2012. doi: 10.1109/TPAMI.
2011.235.
W. Lewis Johnson. Agents that learn to explain themselves. In Twelfth National Conference
on Artificial Intelligence (AAAI ’94), pages 1257–1263. AAAI, 1994.
Douglas B. Kell and Etheresia Pretorius. Proteins behaving badly. substoichiometric molec-
ular control and amplification of the initiation and nature of amyloid fibril formation:
lessons from and for blood clotting. Progress in biophysics and molecular biology, 123:
16–41, 2017.
Peter Kieseberg, Edgar Weippl, and Andreas Holzinger. Trust for the doctor-in-the-loop.
European Research Consortium for Informatics and Mathematics (ERCIM) News: Tack-
ling Big Data in the Life Sciences, 104(1):32–33, 2016.
Iasonas Kokkinos, Georgios Evangelopoulos, and Petros Maragos. Texture analysis and
segmentation using modulation features, generative models, and weighted curve evolution.
IEEE transactions on pattern analysis and machine intelligence, 31(1):142–157, 2009. doi:
10.1109/TPAMI.2008.33.
Markus Krötzsch, Sebastian Rudolph, and Pascal Hitzler. ELP: Tractable Rules for OWL
2, pages 649–664. Springer, Berlin, Heidelberg, 2008. doi: 10.1007/978-3-540-88564-1 41.
Tevye R. Krynski and Joshua B. Tenenbaum. The role of causality in judgment under
uncertainty. Journal of Experimental Psychology: General, 136(3):430, 2007. doi: 10.
1037/0096-3445.136.3.430.
Todd Kulesza, Margaret Burnett, Weng-Keen Wong, and Simone Stumpf. Principles of
explanatory debugging to personalize interactive machine learning. In Proceedings of the
20th International Conference on Intelligent User Interfaces (IUI 2015), pages 126–137.
ACM, 2015. doi: 10.1145/2678025.2701399.
Carmen Lacave and Francisco J. Diez. A review of explanation methods for Bayesian
networks. The Knowledge Engineering Review, 17(2):107–127, 2002. doi: 10.1017/
S026988890200019X.
Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. Interpretable and
explorable approximations of black box models. arXiv:1707.01154, 2017.
22
Explainable AI for the Medical Domain
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):
436–444, 2015. doi: 10.1038/nature14539.
Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. Convolutional deep
belief networks for scalable unsupervised learning of hierarchical representations. In 26th
annual international conference on machine learning (ICML ’09), pages 609–616. ACM,
2009. doi: 10.1145/1553374.1553453.
Sanghoon Lee, Marios S. Pattichis, and Alan C. Bovik. Foveated video compression with
optimal rate control. IEEE Transactions on Image Processing, 10(7):977–992, 2001. doi:
10.1109/83.931092.
Sanghoon Lee, Marios S. Pattichis, and Alan C. Bovik. Foveated video quality assessment.
IEEE Transactions on Multimedia, 4(1):129–132, 2002. doi: 10.1109/6046.985561.
Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, and David Madigan. Interpretable
classifiers using rules and bayesian analysis: Building a better stroke prediction model.
The Annals of Applied Statistics, 9(3):1350–1371, 2015. doi: 10.1214/15-AOAS848.
Christos P. Loizou, Victor Murray, Marios S. Pattichis, Marios Pantziaris, and Constanti-
nos S. Pattichis. Multiscale amplitude-modulation frequency-modulation (amfm) tex-
ture analysis of ultrasound images of the intima and media layers of the carotid artery.
IEEE Transactions on Information Technology in Biomedicine, 15(2):178–188, 2011a.
doi: 10.1109/TITB.2010.2081995.
Christos P. Loizou, Victor Murray, Marios S. Pattichis, Ioannis Seimenis, Marios Pantziaris,
and Constantinos S. Pattichis. Multiscale amplitude-modulation frequency-modulation
(amfm) texture analysis of multiple sclerosis in brain mri images. IEEE Transactions on
Information Technology in Biomedicine, 15(1):119–129, 2011b. doi: 10.1109/TITB.2010.
2091279.
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and
Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the
49th Annual Meeting of the Association for Computational Linguistics, pages 142–150,
2011.
Alexander Maedche and Steffen Staab. Ontology learning for the semantic web. IEEE
Intelligent systems, 16(2):72–79, 2001. doi: 10.1109/5254.920602.
Marc E. Maier, Brian J. Taylor, Huseyin Oktay, and David D. Jensen. Learning causal
models of relational domains. In Proceedings of the Twenty-Fourth AAAI Conference on
Artificial Intelligence (AAAI-10), pages 531–538. AAAI, 2010.
Marc E. Maier, Katerina Marazopoulou, David Arbour, and David D. Jensen. A sound and
complete algorithm for learning causal models from relational data. arXiv:1309.6843,
2013.
23
Holzinger, Biemann, Pattichis, Kell
24
Explainable AI for the Medical Domain
Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural
networks. arXiv:1601.06759, 2016.
Alexander Panchenko, Eugen Ruppert, Stefano Faralli, Simone Paolo Ponzetto, and Chris
Biemann. Unsupervised does not mean uninterpretable: The case for word sense induction
and disambiguation. In Proceedings of the 15th Conference of the European Chapter of
the Association for Computational Linguistics: Volume 1, Long Papers, pages 86–98,
Valencia, Spain, April 2017. Association for Computational Linguistics. URL http:
//www.aclweb.org/anthology/E17-1009.
Seyoung Park, Bruce Xiaohan Nie, and Song-Chun Zhu. Attribute and-or grammar for
joint parsing of human attributes, part and pose. arXiv:1605.02112, 2016.
Marios S. Pattichis, George Panayi, Alan C. Bovik, and Shun-Pin Hsu. Fingerprint classifi-
cation using an AM-FM model. IEEE Transactions on Image Processing, 10(6):951–954,
2001. doi: 10.1109/83.923291.
John Pavlopoulos, Prodromos Malakasiotis, and Ion Androutsopoulos. Deeper attention to
abusive user content moderation. In Proceedings of the 2017 Conference on Empirical
Methods in Natural Language Processing, pages 1136–1146. Association for Computa-
tional Linguistics, 2017.
Judea Pearl. Causality: Models, Reasoning, and Inference (2nd Edition). Cambridge Uni-
versity Press, Cambridge, 2009.
Carlos A. Pena-Reyes and Moshe Sipper. A fuzzy-genetic approach to breast cancer diagno-
sis. Artificial intelligence in medicine, 17(2):131–155, 1999. doi: 10.1016/S0933-3657(99)
00019-6.
Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. Elements of causal inference:
foundations and learning algorithms. Cambridge (MA), 2017.
Gerald Petz, Michal Karpowicz, Harald Fuerschuss, Andreas Auinger, Vaclav Stritesky, and
Andreas Holzinger. Computational approaches for mining users opinions on the web 2.0.
Information Processing & Management, 51(4):510–519, 2015. doi: 10.1016/j.ipm.2014.
07.011.
Rosalind W. Picard. Affective Computing. MIT Press, Cambridge (MA), 1997.
Brett Poulin, Roman Eisner, Duane Szafron, Paul Lu, Russell Greiner, David S. Wishart,
Alona Fyshe, Brandon Pearcy, Cam MacDonell, and John Anvik. Visual explanation
of evidence with additive classifiers. In National Conference On Artificial Intelligence,
pages 1822–1829. AAAI, 2006.
Sarni Suhaila Rahim, Vasile Palade, Chrisina Jayne, Andreas Holzinger, and James Shut-
tleworth. Detection of diabetic retinopathy and maculopathy in eye fundus images using
fuzzy image processing. In Yike Guo, Karl Friston, Faisal Aldo, Sean Hill, and Hanchuan
Peng, editors, Brain Informatics and Health, Lecture Notes in Computer Science, LNCS
9250, pages 379–388. Springer, Cham, Heidelberg, New York, Dordrecht, London, 2015.
doi: 10.1007/978-3-319-23344-4 37.
25
Holzinger, Biemann, Pattichis, Kell
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Model-agnostic interpretability
of machine learning. arXiv:1606.05386, 2016a.
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should I trust you?: Ex-
plaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, pages 1135–1144.
ACM, 2016b.
David Rolnick, Andreas Veit, Serge Belongie, and Nir Shavit. Deep learning is robust to
massive label noise. arXiv:1705.10694, 2017.
Sascha Rothe, Sebastian Ebert, and Hinrich Schütze. Ultradense word embeddings by or-
thogonal transformation. In Proceedings of the 2016 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technolo-
gies, pages 767–777, San Diego, California, June 2016. Association for Computational
Linguistics. URL http://www.aclweb.org/anthology/N16-1091.
Stuart J. Russell and Peter Norvig. Artificial Intelligence: A modern approach. Prentice
Hall, Englewood Cliffs, 1995.
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P. Adams, and Nando de Freitas. Taking
the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE,
104(1):148–175, 2016. doi: 10.1109/JPROC.2015.2494218.
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale
image recognition. arXiv:1409.1556, 2014.
Deepika Singh, Erinc Merdivan, Ismini Psychoula, Johannes Kropf, Sten Hanke, Matthieu
Geist, and Andreas Holzinger. Human activity recognition using recurrent neural net-
works. In Andreas Holzinger, Peter Kieseberg, A Min Tjoa, and Edgar Weippl, ed-
itors, Machine Learning and Knowledge Extraction: Lecture Notes in Computer Sci-
ence LNCS 10410, pages 267–274. Springer International Publishing, Cham, 2017. doi:
10.1007/978-3-319-66808-6 18.
Jiawei Su, Danilo Vasconcellos Vargas, and Sakurai Kouichi. One pixel attack for fooling
deep neural networks. arXiv:1710.08864, 2017.
26
Explainable AI for the Medical Domain
William Swartout, Cecile Paris, and Johanna Moore. Explanations in knowledge systems:
Design for explainable expert systems. IEEE Expert, 6(3):58–64, 1991. doi: 10.1109/64.
87686.
Duane Szafron, Paul Lu, Russell Greiner, David S. Wishart, Brett Poulin, Roman Eisner,
Zhiyong Lu, John Anvik, Cam Macdonell, and Alona Fyshe. Proteome analyst: custom
predictions with explanations in a web-based tool for high-throughput proteome annota-
tions. Nucleic acids research, 32(S2):W365–W371, 2004. doi: 10.1093/nar/gkh485.
Masato Taki. Deep residual networks and weight initialization. arXiv:1709.02956, 2017.
Qiang Tian, Nathan D. Price, and Leroy Hood. Systems cancer medicine: towards realiza-
tion of predictive, preventive, personalized and participatory (P4) medicine. Journal of
internal medicine, 271(2):111–121, 2012. doi: 10.1111/j.1365-2796.2011.02498.x.
Athanasios Tsakonas, Georgios Dounias, Jan Jantzen, Hubertus Axer, Beth Bjerregaard,
and Diedrich Graf von Keyserlingk. Evolving rule-based systems in two medical domains
using genetic programming. Artificial Intelligence in Medicine, 32(3):195–216, 2004. doi:
10.1016/j.artmed.2004.02.007.
Zhenyu Wang and Vasile Palade. Building interpretable fuzzy models for high di-
mensional data analysis in cancer diagnosis. BMC genomics, 12(2):S5, 2011. doi:
10.1186/1471-2164-12-S2-S5.
Bernard Widrow and Michael A. Lehr. 30 years of adaptive neural networks: perceptron,
madaline, and backpropagation. Proceedings of the IEEE, 78(9):1415–1442, 1990. doi:
10.1109/5.58323.
Andrew M. Woodward, Jem J. Rowland, and Douglas B. Kell. Fast automatic registration
of images using the phase of a complex wavelet transform: application to proteome gels.
Analyst, 129(6):542–552, 2004. doi: 10.1039/B403134B.
Saining Xie, Ross Girshick, Piotr Dollr, Zhuowen Tu, and Kaiming He. Aggregated residual
transformations for deep neural networks. arXiv:1611.05431, 2016.
Seid Muhie Yimam, Chris Biemann, Ljiljana Majnaric, efket abanovi, and Andreas
Holzinger. An adaptive annotation approach for biomedical entity and relation recog-
nition. Brain Informatics, 3(3):157–168, 2016. doi: 10.1007/s40708-016-0036-4.
Seid Muhie Yimam, Steffen Remus, Alexander Panchenko, Andreas Holzinger, and Chris
Biemann. Entity-centric information access with the human-in-the-loop for the biomedical
domains. In Svetla Boytcheva, Kevin Bretonnel Cohen, Guergana Savova, and Galia
27
Holzinger, Biemann, Pattichis, Kell
Lotfi A. Zadeh. Toward human level machine intelligence - is it achievable? the need for
a paradigm shift. IEEE Computational Intelligence Magazine, 3(3):11–22, 2008. doi:
10.1109/MCI.2008.926583.
Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks.
In Fleet D., Pajdla T., Schiele B., and Tuytelaars T., editors, ECCV, Lecture Notes in
Computer Science LNCS 8689, pages 818–833. Springer, Cham, 2014. doi: 10.1007/
978-3-319-10590-1 53.
Matthew D. Zeiler, Dilip Krishnan, Graham W. Taylor, and Rob Fergus. Deconvolutional
networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR
2010), pages 2528–2535. IEEE, 2010. doi: 10.1109/CVPR.2010.5539957.
Feng Zhou and Fernando De la Torre. Factorized graph matching. In CVPR 2012 IEEE
Conference onComputer Vision and Pattern Recognition, pages 127–134. IEEE, 2012.
Martin Zinkevich. Online convex programming and generalized infinitesimal gradient as-
cent. In 20th International Conference on Machine Learning (ICML ’03), pages 928–936.
AAAI, 2003.
28