Transparency As Design Publicity - Explaining and Justifying Inscrutable Algorithms
Transparency As Design Publicity - Explaining and Justifying Inscrutable Algorithms
https://doi.org/10.1007/s10676-020-09564-w
ORIGINAL PAPER
Abstract
In this paper we argue that transparency of machine learning algorithms, just as explanation, can be defined at different levels
of abstraction. We criticize recent attempts to identify the explanation of black box algorithms with making their decisions
(post-hoc) interpretable, focusing our discussion on counterfactual explanations. These approaches to explanation simplify
the real nature of the black boxes and risk misleading the public about the normative features of a model. We propose a new
form of algorithmic transparency, that consists in explaining algorithms as an intentional product, that serves a particular
goal, or multiple goals (Daniel Dennet’s design stance) in a given domain of applicability, and that provides a measure of
the extent to which such a goal is achieved, and evidence about the way that measure has been reached. We call such idea of
algorithmic transparency “design publicity.” We argue that design publicity can be more easily linked with the justification
of the use and of the design of the algorithm, and of each individual decision following from it. In comparison to post-hoc
explanations of individual algorithmic decisions, design publicity meets a different demand (the demand for impersonal
justification) of the explainee. Finally, we argue that when models that pursue justifiable goals (which may include fairness
as avoidance of bias towards specific groups) to a justifiable degree are used consistently, the resulting decisions are all
justified even if some of them are (unavoidably) based on incorrect predictions. For this argument, we rely on John Rawls’s
idea of procedural justice applied to algorithms conceived as institutions.
13
Vol.:(0123456789)
254 M. Loi et al.
sense of understanding the mechanism by which the model about an individual case was made. Although the post-hoc
works”(Lipton 2018). We claim that transparency (at least, mode of explanation can be useful to empower individuals
the kind of transparency we characterize in this paper) is in certain settings, we argue that it fails to explain why a
valuable because and in so far as it enables the individu- decision based on certain features is normatively adequate
als, who are subjected to algorithmic decision-making, to and justifiable from an impartial standpoint. While post-hoc
assess whether these decisions are morally and politically explanations do not provide useful elements to judge and
justifiable. We explain the relation between transparency and debate the justification of such systems, we intend transpar-
justification in “Design Publicity and Justification.” Differ- ency as design publicity to enable that type of discussion.
ent ideas may be conveyed by demanding that a machine In this paper, after providing some definitions (2), we
learning model be transparent, each focusing on different highlight some limitations of interpretable algorithms, by
aspects of the model, its components and the training algo- giving a prominent example of post-hoc interpretability
rithm (Lipton 2018). On the other hand, post-hoc explana- methods, i.e. counterfactual explanations, drawing support
tions focus on the outcome of the (learned) model; they from the recent literature. We, then, (3) propose a new con-
include (Mittelstadt et al. 2019) natural language processing cept of algorithmic transparency that overcomes the classical
explanations, visualizations, case-based and counterfactual split in model transparency vs. interpretability and which
explanations, and local approximations; they can be classi- we label “transparency as design publicity”; subsequently,
fied in model specific or model agnostic. Local approxima- (4) we argue that it provides a kind of explanation of their
tion allow, in particular, to explain why a black box model behavior: a teleological explanation, or explanation by
produced a selected prediction by approximating it with an design. This form of explanation tries to take into account
interpretable model (e.g. a linear regression) around the pre- the domain-specificities of the algorithm as well as the
diction at hand (Ribeiro et al. 2016). We refer to the goal expertise, understanding and interests of its end-users. The
of post-hoc explanations of individual decisions as “model special value of this explanation is that it links the behav-
interpretability.” ior of an algorithm to their justification (5) and, when the
In this paper, we advance a new conceptualization of algorithm is used consistently, to the procedural justice of
what explaining an algorithm amounts to. A key feature of its decisions.
our proposal is that we are fully explicit about the purposes
that this mode of explanation is intended to achieve. Design Machine learning and algorithms: some definitions
publicity is intended to empower the public to debate all
the key algorithmic design and testing choices relevant to In this section, we introduce some definitions that are rel-
assessing whether the decisions taken by such systems are evant for the remainder of this contribution. The aim here is
justified. This enables the revision of such design choices to provide the reader with an overview of some commonly
as the public understanding of these ideas evolves. It is not used concepts in the most recent literature on philosophy of
intended to identify solutions deemed absolutely correct and technology and artificial intelligence without indulging (too
incorrigible, to be enshrined once and forever in code. Justi- much) in technicalities and jargon. We start with machine
fication will realistically always be imperfect and incomplete learning, which is a multidisciplinary discipline “concerned
in spite of the best efforts put into it. with the question of how to construct computer programs
Design publicity takes the perspective of society (or of a that automatically improve with experience” (Mitchell
regulator on behalf of society) rather than that of an actual, 1997). Machine learning draws on concepts from artificial
concrete, particular individual subject to algorithmic deci- intelligence, information theory, algorithmics and philoso-
sions. It does not ignore that society is made of individuals, phy, among others. A machine learning problem “can be
but it assumes that individuals are able to take (or at least precisely defined as the problem of improving some meas-
aspire to) an impersonal standpoint when judging such sys- ure of performance P when executing some task T, through
tems. We refer to the more abstract idea of impartiality, not some type of training experience E” (Mitchell 1997). Train-
to a specific account of it—on which contrasting philosophi- ing experience E is represented by (digital) input data, which
cal proposals have been put forth—to characterize the type are preprocessed and formatted for the machine learning
of perspective that we have in mind here.1
Given this normative standpoint, we address a specific
limitation of those post-hoc explanations that identify a fea-
ture or limited set of features as the reason(s) why a decision Footnote 1 (continued)
and epistemic terms which all reasonable individuals in society could
1
This is a much weaker, open ended, and vague account of public accept” For society, the account of what counts as public argument
justification than the one offered by Binns (2018, p. 554) according (i.e. in our view, one involving reasons relevant from an impartial
to which a public justification must be grounded in public reason, standpoint) is meant to be as revisable and open-ended as every other
implying that it “must be able to account for its system in normative assumption on which the justification of algorithms rests.
13
Transparency as design publicity: explaining and justifying inscrutable algorithms 255
problem under consideration. Performance measures P (Wachter et al. 2017), (2) “should be used as a means to pro-
can be off-the-shelf or ad-hoc, that is engineered by the vide explanations for individual decisions” (Wachter et al.
resources responsible for the solution of the corresponding 2017), and (3) “can bridge the gap between the interests of
machine learning problem; they provide with an estimation data subjects and data controllers that otherwise acts as a
of the error made by the solution to the machine learning barrier to a legally binding right to explanation” (Wachter
problem in executing T, using experience E. et al. 2017). For simplicity, we do not consider the theory of
Solving a machine learning problem consists of specify- counterfactuals and causality, limiting our considerations to
ing a class H of mathematical constructs called machine machine learning counterfactuals only.
learning models, to be trained on input data D using a set of Counterfactual explanations identify the explanation of
algorithms implemented in computer-understandable pro- machine learning outcome by the provision of a set of fac-
gramming languages (Mitchell 1997). Therefore, through the tors, or model features, whose change in value alters the
algorithms in the training process, the best machine learn- outcome under consideration, keeping all other factors
ing model is trained or learned. The result of this process equal (Wachter et al. 2017). By design, they highlight “a
is an object in a programming language embedded in an IT limited set of features that are most deserving of a deci-
infrastructure to generate predictions on new data with the sion subject’s attention” (Barocas et al. 2020). Therefore, in
goal to assist or automate decision-making; such complex, the best-case scenario, counterfactual explanations provide
dynamic computer system becomes a “cognitive engine” users with actionable strategies to change the outcome into
at the core of products and services mentioned in (1). In a more favorable one (recourse) as a response to a machine-
the remainder of these notes, we will call this object “algo- generated decision (Ustun et al. 2019).3
rithm;” in fact, this is an algorithm—i.e. a procedure or rule For the purpose of our discussion, the most salient limi-
to compute predictions from input (new) data points—and tation of counterfactual explanations, shared with other
stemming from the training of machine learning models to post-hoc explanations, follows from their being “feature-
solve a given machine learning problem. We will come back highlighting” (Barocas et al. 2020), i.e. these explanation
to the teleological nature of algorithms in (4). provide “an explanation that seeks to educate the decision
subject by pointing to specific features in the model that mat-
Post‑hoc explanations of machine learning models ter to the individual decision” (Barocas et al. 2020, p. 81).4
This way of educating the decision subject is silent about the
As discussed by Selbst and Barocas “interpretability has reasons why the model makes the decision based on such
received considerable attention in research and practice due (and other) features, in the first place, e.g. why individuals
to the widely held belief that there is a tension between how in general are judged by such features. But this is highly
well a model will perform and how well humans will be able relevant for one to evaluate whether the decision based on
to interpret it” (Selbst & Barocas 2018). Following Lipton such features is justifiable. The explainee of counterfactual
(2018), we refer to post-hoc interpretability as the provision explanations must accept as a presupposition that the deci-
of understandable explanations of machine learning model sion is (reasonably?) taken based on certain features. The
outcomes, also called predictions. Despite the proliferation explainee is modelled as having exclusively self-centered,
of post-hoc interpretability tools in the literature of explaina- concrete, pragmatic interest in the specific features that are
ble artificial intelligence, we now explain more theoretically, relevant for the decision about her or him. So the explainee
with reference to prior work (Selbst & Barocas 2018), what
post-hoc interpretability explanations intend to achieve and
why they lead to partial understanding of the impersonally Footnote 2 (continued)
salient normative features of algorithmic systems.
erate the explanation (“lack of ontological stability”). In a big-data
We shall focus on a prominent example of explanations context, i.e. in presence of hundreds or thousands of variables and
for post-hoc interpretability of machine learning models, i.e. synthetic data points, hard-coding causal constraints in the synthetic
counterfactual explanations, which recently drew attention data generating algorithm that reflect a priori criteria of plausibility
in the artificial intelligence research community (Wachter or possibility is an unviable strategy, due to the time needed for con-
sidering and implementing all possible scenarios.
et al. 2017). But we focus on limits to this approach that 3
Moreover, like other post-hoc explanations, counterfactual expla-
are shared with other post-hoc explanation.2 Counterfactual nations provide only local explanations of selected machine learn-
explanations are (1) “a novel type of explanation of auto- ing outcomes; moreover, the choice of the features to highlight may
mated decisions that overcomes many challenges facing cur- reflect subjective and potentially opaque preferences of the person in
rent work on algorithmic interpretability and accountability” charge for providing counterfactual explanations to those demanding
them (“selection bias”), and are sensitive to perturbations of input
2
data (“lack of robustness” (Hancox-Li 2020)).
Additionally, counterfactual explanations may provide users with 4
scenarios, which cannot be realized in practice, as they violate, for Where clearly, “each type of feature-highlighting explanation may
example, the causal constraints between model features used to gen- define “matter” differently” (Barocas et al. 2020, p. 81).
13
256 M. Loi et al.
is not provided with essential elements necessary to judge if the deployment of algorithmic decision-making and the
whether taking such decisions based on certain features is decisions following from it is justifiable.
socially desirable or acceptable. In this context, the justifi-
cation of the algorithmic practice is treated as irrelevant to Design explanation of algorithms
the explainee.
Counterfactual explanations lack normative informative- As showed by Kroll (2018), the thesis that the understanding
ness, as it is not possible to infer normative properties of the and transparency of algorithmic-assisted decision-making is
model from explanations of individual decisions. As will limited by the inscrutability of the machine learning mod-
be later argued, the most normatively important properties els and their algorithms (i.e. the fact that they are opaque
of model-based decisions emerge from repeated applica- or “black boxes”) is criticizable. The debate on algorithm
tion of the model—they are properties of the kind of pat- inscrutability mostly depends on the meaning we attribute
terns, e.g. the distribution of errors, or of benefits, between to the expression “explaining the model” and accordingly
groups, or of groups defined by morally salient properties, “understanding the model.”
that emerge when the law of large numbers applies, as well Explanation—the process and product (Ruben 2012) of
as the model accuracy and ability to generate benefits (e.g. making something understandable—has many meanings:
profit, or other forms of utility). A case in point of a mor- definition, interpretation, individuation of the necessary
ally salient property is indirect discrimination or disparate conditions or sufficient conditions, of purposes, of functions,
impact, which can be considered morally or legally relevant and of goals. An explanation is effective when the x that is
in certain contexts, but cannot be determined by reference to explained is clear and open to people that want to understand
counterfactual explanations, because a protected group may x. An effective explanation renders an object understandable
be discriminated via proxy (e.g. ZIP code can be a proxy of and its understandability contributes to the transparency of
race), so the protected group information will not appear as the object, i.e. the quality of being easy to see through, ana-
a feature in the model. Interestingly, the opposite misun- lyze, and assess.
derstanding may also occur. A counterfactual explanation The explanation of the behavior of an algorithmic system
may show that a decision, e.g. concerning a loan, would has not only different meanings but also different levels of
have been different had the individual been of a different abstraction to which it can refer (Floridi & Sanders 2004).
race. This explanation may suggest unfairness, even when For example, if we consider a low level of abstraction for the
the intention behind using information about the protected algorithmic system by focusing on its core mechanics, then
group is used to make the prediction fairer, e.g. the informa- explainability strategies will focus on its functioning, both
tion is used to ensure the statistical property of separation from a theoretical (e.g. considering the machine learning
(Hardt et al. 2016). model, including the optimization procedures for learning)
Summing up, counterfactual explanations do represent and more engineering-oriented (e.g. the software running
a practical strategy to explanation in presence of few vari- the machine learning training and the algorithm deployment)
ables and scenario choices, where the assumption that the perspectives. However, at this level of abstraction, explain-
model makes decisions that are normatively appropriate is ability may be difficult to reach even to computer scientists
taken for granted. In what follows, we provide a model of and engineers (Lipton 2018). On the other hand, one could
transparency that relies on explanations that are relevant consider a level of abstraction where explanations clarify the
for the justification of algorithmic decisions and, thus, their purpose of algorithms; these would be understandable to the
public acceptability.5 We do not maintain that transparency public, from the end users with low expertise to policymak-
as design publicity—the approach we propose—fulfill all the ers in the need of justifying the use of algorithmic-assisted
desiderata various authors have associated with explainable decision-making, to corporate executives adopting such
and interpretable AI. Our transparency idea serves a particu- models, to computer scientists and engineers that design
lar purpose: that of normative justification. It provides the them.
kind of explanation, which is useful for the public to assess We define a design explanation of an algorithmic system
to be the explanation of what such a system does, which
essentially describes the ability of a system to achieve a
given purpose. The design explanation of an algorithmic
5
By public acceptability we do not mean public in the sense of system is an explanation by intelligent design, namely it
Rawlsian public reason (Binns 2018; Rawls 1996), which involves explains an x by referring to that for the sake of which x was
standards of justification which can be shared by individuals with dif- created. This explanation is more abstract than the mechanis-
ferent conceptions of the good sharing a commitment to core liberal tic one and corresponds to Dennett’s design stance, namely
and democratic values and principles. We assume that different stand-
ards of justification will be employed in different contexts and by dif- the intellectual strategy by which we explain the behavior of
ferent publics. a system by referring to its purpose and intentional design
13
Transparency as design publicity: explaining and justifying inscrutable algorithms 257
(Dennett 1987). Design explanations are teleological and artifact does not respond to the reasons of the person who
focus on the final cause of a system (Aristotle 2016, 2018).6 are supposed to have meaningful human control (Santoni
Design explanation is applicable to algorithms as the lat- de Sio & Van den Hoven 2018) over it. This is problem-
ter are goal-directed, human artifacts produced in a specific atic for accountability. The goal of an algorithm is usually
sociotechnical context (Baker 2004). In the design explana- a practical objective, such as profit or efficient allocation
tion of a common object such as a chair, we provide the rea- of scarce resources, but can include moral values such as
sons for which the chair was designed as such: being stable equity, beneficence, trustworthiness, and the rules that are
and comfortable; these goals directed the design of the chair socially accepted as pertinent for the domain in which the
and explain why respectively the chair has four or three legs model is employed. In both cases, the goal introduces nor-
and has an ergonomic or flat surface in the spot in which we mativity in the model, as it represents something that there
sit down. The design explanation of an algorithm comprises are good reasons to pursue. Hence, normative choices are
“the understanding of what the algorithm was designed to made both when normative standards are explicitly invoked
do, how it was designed to do that, and why it was designed in the design of a model and when they are ignored. As
in that particular way instead of some other way” (Kroll Binns points out:
2018). In other words, explaining the purpose of an algo- [W]hen attempting to modify a model to remove algorith-
rithm requires giving information on various elements: the mic discrimination on the basis of race, gender, religion or
goal that the algorithm pursues, the mathematical constructs other protected attributes, the data scientist will inevitably
into which the goal is translated in order to be implemented have to embed, in a mathematically explicit way, a set of
in the algorithm, and the tests and the data with which the moral and political constraints [6].
performance of the algorithm was verified. The goals or values that guide the design of algorithmic
We define the design transparency of an algorithmic models should therefore be included in an explanation of
system to be the adequate communication of the essential such models. Value transparency is the result of an expla-
information necessary to provide a satisfactory design expla- nation that makes the standards, norms, and goal that were
nation of such a system.7 As design explanation is made implemented in the system accessible. These normative ele-
of different elements, so design transparency can be split ments should also correspond to the reasons for which it
into various components: value transparency, translation was deployed.
transparency, and performance transparency, as we will The goal of an algorithmic system needs to be translated
now show. into something that is measured: a set of rules with which
The goal of an algorithm is something valuable that is the algorithm elaborates inputs and produce outputs. A
achieved. Since it is something that is desired by a person or machine learning algorithm requires the quantification of
group, we can also call it a good or value for that person or the goal because, in particular, the algorithm that generates
group. Value transparency should also indicate why and for the model needs to quantify the departure from the model
whom the goal is valuable, when this is not obvious from the objectives of several potential candidate models. There is
context. The design goal (e.g. identifying the most profitable no straightforward and unique way to translate a goal into a
clients, minimizing hospital readmissions) is typically also mathematic construct. For this reason, making such transla-
the goal of the person who decides to employ the artifact in tion a publicly verifiable criteria provides the public and
practice. Thus, it also figures in the intentional explanation scientific community with the information to assess how a
of the action to develop, or purchase, and employ the AI, by given goal is operationalized in machine-language. Mak-
the persons accountable for such decisions. Thus, the design ing this piece of information public constitutes transla-
explanation should indicate which is the goal—the reasons tion transparency, which is part of design transparency. In
or motivations—of the computer scientists and engineers applications, it is possible to have alternative translations
who designed the algorithm and of the persons account- in machine language of the same goal. For example, let
able for its employment in real-world settings. These goals us consider the problem of designing a predictive model
should be one and the same; when this is not the case, the of customer churn8 for an airline company. The goal is to
design and implement a predictive model of customer churn
in order to assess future profitability of a given portfolio
6
The final cause described by Aristotle can be used to explain the of customers. However, in the case of an airline company,
behavior of entities with no psychological states (desires, beliefs,
conscious purposes, etc.) such as algorithmic systems, as Aristotle
applies the teleological model of explanation to natural processes, 8
To churn or to lapse is the activity of moving out a given group. In
which have no psychological states ((Broadie 1987), (Gotthelf 1976)). business, it refers to the activity of customers to move out of portfo-
7
What counts as a satisfactory in a given context may vary from lios. Predictive models of customer churns are important to organiza-
context to context, also depending on the stakes of public justifica- tions to predict the volumes of portfolios in (future) timeframes and
tion, discussed in Sect. 4. to assess their (future) profitability.
13
258 M. Loi et al.
the business concept of “churn” could be translated into a transparency), how the algorithm rule achieves that goal and
different set of computer-understandable rules. In one case, how the goal achievement has been assessed (performance
one could simply define a customer as churned if no revenue transparency). The division of design transparency into three
is generated by the customer in a given year of interest. On components (value, translation, performance) enables the
the other hand, one could introduce churn as the absence analysis of the algorithm at three different levels of explana-
of revenue in a given year of interest and the lack of flying tions, which, as we will see, constitute also three levels that
activities (i.e. avoiding the case of zero-revenue generated a design explanation has to address.10
by customers flying using promotions). Both choices lead We illustrate with an example what the approach of
to alternative implementations of the same business goal. design transparency requires in practice. We shall argue only
Design transparency recommends to explain the definition in the next section that such design transparency is essen-
of churn and its motivations to the public. Another example tial for design publicity, i.e. for debating the justifiability of
is the goal of fairness or avoiding discrimination. Different the algorithmic practice in question. The owners of a sta-
definitions of fairness for predictive models exist and it is dium have to decide whether to adopt a face recognition
often impossible to satisfy all of them simultaneously (Berk (FR) system at the entrance of the stadium to prevent ter-
et al. 2018). Design transparency requires declaring which roristic attacks.
fairness definition has been adopted, and, if possible, to pro- For value transparency, the FR decision to block an indi-
vide a justification of such choice. vidual is explained by pointing out the design goals of the
Once the criteria to measure the degree of goal achieve- system. The primary goal is to prevent a terrorist attack in
ment are specified, a design explanation of an algorithm a place in which many people gather. Moreover, it is likely
should provide information on the effective achievement of that the goal of the system includes the properties that the
such objectives in the environment for which the system was system is reliable, fair, and avoids excessive disruptions to
built. In fact, for instance, the mere implementation of the the use of the stadium.
most advanced norms of equal treatment in a credit-granting For translation transparency, the owner of the stadium,
system does not guarantee that the system will be effectively as well as independent auditors and legal authorities, should
impartial. The impact of the algorithms and its outcomes have easy access to a lower-level (i.e. more detailed) descrip-
needs to be considered. Performance transparency consists tion of the implementation of the goal and constraints in
in indicating the logic with which the algorithm has been the FR system. Thus, the vendors of the FR systems should
tested in order to verify how much it departed from achiev- make public that the system detects faces in real time and
ing the goal and in indicating the results of such logic, start- compares the faces of people at the stadium entrance with
ing with the choice of performance measures used in both others stored in a database containing the pictures of ter-
training phase and during the assessment of the model on rorists, by extracting facial features. The vendor should
test data.9 These latter are data that have not been used dur-
ing training and whose scope is to assess the adaptability of
10
the model to unseen inputs. The test data are part of perfor- This three-level approach has a similar structure to that of the
mance transparency as the choice and the quality of them, incremental model designed by Castelluccia and Le Mètayer Inria
(2020). This model provides a methodology for assessing the impact
which can be subject to biases, influences the performance of face recognition (FR) systems and is constituted by four levels of
measure and thus the assessment of the algorithm. If perfor- analysis: the goal of the system, the means to achieve it, the suita-
mance assessment is not robust in contexts with a different bility of the use of FR systems to achieve the means, and the suit-
causal regime (e.g. hospital data from Brasil vs. from Japan), ability of a specific technology to achieve the means. However, the
objective and scope of our approach are different: while Castelluccia
transparency about the test data may reveal limitations of and Le Mètayer Inria propose a model for evaluating all the poten-
performance claims. tial impacts of FR systems (and more in general AI) on society, we
In summary, an algorithmic system has the property of provide a theoretical tool for evaluating the transparency of ML mod-
design transparency if and only if it provides the public with els and whether their use is justified within a society. Furthermore,
a comparable three-level distinction was employed in safety testing
the goal of the algorithm (value transparency), how this by DeepMind (Ortega et al., 2018). The DeepMind approach distin-
goal was translated into programming language (translation guishes three concepts of specification, corresponding to what we
call “value”, “translation”, and “performance”, which are respec-
tively the ideal specification (the general description of an ideal AI
system that is fully aligned to the desires of the human operator), the
9
Training and test data are often the result of a random split of an design specification (the specification that is actually used to build the
original set of data used for modelling purposes. This implies that AI system), and the revealed specification (the specification that best
the object resulting from training and the outcomes of which are the describes the system’s behavior). However, unlike our model, Deep-
object of the explainability analysis is in reality a pair consisting of Mind applies these three levels specifically to the security tests that
the model and a random seed, which is the integer value chosen by an AI company should conduct as an internal practice, and it does not
the analyst that governs the randomness in the routines leading to the intend these levels to be used to describe the system for the sake of
training of the model itself. transparency.
13
Transparency as design publicity: explaining and justifying inscrutable algorithms 259
also explain the mathematical balance between the goals of for which it has been designed. This explanation is however
security and non-disruption of business, e.g. that the social problematic in the light of the fact that, when algorithmic
disutility of allowing an actual terrorist in (a false negative) decisions are based on statistical predictions, they will often
is regarded as equivalent to that of preventing an actual non- fail to decide in a way that directly promotes the goal the
terrorist from seeing the match (a false positive). Thus—the model is designed to achieve. E.g. a loan is refused to some-
vendor could explain—the algorithm is trained to maximize one willing and able to repay it, an inmate who will not
classification accuracy (i.e. the percentage of correct predic- reoffend is denied parole, a patient is prematurely released
tions), ignoring the distinction between type-I and type-II from the hospital, causing readmission. This is because deci-
errors.11 Furthermore, let us assume that the fairness of the sions based on imperfect predictions about stochastic events
system is translated with the mathematical notion of equal- will typically be often wrong, but sufficiently often right to
ity in the false-positive and false-negative rate across all the justify the use of the model in practice. In the next section
legally defined race and gender groups. we are going to show why even the statistical imperfection
For performance transparency the vendor should provide of a model can be justified by appealing to its design goals
actual measures of the relevant performance metrics, that and the trade-offs between all values pertaining to the jus-
should be coherent with the translation assumptions above tification of its use.
(i.e., classification accuracy and a meaningful comparison There is a further type of transparency—consistency
of group-related false-positive and false-negative rates). To transparency—that contributes to explain individual deci-
make sense of the robustness of performance measures, the sions by algorithms, given the assumption that the employ-
vendor should also provide information about the type of ment of such systems should be minimally fair. Consistency
data with which the system was trained, whether the data transparency is showing proof that consistency is achieved,
were tested for possible biases (e.g. if there were only few i.e. that the algorithm always generates predictions by the
faces of a given ethnicity), the confidence level of the human same rules even when we cannot observe those rules in oper-
coders classifying the pictures with which the system was ation. Consistency is not a feature of the model but of its
trained, and the contexts in which the system works bet- deployment. It does not contribute to explain why the model
ter (e.g. with pictures of man instead of woman, with good works in a certain way, but why certain decisions are made
lighting and high resolution only). Additionally, to achieve (namely, they result from applying the model consistently).
performance transparency, one would require information on Consistency can even be a property of the deployment of an
how the system performance was assessed (e.g. disclosing algorithm that applies a discriminatory rule such as filter-
information on the partition of data into training and test ing job candidates by their residence address. Nonetheless,
subsets), including the specification of whether the perfor- as consistency transparency shows that identical cases are
mance was assessed in the same context used for training or treated identically, it represents the first step towards fair-
in a totally different environment. Notice that in this exam- ness; it is a sort of basic requirement of fairness that, as we
ple, every level of design transparency consists of objection- shall show, is necessary but not sufficient to justify each
able claims, exposing accountable parties to criticism by the decision as fair.
experts and the broader public. E.g. security experts may In some cases, models are unidentifiable, by which we
object that false negatives are far more important than false mean that in most AI powered solutions the underlying
positives, ethics experts may object to translating fairness, machine learning models are updated (i.e. retrained) with
in this context, as equality in the false positive and false frequencies that depend on the domain of applicability of the
negative rate, and NGOs may point to racial biases in the solution itself. This implies that an AI potentially generates
way databases of terrorist faces are built. different outcomes for the same end user, depending on the
An important step in addition to design transparency moment at which the outcome is generated: any explana-
concerns the explanation of the singular decisions by the tion of this outcome (for the purpose of contesting or audit-
artifact, which should be distinguished logically from the ing it, for example) depends on time, as well. Consistency
nature of the artifact itself. The algorithm’s performance transparency requires that changes in a model be declared
connects the explanation of the artifact (i.e. an algorithm, because, as we shall maintain, this is relevant for their justi-
or rule) with the application of the rule to particular cases. fication. Consistency is a normative goal and showing that
The simple solution is to view each individual decision as a it is achieved by the model contributes to explaining why
means through which the artifact achieves the overall goal an individual decision is made–namely, by showing that it
is explained by a normative consideration. Conversely, the
failure to satisfy consistency implies that the decisions of
11 the model can be challenged on a specific normative ground.
The alternative being a performance measure assigning a differ-
ent weight to the avoidance of type-I and type-II errors (Kraemer et al In conclusion, the design explanation of the model shows
2011; Corbett-Davies et al 2017). that an algorithmic model gives a decisional outcome
13
260 M. Loi et al.
because the model pursues a certain goal (value transpar- a primary goal (e.g. a business objective); some more
ency), which is translated into mathematical constructs advanced algorithms are also designed to take into consid-
implemented in the algorithm (translation transparency), eration a plurality of different values, such as fairness or
which in turn enables one to verify whether the model privacy that often can be conceptualized as moral or legal
achieves the goal (performance transparency). When, as constraints. Constraints are typically in trade-off with the
in most cases, consistency is among the reasonable goals primary goal and affect the way and the extent to which the
of model deployment, the explanation of the decisions by primary goal can be achieved. For the sake of simplicity, we
the model includes consistency transparency. We hypoth- will refer to both goals and constraints (as goals), in what
esize that in achieving design transparency, one can take follows.
into account domain-specific features of algorithms, as well The first step of the justification of decisions taken by an
as the level of expertise, knowledge and interests of their algorithm, thus, requires evaluating the goals and constraints
end-users. These factors are currently considered important that the algorithm, respectively, achieves and respects. In a
in the scientific debate around explainable AI, model trans- justified algorithm, they reflect those values and constraints
parency and interpretability.12 This, together with the use of a reasonable person may want to see promoted/respected in
results from psychology and cognitive science to improve the context of a service.
the understanding of the processes behind model interpreta- The primary goal of the algorithm is essential to show
tion by end-users (Miller 2019), represents a viable strategy that decisions are not morally arbitrary. Publicly recognized
to avoid what Miller et. al. refer to as “inmates running the forms of social utility, such as security (in the FR example)
asylum” (Miller et al. 2017). or profit may fill this purpose. Primary goals matter to jus-
tification when they are valuable goals, e.g. there are good
Design publicity and justification reasons to pursue such goals, which can be explained by
reference to values commonly accepted in society, including
We define design publicity as adequate communication of moral, political or legal values, as well as the profit gener-
the essential elements needed for determining if a deci- ated by free market exchanges, in capitalist societies that
sion driven or made by an algorithmic system is justified. rely on profit as the motive stimulating socially efficient eco-
(The judgments in question are meant to be impartial and to nomic activity, resource allocation, and risk taking. Other
enable an informed public discussion about the use of such goals (the “constraining” goals) typically reflect value con-
systems.) siderations, e.g. privacy or fairness. Different types of jus-
In what follows, we argue that both design transparency tification are possible, for example in terms of common or
and consistency transparency, as defined in “Design expla- philosophical morality, of the law, of by virtue of political
nation of algorithms,” are necessary for design publicity, principles and values that may be universal or characteristic
because they are necessary to assess if and how the decision of the society in which the model operates. Take, for exam-
taken by (or with the help of) an algorithmic system is justi- ple, anti-discrimination as the general name of a value that
fied (when it is). society expects from a FR service, and that contributes to
Design publicity provides information about (a) the goal define the goal of the algorithms (in this case, by constrain-
the algorithm is designed to pursue and the moral constraints ing the distribution of errors in the population affected by
it is designed to respect (value transparency); (b) the way algorithmic driven decisions). Value transparency requires
this goal is translated into a problem that can be solved by that these normative goals and the reasons for considering
machine learning (translation transparency); (c) the perfor- them are clearly specified—i.e. the choice of such normative
mance of the algorithm in addressing problem (performance goal is not a mere arbitrary decision by the data scientists.
transparency); and (d) a proof of the fact that decisions are It contributes to the ability of the public to understand and
taken by consistently applying the same algorithm (consist- assess the validity of a potential justification for accepting
ency transparency). Let us now consider how each of these decisions taken by a model pursuing such goal. If the goals
elements contributes to the justification of using an algo- and constraints pursued by a model do not reflect values
rithm and of the decisions that follow from its use. worth pursuing, the decisions following from the model are
Let us begin with the goal or goals the algorithm is not justified.
designed to pursue. All algorithms are designed to pursue An algorithm pursuing such goals will achieve them to
a determined degree, which is expressed by “performance
transparency.” The performance can only be assessed by
translating the goals in question into measurable quantities.
12
The multi-faceted nature of transparency of algorithms is dis- This exercise of translation is not trivial. With reference to
cussed in (Pégny & Ibnouhsein, 2018), where the authors describe the
distinction between its epistemic (e.g. intelligibility and explicability) the FR example above, the translation of a moral constraint
and normative (e.g. loyalty and fairness) desiderata. (e.g. anti-discrimination) into a quantifiable performance
13
Transparency as design publicity: explaining and justifying inscrutable algorithms 261
measure (e.g. equality in the false positive and false nega- yet challenge the necessity of implementing the model when
tive rate across racial and sex groups) should be given a nor- taking a decision about him. The particular individual may
mative grounding, and not simply be assumed. If the value argue: “I understand that the algorithm achieves these goals
translation is not declared and no reason to accept it is given, and that it does so in a reasonable way. But why can’t you
the decisions of the model are not even prima facie justified. make an exception for me?”. This would violate consist-
Performance transparency is especially important when ency. For example, suppose that a software is used to rand-
there are trade-offs between different values simultaneously omize access to scarce life-saving resources in a hospital of
pursued by a model. Performance metrics provide an impor- a dystopian country. This software translates fairness into
tant indication of the extent to which every value has been a basic mathematical condition, which is equal chances of
achieved, which is especially important for the overall justi- getting the resource in question. This goal can be achieved
fication of the system when a value can only be achieved at by an algorithm whose outcome is completely random. Yet
the expense of another value. For example, fairness can only consistency would be violated if, when the case of the head
be pursued at the expense of efficiency (Corbett-Davies et al. physician’s son is submitted to it, the randomized model is
2017; Wong 2019). Performance transparency provides an no longer used by the person in charge, who recognizes the
indication of the degree to which both values, of efficiency head physician’s son, and assigns the resource to him. In this
and (quantified) fairness have been sacrificed. case, the software does not satisfy consistency.
Notice that design publicity does not require that indi- The violation of consistency for an arbitrary reason (e.g.
viduals that are accountable for algorithmic decisions pro- the case of the head physician’s son) is incompatible with
vide fully persuasive and non-corrigible justifications. It is equal respect; on the other hand, if the same exception were
sufficient that they declare what they take to be the relevant made for everyone who had an interest to demand it, the
elements, exposing themselves to public scrutiny, as the algorithm wouldn’t achieve its design goals, which justify
above-mentioned FR case exemplifies. As Pak-Hang Wong its use. The violation of consistency is incompatible with
observes “the idea of […] algorithmic fairness is […] con- formal justice, i.e., “the impartial and consistent administra-
testable […] there is a great number of definitions of what tion of laws and institutions” [29], applied to the algorithm,
[…] algorithmic fairness amounts to, and it seems unlikely considered as a law, or as an institution. This is why—we
for researchers […] to settle on the definition of fairness maintain—algorithms that change their identity as they are
any time soon” (Wong 2019). Design publicity is intended used are normatively problematic in high-stake decisions.
to empower the public to debate also such choices, so as to In such cases, any change due to retraining should be at
enable their revision. This is compatible with the idea of least publicized, and justified, by pointing out a considerable
the perfectibility of the public justification of algorithms improvement in performance, which overrides consistency
over time, which is what we intend to enable through design concerns.
publicity. When the design of an algorithm is justified, then, if
There is still a gap in the justification of individual deci- the algorithm is also used consistently, we obtain a proce-
sions. As anticipated, the fact that prediction-based decisions dural justification of all the decisions that follow from it.
will often be wrong can be justified. In the case of machine To explain this kind of justification, we draw from Rawls’s
learning-driven algorithms, individual mistreatment hap- idea of the justification of individual shares of the goods pro-
pens because the information necessary to always make duced by cooperation (Rawls 1999). Rawls rejects the idea
perfect predictions does not exist. And even the information of allocative justice, namely, he rejects describing justice as
required to make a model more accurate may be too costly to a property of the end-state of process of the distribution of
collect, or cannot be collected in morally permissible ways. goods, a property independent from how that distribution
It is known that value-driven design that considers privacy came about. For example, an end-state distribution is just,
and non-discrimination pays a price in terms of predictive according to a resource egalitarian account, if and only if
accuracy (Hajian et al. 2015) and efficiency (Corbett-Davies resources are equally distributed, according to a meritocratic
et al. 2017). A further reason why errors are unavoidable is account, if and only if resources are proportional to each
that some outcomes result from human free will, for exam- person’s contribution to society, and according to a utilitar-
ple, success during parole. The same considerations (of cost, ian one, if and only if the distribution maximizes utility.
privacy, or fairness) justify statistical decisions that rely on As Nozick (1974) observes, these allocative end-states are
incomplete information, even when it is theoretically possi- undermined by processes, like markets, that are not fully
ble to collect and analyze all the information that matters, in deterministic, because they are perturbed by human free
principle, if one is to treat each individual case “as a distinct decisions. In Nozick’s slogan, liberty upsets patterns. Impor-
individual” (Lippert-Rasmussen 2010). tantly, this applies to many cases of algorithmic decision-
An individual subjected to an unfavorable decision may making, where the outcomes that justify the decision are
accept, in principle, that the algorithm is justified as a whole, future events that depend on the free will of an individual.
13
262 M. Loi et al.
According to non-compatibilist libertarianism, for example, predictions to assist or automate decision-making. We pro-
believing that an inmate success on parole could be pre- pose a form of transparency that consists in publicizing the
dicted with perfect precision is tantamount to denying that design of an artifact (including value, translation and per-
the inmate has free will. formance) as well as its consistent application. We maintain
This gives us a moral reason to consider Rawls’s pro- that this kind of transparency provides (1) an explanation
cedural alternative to end-state conceptions. In this case, of the artifact, namely, an explanation “by design”; (2) an
distributive shares are just if they result from just institu- intentional explanation of its deployment; (3) a justification
tions. But unlike Nozick, Rawls relates the justification of of its use; (4) when used consistently, a procedural justifica-
institutions to the outcomes they tend to bring about, their tion of the individual decisions it takes.
general statistical tendencies, considered from a suitably The proposed approach to algorithmic transparency devi-
general perspective. As in Hume it is the “general scheme ates from the existing body of literature on explainable arti-
or system of action, which is advantageous” (Hume et al. ficial intelligence (xAI), where the concept of transparency
2000), not every single decision is considered individually. focuses on the explanation of the inner workings of algo-
The outcomes which justify the institutions are characterized rithms or the interpretability of their individual outcomes
by Rawlsian principles of justice. Rawls (Rawls 1999), for (Lipton 2018; Ribeiro et al. 2016). We do not claim here
examples, requires economic institutions as a whole (includ- that transparency as design publicity achieves the goals that
ing taxation) to maximize the expectations of the worst off these approaches are said to achieve. Rather, we stress that
groups in society. If institutions are justified, and if they are transparency as design publicity achieves a distinct goal,
consistently and impartially applied, then the outcomes of namely, providing the public with the essential elements that
free human decisions constrained by institutions are just, are needed in order to assess the justification (and, when
whatever they are. consistency is satisfied, procedural justice) of the decisions
For algorithmic decision-making, the principles of justice that follow from its deployment.
correspond to its design goals. The design goals of the algo-
rithm are that which justifies an algorithm which amounts Acknowledgements We wish to thank Maël Pégny, our audience at the
2019 CEPE (Computer Ethics - Philosophical Enquiry) Conference, in
to specific rules (including inscrutably complex ones). To particular prof. dr. Philip Brey and dr. Paul B. de Laat, and two anony-
assess if inscrutable algorithms satisfy their “principles mous referees of FAT* conference 2020, for insightful comments on a
of justice” we consider their performance. If they do, the previous version of this paper.
consistent and impartial application of the algorithm to
individual cases corresponds to the consistent and impar- Funding Open access funding provided by University of Zurich. This
work was supported by European Union’s Horizon 2020 Research and
tial administration of just institutions. Summing up in one Innovation Programme (Grant No. 700540), Swiss State Secretariat
word: we are bound by procedural justice to accept as just for Education, Research and Innovation (Grant No. 16.0052-1) and
only consistent decisions that result from the application of Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen
an algorithm that is justified by design. Forschung (Grant No. National Research Programme 75 "Big Data"
407540_167218).
Notice that, in the institutional case, the fact that institu-
tions are administered consistently and impartially is a pub- Open Access This article is licensed under a Creative Commons Attri-
lic fact. This publicity is achieved thanks to special proce- bution 4.0 International License, which permits use, sharing, adapta-
dures. E.g. the consistent fulfillment of the legal obligations tion, distribution and reproduction in any medium or format, as long
emerging from civil law can be tested by going to court. In as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons licence, and indicate if changes
the algorithmic case, the consistent application of inscruta- were made. The images or other third party material in this article are
bly complex rules appears to lack transparency. The solution included in the article’s Creative Commons licence, unless indicated
to this is to provide a technical solution that delivers a proof otherwise in a credit line to the material. If material is not included in
that the rules are followed—that is, consistency—even when the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will
the rules themselves are not transparent to anyone because need to obtain permission directly from the copyright holder. To view a
the algorithm is a black box; it appears that this is indeed copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
technically feasible (Kroll et al. 2017).
References
Conclusion
Aristotle (2016). Metaphysics (C. D. C. Reeve, trans.). Indianapolis,
In this paper, we discuss what it means to achieve transpar- USA: Hackett Publishing Company.
ency for machine learning algorithms, i.e. the provision of Aristotle (2018). Physics. (C. D. C. Reeve, trans.). Indianapolis, USA:
Hackett Publishing Company.
explanations to see through, analyze, and assess artifacts
trained on data via machine learning methods and generating
13
Transparency as design publicity: explaining and justifying inscrutable algorithms 263
Baker, L. R. (2004). The ontology of artifacts. Philosophical Explora- Lippert-Rasmussen, K. (2010). “We are all different”: statistical dis-
tions, 7(2), 99–111. https://doi.org/10.1080/138697904100016 crimination and the right to be treated as an individual. The Jour-
94462. nal of Ethics, 15(1), 47–59.
Barocas, S., Selbst, A. D., & Raghavan, M. (2020). The hidden assump- Lipton, Z. C. (2018). The mythos of model interpretability. Queue,
tions behind counterfactual explanations and principal reasons. 16(3), 31–57.
Proceedings of the 2020 Conference on Fairness, Accountability, Miller, T. (2019). Explanation in artificial intelligence: Insights from
and Transparency. https://doi.org/10.1145/33510953372830. the social sciences. Artificial Intelligence, 267, 1–38. https://doi.
Berk, R., Heidari, H., Jabbari, S., Kearns, M., & Roth, A. (2018). Fair- org/10.1016/j.artint.2018.07.007.
ness in criminal justice risk assessments: The state of the art. Miller, T., Howe, P., & Sonenberg, L. (2017). Explainable AI: Beware
Sociological Methods & Research. https://doi.org/10.1177/00491 of Inmates Running the Asylum Or: How I Learnt to Stop Wor-
24118782533. rying and Love the Social and Behavioural Sciences. https://arxiv
Binns, R. (2018). Algorithmic accountability and public reason. Phi- .org/abs/1712.00547. Accessed 13 Aug 2020.
losophy & Technology, 31(4), 543–556. https://doi.org/10.1007/ Mitchell, T. M. (1997). Machine Learning (1st ed.). NY, USA:
s13347-017-0263-5. McGraw-Hill Education.
Broadie, S. (1987). Nature, craft and phronesis in Aristotle. Philosophi- Mittelstadt, B., Russell, C., & Wachter, S. (2019). Explaining
cal Topics, 15(2), 35–50. explanations in AI. In Proceedings of the Conference on Fair-
Castelluccia, C., & Le Métayer Inria, D. (2020). Impact analysis ness, Accountability, and Transparency, 279–288. https://doi.
of facial recognition: Towards a rigorous methodology. HAL, org/10.1145/3287560.3287574.
hal–02480647. Nozick, R. (1974). Anarchy, State, and Utopia. Basic Books.
Corbett-Davies, S., Pierson, E., Feller, A., Goel, S., & Huq, A. (2017). Ortega, P. A., Maini, V., & DeepMind safety team. (2018). Building
Algorithmic decision making and the cost of fairness. Proceed- safe artificial intelligence: Specification, robustness, and assur-
ings of the 23rd ACM SIGKDD International Conference on ance. https://medium.com/@deepmindsafetyresearch/building-
Knowledge Discovery and Data Mining, doi, 10(1145/3097983), safe-artifi cial-intelligence-52f5f75058f1. Accessed 13 Aug 2020
3098095. Pégny, M., & Ibnouhsein, M. I. (2018). Quelle transparence pour les
Dennett, D. C. (1987). The intentional stance. Cambridge, USA: MIT algorithmes d’apprentissage machine ?https://hal.inria.fr/hal-
Press. 01791021. Accessed 13 Aug 2020
Floridi, L., & Sanders, J. W. (2004). On the morality of artificial agents. Rawls, J. (1996). Political Liberalism (Expanded ed.). NY, USA:
Minds and Machines, 14(3), 349–379. https://doi.org/10.1023/ Columbia University Press.
B:MIND.0000035461.63578.9d. Rawls, J. (1999). A Theory of Justice (2nd ed.). Cambridge, USA:
Gotthelf, A. (1976). Aristotle’s conception of final causality. Review of Harvard University Press.
Metaphysics, 30(2), 226–254. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust
Hajian, S., Domingo-Ferrer, J., Monreale, A., Pedreschi, D., & Gian- you?”: Explaining the predictions of any classifier. Proceedings of
notti, F. (2015). Discrimination- and privacy-aware patterns. Data the ACM SIGKDD International Conference on Knowledge Dis-
Mining and Knowledge Discovery, 29(6), 1733–1782. https://doi. covery and Data Mining. https://doi.org/10.1145/2939672.29397
org/10.1007/s10618-014-0393-7. 78.
Hancox-Li, L. (2020). Robustness in machine learning explana- Ruben, D.-H. (2012). Explaining Explanation (Updated and expanded
tions: Does it matter? Proceedings of the Conference on Fair- 2nd ed). Paradigm Publishers.
ness, Accountability, and Transparency, doi, 10(1145/3351095), Santoni de Sio, F., & Van den Hoven, J. (2018). Meaningful human
3372836. control over autonomous systems: A philosophical account. Fron-
Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in tiers in Robotics and AI. https: //doi.org/10.3389/frobt. 2018.00015.
supervised learning. Advances in Neural Information Processing Selbst, A. D., & Barocas, S. (2018). The intuitive appeal of explainable
Systems, 3315–3323. machines. Fordham L. Rev., 87, 1085.
Hume, D., Norton, D. F., & Norton, M. J. (Eds.). (2000). A Treatise of Ustun, B., Spangher, A., & Liu, Y. (2019). Actionable recourse in
Human Nature: Being an Attempt to Introduce the Experimental linear classification. Proceedings of the Conference on Fairness,
Method of Reasoning into Moral Subjects. Oxford, UK: Oxford Accountability, and Transparency. https://doi.org/10.1145/32875
University Press. 60.3287566.
Kraemer, F., van Overveld, K., & Peterson, M. (2011). Is there an Wachter, S., Mittelstadt, B., & Russell, C. (2017). Counterfactual
ethics of algorithms? Ethics and Information Technology, 13(3), explanations without opening the black box: Automated decisions
251–260. https://doi.org/10.1007/s10676-010-9233-7. and the Gdpr. Harvard Journal of Law & Technology, 31(2), 841.
Kroll, J., Huey, J., Barocas, S., Felten, E., Reidenberg, J., Robinson, D., Wong, P.-H. (2019). Democratizing Algorithmic Fairness. Philosophy
et al. (2017). Accountable algorithms. University of Pennsylvania & Technology. https://doi.org/10.1007/s13347-019-00355-w.
Law Review, 165(3), 633.
Kroll, J. A. (2018). The fallacy of inscrutability. Philosophical Trans- Publisher’s Note Springer Nature remains neutral with regard to
actions of the Royal Society A: Mathematical, Physical and Engi- jurisdictional claims in published maps and institutional affiliations.
neering Sciences, 376(2133), 20180084. https://doi.org/10.1098/
rsta.2018.0084.
13