Algorithmic Decision Making - Case Study
Algorithmic Decision Making - Case Study
ALGORITHMIC DECISION-
MAKING AND ACCOUNTABILITY
In our city it is not always clear when and why agencies deploy algorithms, and when they do,
it is often unclear what assumptions they are based upon and what data they even consider….
When government institutions utilize obscure algorithms, our principles of democratic
accountability are undermined. As we advance into the twenty-first century, we must ensure
our government is not “black-boxed,” and I have proposed this legislation not to prevent city
agencies from taking advantage of cutting edge tools, but to ensure that when they do, they
remain accountable to the public.4
This work is licensed under a Creative Commons Attribution 4.0 International License.
Ethics, Technology, and Public Policy – Algorithmic Decision-Making and Accountability | 3
The county agreed to give ProPublica the risk scores assigned to 7,000 people who were arrested
there during a two-year period. The reporters analyzed those scores alongside publicly available
criminal records for those people, and jail information over the following years, which let them see
how accurate the predictions turned out to be. In other words, did the people classified as being at a
high risk of committing a new crime actually go on to commit one, and if so, was it a violent crime?
Likewise, did supposedly low-risk people indeed steer clear of future criminal activity?
What they found was disturbing. Broward’s COMPAS scores appeared to be both inaccurate and
racially biased.10 Here is a synopsis of their analysis:
The score proved remarkably unreliable in forecasting violent crime: Only 20 percent of the
people predicted to commit violent crimes actually went on to do so.
When a full range of crimes were taken into account — including misdemeanors such as driving
with an expired license — the algorithm was somewhat more accurate than a coin flip. Of those
deemed likely to re-offend, 61 percent were arrested for any subsequent crimes within two years.
We also turned up significant racial disparities.... In forecasting who would re-offend, the
algorithm made mistakes with black and white defendants at roughly the same rate but in very
different ways.
The formula was particularly likely to falsely flag black defendants as future criminals, wrongly
labeling them this way at almost twice the rate as white defendants. White defendants were
mislabeled as low risk more often than black defendants.11
So COMPAS assessments of white defendants and black defendants were both inaccurate at about
the same rate, but the predictions for the two races were inaccurate in different directions. Black
defendants’ scores tended to be false positives, while white defendants got false negatives. (See
Exhibit 2 for excerpts from the ProPublica Machine Bias white paper, with more technical details.)
Nowhere in the 137-question surveys were defendants asked what race they were. So how could this
be?
an algorithm is only as good as the data it works with. Data is frequently imperfect in ways that
allow these algorithms to inherit the prejudices of prior decision makers. In other cases, data may
simply reflect the widespread biases that persist in society at large. In still others, [the algorithm]
can discover surprisingly useful regularities that are really just preexisting patterns of exclusion
and inequality.13
In 2016, The White House released a report on the promises and perils of Big Data, from credit
scoring to hiring algorithms to college admissions programs.14 In each case, the report showed,
algorithms present both opportunities and challenges; they can either mitigate or contribute to
discriminatory results. The report stressed that “it is a mistake to assume [algorithms] are objective
simply because they are data-driven.” The report identified four different ways that problems with
the data algorithms use can potentially cause biased outcomes:
1) Poorly selected data
2) Incomplete, incorrect, or outdated data
3) Selection bias
4) Unintentional perpetuation and promotion of historical biases, “where a feedback loop
causes bias in inputs or results of the past to replicate itself in the outputs of an algorithmic
system.”15
The inner workings of the algorithm itself can also be a source of bias. An algorithm’s programmer
must make decisions during the development process, about what will make it run efficiently, what
conditions to optimize for, and so on. Depending on what assumptions and decisions go into the
program’s design, flaws like these can also lead to bias on the back end:
5) Poorly designed matching systems
6) Personalization and recommendation services that narrow instead of expand user options
7) Decision-making systems that assume correlation necessarily implies causation
8) Data sets that lack information or disproportionately represent certain populations16
These problems can also be exacerbated in “machine learning” algorithms, which are designed
to “learn” and evolve over time, in increasingly complex ways that their programmers can not
necessarily anticipate, control, or even understand.17 The White House report argues that as
complexity increases, “it may become more difficult to explain or account for the decisions machines
make through this process unless mechanisms are built into their designs to ensure accountability.”18
This work is licensed under a Creative Commons Attribution 4.0 International License.
Ethics, Technology, and Public Policy – Algorithmic Decision-Making and Accountability | 5
COMPAS in Wisconsin: What about due process?
COMPAS, the risk assessment software used by the Broward County Sheriff’s Office that ProPublica
analyzed, is not a machine learning program; it’s a simple set of scores, from 1 to 10, calculated from
different so-called “risk factors” and other information.19
The main pre-trial hearing judge in Broward County told ProPublica that he relied on his own
judgment, and not the COMPAS scores, to make decisions in his court. The makers of the software
said that it wasn’t meant to be used for higher-stakes decisions in court -- like, say, how long a
sentence someone should get if they were found guilty of a crime. (The “Alternative Sanctions” at the
end of the COMPAS acronym refers to different options the score might steer a judge to choose for a
guilty defendant other than jail time -- like probation, “Drug Court,” or mental health services.)
But at least at the time of the investigation, COMPAS scores were in fact being given to judges at
the time of sentencing, in Arizona, Colorado, Delaware, Kentucky, Louisiana, Oklahoma, Virginia,
Washington and Wisconsin.
Judges in these jurisdictions might, or might not, make reference to the score in court when
announcing the bail determination. The defendant and defense attorney might or might not even be
aware that the score exists, let alone how the COMPAS arrived at that score. For some criminal justice
experts, this scenario raises concerns about due process, especially when the scores are affecting
such vital decisions as how long someone will be sentenced to prison time.
“Risk assessments should be impermissible unless both parties get to see all the data that go into
them,” Christopher Slobogin, director of Vanderbilt Law School’s criminal justice program told
ProPublica. “It should be an open, full-court adversarial proceeding.”20
In fact, this very issue was the crux of an important criminal case that made it all the way to the
Wisconsin Supreme Court.
In 2013, 34-year-old Eric Loomis pleaded guilty to driving a stolen car and evading police after he got
caught driving a car that had been used in a shooting in LaCrosse County, Wisconsin. The LaCrosse
circuit judge sentenced him to six years in prison. At Loomis’s sentencing hearing, the judge
mentioned a risk score that had apparently played into this decision. Judge Scott Horne told the court
that Loomis had been “identified, through the COMPAS assessment, as an individual who is at high
risk to the community.” 21 Loomis’s prosecutors had also referred to his COMPAS score in their own
arguments, saying: “the COMPAS report that was completed in this case does show the high risk and
the high needs of the defendant. There’s a high risk of violence, high risk of recidivism, high pre-trial
risk; and so all of these are factors in determining the appropriate sentence.”
Loomis had admitted guilt, but he objected to the way his sentence was being decided. In an appeal,
Loomis argued that his right to due process had been violated, since this score was calculated
through a private company’s proprietary algorithm. He hadn’t been given the right to challenge the
evidence used against him in the sentencing hearing, he argued; he didn’t have access to it at all.
The case made it all the way to the Wisconsin Supreme Court, which ruled on it in the summer of
2016. The justices ruled against Loomis’s appeal, and said that it did not violate a defendant’s due
process rights when a risk assessment score was used in court. Loomis’s sentence stood. Loomis then
appealed to the U.S. Supreme Court, which declined to hear his case.22
The problem arose from the very beginning of the calculation. And the researchers couldn’t find a
way around it; it seemed to be an unresolvable Catch-22: “A risk score, they found, could either be
equally predictive or equally wrong for all races — but not both.”
One researcher who seemed to have the beginning of a solution was Alexandra Chouldechova,
Assistant Professor of Statistics & Public Policy at Carnegie Mellon University. She found that,
counter-intuitively, the only way for an equation like this to produce unbiased outcomes would be to
weight factors differently for people of different races on the input side.31 But is that fair?
And there are other Catch-22s, as well: another group of researchers used the same COMPAS dataset
to explore how the risk assessment system could optimize for “improving public safety,” or for
“satisfying prevailing notions of algorithmic fairness,” but not both.32
The puzzle that COMPAS presented continues to intrigue computer scientists and researchers
working on how to improve fairness in machine learning systems.
For ethicists, or for anyone thinking about their expectations of justice and transparency in the
institutions they interact with, “fairness” is an abstract but simple concept. But for computer
scientists, there are many competing ideas about what fairness means, in the technical or
mathematical sense. Every programmatic system is embedded with different assumptions about
what methods are most efficient, and what outcomes are the most desirable. These assumptions and
choices will also necessarily have different trade-offs.
One classic “trade-off” example that has long been discussed in philosophy classes is the “trolley car”
thought experiment. (Briefly: is it more ethical to do nothing and let a runaway trolley car kill five
people, or to pull a lever and cause the car to switch tracks and kill only one person?) But in computer
science, there are trolley-car problems at every turn. In fact, Arvind Narayanan, who teaches computer
science at Princeton, gave a talk in February 2018 summarizing 21 definitions of fairness.33
It’s true that in most subfields of [computer science], there are usually a small/manageable
number of objective desiderata. In machine learning, there are standard benchmarks for most
problems. Say for object detection, measure accuracy on ImageNet. It doesn’t that much matter
which standard we pick, just that we have a standard. In other fields, definitions matter more,
but the community can agree on a definition, such as semantic security in cryptography. Well,
this is not going to happen for fairness in [machine learning]. Fairness is an inherently, deeply
social concept. Depending on the context, it will translate into different subsets of mathematical
criteria. And all the math in the world won’t fully capture what we mean by fairness. [...] This calls
for new structures of collaboration, which is what makes the field so exciting.35
This shouldn’t be hard to find out: ideally you would divide judges in a single county in half, and
give one half access to a scoring system, and have the other half carry on as usual. If you don’t
want to A/B test within a county—and there are some questions about whether that’s an ethical
thing to do—then simply compare two counties with similar crime rates, in which one county uses
rating systems and the other doesn’t. In either case, it’s essential to test whether these algorithmic
recidivism scores exacerbate, reduce, or otherwise change existing bias. [….]
As far as I can find, and according to everybody I’ve talked to in the field, nobody has done this
work, or anything like it. These scores are being used by judges to help them sentence defendants
and nobody knows whether the scores exacerbate existing racial bias or not.43
Research like this may be more complicated than she thinks, said Suresh Venkatasubramanian, a
computer scientist who is prominent in the growing field of fairness in machine learning. But he
agreed that it was “bewildering” that no one seemed to be trying to do it at all. “I often wonder if it’s
because part of the rationale for using automated methods is ease of use and speed, and having to do
elaborate studies on their efficacy defeats the purpose,” Venkatasubramanian told Eveleth.44
This work is licensed under a Creative Commons Attribution 4.0 International License.
Ethics, Technology, and Public Policy – Algorithmic Decision-Making and Accountability | 11
argued that opening up the software for testing by New Yorkers was a bad idea because it “could
empower users to fabricate answers that will get them the response they want.”
Craig Campbell, Special Adviser to the Mayor’s Office on Data Analytics, was at the hearing to answer
questions about data analytics and open data projects in the city government. Vacca asked Campbell
about the formula that the government used to distribute firefighters throughout the city, and why
even he, as a lawmaker, wasn’t able to know more about how it worked. But Campbell said he was
unfamiliar with it, that he hadn’t worked with the FDNY at all. He said his department provided
data-analytics assistance to the city’s agencies as they needed it, but it didn’t oversee what all of the
agencies did.
Vacca seemed annoyed. “Is there no centralized oversight over when agencies deploy potentially
complex data analytics?” he asked.
“There is not, and I would argue that it’s probably better that way,” Campbell replied. “The city has
been organized with the idea of putting the technology as closely as possible to the actual operational
functionality that the agencies have to deliver.”
The open-source question, and citizens’ privacy
Vacca had another question. Campbell and Sunderland had said that publishing the city’s algorithms’
source code would present security risks, but Vacca said he had heard from computer science
experts that open-source software could actually be more secure. So which one was it?
Campbell explained that software that was developed in an open-source, collaborative way was
one thing, but that the process of making public once-proprietary software was different. The city’s
algorithms hadn’t designed from the beginning to protect sensitive information from people wanting
to do harm, he said. Reverse-engineering that would be too difficult.
Another broad objection that Sunderland raised in his testimony was that the proposed bill would put
the information of the most vulnerable New Yorkers -- those who are disabled or receive government
benefits, for instance -- at risk of being exposed to people or institutions who might discriminate
against them because of it, or to people who might want to take advantage of that information in
other ways. Taline Sanassarian, the policy director for the technology industry trade group Tech
NYC, agreed with those privacy concerns. “Indeed, one may look no further than the recent breaches
of data, including Equifax, which affected as many as 145 million Americans,” she said, “in which
sensitive personnel information was stolen from current and former government employees and
contractors.”48
City Council Member Greenfield posed that same question to some of the criminal defense advocates
during the hearing: What about those security risks? Should the city have concerns about the
confidentiality of New Yorkers’ data, if the data were to be open to the public?
The Bronx Defenders’ Scott Levy said that no, he didn’t think so. Releasing the training data that had
been used in the past to train predictive policing models, for instance, wouldn’t put any individuals at
risk because it would all be “anonymized, and randomized, and essentially ‘clean.’” Yung Mi Lee from
the Brooklyn Defenders took a stronger stance, arguing, “when we’re talking about constitutional
protections versus possible security risks that aren’t even realized and may never happen, I think our
constitutional protections have to take precedence.”
This work is licensed under a Creative Commons Attribution 4.0 International License.
Ethics, Technology, and Public Policy – Algorithmic Decision-Making and Accountability | 13
IV. Food for Thought / Looking to the Future
What does regulation, or legal recourse, look like?
It’s true that New York’s algorithmic accountability bill was the first of its kind in the country,50 but
there are precedents that could serve as guides for future efforts to regulate. In the 1970s, the US
passed the Fair Credit Reporting Act (FCRA) and the Equal Credit Opportunity Act (ECOA). These
were established to help protect consumers from being harmed by a very opaque and very significant
algorithm, the credit score. The FCRA was born out of the idea that people should have the right to
know what kinds of information were going into the score, and further, have the right to challenge
those inputs. It’s a more focused version of what Vacca was after with his bill.
Overseas, the European Union has made headway in expanding this principle to commercial
algorithms writ large. The General Data Protection Regulation (GDPR) will go into effect in May
2018 and will apply to all companies and people in the EU.51 The GDPR stipulates that all algorithmic
decisions be accountable to the individuals they affect -- that citizens have the “right to explanation”
about how their personal data has been used in that decision-making process.52 And because of
the increasingly global nature of digital companies, the impact of this philosophy will have ripples
outside of the EU as well.
Academic thinkers working in the field of algorithmic accountability have other ideas about
how regulation could potentially work in the future. Some have proposed the creation of formal
regulatory processes within government institutions, like the Federal Trade Commission.53
Another intriguing idea is to apply “disparate impact” theory to algorithmic systems within the
context of the US legal system.54
First applied to employment law in a 1971 Supreme Court case,55 disparate impact is the theory that
systems can be held responsible for discriminatory results even if there was no discriminatory intent.
In 2015, another landmark Supreme Court decision applied the theory to housing discrimination
as well.56 Just as the analysis of COMPAS risk assessment scores focused on the outcomes of the
software rather than its proprietary design, perhaps legal challenges to other algorithms could work
the same way.
For instance, a Bloomberg investigation of Amazon found a racial disparity in the company’s delivery
zones; areas of cities that had access to Amazon’s same-day delivery service were predominantly
white, while predominantly black areas were more likely to be excluded.57 Similarly, ProPublica
found that Princeton Review was charging different prices to customers in different ZIP codes for the
same online test-prep service -- and a curious side effect of this was that Asians were twice as likely to
be shown a higher price than non-Asians.58
No one thinks that Amazon or Princeton Review intentionally set out to treat its customers differently
based on their race. These were likely the unintentional results of automated geographic market
research and variable pricing models. But under disparate impact theory (as well as in journalism),
the results are what matter most.
However, this type of legal remedy for unintentionally discriminatory algorithms remains just an
academic idea for now. Solon Barocas and Andrew Selbst, who posed this thought experiment in
their California Law Review article “Big Data’s Disparate Impact,” wrote that expanding disparate
impact theory to algorithms in court “will be difficult technically, difficult legally, and difficult
politically.”59
The fixation on examining source code reflects a fundamental misunderstanding about how
exactly one might determine whether an algorithm is biased. It is highly dubitable that any
programmer would write ‘unconstitutional code’ by, for example, explicitly using the defendant’s
race to predict recidivism anywhere in the code. Examining the code is unlikely to reveal any
explicit discrimination. ... The answer probably lies in the inputs given to the algorithm. Thus, the
defendant should be asking to see the data used to train the algorithm and the weights assigned to
each input factor, not the source code.65
So aside from raw code, other options for disclosures about decision-making algorithms could
include the original sets of training data and the weighted factors that go into the algorithms’ design.
Or, to make things even more accessible to the non-programming public, governments could provide
visual charts breaking down a scoring system into discrete decision points, or even just written
explanations of the basic logical assumptions behind the programs’ conception.
Whatever the means of access, put simply, the spirit of algorithmic accountability as a broader ideal
is public transparency. It’s the ability of the people to be aware of, and participate in, the decisions
that automated systems increasingly make about their lives.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Ethics, Technology, and Public Policy – Algorithmic Decision-Making and Accountability | 15
That is why algorithms that get developed out in the open from the very start garner such praise --
like Allegheny County, Pennsylvania’s predictive analytic system for child welfare cases.66 Time will
tell whether caseworkers armed with risk scores will do a better or worse job at protecting children
from abuse than they did without, but county officials say they are doing this “the right way”: slowly,
and out in the open. According to a New York Times Magazine piece about the Allegheny Family
Screening Tool:
It is owned by the county. Its workings are public. Its criteria are described in academic
publications and picked apart by local officials. At public meetings held in downtown Pittsburgh
before the system’s adoption, lawyers, child advocates, parents and even former foster children
asked hard questions not only of the academics but also of the county administrators who invited
them.67
Child and family advocates, including the ACLU of Pennsylvania, told the Times reporter that they
approved of what they saw so far.
What does responsibility look like?
The first step in the implementation of any algorithm, public or private, is its design. The White
House’s 2016 report says that we should make sure that Big Data doesn’t exacerbate society’s
systematic disadvantaging of certain groups, and the way to do this is with “‘equal opportunity by
design’—designing data systems that promote fairness and safeguard against discrimination from the
first step of the engineering process and continuing throughout their lifespan.”68
Thoughtful design is even more vital for machine-learning or neural-network programs, which
become more and more complex and less and less scrutable to its own designers (not to mention less
susceptible to external audits) over time.
So what responsibility do the computer scientists, machine-learning developers, coders and
researchers themselves have, to make sure their designs do more good than harm? Do they have any?
If so, what guidelines should they be expected to follow?
This has been an ongoing conversation in the tech community for several years. At the center of the
conversation is a growing group of researchers and practitioners called Fairness, Accountability
and Transparency in Machine Learning, or FAT/ML for short. What began as a small conference
workshop to explore ethical issues in computer science has grown to a large international
community that gathers at standing-room-only annual summits.
Given the “significant social impact” of so many decision-making algorithms, FAT/ML’s leadership
wrote a set of guiding principles that are now published on the group’s website.69 The guidelines
include Responsibility, Explainability, Accuracy, Auditability and Fairness. The site also includes
a long list of questions and suggestions that an algorithm’s designers should consider throughout
the process, such as, “Are there particular groups which may be advantaged or disadvantaged, in
the context in which you are deploying, by the algorithm/system you are building?” and “Talk to
people who are familiar with the subtle social context in which you are deploying”. (See Exhibit 5 for
the full text of FAT/ML’s “Principles for Accountable Algorithms and a Social Impact Statement for
Algorithms.”)
Algorithms and the data that drive them are designed and created by people -- There is always a
human ultimately responsible for decisions made or informed by an algorithm. ‘The algorithm
did it’ is not an acceptable excuse if algorithmic systems make mistakes or have undesired
consequences, including from machine-learning processes.70
The Association for Computing Machinery, a group of over 100,000 computer science practitioners
and educators around the world, has also developed its own set of “Principles for Algorithmic
Transparency and Accountability.”71 They fall under the categories of Awareness, Access and
Redress, Accountability, Explanation, Data Provenance, Auditability, and Validation and Testing.
And a research center called “The AI Now Institute” published its own set of recommendations
for both responsible design and and responsible deployment by companies and governments.72
(See Exhibits 6 and 7 for more on these.)The AI Now Institute also put out a step-by-step guide for
creating “Algorithmic Impact Assessments.”73 If the policymakers of a new law, or the builders of a
new construction project, are expected to systematically consider the impact on the people and the
environments that will be affected by their choices, why aren’t the designers and the customers of
complex algorithms expected to do the same?
Finally, when computer scientists and machine-learning developers consider the most ethical way
to build a program, it’s possible that the most ethical choice would be to choose to not build it at
all. When AI Now’s co-founder, Kate Crawford, spoke during the closing panel of FAT/ML’s 2016
conference, she suggested that everyone in the tech community should consider his or her personal
politics and ethical beliefs when deciding what programming jobs to accept.74 “What will we do with
these tools, and what won’t we do?” she asked.
As we know, machine learning is already being used in predictive policing, in the criminal justice
system, and in tracking refugee populations here and in Europe. Might it also be used to track
undocumented workers? Might it be used to help create the ‘Muslim registry’ that we’ve heard
so much about? I would like to say this to this community here, but also to Silicon Valley more
generally, that I think we all have a responsibility right now to decide, where we will put our work,
and where we will say, this type of work is not ethical.75
Despite these ongoing and passionate (if abstract) conversations, it’s clear that many program
developers are still grappling with these questions in their day-to-day work. According to the 2018
Stack Overflow Survey of over 100,000 developers:
Only tiny fractions of developers say that they would write unethical code or that they have no
obligation to consider the ethical implications of code, but beyond that, respondents see a lot of
ethical gray. Developers are not sure how they would report ethical problems, and have differing
ideas about who ultimately is responsible for unethical code.76 (See Exhibit 8 for some examples of
this survey’s results.)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Ethics, Technology, and Public Policy – Algorithmic Decision-Making and Accountability | 17
V. Conclusion: The Fate of New York’s Bill
James Vacca’s suspicions that he expressed at the end of the October hearing proved to be correct.
The wide-sweeping bill that he had originally proposed -- mandating the publication of the source
codes of all of the algorithms used by all of the city’s agencies, and allowing New Yorkers to test
them online -- was bound to fail. But in the end, he and his staff took everyone’s objections into
consideration, and presented a new, watered-down version to the City Council in December.
The revised bill wouldn’t require the city agencies to publish anything online themselves. It would
establish a task force that would examine exactly what decision-making algorithms are in use
throughout the city government, and whether any of them appear to discriminate against people
based on age, race, religion, gender, sexual orientation or citizenship status. It would also explore
ways for New Yorkers to better understand these algorithms, and to challenge any decisions that they
didn’t agree with.
The task force would be made up of a wide range of experts and advocates, including “persons with
expertise in the areas of fairness, accountability and transparency relating to automated decision
systems.” And then it would deliver a report and recommendations to the city council within a year
and a half. (See Exhibit 9 for the full text of the final bill, Int. No. 1696.)
This new bill passed unanimously.77
Not everyone was happy about the changes that the bill had to go through in order to get passed,
however. A caveat had been added to the task force’s mandate, stating that agencies would not be
required to hand over any information to the task force that would
“interfere with a law enforcement investigation or operations, compromise public health or safety, or
that would result in the disclosure of proprietary information.”
In other words, participation in the task force’s project was voluntary. Some civil rights and criminal
defense advocates who had enthusiastically supported the original bill wondered, what’s to stop law
enforcement from making a blanket denial to the task force on these grounds?
Julia Powles, who had testified at the hearing, called it a “bold” but “flawed” bill in an opinion piece
for The New Yorker.78 “[N]ow the task force will have to rely on voluntary disclosures as it studies how
automated systems are designed, procured, and audited,” she wrote. “For a government body without
real legal powers, this will be a Herculean, or perhaps Sisyphean, undertaking.”
Powles also quoted Ellen Goodman, a Rutgers Law School professor, who was disappointed that the
bill didn’t include anything about how the city government could put pressure on third-party vendors
to make their proprietary programs more transparent to lawmakers and the public. “For many of
these vendors, it’s the biggest customer they’ll get,” said Goodman. “If New York doesn’t use that
power to make systems accountable, who will?”79
Rashida Richardson, Legislative Counsel for the New York Civil Liberties Union, said that in addition
to her concerns about the bill’s carve-outs, she also wondered about what the composition of
the task force would be.80 The bill gives the power to the Mayor to appoint whomever he wanted,
and Richardson said that he shouldn’t choose all government workers, or all technologists, or all
academic researchers. More importantly, though, she wondered what the ultimate outcome of this
task force report would be.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Ethics, Technology, and Public Policy – Algorithmic Decision-Making and Accountability | 19
Exhibit 1 - Full text of New York City’s “algorithmic accountability” bill, as proposed in
August 2017
Int. No. 1696
By Council Member Vacca
A Local Law to amend the administrative code of the city of New York, in relation to automated
processing of data for the purposes of targeting services, penalties, or policing to persons.
Be it enacted by the Council as follows:
Section 1. Section 23-502 of the administrative code of the city of New York is amended to add a new
subdivision g to read as follows:
g. Each agency that uses, for the purposes of targeting services to persons, imposing penalties upon
persons or policing, an algorithm or any other method of automated processing system of data shall:
1. Publish on such agency’s website, the source code of such system; and
2. Permit a user to (i) submit data into such system for self-testing and (ii) receive the results
of having such data processed by such system.
§ 2. This local law takes effect 120 days after it becomes law.
MAJ
LS# 10948
8/16/17 2:13 PM
Exhibit 2 - An excerpt from the white paper explaining the methodology of ProPublica’s
“Machine Bias” investigation
[W]e investigated whether certain types of errors – false positives and false negatives – were
unevenly distributed among races. We used contingency tables to determine those relative rates
following the analysis outlined in the 2006 paper from the Salvation Army.
We removed people from our data set for whom we had less than two years of recidivism
information. The remaining population was 7,214 – slightly larger than the sample in the logistic
models above, because we don’t need a defendant’s case information for this analysis. As in the
logistic regression analysis, we marked scores other than “low” as higher risk. The following tables
show how the COMPAS recidivism score performed:
These contingency tables reveal that the algorithm is more likely to misclassify a black defendant
as higher risk than a white defendant. Black defendants who do not recidivate were nearly twice as
likely to be classified by COMPAS as higher risk compared to their white counterparts (45 percent vs.
23 percent). However, black defendants who scored higher did recidivate slightly more often than
white defendants (63 percent vs. 59 percent).
The test tended to make the opposite mistake with whites, meaning that it was more likely to
wrongly predict that white people would not commit additional crimes if released compared to black
defendants. COMPAS under-classified white reoffenders as low risk 70.5 percent more often than
black reoffenders (48 percent vs. 28 percent). The likelihood ratio for white defendants was slightly
higher 2.23 than for black defendants 1.61.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Ethics, Technology, and Public Policy – Algorithmic Decision-Making and Accountability | 21
Exhibit 5 - The full text of FAT/ML’s “Principles for Accountable Algorithms and a Social
Impact Statement for Algorithms.”
Responsibility
Make available externally visible avenues of redress for adverse individual or societal effects of an
algorithmic decision system, and designate an internal role for the person who is responsible for
the timely remedy of such issues.
Explainability
Ensure that algorithmic decisions as well as any data driving those decisions can be explained to
end-users and other stakeholders in non-technical terms.
Accuracy
Identify, log, and articulate sources of error and uncertainty throughout the algorithm and its data
sources so that expected and worst case implications can be understood and inform mitigation
procedures.
Auditability
Enable interested third parties to probe, understand, and review the behavior of the algorithm
through disclosure of information that enables monitoring, checking, or criticism, including
through provision of detailed documentation, technically suitable APIs, and permissive terms of
use.
Fairness
Ensure that algorithmic decisions do not create discriminatory or unjust impacts when comparing
across different demographics (e.g. race, sex, etc).
We have left some of the terms above purposefully under-specified to allow these principles
to be broadly applicable. Applying these principles well should include understanding them
within a specific context. We also suggest that these issues be revisited and discussed throughout
Exhibit 6 - The full text of the Association for Computing Machinery’s “Principles for
Algorithmic Transparency and Accountability.”
1. Awareness: Owners, designers, builders, users, and other stakeholders of analytic systems should
be aware of the possible biases involved in their design, implementation, and use and the potential
harm that biases can cause to individuals and society.
2. Access and redress: Regulators should encourage the adoption of mechanisms that enable
questioning and redress for individuals and groups that are adversely affected by algorithmically
informed decisions.
3. Accountability: Institutions should be held responsible for decisions made by the algorithms that
they use, even if it is not feasible to explain in detail how the algorithms produce their results.
4. Explanation: Systems and institutions that use algorithmic decision-making are encouraged to
produce explanations regarding both the procedures followed by the algorithm and the specific
decisions that are made. This is particularly important in public policy contexts.
5. Data Provenance: A description of the way in which the training data was collected should be
maintained by the builders of the algorithms, accompanied by an exploration of the potential biases
induced by the human or algorithmic data-gathering process. Public scrutiny of the data provides
maximum opportunity for corrections. However, concerns over privacy, protecting trade secrets, or
revelation of analytics that might allow malicious actors to game the system can justify restricting
access to qualified and authorized individuals.
6. Auditability: Models, algorithms, data, and decisions should be recorded so that they can be
audited in cases where harm is suspected.
7. Validation and Testing: Institutions should use rigorous methods to validate their models and
document those methods and results. In particular, they should routinely perform tests to assess and
determine whether the model generates discriminatory harm. Institutions are encouraged to make
the results of such tests public.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Ethics, Technology, and Public Policy – Algorithmic Decision-Making and Accountability | 25
Exhibit 7 - An excerpt from AI Now’s April 2018 report “Algorithmic Impact Assessments: A
Practical Framework for Public Agency Accountability”
Key Elements of a Public Agency Algorithmic Impact Assessment:
1. Agencies should conduct a self-assessment of existing and proposed automated
decision systems, evaluating potential impacts on fairness, justice, bias, or other
concerns across affected communities;
2. Agencies should develop meaningful external researcher review processes to
discover, measure, or track impacts over time;
3. Agencies should provide notice to the public disclosing their definition of “automated
decision system,” existing and proposed systems, and any related self-assessments
and researcher review processes before the system has been acquired;
4. Agencies should solicit public comments to clarify concerns and answer outstanding
questions; and
5. Governments should provide enhanced due process mechanisms for affected
individuals or communities to challenge inadequate assessments or unfair, biased, or
otherwise harmful system uses that agencies have failed to mitigate or correct.
Exhibit 8 - An excerpt from the overview of Stack Overflow’s 2018 Developer Survey Results
This work is licensed under a Creative Commons Attribution 4.0 International License.
Ethics, Technology, and Public Policy – Algorithmic Decision-Making and Accountability | 29
Endnotes
1
The information in this section comes from both testimony by James Vacca to the New York City Council’s Committee on
Technology on October 16, 2017, and Lauren Kirchner’s interview with him on December 14, 2017.
2
Julia Powles, “New York City’s Bold, Flawed Attempt to Mark Algorithms Accountable,” The New Yorker, December 20,
2017, https://www.newyorker.com/tech/elements/new-york-citys-bold-flawed-attempt-to-make-algorithms-accountable
3
For more details about how this bill developed over time, including related testimony transcripts and briefing
papers, see the New York City Council’s web page for it: http://legistar.council.nyc.gov/LegislationDetail.
aspx?ID=3137815&GUID=437A6A6D-62E1-47E2-9C42-461253F9C6D0
4
Testimony by James Vacca to the New York City Council’s Committee on Technology, October 16, 2017.
5
Jeffrey Baker, “Briefing Paper of the Infrastructure Division,” New York City Council Committee on Technology, October
16, 2017.
6
Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner, “Machine Bias,” ProPublica, May 23, 2016, https://www.
propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
7
(Technically, it predicted whether a person would be arrested for a new crime.)
8
Northpointe, Inc. has since changed its name to Equivant.
9
The COMPAS questionnaire is available on Document Cloud: https://www.documentcloud.org/documents/2702103-
Sample-Risk-Assessment-COMPAS-CORE.html
10
Northpointe disputed ProPublica’s analysis, and responded here: http://www.equivant.com/blog/response-to-
propublica-demonstrating-accuracy-equity-and-predictive-parity
11
Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner, “Machine Bias,” ProPublica, May 23, 2016, https://www.
propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
12
Jeffrey Baker, “Briefing Paper of the Infrastructure Division,” Committee on Technology, October 16, 2017.
13
Solon Barocas and Andrew Selbst, “Big Data’s Disparate Impact,” 104 California Law Review 671 (2016). Available at
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2477899
14
Executive Office of the President, “Big Data: A Report on Algorithmic Systems, Opportunity, and Civil Rights,”
WhiteHouse.gov, May 2016, https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/2016_0504_data_
discrimination.pdf
15
Ibid
16
Ibid
17
Another concise list of ways that algorithms’ outcomes can seem inaccessible is in the Association for Computing
Machinery’s 2017 “Statement on on Algorithmic Transparency and Accountability”: “Decisions made by predictive
algorithms can be opaque because of many factors, including technical (the algorithm may not lend itself to easy
explanation), economic (the cost of providing transparency may be excessive, including the compromise of trade
secrets), and social (revealing input may violate privacy expectations). Available here: https://www.acm.org/binaries/
content/assets/public-policy/2017_usacm_statement_algorithms.pdf
18
Executive Office of the President, “Big Data: A Report on Algorithmic Systems, Opportunity, and Civil Rights,”
WhiteHouse.gov, May 2016, https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/2016_0504_data_
discrimination.pdf
19
Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner, “Machine Bias,” ProPublica, May 23, 2016, https://www.
propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
20
Ibid
21
Ibid
22
Michelle Liu, “Supreme Court refuses to hear Wisconsin predictive crime assessment case,” Milwaukee Journal
Sentinel, June 26, 2017, https://www.jsonline.com/story/news/crime/2017/06/26/supreme-court-refuses-hear-wisconsin-
predictive-crime-assessment-case/428240001/
This work is licensed under a Creative Commons Attribution 4.0 International License.
Ethics, Technology, and Public Policy – Algorithmic Decision-Making and Accountability | 31
42
Rose Eveleth, “Does Crime Predicting Software Bias Judges? Unfortunately There’s No Data,” Motherboard, July 18,
2016, https://motherboard.vice.com/en_us/article/wnxzdb/does-crime-predicting-software-bias-judges-unfortunately-
theres-no-data
43
Ibid
44
Ibid
45
All quotes from this section are taken from the testimony transcript from the October 16, 2017 hearing. Available here:
http://legistar.council.nyc.gov/LegislationDetail.aspx?ID=3137815&GUID=437A6A6D-62E1-47E2-9C42-461253F9C6D0
46
Lauren Kirchner, “Thousands of Criminal Cases in New York Relied on Disputed DNA Testing Techniques,” ProPublica,
September 4, 2017, https://www.propublica.org/article/thousands-of-criminal-cases-in-new-york-relied-on-disputed-dna-
testing-techniques
47
For more on the plan to close Riker’s Island, see for instance Reuven Blau and Erin Durkin, “First Rikers Island jail to
close in summer as part of city’s 10-year plan to shut down the complex,” The New York Daily News, January 2, 2018, http://
www.nydailynews.com/new-york/rikers-jail-close-summer-part-10-year-plan-article-1.3733242 and Lisa W. Foderaro,
“New York State May Move to Close Rikers Ahead of City’s 10-Year Timeline,” The New York Times, February 14, 2018,
https://www.nytimes.com/2018/02/14/nyregion/rikers-island-jail-closing-timeline.html.
48
For more on the Equifax breach, see for instance Thomas Fox-Brewster, “A Brief History Of Equifax Security
Fails,” Forbes, September 8, 2017, https://www.forbes.com/sites/thomasbrewster/2017/09/08/equifax-data-breach-
history/#6a7e2ff5677c and Seena Gressin, “The Equifax Data Breach: What to Do,” The Federal Trade Commission,
September 8, 2017, https://www.consumer.ftc.gov/blog/2017/09/equifax-data-breach-what-do.
49
Lauren Kirchner interview with James Vacca, October 16, 2017.
50
Some jurisdictions in the US had previously passed laws specifically governing the testing and the use of algorithms
in the criminal justice field, but no other jurisdiction had passed a law providing a framework for all algorithms in
government.
51
More details about the GDPR can be found on the European Commission website: https://ec.europa.eu/commission/
priorities/justice-and-fundamental-rights/data-protection/2018-reform-eu-data-protection-rules_en
52
The 2018 AI Now report “Algorithmic Impact Assessments” has more on how the GDPR can provide a useful starting
point for algorithmic-accountability policy discussions, with caveats: “The GDPR language may be a good starting
point for some agencies, but will require some shaping to match the appropriate contexts.” See Dillon Reisman, Jason
Schultz, Kate Crawford, Meredith Whittaker, “Algorithmic Impact Assessments: A Practical Framework for Public Agency
Accountability,” April 2018, https://ainowinstitute.org/aiareport2018.pdf
53
Danielle Keats Citron, “Big Data Should Be Regulated by ‘Technological Due Process,’” The New York Times, July 29,
2016, https://www.nytimes.com/roomfordebate/2014/08/06/is-big-data-spreading-inequality/big-data-should-be-regulated-
by-technological-due-process
54
Lauren Kirchner, “When Big Data Becomes Bad Data,” ProPublica, September 2, 2015, https://www.propublica.org/
article/when-big-data-becomes-bad-data
55
Griggs v. Duke Power Company, decided March 8, 1971. Court opinion available here: https://caselaw.findlaw.com/us-
supreme-court/401/424.html
56
Texas Department of Housing and Community Affairs v. The Inclusive Communities Project, Inc., decided June 25,
2015. Court opinion available here: https://supreme.justia.com/cases/federal/us/576/13-1371/
57
David Ingold and Spencer Soper, “Amazon Doesn’t Consider the Race of Its Customers. Should It?” Bloomberg, April 21,
2016, https://www.bloomberg.com/graphics/2016-amazon-same-day/
58
Julia Angwin and Jeff Larson, “The Tiger Mom Tax: Asians Are Nearly Twice as Likely to Get a Higher Price from
Princeton Review,” ProPublica, September 1, 2015, https://www.propublica.org/article/asians-nearly-twice-as-likely-to-get-
higher-price-from-princeton-review
59
Solon Barocas and Andrew Selbst, “Big Data’s Disparate Impact,” 104 California Law Review 671 (2016). Available at
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2477899
This work is licensed under a Creative Commons Attribution 4.0 International License.
Ethics, Technology, and Public Policy – Algorithmic Decision-Making and Accountability | 33
This work is licensed under a Creative Commons Attribution 4.0 International License.