Caliskan Et Al. - 2017 - Semantics Derived Automatically From Language Corp
Caliskan Et Al. - 2017 - Semantics Derived Automatically From Language Corp
W
European American was found to be significantly
e show that standard machine learning response times when subjects are asked to pair more easily associated with pleasant than unpleas-
can acquire stereotyped biases from tex- two concepts they find similar, in contrast to two ant terms, compared with a bundle of African-
tual data that reflect everyday human cul- concepts they find different. We developed our American names.
ture. The general idea that text corpora first method, the Word-Embedding Association In replicating this result, we were forced to
capture semantics, including cultural Test (WEAT), a statistical test analogous to the slightly alter the stimuli because some of the
stereotypes and empirical associations, has long IAT, and applied it to a widely used semantic rep- original African-American names did not occur
been known in corpus linguistics (1, 2), but our resentation of words in AI, termed word embeddings. in the corpus with sufficient frequency to be in-
findings add to this knowledge in three ways. Word embeddings represent each word as a vector cluded. We therefore also deleted the same number
First, we used word embeddings (3), a powerful in a vector space of about 300 dimensions, based of European-American names, chosen at random,
tool to extract associations captured in text cor- on the textual context in which the word is found. to balance the number of elements in the sets of
pora; this method substantially amplifies the sig- We used the distance between a pair of vectors two concepts. Omissions and deletions are indi-
nal found in raw statistics. Second, our replication (more precisely, their cosine similarity score, a cated in our list of keywords (see the supplemen-
of documented human biases may yield tools and measure of correlation) as analogous to reaction tary materials).
insights for studying prejudicial attitudes and time in the IAT. The WEAT compares these vec- In another widely publicized study, Bertrand
behavior in humans. Third, since we performed tors for the same set of words used by the IAT. We and Mullainathan (7) sent nearly 5000 identical
our experiments on off-the-shelf machine learn- describe the WEAT in more detail below. résumés in response to 1300 job advertisements,
ing components [primarily the Global Vectors for Most closely related to this paper is concurrent varying only the names of the candidates. They
Word Representation (GloVe) word embedding], we work by Bolukbasi et al. (6), who propose a meth- found that European-American candidates were
show that cultural stereotypes propagate to artificial od to “debias” word embeddings. Our work is 50% more likely to be offered an opportunity to be
intelligence (AI) technologies in widespread use. complementary, as we focus instead on rigorously interviewed. In follow-up work, they argued that
Before presenting our results, we discuss key demonstrating human-like biases in word embed- implicit biases help account for these effects (8).
terms and describe the tools we use. Terminology dings. Further, our methods do not require an al- We provide additional evidence for this hypo-
varies by discipline; these definitions are intended gebraic formulation of bias, which may not be thesis using word embeddings. We tested the names
for clarity of the present article. In AI and ma- possible for all types of bias. Additionally, we studied in their study for pleasantness associations. As
chine learning, bias refers generally to prior infor- the relationship between stereotyped associations before, we had to delete some low-frequency names.
mation, a necessary prerequisite for intelligent and empirical data concerning contemporary society. We confirmed the association using two different
action (4). Yet bias can be problematic where such Using the measure of semantic association de- sets of “pleasant/unpleasant” stimuli: those from
information is derived from aspects of human scribed above, we have been able to replicate every the original IAT paper and also a shorter, revised
culture known to lead to harmful behavior. Here, stereotype that we tested. We selected IATs that set published later (9).
we will call such biases “stereotyped” and actions studied general societal attitudes, rather than those Turning to gender biases, we replicated a find-
taken on their basis “prejudiced.” of subpopulations, and for which lists of target and ing that female names are more associated with
We used the Implicit Association Test (IAT) as attribute words (rather than images) were avail- family than career words, compared with male
our primary source of documented human biases able. The results are summarized in Table 1. names (9). This IAT was conducted online and
(5). The IAT demonstrates enormous differences in Greenwald et al. introduced and validated the thus has a vastly larger subject pool but far fewer
IAT by studying biases that they consider nearly keywords. We replicated the IAT results even with
universal in humans and about which there is no these reduced keyword sets. We also replicated an
1
Center for Information Technology Policy, Princeton social concern (5). We began by replicating these online IAT finding that female words (e.g., “woman”
University, Princeton, NJ, USA. 2Department of Computer
inoffensive results for the same purposes. Spe- and “girl”) are more associated than male words
Science, University of Bath, Bath BA2 7AY, UK.
*Corresponding author. Email: [email protected] (A.C.); cifically, they demonstrated that flowers are sig- with the arts than with mathematics (9). Finally,
[email protected] (J.J.B.); [email protected] (A.N.) nificantly more pleasant than insects, based on we replicated a laboratory study showing that
2 2
Strength of association of
1 1
0 0
−1 −1
−2 −2
0 20 40 60 80 100 0 20 40 60 80 100
Percentage of workers in occupation who are women Percentage of people with name who are women
Fig. 1. Occupation-gender association. Pearson’s correlation co- Fig. 2. Name-gender association. Pearson’s correlation coefficient
efficient r = 0.90 with P < 10−18. r = 0.84 with P < 10−13.
female words are more associated with the arts sionality reduction to substantially amplify the signal where
Table 1. Summary of Word-Embedding Association Tests.We replicated eight P values (P, rounded up) to emphasize that the statistical and substantive
well-known IAT findings using word embeddings (rows 1 to 3 and 6 to 10); we significance of both sets of results is uniformly high; we do not imply that our
also help explain prejudiced human behavior concerning hiring in the same way numbers are directly comparable with those of human studies. For the online
(rows 4 and 5). Each result compares two sets of words from target concepts IATs (rows 6, 7, and 10), P values were not reported but are known to be below
about which we are attempting to learn with two sets of attribute words. In the significance threshold of 10−2. Rows 1 to 8 are discussed in the text; for
each case, the first target is found compatible with the first attribute, and the completeness, this table also includes the two other IATs for which we were
second target with the second attribute. Throughout, we use word lists from able to find suitable word lists (rows 9 and 10). We found similar results with
the studies we seek to replicate. N, number of subjects; NT, number of tar- word2vec, another algorithm for creating word embeddings, trained on a
get words; NA, number of attribute words. We report the effect sizes (d) and different corpus, Google News (see the supplementary materials).
Flowers vs. insects Pleasant vs. unpleasant (5) 32 1.35 10−8 25 × 2 25 × 2 1.50 10−7
............................................................................................................................................................................................................................................................................................................................................
−10
Instruments vs. weapons Pleasant vs. unpleasant (5) 32 1.66 10 25 × 2 25 × 2 1.53 10−7
............................................................................................................................................................................................................................................................................................................................................
European-American vs. African-American names Pleasant vs. unpleasant (5) 26 1.17 10−5 32 × 2 25 × 2 1.41 10−8
............................................................................................................................................................................................................................................................................................................................................
European-American vs. African-American names Pleasant vs. unpleasant from (5) (7) Not applicable 16 × 2 25 × 2 1.50 10−4
............................................................................................................................................................................................................................................................................................................................................
European-American vs. African-American names Pleasant vs. unpleasant from (9) (7) Not applicable 16 × 2 8×2 1.28 10−3
............................................................................................................................................................................................................................................................................................................................................
Male vs. female names Career vs. family (9) 39k 0.72 <10−2 8×2 8×2 1.81 10−3
............................................................................................................................................................................................................................................................................................................................................
−2
Math vs. arts Male vs. female terms (9) 28k 0.82 <10 8×2 8×2 1.06 .018
............................................................................................................................................................................................................................................................................................................................................
Science vs. arts Male vs. female terms (10) 91 1.47 10−24 8×2 8×2 1.24 10−2
............................................................................................................................................................................................................................................................................................................................................
−3 −2
The statistic associated with each word vector is Whorf hypothesis (17), because our work suggests supplementary materials). Further concerns may
a normalized association score of the word with that behavior can be driven by cultural history arise as AI is given agency in our society. If machine-
the attribute embedded in a term’s historic use. Such histories learning technologies used for, say, résumé screening
can evidently vary between languages. were to imbibe cultural stereotypes, it may result
sðw; A; BÞ ¼ We stress that we replicated every association in prejudiced outcomes. We recommend address-
→ → → →
documented via the IAT that we tested. The num- ing this through the explicit characterization of
meana∈ A cosðw ; a Þ − meanb∈B cosðw ; b Þ ber, variety, and substantive importance of our acceptable behavior. One such approach is seen in
→ →
std devx∈ A∪ B cosðw ;xÞ results raise the possibility that all implicit human the nascent field of fairness in machine learning,
biases are reflected in the statistical properties of which specifies and enforces mathematical formu-
The null hypothesis is that there is no asso- language. Further research is needed to test this lations of nondiscrimination in decision-making
ciation between s(w, A, B) and pw . We tested the hypothesis and to compare language with other (19, 20). Another approach can be found in mod-
null hypothesis using a linear regression analysis modalities, especially the visual, to see if they have ular AI architectures, such as cognitive systems,
to predict the latter from the former. similarly strong explanatory power. in which implicit learning of statistical regular-
We elaborate on further implications of our re- Our results also suggest a null hypothesis for ex- ities can be compartmentalized and augmented
sults. In psychology, our results add to the credence plaining origins of prejudicial behavior in humans, with explicit instruction of rules of appropriate
of the IAT by replicating its results in such a namely, the implicit transmission of ingroup/ conduct (21, 22). Certainly, caution must be used
different setting. Further, our methods may yield outgroup identity information through language. in incorporating modules constructed via unsu-
an efficient way to explore previously unknown That is, before providing an explicit or institutional pervised machine learning into decision-making
implicit associations. Researchers who conjecture explanation for why individuals make prejudiced systems.
implicit associations might first test them using the decisions, one must show that it was not a simple
WEAT on a suitable corpus before testing human outcome of unthinking reproduction of statisti- REFERENCES AND NOTES
subjects. Similarly, our methods could be used to cal regularities absorbed with language. Similarly, 1. M. Stubbs, Text and Corpus Analysis: Computer-Assisted
quickly find differences in bias between demo- before positing complex models for how stereo- Studies of Language and Culture (Blackwell, Oxford,
1996).
graphic groups, given large corpora authored by typed attitudes perpetuate from one generation to
2. J. A. Bullinaria, J. P. Levy, Behav. Res. Methods 39, 510–526
members of the respective groups. If substan- the next or from one group to another, we must (2007).
tiated through testing and replication, the WEAT check whether simply learning language is suffi- 3. T. Mikolov, J. Dean, Adv. Neural Inf. Process. Syst. 2013,
may also give us access to implicit associations cient to explain (some of) the observed transmis- 3111–3119 (2013).
4. C. M. Bishop, Pattern Recognition and Machine Learning
of groups not available for testing, such as his- sion of prejudice.
(Springer, London, 2006).
toric populations. Our work has implications for AI and machine 5. A. G. Greenwald, D. E. McGhee, J. L. Schwartz, J. Pers. Soc.
We have demonstrated that word embeddings learning because of the concern that these tech- Psychol. 74, 1464–1480 (1998).
encode not only stereotyped biases but also other nologies may perpetuate cultural stereotypes (18). 6. T. Bolukbasi, K.-W. Chang, J. Y. Zou, V. Saligrama,
A. T. Kalai, Adv. Neural Inf. Process. Syst. 2016, 4349–4357
knowledge, such as the visceral pleasantness of Our findings suggest that if we build an intelligent
(2016).
flowers or the gender distribution of occupations. system that learns enough about the properties of 7. M. Bertrand, S. Mullainathan, Am. Econ. Rev. 94, 991–1013 (2004).
These results lend support to the distributional language to be able to understand and produce it, 8. M. Bertrand, D. Chugh, S. Mullainathan, Am. Econ. Rev. 95,
hypothesis in linguistics, namely that the statis- in the process it will also acquire historical cultural 94–98 (2005).
9. B. A. Nosek, M. Banaji, A. G. Greenwald, Group Dyn. 6, 101–115
tical contexts of words capture much of what we associations, some of which can be objectionable. (2002).
mean by meaning (16). Our findings are also sure Already, popular online translation systems in- 10. B. A. Nosek, M. R. Banaji, A. G. Greenwald, J. Pers. Soc.
to contribute to the debate concerning the Sapir- corporate some of the biases we study (see the Psychol. 83, 44–59 (2002).
11. B. A. Nosek et al., Proc. Natl. Acad. Sci. U.S.A. 106, 19. C. Dwork, M. Hardt, T. Pitassi, O. Reingold, R. Zemel, Fairness research as a part of his undergraduate dissertation;
10593–10597 (2009). through awareness, Proceedings of the 3rd Innovations in and S. Barocas, M. Brundage, K. Crawford, C. Lai, and
12. P. D. Turney, P. Pantel, J. Artif. Intell. Res. 37, 141 (2010). Theoretical Computer Science Conference (ACM, 2012), M. Salganik for extremely useful comments on a draft of this
13. J. Pennington, R. Socher, C. D. Manning, EMNLP 14, 1532–1543 pp. 214–226. paper. We have archived the code and data on Harvard
(2014). 20. M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, Dataverse (doi: 10.7910/DVN/DX4VWP).
14. T. MacFarlane, Extracting semantics from the Enron corpus, S. Venkatasubramanian, Certifying and removing disparate
University of Bath, Department of Computer Science Technical impact, Proceedings of the 21th ACM SIGKDD International
Report Series; CSBU-2013-08; http://opus.bath.ac.uk/37916/ Conference on Knowledge Discovery and Data Mining (ACM,
SUPPLEMENTARY MATERIALS
(2013). 2015), pp. 259–268.
21. K. R. Thórisson, Minds Mach. 17, 11–25 (2007). www.sciencemag.org/content/356/6334/183/suppl/DC1
15. W. Lowe, S. McDonald, The direct route: Mediated priming in
22. M. Hanheide et al., Artif. Intell. 2015, j.artint.2015.08.008 (2015). Materials and Methods
semantic space, Proceedings of the Twenty-Second Annual
23. L. L. Monteith, J. W. Pettit, J. Soc. Clin. Psychol. 30, 484–505 (2011). Supplementary Text
Conference of the Cognitive Science Society (LEA, 2000),
Table S1
pp. 806–811.
ACKN OWLED GMEN TS References
16. M. Sahlgren, Ital. J. Linguist. 20, 33 (2008).
17. G. Lupyan, Lang. Learn. 66, 516–553 (2016). We are grateful to W. Lowe for substantial assistance 17 November 2016; accepted 9 March 2017
18. S. Barocas, A. D. Selbst, Calif. Law Rev. 104, 2477899 (2014). in the design of our significance tests; T. Macfarlane for pilot 10.1126/science.aal4230
Science (ISSN 1095-9203) is published by the American Association for the Advancement of Science. 1200 New York Avenue NW,
Washington, DC 20005. The title Science is a registered trademark of AAAS.
Copyright © 2017, American Association for the Advancement of Science