Hehman & Xie (2021) - Doing Better Data Visualization
Hehman & Xie (2021) - Doing Better Data Visualization
research-article2021
AMPXXX10.1177/25152459211045334Hehman, XieData Visualization
ASSOCIATION FOR
Tutorial PSYCHOLOGICAL SCIENCE
Advances in Methods and
Abstract
Methods in data visualization have rapidly advanced over the past decade. Although social scientists regularly need to
visualize the results of their analyses, they receive little training in how to best design their visualizations. This tutorial is
for individuals whose goal is to communicate patterns in data as clearly as possible to other consumers of science and
is designed to be accessible to both experienced and relatively new users of R and ggplot2. In this article, we assume
some basic statistical and visualization knowledge and focus on how to visualize rather than what to visualize. We distill
the science and wisdom of data-visualization expertise from books, blogs, and online forum discussion threads into
recommendations for social scientists looking to convey their results to other scientists. Overarching design philosophies
and color decisions are discussed before giving specific examples of code in R for visualizing central tendencies,
proportions, and relationships between variables.
Keywords
graphing/plotting, data visualization, open data, open materials
Creative Commons NonCommercial CC BY-NC: This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0
License (https://creativecommons.org/licenses/by-nc/4.0/), which permits noncommercial use, reproduction, and distribution of the work without
further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
2 Hehman, Xie
12 12
10 10
8 8
y1
y2
6 6
4 4
4 6 8 10 12 14 16 18 4 6 8 10 12 14 16 18
x1 x2
12 12
10 10
8 8
y3
6
y4 6
4 4
4 6 8 10 12 14 16 18 4 6 8 10 12 14 16 18
x3 x4
Fig. 1. Anscombe’s quartet. In all four data sets depicted, the mean of x is 9, the variance of x is 11, the mean of y is 7.5, the variance
of y is 4.12, and the correlation between x and y is .82. Important features of the data are hidden unless the individual observations
are visualized.
possible.” Using visualizations to increase information (Fig. 1; Anscombe, 1973) is a famous illustration of how
richness speaks to both principles. Anscombe’s quartet descriptive statistics can conceal important features of
your data.
Every data visualization, like any descriptive statistic,
is a simplification of your data. Just like descriptive sta-
tistics can mask meaningful underlying variation, basic
visualizations that oversimplify your data can do so as
well. To the extent that you include more fine-grained
information, you can better convey the actual patterns
within your data. Consider the classic bar plot: When
used to summarize means, bar plots oversimplify because
they depict only the means of different conditions, and
a great deal of important information is lost (Weissgerber
et al., 2015). For example, two conditions might have
the exact same mean but very different underlying dis-
tributions of observations giving rise to those means
(Fig. 2).
Including more visualization features can convey
more information to the reader in the same space,
thereby increasing the information richness of the visu-
Fig. 2. An informationally sparse visualization (left) plotted from toy
alization. A common first step would involve represent-
data. This bar plot reveals two conditions that have identical means. Yet
from the same data, plotting the individual observations (right) reveals a ing the variability around those means (e.g., error bars).
very different distribution in each condition giving rise to those means. A further step would be representing the distribution of
Data Visualization 3
8
region
Africa
Asia
7
Australia & New Zealand
Central America & Mexico
Eastern Europe
Intelligence Ratings
6 Middle East
Scandinavia
South America
UK
5
USA & Canada
Western Europe
4 stim_ethnicity
asian
black
3 latinx
white
1 2 3 4 5 6 7 8 9
Attractiveness Ratings
Fig. 3. An overinformationally rich visualization. This scatterplot depicts the relationship between ratings of attractiveness and
ratings of intelligence made on targets across four ethnicities by perceivers from different world regions.
Categorical
to 3. In this situation, it is informative to have the posi-
tive and negative directions be distinct colors that scale
as the values become farther from zero. In addition, the
Continuous zero point may be best represented as no information,
which separates the colors chosen for the positive and
Zero-point
negative side of the scales (Fig. 4). Some ideal color
palettes can again be found for this situation through
Fig. 4. Examples of distinct color palettes for different types of data.
ColorBrewer (Brewer et al., 2003).
Several packages in R currently represent the state
of the art. One is viridis (Garnier et al., 2018). It has
with the philosophy of minimalism in effective scientific been carefully developed to have eight palettes that
communication, these unnecessary flourishes should be represent continuous change across a spectrum in pal-
removed. ettes that are safe for both color blindness and gray
scale (Nuñez et al., 2018). Another is the colorspace
package (Zeileis et al., 2019), which is based on human
Color color perception; colors vary along hue, chroma, and
One of the most important considerations in any modern luminance dimensions. Likewise, scico (Crameri, 2018)
visualization is that of color. There are a number of offers gradients that are perceptually uniform and uni-
concerns to simultaneously navigate when considering versally readable.
your choice of color. The first is inclusivity. Five percent
of the human population, 8% to 10% of men, have some
sort of color blindness; the most common is red-green
Better Visualization of Common Results
color blindness (Neitz & Neitz, 2011). Another concern As a general philosophy, goal-centered graph design, or
is that although screen-based reading of articles is now choosing a visualization that highlights your specific
more common, ideally your color choices would still hypotheses or goals, will make visualizations most effec-
effectively convey information when printed in gray tive. There are some common visualizations that are
scale because your article will likely be sometimes read overwhelmingly used to convey certain types of informa-
in that way. Most importantly, consider the type of infor- tion. Many of these enjoy their level of popularity
mation being presented. Are your data categorical? Are because of historic precedent in that area of research
there two categories or five? Continuous? Is there a zero and perhaps at one time did comprise the cutting edge
point in your continuum? The answers to each of these of visualization. Yet like any technology, other improved
questions should inform your palette choice. methods have been developed that are now objectively
When your data are categorical, your goal is to choose superior. Summarizing these advances very generally,
colors that are maximally differentiable within the color the improvements in visualization hinge on providing
space (while simultaneously being safe for color blind- improved methods of conveying two types of informa-
ness and gray scale). Exactly what these maximally dif- tion (that are related): representations of variance around
ferentiable colors might be depends on how many a central tendency and representations of the overall
categories you need to be equally spaced in color. Excel- distribution of the data. In this section, we discuss three
lent tools such as ColorBrewer (Brewer et al., 2003) common types of information to be conveyed by studies
palettes are valuable and available at https://color in the social sciences and the modern best practices for
brewer2.org. conveying that information in data visualizations.
When considering a continuous scale, color gradients R code and example data are provided in each sec-
can bias a reader’s perception of relative quantitative tion. All plots were created using the ggplot2 package
differences. For instance, certain colors, such as yellow, (Wickham, 2011), which is required for the tutorial code
can create apparent divisions in a scale not actually there to run, along with data hygiene packages such as dplyr
because of their high luminosity. Some other color transi- (Wickham et al., 2021). In addition, we used the viridis
tions can bias readers into believing there is a bigger (Garnier et al., 2018) and colorspace (Zeileis et al., 2019)
value change in a certain part of the scale. It is important color palette libraries, ggExtra (Attali & Baker, 2019), to
that the color gradient consistently changes in value add marginal density plots and histograms, and gghalves
from the top to the bottom of the scale identical to the (Tiedemann, 2020) to create the raincloud plots pre-
value change of the numbers the colors represent. sented below. For those interested in a primer to R, the
Sometimes researchers may wish to visually represent tidyverse, or ggplot2, see the For Further Reading sec-
a zero point along a continuous scale, such as from −3 tion at the end of the article. More information on each
Data Visualization 5
Hijab
Kippah
Crucifix
1 2 3 4 5 6 7
Perceived Relation to Bill 21
Fig. 5. Raincloud plot combining a probability density function, jittered data points,
a mean represented by the white circle, and a box plot. The advantage of these
additional features is salient here because they reveal several important features of
the data, including nonnormal distributions of observations that would be otherwise
obscured by presenting only a measure of central tendency like the bar plot.
It is our opinion that these methods of data visualiza- In the following example, we use a cluster heat map
tion fully subsume the information conveyed by the bar (Fig. 6) to show how explicit antigay bias changed over
plot and box plot. In fact, because we do not believe time across each state in the United States (with data
there to be any information present in the bar plot not from Ofosu et al., 2019).
available in its modern descendants, for representing In general and for various reasons, we consider the
central tendencies in finalized scientific communication, raincloud plot and cluster heat map more consistent with
we think the bar plot should be fully retired. the philosophies laid out above for conveying central
tendencies than the bar plot, box plot, violin plot,
beeswarm plot, bean plot, pirate plot, lollipop plot, or
Cluster heat map ridgeline plot, although some of these might still provide
Some researchers may wish to show mean change over some advantages in niche situations.
time across multiple conditions or categories or as a
function of some other continuous variable. When addi-
tionally incorporating time, visualizing all the observa- Proportions or Frequencies
tions and distributions at each point is likely too complex Another common type of information presented is that
and visually overwhelming. It may be more effective to of proportions or frequencies. Unlike central tenden-
focus on the information you want to convey most effec- cies, there is no variance to represent around these
tively: mean change for multiple categories over time. observed counts. Accordingly, priorities of the data
One visualization ideal for this situation is the cluster visualization vary. Yet like central tendencies, scientists
heat map (alternatively known as a tile map or level plot; often wish to visually compare proportions with one
Wilkinson & Friendly, 2009). Here, means over time are another. Because multiple proportions are a percentage
represented by color, and each rectangle represents a of some greater whole, a classic way of representing
fixed set of time. This plot enables easy comparison both these data for comparison is a pie chart. We see pie
across many categories and within a category. charts (or other circular visualizations) occasionally but
8 Hehman, Xie
53 load("HeatmapData.Rda")
54
55 # cluster heat map / level plot with change over time in squares
56
57 f2 <- HeatmapData %>% # define dataframe
58 ggplot(aes(x=Year, y=State, z=Explicit)) + # define x, y, and z
variables
59
60 # add observations to the heat map
61 geom_tile(aes(fill = Explicit)) + # we will fill the map with colors
based on
62 # values on the z variable
( Explicit Bias)
63
64 # Define color palette
65 # For this example, we will use the "Inferno" palette from the
colorspace package
66 scale_fill_continuous_sequential(palette="Inferno", # define palette
67 name="Explicit Bias") + # name of legend
68 # optional styling
69 scale_x_continuous(breaks=seq(2003,2015,3)) + # x-axis tick marks
70 xlab("Year") + # x-axis label
71 ylab("State") + # y-axis label
72 ylim(rev(levels(HeatmapData$State))) + # order y-axis
alphabetically
73 theme_minimalism() + # apply our custom
minimal theme
74 theme(panel.grid.major.y=element_line()) # show major gridline
for y axis
75 f2
76
77 # we can also order the y-axis another way. below is the code to sort
the States
78 # by their mean level of prejudice (across all years).
79 yaxisOrder <- HeatmapData %>%
80 group_by(State) %>%
81 dplyr::summarize(avgExplicit = mean(Explicit)) %>%
82 ungroup() %>%
83 arrange(avgExplicit)
84 levels(yaxisOrder$State) <- yaxisOrder$State # this creates the order
of the states
85
86 # then, we add the following to our figure to sort according to States'
87 # average explicit bias
88 f2 <- f2 +
89 ylim(levels(yaxisOrder$State)) # you may ignore the warning
that a scale for 'y' is
90 ## already present. This replaces the
existing scale.
91 f2
92 # save plot
93 ggsave(f2,filename="figs/levelplot.png",dpi=300,type="cairo",
94 height=23,width=11.5, units="cm") # adjust dims to change
s ize of cells
Data Visualization 9
MS
AL 0.4
LA
SC
GA 0.3
Frequency (%)
AR
MO
ND 0.2
DE
FL
TX 0.1
NC
PA
IL 0.0
1 2 3 4 5 6 7
NJ
OH Response (1-7 Scale)
MN
WI Fig. 7. Frequency (%) of responses on a Likert-type item scaled from
IA 1 to 7. Observations were collected at a single time point. Note that
KS even for very small differences, such as Response 1 and Response 2,
SD column length allows for precise comparisons.
TN Explicit Bias
NE 1.6
WV that humans are not very good at perceived circular
OK 1.2 area and so inaccurately interpret proportions visually
State
95 load("BarandLineplotData.Rda")
96
97 # bar chart comparing proportions across single category
98 f3 <- BarAndLineplotData %>% # define dataframe
99 filter(weeks==2) %>% # filter data only from week 2
100 ggplot(aes(x=response, y=percent, # define x, y variables
101 fill=response)) + # the fill variable defines the color
of bars
102 # add bars
103 geom_bar(stat = "identity", position="dodge") + # style of bars. add
fill="black" to
104 # set the same color
across all bars
105 # optional styling
106 # Define color palette
107 # For this example, we will use the "viridis" palette from the viridis
package
108 scale_fill_viridis(discrete = T, option="viridis") +
109 xlab("Response (1-7 Scale)") + # x-axis label
110 ylab("Frequency (%)") + # y-axis label
111 theme_minimalism() + # apply our custom minimal
theme
112 theme(legend.position="none", # hide legend
within bars. Figure 8 illustrates the changing proportion proportions than their alternatives, including the pie
of responses on the same Likert-type item scaled from chart, spider chart, radar chart, tree map, doughnut plot,
1 to 7 made by participants across 4 weeks. area chart, stacked area plot, or steam graph, although
Line plot
Like means over time, a common situation is that 1.00
researchers wish to visualize how proportions change
over time or as a function of some other continuous 0.75 Response
Frequency (%)
some of these might still provide some advantages in mostly adopted best practices. We see scatterplots regu-
niche situations. larly in our respective corner of research. Nonetheless,
some additions can improve the information communi-
cated. Like means, it is important here to represent both
Relationships a central tendency of the relationship and the variance
Finally, researchers often want to visualize a relationship around that relationship. Typically, line graphs are used
between two or more variables, such as a correlation or to represent relationships, and like the other types of
regression slope. In our subjective opinions, it is for this information we are covering, they can be improved by
type of visualization that social scientists have already better conveying the distribution of data.
155 load("BarandLineplotData.Rda")
156 # stacked line plot with total proportion as separate line
157
158 # for this example, we will also add a line to represent the cumulative frequency
159 # calculate cumulative frequency across all levels of x, per y
160 BarAndLineplotData <- BarAndLineplotData %>%
161 group_by(weeks) %>% # group by week
162 dplyr::mutate(percent_TOTAL := sum(percent, na.rm=TRUE)) %>% # get total % per week
163 ungroup()
164
165 # create stacked line plot
166 f5 <- BarAndLineplotData %>% # define dataframe
167 ggplot(aes(x = weeks, # define x variable
168 y = percent, # define y variable
169 fill = response, # set grouping variable for bar colors
170 color = response)) + # set grouping variable for bar colors
171 geom_line(size = 0.4) + # add lines for each group
172
173 # add cumulative frequency to line plot
174 geom_line(aes(x=weeks,y=percent_TOTAL), # add line for total
175 color="black", size = 1) + # color and size for total line
176
177 # optional styling
178 # define color palette using "viridis" palette from viridis package
179 scale_color_viridis(discrete=T, option="viridis",# changes line colors
180 name = "Response") + # legend title
181 xlab("Week") + # x-axis label
182 ylab("Frequency (%)") + # y-axis label
183 coord_cartesian(xlim=c(1,769)) + # set axis limits
184 scale_x_continuous(breaks=seq(0,769,100)) + # x-axis tick marks
185 theme_minimalism() + # apply custom minimal theme
186 theme(panel.grid.major.x=element_line(), # show all major/minor grids
187 panel.grid.major.y=element_line(),
188 panel.grid.minor.x=element_line(),
189 panel.grid.minor.y=element_line())
190 f5
191 # save plot
192 ggsave(f5,filename="figs/barplot3_lineplot.png",dpi=300,type="cairo",
193 height=13,width=18, units="cm")
1
2 visualization as a scatterplot. However, doing so can
0.50 3 require some additional programming. Alternatively,
4
5 researchers might employ a contour plot, essentially
6 turning the scatterplot into a heat or topographical map
0.25 7 in which certain colors represent a higher density of
observations (i.e., a modern version of sunflowers;
Cleveland & McGill, 1984), which enables readers to still
0.00
ascertain the underlying relationship while simultane-
0 100 200 300 400 500 600 700 ously seeing the distributions of the observed data across
Week two axes.
Fig. 9. Frequency (%) of responses on a Likert-type item scaled 1 to 7 To illustrate, in Figure 11, we use a contour plot to
in which observations collected between 2007 and 2019 are compared. represent the same data presented above: the relation-
Rather than stacking the values, the lines are plotted over one another ship between implicit and explicit anti-Black bias. Rather
so their respective change over time can be compared (in contrast to a than the histograms we presented above, here, as a
stacked area plot, which can impede the accurate perception of values;
Few, 2011). We included a black line representing the total per week. variant, we included density distributions in the margins
The data here are proportions, so this value never deviates from 1. of the x-axis and y-axis. In fact, we prefer density dis-
However, when researchers are plotting raw values or frequencies over tributions over histograms because we believe they are
time, it might be informative to indicate how many total observations
more consistent with the principle of minimalism. For
occurred per week across all the distinct categories being plotted.
consistency, we have used these same data to illustrate
this type of visualization. Yet it is important to emphasize
Contour plot
we consider contour plots more appropriate when there
Sometimes researchers may have so many observations are more observations (e.g., > 5,000) to ensure a visu-
that scatterplots are no longer effective. For instance, alization is not too information rich.
194 load("ScatterPlotData.Rda")
195
196 # scatterplot
197 f6 <- ScatterplotData %>% # defines dataframe
198 ggplot(aes(x=ExplicitBias, y=ImplicitBias)) + # defines x and y axis variables
199
200 # add observations to scatterplot
201 geom_point(size=1, alpha=.7, color="darkgray") + # define size and color
202 # alpha adds transparency
203 # add fitted slope and 95% CIs
204 geom_smooth(size=1,method=lm,color="slateblue")+ # define size and color
205 # method=lm indicates linear
slope
206
207 # optional styling
208 scale_x_continuous(breaks=seq(0.4,1.6,.2)) + # x-axis tick marks
209 scale_y_continuous(breaks=seq(0.3,1.6,.05)) + # y-axis tick marks
210 xlab("Explicit Bias") + # x-axis label
211 ylab("Implicit Bias") + # y-axis
212 theme_minimalism() + # apply custom minimal theme
213 theme(panel.grid.major.x=element_line(), # show all major/minor grids
214 panel.grid.major.y=element_line(),
215 panel.grid.minor.x=element_line(),
216 panel.grid.minor.y=element_line())
14 Hehman, Xie
217
218 # add marginal histograms (requires ggExtra package)
219 f6 <- ggMarginal(f6, type=”histogram”, # add histograms to marginal plot
220 fill = “lightgray”, # color of histograms
221 xparams = list(bins=15), # n of bins for x variable
222 yparams = list(bins=15)) # n of bins for y variable
223 f6
224 # save plot
225 ggsave(f6,filename=”figs/scatterplot.png”,dpi=300,type=”cairo”,
226 height=14,width=18, units=”cm”)
0.50
0.45
Implicit Bias
0.40
0.35
0.30
0.4 0.6 0.8 1.0 1.2 1.4
Explicit Bias
Fig. 10. Improved scatterplot visualizing the relationship between implicit and explicit anti-
Black bias, including a 95% confidence band of the slope, with histograms of the variable
on each axis in the opposite margins.
227 load("ScatterPlotData.Rda")
228
229 # contour plot with density plots in the margins
230 f9 <- ScatterplotData %>% # defines dataframe
231 ggplot(aes(x=ExplicitBias, y=ImplicitBias)) + # defines x and y axis
variables
232 geom_point(stat="identity",size=0.01,alpha=0) + # add observations
233 stat_density_2d(aes(fill=..level..), # add the main contour plot
234 h = 0.1,geom="polygon") + # change h to adjust
binning
235
236 #optional styling
237 scale_fill_viridis(option="viridis") +
# color using viridis palette
238 stat_smooth(method = "lm", formula = y~x, # add regression line
239 size=2, color="black", se=F) + # style of regression line
Data Visualization 15
0.50
0.45
Implicit Bias
0.40
level
25
20
0.35 15
10
5
0.30
0.4 0.8 1.2
Explicit Bias
Fig. 11. Contour plot visualizing the relationship between implicit and explicit anti-Black
bias, with probability density functions in the margins. Areas with higher values on the
legend indicate higher density of observations.
16 Hehman, Xie
256 load("SpaghettiPlotData.Rda")
257
258 # Spaghetti plot for random slopes
259 f7 <- SpaghettiplotData %>% # define dataframe
260 ggplot(aes(x=attractive, # define x variable
261 y=intelligent)) + # define y variable
262
263 # create random slopes, where each line represents a slope for each cluster
264 # in this example, each cluster is a Participant
265 geom_line(aes(group=ParticipantID, color=ParticipantID), # set clustering
variable
266 stat="smooth", method="lm", # define the line as a linear
relationship
267 color="gray", size=0.8, alpha = 0.5) + # define style of lines
268
269 # create a grand slope across all clusters
270 stat_smooth(method="lm", formula = y~x, # grand average slope (linear)
271 color="coral",size = 1.5,se=F) + # define color, size of
272 average slope line
273
274 # optional styling
275 #scale_color_viridis(discrete=TRUE) + # different colors for each
cluster
276 coord_cartesian(ylim=c(1,7), xlim=c(1,7)) + # set axis limits
277 xlab("Attractiveness Ratings") + # axis labels
278 ylab("Intelligence Ratings") +
279 theme_minimalism() + # apply custom minimal theme
280 theme(legend.position="none") # hide legend
281 f7
282 # save plot
283 ggsave(f7,filename="figs/spaghettiplot.png",dpi=300,type="cairo",
284 height=14,width=18, units="cm")
Recommendations for Further Reading threads into recommendations viable for individuals
communicating their data and results to other consumers
Although we, the authors, regularly read, think about, of science.
and create data visualizations for our research, we are It is not coincidental that our recommendations often
not visualization professionals. Here, we have attempted hover around the most simple: variants of the bar plot,
to distill and present what we consider the information line plot, or scatterplot. These tried-and-true methods of
most applicable and useful to other social scientists from visualization have persisted across decades because they
people with greater expertise than we. However, we are effective and clear. Although new visualizations are
encourage interested readers to seek out the primary continually being developed (e.g., beeswarm plot, steam
sources and modern practitioners and have included a graph), these sometimes have a goal of aestheticism and
section, For Further Reading, before the Reference sec- novelty involved, not clear scientific communication.
tion as a starting point. Although some might envision specific scenarios in which
other visualizations are superior, we believe that the rec-
ommendations and code we present above will best serve
Summary
most social scientists in most common situations. We
Visualizing one’s data effectively to convey information believe it is most important for researchers to keep the
is a science unto itself with research-informed best and guiding philosophies in mind when making their unavoid-
worst practices. Yet this is an area in which social sci- ably subjective decisions about which visualization might
entists receive little training. Here, we aimed to essen- be most effective to convey understanding of their data
tially distill advice and information scattered across or critical hypothesis test. We hope this tutorial aids in
data-visualization blogs, books, and Internet discussion this endeavor.
Data Visualization 17
Transparency
Action Editor: Julia Strand
6 Editor: Daniel J. Simons
Author Contributions
Conceptualization: E. Hehman. Data curation: all authors.
Intelligence Ratings
Recommended Reading
Ismay, C., & Kim, A. Y. (2021). Modern dive: Statistical Inference ORCID iDs
via Data Science. https://moderndive.com/index.html
A freely and fully available online introduction to R and the Eric Hehman https://orcid.org/0000-0003-2227-1517
tidyverse Sally Y. Xie https://orcid.org/0000-0002-1200-9470
Wickham, H., & Grolemund, G. (2017). R for data science.
O’Reilly Media. https://r4ds.had.co.nz/
Acknowledgments
A freely and fully available online introduction to program-
ming in R We thank Neil Hester, Eugene Ofosu, Jennifer Suliteanu, and
Tutorials Point. Learn ggplot2. https://www.tutorialspoint.com/ Chevieve Heri for feedback on an early draft.
ggplot2/ggplot2_introduction.htm
A freely and fully available online introduction to ggplot2
Wilke, C. O. (2019). Fundamentals of data visualization: A Supplemental Material
primer on making informative and compelling figures. Additional supporting information can be found at http://jour
O’Reilly Media. nals.sagepub.com/doi/suppl/10.1177/25152459211045334
An excellent modern resource, with some portions available
online, including some code for R.
Tufte, E. R. (1983). The visual display of quantitative informa- References
tion. Graphics Press. Allen, M., Poggiali, D., Whitaker, K., Marshall, T. R., & Kievit, R. A.
The classic text on data visualization by an initial pioneer in (2019). Raincloud plots: A multi-platform tool for robust
the area data visualization. Wellcome Open Research, 4, Article 63.
https://www.perceptualedge.com/ https://doi.org/10.12688/wellcomeopenres.15191.1
A website and blog maintained by data visualization expert Anscombe, F. J. (1973). Graphs in statistical analysis. The
Stephen Few, with numerous entries spanning back to 2006 American Statistician, 27(1), 17–21.
Koponen, J., & Hildén, J. (2019). Data visualization handbook. Attali, D., & Baker, C. (2019). ggExtra: Add marginal histo-
Aalto korkeakoulusäätiö. grams to “ggplot2”, and more “ggplot2” enhancements
A practical guide to data visualization. For example, see here (Version 0.9). R package version.
for comparisons of differential effectiveness of ways of Brewer, C. A., Hatchard, G. W., & Harrower, M. A. (2003).
conveying different types of values (e.g., shapes, color, ColorBrewer in print: A catalog of color schemes for maps.
line length, position, etc): “Visual variables,” https://data Cartography and Geographic Information Science, 30(1),
vizhandbook.info/. 5–32. https://doi.org/10.1559/152304003100010929
18 Hehman, Xie
Cleveland, W. S., & McGill, R. (1984). The many faces of a Neitz, J., & Neitz, M. (2011). The genetics of normal and defec-
scatterplot. Journal of the American Statistical Association, tive color vision. Vision Research, 51(7), 633–651. https://
79(388), 807–822. https://doi.org/10.1080/01621459.1984 doi.org/10.1016/j.visres.2010.12.002
.10477098 Nuñez, J. R., Anderton, C. R., & Renslow, R. S. (2018).
Cleveland, W. S., & McGill, R. (1985). Graphical percep- Optimizing colormaps with consideration for color vision
tion and graphical methods for analyzing scientific data. deficiency to enable accurate interpretation of scientific
Science, 229(4716), 828–833. https://doi.org/10.1126/sci data. PLOS ONE, 13(7), Article e0199239. https://doi.org/
ence.229.4716.828 10.1371/journal.pone.0199239
Crameri, F. (2018). Scientific colour maps. Zenodo. https://doi Ofosu, E. K., Chambers, M. K., Chen, J. M., & Hehman, E.
.org/10.5281/zenodo.1243909 (2019). Same-sex marriage legalization associated with
Few, S. (2007). Save the pies for dessert. https://www.percep reduced implicit and explicit antigay bias. Proceedings of
tualedge.com/articles/visual_business_intelligence/save_ the National Academy of Sciences, USA, 116(18), 8846–
the_pies_for_dessert.pdf 8851. https://doi.org/10.1073/pnas.1806000116
Few, S. (2011). Quantitative displays for combining time-series Spear, M. E. (1952). Charting statistics. McGraw-Hill.
and part-to-whole relationships. https://www.perceptu Stevens, S. S. (1975). Psychophysics: Introduction to its per-
aledge.com/articles/visual_business_intelligence/displays_ ceptual, neural, and social prospects. John Wiley & Sons.
for_combining_time-series_and_part-to-whole.pdf Tiedemann, F. (2020). gghalves: Compose half-half plots using
Garnier, S., Ross, N., Rudis, B., Sciaini, M., & Scherer, C. (2018). your favourite geoms (Version 0.1.1). R package version.
viridis: Default color maps from “matplotlib” (Version 0.5.1). Tufte, E. R. (1983). The visual display of quantitative informa-
R package version. tion. Graphics Press.
Heer, J., & Bostock, M. (2010). Crowdsourcing graphical per- Tukey, J. W. (1977). Exploratory data analysis (Vol. 2).
ception: Using Mechanical Turk to assess visualization Addison-Wesley.
design. In Proceedings of the SIGCHI Conference on Human Weissgerber, T. L., Milic, N. M., Winham, S. J., & Garovic, V. D.
Factors in Computing Systems (pp. 203–212). (2015). Beyond bar and line graphs: Time for a new
Hehman, E., Calanchini, J., Flake, J. K., & Leitner, J. B. (2019). data presentation paradigm. PLOS Biology, 13(4), Article
Establishing construct validity evidence for regional e1002128. https://doi.org/10.1371/journal.pbio.1002128
measures of explicit and implicit racial bias. Journal of Wickham, H. (2011). Ggplot2. Wiley Interdisciplinary Reviews:
Experimental Psychology: General, 148(6), 1022–1040. Computational Statistics, 3(2), 180–185.
https://doi.org/10.1037/xge0000623 Wickham, H., François, R., Henry, L., & Müller, K. (2021).
Hehman, E., Flake, J. K., & Calanchini, J. (2018). Dispro dplyr: A Grammar of Data Manipulation (Version 1.0.5).
portionate use of lethal force in policing is associated R package version.
with regional racial biases of residents. Social Psychological Wilke, C. O. (2019). Fundamentals of data visualization: A
and Personality Science, 9(4), 393–401. https://doi.org/ primer on making informative and compelling figures.
10.1177/1948550617711229 O’Reilly Media.
Helske, J., Helske, S., Cooper, M., Ynnerman, A., & Besançon, Wilkinson, L., & Friendly, M. (2009). The history of the clus-
L. (2021). Can visualization alleviate dichotomous think- ter heat map. The American Statistician, 63(2), 179–184.
ing? Effects of visual representations on the cliff effect. https://doi.org/10.1198/tas.2009.0033
ArXiv. https://doi.org/10.1109/TVCG.2021.3073466 Xie, S. Y., Flake, J. K., & Hehman, E. (2019). Perceiver and
Jones, B. C., DeBruine, L. M., Flake, J. K., Liuzza, M. T., Antfolk, J., target characteristics contribute to impression formation
Arinze, N. C., Ndukaihe, I. L. G., Bloxsom, N. G., Lewis, S. C., differently across race and gender. Journal of Personality
Foroni, F., Willis, M. L., Cubillas, C. P., Vadillo, M. A., and Social Psychology, 117(2), 364–385. https://doi
Turiegano, E., Gilead, M., Simchon, A., Saribay, S. A., .org/10.1037/pspi0000160
Owsley, N. C., Jang, C., . . . Coles, N. A. (2021). To which Zeileis, A., Fisher, J. C., Hornik, K., Ihaka, R., McWhite, C. D.,
world regions does the valence–dominance model of social Murrell, P., Stauffer, R., & Wilke, C. O. (2019). Colorspace:
perception apply? Nature Human Behaviour, 5(1), 159– A toolbox for manipulating and assessing colors and pal-
169. https://doi.org/10.1038/s41562-020-01007-2 ettes. ArXiv. http://arxiv.org/abs/1903.06490