0% found this document useful (0 votes)

13 views18 pages

Hehman & Xie (2021) - Doing Better Data Visualization

This tutorial focuses on improving data visualization techniques for social scientists, emphasizing the importance of clear communication of data patterns. It covers design philosophies, color choices, and provides R code examples for visualizing central tendencies, proportions, and relationships between variables. The authors aim to bridge the gap between data visualization expertise and social scientists' needs for effective representation of their research findings.

Uploaded by

Pablo Micael Araújo Castro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views18 pages

Hehman & Xie (2021) - Doing Better Data Visualization

Uploaded by

Pablo Micael Araújo Castro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

1045334

research-article2021
AMPXXX10.1177/25152459211045334Hehman, XieData Visualization

ASSOCIATION FOR
Tutorial PSYCHOLOGICAL SCIENCE
Advances in Methods and

Doing Better Data Visualization Practices in Psychological Science

October-December 2021, Vol. 4, No. 4,
pp. 1–18
© The Author(s) 2021
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/25152459211045334
https://doi.org/10.1177/25152459211045334
Eric Hehman and Sally Y. Xie www.psychologicalscience.org/AMPPS
Department of Psychology, McGill University, Montreal, Quebec, Canada

Abstract
Methods in data visualization have rapidly advanced over the past decade. Although social scientists regularly need to
visualize the results of their analyses, they receive little training in how to best design their visualizations. This tutorial is
for individuals whose goal is to communicate patterns in data as clearly as possible to other consumers of science and
is designed to be accessible to both experienced and relatively new users of R and ggplot2. In this article, we assume
some basic statistical and visualization knowledge and focus on how to visualize rather than what to visualize. We distill
the science and wisdom of data-visualization expertise from books, blogs, and online forum discussion threads into
recommendations for social scientists looking to convey their results to other scientists. Overarching design philosophies
and color decisions are discussed before giving specific examples of code in R for visualizing central tendencies,
proportions, and relationships between variables.

Keywords
graphing/plotting, data visualization, open data, open materials

Received 4/20/21; Revision accepted 8/11/21

Advances of the past decade in open-source software, Guiding Philosophies

computational power, and data-visualization science
have given rise to both improved ways of visualizing This tutorial is for scientific communication. Much of
data and the tools to do so. Rapid changes in develop- what is discussed below may not apply depending on
ment are always accompanied by some uncertainties. one’s goals (e.g., aesthetics) or one’s audience (e.g.,
How does one communicate results most effectively? children, laypersons). In this tutorial, we assume your
What are best practices? In the present article, we aim goal is to communicate patterns in your data as clearly
to serve as an intermediary between people develop- as possible to other consumers of science. Furthermore,
ing new data visualizations and specializing in visual- we also assume some basic statistical and visualization
ization practices and social scientists wishing to apply knowledge (e.g., do not truncate your y-axis) and focus
these techniques to best visualize the results of their on how to visualize rather than what to visualize in a
research. given situation.
Accordingly, this tutorial will have three sections.
First, we discuss important design philosophies; second, Information richness
we speak to decisions about interior components of any
figure; and finally, we provide specific examples of The first philosophy is that of richness. Edward Tufte
improved visualizations for common types of results (1983), a pioneer in data visualization, advocated as
across the social sciences. Throughout, we include principles “Tell the truth” and “Show as much data as
labeled R code for didactic purposes and provide exam-
ple data sets so readers can determine how to structure Corresponding Author:
their data for the accompanying visualization. Code and Eric Hehman, Department of Psychology, McGill University
data are available at https://osf.io/kx4us/. Email: [email protected]

Creative Commons NonCommercial CC BY-NC: This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0
License (https://creativecommons.org/licenses/by-nc/4.0/), which permits noncommercial use, reproduction, and distribution of the work without
further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
2 Hehman, Xie

12 12

10 10

8 8
y1

y2
6 6

4 4

4 6 8 10 12 14 16 18 4 6 8 10 12 14 16 18
x1 x2

12 12

10 10

8 8
y3

6
y4 6

4 4

4 6 8 10 12 14 16 18 4 6 8 10 12 14 16 18
x3 x4

Fig. 1. Anscombe’s quartet. In all four data sets depicted, the mean of x is 9, the variance of x is 11, the mean of y is 7.5, the variance
of y is 4.12, and the correlation between x and y is .82. Important features of the data are hidden unless the individual observations
are visualized.

possible.” Using visualizations to increase information (Fig. 1; Anscombe, 1973) is a famous illustration of how
richness speaks to both principles. Anscombe’s quartet descriptive statistics can conceal important features of
your data.
Every data visualization, like any descriptive statistic,
is a simplification of your data. Just like descriptive sta-
tistics can mask meaningful underlying variation, basic
visualizations that oversimplify your data can do so as
well. To the extent that you include more fine-grained
information, you can better convey the actual patterns
within your data. Consider the classic bar plot: When
used to summarize means, bar plots oversimplify because
they depict only the means of different conditions, and
a great deal of important information is lost (Weissgerber
et al., 2015). For example, two conditions might have
the exact same mean but very different underlying dis-
tributions of observations giving rise to those means
(Fig. 2).
Including more visualization features can convey
more information to the reader in the same space,
thereby increasing the information richness of the visu-
Fig. 2. An informationally sparse visualization (left) plotted from toy
alization. A common first step would involve represent-
data. This bar plot reveals two conditions that have identical means. Yet
from the same data, plotting the individual observations (right) reveals a ing the variability around those means (e.g., error bars).
very different distribution in each condition giving rise to those means. A further step would be representing the distribution of
Data Visualization 3

8
region
Africa
Asia
7
Australia & New Zealand
Central America & Mexico
Eastern Europe
Intelligence Ratings

6 Middle East
Scandinavia
South America
UK
5
USA & Canada
Western Europe

4 stim_ethnicity
asian
black
3 latinx
white

1 2 3 4 5 6 7 8 9
Attractiveness Ratings
Fig. 3. An overinformationally rich visualization. This scatterplot depicts the relationship between ratings of attractiveness and
ratings of intelligence made on targets across four ethnicities by perceivers from different world regions.

the observations. An additional step would be visualizing Minimalism

the observed data points giving rise to those means and
distributions. Readers would then have access to both A second important philosophy is that of minimalism.
summary statistics and the variability and shape of the Visualizations can be evaluated in their signal-to-noise
entire distribution of observations, which provide greater ratio, in which signal is the information being conveyed
understanding of the certainty of any estimate (Helske and noise is anything else. The most effective commu-
et al., 2021). nication maximizes the signal-to-noise ratio by minimiz-
Of course, there is a subjective upper ceiling to how ing visual clutter that might interfere with the signal. An
much information can be conveyed in any visualization extreme version of this argument is that one should
before it instead hinders understanding. Figure 3 depicts justify every single pixel in the visualization. Features
the correlation between attractiveness and intelligence not conveying information or allowing readers to assess
for ratings of targets across four ethnicities (represented the patterns more easily should be removed. These
by shapes) from participants in 11 world regions (rep- might be overlooked features included as default or
resented by color; with data from Jones et al., 2021). commonly seen in some software packages (e.g., exces-
This figure is too rich in information; it hinders the sive gridlines in the plot panel). As an extreme example,
viewer’s comprehension of all the data presented. the serifs in various typefaces are unnecessary pixels
Overwhelmingly complex figures impede the overarch- because they are not providing additional information.
ing goal of science communication: to convey information Sans-serif typefaces are more consistent with minimal-
clearly. And deciding when a figure is too rich is unavoid- ism. Furthermore, it is rare that any analysis done by
ably subjective. Yet as we discuss below, research into most social scientists requires a three-dimensional visu-
the amount of information understood from visualizations alization because it distorts the data and hampers read-
can inform exactly where the information richness ceiling ers’ understanding (Wilke, 2019). Shadows or reflections
might be, depending on the type of visual (Cleveland & under text or borders on shapes are all visual noise that
McGill, 1985; Heer & Bostock, 2010). is not conveying additional information. To be consistent
4 Hehman, Xie

Categorical
to 3. In this situation, it is informative to have the posi-
tive and negative directions be distinct colors that scale
as the values become farther from zero. In addition, the
Continuous zero point may be best represented as no information,
which separates the colors chosen for the positive and
Zero-point
negative side of the scales (Fig. 4). Some ideal color
palettes can again be found for this situation through
Fig. 4. Examples of distinct color palettes for different types of data.
ColorBrewer (Brewer et al., 2003).
Several packages in R currently represent the state
of the art. One is viridis (Garnier et al., 2018). It has
with the philosophy of minimalism in effective scientific been carefully developed to have eight palettes that
communication, these unnecessary flourishes should be represent continuous change across a spectrum in pal-
removed. ettes that are safe for both color blindness and gray
scale (Nuñez et al., 2018). Another is the colorspace
package (Zeileis et al., 2019), which is based on human
Color color perception; colors vary along hue, chroma, and
One of the most important considerations in any modern luminance dimensions. Likewise, scico (Crameri, 2018)
visualization is that of color. There are a number of offers gradients that are perceptually uniform and uni-
concerns to simultaneously navigate when considering versally readable.
your choice of color. The first is inclusivity. Five percent
of the human population, 8% to 10% of men, have some
sort of color blindness; the most common is red-green
Better Visualization of Common Results
color blindness (Neitz & Neitz, 2011). Another concern As a general philosophy, goal-centered graph design, or
is that although screen-based reading of articles is now choosing a visualization that highlights your specific
more common, ideally your color choices would still hypotheses or goals, will make visualizations most effec-
effectively convey information when printed in gray tive. There are some common visualizations that are
scale because your article will likely be sometimes read overwhelmingly used to convey certain types of informa-
in that way. Most importantly, consider the type of infor- tion. Many of these enjoy their level of popularity
mation being presented. Are your data categorical? Are because of historic precedent in that area of research
there two categories or five? Continuous? Is there a zero and perhaps at one time did comprise the cutting edge
point in your continuum? The answers to each of these of visualization. Yet like any technology, other improved
questions should inform your palette choice. methods have been developed that are now objectively
When your data are categorical, your goal is to choose superior. Summarizing these advances very generally,
colors that are maximally differentiable within the color the improvements in visualization hinge on providing
space (while simultaneously being safe for color blind- improved methods of conveying two types of informa-
ness and gray scale). Exactly what these maximally dif- tion (that are related): representations of variance around
ferentiable colors might be depends on how many a central tendency and representations of the overall
categories you need to be equally spaced in color. Excel- distribution of the data. In this section, we discuss three
lent tools such as ColorBrewer (Brewer et al., 2003) common types of information to be conveyed by studies
palettes are valuable and available at https://color in the social sciences and the modern best practices for
brewer2.org. conveying that information in data visualizations.
When considering a continuous scale, color gradients R code and example data are provided in each sec-
can bias a reader’s perception of relative quantitative tion. All plots were created using the ggplot2 package
differences. For instance, certain colors, such as yellow, (Wickham, 2011), which is required for the tutorial code
can create apparent divisions in a scale not actually there to run, along with data hygiene packages such as dplyr
because of their high luminosity. Some other color transi- (Wickham et al., 2021). In addition, we used the viridis
tions can bias readers into believing there is a bigger (Garnier et al., 2018) and colorspace (Zeileis et al., 2019)
value change in a certain part of the scale. It is important color palette libraries, ggExtra (Attali & Baker, 2019), to
that the color gradient consistently changes in value add marginal density plots and histograms, and gghalves
from the top to the bottom of the scale identical to the (Tiedemann, 2020) to create the raincloud plots pre-
value change of the numbers the colors represent. sented below. For those interested in a primer to R, the
Sometimes researchers may wish to visually represent tidyverse, or ggplot2, see the For Further Reading sec-
a zero point along a continuous scale, such as from −3 tion at the end of the article. More information on each
Data Visualization 5

package is available in the Supplemental Material avail- Central Tendency

able online.
Perhaps the most common information social scientists
# Required R packages wish to convey are the central tendencies, usually means,
library("ggplot2") # required to in several different conditions. The most common way
make plots of representing this information is the bar plot. As
library("dplyr") # for data alluded to above, certain variants of bar plots present
wrangling/hygiene only the mean, a simplification that occludes much infor-
library("viridis") # viridis mation about the underlying data. Improved bar graphs
color palettes include error bars representing variation around that
library("colorspace") # colorspace mean, albeit still in a simplified fashion.
color palettes Another common index of central tendency is that of
library("ggExtra") # to add the median. A data visualization based around the
marginal density median is the box plot, pioneered by Spear (1952) and
plots & histograms enhanced into its current form by Tukey (1977). For a
library("gghalves") # required to dated visualization, the box plot remains extremely
make raincloud plots effective in conveying a large amount of information
about the underlying data. Yet modern improvements
In addition to loading these libraries, we set up a have been made.
custom minimalism theme to reduce the redundancy in The addition of the two additional components men-
R code across our examples in the article. The R code tioned above, the actual observed data points and a visu-
provided in full is available as supplemental material at alization of the distribution of those points, can increase
https://osf.io/kx4us/. information richness. These additions far better convey
the underlying data giving rise to the central tendencies.
# create ggplot2 theme
e will use ggplot ′s minimal theme as
# w Raincloud plot
a base and modify it to be usable
across our plots Here, we recommend the raincloud plot over alternatives
because it best operationalizes the philosophies laid out
theme_minimalism <- function(){ above (Allen et al., 2019). Essentially, the raincloud plot
theme_minimal() + # ggplot ′s minimal includes a representation of the overall distribution of
theme hides many unnecessary observations, the actual observations, and measures of
features of plot central tendency. If desired, elements of the box plot
theme( # make modifications to the could be seamlessly integrated in additional layers such
theme that the median and the range of the quartiles of the
panel.grid.major.y=element_ distribution are included.
blank(), In the following example, we use a raincloud plot to
# hide major grid for y axis illustrate Québec residents’ views on “Bill 21,” a recent
panel.grid.minor.y=element_ law passed by the government of Québec prohibiting
blank(), some public-sector employees from wearing religious
# hide minor grid for y axis symbols. We measured the extent to which Québecois
panel.grid.major.x=element_ believed the bill was implemented to address concerns
blank(), over specific religious symbols (e.g., hijab, crucifix) on
# hide major grid for x axis items rated on a Likert scale from 1 to 7 (Fig. 5).
panel.grid.minor.x=element_ Some features included above improve the visualiza-
blank(), tion. With large numbers of observations, individual
# hide minor grid for x axis data points overlap. A solution we employed above, on
text=element_text(size=14), Line 15, is to jitter the location of these data points to
# font aesthetics reduce this overlap. This slightly changes their location
axis.text=element_text(size=12), on the x-axis on an irrelevant y-axis so they can be
a xis.title=element_text(size=14,
observed. Enhancing this visualization further is the
face="bold")) partial transparency of these data points on Line 16
} (i.e., α).
6 Hehman, Xie

1 # Required packages for raincloud plots

2 library("readr")
3 library("gghalves")
4
5 load("RaincloudData.Rda")
6
7 # Raincloud plot with repeated measurements
8 f1 <- RaincloudData %>% # define dataframe
9 ggplot(aes(x = ReligiousSymbol, # define x var
10 y = Relation_to_Bill21)) + # define y var
11
12 #Add individual observations to the plot
13 geom_point(
14 aes(color = ReligiousSymbol), # we want different colors for each
level of x
15 position = position_jitter(width=.1), # add jitter to the observations
16 size=.5, alpha=.8) + # set the size of each dot. alpha adds transparency
17
18 # Define color palette
19
20 scale_color_discrete_qualitative(palette="Dark 3") + # add color palette
21 scale_fill_discrete_qualitative(palette="Dark 3") + # add fill palette
22
23 # Add the mean for each level of X
24 stat_summary(fun=mean, # this indicates we want the mean statistic
25 geom="point", # we want the mean to be represented by a geom
26 shape=21, # use shape 21 (a circle with fill) for the mean
27 size=2, col="black", fill="white") + # set size, color, &
fill
28
29 # Add boxplot for observations at each level
30 geom_half_boxplot(aes(fill=ReligiousSymbol), # different colors for
each level of x
31 side="r", outlier.shape=NA, center=TRUE, # styling for
boxplots
32 position = position_nudge(x=.15), # position of
boxplots
33 errorbar.draw=FALSE, width=.2) + # hide errorbar
34
35 # Add violin plots for observations at each level
36 geom_half_violin(aes(fill=ReligiousSymbol), # different colors for
each level of x
37 bw=.45, side="r", # styling for the
violin plot
38 position = position_nudge(x=.3))+ # position of violins
39
40 # Optional styling
41 coord_flip() + # flip x & y coordinates
42 xlab("Religious Symbol") + # x-axis label
43 ylab("Perceived Relation to Bill 21") + # y-axis label
44 scale_y_continuous(breaks=seq(1,7,1)) + # y-axis ticks
45 theme_minimalism() + # apply our custom minimal theme
46 theme(legend.position="none", # hide legend
47 panel.grid.major.x=element_line())# show major grid for x axis
48 f1
49
50 # save plot
51 ggsave(f1,filename="figs/Raincloudplot.png",dpi=300,type="cairo",
52 height=14,width=18, units="cm")
Data Visualization 7

Hijab

Religious Symbol Turban

Kippah

Crucifix

1 2 3 4 5 6 7
Perceived Relation to Bill 21
Fig. 5. Raincloud plot combining a probability density function, jittered data points,
a mean represented by the white circle, and a box plot. The advantage of these
additional features is salient here because they reveal several important features of
the data, including nonnormal distributions of observations that would be otherwise
obscured by presenting only a measure of central tendency like the bar plot.

It is our opinion that these methods of data visualiza- In the following example, we use a cluster heat map
tion fully subsume the information conveyed by the bar (Fig. 6) to show how explicit antigay bias changed over
plot and box plot. In fact, because we do not believe time across each state in the United States (with data
there to be any information present in the bar plot not from Ofosu et al., 2019).
available in its modern descendants, for representing In general and for various reasons, we consider the
central tendencies in finalized scientific communication, raincloud plot and cluster heat map more consistent with
we think the bar plot should be fully retired. the philosophies laid out above for conveying central
tendencies than the bar plot, box plot, violin plot,
beeswarm plot, bean plot, pirate plot, lollipop plot, or
Cluster heat map ridgeline plot, although some of these might still provide
Some researchers may wish to show mean change over some advantages in niche situations.
time across multiple conditions or categories or as a
function of some other continuous variable. When addi-
tionally incorporating time, visualizing all the observa- Proportions or Frequencies
tions and distributions at each point is likely too complex Another common type of information presented is that
and visually overwhelming. It may be more effective to of proportions or frequencies. Unlike central tenden-
focus on the information you want to convey most effec- cies, there is no variance to represent around these
tively: mean change for multiple categories over time. observed counts. Accordingly, priorities of the data
One visualization ideal for this situation is the cluster visualization vary. Yet like central tendencies, scientists
heat map (alternatively known as a tile map or level plot; often wish to visually compare proportions with one
Wilkinson & Friendly, 2009). Here, means over time are another. Because multiple proportions are a percentage
represented by color, and each rectangle represents a of some greater whole, a classic way of representing
fixed set of time. This plot enables easy comparison both these data for comparison is a pie chart. We see pie
across many categories and within a category. charts (or other circular visualizations) occasionally but
8 Hehman, Xie

53 load("HeatmapData.Rda")
54
55 # cluster heat map / level plot with change over time in squares
56
57 f2 <- HeatmapData %>% # define dataframe
58 ggplot(aes(x=Year, y=State, z=Explicit)) + # define x, y, and z
variables
59
60 # add observations to the heat map
61 geom_tile(aes(fill = Explicit)) + # we will fill the map with colors
based on
62 # values on the z variable
( Explicit Bias)
63
64 # Define color palette
65 # For this example, we will use the "Inferno" palette from the
colorspace package
66 scale_fill_continuous_sequential(palette="Inferno", # define palette
67 name="Explicit Bias") + # name of legend
68 # optional styling
69 scale_x_continuous(breaks=seq(2003,2015,3)) + # x-axis tick marks
70 xlab("Year") + # x-axis label
71 ylab("State") + # y-axis label
72 ylim(rev(levels(HeatmapData$State))) + # order y-axis
alphabetically
73 theme_minimalism() + # apply our custom
minimal theme
74 theme(panel.grid.major.y=element_line()) # show major gridline
for y axis
75 f2
76
77 # we can also order the y-axis another way. below is the code to sort
the States
78 # by their mean level of prejudice (across all years).
79 yaxisOrder <- HeatmapData %>%
80 group_by(State) %>%
81 dplyr::summarize(avgExplicit = mean(Explicit)) %>%
82 ungroup() %>%
83 arrange(avgExplicit)
84 levels(yaxisOrder$State) <- yaxisOrder$State # this creates the order
of the states
85
86 # then, we add the following to our figure to sort according to States'
87 # average explicit bias
88 f2 <- f2 +
89 ylim(levels(yaxisOrder$State)) # you may ignore the warning
that a scale for 'y' is
90 ## already present. This replaces the
existing scale.
91 f2
92 # save plot
93 ggsave(f2,filename="figs/levelplot.png",dpi=300,type="cairo",
94 height=23,width=11.5, units="cm") # adjust dims to change
s ize of cells
Data Visualization 9

MS
AL 0.4
LA
SC
GA 0.3

Frequency (%)
AR
MO
ND 0.2
DE
FL
TX 0.1
NC
PA
IL 0.0
1 2 3 4 5 6 7
NJ
OH Response (1-7 Scale)
MN
WI Fig. 7. Frequency (%) of responses on a Likert-type item scaled from
IA 1 to 7. Observations were collected at a single time point. Note that
KS even for very small differences, such as Response 1 and Response 2,
SD column length allows for precise comparisons.
TN Explicit Bias
NE 1.6
WV that humans are not very good at perceived circular
OK 1.2 area and so inaccurately interpret proportions visually
State

MI represented by a pie chart (Few, 2007; Stevens, 1975).

KY 0.8
IN This issue is compounded with multiple pie charts,
NV 0.4 when readers are comparing proportions not only
MD within a chart but also with other charts (Tufte, 1983).
VA Superior alternatives have been developed.
CT
NY
AZ Bar plot
CA
NH Superior alternatives to pie charts are variants of a bar
MA
WY plot. Although we have critiqued the bar plot for central
RI tendencies, when comparing proportions with one
ID another, a simple bar plot is superior because humans
CO
UT
comprehend values represented by length well (Cleve-
WA land & McGill, 1985; Heer & Bostock, 2010). Which
MT type of bar plot to choose depends on one’s goals and
AK what one might wish to emphasize to readers (presum-
ME
DC ably mirroring your statistical comparisons). For exam-
NM ple, if you wish to compare one proportion with
OR another, separate columns aligned next to one another
VT
HI
far more effectively convey the size of each proportion
relative to one another. In Figure 7, we illustrate the
2003 2006 2009 2012 2015
proportion of responses on a Likert-type item scaled
Year
from 1 to 7 in which greater values represent greater
Fig. 6. Cluster heat map comparing values of multiple categories levels of self-reported anti-Black bias made by partici-
over time. Here, the mean values of each state and year are conveyed pants in a single week (with data from Hehman et al.,
by color. Although color is not always ideal for presenting values 2018). Because there is no residual, there is no informa-
(Cleveland & McGill, 1985), it is an effective option when there is a
lot of information to be conveyed because it optimizes information tion lost in a bar plot representing proportions or
richness. We have sorted this plot by mean prejudice, but it could also frequencies.
be sorted in other ways to enable specific comparisons that emphasize
the authors’ points.
Stacked bar plot
less frequently in scientific articles but very commonly For a situation akin to multiple pie charts, when not only
in dashboards or scientific communication to the pub- comparisons within a cluster are important but also com-
lic. However, the pie chart and other circular visualiza- paring proportions between clusters, stacked bar plots
tions have some strong limitations. Research reveals allow for efficient comparison both between bars and
10 Hehman, Xie

95 load("BarandLineplotData.Rda")
96
97 # bar chart comparing proportions across single category
98 f3 <- BarAndLineplotData %>% # define dataframe
99 filter(weeks==2) %>% # filter data only from week 2
100 ggplot(aes(x=response, y=percent, # define x, y variables
101 fill=response)) + # the fill variable defines the color
of bars
102 # add bars
103 geom_bar(stat = "identity", position="dodge") + # style of bars. add
fill="black" to
104 # set the same color
across all bars
105 # optional styling
106 # Define color palette
107 # For this example, we will use the "viridis" palette from the viridis
package
108 scale_fill_viridis(discrete = T, option="viridis") +
109 xlab("Response (1-7 Scale)") + # x-axis label
110 ylab("Frequency (%)") + # y-axis label
111 theme_minimalism() + # apply our custom minimal
theme
112 theme(legend.position="none", # hide legend

113 panel.grid.major.y=element_line()) # show major grid for

y axis
114 f3
115 # save plot
116 ggsave(f3,filename="figs/barplot1.png",dpi=300,type="cairo",
117 height=11,width=16, units="cm")

within bars. Figure 8 illustrates the changing proportion proportions than their alternatives, including the pie
of responses on the same Likert-type item scaled from chart, spider chart, radar chart, tree map, doughnut plot,
1 to 7 made by participants across 4 weeks. area chart, stacked area plot, or steam graph, although

Line plot
Like means over time, a common situation is that 1.00
researchers wish to visualize how proportions change
over time or as a function of some other continuous 0.75 Response
Frequency (%)

variable. Also like means over time, this is a high amount 1

2
of information that can become too complex with too 3
0.50
many separate stacked bar plots like above. Instead, line 4
5
plots are an excellent choice. 6
In the following example, we expand on the bar-plot 0.25 7
examples to compare the same proportions across more
than 700 time points. Figure 9 illustrates the changing 0.00
proportion of responses on a Likert-type item scaled 1 2 3 4
from 1 to 7 made by participants across hundreds of Week
weeks. Fig. 8. Frequency (%) of responses on a Likert-type item scaled from
Again, we consider the bar, stacked bar, and line plots 1 to 7 in which observations collected across four time points are
more consistent with the philosophies laid out above for compared.
Data Visualization 11

some of these might still provide some advantages in mostly adopted best practices. We see scatterplots regu-
niche situations. larly in our respective corner of research. Nonetheless,
some additions can improve the information communi-
cated. Like means, it is important here to represent both
Relationships a central tendency of the relationship and the variance
Finally, researchers often want to visualize a relationship around that relationship. Typically, line graphs are used
between two or more variables, such as a correlation or to represent relationships, and like the other types of
regression slope. In our subjective opinions, it is for this information we are covering, they can be improved by
type of visualization that social scientists have already better conveying the distribution of data.

118 # bar chart comparing proportions across multiple discrete categories

(i.e., weeks)
119 load("BarandLineplotData.Rda")
120
121 # optional: define custom color palette, assigning a color for each value
122 my.pal <- c("7" = "#403C91",
123 "6" = "#8B96D7",
124 "5" = "#DCEBF9",
125 "4" = "#F5F5F5",
126 "3" = "#F2CB89",
127 "2" = "#F2B552",
128 "1" = "#FFCB25")
129
130 # For this example, we want to compare data from weeks 1 to 4
131 # so we will create an index to define which groups to compare
132 index = c(1:4) # compare data from weeks 1 to 4
133
134 f4 <- BarAndLineplotData %>% # define dataframe
135 filter(weeks %in% index) %>% # filter data by weeks variable (weeks 1-4)
136 ggplot(aes(x = weeks, y = percent)) + # define x,y variables
137 geom_col(aes(fill = response), width = 0.7)+ # add bars, set width for bars
138 # the fill variable sets the
colors
139
140 # optional styling
141 # Define color palette
142 # For this example, we will use the "viridis" palette from the viridis package
143 scale_fill_viridis(discrete=T, option="viridis",# color of bars
144 name = "Response") + # change legend title
145 #scale_fill_manual("Legend",values = my.pal) + # uncomment to use
146 pre-defined palette
147 xlab("Week") + # x-axis label
148 ylab("Frequency (%)") + # y-axis label
149 theme_minimalism() + # apply our custom minimal theme
150 theme(panel.grid.major.y=element_line()) # show major grid for y axis
151 f4
152 # save plot
153 ggsave(f4,filename="figs/barplot2_stacked.png",dpi=300,type="cairo",
154 height=11,width=18, units="cm")
12 Hehman, Xie

155 load("BarandLineplotData.Rda")
156 # stacked line plot with total proportion as separate line
157
158 # for this example, we will also add a line to represent the cumulative frequency
159 # calculate cumulative frequency across all levels of x, per y
160 BarAndLineplotData <- BarAndLineplotData %>%
161 group_by(weeks) %>% # group by week
162 dplyr::mutate(percent_TOTAL := sum(percent, na.rm=TRUE)) %>% # get total % per week
163 ungroup()
164
165 # create stacked line plot
166 f5 <- BarAndLineplotData %>% # define dataframe
167 ggplot(aes(x = weeks, # define x variable
168 y = percent, # define y variable
169 fill = response, # set grouping variable for bar colors
170 color = response)) + # set grouping variable for bar colors
171 geom_line(size = 0.4) + # add lines for each group
172
173 # add cumulative frequency to line plot
174 geom_line(aes(x=weeks,y=percent_TOTAL), # add line for total
175 color="black", size = 1) + # color and size for total line
176
177 # optional styling
178 # define color palette using "viridis" palette from viridis package
179 scale_color_viridis(discrete=T, option="viridis",# changes line colors
180 name = "Response") + # legend title
181 xlab("Week") + # x-axis label
182 ylab("Frequency (%)") + # y-axis label
183 coord_cartesian(xlim=c(1,769)) + # set axis limits
184 scale_x_continuous(breaks=seq(0,769,100)) + # x-axis tick marks
185 theme_minimalism() + # apply custom minimal theme
186 theme(panel.grid.major.x=element_line(), # show all major/minor grids
187 panel.grid.major.y=element_line(),
188 panel.grid.minor.x=element_line(),
189 panel.grid.minor.y=element_line())
190 f5
191 # save plot
192 ggsave(f5,filename="figs/barplot3_lineplot.png",dpi=300,type="cairo",
193 height=13,width=18, units="cm")

Improved scatterplot additionally be included to indicate certainty in the slope

estimate that can be hard to glean from the data points
We consider the scatterplot to be superior to a line plot themselves.
because it demonstrates both the relationship between In Figure 10, we use an improved scatterplot to visual-
variables and the underlying observations that drive that ize the relationship between implicitly and explicitly
relationship. Including additional features, such as his- measured anti-Black bias across hundreds of White par-
tograms or density plots of the distributions of each ticipants (aggregated to geographic regions from
individual variable along the x- and y-axes, can further Hehman et al., 2019). We include histograms in the mar-
improve the scatterplot. Furthermore, 95% confidence gins of the x-axis and y-axis to show the underlying
intervals around the estimate of the slope might distributions of each variable.
Data Visualization 13

1.00 with millions of data points, using a scatterplot results

in a smear in which no pattern is discernible because of
overlap of the points. There are two solutions we prefer
0.75 Response
in this situation. The first is to randomly sample a per-
centage of the observations and represent them in the
Frequency (%)

1
2 visualization as a scatterplot. However, doing so can
0.50 3 require some additional programming. Alternatively,
4
5 researchers might employ a contour plot, essentially
6 turning the scatterplot into a heat or topographical map
0.25 7 in which certain colors represent a higher density of
observations (i.e., a modern version of sunflowers;
Cleveland & McGill, 1984), which enables readers to still
0.00
ascertain the underlying relationship while simultane-
0 100 200 300 400 500 600 700 ously seeing the distributions of the observed data across
Week two axes.
Fig. 9. Frequency (%) of responses on a Likert-type item scaled 1 to 7 To illustrate, in Figure 11, we use a contour plot to
in which observations collected between 2007 and 2019 are compared. represent the same data presented above: the relation-
Rather than stacking the values, the lines are plotted over one another ship between implicit and explicit anti-Black bias. Rather
so their respective change over time can be compared (in contrast to a than the histograms we presented above, here, as a
stacked area plot, which can impede the accurate perception of values;
Few, 2011). We included a black line representing the total per week. variant, we included density distributions in the margins
The data here are proportions, so this value never deviates from 1. of the x-axis and y-axis. In fact, we prefer density dis-
However, when researchers are plotting raw values or frequencies over tributions over histograms because we believe they are
time, it might be informative to indicate how many total observations
more consistent with the principle of minimalism. For
occurred per week across all the distinct categories being plotted.
consistency, we have used these same data to illustrate
this type of visualization. Yet it is important to emphasize
Contour plot
we consider contour plots more appropriate when there
Sometimes researchers may have so many observations are more observations (e.g., > 5,000) to ensure a visu-
that scatterplots are no longer effective. For instance, alization is not too information rich.

194 load("ScatterPlotData.Rda")
195
196 # scatterplot
197 f6 <- ScatterplotData %>% # defines dataframe
198 ggplot(aes(x=ExplicitBias, y=ImplicitBias)) + # defines x and y axis variables
199
200 # add observations to scatterplot
201 geom_point(size=1, alpha=.7, color="darkgray") + # define size and color
202 # alpha adds transparency
203 # add fitted slope and 95% CIs
204 geom_smooth(size=1,method=lm,color="slateblue")+ # define size and color
205 # method=lm indicates linear
slope
206
207 # optional styling
208 scale_x_continuous(breaks=seq(0.4,1.6,.2)) + # x-axis tick marks
209 scale_y_continuous(breaks=seq(0.3,1.6,.05)) + # y-axis tick marks
210 xlab("Explicit Bias") + # x-axis label
211 ylab("Implicit Bias") + # y-axis
212 theme_minimalism() + # apply custom minimal theme
213 theme(panel.grid.major.x=element_line(), # show all major/minor grids
214 panel.grid.major.y=element_line(),
215 panel.grid.minor.x=element_line(),
216 panel.grid.minor.y=element_line())
14 Hehman, Xie

217
218 # add marginal histograms (requires ggExtra package)
219 f6 <- ggMarginal(f6, type=”histogram”, # add histograms to marginal plot
220 fill = “lightgray”, # color of histograms
221 xparams = list(bins=15), # n of bins for x variable
222 yparams = list(bins=15)) # n of bins for y variable
223 f6
224 # save plot
225 ggsave(f6,filename=”figs/scatterplot.png”,dpi=300,type=”cairo”,
226 height=14,width=18, units=”cm”)

0.50

0.45
Implicit Bias

0.40

0.35

0.30
0.4 0.6 0.8 1.0 1.2 1.4
Explicit Bias
Fig. 10. Improved scatterplot visualizing the relationship between implicit and explicit anti-
Black bias, including a 95% confidence band of the slope, with histograms of the variable
on each axis in the opposite margins.

227 load("ScatterPlotData.Rda")
228
229 # contour plot with density plots in the margins
230 f9 <- ScatterplotData %>% # defines dataframe
231 ggplot(aes(x=ExplicitBias, y=ImplicitBias)) + # defines x and y axis
variables
232 geom_point(stat="identity",size=0.01,alpha=0) + # add observations
233 stat_density_2d(aes(fill=..level..), # add the main contour plot
234 h = 0.1,geom="polygon") + # change h to adjust
binning
235
236 #optional styling
237 scale_fill_viridis(option="viridis") +
# color using viridis palette
238 stat_smooth(method = "lm", formula = y~x, # add regression line
239 size=2, color="black", se=F) + # style of regression line
Data Visualization 15

240 xlab(“Explicit Bias”) + # x-axis label

241 ylab(“Implicit Bias”) + # y-axis label
242 theme_minimalism() + # apply custom minimal theme
243 theme(
244 legend.position = c(0.87, 0.3), # legend position
245 panel.grid.major.x=element_line(), # show all major/minor grids
246 panel.grid.major.y=element_line(),
247 panel.grid.minor.x=element_line(),
248 panel.grid.minor.y=element_line())
249
250 # add density plots in the margins (requires ggExtra package)
251 f9 <- ggMarginal(f9, type=”density”)
252 f9
253 # save plot
254 ggsave(f9,filename=”figs/contourplot.png”,dpi=300,type=”cairo”,
255 h eight=14,width=18, units=”cm”)

Spaghetti plot this complexity in a single visualization is the spaghetti

plot (Fig. 12). We do not recommend including 95%
Finally, modeling relationships in clustered data in mul- confidence intervals, grid lines, or underlying data points
tilevel frameworks is becoming increasingly common- in this plot (as in Fig. 2) because it can become too
place. Although showing the grand averaged slope informationally rich and confusing, depending on the
across all clusters is important, it is valuable to show the number of clusters. Here, we visualize how attractive-
relationship within each cluster of the multilevel model ness and intelligence ratings of faces correlate within
varying around the grand slope. Effectively capturing participants (with data from Xie et al., 2019).

0.50

0.45
Implicit Bias

0.40
level
25
20
0.35 15
10
5

0.30
0.4 0.8 1.2
Explicit Bias
Fig. 11. Contour plot visualizing the relationship between implicit and explicit anti-Black
bias, with probability density functions in the margins. Areas with higher values on the
legend indicate higher density of observations.
16 Hehman, Xie

256 load("SpaghettiPlotData.Rda")
257
258 # Spaghetti plot for random slopes
259 f7 <- SpaghettiplotData %>% # define dataframe
260 ggplot(aes(x=attractive, # define x variable
261 y=intelligent)) + # define y variable
262
263 # create random slopes, where each line represents a slope for each cluster
264 # in this example, each cluster is a Participant
265 geom_line(aes(group=ParticipantID, color=ParticipantID), # set clustering
variable
266 stat="smooth", method="lm", # define the line as a linear
relationship
267 color="gray", size=0.8, alpha = 0.5) + # define style of lines
268
269 # create a grand slope across all clusters
270 stat_smooth(method="lm", formula = y~x, # grand average slope (linear)
271 color="coral",size = 1.5,se=F) + # define color, size of
272 average slope line
273
274 # optional styling
275 #scale_color_viridis(discrete=TRUE) + # different colors for each
cluster
276 coord_cartesian(ylim=c(1,7), xlim=c(1,7)) + # set axis limits
277 xlab("Attractiveness Ratings") + # axis labels
278 ylab("Intelligence Ratings") +
279 theme_minimalism() + # apply custom minimal theme
280 theme(legend.position="none") # hide legend
281 f7
282 # save plot
283 ggsave(f7,filename="figs/spaghettiplot.png",dpi=300,type="cairo",
284 height=14,width=18, units="cm")

Recommendations for Further Reading threads into recommendations viable for individuals
communicating their data and results to other consumers
Although we, the authors, regularly read, think about, of science.
and create data visualizations for our research, we are It is not coincidental that our recommendations often
not visualization professionals. Here, we have attempted hover around the most simple: variants of the bar plot,
to distill and present what we consider the information line plot, or scatterplot. These tried-and-true methods of
most applicable and useful to other social scientists from visualization have persisted across decades because they
people with greater expertise than we. However, we are effective and clear. Although new visualizations are
encourage interested readers to seek out the primary continually being developed (e.g., beeswarm plot, steam
sources and modern practitioners and have included a graph), these sometimes have a goal of aestheticism and
section, For Further Reading, before the Reference sec- novelty involved, not clear scientific communication.
tion as a starting point. Although some might envision specific scenarios in which
other visualizations are superior, we believe that the rec-
ommendations and code we present above will best serve
Summary
most social scientists in most common situations. We
Visualizing one’s data effectively to convey information believe it is most important for researchers to keep the
is a science unto itself with research-informed best and guiding philosophies in mind when making their unavoid-
worst practices. Yet this is an area in which social sci- ably subjective decisions about which visualization might
entists receive little training. Here, we aimed to essen- be most effective to convey understanding of their data
tially distill advice and information scattered across or critical hypothesis test. We hope this tutorial aids in
data-visualization blogs, books, and Internet discussion this endeavor.
Data Visualization 17

Transparency
Action Editor: Julia Strand
6 Editor: Daniel J. Simons
Author Contributions
Conceptualization: E. Hehman. Data curation: all authors.
Intelligence Ratings

Visualization: all authors. Writing–original draft: E. Hehman.

Writing–review and editing: all authors. Both authors
4 approved the final manuscript for submission.
Declaration of Conflicting Interests
The author(s) declared that there were no conflicts of inter-
est with respect to the authorship or the publication of this
article.
2 Funding
This research was supported by the Fonds de Recherche
(FRQ-SC NP-267701) to E. Hehman.
2 4 6 Open Practices
Attractiveness Ratings Open Data: https://osf.io/kx4us/
Open Materials: https://osf.io/kx4us/
Fig. 12. Spaghetti plot visualizing the relationship between ratings Preregistration: not applicable
of attractiveness and ratings of intelligence made by the same observ- All data and materials have been made publicly available
ers evaluating various faces. Thicker coral line represents the grand via OSF and can be accessed at https://osf.io/kx4us/. This
intercept and slope across all observers. Because of the complexity of
the figure, we removed features we would normally include, such as
article has received badges for Open Data and Open Materi-
gridlines, observations, or confidence intervals of slopes. In addition, als. More information about the Open Practices badges can
because of the multilevel nature of the data, histograms or density be found at http://www.psychologicalscience.org/publica
plots on the margins are also inappropriate (because they do not tions/badges.
accommodate the clustering within the data).

Recommended Reading
Ismay, C., & Kim, A. Y. (2021). Modern dive: Statistical Inference ORCID iDs
via Data Science. https://moderndive.com/index.html
A freely and fully available online introduction to R and the Eric Hehman https://orcid.org/0000-0003-2227-1517
tidyverse Sally Y. Xie https://orcid.org/0000-0002-1200-9470
Wickham, H., & Grolemund, G. (2017). R for data science.
O’Reilly Media. https://r4ds.had.co.nz/
Acknowledgments
A freely and fully available online introduction to program-
ming in R We thank Neil Hester, Eugene Ofosu, Jennifer Suliteanu, and
Tutorials Point. Learn ggplot2. https://www.tutorialspoint.com/ Chevieve Heri for feedback on an early draft.
ggplot2/ggplot2_introduction.htm
A freely and fully available online introduction to ggplot2
Wilke, C. O. (2019). Fundamentals of data visualization: A Supplemental Material
primer on making informative and compelling figures. Additional supporting information can be found at http://jour
O’Reilly Media. nals.sagepub.com/doi/suppl/10.1177/25152459211045334
An excellent modern resource, with some portions available
online, including some code for R.
Tufte, E. R. (1983). The visual display of quantitative informa- References
tion. Graphics Press. Allen, M., Poggiali, D., Whitaker, K., Marshall, T. R., & Kievit, R. A.
The classic text on data visualization by an initial pioneer in (2019). Raincloud plots: A multi-platform tool for robust
the area data visualization. Wellcome Open Research, 4, Article 63.
https://www.perceptualedge.com/ https://doi.org/10.12688/wellcomeopenres.15191.1
A website and blog maintained by data visualization expert Anscombe, F. J. (1973). Graphs in statistical analysis. The
Stephen Few, with numerous entries spanning back to 2006 American Statistician, 27(1), 17–21.
Koponen, J., & Hildén, J. (2019). Data visualization handbook. Attali, D., & Baker, C. (2019). ggExtra: Add marginal histo-
Aalto korkeakoulusäätiö. grams to “ggplot2”, and more “ggplot2” enhancements
A practical guide to data visualization. For example, see here (Version 0.9). R package version.
for comparisons of differential effectiveness of ways of Brewer, C. A., Hatchard, G. W., & Harrower, M. A. (2003).
conveying different types of values (e.g., shapes, color, ColorBrewer in print: A catalog of color schemes for maps.
line length, position, etc): “Visual variables,” https://data Cartography and Geographic Information Science, 30(1),
vizhandbook.info/. 5–32. https://doi.org/10.1559/152304003100010929
18 Hehman, Xie

Cleveland, W. S., & McGill, R. (1984). The many faces of a Neitz, J., & Neitz, M. (2011). The genetics of normal and defec-
scatterplot. Journal of the American Statistical Association, tive color vision. Vision Research, 51(7), 633–651. https://
79(388), 807–822. https://doi.org/10.1080/01621459.1984 doi.org/10.1016/j.visres.2010.12.002
.10477098 Nuñez, J. R., Anderton, C. R., & Renslow, R. S. (2018).
Cleveland, W. S., & McGill, R. (1985). Graphical percep- Optimizing colormaps with consideration for color vision
tion and graphical methods for analyzing scientific data. deficiency to enable accurate interpretation of scientific
Science, 229(4716), 828–833. https://doi.org/10.1126/sci data. PLOS ONE, 13(7), Article e0199239. https://doi.org/
ence.229.4716.828 10.1371/journal.pone.0199239
Crameri, F. (2018). Scientific colour maps. Zenodo. https://doi Ofosu, E. K., Chambers, M. K., Chen, J. M., & Hehman, E.
.org/10.5281/zenodo.1243909 (2019). Same-sex marriage legalization associated with
Few, S. (2007). Save the pies for dessert. https://www.percep reduced implicit and explicit antigay bias. Proceedings of
tualedge.com/articles/visual_business_intelligence/save_ the National Academy of Sciences, USA, 116(18), 8846–
the_pies_for_dessert.pdf 8851. https://doi.org/10.1073/pnas.1806000116
Few, S. (2011). Quantitative displays for combining time-series Spear, M. E. (1952). Charting statistics. McGraw-Hill.
and part-to-whole relationships. https://www.perceptu Stevens, S. S. (1975). Psychophysics: Introduction to its per-
aledge.com/articles/visual_business_intelligence/displays_ ceptual, neural, and social prospects. John Wiley & Sons.
for_combining_time-series_and_part-to-whole.pdf Tiedemann, F. (2020). gghalves: Compose half-half plots using
Garnier, S., Ross, N., Rudis, B., Sciaini, M., & Scherer, C. (2018). your favourite geoms (Version 0.1.1). R package version.
viridis: Default color maps from “matplotlib” (Version 0.5.1). Tufte, E. R. (1983). The visual display of quantitative informa-
R package version. tion. Graphics Press.
Heer, J., & Bostock, M. (2010). Crowdsourcing graphical per- Tukey, J. W. (1977). Exploratory data analysis (Vol. 2).
ception: Using Mechanical Turk to assess visualization Addison-Wesley.
design. In Proceedings of the SIGCHI Conference on Human Weissgerber, T. L., Milic, N. M., Winham, S. J., & Garovic, V. D.
Factors in Computing Systems (pp. 203–212). (2015). Beyond bar and line graphs: Time for a new
Hehman, E., Calanchini, J., Flake, J. K., & Leitner, J. B. (2019). data presentation paradigm. PLOS Biology, 13(4), Article
Establishing construct validity evidence for regional e1002128. https://doi.org/10.1371/journal.pbio.1002128
measures of explicit and implicit racial bias. Journal of Wickham, H. (2011). Ggplot2. Wiley Interdisciplinary Reviews:
Experimental Psychology: General, 148(6), 1022–1040. Computational Statistics, 3(2), 180–185.
https://doi.org/10.1037/xge0000623 Wickham, H., François, R., Henry, L., & Müller, K. (2021).
Hehman, E., Flake, J. K., & Calanchini, J. (2018). Dispro dplyr: A Grammar of Data Manipulation (Version 1.0.5).
portionate use of lethal force in policing is associated R package version.
with regional racial biases of residents. Social Psychological Wilke, C. O. (2019). Fundamentals of data visualization: A
and Personality Science, 9(4), 393–401. https://doi.org/ primer on making informative and compelling figures.
10.1177/1948550617711229 O’Reilly Media.
Helske, J., Helske, S., Cooper, M., Ynnerman, A., & Besançon, Wilkinson, L., & Friendly, M. (2009). The history of the clus-
L. (2021). Can visualization alleviate dichotomous think- ter heat map. The American Statistician, 63(2), 179–184.
ing? Effects of visual representations on the cliff effect. https://doi.org/10.1198/tas.2009.0033
ArXiv. https://doi.org/10.1109/TVCG.2021.3073466 Xie, S. Y., Flake, J. K., & Hehman, E. (2019). Perceiver and
Jones, B. C., DeBruine, L. M., Flake, J. K., Liuzza, M. T., Antfolk, J., target characteristics contribute to impression formation
Arinze, N. C., Ndukaihe, I. L. G., Bloxsom, N. G., Lewis, S. C., differently across race and gender. Journal of Personality
Foroni, F., Willis, M. L., Cubillas, C. P., Vadillo, M. A., and Social Psychology, 117(2), 364–385. https://doi
Turiegano, E., Gilead, M., Simchon, A., Saribay, S. A., .org/10.1037/pspi0000160
Owsley, N. C., Jang, C., . . . Coles, N. A. (2021). To which Zeileis, A., Fisher, J. C., Hornik, K., Ihaka, R., McWhite, C. D.,
world regions does the valence–dominance model of social Murrell, P., Stauffer, R., & Wilke, C. O. (2019). Colorspace:
perception apply? Nature Human Behaviour, 5(1), 159– A toolbox for manipulating and assessing colors and pal-
169. https://doi.org/10.1038/s41562-020-01007-2 ettes. ArXiv. http://arxiv.org/abs/1903.06490

Research Data Visualization and Scientific Graphics (Martins Zaumanis) (Z-Library)
No ratings yet
Research Data Visualization and Scientific Graphics (Martins Zaumanis) (Z-Library)
129 pages
Beginners Guide To Data Visualization
No ratings yet
Beginners Guide To Data Visualization
221 pages
Data Analytics and Visualization
No ratings yet
Data Analytics and Visualization
50 pages
Communication AND Data Visualization: Data Science For Marketing
No ratings yet
Communication AND Data Visualization: Data Science For Marketing
66 pages
E1 Answers
No ratings yet
E1 Answers
9 pages
050.6. Automation
No ratings yet
050.6. Automation
538 pages
David M. Wilkinson - The Fundamental Processes in Ecology - Life and The Earth System-Oxford University Press (2023)
No ratings yet
David M. Wilkinson - The Fundamental Processes in Ecology - Life and The Earth System-Oxford University Press (2023)
177 pages
Sample Thesis Related To Gestalt Psychology
100% (3)
Sample Thesis Related To Gestalt Psychology
5 pages
Portfolio Data Analyst
No ratings yet
Portfolio Data Analyst
2 pages
DWDV Unit-3
No ratings yet
DWDV Unit-3
49 pages
WINSEM2022-23 CSI3005 ETH VL2022230503218 ReferenceMaterialI WedMar0100 00 00IST2023 MultivariateDataVisualization PDF
No ratings yet
WINSEM2022-23 CSI3005 ETH VL2022230503218 ReferenceMaterialI WedMar0100 00 00IST2023 MultivariateDataVisualization PDF
56 pages
Data Presentation: I. Textual Narrative or Textual Presentation
No ratings yet
Data Presentation: I. Textual Narrative or Textual Presentation
11 pages
Arksey2005 PDF
No ratings yet
Arksey2005 PDF
15 pages
Data Visualization Intro
No ratings yet
Data Visualization Intro
25 pages
Harish Sadineni-Assignment 011-Criteria 1 & 2 & 4
No ratings yet
Harish Sadineni-Assignment 011-Criteria 1 & 2 & 4
29 pages
Bana1 Visualization
No ratings yet
Bana1 Visualization
22 pages
Week9 Slides
No ratings yet
Week9 Slides
80 pages
Chap 4 Datics
No ratings yet
Chap 4 Datics
41 pages
Exp-1 Excel Dashboard To Covid 19
No ratings yet
Exp-1 Excel Dashboard To Covid 19
11 pages
Revisiones de Literatura
No ratings yet
Revisiones de Literatura
17 pages
02 R Stats Visualisation
No ratings yet
02 R Stats Visualisation
37 pages
Assignment 2 - Data Management
No ratings yet
Assignment 2 - Data Management
68 pages
Data Visualization - Chapter1
No ratings yet
Data Visualization - Chapter1
66 pages
Sem 5th Wheather Forcast Project
No ratings yet
Sem 5th Wheather Forcast Project
28 pages
DVT 5th Unit
No ratings yet
DVT 5th Unit
40 pages
Presenting Statistical Data in English 31.10.2024
No ratings yet
Presenting Statistical Data in English 31.10.2024
50 pages
Viz Workshop
No ratings yet
Viz Workshop
57 pages
Introduction To Data Visualisation
100% (1)
Introduction To Data Visualisation
47 pages
2010-Program Allinone
No ratings yet
2010-Program Allinone
15 pages
Midterm Purposive
No ratings yet
Midterm Purposive
9 pages
British Airways Internship Report
No ratings yet
British Airways Internship Report
26 pages
MBA 620 Module Five Power BI Executive Summary Assignment User Manual
No ratings yet
MBA 620 Module Five Power BI Executive Summary Assignment User Manual
29 pages
Lecture 4. Visualization
No ratings yet
Lecture 4. Visualization
38 pages
Better Data Visualizations A Guide For Scholars, Researchers, and Wonks by Jonathan Schwabish
100% (8)
Better Data Visualizations A Guide For Scholars, Researchers, and Wonks by Jonathan Schwabish
464 pages
Data Modelling
No ratings yet
Data Modelling
9 pages
Chapter 09 Plots, Graphs, and Pictures
No ratings yet
Chapter 09 Plots, Graphs, and Pictures
18 pages
Activity Guide and Evaluation Rubric - Unit - 1 - Step 2 - Big Data Analytics and Machine Learning
No ratings yet
Activity Guide and Evaluation Rubric - Unit - 1 - Step 2 - Big Data Analytics and Machine Learning
8 pages
03-Data Science Methodology
No ratings yet
03-Data Science Methodology
8 pages
Persuading with Data: A Guide to Designing, Delivering, and Defending Your Data
From Everand
Persuading with Data: A Guide to Designing, Delivering, and Defending Your Data
Miro Kazakoff
No ratings yet
Data Analyst Portfolio
No ratings yet
Data Analyst Portfolio
5 pages
DVP 3
No ratings yet
DVP 3
97 pages
Common Visualization Idioms
0% (1)
Common Visualization Idioms
95 pages
Stock Visualization
No ratings yet
Stock Visualization
19 pages
DM14 Visualisation
100% (1)
DM14 Visualisation
67 pages
Kind (2009) - An Exploration of Trainee Science Teachers' SMK Development and Its Impact On Teacher Self Confidence.
No ratings yet
Kind (2009) - An Exploration of Trainee Science Teachers' SMK Development and Its Impact On Teacher Self Confidence.
35 pages
Using Doubly Latent Multilevel Analysis To Elucidate Relationships Between Science Teachers' Professional Knowledge and Students' Performance
No ratings yet
Using Doubly Latent Multilevel Analysis To Elucidate Relationships Between Science Teachers' Professional Knowledge and Students' Performance
26 pages
Park, Choi, & Reynolds (2020) - Cross-National Investigation of Teachers' Pedagogical Content Knowledge (PCK) in The U.S. and South Korea.
No ratings yet
Park, Choi, & Reynolds (2020) - Cross-National Investigation of Teachers' Pedagogical Content Knowledge (PCK) in The U.S. and South Korea.
23 pages
Unit 4 - Data Visualization
No ratings yet
Unit 4 - Data Visualization
32 pages
Ch4 EY Data Visualization Slides
No ratings yet
Ch4 EY Data Visualization Slides
47 pages
Intro Visualization
No ratings yet
Intro Visualization
46 pages
Data Analysis
No ratings yet
Data Analysis
23 pages
Hamza & Wickman (2009) - Beyond Explanations. What Else Do Students Need To Understand Science.
No ratings yet
Hamza & Wickman (2009) - Beyond Explanations. What Else Do Students Need To Understand Science.
24 pages
37483F Configuration Manual
No ratings yet
37483F Configuration Manual
164 pages
The Power of Graphs
No ratings yet
The Power of Graphs
29 pages
Visualizing Data Very Good
100% (1)
Visualizing Data Very Good
15 pages
Unit - 1 DV
100% (1)
Unit - 1 DV
10 pages
Meschede Et Al. (2017) - Teachers' Professional Vision, PCK and Beliefs. On Its Relation and Differences Between PST and IST.
No ratings yet
Meschede Et Al. (2017) - Teachers' Professional Vision, PCK and Beliefs. On Its Relation and Differences Between PST and IST.
13 pages
Jin Et Al. (2019) - A Validation Framework For Science Learning Progression Research
No ratings yet
Jin Et Al. (2019) - A Validation Framework For Science Learning Progression Research
24 pages
(Quality) Hannes Et Al. (2010) - A Comparative Analysis of Three Online Appraisal Instruments' Ability To Assess Validity in Qualitative Research
No ratings yet
(Quality) Hannes Et Al. (2010) - A Comparative Analysis of Three Online Appraisal Instruments' Ability To Assess Validity in Qualitative Research
8 pages
(Protocol) Tricco Et Al. (2021) - Strategies For Measuring Prescription Medication Switching With Pharmacy Claims Data
No ratings yet
(Protocol) Tricco Et Al. (2021) - Strategies For Measuring Prescription Medication Switching With Pharmacy Claims Data
16 pages
Principles of Data Visualization
No ratings yet
Principles of Data Visualization
61 pages
01 Introduction
No ratings yet
01 Introduction
51 pages
DV-Viva-Voice-Data Visualization
No ratings yet
DV-Viva-Voice-Data Visualization
12 pages
Data Visualization - Data Mining
No ratings yet
Data Visualization - Data Mining
11 pages
7091cem Assignment Brief
No ratings yet
7091cem Assignment Brief
10 pages
DMV - Unit 3 & 4
No ratings yet
DMV - Unit 3 & 4
32 pages
Data Visualization Techniques: Dr. D. Koteswara Rao
No ratings yet
Data Visualization Techniques: Dr. D. Koteswara Rao
41 pages
What Is Data Visualization
No ratings yet
What Is Data Visualization
2 pages
Better Data Visualizations Scholars
98% (41)
Better Data Visualizations Scholars
464 pages
Soc 301 Lecture Notes Jan 29, 2025
No ratings yet
Soc 301 Lecture Notes Jan 29, 2025
13 pages
Data Visualisation: Why Is Data Visualization Important?
No ratings yet
Data Visualisation: Why Is Data Visualization Important?
19 pages
DAA - Chapter 04
No ratings yet
DAA - Chapter 04
13 pages
Benítez Et Al. (2016) - Using Mixed Methods To Interpret Differential Item Functioning .
No ratings yet
Benítez Et Al. (2016) - Using Mixed Methods To Interpret Differential Item Functioning .
17 pages
Unit 3 DATA VISUAIZATION
No ratings yet
Unit 3 DATA VISUAIZATION
25 pages
Chapter 6 Introduction To Data Visualization - Introduction To Data Science
No ratings yet
Chapter 6 Introduction To Data Visualization - Introduction To Data Science
4 pages
Important of Data Visualization
No ratings yet
Important of Data Visualization
7 pages
Preview-9780231550154 A42427036
No ratings yet
Preview-9780231550154 A42427036
47 pages
Field Report Pac671 - Nur Azreen
No ratings yet
Field Report Pac671 - Nur Azreen
16 pages
Sab Theek Ho Jaega Unit 4 BRM
No ratings yet
Sab Theek Ho Jaega Unit 4 BRM
34 pages
Bda - Rahul Parida
No ratings yet
Bda - Rahul Parida
15 pages
Data Visualization
No ratings yet
Data Visualization
24 pages
Mohini Maggo
No ratings yet
Mohini Maggo
1 page
DV Unit 2
No ratings yet
DV Unit 2
5 pages
Visualization
No ratings yet
Visualization
15 pages
Better Data Visualizations A Guide For Scholars, Researchers, and Wonks (Jonathan Schwabish)
100% (1)
Better Data Visualizations A Guide For Scholars, Researchers, and Wonks (Jonathan Schwabish)
464 pages
Guillermo Vergara Resume
No ratings yet
Guillermo Vergara Resume
2 pages
LM3
No ratings yet
LM3
9 pages
1.1 Introduction To Data Visualization
No ratings yet
1.1 Introduction To Data Visualization
4 pages
Subject Code:Mb20Ba01 Subject Name: Data Visulization For Managers Faculty Name: Dr.M.Karthikeyan
No ratings yet
Subject Code:Mb20Ba01 Subject Name: Data Visulization For Managers Faculty Name: Dr.M.Karthikeyan
34 pages
What Is Data Visualization UNIT-V
No ratings yet
What Is Data Visualization UNIT-V
24 pages
c3200 02
No ratings yet
c3200 02
8 pages
Share Data Through The Art of Visualization
No ratings yet
Share Data Through The Art of Visualization
63 pages
Unit III Business Analytics
No ratings yet
Unit III Business Analytics
8 pages
CSC 428 - 4
No ratings yet
CSC 428 - 4
12 pages
Ds 1603 - Data Visualization Unit I Introduction
No ratings yet
Ds 1603 - Data Visualization Unit I Introduction
17 pages
Reading and Writing Set 2 Assgn
No ratings yet
Reading and Writing Set 2 Assgn
16 pages
Guide To Data-Viz
No ratings yet
Guide To Data-Viz
16 pages
Data Visualization Notes
No ratings yet
Data Visualization Notes
4 pages
Python Brochure
No ratings yet
Python Brochure
8 pages
Getting Data Science Done: Managing Projects From Ideas to Products
From Everand
Getting Data Science Done: Managing Projects From Ideas to Products
John Hawkins
No ratings yet
Salesforce Certified Tableau Consultant Dumps
No ratings yet
Salesforce Certified Tableau Consultant Dumps
11 pages
Data Visulization FrescoPlay MFDM
No ratings yet
Data Visulization FrescoPlay MFDM
2 pages

Hehman & Xie (2021) - Doing Better Data Visualization

Uploaded by

Hehman & Xie (2021) - Doing Better Data Visualization

Uploaded by

1045334

Doing Better Data Visualization Practices in Psychological Science

Received 4/20/21; Revision accepted 8/11/21

Advances of the past decade in open-source software, Guiding Philosophies

the observations. An additional step would be visualizing Minimalism

package is available in the Supplemental Material avail- Central Tendency

1 # Required packages for raincloud plots

Religious Symbol Turban

MI represented by a pie chart (Few, 2007; Stevens, 1975).

113 panel.grid.major.y=element_line()) # show major grid for

variable. Also like means over time, this is a high amount 1

118 # bar chart comparing proportions across multiple discrete categories

Improved scatterplot additionally be included to indicate certainty in the slope

1.00 with millions of data points, using a scatterplot results

240 xlab(“Explicit Bias”) + # x-axis label

Spaghetti plot this complexity in a single visualization is the spaghetti

Visualization: all authors. Writing–original draft: E. Hehman.

You might also like