PDV 9
PDV 9
There are many types of data visualizations and many variations on each 9.1 Exploration Visualizations 179
type. These data visualizations can collectively be thought of as tools in the Scatterplots . . . . . . . . . . 179
data visualizer’s tool box, and good data visualizers will be as familiar with Rug Charts and Histograms 180
them as a master wood worker is with their tools of their trade. Two-Way Tables . . . . . . . 182
9.2 Presentation Visualizations 183
The only way this can come about is through practicing with each type and Text Blocks . . . . . . . . . . 183
variation of visualization, using either teaching datasets or by finding real Tables . . . . . . . . . . . . . 184
world use cases. Line Graphs . . . . . . . . . . 184
Bar Charts . . . . . . . . . . . 185
It also helps to have a good organization to the tool box, which can serve to 9.3 The Rest of The Landscape 188
direct the visualizer to the right general type of visualization. Maps and Heat Maps . . . . 188
Bubble Charts . . . . . . . . 189
As with organizing a tool box, there is no one right way to group data
Small Multiples . . . . . . . 189
visualizations together, and to some extent it is a matter of personal preference. Area Charts and Treemaps . 191
However, some experts in this field have made efforts on this front, and it’s Text Visualizations . . . . . 191
possible to see some commonalities. Parallel Coordinates . . . . . 193
Trees and Networks . . . . . 193
As seen in Figure 9.1, one approach to organizing data visualizations is to
Animated Visualizations . 193
consider which ones best highlight:
9.4 Misc. & Charts to Avoid . . 197
a relationship – show a connection or correlation between two or more Chernoff Faces . . . . . . . . 197
variables,1 such as the impact of an aging population on health care;
Alluvial Diagrams . . . . . . 197
Charts to Avoid . . . . . . . . 198
a comparison – set some variables apart from others, and display how
those two variables interact, such as the number of fans attending
hockey games for different teams in a season;
a composition – collect different types of information that make up 1: Also called dimensions, axes, factors,
etc.
a whole and display them together, such as the various search terms
that visitors used to land on your site, or how many visitors came from
various sources (links, search engines, or direct traffic), and
a distribution – lay out a collection of related or unrelated information
to see how it correlates (if at all), and to understand if there’s any
interaction between the variables, such as the number of bugs reported
during each month after a new software release.
However, this is not the only way to think about data visualizations. Some
practitioners have broken down these categories further, as shown in Figure
9.2. As yet another alternative, it’s possible to consider which combinations
of the 5W questions (who, what, when, where, and how/why) certain data
visualizations are best suited to display, as was shown in Figure 2.14.
178 9 Visualization Toolbox
Figure 9.1: A Classification of chart types, based on visualization objectives [J. Camoes ].
Figure 9.2: An alternative way to group chart types [D. Hull , A. Abela ].
9.1 Exploration Visualizations 179
The take home point here is that while it’s possible to group data visualization
types in a number of ways, it is important to hone our own sense of the most
appropriate visualizations for particular situations, which may be informed
by schema developed previously, and so on (see Figures 9.1 and 9.2).
As we have already discussed in Chapter 2, regardless of our preferred
organizing principles, there are some data visualizations that are true
workhorses – they are usable in some way with almost any dataset, will
be familiar to most lay people and are particularly useful for exploring or
presenting data. Other visualization types have more situation-specific uses,
and may be difficult to use or ineffective in other situations – practice and
discernment are required to use them skillfully and appropriately.
Different people may consider different options for a list of workhorse
visualizations. For instance, we could trot out the following “old-faithfuls”:2
2: Another selection has already been re-
viewed in Chapter 2.
Data Exploration
Scatterplots
Rug charts
Histograms
Two-way tables
Data Presentation
Text blocks
Tables
Line graphs
Bar charts
These categorizations – exploration vs. presentation – are not intended to
be hard and fast: at times, we might use a barchart for exploration and a
scatterplot to present a key finding, say. Everyone will develop their own
approach for the use of these visualizations, but we provide some comments
and pointers as guidelines.3 3: As a rule of thumb, these lists provide a
good starting point when deciding which
tools to pull from the toolbox first.
Scatterplots
But caution must be exercised when using scatterplots for data presentation.
Because they can represent all of the points in a dataset, there is a risk for
clutter (and overwhelming the consumer). The message can easily get lost.
Consequently, we suggest only using scatterplots for communication when
the pattern is naturally clear and relevant to the broader context of the story
being presented.
Bubble charts, which are a variation on scatterplots, can communicate
relationships between multiple dimensions and provide a powerful tool to
render multivariate relationships. We will discuss them further in a coming
section of this chapter.
Scatterplots (and bubblecharts) are most commonly used for quantitative
data, but it is possible to have one or both of the axes represent a qualitative
variable, using an approach similar to that taken on the horizontal axis
(x-axis) of bar charts. This can lend itself to misinterpretation, however.
Lastly, it is not uncommon for scatterplots to be overlayed with a trend line
or curve, such as in the data storytelling tropes of Section 8.3 (Evolving a
5: Creating trend lines involves calcula- Storytelling Chart).5
tions over the data points represented; this
can be automated by the software used to TL;DR Summary and Comments:
render the chart, as we will see in the next
few chapters. plots show relationship between 2 variables (scatterplot) or 3 variables
(bubble plot)
we can use average lines (or similar curves) to provide context
consider using groupings to add clarity (e.g., colour gradients)
colour and geometry allow us to plot (at least) 2 extra variables on a
2D scatterplot
the data may need to be re-scaled or binned
a movie could be used to visualize an additional ordinal variable
text can also be added to visualize an additional categorical variable
works best when chart is not too encumbered
Examples of scatterplots (and bubble charts) are provided in Figures 9.3 and 2.7.
Figure 9.3: Scatterplots and bubble charts: personal collection (top row); Medium (middle left); Towards Data Science (middle right); Ottawa
Senators player usage, 2016-2017 (Hockey Abstract , bottom row).
182 9 Visualization Toolbox
Two-Way Tables
But this only presents a part of the picture. The 1−way tables provide a
univariate summary:
while the 3−way table speed × season × size provides the full picture, but it
is not the only way to represent it:9 9: What would the other combinations
(size × speed × season, etc.) look like?
Text Blocks
The simples of the presentation visualizations, the text block, may not even
seem like it belongs in a list of data visualizations, given its lack of visual
elements outside of the written word. However, when treated as a graphical
element, where the focus is on a fact containing one or two numbersat most,
they are excellent at “setting the scene”.
This is particularly useful in a dashboard or in a report context, where text
blocks can be used to draw the focus to an area of the report which contains
a more detailed breakdown or analysis of the data in question.
Tables
Tables are another text-heavy visualization which interact with our ver-
bal system: we read them. They are useful for comparing values across
variables.
One complicated aspect of tables is that the audience has considerably leeway
in regard to how they elect to read them: they may focus on the relationship
between numbers across each row, or down each column. Furthermore, if
the data relates to specific individuals (or units) audiences are expected to
be most interested in, they will certainly look for and focus on their rows,
10: Designers may need to use the Gestalt potentially to the exclusion of everything else.10
guidelines of Section 4.2 to draw the audi-
ence’s eye to another location in the table. Importantly, table design should blend into the background.11 It is the data
11: Although it seems to go against easily- that should stand out, not the borders – if you must display large, dense
accessible MS PowerPoint templates... tables,12 consider alternating the table row colour from white to a very lightly
12: Must you, really? noticeable shade to help the eye scan across the rows and seperate one row
from another.
The table heat map provides a variant on the table, where cells contain
a colour as well as a number, and in which the colour is leveraged to
convey magnitude, by mapping the colour hue and saturation to the cell
13: A single colour saturation with a leg- value.13 Eventually, the numerical values or cell text labels may be removed
end (white = low, blue = high) is preferable without altering the message, leading to a more holistic reading of the data
to colour differentiation (rainbow scale).
visualization.
Figure 9.8: From table (left) to heat map table (middle) to holistic heat map table (right).
Line Graphs
Figure 9.9: Various line graphs and sparklines, from personal files.
Bar Charts
One of us16 thinks that the bar chart is the workhorse of all workhorse among 16: *cough* Jen *cough*
data visualizations. It is almost immediately familiar to most people and, if
used in an expected fashion, readily interpretable as well. Although some
people may hesitate to use a bar chart simply because it is so familiar and
frequently used, we believe that this is a strength, a weakness.17 17: If novelty is desired, there are many
variations on the bar chart, both with
The basic bar chart represents a single numeric variable broken down by the respect to what types of data are incor-
values of a categorical variable.18 porated and aesthetic and presentation
choices, that can add interest and nuance
These charts are quite versatile and useful. Apart from very rare instances, to the basic bar chart.
they should always have a zero baseline. When constructing them, we 18: See Section 2.2 for more information.
186 9 Visualization Toolbox
recommend using either the graph axis or the data labels: axis for broad
19: Horizontal charts are apparently easier statements, data labels for detail information.19
to read, as we have discussed in Section 4
[38]. From a design point of view, the basic bar chart can be transformed in a
number of ways which impact how viewers interpret and prioritize different
20: Variations include: pie charts, gauge aspects of the visualization.20
charts, funnel charts, lollipop charts, wa-
terfalls, stacked bar charts, cluster bar Funnel charts are typically used to represent decreasing proportions amount-
charts, 100% bar charts, percentage bar ing to a 100% total.21 These can be very useful to help audience quickly
charts, etc.
prioritize items without having to actively filter the data.
21: Although that is not always the case.
Gauge charts are often used as a dashboard component (with or without
needle) – they typically display single value measures on the way to some
goal or key performance indicator (KPI), in a manner that can quickly be
scanned and understood. While gauge charts are particularly useful to show
22: Not that there is anything wrong with progress, they may ultimately prove to be a management fad.22
that, of course.
Stacked bar charts are designed for comparing totals, but can quickly become
overwhelming. They are hard to sort and order. Filtering is complicated in
dashboard applications like Power BI because it is unclear how the chart
should respond when filter is applied.
100% bar charts work well for visualizing portions of a whole on scale from
negative to positive. They have a consistent baseline at each of the extremities
(either left/right, or top/bottom), making it easy to compare the bars. The
issue, however is that there is no relative measure of the magnitude of data.
As with other bar charts, research shows that horizontal is easier to process
than vertical.
Waterfall charts shows how the initial value increases or decreases using a
series of intermediate values; different colours should be used to represent
increases and decreases. One drawback is that it is difficult to remove charts
23: In other words, it is difficult to declut- elements without removing context.23 Note that large increases or decreases
ter waterfall charts. may look odd (as in Figure 9.11).
9.2 Presentation Visualizations 187
Figure 9.11: Bar charts and variants: basic (top row); stacked bar charts (2nd row); 100% bar charts (3rd row); waterfall charts (bottom row).
188 9 Visualization Toolbox
Figure 9.12: Maps, maps, maps: a sprinkle of maps and distortions – Canadian airports (top left, personal file); population cartogram (top right,
Paul Breding); global warming culprits, by population and by size (bottom row, New Scientist).
Heat maps are ideal when we want to look at the relationship between 3 or
4 variables. If one of these represents a percentage or a value within a set
range, it can be used to fix the colour scale, for comparison purposes. The
other variables are then used to locate and size markers on the display.
If the axes variables are continuous, it could still be preferably to bin them:
this decreases the number of required observations for usefulness. It is
typically easier to read such charts if colours are selected along natural
24: More sophisticated gradients can be colour gradients, such as White → Blue or Red → Black.24 When the
used (Red → Yellow → Green, say), but background canvas is a non-distorted geographical map, heat maps are
those are less than ideal from a Gestalt
perspective, or if some viewers are colour
known as choropleths (see Figure 9.13).
blind – this is another clear case where
“less is more”.
9.3 The Rest of The Landscape 189
Figure 9.13: Heat maps and choropleths: The Horizon or Pedestrian Risk (J. Nelson, IDV Solutions, top left); basketball shooting charts
(NBAsavant.com , top right); Election choropleth (A.E. McCann, bottom left); Canadian population choropleth (Statistics Canada, bottom
middle); US elevation choropleth (author unknown, bottom right).
Bubble Charts
Unlike scatterplots, which have already been discussed both in Chapter 2 and
earlier in this chapter, bubble charts can serve to illustrate the interactions
between multiple variables (when used correctly). Importantly, however, they
are usually most useful when there relationships between the variables in
questions are strong, resulting in clear patterns in the chart. We must also
be careful when choosing how to represent the many variables involved –
there are likely more bad options than good choices on this front, and so
experimentation is likely to be required; examples can be seen in Figures 1.4
and 9.3.
Small Multiples
Figure 9.14: Small multiples: US electoral results choropleths, by year (author unknown, top); debt line graphs, by G7 country (Pew Research
Center, bottom).
9.3 The Rest of The Landscape 191
Text Visualizations
Not to be confused with text blocks, text visualizations use text attributes
(such as size and colour) to represent some other variable associated with the
words. For maximal impact, font size may be a function of frequency. These
visualizations are typically used for univariate categorical data, but small
multiples, cloud shape, word placement, colour, and hue could be used to
integrate more variates.
In many implementations, the word placement and colour choice algorithms
are “hidden” from the users. As an example use case, text visualizations can
be used to answer authorship questions.
192 9 Visualization Toolbox
Figure 9.17: Text visualizations: word cloud (most pirated artists, 2007-2010 , top row); Ottawa Senators most frequently named players in AP
articles, 2016-2017 (middle row, first two charts); comparison of word usage in Shakespeare and Marlowe plays (middle row, last two charts);
various text visualizations for Shakespearean plays (bottom row).
9.3 The Rest of The Landscape 193
Although parallel coordinate charts, which stack and connect multiple rug
charts to show relationships between potentially large numbers of variables,
are a relatively obscure type of visualization, a variation has been increasing
in popularity in recent years.
Radar charts, which arrange the axes radially as spokes coming out of a
central point, are often seen in social science or business contexts, where they
are used to show survey results. When used in this manner, the overall shape
of the connected line on the radar chart gives a gestalt sense of the response
profiles.27 27: E.g., are they mostly low or mostly
high? and so on.
Using networks to both model and visualize systems can give us insights into
the system. Having a solid conceptual understanding of the system through
the use of these visualization types can help us draw legitimate and sound
conclusions.28 Examples are provided in Figure 9.19. 28: Here are some clues that suggest that
using a tree or network visualization could
be useful:
Animated and Interactive Visualizations are we dealing with flow (of some-
thing) along pathways?
are we dealing with a collection
Animation and interactivity do not always improve a visualization. What of objects that input and output
insights can they provide? That depends on the data, and on the visualization things?
method. are the inputing/outputing objects
homogeneous?
Even when done well, 85% of users don’t bother with interactive viz, according are we dealing with relationships
to a NY Times analysis of their own viusalizations at The Upshot. This and connections between objects?
Are we dealing with a situation
very strongly supports the notion that the default visualization (i.e., the one
where one object influences an-
that greets viewers when they first load the website on which it is found) other object?
should be coherent and self-consistent as is (see Figures 9.20 and 9.21 for
examples).
194 9 Visualization Toolbox
Figure 9.19: Trees and network diagrams: disease progression [32] (top left); US airport hubs (top right, author unknown); classification [1]
(bottom left); tree of life (P.Z. Meyers, bottom right).
9.3 The Rest of The Landscape 195
Figure 9.20: Animated and interactive charts: The Clubs That Connect the World Cup , NY Times, 2014 (top left); Who Marries Whom
, Bloomberg, 2016 (top right); Hipparcos Star Mapper , European Space Agency, 2016 (middle left); The Internet of Things – a Primer ,
Information is Beautiful, 2016 (middle right); Visualizing the Riemann 𝜁 Function and Analytic Continuation , 3Blue1Brown, 2016 (bottom row).
196 9 Visualization Toolbox
Figure 9.21: Animated and interactive charts II: The Genealogy and History of Popular Music Genres , Musicmap, 2016 (top left); Sequences
Sunburst , Kerry Rodden, 2015 (top right); Health and Wealth of Nations , Gapminder Foundation (middle left); Small Arms and Ammunition
– Imports and Exports , Google, 2012 (middle right); Mobius Transformations Revealed , D.N. Arnold, J. Rogness, 2007 (bottom).
9.4 Misc. & Charts to Avoid 197
Some data visualizations are sufficiently unique that they cannot easily be
grouped or categorized. 29: The idea is perhaps intriguing and
might even work well in some instances,
but in most cases it fails to provide a useful
Chernoff Faces rendering; among other issues, most facial
features are not ordinal, faces are more
than the sum of their parts, and not all
Consider, as a singular example, Chernoff faces, which were designed on the facial features carry emotions.
premise that people can easily understand facial expressions. The Chernoff
visualization can accommodate up to 18 or 36 facial feature variables.29
Figure 9.22: Chernoff faces of MLB managers characteristics during the 2007 season (SC. Wang, NY Times).
Alluvial and Sankey diagrams (see here and here for examples,
respectively) are similar in appearance to one another, and both allow for the
visualization of proportions; however, in the case of the alluvial diagram,
the focus is on datasets with multiple categorical variables, and the chart
displays the percentages of each variable relative to other variables.
Sankey diagrams, conversely, focus on quantity breakdowns relative to
particular categories and how those quantities change when considering
other categories.
198 9 Visualization Toolbox
Charts to Avoid
One the one hand, we are agnostic when it comes to tools and methods:
anything that helps convey the data story is on the table. On the other
hand, some of the commonly-used approaches really put a damper on
comprehension.
We strongly suggest avoiding:
ANYTHING with an arc (except for gauge charts) such as pie and
30: Sometimes we need to be pragmatic... doughnut charts.30 Human brains cannot easily compare angles and
but there are limits. arcs, so these can become misleading: without labels, how easy is it to
compare Steve & Bob below?