Beginners Guide To Data Visualization
Beginners Guide To Data Visualization
VISUALIZATION
HOW TO UNDERSTAND, DESIGN, AND OPTIMIZE OVER 40
DIFFERENT CHARTS
ELIZABETH CLARKE
© Copyright Elizabeth Clarke 2022 - All rights reserved.
The content contained within this book may not be reproduced, duplicated or transmitted without
direct written permission from the author or the publisher.
Under no circumstances will any blame or legal responsibility be held against the publisher, or
author, for any damages, reparation, or monetary loss due to the information contained within this
book. Either directly or indirectly. You are responsible for your own choices, actions, and results.
Legal Notice:
This book is copyright protected. This book is only for personal use. You cannot amend, distribute,
sell, use, quote or paraphrase any part, or the content within this book, without the consent of the
author or publisher.
Disclaimer Notice:
Please note the information contained within this document is for educational and entertainment
purposes only. All effort has been executed to present accurate, up to date, and reliable, complete
information. No warranties of any kind are declared or implied. Readers acknowledge that the author
is not engaging in the rendering of legal, financial, medical or professional advice. The content
within this book has been derived from various sources. Please consult a licensed professional before
attempting any techniques outlined in this book.
By reading this document, the reader agrees that under no circumstances is the author responsible for
any losses, direct or indirect, which are incurred as a result of the use of the information contained
within this document, including, but not limited to, — errors, omissions, or inaccuracies.
TA B L E O F C O N T E N T S
Introduction
Conclusion
Appendix - Tools for Data Visualization
References
INTRODUCTION
Whether you say it d-ay-ta or d-ah-ta, data has been around for a long, long
time. In fact, the history books have records of data collection tools dating
back as far as 19,000 BC. This came in the form of an Ishango bone, which
was used as a tally stick. The Ishango bone was a primitive brown bone tool
consisting of a length of bone with a sharp object like a piece of quartz on
one end. Before what we know as modern mathematics or libraries were
ever invented, human beings were already interacting with data.
Of course, as humans have evolved, so has how we collect and store data.
Naturally, how we display data in visual form has also moved with the
times.
Before the Industrial Revolution in the mid-19th century, the main forms of
data visualization were maps, which we used as displays of resources, land
markers, roads, and cities. But now… Line charts, area charts, histograms,
heat maps, pie charts, bar charts… I really can go on and on. There is no
shortage of ways that data can be presented in a visual format, and, of
course, this variety makes it so that any type of data can be given a visual
skin. We really have come a long way in the last 200 years because such
formats would have been totally foreign to someone of that time viewing
the visualization.
But why was this evolution of data visualization necessary?
The short answer? The sheer amount of data we amass in this modern age
makes it necessary to find mediums to present that information in a visual
context that is easy for us to understand and gain insight from at a glance.
After all, approximately 80 zettabytes (ZB) of data, with 1 ZB totaling 1
billion terabytes, was collected in 2021. Every day, more and more data is
being collected and stored on social media, via retail outlets, in small and
large businesses… You name it and this entity is collecting data. Even on an
individual front, we collect and store data. With the increasing global
population and consumption, data volume is exploding. In fact, data volume
is expected to more than double the 2021 figure by the time we get around
to 2025. Can you imagine the accumulation after that time? It truly boggles
my mind to contemplate. Data visualization allows us to easily identify
trends, patterns, and outliers from such large data sets. It also makes it so
that we can present these data sets to audiences that might not be as
knowledgeable as us in a clear, concise, and easily interpretable way.
But data visualization is not just important because it allows us to digest
visual information quickly. It allows for faster and more efficient decision-
making. It jumpstarts the crucial decisions to improve an organization, such
as business productivity, product performance, better services and whatever
else pushes a business forward. It increases the probability that people in
and out of an organization can share helpful insights and develop solutions
to solve whatever problem that makes the need for data collection
necessary. It allows for acting on solutions quickly so that success is more
likely and fewer mistakes are made. Because of these and more benefits,
data visualization can be used in all sectors and niches such as politics,
healthcare, sciences, sales, marketing, finance, banking, logistics, and more.
However, despite the many benefits of using data visualization, many
people do not understand how to use it to maximum effectiveness. Whether
you are a beginner at data visualization or looking for a way to enhance
your current data visualization toolkit, this book was written to showcase
the many different charts that can be used, what they are good for, and how
you can maximize that usage.
Data visualization can seem quite complex and daunting, but I have
organized this book in a manner that is simple to understand and utilize.
The information has been compiled into four main sections to ensure your
toolkit has all it needs to successfully create effective data visualizations.
They are:
Section 1: Fundamentals of Data Visualization
Section 2: All about Charts and How to Use them to Your Advantage
Section 3: Fundamentals of Design
Section 4: Case Study and Redesigns
Appendix: Tools for Creating Data Visualizations
To summarize, by the time you read the last word of this book, you will
understand the essentials of any data visualization, how to pick the right
charts and how to design them for maximum effectiveness.
Project managers, data scientists, marketers, social media analysts, product
managers… All these professionals and more would have a far easier time
relaying information and insights if they learned the art of telling stories
through data visualization. My passion has led me to create a series of
books to give you the tools to tell captivating stories with data.
I won't waste any more of your time, lets get right to the good stuff!
1
F U N D A M E N TA L S O F D ATA V I S U A L I Z AT I O N
E ven though the visualizations that we see as typical, like line graphs
and pie charts in this day and age, are relatively new constructs of the
late 18th to mid-19th century, data visualization itself is not new. In
fact, it has been around for thousands of years in the form of maps. “X
marks the spot,” is a pirate phrase that has been passed down throughout the
ages and is a great example of how we used to visualize information before.
What we know as modern data visualization was created by William
Playfair, who was a Scottish political economist. He invented line charts,
area charts, and bar charts and shared his invention with the world in 1786
in his publication called The Commercial and Political Atlas; Representing,
by Means of Stained Copper-Plate Charts, the Exports, Imports, and
General Trade of England, at a Single View. To which are Added, Charts of
the Revenue and Debts of Ireland, Done in the Same Manner by James
Correy.
FIGURE 1.1: One of his many charts. Source: The Commercial and Political Atlas, 1786
(3th ed. edition 1801)
VISUAL PROCESSING
There are several ways that the brain processes the world around us. It does
this via our 5 senses (touch, hearing, taste, smell, and sight). The
information it receives via our senses is turned into electrical and chemical
signals that it uses to make interpretations.
No matter what sense the brain receives this information, most of this
process happens outside your conscious awareness. The process whereby
the brain subconsciously accumulates information about your environment
is known as pre-attentive processing. The brain then filters and processes
what it deems important from the received information. Can you imagine if
you noticed every single thing, such as how a blade of grass blew in the
wind and the lint that landed on your shoe every single second of the day?
Your brain would constantly be on the verge of overheating. The brain
recognizes this is not a desired state of affairs and therefore, only gives
attentiveness to things it deems essential. But how does it make the
selection of what is important and discard what is not?
Let’s answer this question now. The information received from our senses is
given different levels of priority for processing in the brain. As part of our
cognitive awareness, taste has the lowest priority. This is followed by
hearing and smell and then touch. Finally, the highest priority is given to the
information received from sight. You are far more likely to notice
something you see compared to what you are tasting under normal
circumstances. More than 50% of the information our brains process is
gained from what we see around us. This process is called visual
processing. It describes how the brain perceives and then processes
information gained from what we see. This information may seem boring to
some, but whoever masters it will never leave an audience uninspired again.
Cognitive awareness is handled by the part of the brain called the cerebral
cortex. This part is responsible for our reasoning. While it is pretty nifty at
sifting through the information, it is hindered by the process of evolution.
Or rather, evolution has not caught up to it yet because it is a comparatively
new structure in the brain. Therefore, it is slower to interpret the
information it receives. We would resemble lagging computers at times if
we relied on this part of the brain to process the sheer amount of visual
input received every minute of the day. However, this part of the brain is
quite equipped to handle functions like thinking, understanding language,
and perceptions. It also interprets information received from other senses
like touch and hearing quite well.
On the other hand, the brain processes the raw visualizations it receives in
another part of the brain called the thalamus. It is older than the cerebral
cortex on the human evolutionary scale and can process visual information
in a few hundred milliseconds. This is much less taxing on the brain and
allows pre-attentive processing to occur. As long as your eyes are open, it
receives input, but we are unaware of most of this intake. That is not until
an element of a visual catches the brain's attention and goes through the
cognitive tunnels of the thalamus.
VISUAL VARIABLES
The differences in elements of items received by the human eye for visual
processing and analysis are called visual variables or pre-attentive
attributes. As you see below, many of these elements you've probably seen
in graphs before and they can work wonders when it comes to highlighting
insights.
FIGURE 1.2: Pre-attentive attributes
Position
This variable describes the way that the element has been situated relative
to other elements. It is where the object is placed in its environment. This
also goes by the name location. This can be given in absolute or relative
terms
Size
This indicates the dimensions of the objects in sight. The size of a visual
variable can affect how other visual variables are seen. For example, a large
element can cause another item to seem smaller than it is by comparison.
Shape
This quality describes the parameter of the objects in sight. Examples of
shapes include points, lines, and flat and 3-D figures.
Hue
This variable has two dimensions, which are the hue and the lightness.
These can also be counted as individual visual variables as well. The hue is
often simply referred to as the color. It describes the name of the color, such
as blue, green, pink, purple, or red. Lightness, also called value, describes
how dark or how light a hue is. So, while shades of blue or green might
have the same hue, they can have different values of lightness or darkness.
So, sky blue and navy blue have different lightnesses even though they have
the same hue, which could then represent high figures and low figures.
Orientation
This describes elements relative to each other or specific positions.
Orientation can create perceptions of likeness or groupings.
Curvature
Utilizing curves in your data visualizations can showcase the flow of the
data. Often, this is populated automatically in cases such as a line graph.
When a bar chart is necessary, you can add a supporting trend line to show
the flow of the data. Another case would be to visualize smaller instead of
larger time intervals. Instead of quarters or years, you can show months or
days to see a microscopic view of the data and how it trends over time.
Length
People can easily distinguish the length between separate things. This is
why bar charts tend to triumph over pie charts. When a bar is noticeably
longer than the other, it is perceived as a more significant value very
quickly. We will cover this theory more later.
Width
Width can be used to determine the size difference between various
categories. This can be done through a Marimekko chart or stacked area
chart, as each category's value is showcased by its width. We will, of
course, go in-depth on these later as they are great charts to have in your
tool belt
Added Marks
A great way to show separate groups within data. For example, data plotted
on a scatter plot. This can also be effective when presenting to someone
who is colorblind and separating the groups by color won't be as effective.
Enclosure
Enclosure allows you to quickly determine groupings based on borders or
enclosed values. Using too many lines or borders can also add clutter, so
keep this in mind.
Color saturation
This relates to the intensity of the color of an element. The more saturated a
color, the richer it appears. At 100% saturation, a color has no gray added to
it. The less intense the color, the paler it appears. At 0% saturation, a color
will appear gray no matter the hue.
Grouping
This describes the layout of predetermined elements. It helps establish
relationships between elements as well as appearance to achieve an overall
visual flow. This means that a particular eye movement will be prescribed
to these elements.
Contrast
This variable describes how an element stands out from another—
potentially a different tone or lightness than the others or its transparency
level. Another example would be hot and cold, where colors over a
spectrum gradually change to a different value.
Law of closure
This law refers to how the brain simplifies complex arrangements of visual
elements by organizing those elements into recognizable patterns. The brain
will fill in incomplete images to make the visualization makes sense based
on this law. As the presenter, you need to be mindful of this law and ensure
that your data visualizations are as complete so that your audience grasps as
accurate information from the data as possible. Therefore, a broken line in a
line chart can unwittingly confuse your audience because the image is
incomplete. On the other hand, a continued line displaying the same
information on a line chart gives more information so that the brain gets a
clearer picture. If you look below you can see the missing value and what
the value actually looks like. The missing point is unclear as to what the
data was doing in that specific time frame and could be misconstrued.
FIGURE 1.4: Law of closure
Law of similarity
This law describes the brain's tendency to group like things together.
Therefore, the brain will perceive things with similar shapes, sizes, colors,
orientations, or textures as belonging to the same group. Use this law to
help your audience more readily identify patterns based on your data. You
can also use this law to ensure that your audience more easily perceives
different groups by using dissimilar elements such as color to make that
differentiation.
Law of enclosure
The brain also identifies visual items as grouped together based on them
being enclosed in a particular group. That is the basis of this law, which is
also sometimes referred to as the Common Region Gestalt principle.
Law of continuity
This law is based on the fact that the human eye tends to follow lines and
perceives direction based on the curvature of that line. Use this law in your
data visualization by arranging visual objects in lines to simplify
comparisons and create groupings. For example, your audience will more
readily digest the data being portrayed in a bar chart that moves figures
from highest to lowest or vice versa in a straight line compared to a bar
chart that has scattered figures and varying heights of bars.
FIGURE 1.8:
The second most common mistake is going against conventions like using
larger areas to indicate higher number values. Audiences are typically
familiar with conventions and going against that grain can leave them with
the wrong impression.
Lastly, do not cherry-pick data to place in your charts. Your audience needs
a whole picture of what the data represents to make informed decisions and
showing only a few data points can mislead and, thus, deceive your
audience. For example, you can present a data set from the last 6 months as
an upward trend.
FIGURE 1.9
Although this is true, if you look at the past year, premium subscriptions
have been steadily declining.
FIGURE 1.10
The more significant insight is why it's on the decline and will it continue
even after this minor spike is over. Always showcase the full picture.
Know your audience
We use data visualizations to communicate with others – the audience. Even
though the message might be the same, how we communicate with a group
is more important than what is said in conveying the right message. You
need to create your visualizations in a way that your audience wants to see
and can understand. Therefore, before you picture a type of chart or the
colors that will be used on that chart, you need to study and understand who
you will be presenting to. Based on what your research says about your
audience, you can cater to their method of communication. Items that affect
the audience’s communication style include their job position. Typically,
higher levels of management need more of a helicopter view to see what is
going on, without as many details. If they are requesting the details,
multiple charts will be in order. While department-specific employees will
need more details for better and more accurate decision-making. Technical
literacy is also a factor that determines how data should be presented. The
more knowledgeable the audience is about the statistics and niche-specific
items, the more complicated the charts can be without losing insight. The
opposite is true for less knowledgeable audiences.
How much background knowledge the audience has on the topic is also
important. The more they know, the fewer details that need to be given
upfront. On the other hand, if the audience does not have any prior
knowledge of the subject matter, then your charts need to include
information that will get them up to speed. It's always good to have some
extra charts that go into detail on some points just in case a few people need
more insight.
Presenting a detailed story to high-level executives is a challenge on its
own. Check out my book “How to Win With Your Data Visualizations” if
you want a more in depth view of presenting data. Here, I go deeper into
understanding your audience and how to present effectively to different
people. Scan the QR below to check it out! (or once you’ve finished this
book.)
While time is the common denominator that links the types of charts we
will discuss in this chapter, it is important to note that the appropriate time
must be displayed. Otherwise, you risk confusing your audience. They need
to have the proper context of the time movement, so this is one of the first
things you need to nail down before developing data visualizations.
There are many options when visualizing change over time. The chart that
you choose will depend on 2 factors:
With these two factors in mind, we will discuss 5 of the most widely used
change-over-time charts next.
LINE CHARTS
A personal favorite. A line chart is one of the first charts that come to mind
for most people when they hear data visualization, but what is a line chart
exactly? Also called a line graph or a line plot, a line chart is a graphical
representation of data that uses points connected by line segments to
demonstrate the change in volume. This demonstration moves from left to
right. As usual, with charts that illustrate change over time, the horizontal
axis represents a continuous progression of time from left to right and the
vertical axis reports the values that change along that progression.
The most common use of line charts is to emphasize the change in variable
values represented on the vertical axis against the continuous values noted
in time intervals like hours, weeks, months, and years plotted on the
horizontal axis. The plotted line will allow for changes in patterns and
trends as it slopes up or down. Single and multiple lines can be plotted on a
line chart. Multiple lines are used to compare trends of different variables
and subgroups with a data set.
Even though the components of a line chart are quite simple, here are a few
practices that be used to make these devices pop out as your audience and,
of course, be effective at bringing your point across:
Use appropriate measurement intervals on the horizontal axis
Using time measurements that are too short can create a rather busy-looking
line graph when using time intervals that are too far apart means that it is
harder to follow the trend being depicted by the data. You must find a
healthy medium between these two extremes for your line chart to be a
credible source to your audience. Do so by testing out different intervals to
get a feel of one representing the data to its highest potential. This could be
made simpler by your knowledge of the data. If you find out the data cannot
be accurately represented by one line, it is possible to use a second line. The
first can highlight the marginal differences in the time interval and the
second line can serve to highlight the overall trend with a rolling window. A
rolling window refers to noting trends in a subset of the data. It does not
represent the data entirely but rather a sub-series of the full set.
Refrain from plotting too many lines
Technically speaking, you can place as many lines on a single-line chart as
you would like. However, not because you can means you should. Too
many plotlines can make your line graph harder to interpret. Try to stick to
plotting 5 lines or less if multiple trends need to be noted on the
visualization. As you can see, too many lines clutter the visual and hide any
insights.
FIGURE 2.2: Cluttered line chart
As you can see, there is a serious issue with the visualization when setting a
zero baseline. Let's change the baseline for a better result.
FIGURE 2.6
We now have a better view of how the data changes over time. Normally
decimal points don't warrant a significant change and shouldn't be
exaggerated. However, when it comes to currency exchange rates, a matter
of cents makes a big difference and it's essential to see the fluctuation. Look
at the data you're presenting before you determine your axis values.
Whether or not you use a zero baseline or a non-zero baseline in your line
chart is dependent on the data at hand. As a rule of thumb, zero baselines
are not a strict requirement with statistical summaries for accurate
interpretation by the audience.
Avoid a smooth line
On a typical line chart, each plot point is connected with a straight line
moving from left to right. Another common mistake to avoid is linking
points with a smooth curve rather than a straight line going through each
point one by one. Using curved lines distorts the perception of the trends
represented in the data. A straight line lets us quickly note drops and spikes
and when exactly they took place. Always remember; insights before
aesthetics!
FIGURE 2.7
D UAL - AXIS LINE CHARTS lead to skewed information
While using a dual vertical axis on a line chart can serve to highlight
different trends on one chart, this practice can easily lead to
misinterpretation depending on how each axis is scaled. The same
information can look entirely different with a few changes on each axis. To
lessen the chances of this happening, ensure that the lines plotted are
separated with appropriate scales on each vertical axis. The audience is less
likely to make false comparisons between the variables being noted. As you
can see below, this is the same set of data with different axis’. This is just an
example. It can be skewed in many ways to make it appear however you
want. People who know nothing about the dataset will indeed be misled.
FIGURE 2.8
FAN CHARTS
A relatively uncommon visual is a fan chart. It joins a line graph that
observes data from the past with a range area chart that determines future
predictions. As forecasts become uncertain, the fan chart fades wider and
becomes more transparent. They are often used to predict exchanges tares
or inflation but can essentially be used to display any data with an uncertain
future value.
FIGURE 2.9
SLOPE CHARTS
At the risk of sounding obvious, I state this - slope charts are composed of
slopes. A slope is a mathematical measurement of how steep a straight line
is when plotted against a pair of coordinated axes like time intervals. The
steepness is indicative of how that line increases or decreases in value. An
increase in the steepness of the slope indicates a positive value change for
the variable being displayed, while the opposite is true for a decrease in the
steepness of the slope. A zero value in the slope means it aligns with the
horizontal axis. Slope charts make use of these conventions to show
comparisons between two variables, such as countries, locations, and
regions. It also shows how trends develop over time in addition to rankings,
transitions, and absolute values. Examples of when to use a slope chart is to
display the number of customers who signed up for a monthly subscription
between 2010 and 2023, or a comparison in the employment rate between
men and women. These simple charts can give an effective picture of before
and after scenarios while allowing audiences to visualize changes in
variables such as sales, costs, prices, revenues, losses, profits, and any
crucial aspect over specific amounts of time. They are often used by upper-
level management to gain simple yet powerful insights.
FIGURE 2.10
You can improve the insights gained from a slope chart by including a table
alongside it. This is particularly useful in situations where the variable on
the vertical axis portrays ranking order. Adding a table will allow the
audience to note the values in order in addition to seeing the visual change
in the form of the degree of the slope.
Other ways that you can make a slope chart a more effective storytelling
device include using specific attributes that grab attention so that the
audience better understands the values. For example, you may choose to use
blue or green to indicate rising slopes while using orange or red to highlight
falling slopes. Additionally, you can also increase the thickness of lines that
indicate a great degree of change while making lines that show a lesser
degree of change with thinner lines. Both uses of visual variables draw the
audience's attention to the important trends indicated by the chart. On the
other hand, you must ensure that your slope chart does not become too busy
by adding several intersecting lines.
A great alternative to a slope chart is the bump chart. This chart shows the
change in ranking over time. In particular, the bump chart helps an audience
visualize how different variables change in rank over time. They differ from
slope charts in that they can highlight multiple time intervals along the
horizontal axis. With slope charts, the goal is to show the change in the
value of the slope value. With bump charts, the aim is to highlight how the
positions in rank change over time.
FIGURE 2.11
You can also easily highlight specific items for a better view.
FIGURE 2.12
To create a bump chart In Excel, we can use a ranking system based on the
categories' values. Lets say between 1 and 4. The lowest figure being 1 and
the highest figure being 4. For the example above, we did this for shoe sales
from every month throughout the year—an efficient way to see the best
performers and the best-performing months per category.
AREA CHARTS
If there ever was a marriage between a bar chart and a line chart, you would
get an area chart. This chart displays how at least one set of numeric values
changes over time. This progression is indicated by a line that moves from
left to right but the difference between a line chart and an area is that the
space between the line and the baseline is shaded. Area charts are most
commonly used to show how multiple groups of values compare to each
other. Therefore, such cases will have multiple lines as part of the makeup
of the chart.
There are different types of area charts and each has a specific purpose.
Stacked area chart
This is the type of area chart that is generally implied when the term ‘area
chart’ is used. Unlike the overlapping area chart, stacked area charts use
lines that are plotted one on top of the other with the most recently plotted
line serving as the baseline for the next line to be plotted.
FIGURE 2.13
The highest line serves as the total of the values when they are summed up.
As such, providing the total value is one of the functions of this type of
chart. It also provides a breakdown of each group of values in addition to
allowing for comparison between these groups via each shaded part. To
summarize, the audience gets a general idea of how each group of values
performs when stacked against each other and how they contribute to the
total.
Get the most from your area charts but following the next few guidelines:
Add transparency to better compare
When working with an overlapping area chart, you can add transparency to
see how the figures compare to each other a lot easier.
FIGURE 2.14
FIGURE 2.15
An area chart in this scenario doesn’t add any value. A line chart is
essentially the same thing with less visual noise.
FIGURE 2.16
Lets look at another great area chart. In this case, one showing the
distribution of C02 emissions since 1860.
FIGURE 2.17
FIGURE 2.18
As an alternative to an area chart, you can use a steam graph. It is most
closely related to the stacked area chart, but while the stacked area chart
features the baseline at the bottom of the stack of values, the steam graph
features the baseline running through the center of the chart. The values are
symmetrically assembled around the baseline. As a result, the steam graph
is not ideal if you would like to note an overall value or even the precise
values of each group, especially in comparison to each other. This chart is,
however, great for inviting interactivity from a wide audience. The
members can play with the chart to note findings and the level of interplay
can serve to make the presentation more memorable and educational to
some audiences.
FIGURE 2.9
FIGURE 2.20
To use this type of chart with maximum effectiveness, follow the same
advice outlined for line charts. Also, it is not necessary to use a zero
baseline with this type of chart due to the lack of shading of any other
construct that might distort the values.
GANT T CHART
This type of chart is great for project management as it shows a project’s
schedule and the task and events that occur in that lifecycle.
FIGURE 2.21
Because of these functionalities, the Gantt chart is great for mapping out
marketing campaigns, outlining the deliverables for a client, planning a
product launch, and similar items.
FIGURE 2.22
W E OFTEN HAVE to look to the past to formulate a plan to achieve our goals
in the future. The same holds true for businesses and organizations. Change
over time charts allow you to streamline that planning process by noting
how past variables have changed as time passes. They educate audiences
about trends and thus, invite investigation into why these trends progress in
the way they do over time. The findings of such investigation allow for
developing solutions to maximize these trends when they are favorable or
turning the situation around when they are not.
These charts also allow for noting comparisons and even totals. We have
outlined several changes over time charts in this chapter with the aim of
showing you which are best used under particular circumstances.
3
C O M PA R I S O N
BAR GRAPHS
You can’t study the topic of data visualization without coming across the
infamous bar chart. Also called a column chart or a bar graph, the bar chart
is used to plot numeric values featuring categories represented by a bar.
Each numeric value is represented by one bar and the length of that bar
corresponds to one axis while the values of these bars are plotted on a
common baseline. This commonality allows for each comparison of values
portrayed by the levels of the bars.
FIGURE 3.1
Bar charts are the go-to when the distribution of data points across multiple
categories is being plotted. These categories can be across multiple sets of
data of multiple categories within one group of values. The length of the
bars gives insight into the most common or highest groups or subgroups
and how other groups or subgroups weigh against these.
Get the most out of your bar charts with the following tips:
Avoid 3D
3-D effects can be difficult to align the bar with the baseline and can be a
major readability issue. I also stay away from 3D charts as they do more
harm than good.
FIGURE 3.2
If you see here, the 3D effect just takes away from the insights and the chart
is immediately harder to read.
FIGURE 3.3
This rendition is a lot easier on the eyes.
Be mindful of how you order the categories
Ensure good visual flow by first sorting the bars with the largest values first
and gradually progressing to the smaller value. This makes the bar length
move from longest to shortest. Your audience will appreciate making easy
comparisons between the bars. The only time you should deviate from this
practice is when categories are labeled in a particular order. This order takes
precedence.
Use color effectively
Color can be your friend or your enemy when it comes to designing bar
charts. Use color sparingly and only to draw attention to key insights. When
it comes to color, don't reinvent the wheel. Keep the overall color usage
neutral to ensure unwanted biases, like red for loss and green for gain. I like
to start with a bar graph with no color and ask myself what needs to be
highlighted. We will cover color in more detail in our design chapter.
Include value annotations as necessary
Annotations, or data labels help explain parts of the chart that might not be
immediately clear from a glance. Even though a good bar chart allows the
audience to compare the lengths of the bars and make approximations of
their values, the exact figures might be unclear. The use of annotations
makes values clear when it is important that they be noted. Use these when
necessary by adding them in the middle or at the end of the bar.
T HERE ARE a few common mistakes that are made when developing bar
charts. These mistakes and how to avoid them include:
Using images to replace bars
The aesthetic of such a practice can be tempting but remember that
understandability comes before visual appeal when it comes to data
visualization. Images can make it hard to derive key insights into the data.
Any visual variable that distracts the audience from the core message must
be avoided. Stick to using rectangular bars to present the data.
Using dark gridlines
Gridlines are the light gray lines that run across the axes, carrying the value
to line up with the plotted data points. They can be placed vertically or
horizontally and help the audience differentiate between the specific
insights being depicted. The key word here is light. You might be tempted
to darken the lines for visual appeal, but the practice will likely distract
your audience. Always keep visual clutter to a minimum. If gridlines are
not a necessary addition to your charts, bar charts, or otherwise, do not use
them. In cases where they help educate the audience, keep them faint to
maintain a good visual flow of the chart.
Great alternative versions to your standard column chart include:
Horizontal bar graphs
FIGURE 3.4
FIGURE 3.8
When creating a stacked bar, I prefer to use it horizontally like the one
shown above. It makes it a bit easier to compare insights, and the data
labels add key information.
L OLLIPOP CHART
FIGURE 3.9
FIGURE 3.10
DIVERGING BAR
To diverge means to move apart or separate. When the data values between
groups or subgroups indicate an increasing difference, diverging bars are
the appropriate visualization tool to use. As such, they make a great
resource to compare two alternatives and display results from surveys and
questionnaires. They make it easy to visualize opposing responses and
compare them.
FIGURE 3.11
Diverging bars emphasize variations, one positive (+) and one negative (-)
from a fixed reference point. The scale normally starts at 0, but this is not
always the case. It can be a target or long-term average. Two horizontal bars
are aligned on this scale, with one running to the left and the other to the
right, starting at the common vertical baseline. The length of the bars
corresponds to their numerical value.
A similar chart that can be used in place of a diverging bar is the diverging
stacked bar. It features an additional vertical baseline with horizontal
rectangle bars stacked on one to the next. The values these correspond to
can be percentages or absolute values.
FIGURE 3.12
BUBBLE CHART
Also called a bubble plot, the bubble chart is a relative of the scatter plot,
which uses dots to show the values corresponding to three numeric
variables. Each dot on the bubble chart represents a single point of data.
The value of that point is indicated by the size of the dot as well as its
position on both the horizontal and vertical axis. This chart makes it
possible for you to combine and compare three groups of data on the same
chart and show their relationship through comparison. Examples of points
of data that can be displayed on a bubble chart include a sports teams
average points per game on the horizontal axis (Y axis), the average points
scored by each team on the vertical (X axis), and the number of wins each
team has (the size of each bubble). Larger bubbles would show more wins
and the opposite would be true for smaller bubbles.
FIGURE 3.13
To afford your audience the most value from your bubble charts, use the
following tips to maximize their effectiveness.
Ensure the bubble sizes are relevant to the value of the data.
Therefore, a bubble with a value of 50 should be half the size of a
bubble with a value of 100. Do not skew the sizes of the bubbles,
as this is misleading. As we covered in chapter 1, your brain
subconsciously categorizes items based on size. Accuracy is key.
Limit the number of data points on your bubble charts. Typically,
bubble charts use transparency. As a result, there will be
overlapping between the dots. The more overlapping there is, the
harder it is to distinguish between values. If you have lots of points
to plot, make the points transparent while bolding the outlines of
the circles.
Include a legend. A legend is a key that describes the parts of a
chart based on visual variables like color and size. Ensure that your
audience understands what the bubble sizes represent by using a
legend.
Make sure you're showcasing a clear trend. The bubble chart is the
tool you are using to help the audience visualize this trend.
If your data contains negative values, use transparent dots or
distinct colors to highlight this characteristic.
If you have a lot of values that need to be displayed, you may consider
using a bubble cloud as an alternative to the bubble chart. However, note
that while it makes easy to observe relationships, it is hard to determine the
exact figure or difference between metrics with this type of chart.
FIGURE 3.14
WATERFALL CHART
Also called a cascade chart or a bridge chart, this chart displays how initial
values increase and decrease before leading to a final value. Before this
final value is arrived at, the chart will display the intermediate values that
raised the initial number up and down. This data visualization tool is widely
used in the finance section to show how net values are reached by outlining
the starting values and the positive and negative contributions of the
intermediate values such as expenses and costs.
FIGURE 3.16
SANKEY DIAGRAM
This is a chart with a very specific use. It allows for highlighting the flow of
assets. Therefore, how a company’s sources, the uses of, and even the costs
of these resources can be noted by looking at these visualizations. Many
organizations use such charts to make decisions on how to expend resources
like time, money, and energy because the chart simplifies the complex
process that goes into managing that one resource. A bird’s eye view can be
obtained in addition to getting specific details. As a result, the highest
contributors or dominant consumer base stand out, so the areas with largest
opportunities can be spotted.
FIGURE 3.17
When we put the same data in a bar chart, it's a lot harder to visualize where
our resources are going.
FIGURE 3.18
When developing a Sankey chart there are a few considerations that must
be made. You must ask the tough questions like:
Asking and answering such questions allows you to solidify the purpose of
creating this chart and develop the key takeaways it must deliver.
But what does this chart look like? The chart uses rectangular or connector
lines with proportional widths to represent the significance of values so that
flow quantity can be visualized. These values are highlighted by the flow
lines' width, color, saturation, length, and shape. Some flows will have
smaller widths than others. Consider cutting them and combing their values
in an ‘other’ category to limit clutter.
A great place to start creating sankey diagrams with no code required is
Flourish.
MARIMEKKO CHART
This chart can be considered a stacked bar chart with bars that vary in
height and width. Both of this chart’s axes are completely stacked in a 2-D
effect. The chart is used to showcase 2 numerical values for each category.
FIGURE 3.19
BULLET CHART
Also known as a bullet graph, this chart allows you to perform three
activities in one go. These functions are:
FIGURE 3.20
The bullet chart is a cousin of the bar graph appearance-wise. The featured
value is represented by a rectangular bar and a vertical line represents the
target value. If the rectangular bar fails to meet the position of the vertical
line, then the featured value is less than the target value. The featured value
has met or is greater than the target value if the horizontal bar touches or
reaches past the vertical line. The chart can be oriented both horizontally
and vertically.
While a bullet graph features a primary value for comparison, it also
includes other measured values to enhance the visual display and to provide
more information for analysis. So, while the primary featured value of a
bullet chart might be the end-of-year revenue of a company, the chart can
also feature the end-of-year revenue for previous years or projected
forecasts for the future. The primary featured value must stand out on the
chart though and this is why it has a stronger color and bold lines to
differentiate from the other values.
A good bullet features 5 main characteristics:
1. Text labels
These are captions that state what the chart is displaying and the unit of
measurement for the values.
1. A quantitative scale
This shows the linear progress of the measure of the metric values. In other
words, it shows the start and end points of the featured and target
measurements.
1. A featured measure
1. A comparative measure
This highlights the target metric that the featured value is being compared
to.
1. A qualitative scale
This is a background fill that shows ranges like good, satisfactory, and bad
to define how the featured value compares to the target value.
When designing bullet charts for maximum effectiveness, use a color
coding system. In addition to using a strong, clear color for the featured
value, you should further highlight its measure by using softer colors for the
comparative values. Also, use a scale from dark to light to show the
qualitative scales. The darker colors will represent the lower, less favorable
values, while lighter hues will showcase the higher, more favorable values.
Limit the number of variables added as well. Because so much is going on
in one chart, the sight can be confusing at the initial glance to a new viewer.
Make sure to explain further so they understand the chart and limit the
number of values displayed as much as possible.
Additional tips for maximizing the use of bullet chart include:
Redesign the chart for values that require low numbers, like costs
and expenses.
Ensure qualitative scale (background fill) is in sync with values
being displayed.
When reversing the qualitative scale, keep your audience in mind.
Will they quickly glance at the chart and determine the low results?
Consider reversing the direction of the quantitative scale if not.
DUMBBELL PLOTS
Also called connected dot plots, gap plots, range plots or arrow plots,
dumbbell plots showcase 2 or more related series of data on the same axis
through the use of connected dots linked by straight lines. These charts are
great for showing the range between the minimum and maximum values of
categorical data. They are called dumbbell charts because they imitate their
namesake with dots at either end of a straight-line.
FIGURE 3.21
All these ‘dumbbells’ are plotted on the same charts, making efficient use
of the graph space designed with multiple data variables being highlighted.
One axis of the chart shows the range of values or the categories that the
data points have been grouped. The second axis highlights the number of
data points in each category. The chart can be plotted either vertically and
horizontally, whichever works best for readability in the particular instance
of the data set.
The values are normally quantitative in nature. They are a great alternative
to using line charts or grouped bar charts. They can be used to represent up
to 1000 data points but note that the chart can get cluttered the more data
points that are added.
FIGURE 3.22
While dumbbell plots should not be used to plot large data set due to this
possible clutter, advantages to using these charts include:
Dumbbell plots are such a great comparison chart because, at a glance, the
audience can note the trend that the data is taking as well as any skewness
suggesting anomalies. They are used often by financial institutions in
instances like expressing interest rate projections and in scientific research
to highlight the similarities and differences between variables.
T he name says it all. Distribution charts allow for visualizing how data
values are distributed or spread out on a grid. They let the audience
know how frequently these values occur and how uniformly or
disbursed they occur. Several groups of data can be compared in such charts
as a result.
Let’s get right into noting the different types of charts that fall under this
bracket and when and how you should use them for maximum
effectiveness.
HISTOGRAM
This is a common chart that most people are familiar with. It is most
frequently used to show statistical distributions as it plots the dispersion of
a numeric variable’s values as a series of bars. The numeric values are
grouped into classes that occur in equal-sized intervals. Classes are also
called bins. This feature allows the audience to understand the approximate
probability of the quantity occurring. The bars can adopt either a horizontal
or vertical orientation. Each bar normally covers a class. Thus, a bar’s
height indicates the rate at which data points occur within the corresponding
bin.
FIGURE 4.1
DOT PLOT
This chart is sometimes shrouded in confusion because it is often mistaken
for others like icon bars, scatter plots, dumbbell charts, and beeswarms. But
we will list the features and what this chart is used for to ensure you do
suffer the same fate.
A standard dot plot is a statistical chart that shows at least one quantitative
value for each category by plotting one or more dots on a numerical axis.
This data is represented by filled-in circles. Unlike the histogram that
displays distribution on a range, the dot plot displays the distribution of
individual values along the X axis. The dots show the frequency at which
these individual values occur along the Y axis. As a result, dot plots are
particularly useful for showing specific values and how they compare to
different categories with similar values. Such a comparison may be hard to
note on a bar chart since the end of the bars are difficult to compare to each
other.
FIGURE 4.3
Color is an instrumental part of this chart. You can use color in various
ways to maximize the chart's effectiveness. Such techniques include:
Highlighting the numeric value of dots with darker and lighter hues
to emphasize that value.
Color coordinating dots according to their category. This should be
a consideration when there are multiple categories with the same
values. Color coordination makes it easy to distinguish these
categories.
Symbolizing time periods to show the past and present through
chronological values.
Changing the opacity of the dots if there are slight overlaps to
make it easier to read or highlight higher or lower values.
RIDGELINE PLOT
Also goes by joy plot, a ridgeline plot is a chart highlighting how numeric
values across several data groups that overlap are distributed. The chart's
name comes from the fact that it resembles an overlapping range of
mountains. As a rule of thumb, use this chart when you need to highlight
the distribution of at least 6 groups of data. You will find many cases with
large volumes of data featuring time displayed in such a chart. An example
would be a measure of the rainfall in a particular region over the last 20
years. The chart would give insight into the values of each year and allow
for noting the trends that have developed in that timeframe. Comparing
temperature averages between separate years is also a common dataset for
this chart.
FIGURE 4.4
Ridgeline plots are great when you want to quickly see data distribution
over a specified period. This allows you to easily see where the spikes and
dips are and compare them to different categories. They are a creative chart
you don't always see used, but they’re worth experimenting with to create a
memorable visual.
FIGURE 4.5
BOX PLOT
This chart utilizes lines and boxes to show the distribution of values of at
least one dataset.
FIGURE 4.6
FIGURE 4.7
The IQR determines how long the lines extend on either end of the box. The
most that either line can extend is 1.5 times the IQR. If the extension
exceeds that, this is marked by dots representing outliers. Box plots can be
effective when you're looking for outliers in your data, as you can set the
parameters and they will appear as a separate dot outside the plot.
Use this type of chart when comparing the distribution of the values
between multiple groups of data as they provide several details at a glance.
Such details include:
Data symmetry
The level of skewness
Any variance
Outliers
FIGURE 4.8
FIGURE 4.9
CANDLESTICK PLOT
Stock, derivatives, currency, cryptocurrency, bonds, commodities... If you
plan to or already invest in or trade any of these assets (and others), you
will use and analyze candlestick plots. Invented in Japan over 100 years ago
by a man called Munehisa Homma, they are used to analyze the price
movements of assets over time. While the invention was made to show the
link between the price of rice and its supply and demand, in this day and
age, their use is being applied to a wider variety of resources and their price
variations.
FIGURE 4.10
Let’s start with what this chart looks like. It gets its name because it features
multiple bars with lines that look like wicks on both ends that look like
candlesticks. These candlesticks give a visual representation of how the
asset’s price changed over a given amount of time. This time period can be
as little as a few minutes or as long as years.
FIGURE 4.11
Each “candlestick” on the chart provides traders and investors with several
sets of important information. First, the bar's color indicates the price
increases or decreases. An increase in an asset’s price is represented by
green or white. This candle is referred to as bullish and shows that the
asset's value closed for that period of time at a higher price than it opened
with. The opening price is at the bottom of the bar while the closing price is
at the top.
A decrease in an asset’s price is represented by red or black. This candle is
deemed a bearish one and shows that the asset's value closed at a lower
price than it opened during that time frame. The opening price is indicated
at the top of the bar, while the closing price is at the bottom. Opposite of the
bullish candlestick.
The difference between the price the asset closed with and opened with is
represented by the length of the bar between the wicks. Longer lengths
indicate a larger change in price. Shorter lengths indicate smaller price
changes.
The wicks also serve a purpose. The upper wick shows the highest price of
the assets traded during the specified time, while the lower wick shows the
lowest price.
Candlestick plots are useful for quickly determining whether an asset’s
value went up or down and if this movement is peculiar or part of a trend.
Time is money for traders and investors. Waiting too long can mean they
miss out on potential buying opportunities. Therefore, this fast analysis is
key to staying ahead of the game with quick decision-making.
VIOLIN PLOT
Similar to a box plot, the violin plot highlights the distribution of the
numeric data for at least one group using density curves. The box plot and
violin charts are so closely related that they typically accompany each other
when presenting to provide supplementary information. A violin plot
includes all the features found on a box plot, such as the median, outliers,
quartiles, and the spread, but the difference is that the violin plot also shows
the probability of data occurring at different values.
FIGURE 4.12
Each density curve is made up of peaks, valleys, and tails. The density
curve also goes by the name kernel density estimator (KDE). Each data
point of the KDE contributes a small area to the true, overall value of the
data. The distribution of these points determines the shape of the curve.
This shape is called the kernel function and can vary from triangular to bell-
shaped. The final shape of the density curve is determined by stacking all
the data points together to form a whole. Density curves are developed
around a center line instead of a baseline but follow the same convention of
construct and interpretation.
Violin plots are especially useful for showing the distribution of value
between multiple data groups so that comparisons can be made by noting
the differences and similarities between each group's peaks, valleys, and
tails.
FIGURE 4.13
STRIP PLOT
This chart is similar to a scatter plot but is used to express the value of 1
variable per column. Strip plots can be effective when raw data is important
because they simply show the data with no added trend lines or design
features. Wherever the data points fall, you can easily analyze them. This
also makes it easy to spot outliers in the data.
FIGURE 4.14
FIGURE 4.15
Due to the high concentration of the dots that is possible with this type of
chart, legibility decreases with increasing dot numbers. Don’t trash your
strip plot if that happens. Instead, use a technique called jittering. All you
have to do is disperse the data points across the X-axis to increase
readability. You can also convert to a beeswarm plot, essentially the same
concept.
BEESWARM PLOT
Think of this chart as a strip plot thats easier to read. A beeswarm plot has
the same functionality as a strip plot, simply showing the data exactly
where it lies. It has a slight advantage as the points are spread apart by
improving on the jittering effect, which reduces overlap. Because of this,
beeswarm plots are great for displaying the distribution of dense data sets.
FIGURE 4.16
T HE NAME COMES from the fact that the slightly spread-out nature of the
dots looks like bees buzzing around a hive. Improve the chart's readability
by adding labels to make outliers clearer to identify. A good data analyst
visualizes data beautifully. A great one does the same thing, except
highlights the key insights.
FIGURE 4.17
D ISTRIBUTION CHARTS ALLOW for noting when trends develop and when
unusual values (outliers) occur over a given period. To select the right
distribution chart for your audience, you need to ask yourself a few
questions:
Answering these answers will give you a clear picture of what you are
trying to accomplish by developing distribution data visualizations. The list
of charts outlined above is by no means all that can be used to show the
spread of values in data sets but they are a good foundation that allows you
to express the disbursement of several types of data.
5
PA R T-T O -W H O L E
PIE CHART
Pie charts are the data visualizations that typically come to mind when
talking about part-to-whole comparisons. Therefore, it is only fitting that
we start our exploration of these chart types here. Just as the name suggests,
a pie chart takes on the form of a circle and even if you are not hungry, it is
quite reminiscent of a pie. The entire ‘pie’ represents the total value of the
data set (100%). The ‘pie’ is sliced into radial portions that present the
categorical variables of the subsets that make up the total. The sizes (AKA
the arc length and area) of those portions showcase the proportion of the
whole they take up.
FIGURE 5.1
But using a pie chart correctly is crucial, otherwise, you're doing more harm
than good. Would I use a pie chart to showcase a neck-and-neck race
between 5+ political parties? Absolutely not. There will be no significance
or uniqueness to the information. I would use it to compare the top two
parties, so you can easily see who has the majority of the whole. A quick
but effective way to visualize this type of data. A pie chart with too many
slices shouldn't be a pie chart to begin with.
FIGURE 5.3
In this case, you could use a bar chart or lollipop chart since the values are
very similar.
FIGURE 5.4
T O GET the most value from a pie chart, only use it to highlight a few values
that can easily be distinguished. A pie chart filled with lots of figures with
similar values will essentially not show you anything. You need to be able
to easily distinguish the contribution of each value. noting the highest and
lowest from a glance. Or any significance in the data.
A great practice that will maximize the use of pie charts includes using
color appropriately. Do not use highly distracting colors, and keep those
colors consistent throughout the presentation. Ensure that these colors
reflect the theme of the data. You can be sparse with your use of color by
only adding color to the main insights that are being showcased and leaving
the rest of the slices gray. This is a great practice as it allows the audience to
decipher the important parts and can be effective if you have many
categories.
Since exact proportions can be difficult to interpret by looking at a pie
chart, you can consider using annotations to the chart. In fact, the addition
of annotations to pie charts is standard. They can be in the form of fractions
or percentages with the category name started.
An alternative to using a pie chart is the use of a donut chart. Also known as
a donut plot, visualize this as a pie chart with the center removed. There is
no significant difference in its readability.
FIGURE 5.5
TREE MAP
Pie charts are the go-to when it comes to comparing the different elements
of a single entity. However, they can be difficult to use effectively. That is
why there are other options such as this one, the tree map, that can be alot
more effective.
FIGURE 5.6
The tree map, in many ways, is thought to be a better version of a pie chart
by data visualization experts. It represents hierarchical data in a tree-like
structure with sub-branches of the data being represented using rectangles
called nodes. Each node allows for the showcase of 2 quantitative values.
This structure makes it easier to spot trends like the bestselling items
bought by new customers in the current year and the growth rate from the
previous year. Even better is that data can be drilled down into an infinite
number of levels while still maintaining the distinguishability of the
categories at a glance. They are often populated in a hierarchical order,
showing the highest value first and down the ladder to the lowest. They can
even be made interactive with certain software so the reader can take their
time and go through each point if all of the data is significant.
FIGURE 5.7
Unlike pie charts, tree maps allow a larger amount of data input. More
categories can be highlighted within a smaller space. To be more exact,
treemaps can be used to plot tens of thousands of data points! So, why pick
this hierarchical chart over others like a multi-level pie chart? Treemaps
have the advantage of allowing the plotting of these many, many data points
in limited space. Even a multi-level pie chart is circular, so the space
available is limited to the diameter of that space. Only so many data points
can be added to the structure.
On the other hand, treemaps are plotted in a linear fashion. This space
offers far more possibilities. Do note that the deeper we delve into the level
of a tree map readability decreases. Therefore, this advantage can turn
against you if you are not careful with its use. In some cases removing data
points to create a smaller dataset will remove critical insights. In this case,
It can be an effective practice to make the chart interactive and send it out
to the team, so they can easily zoom in on details and categories no matter
how big or small the chart is.
The structure of tree maps also allows for easy identification of trends and
patterns as the nodes are proportional to the amount of data they represent.
The similarities can be summarized within a category and its components or
between multiple categories. This functionality is allowed because the
different datasets are assigned different colors. Anomalies can be sighted as
well because of this feature. This is allowed through the use of node
dimensions and colors of the nodes. These are derived from the numerical
values of the nodes.
Use a data set with a distinct hierarchy to get the most out of your tree map.
Ensure that the highest level of the hierarchy is obvious. Also, ensure this
data set has distinct numerical values. They can also be useful when a quick
presentation isn't necessary and the executives want to take their time
reviewing the data in an interactive way like we stated above..
This chart is not appropriate to use if you have similar values as the nodes
will be similar in size and hard to distinguish. In such a case, the better
alternative would be using a bar chart with the data arranged from the
highest to the smallest value.
SUNBURST CHART
This chart goes by other names, including radial tree map and ring chart. It
is also used to visualize hierarchical data sets. Unlike the tree map, which
uses a linear structure, the sunburst chart uses a series of concentric rings to
highlight hierarchy. Every ring coincides with a level within the hierarchy.
The details of that data set are recorded with the segmented rings. The part-
to-whole relationship between the subsets of data that is noted within each
ring respective to its parent ring.
FIGURE 5.8
The radial layout on the sunburst chart gives an immersive experience and
is easy for the eye to follow. The center of the charts is the first level of the
hierarchy. The parent rings are found there. From there, rings representing
subcategories within the parent rings are plotted from the categories that
contain the highest value within that hierarchy to the lowest value. This
must be applied at every level of the hierarchy. Moving away from the
center of the chart means moving down the hierarchy.
FIGURE 5.9
This chart is often compared to a tree map but this chart has an advantage
over the tree map. The categories are noted in an outwardly expanding
circle, so noting the plotted categories as we go down the hierarchy
becomes easier because of that expansion. If you have a lot of space to
work with, this chart can trump a tree map to get a full picture of
hierarchical data.
Another advantage to using a sunburst chart is that because it is visually
similar to a pie chart, most audiences can more readily follow the flow of
information it offers.
The disadvantages to using a sunburst chart include the limitation of the
level that can be plotted based on its structure. Just like a pie chart, the
number of categories that can be included in the circle space is fewer than
in a linear structure. Also, angular recordings and smaller proportional
segments might be difficult for the audience to read.
Circumstances, where you can play on the advantages while minimizing the
disadvantages, include:
Also called the coxcomb chart or the polar area diagram, this chart gave her
this ability by combining components of a column chart and those of a radar
chart and so, this chart was presented to the world in 1858. Plotting occurs
in proportional areas in a polar coordinate grid system. These areas
(categories) are equally divided into segments with the same angle.
The funny thing about the dataset she visualized is other chart options
probably would've been better for the data. But this rendition stuck within
the data community until today and will for years to come. When you want
a memorable visual to enhance your story is exactly when you would use a
nightingale rose chart.
FIGURE 5.11
The most notable features of the Nightingale Rose chart include it is used to
plot multiple data series. They are represented by rings that radiate from the
center of the chart, hence why this chart looks similar to a pie chart. They
represent the cardinal points North, East, South, West and the points in
between or the degrees of a circle. The data values are recorded on these
circles and divided into proportional slices representing the quantity. The
value is highlighted by how far the segment extends from the center.
FIGURE 5.12
This chart is highly used in the scientific field to highlight statistics. For
example, they are used by meteorologists to note and thus analyze
quantities and direction for items such as wind direction, strength and
frequency. That is why the chart is referred to as the wind rose in the field.
It is used as a reference to discern the cardinal and ordinate direction of
winds.
As great of a statistical tool this chart is, it has a major disadvantage. The
other chart segments are larger, so their size draws more attention. This
disproportionately represents an increase in value when this is not the case.
The value is represented by the area and not the segment's radius. This can
be unintentionally misleading to audiences.
T HE FREQUENCY with which we work with different data sets makes it easy
to forget that we also need to understand the interconnectivity and
differences between the subgroups within individual data sets. This chapter
highlighted that importance and gave you a variety of chart options to act
on that understanding.
6
R E L AT I O N S H I P
As seen with the example above, scatter plots are used primarily to note
relationships between 2 quantitative variables. They note not only data
points but the patterns of the data as a whole as well. For example, the chart
explained above may show a concentration of dots representing the
majority of home sales based on size. By looking at how the dots are
concentrated, you can determine how the relationship between the variables
can be described. Is the relationship weak or strong? Is it positive or
negative? Is it linear or nonlinear? You will make this determination by
examination of how the axes affect each other.
Additionally, scatter plots can be used to identify unexpected gaps or
outliers in the data. These will stand out quite a bit when they are removed
from the concentration of the other dots.
While a scatter plot is quite useful in showing the relationships between 2
data variables, that functionality will be lost if they are not used correctly.
The don'ts of scatter plots include:
Overplotting
As with other charts that utilize dots as the visual representation of data, too
many plotted items can lead to overlapping. The high density of data points
occurs when there are many dots in one location. The chart becomes hard to
understand as the points are hard to distinguish. As a result, the
relationships these points signify will also be hard to distinguish.
Luckily, if you find yourself in such a situation with your scatter plot, there
are 2 easy ways to resolve this. They are:
FIGURE 6.2
Although overplotting can be redundant in scatter plots, it can also be their
strength. Scatter plots can be effective when working with very large sets of
data. It allows you to easily see the trend or trajectory of the data and look
at it as a whole instead of individual points. Adjust your scatter plot
accordingly based on your needs.
Interpret correlation as causation
Noting the relationship between 2 variables of data does not mean that the
causation is understood. This serves as a lesson in real life and when using
scatter plots. While this point is more geared toward the issue regarding
observation and not the creation of a scatter plot, you as the designer must
understand this. You may need to also include data that supports
highlighting the causation of trends noted in these charts. Always remember
that a change in one variable is not necessarily responsible for or linked to
changes in another. The observed relationship between these 2 variables can
be driven by a third variable affecting both plotted variables. With the
housing price example above, there is no way to know if square footage and
price affect each other unless data is noted outside the chart to support this.
The pattern can be purely coincidental. Looking at other factors like
location and the year the house was built might bring a lot of insights to
light. A newer home closer to downtown might have much less square
footage but be at a much higher price than one further from downtown and
potentially built some time ago.
Properly observing the data is a must when it comes to scatter plots. Some
data sets may be harder to understand than others. It's important to consider
all factors before drastic changes are made based on the insights.
To maximize the use of scatter plots, you can:
Add a trend line
A trend line or regression line is a line added to a chart to indicate the
general trajectory the data takes. This can help your audience better
understand what is going on. Remember, you've been staring at these
spreadsheets and creating this visual for quite some time. Your audience has
been looking at it for 5 minutes. Don't assume they know what you know.
In fact, it is a common practice for this addition to be made to show the
strength of the relationship between the two variables. The presence of this
line also makes it more apparent if there are outliers. They may affect the
trend indicated by the line.
FIGURE 6.3
RADAR CHART
Also called a spider chart, web chart, or radial chart, a radar chart displays
quantitative multiple data variables. This is plotted starting from the same
center point. The general shape of the final layout of data resembles that of
a web.
FIGURE 6.4
Radar charts are used when there are large numbers of variables. These can
be plotted on a bar chart but such a situation tends to make a bar chart look
cluttered and the data is hard to understand. Radar charts also make it easier
to review multiple performance metrics of a single subject area.
One niche where radar charts are commonly used is in employee
performance reviews. You may have seen it labeled as an Employee Chart
outlining the employee’s ratings in different skillset areas like punctuality
and technical knowledge. This is useful in such an arena because radar
charts make commonalities and outliers strikingly obvious.
FIGURE 6.5
You can maximize the use of radar charts with a few practices that include:
CHORD DIAGRAM
This type of chart highlights the connections or flow of information
between several data variables called nodes. With chord diagrams, these
connections are represented by fragments of a larger circle. Arcs are used to
connect the nodes. The size of the arcs highlights the strength of the
connections.
FIGURE 6.6
This chart is particularly useful when the visual appeal of the data presented
is important. Therefore, you might find information like immigration flow
from one country to the next represented by a chord diagram.
FIGURE 6.7
NETWORK DIAGRAM
Like a chord diagram, network diagrams show interconnections between
data variables with nodes representing each entity and connections between
these nodes with links.
FIGURE 6.9
Network diagrams are typically used with larger, more complex sets of data
and most data analysts will rarely see one in their career. It shows a broad
overview with the ability to zoom in and view specific relationships.
FIGURE 6.10
There are 4 types of network diagrams to choose from. Which one you
choose to use depends on the method of data input. Let’s highlight the
specifics for each input method:
Undirected and unweighted
This input type shows that the entities are connected but there is no
direction and no weight. An example would be to plot the data points that
show Jim, Linda, and Lauren live in the same house.
Undirected and weighted
With this input type, the data points are connected and give information
based on the weight of the relationship. To illustrate, let’s say that these
people above are connected if they published a blog together. The weight of
the line is the number of blogs they have published together. The more
pieces of work they have published together will determine the strength of
the relationship.
Directed and unweighted
Let’s say Josh reads Amanda’s, Ella’s, and Isabella’s blogs. But only
Isabella reads Josh’s blogs. Ella and Amanda read each other's blogs, and
Amanda reads Isabella's blogs. The connections are unweighted. They are
either connected or not.
Directed and Weighted
People migrate from one country to another. The weight of the line
determines the number of immigrants. Direction is the destination.
Network diagrams can be quite complex to develop, so we use more
advanced platforms and software to create them. For the advanced analysts,
Depending on the algorithm you use, your network diagram will take a
specific form based on the built-in layouts. This form is important as
finding an optimal position for each node highly impacts the output your
audience will view. Network diagrams also have the ability to be
interactive, and you can select a node and drag it around to better view the
relationships. Several algorithms have been developed for different
scenarios, including:
Circle
Sphere
Fruchterman
Reingold
Random
Whatever scenario is relevant to your data output, ensure that your links
overlap as little as possible and that there is minimum crossing at the edges.
You can choose to make the lengths of the edges uniform or not. Depending
on your data, you don't always have the freedom to change the overlap. The
data does what it does, but smaller sets showcased in different visuals can
help simplify large data sets.
FIGURE 6.11
These algorithms can be customized via the shape and color specifications
of each node to add more insights to the data. You can also customize these
charts by creating a series of network diagrams over a long time span so
values that have changed over time can be compared. We will keep network
diagrams brief in this book, but if it is something you're interested in, I
would recommend expanding your knowledge on them.
TREE DIAGRAM
This chart also goes by the names linkage tree and organizational chart. It
helps audiences note data hierarchy in a tree-like structure. This structure
consists of the following elements:
FIGURE 6.12
It can also effectively showcase sales throughout the year and determine the
item's overall sales volume. Usually, you would use a line chart or a bar
chart, but a parallel coordinates plot adds a refreshing spin to the data. It
also allows you to add an extra variable to categorizing the data, in this
case, determining if the sales volume was average, high, or low.
FIGURE 6.15
The order of the variables determines any trend line. Make these
trends clear by ordering variables accordingly to showcase certain
insights. Try different scaling to note which works best to suit your
data.
Parallel coordinates plots can become cluttered and even illegible
quickly since so many variables can be compared on one chart.
You can avoid this clutter and keep insights clear by using the
technique called brushing. The technique highlights and isolates a
selected line/s and fades out the others. This makes for easy
interpretation of specific data. This can also be done interactively
as you present, selecting separate lines showcasing their insights.
FIGURE 6.16
Further, decrease possible clutter with axis order. Moving just one
variable position on an axis can minimize the number of times
lines cross. Do not sort variables on the X axis as it causes line
crosses.
D ATA SETS OFTEN OVERLAP and mingle with each other. So do subsets in
groups of data. We need to highlight how these variables interact with each
other as they can and do sometimes affect each other. Relationship charts
allow us to dissect those connections.
7
GEOGRAPHICAL
FIGURE 7.2
CHOROPLETH
This type of geographical map displays divided geographical areas or
regions. This display is facilitated by shaded or colored sections based on
the numeric value. Choropleths are great for showing clear regional patterns
in data. For example, unusually high crime rates in a particular
neighborhood in contrast to its adjacent city could be illustrated using a
choropleth.
FIGURE 7.3
These types of maps allow you to see the big picture but they are not great
at allowing your audience to see the subtle differences. Using chloropleths
as a birds-eye view and then another chart for a zoomed-in insight can be an
effective strategy to get the visual representation from a map and a
memorable insight from a more detailed view.
Another downside is that intervals between colors do not equate to the same
interval between your data values. While these charts are great for
highlighting patterns, they do not make a great tool for comparing exact
values between regions. If exact values are needed for decision-making,
using a zoomed-in view or another chart in collaboration with this chart can
be effective.
With Choropleth maps, color is your best friend and worst enemy. Being
strategic is important for the effective representation of the data. Some
things you can do to maximize your color are:
Use the right color scheme
Use lightness to highlight the difference in sequential and diverging color
schemes. Color gradients from light to dark to help the audience spot high,
low and mid values because that is the natural inclination of the brain. That
is the practice for sequential color schemes. On the other hand, with
diverging color schemes, the extremes need to have the darkest colors and
the lightest colors should be in the center.
Use fewer colors in qualitative color schemes
The more colors on your map, the harder it will be to note their various
meanings. Make it easy for your audience to make that recollection by
limiting it to 3 colors when using qualitative color schemes. This will
ensure that your audience does not have to constantly refer to the key to
familiarize themselves with what these colors represent.
Ensure the audience sees the difference in data
Show the difference in data values with different colors. Use the brightest
and darkest colors to show extremes. Make use of stops. These are equally
sized parts of the color palette. These can initiate low and high values.
Using stops highlights the contrast between these extreme values. Do not
use stops too much as this will cause too much contrast.
Consider using a continuous color scheme over a discrete color scheme
This ensures a smooth visual gradient or eye flow. Continuous color
schemes allow for comparing neighboring regions with one color used in
different shades. On the other hand, discrete color schemes assign distinct
colors to different values. Subtle differences are not very noticeable with
such a scheme, even though these do allow the audience to note the range
that these values fall in quickly.
Create an accurate color legend
Ensure that your key is immediately readable to your audience. With a
sequential color scheme, layer the scheme from lowest to highest value with
the 2 to 4 other values in between. These are placed in equally spaced
intervals like 50, 100 and 150. With divergent color schemes follow the
same logic with a display of the center values.
Use labels for relevant insights
Highlight pertinent information with the use of labels.
PROPORTIONAL SYMBOL M AP
Typically making use of circles and squares, this chart proportionally scales
the size of simple symbols so that data volume based on location can be
visualized. The premise supporting the development of such a chart is
simple: the larger the size of the symbol, the larger the data of volume that
exists in that particular location and vice versa. The smaller the size of the
symbol, the smaller the data volume recorded for that particular location.
This is because the symbols are scaled directly proportionate to the data.
So, if the data volume for Florida is twice as large as New York's, then the
symbols (mainly circles) highlighting that volume will be twice as large.
FIGURE 7.4
This data can be grouped into categories or numerical ranges. From this,
graduated symbol maps can be created. This allows you to reduce clutter by
reducing the number of symbol sizes corresponding to different categories.
Proportional symbol maps are useful in a variety of circumstances. First,
they allow for showcasing or comparing the relevance of data value based
on a region. The audience is given a clear insight into the significance of a
region's data. There may be times when smaller regions have more
significant data. They can quickly get lost in the sea of larger regions in
other types of charts but not with this one. The more significant the data
noted in a region, no matter its size compared to others, the bigger the
symbol overtop. Data will not go unnoticed no matter the location.
Proportional symbol maps are also great for highlighting the risks or
chances of something happening in or to a geographical area.
Proportional symbol maps allow great flexibility because they can represent
numerical data like age but also ordered categorical data like low, medium
or high data variables. That flexibility also extends into these charts having
the function to highlight geographical points such as exact locations as well
as geographical areas such as countries over a world map. Large circles will
be easily noticeable and can be immediately understood as a country with a
high value. A great option for a birds-eye view to see the best performing
regions.
Examples of effective uses could be highlighting the total population of the
10 largest cities in the world or the location and magnitude of earthquakes
in Japan over the last 100 years.
Symbols tend to overlap if large variations or several data locations are near
to each other. Overlapping prevents proper analysis of the data. But you can
still make this chart work for its intended purpose by using various
elements such as size, transparency and exact color to improve the
audience's ability to interpret different values of the map. These visual
elements allow for separating the symbols.
Another way to bypass this problem is to move the symbols so that they
have a bit more room to breathe and thus, be clearer to the audience. Be
careful with this practice, though, as you risk removing the symbol from its
factual location. This can lead to misinterpretation of data.
More ways to maximize the use of proportional symbol maps include:
Ensure the size of symbols are relevant to data. If one continent has
double the population of another, its circle or square should be
exactly double the size of the other continent.
Provide context of the scale in a legend so the audience
understands the rough difference between a small and big symbol.
Highlight specific points that are relevant to your presentation.
FIGURE 7.5
FLOW MAP
This type of geographical chart shows the movement of information or
objects from one location to another. This chart also highlights the value of
these motions. Think of flow maps as a combination of a map and a flow
diagram. The most common use of flow maps is to show the amount and
magnitude of the migrations of items like people, animals, or products in a
single line. Even the flow of money and vehicular traffic can be highlighted
in this way. This relative amount is showcased in the thickness of the lines
in some cases. As a result, flow maps help highlight the distribution of these
data variables geographically. To sum it up, flow maps have four functions.
They show distribution, volume, movement and location.
Flow maps have unique anatomy. The lines start at the point of origin on the
map and branch out in flow lines. The movement is indicated by an arrow.
The arrowhead lands on the destination. These parts come together to show
the contrasts in the qualities that make up the spread over territories of the
items illustrated.
There are three categories of flow maps:
FIGURE 7.6
These types of flow maps showcase the relationship between one source of
an item and its many destinations and uses. This is highlighted by several
lines coming from the origin and radiating out to show the movement. The
accuracy of the route is not the main focus. Rather, the general direction is.
Radial flow maps are commonly used to show the volume of goods being
traded on a global scale. Getting products delivered to your home from
across the sea is facilitated by charts such as these.
FIGURE 7.7
N O MATTER THE type of flow map you used, there are a few universal
strategies that can be used to maximize their function:
G EOGRAPHICAL CHARTS ARE in a class of their own. There are often times
when we might work with data covering regions or even worldwide. In
many cases, the amount of categories that make up this data far exceeds the
comfortable amount for your standard chart. Geographical maps are an
excellent way to visualize regional or national data with many categories
and make it easy to interpret.
8
TA B L E S A N D P I C T O G R A M S
FIGURE 8.1
Tables are a powerful tool for communication when used correctly and
under the right circumstances. They fight that under simplified view of
some charts as they convey a significant amount of information. They are
best used when highlighting data relating to benefits versus risks to the
audiences. This is possible because of the simple yet flexible nature of the
table structure. They can be easily adapted to allow the audience to gain fast
yet efficient readability across rows and columns. Tables can provide
consistency and clarity, both features needed for informed decision-making.
An appropriately designed table allows the audience to quickly extract the
required information, decreasing the cognitive burden placed on the
audience.
When to Use Tables
Instances where using tables makes sense include to:
Allow the audience to look up particular points of information
Often, an audience will not find it pertinent to peruse all the data in a table,
especially since tables tend to be jam-packed with a high volume of data.
The audience will only seek out the data that is relevant to them and the
problem that they are trying to solve. We are naturally attuned to sifting
through data to only focus on what is applicable to the situation.
FIGURE 8.2
Take advantage of this tendency to look up only what is relevant by
structuring your table in such a way that the information of interest to your
audience is not embedded in a block of data. Instead, make the data visible
by laying it out in a way that is natural for the eye to follow and ordered
appropriately. We will cover more of this in the “create better tables”
section
Highlight precise numbers if they apply to the data presentation
Tables and charts can be used in tandem. It may be hard for the audience to
note figures of interest in charts because they focus on the relationship
between data sets and categories within data variables. But using tables
along with charts allows for more clarity of specific figures such as the best
price to list a product or the best interest rate for the highest rate of return
10 years down the line.
FIGURE 8.3
FIGURE 8.4
Another way of adding a dimension of visualization to your table is
to show development over time. This is facilitated by a tool known
as sparklines. These are mini-line charts at the end of a row that
show the development between the time points. The general trend
is highlighted by each sparkline. It is an easy way to make your
char more visual and insightful.
FIGURE 8.5
We have addressed how to use tables for the best communication with your
audience. Experiment and try out new techniques. Tables can be more
effective than you think.
Next, let’s discuss the appropriate circumstances for using pictograms and
how you can optimize them.
FIGURE 8.7
C REATE a key
The key is a tool that highlights the values assigned to each icon or image
used in the pictogram. To illustrate, you may use dog icons to show the
number of stray canines roaming a particular city and in need of a good
home. You may denote a value of 100 for each icon. Another use would be
many small human icons representing 100% of your customer base, and a
percentage of them are filled in with color signifying said percentage are
returning customers.
FIGURE 8.8
INFOGRAPHICS
Infographics are information packages that compile a collection of data
visualizations, images and a few texts. The aim of infographics is to give
audiences an easy-to-comprehend overview of a subject matter. They serve
as a visual tool to facilitate communication and decision-making.
Pictograms are often used as a feature in infographics. Why? Because they
turn otherwise boring information or data points like statistics into
attractive, eye-catching items. Other items that utilize pictograms to catch
and hold attention include resumes, reports and presentations.
FIGURE 8.10
FIGURE 8.11
Relaying insights doesn't have to be intricate and interactive. Often
something as simple as showing a single figure with a visual to represent it
is all you need to get your point across. The most important thing when
visualizing data is understanding and interpreting the insights. Don't
overcomplicate it if you don't have to. Tables and pictograms are an
excellent way to showcase the data in its purest form.
9
F U N D A M E N TA L S O F D E S I G N
Y ou have chosen the right medium to express the story that you have
developed to relay the information to your audience. But your job is
not done. You need to be able to sell your story to your audience.
Doing so is not based on luck and chase. It is based on you combining the
right elements in the right order to grab their attention and keep them
engaged enough to wonder what’s next. This chapter outlines these
elements and how you can use them most effectively.
Balance of Design
You need to draw your audience's eyes to the key data points by using color
contrast, different colors, negative space, size and shapes. Because we read
from left to right, naturally, viewers’ attention tends to fall on the top left
corner of a plot first. It is a good practice to make use of that space for
important insights or, often the title. With this, of course, having some
information to back it up is necessary.
“As you can see, we have been steadily declining in our premium
memberships. However, We have recently released the next version of our
software, which has many improvements such as bug fixes and user
experience upgrades. With this release, we have run multiple marketing
campaigns and seem to have brought in many new customers.”
FIGURE 9.1
FIGURE 9.2
Show Clear Movement
A cluttered design will have your audience's eyes darting all over your plot
with no clear spot where they should land or how they should move to
create cohesive absorption of information. Avoid this confusion by creating
a clear flow of information. Another tendency we, as human beings
illustrate, is that we read in an ‘F’ pattern. First, our eyes move from left to
right, then gradually down a page. You can use this tendency to create
movement from key insights to supporting points in your data
visualizations.
You can also create this smooth movement using colors to direct the
audience's eyes across your plot if your visualization is static. Movement is
implied if your visualization uses interactive and animated tools or light to
dark hue.
FIGURE 9.3
A simple trend line visually shows the audience where the data could go
without needing any information. With this, you can add extra visuals, a
table, or insights about why this is happening or how you can make it
happen.
Utilize Patterns to Highlight Insights
Patterns are developed when design elements are repeated. Use this
repetition to your advantage to display similar types of information across
your plots. This repetition can come in the form of colors, types of charts
and the elements used on these charts.
Showing patterns not only highlights similar relationships between different
data groups but also shows anomalies and differentiations when elements
break from the trend of repetition.
FIGURE 9.4
In the case of the ice cream sales, we added a trend line to reveal the pattern
of the data and its trajectory.
Use Proportion
Watching the same ole thing over and over again creates boredom. As much
as you want to create coherence in your visualizations, you need to also
spice things up. Use different, interesting, relevant design elements to break
the repetition trend. Instead of falling asleep, your audience becomes more
engaged with you and the visualization presented. In this case, we can view
ice cream sales with a regression line showing the upward trend.
FIGURE 9.5
By having two visuals to look at, you better understand the overall trend
and performance of the data.
The theme of your presentation is the dominant idea that unifies all the
elements of the data visualizations. Make this idea clear to your audiences
with consistency and a clear standard. Developing your theme is not a
difficult task. In fact, this would have been developed while you studied
your audiences and while you developed the key insights to be presented.
How you state your theme to your audience depends on the niche of the
data and on the culture of the audience. Find that core element that links all
your insights to show a prevailing objective and concept. Colors tend to tell
someone how to feel. Keep this in mind when figuring out what the goal of
your presentation is.
FIGURE 9.7
In this case, we based the theme of the parties so people can easily
distinguish which is which.
Removal of gridlines creates a smoother visual flow and adding a data label
with the key insight ensures they won’t be missed. The visuals look a lot
smoother, in my opinion. This can also be done with our accompanying
visual aswell.
FIGURE 9.9
The goal of the visual is to show an overall trend, not each specific value.
The gridlines were not adding any significance and the removal made a
cleaner visual.
Data Labels
Labels give your audience context as to what is being presented visually.
However, when they are used incorrectly, they can cause confusion rather
than aid.
FIGURE 9.10
FIGURE 9.11
This is a lot better and gives the reader a great understanding of the data.
Typography
Don't use typography that is too loud as this will distract from your insights.
Not only does your font matter but so too does the title case. Ensure that
capitalization is used correctly. This depends on the exact nature of your
labels.
Map Labels
When labeling maps, be consistent with abbreviations. Use the USPS
abbreviation preferably (or the relevant labels for countries other than the
US). For example, AZ should be used for Arizona in place of A.Z.,
Arizona. Whatever you decide, keep it consistent throughout.
Also, customize map labels to represent the country. This easily
distinguishes them so that confusion is avoided.
Legends
Legends are visual representations of data series on charts. They are used
when displaying multiple series data or combinations of charts. They use
color to show the correlation between the data points plotted.
FIGURE 9.12
Integrating them with your titles is an excellent way to enhance a legend's
effectiveness.
An example of this would be Texas and California are our best performing
states. Instead of a legend, you can color the state names as they would
appear in the chart itself. This eliminates the need for unnecessary glancing
and can be a unique creative approach.
FIGURE 9.13
How legend elements are ordered is also important. When working with
sequential data, always have the highest number at top of the legend in
descending order. A vertical legend with the most extreme values at
opposite ends works best with diverging data.
Placement also plays a significant role. Always place elements below or
beside (parallel to) the visualization so as not to obstruct the audience’s
view of values related to that data. Legends should not add technicality to
the visual, just an easy way to understand the symbols or colors the reader
is looking at.
Titles
This is a line of text that broadly describes what the visualization represents
without identifying trends. Ensure this is not long-winded with no more
than 2 lines of text or 8 words. Always place the title directly at the top of
the chart in the center or to the left.
A title can be accompanied by a subtitle if necessary. Subtitles are a more
detailed explanation of data trends and highlights that will be spotted in the
chart.
FIGURE 9.14
They also serve the purpose of indicating the unit of measurement used. An
effective title paired with a subtitle looks like this: World Population For
2023: World population is expected to cross 8 billion by the end of 2023.
Depending on the main parts of your analysis, curate the title accordingly to
direct the audience to the key outcomes you found.
An excellent format to remember is Title: A general overview of the data in
front of them. Subtitle: Detailed explanation of the critical insight.
A GUIDE TO COLOR
Color theory might seem like a concept that graphic designers and those in
similar posts should know. However, understanding color theory is a
necessary component of developing visually attractive and informative
charts. Color theory combines art and science to explain how human beings
perceive and interpret colors. By understanding color theory, you develop
the mastery of communicating messages effectively by mixing, matching
and contrasting colors on your visuals. How colors are combined is called a
color scheme.
Picking a Color Scheme
Picking the right color scheme for your chart depends on grasping the
anatomy of color. Just like human anatomy describes the different parts that
come together to create a whole being, color harmony refers to the different
aspects that make up color so that we can perceive it. I would highly
recommend you check out Paletton.com. It is an excellent tool for creating
color schemes that go together naturally.
FIGURE 9.15
Color anatomy is made up of the following parts and are processed in the
following ways by your audience:
Hue
Hue is just another name for color. It describes the specific name or shade
of a color. In data visualization, different hues refer to different values or
categories. It shows relationships: whether or not the values or categories
are related.
Saturation
This part of color anatomy refers to a chart element’s brightness relative to
the area it occupies. Highly saturated elements have vibrant colors in
comparison to their environment and other elements, while less saturated
elements produce duller, more washed-out colors. Both ends of the
spectrum are useful in chart design as too much saturation can make
elements overwhelm your graph while too little saturation can make it
difficult to identify visual elements.
Lightness
This feature of color anatomy is closely related to saturation but instead of
the brightness of a color, it refers to the shades and tints (degrees of black
and white) that make up a color. It should be noted that playing both
lightness and saturation leaves you with striking variations in colors’ scale
of intensity. These degrees highlight the differences in chart elements.
For example, changing the lightness of a color can showcase different
values within a given category while still insinuating they are of the same
category. The reader can then easily compare the metrics across regions.
FIGURE 9.16
Color Harmony
Certain color combinations are easier on the eyes and thus, easier for the
brain to perceive. They create contrast and cohesion so that multiple levels
of perception are derived. Diverging from that causes confusion. To ensure
that you use color combinations to your advantage, you need to become
acquainted with the color wheel. The color wheel is an abstract illustration
of colors organized around a circle. These colors are not randomly situated.
Instead, they are placed to show the relationships being primary colors (red,
blue and yellow), secondary colors, which are the mixture of two primary
colors (orange, purple and green), tertiary colors, which are a variety of
primary and secondary colors being mixed (for example blue-green), and
more variations.
Color harmony is achieved when designers pick colors from the color wheel
and arrange them in such a way so that data visualizations gain depth by
virtue of the contrast and cohesiveness those colors allow. Consider color
harmony when creating your theme to represent the data and tell a story in
the most effective way possible.
FIGURE 9.17
Monochromatic
This color arrangement consists of a single color being used in different
shades (the addition of black) and tints (the addition of white). Therefore,
you use light and dark versions of that one color.
FIGURE 9.18
Analogous
Three colors situated next to each other on the color wheel are used in this
arrangement. For example, orange-yellow and orange-red can be used to
develop a data visual with such an arrangement.
FIGURE 9.19
Complementary
This arrangement makes use of colors that are on opposite sides of the color
wheel. This can be expanded into shades and tints of these two colors.
Green and red are examples of complementary colors.
FIGURE 9.20
Split complementary
This is a variation of the complementary color scheme. However, instead of
using two colors, three colors are used. One of the complementary colors is
split into the two adjacent colors to create the trio. The use of orange, blue-
purple and blue-green is an example of a split complementary color
scheme. Orange and blue are opposite on the color wheel. Blue is split into
the two adjacent colors blue-purple and blue-green to make the trio.
Triad
Triad means three, so this arrangement of colors is composed of three
colors evenly spaced on the color wheel. The most basic triad color
schemes are composed of:
Tetradic
This arrangement of colors comes with four colors from the color wheel, as
indicated by the prefix ‘tetra’. The colors are picked for a rectangular shape
and are evenly spaced on the color wheel with no color being dominant
over the others. This color arrangement is also called double
complementary.
FIGURE 9.21
This type of palette is used when the data variables are categorical. These
variables do not have a distinct order. Tips for making the best use of such a
color palette include:
FIGURE 9.22
The sequential color palette is used when the values plotted for one subject
are numerically ordered. The colors assigned must exist in a continuum. For
example, if your data is presented in percentages (0-100), lower numbers
should be lighter while higher values should be a darker shade. The same
color is used. The difference is the changing lightness.
Alternatively, you can use transitions with different hues. Use light or cool
colors like blue for lower values and transition into darker or warm colors
for higher values.
Diverging
FIGURE 9.23
FIGURE 9.24
A more effective approach would be to use a horizontal bar graph. It
eliminates the need for color to distinguish the categories.
FIGURE 9.25
FIGURE 9.26
Be consistent
You might be tempted to play around with color palettes as there is such a
vast array to choose from but you must remember to stay consistent with
colors throughout the development of your visuals. If your presentation
contains different charts highlighting different insights, then, of course, you
can use different palettes, but if you use, for example, green and red to
show positive and negative values in your sequential palette, make sure this
stays consistent for future visualizations. Do not change the association of
color to mean something different in different charts.
Don’t always rely on color
Color is not the only tool at your disposal in data visualization. In some
instances, you can even use other visual elements to magnify the emphasis
you are trying to place with color. For example, you can add indicators like
an up arrow to show positive or strengthen the insight's meaning. The brain
will naturally associate seeing green and an arrow pointing up as a gain of
some sort. The opposite applies to using a down arrow along with red to
show negativity or loss.
Avoid color clutter
Do not use too much color when creating your visuals. In fact, the first
color you should add is gray. Then consider which insights are worth using
color to highlight. In some cases, you can give each category a distinct
color to signify their difference. The chart might solidify one key point with
supporting points in other cases. In that instance, it might be worth
highlighting the specific insight with color and keeping the rest of the chart
neutral.
FIGURE 9.27
FIGURE 9.28
C OLOR IS NOT SOLELY an aesthetic element. It's a tool that tells your
audience how to feel and where to look. I like to think of it as music in a
film. It sets the tone for what's happening. A properly selected color palette
should convey the data accurately and add to the story. Using color simply
to distinguish categories should be the bare minimum. You can get
everything wrong on data visualization, but nail the color composition, and
you'll get your point across.
W HEN DESIGNING YOUR VISUALS , many factors come into play. Knowing the
fundamentals and adding your own flair is the key to creating winning data
visualizations. Over time it is good practice to slowly build a style guide
and document what worked and what didn't. Eventually, you'll arrive at a
place where you can easily turn any dataset into a beautifully crafted and
presented story with ease.
10
CHART REDESIGNS
I DON ' T LIKE A STACKED bar chart because it's difficult to compare the
categories over time or even compare specific metrics within one month.
Readability is essential for your audience and this falls short. You've missed
the mark if you have to tell the reader what they should be seeing before
they see it. Your narrative should support what the chart already shows. I'm
sure there are ways to make it work, but I would rather scrap it altogether. A
line chart is more effective when showcasing trends over time.
FIGURE 10.2
C AN WE REMOVE ANYTHING ?
This happens to be one of our best years for organic traffic and sales. Since
it is 2020 and everyone is forced to stay home, many families brought their
summer fun to their houses.
L ET ' S focus our chart more on the specific insights that stand out and lead
our audience to some action steps to benefit from this spike. We can remove
the traffic from social media and advertising and start by comparing the
previous year's organic search sales to this year's.
This is the first chart we will present to our audience, but not before a few
tweaks.
FIGURE 10.3
I N TERMS OF LABELS , we will remove the legend and add the years onto the
chart to avoid unnecessary glancing back and forth. It's common practice to
place data labels at the end of the lines, but given this specific chart and
how close the ends are, putting them at the beginning might be more
effective. This can also be effective because people read left to right, so
they will see the year and its corresponding income in the same color,
resulting in a quick understanding of the key points.
W E CAN ALSO REMOVE the gridlines and chart border for a smooth visual
flow and update the title to be more detailed and clear.
N OW THE CHART looks quite presentable. But we can't forget our most
important task as data professionals, guiding them to the proper insights
that lead to better business decisions.
T HIS LEADS us to our next question.
W HAT ACTION DOES my audience need to take, and do they have the
tools to see this?
C OVID -19 HAS BEEN an anomaly for our business regarding sales and new
customers. But trying to surpass an anomaly year is not feasible.
W ITH INSIGHTS from our organic search, we can now allocate a considerable
amount of our marketing budget to retargeting in the following season, so
they continue to return for the things they need, whether another pool,
pump, chemicals, or anything else. This can be done through our
promotional emails, digital advertising campaigns and social posts.
T O HELP the audience reach this conclusion, we can pair our new line graph
with a simple metric, Showing that in the previous years, around 50-64% of
our sales were from returning customers. They continually bought the
supplies they needed year after year. With the influx of new customers,
2020 resulted in only 35% of our sales from returning buyers.
FIGURE 10.5
Although we will have more data after the next peak season, we can use the
most recent insights to focus our budget on retargeting and keeping those
customers, so when lockdowns lift and our sales return to normal, we have
gained, and kept many reliable customers. If we spend our resources trying
to continue to gain more and surpass our anomaly year, we are setting
ourselves up for disappointment.
T HIS IS JUST a taste of telling a compelling story with data. If you want to
dive deeper into the presentation process, check out my first book, How to
Win With Your Data Visualizations. It is all about presenting data effectively
captivating your audience.
Let's have a look at some more chart redesigns.
FIGURE 10.6
FIGURE 10.7
Comparing categories within categories with different sets of values is
always a challenge. People often fall on some sort of bar chart variation, but
they aren't the most effective way to view such data.
FIGURE 10.8
I T ALSO INCORPORATES the legend into the chart so that you don't find
yourself glancing back and forth, trying to piece together the visual. (Which
is what I found myself doing in the original bar chart).
Although line charts are a fan favorite, they can still be done incorrectly.
FIGURE 10.10
As you can see, there is an awful lot of clutter. Although it shows all the
information, I think it's possible to make it much more exciting and
effective.
FIGURE 10.11
By only highlighting the essential metrics, the audience can see where we
were in 2017, the trend over the years, and finally, where we are now. The
extra values seemed irrelevant. This way, they can see the sales trajectory
and where more resources need to be allocated. The readability goes way up
by adding the locations onto the chart and changing the color scheme. By
adding insight to the subtitle, we can better focus our audience's attention
while still having all the necessary metrics clearly visible. Highlighting one
specific line wouldn't be effective here as the performance of all the regions
is essential to see.
Many people naturally gravitate to bar charts. Although this isn’t
necessarily negative, let's see how we can make them more effective based
on the data.
FIGURE 10.12
This regular bar chart seems to be better. However, When comparing actual
to targets, there is a better way to do it. You can either go the route of a
bullet chart or, in this case, we can take the elements of a bullet chart and
execute them in a simple yet effective way.
FIGURE 10.14
The horizontal bar works great in this scenario. By adding the target as
lines, the viewer will naturally check to see if the bars are below or above
them, even if they dont know what they mean. A great way to compare
sales to targets. We can add the actual values as data labels to get more
insight out of the chart.
FIGURE 10.15
If required, we can further dive into the metrics from Q3 and see where we
fell short.
There are many ways to redesign and visualize the information above to be
effective. Don't take these rules as set in stone, be creative and present the
information most memorably. Data visualization is an art in itself. Create
your style and continue to expand upon it with new insights and knowledge.
With the number of new people in the world of data, standing above the rest
is crucial to making yourself known. Harnessing your data visualization
skills will allow you to stand above the rest.
CONCLUSION
Data. It seems to pour like raindrops during a storm but unlike those
passing clouds, it never stops. Data is constantly being generated during
every microsecond of the day. Gone is the time when our economy and
many social constructs were driven by manufacturing. We have entered the
information age and there is no end in sight for this time period. Also gone
is the time when the world's most valuable resource is oil. The intangible
components that make up this age, data points, have become the most
valuable thing on this planet. People are willing to pay big bucks for data
but that would not be the case if there was no way of translating data from
its raw state into a medium that is easy to interpret.
Data visualization is the translation of data into a visual context such as a
chart or a map. Data visualization is that bridge that makes data the value
item for which people are willing to write big checks. The human brain is
hardwired to pick information received through the sense of sight far faster
and easier than information derived from any other sense. That’s the power
of visual processing. That is the power you can harness to bring your point
across to audiences. You can show patterns, showcase trends and highlight
outliers without saying a word, even when working with large data sets.
You might be an academic. You might be an intern. You might be an entry-
level employee. You might be a manager. You might be a business owner.
No matter your job title, understanding the value of data visualization and
using it effectively goes across the board. It all starts with a plan. This plan
will tell you which is the right chart to pick for your particular presentation
and which design elements will enhance that chart visually and make it the
most informative for the audience. The “right” chart is dependent on what
you want to tell your audience. This is the overall theme of the chart
development. The themes stated in this book, along with a brief outline of
what they are and some of the graphs that fall in this category, include:
Change Over Time
These charts show the changing trends of data set over both short time
frames like 24 hours and over longer periods of time like years. Charts that
show change over time include:
Line charts
Slope charts
Area charts
Connected Scatter plots
Gantt charts
Comparison
Such charts show the differences or similarities between multiple variables
in data sets or multiple categories within a single variable. Charts that show
comparison include:
Bar graphs
Diverging bars
Bubble charts
Waterfall charts
Sankey diagrams
Marimekko charts
Bullet charts
Dumbbell plots
Distribution
This theme of data visualization expresses the frequency of data to show
uniformity or a lack thereof. In other words, such charts highlight how data
is spread out. Charts in this group include:
Histograms
Dot plots
Ridgeline plots
Box plots
Candlestick plots
Violin plots
Population pyramids
Strip plots
Beeswarm plots
Part-to-the-Whole
These charts highlight how a single entity compares to its elements’
distribution. The common function of these charts is to show how
something is divided up. Such charts include:
Pie charts
Treemaps
Sunburst charts
Nightingale Rose charts
Relationship
This group of charts aims to show the relationships or connections between
two or more data variables. Examples of such charts include:
Scatter plots
Radar charts
Chord diagrams
Network diagrams
Tree diagrams
Parallel coordinates plots
Geographical
These charts are used to highlight data sets when precise locations are
important to note. Such charts include:
Choropleths
Proportional symbol maps
Flow maps
It is not enough to simply pick the right chart. You must also walk your
audience through a story that captivates and engages them. That story needs
to have a beginning, a middle and an end. Using the right visual elements
helps you do just that. The principles of design that make your chart
visually appealing include:
Tables are particularly useful when the audience is used to reading certain
types of information rather than seeing it in a visual format.
You have reached this book's end and are now equipped to use charts like a
pro. Enhancing your ability to communicate data will slingshot you far in
this day and age and lead you to success in your role and a pat on the back
from your superiors (Hopefully, that comes with a raise.) I hope I have
shined a light on an essential skill and I wish you luck on your journey of
telling better stories, communicating more insights, and driving better
business decisions.
A P P E N D I X - T O O L S F O R D ATA
V I S U A L I Z AT I O N
I'm sure you've asked yourself, "How on earth do I create these charts?!”.
Considering that could be a topic for an entire book, We won't go through it
step by step this time. But I can assure you not to panic. There are many
programs to create stunning visualizations regardless of your expertise.
Let's go through some of them so you can start getting your feet wet.
Infogram
Infogram offers interactive charts, infographics, maps, and many free
templates to tell compelling data stories. It supports data uploads from
google sheets, dropbox, MySQL and more. Infogram is known for its
interactive features to enhance your visualizations and allow for movement
such as zoom, bounce, fade, rotate, and slide objects into your work.
Infogram is often used by marketers, media companies, and whoever wants
their visualizations to stand out and be different.
Pro: Easy to create interactive and engaging visualizations
Con: You must upgrade to the paid version to remove the watermark if
embedding your visualizations.
Mid-level options
PowerBI
PowerBI is another great data visualization tool. It has the extra processing
power to work with larger sets of data. Their dashboards can also be more
customizable than Excel. It's essentially a more powerful Excel. It's used
prevalently in business intelligence.
Pro: Can connect with many different file data sources, including Excel and
CSV, as well as database sources like Oracle, SQL Server, IBM and much
more.
Con: Although capable, can sometimes have trouble processing large sets of
data
Tableau
Tableau is one of the most used programs in the business world. Tableau
excels at visualizing even the most extensive data sets without limitations
on the number of data points or rows.
Pros: Fast, with many extensive features for creating intricate
visualizations.
Cons: With the free version (Tableau Public) Your dashboards can be
viewed publicly, which can cause problems with confidential data.
Advanced Options
Python
Python is a programming language with many built-in libraries that make it
possible to visualize data. Libraries such as Matplitlib, Pandas visualization,
Seaborn, and Plotly make it possible to curate your data into a wide array of
dashboards and visuals. Although effective, it takes experience with Python
programming. Creating these visualizations requires you to write lines of
code for your desired output.
R
R is another statistical language used for the analysis and visualization of
data. It connects with various libraries that, with some coding, you can turn
your data into visual insights. It can be very effective and professional yet
has a higher barrier of entry.
As your career in data evolves, so will the software you use. Inevitably, if
you can effectively visualize data, it doesn't matter which program you use.
Find one you like or that fits your business needs. Whether you need access
to large databases or quick insights on the fly, there should be a tool for
you.
REFERENCES