Data Visualization Best Practices
Data Visualization Best Practices
There are four basic presentation types that you can use to present your data:
Comparison
Composition
Distribution
Relationship
Unless you are a statistician or a data-analyst, you are most likely using only the two, most commonly used types of data analysis:
Comparison or Composition.
To determine which chart is best suited for each of those presentation types, first you must answer a few questions:
How many variables do you want to show in a single chart? One, two, three, many?
How many items (data points) will you display for each variable? Only a few or many?
Will you display values over a period of time, or among items or groups?
Bar charts are good for comparisons, while line charts work better for trends. Scatter plot charts are good for relationships and distributions, but pie
charts should be used only for simple compositions — never for comparisons or distributions.
Column Charts
The column chart is probably the most used chart type. This chart is best used to compare different values when specific values are important, and it
is expected that users will look up and compare individual values between each column.
With column charts you could compare values for different categories or compare value changes over a period of time for a single category.
Use column charts for comparison if the number of categories is quite small — up to five, but not more than seven categories.
If one of your data dimensions is time — including years, quarters, months, weeks, days, or hours — you should always set time dimension
on the horizontal axis.
In charts, time should always run from left to right, never from top to bottom.
For column charts, the numerical axis must start at zero. Our eyes are very sensitive to the height of columns, and we can draw inaccurate
conclusions when those bars are truncated.
Avoid using pattern lines or fills. Use border only for highlights.
Only use column charts to show trends if there are a reasonably-low number of data points (less than 20) and if every data point has a clearly-
visible value.
Histograms
Histogram is a common variation of column charts used to present distribution and relationships of a single variable over a set of categories. A good
example of a histogram would be a distribution of grades on a school exam or the sizes of pumpkins, divided by size group, in a pumpkin festival.
Use stacked column charts to show a composition. Do not use too many composition items (not more than three or four) and make sure the
composing parts are relatively similar in size. It can get messy very quickly.
Before moving to the next chart type, I wanted to show you a good example of how to improve the effectiveness of your column chart by simplifying
it. Credit: Joey Cherdarchuk
Bar Charts
Bar charts are essentially horizontal column charts.
If you have long category names, it is best to use bar charts because they give more space for long text. You should also use bar charts, instead of
column charts, when the number of categories is greater than seven (but not more than fifteen) or for displaying a set with
Bar charts are essentially horizontal column charts.
If you have long category names, it is best to use bar charts because they give more space for long text. You should also use bar charts, instead of
column charts, when the number of categories is greater than seven (but not more than fifteen) or for displaying a set with negative numbers.
A typical use of bar charts would be visitor traffic from top referral websites. Referring sites are usually more than five to seven sites and
website names are quite long, so those should be better horizontally graphed.
Another example could be sales performance by sales representatives. Again, names can be quite long, and there might be more than seven
sales reps.
Just like column charts, bar charts can be used to present histograms.
Both the Bar and the Column charts display data using rectangular bars where the length of the bar is proportional to the data value. Both are used to
compare two or more values. However, their difference lies in their orientation. A Bar chart is oriented horizontally whereas the Column chart is
oriented vertically.
Who doesn’t know line charts? We used to draw those on blackboards in school.
Line charts are among the most frequently used chart types. Use lines when you have a continuous data set. These are best suited for trend-based
visualizations of data over a period of time, when the number of data points is very high (more than 20).
With line charts, the emphasis is on the continuation or the flow of the values (a trend), but there is still some support for single value comparisons,
using data markers (only with less than 20 data points.)
A line chart is also a good alternative to column charts when the chart is small.
Timeline Charts
The timeline chart is a variation of line charts. Obviously, any line chart that shows values over a period of time is a timeline chart. The only
difference is in functionality — most timeline charts will let you zoom in and out and compress or stretch the time axis to see more details or overall
trends.
An area chart is essentially a line chart — good for trends and some comparisons. Area charts will fill up the area below the line, so the best use for
this type of chart is for presenting accumulative value changes over time, like item stock, number of employees, or a savings account.
Do not use area charts to present fluctuating values, like the stock market or prices changes.
Stacked Area
Stacked area charts are best used to show changes in composition over time. A good example would be the changes of market share among top
players or revenue shares by product line over a period of time.
Stacked area charts might be colorful and fun, but you should use them with caution, because they can quickly become a mess. Don’t use them if you
need an exact comparison and don’t stack together more than three to five categories.
Who doesn’t love pies, though. These charts are among the most frequently used and also misused charts. The one on the right is a good example of a
terrible, useless pie chart - too many components, very similar values.
A pie chart typically represents numbers in percentages, used to visualize a part to whole relationship or a composition. Pie charts are not meant to
compare individual sections to each other or to represent exact values (you should use a bar chart for that).
When possible, avoid pie charts and donuts. The human mind thinks linearly but, when it comes to angles and areas, most of us can’t judge them
well.
The Dos and Don’ts for Pie charts
For those of you who still feel sentimental about the old PowerPoint Pie charts, and want to keep using them, there are some things to keep in mind.
Make sure that the total sum of all segments equals 100 percent.
Use pie charts only if you have less than six categories, unless there’s a clear winner you want to focus on.
Ideally, there should be only two categories, like men and women visiting your website, or only one category, like a market share of your
company, compared to the whole market.
Don’t use a pie chart if the category values are almost identical or completely different. You could add labels, but that’s a patch, not an
improvement.
Don’t use 3D or blow apart effects — they reduce comprehension and show incorrect proportions.
Scatter Charts
Scatter charts are primarily used for correlation and distribution analysis. Good for showing the relationship between two different variables where
one correlates to another (or doesn’t).
Scatter charts can also show the data distribution or clustering trends and help you spot anomalies or outliers.
A good example of scatter charts would be a chart showing marketing spending vs. revenue.
Bubble Charts
A bubble chart is a great option if you need to add another dimension to a scatter plot chart. Scatter plots compare two values, but you can add bubble
size as the third variable and thus enable comparison. If the bubbles are very similar in size, use labels.
We could in fact add the fourth variable by color-grading those bubbles or displaying them as pie charts, but that’s probably too much.
A good example of a bubble chart would be a graph showing marketing expenditures vs. revenue vs.
profit. A standard scatter plot might show a positive correlation for marketing costs and revenue
(obviously), when a bubble chart could reveal that an increase in marketing costs is chewing on profits.
Use Scatter and Bubble charts to:
A stem and leaf display is a graphical method of displaying data. It is particularly useful when your data are not too numerous.
shows the first digits of the number (thousands, hundreds or tens) as the stem and shows the last digit (ones) as the leaf.
usually uses whole numbers. Anything that has a decimal point is rounded to the nearest whole number. For example, test results, speeds,
heights, weights, etc.
looks like a bar graph when it is turned on its side.
shows how the data are spread—that is, highest number, lowest number, most common number and outliers (a number that lies outside the
main group of numbers).
Once you have decided that a stem and leaf plot is the best way to show your data, draw it as follows:
On the left hand side of the page, write down the thousands, hundreds or tens (all digits but the last one). These will be your stems.
Draw a line to the right of these stems.
On the other side of the line, write down the ones (the last digit of a number). These will be your leaves.
A box and whisker plot (sometimes called a boxplot) is a graph that presents information from a five-number summary.
It does not show a distribution in as much detail as a stem and leaf plot or histogram does, but is especially useful for indicating whether a
distribution is skewed and whether there are potential unusual observations (outliers) in the data set.
Box and whisker plots are also very useful when large numbers of observations are involved and when two or more data sets are being compared.
the ends of the box are the upper and lower quartiles, so the box spans the interquartile range
the median is marked by a vertical line inside the box
the whiskers are the two lines outside the box that extend to the highest and lowest observations.