Limits of Multivariate Data Visualization
Limits of Multivariate Data Visualization
Almost all data visualizations are multivariate (i.e., they display more than one variable), but there are practical
limits to the number of variables that a single graph can display. These limits vary depending on the approach
that’s used. Three graphical approaches are currently available for displaying multiple variables:
3,500
3,000
2,500
2,000
1,500
1,000
500
0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
One variable—time by month—is encoded as horizontal positions along the X axis and the other variable—
sales in dollars—is encoded as vertical positions along the Y axis. In other words, this example uses two visual
attributes to encode values, one per variable: 2-D horizontal position and 2-D vertical position.
11K
10K
9K
8K
7K
6K
5K
4K
3K
2K
1K
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8
Patient % of Cost
The following four visual attributes have been used to encode the four variables: horizontal position along the
X axis (patient percentage of cost), vertical position along the Y axis (per patient cost in U.S. dollars), bubble
size (number of patients), and bubble color intensity (patient age). Could we include a fifth variable in this graph
in a way that works for our brains? There are certainly several more visual attributes from which to choose, but
would any of them work in this case? Unfortunately, due mostly to perceptual limitations, the answer is, “Not
well.”
If you doubt this, use your imagination to consider the possibilities. Perhaps it occurred to you that, in addition
to variation in color intensity, which in this case encodes patient ages, we could encode a new variable, such as
patient racial group, using various hues of color. If we did this, color intensity would no longer work effectively
because it is difficult to compare the varying intensities of different hues, and the variable that’s encoded using
hue would suffer because it is no longer easy to group objects with the same hue when color intensity varies.
As an alternative, perhaps the bubbles, which are all circular in shape, could vary in shape to encode a fifth
variable (e.g., circles, squares, triangles, etc.). The problem with this approach is that, whereas we can roughly
compare the sizes of circles to one another or squares to one another or triangles to one another, we cannot do
a good job of comparing the sizes of circles, squares, and triangles to one another. Differences in shape make
differences in size difficult to discern.
Copyright © 2019 Stephen Few, Perceptual Edge Page 2 of 15
Even with only four variables, we’re already pushing the limits of effectiveness in this graph. Notice how difficult
it is to determine the color intensities of small bubbles and to compare them to other bubbles. Colors become
difficult to discriminate when objects are tiny. The larger the object, the more color there is, which makes
discrimination easier. As you continue to consider other visual attributes that might be used to encode a fifth
variable, you’ll encounter problems with each.
You might be thinking that I’m ignoring a visual attribute that could easily be added to this bubble plot: positions
along the Z axis. Actually, I’m avoiding the Z axis for a good reason. Turning this into a 3-D graph by adding a Z
axis would make the variable that’s encoded along that axis incredibly difficult to read. This is because, contrary
to the ease with which human perception discerns differences in 2-D position (either horizontal or vertical along
a flat plane), our perception of depth is not very good. Adding a Z axis would force us to constantly rotate and
tilt the graph to reorient the Z axis either horizontally or vertically in an effort to see where bubbles fall along the
axis, which isn’t practical.
Visual perception and cognition impose firm limits on the number of variables that we can encode in a single
graph when we’re using a different visual attribute for each. These limitations are tied to several factors:
1. Only a few visual attributes work well for encoding data in graphs.
2. Using some visual attributes eliminates the possibility of using certain other attributes in the same
graph.
3. Working memory can only attend to three or at most four chunks of information at a time, so limited
value is added by including more than four.
4. Increasing the number of visual attributes in a single graph beyond a certain number creates a
cluttered appearance that undermines perception.
Beginning with the work of Jacques Bertin, author of Sémiologie Graphique (The Semiology of Graphics),
in the 1960s, people have studied visual perception as it applies to data visualization. Bertin explored the
opportunities and limitations that influence the use of various visual attributes for encoding data. Since
Bertin’s seminal work, the best books on this topic have been written by Colin Ware: Information Visualization:
Perception for Design and Visual Thinking for Design. Everyone working in the field of data visualization should
read these books. Vendors developing data visualization products should definitely read these books, but it
seems that, based on the ineffective features that most products exhibit, they rarely do.
All data visualizations have one thing in common: they encode data values graphically, using basic attributes
of visual perception. Whenever we look at an object in the world, the visual representation that appears in
our heads is constructed from a small set of basic visual attributes. These attributes are called preattentive
attributes of visual perception, for they are processed in the visual cortex of the brain preattentively (i.e., prior to
conscious awareness). Each of these attributes is perceived separately, but in parallel rather than serially, more
rapidly than conscious perception. The speed and ease of preattentive perception is a big part of the reason
why data visualization is so powerful when done properly.
Here’s a fairly comprehensive list of the preattentive attributes of visual perception that are potential candidates
for encoding data in graphs, grouped into six categories:
Attributes of Position
Attributes of Form
Attributes of Appearance
• Direction of motion (e.g., the direction in which bubbles move in an animated bubble plot)
• Speed of motion (e.g., varying speeds in the movement of bubbles in an animated bubble plot)
• Speed of flicker (i.e., the speed at which an object flickers on and off or from low to high intensity)
Attributes of Quantity
• Numerosity (i.e., our ability to recognize differences in quantity between one, two, or three objects)
• Added marks (i.e., the varying addition of another component to an object—it is either there or it
isn’t—such as a border around a bubble in a bubble plot)
We can consider all 21 of these preattentive attributes of visual perception as candidates for encoding values in
graphs, but only a few of them work well.
We perceive some preattentive visual attributes quantitatively. By this, I mean that we naturally perceive
different expressions of the attribute as representing either greater or lesser values. For example, we perceive
a long line as greater in value than a short line or a dark circle as greater in value than a light circle. We
perceive each of the following attributes quantitatively:
Of these 16 attributes, only three work well for encoding quantitative data in graphs:
When I say that they work well, I mean that they can be perceived and compared to one another quickly, easily,
and with a great deal of precision. Whereas these three attributes work well, all of the others provide only an
approximate sense of value and a rough means of comparison. Of these, the following two tend to be most
useful in graphs:
• Color intensity
• Area
Because color intensity and area only support approximate decoding and rough comparisons, however,
we should only use them when neither 2-D horizontal position, nor 2-D vertical position, nor line length are
available.
It doesn’t usually make sense to even consider numerosity because it’s severely limited. Numerosity refers to
our preattentive ability to see differences between quantities of one, two, or three. We can also discern that
more than three objects are greater than three, but we cannot decode the actual number preattentively. For
example, if several clusters of dots appeared on a screen, we could recognize without conscious effort that
some contained one dot, some two, some three, and some more than three. When clusters contained more
than three dots, however, we could not tell how many there were without taking time to consciously count them.
As such, numerosity is only useful for encoding values in a graph if quantities don’t exceed three. This situation
happens too rarely to routinely consider numerosity as a candidate for encoding values in graphs.
The remaining quantitatively perceived attributes—stereoscopic depth, volume, line orientation, angle,
curvature, blur, speed of motion, and speed of flicker—are rarely used in graphs, either because we perceive
them less well than others or because they aren’t practical.
Some visual attributes can only be used to encode categorical variables, not quantitative variables. These
include the following:
• Simple shape
• Hue
• Texture
• Added marks
So, where does this leave us? Even though all 21 of these preattentive attributes can potentially be used to
encode variables in graphs, only a few work well. As it turns out, however, this is not the only reason why a
single graph can only effectively display a limited number of variables. The number of variables that we encode
in a single graph is also affected by the fact that 1) certain visual attributes cannot be combined effectively in
a single graph, 2) working memory can only handle three or at most four variables at a time, and 3) too many
visual attributes tend to produce visual clutter. We’ll consider those limitations next.
Some visual attributes can be combined in a single graph and some cannot. For example, 2-D horizontal
position, 2-D vertical position, hue, and simple shape can work fairly well together in a scatter plot. On the other
hand, as I’ve already pointed out, we cannot effectively combine hue and color intensity together in a single
graph.
Another ineffective combination is the use of both line length and line width for separate variables. This
is because length and width function as integral attributes. This means that, when they are combined, we
perceive the result as area rather than as independent attributes of length and width. Imagine that we used the
lengths of bars to encode one variable and the widths of bars to encode a second. We would preattentively
perceive this combination as differences in the overall areas of bars, no longer independently as differences
in the bars’ lengths and widths. Although we could not perceive length and width as separate variables
preattentively, we could do so with conscious effort, but it would be much slower.
• Line length and line width, because these attributes are integral
• Any attributes of color (e.g., hue and color intensity, hue and transparency, or color intensity and
transparency)
• Size and color (either hue, intensity, or transparency), when the sizes of objects become tiny
• Shape and size, for we cannot effectively compare the sizes of objects that vary in shape (e.g., circles,
squares, triangles, and stars)
• Shape and curvature, because curvature is an aspect of shape and changing the curve would change
the shape
• Shape and line orientation, because only a few shapes, such as lines and rectangles, would make it
easy to perceive and compare slopes
These attributes can certainly be combined in a single graph, but they cannot be combined effectively.
In the moment when we’re thinking about things (i.e., while we’re attending to them), information is held in
working memory. This is different from long-term memory, which functions as a form of permanent storage
for later retrieval. When you retrieve information from long-term memory, you pull it into working memory to
think about and manipulate it in the moment. Working memory is volatile in that, once information is released
from working memory to free up space for new information, it is forgotten unless we take time to rehearse
it enough to store it in long-term memory. In addition to being volatile, working memory is extremely limited.
As I’ve already mentioned, we can only hold from three to four chunks of information in working memory at a
This limitation in the capacity of working memory plays a significant role in data visualization. When we view
a graph for the purpose of reading and comparing values, the fact that we can only simultaneously hold up
to three or four chunks of information in working memory limits the comparisons that we can make in any
one moment. Fortunately, because a great deal of information is potentially there in front of our eyes, we can
quickly swap information in and out of working memory as needed, but never hold more than four chunks at a
time.
Here’s the clincher. When multiple variables are represented by different visual attributes, we cannot chunk
them together in working memory. For example, if we’re viewing a bubble plot that uses 2-D horizontal position,
2-D vertical position, bubble size, and bubble color intensity to encode four variables in each bubble, each of
those values is held in working memory as a separate chunk. If we applied additional visual attributes to those
bubbles to encode more variables, we would still only be able to hold up to four at a time in working memory.
Now, what if we want to compare one of those bubbles to another? If each bubble represents four values,
totaling eight for two bubbles, we could only hold two values at a time for each bubble in working memory
when making comparisons. This means that we would be forced to swap values in and out of working memory
to compare more than two values per bubble. Consequently, even though we could encode more variables in
a single graph using different visual attributes, it wouldn’t expand our ability to consider them simultaneously.
Even four variables per object exceeds the number that we could consider in any one moment when we’re
comparing objects to one another. As far as I know, no research studies have ever measured the efficiency
gains or losses for various tasks (e.g., decoding the various values that are associated with an object,
comparing objects of various types, etc.) that are associated with the number of variables that are encoded in
a single graph. Given proper study, we might find ways to improve efficiency, but for now we must keep these
limits in mind.
This limitation in the capacity of working memory, combined with the fact that most visual attributes do a
relatively poor job of representing values in graphs, forces us to admit that any gains in efficiency that we’re
hoping to achieve by including more than a few variables in a single graph are wasted. It’s worse than that,
actually, for each additional visual attribute that we include in a graph potentially contributes to the appearance
of clutter, which is our next topic.
By clutter, I’m referring to the characteristics of a graph’s appearance that are potentially messy looking
and distracting when we’re trying to focus on the particular attributes that we care about in the moment. For
example, there is no doubt that having objects blink on and off at various speeds to encode a quantitative
variable would make it almost impossible to attend to anything else. Even overly bright colors result in a
cluttered appearance that is distracting. Every additional variable encoded by introducing another visual
attribute to a graph comes with a perceptual cost. The cleaner and simpler the display, the easier it is to use.
When a chef chooses among the ingredients in her kitchen to cook a soup, her goal is not to combine as many
ingredients as possible but instead to combine only those that are needed and to prepare them in the best way
possible to create a pleasing culinary experience. Similarly, when we choose among the variables in a data set
and display them in a particular way in a graph, our goal is not to squeeze as many variables as possible into it
but to answer the question at hand in the most enlightening way. When visualizing data, we don’t typically start
with a single graph and then ask many questions about it. Instead, we start with questions, one at a time, and
create graphs as needed to answer each in the best possible way.
Data visualization vendors, especially newcomers, occasionally make the erroneous claim that their software
can effectively visualize a large number of variables at once using separate visual attributes for each. I
encountered the latest example of this recently when I read a press release about a new product named
Immersion Analytics by the company Virtual Cove. These folks claim that, using their patents-pending
techniques, they can effectively visualize up to 16 variables simultaneously. The following example includes 12
variables:
One of the arguments that Virtual Cove makes to promote their software is that to visualize a data set
consisting of 16 variables using graphs that display only 4 variables each would require 1,820 graphs in total,
which their software could replace with a single graph. They made this specific claim in an email to me, and
they feature similar claims in their marketing efforts. It’s probably quite persuasive to many people, for it has
the air of mathematical certainty. As it turns out, however, it is neither accurate nor relevant. I’m not sure how
they did the math, but it appears to be based on the invalid assumption that every possible combination of
four-variables would need to be examined to compare each of the 16 variables to each of the others. That isn’t
the case. To see each of the 16 variables in relation to each of the other 15 using four-variable bubble plots,
for example, would only require 35 graphs, not 1,820. Their figure is off by a factor of 52. The actual number
of graphs that would be needed is less than 2% of the figure that they claim. Even if we were looking for
correlations among 16 quantitative variables using scatter plots with only two variables each, that would only
require a total of 120 graphs. In fact, a scatter plot matrix could be used to display all of these scatter plots at
once. Even though this might require some scrolling around on the screen to examine every scatter plot, that
wouldn’t matter because we would only need to view one scatter plot at a time. A scatter plot matrix would
provide insights that could never be achieved using a single graph that attempts to encode 16 variables using
Virtual Cove’s approach.
Given their egregious error, do you suspect that Virtual Cove might be making numbers up when they claim,
as they do on their website, that their software can “increase productivity by up to 400x”? A four-hundred-fold
Parallel Coordinates
A parallel coordinates plot uses 2-D position, most often vertically along Y axes, to encode a series of
variables. The example below displays six quantitative variables, each along its own Y axis.
In case you’re not familiar with parallel coordinates plots, let me briefly explain how they work. Let’s begin
by considering a single variable. In the example below, the prices in dollars for 25 products have been
represented by positioning 25 dots along the Y axis. When each value is represented by a dot along a single
quantitative scale in this manner to show how the values are distributed, the graph is called a strip plot.
100
90
80
70
60
50
40
30
20
10
Price in
U.S. $
50 30 50 2500 500 90
40 24 40 2000 400 80
30 18 30 1500 300 70
20 12 20 1000 200 60
10 6 10 500 100 50
So far, however, we cannot determine which dot represents which product. That would be useful if we want
to determine how the products compare to one another across the entire set of six variables. To make this
possible, a parallel coordinates plot would connect the dots for each product across each of the Y axes using
a line. In the example below, which displays multivariate data for 50 products, a particular line is highlighted to
feature a single product’s multivariate profile.
(Note: In this example, rather than assigning a separate quantitative scale to each variable, the scales have
been normalized by expressing each as percentages: the item with the lowest value is at the bottom with 0%
and the one with the highest value at the top with 100%. Because the purpose of a parallel coordinates plot is
not to decode individual values but instead to examine and compare multivariate patterns, the scales can be
normalized in this manner without a loss of relevant information.)
In this example, we have a single graph that displays six variables for 50 products, but a parallel coordinates
Without more thorough instruction in parallel coordinates, an example like the one above might appear
overwhelming, so you might doubt the ability of these graphs to present complex multivariate data in a way
that works for our brains. They do require extensive study and practice, which is one of the reasons why they
are not more familiar, but they can definitively be worth the effort if you need to compare complex multivariate
profiles. For a bit more explanation, I suggest that you read the newsletter article titled “Multivariate Analysis
Using Parallel Coordinates” that I wrote back in 2006.
Table Lenses
A table lens display also uses a series of axes, one per variable, arranged side by side, but the arrangement is
slightly different from parallel coordinates plots. Here’s a simple example of a five-variable table lens display:
In this case, the Y axis host a categorical scale that labels the item for which quantitative data is being
displayed, in this case U.S. states, and the X axes host independent quantitative scales, one per variable.
When values are represented as bars, the horizontal position of each bar’s end and the length of each bar both
represent the same quantitative value. Unlike parallel coordinates, which are used to compare multivariate
profiles, table lenses are used to look for potential correlations among several quantitative variables at once.
Notice in the example above that the states have been ranked from the highest value at the top to the lowest
value at the bottom based on profit, the leftmost variable. Given this arrangement, we can now look at the
arrangements of bars from top to bottom in each of the other columns to see if any of the other variables exhibit
patterns that are similar to profit or are perhaps its inverse. If the arrangement of bars for one of the other
variables roughly displays a pattern ranging from high values at the top to low values at the bottom, this tells us
that it correlates with profit in a positive way. That is, as profit values per state decrease, values of sales also
tend to decrease. If, on the other hand, sales roughly exhibit a pattern of low values at the top to high values
at the bottom, this would tell us that it is still correlated with profit, but in a negative manner. That is, as profit
values decrease, sales values tend to increase.
A table lens can provide a useful way to look for correlations among many variables at once. The example
A table lens can display many variables in a single graph in a manner that works for our brains because it
encodes each using the same visual attribute—one that we can perceive with ease.
Many more graphs than the three that appear in the example above can be included in a small multiples
display and they can be arranged on the screen in various ways. The example above arranges the small
multiples horizontally in a single row, side by side, but they could also be arranged vertically, in a single
column. A large series could also be wrapped across multiple columns and rows—an arrangement that William
Cleveland called a trellis display.
Alternatively, a series of small multiples can be used to add two more categorical variables rather than just one.
In the example below, each column of graphs still displays customer segments, but now each row displays
product categories.
When small multiples are arranged in this way, with one variable along the rows and another along the
columns, I call it a visual crosstab.
Even though a small multiples display consists of multiple graphs, because all the graphs work the same and
are all visible at once, we can easily and quickly compare them to one another. If we know how to read one
graph, we know how to read them all. This is a powerful way to increase the number of variables that can be
simultaneously displayed beyond the number that you could include in a single graph that encodes variables
using different visual attributes.
Copyright © 2019 Stephen Few, Perceptual Edge Page 13 of 15
Conclusion
Wanting to break through our limitations is natural. We want to be better; we want to do more. We don’t
accomplish this, however, by ignoring our limitations. Ignorance is the path to delusion and dysfunction.
Software vendors don’t get any points for building and selling tools that simultaneously visualize a dozen or
more variables in ways that don’t work. When our limitations get in the way, we overcome them by using our
brains to find real solutions. We always begin by understanding our limitations. Parallel coordinates plots, table
lens displays, and small multiples are all innovations that demonstrate the merits of this approach. On the other
hand, the graph below shows what happens when we simply ignore our limitations.
This graph only displays eight variables, half the number that the vendor, Virtual Cove, claims to support, and
it’s already a virtual cave of worthless effects. We can only see that a few of the spheres (i.e., 3-D bubbles) are
much bigger than the rest and that one is much brighter as well. Imagine how much worse it would be if this
graph attempted to display 16 variables rather than 8.
The potential for understanding that resides in our data should not be wasted by chasing pipe dreams. The
path forward begins by understanding our limitations, not by pretending that they don’t exist.
• Show Me the Numbers: Designing Tables and Graphs to Enlighten, Second Edition
• Information Dashboard Design: Displaying Data for at-a-Glance Monitoring, Second Edition
• Now You See It: Simple Visualization Techniques for Quantitative Analysis
• Signal: Understanding What Matters in a World of Noise
• Big Data, Big Dupe: A Little Book about a Big Bunch of Nonsense
• The Data Loom: Weaving Understanding by Thinking Critically and Scientifically with Data
You can learn more about Stephen’s work and access an entire library of articles at www.PerceptualEdge.com.
Between articles, you can read Stephen’s thoughts on the data visualization and data sensemaking in his two
blogs: www.PerceptualEdge.com/blog and www.Stephen-Few.com/blog.