2015 Task-Based Effectiveness of Basic Visualizations
2015 Task-Based Effectiveness of Basic Visualizations
8, AUGUST 2015 1
Abstract—Visualizations of tabular data are widely used; understanding their effectiveness in different task and data contexts is
fundamental to scaling their impact. However, little is known about how basic tabular data visualizations perform across varying data
analysis tasks. In this paper, we report results from a crowdsourced experiment to evaluate the effectiveness of five small scale (5-34
data points) two-dimensional visualization types—Table, Line Chart, Bar Chart, Scatterplot, and Pie Chart—across ten common data
analysis tasks using two datasets. We find the effectiveness of these visualization types significantly varies across task, suggesting that
visualization design would benefit from considering context-dependent effectiveness. Based on our findings, we derive recommendations
on which visualizations to choose based on different tasks. We finally train a decision tree on the data we collected to drive a
recommender, showcasing how to effectively engineer experimental user data into practical visualization systems.
arXiv:1709.08546v3 [cs.HC] 24 Apr 2018
1 I NTRODUCTION
Eells [13] investigated effectiveness, of proportional compar- which suggested considering a broad set of visualization techniques
ison (percentage estimation) task in divided (stacked) bar charts in our experiment. At the same time, we would like our study
and pie charts. Eells asked participants to estimate the proportions to have the members of the general public as our participants:
in pie charts and bar charts. He found pie charts to be as fast as this would suggest to include a set of visualization techniques
and more accurate than bar charts for proportional comparison which are understandable by all participants. Building on previous
tasks. He also found that as the number of components increases, work [23] and investigations on visualization techniques supported
divided bar charts become less accurate but pie charts become by different visualization tools (e.g., Microsoft Excel, Tableau,
more (maximum five components were considered). In a follow Spotfire, QlikView, Adobe Analytics, IBM Watson Analytics), we
up study with a different setting, Croxton and Stryker [8] also decided to include five well-recognized visualization techniques
tested the effectiveness of divided bar charts and pie charts using in our study. In this study, we include Bar Chart, Line Chart,
a proportional comparison task. They also found pie charts to be Scatterplot, Table, and Pie Chart (see Figure 1).
more accurate than divided bar charts in most cases, but contrary
to Eells’ study, not all. 3.1 Datasets
Spence et al. [35] studied the effectiveness of bar charts, Selecting Datasets: To create visualizations for our experiment,
tables and pie charts. They found that when participants were we selected datasets where the participants were unfamiliar with
asked to compare combinations of proportions, the pie charts the content, but familiar with the meaning of the data attributes
outperformed bar charts. Their results also show that for tasks used in the dataset. This is particularly important since we did not
where participants were asked to retrieve the exact value of want user performance to be affected by how familiary participants
proportions, tables outperform pie charts and bar charts. In another are with the meaning of the data attributes.
study comparing the effectiveness of bar charts and line charts, We first selected five different datasets: Cereals [28], Cars [18],
Zacks and Tversky [43] indicated that when participants were Movies [10], Summer Olympics Medalists [10], and University
shown these two types of visualizations and asked to describe the Professors [28]. We then printed a part of each dataset on paper
data, they constantly used bar charts to reference the compared and showed them to six pilot participants (4 male, 2 female). We
values (e.g., A is 10% greater than B). Whereas with line charts, asked participants “Please look at data attributes used in each of
participants described trends. these datasets. Which datasets do you feel contain data attributes
Study by Siegrist [32] was one of the first studies that compared that you are more familiar with?” Cars and Movies datasets were
2D with 3D visualizations. Siegrist found that there is not a the ones that five out of the six participants selected. The Cars
significant between 2D and 3D bar charts in terms of accuracy. dataset [18] provides details for 407 new cars and the Movies
However, participants using 3D bar charts take slightly longer dataset [10] provides details for 335 movies released from 2007 to
to perform tasks. In addition, Siegrist found that accuracy of 2012.
perceiving 3D pie charts is significantly lower than 2D ones, Data Attribute Types: Both datasets include data attributes of
probably because some of the slices in the 3D pie charts are Nominal, Ordinal, and Numerical types. We define Nominal data
more obscured. Harrison et al. [17] measured the effectiveness attribute type as categorically discrete data such as types of cars
of different visualizations for explaining correlation, finding that (e.g., Sedan, SUV, Wagon). Ordinal is defined as quantities within a
parallel coordinates and scatterplots are best at showing correlation. specific range that have a natural ordering such as rating of movies
They also found that stacked bar charts outperform stacked area and (the number of unique data values ranged from 6 to 12). We define
stacked line. In a follow up study, Kay and Heer reanalyzed [20] the Numerical as continuous numerical data such as Profit values of
data collected by Harrison et al. [17]. The top ranking visualization movies. We generated visualizations using pairwise combinations
remained the same. of all three types of data attributes available in our datasets (e.g.,
While these independent studies provide helpful generic guide- Nominal * Numerical or Ordinal * Numerical).
lines, they were conducted under different conditions, varying Data Sampling: During our informal pilot study, we generated
sample sizes, datasets, and for a disperse set of tasks. In fact, visualizations representing different number of data points ranging
several of these studies used manually created visualizations in from 50 to 300, in increments of 50 data points. In these
their experiments without using actual datasets [8], [13], [35], visualization each visual mark (e.g., a circle or a bar) represented a
[43] or created visualizations using artificial datasets [17]. Also, data point. We noticed our pilot participants faced two challenges
these earlier studies have conducted experiments typically using using static visualizations containing more than 50 visual marks.
atomic generic tasks such as comparison of data values (e.g., [8], First, participants had difficulties performing some of the tasks (e.g.,
[43]) or estimation of proportions (e.g., [13], [34], [35]). However, compute derived value and characterized distribution) using static
many visual analysis tasks (e.g., filtering, finding clusters) require visualizations (error rate increased and in some cases participants
integration of results from multiple atomic tasks, limiting the appli- gave up). In addition, in some cases participants had to spend
cability of earlier findings [1], [2]. Inconsistency in experimental more than two minutes performing the tasks. Due to practical
settings and limited atomic tasks used in previous work encourages limitations of conducting the study (e.g., length and complexity of
studying the effectiveness of visualization types for larger spectrum the experiment) with a high number of visual marks, we decided
of tasks in a more consistence setting. to not show more than 50 visual marks at the time. We had two
options for not showing all the data points in our datasets.
First, we could pick a subset of data points and create
3 S TUDY D ESIGN visualizations using only that subset. In that case, each visual
When deciding which visualization types to include in our mark would represent a data point in our dataset. We could then
experiment, we balanced the familiarity of the visualizations create a bar chart showing manufacturers on the x-axis and price
considered with the comprehensiveness of the experiment. On on the y-axis. In this case, each bar represents a data point/car and
the one hand, we would like to have more generalizable results, the y-axis is the absolute price value for each car.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3
19
19 9 26
0 0 0 32
3 4 5 6 7 8 9 12 3 4 5 6 8 9 12 3 4 5 6 7 8 9 12
12 19
Cylinder Cylinder Cylinder 26
Fig. 1. Five visualization types used in this study. In this figure, each visualization shows the average highway miles per gallon (a numerical data
attribute) for cars with different number of cylinders (an ordinal data attribute).
Second, we could use the cardinality of the data attributes to Characterize Distribution. For a given set of data points and
define how many visual marks (e.g., bars in a bar chart) should be an attribute of interest, we asked participants to identify the
shown on a visualization. For example, imagine a bar chart that has distribution of that attribute’s values over the set. For example,
manufacturers on the x-axis and price on the y-axis. In this case, what percentage of the movie genres have an average gross value
we show 8 bars, each representing a manufacturer (e.g., Toyota, higher than 10 million?
BMW, etc.), and the average price for each car manufacturer on Find Extremum. For this task, we asked participants to find data
the y-axis. Thus, glyphs are not representing the data points, but points having an extreme value of an data attribute. For example,
the cardinality of the paired data attribute. This approach would what is the car with highest cylinders?
require us to have an averaged data attribute on one of the axis Filter. For given concrete conditions on data attribute values, we
(e.g., average price for different manufacturers). Cardinality of data asked participants to find data points satisfying those conditions.
attributes that were less than 50 ranged from 5 (minimum number For example, which car types have city miles per gallon ranging
of visual marks) to 34 (maximum number of visual marks). In our from 25 to 56?
study design, we went with this second approach. Order. For a given set of data points, we asked participants to rank
them according to a specific ordinal metric. For example, which
of the following options contains the correct sequence of movie
3.2 Tasks
genres, if you were to put them in order from largest average gross
We selected the tasks for our study based on two considerations. value to lowest?
First, tasks should be drawn from those commonly encountered Determine Range. For a given set of data points and an attribute
while analyzing tabular data. Second, the tasks should be present in of interest, we asked participants to find the span of values within
existing task taxonomies and often used in other studies to evaluate the set. For example, what is the range of car prices?
visualizations. Retrieve Value. For this task, we asked participants to identify
Previously, Amar et al. [1] proposed a set of ten low-level anal- values of attributes for given data points. For example, what is the
ysis tasks that describe users’ activities while using visualization value of horsepower for the cars?
tools to understand their data. First, these tasks are real world tasks
because users came up with them while exploring five different 3.3 Visualization Design
datasets with different visualization tools. Second, different studies To generate visualizations, we used three pairwise combinations
used these tasks to evaluate effectiveness of visualizations. With of three different data attribute types available in our datasets. In
this in mind, we used the low-level taxonomy by Amar et al. [1], particular, we used Nominal * Numerical, Ordinal * Numerical,
described below. Numerical * Numerical. We did not include Nominal * Nominal
Find Anomalies. We asked participants to identify any anomalies because it is not possible to represent this combination using all
within a given set of data points with respect to a given relationship five visualizations considered in this study (e.g., line chart).
or expectation. We crafted these anomalies manually so that, once To create Scatterplots, Bar Charts, and Line Charts, we used
noticed, it would be straightforward to verify that the observed the same length, font size, and color to draw their x − y axes. In
value was inconsistent with what would normally be present in the addition, all the visual elements (e.g., bars in a bar chart) used in
data (e.g., movies with zero or negative length would be considered the three charts had the same blue color. Unlike other visualizations,
abnormal). For example, which genre of movies appears to have pie charts do not have any axis to read the values from. That is,
abnormal length? to create Pie Charts we had to make design decisions on how to
Find Clusters. For a given set of data points, we asked participants show values of two data attributes used to generate them. The
to count the number of groups of similar data attribute values. For main design decision that we had to make for Pie Charts was
example, how many different genres are shown in the chart below? whether to include legends. Instead of having legends, we could
Find Correlation. For a given set of two data attributes, we potentially add labels on the top of slices of Pie Charts. We tried
asked participants to determine if there is a correlation between to put the labels on the top of slices but this caused visual clutter,
them. To verify the responses to correlation tasks, we computed particularly in cases where the labels were long. Additionally,
Pearson’s correlation coefficient (r) to ensure that there was a strong using legends for Pie Charts is a common practice in majority of
correlation (r ≤ −0.7 or r ≥ 0.7) between the two data attributes. commercial visualization dashboards [36], [39]. We decided to
For example, is there a strong correlation between average budget not show any value on the top of the slices of Pie Charts, instead
and movie rating? showing the values of one data attribute using a legend and another
Compute Derived Value. For a given set of data points, we asked one beside the slices. For Tables, we separated different rows of
participants to compute an aggregate value of those data points. the table using light gray lines. We used a darker background color
For example, what is the sum of the budget for the action and the to make the labels (two data attributes used for creating the table)
sci-fi movies? distinguishable. See Figure 1 for more details.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4
The following chart shows the average Highway Miles Per Gallon for 5 types of cars (e.g. at least 100 approved HITs as a quality check. We implemented
Sedan and SUV). What is the value of Highway Miles Per Gallon for the type Wagon?
our experiment as a web application hosted on a server external
About 26
About 32 to MTurk. Participants accessed the experiment through a URL
About 20 link posted on the MTurk site. Each worker could participate in
About 15
Sumbit
Highway Miles Per Gallon our study only once. The study took about 25 to 40 minutes to
25
complete and we compensated the workers who participated $4.
20 In order to determine the minimum number of participants
15 needed for our study, we first conducted a pilot study with
10
50 participants on Amazon’s Mechanical Turk. Based on the
data collected from our pilot study, we conducted a statistical
5
power analysis to ensure that our experiment included enough
0
Sedan SUV Sports Car Wagon Minivan
participants to reliably detect meaningful performance differences
Type across independent variables of the experiment. Our power analysis,
(a) Retrieve Value Task
based on the results of the pilot study, indicated that at least 160
participants would be required to detect a large effect.
The following chart shows the average budget for movies with different ratings. Movies After determining the number of subjects required to participate
with what ratings have the budget ranging from 115 to 190? in our study, we recruited 203 workers to participate in our
7, 9 study. Among the 203 who participated in our study 180 of
9, 4
7, 9, 6 them (105 Male, 75 Female) completed the study. The age of
6, 8, 9
our workers ranged from 25–40 years. All workers participated in
Sumbit
our experiment were based in the United States and have used vi-
sualizations before. 107 of the participants had experience creating
visualizations using Microsoft Excel. Five of the participants also
had experience in creating visualizations using Tableau software.
4.2 Procedure
Training. Before starting the main experiment, participants were
briefed about the purpose of the study and their rights. At
(b) Determine Range Task this stage, the participants were also asked to answer to some
demographic questions (e.g., age, sex, and prior experience in
The following chart shows the rating for movies with different genres. Which genre of creating visualizations). Participants were then asked to perform
movie has the highest rating?
5 trial questions (one question per visualization) as quickly and
horror
biography accurately as possible. Trial questions were presented in a random
drama order. For each participant, the training questions were a randomly
comedy
Sumbit
ordered set of these five questions. During this session, after
answering each question participants received feedback that showed
Rating
2. https://github.com/gtvalab/ChartsEffectiveness 3. https://github.com/gtvalab/ChartsEffectiveness
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5
Best Worse
Scatterplot Table Bar Chart Line Chart Pie Chart
Order
Retrieve Value
Fig. 3. Pairwise relation between visualization types across tasks and performance metrics. Arrows show that the source is significantly better than
the target.
experiment, we asked participants to "Please enter the criteria analysis of variance (ANOVA) for each task independently to test
you used for ranking the charts along with any other additional for differences among the various visualizations, datasets, and
comments you have about the experiment in general". This was their interactions with one another. While the Visualization had
to allow the participants to convey their feedback and in order to significant effects on both accuracy and time, the Dataset had no
solicit potentially unexpected insights. significant effect on accuracy or time.
Questions (training questions, main experiment questions, and
ranking questions) were pre-generated in an iterative process by all
three authors in multiple sessions. After each session, we conducted 5 R ESULTS
a pilot study to extract the major problems with the designed We first give an overview of our analysis of the results and then
questions. We had two criteria while designing questions for our discuss them in detail for each task. We provide detailed analysis of
experiment. First, for our study to have a reasonable length and the results in Table 1. Throughout the following sections, accuracy
complexity, we had to design questions with a reasonable level of refers to values in percentages (%) and time refers to values in
difficulty. For example, questions with a high level of difficulty seconds.
could frustrate the participants. To define difficulty, we designed Results, aggregated over tasks and datasets, show that Bar Chart
the questions in a way that the average response time for a single is the fastest and the most accurate visualization type. This result is
question is in the range from 10 to 40 seconds. Second, questions in line with prior work on graphical perception showing that people
are balanced across different datasets and presented comparable can decode values encoded with length faster than other encodings
values. For example, if a categorical attribute in the movies dataset such as angle or volume [5], [33], [40]. Conversely, Line Chart
had five categories, we tried to pick a variable from the cars dataset has the lowest aggregate accuracy and speed. However, Line Chart
that also had five (or around five) categories. is significantly more accurate than other charts for Correlation
and Distribution tasks. This finding concurs with earlier research
4.3 Data Analysis reporting the effectiveness of line charts for trend finding tasks
To analyze the differences among the various visualizations, for (e.g., [43]). Nonetheless, the overall low performance of Line
each participant, we calculated mean performance values for each Chart is surprising and, for some tasks, can be attributed to the fact
task and visualization type. That is, we averaged time and accuracy that the axes values ("ticks") were drawn at intervals. This makes
of questions for each visualization type and task. Before testing, we it difficult to precisely identify the value for a specific data point.
checked that the collected data met the assumptions of appropriate While Pie Chart is comparably as accurate and fast as Bar Chart
statistical tests. The assumption of normality was not satisfied and Table for Retrieve, Range, Order, Filter, Extremum, Derived
for performance time. However, the normality was satisfied for and Cluster tasks, it is less accurate for Correlation, Anomalies
log transformation of time values. So, we treated log-transformed and Distribution tasks. Pie Chart is the fastest visualization for
values as our time measurements. We conducted repeated-measures performing Cluster task. The high performance of Pie Chart
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6
TABLE 1
This figure shows performance results for 10 different tasks. Performance results for each task are shown using three sub-charts. Mean accuracy
results are shown on the left (mean accuracy is measured in percentage), mean time results are shown in the middle, and user preferences/rankings
are shown at the right (1 shows least preferred and 5 shows the most preferred). Statistical test results are also shown below the charts. All tests
display 95% confidence intervals and are Bonferroni-corrected.
0 25 50 75 100 0 10 20 30 0 25 50 75 100 0 10 20 30 0 1 2 3 4 5
0 1 2 3 4 5
Accuracy: (F(3.4,4915.1) = 3.03, p < 0.05, η p2 = 0.15) Accuracy: (F(2.6,45065.1) = 60.7, p < 0.05, η p2 = 0.78)
Results of Bonferroni-corrected post-hoc comparisons showed that Line Chart was Results of Bonferroni-corrected posthoc comparisons show that Pie Chart and Bar Chart
significantly less accurate than Scatterplot (p < 0.05). were significantly more accurate than other visualizations. (p < 0.05).
Time: (F(4,68) = 0.48, p < 0.05, η p2 = 0.27) Time: (F(3.9,67.9) = 6.9, p < 0.05, η p2 = 0.29)
Posthoc comparisons indicate that Bar Chart was significantly faster than Line Chart Pie Chart and Bar Chart were significantly faster than Table (p < 0.05) and line Chart
and Table (p < 0.05). This might be because people can decode values encoded with (p < 0.05. We believe that uniquely coloring different slices of pie charts improved the
length faster than other encodings such as angle or distance [5], [33], [40]. performance of Pie Chart for this type of tasks.
Preference: (F(3.1,45.56) = 5.9, p < 0.05, η p2 = 0.26) Preference: (F(2.9,188.56) = 30.2, p < 0.05, η p2 = 0.64)
For the Anomalies task type, results of pairwise comparisons show that user preference User preferences in using Bar Chart and Table were significantly higher than other
in performing Anomalies tasks using Bar Chart and Scatterplot were significantly visualizations (p < 0.05). While user preferences in using Bar Chart can be explained
higher than Pie Chart and Line Chart (p < 0.05). by its high accuracy and speed, it is surprising that Table was also highly preferred by
users for Cluster tasks.
0 25 50 75 100 0 10 20 30 0 1 2 3 4 5 0 25 50 75 100 0 10 20 30 0 1 2 3 4 5
Accuracy: No significant main effect was found. Accuracy: No significant main effect was found.
Time: (F(4,68) = 5.6, p < 0.05, η p2 = 0.25) Time: (F(4,0.4) = 10.4, p < 0.05, η p2 = 0.38)
Our results indicate that Scatterplot and Bar Chart are significantly faster than Pie Chart Bar Chart is significantly faster than Table (p < 0.05) and Pie Chart (p < 0.05). Previous
(p < 0.05) and Table (p < 0.05) for Distribution tasks. Previous work also showed the work also recommends using Bar Chart in cases where readers are looking for a
fast speed of Scatterplot for correlation tasks [17], [20]. maximum or minimum values [15].
Preference: (F(2.5,20528.2) = 12.1, p < 0.05, η p2 = 0.41) Preference: (F(2.8,89.4) = 8.2, p < 0.05, η p2 = 0.61)
Our results indicate that participants preferred Bar Chart, Scatterplot, and Table There is a significant main effect of Visualization on user preference. For Extremum
significantly more than Pie Chart (p < 0.05) and Line Chart (p < 0.05). It is surprising tasks, participants’ preference in using bar charts is significantly higher than all other
that even though Table was not faster than the other four visualizations, participants visualizations (p < 0.05).
highly preferred using it.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7
Order
Retrieve Value
Accuracy Time Preferences
Bar chart Bar chart Bar chart
Pie chart Line chart Table
Scatterplot Scatterplot Scatterplot
Table Table Line chart
Line chart Pie chart Pie chart
0 25 50 75 100 0 10 20 30 0 1 2 3 4 5
Accuracy: No significant main effect was found. Accuracy: No significant main effect was found.
participants stated: “Just by how accurate I felt my own answer 7 L IMITATIONS AND F UTURE W ORK
was, and how easy it was to derive the answer from the graphs.”
Neither accuracy nor speed appear to be the only criteria by Our experimental results should be interpreted in the context
which participants describe their individual rankings. Additionally, of the specified visualizations tasks, and datasets. While our
perceived accuracy does not always match with task accuracy. We findings should be interpreted in the context of the specified
noticed that for some task types such as Distribution and Cluster, settings and conditions, we tested the most common visualization
preference for using tables and bar charts is significantly higher techniques incorporated in various visualization dashboards [23],
than other visualizations, even though these two visualizations are analytical tasks used in different studies [1], [2] and datasets used
not the most effective ones for these type of tasks. Interestingly, in various studies [10], [28]. That being said, additional studies are
some of the participants took into account their familiarity with required to test our research questions taking into account different
visualizations as one of the factors for preferring some visualization visualization techniques, tasks and datasets.
over others. For example, one of the participants mentioned: “I just In this study, participants were required to perform the tasks
went with the ones I felt were familiar to me.” Another participant using static visualizations. While we are aware of the importance
also stated: “I deal with bars a lot. I know how to read them.” of interactivity and the fact that interactivity could impact user
experience with a specific visualization, we decided to exclude
6.3 Which Visualization Type to Use? interactivity because of the following reasons. First, adding
Based on our results, when time, accuracy and preference are interactivity increases the complexity of the study design. In fact, it
important factors to consider, we provide the following guidelines: would require us to take into account another set of factors including
users’ input devices such as mouse, trackpad, and touch. Moreover,
G1. Use bar charts for finding clusters. Our results show that we had to take into account interaction design/implementation.
pie charts and bar charts are significantly faster and more accurate For example, the implementation of each interaction varies across
for this type of task. However, users preference in using bar charts different input devices. Second, static visualizations are commonly
was significantly higher than using pie charts for finding clusters. used for presentation and educational purposes (e.g., visualization
Thus, bar chart having a better overall performance in terms of used in books, newspapers, and presentations). In many of these
time, accuracy, and user preferences for finding clusters. cases, visualization consumers still need to perform a variety of
G2. Use line charts for finding correlations. We found that line tasks using static visualizations. That being said, we encourage
charts and scatterplots have significantly higher accuracy and speed additional studies to directly investigate the effectiveness of these
for finding correlations. However, users preferences in using line visualizations taking into account interactivity.
charts for finding correlations was signifincalty higher than using Due to practical limitations of conducting the study using static
scatterplots. Thus, line charts performed better in terms of time, visualizations with a large number of visual marks (e.g., length
accuracy and user preferences. and complexity of the experiment), the number of visual marks
G3. Use scatterplots for finding anomalies. Results of our study shown in the visualizations used in our study is restricted to be
indicate that scatterplots have high accuracy, speed, and are highly between 5 to 34. We used the cardinality of the data attributes to
preferred by users for this type of task. define how many visual marks (e.g., bars in a bar chart, circles in a
G4. Avoid line charts for tasks that require readers to precisely scatterplot) should be shown in a visualization. However, we would
identify the value of a specific data point. The low performance like to emphasize that performance of these visualization types
of line charts for some tasks such as Derived Value and Cluster might change depending on the number of data points encoded by
might be attributed to the fact that the axes values (i.e., the “ticks”) them. Our study results hold for static visualizations with visual
were drawn at uniform intervals. This makes it difficult to precisely marks that number between 5 and 34. We defer investigation of
identify the value of a specific data point. how datapoint cardinality affects the task-based performance of
G5. Avoid using tables and pie charts for correlation tasks. visualizations to future work.
Findings indicate that Tables and pie charts are significantly less In this study, we investigated the effectiveness of five basic two-
accurate, slower, and less preferred by users for this type of task. dimensional visualization types. However, some of the visualization
types can be extended to more than two dimensions (e.g., line chart).
6.4 How to engineer empirical user performance data Performance of these visualization types might change depending
into practical systems? on their dimensionalities. One interesting avenue of continued
Graphical perception experiments are the work-horse of our quest research is to investigate the impact of the number of dimensions
to understand and improve the effectiveness of visualizations. represented by a visualization type on its effectiveness.
Guidelines and heuristics that we use today in data visualization
are primarily due to the accumulation of experimental results over
decades. It is not, however, always possible to extract guidelines 8 C ONCLUSION
from data collected by user experiments. Even when this is
possible, such derived guidelines require visualization practitioners In this work, we report the results of a study that gathers user
to manually incorporate them in visualizations systems. We performance and preference for performing ten common data
believe machine learning models provide a practical opportunity to analysis tasks using five small scale (5-34 data points) two-
implicitly engineer the insights embodied by empirical performance dimensional visualization types: Table, Line Chart, Bar Chart,
data into visualization systems in an unbiased and rigorous manner. Scatterplot, and Pie Chart. We use two different datasets to
Kopol is a basic example of how this can be achieved. To drive further support the ecological validity of results. We find that
Kopol, we train a decision tree on the data we collected. Kopol the effectiveness of the visualization types considered significantly
then uses the learned model to recommend visualizations at “test” changes from one task to another. We compile our findings into a
time for given user tasks and datasets. set of recommendations to inform data visualization in practice.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9