Data Visualization Literacy
Data Visualization Literacy
Data Visualization Literacy
PAPER
Data visualization literacy: Definitions, conceptual
frameworks, exercises, and assessments
Katy Börnera,b,1, Andreas Buecklea, and Michael Gindaa
a
School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN 47408; and bEducational Technology/Media Centre, Dresden
University of Technology, 01062 Dresden, Germany
Edited by Ben Shneiderman, University of Maryland, College Park, MD, and accepted by Editorial Board Member Eva Tardos December 6, 2018 (received for
review May 20, 2018)
In the information age, the ability to read and construct data A review of major international education surveys with varying
visualizations becomes as important as the ability to read and write degrees of global coverage and diverse intended age groups can
text. However, while standard definitions and theoretical frame- be found in ref. 5.
works to teach and assess textual, mathematical, and visual literacy Mathematical literacy (also referred to as “numeracy”) has been
exist, current data visualization literacy (DVL) definitions and defined as an “understanding of the real number line, time, mea-
frameworks are not comprehensive enough to guide the design surement, and estimation” as well as an “understanding of ratio
of DVL teaching and assessment. This paper introduces a data concepts, notably fractions, proportions, percentages, and proba-
visualization literacy framework (DVL-FW) that was specifically bilities” (6). PISA defines it as “an individual’s capacity to formu-
developed to define, teach, and assess DVL. The holistic DVL-FW late, employ, and interpret mathematics in a variety of contexts,”
promotes both the reading and construction of data visualizations, including “reasoning mathematically and using mathematical con-
a pairing analogous to that of both reading and writing in textual cepts, procedures, facts and tools to describe, explain and predict
phenomena.” PISA administers standardized tests for math,
SOCIAL SCIENCES
literacy and understanding and applying in mathematical literacy.
Specifically, the DVL-FW defines a hierarchical typology of core
problem solving, and financial literacy (7). The PISA 2015 Draft
Mathematics Framework (8) explains the theoretical underpin-
concepts and details the process steps that are required to extract
nings of the assessment, the formal definition of mathematical
insights from data. Advancing the state of the art, the DVL-FW
literacy, the mathematical processes that students undertake
interlinks theoretical and procedural knowledge and showcases
when using mathematical literacy, and the fundamental mathe-
how both can be combined to design curricula and assessment matical capabilities that underlie those processes.
measures for DVL. Earlier versions of the DVL-FW have been used Visual literacy was initially defined as a person’s ability to
to teach DVL to more than 8,500 residential and online students, “discriminate and interpret the visible actions, objects, and
and results from this effort have helped revise and validate the symbols natural or man-made, that he encounters in his envi-
DVL-FW presented here. ronment” (9). In 1978, it was defined “as a group of skills which
enable an individual to understand and use visuals for in-
data visualization | information visualization | literacy | assessment | tentionally communicating with others” (10). More recently, the
learning sciences Association of College and Research Libraries defined stan-
dards, performance indicators, and learning outcomes for visual
literacy (11, 12). In the academic setting, Avgerinou (13) de-
T he invention of the printing press created a mandate for
universal textual literacy; the need to manipulate many large
numbers created a need for mathematical literacy; and the
veloped and validated a visual literacy index by running focus
groups of visual literacy experts, and Taylor (14) reviewed vi-
ubiquity and importance of photography, film, and digital drawing sual, media, and information literacy, arguing for the design of
tools posed a need for visual literacy. Analogously, the increasing a visual language and coining the term “visual information
literacy.”
availability of large datasets, the importance of understanding
DVL, also called “visualization literacy,” has been defined as
them, and the utility of data visualizations to inform data-driven the “the ability to confidently use a given data visualization to
decision making pose a need for universal data visualization literacy translate questions specified in the data domain into visual
(DVL). Like other literacies, DVL aims to promote better com- queries in the visual domain, as well as interpreting visual pat-
munication and collaboration, empower users to understand their terns in the visual domain as properties in the data domain” (15);
world, build individual self-efficacy, and improve decision making “the ability and skill to read and interpret visually represented
in businesses and governments. data in and to extract information from data visualizations” (16);
and “the ability to make meaning from and interpret patterns,
Pursuit of Universal Literacy trends, and correlations in visual representations of data” (17).
In what follows, we review definitions and assessments of textual, Other works have sought to advance the assessment of DVL.
mathematical, and visual literacy and discuss an emerging con- Boy et al. (15) applied item response theory (IRT) to assess
sensus around the definition and assessment of DVL.
Textual literacy, according to the Organisation for Economic
Co-operation and Development’s (OECD’s) Program for In- This paper results from the Arthur M. Sackler Colloquium of the National Academy of
ternational Student Assessment (PISA), is the process of “un- Sciences, “Creativity and Collaboration: Revisiting Cybernetic Serendipity,” held March
derstanding, using, reflecting on and engaging with written texts, in 13–14, 2018, at the National Academy of Sciences in Washington, DC. The complete pro-
order to achieve one’s goals, to develop one’s knowledge and gram and video recordings of most presentations are available on the NAS website at
www.nasonline.org/Cybernetic_Serendipity.
potential, and to participate in society” (1). Major tests for textual
literacy are issued by PISA (2) and are regularly administered in Author contributions: K.B. designed research; K.B., A.B., and M.G. performed research;
and K.B., A.B., and M.G. wrote the paper.
over 70 countries to measure how effectively they are preparing
The authors declare no conflict of interest.
students to read and write text. Another major international test,
Progress in International Reading Literacy Study (PIRLS), has This article is a PNAS Direct Submission. B.S. is a guest editor invited by the Editorial
Board.
measured reading aptitude for fourth graders every 5 years since
2001. For advanced students, the Graduate Record Examination Published under the PNAS license.
1
Subject Tests are widely used to assesses verbal reasoning and an- To whom correspondence should be addressed. Email: [email protected].
alytical writing skills for people applying to graduate schools (3, 4). Published online February 4, 2019.
SOCIAL SCIENCES
visualized. While focusing on visualization construction and in-
the DVL-FW proposed here. Any alignment of previously pro-
terpretation, the DVL-FW does cover different types of analyses
posed needs and tasks will be imperfect, as detailed definitions of
that are commonly used to preprocess, analyze, or model data
terms do not always exist. An extended discussion of additional
before they are visualized (Table 1, column 3). Five general types
prior works and their tabular alignment can be found in ref. 33.
are distinguished: statistical analysis (e.g., to order, rank, or sort);
Data scales. Data variables may have different scales (e.g., qualitative
temporal analysis answering “when” questions (e.g., to discover
or quantitative), influencing which analyses and visual encodings can
trends); geospatial analysis answering “where” questions (e.g., to
be used. Building on the work of Stevens (39), the DVL-FW dis-
identify distributions over space); topical analysis answering
tinguishes nominal, ordinal, interval, and ratio data based on the
“what” questions (e.g., to examine the composition of text); and
type of logical mathematical operations that are permissible (Fig. 1).
relational analysis answering “with whom” questions (e.g., to ex-
The approach subsumes Bertin’s (25) three data-scale types: quali-
amine relationships; also called network analysis). Algorithms for
tative, ordered, and quantitative—which roughly correspond to the different types of analyses come from statistics, geography,
nominal, ordinal, and quantitative (also called “numerical”; includes linguistics, network science, and other areas of research. The tools
interval and ratio). Bertin’s terminology was later adopted by used in the IVMOOC (see below) support more than 100 differ-
MacEachren (29) and many other cartographers and information ent temporal, geospatial, topical, and network analyses (42).
visualization researchers (27, 30, 38). Atlas of Knowledge: Anyone Visualizations. Any comprehensive and effective DVL-FW must
Can Map (33) has a more detailed discussion of different approaches contend with the many existing proposals for visualization naming
and their interrelations. and classification (33). For example, Harris (28) details hundreds
Nominal data (e.g., job type) have no ranking but support of visualizations and distinguishes tables, charts (e.g., pie charts),
equality checks. Ordinal data assumes some intrinsic ranking but graphs (e.g., scatterplots), maps, and diagrams (e.g., block dia-
not at measurable intervals (e.g., chapters in a book). Interval- grams, networks, Voronoi diagrams). Bertin (25) distinguishes
and ratio-scale data assume that the distance between any two diagrams, maps, and networks. Based on an extensive literature
adjacent values is equal. For interval data, the zero point is ar- and tool review and with the goal of providing a universal set of
bitrary (e.g., Celsius or Fahrenheit temperature scales), while for visualization types, the DVL-FW identifies five general types: ta-
ratio, there exists a unique and nonarbitrary zero point (e.g., ble, chart, graph, map, tree, and network visualizations (Table 1,
length or weight). Logical mathematical operations permissible column 4) (definitions and examples are in ref. 33).
for the different data-scale types are given in Fig. 1. In addition, the DVL-FW distinguishes between the reference
Note that quantitative data can be converted into qualitative system (or base map) and data overlays. Fig. 2 exemplifies typical
data (e.g., one may use thresholds to convert interval data into reference systems for four visualization types. All four support the
ordinal data). Ordinal rankings can be converted to yes/no cat- placement of data records (data records can be connected via
egorical decisions (e.g., to make funding decisions). The reverse linkages); color coding of table cells, graph areas, geospatial areas
(e.g., in choropleth maps), or subnetworks; and the design of an-
imations (e.g., changes in the number of data records over time).
Some visualizations use a grid reference system (e.g., tables), while
others use a continuous reference system (e.g., scatterplot graph
or geospatial map). Some visualizations use lookup tables to po-
sition data [e.g., lookup tables for US zip codes to latitude/longi-
tude values or journals to the position of scientific disciplines in
science maps (43)]. One visualization can be transformed into
another. For instance, changing the quantitative axes of a graph
into categorical axes results in a table. Similarly, interpolation
applied to discrete area geospatial (or topic) maps results in
continuous, smooth surface elevation maps.
Prior research on DVL shows that people have difficulties reading
Fig. 1. Logical mathematical operations permissible, measure of central most visualization types but especially, networks (17, 18). Controlled
tendency, and examples for different data scale types. laboratory studies examining the recall accuracy of relational data
SOCIAL SCIENCES
symbol and graphic variable types (position, size, orientation, insight needs (Table 1, column 1). Just as a verbal math problem
color, and luminance) together support what visual insight needs needs to be reformulated into a numerical math problem, the
(e.g., the identification of outliers, trends, or clustering as shown in verbal or textual description of a real-world problem presented
Fig. 5). These findings make it possible to order graphic variables by a stakeholder must be operationalized (i.e., reformulated into
by effectiveness and guide the selection and combination of vari- a data visualization problem so that appropriate datasets, anal-
ables when constructing data visualizations. ysis and visualizations workflows, and deployment options can be
Interactions. The DVL-FW recognizes that, while some visualiza- identified). Math assessment frameworks allocate up to one-half
tions are static (e.g., printed on paper), many can be manipulated of the overall problem-solving effort for the translation of verbal
dynamically using diverse types of interaction. Shneiderman (32) to numerical problems; analogously, major effort is required to
identifies overview, zoom, filter, details on demand, relate (view- translate real-world problems into well-defined insight needs.
ing relationships among items), history (keeping a log of actions to
Acquire. Given well-defined insight needs, relevant datasets and
support undo, replay, and progressive refinement), and extract
other resources can be acquired. Data quality and coverage will
(access subcollections and query parameters). Keim (54) distin-
strongly impact the quality of results, and much care must be
guishes zoom, filter, and link and brush as well as projection and
distortion techniques as a means to provide focus and context. The taken to acquire the best dataset with data scales that support
typology proposed by Brehmer and Munzner (55) covers two main subsequent analysis and visualization.
abstract visualization tasks. The first is “why,” which includes Analyze. Typically, data need to be preprocessed before they can be
consume (present, discover, enjoy, produce), search (lookup, visualized. This step can include data cleaning (e.g., identify and
browse, locate, explore), and query (identify, compare, summa- correct errors, deduplicate data, deal with missing data, anomalies,
rize). The second is “how,” which consists of encode, manipulate unusual distributions); data transformations (e.g., aggregations,
(select, navigate, arrange, change, filter, aggregate), and introduce geocoding, network extraction); and statistical, temporal, geo-
(annotate, import, derive, record). Heer and Shneiderman (56) spatial, topical, or relational network analyses (Table 1, column 3).
focus on the flexible and iterative use of visualizations by naming Visualize. This step can be split into two main activities: pick
12 actions ordered into three high-level categories: data and view reference system (or base map) and design data overlay. The first
specification (visualize, filter, sort, derive), view manipulation activity is associated with selecting a visualization type, and the
(select, manage, coordinate, organize), and process and prove- second activity is associated with mapping data records and
nance (record, annotate, share, guide). As before, the DVL-FW
covers core interaction types, including zoom, search and locate,
filter, details on demand, history, extract, link and brush, pro-
jection, and distortion (Table 1, column 7).
SOCIAL SCIENCES
FW. Together, the assessments allow instructors to check the de-
gree to which the students are meeting the learning objectives. vant dataset(s), construct an appropriate visualization, and ver-
bally communicate key insights to the client.
Discussion and Outlook Horizontal transfer. The DVL-FW aims to ease the transfer of
In this paper, we have presented a typology, process model, and knowledge across visualization-type reference systems (Fig. 2).
Knowledge on how to construct graphs with diverse data overlays
exercises for defining, teaching, and assessing DVL. The DVL-
should make it easier to read and construct other visualization
FW combines and extends pioneering works by leading experts
types. For example, Friel et al. (64) propose a sequence for the
to arrive at a comprehensive set of core types and major process introduction of graph-type visualizations to students of different
steps required for the systematic construction and interpretation ages. Additional user studies are needed to determine how prior
of data visualizations. As a key contribution, this paper interlinks knowledge impacts the reading and construction of visualizations so
the typology and process steps and presents a set of DVL-FM that the typology and process steps can be taught most effectively.
exercises and assessments that can be used by anyone interested Reciprocity. Recent work shows that visualization construction
to measurably improve DVL. Early versions of the DVL-FW (i.e., starting with a reference system and then adding graphic
were implemented and tested in the IVMOOC over the last 6 symbols and additional graphic variables) leads to better un-
years and have informed the DVL-FW typology, process model, derstanding and interpretation of the visualization than decon-
exercises, and assessments presented here. structing a complete visualization (70). Additional user studies
are needed to determine the strength of transfer between con-
Controlled User Studies. Going forward, there is a need to run structing and reading visualizations of different types and what
controlled user studies to understand difficulty levels for the construction workflows are most effective for increasing DVL.
diverse DVL-FW types and process steps and their combinations
and to provide additional guidance for the construction of Outlook. DVL is of increasing importance for making sound
effective visualizations based on scientific evidence. Seminal decisions in one’s personal and professional life. Existing literacy
studies by Cleveland and McGill (66) and Heer and Bostock (52) tests—a review is in Pursuit of Universal Literacy—include sta-
have examined the effectiveness of different visual encodings. A tistical graphs as part of mathematical and financial literacy tests.
similar study design can be used to examine the effectiveness of a In the United States, K–12 national standards for math and science
larger range of graphic symbol types, variable types, and their cover statistical graphs (71, 72) and geospatial maps (72). However,
combinations (53). Work by Wainer (67) and Boy et al. (15) used most exercises ask students to read (not construct) data visualiza-
IRT to compute DVL scores for the interpretation of different tions; topical or network analyses and visualizations are rarely
graph visualizations. IRT was also used in the IVMOOC to as- covered. Adding DVL literacy exercises and assessments to existing
sess student DVL when constructing visualizations, but more tests or establishing separate DVL literacy tests will make it possible
work is needed to optimally use the DVL-FW for teaching to assess how effectively different classes, schools, corporations,
visualization construction. countries, etc. are preparing students to read and construct data
visualizations; what interventions and exercises work for what age
User Studies in the Wild. In addition to laboratory experiments, groups and industries/research areas; and how to further improve
there is a need to understand how general audiences can construct DVL typology, processes, exercises, and assessments via a close
and interpret data visualizations in real-world settings using so- collaboration among academic and industry experts, learning sci-
called “research in the wild” (68). Building on prior work assessing entists, instructional developers, teachers, and learners.
the DVL of science museum visitors (17), we are developing a
museum experience that lets visitors first generate and then vi- ACKNOWLEDGMENTS. We thank Anna Keune and the anonymous re-
sualize their very own data using a so-called “Make-a-Vis” (MaV) viewers for their extensive expert comments on an earlier version of this
setup. MaV is aligned with the DVL-FW and supports the map- paper. We appreciate the support of figure design by Tracey Theriault and
ping of data to visual variables via the drag and drop of column Leonard E. Cross and copyediting by Todd Theriault. This work was
headers to axis and legend areas in a data visualization. The active partially supported by a Humboldt Research Award, NIH Awards
U01CA198934 and OT2OD026671, and NSF Awards 1713567, 1735095,
learning setup aims to empower learners to become producers and and 1839167. Any opinions, findings, and conclusions or recommendations
creators across the lifespan—in line with recommendations found expressed in this material are those of the author(s) and do not necessarily reflect
in How People Learn II: Learners, Contexts, and Cultures (69). the views of the NSF.