Guidelines For Pursuing and Revealing Data Abstractions
Guidelines For Pursuing and Revealing Data Abstractions
2, FEBRUARY 20211503
Fig. 1. A summary of study events over time, their temporal relationship with memos, memo relationships with codes, and code
relationships with themes. The timeline at the top shows the timing of study events, with curved lines indicating when individual memos
were created. The four rows below the timeline indicate the nature of the context in which memos were written, including Meetup
attendance, when data workers discussed their applied datasets, when the authors engaged in theoretical discussions, and when the
authors engaged in open coding. Rows C1-C24 show which memos directly informed the development of codes. Columns T1–T4
show which codes directly inform which themes.
Abstract—Many data abstraction types, such as networks or set relationships, remain unfamiliar to data workers beyond the
visualization research community. We conduct a survey and series of interviews about how people describe their data, either directly or
indirectly. We refer to the latter as latent data abstractions. We conduct a Grounded Theory analysis that (1) interprets the extent to
which latent data abstractions exist, (2) reveals the far-reaching effects that the interventionist pursuit of such abstractions can have on
data workers, (3) describes why and when data workers may resist such explorations, and (4) suggests how to take advantage of
opportunities and mitigate risks through transparency about visualization research perspectives and agendas. We then use the themes
and codes discovered in the Grounded Theory analysis to develop guidelines for data abstraction in visualization projects. To continue
the discussion, we make our dataset open along with a visual interface for further exploration.
Index Terms—Data abstraction, Grounded theory, Survey design, Data wrangling
1 I NTRODUCTION
Data abstractions are fundamental to a wide set of visualization activi- significant threat that risks creating solutions and systems that do not
ties, from performing and documenting the provenance of data wran- address real needs [30]. Too much focus on a single data abstraction has
gling operations, to understanding the mental models of domain experts been observed to limit creativity [3] and to warp scientific analysis [2].
in design study research, to justifying design decisions in technique- However, the extent to which these effects apply, in terms of specific
or systems-focused research, and to reasoning about the role of data abstractions, is poorly understood.
abstraction in theoretical visualization research. Difficulties in rea- We set out to understand how malleable a data abstraction is, and to
soning about and communicating data abstractions therefore have far- better understand the process of pursuing latent data abstractions. We
reaching implications: effective communication about data abstractions define a latent data abstraction to be one that is meaningful and useful,
is critically important to the way researchers justify design decisions yet undiscovered. It has yet to be fully elucidated, communicated,
in technique- or systems-focused research. A poor understanding of documented, and formatted. A data abstraction becomes less latent as
the mental models of domain experts in design study research is a coherent details are identified, as its details are spoken or written, and
as its artifacts in a computer are actualized into relevant forms.
Because there were blind spots in the questions that we should even
• Alex Bigelow, Katy Williams, and Katherine E. Isaacs are with the ask, we chose to conduct a Grounded Theory Method investigation seek-
University of Arizona. E-mail: alexrbigelow@email — kawilliams@email ing to discover how a diverse range of data workers, from spreadsheet
— [email protected]. users to programmers, across different disciplines, consider different
data abstractions. This investigation analyzes memos, or research field
notes taken during conversations, meetings, and interviews, as well as
the results of a deployed survey.
Manuscript received 30 Apr. 2020; revised 31 July 2020; accepted 14 Aug. 2020.
Date of publication 30 Oct. 2020; date of current version 15 Jan. 2021. The result is an evidence-based set of codes and themes regarding
Digital Object Identifier no. 10.1109/TVCG.2020.3030355 data abstractions with implications for how project teams and individ-
1077-2626 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
Authorized licensed use limited to: University of See https://www.ieee.org/publications/rights/index.html
Prince Edward Island. Downloaded on May 28,2021 for moreatinformation.
20:49:57 UTC from IEEE Xplore. Restrictions apply.
1504 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 27, NO. 2, FEBRUARY 2021
uals discover, wrangle, manage, and report their data abstractions. In There are many ways to conduct a Grounded Theory investigation.
particular, we find that introducing a data abstraction typology—a In this study, we identified, discussed, and refined each code and theme
model that describes the space of possible data abstractions and/or data as a team [48]. We used surprise as a principled way to guide our choice
wrangling operations—can elicit rich communication and reflection of research activities [28]; the extent to which we pursued interactions
about data and uncover latent data abstractions, even when such a with data workers, and adapted and deployed a survey, were motivated
typology is imperfect. We show how visualization researchers can in- by identifying gaps in our own knowledge and unanticipated findings.
crease actionable communication with data workers by introducing and In contrast, we also used our lack of surprise as a qualitative indicator
critiquing a typology together, as a visualization design activity [24]. to know when codes and themes had reached saturation.
The codes and themes in this paper also add to existing literature Grounded Theory can also be employed for different epistemolog-
by explaining some of the reasons why communicating about data ical goals. In contrast to the positivist research that we typically see
abstractions can be so challenging. Reflecting on these themes and in the visualization research community [25], our Grounded Theory
our collective interactions with data workers, we provide guidelines investigation had interpretivist objectives [47]. Interpretivist research
for communicating with data workers about data abstractions, that also aims to describe phenomena and generate hypotheses. This is in con-
have applications for more crisp communication about data abstractions trast to the positivist approach used in the scientific method that aims
in design study, technique, systems, and theoretical research papers. to test hypotheses. The four interpretivist themes that we identify, and
We have made the raw data collected in our survey available through their supporting codes, are transferable, in contrast to the way that
an interactive visual interface.1 We also include a version-controlled formal theories are generalizable. Both intellectual traditions require
archive2 of codes, themes, and an audit trail [6] that summarize memos systematic analysis of evidence, but the nature of supporting data and
of observations from a year of interviews and meetings with diverse data the ways that data are collected and analyzed are different.
workers, as well observations from the visualized survey responses. In presenting qualitative research, we are careful of pitfalls [40] in re-
In summary, our contributions are: porting numbers and counts: we include the visualized corpus of survey
1. A set of themes, supported by codes, that describe phenomena responses1 to maximize available context. Our numeric statements and
associated with data abstractions that arise in the processes of visualizations are meant as interpretivist descriptions of phenomena
visualization design and data wrangling (Sect. 5), associated with how data workers think and communicate about data
abstractions, not positivist statements of statistical significance.
2. Guidelines for developing data abstractions (Sect. 6.1), Although this is not a visualization design study, Meyer and Dykes’
3. The design of an open survey regarding the description of data six categories for judging and reporting rigor [25] are relevant for the
and the malleability of data abstractions (Sect. 3.3, Sect. 6.4), and kind of interpretivist research that we present. This research is informed
by our relevant prior research experiences; reflexive in our efforts to
4. An open, visualized corpus of survey responses.1 constantly compare [7] collected data and gaps in our understanding;
We begin by discussing necessary background and a review of related abundant through the number of survey participants and diversity of
work (Sect. 2), and our methodology (Sect. 3). We present the codes interview and Meetup participants; plausible through documented con-
derived from our study (Sect. 4) and how they come together to form nections from memos and survey responses, to codes, and to themes;
themes (Sect. 5). We follow with guidelines and reflections (Sect. 6). resonant in that the themes have broad implications for how visualiza-
tion research is conducted and reported; and transparent through the
2 BACKGROUND AND R ELATED W ORK public release of the survey, its responses, and the revision history of
the evolution of our codes, themes, and relevant metadata.
We discuss the theoretical underpinnings of our work (Sect. 2.1), related
background in thinking and communicating about data in analysis and 2.2 Thinking and Communicating About Data
design projects (Sect. 2.2), the importance of documenting real-world
wrangling needs (Sect. 2.3), and the context in which this work fits into We build on other efforts to understand how data workers think and
research into creativity (Sect. 2.4). communicate about data. From the beginning of our research, our
main focus has been to expand understanding of one specific approach
2.1 Theoretical Underpinnings identified by Muller et al. [29]: how data workers approach the design
of their data, as opposed to discovery, capture, curation, and creation.
This study employs a team-based [48], interpretivist form [47] of
Many authors have noted the designed nature of data abstrac-
Grounded Theory Methodology, resulting in the development and re-
tions [26], such as the handcrafted nature of many cybersecurity
finement of four themes—these four themes, with their supporting
datasets [19]. Feinberg observes that the mere use of a dataset makes
codes, comprise what is often termed a substantive theory [28].
the user a designer of its abstraction [10], even if users are unaware
Grounded Theory is an approach that is uniquely suited for inves-
of their inherent flexibility. Consequently, there is a need to learn to
tigating and describing phenomena in which questions evolve rapidly.
develop a “data vision” to exercise discretion and creativity in design-
The general pattern of a Grounded Theory investigation involves identi-
ing abstractions [34]. This is especially important in light of ethical
fying and refining codes, or concepts that describe phenomena while
responsibilities to structure data effectively [9], as the design of what is
conducting diverse research activities, such as performing interviews
measured and how it is stored can be overtly political acts [37].
or conducting a survey. The choice of research activity is typically
The responsibility to design effective abstractions does not always
informed by the codes as questions evolve. As codes mature and are
fall upon data workers in isolation. In the context of visualization
grouped into categories, they begin to form themes, or evidence-based
design studies that involve individuals with diverse roles and exper-
hypotheses about a phenomenon. Eventually, codes and themes reach
tise, effective data abstraction design [30] and communication about
saturation, or a point at which researchers are confident that codes and
abstractions as they evolve [41], are critical to the success of a project.
themes are stable and no additional data needs to be collected.
However, difficulties arise in effectively communicating about data
Grounded Theory Methodology was an appropriate fit for beginning
abstractions [38]. There are myriad aspects to data abstractions in
this investigation because our initial suspicions—that non-tabular data
design projects, such as adapting to data changes, anticipating edge
abstractions may be comprehensible, useful, and under-utilized among
cases, understanding technical constraints, articulating data-dependent
the broad population of data workers—were very general and based
interactions, communicating data mappings, and preserving data map-
on a small number of surprising observations [2, 3]. The nature of
ping integrity across iterations [46]. These difficulties are consistent
the questions that we should pursue were prone to rapid revision and
with reports of there being surprisingly little documentation about the
refinement as additional, surprising observations arose.
design of abstractions [50]. The lack of documentation makes human
1 osf.io archive of survey responses: https://osf.io/s2wmp/ decisions invisible and threatens future analysis. In strictly machine-
Selected response visualizations are included in the supplemental material learning contexts, some authors have gone as far as suggesting that
2 osf.io archive of codes, themes, and audit: https://osf.io/382fn/ “deemphasizing the need to understand algorithms and models” [35]
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 28,2021 at 20:49:57 UTC from IEEE Xplore. Restrictions apply.
BIGELOW ET AL.: GUIDELINES FOR PURSUING AND REVEALING DATA ABSTRACTIONS1505
may be an effective way to increase trust in model predictions. We We add to precedents for pragmatic guidance for creativity in vi-
show that the inverse is also true: that education and transparency can sualization design, including creativity workshops [13, 18] and exer-
foster healthy skepticism of data models and abstractions, which can cises [24]—we propose the pursuit of latent data abstractions as an
be important for fairness and provenance. We argue that transparency additional creativity exercise, specific to the design of data itself.
about data abstractions can be especially important for data wrangling
and visualization, in which data workers need to “interact not only with 3 M ETHODOLOGY
the interface but with the data” [46]. The evidence upon which we base our findings comes from two sources:
To facilitate communication about a particular project’s specific data memos and a deployed survey about data abstraction perspectives. It
abstraction, the visualization research community often relies exten- is important to note that, consistent with our interpretivist objectives,
sively upon data abstraction typologies [8, 11, 31]. Currently, the main many of the following methods are deliberately uncontrolled—rather
purposes of such typologies are to guide a researcher in the selection than testing hypotheses, our goal is to ask better questions. Here we
of appropriate visual encodings, and to support transferability across discuss both sources of data, and the way that they both influenced, and
different design studies. However, aside from highly contextual design were influenced by, our internal data abstraction typology.
study research itself, there is little data that reveals the extent to which
the visualization research community’s typologies are compatible with 3.1 Memos and Timeline
data workers’ perspectives and language, and, although the interven- We wrote memos in four contexts: 1) regular attendance at data-
tionist nature of design study research is known [22], the effects of centered community Meetups, 2) applied conversations with data work-
introducing foreign data concepts have yet to be described in detail. ers in diverse contexts about their perspective on their data, 3) theo-
retical discussions about data abstractions among the authors, and 4)
2.3 Data about Applied Wrangling Needs collaborative open coding sessions. A summary of all memos, their rela-
Little applied data wrangling work has been published in the visual- tionships with codes, and code relationships with themes, are shown in
ization community, even though novel algorithms, data structures, and Fig. 1, and an associated audit trail [6] is in the supplemental material.2
infrastructure need to be implemented in ways that correctly address This project began with theoretical conversations about the nature of
nuanced worker needs. Such efforts often consume the bulk of the data abstractions between the authors, that arose occasionally as part of
labor involved in applied visualization research [14, 17, 29], and can regular meetings. Early on, we decided to engage with an existing local
include rich refinements in terms of task clarity and data location that Meetup group that regularly met to seek or provide help with data: a
advance science and constitute important visualization research contri- core group of regular members met twice per week at a coffee shop or
butions in their own right [41], yet, without also engineering a polished bar, and continued to meet remotely beginning in March due to social
visualization system, such work has lacked clear publication venues. distancing measures. Members and visitors frequently brought laptops
The lack of such work leaves a major gap in needed visualization to show data and code that they were working with, to solicit advice or
research. Although our work includes a qualitative dataset that only help with debugging in a casual context. The core group and its frequent
begins to fill this gap, the extent to which data workers use or even visitors included a diverse array of researchers, administrators, and data
consider different data abstractions is still difficult to analyze or test, scientists from the local university and surrounding community. As
as data wrangling decisions are rarely documented in research or in these meetings and interactions were largely ad-hoc, an accurate count
practice [50]. When such decisions are documented in research, they of all informants is impossible to report, however, a selected subset of
typically only exist as justification for a visualization design; resulting these community members—those that provided specific information
in limited information about the data abstraction, its provenance, and that informed the development of a code—are shown in Table 1.
important documentation about how and why it was reshaped. Later, as our survey was developed, it was deployed among this
group, as well as at the 2019 IEEE VIS and 2019 Supercomputing
It can consequently be difficult to justify technique-driven or systems-
conferences. Each of the 219 survey responses are included in the
focused research into general-purpose data wrangling software sys-
supplemental material.1 Deployments of the survey often prompted
tems [4, 14, 16, 17, 21, 42, 44], as such efforts often lack grounding in
conversations that provided additional valuable insight that we added
real user needs. Instead, they are forced to rely upon past researcher
to our growing set of memos.
experience, scant hints about real-world data wrangling precedents that
exist in design study literature, and speculation about how data workers As concepts and patterns began to be less surprising, the authors
might think and what operations they might find useful. This study, and began to identify codes from supporting evidence, in a collaborative
future standalone publications that are focused on data transformations, open coding environment similar to the one described by Wiener [48].
can help to better inform the design of such systems. After writing and agreeing upon a framework for documenting codes in
a version-controlled repository2 , the authors began to meet 2-3 times
per week to discuss, refine, and write codes that we had identified as
2.4 Creativity and Creative Roles we reviewed survey responses and our individual field notes. As we
Discovering a latent data abstraction can have powerful creative ben- discussed different patterns in the data, each author actively cited [27]
efits, such as inspiring radical visual innovations [23, 33]. Although supporting personal experience, memos from a related interview, or spe-
the work that we present has implications for visualization researchers cific survey responses to support or contest the proposed code. Where
and their interactions with the broader population of data workers, our personal experience was identified as evidence, additional memos were
primary objective is to compare and contrast sets of creative objectives written to document the experience. As we began to observe broader
that can be held by any kind of data worker—including visualization themes across codes, these were also written, discussed, refined, and
researchers themselves. Consequently, we identify the role of an ab- connected to codes. Refinements to codes included citing additional
straction theorist that seeks to discover useful latent data abstractions, evidence, rephrasing codes, splitting codes, or combining codes in an
and contrast that objective against the broad set of all other concerns ad-hoc process similar to affinity diagramming [1, 15], but in a version-
that a data worker may need to consider, such as data wrangling, data controlled text file instead of using cards or notes. Finally, an audit
ownership, workflow management, the design and implementation of was conducted to verify the nature of the source data, the relationships
visualizations, evaluation, and reporting on visualization research. between memos and survey responses to codes, and the relationships
Contrasting these roles is similar in spirit to Von Oech’s popularized between codes and themes; the result of the audit is visualized in Fig. 1.
“explorer, artist, judge, warrior” creative roles [45]: “theorist” and
“worker” may refer to distinct individuals in a collaborative environment, 3.2 Data Abstraction Typology Evolution
such as a visualization researcher and domain expert, or they could refer We began our investigation by adapting the data abstraction typology
to different priorities that a single individual is considering on their own. described by Tamara Munzner [31] to a data wrangling context: our ini-
Therefore, we describe differences through a pragmatic lens, instead of tial objective was to describe a design space of possible data wrangling
analyzing different populations’ creative styles or cognition [43]. operations, so we modeled operations as edges in a complete graph,
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 28,2021 at 20:49:57 UTC from IEEE Xplore. Restrictions apply.
1506 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 27, NO. 2, FEBRUARY 2021
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 28,2021 at 20:49:57 UTC from IEEE Xplore. Restrictions apply.
BIGELOW ET AL.: GUIDELINES FOR PURSUING AND REVEALING DATA ABSTRACTIONS1507
Consent
Contact Settings
About this survey
Your Responses
Domain Characterization
Basic Dataset Characteristics
Initial Data Abstraction
Tabular Details
Network / Hierarchy Details
Spatial / Temporal Details
Grouped Details
Textual Details
Media Details
Initial Debrief
Alternative ___ Details
Reflections
Debrief
Fig. 3. An overview of the survey that we deployed. The survey is divided into three sections, shown here as a flow diagram. The first section (A)
includes consent forms, contact settings, an introduction to the innovations in the survey, and a summary of responses that redirect to the other two
survey portions. The main “Describe a new dataset” portion of the survey (B) invites participants to describe a real or imagined dataset, and asks
them to reflect upon the extent to which they think about the dataset in terms of the six dataset types that we identified. Where participants reply that
they at least “rarely” think of their data in terms of a given type, they are asked for more details in a specialized Details section of the survey. The final
“Explore alternative” portion of the survey (C) invites participants to imagine their dataset as the type that they initially thought about the least, and fill
in the associated Details portion of the survey with this new perspective. As an example, the Tabular Details interface is shown (D). Participants are
encouraged throughout the survey to look up terminology highlighted in red, where participants can edit the terms and suggest alternative definitions
in the glossary (E). In some Details sections, participants are asked for a small sample of what they imagine the data to look like, to help ground their
thinking (F). At any point in a Details section (G), or at the end of most other sections (H), participants can choose to skip the section to provide
targeted critique on the survey itself if the questions have strayed far enough from the participant’s mental model.
it more easily fits into their current workflow and tools, or because they Codes C7–C12 are based on evidence from multiple sources, and are
do not know of existing “unconventional,” non-tabular tools. suggestive of unspoken perspectives, intuitions, and fears that may be
C2. There was wide variation in reported dataset scales. Taken common among data workers.
from the median response for each of the “Basic Dataset Characteristics” C7. Even before the survey guided participants to alternative ab-
questions (e.g. “Approximately how large is this dataset?”), the median stractions, they discussed how they could see their data in other
dataset was on the order of megabytes (close to gigabytes) in size, with forms. This manifested both in conversations with participants before
thousands of items in the dataset and tens of attributes. they took the survey, as well as in comments in the earliest sections of
C3. Participants included broad techniques in their responses for the survey before the question was asked.
wrangling tool support. When asked to actually transform their initial C8. Many data workers did not feel that what they work with
dataset into the alternative abstraction type, most participants listed “counts as data.” This comment was a common refrain while so-
software tools or programming languages but some listed techniques. liciting survey participation at both technical conferences, as well as
These techniques included natural language processing (“NLP, Python”, through deployment across the university. However, outside of the sur-
“Python, nlp techniques”), machine learning, and mathematical opera- vey, three informants (I1, I2, I4) independently made this observation
tions (“cluster into connected components”, “Morse Smale Complex”). while reflecting on their experiences working with people new to Data
C4. Participants sometimes noted that they would need to ask a Science. For example, I2 often runs a data science workshop in the
domain or visualization expert for help in order to change data humanities but it tends to get very low attendance—often the same three
abstractions. Along with techniques and software solutions appearing participants. Seeing information as “data” may take a certain level of
as answers to how the participant would actually transform the data creativity and willingness to experiment and fail. One Supercomputing
abstraction, some participants acknowledged they either needed more survey participant working on hardware design felt that treating circuit
information from a data theorist (e.g. “Could be displayed as a tree, I diagrams as “data” would be very strange, and perhaps inappropriate.
would hire someone”) or from a domain expert (“...need to discuss this C9. Thinking about alternative data abstractions can provoke
in more detail with a domain expert...this data was not provided”). fears of scope creep. During a discussion with informants I6–I12,
C5. Participants sometimes noted that more information would there was a consensus that exploring alternative abstractions can be
need to be collected and added to the data before transitioning to very beneficial for the success of a project, however, it was also cau-
a different abstraction. To transform their data from one abstraction tioned that it would have the potential to cause misalignments in the
to another, participants stated that they would need to collect additional vision of a collaboration—usually termed “scope creep.” Data workers
data, such as images, speech transcripts, recordings, and labels. are often cognizant of the impacts that changes to the design of their
abstraction will have, including considerations and costs that they may
C6. There was a wide distribution of the tools and techniques that
or may not be able to articulate in detail.
data workers would use to wrangle data. Survey participants re-
ported 54 different tools by name, with many tools being unique to a C10. Data abstractions are often personal in nature to a data
single participant. Tools that were mentioned by multiple participants worker. Based on prior experiences, such as designing a visualization
tended to be programming languages. with I6–I12, the authors recognized that abstractions can be personal,
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 28,2021 at 20:49:57 UTC from IEEE Xplore. Restrictions apply.
1508 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 27, NO. 2, FEBRUARY 2021
subjective, and contextual. Wrapped in an existing data abstraction disconnect between how visualization people talk about data, and how
are a data worker’s personal preferences, prior data science knowledge, data workers in general talk about data.
and domain knowledge. Thus, suggestions to change this abstraction C17. Many data workers consider functions to be data. One unex-
are often met with feelings of confusion and resistance. Some of these pected finding, after reviewing responses aligning with (C8), was that
emotions stem from concerns about additional work overhead, such as a subset of participants recognize functions as data. These datasets
those identified by (C9). Other times, these emotions stem from the include continuous models, functions like regression models from hous-
ecosystem of how the data was created, the people it may impact, and ing data, collections of partial differential equations, or constraint data
the subjects of the data—all things that a data worker may understand for linear or integer programming, which I5 and one author did not
but a theorist may be unaware of. consider to be “spatial” as defined in the survey.
C11. Data workers often have “gut feelings” or intuition about C18. Many data workers consider code to be data. As part of a
their data as networks. Data workers, regardless of whether their data larger discussion about open science and data sharing, several infor-
is known to be network data or not, tended to have some intuition about mants noted that code should be considered data. At a minimum, code
the existence of networks within their data, even if specifics such as the acts as “metadata” by providing provenance of where a given dataset
meaning of a node or edge were unknown. Special types of networks, came from. As I6 noted that, “one person’s metadata is another person’s
such as DAGs and trees, were also mentioned. data.”
C12. Data workers often have “gut feelings” or intuition about
their data as clusters, sets, or groups. Similar to (C11), data workers Codes C19–C21 describe the different ways that it was difficult to focus
also had intuition about the existence of groups in their data. They conversations with data workers on the design of a data abstraction.
sometimes referred to hierarchies existing in and among these groups, C19. The design of a data abstraction proved difficult to talk about
and also intuited patterns and clusters in their data. in isolation from specific file formats. Related to (C16), some survey
participants misunderstood the connection between an abstraction and
Codes C13–C18 highlight informative weaknesses of our typology. its implementation (e.g. a table vs. a spreadsheet). As a result, in
response to our request for “Other Generalizations,” they suggested
C13. There is wide variation in how data workers describe hier- file formats that were clear fits for our existing six abstractions such
archies. There was some initial difficulty designing the survey when as: “directed graph represented in a format such as dot” instead of
deciding where hierarchies should fall. Even among the authors, we rec- Network/Hierarchy, “CSV file” instead of Tabular, “a collection of free
ognized that one could describe hierarchies as spatial, as networks, as text” instead of Textual.
nested sets. We questioned whether a tree and a hierarchy are the same
thing, but concluded they have semantic differences. In the final sur- C20. The design of data abstraction proved difficult to talk about
vey, hierarchies were grouped with networks as a “Network/Hierarchy” in isolation from software and programming language abstrac-
abstraction type, with “Hierarchy” chosen deliberately to seek feed- tions. One author noted difficulties in focusing conversations on how a
back. This diversity of perspectives was confirmed; one participant person thinks about their data; informants frequently pivoted to talking
commented that they more closely align hierarchies with groups: “I find about abstractions imposed by software that were often only loosely
the separation of hierarchies and groupings to be a bit problematic for associated with the data model itself, such as git’s model of remotes
this domain. Many codes, such as diagnosis codes, exist in a hierarchy and branches, or Jupyter’s statefulness.
(defined by metadata). However it is quite common to refer to areas of C21. The design of a data abstraction proved difficult to talk about
this hierarchy as groupings.” in isolation from discovery, capture, curation, and creation. [28]
C14. Most datasets did not fit in one category, and participants Discussions often detoured from data design to topics such as data
talked about not just the raw data, but derived values, metadata, provenance and other data wrangling concerns. Similarly, when
or even “multiple datasets.” Participants often selected multiple data prompted to transform their data from one abstraction to another, some
abstractions in response to the initial question of categorizing their participants suggested collecting entirely new datasets, rather than
dataset. Heterogeneous datasets are very common, such as when meta- transforming the existing data.
data takes a different form from the main dataset, or when one dataset
is a nested “value” inside another of a different type. Codes C22–C24 describe things that appeared to aid reflection and
communication about data abstractions.
C15. “Media” as a category had a less well-defined mental model,
resulting in a space with too little structure for participants to map C22. Showing real data, such as a spreadsheet, helps data work-
their data crisply when forced to think of their data as “media.” ers and theorists communicate effectively about data abstractions.
When asked to consider media as an alternative abstraction, a common Many different interactions at community meetups, such as with I3 and
response was to imagine screen-capturing to record images and video I19, were enhanced by the culture of bringing laptops to show data and
of a visualization of the data. But thinking of their data in this way inspect it together.
elicited feelings of discomfort from some participants; comments such C23. Data abstraction typologies help data workers discover la-
as: “This is weird. I think of the data not as media but I’m actively tent data abstractions. Asking questions about a data abstraction and
trying to turn it into media” and ”I have displayed this data by mapping how it fit, or did not fit, into a typology helped expand data workers’
some of it [to color channels in a heatmap], but I don’t consider the data view of their dataset. One participant noted: “The questions made me
itself to ‘be’ media or ‘have’ media.” Some data workers understand think more about ‘the nature’ of this dataset. I had always considered it
some sort of inherent visual quality of their data. For example, one to be ‘just tabular’ but I realize that there is a hierarchy and geographic
response was “The data set itself does not include any media, but data (and a geographic hierarchy) which I hadn’t really considered
interpretations of it are visual in nature... The data could be illustrated before. As I type this, we could layer in time and sets when considering
by addition of multidimensional images or 3D meshes when interlinked multiple elections.” Data abstraction typologies can help data work-
with concepts in the graph.” ers discover underlying latent abstractions, like hierarchies, or how
C16. Even very technical data workers find some data abstraction visualizing their data with additional data abstractions may augment
concepts, language foreign. We noticed confusion and misunder- understanding, like adding images to patient records.
standing surrounding our abstraction terminology; notably, terminol- C24. Data abstraction typologies help data workers communicate
ogy surrounding tabular data (e.g. items, attributes) was unknown to at a sufficient level of detail to design a visualization system. We
one Supercomputing participant and needed to be related to the phys- observed this directly with I6–I12. A survey participant also noted
ical spreadsheet (e.g. rows, columns) to clarify. This difference in that the mental exercise of the survey “prodded me into thinking about
theory-based thinking and practice-based thinking shows that there is a my annotations as more of a central player in the overall visualization
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 28,2021 at 20:49:57 UTC from IEEE Xplore. Restrictions apply.
BIGELOW ET AL.: GUIDELINES FOR PURSUING AND REVEALING DATA ABSTRACTIONS1509
as opposed to a secondary thought or supporting contextual element.” This is why we predict that T3: data workers are less willing
Discussing abstraction typologies helps create a common data design to pursue latent data abstractions when the design of an existing
language and reinforces the value that both sides (the data worker and abstraction is already fundamental to their workflow. When there
visualization designer) bring to the data problem. exists a direct mapping between familiar software and the raw data
format, efforts to introduce new abstractions will likely meet resistance.
5 T HEMES The costs of a changed data abstraction design can include a need to
learn new file formats (C19) and new software (C20) that may come
Together, these codes form four overarching themes, including the with the need to learn new software skills such as programming. The
prevalence of latent data abstractions, interventionist impacts that pur- tight coupling between data abstractions, workflows, and software can
suing latent abstractions can have, why many data workers may express be seen in the bespoke wrangling software needs that arise from the
hesitancy to pursue latent abstractions, and benefits that transparency combinatoric expansion of diverse abstractions, diverse workflows,
about data typologies can have for the latent abstraction discovery and diverse dataset scales (C6) (C2). However, the added cost is
process. Here, we enumerate evidence that supports each theme. reduced when software practices have not yet been established and
T1: Latent data abstractions are very common. At least initially, investments in learning new skills have not been made. This cost can
raw data formats are not designed in such a way as to anticipate all also be mitigated when theorists are willing and able to provide expert
abstractions that may be needed or useful, yet even though abstractions help (C4), such as wrangling the data to its needed forms.
may not be fully actualized in a computer, data workers are often aware Similarly, the costs of pursuing latent data abstractions can propagate
of meaningful, useful abstractions that they can communicate about to other data concerns (C21), such as the need to collect additional
without specific prompts (C7). Some of these abstractions, particularly data (C5). The fears that data workers often feel (C10) and voice (C9)
networks (C11) and groups (C12), are intuitive to many data workers. are suggestive that data abstraction changes can spill over into task
This theme validates a known [3,41] phenomenon that data rarely has abstraction changes that may begin to depart from data workers’ actual
a “correct” abstraction, even where predominant file formats exist; we needs. This potential cost can be an opportunity if care is taken to solicit
observed that discrepancies between raw file formats and the way that critique whenever theoretical perspectives are introduced. Such intro-
a data worker thinks are common (C1). Instead, data abstractions have ductions often encourage data workers to provide detailed information
a complex and evolving form (C14) that must be explicitly designed. about their mental models that they might not otherwise articulate.
The designed nature of data abstractions makes it important to note Theorists need not wait for such impositions, however, to solicit this
that neither data workers nor theorists possess comprehensive knowl- kind of targeted feedback. T4: Like access to real data, introducing
edge of all possible latent abstractions, and open-minded communica- a data abstraction typology helps to focus reflection and commu-
tion is necessary for meaningful, useful abstractions to be discovered. nication about data abstractions at a level of detail that includes
This is true for both parties: theorists are often aware of abstractions actionable information.
that data workers might not consider to “count” as data (C8). Similarly, Our data (C22) validates the known pitfall [41] in which the lack
data workers may be aware of abstractions that theorists do not con- of access to real data can doom a design study collaboration, because
sider to “count” as data (C17) (C18). Data workers and theorists may visualization researchers are less likely to have enough actionable infor-
also think about the details of the same abstraction differently (C13). mation to articulate an accurate data abstraction. It also validates that a
Introducing a typology of data abstractions can expose abstractions culture of data review [49], that is careful to emphasize good communi-
that neither party has considered, in that a typology can contain new cation and transparency about the data abstraction, can compensate for
abstractions that data workers may not be aware of, or they may lack a lack of access to real data because the detailed abstraction is a joint
new abstractions that theorists have not considered (C23). objective that all parties have a stake in.
T2: The visualization community identifies data abstractions When theorists take the time to be transparent about their agenda,
for its own transferability needs, but the process of identifying an including the typology that they are attempting to fit a worker’s data
abstraction is an intervention with far-reaching effects. Collabora- into, revealing the typology can have similar benefits in that it helps
tions with data workers beyond the visualization research community a data worker understand what a theorist is looking for (C24). In-
stand to benefit—and can be harmed—by the way that both parties troducing typologies can expose data workers to latent abstractions
introduce, articulate, and explore data abstractions. that they may not have considered (C23), and provides an opportu-
Our data validates that visualization researchers, as theorists, are nity to provide detailed feedback that might otherwise be left unspo-
not operating in a vacuum; some abstractions that are common in the ken (C17) (C18) (C13). For example, introducing a typology that is
research community are intuitive to many data workers (C11) (C12). a poor fit in how it subdivides data abstraction categories can serve
However, although these commonalities may be good news for the as an aid to communication, in that it can highlight the detailed ways
validity of the work that visualization researchers perform, there are that a worker considers their data to fit or partially fit more than one
also areas in which the culture of visualization research clashes with abstraction category (C14).
data workers at large: there is a often a disconnect between what Not all shortcomings of a typology are equally beneficial, however.
theorists consider to be data and what data workers consider to be Data abstraction categories that are too general (C15) or rely too heav-
data (C8) (C17) (C18). Disconnects also occur between the language ily upon jargon (C16) may have limited utility. These limits are highly
that theorists use to describe data, and the language that data workers contextual, for example, a typology that differentiates between par-
use (C16). These differences in culture risk miscommunication at best, titions of an abstract mathematical space and regions of a physical
but also may risk the development of a bad collaboration, where either three-dimensional space might be useful for a data worker with a rich
the theorists’ goals or the data workers’ goals become subordinate. mathematical background to reflect upon; however, for a worker with
Consequently, for better or worse, introducing a theoretical perspec- less mathematical training, the amount of unfamiliar jargon introduced
tive is almost always an intervention, and the effects of such interven- could inhibit detailed feedback. When introducing a data abstraction
tions can be profound. Because the design of data abstractions is so typology as an explicit design activity [24], care should be taken to
inextricably linked to the other concerns of data discovery, capture, choose a typology with an appropriate level of granularity and enough
curation, and creation (C21), changes to the design of a dataset can accessible concepts to encourage feedback and critique.
result in changes to all of its other aspects. Similarly, influencing a
data worker’s mental model of their data can have far-reaching practi-
cal effects, including disruptions in workflows and changes to the file 6 D ISCUSSION
formats (C19) and software (C20) that data workers use. The codes and themes that we present describe phenomena that are
Data workers are often cognizant of the impacts that changes to the suggestive of guidelines for theorizing about data abstractions. Ad-
design of their abstraction will have (C9), even if they may not be able ditionally, it has implications for reporting data abstractions in many
to fully articulate these impacts in detail (C10). kinds of visualization research. We also reflect on our experiences and
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 28,2021 at 20:49:57 UTC from IEEE Xplore. Restrictions apply.
1510 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 27, NO. 2, FEBRUARY 2021
their implications for the design of data abstraction typologies, and transformations it has gone through, and related data all provide context.
lessons learned from our innovations in survey design and deployment. This context can be used to better understand how the data worker, who
is more familiar with all of these elements, conceptualizes the dataset.
6.1 Guidelines for Pursuing (Latent) Data Abstractions
Assess opportunities inherent in derailments. The space of (possibly
Reflecting on the presence of latent data abstractions (T1), the interven- latent) data abstractions is vast in comparison to the minimal data
tionist nature of defining data abstractions (T2) and in some cases the abstraction represented in a visualization project. In following these
resistance to it (T3), and the focusing power of typologies (T4), along guidelines, it can be easy for both theorists and workers to feel that the
with our coded findings in Sect. 4, we proffer the following guidelines: discussion has become derailed: workers may begin to discuss other
Data owners and abstraction theorists should collaboratively data concerns such as data discovery, capture, curation, or creation.
probe raw data. A typical design workflow may have data owners Workers may also discuss specific software or even prematurely begin
describe their data synchronously and then give one or more data files to volunteer visualization encodings and techniques. Similarly, theorists
to the abstraction theorists for later review. There are several surfaces may appear to be exploring esoteric concepts that do not have a clear
of loss in this approach, in which latent information remains latent. application to a worker’s project, and their explorations may threaten
Data owners may forget to review elements of their data. Abstraction to add unnecessary labor to a worker’s workload.
theorists may make assumptions given the data file that are only revis- These derailments can be an opportunity to gain insight: First, dis-
ited much later, if at all. Instead, we recommend that initial meetings cussing the design of a dataset has a tendency to prompt communication
with data owners involve the presentation and collaborative probing of of important low-level information—even if seemingly unrelated—that
at least one raw dataset. workers would not otherwise bring up. Second, workers may actually
be speaking on topic, but using seemingly irrelevant language about for-
Abstraction theorists should introduce the typology and process
mats, software, or visualization as proxies that can be revealing about
that they follow. Just as theorists can feel lost without exposure to
domain conventions or language, as well as revealing a need for the
the raw data, data workers can feel lost when theorists attempt to fit a
theorist to be more transparent about what they are looking for. Third,
worker’s project into an opaque typology or framework. For example,
seemingly irrelevant topics may be indicative of a high-level mismatch
if a worker does not understand, at least at a basic level, that a theorist
of objectives, differences in perspective, or other miscommunications
is attempting to identify relevant data abstractions before considering
that could otherwise go unnoticed.
visual encodings, workers are forced to second-guess the theorist’s
Actively seeking critique from data workers can help to identify a
needs. In such a situation, discussing their data in terms of potential
theorist’s own derailments. Once derailments are identified, ascertain-
visualization designs may appear to be helpful. As theorists request
ing the extent to which any of these three opportunities exist can guide
that workers provide at least one raw dataset, theorists should also
a theorist as to whether, when, and how to re-center the conversation.
reciprocate by preparing and presenting sufficient background about
what they are hoping to learn or observe. Document objectives and revisit them regularly. Collaborators
often have different high-level expectations, ideas, agendas, and
Create artifacts that document and convey abstraction details and
sub-goals/tasks. This is complicated by the potential for a latent
demonstrate possible permutations. We discovered that even in dis-
abstraction—even considering one hypothetically—to change collab-
cussion among the authors, people who had a close working relationship
orator perspectives and goals in ways that may not be communicated
and were operating from the same typology, there were times when we
immediately. We recommend documenting the objectives of the project,
believed we were discussing the same abstraction of the data, only to
and revisiting those objectives, especially when derailments are indica-
discover we had completely different assumptions once drawings or
tive of high-level mismatches.
classifications were made explicit. Explicitly stating ideas serves as not
only a communication aid, but also as a method to explore the creative Schedule interventions to revisit data abstractions. The above
space of possible abstractions and as documentation for resulting ab- guidelines discuss how to make the latent apparent, but require the
stractions. Furthermore, writing or drawing such low-level details can latent exist in the minds of people or the artifacts (e.g., the raw data)
be an effective strategy to ground a derailed conversation and refocus it available. However, over the course of the project, all people involved
back on the design of the data. may discover new facets of the data or incorrect assumptions previously
made. Sometimes these discoveries lead to immediate intervention, but
Challenges are an effective means of probing. They require an ar-
sometimes they expand the latent space. We recommend scheduling
tifact to be challenged. Throughout our interactions with data workers,
time to revisit, challenge, and refine data abstractions, given possible
we observed that suggesting a concrete abstraction, particularly one
discoveries that are latent.
that was unlike how the data worker usually conceptualized their data,
elicited rich feedback about their data and their thinking on it. Re- 6.2 Implications for Reporting Data Abstractions
sponses beginning with phrases like “That wouldn’t work because...”
or “That makes no sense” were precursors to valuable reflections on Our data suggests that providing the expert help that many data workers
their data. Setting up such a response requires some form of artifact, need can make visualization researchers more effective collaborators.
verbal, pictorial, textual, or otherwise to be challenged. We recommend Until recently, as we discuss in Sect. 2.3, performing, documenting, and
such situations be approached sincerely as an honest, creative exercise reporting on this kind of work may have been difficult to accomplish
towards considering other forms. by itself, even though there is a great need for published guidance and
experience to inform many different kinds of visualization research.
Typologies can serve as a guide to elicit latent elements of the data We expect performing and reporting on detailed, applied data wran-
abstraction from data workers. A given typology may not fit all gling work better equips visualization experts to collaborate effectively.
elements of a particular problem and dataset. However, it provides a Recent acknowledgements of “Data Transformation,” [32] “Data Ab-
corpus of possible abstractions with which to consider the data. These straction,” and “Data Structure” [20] as potential standalone contribu-
possibilities can serve as a jumping point to discuss and challenge tion areas may aid in these efforts. We also suggest that such reports
possible abstractions of the data. Through our survey and interviews, may be able to help ground technique- and systems- focused research
we observed that discussions of fitting the data to various forms evoked in more evidence-based user needs.
more detail about the data itself as well as provided structure to explor-
ing possible alternative abstractions. 6.3 Implications for Designing Abstraction Typologies
Document and share the provenance of datasets. It is appropriate Our experience in attempting to apply the same data abstraction ty-
that a visualization and analysis solution operates on a brief period of a pology to a diverse array of data workers and datasets revealed wide
dataset’s lifecycle and often only a subset of all possible data available. variability in the extent to which typologies are likely to fit a particular
However, it is beneficial to document the latent elements of the data context—both the diversity of datasets and the diversity of data worker
beyond that directly used by that solution. The source of the data, the expertise and perspectives can risk a poor fit.
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 28,2021 at 20:49:57 UTC from IEEE Xplore. Restrictions apply.
BIGELOW ET AL.: GUIDELINES FOR PURSUING AND REVEALING DATA ABSTRACTIONS1511
Our data shows that this is not necessarily problematic. It demon- of responses in which participants appeared to abuse the ability to skip,
strates how typologies can be useful in pursuing latent data abstractions etc., resulted in a set of questionable responses that we flagged.
despite—and, in some circumstances, because of—their limitations. In Rather than suppress these errata, they are included in the archive,
the spirit of the observation that “all models are wrong but some are and documented in context in the visualized summary of each question.
useful” [5], shortcomings of a typology can create opportunities to aid The set of questionable responses can also be interactively filtered out.
in detailed communication and reflection that might be less likely if the As the survey design itself is not our primary contribution, we have
typology were a perfect fit. only evaluated the extent to which our innovations were effective in
This also suggests that typologies may not scale well for purposes be- achieving our qualitative aims. We can not speak to whether they are
yond the pursuit of latent data abstractions: typologies must generalize effective ways to solicit critique in general, nor engaging enough to
in order to be tractable and support comparison, however, generaliza- encourage theoretical reflection at the levels that we observed.
tions fundamentally censor diverse, individual voices and risk stifling
important exceptions and innovative thinking. Our corpus of survey 7.2 Further Survey Deployment
responses demonstrates a way that a conversation about the nature of The feedback that we collected may also have been influenced by
data abstractions can be conducted at scale, in a way that balances the the groups where we deployed the survey and wrote memos about
need for generalizability, while giving priority to individual viewpoints our observations. The populations we engaged with during this study
and grounding discussion in the context of real-world applications. In all had a high interest in computing: domain scientists who come
the way that our survey explicitly sought critique on the typology that to hacking-oriented meetups and attendees at computing conferences.
we presented, it allowed for enough organization to visualize, compare, Although the Supercomputing conference has thousands of attendees
and contrast hundreds of viewpoints, while giving wider freedom for who are there for reasons other than the technical program, in some
participants to engage directly with its implicit theoretical questions. interactions, we had difficulty convincing those people that their data
counts as “data.” Thus, our data and subsequent findings are lacking
6.4 Reflections on Survey Innovations and Deployment representation among people who do not identify with data.
Unlike typical surveys that primarily collect quantitative information Effectively engaging people with less overt interest [36], that may
for well-defined questions, our main objective in deploying the survey not share the goals represented by our “data worker” persona, is an
was to probe for blind spots in our own understanding of what data ongoing effort that we hope to pursue in future work. Subsequent
abstractions exist, and how data workers think about them. survey deployment and memo writing will target more diverse data per-
Consequently, we sought to create a survey that was as open-ended spectives and skill sets, by networking with people from non-Computer
as possible. Closed questions are therefore least ideal, as they provide Science backgrounds. For example, Meetup attendees have already
zero opportunities for a participant to signal to researchers when there is referenced ongoing discussions about data abstractions in a paleontol-
a problem—researchers have to anticipate every possible response [39]. ogy community. They are considering how to best match and connect
Open-ended, free response questions at least make it possible for competing ontologies from different sources. Similarly, we have been
participants to submit critique, but because they’re expensive to code connected with a group of vehicle mechanics that are adapting their
and analyze, and because they introduce more survey fatigue, they often tables of diagnostic metrics to changes introduced by increasing num-
take the form of a single comment field at the end that are only used as bers of electric vehicles. Other potential domains include linguistics,
an “outlet” for participants, rather than a prioritized source of data [12]. sociology, bioinformatics, construction equipment, and athletics. We
intend to advertise and deploy our survey to more diverse groups of data
The extent to which participants freely made use of the ability to skip
workers, through academic and professional conferences, at relevant
survey sections suggests that this approach has several benefits. Replac-
community Meetup events, and through word of mouth.
ing a whole section of a survey with a single free response field appears
to help mitigate survey fatigue. The free response field is at least as 7.3 Data Reuse
open-ended as regular free response questions, and consequently incurs
no additional analysis cost. The act of stepping outside the normal flow We have released the public portions of the survey data in a visual,
of the survey appears to have encouraged participants to think about the searchable format as a standalone research contribution, so that in-
design of the survey itself, and in some cases, engage at a theoretical dividual voices can be heard and reviewed by researchers studying
level that more closely resembles a forum than a survey. similar phenomena, beyond our research aims. Such aims might in-
In contrast, our interactive glossary did not appear to have garnered clude creating terminology maps across domains, using evidence in our
as much attention—this may have been due to its placement outside the survey responses to motivate and justify the design of general-purpose
flow of the survey, and/or its position in the corner of the screen. visualization and data wrangling tools, and other analyses.
The survey innovations created opportunities to improve our under- 8 C ONCLUSION
standing of what data abstractions exist, what terminology is actually
used by diverse data workers, to refine the evolving themes, and we Our grounded theory investigation into the malleability of data abstrac-
expect it will inform future iterations of the survey. tions has resulted in themes that describe data abstractions and their
implications for visualization design, guidelines for the development of
7 L IMITATIONS AND F UTURE W ORK data abstractions, the design and deployment of an open survey, and a
corpus of survey responses that represent a discussion about the nature
Here we document the limitations of the survey that we present, our of data abstractions at scale. This work has implications for how data
intent to deploy it to a broader audience, and suggest future uses for the abstractions are reported, how typologies are designed and discussed,
dataset that we have released. and may inform future surveys that seek critique. Ultimately, this work
sheds light on why thinking and communicating about data abstractions
7.1 Survey Design and Evaluation can be difficult, and shows how to best take advantage of opportunities
The archive of survey responses that we present is not without typi- inherent in that process, as well as mitigate its risks.
cal technical difficulties. One major drawback of its design was that
the length of the survey varied, depending on the difference between ACKNOWLEDGMENTS
“rarely” thinking about a dataset as a certain type and “never.” This The authors wish to thank all participants; informants; Arizona Re-
resulted in some participants filling out lengthier surveys, who showed search Bazaar; Arizona Research Computing; the Humans, Data, and
signs of fatigue. Additionally, a question in several of the Details Computers Lab; and Joshua Levine. This work was supported by
sections had a bug that failed to capture data completely. Finally, the United States Department of Defense through DTIC Contract
as participants almost always took the survey on their own devices, FA8075-14-D-0002-0007, the National Science Foundation under NSF
connectivity and browser incompatibility issues arose, especially for III-1656958 and NSF III-1844573, and UA Health Sciences through
specific iOS devices. These challenges, together with a small number the Data Science Fellows program.
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 28,2021 at 20:49:57 UTC from IEEE Xplore. Restrictions apply.
1512 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 27, NO. 2, FEBRUARY 2021
R EFERENCES and Errors on Novel Evaluation Methods for Visualization, BELIV ’16,
pp. 10–18. Association for Computing Machinery, Baltimore, MD, USA,
[1] H. Beyer and K. Holtzblatt. Contextual design. Interactions, 6(1):32–42, Oct. 2016. doi: 10.1145/2993901.2993916
Jan. 1999. doi: 10.1145/291224.291229 [23] S. McKenna. The Design Activity Framework: Investigating the Data
[2] A. Bigelow. Driving Genetics with Experimental Visualization. Under- Visualization Design Process. PhD thesis, University of Utah, June 2017.
graduate thesis, University of Utah, Salt Lake City, UT, USA, 2012. [24] S. McKenna, D. Mazur, J. Agutter, and M. Meyer. Design activity frame-
[3] A. Bigelow, S. Drucker, D. Fisher, and M. Meyer. Reflections on how work for visualization design. IEEE Transactions on Visualization and
designers design with data. In Proceedings of the 2014 International Computer Graphics, 20(12):2191–2200, Dec. 2014. doi: 10.1109/TVCG.2014
Working Conference on Advanced Visual Interfaces, AVI ’14, pp. 17–24. .2346331
ACM, 2014. doi: 10.1145/2598153.2598175 [25] M. Meyer and J. Dykes. Criteria for rigor in visualization design study.
[4] A. Bigelow, C. Nobre, M. D. Meyer, and A. Lex. Origraph: Interactive IEEE Transactions on Visualization and Computer Graphics, pp. 1–1,
network wrangling. IEEE VAST, 2019. 2019. doi: 10.1109/TVCG.2019.2934539
[5] G. E. Box. Robustness in the strategy of scientific model building. In [26] M. Meyer, M. Sedlmair, P. S. Quinan, and T. Munzner. The nested blocks
Robustness in Statistics, pp. 201–236. Academic Press, 1979. doi: 10. and guidelines model. Information Visualization, 14(3):234–249, July
1016/B978-0-12-438150-6.50018-2 2015. doi: 10.1177/1473871613510429
[6] M. Carcary. The research audit trial–enhancing trustworthiness in qual- [27] A. Moravcsik. Active citation: A precondition for replicable qualitative
itative inquiry. Electronic Journal of Business Research Methods, 7(1), research. PS: Political Science & Politics, 43(1):29–35, Jan. 2010. doi: 10.
2009. 1017/S1049096510990781
[7] K. Charmaz. Constructing Grounded Theory. sage, 2014. [28] M. Muller. Curiosity, creativity, and surprise as analytic tools: Grounded
[8] E. Chi. A taxonomy of visualization techniques using the data state theory method. In J. S. Olson and W. A. Kellogg, eds., Ways of Knowing
reference model. In IEEE Symposium on Information Visualization 2000. in HCI, pp. 25–48. Springer, New York, NY, 2014. doi: 10.1007/978-1-4939
INFOVIS 2000. Proceedings, pp. 69–75, Oct. 2000. doi: 10.1109/INFVIS. -0378-8 2
2000.885092 [29] M. Muller, I. Lange, D. Wang, D. Piorkowski, J. Tsay, Q. V. Liao,
[9] M. Da Gandra and M. Van Neck. InformForm: Information Design: In C. Dugan, and T. Erickson. How data science workers work with data:
Theory, an Informed Practice. Mwmcreative Limited, July 2012. Discovery, capture, curation, design, creation. In Proceedings of the 2019
[10] M. Feinberg. A design perspective on data. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI ’19, pp.
CHI Conference on Human Factors in Computing Systems, CHI ’17, pp. 126:1–126:15. ACM, 2019. doi: 10.1145/3290605.3300356
2952–2963. Association for Computing Machinery, Denver, Colorado, [30] T. Munzner. A nested model for visualization design and validation. IEEE
USA, May 2017. doi: 10.1145/3025453.3025837 Transactions on Visualization and Computer Graphics, 15(6):921–928,
[11] A. Figueiras. A typology for data visualization on the web. In 2013 17th Nov. 2009. doi: 10.1109/TVCG.2009.111
International Conference on Information Visualisation, pp. 351–358, July [31] T. Munzner. What: Data abstraction. In Visualization Analysis and Design.
2013. doi: 10.1109/IV.2013.45 CRC Press, Dec. 2014.
[12] J. G. Geer. Do open-ended questions measure ”salient” issues? Public [32] T. Munzner, A. Endert, A. Lex, A. Ynnerman, C. Garth,
Opinion Quarterly, 55(3):360–370, Jan. 1991. doi: 10.1086/269268 M. Chen, P. Isenberg, and L. Shixia. Revise committee
[13] S. Goodwin, J. Dykes, S. Jones, I. Dillingham, G. Dove, A. Duffy, town hall. https://drive.google.com/drive/u/0/folders/
A. Kachkaev, A. Slingsby, and J. Wood. Creative user-centered visu- 1dqssldHbXLmAD9zeOqHCbfNTb8gjeHKS, Oct. 2019.
alization design for energy analysts and modelers. IEEE Transactions on [33] C. Nielsen, S. Jackman, I. Birol, and S. Jones. Abyss-explorer: Visualizing
Visualization and Computer Graphics, 19(12):2516–2525, Dec. 2013. doi: genome sequence assemblies. IEEE Transactions on Visualization and
10.1109/TVCG.2013.145 Computer Graphics, 15(6):881–888, Nov. 2009. doi: 10.1109/TVCG.2009.
[14] P. J. Guo, S. Kandel, J. M. Hellerstein, and J. Heer. Proactive wrangling: 116
Mixed-initiative end-user programming of data transformation scripts. [34] S. Passi and S. Jackson. Data vision: Learning to see through algorithmic
In Proceedings of the 24th Annual ACM Symposium on User Interface abstraction. In Proceedings of the 2017 ACM Conference on Computer
Software and Technology, UIST ’11, pp. 65–74. Association for Com- Supported Cooperative Work and Social Computing - CSCW ’17, pp. 2436–
puting Machinery, Santa Barbara, California, USA, Oct. 2011. doi: 10. 2447. ACM Press, Portland, Oregon, USA, 2017. doi: 10.1145/2998181.
1145/2047196.2047205 2998331
[15] B. Hanington and B. Martin. Universal Methods of Design:100 Ways [35] S. Passi and S. J. Jackson. Trust in data science: Collaboration, translation,
to Research Complex Problems, Develop Innovative Ideas, and Design and accountability in corporate data science projects. Proceedings of the
Effective Solutions. Rockport Publishers, Feb. 2012. ACM on Human-Computer Interaction, 2(CSCW):136:1–136:28, Nov.
[16] J. Heer and A. Perer. Orion: A system for modeling, transformation and 2018. doi: 10.1145/3274405
visualization of multidimensional heterogeneous networks. In IEEE Visual [36] E. M. Peck, S. E. Ayuso, and O. El-Etr. Data is personal: Attitudes and
Analytics Science \& Technology (VAST), p. 10, 2011. perceptions of data visualization in rural pennsylvania. In Proceedings of
[17] S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. Wrangler: Interactive the 2019 CHI Conference on Human Factors in Computing Systems, CHI
visual specification of data transformation scripts. Proceedings of the ’19, pp. 1–12. Association for Computing Machinery, Glasgow, Scotland
SIGCHI Conference on Human Factors in Computing Systems, pp. 3363– Uk, May 2019. doi: 10.1145/3290605.3300474
3372, 2011. doi: 10.1145/1978942.1979444 [37] K. H. Pine and M. Liboiron. The politics of measurement and action. In
[18] E. Kerzner, S. Goodwin, J. Dykes, S. Jones, and M. Meyer. A framework Proceedings of the 33rd Annual ACM Conference on Human Factors in
for creative visualization-opportunities workshops. IEEE Transactions on Computing Systems, CHI ’15, pp. 3147–3156. Association for Computing
Visualization and Computer Graphics, 25(1):748–758, Jan. 2019. doi: 10. Machinery, Seoul, Republic of Korea, Apr. 2015. doi: 10.1145/2702123.
1109/TVCG.2018.2865241 2702298
[19] Á. Kiss and T. Szirányi. Evaluation of manually created ground truth [38] A. J. Pretorius and J. J. Van Wijk. What does the user want to see? what
for multi-view people localization. In Proceedings of the International do the data want to be? Information Visualization, 8(3):153–166, Sept.
Workshop on Video and Image Ground Truth in Computer Vision Appli- 2009. doi: 10.1057/ivs.2009.13
cations, VIGTA ’13, pp. 1–6. Association for Computing Machinery, St. [39] U. Reja, K. L. Manfreda, V. Hlebec, and V. Vehovar. Open-ended vs.
Petersburg, Russia, July 2013. doi: 10.1145/2501105.2501106 close-ended questions in web questionnaires. Developments in Applied
[20] B. Lee, K. Isaacs, D. A. Szafir, G. E. Marai, C. Turkay, M. Tory, S. Carpen- Statistics, 19(1):159–177, 2003.
dale, and A. Endert. Broadening intellectual diversity in visualization [40] M. Sandelowski. Real qualitative researchers do not count: The use of
research papers. IEEE Computer Graphics and Applications, 39(4):78–85, numbers in qualitative research. Research in Nursing & Health, 24(3):230–
July 2019. doi: 10.1109/MCG.2019.2914844 240, June 2001. doi: 10.1002/nur.1025
[21] Z. Liu, S. B. Navathe, and J. T. Stasko. Ploceus: Modeling, visualizing, and [41] M. Sedlmair, M. Meyer, and T. Munzner. Design study methodology:
analyzing tabular data as networks. Information Visualization, 13(1):59– Reflections from the trenches and the statcks. IEEE Transactions on
89, Jan. 2014. doi: 10.1177/1473871613488591 Visualization and Computer Graphics, 18(12):2431–2440, 2012.
[22] N. McCurdy, J. Dykes, and M. Meyer. Action design research and visu- [42] A. Srinivasan, H. Park, A. Endert, and R. C. Basole. Graphiti: Interactive
alization design. In Proceedings of the Sixth Workshop on Beyond Time specification of attribute-based edges for network modeling and visual-
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 28,2021 at 20:49:57 UTC from IEEE Xplore. Restrictions apply.
BIGELOW ET AL.: GUIDELINES FOR PURSUING AND REVEALING DATA ABSTRACTIONS1513
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 28,2021 at 20:49:57 UTC from IEEE Xplore. Restrictions apply.