Readable Code Data Scientists Flashfill
Readable Code Data Scientists Flashfill
Author Keywords
computational notebooks; program synthesis; data science
CCS Concepts
•Human-centered computing → Interactive systems and
tools; •Software and its engineering → Development frame-
works and environments;
INTRODUCTION
Data wrangling—the process of transforming, munging, shap-
ing, and cleaning data to make it suitable for downstream
analysis—is a diffcult and time-consuming activity [4, 14].
Consequently, data scientists spend a substantial portion of
their time preparing data rather than performing data analysis Figure 1: W REX is a programming-by-example environment within a
tasks such as modeling and prediction. computational notebook, which supports a variety of program transfor-
mations to accelerate common data wrangling activities. A Users create
Increasingly, data scientists orchestrate all of their data- a data frame with their dataset and sample it. B W REX’s interactive grid
oriented activities—including wrangling—within a single where users can derive a new column and give data transformation exam-
context: the computational notebook [25, 1, 2, 5, 20, 30, ples. C W REX’s code window containing synthesized code generated
31]. The notebook user interface, essentially, is an interactive from grid interactions. D Synthesized code inserted into a new input cell.
session that contains a collection of input and output “cells.” E Applying synthesized code to full data frame and plotting results.
Data scientists use input code cells, for example, to write
Permission to make digital or hard copies of all or part of this work for personal or Python. The result of running an input cell renders an out-
classroom use is granted without fee provided that copies are not made or distributed put cell, which can display rich media, such as audio, images,
for proft or commercial advantage and that copies bear this notice and the full citation and plots. This interaction paradigm has made notebooks a
on the frst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, popular choice for exploratory data analysis.
to post on servers or to redistribute to lists, requires prior specifc permission and/or a
fee. Request permissions from [email protected]. Through formative interviews with professional data scien-
CHI ’20, April 25–30, 2020, Honolulu, HI, USA. tists at a large, data-driven company, we identifed an unad-
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-6708-0/20/04 ...$15.00. dressed gap between existing data wrangling tools and how
http://dx.doi.org/10.1145/3313831.3376442
data scientists prefer to work within their notebooks. First, EXAMPLE USAGE SCENARIO FOR WREX
although data scientists were aware of and appreciated the Dan is a professional data scientist who uses computational
productivity benefts of existing data wrangling tools, having notebooks in Python. He has recently installed W REX as an
to leave their native notebook environment to perform wran- extension within his notebook environment. Dan has an open-
gling limited the usefulness of these tools. Second, although ended task that requires him to explore an unfamiliar dataset
we expected that data scientists would only want to complete relating to emergency calls (911) for Montgomery County,
their data wrangling tasks, our participants were reluctant to PA. The dataset contains several columns, including the emer-
use data wrangling tools that transformed their data through gency call’s location as a latitude and longitude pair, the time
“black boxes.” Instead, they wanted to inspect the code that of the incident, the title of the emergency, and an assortment
transformed their data. Crucially, data scientists preferred of other columns.
these scripts to be written in their familiar data science lan-
guages, like Python or R. This allows them to insert and ex- First Steps: As with most of his data explorations, Dan
ecute this code directly into their notebooks, modify and ex- starts with a blank Python notebook. He loads the
tend the code if necessary, and keep the data transformation montcoalert.zip dataset into a data frame using pandas—a
code alongside their other notebook code for reproducibility. common library for working with this rectangular data. He
previews a slice of the data frame, the latlng and title
To address this gap, we introduce a hybrid interaction model columns for the frst ten rows A 1 . W REX displays an
that reconciles the productivity benefts of interactivity with interactive grid representing the returned data frame B 2 .
the versatility of programmability. We implemented this in- Through the interactive grid, Dan can view, flter, or search
teraction model as W REX, a Jupyter notebook extension for his data. He can also perform data wrangling using “Derive
Python. W REX automatically displays an interactive grid Column by Example.”
when a code cell returns a tabular data structure, such as a
data frame. Using programming-by-example, data scientists
can provide examples to the system on the data transform they
intend to perform. From these examples, W REX generates
readable code in Python, a popular data science language.
Existing programming-by-example systems for data wran-
gling address some, but not all, of these requirements. Flash-
Fill [15] does not display the transformed code to the data sci-
entist. Although Wrangler [14, 23] can produce Python code,
From Examples to Code: Dan notices that cells in the title
these scripts are not designed to be read or modifed directly
column seem to start with EMS, Fire, and Traffic. As a
by data scientists. Trifacta [36] produces readable code, but
sanity check, he wants to confrm that these are the only types
in a domain-specifc language and not a general-purpose one.
of incidents in his data, and also get a sense of how frequently
The contributions of this paper are as follows: these types of incidents are happening.
• We propose a hybrid interaction model that combines Dan selects the title column by clicking its header 3 , then
programming-by-example with readable code synthesis clicks the “Derive column by example” button to activate this
within the cell-based workfow of computational note- feature 4 . The result is a new empty column 5 through
books. We implement this interaction model as a Jupyter which Dan can provide an example (or more, if necessary)
notebook extension for Python, using an interactive grid of the transform he needs.
and provisional code cell.
• We apply program synthesis to the domain of data science
in a scalable way. Up until now, program synthesis has
been restricted to Excel-like settings where the user wants
to transform a small amount of data. Our approach allows
data scientists to synthesize code on subsets of their data
and to apply this code to other, larger datasets. The synthe- He arbitrarily types in his intention, EMS, into the second row
sized code can be incorporated into existing data pipelines. of the newly created column in the grid 6 . When Dan presses
• Through a user study, we fnd that data scientists are sig- Enter or leaves the cell, W REX detects a cell change in the
nifcantly more effective and effcient at performing data derived column. W REX uses the example provided by Dan
wrangling tasks than manual programming. Through quali- (EMS) with the input example taken from the derived from col-
tative feedback, participants report that W REX reduced bar- umn title (EMS: DIABETIC EMERGENCY) to automatically
riers in having to recall or look up the usage of various data fll in the remaining rows 7 .
transform functions. Data scientists indicated that the avail-
ability of readable code allowed them to verify that the data In addition, W REX presents the actual data transformation to
transform would do what they intended and increased their Dan as Python code through a provisional code cell C 8 .
trust and confdence in the wrangling tool. Moreover, in- This allows Dan to inspect the code Python code before com-
serting synthesized code as cells is useful and fts naturally mitting to the code. In this case, the code seems like what he
with the way data scientists already work in notebooks. probably would have written had he done this transformation
manually: split the string on a colon, and then return the frst RELATED WORK
split. Dan decides to insert this code into a cell below this W REX extends and coalesces two lines of prior work: data
one, but defers executing it 9 . If Dan had actually intended wrangling tools and program synthesis for structured data.
to uppercase all of the types, he could have provided W REX
with a second example: Fire to FIRE. If desired, Dan could Data Wrangling Tools
have also changed the target from Python to R for comparison One well-known class of tools tackles the need to make data
(or even PySpark), since Dan is a bit more familiar with R. wrangling (e.g., preprocessing, cleaning, transformation [24])
more effcient. OpenRefne provides an interactive grid that
allows the data scientist to perform simple text transforma-
tions, such as trimming a string, to clean up data columns,
and to discover needed transformations through flters [8]. Jet-
Brains’ DataLore provides a data science IDE that provides
code suggestions given user intentions [18]. Wrangler low-
ered the time and effort that data scientists spent on data
wrangling by suggesting contextually relevant transforms to
users, showing a preview window with the transform’s effects,
and providing an export to JavaScript function [23] (but this
feature has since been removed in the commercial version
[36]). Proactive Wrangler extended it with mixed-initiative
Since the new input cell is just Python code D , Dan is free
suggested steps to transform data into relational formats [14].
to use it however he wants: he can use it as is, modify the
function, or even copy the snippet elsewhere. Dan decides to Tempe provides interactive and continuous visualization sup-
apply the synthesized function to the larger data frame—this port for live streaming data [6], where not only did the visual-
results in adding a type column to the data frame (df). He ization change with new incoming data, but also changed with
plots a bar chart of the count of the categories and confrms new user input through live programming support. Trend-
that there are only three types of incidents in the data E . Query is a “human-in-the-loop” interactive system that al-
lowed users to iteratively and directly manipulate their data
From Code to Insights: Having wrangled the title column
visualizations for the curation and discovery of trends [21].
to type, and given the latlng column already present, Dan
Northstar describes an interactive system for data analysis
thinks it might be interesting to plot the locations on a map.
aimed at allowing non-data-scientist domain experts and data
To do so, the latlng column is a string and needs to be sep-
scientists to collaborate, making data science more accessible
arated into lat and lng columns. Once again, he repeats the
[27]. DS.js leverages existing web pages, and the tables and
data wrangling steps as before: Dan returns a subset of the
visualizations on them, to create programming environments
data, and uses the interactive grid in W REX to wrangle the
that help novices learn data science [39].
latitude and longitude transforms out of the latlng column.
He applies these functions to his data frame. Having done the While these tools all provide increased effciency for data sci-
tricky part of data wrangling the three columns—lat, lng, entists performing data wrangling tasks, they are missing sev-
and type—he cobbles together some code to add this infor- eral key features that W REX provides. Most notably, they
mation onto folium 10 , a map visualization tool. do not aim to generate readable code or to integrate with
data scientists’ existing workfows. W REX uses program syn-
thesis to achieve these goals and integrates with Jupyter, a
popular computational notebook used by data scientists [26].
Through both our formative and controlled lab studies, we
found these features to be critical for data scientists. Partici-
pants required saving source code as an artifact of their data
wrangling so they could perform similar transformations on
future datasets. Further, they wanted to take their wrangling
scripts they created with their sample dataset and apply them
to the full dataset in cluster or cloud environments.
As for notebook integration, Kandel et al. described a com-
mon data science workfow of context switching between raw
data, wrangling tools, and visualization tools; Kandel noted
that the “ideal” tool would combine these workfows into one
tool [22]. We also found that data scientists desired tools such
as W REX that integrated with their current workfow. Using
Like the data scientists in our study, Dan fnds that data wran- separate applications to perform data wrangling and analysis
gling is a roadblock to doing more impactful data analysis. requires extra time and effort to import data into independent
With W REX, Dan can accelerate the tedious process of data tools to wrangle their data, after which they will need to ex-
wrangling to focus on more interesting data explorations—all port their transformed data back into their preferred tool for
in Python, and without having to leave his notebook. data exploration, the computational notebook.
Program Synthesis for Structured Data “[they] shouldn’t have to go somewhere else just to transform
Gulwani et al. developed a new language and program synthe- data.” F6 wondered if was “possible to put some of these ca-
sis algorithm implemented in Microsoft Excel that can per- pabilities [that are available in standalone tools] within their
form several tasks that users have diffculty with in spread- notebooks.” This feedback led to our frst design goal:
sheet environments [9, 11, 12]. This feature, which became
D1. Data wrangling tools should be available where the
to be known as FlashFill, leveraged input-output examples de-
data scientist works—within their notebooks.
fned by the user. FlashFill took these examples and created
programs to perform string manipulations quickly, and with All of our participants wanted tools that produced code as an
very few output examples from the user. Harris and Gulwani inspectable artifact, because, “as a black box; you don’t have
then applied this research direction to table transformations a good intuition about what is happening to your data” (F7)
in spreadsheets [15]. Yessenov et al. used programming-by- and because “black boxes aren’t transparent, the data trans-
example to do text processing [38]. Le and Gulwani then forms aren’t customizable. If the tool doesn’t have your trans-
developed FlashExtract, a framework that uses examples to formation, you have to write it yourself anyway” (F6).
extract data from documents and tabular data [28]. Others
have also leveraged synthesis to perform data transformations Although some tools allowed data scientists to view their data
involving tabular data [19, 7, 17, 37]. transformations as scripts, we found that data scientists pre-
ferred that these scripts be written in languages they already
This procession of research allows end-users to perform the were comfortable with (F1, F6). For example, F1 “preferred
above tasks without knowing how to write wrangling scripts. general-purpose languages for doing data science.” F6 ex-
Further, even when users have the knowledge to create these plained, “there’s a learning curve to having to learn new li-
scripts, these methods can produce results in a fraction of the braries”. F7 added that the scripts from these tools were often
time it takes to code these scripts by hand. This increases quite limited: “I’m an expert in Python; these [languages for
both the accessibility and effciency of users dealing with data wrangling] seem to cater only to novice programmers.
data. W REX leverages these benefts to allow data scientists They don’t compose well with our existing notebook code or
to forgo writing data wrangling scripts and focus on providing the ‘crazy formats’ we have to deal with.”
example output of desired data transformations.
Data scientists’ desire for inspectable code as output of the
These projects found examples to be “the most natural” way data transformation tool, their preference for using familiar
to provide a program synthesizer with a specifcation, but programming languages, and the desire to customize or ex-
challenges remained in designing programming-by-example tend data wrangling transforms led to our second design goal:
interaction models, particularly in user intent [10]. They
noted that user examples may be ambiguous, so users need a D2. Data wrangling tools should produce code as an
way to address this ambiguity. W REX uses the “User Driven inspectable and modifable artifact, using programming
Interaction” model described in this line of work, which al- languages already familiar to the data scientist.
lows the user to examine the artifact, through reading the syn-
WREX SYSTEM DESIGN AND IMPLEMENTATION
thesized source code, and the behavior of the artifact, through
W REX is implemented as a Jupyter notebook extension. The
the resultant output in the derived column. If any discrepan-
front-end display component is based on Qgrid [33], an inter-
cies exist with either, the user can provide further input by in-
active grid view for editing data frames. Several changes were
teracting with their data frame or by directly editing the code.
made to this component to support code generation. First, we
Chasins et al. found that some participants perceived PBE
modifed Qgrid to render views of the underlying data frame,
tools to have less fexibility than traditional programming [3].
rather than the data frame itself. Second, we added the abil-
In W REX, users have both the speed and ease-of-use bene-
ity to add new columns to the grid. By implementing both
fts of PBE with the freedom to always switch to traditional
of these changes, users are able to give examples through vir-
text-based coding if the user perceives it to be necessary.
tual columns without affecting the underlying data. Third, we
FORMATIVE INTERVIEWS AND DESIGN GOALS added a view component to Qgrid to render the code block.
We conducted interviews with seven data scientists who fre- Finally, we bound to appropriate event handlers to invoke our
quently use computational notebooks at a large, data-driven program synthesis engine on cell changes. To automatically
software company. In our interviews, we focused on how display the interactive grid for data frames, the back-end com-
they perform data wrangling, how data wrangling fts within ponent injects confguration options to the Python pandas li-
their notebook workfow, what tools they use or have used for brary [32] and overrides its HTML display mechanism.
data wrangling, and what diffculties they face as they wran-
Readable Synthesis Algorithm
gle data. These data scientists (F1–F7) provided several in-
sights that guided the design goals for W REX. The program synthesis engine that powers W REX substan-
tially extends the FlashFill toolkit [29], which provide several
Data scientists reported that using standalone tools designed domain specifc languages (DSLs) with operators that support
for data wrangling required “excessive roundtrips” (F2, F4) string transformations [9], number transformations [35], date
or “shuffing data back and forth” (F1, F6) between their note- transformations [35], and table lookup transformations [34].
books and the data wrangling tool. As a result, they preferred A technical report by Gulwani et al. [13] formally describes
to write their wrangling code by hand in their notebooks. the semantics of extensions; W REX uses these extensions,
F2 explained that although these tools have nice capabilities, which we summarize in this section.
Transform Input(s) / Example(s) Synthesized Code
String
E XTRACTING 12;L MERION;CITY AVE a = s.index(";") + len(";")
b = s.rindex(";")
L MERION return s[a:b]
C ASE MANIPULATION NEW HANOVER return s.title()
New Hanover
C ONCATENATION Claudio A Chew return "{}-{}-{}".format(s, t, u)
Claudio-A-Chew
G ENERATING INITIALS Doug Funnie t = regex.search(r"\p{Lu}+", s).group(0)
u = list(regex.finditer(
D.F. r"\p{Lu}+", s))[-1].group(0)
return "{}.{}.".format(t, u)
M APPING CONST VALUES Male 0 { "Male": 0, "Female": 1 }.get(s)
Female 1
Number
ROUND TO TWO -15.319 -15.32 return Decimal(s).quantize(
DECIMALS WITH TIES 17.315 17.32 Decimal(".01"),
GOING AWAY FROM ZERO rounding = ROUND_HALF_UP)
Table 1: W REX synthesizes readable code for transformations commonly used by data scientists during data wrangling activities. After selecting
one or more columns (text in blue), the data scientists can specify examples in an output column to provide their intent (text in red). As the data scientist
provides examples, W REX generates a synthesized code block and presents this code block to them.
With W REX, we surfaced this PBE algorithm through an in- Algorithm 1 Program synthesis phases for readable code.
teraction that is accessible to data scientists. The algorithm function R EADABLE C ODE S YNTHESIS(df, examples)
supports a variety of transformations, and even compositions P1 ← S YNTHESIZE(examples, fashfll_ranker)
of those transformations, without requiring the user to explic- examples_all ← {(row, P1 (row)) | row ∈ df}
itly specify any input or output data types. Table 1 lists exam- P2 ← S YNTHESIZE(examples_all, readability_ranker)
ples of the resulting synthesized Python code for typical data P3 ← R EWRITE(P2 , rules, df)
science use cases; the synthesized code for E XTRACTING is code ← TRANSLATE _ TO _ TARGET(P3 )
only three lines of code. In the classic FlashFill algorithm, return F ORMATTER(code)
this same program is over 30 lines of code.
The extended FlashFill algorithm (RCS) has four phases:
the equivalent operator in the target language—which today
Phase 1: Standard Program. RCS calls S YNTHESIZE with can be Python, R, and PySpark. For example, the C ONCAT
the user-provided examples, using the standard FlashFill operator in the DSL is just mapped to + or a format method
ranker. Since the FlashFill ranker is optimized to minimize on a string, depending on the number of elements concate-
the number of required examples, data scientists can in many nated. If the DSL operator does not have a semantically-
cases obtain a useful program (P1 ) by giving only a single ex- equivalent Python operator, then the translator generates mul-
ample. Here, the program is represented as an internal DSL. tiple lines of code in the target language to emulate its behav-
Phase 2: Readable Program. We use the size of the program ior. Finally, the target code is passed through an off-the-shell
as a proxy for readability, and design a ranker that prefers code formatter: for Python, this is autopep8 [16].
small programs, which are likely easier to understand. This
ranker is also designed to prefer programs that use DSL op- Limitations
erators that have direct translation into the target language A limitation of W REX is that user-friendly error handling is
(e.g., Python). Since the readability ranker is optimized to not implemented yet. Errors can arise in two ways: when
prefer small programs, the ranker requires more examples the user specifes a conficting set of constraints (for exam-
than the FlashFill ranker. The insight is to apply the pro- ple, transforming “Traffic to T” alongside “Traffic to TR”),
gram P1 to all required input columns in the data frame or when the synthesis engine fails to learn a program. Pro-
to obtain these additional examples (examples_all). RCS gram synthesis will also unrelentingly generate incomprehen-
again calls S YNTHESIS to obtain the program P2 , this time sible programs due to diffcult-to-spot typos in user-entered
using examples_all and the readability ranker. Concretely, examples (such as having a trailing space in an example). In
consider the transform of “21-07-2012” to “21”. FlashFill such cases, we asked participants to invoke the grid again and
(intent-based) takes the sub-string that matches \d+ on the redo the task, although we did not restart their task time. In
left and “-” on the right—because it handles dates like “4-12- practice, it is unrealistic to expect that data scientists can per-
2018” (input.match('^\d+')). However, tuning towards fectly provide examples to the system, so these issues will
generality makes it less succinct. The objective-based ranker need to be addressed in future work. When the user intro-
chooses input[0:2], but if and only if the behavior matches duces these internally conficting examples or when rows in
on a much larger sample of inputs (maintaining behavioral a dataset have ambiguous values (e.g., null), it is useful to
equivalence to the intent-based ranker). Hence, we pick suggest additional rows to investigate; this signifcant inputs
input[0:2] if there are no inputs of the form “4-12-2018”. feature is available but not evaluated in this paper.
If there are inputs in such a form, the user would have to pro- Some tasks are not amenable to programming and thus are
vide a second example. not performed by W REX, like certain natural language trans-
Phase 3: Rewriting. The goal of the rewriting phase is formations (e.g., “S.F.” to “San Francisco”), and other tasks
to transform the synthesized program into another program that require aggregation like the sum or average of the entire
that is simpler to understand. As before, we apply the in- column. Another limitation comes from how a user samples
sight that we can use rewrite rules such that the synthe- their data (for example, df.head(n), which may lack suff-
sized program preserves the behavior of examples_all, but cient diversity in range of exposed values). This issue may
allows for changing the semantics of any potential inputs lead to synthesized code that works perfectly for the sample
that have not been passed to RCS. One such rule rewrites but runs into issues on the full dataset.
“[0-9]+(\,[0-9]{3})*(\.[0-9]+)?” to “\d+”. This re- Users may not know when to stop providing examples (where
places a complex pattern that matches numbers with commas further examples have little effect on synthesis). Here users
and decimal point with a pattern that matches a sequence of must inspect the data frame and code to determine if W REX
digits. Clearly, replacing the frst regular expression by sec- narrows in on an acceptable solution. It outputs only the
ond one will change the semantics of a program. But if all top-ranked one, and it is possible that the user may prefer a
numbers in all the inputs are of the form “\d+”, then the re- lower-ranked program (e.g., uses non-regex instead of regex).
placement will preserve behavioral equivalence. Finally, W REX is aimed for professional data scientists who
Phase 4: Translation to the Target Language. The fnal trans- work mostly within notebooks; users of Excel, Tableau, and
lation step goes down the abstract syntax tree (AST) of the other GUI tools may be more accustomed to switching be-
DSL-program, and translates each node (DSL operator) into tween multiple tools, so an integrated single-app workfow
may not be as necessary for them.
EVALUATION: IN-LAB COMPARATIVE USER STUDY notebook (manual condition). They had 5 minutes per task
Participants: We recruited 12 data scientists (10 male), ran- to read the requirements of the task and write code to com-
domly selected from a directory of computational notebook plete the task. Participants were provided a verifcation code
users with Python familiarity within a large, data-driven soft- snippet within their notebooks that participants ran to deter-
ware company. They self-reported an average of 4 years of mine if they had completed the task successfully. If partici-
data science experience within the company. They self-rated pants failed to complete the task within the allotted time, we
familiarity with Jupyter notebooks with a median of “Ex- marked the task as incorrect. Participants had access to the
tremely familiar (5),” using a 5-point Likert scale from “Not internet to assist them in completing the task if needed. At
familiar at all (1)” to “Extremely familiar (5),” and their famil- the end of the manual condition, we interviewed the partic-
iarity with Python at a median of “Moderately familiar (4).” ipants about their experience and asked them to complete a
questionnaire to rate aspects of their experience. Next, par-
Tasks: Participants completed six tasks using two different ticipants completed a short tutorial that introduced them to
datasets. These tasks involved transformations commonly W REX. After participants completed the tutorial, they moved
done by data scientists during data wrangling, such as extract- on to the second set of tasks, this time using W REX with con-
ing part of a string and changing its case, formatting dates, ditions similar to the frst set of tasks. After the three tasks are
time-binning, and rounding foating-point numbers. completed, we again interviewed them about their experience
The frst dataset, called A, contains emergency call data con- and asked them to complete the questionnaire.
taining columns with dates, times, latitude, longitude, physi- Questionnaires: After the frst set of tasks, participants rated
cal location with zip code and cross streets, and an incident how often the tasks showed up in their day-to-day work us-
description.1 We designed three tasks using this dataset: ing a 5-point Likert scale from “Never (1)” to “A great deal
A1 Using the Location (19044;HORSHAM;CEDAR AVE & (5)”, and discussed what aspects of the notebook made it diff-
COTTAGE AVE) column, extract the city name and title cult to complete the tasks and what affordances could address
case it (Horsham). these diffculties. After the second set of tasks, this time with
A2 Using the Date (12/11/2015) and Time (13:34:52) W REX, the participants took a second questionnaire that also
columns, format the date to the day of the week, time had them rate task representativeness, and asked free-form
to 12-hour clock format, and combine these values questions on diffculties they had and tool improvements. Fur-
with an “@” symbol (Friday @ 1pm). ther, the second questionnaire asked the participant to rate
grid and code acceptability using a 5-point Likert-type item
A3 Using the Latitude (40.185840) and Longitude
scale ranging from “Unacceptable (1)” to “Acceptable (5)”,
(-75.125512) columns, round half up the values to the
and rate the likeliness they would use a productionized ver-
nearest hundredths place and combine them in a new
sion of W REX using a 5-point Likert-type item scale rang-
format ([40.19, -75.13]).
ing from “Extremely unlikely (1)” to “Extremely likely (5)”.
The second dataset, called B, contains New York City noise Finally, participants were interviewed after each set of tasks
complaint data which includes columns containing the date- about their experience with Jupyter notebooks and W REX.
timestamp of the call, the date-timestamp of when the inci-
Follow-up: We directly addressed participant feedback to im-
dent was closed, type of location, zip code of incident, city of
prove the synthesized code: We removed the use of classes
incident location, borough of incident location, latitude, and
entirely and replaced these instances with lightweight func-
longitude.2 We designed three tasks using this dataset:
tions. We replaced the register-based variable naming scheme
B1 Using the Created Date (12/31/2015 0:01) col- (_0 and _1) with a variable-name generation scheme that uses
umn, extract the time and place it in a 2-hour time bin simpler mnemonic names, such as s and t for string argu-
(00:00-02:00). ments. We removed exception handling logic because these
B2 Using the Location Type (Store/Commercial), constructs made it harder for the data scientists to identify
City (NEW YORK), and Borough (MANHATTAN) columns, the core part of the transformation. Finally, we returned to
title case the values and combine them in a new format the participants after implementing these changes and asked
(Store/Commercial in New York, Manhattan). them to reassess the synthesized code for the study tasks.
B3 Using the Latitude (40.865324) and Longitude QUANTITATIVE RESULTS
(-73.938237) columns, round half down the values to
Table 2 shows completion rates by task and condition.
the nearest hundredths place and combine them in a
Fisher’s exact test identifed a signifcant difference between
new format ((40.86, -73.94)).
the W REX and manual conditions, both in the A-dataset frst
Protocol: Participants were assigned A and B datasets (p < .0001) and in B-dataset frst (p < .0001) subgroups.
through a counterbalanced design, such that half the partic- Participants in the manual condition completed 12/36 tasks,
ipants received the A dataset frst (A-dataset group), and the while those in the W REX condition completed all 36/36 tasks.
other half received the B dataset frst (B-dataset group). We The signifcant differences between the two conditions can
randomized task order within each dataset to mitigate learn- be explained mostly by tasks A2 and B1, which require non-
ing effects. They frst completed three tasks with a Jupyter trivial date and time transformations.
1 https://www.kaggle.com/mchirico/montcoalert Participant Effciency: Table 3 shows the distribution of
2 https://www.kaggle.com/somesnm/partynyc completion times by task and condition, and the participants’
Manual W REX Frequency Acceptability
Task n % n % n Dist. Task n Grid Code1 Code2
A1 3 50% 6 100% 12 3 A1 6 5 3 5
A2 0 0% 6 100% 12 2
A2 6 5 2 5
A3 2 33% 6 100% 12 2
B1 0 0% 6 100% 12 3 A3 6 5 2 5
B2 3 50% 6 100% 12 2 B1 6 4 2 4
B3 4 67% 6 100% 12 2
B2 6 4 3 5
Table 2: Participant task completion under W REX and manual data
wrangling conditions. Participant reported frequency of tasks in day- B3 6 5 3 5
to-day work. Participants were given fve minutes to complete each task.
Rating scale for task frequency from left-to-right: Never (1), Rarely Table 4: How acceptable was the grid experience and the corre-
(2), Occasionally (3), Moderately (4), A great deal (5). Median sponding synthesized code snippet? Rating scale from left-to-right:
values precede each distribution. Unacceptable (1), Slightly unacceptable (2), Neutral (3), Slightly
acceptable (4), and Acceptable (5). Code1 are the ratings from the code
synthesized in the in-lab study. Code2 are the ratings after incorporating
Manual W REX the participants’ feedback. Median values precede each distribution.
Task Timeline n Time (min) n Time (min) ticipants reported that they would probably use the tool (4),
0 5 and seven reported that they would defnitely use the tool (5).
A1 3 2.5 6 2.4
0 5
A2 0 5.0 6 2.9 QUALITATIVE FEEDBACK FROM STUDY PARTICIPANTS
0 5
A3 2 3.6 6 1.8 Reducing Barriers to Data Wrangling
B1
0 5
0 5.0 6 3.1 After completing the three tasks in the manual Jupyter condi-
0 5
tion, participants noted these sets of barriers to wrangling that
B2 3 4.4 6 3.2 they experienced both during the tasks and also in their daily
B3
0 5
4 4.2 6 1.7 work, some of which W REX helped overcome.
Table 3: Participant effciency under W REX and manual data Recall of Functions and Syntax
wrangling conditions. The most common barrier reported by participants, both
within our lab study and in their daily work, is remembering
self-reported frequency of how often they do that type of task. what functions and syntax are required to perform the nec-
A t-test failed to identify a signifcant difference in the A- essary data transformations. One reason for failed recall is
dataset (t(5.93) = 1.13, p = 0.30), but did identify a signif- due to lack of recency, “my biggest diffculty was recalling
cant difference for the B-dataset (t(22.32) = 5.17, p < 0.001). the specifc command names and syntax, just because I didn’t
Using W REX, the average time to completion was u = 2.4, use them today” (P2). The complexity of modern languages
sd = 1.0 (A) and u = 2.7, sd = 1.0 (B). Participants using and the number of libraries available is too vast for data sci-
W REX, on average, were about 40 seconds faster (u = 0.60, entists to rely on their memory faculties as “it is just tough to
sd = 0.53) in A, and about 1.6 minutes faster (u = 1.61, memorize all the nuances of a language” (P5). Participants
sd = 0.31) in B. This means if one has a good understanding noted that although computational notebooks have features
of the code required to perform their transformation—and if like inline documentation and autocompletion, these features
the code is simple to write—then it may be faster to write don’t directly help them in understanding which operations
code directly than to give an example to W REX. they need to use and how they should use them.
Grid and Code Acceptability: Table 4 shows distribution of W REX reduces this barrier with the synthesis of readable code
acceptability for the grid, the code acceptability during the via programming-by-example. This removes the need for
study (Code1 ), and the code acceptability after post-study im- data scientists to remember the specifc functions or syntax
provements (Code2 ). Participants reported the median accept- needed for a transformation. Instead, they need only know
ability of the grid experience as Acceptable (5). The code what they want to do with the data in order to produce code.
acceptability during the study (Code1 ) had substantial varia-
tion in responses, with a median of Neutral (3). After im- Searching for Solutions
proving the program synthesis engine based on the participant To alleviate recall issues, data scientists rely on web searches
feedback (Code2 ), the median score improved to Acceptable for solutions on websites like Stack Overfow. These searches
(5). A Wilcoxon signed-rank test identifed these differences occur because “most of the tasks are pretty standard, I ex-
as signifcant (S = 319, p < .0001), with a median rating in- pect there to be one function that solves the piece, gener-
crease of 2. As a measure of user satisfaction, we asked par- ally in Stack Overfow, if you are able to break the prob-
ticipants if they would use W REX for data wrangling tasks if lem down small enough you can fnd a teeny code snippet
a production version of the tool was made available: fve par- to test.” (P3). Participants believed searching for these code
snippets is quicker than producing the solutions themselves. things. [So], if I leave and pass on my work to someone else,
This helps them reach their goal of “achieving the fnal result they would be able to use it if they know how it is written”
as fast as possible”, so they prefer “to save time and use some- (P12). Participants also cited readability as an enabler for de-
thing existing” (P8). Unfortunately, searching for solutions bugging and maintenance, where readable code would allow
can fail or increase data wrangling time depending on the do- them to make small changes to the code themselves rather
main of the task since “there are so many [web pages] and than provide more examples to the interactive grid. Amongst
you need to pick the right one. So, it takes time to fnd some- our participants, some example standards for readability were:
one who has the exact same problem that you had. Usually in “I would want it to be very similar to what I would expect
70-80% of the cases I’ve found that someone else has had the searching Stack Overfow” (P10). Interestingly, our partici-
same problem, sometimes not, depends on the domain. [...] pants found short variable names like “String s” or “Float f”
In audio [data] it’s more complicated to fnd someone who to be acceptable, as they could just rename these themselves.
did something similar to what you were looking for” (P8).
Trust in Synthesized Code
W REX reduced participants’ reliance on web searches. In- The most salient method to increase trust was reading the syn-
stead of hunting online for the right syntax or API calls, they thesized code. Inspecting the resulting wrangled data frame is
could remain in the context of their wrangling activities and not enough, and that without readable code they “don’t know
only had to provide the expected output for data transforma- what is going on there, so I don’t know if I can trust it if I want
tions. Participants immediately noticed the time it took to to extend it to other tasks. I saw my examples were correctly
complete the three tasks with W REX compared to doing so transformed, but because the code is hard to read, I would not
with a default Jupyter notebook and web search: “I super be able to trust what it is doing” (P10). Several participants
liked it, it was amazing, really quick, I didn’t have to look noted that the best way to gain the confdence of a user in
up or browse anything else” (P9); W REX also “avoided my these types of tools is to “have the code be readable” (P3).
back and forth from Stack Overfow.” (P12). By removing
the need to search websites and code repositories, W REX al- Several participants proposed alternative methods beyond in-
lows data scientists to remain in the context of their analysis. specting the data frame and the data wrangling code to im-
prove their trust of the system. These proposals ranged from
Fitting into Data Scientists’ Workfows simple summations of the resulting output, code comments,
W REX helps address the above barriers by providing familiar and data visualizations. Some participants desired informa-
interactions that reduce the need for syntax recall and code- tion on any assumptions made for edge cases, or to request
related web searches. First, W REX’s grid felt familiar, less- examples for these edge cases. These alternative affordances
ening the learning curve required to perform data wrangling are important, as they could provide “a way of validating,
tasks with the system. This form of interaction was likened maybe not the mechanics, the internals of it, but the output
to “the pattern recognition that Excel has when you drag and of it, would help me be confdent that it did what I thought it
drop it” and that W REX had a “nice free text fow” (P5). Feed- did” (P2). That said, if W REX did not produce readable code,
back for the grid interaction was overwhelmingly positive (Ta- some participants “would be less trustful of it” (P10).
ble 4), with only minor enhancements suggested such as a
right-click context menu and better horizontal scrolling. DISCUSSION
Participants agreed that this tool ft into their workfow. They Data Scientists Need In-Situ Tools Within Their Workfow
were enthusiastic about not having to leave the notebook Computational notebooks are not just for wrangling, but for
when performing their day-to-day data wrangling tasks. By the entire data analysis workfow. Thus, programming-by-
having a tool that generates wrangling code directly in their example (PBE) tools that enhance the user experience at each
notebook, participants felt that they could easily iterate be- stage of data analysis need to reside where data scientists per-
tween data wrangling and analysis. Some participants re- form these tasks: within the notebook. These in-situ work-
ported running subsets of their data on local notebooks for fows are an effciency boost for data scientists in two ways:
exploratory analysis, but then eventually needing to export First, providing PBE within the notebook removes the need
their code into production Python scripts to run in the cloud. for users to leave their notebook and spend valuable time web
With existing data wrangling tools, participants indicated that searching for code solutions, as the solutions are generated
they would have to re-write these transformations in Python. based on user examples. Second, users no longer need to ex-
Because W REX already produces code, these data wrangling port their data, open an external tool, load the data into that
transformations are easy to incorporate into such scripts. tool, perform any data wrangling required, export the wran-
gled data, and reload that data back into their notebook.
Data Scientists’ Expectations of Synthesized Code Though our investigation focused on data wrangling, tools
Readability of Synthesized Code like W REX can play a critical role at each step of a data anal-
Participants described readability as being a critical feature of ysis by providing unifed PBE interactions. For example, fu-
usable synthesized code. P6 wanted “to read what the code ture PBE tools can frst ingest data to synthesize code that
was doing and make sure it was doing what I expect it to creates a data frame, which can then be wrangled using PBE,
do, in case there was an ambiguity I didn’t pick up on”. It and fnally again be used to produce code to create visualiza-
is also important for collaboration “because the whole pur- tions like histograms or other useful graphs. This provides an
pose for me to use Jupyter notebook is to be able to interpret accessible, effcient, and powerful interface to data scientists
performing data analysis, allowing them to never leave their scientists’ existing code-oriented workfow. Though our orig-
notebooks and thus avoid context-switching costs. inal aim was to help data scientists accomplish diffcult data
wrangling tasks, our participants found that W REX was also
In our user studies we found that data scientists were unlikely
useful for performing simpler PBE tasks like adding or drop-
to adopt tools that required them to leave their notebook. We
ping a column. While we implemented our interaction with
also witnessed participants struggle to fnd code online that
program synthesis as an interactive grid, we believe that other
was suitable to the task at hand. Without W REX, they had
interactions can also synthesize readable code. For example,
to frequently move back and forth between web searches and
our study participants mentioned data summaries and visu-
their notebook as they copied and modifed various code snip-
alizations as potential sources for verifcation of the output
pets. With W REX, this frustration was removed. When PBE
of data science tools. One requested feature was histograms
interactions are within the notebook, a streamlined and eff- of the initial and the updated data frame so users can take a
cient workfow for data analysis can be realized. In sum, as quick glance and make sure the shape of the data makes sense.
long as interactions are in-notebook, familiar, and can pro- Data summaries provide ranges of values that provide poten-
duce readable code, data scientists are enthusiastic to adopt tial edge cases for their code to handle, either by feeding them
PBE tools; they are hesitant to do so without these features. as examples to W REX or by modifying synthesized code to
Also, notebooks are the ideal environment for PBE tools since handle these cases. The insight we gleaned from this feed-
program synthesis is good at generating small code snippets back is that data scientists want the freedom that comes with
which is a similar granularity to existing notebook cell usage. multiple workfows so they can choose the best interaction for
Program synthesis also relies on user interactions to provide each task. In future work, it is interesting to explore different
examples that remove ambiguity so that PBE tools can pro- surfaces for exposing PBE tools, like the visually-richer in-
duce correct code. Notebooks already provide a platform that terfaces described above, while discovering and minimizing
enables the interaction between a user and their code that is a potential trade-offs in user experience.
good match for the user interaction requirements of PBE.
Data Scientists’ Priorities for Readable Synthesized Code Synthesized Code Makes Data Science More Accessible
Data scientists need to be able to read and comprehend the Synthesizing code with PBE has the potential to make data
code so they can verify it is accomplishing their task. Thus, science more accessible to people with varying levels of
if a system synthesizes unreadable code, users have much programming profciency. For instance, without a tool like
more trouble performing verifcation of the output. Verifa- W REX, a data scientist in a neuroscience lab must not only
bility increases trust in the system, and gives data scientists become an expert on brain-related data but also in the me-
confdence that the synthesized code handles edge cases and chanics of programming. With W REX they can not only see
performs the task without errors. Readability also improves the fnal wrangled data, which speeds up their workfow, they
maintainability. If a data scientist wants to reuse the synthe- can also study the code that performed those transformations.
sized code elsewhere but make edits based on the context of
Our study participants noted that W REX was useful in learn-
their data, they need to be able to frst understand that code.
ing how to perform the transforms they were interested in,
Data scientists prioritize certain readability features over oth- or even assist them in discovering different programming pat-
ers when thinking about acceptable synthesized code. Partic- terns for regular expressions. W REX also alleviates the te-
ipants did signal a need for features that increased readability dium felt by data scientists having to learn new APIs and
like better indentation, line breaks, naming conventions, and even lessens the burden of having to keep up with API up-
meaningful comments in the synthesized code. Some partici- dates. This also benefts polyglot programmers who might be
pants desired synthesized code that followed language idioms weaker in a new language, as they can quickly get up to speed
or that “would pass a [GitHub] pull request”, but other par- by leveraging W REX to produce code that they can use and
ticipants saw their notebooks as exploratory code that would learn from. In the future we see potential for interactive pro-
need to be rewritten and productionized later anyway, and in- gram synthesis tools as learning instruments if they are able
stead, desired synthesized code that is brief and easy to follow. to synthesize readable and pedagogically-suitable code.
This means that the goal of synthesized code should not be to
appear as if a human had written it, but to focus on having
these high priority features that data scientists require. CONCLUSION
Our formative study found that professional data scientists are
Alternative Interactions with Code and Data reluctant to use existing wrangling tools that did not ft within
Data scientists frequently use applications like Excel to view their notebook-based workfows. To address this gap, we de-
their data and Python IDEs to manipulate it, which make them veloped W REX, a notebook-based programming-by-example
choose between the ease of use afforded by GUIs, and the ex- system that generates readable code for common data trans-
pressive fexibility afforded by programming. W REX merges formations. Our user study found that data scientists are
usability and fexibility by generating code through grid inter- signifcantly more effective and effcient at data wrangling
actions. We found that our grid was familiar to data scientists with W REX over manual programming. In particular, users
who had used various grid-like structures before in spread- reported that the synthesis of readable code—and the trans-
sheets. By implementing our grid in a programming envi- parency that code offers—was an essential requirement for
ronment such as Jupyter Notebooks, our system fts into data supporting their data wrangling workfows.
Acknowledgments [12] Sumit Gulwani and Mark Marron. 2014. NLyze:
We thank Arjun Radhakrishna, Ashish Tiwari, and Andrew Interactive Programming by Natural Language for
Head for helpful discussions about tool and study design, and Spreadsheet Data Analysis and Manipulation. In
the data scientists at Microsoft who participated in the inter- Proceedings of the 2014 ACM SIGMOD International
views and studies. Conference on Management of Data (SIGMOD ’14).
803–814. DOI:
REFERENCES http://dx.doi.org/10.1145/2588555.2612177
[1] Apache. 2019. Zeppelin. (2019).
[13] Sumit Gulwani, Kunal Pathak, Arjun Radhakrishna,
https://zeppelin.apache.org/
Ashish Tiwari, and Abhishek Udupa. 2019.
[2] Carbide. 2019. Carbide Alpha. (2019). Quantitative Programming by Examples. arXiv e-prints
https://alpha.trycarbide.com/ (Sep. 2019), arXiv:1909.05964.
[3] Sarah E. Chasins, Maria Mueller, and Rastislav Bodik. [14] Philip J. Guo, Sean Kandel, Joseph M. Hellerstein, and
2018. Rousillon: Scraping Distributed Hierarchical Jeffrey Heer. 2011. Proactive Wrangling:
Web Data. In Proceedings of the 31st Annual ACM Mixed-initiative End-user Programming of Data
Symposium on User Interface Software and Technology Transformation Scripts. In Proceedings of the 24th
(UIST ’18). 963–975. DOI: Annual ACM Symposium on User Interface Software
http://dx.doi.org/10.1145/3242587.3242661 and Technology (UIST ’11). 65–74. DOI:
http://dx.doi.org/10.1145/2047196.2047205
[4] Tamraparni Dasu and Theodore Johnson. 2003.
Exploratory Data Mining and Data Cleaning (1 ed.). [15] William R. Harris and Sumit Gulwani. 2011.
DOI:http://dx.doi.org/10.1002/0471448354 Spreadsheet Table Transformations from Examples. In
Proceedings of the 32nd ACM SIGPLAN Conference on
[5] Databricks. 2019. databricks. (2019). Programming Language Design and Implementation
https://databricks.com/
(PLDI ’11). 317–328. DOI:
[6] Robert DeLine, Danyel Fisher, Badrish Chandramouli, http://dx.doi.org/10.1145/1993498.1993536
Jonathan Goldstein, Michael Barnett, James F [16] Hideo Hattori. 2019. autopep8. (2019).
Terwilliger, and John Wernsing. 2015. Tempe: Live https://github.com/hhatto/autopep8/
scripting for live data. In 2015 IEEE Symposium on
Visual Languages and Human-Centric Computing [17] Yeye He, Kris Ganjam, Kukjin Lee, Yue Wang, Vivek
(VL/HCC ’15). 137–141. DOI: Narasayya, Surajit Chaudhuri, Xu Chu, and Yudian
http://dx.doi.org/10.1109/VLHCC.2015.7357208 Zheng. 2018. Transform-Data-by-Example (TDE):
Extensible Data Transformation in Excel. In
[7] Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, Proceedings of the 2018 International Conference on
and Swarat Chaudhuri. 2017. Component-based Management of Data (SIGMOD ’18). 1785–1788. DOI:
Synthesis of Table Consolidation and Transformation http://dx.doi.org/10.1145/3183713.3193539
Tasks from Examples. In Proceedings of the 38th ACM
SIGPLAN Conference on Programming Language [18] Jetbrains. 2019. Datalore. (2019). https://datalore.io/
Design and Implementation (PLDI ’17). 422–436. DOI:
[19] Zhongjun Jin, Michael R. Anderson, Michael Cafarella,
http://dx.doi.org/10.1145/3062341.3062351
and H. V. Jagadish. 2017. Foofah: Transforming Data
[8] Google. 2019. OpenRefne. (2019). By Example. In Proceedings of the 2017 ACM
https://openrefine.org/ International Conference on Management of Data
(SIGMOD ’17). 683–698. DOI:
[9] Sumit Gulwani. 2011. Automating String Processing in http://dx.doi.org/10.1145/3035918.3064034
Spreadsheets Using Input-output Examples. In
Proceedings of the 38th Annual ACM [20] Jupyter. 2019. Jupyter Notebook. (2019).
SIGPLAN-SIGACT Symposium on Principles of https://jupyter.org/
Programming Languages (POPL ’11). 317–330. DOI:
[21] Niranjan Kamat, Eugene Wu, and Arnab Nandi. 2016.
http://dx.doi.org/10.1145/1926385.1926423
TrendQuery: A System for Interactive Exploration of
[10] Sumit Gulwani. 2012. Synthesis from Examples: Trends. In Proceedings of the Workshop on
Interaction Models and Algorithms. In Proceedings of Human-In-the-Loop Data Analytics (HILDA ’16).
the 2012 14th International Symposium on Symbolic Article 12, 4 pages. DOI:
and Numeric Algorithms for Scientifc Computing http://dx.doi.org/10.1145/2939502.2939514
(SYNASC ’12). 8–14. DOI:
http://dx.doi.org/10.1109/SYNASC.2012.69
[22] Sean Kandel, Jeffrey Heer, Catherine Plaisant, Jessie
Kennedy, Frank van Ham, Nathalie Henry Riche, Chris
[11] Sumit Gulwani, William R. Harris, and Rishabh Singh. Weaver, Bongshin Lee, Dominique Brodbeck, and
2012. Spreadsheet Data Manipulation Using Examples. Paolo Buono. 2011a. Research Directions in Data
Commun. ACM 55, 8 (Aug. 2012), 97–105. DOI: Wrangling: Visualizations and Transformations for
http://dx.doi.org/10.1145/2240236.2240260 Usable and Credible Data. Information Visualization
10, 4 (Oct. 2011), 271–288. DOI: [35] Rishabh Singh and Sumit Gulwani. 2012b.
http://dx.doi.org/10.1177/1473871611415994 Synthesizing Number Transformations from
[23] Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Input-Output Examples. In Computer Aided
Jeffrey Heer. 2011b. Wrangler: Interactive Visual Verifcation. 634–651. DOI:
http://dx.doi.org/10.1007/978-3-642-31424-7_44
Specifcation of Data Transformation Scripts. In
Proceedings of the SIGCHI Conference on Human [36] Trifacta. 2019. Wrangler. (2019).
Factors in Computing Systems (CHI ’11). 3363–3372. https://www.trifacta.com/products/wrangler-editions/
DOI:http://dx.doi.org/10.1145/1978942.1979444
[37] Navid Yaghmazadeh, Xinyu Wang, and Isil Dillig.
[24] Sean Kandel, Andreas Paepcke, Joseph M. Hellerstein, 2018. Automated Migration of Hierarchical Data to
and Jeffrey Heer. 2012. Enterprise Data Analysis and Relational Tables Using Programming-by-example.
Visualization: An Interview Study. IEEE Transactions Proceedings of the VLDB Endowment 11, 5 (Jan. 2018),
on Visualization and Computer Graphics 18, 12 (Dec. 580–593. DOI:
2012), 2917–2926. DOI: http://dx.doi.org/10.1145/3187009.3177735
http://dx.doi.org/10.1109/TVCG.2012.219
[38] Kuat Yessenov, Shubham Tulsiani, Aditya Menon,
[25] Mary Beth Kery, Marissa Radensky, Mahima Arya, Robert C. Miller, Sumit Gulwani, Butler Lampson, and
Bonnie E. John, and Brad A. Myers. 2018. The Story in Adam Kalai. 2013. A Colorful Approach to Text
the Notebook: Exploratory Data Science Using a Processing by Example. In Proceedings of the 26th
Literate Programming Tool. In Proceedings of the 2018 Annual ACM Symposium on User Interface Software
CHI Conference on Human Factors in Computing and Technology (UIST ’13). 495–504. DOI:
Systems (CHI ’18). Article 174, 11 pages. DOI: http://dx.doi.org/10.1145/2501988.2502040
http://dx.doi.org/10.1145/3173574.3173748
[39] Xiong Zhang and Philip J. Guo. 2017. DS.Js: Turn Any
[26] Thomas Kluyver, Benjamin Ragan-Kelley, Fernando
Webpage into an Example-Centric Live Programming
Pérez, Brian E. Granger, Matthias Bussonnier,
Environment for Learning Data Science. In
Jonathan Frederic, Kyle Kelley, Jessica B. Hamrick,
Proceedings of the 30th Annual ACM Symposium on
Jason Grout, Sylvain Corlay, Paul Ivanov, Damián
User Interface Software and Technology (UIST ’17).
Avila, Safa Abdalla, Carol Willing, and et al. 2016.
691–702. DOI:
Jupyter Notebooks - a publishing format for
http://dx.doi.org/10.1145/3126594.3126663
reproducible computational workfows. In Proceedings
of the 20th International Conference on Electronic
Publishing (ELPUB ’16). DOI:
http://dx.doi.org/10.3233/978-1-61499-649-1-87
[27] Tim Kraska. 2018. Northstar: An Interactive Data
Science System. Proceedings of the VLDB Endowment
11, 12 (Aug. 2018), 2150–2164. DOI:
http://dx.doi.org/10.14778/3229863.3240493
[28] Vu Le and Sumit Gulwani. 2014. FlashExtract: A
Framework for Data Extraction by Examples. In
Proceedings of the 35th ACM SIGPLAN Conference on
Programming Language Design and Implementation
(PLDI ’14). 542–553. DOI:
http://dx.doi.org/10.1145/2594291.2594333
[29] Microsoft. 2019. PROSE SDK. (2019).
https://microsoft.github.io/prose/
[30] Mozilla. 2019. Iodide. (2019). https://alpha.iodide.io/
[31] Observable. 2019. Observable. (2019).
https://observablehq.com/
[32] pandas-dev. 2019. The pandas project. (2019).
https://pandas.pydata.org/
[33] Quantopian. 2019. Qgrid. (2019).
https://github.com/quantopian/qgrid/
[34] Rishabh Singh and Sumit Gulwani. 2012a. Learning
Semantic String Transformations from Examples.
Proceedings of the VLDB Endowment 5, 8 (April 2012),
740–751. DOI:
http://dx.doi.org/10.14778/2212351.2212356