Scratch Refactoring

The document introduces four new automated refactoring techniques for the block-based programming language Scratch: Extract Custom Block, Extract Parent Sprite, Extract Constant, and Reduce Variable Scope. It describes implementing these refactorings in Scratch and evaluating their impact on code quality and programmer experience through analysis of Scratch projects and a user study.

Uploaded by

Laurentiu Sterescu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views9 pages

Scratch Refactoring

Uploaded by

Laurentiu Sterescu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Code Quality Improvement for All:

Automated Refactoring for Scratch

Peeratham Techapalokul and Eli Tilevich
Software Innovations Lab
Dept. of Computer Science, Virginia Tech
{tpeera4, tilevich}@cs.vt.edu

Abstract—Block-based programming has been overwhelmingly As it turns out, the issues of code quality reduce the
successful in revitalizing introductory computing education and pedagogical effectiveness of block-based programming [3], [4]
in facilitating end-user development. However, poor code quality as well as the attractiveness of blocks for serious end-user
makes block-based programs hard to understand, modify, and
reuse, thus hurting the educational and productivity effective- programming pursuits [5]. Consequently, this programming
ness of blocks. There is great potential benefit in empowering community can no longer afford to neglect the issues of code
programmers in this domain to systematically improve the code quality and its systematic improvement. To that end, support
quality of their projects. Refactoring—improving code quality for improving code quality can play an important role in
while preserving its semantics—has been widely adopted in elevating the “ceiling” of block-based programming, while
traditional software development. In this work, we introduce
refactoring to Scratch. We define four new Scratch refactorings: also transforming code quality improvement into a regular
Extract Custom Block, Extract Parent Sprite, Extract Constant, practice of novice programmers and end-user developers.
and Reduce Variable Scope. To automate the application of these As first steps on the way to improving quality, several
refactorings, we enhance the Scratch programming environment recent research efforts have focused on identifying recurring
with powerful program analysis and transformation routines. quality problems in block-based software [6], [4]. However,
To evaluate the utility of these refactorings, we apply them to
remove the code smells detected in a representative dataset of 448 once faced with the identified quality problems, a programmer
Scratch projects. We also conduct a between-subjects user study needs to decide whether to spend the time and effort required
with 24 participants to assess how our refactoring tools impact to fix them. In fact, evidence suggests novice programmers
programmers. Our results show that refactoring improves the in this domain refrain from engaging in quality improvement
subjects’ code quality metrics, while our refactoring tools help practices [7], despite being sufficiently proficient in program-
motivate programmers to improve code quality.
ming to improve their code. Across all levels of expertise,
Index Terms—block-based languages, software refactoring,
Scratch, code quality, code smells, program analysis, end-user novice programmers have been observed introducing a high
programming, introductory curriculum number of quality problems in their code [8].
Integrating quality improvement practices into the software
development process can be prohibitively expensive for pro-
I. I NTRODUCTION fessional software developers, let alone novice programmers
and end-users. For text-based languages, automated refactor-
Block-based programming plays an essential role in re- ing has become an indispensable quality improvement tool
alizing the vision of CS for All [1], which renders com- [9]. Refactoring systematically improves code quality, while
puting accessible to the broadest possible audience of pro- keeping its functionality intact, ensured by its sophisticated
grammers, many of whom are learners and end-users. A program analyses and behavior-preserving program transfor-
highly popular block-based programming language is Scratch, mations. Alas, major block-based programming environments
whose general design principles strive for “a low-barrier only support the most rudimentary Rename refactoring, which
to entry” for beginners and “a high-ceiling” for maturing removes the Uncommunicative Name code smell [4]. However,
programmers to create increasingly sophisticated programs other highly recurring quality problems (e.g., code duplication)
over time [2]. Nevertheless, the Scratch community’s main make block-based projects hard to understand, modify, and
focus thus far has been on its “low-barrier to entry,” with reuse. To be able to improve the code quality of their projects,
new command blocks that accommodate a wider audience by programmers need well-documented refactorings for Scratch
rendering programming accessible and attractive. Perhaps as and programming support, which both identifies refactoring
an unintended consequence, the efforts to push the “ceiling” opportunities and applies them automatically. Systematically
higher have been somewhat deprioritized. Left to their own supporting Scratch programmers in improving the code quality
devices, experienced programmers do get to work on advanced of their projects can increase the pedagogical and productivity
sophisticated projects, but at the cost of their software growing benefits of this programming domain.
uncontrollably in size and complexity, with the overall code To address this problem, we added automated refactoring
quality becoming degraded over time. support to Scratch 3.0, validated by implementing four refac-
torings that remove code smells, reported as highly recurring
978-1-7281-0810-0/19/$31.00 2019
c IEEE
in prior studies [6], [4]. E XTRACT C USTOM B LOCK puts the V evaluates the introduced refactorings. Section VI discusses
repeated sequences of statement blocks into a new procedure, the significance of the evaluation results. Section VII presents
invoked in place of the sequences. E XTRACT PARENT S PRITE re- concluding remarks and future work directions.
moves duplicate sprites (programmable objects) by introducing
II. BACKGROUND AND R ELATED W ORK
a special construct that clones a sprite multiple times. E XTRACT
C ONSTANT replaces recurring constant expressions with a new This section provides the background information required
variable, initialized to that expression. R EDUCE VARIABLE S COPE to understand our contributions and also reviews the most
changes a variable scope accessibility from all sprites (default closely related prior research efforts.
setting) to a given sprite. A. Block-Based Programming Languages
With the exception of E XTRACT C USTOM B LOCK, these refac-
By lowering the barrier to entry for beginner programmers,
torings would be novel even without automated application.
block-based programming has achieved an unprecedented level
They provide the Scratch community with a vocabulary to
of success and popularity [10]. The two areas in which block-
communicate about semantic preserving transformations that
based programming has been most successful are introductory
improve code quality. These refactorings are also uniquely
computing education and end-user programming. In this do-
applicable to Scratch, as it is this language’s domain-specific
main, programming environments typically separate the front-
features and distinct programming practices that cause the
end blocks editor from the back-end execution engine, with
code smells these refactorings remove. E XTRACT C USTOM B LOCK
this separation enabling different languages to reuse the same
does have counterparts in text-based languages, but to correctly
blocks editor.
carry out this refactoring in Scratch requires verifying a Our analysis and infrastructure targets Scratch [2]. To sup-
different set of correctness preconditions. port automated refactoring, we enhance the latest version of
To determine the potential usefulness of the introduced
Scratch, whose blocks editing interface is built on Blockly
refactorings, we apply them to a representative sample of
[11], a popular blocks editing framework. With minimally
448 Scratch projects in the public domain. We ran a user
built-in semantic analyses capabilities, the Blockly framework
study to investigate whether the availability of our refactoring
leaves it up to language designers to implement the nec-
impact the studied participants improving code quality and
essary semantic analyses capabilities. Refactoring relies on
their opinion about code quality and our refactoring tools.
program analyses, whose program representation differs from
The evaluation results show that overall (3 out of 4 refac-
the blocks editor’s representation, designed for rendering and
torings) show the high applicability of 79% or higher, when
interactively manipulating blocks.
applied to the code smells previously shown to be highly
prevalent in the Scratch codebase. Each refactoring positively B. Automated Software Refactoring
impacts its respective software metrics, which improve various Automated software refactoring has received considerable
code quality attributes, including program size, comprehensi- attention from the research community [12]. We review the
bility, modifiability, and abstraction. Our user study results most closely related examples of prior work, from which we
suggest that the presence of actionable improvement hints and draw lessons and insights required to design and implement
the associated refactorings motivates programmers to improve automated refactoring support for block-based languages.
code quality. To the best of our knowledge, this work is the a) Analyses and Transformations: Several facets of our
first effort to introduce automated refactoring to Scratch. By refactoring infrastructure build upon the prior advances in au-
describing the design, implementation, and evaluation of our tomated refactoring despite its text-based context. Refactoring
approach, this paper makes the following contributions: engines commonly operates on the representation known as a
1) A catalog of four refactorings for Scratch that removes program graph [13], an AST augmented with semantics edges
highly recurring code smells. that express various relationships (e.g., reference binding, def-
2) An intuitive user interface for refactoring, whose ac- use chains, etc.). We adopt this representation for its flexibility
tionable and contextualized coding hints encourage pro- in analyses and transformations, in which additional semantics
grammers to engage in improving code quality. edges are introduced as required by a given analysis. For
3) An experimental study that evaluates the applicability example, control flow and data flow analyses in Scratch
and utility of the introduced refactorings. require the information about broadcast-receive relationship
4) A user study that investigates the impact of our refac- in the program. To that end, we leverage JastAdd, a Java-
toring tools on programmers. based language processing framework by Hedin and Eva [14].
5) A software architecture and reference implementation of Our analyses follow the design of the AST-level extensible
a refactoring engine for Scratch 3.0, featuring program intraprocedural program analysis by Söderberg et al. [15].
analyzers and automatic code transformation. b) Refactoring for Blocks: Despite its ubiquitous avail-
The rest of the paper is structured as follows. Section ability in the IDEs for text-based languages, refactoring has
II provides the technical background of this research and only been scarcely applied in the domains of end-user pro-
compares this work to the related state of the art. Section gramming (e.g., pipe-like mashups [16], spreadsheet [17]).
III presents a catalog of Scratch refactorings. Section IV In block-based programming, the Blockly framework pro-
describes our automated software refactoring support. Section vides rudimentary refactoring support: renaming variables and
...
changing function signatures (i.e., add parameters and change
their order). These built-in refactorings can be implemented by repeat 2

...
following a simple match-and-replace program transformation change ghost eﬀect by -50
1 deﬁne fade speed step
strategy, insufficient to support our advanced refactorings. fade 2 -50
3
Several prior works analyze code quality [6], [4] and create ... repeat speed ...

analysis tools (i.e., Hairball [18], Dr.Scratch [19], and Quality repeat 10 2 change ghost eﬀect by step fade 10 -10

Hound [20]). However, these prior works neither focus on how change ghost eﬀect by -10 ...

to address quality problems nor on how to apply automated

tools. By identifying recurring Scratch code smells, these prior ...

works inspire the refactorings described herein.

We designed our refactoring user interface to favor simplic- Fig. 1. E XTRACT C USTOM B LOCK
ity over versatility, in line with our target audience. That is, a
quality hint is presented as a light bulb, on which the program- removals require adding boilerplate code, which would be
mer can mouse over to see the suggested code improvement hard to generate automatically and require advanced program-
context, and then right-click to apply the suggested refactoring. ming expertise to understand. Hence, the refactoring presented
We chose this simple design over more complex interfaces, herein is applicable when sprite duplicates share similar code
such as those that embrace the native drag-and-drop idioms of (usually at the beginning when a duplicate has just been in-
block-based programming. Even though these idioms inspired troduced). Note that the “hide” block is immediately executed
a novel refactoring user interface for Java [21], they would be in the parent sprite, so as to emulate an invisible prototype
poorly suited for refactoring block-based code, as they require object, whose only purpose is to clone visible children.
a detailed knowledge of the source and target destinations Preconditions: Sprite duplicates have identical set of scripts
of the intended refactoring transformations. Hence, drag-and- (exact code duplication, without variation in literals and identi-
drop refactoring interfaces are better suited for object hier- fiers (e.g. variable references)). Each sprite duplicate contains
archies, with clearly delineated boundaries between program no scripts starting with the “when I start as a clone”
constructs. Our user study answers the fundamental question block. Finally, this Each sprite duplicate uses a single costume.
of whether access to automated refactoring support motivates
novice programmers and end-users to improve code quality in
sprite: Star 4
this domain. sprite: Star1
when clicked

go to random position 1 when I start as a clone

III. R EFACTORING C ATALOG Star1 Star2
forever
star1 show
Next we present our refactoring catalog. For each refac- ﬂickers
2 when clicked
toring, we list its rationale and preconditions. Due to space when I start as a clone
hide 5
limitation, we only illustrate the affected program parts before sprite: Star2
switch to costume star1 go to random position
when clicked 3
and after the application of complex refactorings.
go to random position create clone of myself forever

A. Extract Custom Block star2

forever switch to costume star2 ﬂickers

This refactoring creates a procedure, whose body comprises ﬂickers

create clone of myself
the repeated code fragment being refactored, and replaces all
occurrences of this fragment with the appropriately parame-
terized invocations of the created procedure. Fig. 2. E XTRACT PARENT S PRITE
Precondition: For behavioral preserving transformation,
each argument to be parameterized must be a constant and
C. Extract Constant
can be parameterized. Note that some blocks accept a drop-
down option value and cannot be parameterized. Finally, This refactoring replaces replicated constant values with a
the extracted fragment must not contain the control flow variable. Descriptively named variables improve comprehen-
terminating command (i.e., “stop <this script>” block). sion and modifiability [22]. The only precondition is that the
B. Extract Parent Sprite replicated values must be of type literal.
This refactoring removes duplicate sprites by extracting
the parent sprite which instantiates its children clones us- D. Reduce Variable Scope
ing “create clone of <target>” block1 . Encapsulated
This refactoring changes the scope of an existing variable
within a sprite, the same code is not only easier to modify, but
from being accessible to all sprites to only a given sprite. If
is also amenable to other localized refactorings (e.g., E XTRACT
global scope for a variable is not needed, reducing its scope
C USTOM B LOCK, E XTRACT C ONSTANT, etc.). Automated refactoring
improves the sprite’s data encapsulation.
cannot remove sprite duplicates in all cases. Some of the
Preconditions: Only one sprite modifies the rescoped vari-
1 https://en.scratch-wiki.info/wiki/Cloning able (though it can be read by multiple sprites)
IV. R EFACTORING FOR S CRATCH previously mentioned in II. We express various relationships
Although we introduce automated refactoring to Scratch, between program elements in a program graph with its declar-
our general architecture and design can be applied to any ative specification language to augment the AST classes.
block-based programming environment. Fig.3 illustrates the major phases of refactoring with an
Overall Architecture: Exposed as remote services, the example of performing E XTRACT C ONSTANT. The first phase
required program analysis and transformation functionalities starts with a refactoring request, whose parameters for E XTRACT
integrate non-intrusively. Passed a serialized form of the edited C ONSTANT comprise all the block’s IDs of all duplicate literals
program as input, these services analyze and detect code and the edited program. To determine if all preconditions
smells, returning the computed refactoring transformations. are met, the server-side refactoring engine executes various
Implementation: Based on its input parameters, the refac- analysis routines (e.g., check preconds) on the parsed AST.
toring engine analyzes and transforms the edited program. The Then, the engine computes and record a sequence of transfor-
refactoring parameters can be specified by the programmer mations (i.e., “Actions”) that put the refactoring into effect
or in our case automatically extracted from the smells (e.g., (“compute transforms”). The resulting transformation actions
D UPLICATE C ODE → E XTRACT C USTOM B LOCK request). Before
are serialized and returned to the blocks editor, which presents
performing any transformations, the refactoring engine de- the discovered smell hints along with the suggested refactoring
termines whether a given refactoring request satisfies all of transformations (“apply transformations”).
its preconditions. In the transformation phase, the refactoring Refactoring Interface Design: While experienced pro-
engine modifies the analysis AST, while recording each mod- grammers eagerly refactor their code, novice programmers are
ification as a transformation action. Having been transferred unfamiliar with the practice. Hence, the latter’s willingness to
back to the client-side, this atomic sequence of actions is refactor needs to be encouraged with a friendly and intuitive
applied to the program model, maintained by the block-based user interface. Refactoring starts from identifying code whose
programming environment. The actions are applied in the quality can be improved, a hard task that is even harder
specified order, as each of them modifies program state. for novice programmers. To render refactoring accessible to
Fig.3 shows some transformation action types used to our target audience, we follow two key design principles,
implement E XTRACT C ONSTANT. Each action is persisted, so the also demonstrated in the screenshot in Fig.4, an example of
client can replay the corresponding transformations on the applying E XTRACT C USTOM B LOCK refactoring to a real-world
client-side’s program model. Our design assumes all program Scratch project.
elements (both blocks and non-blocks) can be looked up based 1) Code smells should be presented as improvement op-
on their string IDs, so program changes can be mapped across portunities to the programmer. Fig.4A displays a code
representations. Additionally, the blocks editor can serialize hint as a light-bulb icon, indicating an opportunity for
and deserialize its internal program model (e.g., in XML or improving code quality (E XTRACT C USTOM B LOCK in this
JSON data format). case). Whenever possible a hint should be visually con-
Our program analysis and transformation operate on an AST textualized. For E XTRACT C USTOM B LOCK refactoring, our
by means of JastAdd [14], a language processing framework refactoring interface highlights duplicate code blocks.
2) Refactoring should be immediately actionable. Instead
2 Expr Seq. of Actions of relying on the programmer to specify the required
Stmt
VarDecl 1 Assign
Stmt
VarDeclarAction refactoring parameters, as in traditional refactoring, the
body 1 name: xpos
/VarDecls/
3 Expr id: var1_id infrastructure should present only the actions ready for
Sprite Block Loop
Stmt
2 BlockCreateAction
the programmer to act upon. Fig.4B shows “Help me
Seq Stmt 5 xml: <block type: setvar ...
Num create the custom block”, the only available action for
/ScriptList/ Literal BlockInsertAction
Expr 3 target: loopstmt_id this hint in a simple terminology that can be easily
Stmt with: setvar_id
Script
Var 4 understood by novice and end-user programmers.
Access 4 BlockCreateAction
xml: <block type: data-var... Note that in this example, an additional refactoring hint,
1.) build AST 5 BlockReplaceAction shown after the application of E XTRACT C USTOM B LOCK, suggests
target: num_literal_id
2.) check preconds.
3.) compute transforms
with: var1_id to the programmer that the just-extracted custom block should
... apply be meaningfully renamed.
Extract Constant Request: transformations
Param: List<Literal> // all "167"
VarName: "xpos" V. E VALUATION
Automated refactoring can become helpful for novice and
end-user programmers in improving the quality of their
projects, as long as the refactorings are applicable, useful,
and accessible for this programming audience. Our evaluation
seeks answers to the following questions:
RQ1. How applicable is each introduced refactoring?
RQ2. How do the refactorings impact code quality?
Fig. 3. Different stages of E XTRACT C ONSTANT refactoring
A
C

Fig. 4. A screenshot of the E XTRACT C USTOM B LOCK refactoring invocation interface for Scratch

RQ3. Do refactoring tools motivate quality improvement? Code Smell Definition and Detection Criteria
Duplicate 2 or more code fragments, containing more than one statement,
To answer RQ1 and RQ2, we experimentally evaluate a Code are duplicate if they have identical structure except for varia-
representative sample of Scratch projects. We refactor the code tions in identifiers and literals (type II in clone classification
smells, automatically detected by our infrastructure’s smell [23]). If multiple duplicate fragments overlap, the largest is
selected.
analysis modules. To answer RQ3, we conduct an online
Duplicate 2 or more sprites are duplicate if each script within one of the
between-subjects study, in which participants in the treatment Sprite sprites is duplicated in the others.
group interact with our refactoring infrastructure and answer Duplicate Exact literals of at least 3 characters that are replicated at least
survey questions. Constant twice (the thresholds identified experimentally to reduce false
positive results)
A. Experimental Evaluation Broad Scope A variable declared in the global scope (Stage), but assigned
Variable only locally in a single sprite
We limit the scope of our evaluation to highly prevalent
TABLE I
smells and whether our refactorings can remove them. Note C ODE S MELL D EFINITIONS
that some refactorings can remove more than one type of
code smell (e.g., E XTRACT C USTOM B LOCK can remove both RQ1. Refactoring Applicability: For each refactoring,
L ONG S CRIPT [4] and D UPLICATE C ODE smells). Hence, if we
we assess its applicability by calculating the percent of its
were to apply the available refactorings to remove all auto- associated smells that are refactorable. Because code smell
matically detected smells, such an evaluation strategy would definitions affect the applicability of refactorings, Table I lists
distort the applicability results and the refactored code qual- the considered smells and their detection criteria as the bases
ity, as some of the detected smells may not be indica- for interpreting our evaluation results.
tive of actual quality problems (e.g.,what constitutes a L ONG
S CRIPT is highly subjective). To avoid such distortions, our Afflicted Total Refactored
Smell → Refactoring
evaluation considers the following fixed SMELL→REFACTORING Projects Smells Smells
Duplicate Code → Extract Custom Block 181 (41%) 290 229 (79%)
pairs: D UPLICATE C ODE→E XTRACT C USTOM B LOCK, D UPLICATE Duplicate Sprite → Extract Parent Sprite 142 (32%) 193 22 (11%)
S PRITE→E XTRACT PARENT S PRITE, D UPLICATE C ONSTANT→E XTRACT Duplicate Constant → Extract Constant 194 (43%) 453 453 (100%)
C ONSTANT, and B ROAD S COPE VARIABLE→R EDUCE VARIABLE S COPE. Broad Scope Var. → Reduce Var. Scope 94 (21%) 145 118 (81%)
Evaluation Dataset: To assess how viable the refactorings TABLE II
are, we measure the applicability and impact of applying them A PPLICABILITY (N=448)
to third-party Scratch programs. An API request2 to MIT’s
Scratch service retrieves a list of projects, divided into two Results: Table II summarizes the results of evaluating
categories of approximately equal size: (1) trending and (2) refactoring applicability. In our evaluation, as long as a project
recent. This subject selection strategy ensures that we conduct contains at least one instance of a given code smell, the project
our evaluation on a diverse sample of projects created by is considered afflicted by that smell. Different smells have been
the Scratch community. We collected a total of 448 projects found to afflict different projects in the evaluation dataset.
51% among them were viewed at most once, with the rest Afflicting over 30% of the subject projects, duplication-
of projects were viewed on average 12,749 times. Among the related smells are the most prevalent; afflicting around 21%,
subject projects, 88% were remixed at most once and the rest B ROAD S COPE VARIABLE is the least prevalent smell. One project
were remixed on average 93 times. may contain more than one instance of the same code smell
(Total.Smells > No.Afflicted.Projects). We use all detected
2 https://scratch.mit.edu/explore/projects/all/<recent>|<trending> smells to evaluate refactoring applicability.
Metric Definition Metrics N Min p25 Med Mean p75 Max
LOC # statement blocks within a program Duplicate Code
Complex Script Dens. % of scripts (including procedure) with Mc- Group Size 229 2 2 3 3.05 3 20
Cabe’s cyclomatic complexity [24] value > 10 Fragment Size 229 3 3 4 5.05 6 27
(risk threshold according to [25]) Duplicate Sprite
Long Script Dens. % of scripts (including procedure) with LOC Group Size 22 2 2 2.5 22.86 4.75 238
> 11 LOC (threshold empirically determined in Sprite Size (LOC) 22 1 1 1.0 1.64 2.00 3
previous work [4]) Duplicate Constant
Procedure Dens. # procedures within a program per 100 LOCs Group Size 453 5 5 6 8.96 10 113
No. Literals # literals (numbers and strings) within a program Literal Length 453 2 2 3 3.14 3 58
No. Global Var. # global variables Broad Scope Variable
No. Create Clone Of. # C REATE C LONE O F<TARGET> blocks Total Uses 118 0 1 1 2.73 2.00 60
External Uses 90 0 0 0 0.47 0.75 6
TABLE III
M ETRICS D EFINITIONS TABLE IV
C HARACTERISTICS OF R EFACTORABLE S MELLS
The applicability of the introduced refactorings varies
widely. With the success rate of over 75%, E XTRACT C ONSTANT, mean decrease in LOC by 3.38%. Though less applicable,
R EDUCE VARIABLE S COPE, and E XTRACT C USTOM B LOCK are the E XTRACT PARENT S PRITE refactoring removes large duplications
most applicable refactorings. E XTRACT C USTOM B LOCK’s precon- at the sprite level. A subset of projects afflicted by D UPLICATE
dition failures are due to the variations in duplicate fragments S PRITE see a greater mean decrease in LOC by 8.49%.
failing to satisfy the preconditions (60% of the variations Comprehension: We expect E XTRACT C USTOM B LOCK to
contain global variables and 31% contain non-constant ex- help shorten some long scripts and reduce the number of
pression blocks; 9% are located at non-parameterizable input complex scripts due to the original script being extracted. We
slots). As expected, E XTRACT PARENT S PRITE is the least ap- observe 4.4% decrease in the number of long scripts. On the
plicable refactoring due to its restrictive preconditions—only other hand, we only observe a slight improvement in terms
11% of the detected smell instances can be refactored. The of the reduction in the number of complex scripts (5.77%)
reason for failures is that E XTRACT PARENT S PRITE cannot handle indicating that most refactorable Duplicate Code smells are
certain duplicate sprites, out of which 63% differ slightly in not located within complex scripts.
terms of their contained code, 33% are multi-costumes, and Modifiability: Although the software metrics literature
4% contain scripts starting with the “when I start as a still lacks a definitive metrics known to faithfully capture code
clone” block. Overall, we observe the introduced refactorings modifiability, we can still reason about certain code modifi-
to be satisfactorily applicable to the highly recurring smell ability improvements by measuring the number of repeated
types. Even the least applicable refactoring—E XTRACT PARENT functionalities that have become localized in a single reusable
S PRITE—can be applied frequently enough. program unit (i.e., procedure for D UPLICATE C ODE, parent sprite
RQ2. Refactoring Impact on Quality: To assess how for D UPLICATE S PRITE, and variable for D UPLICATE C ONSTANT). In
each refactoring impacts program quality, we apply all the Table IV, the Group Size characteristic of these refactored
evaluated refactorings in sequence on each of the detected duplication-related smells reflects the number of locations a
smell type instances. We then compute the relevant software programmer needs to navigate to make similar changes in the
metrics of the original and the refactored versions of each duplicate parts.
subject program, so as to determine the difference or the delta Abstraction: Duplication-eliminating refactorings have
in code quality, which serves as a measure of refactoring an obvious impact on abstraction (i.e., E XTRACT C USTOM B LOCK
quality impact. Table III defines the software metrics used. increases procedural abstraction, E XTRACT PARENT S PRITE in-
Results: Table IV summarizes the characteristics of the creases object abstraction, and E XTRACT C ONSTANT increases
detected smells that are refactorable. Table V summarizes the uses of variable, a basic data abstraction). Lastly, R EDUCE
percentage changes for different software metrics before and VARIABLE S COPE improves information hiding or encapsulation,
after performing each refactoring. To help the reader interpret which correlates with the increase in the usage of local
the results, the last column translates the mean deltas into variables. The result indicates the refactored projects (N=41)
percent improvements. Group Size refers to the number of which have used local variables at least once could see an
replications of a given program element. Next, we describe increased usage of local variables by almost 54% on average.
the results in terms of different code quality attributes. Total A great room for improvement in the usages of local variable
Uses refers to the number of times a given variable is read in is expected as changing the scope of declared variable in
a project. External Uses refers to the number of times a given Scratch is an expensive and tedious transformation requiring
variable is read from outside the sprite in which it is defined. the programmer to create a new variable with the intended
Size: We expect duplication-eliminating refactorings to scope and replace each existing variable block with the newly
remove redundant code and decrease the code size of the created one manually.
afflicted projects. We observe that E XTRACT C USTOM B LOCK
reduces a varying level of code size. A small improvement B. User Study
in code size is due to the small number of repetitions detected We conducted an online user study, facilitated by Amazon’s
more frequently than bigger ones. Thus, most projects see a Mechanical Turk, in order to gain access to a diverse pool of
% Change Statistics %
Improve
Metrics N Min p25 Med Mean p75 Max
E XTRACT C USTOM B LOCK
LOC 147 -29.70 -4.16 -1.65 -3.38 -0.67 0.00 +3.38%
Complex Script Dens. 52 -100.00 -4.00 -2.47 -5.77 -1.44 -0.28 +5.77%
Long Script Dens. 137 -100.00 -6.45 0.66 -4.40 1.97 42.24 +4.4%
Procedure Dens. 46 2.72 13.61 35.48 49.98 54.73 222.15 +49.98%
E XTRACT PARENT S PRITE
LOC 20 -94.15 -12.71 -0.69 -8.49 4.45 41.18 +8.49%
No. Sprites 20 -96.39 -58.48 -12.70 -31.07 -8.90 -3.85 +31.07%
E XTRACT C ONSTANT
No. Literals 194 -65 -14.02 -8.33 -11.53 -4.7 -0.89 +11.53%
R EDUCE VARIABLE S COPE
No. Local Variable 43 2.22 7.28 20 52.42 66.67 200 +52.42%

TABLE V
P ERCENTAGE C HANGES OF S OFTWARE M ETRICS B EFORE AND A FTER R EFACTORINGS

novice and experienced participants. A total of 24 participants RQ3.1 Engagement with improving code quality: We tested
took part in the study. 7 out of 13 participants in the treatment whether the participants, who chose to engage in improving
group reported having programming experience as compared code quality, depended on receiving our improvement hints
to 4 out of 11 in the control group. The participants took 30 and their suggested refactorings. To that end, we performed a
minutes on average to complete the study (1-hour hard limit) Chi-square independence test. The relationship between these
and were compensated $3 for completing the assignment. This variables turned significant, (χ2 (1, N = 24) = 8.48, P=.004),
study investigated the impact of the availability of refactoring thus implying that programmers receiving hints were likely to
tools (i.e., code quality hints and automatic refactorings) on follow them in applying the suggested refactorings.
the propensity of programmers to improve the quality of their When asked which of the program versions they found
code. To that end, the participants were first primed to use easier to understand, 25% of the participants chose the original
custom blocks to improve program comprehensibility, and then version, while 75% of them chose the refactored one. We
encouraged to improve their code, amenable to the E XTRACT looked further if the participants’ preference for their choices
C USTOM B LOCK refactoring. affected their engagement in improving code quality. Among
The participants first answered background questions about the participants who chose the “refactored version” as being
their programming experience and familiarity with Scratch. easier to understand, only 12.5% in the control group ended
Then they received a short introduction to Scratch program- up improving the code quality, as compared to 80% of the
ming and custom blocks. To prime the participants to think participants in the treatment group.
about code quality, they were asked to rank two program RQ3.2 Code quality perception: When asked whether their
versions on their comprehensibility (both performed the same finished programs would be easy to understand for novice
animation but one was a refactored version of the other). programmers, 85% of the participants in the treatment group
The participants were presented with a programming task that agreed as compared to 91% in the control group.
required reusing in two places an existing block sequence RQ3.3 Improvement hints and refactoring usefulness: The
in the workspace. In order to understand what the block treatment group was asked how useful they found the improve-
sequence did, it was expected to be run. Manually extracting ment hints and the associated refactorings in making their code
a parameter-less custom block from this code sequence took easy for others to understand. The vast majority of the group
5 editing steps. The participants were asked to make sure members found our refactoring tools useful: 54% very useful
their code was easy to understand before completing the and 38% extremely useful.
task. In the remainder of the study, the control and treatment
groups diverged. Only the treatment group was exposed to C. Threats to Validity
D UPLICATE C ODE hints and the associated E XTRACT C USTOM B LOCK Our study had several threats to validity. Our experimental
refactoring. evaluation results only reflected the partial applicability and
RQ3. Refactoring Tools Motivating Quality Improve- quality impact of each refactoring to the studied code smells.
ment: To investigate how the treatment affected the likelihood As mentioned, some of these refactorings were applicable in
of the participants improving code quality, we instrumented additional code smells/scenarios, but not all of them could be
our custom Scratch editor to record the following two program properly covered in one study. Because performing a refac-
versions: 1) the first submission attempt, before participants toring is a subjective decision, the results did not necessarily
were asked to make their code easy for others to understand; equate with the actual applicability and quality impact. Never-
and 2) the final submission. To understand how the presence theless, one can see from our results the potential usefulness
of refactoring tools impacted the participants’ attitude toward of providing such support for the programmers in this domain.
code quality, we asked them to complete a post-study survey. In our user study, the participants with some programming
Results: The study results for each question are: experience, but no familiarity with Scratch represented almost
a half of the total participants. Although we intended to include experience are very encouraging. Custom blocks or procedures
more results from novice participants, it would be impossible are considered somewhat a hard concept not introduced until
for us to make use of their incomplete task results. Al- later in the introductory curriculum. However, most of the non-
though, as expected, programming experience was positively programmer participants in the treatment group were able to
correlated with completion rates, it showed no influence on take advantage of our refactoring tools and perceive the hints
our intervention. The programming task used in the study and their suggested code improvement actions positively.
was not representative of the real world programs in this
domain. However, it was reasonably complex for a half of VII. C ONCLUSION AND F UTURE W ORK
the participants that had no programming experience.
This paper describes our effort to introduce automated refac-
VI. D ISCUSSION toring to Scratch, a widely used block-based programming
language. To demonstrate the practicality of our analysis and
Overall, the results of our experimental evaluation and
transformation infrastructure, we implement four refactorings
user study are quite revealing. The introduced refactorings do
that remove highly recurring Scratch code smells, identified
improve the code quality metrics by removing recurring code
in prior works. By providing their rationales, preconditions
smells from the representative projects. We have also observed
and transformation strategy, we systematically document these
that the presence of improvement hints and their associated
Scratch refactorings for use by both programmers and lan-
refactorings positively influences how programmers perceive
guage designers. To assess the potential usefulness of each
code quality and its systematic improvement.
introduced refactoring, we experimentally evaluate the appli-
Suggesting Quality Improvements: Although automated
cability and quality impact of each refactoring on a dataset of
hints help detect quality problems, some of the detected
448 projects. Our evaluation results show that the introduced
quality problems may not need to be refactored due to their
refactorings are highly applicable, while their application
triviality. A better alternative can be to provide hints in the
improves code quality.
form of before and after examples. Other detected problems
Our refactoring infrastructure helps overcome two main
make poor subjects for automated refactoring due to their
hindrances: programmers being unaware of code improvement
high complexity. Indeed, fixing some of these problems may
opportunities and the programming burden of the improve-
require significant programmer involvement (e.g., to come up
ments. Our infrastructure provides coding hints with immedi-
with a meaningful name). Finally, some problems are simply
ately actionable suggestions to carry out the refactorings. Our
not amenable to automated refactoring due to the complexity
user study reveals that the presence of improvement hints and
of formalizing the required general transformation strategies.
associated automatic refactorings increases the likelihood of
Nevertheless, some non-trivial refactorings would likely to
programmers deciding to improve code quality.
present a cognitive burden and may interrupt the creative flow,
Although this work focuses on Scratch, our experiences
without seamless and effective automated support to encourage
and findings can benefit designers and developers of other
their application.
block-based programming environments. Future designs of
Perceived Code Quality Judging code quality remains
block-based environments can be improved to make pro-
somewhat subjective as our user study results reveal. The
gram analyses and transformations easily accessible, so as
participants regardless of their programming experience per-
to facilitate the development of semantic editing support, in
ceive code quality (program comprehensibility) differently.
addition to improving code quality. Our findings serve as a
The participants also rate their code positively on how easy
starting point in determining which refactorings are likely to
it is to be understood by other programmers new to the
be useful and worthwhile to programmers in this domain. We
language, even though their finished programs exhibit a similar
plan to investigate further how novice programmers interact
quality to the programs they previously ranked as being harder
with the refactoring tools as part of the overall programming
to understand. Receiving no suggestions, programmers may
process. The following research questions arise: How the
be unaware about all the different alternatives they have to
presence of refactoring tools affects how novice and end-user
achieve their goal. The availability of automated refactoring
programmers code? How effectively this presence raises the
influences programmers to become aware of code quality and
code quality awareness among programmers? The answers
how they can improve it.
could inform the research community how much providing
Educational Benefits: These improvement hints and their
refactoring support raises the importance of code quality in the
associated refactorings can provide a timely intervention to
minds of programmers in this increasingly important domain.
help novice programmers become aware of alternative design
and implementation options that can improve code quality. For
ACKNOWLEDGMENTS
example, certain quality hints and refactorings may alleviate
the low usage of procedures (a well-documented observation The authors would like to thank Franklyn Turbak and
in a prior work [26]), thus elevating the role of procedural the anonymous reviewers for their valuable feedback that
abstraction—a fundamental concept in CS education and pro- helped improve this manuscript. This research is supported
fessional software development—in this domain. Our evalua- by the National Science Foundation through the Grant DUE-
tion results focusing on the participants without programming 1712131.
R EFERENCES [14] G. Hedin and E. Magnusson, “JastAdd—an aspect-oriented compiler
construction system,” Science of Computer Programming, vol. 47, no. 1,
[1] M. Smith, “Computer science for all,” 2016. [Online]. Available:
pp. 37–58, 2003.
https://www.whitehouse.gov/blog/2016/01/30/computer-science-all
[15] E. Söderberg, T. Ekman, G. Hedin, and E. Magnusson, “Extensible
[2] M. Resnick, J. Maloney, A. Monroy-Hernández, N. Rusk, E. Eastmond,
intraprocedural flow analysis at the abstract syntax tree level,” Science of
K. Brennan, A. Millner, E. Rosenbaum, J. Silver, B. Silverman et al.,
Computer Programming, vol. 78, no. 10, pp. 1809 – 1827, 2013, special
“Scratch: programming for all,” Communications of the ACM, vol. 52,
section on Language Descriptions Tools and Applications (LDTA’08 &
no. 11, pp. 60–67, 2009.
’09) & Special section on Software Engineering Aspects of Ubiquitous
[3] F. Hermans and E. Aivaloglou, “Do code smells hamper novice program-
Computing and Ambient Intelligence (UCAmI 2011).
ming? A controlled experiment on Scratch programs,” in 2016 IEEE
[16] K. T. Stolee and S. Elbaum, “Refactoring pipe-like mashups for end-user
24th International Conference on Program Comprehension (ICPC), May
programmers,” in Proceedings of the 33rd International Conference on
2016, pp. 1–10.
Software Engineering. ACM, 2011, pp. 81–90.
[4] P. Techapalokul and E. Tilevich, “Understanding recurring quality prob-
[17] S. Badame and D. Dig, “Refactoring meets spreadsheet formulas,” in
lems and their impact on code sharing in block-based software,” in
2012 28th IEEE International Conference on Software Maintenance
Proceedings of IEEE Symposium on Visual Languages and Human-
(ICSM), Sept 2012, pp. 399–409.
Centric Computing, VL/HCC, 2017.
[18] B. Boe, C. Hill, M. Len, G. Dreschler, P. Conrad, and D. Franklin, “Hair-
[5] Y. Ohshima, J. Mnig, and J. Maloney, “A module system for a general-
ball: Lint-inspired static analysis of Scratch projects,” in Proceeding of
purpose blocks language,” in 2015 IEEE Blocks and Beyond Workshop
the 44th ACM technical symposium on Computer science education.
(Blocks and Beyond), Oct 2015, pp. 39–44.
ACM, 2013, pp. 215–220.
[6] F. Hermans, K. T. Stolee, and D. Hoepelman, “Smells in block-
[19] J. Moreno-León, G. Robles, and M. Román-González, “Dr. Scratch:
based programming languages,” in 2016 IEEE Symposium on Visual
Automatic analysis of Scratch projects to assess and foster computational
Languages and Human-Centric Computing (VL/HCC), Sept 2016, pp.
thinking,” RED. Revista de Educación a Distancia, no. 46, pp. 1–23,
68–72.
2015.
[7] G. Robles, J. Moreno-León, E. Aivaloglou, and F. Hermans, “Software
[20] P. Techapalokul and E. Tilevich, “Quality Hound — an online code smell
clones in Scratch projects: On the presence of copy-and-paste in com-
analyzer for Scratch programs,” in 2017 IEEE Symposium on Visual
putational thinking learning,” in Software Clones (IWSC), 2017 IEEE
Languages and Human-Centric Computing (VL/HCC), Oct 2017, pp.
11th International Workshop on. IEEE, 2017, pp. 1–7.
337–338.
[8] P. Techapalokul and E. Tilevich, “Novice programmers and software [21] Y. Y. Lee, N. Chen, and R. E. Johnson, “Drag-and-drop refactoring: In-
quality: Trends and implications,” in 2017 IEEE 30th Conference on tuitive and efficient program transformation,” in 2013 35th International
Software Engineering Education and Training (CSEE&T), Nov 2017, Conference on Software Engineering (ICSE), May 2013, pp. 23–32.
pp. 246–250. [22] M. Fowler and K. Beck, Refactoring: Improving the Design of Existing
[9] E. Murphy-Hill, C. Parnin, and A. P. Black, “How we refactor, and how Code. Addison-Wesley Professional, 1999.
we know it,” 2009 IEEE 31st International Conference on Software [23] C. K. Roy, J. R. Cordy, and R. Koschke, “Comparison and evaluation
Engineering, pp. 287–297, 2009. of code clone detection techniques and tools: A qualitative approach,”
[10] D. Bau, J. Gray, C. Kelleher, J. Sheldon, and F. Turbak, “Learnable Science of Computer Programming, vol. 74, no. 7, pp. 470 – 495,
programming: Blocks and beyond,” Commun. ACM, vol. 60, no. 6, 2009. [Online]. Available: http://www.sciencedirect.com/science/article/
pp. 72–80, May 2017. [Online]. Available: http://doi.acm.org/10.1145/ pii/S0167642309000367
3015455 [24] T. J. McCabe, “A complexity measure,” IEEE Transactions on Software
[11] N. Fraser et al., “Blockly: A visual programming editor,” URL: Engineering, pp. 308–320, Dec 1976.
https://developers.google.com/blockly/, 2013. [25] M. Bray, K. Brune, D. A. Fisher, J. Foreman, and M. Gerken, “C4 soft-
[12] T. Mens and T. Tourwe, “A survey of software refactoring,” IEEE ware technology reference guide-a prototype.” Carnegie-Mellon Univ
Transactions on Software Engineering, vol. 30, no. 2, pp. 126–139, Feb Pittsburgh Pa Software Engineering Inst, Tech. Rep., 1997.
2004. [26] I. Li, F. Turbak, and E. Mustafaraj, “Calls of the wild: Exploring
[13] J. L. Overbey and R. E. Johnson, “Differential precondition checking: procedural abstraction in app inventor,” in 2017 IEEE Blocks and Beyond
A lightweight, reusable analysis for refactoring tools,” in 2011 26th Workshop (B&B), Oct 2017, pp. 79–86.
IEEE/ACM International Conference on Automated Software Engineer-
ing (ASE 2011), Nov 2011, pp. 303–312.