Android Code Smells - From Introduction To Refactoring
Android Code Smells - From Introduction To Refactoring
article info a b s t r a c t
Article history: Object-oriented code smells are well-known concepts in software engineering that refer to bad design
Received 16 September 2020 and development practices commonly observed in software systems. With the emergence of mobile
Received in revised form 31 January 2021 apps, new classes of code smells have been identified by the research community as mobile-specific
Accepted 26 March 2021 code smells. These code smells are presented as symptoms of important performance issues or
Available online 1 April 2021
bottlenecks. Despite the multiple empirical studies about these new code smells, their diffuseness
and evolution along change histories remains unclear.
We present in this article a large-scale empirical study that inspects the introduction, evolution,
and removal of Android code smells. This study relies on data extracted from 324 apps, a manual
analysis of 561 smell-removing commits, and discussions with 25 Android developers. Our findings
reveal that the high diffuseness of mobile-specific code smells is not a result of releasing pressure. We
also found that the removal of these code smells is generally a side effect of maintenance activities as
developers do not refactor smell instances even when they are aware of them.
© 2021 Elsevier Inc. All rights reserved.
https://doi.org/10.1016/j.jss.2021.110964
0164-1212/© 2021 Elsevier Inc. All rights reserved.
S. Habchi, N. Moha and R. Rouvoy The Journal of Systems & Software 177 (2021) 110964
is the code smell considering its occurrence chances and its host 5. Developers who intentionally refactor code smells affirm
entity. that their actions were driven and assisted by built-in code
analysis tools.
Lack of release analysis. Some studies (Habchi et al., 2019a,c)
6. Developers who did not refactor Android code smells
leveraged the change history of mobile apps to better under-
doubted their performance impact and the usefulness of
stand code smells. Specifically, our previous work (Habchi et al.,
their refactoring. Some developers also preferred to handle
2019c) evaluated the impact of releases on code smell survival.
performance issues when they arise instead of anticipating
However, this study did not assess the impact of releases on the
them.
introduction and removal of code smells. Releases are usually
considered as a factor that favours code smells and technical debt This study provides a comprehensible replication package
in general, since they push developers to code rapidly and meet (Habchi et al., 2019b), which includes the used tools and data
deadlines regardless of quality constraints (Tom et al., 2013). Be- analysis scripts, the extracted data, and the results of the quali-
sides, mobile apps are known for having more frequent releases tative analysis.
and updates (McIlroy et al., 2016), which may contribute to the The remainder of this article is organized as follows. Section 2
prevalence of mobile code smells. Considering these potential explains the study design, while Section 3 reports on the results.
factors, it is important to analyse the impact of releases on the Section 4 interprets and discusses these results, and Section 5
presence of mobile code smells. exposes the threats to validity. Finally, Section 6 analyses related
works, and Section 7 concludes with our main findings.
Lack of qualitative analysis. In a previous work (Habchi et al.,
2018), we interviewed developers to understand their usage of
2. Study design
linters to anticipate performance bottlenecks in mobile apps. This
study gave insights about the adequacy of static analysers as a To perform this study, we relied on the artefacts that we built
solution for mobile code smells. Nonetheless, other facets of these in our previous works about mobile code smells. In particular, we
code smells still require qualitative investigation. In particular, we leveraged the dataset of code smell history (Habchi et al., 2019a,c)
lack knowledge about how do developers remove mobile code to collect the necessary data for this study. Then, we followed
smells from the source code. This knowledge is important to:
different approaches to analyse this data and answer our research
• Assess developers’ awareness of mobile code smells; questions.
• Check whether developers refactor these code smells in-
tentionally or not; 2.1. Dataset
• Learn removal techniques from developers and aliment
future studies about code smell refactoring. In previous works, we created a dataset containing the history
of mobile-specific code smells. This dataset was built by running
In this article, we address these lacks by answering the follow- Sniffer (Habchi and Veuiller, 2019) on a set of Android apps and
ing research questions: tracking 8 mobile code smells. For self-containment purposes, we
present in this section (i) Sniffer, (ii) the 8 code smells, and (iii)
• RQ 1: How frequent and diffuse are mobile code smell the contents of this dataset.
introductions?
• RQ 2: How do releases impact introductions and removals 2.1.1. Sniffer
of mobile code smells? Sniffer is an open-source (Habchi and Veuiller, 2019) toolkit
• RQ 3: How do developers remove mobile code smells? that tracks the full history of Android-specific code smells. It
• RQ 4: Do developers refactor mobile code smells? tackles many issues raised by the Git mining community by track-
To answer these questions, we build on the artefacts of our ing branches and detecting renaming (Kovalenko et al., 2018).
previous works (Habchi et al., 2019a,c) to perform an empiri- Sniffer builds the code smell history by following a three-step
cal study where we leverage both quantitative and qualitative process. First, from the repository of the app understudy, it ex-
analyses to inspect introductions and removals of 8 types of tracts the commits and other necessary metadata like branches,
Android code smells. Specifically, we analyse the evolution of releases, and commit authors. In the second step, it analyses the
180k code smell instances to answer RQ 1 and RQ 2. Then, we source code of each commit separately to detect code smell in-
manually explore 561 code smell removals to answer RQ 3 and stances. Finally, based on the code smell instances and the repos-
finally we interview 25 smell-removing developers to answer itory metadata, it tracks the history of each smell and records it
RQ 4. The results of this study show that: in the output database.
The performance of Sniffer was manually validated using 384
1. Regarding frequency and diffuseness, there is an important commits randomly sampled from open-source Android apps. This
discrepancy between code smell types. No Low Memory Re- validation showed that it can detect code smell introductions
solver and Leaking Inner Class are the most diffuse by af- with F1-score of 0.97 and code smell removals with a score of
fecting more than 80% of the activities and inner classes, 0.96.
respectively.
2. Releases do not have an impact on the introductions and 2.1.2. Code smells
removals of code smells in open-source Android apps. The dataset covers all the 8 types of Android-specific code
3. 79% of code smell instances are removed through the smells that are detectable by Sniffer. These code smells are
change history. However, these removals are mostly caused performance-oriented and they originate from the catalogues of
by large source code removals that do not mention refac- Reimann et al. (2014) and Hecht (2017), Hecht et al. (2015a).
toring. Also, only 19% of developers who authored these Unlike other Android code smells, these 8 smells are objective—
removals confirmed that their actions were intentional i.e., they either exist in the code or not, and cannot be introduced
refactoring. or removed gradually. Hence, their introduction and removal
4. Developers who are aware of Android code smells do not can be attributed to specific commits without confusion. Table 1
necessarily refactor them. The code smell Init OnDraw was presents these code smells with a highlight on source code enti-
recognized by 64% of the participants, but only 12% of them ties in which they can appear. We also mention the performance
refactored it. resource impacted by each code smell.
2
S. Habchi, N. Moha and R. Rouvoy The Journal of Systems & Software 177 (2021) 110964
Table 1 Table 2
Studied code smells. Content of the dataset.
Leaking Inner Class (LIC ): in Android, anonymous and non-static inner Apps Commits Files Smell Instances Developers Releases Branches
classes hold a reference of the containing class. This can prevent the
garbage collector from freeing the memory space of the outer class even 324 255,798 190,745 180,013 5,104 11,118 21,210
when it is not used anymore, and thus causing memory leaks (Android,
2017; Reimann et al., 2014).
Entity: Inner class.
this dataset. It is worth noting that the number of files presents
Impact: Memory.
all the files (classes, XML, configuration, etc.) that were analysed
Member Ignoring Method (MIM ): this smell occurs when a method that through the change history and not only the last versions of the
is not a constructor and does not access non-static attributes is not static.
apps. The first commit in this dataset is from November 2007 and
As the invocation of static methods is 15%–20% faster than dynamic
invocations, the framework recommends making these methods the last one is from October 2018.
static (Hecht et al., 2015a).
Entity: Method. 2.2. Data analysis
Impact: CPU.
In this subsection, we describe our approach for analysing the
No Low Memory Resolver (NLMR): this code smell occurs when an
Activity does not implement the method onLowMemory(). This method
collected data to answer our research questions. Table 3 reports
is called by the operating system when running low on memory in order on the list of metrics that we defined for this purpose.
to free allocated and unused memory spaces. If it is not implemented, the As shown in Table 1, every code smell type affects a specific
operating system may kill the process (Reimann et al., 2014). entity of the source code. Therefore, to compute the metric %dif-
Entity: Activity. fuseness, we only focused on these entities. For instance, the
Impact: Memory. code smell Init OnDraw affects only the entity View, thus we
Hashmap Usage (HMU ): the usage of HashMap is inadvisable when
compute the percentage of views affected. This allows us to focus
managing small sets in Android. Using HashMaps entails the auto-boxing on the relevant parts of the source code and have a precise vision
process where primitive types are converted into generic objects. The issue about the code smell diffuseness. For each app a, the diffuseness
is that generic objects are much larger than primitive types, 16 and 4 of a type of code smells t that affects an entity e is defined by:
bytes, respectively. Therefore, the framework recommends using the
SparseArray data structure that is more memory-efficient (Android, #affected-entities(a, t)
2017; Reimann et al., 2014). %diffuseness(a, t) =
#available-entities(a, e)
Entity: Method.
Impact: Memory.
For instance, the diffuseness of the code smell No Low Memory
Resolver (NLMR) in an app a is:
UI Overdraw (UIO): a UI Overdraw is a situation where a pixel of the
screen is drawn many times in the same frame. This happens when the UI #NLMR-instances(a)
%diffuseness(a, NLMR) =
design consists of unneeded overlapping layers, e.g., hidden backgrounds. #activities(a)
To avoid such situations, the canvas.quickreject() API should be used
to define the view boundaries that are drawable (Android, 2017; Reimann Where #NLMR-instances(a) is the number of No Low Memory
et al., 2014). Resolver instances in the app a and #activities(a) is the number
Entity: View. of activities in a.
Impact: GPU. For the metrics #code-removed and %code-removed, we
Unsupported Hardware Acceleration (UHA): in Android, most of the
tracked the source code modifications that led to code smell
drawing operations are executed in the GPU. Rare drawing operations that removals. In particular, we counted all code smell removals
are executed in the CPU, e.g., drawPath method in where the host entity was also removed. For example, when an
android.graphics.Canvas, should be avoided to reduce CPU instance of the code smell No Low Memory Resolver is removed,
load (Hecht, 2017; Ni-Lewis, 2015).
the removal can be counted as #code-removed only if the host
Entity: Method.
Activity has also been removed in the same commit.
Impact: CPU.
Init OnDraw (IOD): a.k.a. DrawAllocation, this occurs when allocations are 2.2.1. RQ 1: How frequent and diffuse are mobile code smell intro-
made inside onDraw() routines. The onDraw() methods are responsible ductions?
for drawing Views and they are invoked 60 times per second. Therefore,
allocations (init) should be avoided inside them in order to avoid memory
To inspect the prevalence of code smells, we computed—for
churn (Android, 2017). each code smell type—the metrics: #introductions and %affected-
Entity: View. apps. These metrics allow us to compare the prevalence of dif-
Impact: Memory. ferent code smell types. Then, to obtain a precise assessment
of this prevalence, we also used the metric: %diffuseness. We
Unsuited LRU Cache Size (UCS): in Android, a cache can be used to store
computed the diffuseness of each code smell type in every app
frequently used objects with the Least Recently Used (LRU) API. The
code smell occurs when the LRU is initialized without checking the of our dataset. Finally, we plotted the distribution to show how
available memory via the getMemoryClass() method. The available diffuse are code smells compared to their host entities.
memory may vary considerably according to the device so it is necessary to
adapt the cache size to the available memory (Hecht, 2017; McAnlis, 2015).
2.2.2. Rq 2: How do releases impact introductions and removals of
Entity: Method. mobile code smells?
Impact: Memory. This research question focuses on the impact of releases on
code smell evolution. To ensure the relevance of this investiga-
tion, we paid careful attention to the suitability of the studied
apps for a release inspection. In particular, we manually checked
2.1.3. Content
the timeline of each app to verify that it publishes releases
Running Sniffer on a set of 324 open-source Android apps through all the change history. We excluded apps that did not
resulted in a dataset with the history of all code smell instances use releases at all, and apps that used them only at some stage.
that appeared in these apps. Table 2 summarizes the contents of For instance, the Chanu app (Nittner, 2016) only started using
3
S. Habchi, N. Moha and R. Rouvoy The Journal of Systems & Software 177 (2021) 110964
Table 3
Study metrics.
Metric Description
releases in the last 100 commits, while the first 1337 commits the common guidelines: negligible (N) for |d| < 0.10, small (S)
do not have any releases. Hence, this app is, to a large extent, for 0.10 ≤ |d| < 0.33, medium (M) for 0.33 ≤ |d| < 0.474, and
release-free and thus irrelevant for this research question. Out of large (L) for |d| ≥ 0.474 (Grissom and Kim, 2005).
the 324 studied apps, we found 156 that used releases during
all the change history. The list of these apps can be found in 2.2.3. RQ 3: How do developers remove mobile code smells?
our study artefacts (Habchi et al., 2019b). It is also worth noting Quantitative analysis. First, we computed for each code smell
that as Android apps are known for continuous delivery and type the metrics: #removals and %removals. Then, to gain in-
releasing (Android, 2019; McIlroy et al., 2016), we considered in sights about the actions that lead to code smell removals, we
this analysis both minor and major releases. This allows us to computed: #code-removed and %code-removed. The metric
perform a fine-grained study with more releases to analyse. %code-removed reports the percentage of code smell instances
We used this set of 156 apps to evaluate the impact of releases that were removed with source code removal. This metric pro-
on code smell introductions and removals. First, we visualized for vides us a first idea about code smell removal techniques. To
each project the evolution of source code and code smells along push further and identify the fine-grained actions that removed
with releases. We also plotted the evolution of code smell diffuse- code smells, we opted for qualitative analysis.
ness for all studied apps. This visualization provides insights into
the impact of releases and the evolution patterns of code smells. Qualitative analysis. The objective of our analysis is to understand
To accurately measure the impact of releases, we analysed the how code smells are removed. To achieve this, we manually
effect of approaching releases on the numbers of introductions analysed a sample of code smell removals. We used a stratified
and removals performed in commits. Therefore, we used the sample to make sure to consider a statistically significant sample
metrics distance-to-release and time-to-release. for each code smell. In particular, we randomly selected a set of
561 code smell removals from our dataset. This represents a 95%
Distance to release. We aimed to evaluate the relationship be- statistically significant stratified sample with a 10% confidence
tween the distance to release and the numbers of code smells interval of the 143,995 removals detected in our dataset. The
introduced and removed per commit. For this purpose, we as- stratum of the sample is represented by the 8 studied code smells.
sessed the correlation between the distance-to-release and both This sample includes commits from After sampling, we analysed
#commit-introductions and #commit-removals using Spearman’s every smell-removing commit to inspect two aspects:
rank coefficient. Spearman is a non-parametric measure that
assesses how well the relationship between two variables can • Commit action: The source code modification that led to
be described using a monotonic function. This measure is ade- the removal of the code smell instance. In this aspect, every
quate for our analysis as it does not require the normality of the code smell type has different theoretical ways to remove
variables and does not assess the linearity. it. We inspect the commits to identify the actions used
in practice for concretely removing code smells from the
Time to release. Using the metric time-to-release, we extracted codebase;
three commit sets: • Commit message: We checked the messages looking for
any mention of code smell removal. In this regard, we were
• Commits authored 1 day before a release,
aware that developers could refer to the smell without
• Commits authored 1 week before a release,
explicitly mentioning its name. Therefore, we thoroughly
• Commits authored 1 month before a release.
read the commit messages to look for implicit mentions of
Then, we compared the #commit-introductions and #commit- the code smell removal.
removals in the three sets using Mann–Whitney U and Cliff’s δ .
We used the two-tailed Mann–Whitney U test (Sheskin, 2003) 2.2.4. RQ 4 : Do developers refactor mobile code smells?
with a 99% confidence level, to check if the distributions of intro- The objective of this question is to verify if the code smell
ductions and removals are identical in the three sets. To quantify removals detected in the change history are actual refactoring
the effect size of the presumed difference between the sets, we operations. For this purpose, we randomly selected 424 smell-
used Cliff’s δ (Romano et al., 2006). Cliff is a non-parametric removing developers, —i.e., developers who performed code smell
effect size measure, which is reported to be more robust and removals. This represents 30% of the 1414 smell-removing de-
reliable than Cohen’s d (Cohen, 1992). Moreover, it is suitable velopers identified in our dataset. Afterwards, we filtered out
for ordinal data and it makes no assumptions of a particular developers that did not set proper emails in their git commits and
distribution (Romano et al., 2006). For interpretation, we followed we ended up with 340 developers that we can contact. We sent
4
S. Habchi, N. Moha and R. Rouvoy The Journal of Systems & Software 177 (2021) 110964
emails to these developers to ask about the removed code smells. significant disparity between the different code smell types. The
In particular, we presented the concerned code smell with the code smells Leaking Inner Class and Member Ignoring Method were
definition and code snippet that illustrates it. Then, we asked introduced more than 70,000 times, while Unsuited LRU Cache Size
them the following questions: and Init OnDraw were only introduced less than 100 times. These
results highlight two interesting observations:
1. Were you aware of this code smell?
2. Did you refactor this code smell intentionally? • The most frequently introduced code smells, Leaking Inner
Class and Member Ignoring Method, are both about source
The objective of the first question is to capture the developer’s code entities that should be static for performance opti-
knowledge and awareness of the code smell. The second question mization;
allows us to check if the code smell removals authored by the • The UI-related code smells (UI Overdraw, Unsupported Hard-
developer are intended refactorings. Depending on the outcome ware Acceleration, and Init OnDraw) are among the least
of the second question, we asked one of the following questions: frequently introduced code smells.
3. Why did you refactor this code smell? Regarding affected apps, Table 4 shows that 99% of apps had at
4. Why did not you refactor this code smell? least one code smell introduction in their change history, which
These open-questions allow developers to express their thoughts again highlights the widespread of the phenomenon. The table
about mobile code smells and explain their choices about refac- also shows that the disparity in introduction frequency is re-
toring. flected in the percentage of affected apps as frequent code smells
We received answers from 25 developers, which represents tend to affect more apps. However, we observe that having more
a response rate of 7, 35%. This rate is expectedly low as we instances does not always imply affecting more apps. In particu-
ask developers about multiple code smell instances that im- lar, No Low Memory Resolver is much less present than Leaking
pose them some deeper investment to recall and understand. Inner Class and Member Ignoring Method, 4198 vs. 98,751 and
The participants answered about all studied code smells, except 72,228, respectively. Yet, it affected more apps, 99% vs. 96% and
Unsupported Hardware Acceleration and Unsuited LRU Cache Size. 85%.
None of the responding developers was involved with these two To obtain a clear vision about these disparities, we reported in
code smells, which were indeed rare in our dataset. While most of Fig. 1 the diffuseness of code smells within their host entities in
the respondents only answered by text, two developers showed the studied apps. The figure shows that No Low Memory Resolver
an interest in the topic and we were able to perform online is the most diffuse code smell. At least 50% of the dataset apps
interviews with them. The interviews initially followed the same had this code smell in all their activities, median = 100%. Leaking
textual questions, but depending on developers’ answers, we Inner Class is also very diffuse. In most of the apps, it affected
asked additional questions. Consequently, we were able to get more than 80% of inner classes. Code smells that are hosted by
more detailed answers, especially for the two open-questions. views are less diffuse. On average, 15% of the views are affected
We transcribed the interview recordings into text using a by UI Overdraw. As for Init OnDraw, generally, it only affected less
denaturalism approach, which allows us to focus on informational than 10% of the views. Finally, code smells hosted by methods
content while still keeping a ‘‘full and faithful transcription’’. are the least diffuse. Member Ignoring Method, HashMap Usage,
Together, the interviews and the answers to our open questions Unsupported Hardware Acceleration, and Unsuited LRU Cache Size
formed material for qualitative inspection. To analyse this mate- are present in less than 3% of the methods. This low diffuseness
rial, we followed the analytical strategy of Schmidt (2004), which is not surprising as the number of methods is very high.
is well adapted for open questions. In this analysis, we relied on These results show that some frequent code smells, like Mem-
the two semantic categories: ber Ignoring Method, are not diffuse, they only impact a small
proportion of their potential host entities. Yet, code smells that
• The reasons why developers refactor code smells; seem less frequent, like UI Overdraw and Init OnDraw, are more
• The reasons why developers do not refactor code smells; diffuse and affect a bigger proportion of entities.
To encode our material, we read the developers’ answers and we
tried to identify passages that relate to these categories. Based on Android code smells are not introduced and diffused
these passages, we formulated new sub-categories. In our case, equally. No Low Memory Resolver and Leaking Inner Class
a sub-category represents a new reason for refactoring or not are the most diffuse, in average they impact more than
the code smell. To avoid redundancy, these sub-categories will 80% of the activities and inner classes, respectively.
later be presented when we report the results of this research
question. 3.2. RQ 2 : How do releases impact introductions and removals of
mobile code smells?
3. Study results
In this section, we report on the results of our release analy-
This section reports on the results of our study. It is worth sis on the 156 apps that used releases regularly. For each app,
noting that, to facilitate the replication of this study, all the results we generated code smell evolution curves that can be found
presented here are included in our companion artefacts (Habchi in our artefacts (Habchi et al., 2019b). Fig. 2a shows an exam-
et al., 2019b). ple of these curves that depicts the evolution of the number
of code smells and classes in the Seafile client app. The figure
3.1. RQ 1 : How frequent are code smell introductions? highlights the releases to show the changes in code smell num-
bers when approaching releases. From our manual examination
Table 4 reports on the number of code smells introduced and of all the evolution curves, we did not observe any tendency
the percentage of apps affected. of code smell increase or decline immediately before or after
The table shows that, in the 324 analysed apps, 180,013 releases. Generally, the number of code smells evolves with an
code smell instances were introduced. This number reflects the important growth at the first stages of feature development.
widespread of code smells in Android apps. Nonetheless, not all Then, this growth stabilizes as the projects enter the maintenance
code smells are frequently introduced. Indeed, the table shows a phase. Naturally, this pattern is not followed by all the analysed
5
S. Habchi, N. Moha and R. Rouvoy The Journal of Systems & Software 177 (2021) 110964
Table 4
Numbers of code smell introductions.
Code smell LIC MIM NLMR HMU UIO UHA IOD UCS All
projects as in many cases some components or modules are 3.2.1. Distance to release
removed, which results in a drop in the project size and the Fig. 3 presents two scatter plots that show the relationship be-
number of code smells. Fig. 2b presents an example of these tween the distance from releasing and the number of code smell
cases that were observed in the Syncthing app. Regardless of introductions and removals per commit. The first thing that leaps
the growth pattern, we observe that the number of code smells to the eye is the similarity between the two plots. Code smell
follows the project size in terms of number of classes. These introductions and removals are similarly distributed regarding
observations align with Lehman’s laws of continuing growth and the distance from releasing. We do not notice any time window
declining quality where the increase in code smells is an indicator where the code smell introductions and removals are negatively
of declining quality. correlated. We also do not visually observe any correlation be-
To isolate the impact of project size, we also generated evolu- tween the distance from release and code smell introductions and
tion curves for code smell diffuseness. Figs. 2c – 2f show examples
removals. Indeed, the Spearman’s rank correlation coefficients
of curves generated for four Android apps. From our inspection
confirm the absence of such correlations.
of these curves, we did not observe any impact of releases on
code smell diffuseness. Sometimes, we notice abrupt drops or Spearman(distance-to-release, #commit-introductions)
peaks in code smell diffuseness but these events are not ex- {
plicitly related to releases. We also notice that the diffuseness ρ = 0.04
×
evolution did not follow one simple pattern, like the raw number p-v alue < 0.05
of code smells. However, based on general trends, we observed
that three patterns were emerging frequently: consistent rise,
Spearman(distance-to-release, #commit-removals)
consistent decline, and stability. {
Fig. 2c shows an example of consistent rise in code smell ρ = 0.01
diffuseness observed in the Subsonic project. We can see how ×
p-v alue < 0.05
the project started with 0.4 code smells per class and rose consis-
tently to reach 0.8 code smells per class after 3600 commits. The The results show that for both correlations, the p-value is
opposite pattern is observed in Fig. 2d where code smell diffuse- below the threshold. Hence, we can consider the computed coef-
ness declines over the lifetime of the AndStatus app. At the early ficients as statistically significant. As these correlation coefficients
stages of this project, the diffuseness was around 0.65 smells are negligible, we can conclude that there is no monotonic rela-
per class and it declined progressively and ended up around 0.4 tionship between the distance from releasing and the numbers of
smells per class. The KISS app, depicted in Fig. 2c, shows an introductions and removals per commit.
example of stable code smell diffuseness. Despite some abrupt
peaks and drops in the initial commits, code smell diffuseness
always ranged between 0.35 and 0.45 all along 2800 commits. 3.2.2. Time to release
Some apps did not fall under any of these patterns and their After analysing the impact of the distance to release, we in-
code smell evolution had random changes along the project life- vestigate the impact of the time to release on code smell intro-
time. For instance, the 4pdaClient app, had a hill-shaped evolution ductions and removals.
curve as shown in Fig. 2f. Indeed, the diffuseness evolved con- Fig. 4 shows the density function of code smell introduc-
stantly in the first 200 commits, then started decreasing to go tions and removals in different timings. First, we observe that
back to the same initial diffuseness. code smell introductions and removals are distributed similarly.
Beyond this manual analysis, we assessed the impact of re- For each timing, the density function of code smell introductions
leases using the metrics distance-to-release and time-to-release. and removals are analogous.
6
S. Habchi, N. Moha and R. Rouvoy The Journal of Systems & Software 177 (2021) 110964
Fig. 3. The number of code smell introductions and removals per commit in the last 100 commits before release.
As for the comparison between code smell introductions per- one code smell introduction. Commits performed one week or
formed at different times, we observe that commits performed one month before release tend also to have around one code smell
one day before releasing have a higher probability to only have introduction, but they also have chances to introduce more code
7
S. Habchi, N. Moha and R. Rouvoy The Journal of Systems & Software 177 (2021) 110964
Fig. 4. The density function of code smell introductions and removals one day, one week, and one month before releasing.
smells. This means that commits authored one day before the have a coherent removal percentage; they all had from 60% to
release do not necessarily have more code smell introductions. 70% of their instances removed.
Code smell removals follow the same distribution for every The table also reports the number and percentage of code
timing. Thus, we can infer that time to release has no visible smell instances removed within source code removal—i.e., code-
impact on code smell introductions and removals. removed. The table shows that overall 59% of code smell removals
To confirm this observation, we compare in Table 5 the code are a result of removing source code. For all code smell types,
smell introductions performed one day, one week, and one month except Member Ignoring Method, more than 50% of code smell
before the release. removals are accompanied with the removal of their host entities.
The table results show that for all code smells, there is no Member Ignoring Method is the only code smell that is rarely
significant difference between code smell introductions occurring removed with source code removals— %code-removed = 20%.
on different dates before the release (p-v alue > 0.01). The effect
Table 8 compares smell-removing commits with commits that
size values confirm the results, all the quantified differences are
did not remove code smell instances, in terms of number and
small or negligible.
percentage of code lines deleted. The table shows that 50% of
Similarly, Table 6 compares code smell removals in commits
smell-removing commits deleted at least 41 code lines and 25%
authored one day, one week, and one month before the release.
of these commits deleted more than 1.39% of the codebase. This
The results are similar to the ones observed for code smell in-
troductions. The differences between different commits sets are shows that code smell removals occur in commits that perform
insignificant (p-v alue > 0.01) and effect sizes are small or large code deletions. On average, smell-removing commits delete
negligible regardless of the code smell type. 10 times more lines than commits that do not remove code
These observations suggest that there is no difference be- smells. A similar discrepancy is observed in the percentage of
tween the introduction and removal tendencies in commits au- code deleted from the codebase. 50% of smell-removing commits
thored just before release and those written days or weeks before. deleted more than 0.24% of the codebase, whereas the same
It is worth noting that for UI Overdraw, Init OnDraw, Unsup- portion of non-removing commits deleted less than 0.029% of the
ported Hardware Acceleration, and Unsuited LRU Cache Size, the codebase.
number of instances was in some cases insufficient for perform-
ing the statistical tests. Hence, our results are not applicable to Overall, 79% of code smell instances are removed through
these code smells. the change history. Except for Member Ignoring Method,
most of code smells are removed because of source code
Releases do not have an impact on the introductions and removal.
removals of Android code smells.
Table 5
Compare #commit-introductions in commits authored one day, one week, and one month before releasing.
LIC MIM NLMR HMU UIO IOD UHA UCS All
Day p > 0.01 p > 0.01 p > 0.01 p > 0.01 p > 0.01 p > 0.01 p > 0.01 − p > 0.01
Week 0.01(N) 0.05(N) 0.08(N) 0.20(S) 0.18(S) 0.10(S) − − 0.00(N)
Day p > 0.01 p > 0.01 p > 0.01 p > 0.01 − − − − p > 0.01
Month 0.02(N) 0.04(N) 0.08(N) 0.02(N) − − − − 0.01(N)
Week p > 0.01 p > 0.01 p > 0.01 p > 0.01 − − − − p > 0.01
Month 0.04(N) 0.00(N) 0.00(N) 0.04(N) − − − − 0.01(N)
Table 6
Compare #commit-removals in commits authored one day, one week, and one month before releasing.
LIC MIM NLMR HMU UIO IOD UHA UCS All
Day p > 0.01 p > 0.01 p > 0.01 p > 0.01 p > 0.01 p > 0.01 p > 0.01 − p > 0.01
Week 0.05(N) 0.03(N) 0.02(N) 0.05(N) 0.29(S) − − − 0.01(N)
Day p > 0.01 p > 0.01 p > 0.01 p > 0.01 − − − − p > 0.01
Month 0.02(N) 0.07(N) 0.30(S) 0.01(N) − − − − 0.03(N)
Week p > 0.01 p > 0.01 p > 0.01 p > 0.01 − − − − p > 0.01
Month 0.02(N) 0.04(N) 0.30(S) 0.04(N) − − − − 0.01(N)
Table 7
Number and percentage of code smell removals.
Code smell LIC MIM NLMR HMU UIO IOD UHA UCS All
Table 8
Number and percentage of lines deleted in smell-removing commits.
Commit type #Line deletions %Line deletions
Q1 10 1 0,034 0.004
Median 41 3 0.246 0.029
Q3 133 13 1.394 0.152
Code smell: Leaking Inner Class (LIC). Code smell: Member Ignoring Method (MIM).
Possible removals:
Possible removals:
• Make the inner class static; • Make the affected method static;
• Remove the inner class.
• Add method body, i.e., introduce code that accesses non-
Commit actions: We found that in 98% of the cases, LIC instances static attributes to the affected method;
are removed because inner classes are removed with other parts • Remove the affected method.
of the code. For instance, a commit from the Seadroid app that
Commit actions: We found that only 15% of MIM removals
fixes bugs also removes unused code that contained a non-static
are due to the deletion of the host methods. In most of cases,
inner class (GitHub, 2013). Hence, the commit has removed a LIC
MIM was rather removed with the introduction of source code.
instance as a side effect of the bug fixing. This finding explains the
Specifically, when empty methods are developed—with instruc-
high percentage of code-removed found for LIC in the quantitative
tions added inside—they do not correspond to the MIM definition
analysis (95%).
anymore, and thus code smell instances are removed. Also, other
We only found one case of LIC removal that was not caused
by source code deletion. It was a commit that refactored a feature instances are removed from full methods with the introduction of
and made an inner class private and static, thus removing a new instructions that access non-static attributes and methods.
code smell instance (GitHub, 2010). As this commit made diverse Finally, we did not find any case of MIM removal that was
other modifications, we could not affirm that the action was an performed by only making the method static.
intended code smell refactoring. Commit message: We did not find any commit message that
Commit message: We did not find any explicit or implicit men- referred to the removal of MIM instances.
tion of LIC in the messages of smell-removing commits. Moreover, Code smell: No Low Memory Resolver (NLMR).
the messages did not refer specifically to the removed inner
classes. Even the unique commit that removed a LIC instance with Possible removals:
a modification did not mention anything about the matter in its • Add the method onLowMemory() to the activity;
message (GitHub, 2010). • Remove the affected activity.
9
S. Habchi, N. Moha and R. Rouvoy The Journal of Systems & Software 177 (2021) 110964
Fig. 5. Answers about code smell awareness and refactoring. (For interpretation of the references to colour in this figure legend, the reader is referred to the web
version of this article.)
Overdraw, which were refactored by only 20% of the participants. into an existing code base, ‘‘for me it is a practice to dig into
Interestingly, No Low Memory Resolver and Init OnDraw were foreign code and while doing so, fixing smells along the way, as I
rarely refactored, 4% and 12% respectively. This proportion is gain understanding of the code base’’. This developer explained that
particularly low considering that Init OnDraw was acknowledged these smells are easy to refactor as they do not change from an
by 64% of the participants. app to another, ‘‘they do not relate to the app specifically, they are
Another observation is the large proportion of uncertain re- common knowledge’’.
spondents for the refactoring questions—i.e., developers respond-
Freedom. One participant affirmed that she was able to perform
ing with maybe. Indeed, this proportion ranged between 4% and
refactoring operations only because she was the main maintainer
8% for the awareness question, whereas it ranged between 8%
of an open-source project with no external stakeholders. ‘‘I have
and 28% for the refactoring question. This is reasonable as the
full control over releases. I do not have external pressure, so I can
participants may not remember for sure if they have already
invest time in keeping the source code at the quality level that
refactored this code smell or not, thus the uncertain answer.
satisfies me’’. Interestingly, this developer explicitly claims that
this same freedom was not available in industrial projects that
On average, 43% of the participants recognized mobile she contributed to as part of her job.
code smells but only 19% of them confirmed their refac-
toring. Developers who are aware of Android code smells
Developers who refactored Android code smells are mo-
do not necessarily refactor them. Init OnDraw was recog-
tivated and assisted by built-in code analysis tools and
nized by 64% of the participants and only refactored by
their personal commitment to code quality.
12% of them.
For the open-questions, we present the answers following the 3.4.2. Reasons why developers do not refactor mobile code smells
semantic sub-categories that we identified. The impact is not significant. Five participants judged that the
impact of mobile code smells is not big enough to care about
3.4.1. Reasons why developers refactor mobile code smells them. In particular, one developer claimed that ‘‘the performance
Code analysis tools. Three participants claimed that they refactor difference is really tiny or non-existent in the end’’. The same
code smells because they are reported by code analysis tools. developer also claimed that most of these code smells are auto-
One participant affirmed refactoring all critical code smells that matically mitigated by the runtime, ‘‘ART can perform this kind of
are detected by Android Lint. ‘‘I trust the default configuration optimizations’’. Another developer explained how the architecture
of the linter, so if something is flagged as critical, I will stop the of mobile apps makes these code smells less concerning, ‘‘well
build to fix it’’. Another participant emphasized the impact of designed apps have most of the logic implemented in the backend
using such tools, ‘‘IDEs and their built-in code analysis tools provide and unless it is a game, the UI is not updated very often. Therefore,
good warnings on these problems, which encourage people to fix most of these performance issues are usually no issue at all’’. A simi-
them, even if they are unaware of them’’. According to another lar argument was constructed by another developer who believed
participant, the help given by these tools goes beyond refactoring that performance issues arise from the connection to the backend
code smells, ‘‘the tools provide context and explanation that sharpen instead of the frontend code smells, ‘‘network latency typically
the programmers’ perception in the future to avoid such problems dominates in mobile app responsiveness’’. Other participants gave
beforehand. This kind of nudging should not be underestimated’’. specific examples about code smells that seemed impotent for
them. One developer downplayed the impact of HashMap Usage,
Personal practices. Two developers considered code smell refac- claiming that ‘‘using SparseArray collections is better for memory
toring as a good development practice that they adopt. The first usage but it is less important on modern devices than it was ten years
participant said that she refactors code smells regularly, ‘‘I always ago’’. No Low Memory Resolver was also considered irrelevant
try to improve the code source of projects that I work on. If I notice by one participant who described different approaches to free
an issue, I fix it’’. The second participant described refactoring memory using the activity life-cycle callbacks, e.g., onPause()
code smells as a part of the routine that she follows while getting and onResume(). This participant also added that this code smell
11
S. Habchi, N. Moha and R. Rouvoy The Journal of Systems & Software 177 (2021) 110964
is not commonly acknowledged by developers, ‘‘I have been part 4. Discussion and implications
of the Android community since the very beginning, 2010, and I do
not recall hearing about this code smell at all’’. Releases. The results of RQ 2 show that the pressure of releases
do not have an impact on code smell introductions and removals.
Not a performance problem. Three participants expressed their This suggests that code smells are not introduced as a result of
doubts about the relationship between mobile code smells and releasing pressure. Moreover, when asked about the reasons for
performance. For instance, one developer considered that Member not refactoring code smells, developers did not blame releases.
Ignoring Method is a ‘‘code usability’’ issue but not a performance One developer mentioned prioritization, but this did not include
one. Another developer estimated that these code smells are ‘‘not releases and rather explained how different quality aspects and
directly related to performance’’ giving as example No Low Memory outcomes are prioritized. These results may challenge the com-
Resolver and Leaking Inner Class. Another developer believed that mon beliefs about releases and their relationship with technical
these issues should be qualified as ‘‘code quality, completeness, debt in general (Tom et al., 2013). However, it is noteworthy
correctness, resource usage issues’’ instead of performance. that our results are based on open-source projects, which can
be different from their industrial counterparts. This point was
Refactoring would not help. Two developers judged that the refac-
raised by the participant who praised the freedom and control
toring of code smells is useless by giving as example No Low Mem-
that she had in her open-source project and who was aware
ory Resolver. Specifically, one developer considered that receiving
that such circumstances are rare in industrial projects. Hence, we
No Low Memory warnings is a sign of bad memory management
encourage future studies to:
by the app, and responding to the system warning would not fix
the issue. She explained: ‘‘if that point is reached it probably means • Evaluate the impact of releases on mobile code smells in
you have a memory leak or another problem with the way your industrial projects and ecosystems.
app manages memory and clearing some cache to prevent an out of
Awareness of code smells. Previous studies suggested that the
memory crash is just a temporary band-aid before the app eventually
accrual of mobile code smells and the indifference of developers
crashes anyway’’. Another participant went further by considering
toward it are signs of unawareness (Habchi et al., 2019a,c). How-
that refactoring No Low Memory Resolver could even lead to the
ever, the inputs collected from developers in RQ 4 challenge this
introduction of new bugs. She believes that ‘‘handling low memory
hypothesis. Our participants claimed to recognize many Android
seem more likely to do harm than good as implementations are likely
code smells, but they had other reasons to neglect them. In par-
to be buggy and of little benefit compared to letting the app be
ticular, some developers were reluctant to code smell refactoring
killed’’.
because they assumed that it would lead to further issues. To
Performance issues are better handled reactively. Two participants remove this obstacle, we encourage researchers and toolmakers
considered that developers should not worry about these code to:
smells because performance issues should not be handled proac- • Build tools that propose automated refactoring of mobile
tively. The first participant stated: ‘‘gut feelings about performance code smells.
issues are usually wrong’’. Thus, she advises against trusting these
feelings or instincts, ‘‘never try to optimize before a profiling has Static analysis tools. We observed that code smells that are de-
shown where exactly the problem in your application is’’. The other tected by Android Lint—i.e., Leaking Inner Class, Init OnDraw,
participant gave a similar advice and recommended relying on HashMap Usage, and UI Overdraw—are the most recognized by de-
reactive tools like profilers to deal with performance issues when velopers (Android, 2017). Developers who performed refactoring
they arise, ‘‘instead of worrying about details I advise Android also explained that their actions were motivated and assisted by
developers to worry more about UX and if the app performance slows built-in code analysis tools. Also, the only apparent refactorings
just use the profiler to locate the bottlenecks and fix that’’. identified in RQ 3 was for Init OnDraw and UI Overdraw, which
are detected by Android Lint. Some of these refactorings explicitly
Prioritization. Two participants mentioned that refactoring mo- deleted Android Lint suppressions, which shows that developers
bile code smells is not a high priority. One developer referenced considered the linter warnings and responded with an intended
the perpetual trade-off between quality improvement and new refactoring. These findings confirm that static analysers can help
features, ‘‘I always have a panoply of ideas for improving my code- in raising awareness about code smells and refactoring them.
base but I learned to prioritize features that directly benefit the Hence, we encourage researchers and toolmakers to:
client’’. The developer also described different criteria that she
• Build and integrate static analysers with more code smell
considers while initiating a refactoring, ‘‘I evaluate considering the
coverage.
source code quality in the long term. If the refactoring does not help
in terms of maintenance and performance, I will not perform it’’. Removal and refactoring. The quantitative and qualitative find-
ings of RQ 3 show that code smells are mainly removed with
The practice is justifiable. One participant judged that some prac-
source code deletion. Even for Member Ignoring Method instances,
tices that were labelled as code smells were justifiable. She
which are removed with source code introduction, we found
claimed that the use of HashMaps is a good and necessary
that they are removed because the empty and primitive methods
practice because the alternative data structure makes code main-
are developed with new statements. While we cannot judge the
tenance worse. She explained this by stating that ‘‘using Android intentions of a source code modification, most of the analysed
framework SparseArray classes in a component prevents it from commits did not reveal signs of intended refactoring and did not
being tested in JVM unit tests’’. mention the code smell. On top of that, the answers collected in
RQ 4 indicate that only a minority of smell-removing develop-
Developers that did not refactor Android code smells ers did perform a refactoring. Hence, we can suggest that most
doubt their performance impact and the usefulness of of code smell removals are a side effect of other maintenance
their refactoring. Some developers also prefer to handle activities and are not intentional refactorings. This implies that:
performance issues when they arise instead of anticipat-
ing them. • We cannot rely on code smell removals to learn refactoring
techniques. Future studies that intend to learn from the
12
S. Habchi, N. Moha and R. Rouvoy The Journal of Systems & Software 177 (2021) 110964
change history to build automated refactoring tools can- software that can serve this study. We encourage future studies
not rely on the removals of these code smells as learning to consider other datasets of open-source apps to extend this
examples. study (Geiger et al., 2018; Krutz et al., 2015). We also encourage
the inclusion of apps of different sizes as the frequency of mobile
Controversial code smells. According to RQ 4, No Low Memory
code smells follows the codebase size. Another possible threat
Resolver is the least acknowledged and refactored code smell, 36%
to external validity is that our study only concerns 8 Android-
and 4% respectively. This code smell was disapproved by many
specific code smells. Without further investigation, these results
developers who explained that the absence of a resolver does not
should not be generalized to other code smells or development
systematically result in memory issues and its presence is not
frameworks. We, therefore, encourage future studies to replicate
always useful. This disapproval can explain why this code smell
our work on other datasets and with different code smells and
affected 99% of the studied apps and was the most diffuse of our 8
mobile platforms.
code smells. Questions were also raised about other code smells,
like UI code smells, which were downplayed, and HashMap Usage, RQ 2. A possible threat to internal validity is the selection of
which was described as justifiable and irrelevant in modern de- releases and projects. We avoided this threat by selecting apps
vices. Following these questions, we invite future research works that had releases all along with their change history. This measure
to: ensures the accuracy of the metrics distance-to-release and time-
to-release. Another potential threat for this question could be the
• Reassess the relevance of these code smells and check the
presence of app releases that are not marked on GitHub. Our
accuracy of their definitions.
release detection relies on GitHub tags, which are designed for
Manage performance reactively. Developers explained that in- this purpose, but developers can always release their projects
stead of worrying about code smells, they prefer handling perfor- without using such tags. Hence, it is possible for our approach
mance bottlenecks when they arise. This reactive approach was to miss some app releases. Furthermore, our results about the
already observed and discussed in previous studies about mobile impact of releases on code smells are limited to open-source apps.
apps (Habchi et al., 2018; Linares-Vasquez et al., 2015), yet the Apps developed as part of industrial projects can be subject to
research contributions in this area remain rare. Specifically, many more external requirements and releasing pressure. We encour-
static analysers were provided to detect performance issues in age future works to extend our work by inspecting the impact of
mobile apps (Habchi et al., 2017; Hecht et al., 2015b; Palomba releases in different settings.
et al., 2017) and, to the best of our knowledge, no profiler was
provided to help in managing bottlenecks when they appear. For RQ 3. One possible threat to the internal validity of our results
this reason, we encourage future works to: could be the accuracy of our manual analysis. We tried to alle-
viate this threat by relying on objective criteria like the actions
• Build profilers that can help developers in spotting perfor- performed by the commit and the content of its message. We
mance bottlenecks and identifying their root causes. also did not judge the intentions of developers and counted on
their answers in RQ 4 to assess the proportion of real refactoring.
Relationship between code smells and performance bottlenecks. Pre-
Another threat could be the generalizability of the results of
vious studies showed that mobile developers look after perfor-
our qualitative analysis. We used a randomly selected set of
mance and take bottlenecks seriously (Linares-Vasquez et al.,
561 smell-removing commits. This represents a 95% statistically
2015), yet when asked about code smells developers seem less
significant stratified sample with a 10% confidence interval of
preoccupied. In particular, our participants doubted the impact of
code smells and questioned their association with performance. the 143,995 removals detected in our dataset. To support the
This shows that some developers do not perceive a causal re- credibility of our study, we also provide this set with our study
lationship between code smells and performance bottlenecks. artefacts.
Indeed, the existence of such a relationship remains theoreti- RQ 4. The results of our user study can be threatened by the sam-
cal. Previous studies relied on repeated execution scenarios to pling bias. We sent our questions to a set of 340 smell-removing
demonstrate the impact of Android code smells on performance developers because our objective was to check if their removals
(Carette et al., 2017; Hecht et al., 2016; Palomba et al., 2019), were actual refactoring operations. Without further investigation,
but they did not associate them with bottlenecks. Therefore, we the observed proportions of awareness and refactoring cannot be
encourage future studies to: generalized to all mobile developers. Furthermore, the answers
• Study the relationship between mobile code smells and collected in this study may be subject to acquiescence and desir-
performance bottlenecks. ability biases. Participants may be inclined to answer with ‘‘yes’’
to agree with us or seem more aware of software quality issues.
5. Threats to validity We minimized these biases by keeping the answers anonymous
and avoiding the implication that some answers are ‘‘right’’ or
General threats. The main threat to our internal validity could be ‘‘wrong’’. We also allowed participants to express their points of
an imprecise detection of code smell introductions and removals. view through the open-questions.
This imprecision is relevant in situations where code smells are
introduced and removed gradually, or when the change history 6. Related works
is not accurately tracked. However, this study only considered
objective code smells that can be introduced or removed in a sin- In this section, we report on the literature related to code
gle commit. As for history tracking, we relied on Sniffer, which smells in mobile apps and their analysis in the change history.
tracks branches and renamings and accurately detects code smell
introductions and removals (F1-score = {0.97, 0.96}). As for 6.1. Mobile code smells
external validity, the main threat is the representativeness of our
results. We used a dataset of 324 open-source Android apps with The first reference to mobile-specific code smells was when
255k commits and 180k code smell instances. It would have been Reimann et al. (2014) proposed a catalogue of 30 quality smells
preferable to consider also closed-source apps to build a more di- dedicated to Android. These code smells originate from the good
verse dataset. However, we did not have access to any proprietary and bad practices presented online in Android documentation.
13
S. Habchi, N. Moha and R. Rouvoy The Journal of Systems & Software 177 (2021) 110964
They cover various aspects like implementations, user interfaces, provide any qualitative insights about the topic. Indeed, the
or database usages and they are reported to harm properties, such only study that leveraged qualitative analysis is the one from
as efficiency, user experience, or security. Many research works Habchi et al. (2018), which investigated the perception of perfor-
built on this catalogue and proposed approaches and tools for mance bad practices by Android developers. This study reported
detecting code smells in mobile apps (Habchi et al., 2017; Hecht that developers may lack interest and awareness about Android
et al., 2015b; Kessentini and Ouni, 2017; Palomba et al., 2017). code smells. Moreover, the study showed that some developers
In particular, Hecht et al. (2015a) proposed Paprika, a tooled challenge the relevance and impact of code smells in practice. Our
approach that detects OO and Android smells in Android apps. work complements this study as it relies on another information
Paprika models Android apps as a large architectural graph and source—removal instances—to understand the phenomenon of
queries it to detect code smells. Palomba et al. (2017) proposed code smells in practice. Besides, our work also provides quan-
another tool, called aDoctor, able to identify 15 Android-specific titative insights into how these code smells are introduced and
code smells from the catalogue of Reimann et al. Habchi et al. removed in practice.
(2017) proposed an extension of Paprika that detects iOS-specific
code smells. Lately, Gupta et al. (2019) used 3 machine learning 6.3. Code smells in the change history
algorithms to generate rules that detect four Android code smells.
In their experiments, the JRip algorithm achieved the best results The evolution of code smells through the change history has
by generating rules capable of detecting smells with a 90% overall been addressed by various studies in the OO context. Tufano et al.
precision. (2017) addressed questions similar to our study. They, analysed
To cope with mobile code smells, researchers also proposed the change history of 200 open-source projects to understand
refactoring solutions. Lin and Dig (2015) proposed Asynchro- when and why code smells are introduced and for how long
nizer, a tool that extracts long-running operations into Async- they survive. They observed that most of code smells instances
Tasks, and AsyncDroid, a tool that transforms improperly-used are introduced when files are created and not due to evolution
AsyncTasks into Android IntentService. Morales et al. (2017) process. They also found that new features and enhancement
proposed EARMO, an energy-aware refactoring approach for mo- activities are responsible for most smell introductions, and new-
bile apps. They identified the energy cost of 8 OO and mobile comers are not necessarily more prone to introducing new smells.
antipatterns. Based on the cost, EARMO generates refactoring Interestingly, this study also investigated the rationales of code
sequences automatically. smell removal, showing that only 9% of code smells are removed
with specific refactoring operations. Our study yields similar re-
6.2. Empirical studies on mobile code smells sults as it shows that even though 79% of code smell instances
are removed through the change history, only 19% of code smell
Most empirical studies focused on assessing the performance removers described their actions as intentional refactoring.
impact of mobile code smells on app performance (Carette et al., Peters and Zaidman (2012) conducted a case study on 7 open-
2017; Hecht et al., 2016; Morales et al., 2016; Palomba et al., source systems to investigate the lifespan of code smells and the
2019). Hecht et al. (2016) conducted an empirical study about refactoring behaviour of developers. They found that, on average,
the individual and combined impact of 3 Android smells. They code smell instances have a lifespan of approximately 50% of
measured the performance of 2 apps with and without smells us- the examined revisions. Moreover, they noticed that, usually, one
ing the following metrics: frame time, number of delayed frames, or two developers refactor more than the others, however, the
memory usage, and number of garbage collection calls. The mea- difference is not large. Finally, they observed that the main refac-
surements showed that refactoring the Member Ignoring Method toring rationales are cleaning up dead or obsolete code, dedicated
smell improves the frames metrics by 12.4%. Carette et al. (2017) refactoring, and maintenance activities.
studied the same code smells, but focused on the energy impact. Tufano et al. (2016) analysed the change history of 152 open
They analysed 5 open-source Android apps and observed that source projects to inspect the evolution of test smells and their
relationship with code smells. Their results showed that, similarly
in one of them the refactoring of the 3 code smells reduced
to OO code smells, test smells are introduced when tests are
the global energy consumption by 4, 83%. The study of Morales
created and they have a high survivability. Their results also
et al. (2017) also showed by analysing 20 open-source apps that
suggest the existence of relationship between test smells and
refactoring antipatterns can decrease significantly energy con-
code smells of the code under test.
sumption of mobile apps. Notably, Palomba et al. (2019) showed
In the context of mobile apps, we have already leveraged the
that methods that represent a co-occurrence of Internal Setter,
change history to study code smells in previous works (Habchi
Leaking Thread, Member Ignoring Method, and Slow Loop, consume
et al., 2019a,c). The first work studied developer contributions
87 times more energy than other smelly methods.
showing that the ownership of code smells is spread across
Beyond the performance impact, empirical studies compared
developers regardless of their seniority and experience. As for the
the distribution of code smells in mobile apps and desktop sys-
second one, it studied code smell survival and showed that while
tems. Specifically, Mannan et al. (2016) compared the presence of
in terms of time Android code smells can remain in the codebase
OO code smells in Android apps and desktop applications. They
for years before being removed, it only takes 34 effective commits
did not observe major differences between these two types of
to remove 75% of them. These results suggested that developers
applications in terms of density of code smells. However, they
lack interest in code smells and most of their actions toward them
found that the distribution of OO code smells in Android is
are accidental. Result-wise, our study complements these find-
more diversified than for desktop applications. Further, Habchi
ings as it shows the reasons behind developers’ inaction toward
et al. (2017) analysed 279 iOS apps and 1500 Android apps to
code smells. Novelty-wise, our study relies on the artefacts of
compare the presence of OO and mobile-specific smells in the
these works to address new topics:
two platforms. They observed semantic similarities between the
code smells exhibited by the two platforms. On top of that, they • Removal fashions: This study inspects the actions that lead
found that Android apps tend to have more OO and mobile- to code smell removals;
specific code smells. • Refactoring: In this study, we discuss with developers to (i)
While these studies helped in understanding the distribu- check if they intentionally refactor code smells and to (ii)
tion and impact of mobile-specific code smells, they did not identify the motivations behind their actions;
14
S. Habchi, N. Moha and R. Rouvoy The Journal of Systems & Software 177 (2021) 110964
• Releases and diffuseness: Our previous work evaluated the Besides, based on our findings, we encourage tool makers to:
impact of releases on code smell survival (Habchi et al.,
2019c). In this work, we go further and assess the impact • Build profilers that can help developers in spotting per-
of releases on code smell introductions and removals. On formance bottlenecks and identifying their root causes
top of that, we provide insights about the evolution of rapidly;
code smells and their diffuseness. • Build and integrate static analysers with more code smell
coverage. This is beneficial as we observed the impact of
Another relevant study for our work was conducted by such tools on developer awareness and actions;
Mazuera-Rozo et al. (2020) who manually analysed 500 com- • Build tools that propose automated refactoring of mobile
mits that fixed performance bugs in Android and iOS apps. This code smells. Such tools are crucial for developers who are
analysis allowed them to build a taxonomy of performance bugs reluctant toward refactoring by fear of introducing further
and confirm that GUI lagging, energy leak, and memory bloat are issues.
the most common performance bugs in mobile apps. The study
also analysed the survival of performance bugs showing that on This study also provides a comprehensive replication package
average they remain for at least 90 days, which surpasses the (Habchi et al., 2019b), which includes the tools, datasets, and
average lifetime of other bug types. results.
We presented in this article a large-scale empirical study that Sarra Habchi: Conceptualization, Methodology, Investigation,
leverages quantitative and qualitative analyses to improve our Writing - original draft. Naouel Moha: Conceptualization, Writing
understanding of mobile code smells. The main findings of this - review & editing, Funding acquisition. Romain Rouvoy: Con-
study are: ceptualization, Writing - review & editing, Supervision, Funding
acquisition.
• Diffuseness: Android code smells are not introduced and
diffused equally. No Low Memory Resolver and Leaking In-
ner Class are the most diffuse by affecting 90% of activities Declaration of competing interest
and inner classes.
• Releasing pressure: Releases do not have an impact on The authors declare that they have no known competing finan-
the frequency of code smell introductions and removals in cial interests or personal relationships that could have appeared
open-source Android apps; to influence the work reported in this paper.
• Removal: 79% of code smell instances are removed through
the change history. However, these removals are mostly References
caused by large source code removals that do not men-
tion refactoring. Also, only 19% of developers who au- Android, 2017. Android lint checks. [Online; accessed April-2021], https://sites.
google.com/a/android.com/tools/tips/lint-checks.
thored these removals confirmed that their actions were Android, 2019. Android versioning. [Online; accessed January-2019], https://
intentional refactorings; developer.android.com/studio/publish/versioning.
• Awareness: Developers who are aware of Android code Carette, A., Younes, M.A.A., Hecht, G., Moha, N., Rouvoy, R., 2017. Investigating
smells do not necessarily refactor them. The code smell Init the energy impact of android smells. In: Software Analysis, Evolution and
OnDraw was recognized by 64% of the participants, but only Reengineering (SANER), 2017 IEEE 24th International Conference on. IEEE,
pp. 115–126.
12% of them refactored it; Cohen, J., 1992. A power primer.. Psychol. Bull. 112 (1), 155.
• Refactoring: Developers who refactored Android code Geiger, F.-X., Malavolta, I., Pascarella, L., Palomba, F., Di Nucci, D., Bacchelli, A.,
smells were motivated and assisted by built-in code anal- 2018. A graph-based dataset of commit history of real-world android apps.
ysis tools and their commitment to code quality. On the In: Proceedings of the 15th International Conference on Mining Software
other hand, developers that did not refactor Android code Repositories. ACM, pp. 30–33.
GitHub, 2009. Mention memory improvement. [Online; accessed January-
smells doubted their performance impact and the useful- 2019], https://github.com/k9mail/k-9/commit/909f677f912ed1a01b4ef39f2b
ness of their refactoring. Some developers also preferred d7e6b068d1f19e.
to handle performance issues when they arise instead of GitHub, 2010. Fix LIC. [Online; accessed January-2019], https://github.com/conn
anticipating them. ectbot/connectbot/commit/32bc0edb89e708b873533de94d3e58d5099cc3ba.
GitHub, 2012a. Remove NLMR. [Online; accessed January-2019], https://github.
These findings have notable implications on future research com/SilenceIM/Silence/commit/3d9475676f80a3dbd1b29f83c59e2c132fb135
agenda: b5.
GitHub, 2012b. Remove NLMR with modifications. [Online; accessed January-
• We encourage future works to evaluate the impact of re- 2019], https://github.com/k9mail/k-9/commit/bbcc4988ba52ca5e8212a7344
leases on mobile code smells in industrial projects and 4913d35c23cebc4.
GitHub, 2013. Remove LIC. [Online; accessed January-2019], https://github.com/
ecosystems. This need arises from the remarks of develop-
haiwen/seadroid/commit/74112f7acba3511a650e113aa3483dcd215af88f.
ers about the contrast between the freedom that they have Grissom, R.J., Kim, J.J., 2005. Effect sizes for research: A broad practical approach..
while developing open-source apps and the pressure that Lawrence Erlbaum Associates Publishers.
they undergo in industrial projects; Gupta, A., Suri, B., Bhat, V., 2019. Android smells detection using ML algo-
• Future studies that intend to learn from the change history rithms with static code metrics. In: International Conference on Recent
Developments in Science, Engineering and Technology. Springer, pp. 64–79.
to build automated refactoring tools cannot rely on code
Habchi, S., Blanc, X., Rouvoy, R., 2018. On adopting linters to deal with
smell removals as learning examples; performance concerns in android apps. In: Proceedings of the 33rd ACM/IEEE
• To address the questions and doubts raised by developers, International Conference on Automated Software Engineering. In: ASE 2018,
we need to reassess the relevance of Android code smells ACM, New York, NY, USA, pp. 6–16. http://dx.doi.org/10.1145/3238147.
and check the accuracy of their definitions; 3238197.
Habchi, S., Hecht, G., Rouvoy, R., Moha, N., 2017. Code smells in iOS apps:
• We intend to study the relationship between mobile code How do they compare to android? In: Proceedings of the 4th International
smells and performance bottlenecks to understand and Conference on Mobile Software Engineering and Systems. IEEE Press, pp.
assess their impact on performance. 110–121.
15
S. Habchi, N. Moha and R. Rouvoy The Journal of Systems & Software 177 (2021) 110964
Habchi, S., Moha, N., Rouvoy, R., 2019a. The rise of android code smells: Who is McAnlis, C., 2015. The magic of LRU cache (100 days of google dev). [Online;
to blame? In: Proceedings of the 16th International Conference on Mining accessed January-2019], https://youtu.be/R5ON3iwx78M.
Software Repositories. In: MSR ’19, IEEE Press, Piscataway, NJ, USA, pp. McIlroy, S., Ali, N., Hassan, A.E., 2016. Fresh apps: an empirical study of
445–456. http://dx.doi.org/10.1109/MSR.2019.00071. frequently-updated mobile apps in the google play store. Empir. Softw. Eng.
Habchi, S., Moha, N., Rouvoy, R., 2019b. Study artifacts. [Online; accessed 21 (3), 1346–1370.
June-2019], https://figshare.com/s/790170a87dd81b184b0a. Morales, R., Saborido, R., Khomh, F., Chicano, F., Antoniol, G., 2016. Anti-patterns
Habchi, S., Rouvoy, R., Moha, N., 2019c. On the survival of android code smells in and the energy efficiency of android applications. arXiv preprint arXiv:
the wild. In: Proceedings of the 6th International Conference on Mobile Soft- 1610.05711.
ware Engineering and Systems. In: MOBILESoft ’19, IEEE Press, Piscataway, Morales, R., Saborido, R., Khomh, F., Chicano, F., Antoniol, G., 2017. EARMO: An
NJ, USA, pp. 87–98, URL: http://dl.acm.org/citation.cfm?id=3340730.3340749. energy-aware refactoring approach for mobile apps. IEEE Trans. Softw. Eng..
Habchi, S., Veuiller, A., 2019. Sniffer source code. [Online; accessed March-2019], Ni-Lewis, I., 2015. Custom views and performance (100 days of google dev).
https://github.com/HabchiSarra/Sniffer/. [Online; accessed January-2019], https://youtu.be/zK2i7ivzK7M.
Hecht, G., 2017. Détection et analyse de l’impact des défauts de code dans Nittner, G., 2016. Chanu - 4chan android app. [Online; accessed January-2019],
les applications mobiles (Ph.D. thesis). Université du Québec à Montréal, https://github.com/grzegorznittner/chanu.
Université de Lille, INRIA. Palomba, F., Di Nucci, D., Panichella, A., Zaidman, A., De Lucia, A., 2017.
Hecht, G., Moha, N., Rouvoy, R., 2016. An empirical study of the performance im- Lightweight detection of android-specific code smells: The adoctor project.
pacts of android code smells. In: Proceedings of the International Workshop In: Software Analysis, Evolution and Reengineering (SANER), 2017 IEEE 24th
on Mobile Software Engineering and Systems. ACM, pp. 59–69.
International Conference on. IEEE, pp. 487–491.
Hecht, G., Omar, B., Rouvoy, R., Moha, N., Duchien, L., 2015a. Tracking the
Palomba, F., Di Nucci, D., Panichella, A., Zaidman, A., De Lucia, A., 2019. On the
software quality of android applications along their evolution. In: 30th
impact of code smells on the energy consumption of mobile applications.
IEEE/ACM International Conference on Automated Software Engineering.
Inf. Softw. Technol. 105, 43–55.
IEEE, p. 12.
Peters, R., Zaidman, A., 2012. Evaluating the lifespan of code smells using
Hecht, G., Rouvoy, R., Moha, N., Duchien, L., 2015b. Detecting Antipatterns in
software repository mining. In: Software Maintenance and Reengineering
Android Apps. Research Report RR-8693, INRIA Lille ; INRIA, URL: https:
(CSMR), 2012 16th European Conference on. IEEE, pp. 411–416.
//hal.inria.fr/hal-01122754.
Reimann, J., Brylski, M., Aßmann, U., 2014. A tool-supported quality smell
Kessentini, M., Ouni, A., 2017. Detecting android smells using multi-objective
catalogue for android developers.. Softw. Trends 34 (2), URL: http://dblp.uni-
genetic programming. In: Proceedings of the 4th International Conference
trier.de/db/journals/stt/stt34.html#ReimannBA14.
on Mobile Software Engineering and Systems. IEEE Press, pp. 122–132.
Kovalenko, V., Palomba, F., Bacchelli, A., 2018. Mining file histories: should Romano, J., Kromrey, J.D., Coraggio, J., Skowronek, J., 2006. Appropriate statistics
we consider branches? In: Proceedings of the 33rd ACM/IEEE International for ordinal level data: Should we really be using t-test and cohen’sd for
Conference on Automated Software Engineering. ACM, pp. 202–213. evaluating group differences on the NSSE and other surveys. In: Annual
Krutz, D.E., Mirakhorli, M., Malachowsky, S.A., Ruiz, A., Peterson, J., Filipski, A., Meeting of the Florida Association of Institutional Research. pp. 1–33.
Smith, J., 2015. A dataset of open-source android applications. In: Proceed- Schmidt, C., 2004. The analysis of semi-structured interviews. A Compan. Qual.
ings of the 12th Working Conference on Mining Software Repositories. IEEE Res. 253–258.
Press, pp. 522–525. Sheskin, D.J., 2003. Handbook of parametric and nonparametric statistical
Lin, Y., Dig, D., 2015. Refactorings for android asynchronous programming. In: procedures. crc Press.
Automated Software Engineering (ASE), 2015 30th IEEE/ACM International Tom, E., Aurum, A., Vidgen, R., 2013. An exploration of technical debt. J. Syst.
Conference on. IEEE, pp. 836–841. Softw. 86 (6), 1498–1516.
Linares-Vasquez, M., Vendome, C., Luo, Q., Poshyvanyk, D., 2015. How developers Tufano, M., Palomba, F., Bavota, G., Di Penta, M., Oliveto, R., De Lucia, A.,
detect and fix performance bottlenecks in android apps. In: 2015 IEEE Poshyvanyk, D., 2016. An empirical investigation into the nature of test
International Conference on Software Maintenance and Evolution (ICSME). smells. In: Proceedings of the 31st IEEE/ACM International Conference on
IEEE, pp. 352–361. Automated Software Engineering, pp. 4–15.
Mannan, U.A., Ahmed, I., Almurshed, R.A.M., Dig, D., Jensen, C., 2016. Understand- Tufano, M., Palomba, F., Oliveto, R., Penta, M.D., Lucia, A.D., Poshyvanyk, D.,
ing code smells in android applications. In: Proceedings of the International 2017. When and why your code starts to smell bad (and whether the
Workshop on Mobile Software Engineering and Systems. ACM, pp. 225–234. smells go away). IEEE Trans. Softw. Eng. PP, http://dx.doi.org/10.1109/TSE.
Mazuera-Rozo, A., Trubiani, C., Linares-Vásquez, M., Bavota, G., 2020. Investi- 2017.2653105.
gating types and survivability of performance bugs in mobile apps. Empir.
Softw. Eng. 1–43.
16