Effectiveness of Kotlin vs. Java in Android App Development Tasks
Effectiveness of Kotlin vs. Java in Android App Development Tasks
Repository ISTITUZIONALE
Original
Effectiveness of Kotlin vs. Java in Android App Development Tasks / Ardito, Luca; Coppola, Riccardo; Malnati, Giovanni;
Torchiano, Marco. - In: INFORMATION AND SOFTWARE TECHNOLOGY. - ISSN 0950-5849. - ELETTRONICO. -
127:(2020). [10.1016/j.infsof.2020.106374]
Availability:
This version is available at: 11583/2837799 since: 2020-10-22T16:56:14Z
Publisher:
Elsevier
Published
DOI:10.1016/j.infsof.2020.106374
Terms of use:
This article is made available under terms and conditions as specified in the corresponding bibliographic description in
the repository
Publisher copyright
Elsevier postprint/Author's Accepted Manuscript
© 2020. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
http://creativecommons.org/licenses/by-nc-nd/4.0/.The final authenticated version is available online at:
http://dx.doi.org/10.1016/j.infsof.2020.106374
23 December 2023
Effectiveness of Kotlin vs. Java in Android App
Development Tasks
Luca Ardito, Riccardo Coppola, Giovanni Malnati, Marco Torchiano
Politecnico di Torino
Department of Control and Computer Engineering
Turin, Italy
[email protected]
Abstract
Context: Kotlin is a new programming language representing an alternative
to Java; they both target the same JVM and can safely coexist in the same
application. Kotlin is advertised as capable to solve several known limitations
of Java. Recent surveys show that Kotlin achieved a relevant diffusion among
Java developers.
Goal: We planned to empirically assess a few typical promises of Kotlin
w.r.t. known Java’s limitations, in terms of development effectiveness, main-
tainability, and ease of development.
Method: Our experiment involved 27 teams of 4 people each that com-
pleted a set of maintenance tasks (both defect correction and feature ad-
dition) on Android apps written in either Java or Kotlin. In addition to
the number of fixed defects, effort, and code size, we collected, though a
questionnaire, the participants’ perceptions about the avoidance of known
pitfalls.
Results: We did not observe any significant difference in terms of main-
tainability between the two languages.We found a significant difference re-
garding the amount of code written, which constitutes evidence of better
conciseness of Kotlin. Concerning ease of development, the frequency of
NullPointerExceptions reported by the subjects was significantly lower when
developing in Kotlin. On the other hand, no significant difference was found
in the occurrence of other common Java pitfalls. Finally, the IDE support
was deemed better for Java than Kotlin.
Conclusions: Some of the promises of Kotlin to be a ”better Java” have
been confirmed by our empirical assessment. Evidence suggests that the
1 1. Introduction
2 Kotlin is a modern programming language, appeared in 2011, which rep-
3 resents an alternative to Java, with which it can seamlessly coexist. Many
4 pieces of evidence are available in the literature underlining that Kotlin is
5 gaining traction among Android software developers. In a previous study,
6 we mined all Android apps hosted on the F-Droid platform and updated
7 after October 2017: we found that nearly one-fifth of them featured Kotlin
8 code, with 2/3 of those projects featuring more Kotlin than Java code [1].
9 Similar trends have been reported by Oliveira et al. regarding the number of
10 StackOverflow questions about Kotlin programming for Android and GitHub
11 repositories with Kotlin [2].
12 One of the main design guidelines that led to the development of the
13 Kotlin language is a better handling of null values. In the Java language,
14 without the usage of specific checks, the handling of null values can lead
15 to NullPointerExceptions (NPE). Several studies in the literature report the
16 prominent role of NullPointerExceptions among the reasons for Android ap-
17 plication to crash. Coelho et al. report that near 30% of all stack traces col-
18 lected upon the Android app crash contained NPEs as their root causes [3].
19 The authors also underline the difficulty in protecting the code against those
20 exceptions, especially when the app does not have access to third-party source
21 code. NPEs can also happen – as Payet and Spoto report – in the link be-
22 tween the XML layouts and explicit application code casts [4]. Such a link
23 is obtained utilizing the very commonly used setContentView and findView-
24 ById methods. These method calls are very crucial, and frequent operations
25 are executed every time the components of the application screen are instan-
26 tiated. The effects of those issues are amplified by misuses of the exception
27 handling mechanisms provided by Java, which are documented frequently
28 among Android developers [5].
29 Readability and conciseness are considered key-features of the Kotlin lan-
30 guage, especially for what concerns the declaration of objects and classes with
31 numerous attributes [6].
2
32 The novelty of the Kotlin language, and the easiness in adapting existing
33 (and possibly long-running) Java projects to it, suggests the need for an
34 evaluation of the benefits guaranteed to developers from such transition.
35 Many advantages are reported by works in the specialized literature, but to
36 the best of our knowledge, their empirical assessment is still missing. With
37 this work, we aimed at assessing some assumed advantages of Kotlin with
38 respect to Java in the context of Android development and maintenance.
39 To do so, we conducted a controlled study with undergraduate students, a
40 sample that can represent average Kotlin developers due to the low experience
41 possessed – as of today – by developers with such language.
42 In light of an ever increasing adoption of Kotlin for Android development,
43 this empirical assessment aims to provide practical evidence that could help
44 in a transition from Java to Kotlin. In particular we focused on possible ef-
45 fects on maintainability, conciseness, and avoidance of a few common pitfalls.
46 The remainder of the paper is organized as follows: Section 2 provides
47 some background for Kotlin programming, its characteristics, and the recent
48 trends of its diffusion, and it provides a brief review of related work in liter-
49 ature; Section 3 describes the goal, procedure, participants and material of
50 the experiment, along with possible threats to the validity of our findings;
51 Section 4 discusses the threats to the validity of this study; Section 5 reports
52 the results of the experiment, that are discussed in section 6; finally, Section
53 7 concludes the paper.
54 2. Background
55 Kotlin first appeared in 2011, but its first stable release was distributed
56 in February 2016. In May 2017, Kotlin became a first-class language on
57 Android, and support was provided by the Android Studio DE since release
58 3.0 of October 2017. The popularity of Kotlin increased rapidly since then.
59 The State of Developer Ecosystem in 2018 shows that Kotlin is mainly used
60 for mobile and Server applications working mainly in Oreo and Nougat in
61 Android, and JDK 8 in servers. According to statistics provided by JetBrains,
62 only around 40% of Kotlin developers have adopted the language for more
63 than one year1 .
1
https://www.jetbrains.com/research/devecosystem-2019/ Last visited January
2020
3
64 Kotlin is a statically typed programming language that runs on the Java
65 Virtual Machine (JVM) and fully interoperates with Java: it is possible to
66 mix Kotlin and Java code in the same application, to call Kotlin code from
67 Java code and vice versa [7]. The two languages share several common-
68 alities [2], and the official documentation of Kotlin itself reports its main
69 characteristics by means of comparisons with Java.
70 Kotlin takes a pragmatic approach, such as not re-implementing the en-
71 tire Java collections framework making it compatible with the JDK collection
72 interfaces without breaking any existing project implementations. For exam-
73 ple, Kotlin still supports Java 6 bytecode because almost half of the Android
74 devices still run on it. It is possible to start using Kotlin for small parts of a
75 large project, including a few UI components and simple business logic. The
76 possible coexistence between Kotlin and Java can be deemed as one of the
77 main factors that are fueling the transition to Kotlin for Android developers.
78 As a first example of features that are not supported by Java, Kotlin
79 also allows functions in addition to classes to be first level constructs. In
80 Kotlin, everything is an object, even numeric values that in Java are treated
81 as primitive types. Kotlin provides the ability to extend a class with new
82 features without having to inherit from the class or use any design pattern
83 such as Decorator [8] through special declarations called Extension Functions
84 and Extension Properties.
85 On the other hand, Kotlin does not feature some characteristics of the
86 Java language, like checked exceptions, static members, non-private fields,
87 and the ternary operator.
88 A complete description of the features of Kotlin is out of the purpose of
89 this paper2 . The primary objective of our work has been instead to verify
90 some of the peculiarities of Kotlin, mostly regarding the avoidance of common
91 Java development pitfalls [9]:
2
A large set of open resources about the Kotlin language is available online at https:
//kotlinlang.org/docs/reference/
4
98 alternative to returning null, but not as a general-purpose solution to
99 the nullability problem [11]. Kotlin provides a way to declare nullable
100 variables explicitly (?) and a safe-call operator (?.) that can be used
101 in conjunction with the elvis operator (?:) to avoid most NPEs.
102 Figure 1 reports side-by-side examples of equivalent Kotlin and Java
103 code. We can observe how Kotlin allows declaring a nullable variable
104 – by default variables are non-nullable – and to use safe call and elvis
105 operators to achieve safer and more compact code.
Person bob = null;
var bob : Person? = null; // ...
//... if(bob!=null)
return bob?.department?.name; // safe call if(bob.department!=null)
return
bob.department.name;
return null;
Person bob = null;
var bob : Person? = null; // ...
//... return
return bob?.name:"<?>"; // ?: elvis bob!=null?bob.name:"<?>";
Kotlin Java
106 • Mandatory Casts: Java often requires several explicit casts to let the
107 compiler cope with type conversions, this makes code longer and hard
108 to read, in addition, a wrong cast could be accepted by the compiler
109 and result into a run-time exception; Kotlin introduced smart casts and
110 a safe (nullable) cast operator (as?).
111 Figure 2 report an example of a safe cast, in Kotlin and Java. Safe casts
112 are capable of eliminating the possibility of triggering a ClassCastException
113 at run-time. As it is evident from the comparison, the safe cast in Java
114 requires a more verbose syntax – that we reported with the usage of
115 the ternary operator – with respect to that needed by Kotlin. Such a
116 higher verbosity can be deemed as a deterrent for developers to exten-
117 sively use the practice of safe casting, hence increasing the likelihood
118 of generating ClassCast exceptions.
119 • Long argument lists: the invocation of Java methods uses a strict po-
120 sitional argument mapping. Therefore methods may require passing
5
Person p = x instanceof
val p: Person? = x as? Person
Person?(Person)x:null;
Kotlin Java
121 many arguments even if they assume default or null values; writing
122 overloaded methods might help in such cases, but it may require sig-
123 nificant effort without covering all cases. Kotlin adopts a solution to
124 this issue by defining default values for arguments and allowing – in
125 addition to positional arguments – passing arguments by name. Other
126 recent languages have adopted similar solutions, e.g., default values for
127 arguments are provided by Python.
128 • Data Classes: often, a program requires the creation of classes whose
129 primary purpose is to hold data. The amount of boilerplate code re-
130 quired by Java to implement these classes can be relevant. The addi-
131 tional code can often be mechanically derivable from the data: such
132 automatic derivation is done by libraries that are not part of the stan-
133 dard Java library, e.g., in project Lombok3 . Kotlin introduced the
134 Data Classes that the compiler is able to process to generate all the re-
135 quired boilerplate code automatically. In our prior investigations about
136 Kotlin, we found out that the amount of LoCs savings for a data class
137 with few fields can be of up to 90% w.r.t. the Java equivalent. An
138 example of Kotlin class and its Java equivalent is reported in Figure 3.
139 The main contribution of our work is a comparison between Java and
140 Kotlin in the context of Android Mobile Applications, and specifically when
141 performing maintenance tasks on apps written in either language. We per-
142 form this comparison with undergraduate students attending the course of
143 Mobile Application Development, inspired by the work done by Kosar et
144 al. [12] for setting up the experiment.
3
https://projectlombok.org Last visited March 2019
6
class User
private String name;
private int age;
public User(String name, int age){
this.name=name; this.age=age;
}
public String getName(){ return name; }
data class User(
public String getAge(){ return age; }
val name: String,
public String toString(){
val age: Int)
return "User(name="+name+",age="+age+")");
}
public boolean equals(Person){...}
public int hashCode(){ ... }
}
Kotlin Java
148 icated to the comparison between Kotlin and Java. This section reports a
149 summary of relevant work in these fields.
7
169 Their findings support the hypothesis that Kotlin presents fewer code smells
170 than Java. With this paper, however, we did not focus on code smells but
171 on maintenance aspects of code development.
172 Banerjee et al. [20] performed comparisons between the usage of Java and
173 Kotlin for developing Android applications. They conclude that the usage of
174 Kotlin makes the development of Android applications easier while reducing
175 the number of errors and bugs in the code. The principal limit of the work
176 by Banerjee et al. lies in the fact that their assumptions are based only on
177 coding tasks executed by the authors (thus, significant researcher biases can
178 be introduced), and no empirical evidence is provided to support them. The
179 results of the present manuscript are in line with those authors’ findings but
180 – to the best of our knowledge – we provide the first empirical assessment of
181 the claimed advantages of the Kotlin language when compared to Java.
8
Table 1: GQM Template for the study
Object of Study : usage of Java and Kotlin programming languages
Purpose : comparing
Focus : effectiveness in avoiding common pitfalls
Context : maintenance and development tasks performed on
Android applications by students
Stakeholders : developers, researchers
204 gree, attended by 108 students. During the course, the students usually
205 attend practical labs where they are required to work together in groups to
206 develop code for a course running project. The experiment took place dur-
207 ing two such labs and involved working on both a small application and the
208 course running project.
209 This section follows the reporting guidelines proposed by Jedlitschka et
210 al. [25] and the APA Manual [26] to organize the discussion of the exper-
211 imental design. More specifically, the following subsections provide details
212 about the high-level goal of the experiment, the participants that were in-
213 volved, the overall experimental design, and the individual research questions
214 that we formulated. For each research question, we report the materials, the
215 procedure, and the metrics that were used to answer them.
9
228 picking students ID randomly, in order to avoid any bias in the composition
229 of the groups that could be introduced by allowing the students to compose
230 the groups as they desired. All the groups were formed by four students
231 enrolled in the course and attending the Computer Engineering MSc degree
232 at Politecnico di Torino.
233 The sample of the experiment is clearly a convenience sample that might
234 be representative of small teams of novice developers.
235 Following recommended good practices [28], the subjects were rewarded
236 with points for participating in the experiment. Based on the correctness of
237 their answers, each subject earned up to a 10% bonus on their assignment
238 grade for the course.
10
Table 2: Questionnaire Structure
Group N Question Type Options
Context 1 What is your group ID? String -
2 How many people are in your group Numerical -
3 How many of you have worked as professional Numerical -
java developers?
4 How many of you have worked as professional Numerical -
developers in other languages?
5 On average what is your experience in Java Ordinal (i) Less than one year
programming?
(ii) Between one year and three
years
(iii) More than three years
6 In this lab what language has been assigned to Categorical Java / Kotlin
your group?
7 Have any of you developed programs using Categorical Yes / No
Kotlin?
8 What is the Java knowledge of the most expe- Ordinal (i) Novice: up to 20 classes
rienced member in your group? projects
(ii) Intermediate: 20 to 50
classes projects
(iii) Advanced: 50+ classes
projects
i 1 How easy was to understand the overall struc- Likert Very Easy - Very difficult
ture of the code
2 What is the purpose of class RecordingItem? Categorical (i)Manage audio registration
(ii) Send registration to server
(iii) Store registration in
database
(iv) Wrap registration data
(v) Notify the OS when the
registration data changes
3 How many defects did you find in the App? Numerical -
4 How many defects were you able to fix? Numerical -
5 How long did it take to fix the defect(s)? (In Numerical -
minutes)
6 Which classes contained defects? Categorical (i) Main Activity
(ii) FileViewerAdapter
(iii) DBHelper
(iv) RecordingItem
(v) RecordingService
ii 1 Did the IDE (e.g., autocomplete) help in writ- Likert Very Little — Very Much
ing code?
2 Did you often experienced Likert Never — Very Frequently
NullPointerExceptions?
3 Did you often encounter problems with meth- Likert Never — Very Frequently
ods having long argument lists?
4 How much the effort required to write classes Ordinal Much Higher — Much Lower
containing mainly data compare to the added
value?
5 How much the effort required to write code to Ordinal Much Higher — Much Lower
handle class casts compare to the added value?
11
Type 1 Reception
App use groups of Type 1
cases startup groups
deliverables
Researchers
Corrective Corrective
Type 1 groups
Maintenance Maintenance
[BookSearch] [SoundRec] Questionnaire
Project
Delivery
Development Development
[Chat] [SoundRec]
Corrective Corrective
Type 2 groups
Maintenance Maintenance
[BookSearch] [SoundRec] Questionnaire
Project
Delivery
Development Development
[Chat] [SoundRec]
268 RQ1: Does the use of Kotlin vs. Java affect the maintainability of Android
269 projects?
12
Table 3: Summary of the research questions
RQ Materials Experimental Tasks Analyzed Variables
BookSearch Understanding level
RQ1: Maintainability SoundRecorder Corrective maintenance Defect location accuracy
Questionnaire (sec. 1) Fix effort
4
Available at: https://github.com/shrikant0013/android-booksearch
5
Available at: https://openlibrary.org/dev/docs/api/search
6
Available at https://github.com/dkim0419/SoundRecorder
13
291 While the IntelliJ Idea IDE is capable of automatically translating from Java
292 to Kotlin, we opted for a manual translation performed by one of the authors
293 – having more than ten years of experience in Android development – in order
294 to have the code as close as possible to an application natively developed in
295 Kotlin. For each app, we identified the main use cases that were reproduced
296 on the Kotlin versions by another author of the app. The execution of all the
297 use cases allowed to validate the correctness of the translated apps, and to
298 verify that they provide the same functionalities of their Java counterparts7 .
299 The first section of the questionnaire administered to the participating
300 groups concerned the maintainability concepts measured to answer RQ1.
7
We report as a digital appendix the use case narratives of the applications:
https://figshare.com/articles/Effectiveness of Kotlin vs Java in Android App Development Tasks -
Use Cases of experimental subjects/11808018
14
322 Hu0 There is no difference between the understanding level achieved when
323 using Java or Kotlin.
324 Hl0 There is no difference between the capability of locating a defect when
325 using Java or Kotlin.
326 Ht0 There is no difference between the reported time required to correct a
327 defect when using Java or Kotlin.
328 The variables considered in our analysis correspond to the answers col-
329 lected through the questions in group 1 of the questionnaire.
330 Besides, we defined three derived measures, that were automatically com-
331 puted based on the answers to the questionnaire:
332 Purpose understanding is defined starting from item ii.2. The item asks
333 for a specific class in the application. One of the five options is cor-
334 rect; the others are wrong. Purpose understanding is a dichotomous
335 variable whose levels can be either correct or wrong. More specifically,
336 the RecordingItem class is used in the SoundRecorder app to manage
337 data about recordings; hence the Purpose understanding measure was
338 correct for all the experimental subjects that selected the fourth answer
339 to question ii.2.
340 Location accuracy is defined starting from item ii.6. The item asks the
341 respondents to identify the classes where the defects are located.
342 Two out of five classes are expected as a correct answer. We adopt
343 an information retrieval approach and compute the accuracy of the
344 answer.
In particular, Defect Location Accuracy (LA) is a ratio measure defined
as:
TP + TN
LA =
TP + TN + FP + FN
345 Where T P are the true positive, T N are the true negatives, F P are
346 the false positives, and F N are the false negatives.
347 More specifically, the two defects of the SoundRecorder app were in-
348 jected in the RecordingItem and FileViewerAdapter classes. Hence,
349 the maximum score for the Defect Location Accuracy was obtained if
350 and only if the respondents checked these two classes only in question
351 ii.6.
15
352 Fix effort is defined starting from item ii.5, that in the questionnaire col-
353 lects the time employed by each group to fix the defects, and item ii.6,
354 that reports the defects supposedly identified by the groups.
355 Fix Effort is defined as the ratio of the number of answers checked for
356 question ii.6 and the time estimated by the group for fixing the defects;
357 hence, it serves as a self-estimate of the average effort (in minutes) to
358 fix one defect.
369 • Time to fix a defect: we analyze the variable Fix effort to assess Ht0
370 by using Mann-Whitney test.
376 RQ2: Does the use of Kotlin vs. Java makes the code more concise?
377 We consider conciseness at the macroscopic level, which means less code,
378 both in terms of the number of classes and LoCs.
16
379 3.5.1. Materials
380 The students worked on a larger running project, which is developed
381 throughout the whole course. The running project consists of an app to help
382 people share books. The app had to allow users to sign up easily and set up
383 a basic profile; then users can make books available for sharing, providing all
384 the relevant pieces of information by accessing some shared database. The
385 users can search for shared books and get in contact, via the app, with the
386 book owner in order to arrange the withdrawal and successive return; as a
387 consequence of each sharing, users’ reputation must be updated.
388 The average final size of the projects was around 30 to 40 classes per
389 project, with an average total of 6KLOC, including both Java and Kotlin.
390 The second section of the questionnaire administered to the participating
391 groups concerned the conciseness concepts measured to answer RQ2.
408 Hc0 There is no difference between the measured amount of classes written
409 to implement a new feature when using Java or Kotlin.
410 Hl0 There is no difference between the measured lines of code written to
411 implement a feature when using Java or Kotlin.
17
412 3.6.2. Analysis method
413 Regarding code conciseness, we focused on two measures collected through
414 static analysis of the submitted experimental assignments:
417 Both above measures were used to asses Hc0 by applying a non-parametric
418 Mann-Whitney test.
428 RQ3: Does the use of Kotlin vs. Java effectively avoids the occurrence of
429 common pitfalls?
18
440 3.7.2. Hypotheses and variables
441 The following null hypotheses were formulated to answer RQ3:
442 Hp10 There is no perceived difference in terms of the number of NPEs oc-
443 currences with Java or Kotlin.
444 Hp20 There is no perceived difference in terms of the number of casts with
445 Java or Kotlin.
446 Hp30 There is no perceived difference in terms of issues with long argument
447 lists with Java or Kotlin.
19
471 et al. [29] [30], and NPEs and layout issues are among the most popular
472 categories of bugs.
473 A final threat to the generalizability of results may be linked to how un-
474 dergraduate students may be considered representative of Android developers
475 in general. The use of students as participants in experiments is, however,
476 widely recognized: Sjoeberg et al. [28] report that 50% of the 2,969 exper-
477 iments in 12 leading software engineering journals and conferences between
478 1993 and 2002 used undergraduate students as participants [28]. Carver et al.
479 define a model for conducting a valid empirical study with students (ESWS).
480 They identify research and pedagogical requirements that need to be man-
481 aged while preparing and executing an experiment in a university course. In
482 short, researchers have to make sure that the study is well-integrated with
483 the course goals and materials, give realistic time estimates for experimen-
484 tal tasks, properly motivate the participants without revealing the goals,
485 measures and analysis prior to the study, allow students to give feedback,
486 convince the participants of the relevance of what they are learning, avoid
487 conflicts with students’ other commitments, and give students feedback on
488 the results of the experiment [32]. All these guidelines were followed in the
489 conduction of the work documented in this paper. Besides, Carver et al. [32]
490 also provide a checklist to explain when the various activities should occur
491 (i.e., before starting the study, as soon as the study begins, during the study,
492 or after the study is completed). The requirements and checklist provide a
493 useful guide for judging how well a study is integrated into the university
494 course and for judging the reliability of the results. We used that checklist
495 to verify the research and pedagogical goals in this study.
496 Construct validity threats concern the relationship between theory and
497 observation. It is not assured that the Purpose Understanding, Location
498 Accuracy, and Fix Effort metrics defined in this paper are the best possible
499 proxies for providing answers to the Research Questions identified for this
500 study. We measure conciseness in terms of code size, though shorter code
501 could – at least in principle – bear a higher cognitive load, thus reversing the
502 benefits stemming from more concise code. Considering the specific features
503 introduced in Kotlin, we do not believe this is the case though there is no
504 empirical evidence supporting such belief. As explained in section 3.7.1,
505 we decided to measure the occurrence of pitfalls by means of proxies. In
506 practice, we inferred the actual occurrence on the basis of the reported pitfall
507 manifestation, recorded through the questionnaire. While this choice was
508 dictated by practical feasibility reasons, we have no reason to believe any
20
Java Kotlin
Other language
professionals
in group 2 1 1
1 1 1 1
None 10 1 11
None 1 None 1
Java professionals in group
Figure 5: Groups with members having professional experience with Java or other lan-
guages
509 misreporting took place. Moreover, we argue the pitfalls do not represent a
510 problem per se but rather in as much they affect the development activities
511 of the developer, thus in this specific case the reported experience of such
512 pitfalls is probably closer to the original construct than a mechanical count
513 of pitfall frequency.
514 As far as the IDE support, we have to notice that in the presence of
515 participants familiar with Java but not at all with Kotlin, the perceived
516 support could be more related to familiarity than to actual IDE support.
517 Finally, Researcher bias is another possible threat to the validity of this
518 study, since the authors were involved in the creation of the starting Kotlin
519 versions of the two considered apps and the bug injection phase. However,
520 the authors have no reason to favor any language; neither are they inclined
521 to demonstrate any specific result.
522 5. Results
523 In this section, we report the results measured for the three Research
524 Questions of the experiment. We also provide details about the population
525 that participated in the experiment.
21
Java Kotlin
Highest Java
skill in group
Advanced 2 2
Intermediate 1 7 5 2
Novice 1 2 1 4
< 1 year 1 to 3 years > 3 years < 1 year 1 to 3 years > 3 years
Average group experince in Java development
530 experiment involved 108 students aged between 25 years old and 40 years old
531 of different gender and ethnicity.
532 The professional experience of group members was measured through the
533 answers to question 3 and 4 of the questionnaire. Figure 5 summarizes the
534 answers to those questions for both typologies of groups. Three groups in-
535 cluded participants that had professional Java development experience, and
536 overall, 6 out of 27 groups included components with experience as profes-
537 sional software developers in any language. No participant had any previous
538 experience in Kotlin.
539 The skill level of the population was measured through the answers to
540 question 5 and 8 of the Context section of the questionnaire. Figure 6 sum-
541 marizes the answers to those questions for both typologies of groups. The
542 experience with Java development was mostly between one and three years,
543 although three groups had no member with more than one year of experience
544 in Java, and four groups included a member with more than three years of
545 experience.
546 In the whole population, four groups had members having advanced Java
547 skills (i.e., they developed at least one project of over than 50 classes); eight
548 groups had a most experienced member that considered him/herself a novice
549 (i.e., they developed a few projects featuring up to 20 classes); in the re-
550 maining fifteen groups the most experienced member considered him/herself
551 an intermediate (i.e., at least one medium-sized project of 20 to 50 classes).
552 Regarding the years of experience with Java, four groups had an average
553 experience of more than three years, and three groups had an average expe-
554 rience of less than one year; the remaining groups had an average experience
22
Table 5: Null hypotheses for RQ1
Name Description p-value Decision
Hu0 There is no difference between the understand- 0.68 Don’t Reject
ing level achieved when using Java or Kotlin
Hl0 There is no difference between the capability of 0.81 Don’t Reject
locating a defect when using Java or Kotlin
Ht0 There is no difference between the reported time 0.43 Don’t Reject
required to correct a defect when using Java or
Kotlin
Purpose
Correct 9
(69%)
Java
Wrong 4
(31%)
Correct 11
(79%)
Kotlin
Wrong 3
(21%)
0 2 4 6 8 10
Frequency
23
Java Kotlin
Class
RecordingService 77% 64%
RecordingItem 8% 0%
DBHelper 8% 0%
Figure 8: Frequency of the answers to the defect location question selected by the respon-
dents
24
Language
Kotlin +
Java +
Language
Kotlin +
Java +
0 20 40 60
Average time to fix a defect
591 The average time to fix defects is similar between Kotlin and Java, with
592 no statistically significant difference.
25
Table 6: Null hypotheses for RQ2
Name Description p-value Decision
Hc0 There is no difference between the measured 0.2138 Don’t Reject
amount of classes written to implement a new
feature when using Java or Kotlin
Hl0 There is no difference between the measured 0.033 Reject
lines of code written to implement a feature
when using Java or Kotlin
603 measured by a script that leveraged the open-source cloc tool8 . Blank and
604 comment lines were not included in the computation.
605 From the table, it can be seen that the number of classes and the amount
606 of code development had a very high variability between groups. Groups
607 that worked with Kotlin to implement the new features produced a number
608 of classes that ranged from 3 to 12 (246 to 1568 lines of code). Four different
609 groups developed data classes. Groups that worked with Java produced a
610 number of classes that ranged from 0 to 26 (583 to 4745 lines of code).
611 The distribution of the number of new classes is reported in Figure 11.
612 We observe a lower number of classes developed for the participating groups
613 using Kotlin. The mean number of classes developed with Kotlin is 7 while
614 it is 12 for Java; the medians are respectively 8 and 13.
615 The hypothesis Hc0 was tested using a Mann-Whitney test that returned
616 a p-value=0.2138. Therefore we cannot reject the null hypothesis.
617 The effect size can be considered small, as Cliff’s Delta is 0.29; the 95%
618 CI for the effect size is (-0.24; 0.68): it includes the 0. Therefore, it cannot
619 be considered as statistically significant.
620 The same kind of analysis can be applied to the amount of Lines-Of-Code
621 (LOCs) written in order to implement the new feature. The distribution of
622 the LOCs by language is reported in Figure 12. The median LOCs reported
623 for Java is between 1526, while for Kotlin, it is Less than 589.5.
624 The hypothesis Hl0 was tested using a Mann-Whitney test that returned
625 a p-value=0.003. Therefore we can reject the null hypothesis.
626 The effect size can be considered large, Cliff’s Delta is 0.65, with the
627 relative 95% CI being (0.24; 0.86); since the CI does not include the 0, the
8
https://github.com/AlDanial/cloc
26
Table 7: Absolute and relative added classes and LOCs for the development of the required
features
Added
Language Group Classes (%) LOCs (%) Data classes
Kotlin 1 7 (17.1%) 483 (8.3%) 0
3 8 (12.3%) 337 (5.2%) 0
5 12 (24.0%) 1568 (18.9%) 4
7 8 (12.7%) 745 (8.1%) 0
9 5 (4.4%) 515 (4.0%) 0
11 8 (13.6%) 978 (12.9%) 1
13 3 (8.8%) 322 (3.2%) 1
15 9 (20.4%) 664 (10.2%) 0
17 9 (17.0&) 972 (13.0%) 0
19 8 (7.1%) 460 (4.0%) 1
21 5 (9.1%) 380 (5.0%) 0
23 6 (12.8%) 835 (14.4%) 0
25 3 (5.1%) 246 (4.3%) 0
27 9 (14.1%) 744 (7.5%) 0
Java 2 17 (24.3%) 4099 (37.3%) 0
4 4 (11.8%) 679 (15.6%) 0
6 17 (37.8%) 1526 (31.6%) 0
8 22 (31.4%) 2208 (24.9%) 0
10 26 (32.9%) 4745 (45.0%) 0
12 17 (30.0%) 2688 (32.0%) 0
14 0 (0.0%) 583 (7.3%) 0
16 19 (16.4%) 3810 (31.9%) 0
18 3 (8.6%) 775 (18.1%) 0
20 13 (17.6%) 742 (14.9%) 0
22 6 (15.4%) 1755 (29.6%) 0
24 9 (15.0%) 1424 (14.7%) 0
26 2 (6.1%) 647 (15.4%) 0
27
Language
Kotlin +
Java +
0 10 20
Number of added classes
Language
Kotlin +
Java +
28
Table 8: Null hypotheses for RQ3
Name Description Decision
Hp10 There is no perceived difference in terms of number of Reject
NPEs occurrences with Java or Kotlin
Hp20 There is no perceived difference in terms of the number Don’t Reject
of casts with Java or Kotlin
Hp30 There is no perceived difference in terms of issues with Don’t Reject
long argument lists with Java or Kotlin
Hp40 There is no perceived difference in terms of tool sup- Don’t Reject
port to Java or Kotlin
Hp50 There is no perceived difference in terms of effort re- Don’t Reject
quired to write data classes
Frequency of NullPointerExceptions | |
Figure 13: Effect size with confidence interval of language for different aspects.
641 The distributions of the answers to the perception questions are reported
642 in Figure 13. For each aspect, the figure reports Cliff’s Delta estimate effect
643 size as a diamond shape, and the relative 95% CI is as a whiskered segment.
644 The shades of the background represent the standard quantification ranges
645 for the practical magnitude of the effect size.
646 With reference to the 95% confidence interval, the following assumptions
647 can be based on the respondents’ perceptions:
648 • More NPEs and null-pointers related issues are likely to happen when
649 coding with Java; the difference in the respondents’ answers is statis-
650 tically significant, and the size of the effect is of large magnitude;
29
651 • The usage of Kotlin for writing code casts is perceived as less effort con-
652 suming than Java; this difference is not statistically significant though
653 the effect size magnitude is medium;
654 • The respondents considered the definition of long argument lists with
655 Kotlin easier than with Java; this difference is not statistically signifi-
656 cant though the effect size magnitude is medium;
657 • Java is perceived as better supported by tooling than Kotlin; this dif-
658 ference is not statistically significant though the effect size magnitude
659 is medium;
662 6. Discussion
663 The overall goal of our comparative investigation on Java vs. Kotlin
664 was focused on assessing the consequences of a possible transition from one
665 programming language to the other. The switch could occur at different
666 levels: from a single project to a unit, up to a whole company.
667 We know that, by design, Kotlin is fully compatible at the bytecode-level
668 with Java; therefore, a smooth, progressive transition between the two lan-
669 guages is technically possible. The focus of our research questions addressed
670 the development part of the transition that entails:
30
680 terms of the ability to detect defects, to fix them, and the effort required to
681 perform the change.
682 There is no evidence that the use of Kotlin, as a substitute for Java,
683 either enhances or lessens software maintainability (RQ1).
684 It is important to asses how this finding can be generalized. For this
685 purpose, we have to consider the background of the participants in our ex-
686 periment: they are students in a computer engineering master’s degree, only
687 three teams in each language group reported some professional experience.
688 Overall we may consider them close to a junior developer profile. Moreover,
689 we wish to stress that none of our participants had any previous experience in
690 Kotlin. Nonetheless, they were able to detect and fix the defects seamlessly.
691 This may represent evidence in favor of limited risks deriving from the switch
692 to Kotlin in a company, even with little previous knowledge of Kotlin.
693 From our understanding, it remains an open question whether the lack
694 of evidence in terms of maintainability was due to the confounding factors
695 represented by the characteristics of the participants and the small size of
696 the applications.
31
715 experiment was able to provide evidence limited to the Android environment
716 and specifically concerning two small applications.
717 An important aspect that deserves further investigation is the capability
718 of modern construct that is present in Kotlin – but we argue in many mod-
719 ern programming languages – to actually enable more concise code in many
720 different settings.
721 Also related to the previous research question, we wonder it conciseness
722 stemming from more expressive constructs also translates into a higher un-
723 derstandability of the code.
32
751 characteristic of the required new features, which did not require the use of
752 any data class (see Table 7).
753 Concerning the IDE support, we have to keep in mind that no student had
754 any experience with Kotlin’s development before the experiment. This bias
755 could have affected the participants’ perception. Therefore we could have
756 measured the familiarity with the language rather than the actual support
757 provided by the IDE.
758 The limited evidence and partially counter-intuitive results concerning
759 the coding pitfalls deserve further investigation. Research should be aimed
760 at understanding whether and under which circumstances the adoption of
761 Kotlin allows avoiding the pitfalls.
778 7. Conclusion
779 Kotlin is a modern programming language that represents a relevant al-
780 ternative to Java in several development domains. In particular, it has been
781 adopted as an official development language for the Android OS. In this work,
782 we focused on the main promises of this new language. In particular, we in-
783 vestigated how Kotlin can improve the maintainability of code, make code
33
784 more compact, and avoid common pitfalls. For this purpose, we carried on an
785 experiment in the context of a Mobile Application Development course in an
786 MSc. degree. The experiment compared the Kotlin programming language
787 to its ancestor, Java.
788 With our experiment, we found that the usage of Kotlin apparently does
789 not affect the maintainability with respect to Java, when working on two
790 small applications. At the same time, we found evidence that the adoption
791 of Kotlin leads to more compact code when the subjects of the experiments
792 were asked to develop new features for an ongoing software project.
793 The adoption of Kotlin makes a few common Java annoyances less fre-
794 quent, thus making the development safer. We registered evidence of a re-
795 duction in the frequency of Null Pointer Exceptions. We also observed fewer
796 issues with long argument lists and reduced effort when dealing with casts,
797 although no definitive evidence could be found with this respect.
798 Those findings represent a first empirical assessment of the advantages
799 of Kotlin with respect to Java, as reported by many works in the related
800 literature. The findings showed that most of the promises of the develop-
801 ment of the Kotlin language are reflected by the code produced and by the
802 developers’ perception.
803 The study has few limitations, mainly due to the academic settings: the
804 software artifacts were small, the developers were students with limited ex-
805 perience; therefore, the number of bugs and tasks that were studied was
806 limited. The study may not be representative of bigger, real-world projects
807 that require many development tasks and may expose many typologies of
808 defects and issues. It is important to collect more evidence for different and
809 possibly larger applications and outside the Android ecosystem.
810 As future work, we hence plan to investigate the advantages brought
811 by Kotlin in other domains, e.g., server-side development. Also, we aim at
812 finding whether other expected Kotlin benefits hold.
813 References
814 [1] R. Coppola, L. Ardito, M. Torchiano, Characterizing the transition to
815 kotlin of android apps: a study on f-droid, play store, and github, in:
816 Proceedings of the 3rd ACM SIGSOFT International Workshop on App
817 Market Analytics, pp. 8–14.
34
819 on android development: a triangulation study, in: 27th IEEE Interna-
820 tional Conference on Software Analysis, Evolution, and Reengineering
821 (SANER 2020), IEEE, pp. 1–6.
825 [4] É. Payet, F. Spoto, Static analysis of android programs, Information
826 and Software Technology 54 (2012) 1192–1201.
831 [6] S. Hellbrück, A Data Mining Approach to Compare Java with Kotlin,
832 Metropolia Ammattikorkeakoulu, 2019.
844 [11] B. Goetz, Response to ”should java 8 getters return optional type?”,
845 https://stackoverflow.com/a/26328555/3687824, 2014. Accessed:
846 2018-02-23.
35
850 [13] Y. Shah, J. Shah, K. Kansara, Code obfuscating a kotlin-based app
851 with proguard, in: 2018 Second International Conference on Advances
852 in Electronics, Computers and Communications (ICAECC), pp. 1–5.
856 [15] B. Skripal, V. Itsykson, Aspect-oriented extension for the kotlin pro-
857 gramming language, in: CEUR Workshop Proceedings, volume 1864,
858 pp. 1–6.
36
881 [22] L. Prechelt, An empirical comparison of seven programming languages,
882 Computer 33 (2000) 23–29.
883 [23] D. Singh, An empirical study of programming languages from the point
884 of view of scientific computing, Int. J. Innov. Sci. Eng. Technol 4 (2017)
885 367–371.
904 [30] C. Hu, I. Neamtiu, Automating gui testing for android applications,
905 in: Proceedings of the 6th International Workshop on Automation of
906 Software Test, pp. 77–83.
907 [31] M. Reyhani Hamedani, D. Shin, M. Lee, S.-J. Cho, C. Hwang, Andro-
908 class: An effective method to classify android applications by applying
909 deep neural networks to comprehensive features, Wireless Communica-
910 tions and Mobile Computing 2018 (2018).
37
911 [32] J. C. Carver, L. Jaccheri, S. Morasca, F. Shull, A checklist for integrating
912 student empirical studies with research and teaching goals, Empirical
913 Softw. Engg. 15 (2010) 35–59.
38
914 Author Biography
915
922
929
39
937 vironments. He is a co-author of seven patents. Since 1999, he cooperates
938 with Istituto Superiore ”Mario Boella”, participating to a shared laboratory
939 for the development of mobile services and applications. He supervised the
940 research activities of several graduate and PhD students at Politecnico di
941 Torino. He has been the advisor of four PhD students in Computer and
942 Control Engineering and more than 40 master students.
943
40
958 Appendix
959 7.1. Population details
960 Professional experience in Java and other languages in the two experi-
961 mental groups.
Java Kotlin
Other language
professionals
in group 2 1 1
1 1 1 1
None 10 1 11
None 1 None 1
962 Java professionals in group
RecordingItem 8% 0%
DBHelper 8% 0%
41
967 7.3. Detailed answer for perceptions
968 7.3.1. IDE support effectiveness
IDE support Java Kotlin
effectiveness
Very much 7 4
Much 2 1
Enough 3 5
Little 3
Very little 1 1
0 2 4 6 0 2 4 6
969 Frequency
Frequently 5
Occasionally 6 3
Rarely 1 7
Never 3
0 2 4 6 0 2 4 6
971 Frequency
42
972 7.3.3. Frequency of Long arguments list issues
Long arg list Java Kotlin
occurrence
Very Frequently
Frequently
Occasionally 5 4
Rarely 5 5
Never 3 5
0 1 2 3 4 5 0 1 2 3 4 5
973 Frequency
Proportional 7 7
Lower 2 3
Much lower 2 2
0 2 4 6 0 2 4 6
975 Frequency
43
976 7.3.5. Effort to write casts
Cast writing Java Kotlin
effort
Much higher 1
Higher 1
Proportional 8 8
Lower 1 2
Much lower 2 4
0 2 4 6 8 0 2 4 6 8
977 Frequency
44