0% found this document useful (0 votes)
4 views12 pages

How Is Open Source Software Development Different in Popular IoT Projects

This paper investigates the differences in Open Source Software (OSS) development practices between popular Internet of Things (IoT) and non-IoT projects on GitHub. Through a quantitative analysis of 60 OSS repositories, the study reveals significant variations in developer contributions, specialization, and code reuse between the two domains. The findings highlight the unique challenges and characteristics of IoT software development, suggesting implications for future research in software engineering within the IoT context.

Uploaded by

Sean Yang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views12 pages

How Is Open Source Software Development Different in Popular IoT Projects

This paper investigates the differences in Open Source Software (OSS) development practices between popular Internet of Things (IoT) and non-IoT projects on GitHub. Through a quantitative analysis of 60 OSS repositories, the study reveals significant variations in developer contributions, specialization, and code reuse between the two domains. The findings highlight the unique challenges and characteristics of IoT software development, suggesting implications for future research in software engineering within the IoT context.

Uploaded by

Sean Yang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Received January 13, 2020, accepted February 3, 2020, date of publication February 7, 2020, date of current version February

17, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.2972364

How Is Open Source Software Development


Different in Popular IoT Projects?
FULVIO CORNO , (Member, IEEE), LUIGI DE RUSSIS , (Member, IEEE),
AND JUAN PABLO SÁENZ , (Student Member, IEEE)
Department of Control and Computer Engineering, Politecnico di Torino, 10129 Torino, Italy
Corresponding author: Luigi De Russis ([email protected])

ABSTRACT From the software point of view, the development of IoT applications differs from other kinds of
applications due to the specific features that the former exhibit. In this paper, we investigate how developers
contribute to IoT applications in the Open Source Software (OSS) context, to gain a deeper understanding
of how their work differs from that of non-IoT applications. To that end, we conducted a quantitative
analysis of a broad set of the 60 most popular publicly available IoT and non-IoT projects on GitHub.
By comparing how developers contribute to these projects, our analysis provides insight into the purpose and
characteristics of the code, the behavior of the contributors, and the maturity of the IoT software development
ecosystem. Results reveal significant differences between IoT and non-IoT application development, in terms
of how applications are realized, in the diversity of developers’ specializations, and in how code is reused.
This work provides evidence about some Open Source IoT software development peculiarities to be
considered by future research efforts aimed at better satisfying software engineering needs in the IoT
scenario.

INDEX TERMS Internet of Things, open source software, software mining, developers.

I. INTRODUCTION on software to address diverse features that IoT applications


Nowadays, the Internet of Things (IoT) is a well-established expose [2].
paradigm that has gained prominence in several aspects of From the software point of view, in addition, the imple-
our everyday lives [1]. Roughly speaking, it is based on mentation of IoT applications is particularly complex and
embedding computing and communication capabilities into differs from the development of mobile and web appli-
objects of common use [2]. This concept has given rise to cations. According to Taivalsaari et al. [8], for instance,
the development of various kinds of solutions in several IoT development differs from mainstream mobile app and
domains such as smart buildings, smart cities, environmental web application development in several ways, summarized
monitoring, healthcare, smart business, smart agriculture, and by the authors into a set of dimensions that are unfamil-
security and surveillance [3]–[5]. iar to most software developers. Multi-device programming,
From a technical point of view, several definitions have the reactive nature of the application, the distributed nature
been proposed for the Internet of Things [6] and various of the software, and the need to write fault-tolerant software,
enabling technologies are considered to characterize IoT are among these dimensions, which IoT developers must
applications. According to Atzori et al. [7], these tech- consider.
nologies may be categorized into identification, sensing Against this backdrop, the present work relies upon soft-
and communication technologies; middleware components; ware mining to gain understanding, from a practical point
end-user software applications; services composition; service of view, about how developing IoT applications is different
management; and object abstraction. While identification, from developing non-IoT applications in the Open Source
sensing and communication technologies mainly concern Software (OSS) context. To this end, this paper reports the
hardware components, the other enabling technologies rely comparison and quantitative analysis between the behavior of
developers in the most popular IoT and non-IoT OSS projects
The associate editor coordinating the review of this manuscript and hosted on a world leading software development platform as
approving it for publication was Pietro Savazzi . is GitHub.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 28337
F. Corno et al.: How Is OSS Development Different in Popular IoT Projects?

In particular, we conducted an empirical study mining popular ones. Additionally, focusing on the IoT OSS projects,
60 OSS repositories publicly available on GitHub. We mined we wanted to identify which aspects of IoT application devel-
30 IoT OSS and 30 non-IoT OSS projects to analyze a) the opment these dependencies address and how often they are
way developers contribute to their projects, b) the files that used by IoT developers. This leads to our second research
they tend to modify the most, and c) the specialization and question:
the evolution of these modifications. Finally, we assessed the
maturity of the IoT software development ecosystem based RQ2: How developers exploit dependencies to reuse features
on a dependency analysis in the selected projects. Besides in IoT vs. non-IoT OSS projects?
leveraging a characterization of IoT OSS projects currently
available for IoT developers, this work aims at providing B. SELECTION OF THE ANALYZED REPOSITORIES
evidence from a practical point of view about the IoT software To select a prominent widely-known and widely-used set
development peculiarities that should guide future research of IoT OSS repositories from GitHub, we first filtered
efforts to better understand and satisfy software engineering them by topic, choosing the ones that belong to the iot
needs in the IoT context. or internet-of-things topics on GitHub. Topics are
The remainder of the paper is structured as follows. labels to classify a repository based on its intended purpose,
Section II describes the research goal and questions and subject area, community, or language. They appear on the
outlines the selection process. Section III characterizes the main page of a repository and repository administrators can
selected OSS projects and describes the quantitative analysis add as many topics as they want to a repository.
conducted over them as well as the outcome of the analysis. Once the repositories belonging to the IoT topic were
Section IV discusses the results and presents further impli- filtered, 4,696 repositories were retrieved. Therefore, to pri-
cations, while threats to validity are outlined in Section V. oritize the most popular and well-evaluated ones, we sorted
Section VI presents the related work, while Section VII con- them according to the decreasing number of stars. Stars
cludes the article. enable GitHub users to keep track of repositories they find
interesting and to discover similar repositories [9], as well
II. RESEARCH GOAL AND QUESTIONS
as to show appreciation to the repository maintainers for
The overall goal of this research is to explore the potential their work.1 Lastly, we took the 30 top-starred repositories,
differences between the development practices for IoT and provided they were open source code repositories. In fact,
non-IoT projects in the OSS context. In particular, we are since a large portion of repositories on GitHub are not for
interested in identifying (a) the behavior of developers and software development [10], we inspected them manually to
the diversity of resources they manage, and (b) the reuse of exclude the ones that were not software related (i.e., tutorials,
features through the adopted dependencies. These two criteria documentation pages, icon-packs, fonts) or without an open
lead us to the research questions set out below. source license.
The same procedure was followed to select the non-IoT
A. RESEARCH QUESTIONS
repositories. The only difference was that the filter was modi-
We want to investigate whether and how developers adopt fied to include repositories belonging to any topic except iot
different programming languages and cover various special- and internet-of-things.
izations in IoT vs. non-IoT OSS projects. In particular, we are The data used in the analyses reported in this article was
interested in: mined from GitHub in August 2018. Tables 1 and 2 list the
• how different programming languages are used in the selected IoT and non-IoT repositories along with their salient
two domains; characteristics. Most of the information about the repositories
• whether IoT developers are more specialized in any was gathered through the GitHub GraphQL API v4.2
programming languages or certain types of files in their
project; III. OSS PROJECTS ANALYSIS
• how the usage of such programming languages evolve A. PROJECTS CHARACTERIZATION
over time. Before diving into the research questions, we report a charac-
Therefore, our first research question is: terization of the selected projects, to provide a brief but com-
plete overview and to set the stage for the subsequent analysis.
RQ1: How developers of IoT vs. non-IoT OSS applications Each project was examined individually to understand its pur-
contribute to their projects regarding the programming lan- pose and to assign it a genre. The genres aimed at describing
guages that they adopt? the nature of the projects. Then, through the GitHub API,
several characteristics were gathered, namely: the topics,
Our quantitative investigation, furthermore, exploits OSS
their size (kB and lines of code), their primary language, and
repositories by focusing on the maturity of the IoT ecosystem
their total number of programming languages. Additionally,
for a software development point of view. We investigate this
aspect in the repositories we selected by analyzing project 1 https://help.github.com/articles/about-stars/, last visited on June 6, 2019
dependencies, how many they are, and which are the most 2 https://developer.github.com/v4/, last visited on June 6, 2019

28338 VOLUME 8, 2020


F. Corno et al.: How Is OSS Development Different in Popular IoT Projects?

TABLE 1. IoT popular Open Source GitHub repositories.

to put into perspective the comparison of the projects’ size, regarding the genre and the topics of the projects seem to
we illustrate through the heatmap graphs in Figures 1 and 2 be in line with various authors [6], [8], [11], who point out
the growth of the source code along the projects’ lifetime. that the development of IoT applications is more complex
As observed in Table 1, the genre of the IoT OSS projects and requires programmers with skills and expertise in sev-
is heterogeneous, as they are scattered across operating sys- eral domains as might be, for instance, mobile and cloud
tems, programming frameworks, libraries, network proto- computing, embedded devices, database design, and web
cols, databases, IoT platforms, and IDEs. At first glance, development.
no clear trend emerged concerning their purpose or appli- Concerning the size of the projects (in kB), the aver-
cation domain. On the contrary, when analyzing non-IoT age non-IoT project is almost three times larger (4.56×)
projects (Table 2), we can notice that most of them are related than a typical IoT project. However, if we look at LOC
to the web development area, with just 12 exceptions, such as (Lines Of Code), this difference decreases significantly: on
a machine learning framework, a Zsh framework, an operat- average, non-IoT projects contains 1.9M LOC, while IoT
ing system kernel, an IDE, a text editor, and a couple of open projects 1.0M (1.9×). The largest IoT project, for both
source programming languages. kB and LOC, corresponds to rt-thread, a real-time
The fifteen most commonly used topics across the IoT IoT operating system for embedded devices. Similarly,
projects (mqtt, raspberry-pi, arduino, hardware, esp8266, the largest non-IoT project is the Linux kernel followed far
esp32, embedded, robotics, javascript, java, iot-platform, i2c, behind by kubernetes. The smallest IoT project, in kB,
home-automation, gpio, docker) did not reveal a prevail- is BerryNet, a project to turn edge devices such as Rasp-
ing technology or application domain. Instead, the 15 top- berry Pi 3 into intelligent gateways with deep learning capa-
ics across the non-IoT projects (javascript, nodejs, html, bilities running locally, on the edge device itself, without the
framework, electron, css, windows, web, ui, react, python, need of an Internet connection. For what concerns LOCs,
macos, linux, go, frontend) are mostly about web develop- instead, the smallest IoT project is cylon, a JavaScript
ment. This fact leads us to think that neither in our classifi- framework for robots, drones, and the IoT, developed for
cation nor in the labels assigned by the owners to their IoT Arduino and similar boards. As may be observed in these
projects, there is a strong focus towards a particular domain last two projects, achieving a small size is fundamental given
or technology, thus further motivating our investigation and the fact that in most cases IoT software components are
research questions. Furthermore, our initial observations deployed on constrained devices with low computational

VOLUME 8, 2020 28339


F. Corno et al.: How Is OSS Development Different in Popular IoT Projects?

TABLE 2. Non-IoT popular Open Source GitHub repositories.

FIGURE 1. Growth speed of the IoT repositories.

28340 VOLUME 8, 2020


F. Corno et al.: How Is OSS Development Different in Popular IoT Projects?

FIGURE 2. Growth speed of the non-IoT repositories.

and/or storage resources. This same restriction holds for most the first measure is always one, and the following values rep-
of the other IoT projects, especially those to be deployed on resent the variation regarding the initial size of the projects’
the gateway architectural element. programming files. Hence, the last measure represents how
Finally, Figure 1 (for IoT projects) and Figure 2 many times the source code grew in comparison with respect
(for non-IoT projects) aim at visualizing the growth of to its initial size.
the projects’ source code, expressed as the proportion As can be observed in Figure 1a, a subset of four IoT
between the initial size of the programming files and their projects grew up hugely. Namely netdata (350 times,
size along the lifetime of the projects. We divided the period 24.4 MB, and 7.3k commits), home-assistant
between the first commit in the project and the last commit (101 times, 85.7 MB, and 14.7k commits), gobot
before August 2018 (the date when the repositories were (108 times, 9.6 MB, and 2.5k commits), and crate
mined for this analysis), into 21 equally spaced date intervals (261 times, 86.5 MB, and 8.7k commits). Indeed, while
for each project. Then, on each of these dates, we checked the average growth is 35.15 times, the standard deviation
out from GitHub the corresponding version of the project is 78.83 times. To improve the readability of the graph for
and calculated the size of the programming files. To this end, project with less dramatic growth, we generated a second
we relied on Linguist; the open-source library that GitHub heatmap visualization, restricted to the projects whose final
uses to determine file languages for syntax highlighting, growth is below the mean, only (Figure 1b).
and project statistics.3 Specifically, we used the Ruby API Concerning non-IoT projects (Figure 2b), five of them
provided by this library that, given a directory, returns a grew up significantly, although not as dramatically as the
dictionary with the detected programming languages along subset of IoT projects that grew above the mean. These
with their size. repositories were: oh-my-zsh (60 times, 4.7 MB, and 4.7k
The growth of the project was calculated by dividing the commits), create-react-app (39 times, 5.7 MB, and
size of each checked out version of the project by the size 1.7k commits), moby (30 times, 137.5 MB, and 35.8k com-
of the second checked out version. By taking the second mits), three.js (72 times, 662.9 MB, and 25.2k commits),
version instead of the first one (initial commit) we could avoid and meteor (47.9 times, 76.0 MB, and 21.6k commits).
empty projects (without source code) that would have made The average growth in non-IoT projects is 11.88 times, and
our calculation impossible or meaningless. In this manner, the standard deviation 18.81 times. As with the IoT projects,
Figure 2b reports a second heatmap visualization with the IoT
3 https://github.com/GitHub/linguist, last visited on November 26, 2019 projects whose final growth is below the mean.

VOLUME 8, 2020 28341


F. Corno et al.: How Is OSS Development Different in Popular IoT Projects?

FIGURE 3. Top primary programming languages in IoT and non-IoT


repositories.

Among the IoT projects, paho.mqtt.android is the


one that has remained more stable over time (1.0 times, FIGURE 4. Presence of programming languages in IoT and non-IoT
2.0 MB, and 194 commits), it consists of an MQTT projects.

client library written in Java for developing applications on


Android. Nevertheless, the last of its 195 commits was 4ht
October 2017, and it has just two releases. After it, the project Python, and JavaScript. All of them are the primary language
that remained more stable was urh (1.2 times, 43.5 MB, and on almost the same number of projects (from 4 to 6 projects,
2.5k commits), it consists of a tool for analyzing unknown each).
wireless protocols by taking samples from Software Defined Besides the primary language, there are several other lan-
Radios and transforming them into binary information. For its guages on each project: on average, 7.4 different languages
part, the non-IoT project whose code growth remained more for non-IoT projects vs. 8.3 for IoT projects. To gather
stable over time is socket.io (1.0 times, 12.2 MB, and additional insight on this comparison, since the averages’
1.7k commits), a library that enables real-time, bidirectional difference is not statistically significant due to the small size
and event-based communication between the browser and the of the sample, we compared the percentage of files written in
server. a given programming language with the number of projects
in which that language is present and reported it in Figure 4.
B. RQ1: DEVELOPMENT ACTIVITIES It illustrates, given a programming language, the number of
To answer RQ1, we performed an analysis of the commit projects in which it is present, and the average percentage
history for all the OSS projects. In particular, each repository of files on those projects. Regarding this graph, it can be
was cloned locally so that its git history could be saved into an observed that no languages were present on a high number of
external text file, and processed later by a custom-developed IoT projects with a significant percentage of files (right-upper
text mining tool. This tool extracted from each commit the quadrant). In most of the IoT projects, the chart identifies
set of files that were modified, the modification date, and programming languages that are present in many projects
the author name. Several classifications and cross-checking with a marginal percentage (right-lower quadrant), as well as
analyses over this information allowed us to determine the programming languages that have a significant percentage of
most widely-modified file formats, and especially the commit files but just on a few projects (left-upper quadrant). In the
history over time of such resources. In addition, we gath- first category, Java and Erlang have a significant percentage
ered complementary information from the GitHub API, when of files on a few projects. In the second category, C++, C,
appropriate. and Python are present in around half of the projects, with
percentages of files ranging from 26% to 32%. Furthermore,
1) DISTRIBUTION OF PROGRAMMING LANGUAGES several IoT projects have a small portion of Shell scripts
Among the information that Linguist provides there is the (on average, 0.96% of the files in 23 projects).
primary language, which is the most used programming For non-IoT projects, JavaScript is still the only program-
language within a project (Figure 3). The most popular ming language with a significant percentage of files on most
primary programming language among non-IoT projects projects (66.47% on 23 projects). This results gives an initial
is JavaScript, which is the also the lead language since indication that the programming languages IoT developers
18 non-IoT projects use it (60%). It is followed far behind deal with are observably different and more varied from those
by C++ and C (3 and 1 project, respectively). IoT projects, worked on by non-IoT developers, and supports the idea that
instead, exhibit a more balanced distribution of primary lan- the development of IoT applications requires programmers
guages, with the most popular languages being C, C++, Java, with skills and expertise in several domains.

28342 VOLUME 8, 2020


F. Corno et al.: How Is OSS Development Different in Popular IoT Projects?

FIGURE 5. Percentage of contributors by file format.

FIGURE 6. Commit history over time by file format.

2) SPECIALIZATION OF CONTRIBUTORS BY proportion of contributors. Moreover, shell-oriented files


PROGRAMMING LANGUAGE (e.g., sh files) in non-IoT projects are modified by a signif-
Figures 5a and 5b illustrate the average percentage of contrib- icantly lower proportion of contributors, in comparison with
utors that modify the files developed in a given programming IoT projects. However, we must clarify that Figure 5 does
language, among the projects where it is present. Inside the not represent an overall ranking of the most used program-
IoT projects, the files modified by a higher proportion of ming languages among IoT and non-IoT projects. Instead,
the contributors are Java, C, C++, Python, and JavaScript. it corresponds to the programming language whose files are
As before, this result indicates that programming languages modified by a higher percentage of contributors, among the
used by IoT developers are more variegate and diverse than repositories that we analyzed. For instance, although Go is
other contexts, with a lower specialization towards a few the second programming language modified by a high per-
lead languages. On the contrary, shell executable files, batch centage of contributors, it is present in just three IoT projects
files, and command files are manipulated by a percentage that and five non-IoT projects, in both cases with around half of
reaches, on average, 15% of the contributors. This percentage the files.
suggests a higher level of specialization for shell-oriented
languages. 3) EVOLUTION OF FILES BY PROGRAMMING LANGUAGES
For what concerns non-IoT projects, the files modified by a Figures 6a and 6b aim at visualizing the files modified in the
higher proportion of the contributors are by far JavaScript and commits, grouped by their format. To facilitate the interpreta-
Go. The rest of the files are modified by a dramatically lower tion, the dates of the commits, from all the analyzed projects,

VOLUME 8, 2020 28343


F. Corno et al.: How Is OSS Development Different in Popular IoT Projects?

were normalized and placed on a common timeline since the accurate because GitHub is not able to identify the depen-
first commit to the data extraction date. Moreover, as the mod- dencies of a project if they are not defined in one of the
ifications to the files from the analyzed projects sum up to supported manifest file types.4 Moreover, these manifests are
approximately 0.6 million diffs in IoT projects, and 3 million limited to a reduced set of supported languages, namely Java,
in non-IoT projects, and larger projects have a significantly JavaScript,.NET, Python, and Ruby. For this reason, we had
higher number of commits, we decided to randomly sample to manually explore each project looking for the files where
500 modifications, at most, per each project. In this manner, dependencies are specified along with their versions.
we guaranteed that the graph could be readable and balanced When manually looking for the dependencies, we first tried
concerning the represented number of modifications from to find the equivalent to the manifest file in the project root
each project. Otherwise, there would be so many points that directory. If such a manifest did not exist, we proceeded to
it would not be possible to identify the trends, and most of examine the content of the files, through the GitHub search
them would belong to the larger projects. engine, looking for keywords that could help us to identify the
This visualization of the modifications in the commits files in which dependencies could have been declared. Con-
by files format allows observable trends concerning the fre- cretely, the query keywords were: dependencies, deps,
quency of the changes to be identified. This chart indicates dev-deps, import, include, require. Furthermore,
that compiled and interpreted programming languages are to identify the dependency’s corresponding repository on
continually modified along the IoT projects lifetime, while GitHub, we also used as a query keyword the substring
shell-oriented languages are rarely modified. Thus, the com- ‘‘github.com/’’. In that case, the search could highlight
mits over time are consistent with the specialization trends by the URL within GitHub of the declared dependencies. Unfor-
language (Figure 5), the presence of the programming lan- tunately, this strategy was not always effective, particularly
guages and the primary programming languages (Figures 4 in the largest projects where the query retrieved thousands
and 3). This shows that developers focus more on source code of source code files, most of which contained the keywords
concerning the business logic of the application rather than inside documentation blocks. When we were able to find one
the execution scripts. or more dependencies, we added them to the data gathered
Regarding non-IoT projects, JavaScript files are evidently with the GitHub API; otherwise, we assumed that the project
the most modified over time, no matter in which project they under analysis did not have any explicit dependency.
were used (e.g., user interface frameworks, general purpose Afterwards, the API data and the data gathered manually
libraries, MVC frameworks, runtime engines, programming were consolidated, and the analysis was performed taking
frameworks). Other types of files evolved equally, with no into account two conditions: (i) dependencies had to cor-
evident differences, across the various development phases. respond to open source software projects so that we could
explore and analyze them, (ii) the dependencies declared
RQ1: How developers of IoT vs. non-IoT OSS appli- directly in the analyzed project, only, were included: depen-
cations contribute to their projects regarding the pro- dencies of the dependencies were excluded from the analysis.
gramming languages that they adopt? IoT projects present Consequently, the number reported in the # Dependencies
contributions in diverse programming languages, without a column in Tables 1 and 2, corresponds to the number of
unique widely used language. In IoT projects, in addition, dependencies that could be correctly identified either via the
the files modified by a higher proportion of contributors are API or manually, and that satisfy the just described condi-
Java, C, C++, Python, and JavaScript. Additionally, Shell tions. For this reason, we must clarify that zero dependencies
executable files, Batch files, and Command files are manip- reported in the table does not necessarily imply that, in prac-
ulated by a percentage that reaches, on average, 15% of the tice, the concerned project does not have any dependencies
contributors. The above indicates a more variegate usage of at all.
programming languages and a higher level of specialization Regarding the number of dependencies, we observe that
in shell-oriented languages than in non-IoT projects. Con- developers of non-IoT projects adopt more dependencies than
cerning files’ evolution over time, compiled and interpreted those working on IoT projects. Specifically, IoT projects
programming languages are continually modified along the exhibited 1,084 dependencies, compared to 1,868 dependen-
IoT projects lifetime, while shell-oriented languages are cies for non-IoT projects (1.7×). In addition, the number
rarely modified. This is less visible for non-IoT projects. of dependencies shared among different repositories is sig-
nificantly higher in non-IoT projects. Accordingly, Figure 7
shows the percentage of dependencies present in a given
C. RQ2: MATURITY OF THE IoT SOFTWARE ECOSYSTEM number of projects. In both cases, the majority of the depen-
To investigate the maturity of the IoT software ecosystem dencies are not shared, but while in the non-IoT projects the
for answering RQ2, we explored the dependencies of each percentage of dependencies shared by 2 or more projects is
project and identified how many they are and which ones approximately 35%, in IoT projects is around 5%.
are present in the various projects. Initially, we relied on the
GitHub API to extract the data about dependencies. However, 4 https://help.github.com/articles/listing-the-packages-that-a-repository-
in this case, the data provided by the API is not completely depends-on/, last visited on June 6, 2019

28344 VOLUME 8, 2020


F. Corno et al.: How Is OSS Development Different in Popular IoT Projects?

and algorithm implementations. In the IoT projects, instead,


some of the most popular dependencies concern network
protocols client libraries, HTTP requests libraries, a serial
port access library, and a test framework. A few dependencies
were common across IoT and non-IoT projects, and they
are utilities mainly concerning code source code formatting,
linting, and testing.

RQ2: How developers exploit dependencies to reuse fea-


tures in IoT vs. non-IoT OSS projects? Non-IoT projects
have more dependencies than IoT projects (1.7×). Moreover,
the number of shared dependencies is significantly higher
FIGURE 7. Distribution of dependencies present in one or more projects.
for non-IoT projects. Although in both of them, IoT and
TABLE 3. Most popular dependencies of IoT projects. non-IoT projects, most of the dependencies were not shared
among different projects, in non-IoT projects the percentage
of dependencies shared by 2 or more projects is approxi-
mately 35%, while in IoT projects is around 5%. Finally,
the most popular dependencies in the analyzed IoT projects
were shared at most by 5 projects, and among these popular
dependencies, there were network protocols client libraries,
HTTP requests libraries, a serial port access library, and a
test framework. Among the most popular non-IoT projects,
instead, dependencies mainly concerned utilities aimed at
easing code development.

IV. DISCUSSION AND IMPLICATIONS


After presenting the results of our analysis, in this section we
focus on (i) a discussion of the results and on (ii) an analysis
of the implication that our work has both for researchers and
practitioners.
TABLE 4. Most popular dependencies of non-IoT projects.

A. DISCUSSION
Our results showed a number of points to be further high-
lighted and discussed, in particular:
The development of IoT applications is different. While
the knowledge about an inherent complexity in develop-
ing IoT applications was already hinted in the literature
(e.g., [6], [8], [11]), we evaluated this complexity in a more
quantitative way. We observed that developers, involved in
the creation of IoT vs. non-IoT software applications, are
less oriented towards the adoption of a lead programming
language, but they work with different programming lan-
guages, according to the task at hand or to the specific
capability of the infrastructure (e.g., a micro-controller or a
cloud service) where the IoT application should be deployed.
Furthermore, this heterogeneity of languages is also reflected
in the IoT projects’ topics, thus unveiling one of the main
sources of complexity in IoT applications development,
i.e., the co-existence of various kinds of devices, protocols,
Finally, Tables 3 and 4 present the list of the top-15 most and architectures within the same application. The tools and
popular dependencies among IoT and non-IoT projects, methodologies to support IoT developers can not, therefore,
respectively. By analyzing the type of the dependencies, be constrained to a given technological stack but they should
it can be highlighted that most of the dependencies of be language and platform agnostic.
non-IoT projects correspond to utilities aimed at easing code Specialization of a few contributors towards command-
development, such as parsers, test frameworks, beautifiers, line scripting. The percentage of contributors that modified

VOLUME 8, 2020 28345


F. Corno et al.: How Is OSS Development Different in Popular IoT Projects?

specific files and the tracking of the commits over the lifetime technological stack. In addition, research could focus
of IoT projects showed that a strong majority of the devel- on ways to abstract this heterogeneity, to allow develop-
opers are frequently modifying the files written in compiled ers to more easily share their IoT-related efforts, code,
and interpreted programming languages, where the business and documentation.
logic of the application reside, while a few contributors 2) Supporting automation for multiple and diverse
specialize in shell-oriented languages (e.g., bash), generally deployment targets. The specialization towards shell-
related to the configuration and deployment of the software oriented languages and their relative immutability, gen-
components in a particular execution environment. Indeed, erally related to the configuration and deployment
differently from non-IoT projects, shell-oriented languages of the software components in a particular execution
are present in most of the IoT projects. This result reveals environment or embedded device, may indicate that
that, in IoT projects, the execution environment is particularly execution environments are particularly relevant for
relevant yet problematic for what concerns the different (and IoT development. Research efforts should consider
often incompatible) target devices. approaches to deal with this devices heterogeneity and
The way files evolve is different. We observed the files to automate the generation and execution of deploy-
evolution during the history of software projects. IoT devel- ment commands across several, often incompatible,
opers focus more on compiled and interpreted programming devices.
languages (i.e., Java, C, C++, Python, and JavaScript) able to 3) IoT-specific dependencies sharing mechanisms. Our
fulfill the core business logic of the IoT application. All these results report that developers exploit some existing
files evolved equally across the various development phases, dependencies in their projects, but the same projects
while shell-oriented files are scarcely modified. IoT devel- do not present common dependencies. Likely, this is
opers seems not to focus on configuration and deployment due both to the heterogeneity of the IoT projects and
scripts, probably immutable once the target platform(s) is to the relatively new and not yet consolidated software
chosen. Conversely, non-IoT developers constantly and sig- community behind those projects. This represents an
nificantly evolve the JavaScript files of their applications, opportunity for researchers for the definition of novel
only, being they user interface frameworks, general purpose mechanisms that IoT developers can adopt to make
libraries, MVC frameworks, runtime engines, or program- their code more extensible, modular, and reusable,
ming frameworks. Other types of files evolved equally, with given the peculiarities of the deployment platforms.
no evident stops, across the various development phases.
Practitioners need to find appropriate ways to handle and
Dependencies are considered differently. Non-IoT
share dependencies, as well as to create a more focused
projects have more dependencies than IoT projects, and 35%
software community around these topics. Finally, confirming
of those dependencies are shared among 2 or more non-IoT
previous insights in the literature, our results suggest that IoT
projects. IoT developers do not only use less dependencies,
software development requires skills and expertise in several
but such dependencies are also shared among fewer projects,
and disparate domains, differently from those required by the
with only 5% of them shared by two or more repositories.
development of traditional software. Developers are indeed
However, dependencies in non-IoT projects mainly represent
called to be more creative and able to adapt to different
utilities, while dependencies in IoT projects are more varied
contexts and programming environments. Thus, it would be
and oriented towards software integration tasks. The rela-
beneficial for students to have dedicated courses (e.g., similar
tively high number of dependencies used by IoT projects may
to the courses reported in [12]) where they could gather these
entail a relatively good maturity of the IoT ecosystem, but the
skills to approach the development of IoT applications.
analysis also highlight some issues in sharing the knowledge
about the existence of a given dependency.
V. THREATS TO VALIDITY
A. SAMPLE VALIDITY
B. IMPLICATIONS
The selection criteria of the analyzed projects aimed to be as
The aforementioned findings have a number of implica-
neutral as possible from our appreciations. For this reason,
tions for researchers and practitioners. Researchers should
we only relied on their number of stars, prioritized them
acknowledge the specificity of this domain, and explicitly
accordingly, and took the 60 top starred ones. Additionally,
consider IoT-oriented software engineering as a study branch.
their IoT and non-IoT nature were determined by the topics
More specifically:
that the project owners assigned them. Since tags are freely
1) IoT-oriented tools and methodologies. Given the added by project owners, this might have excluded some
wide heterogeneity of IoT applications and adopted potentially interesting IoT projects from our analysis. The
programming languages, stemming from both the only two interventions of our criteria consisted of excluding
results and the literature, tools like Integrated Devel- projects that were not software related or without an open
opment Environments (IDEs) and software methodolo- source license. Nevertheless this selection procedure, unin-
gies to support IoT developers should be language and tentionally, resulted in a strong shift in the non-IoT projects
platform agnostic, and not constrained to any given towards web-related frameworks. However, we opted to keep

28346 VOLUME 8, 2020


F. Corno et al.: How Is OSS Development Different in Popular IoT Projects?

this selection criteria because, on the one hand, it was replica- the needs and challenges of software engineering in the IoT
ble and transparent, and on the other hand, it reveals GitHub context, and Software Mining research in other fields differ-
users trends about their interests. ent from IoT.
On the other hand, the inclusion of the most starred projects According to Morin et al. [14], IoT applications have two
spontaneously resulted in a significant number of files, com- main characteristics from a software engineering viewpoint.
mits, and an active contributors community. According to The first is their distribution over a large range of processing
Kalliamvakou et al. [10], these variables help to avoid perils nodes. The second is high heterogeneity of the processing
while performing software engineering research on GitHub. nodes and the protocols used between them. To deal with
Moreover, we took inspiration from the methodology adopted these characteristics, authors introduce a modeling language
by Pascarella et al. [13]. Authors included the same number aligned with UML, an advanced multiplatform code genera-
of projects in their comparative analysis of video games and tion framework, and a methodology specifying the develop-
non-video games OSS projects. ment processes and tools used by both IoT service developers
and platform experts.
B. FILE CLASSIFICATION VALIDITY Similarly, Čolaković and Hadžialić [15] hold that IoT soft-
We relied on the statistics provided by the GitHub API con- ware architectures and frameworks are necessary to overcome
cerning the percentage of programming language on each the inherent complexity of IoT systems and to provide an
project. As already mentioned, this measure is calculated environment for services composition. In their opinion, IoT
by GitHub using the open source Linguist library, which software platforms should be created as an Open Application
we assume, provides accurate statistics. However, we could Platform to enable modular design as well as providing an
asses the accuracy of such statistics later when computing the open API (Application Programming Interface) that would
percentage of contributors working on a given programming easily integrate sensors and other devices.
language. We locally cloned each project and, with a text On the basis that IoT applications have been based on
mining tool developed by us, we processed the commits to fragmented software implementations for specific systems
extract the files modified by each contributor and observed and use cases, Weyrich and Ebert [16] propose the use of
that the results delivered by our tool were consistent with the reference architectures as a mean to facilitate interoperability,
percentages retrieved through the API. simplify development, and ease implementation.
According to Larrucea et al. [11], no consolidated set of
C. DEPENDENCIES IDENTIFICATION software engineering best practices for the IoT has emerged
The GitHub API retrieves the number and list of depen- yet. On the author’s words, ‘‘IoT landscape resembles the
dencies if they are defined in one of the supported man- wild west, with programmers putting together IoT systems in
ifest file types, only. These types are only attached to ad hoc fashion’’. They consider that industry needs guidance
Java, JavaScript,.NET, Python, and Ruby projects. Therefore, to engineer the new generation of scalable, highly reactive,
to avoid inconsistencies in the analysis of ecosystem matu- often resource-constrained software systems characteristic of
rity, we had to manually explore each project looking for the IoT. Among such guidance, authors remark the need for a
the files where software dependencies and their versions are new generation of development environments and the training
specified. This manual process, given its complexity, could of the new generation of IoT software developers.
have lead to omissions or mistakes in the identification of the Patel and Cassou [17] draws attention to the lack of a
dependencies. software engineering methodology to support the entire IoT
Finally, the higher number of dependencies in the non-IoT application development life-cycle, which results in highly
projects could depend from the nature of these projects: they difficult to maintain, reuse, and platform-dependent design.
are homogeneous in web development and a large number of To deal with such difficulty, authors introduce a develop-
them have the same primary language (i.e., JavaScript). Given ment methodology for IoT application development, based
these conditions, it is logical that non-IoT projects share more on model-driven development and involving sensor network
dependencies among them than IoT projects, which are more macroprogramming techniques.
heterogeneous. Regarding IoT projects in OSS, Taivalsaari and Mikkonen
[18] hold that nowadays nearly all the component areas of a
VI. RELATED WORK typical IoT cloud back-end architecture can be constructed
This work lies in the software engineering domain and is from open source technologies. On their opinion, given the
intended to provide insights into the peculiarities of IoT availability and maturity of open source components, the role
development in the OSS context. To the best of our knowl- of back-end developers today could be characterized more as
edge, no other research aimed at exploring and analyzing how software composition or orchestration instead of traditional
developers work within several OSS IoT projects. Indeed, software development.
various authors have pointed out the need for research on Concerning sotfware mining, as mentioned before,
software engineering for IoT systems in view of the several the methodology followed in this work took inspiration
challenges that the development of such systems poses. In the from the work of Pascarella et al. [13], in the video games
following we approached the related work from two areas: OSS context. The authors conducted a study on 60 projects,

VOLUME 8, 2020 28347


F. Corno et al.: How Is OSS Development Different in Popular IoT Projects?

and their results confirmed the existence of significant dif- [9] H. Borges and M. T. Valente, ‘‘What’s in a GitHub Star? Understanding
ferences between game and non-game development, in terms repository starring practices in a social coding platform,’’ J. Syst. Softw.,
vol. 146, pp. 112–129, Dec. 2018.
of how project resources are organized and in the diversity of [10] E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and
developers specializations. Another source of inspiration was D. Damian, ‘‘The promises and perils of mining Github,’’ in Proc. 11th
the work of Ray et al. [19]: they performed a large scale study Work. Conf. Mining Softw. Repositories (MSR). New York, NY, USA:
ACM, 2014, pp. 92–101.
on GitHub about the of programming languages type and [11] X. Larrucea, A. Combelles, J. Favaro, and K. Taneja, ‘‘Software engineer-
use on software quality. They examined the interactions of ing for the Internet of Things,’’ IEEE Softw., vol. 34, no. 1, pp. 24–28,
language, domain, and defect type through a combination of Jan. 2017.
[12] F. Corno and L. De Russis, ‘‘Training engineers for the ambient intelli-
regression modeling, text analytics, and visualization. Their gence challenge,’’ IEEE Trans. Educ., vol. 60, no. 1, pp. 40–49, Feb. 2017.
results suggested that strong typing is modestly better than [13] L. Pascarella, F. Palomba, M. Di Penta, and A. Bacchelli, ‘‘How is video
weak typing, and among functional languages, static typing game development different from software development in open source?’’
in Proc. 15th Int. Conf. Mining Softw. Repositories (MSR). New York, NY,
is also somewhat better than dynamic typing. However, USA: ACM, 2018, pp. 392–402.
authors point out that effects arising from language design [14] B. Morin, N. Harrand, and F. Fleurey, ‘‘Model-based software engineering
are overwhelmingly dominated by the process factors such to tame the IoT jungle,’’ IEEE Softw., vol. 34, no. 1, pp. 30–36, Jan. 2017.
[15] A. Čolaković and M. Hadžialić, ‘‘Internet of Things (IoT): A review of
as project size, team size, and commit size. Additionally, they enabling technologies, challenges, and open research issues,’’ Comput.
determined that the defect proneness of languages, in general, Netw., vol. 144, pp. 17–39, Oct. 2018.
is not associated with software domains. [16] M. Weyrich and C. Ebert, ‘‘Reference architectures for the Internet of
Things,’’ IEEE Softw., vol. 33, no. 1, pp. 112–116, Jan. 2016.
[17] P. Patel and D. Cassou, ‘‘Enabling high-level application development for
VII. CONCLUSION the Internet of Things,’’ J. Syst. Softw., vol. 103, pp. 62–84, May 2015.
[18] A. Taivalsaari and T. Mikkonen, ‘‘On the development of IoT systems,’’
IoT software development is known to differ from the devel-
in Proc. 3rd Int. Conf. Fog Mobile Edge Comput. (FMEC), Apr. 2018,
opment of other kinds of applications. It poses several chal- pp. 13–19.
lenges and requires expertise in various areas due to the [19] B. Ray, D. Posnett, V. Filkov, and P. Devanbu, ‘‘A large scale study of
programming languages and code quality in GitHub,’’ in Proc. 22nd ACM
diverse features that IoT applications expose. In this article,
SIGSOFT Int. Symp. Found. Softw. Eng. (FSE). New York, NY, USA:
we provide empirical insights into the peculiarities of IoT ACM, 2014, pp. 155–165, doi: 10.1145/2635868.2635922.
software development through the analysis of OSS projects.
This analysis was structured around two criteria: the behav-
ior of the contributors, and the maturity of the IoT soft-
ware development ecosystem. Specifically, we conducted an FULVIO CORNO (Member, IEEE) has been the
exploratory study mining 30 popular IoT OSS and 30 popular Leader of the e-Lite Research Group, since 2002,
where he focuses on ambient intelligence systems
non-IoT OSS projects available on GitHub. Our results are by integrating novel interaction modalities with the
intended to provide evidence about IoT development charac- IoT architectures. He is currently a Full Professor
teristics (such as the distribution of programming languages, with the Department of Control and Computer
the specialization of contributors, the evolution of the files, Engineering, Politecnico di Torino. He is a mem-
ber of IEEE Computer Society and ACM.
and the adopted dependencies), that should be considered
by future research efforts aimed at better satisfying software
engineering needs in the IoT scenario.

REFERENCES LUIGI DE RUSSIS (Member, IEEE) has been


[1] J. A. Stankovic, ‘‘Research directions for the Internet of Things,’’ IEEE an Assistant Professor with the Department of
Internet Things J., vol. 1, no. 1, pp. 3–9, Feb. 2014. Computer and Control Engineering, Politecnico di
[2] D. Miorandi, S. Sicari, F. De Pellegrini, and I. Chlamtac, ‘‘Internet of Torino, since 2018. His current research focuses
Things: Vision, applications and research challenges,’’ Ad Hoc Netw., on human–computer interaction, with an interest
vol. 10, no. 7, pp. 1497–1516, Sep. 2012. on how to overcome interaction challenges in com-
[3] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, plex settings, such as within the IoT systems. He is
‘‘Internet of Things: A survey on enabling technologies, protocols, and a member of IEEE-HKN, IEEE Computer Society,
applications,’’ IEEE Commun. Surveys Tuts., vol. 17, no. 4, pp. 2347–2376, and ACM.
4th Quart., 2015.
[4] A. Zanella, N. Bui, A. Castellani, L. Vangelista, and M. Zorzi, ‘‘Internet of
Things for smart cities,’’ IEEE Internet Things J., vol. 1, no. 1, pp. 22–32,
Feb. 2014.
[5] S. M. Riazul Islam, D. Kwak, M. Humaun Kabir, M. Hossain, and
K.-S. Kwak, ‘‘The Internet of Things for health care: A comprehensive JUAN PABLO SÁENZ (Student Member, IEEE)
survey,’’ IEEE Access, vol. 3, pp. 678–708, 2015. is currently pursuing the Ph.D. degree with the
[6] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, ‘‘Internet of Things Department of Computer and Control Engineer-
(IoT): A vision, architectural elements, and future directions,’’ Future ing, Politecnico di Torino. His current research
Gener. Comput. Syst., vol. 29, no. 7, pp. 1645–1660, Sep. 2013. focuses on software engineering, with an interest
[7] L. Atzori, A. Iera, and G. Morabito, ‘‘The Internet of Things: A survey,’’ on development tools and methodologies for the
Comput. Netw., vol. 54, no. 15, pp. 2787–2805, Oct. 2010. IoT systems. He is a Student Member of ACM.
[8] A. Taivalsaari and T. Mikkonen, ‘‘A roadmap to the programmable world:
Software challenges in the IoT era,’’ IEEE Softw., vol. 34, no. 1, pp. 72–80,
Jan. 2017.

28348 VOLUME 8, 2020

You might also like