How Is Open Source Software Development Different in Popular IoT Projects
How Is Open Source Software Development Different in Popular IoT Projects
17, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.2972364
ABSTRACT From the software point of view, the development of IoT applications differs from other kinds of
applications due to the specific features that the former exhibit. In this paper, we investigate how developers
contribute to IoT applications in the Open Source Software (OSS) context, to gain a deeper understanding
of how their work differs from that of non-IoT applications. To that end, we conducted a quantitative
analysis of a broad set of the 60 most popular publicly available IoT and non-IoT projects on GitHub.
By comparing how developers contribute to these projects, our analysis provides insight into the purpose and
characteristics of the code, the behavior of the contributors, and the maturity of the IoT software development
ecosystem. Results reveal significant differences between IoT and non-IoT application development, in terms
of how applications are realized, in the diversity of developers’ specializations, and in how code is reused.
This work provides evidence about some Open Source IoT software development peculiarities to be
considered by future research efforts aimed at better satisfying software engineering needs in the IoT
scenario.
INDEX TERMS Internet of Things, open source software, software mining, developers.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 28337
F. Corno et al.: How Is OSS Development Different in Popular IoT Projects?
In particular, we conducted an empirical study mining popular ones. Additionally, focusing on the IoT OSS projects,
60 OSS repositories publicly available on GitHub. We mined we wanted to identify which aspects of IoT application devel-
30 IoT OSS and 30 non-IoT OSS projects to analyze a) the opment these dependencies address and how often they are
way developers contribute to their projects, b) the files that used by IoT developers. This leads to our second research
they tend to modify the most, and c) the specialization and question:
the evolution of these modifications. Finally, we assessed the
maturity of the IoT software development ecosystem based RQ2: How developers exploit dependencies to reuse features
on a dependency analysis in the selected projects. Besides in IoT vs. non-IoT OSS projects?
leveraging a characterization of IoT OSS projects currently
available for IoT developers, this work aims at providing B. SELECTION OF THE ANALYZED REPOSITORIES
evidence from a practical point of view about the IoT software To select a prominent widely-known and widely-used set
development peculiarities that should guide future research of IoT OSS repositories from GitHub, we first filtered
efforts to better understand and satisfy software engineering them by topic, choosing the ones that belong to the iot
needs in the IoT context. or internet-of-things topics on GitHub. Topics are
The remainder of the paper is structured as follows. labels to classify a repository based on its intended purpose,
Section II describes the research goal and questions and subject area, community, or language. They appear on the
outlines the selection process. Section III characterizes the main page of a repository and repository administrators can
selected OSS projects and describes the quantitative analysis add as many topics as they want to a repository.
conducted over them as well as the outcome of the analysis. Once the repositories belonging to the IoT topic were
Section IV discusses the results and presents further impli- filtered, 4,696 repositories were retrieved. Therefore, to pri-
cations, while threats to validity are outlined in Section V. oritize the most popular and well-evaluated ones, we sorted
Section VI presents the related work, while Section VII con- them according to the decreasing number of stars. Stars
cludes the article. enable GitHub users to keep track of repositories they find
interesting and to discover similar repositories [9], as well
II. RESEARCH GOAL AND QUESTIONS
as to show appreciation to the repository maintainers for
The overall goal of this research is to explore the potential their work.1 Lastly, we took the 30 top-starred repositories,
differences between the development practices for IoT and provided they were open source code repositories. In fact,
non-IoT projects in the OSS context. In particular, we are since a large portion of repositories on GitHub are not for
interested in identifying (a) the behavior of developers and software development [10], we inspected them manually to
the diversity of resources they manage, and (b) the reuse of exclude the ones that were not software related (i.e., tutorials,
features through the adopted dependencies. These two criteria documentation pages, icon-packs, fonts) or without an open
lead us to the research questions set out below. source license.
The same procedure was followed to select the non-IoT
A. RESEARCH QUESTIONS
repositories. The only difference was that the filter was modi-
We want to investigate whether and how developers adopt fied to include repositories belonging to any topic except iot
different programming languages and cover various special- and internet-of-things.
izations in IoT vs. non-IoT OSS projects. In particular, we are The data used in the analyses reported in this article was
interested in: mined from GitHub in August 2018. Tables 1 and 2 list the
• how different programming languages are used in the selected IoT and non-IoT repositories along with their salient
two domains; characteristics. Most of the information about the repositories
• whether IoT developers are more specialized in any was gathered through the GitHub GraphQL API v4.2
programming languages or certain types of files in their
project; III. OSS PROJECTS ANALYSIS
• how the usage of such programming languages evolve A. PROJECTS CHARACTERIZATION
over time. Before diving into the research questions, we report a charac-
Therefore, our first research question is: terization of the selected projects, to provide a brief but com-
plete overview and to set the stage for the subsequent analysis.
RQ1: How developers of IoT vs. non-IoT OSS applications Each project was examined individually to understand its pur-
contribute to their projects regarding the programming lan- pose and to assign it a genre. The genres aimed at describing
guages that they adopt? the nature of the projects. Then, through the GitHub API,
several characteristics were gathered, namely: the topics,
Our quantitative investigation, furthermore, exploits OSS
their size (kB and lines of code), their primary language, and
repositories by focusing on the maturity of the IoT ecosystem
their total number of programming languages. Additionally,
for a software development point of view. We investigate this
aspect in the repositories we selected by analyzing project 1 https://help.github.com/articles/about-stars/, last visited on June 6, 2019
dependencies, how many they are, and which are the most 2 https://developer.github.com/v4/, last visited on June 6, 2019
to put into perspective the comparison of the projects’ size, regarding the genre and the topics of the projects seem to
we illustrate through the heatmap graphs in Figures 1 and 2 be in line with various authors [6], [8], [11], who point out
the growth of the source code along the projects’ lifetime. that the development of IoT applications is more complex
As observed in Table 1, the genre of the IoT OSS projects and requires programmers with skills and expertise in sev-
is heterogeneous, as they are scattered across operating sys- eral domains as might be, for instance, mobile and cloud
tems, programming frameworks, libraries, network proto- computing, embedded devices, database design, and web
cols, databases, IoT platforms, and IDEs. At first glance, development.
no clear trend emerged concerning their purpose or appli- Concerning the size of the projects (in kB), the aver-
cation domain. On the contrary, when analyzing non-IoT age non-IoT project is almost three times larger (4.56×)
projects (Table 2), we can notice that most of them are related than a typical IoT project. However, if we look at LOC
to the web development area, with just 12 exceptions, such as (Lines Of Code), this difference decreases significantly: on
a machine learning framework, a Zsh framework, an operat- average, non-IoT projects contains 1.9M LOC, while IoT
ing system kernel, an IDE, a text editor, and a couple of open projects 1.0M (1.9×). The largest IoT project, for both
source programming languages. kB and LOC, corresponds to rt-thread, a real-time
The fifteen most commonly used topics across the IoT IoT operating system for embedded devices. Similarly,
projects (mqtt, raspberry-pi, arduino, hardware, esp8266, the largest non-IoT project is the Linux kernel followed far
esp32, embedded, robotics, javascript, java, iot-platform, i2c, behind by kubernetes. The smallest IoT project, in kB,
home-automation, gpio, docker) did not reveal a prevail- is BerryNet, a project to turn edge devices such as Rasp-
ing technology or application domain. Instead, the 15 top- berry Pi 3 into intelligent gateways with deep learning capa-
ics across the non-IoT projects (javascript, nodejs, html, bilities running locally, on the edge device itself, without the
framework, electron, css, windows, web, ui, react, python, need of an Internet connection. For what concerns LOCs,
macos, linux, go, frontend) are mostly about web develop- instead, the smallest IoT project is cylon, a JavaScript
ment. This fact leads us to think that neither in our classifi- framework for robots, drones, and the IoT, developed for
cation nor in the labels assigned by the owners to their IoT Arduino and similar boards. As may be observed in these
projects, there is a strong focus towards a particular domain last two projects, achieving a small size is fundamental given
or technology, thus further motivating our investigation and the fact that in most cases IoT software components are
research questions. Furthermore, our initial observations deployed on constrained devices with low computational
and/or storage resources. This same restriction holds for most the first measure is always one, and the following values rep-
of the other IoT projects, especially those to be deployed on resent the variation regarding the initial size of the projects’
the gateway architectural element. programming files. Hence, the last measure represents how
Finally, Figure 1 (for IoT projects) and Figure 2 many times the source code grew in comparison with respect
(for non-IoT projects) aim at visualizing the growth of to its initial size.
the projects’ source code, expressed as the proportion As can be observed in Figure 1a, a subset of four IoT
between the initial size of the programming files and their projects grew up hugely. Namely netdata (350 times,
size along the lifetime of the projects. We divided the period 24.4 MB, and 7.3k commits), home-assistant
between the first commit in the project and the last commit (101 times, 85.7 MB, and 14.7k commits), gobot
before August 2018 (the date when the repositories were (108 times, 9.6 MB, and 2.5k commits), and crate
mined for this analysis), into 21 equally spaced date intervals (261 times, 86.5 MB, and 8.7k commits). Indeed, while
for each project. Then, on each of these dates, we checked the average growth is 35.15 times, the standard deviation
out from GitHub the corresponding version of the project is 78.83 times. To improve the readability of the graph for
and calculated the size of the programming files. To this end, project with less dramatic growth, we generated a second
we relied on Linguist; the open-source library that GitHub heatmap visualization, restricted to the projects whose final
uses to determine file languages for syntax highlighting, growth is below the mean, only (Figure 1b).
and project statistics.3 Specifically, we used the Ruby API Concerning non-IoT projects (Figure 2b), five of them
provided by this library that, given a directory, returns a grew up significantly, although not as dramatically as the
dictionary with the detected programming languages along subset of IoT projects that grew above the mean. These
with their size. repositories were: oh-my-zsh (60 times, 4.7 MB, and 4.7k
The growth of the project was calculated by dividing the commits), create-react-app (39 times, 5.7 MB, and
size of each checked out version of the project by the size 1.7k commits), moby (30 times, 137.5 MB, and 35.8k com-
of the second checked out version. By taking the second mits), three.js (72 times, 662.9 MB, and 25.2k commits),
version instead of the first one (initial commit) we could avoid and meteor (47.9 times, 76.0 MB, and 21.6k commits).
empty projects (without source code) that would have made The average growth in non-IoT projects is 11.88 times, and
our calculation impossible or meaningless. In this manner, the standard deviation 18.81 times. As with the IoT projects,
Figure 2b reports a second heatmap visualization with the IoT
3 https://github.com/GitHub/linguist, last visited on November 26, 2019 projects whose final growth is below the mean.
were normalized and placed on a common timeline since the accurate because GitHub is not able to identify the depen-
first commit to the data extraction date. Moreover, as the mod- dencies of a project if they are not defined in one of the
ifications to the files from the analyzed projects sum up to supported manifest file types.4 Moreover, these manifests are
approximately 0.6 million diffs in IoT projects, and 3 million limited to a reduced set of supported languages, namely Java,
in non-IoT projects, and larger projects have a significantly JavaScript,.NET, Python, and Ruby. For this reason, we had
higher number of commits, we decided to randomly sample to manually explore each project looking for the files where
500 modifications, at most, per each project. In this manner, dependencies are specified along with their versions.
we guaranteed that the graph could be readable and balanced When manually looking for the dependencies, we first tried
concerning the represented number of modifications from to find the equivalent to the manifest file in the project root
each project. Otherwise, there would be so many points that directory. If such a manifest did not exist, we proceeded to
it would not be possible to identify the trends, and most of examine the content of the files, through the GitHub search
them would belong to the larger projects. engine, looking for keywords that could help us to identify the
This visualization of the modifications in the commits files in which dependencies could have been declared. Con-
by files format allows observable trends concerning the fre- cretely, the query keywords were: dependencies, deps,
quency of the changes to be identified. This chart indicates dev-deps, import, include, require. Furthermore,
that compiled and interpreted programming languages are to identify the dependency’s corresponding repository on
continually modified along the IoT projects lifetime, while GitHub, we also used as a query keyword the substring
shell-oriented languages are rarely modified. Thus, the com- ‘‘github.com/’’. In that case, the search could highlight
mits over time are consistent with the specialization trends by the URL within GitHub of the declared dependencies. Unfor-
language (Figure 5), the presence of the programming lan- tunately, this strategy was not always effective, particularly
guages and the primary programming languages (Figures 4 in the largest projects where the query retrieved thousands
and 3). This shows that developers focus more on source code of source code files, most of which contained the keywords
concerning the business logic of the application rather than inside documentation blocks. When we were able to find one
the execution scripts. or more dependencies, we added them to the data gathered
Regarding non-IoT projects, JavaScript files are evidently with the GitHub API; otherwise, we assumed that the project
the most modified over time, no matter in which project they under analysis did not have any explicit dependency.
were used (e.g., user interface frameworks, general purpose Afterwards, the API data and the data gathered manually
libraries, MVC frameworks, runtime engines, programming were consolidated, and the analysis was performed taking
frameworks). Other types of files evolved equally, with no into account two conditions: (i) dependencies had to cor-
evident differences, across the various development phases. respond to open source software projects so that we could
explore and analyze them, (ii) the dependencies declared
RQ1: How developers of IoT vs. non-IoT OSS appli- directly in the analyzed project, only, were included: depen-
cations contribute to their projects regarding the pro- dencies of the dependencies were excluded from the analysis.
gramming languages that they adopt? IoT projects present Consequently, the number reported in the # Dependencies
contributions in diverse programming languages, without a column in Tables 1 and 2, corresponds to the number of
unique widely used language. In IoT projects, in addition, dependencies that could be correctly identified either via the
the files modified by a higher proportion of contributors are API or manually, and that satisfy the just described condi-
Java, C, C++, Python, and JavaScript. Additionally, Shell tions. For this reason, we must clarify that zero dependencies
executable files, Batch files, and Command files are manip- reported in the table does not necessarily imply that, in prac-
ulated by a percentage that reaches, on average, 15% of the tice, the concerned project does not have any dependencies
contributors. The above indicates a more variegate usage of at all.
programming languages and a higher level of specialization Regarding the number of dependencies, we observe that
in shell-oriented languages than in non-IoT projects. Con- developers of non-IoT projects adopt more dependencies than
cerning files’ evolution over time, compiled and interpreted those working on IoT projects. Specifically, IoT projects
programming languages are continually modified along the exhibited 1,084 dependencies, compared to 1,868 dependen-
IoT projects lifetime, while shell-oriented languages are cies for non-IoT projects (1.7×). In addition, the number
rarely modified. This is less visible for non-IoT projects. of dependencies shared among different repositories is sig-
nificantly higher in non-IoT projects. Accordingly, Figure 7
shows the percentage of dependencies present in a given
C. RQ2: MATURITY OF THE IoT SOFTWARE ECOSYSTEM number of projects. In both cases, the majority of the depen-
To investigate the maturity of the IoT software ecosystem dencies are not shared, but while in the non-IoT projects the
for answering RQ2, we explored the dependencies of each percentage of dependencies shared by 2 or more projects is
project and identified how many they are and which ones approximately 35%, in IoT projects is around 5%.
are present in the various projects. Initially, we relied on the
GitHub API to extract the data about dependencies. However, 4 https://help.github.com/articles/listing-the-packages-that-a-repository-
in this case, the data provided by the API is not completely depends-on/, last visited on June 6, 2019
A. DISCUSSION
Our results showed a number of points to be further high-
lighted and discussed, in particular:
The development of IoT applications is different. While
the knowledge about an inherent complexity in develop-
ing IoT applications was already hinted in the literature
(e.g., [6], [8], [11]), we evaluated this complexity in a more
quantitative way. We observed that developers, involved in
the creation of IoT vs. non-IoT software applications, are
less oriented towards the adoption of a lead programming
language, but they work with different programming lan-
guages, according to the task at hand or to the specific
capability of the infrastructure (e.g., a micro-controller or a
cloud service) where the IoT application should be deployed.
Furthermore, this heterogeneity of languages is also reflected
in the IoT projects’ topics, thus unveiling one of the main
sources of complexity in IoT applications development,
i.e., the co-existence of various kinds of devices, protocols,
Finally, Tables 3 and 4 present the list of the top-15 most and architectures within the same application. The tools and
popular dependencies among IoT and non-IoT projects, methodologies to support IoT developers can not, therefore,
respectively. By analyzing the type of the dependencies, be constrained to a given technological stack but they should
it can be highlighted that most of the dependencies of be language and platform agnostic.
non-IoT projects correspond to utilities aimed at easing code Specialization of a few contributors towards command-
development, such as parsers, test frameworks, beautifiers, line scripting. The percentage of contributors that modified
specific files and the tracking of the commits over the lifetime technological stack. In addition, research could focus
of IoT projects showed that a strong majority of the devel- on ways to abstract this heterogeneity, to allow develop-
opers are frequently modifying the files written in compiled ers to more easily share their IoT-related efforts, code,
and interpreted programming languages, where the business and documentation.
logic of the application reside, while a few contributors 2) Supporting automation for multiple and diverse
specialize in shell-oriented languages (e.g., bash), generally deployment targets. The specialization towards shell-
related to the configuration and deployment of the software oriented languages and their relative immutability, gen-
components in a particular execution environment. Indeed, erally related to the configuration and deployment
differently from non-IoT projects, shell-oriented languages of the software components in a particular execution
are present in most of the IoT projects. This result reveals environment or embedded device, may indicate that
that, in IoT projects, the execution environment is particularly execution environments are particularly relevant for
relevant yet problematic for what concerns the different (and IoT development. Research efforts should consider
often incompatible) target devices. approaches to deal with this devices heterogeneity and
The way files evolve is different. We observed the files to automate the generation and execution of deploy-
evolution during the history of software projects. IoT devel- ment commands across several, often incompatible,
opers focus more on compiled and interpreted programming devices.
languages (i.e., Java, C, C++, Python, and JavaScript) able to 3) IoT-specific dependencies sharing mechanisms. Our
fulfill the core business logic of the IoT application. All these results report that developers exploit some existing
files evolved equally across the various development phases, dependencies in their projects, but the same projects
while shell-oriented files are scarcely modified. IoT devel- do not present common dependencies. Likely, this is
opers seems not to focus on configuration and deployment due both to the heterogeneity of the IoT projects and
scripts, probably immutable once the target platform(s) is to the relatively new and not yet consolidated software
chosen. Conversely, non-IoT developers constantly and sig- community behind those projects. This represents an
nificantly evolve the JavaScript files of their applications, opportunity for researchers for the definition of novel
only, being they user interface frameworks, general purpose mechanisms that IoT developers can adopt to make
libraries, MVC frameworks, runtime engines, or program- their code more extensible, modular, and reusable,
ming frameworks. Other types of files evolved equally, with given the peculiarities of the deployment platforms.
no evident stops, across the various development phases.
Practitioners need to find appropriate ways to handle and
Dependencies are considered differently. Non-IoT
share dependencies, as well as to create a more focused
projects have more dependencies than IoT projects, and 35%
software community around these topics. Finally, confirming
of those dependencies are shared among 2 or more non-IoT
previous insights in the literature, our results suggest that IoT
projects. IoT developers do not only use less dependencies,
software development requires skills and expertise in several
but such dependencies are also shared among fewer projects,
and disparate domains, differently from those required by the
with only 5% of them shared by two or more repositories.
development of traditional software. Developers are indeed
However, dependencies in non-IoT projects mainly represent
called to be more creative and able to adapt to different
utilities, while dependencies in IoT projects are more varied
contexts and programming environments. Thus, it would be
and oriented towards software integration tasks. The rela-
beneficial for students to have dedicated courses (e.g., similar
tively high number of dependencies used by IoT projects may
to the courses reported in [12]) where they could gather these
entail a relatively good maturity of the IoT ecosystem, but the
skills to approach the development of IoT applications.
analysis also highlight some issues in sharing the knowledge
about the existence of a given dependency.
V. THREATS TO VALIDITY
A. SAMPLE VALIDITY
B. IMPLICATIONS
The selection criteria of the analyzed projects aimed to be as
The aforementioned findings have a number of implica-
neutral as possible from our appreciations. For this reason,
tions for researchers and practitioners. Researchers should
we only relied on their number of stars, prioritized them
acknowledge the specificity of this domain, and explicitly
accordingly, and took the 60 top starred ones. Additionally,
consider IoT-oriented software engineering as a study branch.
their IoT and non-IoT nature were determined by the topics
More specifically:
that the project owners assigned them. Since tags are freely
1) IoT-oriented tools and methodologies. Given the added by project owners, this might have excluded some
wide heterogeneity of IoT applications and adopted potentially interesting IoT projects from our analysis. The
programming languages, stemming from both the only two interventions of our criteria consisted of excluding
results and the literature, tools like Integrated Devel- projects that were not software related or without an open
opment Environments (IDEs) and software methodolo- source license. Nevertheless this selection procedure, unin-
gies to support IoT developers should be language and tentionally, resulted in a strong shift in the non-IoT projects
platform agnostic, and not constrained to any given towards web-related frameworks. However, we opted to keep
this selection criteria because, on the one hand, it was replica- the needs and challenges of software engineering in the IoT
ble and transparent, and on the other hand, it reveals GitHub context, and Software Mining research in other fields differ-
users trends about their interests. ent from IoT.
On the other hand, the inclusion of the most starred projects According to Morin et al. [14], IoT applications have two
spontaneously resulted in a significant number of files, com- main characteristics from a software engineering viewpoint.
mits, and an active contributors community. According to The first is their distribution over a large range of processing
Kalliamvakou et al. [10], these variables help to avoid perils nodes. The second is high heterogeneity of the processing
while performing software engineering research on GitHub. nodes and the protocols used between them. To deal with
Moreover, we took inspiration from the methodology adopted these characteristics, authors introduce a modeling language
by Pascarella et al. [13]. Authors included the same number aligned with UML, an advanced multiplatform code genera-
of projects in their comparative analysis of video games and tion framework, and a methodology specifying the develop-
non-video games OSS projects. ment processes and tools used by both IoT service developers
and platform experts.
B. FILE CLASSIFICATION VALIDITY Similarly, Čolaković and Hadžialić [15] hold that IoT soft-
We relied on the statistics provided by the GitHub API con- ware architectures and frameworks are necessary to overcome
cerning the percentage of programming language on each the inherent complexity of IoT systems and to provide an
project. As already mentioned, this measure is calculated environment for services composition. In their opinion, IoT
by GitHub using the open source Linguist library, which software platforms should be created as an Open Application
we assume, provides accurate statistics. However, we could Platform to enable modular design as well as providing an
asses the accuracy of such statistics later when computing the open API (Application Programming Interface) that would
percentage of contributors working on a given programming easily integrate sensors and other devices.
language. We locally cloned each project and, with a text On the basis that IoT applications have been based on
mining tool developed by us, we processed the commits to fragmented software implementations for specific systems
extract the files modified by each contributor and observed and use cases, Weyrich and Ebert [16] propose the use of
that the results delivered by our tool were consistent with the reference architectures as a mean to facilitate interoperability,
percentages retrieved through the API. simplify development, and ease implementation.
According to Larrucea et al. [11], no consolidated set of
C. DEPENDENCIES IDENTIFICATION software engineering best practices for the IoT has emerged
The GitHub API retrieves the number and list of depen- yet. On the author’s words, ‘‘IoT landscape resembles the
dencies if they are defined in one of the supported man- wild west, with programmers putting together IoT systems in
ifest file types, only. These types are only attached to ad hoc fashion’’. They consider that industry needs guidance
Java, JavaScript,.NET, Python, and Ruby projects. Therefore, to engineer the new generation of scalable, highly reactive,
to avoid inconsistencies in the analysis of ecosystem matu- often resource-constrained software systems characteristic of
rity, we had to manually explore each project looking for the IoT. Among such guidance, authors remark the need for a
the files where software dependencies and their versions are new generation of development environments and the training
specified. This manual process, given its complexity, could of the new generation of IoT software developers.
have lead to omissions or mistakes in the identification of the Patel and Cassou [17] draws attention to the lack of a
dependencies. software engineering methodology to support the entire IoT
Finally, the higher number of dependencies in the non-IoT application development life-cycle, which results in highly
projects could depend from the nature of these projects: they difficult to maintain, reuse, and platform-dependent design.
are homogeneous in web development and a large number of To deal with such difficulty, authors introduce a develop-
them have the same primary language (i.e., JavaScript). Given ment methodology for IoT application development, based
these conditions, it is logical that non-IoT projects share more on model-driven development and involving sensor network
dependencies among them than IoT projects, which are more macroprogramming techniques.
heterogeneous. Regarding IoT projects in OSS, Taivalsaari and Mikkonen
[18] hold that nowadays nearly all the component areas of a
VI. RELATED WORK typical IoT cloud back-end architecture can be constructed
This work lies in the software engineering domain and is from open source technologies. On their opinion, given the
intended to provide insights into the peculiarities of IoT availability and maturity of open source components, the role
development in the OSS context. To the best of our knowl- of back-end developers today could be characterized more as
edge, no other research aimed at exploring and analyzing how software composition or orchestration instead of traditional
developers work within several OSS IoT projects. Indeed, software development.
various authors have pointed out the need for research on Concerning sotfware mining, as mentioned before,
software engineering for IoT systems in view of the several the methodology followed in this work took inspiration
challenges that the development of such systems poses. In the from the work of Pascarella et al. [13], in the video games
following we approached the related work from two areas: OSS context. The authors conducted a study on 60 projects,
and their results confirmed the existence of significant dif- [9] H. Borges and M. T. Valente, ‘‘What’s in a GitHub Star? Understanding
ferences between game and non-game development, in terms repository starring practices in a social coding platform,’’ J. Syst. Softw.,
vol. 146, pp. 112–129, Dec. 2018.
of how project resources are organized and in the diversity of [10] E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and
developers specializations. Another source of inspiration was D. Damian, ‘‘The promises and perils of mining Github,’’ in Proc. 11th
the work of Ray et al. [19]: they performed a large scale study Work. Conf. Mining Softw. Repositories (MSR). New York, NY, USA:
ACM, 2014, pp. 92–101.
on GitHub about the of programming languages type and [11] X. Larrucea, A. Combelles, J. Favaro, and K. Taneja, ‘‘Software engineer-
use on software quality. They examined the interactions of ing for the Internet of Things,’’ IEEE Softw., vol. 34, no. 1, pp. 24–28,
language, domain, and defect type through a combination of Jan. 2017.
[12] F. Corno and L. De Russis, ‘‘Training engineers for the ambient intelli-
regression modeling, text analytics, and visualization. Their gence challenge,’’ IEEE Trans. Educ., vol. 60, no. 1, pp. 40–49, Feb. 2017.
results suggested that strong typing is modestly better than [13] L. Pascarella, F. Palomba, M. Di Penta, and A. Bacchelli, ‘‘How is video
weak typing, and among functional languages, static typing game development different from software development in open source?’’
in Proc. 15th Int. Conf. Mining Softw. Repositories (MSR). New York, NY,
is also somewhat better than dynamic typing. However, USA: ACM, 2018, pp. 392–402.
authors point out that effects arising from language design [14] B. Morin, N. Harrand, and F. Fleurey, ‘‘Model-based software engineering
are overwhelmingly dominated by the process factors such to tame the IoT jungle,’’ IEEE Softw., vol. 34, no. 1, pp. 30–36, Jan. 2017.
[15] A. Čolaković and M. Hadžialić, ‘‘Internet of Things (IoT): A review of
as project size, team size, and commit size. Additionally, they enabling technologies, challenges, and open research issues,’’ Comput.
determined that the defect proneness of languages, in general, Netw., vol. 144, pp. 17–39, Oct. 2018.
is not associated with software domains. [16] M. Weyrich and C. Ebert, ‘‘Reference architectures for the Internet of
Things,’’ IEEE Softw., vol. 33, no. 1, pp. 112–116, Jan. 2016.
[17] P. Patel and D. Cassou, ‘‘Enabling high-level application development for
VII. CONCLUSION the Internet of Things,’’ J. Syst. Softw., vol. 103, pp. 62–84, May 2015.
[18] A. Taivalsaari and T. Mikkonen, ‘‘On the development of IoT systems,’’
IoT software development is known to differ from the devel-
in Proc. 3rd Int. Conf. Fog Mobile Edge Comput. (FMEC), Apr. 2018,
opment of other kinds of applications. It poses several chal- pp. 13–19.
lenges and requires expertise in various areas due to the [19] B. Ray, D. Posnett, V. Filkov, and P. Devanbu, ‘‘A large scale study of
programming languages and code quality in GitHub,’’ in Proc. 22nd ACM
diverse features that IoT applications expose. In this article,
SIGSOFT Int. Symp. Found. Softw. Eng. (FSE). New York, NY, USA:
we provide empirical insights into the peculiarities of IoT ACM, 2014, pp. 155–165, doi: 10.1145/2635868.2635922.
software development through the analysis of OSS projects.
This analysis was structured around two criteria: the behav-
ior of the contributors, and the maturity of the IoT soft-
ware development ecosystem. Specifically, we conducted an FULVIO CORNO (Member, IEEE) has been the
exploratory study mining 30 popular IoT OSS and 30 popular Leader of the e-Lite Research Group, since 2002,
where he focuses on ambient intelligence systems
non-IoT OSS projects available on GitHub. Our results are by integrating novel interaction modalities with the
intended to provide evidence about IoT development charac- IoT architectures. He is currently a Full Professor
teristics (such as the distribution of programming languages, with the Department of Control and Computer
the specialization of contributors, the evolution of the files, Engineering, Politecnico di Torino. He is a mem-
ber of IEEE Computer Society and ACM.
and the adopted dependencies), that should be considered
by future research efforts aimed at better satisfying software
engineering needs in the IoT scenario.