Cloud Notes 2
Cloud Notes 2
2.1 Introduction
Over the last years, the Internet of Things (IoT) has moved from being a
futuristic vision to market reality. It is not a question any more whether IoT will
be surpassing the hype, it is already there and the race between IoT industry
stakeholders has already begun. The IoT revolution comes with trillions of
connected devices; however the real value of IoT is in the advanced processing
of the collected data. By nature, IoT data is more dynamic, heterogeneous and
unstructured than typical business data. It demands more sophisticated, IoT-
specific analytics to make it meaningful. The exploitation in the Cloud of data
obtained in real time from sensors is therefore very much a necessity. This
data processing leads to advanced proactive and intelligent applications and
services. The connection of IoT and BigData can offer: i) deep understanding
of the context and situation; ii) real-time actionable insight; iii) performance
optimization; and iv) proactive and predictive knowledge. Cloud technologies
offer decentralized and scalable information processing and analytics, and
data management capabilities. This chapter describes a Cloud based IoT and
BigData platform, together with their requirements. This includes multiple
sensors and devices, BigData analytics, cloud data management, edge-heavy
computing, machine learning and virtualization.
In this chapter, Section 2.2 introduces the characteristics of an online Cloud
IoT platform. Section 2.3 shows the challenge posed by the huge amount of
data to be processed, from the point of view of the quality and quantity of
data. It gives an overview of the technologies able to address those challenges.
11
12 IoT, Cloud and BigData Integration for IoT Analytics
Section 2.4 presents LoRa, a key enabler for the collection of the data. The
chapter includes also initial results of two EU-funded projects on IoT BigData:
WAZIUP in Section 2.5; and iKaaS in Section 2.6.
Cloud-based IoT platforms are usually based on the SaaS paradigm. They
provide IoT-related services using a web interface on a pay-per-use basis. For
example, a service such as Xively1 provides a web service with a database
able to store sensors data points. This data is then processed and displayed in
various graphics.
However, SaaS IoT platforms are limited to the possibility of their web
interface. They will not permit the developers to create complex and cus-
tom applications. Extensibility mechanisms are sometime offered, allowing
extending the web services offered with user-provided callbacks. However the
resulting application will not be homogeneous and will be difficult to maintain.
Instead, we present in Section 2.5 a concept of IoT Cloud platform based on the
PaaS paradigm. Developing an IoT BigData application is a complex task. A
lot of services need to be installed and configured, such as databases, message
broker and big data processing engines. With the PaaS paradigm, we abstract
some of this work. The idea is to let the developer specify the requirements of
his application in a specification file called the “manifest”. This specification
will be read by the PaaS framework and the application will be compiled and
instantiated in the Cloud environment, together with its required services.
1
https://xively.com
14 IoT, Cloud and BigData Integration for IoT Analytics
Distributed
The platform includes distributed information processing and computing
capabilities, distributed storage, distributed intelligence, and distributed data
management capabilities. These capabilities should be distributed across smart
devices, gateway/server and multiple cloud environments. The processing
capability needs to be migrated closer to users, to save bandwidth.
Scalable
The platform needs to be scalable in order to address the needs of a variable
number of the devices, services and users. The data management, storage and
processing services need to be dimensioned dynamically.
Real-Time
The platform need to be able to process data in real-time, i.e. providing a
fast analysis and responses for situations of urgency. A real-time data analysis
platform needs to be able to prioritize urgent traffic and processing from non-
urgent ones.
Programmable
The platform shall support programmable capabilities of IoT business and
service logics, data warehouse scheme, template of data and service model.
Interoperable
The platform provides interoperability between the different IoT services and
infrastructure. TheAPIs need to follow the existing standards. The components
are published and maintained as Open Source software. The target is to deliver
a common data model able to exploit both structured and unstructured data. In
order to create multimodal and cross-domain smart applications, it is necessary
to move from raw data to linked data and adopt unambiguous description of
relevant information.
Secure
The platform shall include security and privacy by design. This includes
different features like data integrity, localization, confidentiality, SLA. Holistic
2.3 Data Analytics for the IoT 15
approaches are required to address privacy & security issues across value
chains.
Batch Processing
Batch processing supposes that the data to be treated is present in a database.
The most widely used tool for the case is Hadoop MapReduce. MapReduce
is a programming model and Hadoop an implementation, allowing processing
large data sets with a parallel, distributed algorithm on a cluster. It can run
on inexpensive hardware, lowering the cost of a computing cluster. The latest
version of MapReduce is YARN, called also MapReduce 2.0. Pig provides a
higher level of programming, on top of MapReduce. It has its own language,
PigLatin, similar to SQL. Pig Engine parses, optimizes and automatically
executes PigLatin scripts as a series of MapReduce jobs on a Hadoop cluster.
Apache Spark is a fast and general-purpose cluster computing system. It
provides high-levelAPIs in Java, Scala, Python and R, and an optimized engine
that supports general execution graphs. It can be up to a hundred times faster
than MapReduce with its capacity to work in-memory, allowing keeping large
working datasets in memory between jobs, reducing considerably the latency.
It supports batch and stream processing.
Stream Processing
Stream processing is a computer programming paradigm, equivalent to
dataflow programming and reactive programming, which allows some appli-
cations to more easily exploit a limited form of parallel processing. Flink is a
streaming dataflow engine that provides data distribution, communication and
fault tolerance. It has almost no latency as the data are streamed in real-time
2.3 Data Analytics for the IoT 19
(row by row). It runs on YARN and works with its own extended version of
MapReduce.
Machine Learning
Machine learning is the field of study that gives computers the ability to learn
without being explicitly programmed. It is especially useful in the context
of IoT when some properties of the data collected need to be discovered
automatically. Apache Spark comes with its own machine learning library,
called MLib. It consists of common learning algorithms and utilities, including
classification, regression, clustering, collaborative filtering, dimensionality
reduction. Algorithms can be grouped in 3 domains of actions: Classification,
association and clustering. To choose an algorithm, different parameters
must be considered: scalability, robustness, transparency and proportionality.
KNIME is an analytic platform that allows the user to process the data in a
user-friendly graphical interface. It allows training of models and evaluation
of different machine learning algorithms rapidly. If the workflow is already
deployed on Hadoop, Mahout, a machine learning library can be used. Spark
also has his own machine learning library called MLib.
H20 is a software dedicated to machine-learning, which can be deployed
on Hadoop and Spark. It has an easy to use Web interface, which makes
possible to combine BigData analytics easily with machine learning algorithm
to train models.
Data Visualisation
Freeboard offers simple dashboards, which are readily useable sets of widgets
able to display data. There is a direct Orion Fiware connector. Freeboard
offers a REST API allowing controlling of the displays. Tableau Public is
a free service that lets anyone publish interactive data to the web. Once on
the web, anyone can interact with the data, download it, or create their own
visualizations of it. No programming skills are required. Tableau allows the
upload of analysed data from .csv format, for instance. The visualisation
tool is very powerful and allows a deep exploration the data. Kibana is
an open source analytics and visualization platform designed to work with
Elasticsearch. Kibana allows searching, viewing, and interacting with data
stored in Elasticsearch indices. It can perform advanced data analysis and
visualize data in a variety of charts, tables, and maps. Elasticsearch is a
highly scalable open-source full-text search and analytics engine. It allows to
store, search, and analyze big volumes of data quickly and in near real time.
20 IoT, Cloud and BigData Integration for IoT Analytics
For many adhoc applications, it is however more important to keep the cost
of the gateway low and to target small to medium size deployment scenario for
various specific use cases instead of the large-scale, multi-purpose deployment
scenarios defined by LoRaWAN. Note that even though several gateways
can be deployed to serve several channel settings if needed. In many cases,
this solution presents the advantage of being more optimal in terms of cost
as incremental deployment can be realized and also offer a higher level of
redundancy that can be an important requirement in developing countries for
instance.
Our LoRa gateway could be qualified as “single connection” as it is built
around an SX1272/76, much like an end-device would be. The cost argument,
along with the statement that too integrated components are difficult to repair
and/or replace in the context of developing countries, also made the ”off-the-
shelves” design orientation an obvious choice. Our low-cost gateway is based
on a Raspberry PI (1B/1B+/2B/3B) which is both a low-cost (less than 30 euro)
and a reliable embedded Linux platform. Our long-range communication
library supports a large number of LoRa radio modules (most of SPI-based
radio modules). The total cost of the gateway can be as low as 45 euro.
Together with the “off-the-shelves” component approach, the software
stack is completely open-source: (a) the Raspberry runs a regular Raspian
distribution; (b) our long range communication library is based on the SX1272
library written initially by Libelium and (c) the lora gateway program is kept as
2.5 WAZIUP Software Platform 23
2.5.3 Architecture
Figure 2.7 presents the full WAZIUP architecture. There are 4 silos (from left to
right): Application development, BigData platform, IoT platform, Sensors and
data sources. The first silo involves the development of the application itself.
A rapid application development (RAD) tool can be used, such as Node-
Red. The user provides the code source of the application, together with
the manifest. As a reminder, the manifest describes the requirements of the
application in terms of:
• Computation needs (i.e. RAM, CPU, disk).
• Reference to data sources (i.e. sensors, internet – sources . . .).
• BigData engines needed (i.e. Flink, Hadoop . . .).
26 IoT, Cloud and BigData Integration for IoT Analytics
2.5.4 Deployment
WAZIUP will be deployed and accessed in an African context, where internet
access is sometime scarce. WAZIUP therefore has a very strong constraint
2.6 iKaaS Software Platform 27
The iKaaS platform consists of two distinct Cloud ecosystems: the Local
Cloud and the Global Cloud. More specifically:
• A Local Cloud provides requested services to users in a limited geo-
graphical area. It offers additional processing and storage capability to
services. It is created on-demand, and comprises appropriate computing,
storage and networking capabilities.
• The Global Cloud is seen in the “traditional” sense, as a construct with
on-demand and elastic processing power and storage capability. It is a
“backbone infrastructure”, which increases the business opportunities
for service providers, the ubiquity, reliability and scalability of service
provision.
Local Clouds can involve an arbitrarily large number of nodes (sensors, actu-
ators, smartphones, etc.). The aggregation of resources comprises sufficient
processing power and storage space. The goal is to serve users of a certain
area. In this respect, a Local Cloud is a virtualised processing, storage and
networking environment, which comprises IoT devices in the vicinity of the
users. Users will exploit the various services composed of the Local Cloud’s
devices’ capabilities. For example, a sensor and its gateway equipped with the
iKaaS platform.
The Global Cloud allows IoT service providers to exploit larger scale
services without owning actual IoT infrastructure.
The iKaaS Cloud ecosystem will encompass the following essential
functionality:
• Consolidated service-logic, resource descriptions and registries will be
parts of the Global Cloud. These will enable the reuse of services.
Practically, a set of registries will be developed and pooling of service
logic and resources will be enabled.
• Autonomic service management will be part, firstly, of the Global
Cloud, and, then, in the Local Clouds. This functionality will be in
charge of (i) dynamically understanding the requirements, decomposing
the service (finding the components that are needed); (ii) finding the
best service configuration and migration (service component deploy-
ment) pattern; (iii) during the service execution, reconfiguring the
service, i.e., conducting dynamic additions, cessations, substitutions of
components.
• Distributed data storage and processing is anticipated for the struc-
ture of global and local clouds. This means capabilities for efficiently
2.6 iKaaS Software Platform 29
services like monitoring the blood pressure, monitoring the heart rate, weight
monitoring, location awareness, smart lighting, utility metering, notification
and reminders, etc., across health, well-being, security and home automation
domains. Additionally, for IoT and BigData in a given application as the
service is evolving, more and more services added to the applications/systems.
Therefore, it is important to design the iKaaS services as small and autonomous
as possible, with well-defined APIs to operate them individually.
iKaaS functional decomposition of an application/complex service (as
defined in the previous sub-section) allows to achieve loose coupling and
high cohesion of multiple services. Alternatively multiple simple services can
be composed into complex services for the purposes of various applications.
In Figure 2.10, the basic logic of service decomposition and composition are
shown.
Functional decomposition of services gives the agility, flexibility, scal-
ability of individual services to operate autonomously. Each of the simple
services is running in its own process and communicating with lightweight
mechanisms. The overall high-level service logics (e.g. software module) are
decomposed to multiple service logics or software modules which can be
delivered as independent runtime services. These services are built around
business capabilities and are independently deployable by fully automated
deployment machinery. There is a bare minimum of centralized management
Figure 2.12 iKaaS distributed local and global cloud with service migration.
in such systems is to decide where and when services should be migrated with
respect to users mobility, overall situation and environment context.
Edge/local computing can provide elastic resources to large scale data
process system without suffering from the drawback of cloud, high latency.
In cloud computing paradigm, event or data will be transmitted to the data
centre inside core network and result will be sent back to end user after a
series of processing. A federation of fog and cloud can handle the BigData
acquisition, aggregation and pre-processing, reducing the data transportation
and storage, balancing computation power on data processing. For example,
in a large-scale environment monitoring system, local and regional data can be
aggregated and mined at fog nodes providing timely feedback especially for
emergency case such as toxic pollution alert. Detailed and thorough analysis
as computational-intensive tasks can be scheduled in the cloud side.
Acknowledgement
This work has been produced in the context of the H2020 WAZIUP as well
as H2020 iKaaS projects. The WAZIUP/iKaaS project consortium would
like to acknowledge that the research leading to these results has received
References 37
funding from the European Union’s H2020 Research and Innovation Program
(H2020-ICT-2015/H2020-ICT-2015).
References
[1] Semtech, “SX1276/77/78/79 – 137 MHz to 1020 MHz Low Power Long
Range Transceiver. rev.4-03/2015,” 2015.
[2] S. Jeff McKeown, “LoRa – a communications solution for emerging
LPWAN, LPHAN and industrial sensing & IoT applications. http://
cwbackoffice.co.uk/docs/jeff∼20mckeown.pdf,” accessed 13/01/2016.
[3] LoRa Alliance, “LoRaWAN specification, v1.01,” Oct. 2015.
[4] Anders Quitzau, “Transforming Energy and Utilities through Big Data &
Analytics” Big Data & Analytic, IBM, http://www.slideshare.net/Anders
QuitzauIbm/big-data-analyticsin-energy-utilities
[5] www.waziup.eu
[6] www.ikaas.com