0% found this document useful (0 votes)
19 views15 pages

Stream Processing On Clustered Edge Devices

This document presents a novel distributed architecture for stream processing on clustered edge devices, addressing the limitations of traditional cloud-centric models in time-critical IoT applications. The proposed ECStream Processing middleware enables horizontal offloading of computational tasks among heterogeneous edge devices, significantly reducing network latency and improving response times. By utilizing in-memory processing and dynamic clustering, the architecture aims to enhance the efficiency of data-intensive applications at the edge of the network.

Uploaded by

mateusmartos4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views15 pages

Stream Processing On Clustered Edge Devices

This document presents a novel distributed architecture for stream processing on clustered edge devices, addressing the limitations of traditional cloud-centric models in time-critical IoT applications. The proposed ECStream Processing middleware enables horizontal offloading of computational tasks among heterogeneous edge devices, significantly reducing network latency and improving response times. By utilizing in-memory processing and dynamic clustering, the architecture aims to enhance the efficiency of data-intensive applications at the edge of the network.

Uploaded by

mateusmartos4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/345700826

Stream Processing on Clustered Edge Devices

Preprint · March 2020

CITATIONS READS
0 214

2 authors:

Rustem Dautov Salvatore Distefano


SINTEF University of Messina
59 PUBLICATIONS 647 CITATIONS 261 PUBLICATIONS 4,505 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Rustem Dautov on 10 November 2020.

The user has requested enhancement of the downloaded file.


1

Stream Processing on Clustered Edge Devices


Rustem Dautov∗ , Salvatore Distefano†‡
∗ SINTEF Digital, Oslo, Norway

[email protected]
† Università di Messina, Messina, Italy
‡ Kazan Federal University, Kazan, Russian Federation
[email protected], s [email protected]

Abstract—The Internet of Things continuously gener- and actuation resources, but also act as computational
ates avalanches of raw sensor data to be transferred to nodes in their own right. This way, execution of processing
the Cloud for processing and storage. Due to network tasks immediately after data are generated and reduction
latency and limited bandwidth, this vertical offloading
model, however, fails to meet requirements of time- of network traffic and latency, can be enabled by exploiting
critical data-intensive applications which must act upon own processing capabilities of edge devices.
generated data with minimum time delays. To address In this light, this paper further extends the scope of
such a limitation, this paper proposes a novel dis- Edge Computing towards a wider range of data-intensive
tributed architecture enabling stream data processing IoT application scenarios, where computational tasks can
at the edge of the network, broadening the principle of
enabling processing closer to data sources adopted by be offloaded to collocated edge devices. Specifically, it
Fog and Edge Computing. Specifically, this architecture proposes a novel distributed Stream Processing architecture
extends the Apache NiFi stream processing middleware to enable horizontal offloading at the edge, by clustering
with support for run-time clustering of heterogeneous devices and utilizing a shared pool of their contributed
edge devices, such that computational tasks can be resources to process computational tasks offloaded by
horizontally offloaded to peer devices and executed in
parallel. As opposed to vertical offloading on the Cloud, peers, i.e. Clustered Edge Computing, as apposed to the
the proposed solution does not suffer from increased traditional vertical offloading to Fog/Cloud nodes. By
network latency and is thus able to offer 5-25 times pushing intelligence to the very edge of the network
faster response time, as demonstrated by the experi- as close to the data source as possible, the proposed
ments on a run-time license plate recognition system. architecture aims to minimize the amount of data sent to
a remote server, reduce network latency, and thus achieve
Index Terms—Internet of Things; Edge Computing; faster processing results. Furthermore, considering storage
Big Data; Stream Processing; Horizontal and Vertical
limitations of edge devices, the proposed architecture also
Offloading; Apache NiFi; License Plate Recognition.
benefits from in-memory (stream) processing to minimize
the number of disk I/O operations.
I. Introduction, Motivation, and Contribution This proposed approach is implemented as the Edge Clus-
HE traditional Cloud-centric data processing model ter Stream Processing (ECStream Processing) middleware
T adopted by the Internet of Things (IoT) is only
suitable for scenarios with rather relaxed time constraints,
enabling time-constrained data-intensive applications to be
entirely deployed and executed at the edge. Accordingly,
as it fails to meet pressing requirements in terms of reaction the contribution of this paper is three-fold: i) a dynamic
time and network latency, especially in the presence of con- horizontal offloading pattern for distributed data processing
siderably big data streams. As time-critical IoT applications on clustered edge devices able to deal with the node
and services demand for near real-time data processing and churn at run-time; ii) a decentralized Stream Processing
reaction, they cannot rely on (potentially outdated) results architecture, extending the Apache NiFi middleware with
obtained by sending data over the network to a remote new clusterization services for in-memory processing of data
processing location. This becomes particularly challenging streams on clustered edge devices, distributing modules
in the context of bandwidth-constrained wireless connec- among clustered nodes. The proposed approach goes
tions ubiquitously present at the edge of IoT networks, beyond the traditional data parallelism model (e.g. MapRe-
thus limiting the disruptive potential of the IoT. Aiming duce) towards a task parallelism (pipeline) model, wherein
to address this challenge, the Fog computing paradigm atomic tasks are offloaded to peer edge devices, rather than
still remains limited in its capacity to support processing the full workflow, as in the traditional data parallelism;
of extreme amounts of continuously flowing data in a iii) a comparison of stand-alone local processing against
ubiquitous manner similar to the Cloud, whereas network Cloud (vertical offloading) and Clustered Edge Computing
latency (albeit much lower) is still present. Supported (horizontal offloading) through a case study, to demonstrate
by the ever-growing processing capabilities of devices at the viability of the approach and its potential exploitation
the network edge, the Edge Computing paradigm aims at in cluster capacity planning.
pushing intelligence to devices that not only provide sensing The remainder of the paper describes the ECStream
2

Processing approach, by first outlining the research context be treated as a Big Data challenge, hardly manageable by
resorting to a motivating video processing application the device on its own (unless specifically conceived for that),
domain and then reviewing the literature to highlight thus becoming a perfect motivating scenario for the present
main limitations of existing solutions (Section II). As work. A common technological trend to address such a
a potential way of addressing such limitations, Section limitation is to resort to vertical offloading, also looking for
III presents the main aspects of ECStream Processing, a right balance between low network latency of the Fog [3],
including the stack architecture and the corresponding [4], [5] and high computing capabilities of the Cloud [6],
clusterization workflow. Section IV describes preliminary [7], [8] by hierarchical resource allocation and orchestration
clusterization stages, while core activities are detailed architectures that transparently provision containerized
in Section V (discovery and selection) and Section VI resources. These are often coupled with some optimization
(deployment and operation). A preliminary implementation techniques and algorithms as in [9], [10], where the resource
of the ECStream Processing Cluster middleware is proposed allocation problem for optimal placement of video analytics
in Section VII, while Section VIII describes a case study queries in such a hierarchy is formulated. However, the
on a run-time license plate recognition system, deployed on performance of such client-server architectures is affected
clustered edge devices available on-board a vehicle. This by the network connection, worsening with the number of
way, ECStream Processing is compared to Cloud-based hops – a limitation hardly addressable by vertical offloading
horizontal offloading models through experiments with approaches due to the inevitable requirement to send
promising results and feedback. Section IX concludes the data remotely. In these circumstances, minimization of
paper with final remarks and an outlook for future work. the amount of data transferred over the network comes
as a natural fit and is acknowledged as one of the main
II. Research Context and Related Works concerns for the IoT research community [11].
As a next step towards keeping computation locally,
Delayed data analysis and feedback generation often
Edge Computing enabled data filtering and aggregation,
cannot be tolerated by critical systems, which rely on timely
as well as relatively simple processing to be executed on
(i.e. quasi real-time) operation. These limitations primarily
edge devices themselves as part of a more complex data
affect application domains involving, e.g., live video analysis
processing workflow, the rest of which is expected to be
where multiple image sensors, independently or combined,
accomplished by Fog and Cloud nodes [12]. Until recently,
continuously capture video streams for online processing
a shortcoming of the existing approaches focusing on data
at the intersection of IoT and Fog/Edge computing [1].
processing at the edge [13], [14] was the lack of support
Examples include intelligent surveillance systems (e.g.
for pooling computing resources of multiple collocated
object/face detection and recognition), smart mobility (e.g.
edge devices, which only became possible with the recent
dashcams and infotainment systems), Industry 4.0 (e.g.
advances in hardware and networking technologies. As a
machine vision for product/equipment surface inspection
result, there are existing works [15], [16], demonstrating
and staff tracking), emergency management, robotics, etc.
how edge devices can be clustered and managed through
Image frame
middleware at run-time, thereby achieving even lower
Object Detection
Object Foreground & Features latency. Similar to the Cloud- and Fog-level coordination,
these approaches rely on equipping edge nodes with agent-
Object Tracking Object Classification
Object Trajectory Object Type like virtual containers to enable orchestration and manage-
Alarm Generation Behaviour Analysis Video Indexing
ment. This way, edge devices are able to communicate with
each other to split, delegate, and share processing tasks.
Fig. 1: A generic video processing workflow [2]. Existing initial attempts to enable collaborative process-
A typical video processing workflow can be conceptually ing among edge devices by means of horizontal offloading
split into several steps, as depicted in Fig. 1. Various typically rely on a central (Fog) coordinator [8], [17], which
objects, present in input frames, are first detected and is in charge of cluster establishment and management.
then classified (i.e. recognized) according to a background Clustering and orchestration of edge devices using hybrid,
knowledge base. At the same time, detected objects might hierarchical Cloud-Fog-Edge offloading techniques are also
be tracked through a sequence of frames to analyze proposed in [8], [18], [19], still via centralized Fog/Cloud-
object behavior and actions. Another element of such based coordination. This limitation is partially addressed
systems is some kind of notification/alarming, as well by recent works focusing on Mobile Cloud Computing [20],
as storing of intermediate processing results and video Mobile Ad-hoc Clouds [2], [17], Mobile Edge Computing [21]
indexing. Depending on the system architecture, different and Cyber Foraging [22], which are able to pool resources of
processing steps can take place at various physical and collocated mobile phones and IoT devices into cloudlets [17]
logical locations. For example, the initial object detection or fog colonies [8], to support distributed processing. Similar
can take place immediately at the source, whereas more to their centralized predecessor Mobile Grid Computing,
complex operations are undertaken on a remote server. such approaches do not address the heterogeneity of edge
Raw video streams, continuously recorded by relatively IoT devices restricting their scope only to mobile devices,
modern capture devices, constitute a rather big data set providing basic infrastructure clustering mechanisms (e.g.
from a perspective of an edge device and, consequently, can pooling), not able to dynamically connect, discover, select,
3

Application & Services Legend:


and orchestrate devices. In more generic IoT contexts, Cluster Middleware Default
Multi-access Edge Computing [21] addresses networking Task Scheduling & AAA & Incentive component
Synchronisation Task Partitioner
Mechanisms Extended
issues by extending Mobile Edge Computing with support ZooKeeper Placement component
Load Balancing & Orchestration
Node Discovery
for wireless (radio) connectivity at the edge. Data Provenance, Networking &
New
Node Selector component
As opposed to these existing approaches, the research Backup & Recovery Communication

effort presented in this paper aims at enabling a decen- OS & Execution Environment
IoT Infrastructure
tralized architecture, where participating clustered edge Processing Sensing &
Storage Networking
devices can act as both cluster initiators/coordinators (CPU+RAM) Actuation

and worker nodes. This architecture takes into account Fig. 3: ECStream Processing stack.
the mobile and heterogeneous nature of edge devices and locations. This is highlighted in Fig. 2, where traditional
enables dynamic discovery, selection and management of vertical offloading to Fog and Cloud nodes (black dashed
suitable nodes at run-time. Similar ideas are proposed lines) are extended with horizontal offloading (white dashed
and discussed, albeit at a more conceptual level, in [23], lines) enabled by CEC.
[24], where the authors motivate for horizontal offloading
at the edge and outline a high-level architecture of a A. Stack Architecture
future system. As explained below, the proposed approach
The envisioned ECStream Processing has to deal with
builds upon an existing Stream Processing middleware for
run-time clusterization, task decomposition, distribution,
in-memory data analytics, currently designed for static
scheduling and orchestration, resource and data manage-
cluster configurations, and extends it with mechanisms
ment, serialization and synchronization. Furthermore, since
for dynamic clustering and task offloading at system run-
edge devices are usually resource-constrained, specifically
time. This approach goes beyond the traditional data-
in terms of storage facilities, a light-weight solution based
parallel processing model (i.e. MapReduce) and is able
on in-memory, online data processing of continuously
to ‘unpack’ Stream Processing workflows into finer-grained
streaming raw data has to be adopted. To this purpose,
atomic tasks, thereby adopting a task-parallel processing
among multiple available open-source options,1 we opt
model on clustered edge devices.
for Apache NiFi2 – a light-weight, customizable, fault-
III. ECStream Processing tolerant Stream Processing framework. As opposed to
more widely used implementations, such as Apache Storm,
Data Spark, or Flink, the main advantage of NiFi is its low
Center/
Cloud footprint – i.e. the smallest NiFi agent, written in C++
(thousands)
and specifically tailored to IoT devices, consumes as little as
5MB of memory. Based on the concept of Flow-Based Pro-
Fog Nodes
(millions) gramming, NiFi allows defining control logic as a workflow
composed of multiple interconnected processing steps (i.e.
processors). The built-in set of NiFi processors ranges from
Edge
Devices simple mathematical operations, data translation or format
(billions) conversion, to more complex analytical operations, and
can be further extended with user-customized processors.
Fig. 2: IoT data offloading and processing patterns. NiFi features also include support for cluster management
(i.e. ZooKeeper), scheduling algorithms, data serialization,
The challenges raised by data-intensive and time-critical
backup and replication, network communication, monitor-
IoT scenarios, such as online video processing, call for
ing, accounting, authentication and authorization (AAA),
a solution bridging infrastructure and software layers.
security and privacy using TLS encryption, and improved
Such a solution is expected to foster the convergence of
usability (e.g. IDE for visual workflow design).
multiple paradigms, spanning across Edge, Fog, and Cloud
To implement the ECStream Processing stack, we aimed
Computing in a computing continuum (see Fig. 2) coupled
to build upon existing functionality, making use of available
with Big Data (batch/stream) processing techniques. To
NiFi features wherever possible, extending built-in or
involve edge devices in this continuum, Edge Computing
developing new ones, specifically conceived for clustering
has to be enhanced with clustering techniques extending
and management of edge devices. Fig. 3 shows the resulting
its application domain towards Clustered Edge Computing
four-layer ECStream Processing stack architecture, differ-
(CEC). Combined with in-memory Stream Processing, CEC
entiating between i) completely novel (darker boxes with
paves the way for the proposed Edge Cluster Stream (EC-
bold italic labels), ii) existing and extended (yellow boxes
Stream) Processing approach, resulting in a flexible solution
with italic labels), and iii) existing and taken as-is (white
for time-constrained IoT data processing on a cluster of
boxes) components (bottom-up):
collocated edge devices. This way, data processing is no
longer Fog/Cloud-centric, but is rather Edge-centric – i.e. 1) IoT Infrastructure is no longer restricted to traditional
the workload is distributed among clustered edge nodes to servers, but also includes edge devices with their sensing
avoid network latency and improve performance, while the 1 https://github.com/manuzhang/awesome-streaming

Fog/Cloud servers remain as secondary processing/storage 2 https://nifi.apache.org/


4

and actuation facilities. In the video processing context The clusterization workflow is depicted in Fig. 4 and
taken as a reference, cameras and other smart devices is executed by the ECStream Processing stack shown in
(e.g. smartphones) with processing, networking and sensing Fig. 3. It continuously loops to dynamically adapt the
capabilities can be part of the infrastructure. edge cluster configuration to potential issues (due to the
2) OS & Execution Environment serve as a unified platform node churn) at run-time, thus requiring to perform all
for deploying and running middleware and software on the activity steps online, demanding for a light-weight,
top of the heterogeneous infrastructure. IoT heterogeneity low-latency implementation.
is still an open issue, since edge devices can differ in Clusterization is initiated by an edge node, the Initiator,
both hardware and software capabilities. A well-established willing to offload computational task to peers. To this
solution is containerization [25] – i.e. a light-weight form purpose, it may be required to first establish a connection
of virtualization, allowing to run multiple independent between nodes through a Cluster Enabling operation per-
applications in ‘sandboxed’, isolated environments, thus formed by a bootstrap node, the Network Manager. Once
achieving interoperability, while ensuring security and the connection is established, clusterization is triggered by
privacy crucial for the IoT. the Initiator, broadcasting offloading requests to edge nodes
3) Cluster Middleware is the core of the ECStream Pro- as part of the ECStream Processing Discovery service. In a
cessing stack. It is deployed on the resulting (homoge- video processing application, the camera usually acts as the
neous) execution environment and provides clusterization Initiator, delivering offloading requests to nearby network
functionality. It extends the original NiFi architecture nodes. Upon receiving a request, eligible devices in the
with six components, three of which are brand new (Task Initiator range can decide whether to support it. If so, they
Partitioner, Node Discovery and Node Selector, while the will then go through the Selection stage driven by various
rest are existing modules enhanced with CEC-oriented factors, such as mobility patterns, potential security issues,
features. Specifically, Networking & Communication is physical and network distances, etc. Then, it assigns tasks
extended with support for overlay networking, ZooKeeper to selected nodes by sending them configuration parameters
Placement & Orchestration is enhanced with functionality at the Placement and Configuration step. If there are at
for deploying and orchestrating distributed tasks in edge least 3 nodes in the cluster (i.e. a minimum sufficient
clusters, and AAA & Incentive Mechanisms include new number to run a leader election protocol), a Coordinator
primitives for subscription management (e.g. user contri- is elected among the nodes contextually. In case there is
bution profiles) and incentive mechanisms (e.g. reputation already an elected Coordinator, only an Ack is replied to
systems, gamification, social incentives, financial rewards). the configuration message. For the sake of simplicity, Fig.
4) Applications & Services benefit from ECStream Process- 4 assumes that the roles of Initiator and Coordinator are
ing results according to their business logic, even issuing undertaken by the same node. This way, the cluster is
downstream feedback actuation commands, thus promptly established and configured, ready to run offloaded tasks
closing the loop to meet application time constraints. in parallel on its nodes (Processing), supervised by the
customized ZooKeeper Placement & Orchestration module
B. ECStream Processing Workflow that performs Orchestration & Lifecycle Management. All
these steps are explained below in more details.
Network Initiator /
Node 1 Node 2 Node n
Manager Coordinator

Join Network Join Network


IV. Preliminary Stages
Join Network
Join Network
Cluster Enabling
Network Setup Network Setup Network Setup Network Setup A. Cluster Enabling
Networking and
Communication
~
~ ~
~ ~
~ ~
~ To support clusterization, networking issues have to
Task
be addressed first. Constituted by multiple mobile and
Partitioning decomposition portable smart devices that can move across different
Task Partitioner
geophysical and network locations, the IoT ecosystem is
Loop Offloading requests
very dynamic in its nature. Since the dynamic nature of
Ack
Discovery such topologies is underpinned by wireless connectivity
Node Discovery NAck
Ack coupled with mobility patterns, possibly inducing the
Selection Select traversal of different network domains, it is important to
Node Selector

Placement
Task mapping
and allocation
take into account issues such as (sudden) introduction of
and Configure (& Elect) address/port translators or security-oriented appliances
Configuration Ack (| Coordinator)
(e.g. firewalls) between nodes, which may outright block or
ZooKeeper
Ack (| Coordinator)
Placement significantly modify inter-node communications, hindering
Loop Run Run
the process of node discovery and clusterization. To this
Orchestration Return end, the built-in Networking & Communication module
& Lifecycle
Management
Return (see Fig. 3) is extended with support for ad-hoc topologies
Reduce
ZooKeeper Processing and overlay networking facilities. This further improves
Orchestration
the cluster discovery range, as well as node reliability,
Fig. 4: Sequence diagram of the clusterization process. allowing to enroll and keep edge nodes traversing different
5

subnets, even in the presence of network barriers that might functionality to be performed) and syntactic (e.g. data
otherwise impede their direct interaction. structure and format) aspects. Taking these self-describing
To implement this, we based on existing work [25], building blocks, the system can then chain complex work-
[26] that enables (transparent) network communications flows, validate information flows, input data and output
between nodes in different subnets via overlay networks. results automatically. Software composability and task
As depicted at the top of Fig. 4, an (overlay) Network decomposition are partially explored by literature [27], [28]
Manager (NM) gets contacted by other nodes aiming to and deserve further investigation, since partitioning is still
establish an always-on command-and-control stream of an open problem, especially in the context of the IoT and
messages, compliant with WebSocket-based Web Applica- Edge Computing scenarios.
tion Messaging Protocol. Network barriers are overcome On this premise, a convenient solution for partitioning
through WebSocket-based (reverse) tunnelling by piercing a (NiFi) stream processing workflow proposed in this
‘middle boxes’ for implementing overlay networks among paper is to first identify atomic tasks in a workflow and
edge nodes and transporting node-initiated network tunnels. then parallelize them. This is quite a trivial approach,
Specifically, transparent Layer-3 (L3) networking is enabled since a stream processing workflow mainly consists of
by the NM that instantiates, manages, and routes tunnels task sequences, conditional branches and parallel fork-
to each node during clusterization, as well as during actual join constructs without loops [29]. The resulting workflow
data processing afterwards. decomposed into a sequence of atomic tasks, will be then
Referring to the IoT video processing workflow, an exam- executed by exploiting a pipeline parallel model [30] on
ple of cluster enabling is shown in Fig. 5(a), where nodes clustered edge devices, thus maximizing the throughput.
with network restrictions, although Internet-connected, In the reference video processing scenario, a complex
send a join request to the NM, which replies with a setup workflow can be partitioned as shown in Fig. 1, and a
configuration to establish WebSocket reverse tunnelling task to be offloaded could request for devices equipped
and, as a result, enable communication with other nodes. with GPUs, optimized for such kind of processing. Further
details on the underlying partitioning algorithm can also
be found in [31].
B. Partitioning
Our target scenario assumes that a running edge node V. Discovery and Selection
cannot meet processing requirements of a data-intensive
application due to some resource constraints, and opts for A. Discovery
sharing computational tasks, thereby triggering offloading To integrate edge devices into a common cluster, it
at run-time, not at design/deployment time. It is also is required to discover them on the network first. The
assumed that the initial setup and the requirements for the discovery process should happen dynamically at run-
tasks to be offloaded are known and will further drive the time, since many edge devices are expected to be mobile
partitioning activities by the Initiator. From the horizontal (e.g. smartphones, tablets, and other hand-held portable
offloading perspective, such tasks can be allocated to nearby devices), i.e. joining and leaving the wireless network
devices composing a cluster. To do so, the original (com- unpredictably. This becomes particularly challenging as
plex) application logic has to be decomposed into simpler far as edge devices with sensing/actuating capabilities are
tasks tailored to edge device capabilities, thus treating the concerned – i.e. as opposed to more traditional nodes, these
original application as a sequence of atomic data processing need to be (semantically) described to be discoverable.
operations. Furthermore, it is also mandatory to identify The discovery process can be conceptually split into
the requirements of partitioned tasks driving the discovery two sequential steps: network discovery and functional
and selection steps to cluster matching devices. discovery. The network discovery mainly focuses on finding
Admittedly, partitioning is challenging to be imple- connected nodes reachable by the Initiator. Apart from
mented in an automated manner, as it cannot be gen- the TCP/IP-based networks (i.e. WiFi and Ethernet), it
eralized to any class of problems, since each sequential is also possible to discover peer devices through other
algorithm usually requires a specific partitioning model. wireless channels, such as Bluetooth, possibly implementing
Even restricting the scope to our target domain (i.e. discovery in parallel and even hierarchical ways [29].
data-intensive video processing applications) could not Once an edge node is discovered, the functional discovery
be enough, since it is required to go beyond pure data checks whether the node is eligible for running a task among
parallelism to let resource-constrained edge devices be those identified by partitioning. The presence of a wide
able to run simpler stream processing tasks, rather than range of heterogeneous edge devices does not guarantee
the full workflow. It is therefore necessary to identify and that all of available-discovered nodes will necessarily be
split concurrent blocks of the target sequential algorithm, capable of processing the current workload for a number of
and then apply a partitioning strategy that can possibly reasons (e.g. missing hardware/software components, low
combine different task decomposition models. Admittedly, computational capabilities, high network latency, etc.). In
this requires deep knowledge of the target application, and these circumstances, it is important to check first whether
descriptions of decomposed tasks have to include both a particular node is indeed suitable for processing a given
semantic (e.g. information to be exchanged or processed, task – that is, to check their functional suitability for a
6

NM NM 1: offlo NM
Ad-Hoc/Local adin select
1: setup reque g
Internet st
uest
Reverse Tunneling 2: join g req 2: nack
o ffl oadin
1 : .nack Initiator
ck, M
2: join 2: B.a 1: offloading
2: ack 2: ack request
1: setup

ue ing
1: offlo

req fload
adin

st
2: ack reques g

of
t

1:
Initiator
2: ack
2: ack
1: o.r.

1: requ
c) Node Selection

offl es
2: ack

oa t
din
2: nack

g
1: Task allocation and
node configuration
2: Coordinator
1: Ta 2: Co
1: offloading 1: sk allo ordin
Tas cati ator
request ka 2: on and no
Init./Coord. conllocat Coord de c
onfig
2: Coord figu ion in uratio
2:ack ratioand ator n

1:
Ta nd n ratio
n nod

sk od n
1: e

a figu
co

allo e
n
offloading

ca
request

tio
2:ack

n
a) Cluster Enabling b) Node Discovery d) Placement and Configuration

Fig. 5: An example of the ECStream Processing algorithm applied to the IoT video processing workflow.
task. To enable such kind of match-making, it is expected bottleneck in the future. Admittedly, the node itself is not
that previously discovered network nodes reply with a self- expected to be aware of this context-related information,
description specifying the provided resources and services which becomes known only to the Initiator, once it collects
(see Listing 1 for an example). Resource properties have node acknowledgments.
to match with task requirements for proper allocation to Accordingly, the selection of edge nodes becomes an
ensure the node can provide expected hardware resources important duty of the Initiator that collects replies from
and software components, e.g. specific sensor/actuator, all the nodes, and, therefore, has a global view on the
type of power supply (power line vs battery), network system, including context-related information. While dis-
connection (wired vs wireless), mobility pattern (static covery takes into account single task requirements when
vs mobile), security and privacy mechanisms, etc. For identifying a node, selection considers global policies to
example, they may not be equipped with a relevant image further filter the previously discovered nodes, aiming to
recognition software, a camera, or have sufficient battery achieve a balanced and robust topology (see Listing 2
charge, and, therefore, will not acknowledge their suitability below for an example). The Initiator has to evaluate
to participate in the given scenario. available nodes, which acknowledged its offloading requests,
The example shown in Fig. 5(b) depicts the discovered with respect to their suitability to global selection policies.
devices in the IoT image processing example. The network Upon receiving acknowledgments, the Initiator may follow
discovery, initiated by a node requiring task offloading a policy to select e.g. only those devices that exhibit
(i.e. the Initiator smart camera), only discovers Internet- sufficient computing capabilities to process a task, whereas
connected devices, exploiting the NM mediation for edge less powerful ones are to be excluded. Such a selection
nodes with network restrictions. Grey nodes include both procedure also serves to ‘homogenize’ and balance the
not Internet-connected devices and not available ones, future cluster, so that it is composed of nodes relatively
rejecting the discovery request through a Nack. equal in their computing capabilities and network latency
(i.e to avoid delayed processing by weaker nodes and
B. Selection further dis-synchronization). Selection policies might also
The functional compliance check performed by potential include costs, which are strictly related to the incentive
cluster nodes during discovery is not yet enough to establish mechanisms. In this case, the Node Selection component
a cluster. Network nodes have a limited view on the of the ECStream Processing framework in Fig. 4 also has
arrangement of a cluster – that is, they are only able to to interact with AAA & Incentive Mechanisms to enforce
evaluate their individual functional capabilities to address a selection policy taking into account credits, rewards and
task requirements, but not their suitability to be engaged related technologies.
in a cluster. For example, a device might be equipped with Noteworthy, during the selection process, more than
sufficient hardware resources, as well as image processing one node could meet the requirements of a task and,
software (i.e. thus meeting task requirements). However, vice-versa, the same node could meet requirements of
it might turn out that, due to its network location and multiple tasks. However, since we need to implement a
configuration, network latency between the cluster Initiator pipeline parallelism, at most one task can be assigned
and this node is unacceptably high, which might become a to a node to maximize the pipeline speedup. Allocating
7

tasks to nodes is quite challenging – a problem known there is a match, the node keeps and processes the flowfile,
as mapping in parallel computing, which, even in the otherwise it rolls back, placing the flowfile back on the
presence of constraints, falls into the class of NP-hard input queue to be forwarded to a different node.3 Such a
generalized assignment problems. To solve the ECStream mechanism allows to assign tasks to nodes according to
Processing mapping of tasks to edge devices subject to the the mapping schema defined in the previous steps.
pipeline constraint, analytical solutions cannot thus be a c) RESTful interface: NiFi can be accessed and managed
valid option, even with a low number of nodes (tens as in through a RESTful interface (which is also exploited by its
CEC). To this purpose, some heuristics, usually based on workflow design interface) that allows to programmatically
greedy approximation algorithms (e.g. first-/best-/worst- query and manage cluster nodes, as well as selectively
fit allocation) with polynomial time complexity, already connect or disconnect nodes according to task requirements
applied in Edge Computing contexts [32], can be adopted. and available node resources. This can be defined as a
Further investigation of this problem can be found in [33]. script or a custom processor triggered before deploying and
Fig. 5(c) depicts how the Initiator smart camera applies executing the workflow topology so as to avoid inconsistent
selection policies in the reference image processing scenario. and unstable behavior of the cluster at run-time.
One policy restricts the scope of the cluster to non-battery Fig. 5(d) depicts how the Initiator, supported by the de-
powered devices, thus excluding all mobile devices (e.g. scribed mechanisms, places and configures video processing
smartphones, tablets and smart vehicles) from the cluster. tasks on selected cluster nodes, and is then elected as the
Furthermore, since other two cameras are available, another Coordinator by the nodes through ZooKeepers, thus also
policy selects the one directly connected to cluster nodes, acknowledging the assignments before starting execution.
while the camera reached by tunneling, due to network
limitations, is discarded to reduce image processing delays. B. Operation
Once the edge cluster is established and configured, the
VI. Deployment and Operation nodes start running allocated tasks concurrently, sending
A. Deployment results to the Coordinator for reduction and aggregation, as
The previously identified tasks have to be deployed on requested by the original workflow. At the same time, the
the selected nodes and configured accordingly as part Coordinator manages and orchestrates the overall process-
of Placement and Configuration. This is performed by ing, periodically scanning the network to discover and select
the Coordinator, now identified by election among the new nodes. The selected nodes will then be configured as
selected peers orchestrated by ZooKeeper (usually the new cluster nodes in order to run corresponding workflow
Initiator acting as the driver of the clusterization process). tasks. This process is iterated till completion.
To inject the application logic into clustered edge nodes Node churn management mechanisms can be imple-
(i.e. the workflow tasks and corresponding configurations), mented by exploiting NiFi built-in ZooKeeper – a com-
the following mechanisms were modified or added to the monly present facility in Apache software projects. It allows
original NiFi to implement the ECStream Processing keeping track of disconnected/failed nodes and update
middleware in Fig. 3: the cluster topology with respect to available nodes. This
a) Custom prioritizers: a mechanism for specifying the module (ZooKeeper & Orchestration in Fig. 4) has been
order of delivering jobs to processors, extending NiFi further extended with the described run-time orchestration
default prioritizers (e.g. ‘First In – First Out’, ‘Last In functionality tailored to edge clusters, as described below.
– First Out’, etc.) with parametric custom prioritizers is In the case some incentives have been negotiated by the
developed. Based on flowfile attributes, a custom processor involved parties, this module has to also interact with the
can prioritize queueing flowfiles and thus define the process- AAA & Incentive Mechanisms to enforce corresponding
ing order. Such prioritizers only act on the task processing policies and finalize pending transactions.
scheduling and do not modify the cluster configuration or
the workflow topology. VII. Preliminary Implementation
b) Parametric flowfiles: in Stream Processing, individual In the preliminary implementation of the ECStream
processors composing a workflow have no direct commu- Processing middleware,4 the NiFi code baseline has been
nications and the workflow deployment is performed by extended to implement the described enhanced functional-
forwarding a flowfile from one processor to another through ity according to the architecture in Fig. 3. Following the
a queue. Furthermore, this implies that cluster nodes NiFi distributed philosophy and using embedded ZooKeeper
do not know about downstream processors and nodes, instances, it is deployed on top of edge devices using
thus preventing any dynamic run-time flowfile routing a decentralized zero-master coordination approach. This
based on characteristics of upcoming processors. This does means that participating devices are equipped with the
not allow to deploy a flowfile with specific requirements,
3 To prevent a flowfile from being infinitely queued due to the
e.g. containing a video frame, on a node with matching
resources, e.g. a GPU-based node processor. A flowfile absence of relevant processing nodes, it is possible to implement a
custom processor that will remove the flowfile from the queue for later
attribute-based compliance check overcomes this issue by processing, if there are no suitable processing nodes available.
comparing its attributes to the node resource properties. If 4 https://github.com/rdautov/ekstream
8

same middleware and equally suitable to act as both B. Port Scanner and Web Server
the Initiator/Coordinator of the cluster and usual worker Node Discovery can be implemented in several different
nodes. Once deployed and configured, each NiFi instance ways, ranging in their complexity based on the network
is responsible for a range of background routine operations, configuration and constraints. As far as the network discov-
including the default ones for networking and cluster ery of online nodes is concerned, this was implemented by
communication, security, job scheduling, synchronization, means of the TCP port scanning facilities and integrated
backup and recovery, distributed coordination, data prove- into the NiFi Web Server initialization code as the Port
nance, as well as the novel features discussed above. The Scanner. As a result, the Initiator is able to scan network
higher Application & Services level comes with an intuitive hosts on a specific port to detect other nodes running
thin client used to define workflows and transformations, the ECStream Processing middleware and, therefore, po-
and monitor the run-time cluster operation. This level deals tentially ready to join the cluster. To avoid situations
with the actual flow-based programming, where users are when some other software occupies the given port, nodes
able to design data flow topologies, made of data sources, discovered via the Port Scanner are also expected to report
processors, and connections between them. their unique ID, as part of the JSON heartbeat payload. If
Cluster Initiator/Coordinator
no node ID is reported, the network device is assumed not
Worker
Task Requirement HTTP
Nodes to be running the ECStream Processing middleware, and
List Tuples
Task Port ECSP therefore is no longer considered for clustered processing.
Partitioner Scanner Web Server
JSON
WoT TD Node Descriptor/
It is important to remark that TCP scanning, a simple,
Task Tuples Repo:
JSON Node HTTP
Flowfiles,
effective and standardized solution for network topology
Policies Selector Topology,
Selected Node List/ Provenance
discovery, requires that network nodes remain routable
Workflow Topology REST
Repo: Flowfiles, ECSP Zoo API
ECSP Zoo Cluster
and are not subject to address/port translation (or any
Topology, Settings
Provenance Cluster
Keeper Keeper other kind of filtering). As discussed in Section IV-A, this
Settings is achieved via the NM that provides overlay networking
capabilities to establish a virtual network for unhindered
Fig. 6: ECStream Processing middleware implementation
communication among nodes.
on top of Apache NiFi.
Listing 1: Device description in JSON-TD.
Fig. 6 schematically represents the design of the EC-
{ "id": "SBC_1",
Stream Processing (ECSP) middleware on Apache NiFi, "coordinates": [38.26, 15.60],
"ip": "172.30.127.77",
describing the interactions between the Initiator/Coordina- "properties": {
tor and worker nodes. As detailed below, in this preliminary "CPU": { "type": "number",
"description": "CPU clock in GHz",
implementation, we primarily focus on the basic features "href": "/node/properties/CPU" },
"RAM": { "type": "number",
to implement a minimal, yet viable middleware able to "description": "RAM capacity in GB",
establish and manage a cluster of edge nodes. "href": "/node/properties/RAM" },
"Storage": { "type": "number",
"description": "Storage capacity in GB",
"href": "/node/properties/storage" },
"Power": { "type": "number",
A. Task Partitioner "description": "Power type: -1 Power Line;
0-100 Battery level",
"href": "/node/properties/power" },
The preliminary implementation of Task Partitioner ... }
}
relies on a basic decomposition algorithm, which splits
the original workflow into connected sub-workflows, i.e. Listing 1 reports a JSON description of a single-board
tasks, as discussed in Section IV-B. This way, each task of computer (SBC), adopting the Web of Things (WoT) Thing
the workflow is considered as a NiFi processor represented Description (TD) model,5 provided by the worker node
by its (functional and non-functional) requirements, and Web Server in response to a request from the Initiator
connected to others according to the original workflow. Port Scanner. A node is characterized by a unique ID,
These tasks/sub-workflows are described by a list of geographical and network location, available hardware
requirements and expressed as tuples resources, available software functionality, etc. It is also
able to exchange heartbeats – light-weight JSON messages
(T ask ID, Req1 , ..., Reqn ) carrying all these relevant fields as payload.
where T ask ID is the task identifier, and Reqi =
(N amei , Opi , V aluei ) with i = 1, .., n is a requirement C. Node Selector
triple representing a constraint applied to a property The next step in the cluster configuration process is
(N amei ) with a threshold value (V aluei ) through a re-
lational operator (Opi ). For example, a simplified compu- the node selection, wherein the Initiator, based on node
tational task expressed by the tuple selection policies and task requirements – on the one hand,
and available nodes and resources – on the other, is able to
(T1 , (CP U, >=, 1), (RAM, >=, 1), (Storage, >=, 10), (OS, =, Linux))
configure the cluster as required. Node selection policies are
evaluated against collected node self-descriptions and may
requires at least a 1GHz CPU, 1GB of RAM and 10GB of
storage on a Linux-running device. 5 https://www.w3.org/TR/wot-thing-description/
9

range from simple rules specifying rather static threshold met by the available nodes (discoveredN odes), which are
values (e.g. a cluster node should have at least 1GB RAM then then contacted for their JSON descriptions (getDe-
and 1GHz CPU) to more sophisticated constraints that scription()). If the resources in devDescription exposed
take into account how well individual nodes can co-operate by the node match the requirements of the current task
within a cluster. That is, a selection policy might, for (taskFiltering()), the algorithm queries the node for further
example, restrict nodes with an excessive response time details on the exact resource property values (query()). If
– a potential shortcoming that might eventually affect the requirements do not match, the algorithm proceeds
the timely operation of the cluster in the long run. It to the next available node in discoveredN odes. Otherwise,
is assumed that monitoring of such metrics at run-time is the algorithm compares values of task requirements and
implemented using standard Linux resource and network node resources (reqResMatching()) and, if the selection
utilization facilities. With system performance as a priority, policies are also satisfied by the considered node (select()),
the implemented prototype relies on JSON for representing the node is selected (selN odesT asks.add()) for the current
policies – a simple, yet efficient way of capturing the node task. The respective node and tasks will not be further
selection logic. considered by the algorithm (discoveredN odes.remove()
and tasks.remove()). Finally, if there are not enough nodes
Listing 2: Task selection policy in JSON.
or they are not able to satisfy all task requirements, the
{ "policyId": "policy-1",
"rule": { algorithm exits with errors. Otherwise, the assignment is
"ruleId": "noBatteryDev", deemed accomplished, and the list of paired tasks and nodes
"node:/properties/power": {
"op": "=" is returned. The Port Scanner provides scanNetworkPort(),
"type": [-1] }
}, getDescription(), and query(), while the Node Selector
...
} exposes taskFiltering(), reqResMattching(), and select().
As stated above, the selection process can be considered
A simple selection policy expressed in JSON is reported as a mapping problem, which has been demonstrated to
in Listing 2. If a node description (similar to Listing 1) be NP-hard. Algorithm 1 implements a first-fit greedy
satisfies the task requirements of this policy, the node is approximation heuristic, which allocates tasks one by one
selected to run the considered task. To exploit the pipeline to a first-fitting (discovered) node. This way, the above
parallelism, only one task can be allocated to a node. This algorithm, which is at the core of the overall clusterization
means that allocation can potentially be unfeasible, if there process in Fig. 4, can be solved in polynomial time (O(n2 )).
are not enough nodes meeting task requirements. For the sake of feasibility, it is assumed that the number
Algorithm 1: Discovery and Selection algorithm of discovered nodes m is of the order of magnitude of the
involving Port Scanner (PS) and Node Selector (NS). number of tasks to be allocated n (m ∼ n, m ≥ n), since
Input: Task list tasks and selection policies selP ols reqResMatching() and select(), required to process a single
Output: Selected node with allocated task list selN odesT asks task-node pair, have constant time complexity (O(1)).
discoveredN odes ← PS.scanNetworkPort(8080);
foreach task ∈ tasks do
if discoveredN odes 6= null then D. ZooKeeper
selected ← f alse;
foreach node ∈ discoveredN odes do ZooKeeper 6 is a service for maintaining configuration
devDescription ← PS.getDescription(node); information, naming, providing distributed synchronization
if NS.taskFiltering(devDescription,task.reqs)
then and group services. It also provides basic facilities to imple-
nodeRes ← PS.query(devDescription.U RI); ment consensus, group management, leader election, and
if NS.reqResMatching(task.reqs, nodeRes)
then presence protocols. At its core, ZooKeeper is a distributed
if selected ← NS.select(selP ols, nodeRes) file system with so-called ZNodes that store snapshots of
then
selN odesT asks.add(node,task); the current system state. ZooKeeper can be configured
discoveredN odes.remove(node); to run either as a centralized stand-alone service, or as
tasks.remove(task);
break; . First fit algorithm. multiple embedded instances in a distributed manner. In the
end latter case, the available built-in synchronization and state
end
end management facilities allow to synchronize all instances
end with minimum delay in a reliable and fault-tolerant manner.
else
exit(-1); . Unfeasible: not enough devices! This is especially useful for distributed cluster setups,
end where individual nodes may become unavailable due to
if selected=false then
exit(-2); . Unfeasible: unallocated tasks! failures or network barriers. With multiple embedded
end ZooKeeper instances, each node is continuously updated
end
return selN odesT asks; with the current state of peer nodes and jobs in the
queue, thus ensuring their eventual execution even in
Algorithm 1 illustrates the overall discovery and selection the case of node failures. Distributed ZooKeepers are also
process triggered by the Initiator, which first scans the useful during the election of the Coordinator of the cluster
network for online nodes running the NiFi Web Server (initially assigned to the Initiator by default), whenever the
on port 8080 (scanNetworkPort(8080)), and then starts
checking if task requirements, expressed as tuples, are 6 https://zookeeper.apache.org/
10

current Coordinator node goes offline. The leader election that target devices are equipped with a microprocessor and
is implemented using a reliable and efficient protocol a wireless networking interface.
[34], ensuring that the cluster has its Coordinator at all Based on these requirements, a target scenario can be
times. All nodes in the cluster will then continuously send identified in the domain of relatively complex image/video
heartbeat/status information to the Coordinator, which processing in an urban environment, where mobile/portable
may also disconnect non-responsive nodes. Additionally, nodes can share their idle resources. In this context,
when a new node joins the cluster, it must first connect to among a number of public surveillance and monitoring
the currently-elected Coordinator to obtain the most up- applications, a particularly novel and challenging topic is
to-date flow. These activities, executed through ZooKeeper, run-time license plate recognition. At present, a network
have relatively small impact on the Coordinator, and are of traffic monitoring cameras typically covers only road
comparable to the overheads of the rest cluster nodes. junctions and intersections to detect speed limit violations,
Once the Initiator knows all nodes and their tasks within whereas most of the roads are not monitored at all.
the cluster, it is time to deploy the workflow topology. Moreover, such cameras are usually only involved in off-
This functionality is implemented by NiFi RESTful API line image recognition – i.e. they transfer captured images
and ZooKeeper, adapted to ECStream Processing purposes. (together with the violating speed values) to a server,
Among other things, the API provides entry points for responsible for the actual recognition of license plates.
querying and updating the current cluster configuration by, This limitation could be potentially addressed by a
e.g. connecting/disconnecting nodes or specifying stand- pervasive network of personal image capturing devices,
alone processes (i.e. executed on a single node). Accordingly, such as vehicle dashcams and personal smartphones. In-
the Initiator first updates its own settings, which are then deed, there are millions of drivers worldwide who use on-
synchronized across the cluster by embedded ZooKeepers. board cameras to continuously record the surrounding
ZooKeeper has also been extended to implement Orches- environment. The current use, however, is limited to offline
tration and Lifecycle Management by running a continuous manual analysis of the recorded video in case of various
looping routine, during which the configured computational incidents (accidents, car break-ins, ‘hit-and-go’, etc.), since
topology is executed on the cluster nodes in parallel. The run-time automated image analysis is currently beyond the
Coordinator Port Scanner keeps on scanning the network capabilities of a single device. The situation might change
for new potential worker nodes. Whenever a new node with the ubiquitous presence of increasingly powerful edge
appears on the network, it needs to go through the same devices, either personal hand-held gadgets or smart road-
initial steps and, if successful, will be added to the cluster side infrastructure. These vast processing capabilities open
in a seamless and transparent way – i.e. there is no need up opportunities for using on-board dashcams to perform
to stop and restart the already running cluster in order run-time situation assessment by pooling computing re-
for a new node to be integrated. This is also facilitated by sources of edge devices and distributing the workload in
ZooKeeper that handles the node churn and synchronizes an ECStream Processing fashion.
topology changes across all cluster nodes. Limiting the scope of the generic video processing work-
flow depicted in Fig. 1, the envisaged scenario, therefore,
VIII. Proof of Concept is the following. The dashcam installed in a vehicle acts as
the source of the video stream (and possibly as the WiFi
To fully demonstrate viability of the proposed approach, access point in the case there is no built-in access point
a pilot scenario had to meet the following requirements: in the vehicle), whereas other WiFi-enabled smart devices
a) The amount of data generated by the pilot and the available within the car, including personal smartphones,
application logic are large and complex enough to require tablets and an on-board infotainment system, can connect
Stream Processing techniques for their management. to the network and communicate with each other. The
b) Data processing, involved in the target scenario, is dashcam is then able to sample the video stream into
computationally intensive and goes beyond the capabilities individual frames and distribute them among participating
of a single edge device, thereby requiring offloading. nodes for parallel license plate recognition.
c) The pilot application logic can be decomposed into
simpler, ‘parallelizable’ tasks. A. Testbed Setup
d) The pilot scenario has strict time constraints.
To compare and evaluate the proposed approach to
e) The surrounding urban IoT environment, composed the existing technological baseline, below we present and
of various edge devices, is dynamic – that is, different explain how the challenge of automated run-time license
connected devices may randomly appear in close proximity plate recognition can be potentially faced. We thus identify
to the source of data at unpredictable rates. On the other three possible setups, in addition to a fourth setup running
hand, the environment should not be too dynamic either, on clustered edge devices. In all setups, OpenALPR7 is
as it happens, for example, in vehicular ad-hoc networks used as the underlying license plate recognition software.
characterized by very short-lasting connections. 1) Stand-alone OpenALPR on a single device (EC) - This
f) Collocated edge devices, albeit resource-constrained setup represents a situation when an onboard dashcam has
and/or mobile, are powerful enough to run Linux OS and
have an executable environment, such as JRE. This means 7 http://www.openalpr.com/
11

to perform license plate recognition against the captured be dynamically pushed to vehicles involved in the run-
video stream on its own, in a typical Edge Computing time license plate recognition, which then re-configure their
(EC) fashion. Admittedly, the dashcam market is saturated internal cluster nodes to the new plates. Upon detection, a
with different types of devices, varying in their hardware notification with a corresponding screenshot, GPS location,
specs and architectures. A common practice is to use a and a timestamp is issued to interested parties (e.g. police).
smartphone with a special app to act as a dashcam. As an Stand-alone node Cluster nodes (Parallel execution)
average representation of this plethora of dashcam models,
Image License
LicensePlates
License Plates
Plates
this setup employs a Raspberry Pi board running Raspbian Alerting
Alerting
Alerting
Sampling Recognition
Recognition
Recognition
OS with OpenALPR, thus sampling the video stream and
immediately feeding the resulting images to OpenALPR. Fig. 7: The streaming workflow in the context of the license
2) Stand-alone OpenALPR on Google Compute Cloud plate recognition scenario.
(IaaS) - Cloud-based functionality can be implemented by To be processed in parallel on the edge cluster, the license
deploying the free OpenALPR API on a public IaaS Cloud plates recognition workflow is partitioned into three custom
platform and making it available as a RESTful Web service. NiFi processors, as shown in Fig. 7. Image Sampling takes
This will receive and process incoming images and return an input video stream, samples it into separate frames,
the results back to the user. In our experiments we used a and transfers the resulting images to an output port for
Google Compute Engine8 virtual machine deployed in the recognition. License Plates Recognition is responsible
EU to implement this setup. for detecting and recognizing license plates in incoming
3) Stand-alone OpenALPR Cloud API (SaaS) - Apart images by invoking the OpenALPR library. As an output, it
from a freely available software library to be installed provides a list of license plates with a matching confidence
on-premises, OpenALPR also offers a commercial Cloud value. Alerting notifies interested parties whenever target
SaaS service9 (deployed in the US) providing a RESTful license plates are recognized. It can be configured for using
API for license plate recognition. Client applications can various channels, such as API, e-mail, or SMS.
either stream video or transfer a single image and receive To trigger the ECStream Processing clusterization pro-
notifications on the recognized license plate. cess shown in Fig. 4, on-board devices have to be located
4) OpenALPR on an Edge Cluster (CEC) - This on the same WLAN. This way, the dashcam (i.e. Raspberry
setup implements the proposed ECStream Processing Pi), acting as the Initiator, is able to scan the network to
architecture, in which video frames from a dashcam are discover worker nodes through its Port Scanner. Recipient
distributed among a cluster of edge devices (e.g. passenger nodes reply to the incoming request with their JSON-TD
smartphones) for Stream Processing in a CEC fashion. self-descriptions, as explained in Section VII. Then, the
Initiator, now becoming the Coordinator, is able to select
TABLE I: Testbed hardware specs and network speed test. suitable nodes. In the considered scenario, all nodes are
Round suitable for task offloading (both license plates recognition
Uplink, Downlink, Trip
Setup Hardware
Mbit/s Mbit/s Time, and alerting) and there are no selection policies to be
ms
Raspberry Pi 3 (1.2GHz ARM
enforced by the Coordinator. ZooKeeper thus allocates
EC n/a n/a n/a
Cortex-A53, 1GB RAM) the tasks to the available nodes, replicating them to run
n1-standard-1 (located in EU,
IaaS
2.52 GHz vCPU, 3.75GB RAM)
1.24 1.63 482
in parallel. As a result, the dashcam is tasked with a
SaaS Amazon EC2 (located in US)
Raspberry Pi 3 + 1-6 Samsung
0.77 0.93 524
stand-alone operation of sampling the video stream and
CEC Galaxy J5 (1.2GHz Cortex-A53,
1.5GB RAM)
16.58 26.9 144 broadcasting frames, whereas the worker nodes perform
license plate detection/recognition and alerting in parallel.
The configuration of all four testbeds is summarized in
Table I, which covers both hardware and network specifi- B. Experiments and Benchmarking
cation. At its current stage, the prototype implementation The main benchmarking metric for the license plate
relies on a pre-recorded dashcam stream as the input recognition experiments is the response time – i.e. the
video source,10 captured during a ride through London time difference between the instant when an image is
downtown in HD quality of 1,920 × 1,080 pixels, resulting first sampled by the dashcam and the instant when the
in 1,350 KB aggregate payload transferred on average, system accomplishes the license plate recognition task. This
at the frequency of 30 frames per second. Each device is metric is two-fold, and includes i) time delays associated
assumed to be running Debian-based Linux OS (Linux with network latency and data (de-)serialization when
Deploy11 was used to emulate the Linux environment on transferring images (overhead), and ii) time spent on
top of Android OS on smartphones) and a pre-deployed actual data processing (processing time). Further metrics
instance of the ECStream Processing Cluster middleware of interests for our case study are the throughput – i.e. the
with customized NiFi processors. Target license plates can number of frames each setup is able to process in a second,

8 https://cloud.google.com/compute/
and the speedup – i.e. the ratio between the response times
9 https://www.openalpr.com/cloud-api.html obtained by the sequential processing of the license plate
10 https://youtu.be/MM3W3FS-W8Q recognition workflow on the different setups and the ones
11 https://github.com/meefik/linuxdeploy obtained by parallel execution on the edge cluster varying
12

the number of nodes. To achieve statistically significant of the Cloud. Fig. 8b presents the same results in terms
results, the experiments were conducted over several days of throughput. Admittedly, the Cloud-enabled setups fail
with more than 1,000 iterations per setup. to provide continuous support for run-time license plate
recognition, whereas the stand-alone OpenALPR node can

1
Overhead Processing time process 0.319 frames per second.
20, 000

0.8
18, 595
Fig. 9a refers to CEC and illustrates how the performance
15, 195 improves as more nodes join the cluster. That is, starting

Throughput
Time (ms)

0.6
from two nodes in the cluster (the dashcam and one
10, 000

smartphone), where the average response time for a single

0.4
0.319
frame is 1,980 ms, the cluster grows up to 6 nodes (e.g.
0.2
3, 132
0.066
more passengers get in the car and contribute to the edge
0.054
cluster with their devices) that are able to process incoming
0

EC IaaS SaaS EC IaaS SaaS


images in parallel with a response time of 746 ms (533
(a) Response time (per frame - ms) (b) Throughput (frames per second) ms for processing and 213 ms of overhead) per frame.
Fig. 8: Benchmarking results on traditional setups. Please note the slight increase in the overhead (due to
more intensive data serialization, scheduling, and network
3, 000

transferring requirements), as the number of cluster nodes


2

Overhead Processing time grows. Fig. 9b refers to throughput and suggests that within
a fully-loaded car (4 passengers and the driver), the pooled
1.5
2, 000

1, 980 1.34
1.227 resources of 6 edge devices are enough to process 1.34
Throughput
Time (ms)

1.072
frames per second – a sufficiently high rate for run-time
1

1, 182 0.846
license plate recognition. A lower number of devices could
1, 000

933
815 746 0.505
be still acceptable (∼1-1.227 frames per second for 4-5
0.5

nodes), while Cloud-based solutions, with delays higher


than 15-18 seconds, cannot be considered for run-time
0

2 3 4 5 6 2 3 4 5 6
license plate recognition.
Number of cluster nodes Number of cluster nodes
By looking at the histogram charts, it becomes clear
(a) Response time (per frame - ms) (b) Throughput (frames per second)
that local processing (either in a stand-alone local mode
Fig. 9: Benchmarking results on a CEC edge cluster. or in a cluster) is already able to outperform the Cloud-
enabled architectures by avoiding the congested network
30 communication and the related overheads. Admittedly,
EC
IaaS 24.93
the results refer to this specific license plate recognition
SaaS 22.82 task and the corresponding experimental setup (i.e. image
19.93 20.37 size, sampling frequency, available cluster nodes, network
20 18.64
16.29
bandwidth, etc.). Nevertheless, it is expected that for
Speedup

15.73
similar data-intensive tasks (suitable for the proposed
12.86
ECStream Processing) there will be a threshold number
10
9.39
7.67
of nodes to share the workload horizontally, sufficient to
substitute the remote vertical offloading. The increase in
3.84 4.2
2.65
3.36 performance is best depicted by the speedup graph in Fig.
1.58
10, which compares the three stand-alone setups against
0

2 3 4 5 6
the edge cluster composed of 2-6 nodes.
Number of cluster nodes
C. Threats to Validity and Discussion
Fig. 10: Speedup of the CEC setup parallel processing vs In the conducted experiments, smartphones fully con-
EC, IaaS and SaaS stand-alone processing. tribute their available computing resources to the cluster,
The experimental results for all four setups are sum- whereas in reality they are expected to be running some
marized in Fig. 8 (traditional setups) and Fig. 9 (edge user applications and related background jobs. Potentially,
cluster). The 95% confidence interval is negligible due the minimum share of contributed resources can also be
to the high number of experiments (>1,000) and thus defined as a non-functional requirement, such that, for
is not shown in the figures. Fig. 8a depicts average time example, devices not able to guarantee at least 50% of
delay for the traditional setups. As it follows from the their hardware capacities are not selected. This can also
chart, running license plate recognition in a stand-alone apply to battery charge – e.g. devices with insufficient
mode on a single node (i.e. dashcam) takes 3,132 ms charge levels are not allowed (although in the presented
with almost no overheads. On contrary, in the vertical in-vehicle scenario, there is a possibility to charge a device).
Cloud-enabled setups, the main delay is caused by the Given the finite bandwidth of the wireless network, the
image transfers, whereas the actual image processing is increased number of cluster nodes and associated inter-
relatively fast due to the excessive hardware resources node data exchange may potentially lead to saturation of
13

the network, as well as to quickly drain the device battery. belonging to the cluster, are able to spread workload among
Albeit beyond the scope of this paper, these issues will themselves – that is, implement a horizontal offloading
need to be explored in the future, potentially applying pattern – and minimize the amount of data sent over po-
intelligent estimation techniques, as proposed in [35]. tentially congested network. As demonstrated by the proof-
It is also worth benchmarking the clusterization process of-concept implementation and benchmarking experiments,
as well, to provide a fair overview of the viability of the proposed approach outperforms Cloud-centric setups.
the presented solution. As it was explained, the current This way, traditional on-board video recording systems
implementation of node discovery and selection is based can be turned into online video analytics platforms to
on broadcast network scanning, which makes this process support a wide range of situation assessment scenarios in
relatively fast (i.e. up to 3 seconds to scan up to 256 LAN urban environments. These might range from simple object
addresses, collect acknowledgments, and reconfigure device detection of vehicles and people to more sophisticated
settings accordingly). The performance drops, however, tracking of subjects and reaction to critical situations.
with restarting the devices – that is, after each node has Along with the generally positive results demonstrated
overwritten its cluster settings, it is required to reboot in by the prototype implementation, there are some potential
order for the new configuration to take place. This process enhancements to be taken into account and addressed
might take up to 1 min (depending on the number of as part of future work. The histogram charts in Fig.s
cluster nodes and deployed NiFi processors). Same applies 8 and 9 suggest an interesting and challenging problem
to a situation, when a node joins an already running of generalizing the experimental observations across a
cluster – i.e. having received new cluster settings, it needs wider scope of processing tasks to identify an optimal
to update its configuration taking up to 1 minute. This configuration for a specific task at hand. That is, there
lack of support for ‘hot deployment’ is seen as a limitation are expected to be a minimum and a maximum number
of the current version of Apache NiFi, albeit this feature is of cluster nodes that will underpin a balanced clustered
already proposed to be included in one of the future releases. architecture. The minimum number of nodes justifies
The clusterization process is anyway a one-off process that establishing a cluster that will outperform the remote
is not expected to affect the system performance in the offloading, whereas the maximum number ensures that
long run. Furthermore, the clusterization overheads are no unnecessary/redundant nodes are added to the cluster.
comparable to or even lower than the ones of the Cloud This allows to plan and adjust the edge cluster to the
setups, where only the time required to launch a virtual problem at hand – e.g. as shown in Fig. 9b, it is possible
machine, depending on multiple criteria, typically takes at to tune the cluster throughput and achieve the required
least 50 seconds [36]. video frame rate by adding new nodes accordingly (e.g. 5
At last, speaking of the scalability of the proposed nodes for processing 1.2 frames per second).
solution, the experiments do not yet demonstrate significant M2M operation via Bluetooth, LoRa, Zigbee, or some
results, and this aspect needs to be further investigated. other Personal Area Network (PAN)/Ad-Hoc (MANET)
However, the main goal of the proposed edge clustering is to related protocols also deserve to be investigated. Such
improve the performance by reducing network latency. As protocols are mainly based on Master-Slave role profiles
shown in Fig 9a, the overhead increases proportionally to (i.e. one-to-one links), thus not immediately ready to
the number of cluster nodes. By interpolating these values, support the zero-master, many-to-many cluster topologies
it is assumed that an edge cluster of about 100 nodes (including NiFi). Indeed, any such solution should aim to
should have an overhead similar to the Cloud (∼15 sec). It establish a (full) mesh to enable the cluster middleware
is also necessary to consider the overhead to manage the to work as intended, albeit with unavoidable configuration
cluster for a resource-constrained edge device acting as the overheads. Moreover, the constrained bandwidth of existing
Coordinator, as well as security issues, all increasing linearly wireless technologies also limits their potential utilization
with the number of nodes. For these reasons, the proposed for data-intensive scenarios. As discussed above, network
approach is suitable for clustering and managing few nodes discovery is also worth to be investigated in this respect.
in the neighborhood, in the order of tens, thus minimizing Another relevant aspect is security and privacy. Edge
overhead and security issues. In the case of more complex clustering could be a way of securing computational task
computational task to offload, it is recommended to resort offloading by enforcing security properties during node
to Fog and Cloud computing. In this light, the scale of the selection by, for example, filtering remote or not trusted
above experiments can be considered appropriate to show nodes. Such policies should be properly designed and
feasibility and effectiveness of ECStream Processing. evaluated, thus calling for specific techniques and tools.
IX. Conclusion and Future Work
This paper presented a novel approach to perform data References
processing at the very edge of IoT. As opposed to the
established practice to offload computation to a Cloud [1] R. Dautov, S. Distefano, D. Bruneo, F. Longo, G. Merlino,
in a vertical manner, the proposed approach relies on A. Puliafito, and R. Buyya, “Metropolitan intelligent surveillance
systems for urban areas by harnessing iot and edge computing
enabling local clusters of edge devices on top of the NiFi paradigms,” Software: Practice and Experience, vol. 48, no. 8,
Stream Processing middleware. This way, edge devices, pp. 1475–1492, 2018.
14

[2] M. Jang, M.-S. Park, and S. C. Shah, “A mobile ad hoc cloud [24] S. Nastic, T. Rausch, O. Scekic, S. Dustdar, M. Gusev,
for automated video surveillance system,” in 2017 International B. Koteska, M. Kostoska, B. Jakimovski, S. Ristov, and R. Pro-
Conference on Computing, Networking and Communications dan, “A serverless real-time data analytics platform for edge
(ICNC). IEEE, 2017, pp. 1001–1005. computing,” IEEE Int. Comp., vol. 21, no. 4, pp. 64–71, 2017.
[3] S. Yang, “IoT Stream Processing and Analytics in the Fog,” [25] G. Merlino, D. Bruneo, F. Longo, S. Distefano, and A. Puliafito,
IEEE Communications Magazine, vol. 55, no. 8, pp. 21–27, 2017. “Cloud-Based Network Virtualization: An IoT Use Case,” in Int.
[4] Z. Wen, R. Yang, P. Garraghan, T. Lin, J. Xu, and M. Rovatsos, Conf. on Ad Hoc Net. Springer, 2015, pp. 199–210.
“Fog orchestration for internet of things services,” IEEE Internet [26] F. Longo, D. Bruneo, S. Distefano, G. Merlino, and A. Puliafito,
Computing, vol. 21, no. 2, pp. 16–24, 2017. “Stack4Things: a sensing-and-actuation-as-a-service framework
[5] F. Haider, D. Zhang, M. St-Hilaire, and C. Makaya, “On the for IoT and cloud integration,” Ann. Telecomm., vol. 72, pp.
planning and design problem of fog computing networks,” IEEE 53–70, 2017.
Transactions on Cloud Computing, pp. 1–1, 2018. [27] D. J. Lilja, “Experiments with a Task Partitioning Model for
[6] M. Satyanarayanan, P. Simoens, Y. Xiao, P. Pillai, Z. Chen, Heterogeneous Computing,” in Proceedings of the Workshop on
K. Ha, W. Hu, and B. Amos, “Edge analytics in the internet of Heterogeneous Processing. IEEE, 1993, pp. 29–35.
things,” IEEE Pervasive Comp., vol. 14, no. 2, pp. 24–31, 2015. [28] U. Catalyurek and C. Aykanat, “A hypergraph-partitioning ap-
[7] R. Vilalta, A. Mayoral, D. Pubill, R. Casellas, R. Martı́nez, proach for coarse-grain decomposition,” in The 2001 ACM/IEEE
J. Serra, C. Verikoukis, and R. Muñoz, “End-to-End SDN Conference on Supercomputing. ACM, 2001, pp. 28–28.
orchestration of IoT services using an SDN/NFV-enabled edge [29] R. Dautov, S. Distefano, D. Bruneo, F. Longo, G. Merlino, and
node,” in Optical Fiber Comm. Conf. IEEE, 2016, pp. 1–3. A. Puliafito, “Data processing in cyber-physical-social systems
[8] O. Skarlat, S. Schulte, M. Borkowski, and P. Leitner, “Resource through edge computing,” IEEE Access, vol. 6, pp. 29 822–29 835,
provisioning for IoT services in the fog,” in 2016 IEEE 9th 2018.
International Conference on Service-Oriented Computing and [30] M. D. de Assuncao, A. da Silva Veith, and R. Buyya, “Distributed
Applications (SOCA). IEEE, 2016, pp. 32–39. data stream processing and edge computing: A survey on
[9] C.-C. Hung, G. Ananthanarayanan, P. Bodik, L. Golubchik, resource elasticity and future directions,” Journal of Network
M. Yu, P. Bahl, and M. Philipose, “VideoEdge: Processing and Computer Applications, vol. 103, pp. 1–17, 2018.
camera streams using hierarchical clusters,” in 2018 IEEE/ACM [31] R. Dautov and S. Distefano, “Automating IoT Data-Intensive
Symposium on Edge Computing. IEEE, 2018, pp. 115–131. Application Allocation in Clustered Edge Computing,” IEEE
[10] V. Cardellini, V. Grassi, F. Lo Presti, and M. Nardelli, “Optimal Transactions on Knowledge and Data Engineering, pp. 1–1, 2019.
operator placement for distributed stream processing appli- [32] F. Lin, Y. Zhou, X. An, I. You, and K. R. Choo, “Fair resource
cations,” in ACM Int. Conf. on Distributed and Event-based allocation in an intrusion-detection system for edge computing:
Systems, 2016, pp. 69–80. Ensuring the security of internet of things devices,” IEEE
[11] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, “Internet Consumer Electronics Magazine, vol. 7, no. 6, pp. 45–50, 2018.
of things: A vision, architectural elements, and future directions,” [33] R. Burkard, M. Dell’Amico, and S. Martello, Assignment Prob-
Future Gen. Comp. Sys., vol. 29, no. 7, pp. 1645–1660, 2013. lems. Philadelphia, PA, USA: Society for Industrial and Applied
[12] R. Dautov, S. Distefano, D. Bruneo, F. Longo, G. Merlino, and Mathematics, 2009.
A. Puliafito, “Pushing Intelligence to the Edge with a Stream [34] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed, “ZooKeeper:
Processing Architecture,” in The 2017 IEEE International Wait-free Coordination for Internet-scale Systems,” in USENIX
Conference on Internet of Things (iThings 2017). IEEE, 2017. annual technical conference, vol. 8, no. 9, 2010.
[13] R. Roman, J. Lopez, and M. Mambo, “Mobile edge comput- [35] D. Trihinas, G. Pallis, and M. D. Dikaiakos, “ADMin: Adaptive
ing, fog et al.: A survey and analysis of security threats and monitoring dissemination for the internet of things,” in 2017
challenges,” Future Generation Computer Systems, 2016. IEEE Conference on Computer Communications (INFOCOM).
[14] M. R. Rahimi, J. Ren, C. H. Liu, A. V. Vasilakos, and N. Venkata- IEEE, 2017, pp. 1–9.
subramanian, “Mobile cloud computing: A survey, state of art [36] M. Mao and M. Humphrey, “A performance study on the VM
and future directions,” Mobile Networks and Applications, vol. 19, startup time in the cloud,” in 2012 IEEE Fifth International
no. 2, pp. 133–143, 2014. Conference on Cloud Computing. IEEE, 2012, pp. 423–430.
[15] A. Manzalini and N. Crespi, “An edge operating system enabling
anything-as-a-service,” IEEE Comm. Mag., vol. 54, no. 3, pp.
62–67, 2016.
[16] N. Fernando, S. W. Loke, and W. Rahayu, “Computing with Rustem Dautov holds a PhD in Computer
nearby mobile devices: A work sharing algorithm for mobile Science from the University of Sheffield, UK.
edge-clouds,” IEEE Transactions on Cloud Computing, vol. 7, He is a Research Scientist at SINTEF, Norway,
no. 2, pp. 329–343, 2019. where he is involved in several R&D projects
[17] M. Chen, Y. Hao, Y. Li, C.-F. Lai, and D. Wu, “On the at the European and national levels. He has
computation offloading at ad hoc cloudlet: architecture and previously been a Postdoctoral Researcher and
service modes,” IEEE Communications Magazine, vol. 53, no. 6, a Lecturer in IoT at Kazan Federal University,
pp. 18–24, 2015. Russia, and a Marie Curie Fellow at SEERC,
[18] H. Guo and J. Liu, “Collaborative Computation Offloading for Greece. His research focuses on software engi-
Multiaccess Edge Computing Over Fiber–Wireless Networks,” neering for IoT, Edge and Cloud Computing.
IEEE Transactions on Vehicular Technology, vol. 67, no. 5, pp.
4514–4526, 2018.
[19] R. Dautov, S. Distefano, and R. Buyya, “Hierarchical data fusion
for smart healthcare,” Journal of Big Data, vol. 6, no. 19, 2019.
[20] C. Zhu, H. Wang, X. Liu, L. Shu, L. T. Yang, and V. C. M. Leung,
“A novel sensory data processing framework to integrate sensor
networks with mobile cloud,” IEEE Systems Journal, vol. 10, Salvatore Distefano is an Associate Profes-
no. 3, pp. 1125–1136, 2016. sor at the University of Messina, Italy and a
[21] T. Taleb, K. Samdanis, B. Mada, H. Flinck, S. Dutta, and Fellow Professor at Kazan Federal University,
D. Sabella, “On multi-access edge computing: A survey of the Russia. His research interests include Cloud,
emerging 5G network edge cloud architecture and orchestration,” Fog, Edge computing, IoT, crowd-sourcing,
IEEE Communications Surveys & Tutorials, vol. 19, no. 3, pp. Big Data, software and service engineering,
1657–1681, 2017. performance and reliability evaluation and QoS.
[22] M. Satyanarayanan, “Pervasive computing: Vision and chal- He is involved in several national and interna-
lenges,” IEEE Personal Comm., vol. 8, no. 4, pp. 10–17, 2001. tional projects. He is a member of international
[23] M. Gusev, B. Koteska, M. Kostoska, B. Jakimovski, S. Dustdar, conference committees and journal editorial
O. Scekic, T. Rausch, S. Nastic, S. Ristov, and T. Fahringer, “A boards such as IEEE Trans. on Dependable and
deviceless edge computing approach for streaming IoT applica- Secure Computing. He has also co-founded the SmartMe.io startup.
tions,” IEEE Int. Comp., vol. 23, no. 1, pp. 37–45, 2019.

View publication stats

You might also like