Internet-of-Things (IoT) System Architectures, Algorithms, Methodologies PDF
Internet-of-Things (IoT) System Architectures, Algorithms, Methodologies PDF
Marilyn Wolf
Internet-of-Things
(IoT) Systems
Architectures, Algorithms,
Methodologies
Internet-of-Things (IoT) Systems
Dimitrios Serpanos • Marilyn Wolf
Internet-of-Things (IoT)
Systems
Architectures, Algorithms, Methodologies
Dimitrios Serpanos Marilyn Wolf
Electrical & Computer Engineering School of ECE
University of Patras Georgia Institute of Technology
Patras, Greece Atlanta, GA, USA
The Internet of Things is the evolutionary step of the Internet that creates a world-
wide infrastructure interconnecting machines and humans. As the Internet became
public in the early 1990s, the first wave of its exploitation and deployment was
mainly focused on the impact to everyday services and applications that changed
the known models for financial transactions, shopping, news feeding and informa-
tion sharing. It was a revolution that digitized a wide range of services as we knew
them, from banking and retail shopping to face-to-face communication and govern-
ment services. The first two decades of the Internet revolution focused strongly on
consumer services and businesses, but human-centric. New business models
appeared for banking, for online shopping, video communication, etc. for consum-
ers. Business to business models and the cloud have impacted businesses signifi-
cantly, wiping out large sectors of industry that did not adjust to the fast pace of the
revolution. The impact on the economies has been tremendous. Now, more than two
decades later, we witness and experience a new way of life because of the Internet’s
reach to our homes and work environments.
The advances of communication technology that enables the deployment and
success of the Internet at home and work had an additional effect: the development
of sophisticated interconnections among machines in the operational environment;
we contrast the operational technology (OT) environment, which controls physical
machines, to the information technology (IT) environment where humans are using
computers for work. The already automated industrial environment received well
the emerging technologies, adopted the suitable ones and created a, private mostly,
network infrastructure that enables highly productive industrial processes. It has
only been a natural step to evolve the Internet itself to include these processes.
Additionally, the control models of the industrial environment, taking advantage of
the smart devices –i.e. devices that include processing, memory and networking
resources- that are deployed in various environments, have been extended and used
in a wide variety of application domains. Conventional application domains like
transportation, aeronautics, energy production and distribution, manufacturing and
health adopt similar control models, exploiting smart sensors, actuators and devices
that enable control automation for sophisticated applications. Critical infrastructure
vii
viii Preface
ix
Contents
xi
xii Contents
Index������������������������������������������������������������������������������������������������������������������ 91
Chapter 1
The IoT Landscape
The Internet of Things (IoT) has become a common news item and marketing trend.
Beyond the hype, IoT has emerged as an important technology with applications in
many fields. IoT has roots in several earlier technologies: pervasive information
systems, sensor networks, and embedded computing. The term IoT system more
accurately describes the use of this technology than does Internet of Things. Most
IoT devices are connected together to form purpose-specific systems; they are less
frequently used as general-access devices on a worldwide network.
IoT moves beyond pervasive computing and information systems, which con-
centrated on data. Smart refrigerators are one example of pervasive computing
devices. Several products included built-in PCs and allowed users to enter informa-
tion about the contents of their refrigerator for menu planning. Conceptual devices
would automatically scan the refrigerator contents to take care of data entry. The use
cases envisioned for these refrigerators are not so far removed from menu planning
applications for stand-alone personal computers.
Sensor network research spanned a range of configurations. Many of these were
designed for data collection at very low data rates. The collected data would then be
sent to servers for processing. Traditional sensor network research did not empha-
size in-network processing.
Embedded computing concentrated on either stand-alone devices or tightly cou-
pled networks such as those used in vehicles. Consumer electronics and cyber-
physical systems were two major application domains for embedded computing;
both emphasized engineered systems with well-defined goals.
Given the wide range of advocates for IoT technology, no single, clear definition
of the term has emerged. We can identify several possibilities:
• Internet-enabled physical devices, although many devices don’t use the Internet
Protocol
1.2 Applications
• Vehicles use networked sensors to monitor the state of the vehicle and provide
improved dynamics, reduced fuel consumption, and lower emissions.
• Medical systems connect a wide range of patient monitoring sensors that may be
located at the home, in emergency vehicles, the doctor’s office, or the hospital.
Use cases help us understand the requirements on an IoT system.
Sensor network The system may act strictly as a data gathering system for a set of
sensors.
Alert system Data from sensors may be gathered and analyzed. Alerts are gener-
ated when particular criteria are met.
Analysis system Data from sensors is gathered and analyzed, but in this case, the
analysis is ongoing. Reports on analytic results may be generated periodically –
hourly, daily, etc. – or may be continuously updated.
Reactive system Analysis of sensor data may cause actuators to be triggered. We
reserve the term reactive for systems that don’t implement typical control laws.
Control system Sensor data is fed to control algorithms that generate outputs for
actuators.
We can identify a class of nonfunctional requirements that apply to many IoT
systems. Nonfunctional requirements on the system impose nonfunctional require-
ments on the components.
Event latency Latency from capture of an event to its destination may not be
important for batch-oriented applications but becomes important for online
analysis.
Event throughput The rate at which events can be captured, transported, and pro-
cessed depends on the throughput of the nodes, network bandwidth, and cloud
throughput.
Event loss rate and buffer capacity In the absence of strict upper bounds on event
production rates, the environment may produce more events in an interval than the
system can produce. Event loss rate captures the desired capability, while buffer
capacity is a more pragmatic requirement that can be directly tied to component
capabilities.
Service latency and throughput Ultimately, events will be processed by services.
We can also specify the latency and throughput for services.
Reliability and availability Since IoT systems are distributed, reliability is more
likely to be specified over parts of the network rather than reliability of the complete
system. Availability is commonly used to describe distributed systems.
Service lifetime IoT systems are often expected to have longer lifetimes than we
expect for PC systems. The lifetime of the system or a subset of the system may be
considerably longer than that of a component, particularly if the system uses redun-
dant sensors and other components.
4 1 The IoT Landscape
1.3 Architectures
Wireless networks are integral to IoT systems. Wireless network connections sim-
plify installation and operation of wireless networks.
However, wireless networks introduce some important problems and restric-
tions. Radio communication requires more power than does wired communication.
Some of the wireless networks used in today’s IoT devices were designed for other
purposes, such as telephony and multimedia. As a result, they are not optimized for
event-driven communication and consume significant amounts of power in the com-
munications protocol.
One of the ironies of IoT is that many edge devices and their wireless networks
don’t operate on the Internet Protocol (IP). IP introduces significant overhead with
an extra level of packetization and associated processing. Many IoT devices avoid
IP and rely on upstream nodes to provide them with an Internet presence.
IoT networks are typically run by noncomputer experts. IoT wireless networks
must be easy to deploy and relatively self-managing.
1.5 Devices
Security has finally been recognized as an essential requirement for all types of
computer systems, including IoT systems. However, many IoT systems are much
less secure than typical Windows/Mac/Linux systems. IoT security problems stem
from a range of causes: inadequate security features in hardware, poorly designed
software with a range of vulnerabilities, default passwords, and other security
design errors.
Insecure IoT nodes create problems for the security of the entire IoT system.
Because nodes typically have lifetimes of several years, the large installed base of
insecure devices will create security problems for some time to come.
Insecure IoT systems also cause security problems for the rest of the Internet.
IoT devices are plentiful; insecure IoT nodes are ideally suited to denial-of-service
attacks. The Dyn attack [Sch16] is one example of an IoT-based attack on traditional
Internet infrastructure.
Privacy is related to security but requires specific measures at the application,
network, and device levels. Not only must user data be protected from outright theft,
but the network needs to be designed so that less-private data cannot easily be used
to infer more private data.
We believe that the event is a fundamental data type in IoT systems and that event-
driven systems are an important structuring technique for IoT. Many of the building
block technologies used for IoT today show some holdover from traditional,
transaction-oriented systems. Event processing pushes us to treat time as a first-
class concept and to consider the relationship between events in event sequences.
We use the term event more broadly then do simulation engineers. We consider
events as time-value sets. Event-driven system simulation is widely used for model-
ing a wide range of engineering systems. In that context, an event is generally used
to mean a change in the state of a variable. Given the decentralized nature of IoT
systems, we are willing to consider stuttering – the repetition of an event value – as
part of the event model. We also use events to model sampled data and time-series
data. We believe that all these uses of the term event can be unified to create rich
system structures.
6 1 The IoT Landscape
The rest of this book describes a range of topics in IoT systems in more detail:
• Chapter 2 studies IoT system architectures, including wireless networks.
• Chapter 3 considers VLSI IoT devices. It describes the relationship between cost
of ownership, power consumption, and duty cycle.
• Chapter 4 introduces analysis methods for event-driven IoT systems. These anal-
ysis methods allow us to study the memory requirements implied by event com-
munication and processing.
• Chapter 5 describes the Industrial Internet of Things and applications of IoT
systems in smart energy systems.
• Chapter 6 studies security and safety issues in IoT systems. Computer and cyber-
physical system security is closely tied to safety in sensor and closed-loop con-
trol systems.
• Chapter 7 describes fuzz testing, a technique for testing the security of IoT sys-
tems. Bugs and crashes can provide exploits for attackers; fuzz testing is designed
to help identify such problems.
Reference
[Sch16] Schneier, B. (2016, October 22). DDoS attacks against Dyn. Schneier on Security.
https://www.schneier.com/blog/archives/2016/10/ddos_attacks_ag.html
Chapter 2
IoT System Architectures
2.1 Introduction
In this chapter, we study architectures for IoT systems. We will study typical com-
ponents used for networks, databases, etc.
Figure 2.1 shows the organization of an IoT system:
• The plant or environment is the physical system with which the IoT system inter-
acts. We will use these two terms interchangeably.
• A set of devices form the leaves of the network. A node may include sensors and/
or actuators, processors, and memory. Each node has a network interface. A node
may or may not run the Internet Protocol.
• Hubs provide first-level connectivity between the nodes and the rest of the net-
work. Hubs are typically run IP.
• Fog processors perform operations on local sets of nodes and hubs. Keeping
some servers nearer the nodes reduces latency. However, fog devices may not
have as much compute power as cloud servers. Fog devices also introduce sys-
tem management issues.
• Cloud servers provide computational services for the IoT system. Databases
store data and computational results. The cloud may provide a variety of services
that mediate between nodes and users.
the heterogeneous and long-lived nature of most IoT systems, standards are often
used rather than custom protocols. Several different protocols have been proposed
and, to varying degrees, used for IoT systems [Duf13]. The user space has not yet
converged on a single standard for IoT communication services.
Given the prevalence of event-oriented models in IoT systems, a protocol should
support event-style communication.
The HTTP protocol uses a request/response design pattern. A client issues a
request for a hypertext object; the server then replies with the object in response.
A publish/subscribe protocol [Twi11] requires less coupling between the client
and server as illustrated in Fig. 2.2. The server, known as a publisher, classifies mes-
sages into categories. Clients subscribe to the categories of interest to them. Publish/
subscribe systems are typically mediated by brokers which receive published
2.2 Protocols Concepts 9
essages from publishers and send them to subscribers. Messages may be orga-
m
nized by topic; all message of a given topic are distributed by the brokers to the
subscribers for that topic. The broker knows the identities of subscribers but the
publisher does not. Brokers may interact with each other using a bridge protocol. A
bridge allows indirect publication of messages, with a message going from the pub-
lisher to a first broker, then to a second broker, and finally to subscribers who are not
connected to the first broker.
Data Distribution Service (DDS) (http://portals.omg.org/dds/) [Obj16] is a pub-
lish/subscribe software architecture; several implementations of DDS are in use. A
DDS domain maintains a logical global data space; the data is managed over a set
of local stores. Publishers and subscribers are dynamically discovered across the
network. Publishers can specify a number of quality of service parameters that are
enforced by the brokers.
Real-Time Publish/Subscribe Protocol (RTPS) [Obj14] is a so-called wire proto-
col that defines a protocol for communication with DDS and other publish/sub-
scribe systems. RTPS provides QoS properties, fault tolerance, and type safety.
Esposito et al. [Esp09] developed an architecture for time-sensitive publish/sub-
scribe systems that would be scalable to Internet-sized systems. They identified
three major design goals: predictable latency, guaranteed delivery in the presence of
multiple faults, and continued performance under scaling. They identified several
types of fault models for publish/subscribe systems: network anomalies (loss, order-
ing, corruption, delay, congestion, partitioning), link crash, node crash, and churn of
nodes unexpectedly joining and leaving the system. Their architecture has three
abstraction layers: the network layer consists of domains composed of nodes; the
nodes layer consists of clusters, with each cluster’s members belonging to the same
stub domain; and a coordinators layer. The coordination layer routes messages
using a tree-based topology built on top of a distributed hash table. The coordinator
is p-redundant to provide fault-tolerant coordination. To provide fault-tolerant over-
lays, they formulate a model for path diversity that can be computed with limited
knowledge of the network connections.
Kang et al. [Kan12] used a semantics-aware communication mechanism to
reduce overhead and improve reliability. They use state-space estimators at both the
publisher and subscriber to maintain continuity of sensor values in the presence of
network variations. Their state estimator is of the form xk + 1 = Fk + 1xk. The designer
sets a model precision bound δ for each sensor. The bound is used to manage band-
width requirements. Their system also dynamically adjusts the model precision
bound.
Choi et al. [Choi16] combined DDS with the OpenFlow software-defined net-
working protocol to ensure that DDS can implement the QoS parameters. They
added two QoS parameters that could not be easily deduced from the standard DDS
parameters: MINIMUM_SEPARATION and an E2E_LATENCY specified by
subscribers.
10 2 IoT System Architectures
We can divide protocols into two major categories: those that are tied to a specific
physical layer and those that are not. Generally speaking, protocols that rely on a
specific physical layer do not use the Internet Protocol, while protocols that are
physical layer agnostic do use IP.
Zigbee [Zig14, Far08] is a mesh network designed for low-power operation. A
variety of derivative application standards specialize the protocol for applications
such as smart homes and utilities. Zigbee is based on the IEEE 802.15.4 PHY and
MAC standards. 802.15.4 operates in three bands: 868 MHz, 915 MHz, and
2.4 GHz. It delivers bit rates from 20 to 250 kbps, depending on the frequency band.
The Zigbee NWK layer sits on top of the 802.15.4 MAC layer and provides data and
management services. The APL layer includes three sections: the application sup-
port sublayer, the Zigbee Device Objects layer, and the application framework.
Zigbee provides two types of network security models: a centralized security
network can be started only by a Zigbee coordinator/trust center; distributed secu-
rity networks do not have a central trust center. Nodes can join either type of net-
work and adapt to the type of network they have joined. Networks are formed by
either coordinators or routers after scanning to select an available channel.
Coordinators form centralized security networks, while routers form distributed
security networks. Network steering is the name for the process by which a node
joins a network. After identifying an open network, the node associates with that
network and receives a network key. Clusters define interfaces for features and
domains.
Bluetooth Low Energy (BLE) (https://www.bluetooth.com/what-is-bluetooth-
technology/how-it-works/low-energy) [Hay13] is a part of the Bluetooth standard
designed for low-power operation such as devices powered from coin cell batteries.
A BLE device can work as a transmitter, receiver, or both. Figure 2.3 illustrates the
Bluetooth Classic protocol stack.
The link layer provides an advertising service; devices can scan to identify nodes
and networks. Devices can act as gateways to the Internet based on network address
translation. The BLE protocol is stateful. BLE includes a number of optimizations
to reduce power consumption.
LoRa (http://lora-alliance.org) [LoR15] is designed for wide-area IoT applica-
tions with a base station covering hundreds of square kilometers. It is designed to
support a network topology with gateways for end devices, with gateways organized
into their own star network. Data rates range from 0.3 to 50 kbps.
MQTT (http://www.mqtt.org) [IBM12, Oas14] is an IoT-oriented protocol with
publish/subscribe semantics. The protocol is designed for low overhead and is
agnostic to the data payload. MQTT provides three levels of quality of service: at
most once provides best-effort service, at least once assures delivery but may incur
duplicates, and exactly once ensures the message is delivered without duplication.
2.3 IoT-Oriented Protocols 11
REST collections. The system is divided into a data plane for messages and a con-
trol plane for allocation to servers known as routers; data plane servers are known
as forwarders. The routers balance consistency and uniformity of data using a con-
sistent hashing algorithm. A message life cycle includes several steps. When a pub-
lisher sends a message, it is written to storage. The subscribers receive the message,
and the publisher receives an acknowledgment. Subscribers acknowledge the mes-
sage to Google Cloud Pub/Sub. The message is deleted from storage once at least
one subscriber for each subscription has acknowledged the message. The system
monitors itself to detect and mitigate service problems.
Amazon Web Services (AWS) IoT [Bar15] is a managed cloud service for IoT
devices, which are termed things. A thing shadow is a cloud model of a thing. A rule
engine transforms messages based on rules and routes the results to AWS services.
The message broker is based on MQTT. A Thing Registry assigns unique identity to
things.
Microsoft Azure (https://azure.microsoft.com/en-us/services/iot-hub/) provides
IoT-oriented services. Its Service Fabric is a middleware communication system
that supports microservices running on a cluster. A microservice may be either
stateless or stateful. It also provides a container model for applications; a container
provides an isolated environment but relies on the operating system, in contrast to a
virtual machine which runs underneath the operating system. It provides databases
using both structured and unstructured approaches. It also provides APIs for artifi-
cial intelligence services.
2.4 Databases
Databases are used for both short-term and long-term storage. Applications may
rely on databases to retrieve data over a time window for analysis. Some use cases
may require archival storage of values.
Unstructured databases, known as noSQL, are used in many IoT systems. A
noSQL database does not have a schema. Simple noSQL databases represent data
as key-value pairs, but other representations are possible. The lack of a schema
allows quick deployment but may cause maintenance problems.
The Amazon Simple Storage Service (Amazon S3) (https://aws.amazon.com/
s3/) is an object store with a Web service interface. Data can be pushed to other,
lower-cost storage services for long-term, infrequent use. Notifications can be
issued when objects operated upon.
Google Cloud Storage (https://cloud.google.com/storage) is an object store for
unstructured data. It provides three different service models at different latency/
latency/price points. Cloud SQL can be used to perform database operations.
Streaming transfers are supported using HTTP chunked transfer encoding.
Time-series data possesses structure that may require special handling to provide
proper database performance. Time series are sometimes stored as blobs in rela-
tional databases to allow specialized algorithms.
2.6 Security 13
Dynamic time warping (DTW) [Rat04, Rak12] is widely used to search over
time-series data. DTW was originally used to compare waveforms for speech pro-
cessing. Correlation provides a direct comparison of two waveforms. By warping
one waveform, non-exact matches can be found. Dynamic programming can be
used to find the minimum warp match between two-time series; a limit on maxi-
mum warping is typically applied to avoid obviously bad matches. Very efficient
algorithms have been developed to provide high-speed search. Among other tech-
niques, these algorithms abandon a warp computation early when partial results
exceed a given bound. Fast DTW algorithms have been used to search very large
databases.
Many IoT systems require a notion of global time. Several algorithms, starting with
Lamport’s algorithm [Lam78], have been developed for the synchronization of
clocks in a distributed system.
The Network Time Protocol (RFC1305) is used on the Internet for distributed
time synchronization.
2.6 Security
Security is a system property; the system can be only as secure as its weakest com-
ponent. Security features are provided by components at several layers in the IoT
stack: devices, physical networks, and middleware. A unified view of IoT system
security architectures has not yet emerged.
Some, but not all processors for low-power operation, provide security features
such as encryption accelerators and root of trust. The National Security Agency has
developed families of lightweight block ciphers [Sch13]: SIMON targets hardware
implementations, and SPECK is intended for software implementations. Gulcan
et al. [Gul14] developed a low-power implementation of SIMON.
Several networks provide security features. Bluetooth Low Energy provides a
Simple Secure Pairing protocol to protect against passive eavesdropping. It also
provides address randomization. As discussed above, Zigbee provides two network
security models: centralized and distributed. LoRa provides unique network keys,
unique application keys, and device-specific keys.
MQTT does not specifically require encryption, but it can be used with several
different security standards. MQTT and the NIST Framework for Improving Critical
Infrastructure Cybersecurity [Oas14B] describe the relationship between MQTT
and the NIST Cybersecurity Framework.
We will study IoT system security in more detail in Chap. 6.
14 2 IoT System Architectures
References
[Bar15] Barr, J. (2015, October 8). AWS IoT: Cloud services for connected devices. AWS Blog.
https://aws.amazon.com/blogs/aws/aws-iot-cloud-services-for-connected-devices/
[Choi16] Choi, H.-Y., King, A. L., & Lee, I. (2016). Making DDS really real-time with OpenFlow.
2016 international conference on embedded software (EMSOFT) (pp. 1–10). Pittsburgh, PA.
[Duf13] Duffy, P. (2013, April 30) Beyond MQTT: A Cisco view on IoT protocols. Cisco Blogs.
https://blogs.cisco.com/digital/beyond-mqtt-a-cisco-view-on-iot-protocols
[Esp09] Esposito, C., Cotroneo, D., & Gokhale, A.. 2009. Reliable publish/subscribe middle-
ware for time-sensitive internet-scale applications. Proceedings of the third ACM international
conference on distributed event-based systems (DEBS’09). ACM, New York, Article 16, 12
pages.
[Far08] Farahani, S. (2008). Zigbee wireless networks and transceivers. Amsterdam: Newnes.
[Goo17A] Google. (2017, April 19). What is Google Cloud Pub/Sub? https://cloud.google.com/
pubsub/docs/overview
[Goo17B] Google. (2017, April 3). Google Cloud Pub/Sub: A Google-scale messaging service.
https://cloud.google.com/pubsub/architecture
[Gul14] Gulcan, E., Aysu, A., & Schaumont, P. (2015). A flexible and compact hardware archi-
tecture for the SIMON block cipher. In T. Eisenbarth & E. Öztürk (Eds.), Lightweight cryptog-
raphy for security and privacy. LightSec 2014, Lecture Notes in Computer Science (Vol. 8898,
pp. 34–50). Cham: Springer.
[Hay13] Heydon, R. (2013). Bluetooth low energy: The developer’s handbook. Prentice Hall:
Upper Saddle River, NJ.
[IBM12] IBM International Technical Support Organization (2012, September). Building smarter
planet solutions with MQTT and IBM WebSphere MQ telemetry, Redbooks.
[IET14] Internet Engineering Task Force (2014, June). The constrained application protocol
(CoAP), RFC 7252, Shelby, Z., Hartke, K., & Bormann, C.
[Kan12] Kang, W., Kapitanova, K., & Son, S. H. (2012). RDDS: A real-time data distribu-
tion service for cyber-physical systems. IEEE Transactions on Industrial Informatics, 8(2),
393–405.
[Lam78] Lamport, L. (1978). Time, clocks, and the ordering of events in a distributed system.
Communications of the ACM, 21(7), 558–565.
[LoR15] LoRa Alliance (2015, November). LoRaWAN: What is it? A technical overview of LoRa
and LoRaWAN.
[Oas14] Oasis. (2014, 29). MQTT version 3.1.1. Oasis standard.
[Oas14B] Oasis (2014, May 28). MQTT and the NISTG cybersecurity framework version 1.0.
Committee note 01.
[Obj14] Object Management Group. (2014). The real-time publish-subscribe protocol (RTPS)
DDS interoperability wire protocol specification, Version 2.2.
[Obj16] Object Management Group. (2016). What is DDS? http://portals.omg.org/dds/what-is-
dds-3/, accessed May 4, 2017.
[Sch13] Schneier, B. SIMON and SPECK: New NSA encryption algorithms. Schneier on
Security. https://www.schneier.com/blog/archives/2013/07/simon_and_speck.html, retrieved
May 8, 2017.
[Vaq14] Vaqqas, M. (2014, September 23) RESTful web services: A tutorial. Dr. Dobb’s. http://
www.drdobbs.com/web-development/restful-web-services-a-tutorial/240169069
[Rak12] Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q.,
Zakaria, J., & Keogh, E.. 2012. Searching and mining trillions of time series subsequences
under dynamic time warping. Proceedings of the 18th ACM SIGKDD international conference
on knowledge discovery and data mining (KDD’12) (pp. 262–270). ACM, New York.
[Rat04] Ratanamahatana, C. A., & Keogh, E. (2004, August 22–25). Everything you know about
dynamic time warping is wrong. Third workshop on mining temporal and sequential data, in
References 15
conjunction with the tenth ACM SIGKDD international conference on knowledge discovery
and data mining (KDD-2004). Seattle, WA.
[Rod15] Rodriguez, Alex. (2008, November 6). RESTful web services: The basics. IBM devel-
operWorks, updated February 9, 2015. https://www.ibm.com/developerworks/library/ws-rest-
ful/index.html
[Twi11] Twin Oaks Computing, Inc. (2011). What can DDS do for you?
[Zig14] Zigbee Alliance (2014, December 2). ZigBee 3.0: The open, global standard for the
Internet of Things. http://www.zigbee.org/zigbee-for-developers/zigbee/
Chapter 3
IoT Devices
The design space for IoT devices is very different from that for mobile or cloud
processors. Both mobile and cloud systems require very large chips. IoT devices
should operate at extremely low power levels but often not operate continuously.
They must integrate processors, memory and storage, communication, and sensors.
They will also be sold in quantities that dwarf even those of mobile processors,
which in turn require a very low price. Purchase price is, however, only one compo-
nent of the IoT device cost model. Total cost of ownership will drive many IoT
markets – these devices will be installed for use over a lifetime of several years.
Installation cost is an important element in the decision to purchase and install these
devices. We will see that cost of ownership is directly tied to power consumption.
The sensors and MEMS communities have long been interested in IoT as an
application for integrated sensors and actuators. Many commentators have called
for a trillion sensor world. This goal is in fact very realistic given current industry
capabilities. According to Semi.org [Die16], worldwide manufacturing capacity for
200 mm wafers is expected to be 5.4 million wafers per month in 2018. If all this
capacity is used for IoT, it translates to 678 billion chips per month of size 1 mm2 or
68 billion per month of 10 mm2 chips. That capacity puts the industry within range
of producing a trillion sensors per year. We could reach the trillion sensors per year
mark simply by reallocating existing capacity. Even if production does not com-
pletely reach the trillion sensor mark, the industry can clearly manufacture huge
volumes of sensors.
Lifetime cost of ownership is a key metric for IoT devices [Wol16]. The cost of an
IoT silicon includes several components: sensing and actuation, computation, net-
working, as well as packaging. The completed IoT device includes power supply
and packaging. However, installation cost is a significant factor in the cost of owner-
ship. The cost of installing a cable drop in an existing building in the USA is, in the
authors’ experience, around $150. That cost overwhelms the cost of hardware.
Eliminating all wiring – both power and networking – substantially reduces instal-
lation cost. The cost of replacing batteries is significant. Our colleague Rajesh
Gupta reported that the computer science building at University of California, San
Diego, requires a full-time employee to replace batteries on electronic door locks
(Rajesh Gupta, personal communication, February 2014). The ability to power
devices entirely by energy harvesting would eliminate that cost but imposes con-
straints on the devices.
The high cost and effort of wired power have encouraged the development of
energy-scavenging (also known as energy-harvesting) technologies. A range of
physical mechanisms can be used to convert energy for use by the environment.
Since most scavenging sources provide varying amounts of power, the harvested
energy is stored for later use. Electric power may be stored in a battery, a capacitor,
or a supercapacitor. On-chip power management circuitry stores harvested energy
and then regulates the power as it is used by the rest of the chip.
Paradiso and Starner [Par05] identified several widely different sources of
energy, including radio frequency, ambient light, thermoelectricity, and heel strikes.
They pointed out that indoor lighting provides much lower ambient light levels than
are available from the sun. Sudevalayam and Kulkarni [Sud11] surveyed energy-
harvesting technologies for sensor nodes. They identified a range of technologies
with different sources, conversion efficiencies, and harvest yield. They reported, for
example, that light converged by solar cells typically provided 15 mW/cm2, wind by
anemometer provided 1200mWh/day, and provided footfalls 5W.
Romani et al. [Rom17] survey power conversion and management architectures
for ambient-powered IoT devices. Their reference architecture for a no-battery
power management system includes several components. A transducer extracts
power from an external power source with efficiency η. Several sources of internal
power consumption further limit the overall system efficiency: power control cir-
cuitry consumes intrinsic power Pint; the storage element leaks power at a rate Pleak;
monitor circuits consume Pvmon. A bootstrap circuit may be used to initialize the
system from discharge. They note that a key challenge of the power management
controller is to match the effective load impedance to the power source’s internal
impedance.
3.3 Cost per Transistor and Chip Size 19
Cm
Ctr = .
ntr
In the standard Moore’s Law scenario, we expect the number of transistors per
wafer to double from one generation to the next. If the cost of processing the wafer
increases by less than that factor, cost per transistor goes down; if not,
The duty cycle model is widely used to analyze IoT devices. As shown in Fig. 3.2,
the model assumes periodic activation of the device. The duty cycle is the percent-
age of time for which the device is on:
3.4 Duty Cycle and Power Consumption 21
O
D= ´100%.
T
Lower duty cycles mean lower energy consumption. We can change the duty
cycle through a combination of changes to the operating time O and the period T.
Reducing the operating time may reduce the device’s functionality; increasing its
period lowers its data rate.
Let the on-state power consumption of the device be Pon. If we assume zero leak-
age, then the power consumption under duty cycle operation is
O
Pideal = Pon .
T
If the device has a leakage power of Poff, then its average power consumption
over the duty cycle is
O æ Oö
Pleak = Pon + ç 1 - ÷ Poff .
T è Tø
We can also solve for fractional duty cycle as a function of on-state and off-state
power and total power consumption:
O Pleak - Poff
= .
T Pon - Poff
This model carries several implications for the design of IoT devices: the device
must be good at idling at low power; it should provide low energy and time to shut
down and to turn back on.
Communication power is a large fraction of the total power consumption of
many IoT devices. Many IoT devices transmit small amounts of data during the on
portion of their duty cycle. In this scenario, the overhead associated with setting up
a communication is a significant part of the total communication power; many com-
munication systems are designed for connection-oriented service that allows setup
costs to be amortized over a longer communication.
Dementyev et al. [Dem13] measured the power consumption of several wireless
protocols. They used their data to determine the optimal period T for each protocol:
14.3 s for Zigbee and 10.0 s for Bluetooth Low Energy (BLE).
22 3 IoT Devices
Unlike mobile devices, most IoT devices do not operate continuously. Nonetheless,
they need to retain state from activation for a range of purposes: communication
status, DSP filtering, etc. SRAM requires power to retain state and thereby length-
ens the allowable duty cycle. Flash memory must be written in blocks. Emerging
technologies offer the promise of bit-level persistent-state devices that can be used
within the processor, not just as memory.
Soerken et al. [Soe17] developed a programmable logic-in-memory (PLiM)
using resistive RAM (RRAM) devices. An RRAM device has persistent state – it
can be written and retains its state after the power supply is removed – making it
well suited to the duty cycle characteristics of IoT devices. They designed their
processor to take advantage of the majority-logic characteristics of RRAMs. They
developed a compiler to translate Boolean functions into instruction streams for
their processor.
3.6 Summary
IoT systems open up a new horizon for VLSI design. IoT systems require ultra-low
power systems that combine disparate elements – computation, communication,
and sensing – at very low price points. IoT systems emphasize small, capable chips
in contrast to the large chips that have driven the industry for many years. We are at
the early stages in the development of this new category of chip.
References
[Dem13] Dementyev, A., Hodges, S., Taylor, S., & Smith, J. (2013). Power consumption analysis
of Bluetooth Low Energy, ZigBee and ANT sensor nodes in a cyclic sleep scenario. Wireless
Symposium (IWS), 2013 IEEE International, Beijing, 2013, pp. 1–4.
[Die16] Dieseldorf, C. G. (2016). Foundries Take Over 200mm Capacity Fab by 2018. www.
semi.org, January 25, 2016.
[Hru12] Hruska, J. (2012). Nvidia deeply unhappy with TSMC, claims 20 nm essentially
worthless. extremetech.com, http://www.extremetech.com/computing/123529-nvidia-deeply-
unhappy-with-tsmc-claims-22nm-essentially-worthless, March 23, 2012.
[Mal94] Maly, W. (1994). Cost of Silicon Viewed from VLSI Design Perspective. Design
Automation, 1994. 31st Conference on, San Diego, CA, USA, 1994, pp. 135–142.
[Par05] J. A. Paradiso and T. Starner, Energy scavenging for mobile and wireless electronics
IEEE Pervasive Computing, vol. 4, no. 1, pp. 18–27, 2005.
[Ral16] Ralston, P., Fry, D., Suko, S., Winters, B., King, M., & Kober, R. (2016). Defeating
counterfeiters with microscopic dielets embedded in electronic components. Computer, 49(8),
18–26.
References 23
[Rom17] Romani, A., Tartagni, M., & Sangiorgi, E. (2017). Doing a lot with a little: Micropower
conversion and management for ambient-powered electronics. Computer, 50(6), 41–49.
[Soe17] Soeken, M., Gaillardon, P. E., Shirinzadeh, S., Drechsler, R., & Micheli, G. D. (2017).
A PLiM computer for the internet of things. Computer, 50(6), 35–40.
[Sud11] Sudevalayam, S., & Kulkarni, P. (2011, Third Quarter). Energy harvesting sensor nodes:
Survey and implications. IEEE Communications Surveys & Tutorials, 13(3), 443–461.
[Whi15] White, M. (2015). IoT, Cost-per-Transistor Extend Lifetimes of Established
Technology Nodes. Electronic Design, May 15, 2015, http://electronicdesign.com/eda/
iot-cost-transistor-extend-lifetimes-established-technology-nodes
[Wik16A]
Wikipedia. (2016). IBM Personal Computer. https://en.wikipedia.org/wiki/IBM_
Personal_Computer. Accessed October 19, 2016.
[Wik16B] Wikipedia. (2016). Intel 8088. https://en.wikipedia.org/wiki/Intel_8088. Accessed
October 19, 2016.
[Wik16C] Wikipedia. (2016). Transistor count. https://en.wikipedia.org/wiki/Transistor_count,
Accessed October 19, 2016.
[Wol16] Wolf, M. (2016). Ultralow power and the new era of not-so-VLSI. IEEE Design & Test,
33(4), 109–113.
[Wol17] Wolf, M. (2017). The physics of computing. Cambridge MA: Elsevier.
Chapter 4
Event-Driven System Analysis
4.1 Introduction
This chapter describes modeling and analysis methods for Internet of Things (IoT)
system design. IoT systems require new types of analysis because events do not
necessarily result in immediate actions or maintain their order relative to other
events.
Traditional methods such as the distributed control-oriented methods of Thiele
and Ernst consider possibly infinite streams of events or samples, but the lifetime of
an event/sample in the system is relatively short. In contrast, IoT systems must deal
with event lifetimes at multiple time scales: some events may schedule activity only
seconds in the future, while other events may schedule activity days, weeks, or
months ahead. IoT also do not maintain temporal order of causality – one event may
cause an event in the near future, while another event may cause an event in the far
future. We need new analytical methods for multiple time scales and complex cau-
sality relationships.
The primary goal of our analysis is the understanding of the required character-
istics of the IoT platform. We propose a model of the IoT system as a network with
devices as leaf nodes and hubs as non-leaf nodes. Hubs perform routing functions
but for our purposes their key role is to control the timing of event activity through
the use of timewheels. While we assume that events carry key-value pairs, we are
not concerned here with the semantics of events. Instead, we analyze the lifetimes
of event populations. Event populations depend in part on the activity of the envi-
ronment in which the IoT operates. To accommodate a wide range of realistic sce-
narios, we develop models based on both deterministic and stochastic event
timing.
Several lines of work have established event-based models for real-time networked
systems. One of the goals of these projects has been to unify the analysis of network-
oriented events and the computation on the network nodes that transform one stream
of events into another.
Chakraborty et al. [Cha03] developed a real-time calculus that models events and
resources. They model an event stream R[s, t) over the prescribed time interval as a
pair of arrival curves: αl(∆) for the lower bound on the number of events in the
interval ∆ and αu(∆) for the upper bound of events in the interval. They show how
to model event streams with jitter. They use β functions to model the service pro-
vided by computational and communication components. They show how to ana-
lyze single streams, multiple interacting streams, and platforms with multiple
computing and communication resources. Maxiaguine et al. [Max04] used work-
load curves to characterize the computational workload of real-time systems. They
showed how to use their methods to analyze both a rate-monotonic system and
streaming architectures.
Henia et al. [Hen05] give definitions and formulas for events and event streams.
Many of their results apply to our model; we summarize some of their applicable
results here.
Event time applies to both generation and release time. An event time includes a
nominal time and jitter:
T,J
A periodic event stream has parameters period and jitter:
P, J
The upper event function ηu(Δt) gives the maximum number of events in the
interval Δt. Similarly, the lower event function ηl(Δt) gives the minimum number of
events in the interval Δt. The upper and lower event functions for a periodic event
stream with jitter are
Dt + J æ Dt - J ö
h Pu + J = , h Pl + J = max ç 0,
P è P ÷ø
They give formulas for the jitter of the output of components that combine event
streams using AND and OR combination methods.
4.4 IoT Network Model 27
IoT systems are built for a variety of applications: industrial control, environmental
monitoring, logistics, etc. We will use examples in this paper derived from our
experiments with IoT systems for long-term care [Wol15]. This application pro-
vides us with use cases typical of smart homes (turning on and off lights, energy
management, etc.) as well as use cases associated with health care (scheduling med-
ications, checking on the condition of residents, etc.). Our example IoT system
operates in a home with several residents, as a rotating set of staffers, and visitors.
A variety of sensors monitor activity in the home: cameras, utility sensors, smart
objects, etc. The IoT system is designed to track the activity of residents and staffers
and to alert staffers of situations that may deserve their attention.
The system architecture consists of several elements:
• A set of sensors
• A local hub that monitors the sensors as I/O devices
• A cloud-based node for some analytical functions
A key feature of the local hub is its internal timewheel (Coelho, D., 2014, August
2, private communication). Timewheels are used in event-driven simulation to man-
age simulator event activity; in this case, we use the timewheel to manage events in
the real world as mediated by the I/O devices. Events are timestamped with a time
at which they should occur, which may be later than the time at which the event was
generated. The timewheel is a time-sorted queue; when the clock time equals the
timestamp of the event at the head of the timewheel, that event is dequeued and
processed.
Our IoT network model is oriented toward the analysis of event behavior in the
system. Because events have long lives, memory in the form of timewheel queues
plays a critical role in the model.
4.4.1 Events
The semantics of the event is given by the key-value pair. The destination of the
event is the device that should process the given key-value pair. The modeling meth-
ods described in this paper are not concerned with the semantics of key-value pairs.
We also need to know the temporal behavior of an event, which is given by two
values. The generation time ϑ is the time at which the event was created. The gen-
eration time is useful in our analysis; an implementation may or may not keep track
of this value. The release time ρ of an event allows the IoT system to perform
delayed actions – one event in the environment may not cause an immediate
response but rather one that happens some time later. We refer to the difference
between generation and release time as lifetime of an event is λ = ρ − ϑ. Events may
be generated periodically or aperiodically. Activation or release times may be peri-
odic or aperiodic.
4.4.2 Networks
A network consists of nodes and links. We will discuss nodes in more detail below.
We model communication links are unidirectional. Most physical hubs are full
duplex, but we model links as unidirectional to advance our analysis.
A single-hub network consists of one hub mode and one or more device nodes and
their associated links. The hub manages the exchange of events between its device
nodes.
In a single-hub network, input traffic arrives at the hub from its device nodes,
while output traffic is generated by the timewheel and goes to the devices.
As a simple example, consider scheduling medications for residents of the home.
If a resident receives medicines twice per day, once in the morning and again in the
evening, the device responsible for scheduling the medicines must generate an event
for each administration. The event for the next medicine administration is probably
generated when the current medicine administration is released, giving an event
lifetime of 12 h.
The morning routine of the residents presents a more complex set of events and
more scheduling choices. Each resident’s routine will generate a series of events
(getting up, toileting, eating breakfast, etc.); depending on the activity, all the events
in the routine may be scheduled at once, or some may be scheduled on the comple-
tion of other events. If all residents get up at once, they create both congestion in the
house and congestion in the hubs and their timewheels – the maximum number of
events in the system will be a function of the number of residents as well as the
complexity of their routines. By staggering the timing of their activities, we can
both reduce physical congestion as well as reduce the number of events that the
hubs must deal with at any given time.
A more general network may contain more than one type of hub. One link or a pair
of links is used to connect the hubs. For the moment, we consider only tree-
structured networks.
In our example system, the in-house system consists of a hub and a set of devices.
The cloud analytics system also uses a hub and timewheel to manage the times at
which events should be processed. For modeling purposes, the analytics engine
itself is a device.
We model event routing as hub-to-hub transfers in which an event is removed
from one timewheel and placed on another. When an event is transmitted to another
hub, we may use additional queue operators to remove the event before it reaches
the head of the queue. We will discuss the effects of event routing in more detail in
the next section.
30 4 Event-Driven System Analysis
The mapping between model nodes/links and nodes/links in the physical network
need not be one-to-one. A single physical device may house several logical nodes.
A single network physical link may be used to transport several logical links.
We can rely on results from parallel computing [Dua02] for techniques for rout-
ing events in multi-hub networks. Physical networks may use separate memories for
queues and buffers on network links.
The network model helps us to understand the behavior of more complex physi-
cal links. We can first separately analyze half-duplex traffic on links and then use
that analysis to understand the characteristics of full-duplex links.
Because events in IoT systems are long-lived, we must consider the lifetimes of
event populations. Because events may be released long after they are generated, the
system may need to accommodate a large number of events even if no events are
currently being generated.
The event population is the number of events that are still alive, given by the dif-
ference between the number of generated and released events. We can evaluate
event population over the entire network or over a set of components in the network.
When events are generated and released with jitter, we can write formulas for the
upper and lower population; here we concentrate on a jitter-free form of the analysis
to emphasize basic principles.
A general form for the population count is to enumerate all events from the sys-
tem start time:
t
P ( t ) = ò éëJ ( t ) - r ( t ) ùû dt.
0
4.5 IoT Event Analysis 31
t2
A wide variety of assumptions and stochastic models are possible for events. In this
section, we use some basic models to derive important design metrics. Although no
one to our knowledge has gathered large traces of IoT activity, we can use models
of traffic from related domains to help us understand IoT design.
We can gain some intuition by considering the simpler case of the Poisson distri-
bution. A common model for telephone traffic is that call arrivals and departures are
4.5 IoT Event Analysis 33
each modeled as Poisson processes. In our case, we use the Poisson distribution to
model event generation at a rate λ. If successive events have non-overlapping life-
times, then their maximum population in that interval is 1; if their lifetimes overlap,
then the maximum population is 2, which must be accommodated by buffering. The
probability that two events have overlapping lifetimes L is
P [t < L ] = 1 - e- l L
This simple formulation suggests that λL, the product of event generation rate and
lifetime, is a useful metric for judging maximum event populations.
The Erlang-B distribution provides a more accurate model for event populations.
In the case of IoT events, the event dwell time corresponds to call duration; the
queues correspond to telephone lines. (The Erlang-C distribution models call wait-
ing with queues. In our case, consider the queue as a set of servers consisting of
memory locations. One memory location/server is required for each event to wait
for its release time.)
The offered traffic is in units of erlangs:
E = lL
In our case, λ is the event arrival rate and L is the event lifetime. The probability of
blocking (i.e., dropping an event due to a full queue) is
E m / m!
Pb = B ( E,m ) =
å
m
E i / i!
i =0
where m is the size of the queue.
The offered event traffic in erlangs is a useful rule-of-thumb metric for IoT sys-
tem traffic – both the frequency of events and their dwell times must be considered
to understand their effect on timewheel size.
We can use Pb to design the timewheel capacities of the hubs, either using the
maximum traffic as a guide or evaluating the traffic at different points in time using
the population functions. Given the systemwide offered traffic, we can find Pb for
the entire network. However, in a multi-hub system, we must determine how to
partition the timewheel memory between the hubs.
We can model the traffic hub by hub:
n n
åE = ål L
i i i
i =1 i =1
From this, we can determine the Pbs. However, this approach does not minimize
total system memory. If we assume that all the hubs share the same values for arrival
rate and event lifetime, then E < ∑1 ≤ i ≤ nEi.
We describe in Sect. 4.5.4 how to transfer events between hubs to balance queue
sizes.
34 4 Event-Driven System Analysis
We can identify three methods for modeling the interaction of the IoT system with
its environment, each with its own degree of accuracy and detail.
The simplest model treats both the device and the user as timed finite-state
machines. Given a path through the user machine that defines a given use case, we
can form the product of the device machine and the user machine path. The result-
ing FSM, along with a timing regimen that is specified by the use case, tells us when
events are generated by the device. That trace can be used to build the event popula-
tion trace.
A more sophisticated model treats the user as a Markov decision process (MDP)
with fixed timing. A Markov decision process is a stochastic model used for optimi-
zation. An MDP is defined by a set of states and possible actions out of each state.
Each action is assigned a reward R. Transitions out of the action to the next state are
assigned probabilities. We can use any of several different algorithms (dynamic
programming, linear algebra) to find the path that maximizes the reward. In this
scenario, we solve for the optimal reward path and form its product with the device
model, using a fixed time model. Figure 4.2 shows an example of a simple device
model and user model. The device model combines the actions of all the component
devices related to the routine into a single state machine for simplicity. The actions
in the user model MDP correspond to states in the device model.
A yet more complex model uses a continuous time Markov decision process
(CTMDP). The most common mathematical form of this model is as an MDP with
the timing of state transitions modeled as a Poisson process [Buc11]. Standard MDP
approaches can be used to solve for the optimal path with timing given by the
Poisson process.
An event does not necessarily have to be stored on the hub that owns either the
event’s source or destination. In a multi-hub system, we can station events at nonlo-
cal hubs to avoid overflowing a hub’s queue capacity or improve its battery life. If
an event is queued nonlocally, we must factor transmission time into its release to
ensure that it reaches its destination device at the proper time.
For simplicity, we consider the case of no energy cost for transporting events
across the network. Let Pe be the power consumption of storing one event in mem-
ory for a unit time. Given a population of events Π, the energy required to store all
events in the population until their release times is
Epop = å éë r ( e ) - J ( e ) ùû Pe
eP
4.5 IoT Event Analysis 35
Fig. 4.2 Models of the morning routine for devices and residents
We have a set of H hubs each with available battery energy Eh(i). We want to find an
allocation of events to hubs such that
"iH : Epop ( i ) £ Eh ( i )
This is a classic bin-packing problem, although we want to solve it as a distributed
problem without a centralized list of events. In practice, transmission energy reduces
the set of plausible event allocations.
We propose a heuristic algorithm for event migration:
• Find a partial ordering of the hubs such that no two adjacent hubs are in the same
set and all hubs are covered.
36 4 Event-Driven System Analysis
Acknowledgment Thanks to the team at Alya Networks for useful discussions on key-value-
based IoT networks.
References
[Buc11] Peter Buchholz (2011). Continuous time Markov decision processes: Theory, applica-
tions, and computational algorithms. TU Dortmund Informatik IV lecture notes.
[Cha03] Chakraborty, S., K¨unzli, S., & Thiele, L. (2003). A general framework for analysing
system properties in platform-based embedded system designs. Proceedings sixth design, auto-
mation and test in Europe (DATE) (pp. 190–195). Munich, Germany.
[Dua02] Duato, J., Yalamanchili, S., & Ni, L. (2002). Interconnection networks. San Francisco:
Morgan Kaufman.
[Hen05] Henia, R., Hamann, A., Jersak, M., Racu, R., Richter, K., & Ernst, R. (2005). System
level performance analysis: The SymTA/S approach. IEE Proceedings: Computer Design
Techniques, 152(2), 148–166.
[Max04] Maxiaguine, A., Kunzli, S., & Thiele, L. (2004). Workload characterization model for
tasks with variable execution demand. Proceedings of the conference on seventh design, auto-
mation and test in Europe (DATE) (pp. 1040–1045). Paris, France.
[Wol15] Wolf, M., van der Schaar, M., Kim, H., & Xu, J. (2015). Caring analytics for adults with
special needs. IEEE Design & Test, 32(5), 35–44.
Chapter 5
Industrial Internet of Things
5.1 Introduction
The Internet of Things (IoT) has already brought a revolution to our understanding
of applications in a wide range of human activity. This trend is expected to increase
in the near future, as the potential economic impact of IoT is expected to be between
900 billion USD and 2.3 trillion USD on a yearly basis up to 2025 [Man13]. IoT
applications are spreading to various sectors including smart energy, manufactur-
ing, agriculture, health, security and safety, smart cities, smart buildings, and smart
environment. All these application areas repeat the same basic model: a large num-
ber of smart devices, interconnected over wired or wireless media, interacting and
coordinating to achieve a goal.
In the industrial environment, the effort for smart factories [Zue10], the Industrie
4.0 strategy [Ind14], the Industrial Internet [GE17], and the European initiative for
the Factories of the Future [FoF] have initiated the adoption of IoT in industry with
the goals of increasing flexibility and productivity, while reducing production cost.
The developing concept is the Industrial IoT (IIoT).
The Industrial Internet of Things is part of the general IoT evolution. However, it
faces challenges that are unique and differentiate it from the other systems and ser-
vices of IoT due to the need to integrate programmable logic controllers (PLC) and
supervisory control and data acquisition systems (SCADA). PLC and SCADA sys-
tems, together with the related industrial networks that interconnect them, constitute
the infrastructure of operational technology (OT), which has traditionally evolved
independently from the typical IT technology, because it addresses the needs of
systems in the field – industrial floor, energy production facilities, energy distribu-
tion networks, etc. – with strong requirements such as continuous operation, safety,
real-time operation, etc. The capabilities offered by the emerging IIoT technology
pose challenges for the integration of these OT systems with the traditional enter-
prise IT systems at many levels, from enterprise management to cyber security. For
example, enterprise resource planning systems (ERP) need to be expanded to
Industrie 4.0 is a strategic initiative in Germany that targets to bring IoT technolo-
gies to the manufacturing and production sectors [Ind14].The goal is to enable
Germany to keep a leading role in manufacturing achieving efficient and low-cost
production with flexible workflows. The means to achieve this goal is the wide-
spread inclusion of cyber-physical systems in the manufacturing and production
processes, in order to insert intelligence in the systems and processes, to enable their
high connectivity and communication, and to achieve their coordination into more
complex but flexible processes that lead to high-quality, low-cost products.
Industrie 4.0 takes its name from the identification of the new, emerging industry
as the fourth revolution of industrial production. It is widely accepted that industrial
production to date has gone through three (3) revolutions. The first industrial revolu-
tion, between the eighteenth and nineteenth century, is the one where mechanized
production facilities were introduced in the production of goods and services, where
the required energy was provided by water and steam. Electrical energy was intro-
duced during the second revolution, which led to mass production, as electricity
boosted productivity. In the post WWII era, the inclusion of electronics and soft-
ware, i.e., industrial information technology, to the mechanical and electrical com-
ponents led to the third revolution that enabled automation at high levels. Currently,
many industrial stakeholders believe that we are at the verge of the next, the fourth,
industrial revolution, through wide adoption and use of cyber-physical systems that
leads not only to even higher levels of automation but enables mass customized
manufacturing and production of goods and services, due to the flexibility offered
by the easily programmable, configurable, and controllable manufacturing lines.
The effort for Industrie 4.0 is based on the widespread deployment and use of
computational and communication resources. The last two decades have been char-
acterized by significant advances in high performance, low-power processors, mem-
ories, and communication components that enable efficient processing and
networking. These advances have brought significant processing capabilities to a
large number of devices that are deployed to consumers or to the field. Smart con-
sumer devices have become norm. Smartphones provide hundreds of applications
and enable services ranging from identifying travel and transportation routes to
mobile banking and health monitoring. Smart televisions combine and provide vari-
ous types of entertainment and network services, from customized TV channel con-
trol and management to Internet gaming and home device management. Smart
home appliances monitor parameters, from environmental temperatures to water
and energy consumption, enabling citizens to manage their homes efficiently and
effectively leading to the required living quality while reducing operational cost at
various fronts.
The large basis of computational resources and connectivity becomes apparent
by the published numbers of embedded processors and components that are cur-
rently produced. According to [Ind14], the vast majority of produced processors,
approximately 98%, are deployed in embedded systems. Deployed semiconductor
40 5 Industrial Internet of Things
memory is also growing and expected to grow at 40% year over year in 2017
[Mic17]. Furthermore, the significant advances of wired and wireless networks in
the last two decades have led to ubiquitous connectivity, approaching 100% in cities
and towns, through different technologies.
The available processing and communication basis leads to an evolving hierar-
chy of embedded systems and services up to the level of the Internet of Things,
Data, and Services. Examples of this evolution can be identified at several applica-
tion areas. In transportation, for example, embedded systems are widespread con-
trolling functions from car entertainment systems to car seat control. At this level,
embedded processors are programmed to control specific, individual parameters,
e.g., height and movement in car seats, based on user commands. However, embed-
ded systems in cars are also networked, either within the car system or with the
environment, providing networked embedded services; automatic toll payment is
one of them where embedded systems in the car and the toll booths communicate
with each other, in order to complete the electronic payment transaction of the toll
passage. Such payment systems from several tolls, for example, can be further com-
bined in a distributed system that enables traffic and toll management at a wider
scale, leading to more effective transportation infrastructure that achieves lower
waiting times and fuel costs for travelers as well as lower operational cost and, thus,
higher income to transportation management authorities. One can even envision an
even higher level of connectivity of such complex transportation systems to smart
cities that combine transportation management with additional services, such as
energy distribution, civil services, emergency services, etc., as required at different
times, locations, and during special events.
The advances of sensor technologies, in addition to the evolution of embedded
systems and communication networks, make all these scenarios realistic.
Importantly, sensors bridge the gap between the physical world and the digital
world, providing increasingly rich information to digital systems and enabling intel-
ligent control of systems and processes. In that respect, manufacturing and indus-
trial automation has been traditionally employing IT technologies with sensors and
electromechanical systems, leading the development and deployment of technolo-
gies and concepts for intelligent control, systems, and services. Thus, the develop-
ment of the Industrie 4.0 strategy and the related initiatives comes as a natural
evolution step of industrial technologies influencing and being influenced by the
advancement of consumer technologies of the Internet of Things.
The smart factory concept embodies the goals of the Industrie 4.0 strategy to a
large degree. The concept is based on the hierarchy of cyber-physical systems men-
tioned above, where smart production systems are interconnected in a multilevel
hierarchy to achieve a high degree of automation, targeting flexibility, efficiency,
autonomy, resilience, safety, and low cost. Smart machines will be interconnected
to establish smart plants, which, in turn, will be combined to provide smart facto-
ries. Considering the typical components of manufacturing process, smart factories
are targeted to automate efficiently all components and stages. Materials and
resources will be managed and introduced in the process efficiently; production
processing will be managed in real time minimizing the used resources for the
5.3 Industrial Internet of Things (IIoT) 41
The Industrial Internet of Things (IIoT) has emerged as a general concept of the
application of the Internet of Things to the industrial sector. Effectively, it is a gen-
eralization of Industrie 4.0, which appears to focus more on industrial process effi-
ciency. The IIoT vision includes all aspects of industrial operations, focusing not
only on process efficiency but also on asset management, maintenance, etc.
Considering that IIoT is effectively IoT in the industrial sector and that the
Industrie 4.0 concepts are effectively a subset of IIot, as shown in Fig. 5.1, one
needs to identify the difference between IoT and IIoT. Although the basic concepts
are the same, i.e., interconnected smart devices that enable remote sensing, data col-
lection, processing, monitoring, and control, the parameters that identify the IIoT
subset of IoT are the strong requirements for continuous operation and safety as
well as the operational technology employed in the industrial sector. As an example,
one can consider the difference between a consumer service such as a health moni-
toring application on a smart watch and an industrial service such as the monitoring
of a steam pump. Although both applications collect real-time data, e.g., steps or
body temperature in the health application case and pressure or steam volume in the
steam pump case, transmit the data, identify events, and provide feedback or com-
mands to operators/consumers and subsystems, clearly, continuous operation and
safety place stricter requirements in the steam pump case, where the potential effect
of a failure is significantly more catastrophic and may lead to costly operation down
time and even human injuries or loss of life.
These characteristics of the industrial sector – technology and requirements –
lead to specialized, demanding solutions for technology and services, justifying the
focus of the industrial sector on a specialized IoT concept. This has resulted to the
strong interest of the industrial sector in the development of specialized concepts,
from strategy to application and technology. The conventional business develop-
ment models that include numerous interdependencies between stakeholders, from
supply chains to service promotion, lead also to a strong need for interoperable
solutions at many levels, from the device level to services. Thus, there is need for
coordinated activities in the evolution to IIoT, which is addressed by consortia, such
as the Industrial Internet Consortium [IIC14] that provides significant leadership in
this emerging field.
The General Electric company introduced the term Industrial Internet in 2012,
as a leader of the Industrial Internet of Things, identifying also the technologies of
machine-to-machine communication, SCADA, HMI, industrial data analytics, and
42 5 Industrial Internet of Things
cybersecurity as the main constituents of the IIoT vision [GE17]. Interestingly, they
also calculate the impact of the Industrial Internet to 46% of the global economy,
while in the energy sector they calculate an impact of 100% on energy production
and 44% on energy consumption globally [GE17].
The development and deployment of IIoT systems and services require the develop-
ment of architectures that enable efficient and effective operations as well as interop-
erability considering the anticipated end-to-end services and the large number of
stakeholders involved for devices, cyber-physical systems, communication systems
and networks, service providers, and business developers. Thus, significant effort is
being spent to develop standards and reference architectures that will be accepted
and adopted by the various stakeholders. The International Telecommunication
Union (ITU) has addressed this issue, publishing in 2012 the ITU-T Y.2060 recom-
mendation, which introduces a reference architecture for IoT, in general, including
explicitly applications that fall in the context of IIoT, such as smart grid, intelligent
transportation systems, e-health, etc. [ITU12]. The Industrial Internet Consortium
(IIC) has also been working on a reference architecture for IIoT and currently has
published Version 1.7 of the Industrial Internet Reference Architecture [IIC17].
This architecture is an elaborated reference architecture, significantly more detailed
than the ITU one, addressing all important aspects to all categories of stakeholders.
Taking into account the details of both reference models, one can consider the IIC
model as a specialized evolution of the ITU model, addressing in more details the
important issues of IIoT relatively to the more generic ITU reference model that
encapsulates the requirements for the general IoT.
The ITU effort has expanded the communications’ vision to include communica-
tion of “anything” to the communication concepts of “any time” and “any place.”
Importantly, it includes all expected applications, including industrial ones, specifi-
cally mentioning smart grids and intelligent transport systems among others. As
“things,” ITU considers physical and virtual objects that are identifiable and able to
5.4 IIoT Architecture 43
T
T
T
Communication Network
(CN)
Direct communication
T G
Communication over CN through Gateway
Management
Service support and Application support Layer
Security
Network Layer
Device Layer
platform, and application provider, while, in Model 2, one stakeholder has the roles
of device, network, and platform provider and another one has the role of the appli-
cation provider.
The Industrial Internet Consortium (IIC) focuses on similar concepts and devel-
ops a reference IIoT architecture that has several similarities with the ITU approach
and reference model. Clearly, the IIC approach to the architecture development
addresses the interests and concerns of all types of stakeholders in an integrated
way, originating from use cases and focusing on complete business models and
applications at all levels, from devices to IIoT services. IIC follows the approach
that different stakeholders who need to make different decisions have architectural
viewpoints that are at different levels of abstraction. These viewpoints enable stake-
holders to focus on the parameters of interest and develop appropriate architectures
that achieve their goals and address the problems they have identified. For this pur-
pose, IIC has identified four different viewpoints: (a) business, (b) usage, (c) func-
tional, and (d) implementation.
The business viewpoint addresses the concerns of business stakeholders, who
define and specify IIoT systems and services in their organizations or for customers.
These concerns, such as return on investment, cost of maintenance, and similar, are
addressed through a model that enables the definition of visions and values which
are translated to key objectives and then to high-level specifications of business
tasks, named fundamental capabilities. The stakeholders involved include business
developers as well as system engineers and product managers.
The usage viewpoint describes how the system is used, implementing the key
objectives and the capabilities that have been specified through the business view-
5.4 IIoT Architecture 47
point. The viewpoint is described with a model that identifies the system and its
activities, the involved parties – humans or machines – and their roles, and, finally,
tasks, i.e., actions that are executed by parties with a specific role. As tasks are the
actions in the system, they are precisely specified and described per role with, so
called, functional and implementation maps that specify the exact functions and
implementation subsystems that are necessary for a task’s complete execution. The
stakeholders involved in the usage view include not only the systems engineers and
the product managers of the related employed products but all stakeholders that are
involved in IIoT system and service specification, including the end users.
The functional viewpoint presents the functional architecture of the IIoT system,
describing its components, dependencies, and coordination, meeting the require-
ments and specifications that have been developed through the usage viewpoint. The
stakeholders involved in this viewpoint are system and subsystem developers, prod-
uct developers, and managers as well as system integrators.
Considering the focus of IIC on IIoT and the increasing adoption of industrial
control systems (ICS) within the industries of several sectors and in the operation
and management of critical infrastructure, the IIC reference model focuses on its
functional architecture of IIoT systems on the integration of ICS with classical
information technology (IT) systems in a unified, effective model that meets the
requirements of all stakeholders – as specified in the business and usage view-
points – and enables their effective decisions. The inclusion of ICS and IT in a uni-
fied model presents several challenges. Industrial control systems, the systems of
Operational Technology (OT), have been developed following a different evolution
path from typical IT systems, because of their goals and requirements that typically
include continuous operation, safety, and real-time constraints; OT systems have
been mostly developed and owned by control and operations engineers, they employ
different technologies for processing, communications, and interfaces because they
interface directly with the environment through sensors and actuators, and they are
managed by their owners independently, since they are typically part of demanding
systems and services in terms of dependability, continuous operation, real time, and
safety. As a result, their technologies, practices, and standards have evolved inde-
pendently from the ones for IT. However, the increasing capabilities offered by
advanced sensors and actuators, processors, and memories have enabled ICS to
execute highly complex operations that have been developed for complex IT sys-
tems, such as high-volume data collection and analysis, multivariable modeling and
optimization, etc. Importantly, at the same time, the increased capabilities and the
increasing complexity of ICS have led them to be more vulnerable to failures and
cyber-attacks, leading to additional functional requirements for their correct and
efficient operation.
In order to address the integration of IT and OT in a unified model, the IIC
approach to the reference architecture divides IIoT systems in five domains, each
one grouping the functionality required for a logically distinct high-level operation
of the system. These five domains are (a) control, (b) operations, (c) information,
(d) application, and (e) business. Figure 5.5, from [IIC17], illustrates the decompo-
sition of the functional representation of an IIoT system into the five domains and
48 5 Industrial Internet of Things
Business
Control flow
Data flow
Information
Application
Operations
Control
Sense Actuate
Physical system
shows the data and control flow among the domains, as specified by IIC. The control
domain effectively represents the control loop realized by industrial control sys-
tems, i.e., it contains the sensors, the logic, and the actuation that constitute a plant
implemented by one or more industrial control systems. The operations domain
includes the functions that are required for the operation of the industrial control
systems in the control domain; the operation includes system monitoring and man-
agement as well as optimization for the efficient operation of the systems, especially
considering the requirements of several application domains for continuous opera-
tion, meeting real-time requirements, and achievement of low-power objectives.
The information domain is responsible for collecting data from all domains and
analyzing them to enable high-level decisions for the system, e.g., coordinating and
optimizing the end-to-end operation of several industrial control systems in the con-
trol domain. The application domain includes functionality that is application-
dependent and effectively includes the models and operation rules of the application
at hand; an important part of this domain is the set of APIs and user interfaces so that
other applications or human users can use the application effectively. Finally, the
business domain includes systems and functions that enable management and deci-
sion making at the business level, e.g., with enterprise resource planning systems
(ERP), manufacturing execution systems (MES), etc.
5.5 Basic Technologies 49
It is important to note that the IIC approach is centered around the concept of a
control plant, i.e., it addresses all viewpoints around a control loop that implements
a plant. Since control loops can be simple, with one system, or complex with mul-
tiple systems typically organized in a hierarchy, the IIC functional domain decom-
position can be applied at all levels of a hierarchy. Thus, the decomposition of an
IIoT system in the domains does not represent a layered approach as the ITU
approach, but rather a logical functional decomposition within a layer or across lay-
ers in a hierarchy. Because of this, the IIC reference architecture identifies “cross-
cutting functions” that are effectively hierarchical (or layered) IT infrastructure
functions necessary for the development of a complete IIoT application. These
functions include connectivity, distributed data management, analytics, intelligent
and resilient control, and any other application function that is necessary for the
specific application domain or use case. For example, connectivity has to be imple-
mented in a hierarchical fashion, following standards and practices, interconnecting
components within an industrial control system or across several such systems,
where each system can be viewed as a collection of functions from all five specified
domains. Observing the crosscutting functions mentioned, one can realize that they
effectively constitute a layered architecture analogous to the one by ITU. In that
respect, one can consider the IIC approach and the ITU approach as complementary,
with the IIC reference architecture being a generalization of the ITU one, since it
includes crosscutting functions analogous to the ITU layers, while it enables the
development of more detailed functional models per layer addressing complete con-
trol loops and providing support to all types of stakeholders – from device designers
to business developers – for effective decision making.
This analogy and complementarity becomes more apparent with the implemen-
tation viewpoint, which addresses the implementation details of the functional
viewpoint developed for an IIoT system. The implementation viewpoint includes all
the necessary technical and technological details that are necessary for the imple-
mentation of a complete IIoT system and its application, including system function-
ality, technological requirements, communication and network protocols, all types
of interfaces, and a mapping of the functional blocks that are specified in the func-
tional viewpoint onto typical implementation architectures, such as the three-tier
architecture (where the three tiers are the edge, platform, and enterprise) and the
layered databus architecture.
The basic technologies that enable the evolution of IIoT are the sensors, cyber-
physical systems, and the related communications and networking technologies that
enable their connectivity, among them or to other systems, including enterprise net-
works. As basic technologies, we designate the ones that are all common to all
application domains and use cases.
50 5 Industrial Internet of Things
IIoT applications span a wide range of IoT application domains. Operational tech-
nology (OT) systems have become the basic computation platform for the operation
and management of most critical infrastructure. The high processing and storage
capacity of PLCs, their ability to manage real-time applications with high availabil-
ity, and their easy management by available SCADA systems have made them quite
popular as building blocks of large infrastructures beyond the manufacturing floor,
for which they were originally introduced. Today, a large portion of infrastructure is
based on industrial control systems (ICS) and makes this critical infrastructure a
potential provider of IIoT services and user of IIoT technology. The energy sector is
probably the most demanding one on the use of ICS, since the production and pro-
cessing of energy is part of a country’s heavy industry and thus, naturally, includes
large ICS platforms. In addition, ICS are used heavily in power distribution net-
works, such as the electricity network. Considering the emerging smart grids that
provide monitoring devices, i.e., PLC-like systems, to customers, it becomes appar-
ent that ICS are the main computing infrastructure in power systems end to end,
from production to consumption.
5.6 Applications and Challenges 51
References
[Amq14] ISO/IEC 19464:2014 (2014). Information technology: Advanced message queuing pro-
tocol (AMQP) v1.0 specification.
[Ant16] Antonopoulos, C., et al. (2016). Integrated toolset for WSN application planning devel-
opment commissioning and maintenance: The WSN-DPCM ARTEMIS-JU Project. MDPI
Sensors.
References 53
[Bi14] Bi, Z., Xu, L. D., & Wang, C. (2014). Internet of Things for enterprise systems of mod-
ern manufacturing. IEEE Transactions on Industrial Informatics, 10(2), 1537–1546.
[Blu] Bluetooth specifications. https://www.bluetooth.com/
[CEP10] (2010, March). CERP-IoT vision and challenges for realising the Internet of Things.
CERP-IoT – Cluster of European research projects on the Internet of Things.
[Che13] Chesire, S., & Krochmal, M. (2013). Multicast DNS. IETF RFC, 6762.
[Eno] EnOcean Alliance. https://www.enocean-alliance.org
[FoF] Factories of the Future. European factories of the Future Research Association (EFFRA).
http://ec.europa.eu/research/industrial_technologies/factories-of-thefuture_en.html
[Fuq15] Al- Fuqaha, A., et al. (2015). Internet of things: A survey on enabling technologies
protocols and applications. IEEE Communications Surveys & Tutorials, 17(4), 2347–2376.
[GE17] GE Digital. Industrial Internet insights from GE Digital. https://www.ge.com/digital/
content/industrial-insights-from-ge-digital
[Hat11] Hatziargyriou, N. (2011, September 13–15). Network of the future. Presentation on
behalf of CIGRE TC at the panel session “The electric power system of the future: an interna-
tional overview”. CIGRE international symposium, “The electric power system of the future:
Integrating supergrids and microgrids”. Bologna, Italy.
[Hui11] Hui, J., & Thubert, P. (2011, September). Compression format for IPv6 datagrams over
IEEE 802.15.4-based networks. IETF RFC 6282.
[IEC16] OSI. Information technology – Security techniques – Information security management
systems – Overview and vocabulary. ISO/IEC 27000:2016. http://ww.iso.org
[IIC14] Industrial Internet Consortium. http://www.iiconsortium.org/
[IIC17] IIC. (2017). The industrial Internet of Things volume G1: Reference architecture.
IIC:PUB:G1:V1.80:20170131.
[Ind14] Germany Trade and Invest. (2014, July). Industrie 4.0 smart manufacturing for the
future.
[ISA16] ISA. (2016, December). The 62443 series of standards – Industrial automation and con-
trol systems security. ISA. http://www.isa99.isa.org/Public/Information/The-62443-Series-
Overview.pdf
[ITU12] ITU-T. Overview of the Internet of Things. ITU-T SERIES Y: Global information
infrastructure Internet protocol aspects and next-generation networks, recommendation
Y.20606/2012.
[Kok09] Kok, K., et al. (2009, June 8–11). Smart houses for a smart grid. 20th international
conference and exhibition on electricity distribution: Part 1, CIRED 2009, Prague.
[Kou11] Kourtis, G., Hadjipaschalis, I., & Poullikkas, A. (2011). An overview of load demand
and price forecasting methodologies. International Journal of Energy and Environment, 2,
123–150.
[Kou16] Koulamas, C., Giannoulis, S., Fournaris, A. (2016). IoT components for secure smart
building environments. Components and services for IoT platforms: Paving the way for IoT
standards. Springer.
[Man13] Manyika, J., et al. (2013, May). Disruptive technologies: Advances that will transform
life business and the global economy. McKinsey Global Institute www.mckinsey.com/mgi
[Mic17] Tanner, P. (2017, June 28). Micron benefits from memory mar-
ket’s faster growth rate. Market Realist. http://marketrealist.com/2017/06/
micron-benefits-from-memory-markets-faster-growth-rate/
[Mon07] Montenegro, G., Kushalnagar, N., Hui, J., & Culler, D. (2007, September). Transmission
of IPv6 packets over IEEE 802.15.4 networks. IETF RFC 4944.
[Mqt16] ISO/IEC 20922:2016 Information technology: Message queuing telemetry transport
(MQTT) v3.1.1
[Rig11] Rigatos, G. G. (2011). Modelling and control for intelligent industrial systems: Adaptive
algorithms in robotics and industrial engineering. Springer.
[Rig13] Rigatos, G. (2013). Advanced models of neural networks: Nonlinear dynamics and sto-
chasticity in biological neurons. Springer.
54 5 Industrial Internet of Things
[Rig15] Rigatos, G. (2015). Nonlinear control and filtering using differential flatness approaches:
Applications to electromechanical systems. Springer.
[Rig17] Rigatos, G. (2017). Intelligent renewable energy systems: Modelling and control.
Springer.
[She12] Shelby, Z., Chakrabarti, S., Nordmark, E., & Bormann, C. (2012, February). Neighbor
discovery optimization for IPv6 over low-power wireless personal area networks (6LoWPANs).
IETF RFC 6775.
[She14] Shelby, Z., Hartke, K., & Bormann, C. (2014, June). The constrained application proto-
col (CoAP). IETF RFC 7252.
[Zue10] Zuehlke, D. (2010). SmartFactory—Towards a factory-of-things. Annual Reviews in
Control, 34(1), 129–138.
[Zig] ZigBee specifications. http://www.zigbee.org/
Chapter 6
Security and Safety
6.1 Introduction
The Internet of Things (IoT), including the Industrial Internet (IIoT), refers not only
to the connectivity of systems and devices but to the related applications and ser-
vices that provide monitoring and control of complex systems and services. The
application domain spans a wide range of industries, from health to industrial con-
trol and from transportation to surveillance systems. Its expansion and growth
incorporate several technologies and disciplines, such as electronics, embedded net-
works, hybrid systems, and control. The inclusion of information technology (IT) as
well as operational technologies (OT) creates a challenge for the development of
systems and services that are technologically interdisciplinary. The resulting chal-
lenges to integrate these technologies in new design methodologies for robust and
effective IoT systems and services are significant. Currently, even the terminology
used by different stakeholders presents challenges and inconsistencies to the com-
mon understanding of properties and goals of IoT infrastructure and applications.
Considering the targeted applications and services of IoT, in this chapter, we
address security and safety of IoT systems and services with an approach that spans
from systems to applications (services or processes) in a unified way, using termi-
nology that originates from computing, networking, and control, since these disci-
plines constitute the main pillars of IoT technologies in all IoT application domains.
This approach is consistent with the reference architectural models of both ITU and
the Industrial Internet Consortium, as presented in Chap. 5. For convenience, we
address security in this chapter following the ITU model, which divides security
mechanisms in two parts, one for generic security and one that is application depen-
dent; we use the terms application dependent and process dependent
interchangeably.
IoT applications, in general, collect data through sensing devices, process this
data, and take actions that range from sending notification and raising alarms to tak-
ing actions through actuators on physical systems. A simple generic model for this
Control Center
(C)
Control
actions Measurements
Device
Actuators Sensors
(D)
operation is the model of the control loop that is used across many application
domains and is depicted in Fig. 6.1. In this model, a device D is controlled by a
control center C. Measurements of the parameters of interest are collected from D
through sensors and delivered to C which makes the necessary calculations and
takes the necessary decisions and actions for the application; if the application
requires automatic actions, C sends the necessary commands to actuators that con-
trol D. The model is generic and covers application across domains ranging from
health to transportation and from aerospace to manufacturing. In a health applica-
tion, for example, sensors measure patient parameters, such as temperature and glu-
cose levels, and send them to a monitoring program – analogous to the control
center – and decisions are made depending on the application; a message may be
sent to attract a patient’s or a doctor’s attention, or an insulin pump may be opened
to administer more insulin. In a manufacturing floor, sensors may detect the arrival
of a component and send the data to a control center which, in turn, sends the appro-
priate commands to the machine that will process the component accordingly.
The control loop model shown in Fig. 6.1 is implemented on a computational
platform that has a different structure from the one indicated in the control model.
Figure 6.2 shows a typical hierarchical computational structure for industrial sys-
tems, an important class of IoT systems, showing how the computing systems, net-
works, sensors, and actuators are typically used to implement the operational
computing infrastructure of the control loop. Sensors and actuators are attached to
the controlled device (D in Fig. 6.1), programmable logic controllers (PLCs) imple-
ment simple controls – one per PLC typically – and the supervisory control and data
acquisition (SCADA) system implements the control loop for the complete process,
also denoted as plant. The PLCs in the structure are simple industrial computers,
and their number differs according to the application. In a smart grid, for example,
different PLCs may take actions locally per transformer, while SCADA controls the
6.1 Introduction 57
Fig. 6.2 Hierarchical
computing structure for
Supervisory Control
control loop and Data Acquisition
(SCADA)
Network
Network
Valve Pump
complete smart grid; in a water management system, a different PLC may control
each pump, while SCADA controls the water system of an industrial site.
In this environment, there are several properties we want to achieve. From the
control point of view, these properties are typically safety properties. For example,
we want to avoid overloading of a smart grid, to avoid the overflow of a fluid tank,
or to avoid overdose of a pharmaceutical substance that is automatically adminis-
tered to a patient. These properties can be violated because of several reasons. A
programmer may have introduced a bug in the program, the requirements of the
system may have missed a condition that should had be taken into consideration, the
middleware of the system may give the wrong priorities to control processes, or,
simply, a malicious party may attack the system and cause it to take the wrong
actions.
The safety requirements for applications are typically expressed as requirements
on the control loop which implements applications. These expressions are based on
assumptions about properties of the infrastructure on which the application is imple-
mented. For example, an HVAC control system assumes that the temperature mea-
surements that are input to the system are correct within some approximation. This
implies that the safety properties are based on assumptions for data integrity that
need to be satisfied by the infrastructure. In general, safety requirements include
infrastructure security ones, such as integrity, implicitly or explicitly. A typical
explicit security property is the protection of personal information in a health man-
agement system. Thus, it becomes clear that security is a requirement for safety as
well, since data integrity is necessary at least.
58 6 Security and Safety
In order to identify the requirements and mechanisms that are required to provide
the necessary security properties in the IoT and IIoT context, we follow the layering
shown in Fig. 6.3, which has been introduced in [Ser13]. Figure 6.3 defines our view
of the relationship between application and process properties, such as safety and
privacy, and security and dependability mechanisms which are provided at the sys-
tem level and are used as primitives to provide the application and process
properties.
The depicted layering is based on our approach to differentiate system level
properties, such as secure storage, secure communication, tamper resistance, etc.,
from properties that are required and provided at the application level. In this
approach, we consider that (embedded) systems and their interconnections are built
to operate resiliently overcoming failures, accidental or malicious, that lead to infor-
mation loss, leakage, and availability. Dependability mechanisms focus more on the
aspects of reliability and availability considering accidental failures, using probabi-
listic models for the failures, while security mechanisms focus on the provision of
alternative properties, e.g., confidentiality, authentication, availability, etc., based
on defined malicious attack models. Although some dependability and security
properties, such as the availability of information, are common between the two
disciplines, others, such as confidentiality or continuous operation, are complemen-
tary. In general, dependability is complementary to security, because an attacker can
insert faults and failures – analogously to launching attacks on security mecha-
nisms – that the dependability mechanisms cannot recover from. Clearly, the com-
bination of dependability and security mechanisms at the system level provides
trusted platforms that are both secure and available under accidents and attacks.
Safety and privacy are often described as security requirements in many applica-
tion domains, although they are different from the typical security considerations in
many ways. Typically, privacy protection and safety are requirements for processes,
applications, and services, rather than for generic systems. In our approach, privacy
and safety are dependent on security, because they employ security mechanisms for
their implementation, such as data integrity and confidentiality. Interestingly, safety
and privacy are overlapping, because privacy is a safety issue in some contexts, such
as the financial transactions. It is important to note that, as Fig. 6.3 indicates, secu-
rity and dependability are requirements for privacy and safety. If security mechanisms
60 6 Security and Safety
are lacking, an attacker can violate privacy by easily collecting data or can alter
processes and applications, leading to unsafe conditions.
The threat model we consider for IoT systems is one that includes both compu-
tational attacks and data attacks. Computational attacks include all malicious actions
in a computing system that affect the correct execution of a program and/or lead to
information leakage. Data attacks constitute all attacks on input or communicated
data. We extend the concept of data attacks to include false data injection attacks,
which are malicious interventions that input inappropriate (illegal) data to a system.
False data injection (FDI) attacks are an emerging class of attacks to IoT systems,
which do not attack the IoT systems themselves but input wrong data to a control
system in order to lead it to a wrong decision. In that respect, they are mostly safety
attacks. For example, in an HVAC system, a false data injection attack would be to
input a higher temperature to the system, instead of the correct measure, in order to
lead it to lower the temperature further. Clearly, this type of attacks can lead to haz-
ardous conditions that may endanger processes and systems, even human life.
IoT systems are embedded computing systems that employ architectures analogous
to general-purpose ones. A typical structure of an IoT system is shown in Fig. 6.4,
where the system contains four main subsystems: (i) processing, (ii) memory, (iii)
input/output, and (iv) power. In general, a secure system requires protection as a
whole in addition to protection of all its components individually. The specific
requirements are placed depending on the operational environment and the expected
capabilities of attackers. In a surveillance system, for example, optical sensors
(cameras) need to be secured individually, but the whole network needs to operate
dynamically in case individual cameras are compromised or destroyed.
The security of stand-alone systems is achieved with several levels of protection
that include physical and hardware security as well as trusted computing platforms.
Anti-tampering techniques enable different levels of physical protection ranging
I/O
Power (Battery)
6.2 Systems Security 61
from tamper evidence to tamper response and tamper resistance and are employed
accordingly depending on the security requirements of the system and its opera-
tional environment. Techniques for tamper evidence simply indicate whether a
device has been tampered with. Tamper-response methods combine tamper detec-
tion with tamper reaction, where appropriate actions are taken after tamper detec-
tion; for example, they destroy stored sensitive data. Tamper resistance methods
prevent tampering with devices and protect any sensitive data in the device from
attacks.
Anti-tamper technologies have been developed to protect systems after their
deployment, so they need to address physical and hardware attacks of attackers with
variable capabilities in a wide range of hostile environments, especially for critical
applications such as surveillance. They need to combine physical as well as algo-
rithmic mechanisms. Traditional encryption of data, for example, is not a sufficient
solution to data protection nowadays, especially in limited-resource systems, where
encryption can be overcome with simple attacks. Side-channel attacks have changed
the attacks on cryptosystems exploiting physical parameters of the implementations
of cryptographic algorithms, such as timing and power consumption, rather than
attacking the algorithms themselves [Koc96, Koc99, Qui01] or introduce faults dur-
ing cryptographic computations [Bar06, Joy09].
Complex hardware systems such as processors and micro-controllers are suscep-
tible to physical and hardware attacks similarly to dedicated circuits, such as cryp-
tographic circuits [And96, Bly93]. Defenses against such attacks require dedicated
hardware, specialized design techniques, or even new architectural concepts. For
example, a sensitive program can be protected from attacks by storing it in a special
design of execute-only memory that allows instructions stored in memory to be
executed only and does not allow any other manipulation [Lie00]. Encrypted buses
protect data from leakage during data transfers between a processor and its memory
[Bes81, Kuh97]. Decay caches can protect from side-channel attacks avoiding
cache information leakage [Ker08].
Anti-tampering techniques protect against attacks after system deployment. New
business environments can drive embedded systems insecure by planting hardware
Trojans during the design and manufacture phase [Jin10].
Embedded and cyber-physical systems, in general, are widespread and have
attracted a large range of attacks [Rav04]. Defense against them requires a combina-
tion of software and hardware techniques, in order to cover all potential attacks.
This is especially important in emerging cyber-physical and IoT systems, which
include operating systems or specialized middleware. More complex programmable
systems require adoption of such methods as secure booting [Arb97] to establish
system integrity, process isolation, and process level attestation techniques [Mic11]
to protect running processes as well as techniques for context switching, exception
handling, inter-process communication, and memory management [Lie03, Gar03].
Overall, the increasing programmability of these systems requires appropriate soft-
ware security techniques. Software techniques also offer a cost advantage over
hardware techniques. Furthermore, the combination of software techniques with
62 6 Security and Safety
When the key set sizes are chosen appropriately, all network end points of a network
can communicate successfully [Cha03].
Networked systems, especially through the Internet, need to ensure that data are
being communicated only among authorized users and processes and that these
exchanged data are “legal.” This is usually achieved through the use of firewalls,
which are typically implemented at the network and application layers, in end point
systems or in the network infrastructure [Bol95]. IoT systems typically have very
well-defined communication needs, and thus, firewalls can be easily configured to
allow strictly the limited type of legitimate communication. The decision about
where the firewall should be implemented, i.e., at the network or application layer,
at the end system, or in the network, depends on the end point system, the network
and their available resources, as well as on the network topology. For example, ad
hoc networks need protection at the node level, while more centralized systems can
rely more on network level protection [Sli02].
Denial-of-service (DoS) and distributed denial-of-service (DDoS) attacks are a
significant threat against IoT systems or exploiting IoT systems. DoS attacks over-
load resources, such as processor, memory, and network, of the targeted system, in
order to prevent it from performing its intended functionality or serving its users.
In general, there are two basic types of DoS attacks [Hus03]. The first type of
attack exploits vulnerabilities, hardware or software, by sending carefully con-
structed packets to the target system; the typical goal is to crash the target system.
Often, such vulnerabilities are exploited because systems are not patched. This
makes IoT systems especially vulnerable to these attacks, because many IoT sys-
tems are not configured to update their software automatically and a wide popula-
tion of users is not sufficiently aware of the risks and actions they need to take to
protect their systems accordingly
In the second type of attacks, the distributed denial-of-service (DDoS) ones, a
large population of compromised systems create vast amounts of network traffic
toward a victim system; this traffic is combined with legitimate traffic as well. The
overload of the aggregated arriving traffic at the target system overloads its resources
and renders it incapable to serve its legitimate users. The recent incident of the Mirai
botnet attack [New16] demonstrated clearly that IoT devices are vulnerable to mal-
ware injection and they can be effectively used to launch DDoS attacks; in the Mirai
case, they attacked an Internet directory service, causing significant and costly dis-
ruptions to Internet connectivity worldwide.
DDoS attacks are difficult to stop because they exploit shared network services
that are accessed by all systems connected to a network. The current version of the
Internet Protocol (IPv4) allows systems to send IP packets with arbitrary values in
the source IP address field, making it difficult to identify sources of offending IP
packets in many attacks [Wan07]. Current efforts to defend against DDoS attacks
are usually based on intrusion detection and traceback schemes for detection, filter-
ing, and tracing of an attack [Pen07]. Intrusion detection employs signature- and
anomaly-based detection techniques [Cab01, Wan02], while packet marking [Bel03,
Sav01] and packet logging [Sno02] are used for attack traceback.
64 6 Security and Safety
u pgrading of the IoT systems. There are several approaches to this challenge. One
can limit or prevent the ability to upgrade software components that manage critical
system resources in highly hostile application environments. Alternatively, in safer
environments, strict access control mechanisms can be used to enable upgrades of
different software components by different operators. Mobile code transmission
may be prevented, while wired code transmission may be allowed when connectiv-
ity is in a controlled environment. In general, remote management of systems, espe-
cially IoT systems with limited resources, requires a secure architecture that
addresses the operational environment as well as the profiles of the potential
attackers.
of developing the application programs as one where the application designers pro-
vide the specifications of the application, including the safety properties, and then
the software is developed accordingly to meet these specifications and be secure
from vulnerabilities overall. In this fashion, the safety and security problem becomes
a verification and monitoring problem: first is the verification of the produced appli-
cation software, i.e., that it meets the set requirements, and second the monitoring
of the execution of the verified program in order to ensure that it is not altered and
executes as expected, based on the specification.
This approach is a behavioral approach to safety and security, since it is based on
the specification of the application process. In this context, application behavior is
defined by the executable specification that is the starting point of the approach, and
this is the way the term is used in the remainder of this text.
Run-time monitoring systems for security can be classified based on two parame-
ters: (i) the method that describes the behavior, i.e., profile based or model based,
and (ii) the method that compares the behaviors, i.e., matching to bad behavior or
deviation from good behavior. This leads to a classification with four classes, as
shown in Fig. 6.5.
Profile-based approaches monitor parameters of the observed system and build a
profile of system operation. Class 1 monitoring systems that detect attacks by
matching with bad behavior (Class 1 in the figure) typically use statistical methods
and machine learning methods to build profiles of bad behavior and statistical pro-
files of attacks [Hod04, Val00]. They are more robust than model-based systems
(Class 2 systems) because machine learning typically generalizes from the collected
data, but they suffer from high false alarm rates, and they do not provide rich infor-
mation for diagnosis when an alarm is raised. Systems in Class 3, which detect
deviations from good behavior, usually build a statistical profile of good behavior
and detect deviations from that [Kim04, Lak05].These systems are actually more
robust than the ones in Class 1, because they do not depend on any past information
of attacks and, thus, they raise alarms when new attacks are launched, because all
deviations from good behavior are detected. However, not only do they provide
limited diagnosis information, i.e., only that something extraordinary has happened,
but they suffer from high false alarm rates, because the deviation may not be mali-
cious or accidental, but it can also be normal but just out of the statistically accepted
profile behavior.
Model-based monitoring systems, Class 2 and Class 4 systems, use a model of
the behavior of the monitored system. Such systems are popular in highly secure
Behavioral
comparison
environments, where successful attacks have high cost. Because they use a behav-
ioral model of the observed system, these monitors provide rich diagnostic informa-
tion when alarms are raised, in contrast to profile-based monitors. Despite this rich
information though, Class 2 monitors are limited because they can detect only
known attacks; this originates from their bad behavior models which are already
known by definition, i.e., the attacks exist [Pax99, Roe99]. Signature-based systems
are typical examples in this class. Class 4 monitors detect deviations from a good
behavior model [Wat07, Gol07] and thus provide even higher diagnostic informa-
tion, because there is adequate knowledge of the exact problem, e.g., the exact
instruction, that led to a detected deviation. However, the execution overhead of the
models of good behavior poses limitations to run-time system performance.
A promising approach that addresses safety and security in a unified way in IoT
systems and cyber-physical systems is the ARMET approach [Kha17]. ARMET is
based on three basic concepts: (i) we can build secure-by-design systems, (ii) we
can monitor these systems at run-time for correct operation to detect attacks or fail-
ures, and (iii) when there is a failure or an attack, we can have plans to recover,
depending on the problem and how much information we have about it. ARMET
has been developed focusing on industrial control systems, but it is applicable to
other IoT systems as well, since their software complexity is comparable to that of
industrial control systems.
With the ARMET approach, an IoT application is developed starting from an
executable specification which is provably consistent with the safety properties set
for the application. From this executable specification, the application code is
derived. Given the executable application specification and the application code for
the target system, ARMET monitors the behavior of an application while it exe-
cutes, by comparing its observed behavior to the expected behavior based on the
application’s specification; to achieve this, a middleware executes the executable
specification in parallel with the application execution on the IoT system and calcu-
lates predictions of the application’s behavior. Figure 6.6 shows the structure of the
ARMET middleware system, which is composed of several components: (i) the
run-time security monitor, (ii) the diagnosis module, (iii) the recovery module, (iv)
the trust model, (v) the adaptive method selection module, and (vi) the backup mod-
ule. The run-time security monitor is a critical component of the middleware, which
takes as input the executable specification of the application and the state of the
system that executes the code of the application. The monitor observes the behavior
of the application execution, and, in parallel, it predicts the state of the application
execution by executing its specification; the specification execution defines the
expected “good behavior” of the application and, optionally, known “bad behavior”
of the application that includes known attacks. Comparing the predictions with the
observations, the monitor can detect deviations that indicate a failure of the
6.8 The ARMET Approach 69
ARMET
Application Application
specification code
in_pump
h wh
out_pump
executable specification can produce automatically executable code for the target
system, IoT or industrial.
The ARMET run-time security monitor (RSM) successfully identifies inconsis-
tencies between predictions, produced by the execution of the specification, and
observations of the application code execution, because of its executable specifica-
tion language [Kha15]; importantly, the predictions are generated automatically.
The run-time security monitor (RSM) is the first one to be formally proven as sound
and complete [Kha15]; the proof means that the monitor is also free of false alarms
(detections), an important, desirable property in practical systems, where false
alarms lead to lost resources that are used to explore the false alarms. Importantly,
ARMET’s specification language allows the specification of faulty behaviors as
well as attack plans, which can be used by the monitoring system for threat
detection.
The ARMET approach is based on the concept that a system can be specified
with an executable specification. Based on an appropriate functional specification
for a system, one can express the safety and security properties that the system
should meet as conditions of the specification and include them in the specification
as well. As an example, let us consider the case of a water tank which has a height
h, as shown in Fig. 6.7, and two pumps that are controlled, one for filling the tank
with water, denoted in_pump, and one, out_pump, for draining the water out; each
of the two pumps has only two possible states, i.e., open or closed. Furthermore, we
assume that there is a sensor that measures the water height, denoted wh, in the tank.
We want to have a water management system, where a user issues commands to
pour water or drain water from the tank. For simplicity, we consider that a user can
perform three actions, FILL, DRAIN, or NOTHING, and that the system operates
in cycles, synchronously with a clock. So, during every cycle (clock tick), one
action can be performed. A FILL action implies that in_pump opens, out_pump
closes, and for this one time unit water is poured in the tank. A DRAIN action
means that in_pump closes, out_pump opens, and for this one time unit water drains
out of the tank. When the action is NOTHING, then both pumps are closed and the
state of the tank remains the same. In an environment like this, an obvious safety
property is that we do not want the tank to overflow under any conditions.
6.8 The ARMET Approach 71
«Enumeration»
Action
- FILL
- DRAIN
- NOTHING
«StereoType»
WaterTankSpec
- water_level : Integer :=0
- SENSOR_ACCURACY : Real := 0.01
- FILL_RATE : Integer := 1
- DRAIN_RATE : Integer := 1
- TANK_MAX : Integer := 10
+ readValue (reading : Integer) : void
+ doAction (water_level : Integer ) : Action
Figure 6.8 shows one executable specification, written in UML, which imple-
ments the three defined actions, assuming that each action FILL or DRAIN has as a
parameter an integer value for the variable water_level, which specifies the target
height of the water that the user wants to obtain; furthermore, the specification
ensures that the water tank never overflows. In the specification, the three actions
are defined in enumeration: SENSOR_ACCURACY defines the measurement
accuracy of the reading sensor for the water level in the tank, FILL_RATE is the
72 6 Security and Safety
incoming water rate through in_pump, and DRAIN_RATE is the rate of the outgo-
ing water when out_pump opens. TANK_MAX is the height h of the tank.
When an action is issued by the user, the system first takes a reading of the water
level with the sensor, as specified in readValue, and identifies whether the target
water height differs from the measured height within the sensor’s accuracy bounds.
If the target height is different, then the corresponding action is performed, pouring
water in or draining water out until the target height is achieved. The safety property
is enforced, because of the precondition that is expressed in doAction(), which
ensures that a FILL action is performed when its result leads to a water height that
is less or equal to TANK_MAX.
Since RSM is sound and complete, it is proved that it will detect all computa-
tional attacks on the application. This means that any attack that influences the
execution of the application and leads to wrong calculations will be detected. This
has been confirmed with several computational attacks [Kha17]. Importantly, RSM
captures a wide range of false data injection attacks as well. For example, if an
attacker wants to overflow the water tank of the example and alters the reading of
the sensor to a lower value – with the purpose to cause insertion of larger volumes
of water – RSM will identify the attack, because the execution of the specification
will calculate a different value for the water level than the one measured with the
sensor. The difference between the expected water level and the one read will lead
to a detection of the deviation; it will raise an alarm and, eventually, will cause the
action to be stopped. Although there exist complex false data injection attacks that
are not detected by RSM, its detection of common attacks combined with the proof
that it detects all computational attacks makes the ARMET behavioral approach a
powerful tool for the protection of processes and applications in the IoT space.
Privacy protection is one of the most significant challenges in IoT systems because
of the legal requirements in many application domains such as home environments,
smart grids, and health systems. There are increasing restrictions and constraints on
the collection, storage, and processing of personal information involved in all appli-
cations, including IoT. Privacy protection solutions may need to integrate a range of
methods and techniques, such as time-limited storage of sensitive information,
access control systems to enable access only for authorized personnel, accounting
systems to enable auditing, etc. The burden to comply with the required policies and
laws is further increased by the increasing amount of information considered as
personal or private, which leads to a need for adaptive and scalable solutions that
accommodate new policies as the relevant legal requirements emerge [Mul06]. The
ARMET approach provides a powerful solution to the problem of privacy protec-
tion, when privacy protection is viewed as a safety property. Privacy protection
originates from legal requirements that can be expressed as conditions in an infor-
mation system, i.e., they can be expressed as preconditions, postconditions, or
References 73
References
[AES01] NIST. (2001). Advanced Encryption Standard. FIPS Publication 197, November 26,
2001.
[Ana15] Anand, A., & Knepper, R. (2015). ROSCoq: Robots powered by constructive reals.
In Proceedings of the 2015 International Conference on Interactive Theorem Proving (pp.
34–50). Springer LNCS-9236.
[And96] Anderson, R., & Kuhn, M. (1996). Tamper resistance: A cautionary note. In Proceedings
of the 2nd Workshop on Electronic Commerce, USENIX Association, Berkeley, CA, 1996,
pp. 1–11.
[Arb97] Arbaugh, W., Farber, D., & Smith, J. (1997). A secure and reliable bootstrap architec-
ture. In Proceedings of the IEEE Symposium on Security and Privacy, 1997, pp. 65–71.
[ARM05] ARM Security Technology. (2005). Building a Secure System using TrustZone
Technology. ARM white paper, Document PRD29-GENC-009492C, 2005. http://infocenter.
arm.com/help/topic/com.arm.doc.prd29-genc-009492c/PRD29-GENC-009492C_trustzone_
security_whitepaper.pdf
[Bar06] Bar-El, H., Choukri, H., Naccache, D., Tunstall, M., & Whelan, C. (2006). The sor-
cerer’s apprentice guide to fault attacks. Proceedings of the IEEE, 94(2), 370–382.
[Bel03] Belenky, A., & Ansari, N. (2003). IP traceback with deterministic packet marking. IEEE
Communications Letters, 7(40), 162–164.
[Ber04] Bertot, Y., & Castran, P. (2004). Interactive theorem proving and program development-
Coq’Art: The calculus of inductive constructions. Berlin Heidelberg: Springer.
[Bes81] Best, R. (1981). Crypto microprocessor for executing enciphered programs. US patent
4,278,837, July 1981.
74 6 Security and Safety
[Bly93] Blythe, S., Fraboni, B., Lall, S., Ahmed, H., & De Riu, U. (1993). Layout reconstruction
of complex silicon chips. IEEE Journal on Solid-State Circuits, 28(2), 138–145.
[Bol95] Bolding, D. (1995). Network security, filters and firewalls. Crossroads, 2(1), 8–10.
[Cab01] Cabrera, J., Lewis, L., Qin, X., Lee, W., Prasanth, R., Ravichandran, B., & Mehra, R.
(2001). Proactive detection of distributed denial of service attacks using MIB traffic vari-
ables—A feasibility study. In Proceedings of the IEEE/IFIP International Symposium on
Integrated Network Management, pp. 609–622.
[Cha03] Chan, H., Perrig, A., & Song, D. (2003). Random key predistribution schemes for sen-
sor networks. In Proceedings of the IEEE Symposium on Security and Privacy, pp. 197–213.
[Cha16] Chan, M., Ricketts, D., Lerner, S., & Malecha, G. (2016). Formal verification of stabil-
ity properties of cyber-physical systems. In CoqPL’16, Jan 2016.
[Chl96] Chlipala, A. (2016). Ur/web: A simple model for programming the web. Communications
of the ACM, 59(8).
[Cos16] Costan, V., & Devadas, S. (2016). Intel SGX explained. Cryptology ePrint Archive:
Report 2016/086, IACR.
[Del15] Delaware, B., Pit-Claudel, C., Gross, J., & Chlipala, A. (2015). Fiat: Deductive synthesis
of abstract data types in a proof assistant. In Proceedings of the 42nd Annual ACM SIGPLAN-
SIGACT Symposium on Principles of Programming Languages (POPL’15), Mumbai, India,
Jan. 15–17, 2015, pp. 689–700.
[Dij67] Dijkstra, E. W. (1967). A constructive approach to the problem of program correctness,
August 1967, circulated privately.
[Gar03] Garfinkel, T., Rosenblum, M., & Boneh, D. (2003). Flexible OS support and applica-
tions for trusted computing. In Proceedings of the 9th Conference on Hot Topics in Operating
Systems (Vol. 9, pp. 25–25).
[Gol07] Goldsby, H. J., Cheng, B. H. C., & Zhang, J. (2008). AMOEBA-RT: Run-Time
Verification of Adaptive Software. In Proceedings of Models in Software Engineering
(MODELS 2007), Nashville, TN, USA, September 30–October 5, 2007, LNCS-5002, Springer,
pp. 212–224.
[Hod04] Hodge, V., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial
Intelligence Review, 22(2), 85–126.
[Hus03] Hussain, A., Heidemann, J., & Papadopoulos, C. (2003). A framework for classify-
ing denial of service attacks. In Proceedings of the conference on applications, technologies,
architectures, and protocols for computer communications (pp. 99–110). New York: ACM.
[Jin10] Jin, Y., & Makris, Y. (2010). Hardware Trojans in wireless cryptographic ICs. IEEE
Design and Test, 27(1), 26–35.
[Joy09] Joye, M. (2009). Protecting RSA against fault attacks: The embedding method. In
Proceedings of the Workshop on Fault Diagnosis and Tolerance in Cryptography (FDTC),
pp. 41–45.
[Ker08] Keramidas, G., Antonopoulos, A., Serpanos, D., & Kaxiras, S. (2008). Nondeterministic
caches: A simple and effective defense against side channel attacks. Design Automation of
Embedded Systems, 12(3), 221–230.
[Kha15] Khan, M. T., Serpanos, D., & Shrobe, H. (2015). On the formal semantics of the cogni-
tive middleware AWDRAT. Technical Report MIT-CSAIL-TR-2015-007, Computer Science
and Artificial Intelligence Laboratory, MIT, USA, March 2015.
[Kha17] Khan, M. T., Serpanos, D., & Shrobe, H. ARMET: Behavior-Based Secure and Resilient
Industrial Control Systems. In Proceedings of the IEEE, Preprint. URL: http://ieeexplore.ieee.
org/stamp/stamp.jsp?tp=&arnumber=8011473&isnumber=4357935
[Kim04] Kim, S. S., Reddy, A. L. N., & Vannucci, M. (2004). Detecting traffic anomalies through
aggregate analysis of packet header data. In Proceedings of 3rd International IFIP-TC6
Networking Conference (NETWORKING 2004), Athens, Greece, May 9–14, 2004, Springer
LNCS-3042, pp. 1047–1059.
[Koc96] Kocher, P. (1996). Timing attacks on implementations of Diffie-Hellman, RSA, DSS,
and other systems. In Advances in Cryptology – CRYPTO’96. Springer, pp. 104–113.
References 75
[Koc99] Kocher, P., Jaffe, J., & Jun, B. (1999). Differential power analysis. In Advances in
Cryptology-CRYPTO’99. Springer, pp. 789–789.
[Kuh97] Kuhn, M. (1997). The Trust No1 cryptoprocessor concept. http://www.cl.cam.ac.uk/
mgk25/.
[Lak05] Lakhina, A., Crovella, M., & Diot, C. (2005). Mining anomalies using traffic feature dis-
tributions. In Proceeding of the 2005 Conference on Applications, Technologies, Architectures
and Protocols for Computer Communications (SIGCOMM 2005), Philadelphia, PA, USA,
August 22–16, 2005, pp. 217–228.
[Lie03] Lie, D., Thekkath, C., & Horowitz, M. (2003). Implementing an untrusted operating
system on trusted hardware. ACM SIGOPS Operating Systems Review, 37(5), 178–192.
[Lie00] Lie, D., Thekkath, C., Mitchell, M., Lincoln, P., Boneh, D., Mitchell, J., & Horowitz, M.
(2000). Architectural support for copy and tamper resistant software. ACM SIGPLAN Notices,
35(11), 168–177.
[Mal16] Malecha, G., Ricketts, D., Alvarez, M. M., & Lerner, S. (2016). Towards foundational
verification of cyber-physical systems. In Proceedings of 2016 Science of Security for Cyber-
Physical Systems Workshop (SOSCYPS), April 2016, pp. 1–5.
[Mic11] MICROSOFT. (2011). Shared source initiative. http://www.microsoft.com/resources/
ngscb/default.mspx
[Mor15] Dworkin, M. J. (2015). SHA-3 Standard: Permutation-based hash and extendable-
output functions. Federal Information Processing Standards (NIST FIPS) – 202, August 04,
2015.
[Mul06] Muller, G. (2006). Special issue: Privacy and security in highly dynamic systems-
introduction. Communications of the ACM, 49(9), 28–31.
[New16] Newman, L. H. (2016). What we know about Friday’s massive east coast internet out-
age. WIRED, October 21, 2016.
[Pax99] Paxson, V. (1999). Bro: A system for detecting network intruders in real-time. Computer
Networks, 31(23–24), 2435–2463.
[Pea02] Pearson, S. (2002). Trusted computing platforms: TCPA technology in context. USA:
Prentice Hall.
[Pen07] Peng, T., Leckie, C., & Ramamohana-Rao, K. (2007). Survey of network-based defense
mechanisms countering the DoS and DDoS problems. ACM Computing Surveys, 39(1), Article 3.
[Per04] Perrig, A., Stankovic, J., & Wagner, D. (2004). Security in wireless sensor networks.
Communications of the ACM, 47(6), 53–57.
[Qui01] Quisquater, J. J., & Samyde, D. (2001). Electromagnetic analysis (EMA): Measures and
counter-measures for smart cards. In Proceedings of the International Conference on Research
in Smart Cards: Smart Card Programming and Security, Springer LNCS-2140, pp. 200–210.
[Rav04] Ravi, S., Raghunathan, A., Kocher, P., & Hattangady, S. (2004). Security in embed-
ded systems: Design challenges. ACM Transactions on Embedded Computing Systems, 3(3),
461–491.
[Roe99] Roesch, M. (1999). Snort – lightweight intrusion detection for networks. In Proceedings
of the 13th USENIX Conference on System Administration (LISA ‘99), pp. 229–238.
[RSA78] Rivest, R. L., Shamir, A., & Adleman, L. (Feb. 1978). A method for obtaining digital
signatures and public-key cryptosystems. Communications of the ACM, 21(2), 120–126.
[Sav01] Savage, S., Wetherall, D., Karlin, A., & Anderson, T. (2001). Network support for IP
traceback. IEEE/ACM Transactions on Networking, 9(3), 226–237.
[Ser08] Serpanos, D., & Henkel, J. (2008). Dependability and security will change embedded
computing. Computer, 41(1), 103–105.
[Ser13] Serpanos, D. N., & Voyiatzis, A. G. (2013). Security challenges in embedded systems.
ACM Transactions on Embedded Computing Systems, 12(1s), Article 66.
[Sie82] Siewiorek, D., & Swarz, R. (1982). The theory and practice of reliable system design.
Bedford: Digital Press.
76 6 Security and Safety
[Sli02] Slijepcevic, S., Potkonjak, M., Tsiatsis, V., Zimbeck, S., & Srivastava, M. (2002). On
communication security in wireless ad-hoc sensor networks. In Proceedings of the 11th IEEE
International Workshop on Enabling Technologies, pp. 139–144.
[Sno02] Snoeren, A., Partridge, C., Sanchez, L., Jones, C., Tchakountio, F., Schwartz, B., Kent,
S., & Strayer, W. (2002). Single-packet IP traceback. IEEE/ACM Transactions on Networking,
10(6), 721–734.
[Val00] Valdes, A., & Skinner, K. (2000). Adaptive, model-based monitoring for Cyber Attack
Detection. In Proceedings of the 3rd International Workshop on Recent Advances in Intrusion
Detection (RAID 2000), Toulouse, France, October 2–4, 2000, Springer, pp. 80–93.
[Wan07] Wang, H., Jin, C., & Shin, K. (2007). Defense against spoofed IP traffic using hop-count
filtering. IEEE/ACM Transactions on Networking, 15(1), 40–53.
[Wan02] Wang, H., Zhang, D., & Shin, K. (2002). Detecting SYN flooding attacks. In Proceedings
of the 21st Annual Joint Conference of the IEEE Computer and Communications Societies
(INFOCOM’02), pp. 1530–1539.
[Wat07] Watterson, C., & Heffernan, D. (2007). Runtime verification and monitoring of embed-
ded systems. Software, IET, 1(5), 172–179.
[Yan12] Yang, J., Yessenov, K., & Solar-Lezama, A. (2012). A language for automatically
enforcing privacy policies. In Proceedings of the 39th ACM Symposium on Principles of
Programming Languages (POPL 2012), Philadelphia, PA, USA, January 25–27, 2012,
pp. 85–96.
[Zhu03] Zhu, S., Setia, S., & Jajodia, S. (2003). LEAP: Efficient security mechanisms for large-
scale distributed sensor networks. In Proceedings of the 10th ACM Conference on Computer
and Communications Security, pp. 62–72.
Chapter 7
Security Testing IoT Systems
7.1 Introduction
Successful testing of IoT systems is critical considering that many of them have
strong requirements that are crucial to their operation, such as meeting real-time
constraints, satisfying specific safety properties, and continuing operation even
under strained conditions. Furthermore, IoT systems include a communication
component, which constitutes a testing challenge because the specifications of com-
munication protocols often have undefined parameters that lead to differing imple-
mentations by different vendors; this is the reason why interoperability in
communication systems is an important challenge. The criticality of IoT testing,
especially for security, becomes more apparent when considering industrial IoT
systems, which are extensively used in critical infrastructures nowadays, such as
energy networks, water management systems, etc. Successful testing not only con-
firms the expected operations but takes away from attackers the tools to cause mal-
functions and disruptions; in the emerging environment, even crashing an application
or an operation may be more catastrophic than hijacking them.
Hardware and software testing are technological areas with significant effort in
the market and in academia for decades. A large number of methodologies and tools
have been developed, but software testing has been a significantly harder problem
than hardware testing because of several differentiating characteristics software has,
such as evolution through added features and functionality, fault models and lack of
re-use. Considering that most IoT systems are built using off-the-shelf hardware
components and computing subsystems, we address software testing for security in
this chapter, and, more specifically, we focus on the testing of their communication
protocol implementations, since it is the point of entry to systems and a common
target of attackers. We present fuzz testing, the most common testing approach for
security, which requires no information about the internal structure of the system
that is tested. As industrial IoT systems constitute an attractive target for attackers
that exploit testing techniques, we use as an example the Modbus protocol and
describe fuzz testing techniques for its implementations, which give successful
results for existing protocol implementations in the field.
execution faults – examining addresses, for example – without changing the func-
tionality of the original programs [Cow98]. TaintCheck applies taint analysis for
automatic vulnerability analysis without need for the source code [Cla07, New04].
Dynamic analysis methods enable powerful mechanisms for vulnerability detection
at the cost of execution time overhead, because of the additional code that is inserted
in the application program. Simulation has also been proposed for vulnerability
testing, where a simulation environment is used to inject faults to a program and
check its behavior [Du02]. This is a systematic method, but it is limited to input
patterns that may cause errors.
Fuzz testing (fuzzing) provides an alternative, reliable approach with successful
results and advantages over the previous methods. Fuzzing is a testing method that
applies test inputs (vectors) to a system under test (SUT) and observes its outputs,
as shown in Fig. 7.1. The goal of the fuzzer is to identify faults in the SUT, e.g., to
detect inputs that lead to a system crash. The effectiveness of the fuzzer is based on
its ability to identify as many vulnerabilities as possible covering effectively the
input value space. If there is inability to identify whether a system or a program has
crashed during a test, the effectiveness of the fuzzer cannot be evaluated.
Fuzzing provides several advantages over static and dynamic analysis. First, it
can be applied to programs whose source code is not available. Second, it is inde-
pendent of the internal complexity of the tested software which limits in practice
other methods, such as static analysis. Because of this independence, the same fuzz-
ing tool can be used to test similar programs independently of the programming
language used for their coding. Finally, the identified faults and errors can be
directly associated to the user input and can be evaluated easier.
Fuzz testing has its limitations. The space of input values is vast, and thus, it is
impossible to test large systems for all their potential input values within reasonable
time frames. A fuzzer that produces random input values can discover faults and
vulnerabilities, but, in general, it will not detect easily many important vulnerabili-
ties unless it follows some specific strategic approach. Its effectiveness depends on
its ability to identify representative input values, which may originate from attacks
or common errors with invalid inputs, and detect vulnerabilities that are useful to
attackers.
Fuzz testing can be classified in three (3) categories, depending on the informa-
tion that is available for the system under test (SUT) [Tak08] [Sut07], as shown in
Fig. 7.1:
-Tester/ System-Under-Test
Fuzzer (SUT)
Outputs
80 7 Security Testing IoT Systems
• White-box testing: the source code or the specification of the SUT is known.
• Black-box testing: the internal structure of the SUT is unknown –testing is lim-
ited to observations of SUT inputs and outputs.
• Grey-box testing: partial information for the SUT internal structure is available,
e.g., through reverse engineering or static analysis results.
Modern white-box fuzz testing tools exploit the information about the system’s
internal structure using symbolic execution techniques or taint analysis to identify
vulnerabilities. Symbolic execution replaces symbolic values in the source code or
the program flow, in order to evaluate code execution paths [Cad13]. These tech-
niques have been explored widely in efforts such as DART [God05], SAGE [God12],
EXE [Cad06], and KLEE [Cad08]. Tools like AEG [Avg11] and CRAX [Hua12]
combine symbolic execution with concrete execution, employing concolic testing
[Sen05] to identify vulnerabilities that lead to control flow hijacking. Such tools
have been very successful in fuzz testing of Windows and Linux applications
[God12, Cad06]. The techniques have the advantage that they can explore all pos-
sible modes of applications, since they use the source code, and identify dead code.
However, they cannot identify logic errors in programs and are unable to explore all
execution paths in large programs with complex structures. Tools that use taint anal-
ysis identify potential attack points in programs by tracing tainted values and then
fuzz the input values to these attack points [Sch10]. BuzzFuzz [Gan09] and
TaintScope [Wan10] are two representative tools that exploit taint analysis
techniques.
Black-box fuzzing techniques do not have any structural information about the sys-
tem under test. Since testing requires application of inputs to the system and obser-
vation of its outputs, one of the most popular targets of black-box fuzzing is the
implementation of communication protocols because they provide the first point of
entry to systems and they typically implement some standard; so, our description is
focused on protocols, although the techniques can be applied to application and
system software in general.
There are two main approaches to generate fuzz testing inputs to protocols: (i)
data generation and (ii) data mutation [Nal12, Tak08, Sut07]. Data generation tech-
niques create input packets to a protocol implementation either randomly or with a
systematic method that takes into account the specifications of the specific protocol.
The contents of these packets may be completely random, or they may take into
account the structure of the packets, i.e., their fields, and insert either random or
7.2 Fuzz Testing for Security 81
special values in the fields, depending on various parameters, such as the system
interface or a specific targeted operation. In this case, the specification of the proto-
col needs to be integrated in the fuzzer. Clearly, the effectiveness of the fuzzing
process depends on the successful integration of the protocol specification in the
fuzzer, since any problem in that integration may lead to limited or no coverage of
a wide range of tests.
Mutation fuzzing creates the test inputs based on legal protocol packets. It takes
as input the legal packets and changes (mutates) some of their data, e.g., specific
fields, in order to create the test packets that are input to the system. This approach
is especially useful in cases where the protocol is complex, because the fuzzer does
not construct packets from scratch but uses known legal packets and mutates them.
Thus, the fuzzer does not need to include the protocol specification, and the author
of the fuzzer does not need to delve into the details of the protocol, thus avoiding the
risk of misinterpretations and creation of inappropriate packets.
These two main approaches are coupled with techniques that choose the values
that are used in the generated or mutated packets. The most common techniques are:
1. Random: generates of random values without any consideration of packet struc-
ture, legal values, etc. The technique is fast, low cost, and quite successful
[Mil90, Mil95, Mil06] but limited because it is characterized by low test
coverage.
2. Block-based: manages data values in blocks, taking into account the specifica-
tions of protocols and creating meaningful blocks of values, in contrast to ran-
dom values. The technique has been used widely in frameworks and tools, such
as Spike [Ait02], SNOOZE [Ban06], Sulley [Ami14], Peach [Pea14], Autodafè
[Vua06], and AspFuzz [Kit10], and is especially useful in mutation fuzzing. The
success of the technique depends on the successful integration of protocol specs
in the fuzzers.
3. Grammar-based: embeds a grammar in the fuzzer, in order to cover part of the
specification of legal inputs to the system under test. Fuzzing inputs are created
with the consideration of the grammar. PROTOS [PRO] is a representative tool
using this technique.
4. Heuristic-based: generates new fuzzing inputs taking into account the effective-
ness of the inputs applied in the past. Processing of the outputs obtained from the
prior tests can be done with various methods such as with appropriate genetic
algorithms [Spa07] or statistical analysis [Zha11].
There exist also approaches that construct protocol descriptions or specifications by
observing real protocol traffic. With this information, related tools can make more
effective decisions about how to mutate observed packets, in order to increase the
effectiveness of mutation fuzzers. General Purpose Fuzzer (GPF) [Vda14] and
AutoFuzz [Gor10] are representative tools that employ this approach. Interestingly,
in mutation fuzzing there is also the approach of creating test cases based on exist-
ing attack traffic [Ant12, Tsa12].
82 7 Security Testing IoT Systems
Fuzz testing for industrial networks has attracted significant interest in the market
and in academia, considering the increasing adoption of industrial control systems
in critical infrastructures. Many commercial and open source fuzzing tools support
industrial protocols. Sulley [Dev07] provides fuzzing modules for ICCP, Modbus,
and DNP3 since 2007. ProFuzz [Koc], a fuzzing tool based on Scapy [Bio], sup-
ports fuzzing in PROFINET. Achilles test platform [Ach17] supports fuzzing for
SCADA protocols, like Modbus/TCP and DNP3.
There is also research work in fuzzing industrial protocols using various tech-
niques. Black-box mutation fuzzing, for example, has been explored for SCADA
networks without any knowledge about the networking protocol [Sha11] and using
the LZ-Fuzz tool [Bra08] to evaluate its effectiveness. OPC-MFuzzer [Wan13,
Qi14] is a mutation fuzzer (based on Peach [Pea14]) for OPC SCADA fuzzing.
Based on three different mechanisms to produce fuzzing inputs, the tool identified
and confirmed known vulnerabilities that had been included previously in the
National Vulnerability Database (NVD) [Nis].
Modbus fuzzing has attracted significant attention as well. BlackPeer [Byr06]
produces inputs and checks outputs using a grammar that is included in the tool;
although successful, it has limited flexibility as it cannot adjust easily to new tests.
Sulley [Dev07], a block-based framework, enables methodical and easy mutation
fuzzing through its Modbus module; however, its block-based approach is limited
for testing devices that deviate from the standard implementation and are custom-
ized by the users. A framework for fuzz testing Modbus for security has also been
proposed based on Scapy [Kob07].
Modbus Application
Protocol
Modbus Messaging
(Mapping) on TCP
TCP
Fig. 7.3 Modbus
Function code
application packet Data
(FC)
request the reading of a sensor attached to a slave PLC (server), or it may request
the writing of a command to an actuator to turn a switch.
Modbus application packets are simple, composed of two fields, a function code
(FC) and data, as shown in Fig. 7.3. Requests from servers send the function code
that defines the operation to be performed and the related data, e.g., an address or
command. A response from a client includes the function code that was executed at
the client and the resulting related data. Since an operation may not be successfully
executed at the client, the protocol defines that the client will respond with the origi-
nal function code if the related operation is executed correctly, or it will send an
exception code indicating that the operation was not executed.
Modbus has three different classes of function codes: public codes, user-defined
codes and reserved ones. Public codes are defined by the standard and include num-
bering and operation definition. Reserved codes are also public, but they cannot be
used freely, since they have been defined and reserved for interoperability purposes
with legacy industrial control systems. User-defined codes are available to develop-
ers and users to implement specialized function codes at will. Since the function
code field is 8 bits, function codes can have 256 values, in the range 0–255. Public
codes are in the ranges 1–64, 73–99, and 111–127; these ranges include the reserved
codes. User-defined codes may have values in the ranges 65–72 and 100–110. The
codes 128–255 are used to indicate errors; each function code has its unique related
exception code, which differs from the function code at the most significant bit;
with the 8-bit format, all function codes have “0” as their most significant bit and all
exception codes have it as “1.”
84 7 Security Testing IoT Systems
Most Modbus function codes perform read and write operations to device data.
For this purpose, Modbus considers that devices store data in tables. There are four
different table types, based on the data entry size (1 bit or 16 bits) and the access
operation allowed (read or read/write). The tables are denoted as (i) discrete input,
with 1-bit entries and only read operations allowed; (ii) coils, with 1-bit entries and
read/write operations allowed; (iii) input registers, with 16-bit entries and only read
operations allowed; and (iv) holding registers, with 16-bit entries and read/write
operations allowed. All four types of tables can have up to 64 K entries. Importantly,
these tables are actually virtual, meaning that they can be physically separate in the
device’s memory or they can overlay over the same physical memory cells. Modbus
can also access files, which are sequences of records (up to 10,000), and each record
has a length measured with 16-bit units.
Modbus application packets (protocol data units, or PDUs) are encapsulated in
lower layer protocol packets to be transmitted. When serial connections are used,
the application packets are encapsulated by the data link control (DLC) protocol and
produce DLC PDUs that are then transmitted by the serial protocol. DLC packets
add an address field for the slave next to the function code field and a checksum next
to the data field of the application protocol, as Fig. 7.4a shows. In the case of the
serial physical layer, there are two formats for the DLC packets, denoted as RDU
and ASCII. The main difference between the two is the size of the slave address and
the size of the function code field: in RTU format, they are both one byte, while in
the ASCII format, each one is 2 bytes long.
Modbus over TCP is performed by extending the Modbus application packet first
with an additional header, named MBAP (Modbus Application Protocol) header as
shown in Fig. 7.4b, and then encapsulating this extended packet by the TCP/IP pro-
tocol stack, which employs Ethernet at the data link control and physical protocol
layers, as shown in Fig. 7.2c.
Modbus does not include security mechanisms such as authentication, confiden-
tiality, or integrity. The lack of security renders its implementations vulnerable to a
wide range of attacks. The lack of confidentiality enables attackers to extract infor-
mation from captured packets, while the lack of integrity checks does not allow a
receiver of a packet to identify whether the packet has been altered. Replay attacks
are possible as well and the lack of non-repudiation mechanisms can lead to inabil-
ity to analyze and audit systems credibly.
7.4 Fuzzing Modbus 85
There exist several Modbus fuzzers, as described in Sect. 7.2. In this subsection, we
present the approach and results of MTF (Modbus/TCP fuzzer) [Voy15] as a repre-
sentative example. The choice of MTF is based on its characteristics that show the
trends in fuzzing technology today: it is an automated tool, it provides good cover-
age of input tests, and it does not require physical access to the system under test,
operating remotely over the network. These characteristics make MTF an attractive
tool for testing security and compliance of Modbus connected devices.
MTF incorporates the specification of Modbus/TCP and supports fuzzing both
master and slave devices on the network. As an automated tool for fuzzing, MTF
operates in three main phases: (i) reconnaissance, (ii) attack, and (iii) failure detec-
tion. In the first phase, MTF identifies the operational characteristics and parameters
of the tested system. In the second phase, it applies tests to the system and collects
its responses, while in the third phase it evaluates the collected (observed) responses
to identify security problems and system failures.
Reconnaissance is an important operation in automated black-box or gray-box
fuzzers, because it identifies the operations performed by the system under test and
its important parameters. In the case of Modbus, in order to generate meaningful
tests, one needs to know the function codes used by the system as well as its mem-
ory model, i.e., the four memory types – discrete inputs, coils, input registers, and
holding registers – that are specified by the standard. MTF explores the function
codes through different methods, in order to accommodate different types of devices
that may be fully or partially conformant with the standard. A straightforward
method is to ask the device for identification information – the standard specifies
function code 43 for this operation – and then, based on this, to find information
off-line about the supported function codes, e.g., from a manual. Alternatively, it
sends legitimate requests and examines the responses, which indicate whether the
requests have been executed or not (as described in the standard specification), or it
monitors traffic from the device and extracts functional information from that.
In regard to the memory model of the tested system, MTF effectively identifies
the boundary memory addresses for each type of memory. This is done either
actively, sending packets with the appropriate function codes probing specific
address values, or passively, observing traffic which eventually indicates memory
bounds, although these bounds may be approximate.
Taking into account the list of function codes and the memory mapping for the
four memory types, the fuzzer can construct legitimate packets and fuzz them in
order to test the system. Since the supported function codes are known, MTF con-
structs a set of packet sequences for each supported function code, where each
sequence implements a potential attack to the system; such attacks include packet
removal, packet injection, and packet field manipulation.
Packet field manipulation is performed with field values that are boundary, ran-
dom, or illegal.
86 7 Security Testing IoT Systems
When tests are applied, the response, or its absence, is recorded. The tool records
the sequence of all tests and related responses and produces a list of errors which are
invalid responses (out of specification), valid but with incorrect parameters (values,
size, etc.), and delayed or incomplete (no response). Further processing of the
records, including both the valid request/response pairs and the errors, leads to
detection of security and dependability problems, i.e., malicious or accidental
failures.
The MTF approach is representative of the trends in fuzzing industrial protocols.
It provides a complete approach to fuzzing, starting with reconnaissance, continu-
ing with meaningful tests and, finally, analyzing the results for security and reliabil-
ity failures. Its practicality has been demonstrated through the prototype
implementation described in the original work [Voy15], which has been used to
evaluate several commercial and open source Modbus subsystems and for several
attacks. The attacks include packet dropping, packet injection, illegal field values,
altered function codes, and even flooding, leading to denial of service attacks.
Importantly, many of these attacks have been successful against commercial
Modbus implementations, as the reported original results demonstrate. Interestingly,
MTF succeeds in attacking these implementations much more efficiently than alter-
native tools, i.e., with a significantly smaller number of packets. Overall, the results
demonstrate that the approach of generation fuzzing is an effective and efficient
fuzzing method.
References
[Wan13] Wang, T., et al. (2013). Design and implementation of fuzzing technology for OPC pro-
tocol. In Proceedings of 9th International Conference on Intelligent Information Hiding and
Multimedia Signal Processing, Beijing, China, 2013, pp. 424–428.
[Zha11] Zhao, J., Wen, Y., & Zhao, G. (2011). H-fuzzing: A new heuristic method for fuzz-
ing data generation. In Proceedings of Network and Parallel Computing, LNCS, Vol. 6985,
Springer, 2011, pp. 32–43.
Index
A security, 13
Advanced Message Queuing Protocol semantics-aware communication
(AMQP), 50 mechanism, 9
Alert system, 3 subscribers, 11
Amazon Simple Storage Service tree-based topology, 9
(Amazon S3), 12 XMPP protocol, 11
Amazon Web Services (AWS), 12 Zigbee () network, 10
Analysis system, 3 ARMET approach, 68–72
Application process security, 65, 66 AutoFuzz tool, 81
Architecture
AWS, 12
BLE, 10 B
CoAP, 11 Black-box fuzzing
databases communication protocols
Amazon Simple Storage Service implementation, 80
(Amazon S3), 12 data generation, 80
DTW, 13 data mutation, 81
Google Cloud Storage, 12 generated/mutated packets, 81
short-term and long-term storage, 12 GPF and AutoFuzz tools, 81
time-series data, 12 grammar-based, 81
unstructured databases, 12 heuristic-based, 81
DDS, 9 random and block-based techniques, 81
design goals, 9 Black-box testing, 80
event-style communication, 8 Block-based technique, 81
Google Cloud Pub/Sub, 11 Bluetooth low energy (BLE), 10
HTTP protocol, 8 Bluetooth stack, 11
LoRa, 10
Microsoft Azure system, 12
MQTT, 10 C
multi-hop, end-to-end communication, 7 Communication protocol, 7
Network Time Protocol (RFC1305), 13 Constrained Application Protocol (CoAP),
organization, 7, 8 11, 50
publish/subscribe model, 8 Continuous time Markov decision process
QoS parameters, 9 (CTMDP), 34
REST, 11 Control system, 3
RTPS, 9 Cost of ownership, 17, 18
U White-box testing, 80
User-defined codes, 83 Wireless networks, 4
Wireless sensor networks (WSNs), 50
W
White-box fuzzing X
code execution paths, 80 XMPP protocol, 11
symbolic execution, 80
taint analysis techniques, 80
techniques, 80 Z
Windows and Linux applications, 80 Zigbee () network, 10