Classifying IoT Devices in Smart Environments Using Network Traffic Characteristics
Classifying IoT Devices in Smart Environments Using Network Traffic Characteristics
Abstract—The Internet of Things (IoT) is being hailed as the next wave revolutionizing our society, and smart homes, enterprises, and
cities are increasingly being equipped with a plethora of IoT devices. Yet, operators of such smart environments may not even be fully
aware of their IoTassets, let alone whether each IoT device is functioning properly safe from cyber-attacks. In this paper, we address this
challenge by developing a robust framework for IoT device classification using traffic characteristics obtained at the network level. Our
contributions are fourfold. First, we instrument a smart environment with 28 different IoT devices spanning cameras, lights, plugs, motion
sensors, appliances, and health-monitors. We collect and synthesize traffic traces from this infrastructure for a period of six months, a
subset of which we release as open data for the community to use. Second, we present insights into the underlying network traffic
characteristics using statistical attributes such as activity cycles, port numbers, signalling patterns, and cipher suites. Third, we develop
a multi-stage machine learning based classification algorithm and demonstrate its ability to identify specific IoT devices with over
99 percent accuracy based on their network activity. Finally, we discuss the trade-offs between cost, speed, and performance involved in
deploying the classification framework in real-time. Our study paves the way for operators of smart environments to monitor their IoT
assets for presence, functionality, and cyber-security without requiring any specialized devices or protocols.
Ç
1 INTRODUCTION
1536-1233 ß 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on October 05,2024 at 14:43:49 UTC from IEEE Xplore. Restrictions apply.
1746 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 18, NO. 8, AUGUST 2019
TABLE 1
MAC Address and DHCP Host Name of IoT Devices Used in Our Testbed
and lastly (e) these host names can be changed by the user emerging of how IoT devices have been compromised
(e.g., the HP printer can be given an arbitrary host name). and used to launch large-scale attacks [13]. The large het-
For these reasons, relying on DHCP infrastructure is not a erogeneity in IoT devices has led researchers to propose
viable solution to correctly identify devices at scale. network-level security mechanisms that analyze traffic
In this paper, we address the above problem by develop- patterns to identify attacks (see [14] and our recent work
ing a robust framework that classifies each IoT device sepa- [15]); success of these approaches relies on a good under-
rately in addition to one class of non-IoT devices with high standing of what “normal” IoT traffic profile looks like.
accuracy using statistical attributes derived from network Our primary focus in this work is to establish a machine
traffic characteristics. Qualitatively, most IoT devices are learning framework based on various network traffic char-
expected to send short bursts of data sporadically. Quantita- acteristics to identify and classify the default (i.e., baseline)
tively, our preliminary work in [6] was one of the first behavior of IoT devices on a network. Such a framework
attempts to study how much traffic IoT devices send in a can potentially be used in the future to detect anomalous
burst and how long they idle between activities. We also behavior of IoT devices (potentially due to cyber-attacks),
evaluated how much signaling they perform (e.g., domain and such anomaly detection schemes are beyond the scope
lookups using DNS or time synchronization using NTP) in of this paper. This paper fills an important gap in the litera-
comparison to the data traffic they generate. This paper sig- ture relating to classification of IoT devices based on their
nificantly expands on our prior work by employing a more network traffic characteristics. Our contributions are:
comprehensive set of attributes on trace data captured over 1) We instrument a living lab with 28 IoT devices emu-
a much longer duration (of 6 months) from a test-bed com- lating a smart environment. The devices include
prising 28 different IoT devices. cameras, lights, plugs, motion sensors, appliances
There is no doubt that it is becoming increasingly and health-monitors. We collect and synthesize data
important to understand the nature of IoT traffic. Doing so from this environment for a period of 6 months. A
helps contain unnecessary multicast/broadcast traffic, subset of our data is made available for the research
reducing the impact they have on other applications. It community to use.
also enables operators of smart cities and enterprises to 2) We identify key statistical attributes such as activity
dimension their networks for appropriate performance cycles, port numbers, signaling patterns and cipher
levels in terms of reliability, loss, and latency needed by suites, and use them to give insights into the under-
environmental, health, or safety applications. However, lying network traffic characteristics.
the most compelling reason for characterizing IoT traffic is 3) We develop a multi-stage machine learning based
to detect and mitigate cyber-security attacks. It is widely classification algorithm and demonstrate its ability
known that IoT devices are by their nature and design to identify specific IoT devices with over 99 percent
easy to infiltrate [7], [8], [9], [10], [11], [12]. New stories are accuracy based on their network behavior.
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on October 05,2024 at 14:43:49 UTC from IEEE Xplore. Restrictions apply.
SIVANATHAN ET AL.: CLASSIFYING IOT DEVICES IN SMART ENVIRONMENTS USING NETWORK TRAFFIC CHARACTERISTICS 1747
4) We evaluate the deployment of the classification classify traffic application or identify malwares/botnets for
framework in real-time, by examining the trade-offs typical computer networks. The work in [27] uses deep learn-
between costs, speed, and accuracy of the classifier. ing to classify flow types such as HTTP, SMTP, Telnet, QUIC,
The rest of this paper is organized as follows: Section 2 Office365, and YouTube by considering six features namely
describes relevant prior work. We present our IoT setup source/destination port number, payload volume, TCP win-
and data traces in Section 3, and in Section 4 characterize dow size, inter-arrival time and direction of traffic that are
traffic attributes of the various IoT devices. In Section 5 we extracted from the first 20 packets of a flow. The work carried
propose a machine learning based multi-stage device classi- out in [28] suggests that botnets exhibit identifiable traffic
fication method and evaluate its performance, followed by patterns that can be classified by considering features such
a discussion on the real-time operation of the proposed sys- as average time between successive flows, flow duration,
tem in Section 6. The paper is concluded in Section 7. inbound/outbound traffic volume, and Fourier transforma-
tion over the flow start times. Detection of malicious activity
2 RELATED WORK on the network was enhanced in [29] and [30] by combining
these flow-level features with packet-level attributes includ-
There is a large body of work characterizing general Internet ing packet size, byte distribution of payload, inter arrival
traffic [16], [17], [18], [19]. These prior works largely focus on times of packets and TLS handshake metadata (i.e., cipher
application detection (e.g., Web browsing, Gaming, Mail, suite codes). Further, authors have released an open source
Skype VoIP, Peer-to-Peer, etc.). However, studies focusing libpcap-based tool called Joy [31] to extract these features
on characterizing IoT traffic (also referred to as machine-to- from the passive capture of network traffic.
machine or M2M traffic) are still in their infancy. In the context of IoT, [32] uses machine learning to clas-
Analysis of Empirical Traces. The work in [20] is one of the sify a single TCP flow from authorized devices on the net-
first large-scale studies to delve into the nature of M2M traf- work. It employs over 300 attributes (packet-level and flow-
fic. It is motivated by the need to understand whether M2M level), though the most influential ones are minimum,
traffic imposes new challenges for the design and manage- median and average of packets Time-To-Live (TTL), the
ment of cellular networks. The work uses a traffic trace ratio of total bytes transmitted and received, total number
spanning one week from a tier-1 cellular network operator packets with reset (RST) flag, and the Alexa rank of server.
and compares M2M traffic with traditional smart-phone While all the above works make important contributions,
traffic from a number of different perspectives—temporal they do not undertake fine-grained characterization and
variations, mobility, network performance, and so on. The classification of IoT devices in a smart environment such as
study informs network operators to be cognizant of these a home, city, campus or enterprise. Furthermore, statistical
factors when managing their networks. models are not developed that enable IoT device classifica-
In [21], the authors note that the amount of traffic gener- tion based on their network traffic characteristics. Most
ated by a single M2M device is likely to be small, but the importantly, prior works do not make any data set publicly
total traffic generated by hundreds or thousands of M2M available for the research community to use and build
devices would be substantial. These observations are to upon. Our work overcomes these shortcomings.
some extent corroborated by [22], [23], which note that a
remote patient monitoring application is expected to gener- 3 IOT TRAFFIC COLLECTION AND SYNTHESIS
ate about 0.35 MB per day and smart meters roughly
0.07 MB per day. In this section, we describe our smart environment infra-
Aggregated Traffic Model. A Coupled Markov Modulated structure for collecting and synthesizing traffic from various
Poisson Processes framework to capture the behavior of a IoT devices.
single machine-type communication as well as the collective
behavior of tens of thousands of M2M devices is proposed 3.1 Experimental Test-Bed
in [24]. The complexity of the CMMPP framework is shown A real-life architecture of a “smart environment” is depicted
to grow linearly with the number of M2M devices, render- in Fig. 1 that serves a wide range of IoT and non-IoT devices
ing it effective for large-scale synthesis of M2M traffic. over its (wired/wireless) network infrastructure and allows
In [25], the authors show that it is possible to split the them to communicate with the Internet servers via a gate-
(traffic) state of an M2M device into three generic categories, way. Our lab setup is a specialized implementation of this
namely periodic update, event driven, and payload excha- architecture, housed at our campus facility, comprises one
nge, and a number of modelling strategies that use these node of TP-Link Archer C7 v2 WiFi access point (represent-
states are developed. An illustration of model fitting is ing internal switch) collocated with the Internet gateway.
shown via a use-case in fleet management comprising 1000 The TP-Link access point, flashed with the OpenWrt firm-
trucks run by a transportation company. The fitting is based ware release Chaos Calmer (15.05.1, r48532), serves as the
on measured M2M traffic from a 2G/3G network. A simple gateway to the public Internet. We also installed additional
model to estimate the volume of M2M traffic generated in a OpenWrt packages on the gateway, namely tcpdump
wireless sensor network enabled connected home is con- (4.5.1-4) for capturing traffic, bash (4.3.39-1) for
structed in [26]. Since behavior of sensors is very application scripting, block-mount package for mounting external
specific, the work identifies certain common communication USB storage on the gateway, kmod-usb-core and kmod-
patterns that can be attributed to any sensor device. Using usb-storage (3.18.23-1) for storing the traffic trace
these attributes, four generalized equations are proposed to data on the USB storage.
estimate the volume of traffic generated by a sensor network In our lab setup, the WAN interface of the TP-Link access
enabled connected apartment/home. point is connected to the public Internet via the university
Use of Machine Learning. Various machine-learning-based network, while the IoT devices are connected to the LAN
analytical methods have been proposed in the literature to and WLAN interfaces respectively. Our smart environment
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on October 05,2024 at 14:43:49 UTC from IEEE Xplore. Restrictions apply.
1748 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 18, NO. 8, AUGUST 2019
Fig. 2. Sankey diagram of daily network activity for two representative IoT devices, Amazon Echo, and LiFX lightbulb. A clear distinction is observed
in terms of their communication patterns, i.e., the servers they talk to, and the port numbers and protocols used for data exchange.
Fig. 3. Distribution of IoT activity pattern: (a) Flow volume, (b) flow duration, (c) average flow rate, and (d) device sleep time.
4.1 IoT Activity and Volume Pattern Fig. 3a that each IoT device tends to exchange a small
We start with the activity pattern of IoT devices that is amount of data per flow. For the case of the LiFX lightbulb
defined by the properties of their traffic flows. We define (depicted by red bars), 26 percent of flows transfer between
four key attributes at a per-flow level to characterize IoT [130, 140] bytes and 20 percent between [120, 130] bytes.
devices based on their network activity: flow volume (i.e., The flow volume for the Belkin motion sensor (depicted by
sum total of download and upload bytes), flow duration (i.e., green bars) is slightly higher; over 35 percent of flows trans-
time between the first and the last packet in a flow), average fer between [2800, 3800] bytes. For the Amazon Echo
flow rate (i.e., flow volume divided by the flow duration), (depicted by blue bars), over 95 percent of flows transfer
and device sleep time (i.e., time interval over which the IoT less than 1000 bytes. Though we present the flow volume
device has no active flow). histogram for only a few devices, most of our IoT devices
We plot in Fig. 3 the probability distribution of the above exhibit a similar predictable pattern.
four attributes for a chosen set of IoT devices using the trace A similar pattern emerges for the flow duration as
data collected over 26 weeks. It can be observed from well. Referring to Fig. 3b, we note that the flow duration of
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on October 05,2024 at 14:43:49 UTC from IEEE Xplore. Restrictions apply.
1750 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 18, NO. 8, AUGUST 2019
Fig. 4. Word-cloud of server ports (total count of unique ports is shown in {sub-captions} next to the device name).
53 seconds is seen in more than 40 percent of flows for in Figs. 4e, 4f, and 4g. We also note that well-known standard
Amazon Echo, while a duration of 60 seconds is seen for the port numbers such as 53 (DNS), 123 (NTP), 0 (ICMP) and
LiFX lightbulb and Belkin motion sensor with a probability 1900 (SSDP) are used by many of the IoT devices as well as
of 50 and 21 percent respectively. the non-IoTs with various frequencies, as shown in Fig. 4.
For the average flow rate attribute, Fig. 3c shows that the Moreover, the server-side port number of 443 (TLS/SSL) is
mean rate is rather small, in the bits-per-second range as also used by many of the IoT devices.
one would qualitatively expect. Quantitatively, the figure
shows that the LiFX lightbulb has an average flow rate of 4.2.2 DNS Queries
18 bits-per-second nearly 60 percent of the time. Nearly
DNS is a common application used by almost all networked
30 percent of Belkin flows have a bit rate in the range 59 to
devices. Since IoT devices are custom-designed for specific
60 bits-per-second while nearly 40 percent Amazon Echo
purposes, they access a limited number of domains corre-
flows have a bit range in the range 70 to 71 bits-per-second.
sponding to their vendor-specific end-point servers. We
Lastly, in terms of the sleep time for the devices Fig. 3d
plot in Fig. 5 the word cloud of domain names accessed by
shows that the Belkin motion sensor and the LiFX lightbulb
several IoT devices as well as non-IoTs. It is seen that IoT
exhibit a distinct sleep pattern. The duration is 1 second and
devices are fairly distinguishable by the domain names they
60 seconds with probability 73 and 48 percent respectively.
communicate with. For example, as depicted in Figs. 5a, 5b,
However, multiple sleep times with small probabilities are
and 5c, domains such as example.com, example.net,
observed for the Amazon Echo. This is because Amazon
and example.org are frequently requested by Amazon
Echo keeps its TCP connections alive and goes to sleep only
Echo; sub-domains of hp.com and hpeprint.com are
when it disconnects from the Internet. Other devices in our
seen in DNS queries from the HP printer. However, we also
test-bed also perform like the Echo and do not seem to have
see that some prominent domain names are shared between
a dominant sleep pattern.
the different devices. For example, belkin.com and
d3gjecg2uu2faq.cloudfront.net are commonly used
4.2 IoT Signaling Pattern by Belkin devices (i.e., camera, motion sensor and power
We now focus on the application layer protocols, inferred switch) as shown in Figs. 5d, 5e, and 5f; or pool.ntp.org
using the port numbers, that IoT devices mostly use to com- is prominent in traffic flows generated from Google Drop-
municate locally in the LAN and/or externally with servers cam, Awair air quality monitor and LiFX lightbulb, as
on the public Internet. shown in Figs. 5b, 5c, 5d, 5e, 5f, 5g, and 5h. Again consider-
ing non-IoTs in Fig. 5i, we see about 12000 unique domains
4.2.1 Server Port Numbers visited which is far diverse compared to IoT devices with
only a handful of domains accessed repeatedly.
Fig. 4 shows the word cloud of server-side port numbers of
We also found that IoT devices differ from one other in
all flows initiated from a variety of IoT devices. For each
how often the DNS protocol is used. We have observed
device, if a port is used more frequently then it is shown by a
from our traffic traces that IoT devices generate DNS
larger font-size in the respective word cloud. Sub-captions
queries during different stages of its operation; for example
(i.e., numbers within {}) report the number of unique server
only during the boot-up phase (e.g., Google Dropcamp) or
ports for each device. It can be seen that IoT devices each
when interacting with a user (e.g., Hello Barbie) or periodi-
uniquely communicate with a handful of server ports
cally (e.g., Amazon Echo). As shown in Fig. 6, certain IoT
whereas non-IoT devices use a much wider range of services
devices exhibit a characteristic signature in the frequency of
(i.e., 2382 unique ports are shown in Fig. 4h and many of
their DNS queries. The LiFX lightbulb and Amazon Echo
them are very infrequent). We observe that non-standard
send DNS queries very frequently (i.e., every 5 minutes) but
ports 33434, 56700, 8883, and 25050 are prominently seen in
a device like the Belkin motion sensor requests domain
traffic originating from Amazon Echo, LiFX lightbulb, Awair
names only once every 30 minutes.
air quality monitor, and Netatmo weather station respec-
tively, as shown in the top row of Fig. 4. Further, we note
devices from the same manufacturer share certain ports. For 4.2.3 NTP Queries
example, port numbers 8443 and 3478 are common between As mentioned earlier, NTP is another popular protocol
Belkin’s motion sensor, power switch, and camera, as shown used by IoT devices because precise and verifiable timing
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on October 05,2024 at 14:43:49 UTC from IEEE Xplore. Restrictions apply.
SIVANATHAN ET AL.: CLASSIFYING IOT DEVICES IN SMART ENVIRONMENTS USING NETWORK TRAFFIC CHARACTERISTICS 1751
Fig. 5. Word-cloud of domain names (total count of unique domains is shown in {sub-captions} next to the device name).
is crucial for IoT operations [34]. Many IoT devices tend to negotiate the security algorithms with servers, devices start
use NTP protocol (UDP port 123) in a periodic manner in handshaking by sending a “Client Hello” packet with a list
order to synchronize their time with publicly available of “cipher suites” that they can support, in the order of their
NTP servers. For example, Awair air quality monitor, LiFX preference. For example, Figs. 8a and 8b depict cipher suites
lightbulb and Google Dropcam obtain the IP address of that Amazon Echo offers to two different Amazon servers.
time servers from pool.ntp.org. We also find that time Each cipher suite (i.e., 4-digit code) can take one of 380 pos-
synchronization occurs repeatedly in our test-bed and sible values and represents algorithms for key exchange,
many IoT devices exhibit a recognizable pattern in the use bulk encryption and message authentication code (MAC).
of NTP. For example, the Belkin power switch, LiFX light- For example, the cipher 002f negotiated by an Amazon
bulb and SmartThings hub send NTP requests every 60, server uses RSA, AES_128_CBC, and SHA protocols for key
300 and 600 seconds respectively, as shown in histogram exchange, bulk encryption and message authentication,
plot of Fig. 7. respectively.
We find that 17 out of the 28 IoT devices in our setup,
4.2.4 Cipher Suite inclu ding the Amazon Echo, August Doorbell Cam, Awair
A number of IoT devices use TLS/SSL protocol (port num- air quality monitor, Belkin Camera, Canary Camera, Drop-
ber 443) to communicate with their respective servers on cam, Google Chromecast, Hello Barbie, HP ENVY Printer,
the Internet [30]. In order to initiate the TLS connection and iHome, Netatmo Welcome camera, Philips Hue lightbulb,
in IoT traffic. Each cell of this matrix is the number of occur- Naive Bayes Multinomial classifier performs well if training
rencesof suchuniquewordsinagiveninstance. instances are fairly distributed among various classes [35].
As shown in Fig. 10, each classifier of Stage-0 generates
two outputs, namely a tentative class and a confidence level, 5.1.2 Stage-1 Classifier
which together with other single-valued quantitative attrib-
utes (i.e., flow volume, duration, rate, sleep time, DNS, NTP We have a stage-1 classifier that takes all quantitative attrib-
intervals) are fed into a Stage-1 classifier that produces the utes along with the pair of outputs from each stage-0 classi-
final output (i.e., the device identification with a confidence fier. Since the stage-1 attributes are not linearly separable
level). and the outputs of stage-0 classifiers are nominal values, we
use a Random Forest based stage-1 classifier. Another rea-
5.1.1 Stage-0: Bag-of-Words Classifiers son for selecting the Random Forest is its high tolerance to
over-fitting compared to other decision tree classifiers.
We employ a Naive Bayes Multinomial classifier to analyze
each bag of words in the stage-0 of our machine. It has been
shown [35] that this classifier performs well in text classifica-
5.2 Performance Evaluation
tion when dealing with a large number of unique words. We use the Weka [36] tool for our IoT device classification.
During the training phase, the classifier takes the distribution We have collected a total of 50,378 labeled instances from
of words, e.g., individual unique domain names, and com- our traffic traces. As mentioned earlier, we have a number
putes the probability of each word given a class using: of instances from different devices—those that generate
traffic when triggered by user interaction have small num-
P ber of instances (e.g., 13 for Blipcare BP monitor, 21 for Goo-
1þ D l¼1 nl;ci ;wj
train
Prðwtrain
j jci Þ ¼ PN PD ; (1) gle Chromecast) and those that autonomously generate
N þ k¼1 l¼1 nl;ci ;wk train traffic have a fairly large number of instances (e.g., 2,868 for
Samsung Smart Things or 2,247 for Amazon Echo). We have
where wj is a unique word in the training dataset (e.g., port
randomly split instances into two groups, one containing
number 56700); ci is a class label (e.g., LiFX lightbulb); D is
70 percent of the instances for “training” and another con-
the total number of instances; nl;ci ;wj train is the number of wj taining 30 percent of the instances for “testing”.
occurrences in each of instances with class label of ci ; N is Table 2 shows the performance of our classifier under
the total number of unique words (e.g., we have N ¼ 421 various scenarios, each captured by a pair of columns.
unique port numbers in our dataset). For a given scenario, we measure the true positive rate
During the testing phase, the classifier needs to compute (i.e., fraction of test instances that are correctly classified)
the following probability for all possible classes: and false positive rate (i.e., fraction of test instances that
are incorrectly classified) for every device corresponding
Y
N
ntest
to the rows in Table 2. We also obtain the average confi-
Prðci jW test Þ ¼ Prðctrain
i Þ Pr ðwj train jci Þj ; (2) dence level (i.e., a number between 0 and 1 depicted
j¼1 within square brackets in each cell) of our classifier for
correctly classified and incorrectly classified instances. In
where W test is a set represented by fw1 : ntest test
1 ; w2 : n2 ; . . . ; addition, we aggregate the performance of individual
wN : ntest
N g; ntest
j is the occurrence number of individual classes and compute the overall accuracy (i.e., total true
unique words wj in a given test instance; Prðctrain i Þ is the positive rate) along with the overall root relative squared
presence probability of a class ci in the whole training data- error (RRSE) as measures of performance for our classi-
set (i.e., number of ci training instances divided by total fier. These measures are reported in the top row of each
number of all training instances). The classifier finally choo- scenario in Table 2. Note that our objective is to achieve a
ses the class that gives the maximum probability in (2) for a high accuracy (close to 100 percent) with a fairly low
given set of words along with their occurrences. Note that a error (close to zero).
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on October 05,2024 at 14:43:49 UTC from IEEE Xplore. Restrictions apply.
1754 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 18, NO. 8, AUGUST 2019
TABLE 2
Performance of the Proposed IoT Device Classifier under Different Sets of Attributes
5.2.1 Performance of Stage-0: Port Numbers Attribute reasons: (i) there are 2451 training instances of Netatmo
The first three columns correspond to those cases in which compared to 323 of Ring door bell, which makes Prðctrain i Þ of
we consider only nominal attributes of stage-0 (i.e., bag of Netatmo larger than that of Ring door bell, and (ii) many
words corresponding to port numbers, domain names and Netatmo instances contain several (on average 4 times)
cipher suites). The first column shows that when we only occurrences of port 53 as opposed to only one for Ring Door
use a list of server-side port numbers for device classifica- bell, which also contributes to Prðwj jci Þ of Netatmo being
tion, a reasonable accuracy of 92.13 percent is achieved, but greater than that for Ring door bell in (1). Thus, Ring door
RRSE is poor (at 39.93 percent). Inspecting the individual bell instances get classified as Netatmo weather station,
classes, we observe that certain classes highlighted by yel- warranting a second stage of classification with additional
low or light-green (e.g., Ring door bell, Blipcare BP monitor, attributes for improved accuracy.
Hello Barbie, and Google chromecast) are poorly classified. Blipcare BP Monitor. It uses only two remote port numbers,
We explain the reason behind this misclassification next. namely 8777 and 53, in a total of 13 instances - the port num-
Ring Door Bell. Out of 486 instances, 465 contain a single bers appear only once or twice in each instance. Surprisingly,
occurrence of the DNS query (i.e., remote port number 53). we see that 80 percent of Blipcare test instances are incorrectly
We see that 95.8 percent of test instances are incorrectly clas- classified as Ring Door Bell though the remote port number of
sified as Netatmo weather station. This is because of two 8777 is unique to the Blipcare BP monitor. This is because
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on October 05,2024 at 14:43:49 UTC from IEEE Xplore. Restrictions apply.
SIVANATHAN ET AL.: CLASSIFYING IOT DEVICES IN SMART ENVIRONMENTS USING NETWORK TRAFFIC CHARACTERISTICS 1755
there are only a very small number of Blipcare instances in Table 2. In addition, we find that August doorbell cam is
our dataset, which results in a fairly small value of sharing one of its cipher suite strings (out of total 18) with
Prð00 5300 jBlipcareÞ ¼ 0:0203 and Prð00 877700 j BlipcareÞ ¼ Pixstar photoframe, which has a single cipher suite string.
0:0294 in (1), and a negligible value of Pr ðBlipcaretrain Þ ¼ Thus, 21.2 percent of August door bell instances are misclas-
0:0003 in (2). On the other hand, Prð00 877700 jRingÞ becomes sified as Pixstar photoframe and almost all instances of Pix-
very small as the remote port number 8777 is never used by star photoframe are classified as August doorbell.
the Ring Door Bell in our dataset. However, the probability of
Prð00 877700 jRingÞ ¼ 0:0011 in (1) is sufficient enough to maxi- 5.2.4 Performance of Stage-0: Combination
mize the classifier probability PrðRingjf00 5300 : 1;00 877700 : 1gÞ of Attributes
in (2), given PrðRingtrain Þ ¼ 0:0097. We expect the combination of the three bags of words (port
Other Devices. Server-side port numbers are empty in numbers, domain names, and cipher suites) to significantly
72 percent of instances for Hello Barbie, since it communi- enhances the accuracy of our classifier, as indeed shown by
cates with local devices instead of Internet-based end-points. the fourth column titled “Combined stage-0” in Table 2. The
Similarly for HP printer (38 percent) and iHome power plug overall accuracy reaches to 97.39 percent with RRSE of
(10 percent). The lack of server-side port number information 18.24 percent. It can be seen that the majority of test instances
explains why these devices are classified as Dropcam, which are correctly classified, except for Hello Barbie. This is because
has the highest value of PrðDropcamtrain Þ ¼ 0:0828 in (2). We most of the Hello Barbie attributes are empty in stage-0 and
note that the confidence level of our stage-0 classifier is fairly thus it is classified as Dropcam, as mentioned earlier.
low (i.e., less than 0.4) in these cases, suggesting that the clas- Interestingly, we see that all test instances of Blipcare BP
sifier chooses the most probable class given empty attribute monitor are classified correctly though the accuracy of indi-
(i.e., all ntest
j are zero). vidual stage-0 was fairly poor. This is because our decision-
tree-based classifier in stage-1 sees a strong correlation
5.2.2 Performance of Stage-0: Domain between the outputs of stage-0 classifiers and the actual
Names Attribute class of training instance, even though those outputs (tenta-
We now focus on the stage-0 machine that uses only a bag of tive class) are incorrect—e.g., having the tentative output
domain-names, which yields an accuracy of 79.48 percent from remote port number classifier as Ring door bell, hav-
with a fairly high RRSE value of 57.56 percent, as shown in the ing the tentative output from cipher suite classifier as Drop-
second column in Table 2. In this scenario, more classes suffer cam, and having the confidence level from domain name
from misclassification (i.e., those with yellow coloured cells) classifier less than 0.66 collectively is a strong indication of
compared to the previous scenario where only remote port Blipcare instance.
numbers were considered. The reasons behind the misclassifi-
cation are threefold: (i) since devices from the same manufac- 5.2.5 Overall Performance
turer share a collection of domain names, as discussed in As the last step, we incorporate the outputs from the stage-0
Section 4.2.2, 59.8 percent of Belkin camera test instances are classifiers into stage-1 (without the latter having any notion
misclassified as Belkin Motion sensor and 100 percent Belkin of the quantitative attributes from the former), and addi-
Motion sensor instances are misclassified as Belkin switch. tionally include quantitative attributes (flow volume, dura-
Similarly, 56.8 percent of Withings scale instances are incor- tion, rate, sleep time, DNS and NTP intervals). The last
rectly classified as Withings sleep sensor, and 12 percent of column of Table 2 shows the overall performance of the
Samsung smart cam are misclassified as Samsung Smart- classification framework. In this case, the accuracy reaches a
things. (ii) a significant number of instances from select devi- remarkably high value of 99.88 percent, with almost all clas-
ces contain no DNS query entries (e.g., 96.2 percent of HP ses labeled correctly with a very small value of RRSE at
printer, 73.4 percent of Samsung Smart Cam, 71.4 percent of 5.06 percent. Fig. 11 shows the full confusion matrix of our
Hello Barbie, 12.5 percent of iHome power plug, 11 percent of classification when all the attributes are used in conjunction,
Hue bulb) and are thus incorrectly classified as a Dropcam, and corroborates that the diagonal entries (corresponding to
which also rarely generates DNS packets. (iii) the low number correct classification) are all at or very close to 100 percent,
of training instances with domain names leads to poor perfor- with just two exceptions—the Google Chromecast and the
mance (e.g., Blipcare BP meter and Hello Barbie). Hello Barbie. As explained earlier, the Chromecast gets clas-
sified as the Dropcam in some instances, while the Hello
5.2.3 Performance of Stage-0: Cipher Suite Attribute Barbie gets classified as a Hue bulb.
Considering only the cipher suite attribute, this stage-0 clas-
sifier results in a fairly low accuracy of 36.15 percent with a
high RRSE of 86.73 percent, as shown in the third column in
6 REAL-TIME OPERATION IN A NETWORK
Table 2. Again, the main reason for such poor performance Thus far, we have examined the performance of our multi-
is the scarcity of cipher suite attribute in the training instan- stage classifier using off-line analysis on captured traffic
ces, though this attribute carries a very strong signature to traces (i.e., pcap files). In this section, we discuss how one
uniquely identify an IoT device. Note that many of the IoT can realize a real-time implementation of our system taking
devices do not use secure communication at all and are thus into account the various stages involved in the analysis,
devoid of this attribute (i.e., have an empty field for it). namely attribute collection, machine training, and interpret-
Unsurprisingly, instances of devices that exchange cipher ing the classifier’s output.
suite fairly frequently including Amazon Echo, Awiar air
quality monitor, Canary camera, Google Chromecast and 6.1 Computing Attributes
Netatmo camera are correctly classified, as shown by the Extracting the attributes on-the-fly requires infrastructure
dark-green color cells in the corresponding column in that has sufficient visibility into the traffic flowing on the
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on October 05,2024 at 14:43:49 UTC from IEEE Xplore. Restrictions apply.
1756 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 18, NO. 8, AUGUST 2019
Fig. 11. Confusion matrix of our IoT device classification using all attributes (accuracy: 99.88 percent, RRSE: 5.06 percent).
network. Flow related attributes such as flow volume, flow attributes tool in Weka with InfoGain attribute evaluator and
duration and flow rate can be extracted relatively easily Ranker search method. Fig. 12c shows the attributes in
using network switches that are instrumented with special decreasing order of merit score. A high merit score trans-
hardware-accelerated flow-level analyzers, e.g., NetFlow lates to superior strength in identifying the class of an
capable devices [37]. We therefore deem the extraction cost instance. We can see that the “flow-volume” is the most
of flow related attributes to be fairly low, and show them important attribute, followed by “bag of remote port
via blue color bars in Fig. 12c that depicts the relative costs numbers”, “bag of domain names” and “flow duration”
and merits of the various attributes. respectively. The sleep-time and NTP interval are the attrib-
Attributes including bag of port numbers, sleep-time, utes with the lowest merit.
and frequency of DNS/NTP requests can be extracted using Knowing the relative cost and merit of each attribute
flow-aware network switches with extra computation and allows us to evaluate the performance of our classifier
state management. For example, remote port numbers of all using: (a) only low cost attributes, (b) combination of low
flows associated with a given IoT device need to be and medium cost attributes, and (c) all attributes. The
recorded for the bag of port numbers. However, this specific classifier accuracy and RRSE are shown in Table 3. It is
state is not captured by default in commodity switches. Sim- seen that using only low-cost attributes results in 97.85
ilarly, time intervals between successive UDP packets of percent accuracy with an RRSE value of 18.63 percent; the
NTP/DNS should be recorded, which requires additional additional use of medium-cost attributes increases accu-
computation. We therefore associate these attributes with
racy to 99.68 percent and significantly reduces the RRSE
medium cost, and shown as yellow color bars in Fig. 12c.
error to 7.7 percent; while including all attributes yields
Lastly, two of our attributes, namely bag of domain
an overall accuracy of 99.88 percent and RRSE of 5.06 per-
names and bag of cipher suite strings, can only be extracted
cent. The method can therefore be tuned to achieve
by looking inside the payload of the appropriate packets,
appropriate balance between attribute collection cost and
which imposes considerable cost on processing. Thus, we
associate these attributes with high collection cost, and accuracy/error of classification.
shown them via red color bars in Fig. 12c.
Having understood the extraction cost of various attrib- 6.2 Training the Machine
utes, let us now examine the relative importance of the The duration of the training data set is another source of
attributes in classifying the IoT devices. We quantify the cost incurred by our classification. In Fig. 12a, we plot the
importance of each attribute by employing the select accuracy of the classifier on the left y-axis and the RRSE on
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on October 05,2024 at 14:43:49 UTC from IEEE Xplore. Restrictions apply.
SIVANATHAN ET AL.: CLASSIFYING IOT DEVICES IN SMART ENVIRONMENTS USING NETWORK TRAFFIC CHARACTERISTICS 1757
Fig. 12. Operational insights for real-time implementation of our device classifier: (a) Impact of training, (b) confidence-level for correct/incorrect clas-
sification, and (c) importance of attributes.
the right y-axis as a function of the number of days involved 80 percent, otherwise we need to collect more traffic (and
in collecting the training data set. Note that the x-axis is in richer instances) from that device in order to increase the
log-scale and each day represents 24 instances. confidence level.
It can be seen that the classifier achieves an overall accu- To demonstrate the ability of our classifier in detecting
racy is 99.28 percent with only one day of training and satu- changes of normal behavior, we have launched UDP reflec-
rates at 99.76 percent when trained over 16 days. On the tion and TCP SYN attacks of varying rates on the Samsung
other hand, RRSE drops from 14.43 to 7.5 percent when the camera. When our classifier is fed these attributes during the
training duration is increased from 1 day to 16 days. It fur- attack, it incorrectly identifies the device, but its confidence-
ther falls to 5.82 percent when we train using 70 percent of level drops to less than 50 percent. We note that the confi-
all instances from 128 days. As mentioned in Section 5, the dence level is 100 percent for normal traffic from Samsung
RRSE value is sensitive to the accuracy of individual classes. camera, as shown in the last column of Table 2. This is taken
We therefore believe that if there is a balanced number of as a sign of anomalous behavior that warrants further inves-
instances from various classes, our classifier would perform tigation by the network operator.
better in terms of RRSE.
[3] A. Schiffer, How a fish tank helped hack a casino, 2017. [Online]. [28] D. Tegeler, et al., “BotFinder: Finding bots in network traffic with-
Available: https://goo.gl/SAHxCX out deep packet inspection,” in Proc. 8th Int. Conf. Emerging Netw.
[4] Ms. Smith, University attacked by its own vending machines, Experiments Technol., Dec. 2012, pp. 349–360.
smart light bulbs & 5,000 IoT devices, 2017. [Online]. Available: [29] D. McGrew and B. Anderson, “Enhanced telemetry for encrypted
https://goo.gl/cdNJnE threat analytics,” in Proc. IEEE 24th Int. Conf. Netw. Protocols,
[5] S. Alexander and R. Droms, “DHCP Options and BOOTP vendor Nov. 2016, pp. 1–6.
extensions,” Internet Requests for Comments, RFC Editor, RFC [30] B. Anderson and D. McGrew, “Identifying encrypted malware
2132, Mar. 1997. [Online]. Available: https://tools.ietf.org/rfc/ traffic with contextual flow data,” in Proc. ACM Workshop Artif.
rfc2132.txt Intell. Security, Oct. 2016, pp. 35–46.
[6] A. Sivanathan, et al., “Characterizing and classifying IoT traffic in [31] Cisco, 2017. [Online]. Available: https://github.com/cisco/joy
smart cities and campuses,” in Proc. IEEE Infocom Workshop Smart [32] Y. Meidan, et al., “Detection of unauthorized IoT devices using
Cities Urban Comput., May 2017, pp. 559–564. machine learning techniques,” arXiv, 2017. [Online]. Available:
[7] S. Notra, et al., “An experimental study of security and privacy http://arxiv.org/abs/1709.04647
risks with emerging household appliances,” in Proc. M2MSec, [33] OpenWrt. 2016. [Online]. Available: https://openwrt.org/
Oct. 2014, pp. 79–84. [34] M. Weiss, et al., Time-aware applications, computers, and com-
[8] F. Loi, et al., “Systematically evaluating security and privacy for munication systems. 2015. [Online]. Available: http://dx.doi.org/
consumer IoT devices,” in Proc. ACM CCS Workshop IoT Security 10.6028/NIST.TN.1867
Privacy, Nov. 2017, pp. 1–6. [35] A. McCallum and K. Nigam, “A comparison of event models for
[9] I. Andrea, et al., “Internet of Things: Security vulnerabilities and naive bayes text classification,” in Proc. Workshop Learn. Text Cate-
challenges,” in Proc. IEEE Symp. Comput. Commun., Jul. 2015, gorization, 1998, pp. 41–48.
pp. 180–187. [36] E. Frank, et al., The WEKA Workbench. Online Appendix for ”Data
[10] K. Moskvitch, “Securing IoT: In your smart home and your con- Mining: Practical Machine Learning Tools and Techniques”, 4th ed.
nected enterprise,” Eng. Technol., vol. 12, no. 3, pp. 40–42, San Mateo, CA, USA: Morgan Kaufmann, 2016.
Apr. 2017. [37] E. Vyncke and C. Paggen, LAN Switch Security: What Hackers Know
[11] N. Dhanjani, Abusing the Internet of Things: Blackouts, Freakouts, and About Your Switches. Indianapolis, IN, USA: Cisco Press, 2008.
Stakeouts. Sebastopol, CA, USA: O’Reilly Media, 2015.
[12] E. Fernandes, et al., “Security analysis of emerging smart home Arunan Sivanathan received the bachelor’s
applications,” in Proc. IEEE Symp. Security Privacy, May 2016, degree from the University of Peradeniya, Sri
pp. 636–654. Lanka, in 2012. He is currently working toward the
[13] T. guardian, Why the internet of things is the new magic ingredi- PhD degree in the School of Electrical and Tele-
ent for cyber criminals. 2016. [Online]. Available: https://goo.gl/ communication Engineering, University of New
MuH8XS South Wales (UNSW Sydney). He later joined the
[14] T. Yu, et al., “Handling a trillion (Unfixable) flaws on a Billion devi- University of Jaffna, Sri Lanka, as a lecturer from
ces: Rethinking network security for the internet-of-things,” in Proc. 2013 to 2016. His primary research interests
Proc. 14th ACM Workshop Hot Topics Netw., Nov. 2015, Art. no. 5. include security of Internet of Things and data
[15] A. Sivanathan, et al., “Low-cost flow-based security solutions for analytics on machine-to-machine communication.
smart-home IoT devices,” in Proc. IEEE Int. Conf. Advanced Netw.
Telecommun. Syst., Nov. 2016, pp. 1–6.
[16] A. Moore and D. Zuev, “Internet traffic classification using bayes- Hassan Habibi Gharakheili received the BSc
ian analysis techniques,” SIGMETRICS Perform. Eval. Rev., vol. 33, and MSc degrees in electrical engineering from
no. 1, pp. 50–60, Jun. 2005. the Sharif University of Technology in Tehran,
[17] M. Iliofotou, et al., “Exploiting dynamicity in graph-based traffic Iran, in 2001 and 2004, respectively, and the PhD
analysis: Techniques and applications,” in Proc. 5th Int. Conf. degree in electrical engineering and telecommu-
Emerging Netw. Experiments Technol., Dec. 2009, pp. 241–252. nications from UNSW in Sydney, Australia, in
[18] D. Bonfiglio, et al.,“Revealing skype traffic: When randomness 2015. He is currently a postdoctoral researcher
plays with you,” SIGCOMM Comput. Commun. Rev., vol. 37, no. 4, with UNSW Sydney. His research interests inclu-
pp. 37–48, Aug. 2007. de network architectures, software-defined net-
[19] R. Ferdous, et al., “On the use of SVMs to detect anomalies in a working, and Internet of Things.
stream of SIP messages,” in Proc. 11th Int. Conf. Mach. Learn. Appl.,
Dec. 2012, pp. 592–597.
[20] M. Z. Shafiq, et al., “A first look at cellular machine-to-machine Franco Loi is currently working toward the bach-
traffic: Large scale measurement and characterization,” in Proc. elor’s degree in electrical engineering and com-
ACM 12th ACM SIGMETRICS/PERFORMANCE Joint Int. Conf. puter science at the University of New South
Meas. Modeling Comput. Syst., Jun. 2012, pp. 65–76. Wales in Sydney. His research interest includes
[21] N. Nikaein, et al., “Simple traffic modeling framework for the network security of IoT.
machine type communication,” in Proc. 10th Int. Symp. Wireless
Commun. Syst., Aug. 2013, pp. 1–5.
[22] M. Jadoul, The IoT: The network can make it or break it. 2016.
[Online]. Available: https://insight.nokia.com/iot-network-can-
make-it-or-break-it
[23] M. Simon and Alcatel-Lucent. Architecting Networks: Supporting
IoT, 2014, https://www.slideshare.net/usmanusb/mimos-iot-twg-
day1-session-ii-2nd-speaker-mathew-al
Adam Radford received a first class honors
[24] M. Laner, et al., “Traffic models for machine type commu- degree in science, majoring in computer science
nications,” in Proc. 10th Int. Symp. Wireless Commun. Syst., from UNSW. He is a distinguished systems engi-
Aug. 2013, pp. 1–5. neer at Cisco Systems in Sydney, Australia. His
[25] L. Markus, N. Nikaein, P. Svoboda, M. Popovic, D. Drajic, and background is software and automation, having
S. Krco, “8 - Traffic models for machine-to-machine (M2M) com-
spent 10 years building and automating campus
munications: types and applications,” Machine-to-machine (M2M)
networks. He then joined Cisco and has focused
Communications, Woodhead Publishing, pp. 133–154, 2015. on a variety of technologies including voice, wire-
[26] A. Orrevad, “M2M Traffic Characteristics: When Machines Partic- less, and data center. In recent times, his focus
ipate in Communication,” Inf. Commun. Technol., Stockholm, has been enterprise networks, specifically auto-
Sweden, 2009. mation and programmability.
[27] M. Lopez-Martin, et al., “Network traffic classifier with convolu-
tional and recurrent neural networks for internet of things,” IEEE
Access, vol. 5, pp. 18042–18050, 2017.
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on October 05,2024 at 14:43:49 UTC from IEEE Xplore. Restrictions apply.
SIVANATHAN ET AL.: CLASSIFYING IOT DEVICES IN SMART ENVIRONMENTS USING NETWORK TRAFFIC CHARACTERISTICS 1759
Chamith Wijenayake received the BSc degree in Vijay Sivaraman received the BTech degree from
electronic and telecom engineering from the Uni- the Indian Institute of Technology in Delhi, India, in
versity of Moratuwa, Sri Lanka, in 2007, and the 1994, the MS degree from North Carolina State
PhD degree in electrical engineering from the Uni- University, in 1996, and the PhD degree from the
versity of Akron, Ohio, in 2014. He is currently a University of California at Los Angeles, in 2000.
lecturer with the School of Electrical Engineering He has worked at Bell-Labs as a student fellow, in
and Telecommunications at UNSW. His research a Silicon Valley start-up manufacturing optical
interests include multidimensional space-time sig- switch-routers, and as a senior research engineer
nal processing for electronically scanned smart at CSIRO in Australia. He is now a professor with
antenna arrays, light field signal processing, local the University of New South Wales in Sydney, Aus-
signal approximations, and FPGA based system tralia. His research interests include software
design for DSP applications. defined networking, network architectures, and
cyber-security particularly for IoT networks.
Arun Vishwanath (SM’15-M’11) received the " For more information on this or any other computing topic,
PhD degree in electrical engineering from the Uni- please visit our Digital Library at www.computer.org/publications/dlib.
versity of New South Wales, Sydney, Australia, in
2011. He is a lead research scientist with IBM
Research in Melbourne, Australia, working in the
area of IoT for energy optimization in smart build-
ings and IoT security. He was a visiting PhD
scholar in the Department of Computer Science,
North Carolina State University, in 2008. His
research interests include areas of IoT applica-
tions, cybersecurity and software defined net-
working. He has received several awards from IBM for outstanding
technical accomplishments. He is the recipient of the Best Paper Award
at the ACM e-Energy 2018 conference, an appointed a Distinguished
Speaker of the ACM, and a senior member of the IEEE.
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on October 05,2024 at 14:43:49 UTC from IEEE Xplore. Restrictions apply.