A Deep Learning Approach For IoT Traffic Multi-Classification in A Smart-City Scenario
A Deep Learning Approach For IoT Traffic Multi-Classification in A Smart-City Scenario
ABSTRACT As the number of Internet of Things (IoT) devices and applications increases, the capacity
of the IoT access networks is considerably stressed. This can create significant performance bottlenecks in
various layers of an end-to-end communication path, including the scheduling of the spectrum, the resource
requirements for processing the IoT data at the Edge and/or Cloud, and the attainable delay for critical
emergency scenarios. Thus, a proper classification or prediction of the time varying traffic characteristics
of the IoT devices is required. However, this classification remains at large an open challenge. Most of the
existing solutions are based on machine learning techniques, which nonetheless present high computational
cost, whereas they are not considering the fine-grained flow characteristics of the traffic. To this end, this
paper introduces the following four contributions. Firstly, we provide an extended feature set including,
flow, packet and device level features to characterize the IoT devices in the context of a smart environment.
Secondly, we propose a custom weighting based preprocessing algorithm to determine the importance of the
data values. Thirdly, we present insights into traffic characteristics using feature selection and correlation
mechanisms. Finally, we develop a two-stage learning algorithm and we demonstrate its ability to accurately
categorize the IoT devices in two different datasets. The evaluation results show that the proposed learning
framework achieves 99.9% accuracy for the first dataset and 99.8% accuracy for the second. Additionally,
for the first dataset we achieve a precision and recall performance of 99.6% and 99.5%, while for the second
dataset the precission and recall attained is of 99.6% and 99.7% respectively. These results show that our
approach clearly outperforms other well-known machine learning methods. Hence, this work provides a
useful model deployed in a realistic IoT scenario, where IoT traffic and devices’ profiles are predicted and
classified, while facilitating the data processing in the upper layers of an end-to-end communication model.
INDEX TERMS Deep learning, edge computing, Internet of Things, machine learning, neural networks,
traffic classification.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 10, 2022 21193
A. Hameed et al.: Deep Learning Approach for IoT Traffic Multi-Classification in Smart-City Scenario
different types of devices and their dynamic cycle activity; 3) A statistical feature selection technique is employed to
and (ii) when there is a large number of IoT devices, the select the features with regard to their contribution to
total communication delay may be affected on account of the the classification of IoT devices. Furthermore, an inves-
constrained nature of the IoT access networks. tigation of correlated features at each level is provided
Hence, the importance to predict the time varying charac- using the Pearson correlation coefficient.
teristics of the IoT devices (such as activity patterns, signaling 4) A two stage learning framework is presented with
patterns etc.) becomes evident. Furthermore, the classifica- 99.9% accuracy for the first dataset under consideration
tion of similar devices facilitates the estimation of the gen- and 99.8% for the second one, which proves the gener-
erated workload and can better guarantee a specific level alization of our approach. To determine the IoT device
of Quality of Service (QoS). Therefore, by classifying the classification, we compute the classes for certain nom-
IoT devices into different categories, the prediction of traffic inal and multivalued attributes at learning stage 0 using
characteristics can be more efficiently done. Additionally, logistic regression. Following, we perform the final
a more accurate prediction of the resource requirements at the classification for numerical and single-valued features
IoT access network (i.e. spectrum) and Edge infrastructures at stage 1 using a multilayer perceptron (MLP) neural
(i.e. computational and communication resources), can be network. The MLP network takes as an input a feature
achieved. subset at each time and classifies IoT devices in a con-
However, such an IoT device classification, often called text of a smart environment. Furthermore, to achieve
device fingerprinting [6], presents several challenges. In par- the optimal or near optimal MLP architecture, a random
ticular, the existing IoT classification techniques do not search based keras tuner is employed.
consider the fine-grained characterization of IoT traffic,
while they suffer from high computational cost for the The rest of the paper is structured as follows: Section II
data extraction and processing, and are often affected by highlights the related work in traffic classification, covering
high dimensional data and complexity. Accordingly, in this the most important methods and technologies applied in the
paper, we propose a two-stage based deep learning archi- IoT traffic classification domain. Section III provides the sys-
tecture in order to classify the IoT devices by considering a tem model and necessary preliminaries for comprehending
fine-grained set of network characteristics (features). To do the classification problem in the context of the IoT domain.
so, firstly, we propose a two-step preprocessing algorithm Additionally, this Section covers the description of the feature
while employing a feature selection and prioritization tech- sets, their statistical characteristics and feature correlation,
nique for the feature set under consideration. Our approach, information that is necessary for the domain of data anal-
facilitates the distribution of the features in the two stages ysis that our paper touches upon. Section IV presents the
avoiding the high dimensionality and overfitting problems of two-stage proposed learning framework for the IoT device
the training data. classification problem. Section V explains the algorithmic
The novelty of this paper lies in proposing a very accurate form of proposed preprocessing and learning model along
but considerably more lightweighted approach than the exist- with their asymptotic analysis. Sections IV and V fall under
ing ones. Furthermore, the feature selection and prioritazion the domains of deep learning, machine learning and problem
along with the combination of a deep learning model creates complexity, presenting all the necessary technical details.
a unique and innovative approach for the problem of the Section VI provides the performance evaluation results for
IoT device classification. The novelty of our approach is both datasets under consideration. The conclusions and the
strengthened by the fact that it can be generalized and applied future directions of this work are presented in Section VII.
in different datasets without losing any accuracy. Thus, the Finally, Table 1 presents the set of abbreviations used in this
reproducability of the results and the stability of our approach paper.
in different IoT contexts fortify the originality introduced.
In particular, the major contributions and novelty of this
paper can be summarized as follows: II. RELATED WORK
For the IoT device classification, significant emphasis has
1) In order to perform a classification of the IoT devices, been given into aggregated traffic models, fingerprinting,
we have suggested an extended feature set compris- and machine learning based solutions. The aggregated traffic
ing of flow, device, and packet level features. This models resort to mathematical and statistical distribution-
approach provides a fine grained characterization of the based methods, which involve several probability distribu-
traffic flow with less computational complexity for the tions and mathematical techniques like stochastic processes
classification. to model the traffic. Following, the fingerprinting methods are
2) A two step preprocessing algorithm is proposed that used to identify the IoT devices leveraging information from
assigns relevance weights to the nominal (representing network traces in order to correlate datasets. In particular,
the qualitative data with numeric codes) features and this category of classification identifies a device using infor-
provides scaling of the dataset using a MinMaxScaler mation from the network packets during the communication
method. over the network.
FIGURE 1. Overview of our previous work vs. proposed work contributions (shown in the purple boxes).
interarrival time. However, this approach is computationally dom forest, decision tree, SVM, k-nearest neighbors, simple
intensive as all packet level information is utilized with- neural network and naive bayes approaches.
out any selection strategy. In [14], the traffic patterns of Lopez-Martin et al. [21] classified the traffic applications
encrypted network flows are used to reveal the existence of using a multi-class neural network, which is proven to be
a specific device inside a home network. However, obtain- effective in complex data structures. The authors in [22]
ing such a great number of features require specialized proposed an individual binary classification model for each
hardware accelerators, thus resulting in high computational class in order to eliminate the complexity issue of multi-class
cost, longer classification duration and limited scalability classification. Sivanathan et al. [23] utilized the statistical
due to the need of a deep packet inspection functionality attributes, signaling patterns and cipher suites along with
(limitation 2). machine learning for IoT device classification.
Some related works also employed machine learn- Nonetheless, these ML approaches are affected by the high
ing in order to perform traffic and device classifica- data dimensionality, they are sensitive to the hyper-parameter
tion. Lippmann et al. [15] compared the K-nearest neighbor tuning and they require a large number of training data.
(KNN), Support Vector Machine (SVM), Decision Tree (DT) Moreover, the main constraint of the multi-class classifica-
and Multilayer Perceptron (MLP), using the packet header tion is scalability, as the high number of classes makes the
information and concluded that KNN and DT provide bet- classifier more complex and updating requires full retraining
ter results. Kotak and Elovici [16] classified nine different (limitation 3). A summary of the papers reviewed in this
device flows based on the device type using artificial neuron section is given in Table 2.
network. Regarding traffic classification, the authors in [17] In our preliminary work [24], we tried to address some
predicted the QoS behavior of five different IoT applications of these limitations by relying on typical machine learning
in a smart building context, using several regression based techniques, such as logistic regression and gradient boosting.
ML approaches. In this paper, we extend our preliminary framework to pro-
The work in [18] shows how to classify traffic and perform vide a more complete and detailed IoT multi-classification
device identification using random forest. The list of key approach based on a deep learning solution. As this research
features used in the classification included the packet size, is an extension of our previous study, we used the same IoT
volume of packets, inter-arrival time, duration, urgent and dataset [23]. However, in order to prove the generalization
push flags. Additionally, the authors in [19] performed a of our proposed methodology we also performed our exper-
prediction of the IoT network traffic using Long Term Short iments with a second IoT dataset [25]. Additionally, herein,
Memory (LSTM). The features of dataset consisted of the we include a more extended feature set at three different levels
timestamp, bytes count, and the packet count. A more com- such as: device, flow and packet.
parative approach, was introduced in [20], where the authors This work also introduces a feature correlation mechanism,
presented a method to recognize the IoT devices using ran- whereas specific features are selected for training models
which is not included in our previous work. Furthermore, for TABLE 3. Summary of the key notation.
the new two stage learning framework, we apply an optimal
searched neural network architecture at the second stage.
Finally, a completely new performance evaluation section is
presented. The particular section includes a new set of results
for both datasets, new experiments, and additional compar-
isons with machine learning and deep learning approaches.
The differences between our previous and proposed work are
given in Fig. 1.
The extensions made in this paper are aligned in such a way
to address the above cited limitations:
• To overcome limitation 1, we incorporate a fine-grained
feature set at different network levels i.e., flow, device
and packet level.
• To address limitation 2 and the high computational costs
of complex features, we employ a statistical feature
selection (i.e., ANOVA score) to select a subset of the
available features at a time instance t.
• To address limitation 3, we propose a two-stage learning
framework. Firstly, a relevance weighting-based prepro-
cessing is performed for the available features, whereas
different subsets of the selected features are utilized
across these two stages to avoid the high dimensionality
issue. Finally, the tuned hyperparameters are utilized in
a neural network that achieves 99.9% accuracy for the
first dataset and 99.8% for the second.
TABLE 4. Description of features in both datasets. TABLE 5. Statistical characteristics of IoT traffic features.
B. DATA PREPROCESSING
During the data preprocessing, a basic filtering of the dataset
is performed in order to remove some of the non-meaningful
packets such as ping, DNS requests, etc. The features such
as TTL, window size, packet length are already numerical,
whereas the interarrival time feature is converted to seconds.
Following, we observed that some of the features such as ‘‘set
of port numbers (f5 and f6 )’’, ‘‘set of IP addresses (f2 and f3 )’’
and ‘‘set of MAC addresses (f10 and f11 )’’ are nominal and
FIGURE 3. Correlation between IoT traffic features of Dataset 1.
multi-valued (having more than one value with a single data
instance). As machine learning classifiers cannot deal with
such data, we converted these features into a numerical form
1) Preprocessing the IoT Traffic (Section IV-B): It is the using a two-step procedure.
first step executed and it aims at providing the weighted Firstly, we perform the data cleaning by passing the nomi-
preprocessing of dataset along with the rescaling, nal vectors to the Bag-of-Word (BoW) model [26]. Secondly,
imputation and transformation of traffic traces. as the BoW assigns the same importance to each vector
2) Selecting the most relevant features (Section IV-C): word, we have proposed a relevance weighting to assign
It consists of the selection of the most important fea- a prioritized importance to each word within each vector.
tures, which are highly correlated to the class labels, These relevance weights, attributed to each feature vector, are
using the ANOVA filter based selection method. passed to the stage 0 classifier and is given by Eq. (2):
3) Two-stage learning model (Section IV-D): Here the
classification of the IoT traffic traces is done using Relevance Weight = wfw,v × vfw,v (2)
C. FEATURE SELECTION
The supervised feature selection is a way to choose the input
features that are believed to be the most useful to a model
in order to predict the target variable. For our supervised
feature selection method, we resort to either wrapper methods
or filter based methods. A wrapper based method, such as
Recursive Feature Elimination (RFE), selects the features that
are performing well.
However, for the selection of features from our feature
space F, we employed the filter-based feature selection tech-
nique [27] which uses the statistical methods to score the rela-
tionship between the features and the target labels i.e., class
FIGURE 5. Operational flow of the proposed work.
labels. Specifically, we have selected the ANOVA (Analysis
of Variance) F-value feature selection technique because our
input features are quantitative or become quantitative after
where wfw,v denotes the word frequency of a word w within
preprocessing and the target class labels are of categorical
a vector v and vfw,v represents the total vector frequency.
nature (i.e. c1 indicates a belkin wemo switch, c2 represents
Herein, the vectors consist of the ‘‘port numbers vector’’,
smart cam and so on).
‘‘IP addresses vector’’, and ‘‘MAC addresses vector’’. The
word frequency wfw,v is defined as the number of times that
w occurs in v and is given using Eq. (3): D. PROPOSED TWO-STAGE LEARNING MODEL
1) STAGE 0 CLASSIFIER
number of occurrence of a word in a vector
wfw,v = (3) The Logistic Regression method is employed at stage 0,
number of words in that vector which takes the selected set of features for the training,
Because frequent words are less informative than rare as given by the ANOVA F-value. The reason that we have
words, the vector frequency, vfw,v is given as Eq. (4). selected this classifier is that it has been proven to perform
well for very large data sets [28], as in the case of a smart envi-
number of vectors
vfw,v = log (4) ronment. The logistic regression technique investigates the
number of vectors containing word w association among the independent variables and the depen-
After this step, we impute the missing values of features dent variables of the problem. In our scenario, the selected
using their mean value and re-scale the dataset between 0 and features are the independent variables and the device cate-
1 using the MinMaxScaler technique. gories (e.g. hubs, cameras, etc.) are the dependent variables.
(l)
The goal is to estimate the probability p for a combination of respective layer. The w(i,j) denotes the weight of a connection
independent variables using the following logit function: between the ith neuron of layer l and the jth neuron of layer
(l)
p l − 1; Bi represents the bias value applied at the l th layer for
logit(p) = ln (5) (l)
1−p the ith neuron; Oi denotes the output of the the ith neuron
where ln is the natural logarithm and p denotes the probability at the l th layer and V l represents the nonlinear activation
of an independent variable. The anti log of (5) allows us to function applied at layer l. This work applied the Rectified
find the estimated regression equation given by Eq. (6): Linear Units (ReLU) activation function at the input layer and
p the softmax activation function at the output layer.
logit(p) = ln The above process continues till the output layer predicts a
1−p
label, i.e., class of an IoT device, which is then compared with
= β0 + β1 ∗ x1 + β2 ∗ x2 + . . . + βn ∗ xn ⇒
the actual label and a loss value is calculated using a loss func-
eβ0 +β1 ∗x1 +β2 ∗x2 +...+βn ∗xn tion based on the categorical cross entropy. Secondly, a back
p= (6)
1 + eβ0 +β1 ∗x1 +β2 ∗x2 +...+βn ∗xn propagation is done in which weights are updated using the
where β0 is an intercept, β1 , β2 , and βn are the regression predicted output, desired output and their difference. The goal
coefficients, x1 is the first independent variable, x2 is the is to minimize the loss by finding the optimal weights value.
second independent variable, and xn is the nth selected feature. The optimization function that we applied is based on the
In order to calculate β coefficients, we employed the Gradient Adaptive Moment Estimation (Adam) because it is proved to
Descent method [29]. The general form of Eq. (6) is given as: be very robust for large datasets [31].
1 To model an optimal MLP-ANN, we used the Keras
p(yi |x1 , x2 , . . . , xn ) = (7) tuner [32] along with the Random Search technique. For
1 + e−(β0 +β1 ∗x1 +β2 ∗x2 +...+βn ∗xn )
the hyper parameter optimization, we determine the optimal
where yi represents the dependent variable i.e., the ith IoT number of hidden layers, the optimal number of neurons in
device class, which we predict based on x1 , x2 , and xn . After each layer (i.e., a search between 22 and 512 neurons), and
calculating the regression coefficients the testing component the learning rate (i.e., a search between 1e-2 and 1e-4) using a
comes into effect, where the classifier uses the regression random search tuner. Following, these parameters are passed
coefficients and computes the estimated regression for each to the Adam optimizer, since we want to achieve the best
testing instance using Eq. (7). Finally, stage 0 classifier per- performance along with the least computational complexity.
forms a first tentative prediction.
Algorithm 1 Preprocessing Algorithm requires O(1) operations. For the relweight statement (line 6)
PREP(f2 ,f3 ,f5 ,f6 ,f10 ,f11 ,devices) the complexity is O(1) ∗ O(n) = O(n). However, line 9
// f2 and f3 are source and destination IP addresses; f5 and depends on the number of feature vectors n and thus, in the
f6 are source and destination port numbers; f10 and f11 are worst-case scenario needs O(n). Accordingly, the overall time
source and destination MAC addresses; and devices labels. complexity of PREP procedure is linear i.e., O(1) + O(1) +
1. BOW1 ← generate_BOW (f2 , f3 ) O(n) + O(n) = O(n).
2. BOW2 ← generate_BOW (f5 , f6 ) Proposition 2: The computational complexity of LOGREG
3. BOW3 ← generate_BOW (f10 , f11 ) procedure is O(n).
4. wf ← word_Freq(BOW1 , BOW2 , BOW3 ) Proof: Line 1 is a simple assignment statement (i.e.,
5. vf ← vector_Freq(BOW1 , BOW2 , BOW3 ) O(1)) and lines 2-3 require O(n) computation time in the
6. relweight ← wf × vf worst scenario. Regarding the training time (lines 4-5) of
7. set x ← dataset(BOW1 , BOW2 , BOW3 , relweight ) LOGREG the complexity is O(t ∗ n) where t is the number
8. set y ← dataset(devices) of training examples and n is the number of selected data
9. set xnorm ← MinMaxScaler(x) features used for the classifier training. Additionally, the
Output: xnorm ,y testing time taken by line 6 is O(n). Thus, the LOGREG takes
O(1)+O(n)+O(t ∗n)+O(n) = O(n), which can be beneficial
for low latency applications that require a fast classification
model with the highest validation accuracy across all models method.
given by the RandomSearch(). Finally, we fit the model with Proposition 3: The computational complexity of MLP pro-
fit() for 70 epochs and then call the predict() function. cedure is O(nd)
Proof: In the MLP procedure, lines 7-9 consist of simple
Algorithm 2 Learning Algorithm assignments i.e., O(1). Line 10 indicates the build_model()
function of the neural network and its complexity is O(n ∗
LOGREG(xnorm ,y)
d ∗ t ∗ e), where for proposition 3, n represents the number
// xnorm is the dataset instances and y is the class labels
of layers, d denotes the number of neurons in each layer,
1. set xtr , xtst , ytr , ytst ← split(x, y, testsize ← 0.2)
t is the number of training examples and e is the number
2. set xtr ← selectKBest(Anovascore , xtr )
of epochs. Because we are using 80% training examples
3. set xtst ← selectKBest(Anovascore , xtst )
i.e., 664796 for 70 epochs, the complexity for this part is
4. set model ← LogisticRegression(maxiter ← 3000)
O(n ∗ d ∗ 664796 ∗ 70) = O(nd). Following, RandomSearch()
5. set fit ← model.fit(xtr , ytr )
(line 11) takes O(n) for the worst scenario and line 12 takes
6. set ypred ← model.predict(xtst )
a constant amount of time i.e., O(1). Line 13 takes O(t) and
Output: ypred F Stage 0
testing time taken by the line 14 is O(n). Thus, the MLP takes
MLP(ypred ,f1 ,f4 ,f7 ,f8 ,f9 , devices)
O(1) + O(nd) + O(n) + O(1) + O(t) + O(n) = O(nd) time.
// ypred is the output of Stage 0 classifier; f1 is the interar-
rival time; f4 is the IP protocol used; f7 is the TTL; f8 and
The overall complexity, T of the proposed learning frame-
f9 are the window size and packet length; devices are the
work is represented in term of n as: T (n) = O(n) +
class labels.
O(n) + O(nd) = O(n). Thus, it is a linear time learning
7. set x ← dataset(ypred , f1 , f4 , f7 , f8 , f9 )
work.
8. set y ← dataset(devices)
9. set xtr , xtst , ytr , ytst ← split(x, y, testsize ← 0.2)
10. set m ← build_model() VI. PERFORMANCE EVALUATION
11. set tuner ← RandomSearch(m, tuner.obj(valacc ), A. MODEL IMPLEMENTATION AND FRAMEWORKS
maxtr ← 3, searchtr ← 1)
1) DATASET DESCRIPTION
12. set model ← tuner.getBestModel(nummodels ← 1)
In this work, we have used two different datasets provided
13. set history ← model.fit(xtr , ytr , epochs ← 70)
by [33] and [25] consisting of IoT traffic traces in a smart
14. set ypred ← model.predict(xtst )
environment. The description of both datasets is provided as
Output: ypred : FS ← devices F Stage 1
follows:
Dataset 1 [33] consists of network traffic traces from
28 smart devices. As we have considered a subset of the
B. ASYMPTOTIC ANALYSIS network traffic, which is a total of 12000317 labeled instances
Proposition 1: The computational complexity of PREP of 22 IoT devices, for this dataset we have 22 distinctive
procedure is O(n) classes. The devices are namely, smart phone, belkin wemo
Proof: The PREP procedure running time depends on switch, belkin wemo motion sensor, dropcam, HP printer,
the number of feature vectors, represented as n. Lines 1-3 iphone, laptop, nest protect smoke alarm, netatmo welcome,
take a constant time as they split the vectors into words, thus netatmo weather station, PIX star photo frame, samsung tab,
O(1). Lines 4-5 and 7-8 are assignment statements and each samsung smartcam, smart things, TP link camera, TP link
we delve into the details of the performance of the particular At the main diagonal there are four exception cases: (i) the
framework. worst classification is noticed for the iPhone device, since
Accordingly, for the first dataset, Fig. 10 illustrates the 58% instances of the particular device were classified as
performance metrics per device for stage 0. Some devices Samsung galaxy tab, 22% instances were misclassified as TP
such as Belkin sensor, Dropcam and TP link router presents link router, and 20% were misclassified as amazon echo thus
the highest performance, i.e., recall=1; precision=1 and F1- depicting 100% FPR; (ii) for the nest protect smoke alarm
score=1, all aggregated to 3. The lowest precision is noticed the classification value is 0% with 100% FPR because it
for the belkin wemo switch i.e., 0.61, while the lowest recall was misclassified as Samsung tab; (iii) for the triby speaker,
and F1-score are observed for the Samsung smartcam i.e., we notice a 28% of misclassification as laptop (Type II
0.53 and 0.65 respectively. Furthermore, for the SmartCam error), and 72% of misclassification as netatmo welcome
the aggregated value is 2.04 since the F1 score is 0.65, the (Type II error); (iv) for the withings smart scale, we noticed
recall is 0.53, whereas the precision is significantly high, i.e., 87% of misclassification as baby monitor (Type II error),
0.86. For the Netatmo weather station device, the aggregated 9.6% of misclassification as Samsung smartcam (Type II
value is 2.09 as the precision is reasonably good, i.e., 0.88 but error), 1.9% of misclassification as Netatmo welcome, and
the recall and F1 score are relatively low i.e., 0.54 and 0.67. 1.9% instances were incorrectly classified as belkin wemo
However, there were some devices such as withings scale, switch.
triby speaker, nest alarm, and iPhone for which precision, This behavior is attributed to the following reasons:
recall and F1-score were zero. The reason is that the instances (a) there were 50 instances of iPhone compared to 3242,
of such devices were misclassified in other categories. 87580 and 6231 of galaxy tab, TP link router and amazon
Following, we plot the confusion matrix of dataset 1 to give echo instances; (b) 41 nest protect smoke alarm instances
the overall performance of stage 0 as shown in Fig. 11. The compared to 3242 instances of Samsung galaxy tab; (c)
row entries of a confusion matrix depict the actual values 771 triby speaker instances compared to 21815 laptop
and the column entries depicts the predicted values for the instances and 3995 instances of netatmo welcome; and (d)
22 classes. All the diagonal entries correspond to correct 52 withings smart scale instances compared to 5912, 4895,
classification whereas entries above diagonal are all Type I 3995 and 4407 instances of baby monitor, Samsung smart-
error (also called False Positive Rate (FPR)) and entries below cam, Netatmo welcome and belkin wemo switch respectively.
are Type II error (also called False Negative Rate (FNR)). The Thus, the prediction value for these devices is much higher as
goal is to minimize the Type I and Type II errors close or equal compared to iPhone, nest protect smoke alarm, triby speaker
to zero. and withings scale.
FIGURE 12. Training vs. validation accuracy of architecture V for FIGURE 14. Comparison of performance metrics for stage 1 of
100 epochs. architecture V over 100 epochs.
a training dataset, and the computational overhead for the [4] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, ‘‘A survey on
model training. In addition to that a classification task is a mobile edge computing: The communication perspective,’’ IEEE Com-
mun. Surveys Tuts., vol. 19, no. 4, pp. 2322–2358, 4th Quart., 2017, doi:
supervised learning approach. This means that if new types of 10.1109/COMST.2017.2745201.
IoT devices are connected in the local network a new cycle of [5] D. Dechouniotis, N. Athanasopoulos, A. Leivadeas, N. Mitton, R. Jungers,
data collection, annotation and training should begin in order and S. Papavassiliou, ‘‘Edge computing resource allocation for dynamic
networks: The DRUID-NET vision and perspective,’’ Sensors, vol. 20,
to update the model. no. 8, p. 2191, Apr. 2020, doi: 10.3390/s20082191.
[6] Q. Xu, R. Zheng, W. Saad, and Z. Han, ‘‘Device fingerprinting
in wireless networks: Challenges and opportunities,’’ IEEE Commun.
VII. CONCLUSION
Surveys Tuts., vol. 18, no. 1, pp. 94–104, 1st Quart., 2016, doi:
In this work, we studied the problem of IoT traffic classifica- 10.1109/COMST.2015.2476338.
tion. To solve this problem we have proposed a composite [7] O. N. Osterbo, D. Zucchetto, K. Mahmood, A. Zanella, and O. Grondalen,
learning framework that consists of two stages. After an ‘‘State modulated traffic models for machine type communications,’’ in
Proc. 29th Int. Teletraffic Congr. (ITC), Ilmenau, Germany, Sep. 2017,
initial data preprocessing, the network traces are passed to pp. 1–5.
stage 0, where a feature selection mechanism and a Logistic [8] M. Laner, N. Nikaein, P. Svoboda, M. Popovic, D. Drajic, and S. Krco,
Regression classifier are applied. In particular, an ANOVA ‘‘Traffic models for machine-to-machine (M2M) communications: Types
and applications,’’ in Machine-to-Machine (M2M) Communications:
filter based selection technique decides on the most important Architecture, Performance and Applications, C. Antón-Haro and
features to be used by the stage 0 classifier. The tentative clas- M. Dohler, Eds. Sawston, U.K.: Woodhead Publishing, 2020,
sification of the stage 0 classifier along with the remaining pp. 133–154.
[9] A. Orrevad, ‘‘M2M traffic characteristics: When machines participate in
features were then passed to the stage 1 classifier, which used communication,’’ Ph.D. dissertation, KTH Inf. Commun. Technol., Stock-
an optimal multi-layer perceptron neural network architecture holm, Sweden, 2009.
that provides the final classification. [10] M. Miettinen, S. Marchal, I. Hafeez, T. Frassetto, N. Asokan,
A.-R. Sadeghi, and S. Tarkoma, ‘‘IoT Sentinel demo: Automated
Following, a detailed experimentation and comparison device-type identification for security enforcement in IoT,’’ in Proc.
with various composite architectures on two different IoT IEEE 37th Int. Conf. Distrib. Comput. Syst. (ICDCS), Atlanta, GA, USA,
datasets have been performed. We concluded that the pro- Jun. 2017, pp. 2511–2514.
posed framework can considerably increase the performance [11] B. Bezawada, M. Bachani, J. Peterson, H. Shirazi, I. Ray, and I. Ray,
‘‘Behavioral fingerprinting of IoT devices,’’ in Proc. Workshop Attacks
of the classification in terms of recall, precision, F1-score, Solutions Hardw. Secur., Jan. 2018, pp. 41–50.
accuracy and confusion matrix metrics. Regarding the accu- [12] Y. Meidan, M. Bohadana, A. Shabtai, M. Ochoa, N. Ole Tippenhauer,
racy, our proposed model achieved a 99.9% accuracy for the J. Davis Guarnizo, and Y. Elovici, ‘‘Detection of unauthorized IoT
devices using machine learning techniques,’’ 2017, arXiv:1709.04647.
first dataset and a 99.8% accuracy for the second dataset, Accessed: Jul. 27, 2021.
proving the generalization aspects of our approach. [13] S. Aneja, N. Aneja, and M. S. Islam, ‘‘IoT device fingerprint using deep
The particular model is of utmost importance in an IoT learning,’’ in Proc. IEEE Int. Conf. Internet Things Intell. Syst. (IOTAIS),
Nov. 2018, pp. 174–179.
to Cloud continuum communication model, where different [14] N. Apthorpe, D. Reisman, and N. Feamster, ‘‘A smart home is no castle:
IoT devices need to be classified and their traffic profiles Privacy vulnerabilities of encrypted iot traffic,’’ 2017, arXiv:1705.06805.
be accurately predicted. This precise classification can pos- Accessed: Jul. 27, 2021.
[15] R. Lippmann, D. Fried, K. Piwowarski, and W. Streilein, ‘‘Passive oper-
itively contribute to the proper estimation of the required ating system identification from TCP/IP packet headers,’’ in Proc. ICDM
resources from the subsequent Edge and Cloud layers where Workshop Data Mining Comput. Secur. (DMSEC), 2003, pp. 1–10.
the IoT traffic will be processed and analyzed. [16] J. Kotak and Y. Elovici, ‘‘IoT device identification using deep learning,’’
The future direction of this work lies in the combination of in Proc. 13th Int. Conf. Comput. Intell. Secur. Inf. Syst. (CISIS), 2020,
pp. 76–86.
our proposed model with a resource allocation mechanism [17] A. Hameed, J. Violos, N. Santi, A. Leivadeas, and N. Mitton, ‘‘A machine
that will be able to leverage this workload estimation and learning regression approach for throughput estimation in an IoT environ-
dynamically change the allocation strategy at the access and ment,’’ in Proc. 14th IEEE Int. Conf. Internet Things, Melbourne, VIC,
Australia, Dec. 2021, pp. 29–36.
Edge networks. Finally, we aim to include other machine [18] M. R. P. Santos, R. M. C. Andrade, D. G. Gomes, and A. C. Callado,
learning techniques such as K-means clustering along with ‘‘An efficient approach for device identification and traffic classification
unsupervised methods to address the limitations of classify- in IoT ecosystems,’’ in Proc. IEEE Symp. Comput. Commun. (ISCC),
Jun. 2018, pp. 304–309.
ing new and unknown types of IoT devices. [19] A. Abdellah, V. Artem, A. Muthanna, D. Gallyamov, and A. Koucheryavy,
‘‘Deep learning for IoT traffic prediction based on edge computing,’’ in
REFERENCES Proc. Int. Conf. Distrib. Comput. Commun. Netw., Moscow, Russia, 2020,
pp. 18–29.
[1] N. Ivanov. (2019). Unleashing the Internet of Things With In-Memory [20] M. R. Shahid, G. Blanc, Z. Zhang, and H. Debar, ‘‘IoT devices recognition
Computing—IoT Now—How to Run an IoT Enabled Business. through network traffic analysis,’’ in Proc. IEEE Int. Conf. Big Data (Big
Accessed: Jul. 7, 2021. [Online]. Available: https://www.iot-now. Data), Dec. 2018, pp. 5187–5192, doi: 10.1109/BigData.2018.8622243.
com/2019/01/17/92200-unleashing-internet-things-memory-computing [21] M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas, and J. Lloret, ‘‘Net-
[2] S. C. Mukhopadhyay and N. K. Suryadevara, ‘‘Internet of Things: Chal- work traffic classifier with convolutional and recurrent neural networks
lenges and opportunities,’’ in Internet of Things. Springer, 2014, pp. 1–17, for Internet of Things,’’ IEEE Access, vol. 5, pp. 18042–18050, 2017, doi:
doi: 10.1007/978-3-319-04223-7_1. 10.1109/ACCESS.2017.2747560.
[3] F. Saeik, M. Avgeris, D. Spatharakis, N. Santi, D. Dechouniotis, J. Violos, [22] Y. Meidan, M. Bohadana, A. Shabtai, J. D. Guarnizo, M. Ochoa,
A. Leivadeas, N. Athanasopoulos, N. Mitton, and S. Papavassiliou, ‘‘Task N. O. Tippenhauer, and Y. Elovici, ‘‘ProfilIoT: A machine learning
offloading in edge and cloud computing: A survey on mathematical, arti- approach for IoT device identification based on network traffic analysis,’’
ficial intelligence and control theory solutions,’’ Comput. Netw., vol. 195, in Proc. Symp. Appl. Comput. (SAC), Marrakech, Morocco, Apr. 2017,
Aug. 2021, Art. no. 108177, doi: 10.1016/j.comnet.2021.108177. pp. 506–509.
[23] A. Sivanathan, H. H. Gharakheili, F. Loi, A. Radford, C. Wijenayake, JOHN VIOLOS was a Research Associate at the
A. Vishwanath, and V. Sivaraman, ‘‘Classifying IoT devices in smart National Technical University of Athens, a Ses-
environments using network traffic characteristics,’’ IEEE Trans. sional Lecturer at the Harokopio University of
Mobile Comput., vol. 18, no. 8, pp. 1745–1759, Aug. 2019, doi: Athens, and a Visiting Lecturer at the National and
10.1109/tmc.2018.2866249. Kapodistrian University of Athens. He was a mem-
[24] A. Hameed and A. Leivadeas, ‘‘IoT traffic multi-classification using net- ber of the European Commission’s Digital Single
work and statistical features in a smart environment,’’ in Proc. IEEE 25th Market working group on the code of conduct for
Int. Workshop Comput. Aided Modeling Design Commun. Links Netw.
switching and porting data between cloud service
(CAMAD), Pisa, Italy, Sep. 2020, pp. 1–7.
providers. He is currently a Research Associate
[25] J. Ren, D. J. Dubois, D. Choffnes, A. M. Mandalari, R. Kolcun, and
H. Haddadi, ‘‘Information exposure from consumer IoT devices: A multi- with the Department of Software Engineering and
dimensional, network-informed measurement approach,’’ in Proc. Internet Information Technology, ETS. His research interests include deep learning,
Meas. Conf., New York, NY, USA, Oct. 2019, pp. 267–279. machine learning, and cloud and edge computing.
[26] C. Zong, R. Xia, and J. Zhang, ‘‘Text representation,’’ in Text Data Mining,
1st ed. Singapore: Springer, 2021.
[27] J. Brownlee, ‘‘How to choose a feature selection method for machine
learning,’’ Mach. Learn. Mastery, 2020. Accessed: Jul. 27, 2021. [Online].
Available: https://machinelearningmastery.com/feature-selection-with-
real-and-categorical-data/
[28] K. Backhaus, B. Erichson, S. Gensler, R. Weiber, and T. Weiber, ‘‘Logis-
tic regression,’’ in Multivariate Analysis, K. Backhaus, B. Erichson,
S. Gensler, R. Weiber, and T. Weiber, Ed. Wiesbaden, Germany: Springer,
2021, pp. 267–354.
[29] M. Henry, ‘‘Review on gradient descent algorithms in deep learning
approaches,’’ J. Innov. Develop. Pharmaceutical Tech. Sci., vol. 4, no. 3,
pp. 91–95, 2021.
[30] M. Okwu and L. Tartibu, ‘‘Artificial neural network,’’ in Metaheuris-
tic Optimization: Nature-Inspired Algorithms Swarm and Computational
Intelligence, Theory and Applications, M. Okwu and L. Tartibu, Eds.
Cham, Switzerland: Springer, 2021, pp. 133–145.
[31] Scikit Learn, Neural Network Models (Supervised).
Accessed: Jul. 27, 2021. [Online]. Available: https://scikit-learn.org/stable/
modules/neural_networks_supervised.html
[32] (2020). Keras Tuner. Accessed: Jul. 27, 2021. [Online]. Available: ARIS LEIVADEAS (Senior Member, IEEE)
https://keras-team.github.io/keras-tuner/ received the Diploma degree in electrical and
[33] University of New SouthsWales. IoT Traffic Traces. computer engineering from the University of
Accessed: Jul. 27, 2021. [Online]. Available: https://iotanalytics.unsw. Patras, Greece, in 2008, the M.Sc. degree in
edu.au/iottraces engineering from King’s College London, U.K.,
in 2009, and the Ph.D. degree in electrical and
computer engineering from the National and Tech-
nical University of Athens, Greece, in 2015.
From 2015 to 2018, he was a Postdoctoral
AROOSA HAMEED received the master’s degree Researcher with the Department of Systems and
in computer science from Quaid-i-Azam Univer- Computer Engineering, Carleton University, Ottawa, ON, Canada. In par-
sity, Islamabad, Pakistan, in 2018. She is currently allel, he worked as an Intern at Ericsson and then at Cisco, Ottawa. He is
pursuing the Ph.D. degree with the Department currently an Associate Professor with the Department of Software and
of Software and Information Technology Engi- Information Technology Engineering, Ecole de Technologie Superieure
neering, Ecole de Technologie Superieure (ETS), (ETS), University of Quebec, Canada. His research interests include cloud
Montreal. Her main research interests include the computing, the IoT, and network optimization and management. He received
Internet of Things (IoT), traffic analytics, the IoT the Best Paper Award in ACM ICPE 2018 and IEEE iThings 2021 and the
services, the IoT security, and machine learning. Best Presentation Award in IEEE HPSR 2020.