Applsci 11 03022 v2
Applsci 11 03022 v2
sciences
Article
A Multi-Stage Classification Approach for IoT Intrusion
Detection Based on Clustering with Oversampling
Raneem Qaddoura 1 , Ala’ M. Al-Zoubi 2,3 , Iman Almomani 3,4 and Hossam Faris 3,5, *
Abstract: Intrusion detection of IoT-based data is a hot topic and has received a lot of interests
from researchers and practitioners since the security of IoT networks is crucial. Both supervised
and unsupervised learning methods are used for intrusion detection of IoT networks. This paper
proposes an approach of three stages considering a clustering with reduction stage, an oversampling
stage, and a classification by a Single Hidden Layer Feed-Forward Neural Network (SLFN) stage.
The novelty of the paper resides in the technique of data reduction and data oversampling for
generating useful and balanced training data and the hybrid consideration of the unsupervised
and supervised methods for detecting the intrusion activities. The experiments were evaluated in
terms of accuracy, precision, recall, and G-mean and divided into four steps: measuring the effect
Citation: Qaddoura, R.; Al-Zoubi,
of the data reduction with clustering, the evaluation of the framework with basic classifiers, the
A.M.; Almomani, I.; Faris, H. A
effect of the oversampling technique, and a comparison with basic classifiers. The results show that
Multi-Stage Classification Approach
SLFN classification technique and the choice of Support Vector Machine and Synthetic Minority
for IoT Intrusion Detection Based on
Oversampling Technique (SVM-SMOTE) with a ratio of 0.9 and the k value of 3 for k-means++
Clustering with Oversampling. Appl.
Sci. 2021, 11, 3022. https://doi.org/
clustering technique give better results than other values and other classification techniques.
10.3390/app11073022
Keywords: intrusion detection; IoT; internet of things; imbalanced; oversampling; IoTID20; clustering
Academic Editor: Eui-Nam Huh
feeds [16]. They did this using the IoT devices search engine (Shodan), in which several
feeds were revealed. One of these feeds showed the daily routine of an Australian family,
a man in Moscow getting ready for sleep, and a cat been feeding by her owner in Japan.
Furthermore, all cameras did not have any security checks and could be accessed by
anyone according to CNN. Another situation happened in 2017, where the US FDA stated
that St. Jude Medical’s implantable cardiac devices could possibly be hacked [17]. These
devices are responsible for monitoring patients’ heartbeat and checking for heart attacks.
Consequently, hackers could control device functions such as drain the battery, incorrect
pacing, and shocks.
The huge data (big data) generated from IoT devices made the detection of such attacks
nearly impossible without proper mechanisms. One of these mechanisms is the Intrusion
Detection System (IDS), which is a defense system that is responsible for monitoring the
activities of the network in IoT devices [18].
There are several intrusion detection approaches, such as preemptive blocking, signature-
based, and anomaly detection. In recent years, anomaly-based intrusion detection has shown
superior performance over the other approaches, especially using machine learning techniques
[19]. The main idea of machine learning-based detection is to train a trustworthy activity
model and compare it against the new behavior. Machine learning is known to have better
generalization properties than other conventional methods, given the potential to train hardware
and application configurations.
Due to the big data generated from IoT devices, machine learning algorithms consider
the optimal approach to deal with such data using their ability to deliver meaningful inter-
pretations and predictions as well as deep analysis of the data patterns [20]. The authors
of [21] stated that, to develop a computational approach that can detect different types of
cyber-attacks, an intelligent data-driven intrusion detection system is required, by means of
machine learning technique. Mishra et al. [22] reported that it can be simple to bypass the
signature-based intrusion detection system if the the attack is modified slightly, whereas
machine learning-based methods can detect these variations as a result of the learning
characteristics of the activity of the traffic flow. Further, it can capture the complicated
properties of such attacks through learning their behavior and improve the speed as well
as the detection performance better than the traditional intrusion detection system.
Well-known datasets have been presented for intrusion detection, namely CICIDS2017 [23],
UNSWNB15 [24], and ISCX2012 [25]. However, none of these datasets are collected from an
IoT environment. Several works from previous years have started to consider intrusion de-
tection datasets in the IoT environment, such as BoT-IoT [26] and DS20S [27]. Nevertheless,
with the growth of IoT devices and novel attack techniques in recent years, it has become a
necessity to update and upgrade the datasets to reflect such techniques. Besides, available
IoT intrusion datasets lack a large number of features. Thus, recent datasets are introduced
such as LITNET-2020 [28] and IoTID20 [29]. The LITNET-2020 dataset was collected from
the KTU LITNET network to present the normal and attack network traffic, while the data
gathered for IoTID20 were generated from various sources such as smartphones, laptops,
tablets, and smart home devices. The IoTID20 focuses more on the daily home usage
devices, while the LITNET-2020 concentrates on the academic network traffic. Therefore,
in this study, we considered the IoTID20 dataset to investigate the IoT intrusion detection
for in-home environments.
Machine learning methods are generally split into two types; supervised and unsuper-
vised learning. The first type is the common one, which is also referred to as classification,
where the algorithm learns from the dataset labels; in other words, it has the answer keys
of the attack types to evaluate the detection accuracy of the training data. Several works
adopted the supervised learning method [30,31]. For example, Alharbi et al. [32] proposed
a malware cyberattacks detection system in the IoT environment. Their approach consists
of several components, including a traffic analysis unit using the supervised learning
method. They applied the decision tree classification algorithm to detect suspicious traffic
flow. The proposed approach shows effective detection of malicious attacks with little
Appl. Sci. 2021, 11, 3022 3 of 19
bandwidth consumption and a low response time. Verma and Ranga [33] applied the
supervised learning method to investigate the detection of DoS attacks in IoT devices.
They employed three well-known datasets to evaluate the classification model: NSL-KDD,
CIDDS-001, and UNSWNB15. The outcome of the approach showed better performance
than the other methods. Supervised learning is useful when attacks are known; however,
with the evolution of the attacking techniques alongside emerging new (unknown) ones, it
becomes more difficult to detect them.
Another interesting work investigates the webshell detection in IoT environment using
the ensemble methods [34]. The authors applied three types of ensemble techniques to en-
hance the machine learning model’s performance: voting, extremely randomized trees (ET),
and random forest (RF). Moreover, their outcome of the study is that RF and ET are better
for lightweight IoT scenarios, while the voting method was better heavyweight scenarios.
Furthermore, previous works lack two main aspects: the novelty of the IoT intrusion
detection datasets and the testing the combined approaches of the learning criteria, super-
vised and unsupervised. Thus, our approach fills the gap and studies the novelty of the
dataset and the use of a combined learning approaches. Unsupervised learning, which
is also referred to as clustering, can work alongside with the supervised techniques to
handle these unknown novel attacks [22]. Unsupervised learning can be described as a
method that extracts and finds hidden patterns of an unlabeled dataset [35]. Consequently,
it identifies similar characteristics of current and new IoT attacks and divides them into
groups using the clustering machine learning method.
The main contribution of this paper can be summarized as follows:
• We propose a multi-stage approach for classifying the intrusion and normal activities
by applying clustering, reduction, oversampling, and classification techniques.
• A unique reduction with clustering technique is applied on the IoT training data to
undersample the data while maintaining a representative dataset for training.
• Oversampling the dataset is used to solve the issue of imbalance distribution of the
classes in the data.
The remainder of this paper is organized as follows. Section 2 shows recent studies
on intrusion detection and recent datasets found in related works. Section 3 discusses the
k-means clustering technique, the SLFN, and the oversampling techniques, which are used in
the proposed approach. The IoTID20 dataset used in the experiments is presented in Section 4.
Section 5 discusses in detail the three main stages of the approach, namely data reduction
with clustering, oversampling, and classification with SLFN stages. The experiments and
results are presented and discussed in Section 6. Finally, Section 7 concludes the work.
2. Related Works
In the past few years, IoT has been utilized in different and essential fields, including
healthcare, industry, education, smart cities, agriculture, and retail. Numerous capabilities
have led IoT to be applied in various applications [36]. Besides, IoT can help users select the
best possible opportunity in any scenario, e.g., cloud resources, management, and decision
making [37], thus gaining more attention in academia and industry. For instance, Iqbal
et al. [38] proposed an IoT wearable sensor-based device to monitor human health. They used
several objects: wearable sensors, activity recognition, and smartphones. Zielonka et al. [39]
presented an IoT convection system for small houses. Their approach, which is a remote
platform control system, is responsible for collecting sensor readings all over the house from
users and optimizing its parameters using computational intelligence to enhance the IoT
convection system to improve family comfort. Further, Kamble et al. adopted IoT to recognize
the barriers in the retail supply chain for food in India [40]. Abdel-Basset et al. investigated
the smart education environment through IoT [41]. Ahmed et al. [42] proposed network
architecture for controlling the agricultural places in rural areas.
This increase in IoT utilization and employment makes it hard to control the safety
and privacy of connected devices. Thus, the intrusion detection in IoT becomes more
crucial. Da Costa et al. [43] suggested that network security is essential due to the high
Appl. Sci. 2021, 11, 3022 4 of 19
access to critical information that may cause substantial business losses. Thus, upgrading
the field of intrusion detection in IoT is a real necessity [43].
Fu et al. [44] presented an intrusion detection technique in the IoT environment.
The approach is used to detect and report different types of attacks, namely reply-attack,
jam-attack, and false-attack. Further, they developed experiments to validate the proposed
approach against the RADIUS application. Loulianou et al. [45] implemented an intrusion
detection system for IoT network protection. The work adopted the signature-based
intrusion detection method as well as distributed and centralized modules. Additionally,
they designed a DoS scenario using the Cooja simulator and proved that these types of
attacks might affect the availability of IoT devices.
On the other hand, applying machine learning for intrusion detection has shown,
in many applications, better performance than other approaches. da Costa et al. [43]
presented a review of machine learning intrusion detection based in the IoT environment.
They surveyed more than 95 papers in the literature that applied machine learning to deal
with the issue of IoT intrusion detection. Another recent review presents and analyzes
different machine learning and deep learning-based methods in order to identify the
intrusion activities of IoT applications [46]. Both reviews emphasize the advantage of
machine learning techniques against other approaches in intrusion detection problems.
Most of the works in the literature that employs the machine learning techniques are
divided into two groups, namely supervised and unsupervised learning. For example,
Smys et al. introduced the supervised learning hybrid convolutional neural networks
model for intrusion detection in IoT networks [47]. Their model achieved better results
when compared against deep learning and conventional machine learning models. The
proposed approach proved that it is sensitive to the detection of IoT attacks. Another
supervised learning work developed an attack detection system using the Support Vector
Machine (SVM) as a classification model to detect any injected data in the IoT network [48].
The classification model achieved satisfactory results in terms of accuracy. Additionally,
Almomani and Alenezi [49] applied eight different data mining techniques to detect and
classify different types of DoS (Denial of Service) attacks in the context of sensor-based IoT
networks. The dataset used was created by Almomani et al. [50] and includes five types of
DoS attacks including flooding, TDMA, grayhole, and blackholes attacks. Although the
feature selection algorithm reduced 53% of the overall features, their intrusion detection
system attained high accuracy that reached 98%.
On the other hand, the detection of unknown new attacks is better performed by
unsupervised learning than supervised learning, because of its ability to group and sort
new attacks with similar characteristics. Choudhary and Kesswani [51] presented a cluster-
based intrusion detection algorithm for IoT. Their model consists of hybrid intrusion
detection for detecting sinkhole and forwarding attacks. The model obtained 96.3% true
positive rate and 6.1% false positive rate. Bostani and Sheikhan [52] proposed an un-
supervised optimum-path forest algorithm for intrusion detection in IoT. The proposed
approach contains two intrusion detection methods, which are specification-based and
anomaly-based. The specification-based method analyzes the host nodes and transfers
their results to the root node, whereas the anomaly-based method applies the clustering
models using the transfer data. Further, through a voting mechanism, the hybrid proposed
model identifies the suspicious behavior. The results show that the proposed method
acquired 76.19% and 5.92% true positive and false positive rates, respectively.
Notwithstanding, most of the IoT datasets in intrusion detection suffer from the class-
imbalance issue that causes poor performance of traditional machine learning approaches.
Consequently, some researchers have tried to solve this problem. Telikani and Gandomi
suggested a cost-sensitive stacked auto-encoder (CSSAE) approach to handle the imbalance
problem in IoT intrusion detection systems [53]. CSSAE produced a cost for each class
that depends on the distribution of the classes. Then, an auto-encoder with a two-layer
stack was applied to learn the differences between majority and minority classes. Their
approach can be employed in both the binary and multi-class data. CSSAE achieved
Appl. Sci. 2021, 11, 3022 5 of 19
better results when compared with other intrusion detection systems against KDD CUP
99 and NSL-KDD datasets. Ullah and Mahmoud [54] investigated intrusion detection
in IoT networks using the two-level hybrid model to identify the irregular activity. The
model utilizes the Synthetic Minority Oversampling Technique (SMOTE) to apply the
oversampling technique on CICIDS2017 and UNSW-15 datasets. The experiment results
obtained by the model are competitive with 100% for CICIDS2017 and 99% for UNSW-15
in terms of precision, recall, and F-score. Shahriar et al. [55] also addressed the imbalanced
issue in IoT intrusion detection systems. They used a generative adversarial network
(GAN) as a model to solve the difficulties of imbalanced classes. They argued that their
approach performs better in detecting attacks than the standalone intrusion detection
systems. Moreover, IoTID20 dataset was tested by Maniriho et al. [56] by classifying three
different subsets: normal traffic and DoS attack, normal traffic and MITM, and normal
traffic and Scan attack. They did not test the dataset with the classification of normal
activities and all the categories of intrusion activities (including Mirai, DoS, MITM, and
scan attacks) at once, which is considered in our work.
Therefore, due to the scarcity of research on this matter, we combine both supervised
and unsupervised learning methods on the recently published IoTID20 dataset. We solve
the imbalanced data issue by the reduction and oversampling techniques, which ensures
an expressive and balanced training data, by minifying the training data at one stage using
a reduction technique and enlarging it at another stage using oversampling.
3. Preliminaries
This section discusses the introductory information needed for understanding the
main components of the proposed approach. It includes a discussion of the k-means
clustering algorithm, the SLFN algorithm, and the oversampling technique.
k
argmin ∑ ∑ || x j − ci ||2 (1)
s i =1 x j ∈ s i
on the features of the instances, or creating new instances of the minority class from the
instances that are harder to learn.
Figure 1. Oversampling technique of the two classes to enlarge the minority class.
SLF
SLFN
N
F1 Intrusion
F2
F3
. Normal
. .
.
.
Fn .
The task of training the neural network aims at giving high-quality classification
results by finding the relationships between the inputs and the outputs because the actual
relationship in most cases cannot be recognized by traditional techniques [60]. This is done
by tuning the weights and biases of the neural network to give better mapping, and thus
increasing the accuracy of the predicted labels, which is measured by a cost function.
For each neuron in the hidden layer, a weighted sum is calculated as the sum of the
product between the input values and the corresponding values of the weights, which can
be observed by the following equation:
N
WS j = ∑ wij Fi + b J (2)
i =1
Appl. Sci. 2021, 11, 3022 7 of 19
where WS j is the weighted sum for a neuron j, N is the number of input nodes, Fi is the
value of an input node i, wij is the weight connecting input node i with hidden neuron j,
and b J is the bias of the hidden layer J.
The value of the neuron j is the value of activating the weighted sum WS j . This value
is then considered as input value for the output layer. The weighted sum is then applied
on the output layer to generate the predicted value for each neuron in the output layer. The
predicted value is then compared to the expected value to find the error generated by the
network for each label.
The type of DoS attack considered is the one that usually targets the TCP-based
connections (Transmission Control Protocol-based connections) by flooding synchronized
(SYN) packets. SYN packets are usually used to build TCP connections between the
communicating parties by reserving resources, mainly ports and buffers at both sides.
It can be utilized to attack the availability of the server and/or the victim machines.
Moreover, DDoS attacks in the context of IoT Mirai were implemented through flooding
acknowledgment, HyperText Transfer Protocol (HTTP), and User Datagram Protocol (UDP)
packets. In addition, brute force attack was executed to break the encrypted data and
expose its secrecy.
MITM was also performed to poison the Address Resolution Protocol (ARP) table and
map the Internet Protocol (IP) address of the router with the Media Access Control (MAC)
address of the attacker. This allows the attacker to impersonate the network router and
interfere with the communications among the network entities. The purpose of this attack
is mainly sniffing or manipulating the transmitted data.
In general, before conducting any attack, the scanning phase should take place. This
is part of the active reconnaissance that discovers the running services on the victims’
machines and the type of operating systems they are using. This can be conducted through
port scanning and Time to Live (TTL) analysis. Knowing this information helps the attacker
identify the vulnerabilities of these services to attack them successfully and cause severe
damage to the IoT system and its resources.
In this study, we considered the intrusion identification label for the IoTID20 dataset
indicating identification of the intrusion activities from normal ones. The number of
intrusion activities is approximately 15 times the normal ones, having the value of 585,710
Appl. Sci. 2021, 11, 3022 8 of 19
for the intrusion label compared to the value of 40,073 for the normal label. Thus, it is an
imbalanced dataset.
5. Proposed Approach
In this section, the components of the proposed approach are discussed. Figure 3
illustrates the steps proposed for detecting the intrusion attacks. As observed in the figure,
the IoT dataset is split into 2/3 and 1/3 divisions indicating the training and testing
portions, which is one of the most common split strategies for classification [61]. The
training portion is then considered for the following three main stages of the proposed
approach:
• Data reduction with clustering
• Oversampling
• Classification with SLFN
These stages are further discussed in the following sections. Finally, the generated
model is evaluated using the testing data portion.
The proposed approach is represented by Algorithm 1. The algorithm accepts the
dataset and several other values including the number of clusters (k), the oversampling
ratio, and the reduction percentage. The training and testing split is presented by Line 1.
The three stages reduction with clustering, oversampling, and classification are presented
on Lines 2–4, 5, and 6, respectively. Then, the testing portion is predicted on Line 7
by the model generated from the classification stage. Finally, the evaluation process is
performed on Line 8 generating the Accuracy (ACC), Precision (PREC), Recall (REC), and
G-mean (GM).
Algorithm 1: SLFN-SVM-SMOTE
Input: dataset, k, ratio, reduction%
Output: ACC, PREC, REC, GM
1 train, test = split(dataset)
2 clusters = k-means++(train, k)
3 updated-clusters = reduce(clusters, reduction%)
4 reduced-dataset = aggregate(updated-clusters)
5 oversampled-dataset = SVM-SMOTE(reduced-dataset, ratio)
6 model = SLFN(oversampled-dataset)
7 predicted-labels = predict(model, test)
8 ACC, PREC, REC, GM = evaluate(predicted-labels)
Appl. Sci. 2021, 11, 3022 9 of 19
Dataset
IOT Evaluation
Intrusion Normal
Train Test
2/3 1/3
Model
Classification
SLF
N
SLFN
Reduction
with clustering Intru
Intrusion
sion
k-means++
Clustering
k-means++
Clusters .
. Norma
. Normal
l
Apply reduction on
each cluster
Aggregation of
clusters
Oversampled
Dataset
Oversampling
SVM SMOTE
Oversampling
Figure 3. The proposed SLFN-SVM-SMOTE technique including the three stages reduction with
clustering, oversampling, and classification.
Clustering
Clustering
k-means++
k-means++
Reduction 10%
Reduction 10%
Aggregation
Aggregation
Figure 4. Data reduction with clustering stage using k-means++, a reduction by 10%, and the
aggregation step.
The updated dataset is then passed to the oversampling stage discussed in the follow-
ing section. This provides another benefit of the reduction process as it is later enlarged
at the oversampling stage, which results in minimizing the processing volume of the
computation while maintaining a useful and comprehensive dataset.
5.2. Oversampling
Unbalanced datasets suffer from poor predicting performance by a classification
algorithm. The classification algorithm generates either an over-fitted model or a model
which is bias toward the majority class. Oversampling techniques are used to solve this
shortcoming, as discussed in Section 3.2. The SVM-SMOTE oversampling technique is
chosen to oversample the reduced dataset into an enlarged one, as observed in Figure 5. The
enlarged dataset is then passed to the classification stage discussed in the following section.
Features Labels
Oversampling
SVM SMOTE
Intrusion
Normal
Figure 5. The SVM-SMOTE oversampling technique to increase the number of instances of the
normal class.
Appl. Sci. 2021, 11, 3022 11 of 19
TP + TN
ACC = (3)
TP + TN + FP + FN
TP
PREC I = (4)
TP + FP
TN
PRECN = (5)
TN + FN
TP
REC I = (6)
TP + FN
TN
RECN = (7)
TN + FP
p
GM = REC I ∗ RECN (8)
where TP is the true positive indicating the number of intrusion instances which are
predicted as intrusion, TN is the true negative indicating the number of normal instances
which are predicted as normal, FP is the false positive indicating the number of normal
instances which are predicted as intrusion, and FN is the false negative indicating the
number of intrusion instances which are predicted as normal.
Appl. Sci. 2021, 11, 3022 12 of 19
20
C1 Normal
18085
0
0 20 40 60 80 100
Figure 6. Distribution of instances before reduction.
100
C3 Intrusion
1609
80 C2 Normal
892
60
C1 Intrusion
33387 C2 Intrusion
4193
40
20
C1 Normal
1846
0
0 20 40 60 80 100
Figure 7. Distribution of instances after reduction.
Appl. Sci. 2021, 11, 3022 13 of 19
Table 2. Evaluation of the framework with basic classifiers including SVM, SGD, LR, and SLFN with
the values of 2, 3, and 4 for k.
0.94
G-mean 0.92
0.90
0.88 SMOTE
ADASYN
SVM-SMOTE
0.86 Borderline1-SMOTE
Borderline2-SMOTE
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Oversampling Ratio
Figure 8. GM evaluation of different values of oversampling ratios and different oversampling
techniques (SMOTE, ADASYN, SVM-SMOTE, Borderline1-SMOTE, and Borderline2-SMOTE).
0.96
0.94
0.92
G-mean
0.90
0.88
0.86
TE
SMr=O0.9) SYN TE
SM0O.9) OTE OTE
ADrA=0.6) M - e 1-SM) e2 -SM)
( ( SV (r= lin .6 lin .5
der =0 der =0
Bor (r Bor (r
Oversampling
Figure 9. The box plot for the GM value forTechnique
different oversampling techniques.
6.7. Discussion
In summary, reduction with clustering, oversampling, and classification stages were
tested for the IoTID20 dataset with a selected values of oversampling ratio and k for k-means
clustering technique using SLFN classifier and SVM-SMOTE oversampling technique.
The reduction with clustering stage produced an undersampled but representative
dataset having the same distribution of normal and intrusion activities. The SLFN classifi-
cation technique and k-means clustering with k value of 3 generated better performance
than other classification techniques and other values of k by having the highest value of
GM. In addition, the performance of the SVM-SMOTE with a value of 0.9 for the over-
Appl. Sci. 2021, 11, 3022 16 of 19
sampling ratio gave better results than other oversampling techniques and other values of
oversampling ratio. Finally, the proposed approach outperformed the other basic classifiers
in terms of GM.
The experiments were limited to the classification of the intrusion identification label,
which could extended to include the category label and the subcategory label for the
IoTID20 dataset. It also considered the specific distribution of the activities for the IoTID20
dataset and could be tested on different datasets having different distribution of activities.
Other works in the future are expected to be observed for the IoTID20 dataset where a
comparison and empirical evaluation can hold. In addition, an issue might arise if there
is a significant difference between the sizes of the resulted clusters in the first stage (i.e.,
some clusters are very small compared to other clusters). In this case, the pattern of such
instances will be underrepresented for the training algorithm and consequently lead to
misclassification of similar instances.
Acknowledgments: The authors would like to acknowledge the support of Prince Sultan University
for paying the Article Processing Charges (APC) of this publication.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Qadri, Y.A.; Nauman, A.; Zikria, Y.B.; Vasilakos, A.V.; Kim, S.W. The Future of Healthcare Internet of Things: A Survey of
Emerging Technologies. IEEE Commun. Surv. Tutor. 2020, 22, 1121–1167. [CrossRef]
2. Ashton, K. That ‘internet of things’ thing. RFID J. 2009, 22, 97–114.
3. Evans, D. The internet of things: How the next evolution of the internet is changing everything. CISCO White Pap. 2011, 1, 1–11.
4. Balogh, Z.; Magdin, M.; Molnár, G. Motion Detection and Face Recognition using Raspberry Pi, as a Part of, the Internet of
Things. Acta Polytech. Hung. 2019, 16, 167–185.
5. AbuNaser, M.; Alkhatib, A.A. Advanced survey of blockchain for the internet of things smart home. In Proceedings of the 2019
IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), Amman, Jordan, 9–11
April 2019; pp. 58–62.
6. Ronaghi, M.H.; Forouharfar, A. A contextualized study of the usage of the Internet of things (IoTs) in smart farming in a typical
Middle Eastern country within the context of Unified Theory of Acceptance and Use of Technology model (UTAUT). Technol. Soc.
2020, 63, 101415. [CrossRef]
7. Casta neda-Miranda, A.; Casta no-Meneses, V.M. Internet of things for smart farming and frost intelligent control in greenhouses.
Comput. Electron. Agric. 2020, 176, 105614. [CrossRef]
8. Sadiq, A.S.; Faris, H.; Ala’M, A.Z.; Mirjalili, S.; Ghafoor, K.Z. Fraud detection model based on multi-verse features extraction
approach for smart city applications. In Smart Cities Cybersecurity and Privacy; Elsevier: Amsterdam, The Netherlands, 2019;
pp. 241–251.
9. Vinayakumar, R.; Alazab, M.; Srinivasan, S.; Pham, Q.V.; Padannayil, S.K.; Simran, K. A visualized botnet detection system based
deep learning for the Internet of Things networks of smart cities. IEEE Trans. Ind. Appl. 2020, 56, 4436–4456. [CrossRef]
10. Gupta, M.; Sandhu, R. Authorization framework for secure cloud assisted connected cars and vehicular internet of things. In
Proceedings of the 23nd ACM on Symposium on Access Control Models and Technologies, Indianapolis, IN, USA, 13–15 June
2018; pp. 193–204.
11. Talboom, J.S.; Huentelman, M.J. Big data collision: the internet of things, wearable devices and genomics in the study of
neurological traits and disease. Hum. Mol. Genet. 2018, 27, R35–R39. [CrossRef]
12. Hamidi, H. An approach to develop the smart health using Internet of Things and authentication based on biometric technology.
Future Gener. Comput. Syst. 2019, 91, 434–449. [CrossRef]
13. Laxmi, A.R.; Mishra, A. RFID based logistic management system using internet of things (IoT). In Proceedings of the 2018 Second
International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 29–31 March
2018; pp. 556–559.
14. Williams, R.; McMahon, E.; Samtani, S.; Patton, M.; Chen, H. Identifying vulnerabilities of consumer Internet of Things (IoT)
devices: A scalable approach. In Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics
(ISI), Beijing, China, 22–24 July 2017; pp. 179–181.
15. Thamilarasu, G.; Chawla, S. Towards deep-learning-driven intrusion detection for the internet of things. Sensors 2019, 19, 1977.
[CrossRef]
16. Griffiths, J. ’Internet of Things’ or ’Vulnerability of Everything’? Japan Will Hack Its Own Citizens to Find Out. 2019. Available
online: http://epicenterla.org/amp/2019/02/03/cnn-internet-of-things-or-vulnerability-of-everything-japan-will-hack-its-
own-citizens-to-find-out/ (accessed on 26 March 2021)
17. Larson, S. FDA Confirms that St. Jude’s Cardiac Devices Can be Hacked. 2017. Available online: https://www.fox61.com/
article/news/local/outreach/awareness-months/fda-confirms-that-st-judes-cardiac-devices-can-be-hacked/520-9a16749b-
751c-4132-b019-b87959c128aa (accessed on on 26 March 2021)
18. Kumar, C.S. Correlating Internet of Things. Int. J. Manag. (IJM) 2017, 8, 68–76.
19. Aljawarneh, S.; Aldwairi, M.; Yassein, M.B. Anomaly-based intrusion detection system through feature selection analysis and
building hybrid efficient model. J. Comput. Sci. 2018, 25, 152–160. [CrossRef]
20. Sarker, I.H.; Kayes, A.; Badsha, S.; Alqahtani, H.; Watters, P.; Ng, A. Cybersecurity data science: an overview from machine
learning perspective. J. Big Data 2020, 7, 1–29. [CrossRef]
21. Alqahtani, H.; Sarker, I.H.; Kalim, A.; Hossain, S.M.M.; Ikhlaq, S.; Hossain, S. Cyber Intrusion Detection Using Machine Learning
Classification Techniques. In Proceedings of the International Conference on Computing Science, Communication and Security,
Gujarat, India, 26–27 March 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 121–131.
22. Mishra, P.; Varadharajan, V.; Tupakula, U.; Pilli, E.S. A detailed investigation and analysis of using machine learning techniques
for intrusion detection. IEEE Commun. Surv. Tutor. 2018, 21, 686–728. [CrossRef]
23. Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic
characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP 2018),
Funchal, Portugal, 22–24 January 2018; pp. 108–116.
Appl. Sci. 2021, 11, 3022 18 of 19
24. Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network
data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra,
Australia, 10–12 November 2015; pp. 1–6.
25. Shiravi, A.; Shiravi, H.; Tavallaee, M.; Ghorbani, A.A. Toward developing a systematic approach to generate benchmark datasets
for intrusion detection. Comput. Secur. 2012, 31, 357–374. [CrossRef]
26. Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the development of realistic botnet dataset in the internet of
things for network forensic analytics: Bot-iot dataset. Future Gener. Comput. Syst. 2019, 100, 779–796. [CrossRef]
27. Pahl, M.O.; Aubet, F.X. All eyes on you: Distributed Multi-Dimensional IoT microservice anomaly detection. In Proceedings
of the 2018 14th International Conference on Network and Service Management (CNSM), Rome, Italy, 5–9 November 2018;
pp. 72–80.
28. Damasevicius, R.; Venckauskas, A.; Grigaliunas, S.; Toldinas, J.; Morkevicius, N.; Aleliunas, T.; Smuikys, P. LITNET-2020: An
annotated real-world network flow dataset for network intrusion detection. Electronics 2020, 9, 800. [CrossRef]
29. Ullah, I.; Mahmoud, Q.H. A Scheme for Generating a Dataset for Anomalous Activity Detection in IoT Networks. In Proceedings
of the Canadian Conference on Artificial Intelligence, online, 13–15 May 2020; Springer: Berlin/Heidelberg, Germany, 2020;
pp. 508–520.
30. Liu, J.; Kantarci, B.; Adams, C. Machine learning-driven intrusion detection for contiki-NG-based IoT networks exposed to
NSL-KDD dataset. In Proceedings of the 2nd ACM Workshop on Wireless Security and Machine Learning, Miami, FL, USA,
28 June–2 July 2020; pp. 25–30.
31. Hindy, H.; Bayne, E.; Bures, M.; Atkinson, R.; Tachtatzis, C.; Bellekens, X. Machine Learning Based IoT Intrusion Detection
System: An MQTT Case Study. arXiv 2020, arXiv:2006.15340.
32. Alharbi, S.; Rodriguez, P.; Maharaja, R.; Iyer, P.; Bose, N.; Ye, Z. FOCUS: A fog computing-based security system for the Internet of
Things. In Proceedings of the 2018 15th IEEE Annual Consumer Communications & Networking Conference (CCNC), Las Vegas,
NV, USA, 12–15 January 2018; pp. 1–5.
33. Verma, A.; Ranga, V. Machine learning based intrusion detection systems for IoT applications. Wirel. Pers. Commun. 2019,
111, 2287–2310. [CrossRef]
34. Yong, B.; Wei, W.; Li, K.C.; Shen, J.; Zhou, Q.; Wozniak, M.; Połap, D.; Damaševičius, R. Ensemble machine learning approaches
for webshell detection in Internet of things environments. Trans. Emerg. Telecommun. Technol. 2020, 2020, e4085. [CrossRef]
35. Qaddoura, R.; Faris, H.; Aljarah, I. An efficient evolutionary algorithm with a nearest neighbor search technique for clustering
analysis. J. Ambient. Intell. Humaniz. Comput. 2020, 1–26.
36. Talavera, J.M.; Tobón, L.E.; Gómez, J.A.; Culman, M.A.; Aranda, J.M.; Parra, D.T.; Quiroz, L.A.; Hoyos, A.; Garreta, L.E. Review
of IoT applications in agro-industrial and environmental fields. Comput. Electron. Agric. 2017, 142, 283–297. [CrossRef]
37. Asghari, P.; Rahmani, A.M.; Javadi, H.H.S. Internet of Things applications: A systematic review. Comput. Netw. 2019, 148, 241–261.
[CrossRef]
38. Iqbal, A.; Ullah, F.; Anwar, H.; Ur Rehman, A.; Shah, K.; Baig, A.; Ali, S.; Yoo, S.; Kwak, K.S. Wearable Internet-of-Things platform
for human activity recognition and health care. Int. J. Distrib. Sens. Netw. 2020, 16, 1550147720911561. [CrossRef]
39. Zielonka, A.; Sikora, A.; Woźniak, M.; Wei, W.; Ke, Q.; Bai, Z. Intelligent Internet-of-Things system for smart home optimal
convection. IEEE Trans. Ind. Inform. 2020, 17, 4308–4317. [CrossRef]
40. Kamble, S.S.; Gunasekaran, A.; Parekh, H.; Joshi, S. Modeling the internet of things adoption barriers in food retail supply chains.
J. Retail. Consum. Serv. 2019, 48, 154–168. [CrossRef]
41. Abdel-Basset, M.; Manogaran, G.; Mohamed, M.; Rushdy, E. Internet of things in smart education environment: Supportive
framework in the decision-making process. Concurr. Comput. Pract. Exp. 2019, 31, e4515. [CrossRef]
42. Ahmed, N.; De, D.; Hussain, I. Internet of Things (IoT) for smart precision agriculture and farming in rural areas. IEEE Internet
Things J. 2018, 5, 4890–4899. [CrossRef]
43. Da Costa, K.A.; Papa, J.P.; Lisboa, C.O.; Munoz, R.; de Albuquerque, V.H.C. Internet of Things: A survey on machine learning-
based intrusion detection approaches. Comput. Netw. 2019, 151, 147–157. [CrossRef]
44. Fu, Y.; Yan, Z.; Cao, J.; Koné, O.; Cao, X. An automata based intrusion detection method for internet of things. Mob. Inf. Syst.
2017, 2017, 1–13. [CrossRef]
45. Ioulianou, P.; Vasilakis, V.; Moscholios, I.; Logothetis, M. A signature-based intrusion detection system for the internet of things.
Inf. Commun. Technol. Form 2018, 1–6. in press.
46. Asharf, J.; Moustafa, N.; Khurshid, H.; Debie, E.; Haider, W.; Wahab, A. A review of intrusion detection systems using machine
and deep learning in internet of things: Challenges, solutions and future directions. Electronics 2020, 9, 1177. [CrossRef]
47. Smys, S.; Abul, B.; Haoxiang, W. Hybrid Intrusion Detection System for Internet of Things (IoT). J. ISMAC 2020, 2, 190–199.
[CrossRef]
48. Jan, S.U.; Ahmed, S.; Shakhov, V.; Koo, I. Toward a lightweight intrusion detection system for the internet of things. IEEE Access
2019, 7, 42450–42471. [CrossRef]
49. Almomani, I.; Alenezi, M. Efficient Denial of Service Attacks Detection in Wireless Sensor Networks. J. Inf. Sci. Eng. 2018,
34, 977–1000.
50. Almomani, I.; Al-Kasasbeh, B.; Al-Akhras, M. WSN-DS: A dataset for intrusion detection systems in wireless sensor networks.
J. Sens. 2016, 2016, 1–16. [CrossRef]
Appl. Sci. 2021, 11, 3022 19 of 19
51. Choudhary, S.; Kesswani, N. Cluster-Based Intrusion Detection Method for Internet of Things. In Proceedings of the 2019
IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA), Abu Dhabi, United Arab Emirates,
3–7 November 2019; pp. 1–8.
52. Bostani, H.; Sheikhan, M. Hybrid of anomaly-based and specification-based IDS for Internet of Things using unsupervised OPF
based on MapReduce approach. Comput. Commun. 2017, 98, 52–71. [CrossRef]
53. Telikani, A.; Gandomi, A.H. Cost-sensitive stacked auto-encoders for intrusion detection in the Internet of Things. Internet Things
2019, 100122, in press. [CrossRef]
54. Ullah, I.; Mahmoud, Q.H. A two-level hybrid model for anomalous activity detection in IoT networks. In Proceedings of the 2019
16th IEEE Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 11–14 January 2019;
pp. 1–6.
55. Shahriar, M.H.; Haque, N.I.; Rahman, M.A.; Alonso, M., Jr. G-IDS: Generative Adversarial Networks Assisted Intrusion Detection
System. arXiv 2020, arXiv:2006.00676.
56. Maniriho, P.; Niyigaba, E.; Bizimana, Z.; Twiringiyimana, V.; Mahoro, L.J.; Ahmad, T. Anomaly-based Intrusion Detection
Approach for IoT Networks Using Machine Learning. In Proceedings of the 2020 International Conference on Computer
Engineering, Network, and Intelligent Multimedia (CENIM), Surabaya, Indonesia, 17–18 November 2020; pp. 303–308.
57. Qaddoura, R.; Faris, H.; Aljarah, I.; Castillo, P.A. Evocluster: an open-source nature-inspired optimization clustering framework
in python. In Proceedings of the International Conference on the Applications of Evolutionary Computation (Part of EvoStar),
Seville, Spain, 15–17 April 2020; pp. 20–36.
58. Qaddoura, R.; Faris, H.; Aljarah, I. An efficient clustering algorithm based on the k-nearest neighbors with an indexing ratio.
Int. J. Mach. Learn. Cybern. 2020, 11, 675–714. [CrossRef]
59. Fernández, A.; Garcia, S.; Herrera, F.; Chawla, N.V. SMOTE for learning from imbalanced data: Progress and challenges, marking
the 15-year anniversary. J. Artif. Intell. Res. 2018, 61, 863–905. [CrossRef]
60. Yang, J.; Ma, J. Feed-forward neural network training using sparse representation. Expert Syst. Appl. 2019, 116, 255–264.
[CrossRef]
61. Dobbin, K.K.; Simon, R.M. Optimally splitting cases for training and testing high dimensional classifiers. BMC Med. Genom. 2011,
4, 1–8. [CrossRef] [PubMed]
62. Lemaître, G.; Nogueira, F.; Aridas, C.K. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in
Machine Learning. J. Mach. Learn. Res. 2017, 18, 1–5.
63. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.;
et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830.