Real Time Intrusion Detection System For IoT Networks
Real Time Intrusion Detection System For IoT Networks
Abstract—The proliferation of IoT devices has piqued the network [9]. Also the proposed system was peer-to-peer and
interest of several adversaries looking for a different means to consolidated. Machine learning algorithms were used to
gain unauthorized access to systems or for other illicit reasons. detect real-time anomalous network behaviors but more
As a result, protecting these devices is essential. The IDS acts accuracy was needed for detection of attacks which also
as a second line of defense after the firewall and can be failed to categorize it into different types.
beneficial in the IoT networks. This paper presents a Real
Time Intrusion Detection System based on the Machine Nilam Upasani and Hari Om came up with a modified
Learning model Random Forest and has been set up for the neuro-fuzzy classifier and it has been implemented using a
2021 6th International Conference for Convergence in Technology (I2CT) | 978-1-7281-8876-8/21/$31.00 ©2021 IEEE | DOI: 10.1109/I2CT51068.2021.9417815
IoT node consisting of Arduino, NodeMCU and an Ultrasonic modern GPU [7]. That helped them achieve a considerable
sensor. Unlike most of the systems that train and test the model speedup for training, classification and recognition phases.
only on data from the dataset, this has been tested with real They had a low false positive rate and recall time, however,
time network traffic. The dataset used is self made, created by the dataset used, KDD cup’99 was not IoT specific and
monitoring the network traffic of our IoT network and not the using GPUs is also not suitable for most IoT devices.
usual popular dataset that is not IoT specific.
Muder Almiani, Alia AbuGhazleh, Amer AI-Rahayfeh,
Keywords—Intrusion detection, Internet of Things, Network Saleh Atiewi proposed a Fog computing based deep
security, Machine learning, Real Time IDS recurrent neural network for IoT intrusion detection system
[5]. It is trained and tested using the NSL-KDD dataset. It
I. INTRODUCTION adapted a recurrent Neural Network trained by an advanced
With the Internet of Things and wireless networks version of backpropagation algorithm where each network
becoming mainstream and ubiquitous, the major concern in was adaptively tuned to different parameters to enhance
IoT systems is handling the security of IoT devices and detection of specific types. Although this model showed a
protection of data from attacks. Protection from these high sensitivity to Dos attacks, it proved insufficient to
attacks is a challenge due to heterogeneity of devices and detect other types of attacks.
protocols, direct exposure of devices to the internet and Felipe de Almeida Florencio, et al. have created an IDS
resource constraints on devices. Security solutions that can using a Multilayer Perceptron NN and tested it on an
provide real time attack detection and diminution are in Arduino [3]. They did test the model on a low powered
demand. The goal of our system is to address this security device however, the dataset used was NSL-KDD which is
gap by implementing an IDS to provide a comprehensive not specific to IoT devices. Also, the tests were done on the
security solution for IoT networks. dataset and not on real time network traffic.
An Intrusion Detection System is a proactive intrusion Muhammad Ashfaq Khan, Md. Rezaul Karim and
detection tool used to detect and classify intrusions, attacks Yangwoo Kim proposed a scalable and hybrid intrusion
or violations of the security policies automatically at detection system based on the Convolutional-LSTM
network level, host level or hybrid infrastructure in a timely network which acted as misuse detection model for both
manner. In order to defend against cyber attacks in IoT, our local and global latent threat signatures and Spark ML
paper presents a real time IDS for IoT networks using which acted as anomaly detection module using ISCX-UNB
random forest algorithm to identify 5 different types of dataset [6]. ML classification algorithms such as DT, RF,
attacks - Wrong Setup, Distributed Denial of Service, Data GBT and SVM were used for detecting attacks which
Type Probing, Scan Attack and Man in the Middle. showed better accuracy and less computational complexity.
II. RELATED WORK However, no attack was detected on real-time streaming of
data.
A. Literature Survey
Tariq Ahamad Ahanger, Usman Tariq and Muneer Nusir
Valerio Morfino, Salvatore Rampone have presented a have presented a real time system using edge computing
random forest based solution that focuses on the popular syn where the processing is transferred to the edge of the
dos attack [12]. They’ve used Apache Spark which is network i.e. on one server/computer of the network [11].
efficient in handling big data, which is used to create a real They show lower RAM and CPU consumption than Snort
time system here. However, the performance is measured and BroIDS, which is normal since the project is being
based on accuracy and not other important factors like false compared to full blown production systems.
alarm rates that tend to be higher in systems like these.
Vinayakumar R, Mamoun Alazab, Soman KP,
Shiver Chawla, Geethapriya Thamilarasu proposed an Prabaharan Poornachandran, Ameer Al-Nemrat, and
independent integrated intrusion detection system which Sitalakshmi proposed a hybrid based scalable framework
provided security as a service and was integrated into any
Authorized licensed use limited to: Univ Sannio. Downloaded on December 03,2021 at 16:53:46 UTC from IEEE Xplore. Restrictions apply.
employed using deep learning model with DNN which was
chosen by comparing the performance with the classical
machine learning classifiers [13]. The DNN model
performed well on KDDCup 99 and was applied on other
datasets like NSL-KDD, UNSW-NB15, Kyoto, WSN-DS,
CICIDS 2017 to conduct the benchmark . But this hybrid
model is not IOT specific and does not give detailed
information on the structure and characteristics of malware.
Overall, the performance can be further improved by
training complex DNN architectures.
Authorized licensed use limited to: Univ Sannio. Downloaded on December 03,2021 at 16:53:46 UTC from IEEE Xplore. Restrictions apply.
TABLE I. DATASET - IOT TRAFFIC the notifications module, the web socket is created which
Feature Description Example initializes the loop in consumers.py that fetches traffic data
frame.number serial number of frame 31 from file.csv in real time. This data is sent to the FE where it
Apr 11, 2020 is displayed. It updates in real time so the user can monitor
frame.time receiving time of frame
18:57:12.387597272 IST IoT network traffic.
frame.len frame length 241 7) Dashboard: The dashboard module displays the
eth.src source mac address 97:21:ea:d4:cc:2a distance data received from the IoT node with the help of
eth.dst destination mac address 61:fe:8b:ac:ef:31 graphics, thereby simulating the applications of the IoT
ip.src source ip address 192.168.0.106 network. A web socket is created between the FE and the
ip.dst destination ip address 129.146.49.110 php script websockets.php. An initiation message is sent
ip.proto ip protocol 6 which triggers the loop in the php script which reads the
ip.len ip length 227
data from datastorage.txt and constantly sends this data to
tcp.len tcp length 175
Dashboard.js where the node activity is shown.
tcp.srcport tcp port of source 51246
8) Visualisations: Visualisations modules showcases
tcp.dstport tcp port of destination 443
the data exploration, preprocessing and comparisons in
_ws.col.Info summary of information Application Data
results from the dataset and ML model. It has the visuals –
ML model comparison, types of attack, types of protocol,
b) Model Creation: The generated dataset was filtered correlation matrix, confusion matrix, effects of
for erroneous data in dataFilteration.py, resampling was oversampling. The results were obtained in the python files
done to deal with the issue of fewer attack data points than dataFilteration.py, resampling.py, modelCompare.py,
normal data in resampling.py, then it was divided into dataVisual.py and displayed in the visualisations module.
training and testing datasets and a Random Forest classifier
was trained based on training data in model.py. This was D. Algorithm used
tested using the test data and later optimized. This model Random Forest is a supervised machine learning
has been used in consumers.py in the Django backend for algorithm that can be used for classification as well as
real time attack classification. regression [1]. As the name suggests, it is a robust forest
c) Attack Distribution: A total of 5 attacks were made up of many decision trees in which the process of
performed during the training and testing phase. The finding the root node and splitting the feature nodes will
following table describes these attacks along with their take place randomly. It has many advantages as compared to
frequency. other classifiers in ML like it overcomes the problem of
overfitting, it can handle missing values, can be modelled
TABLE II. ATTACK DISTRIBUTION for categorical values, possesses very high accuracy and is
Oversampled
flexible to use in real-time. Because of these advantages, we
Id Attack Description Frequency used a random forest classification model on the training set
Frequency
0 Normal Normal traffic 79035 79035 with 20 trees in the forest, with criteria for the Gini
1
Wrong IoT Node wrongly
7691 82285 impurity, random state as 3 and maximum depth of the tree
Setup setup
Multiple malicious
as 3. Further, the test set results are predicted using the
Distributed classifier object. This classifier object is saved as a pickle
devices blocking the
2 Denial of 16596 79020
Service
services to deny the (.pkl) file which is then imported in consumers.py file to
legitimate user
Sending data with the
filter the data in real-time.
Data Type wrong data type
3
Probing (String instead of
209 79002 E. Performance Metrics
Integer) The metrics used for gauging the performance of our
Reconnaissance of the system are as follows [10].
open system ports
4 Scan Attack
before the actual
21612 79052 1) Confusion Matrix: This can be used to understand
attack the correctness and accuracy of the model. All the following
Unknowingly performance measures will be calculated on the basis of
Man in the
5 intercepting traffic 15 79032
Middle
between two nodes confusion matrix. It has the actual values on the X-axis and
the predicted values on the Y-axis. The following table
5) Notifications: The notification module is shows the confusion matrix that we used with 6 fields
responsible for displaying the notification to the user (normal + attacks). The values used are the encoding from
whenever an attack has been detected in the IoT network. A the previous Table, Attack Distribution.
web socket (attackNotif ws) is created between the React.js
TABLE III. CONFUSION MATRIX - FOR 3 (DTP ATTACK)
front end and Django backend to send notification data from
BE to FE whenever an attack is detected. The ws starts the Actual
real time read loop in consumers.py which takes the data 0 1 2 3 4 5
from file.csv, processes it and constantly keeps feeding to 0
the ML model. The model makes classification (normal or Pr 1 TN FN TN
edi
attack) in real time and sends a notification if an attack is cte
2
detected. This notification is displayed in the front end. d 3 FP TP FP
6) Network Logs: The network logs module is used to 4
TN FN TN
display the traffic in the IoT network in real time. Similar to 5
Authorized licensed use limited to: Univ Sannio. Downloaded on December 03,2021 at 16:53:46 UTC from IEEE Xplore. Restrictions apply.
a) TP: True Positives are the packets that were TABLE V. CLASSIFICATION REPORT - DATASET TESTING
actually the attack, here Data Type Probing attack {3} and Id Precision b Precision Recall b Recall F1-score b F1-score
were predicted as that attack {3} as well. 0 1.00 1.00 0.99 0.99 0.99 0.99
b) TN: True Negatives are the packets that were 1 0.85 0.99 1.00 1.00 0.92 1.00
actually not the attack {0,1,2,4,5} and were predicted as any 2 1.00 1.00 1.00 1.00 1.00 1.00
other attack {1,2,4,5} or as normal {0}. 3 0.00 1.00 0.00 0.99 0.00 1.00
c) FP: False Positives are the packets that were not
4 1.00 1.00 1.00 1.00 1.00 1.00
the attack {0,1,2,4,5} but were predicted as that attack {3}.
d) FN: False Negatives are the packets that were 5 0.00 1.00 0.00 1.00 0.00 1.00
actually the attack {3} but were predicted as not the attack Accuracy b 0.9892138063279002
{0,1,2,4,5}. Accuracy 0.9973748149803111
2) Accuracy: is the fraction of correctly predicted
values either attack or normal out of the total. For multiclass Although the accuracy remains similar, there is a drastic
classification the accuracy of the model is the average per improvement in precision, recall and f1-score for attacks 3
class accuracy. Accuracy alone can be deceiving since a (Data Type Probing) and 5 (Man in the Middle) and
highly accurate model can have low recall and overall 1(DDoS) to some extent. This classification report shown in
would be a bad model. Table V shows the results of the tests conducted on the split
σೖ
ುశಿ dataset. However, the next classification report in Table VI
సభ
Accuracy (model) = ುశಿశಷುశಷಿ
(2) shows the results of tests conducted on real time traffic. The
actual values and predicted values were fetched while
3) Precision: is the number of packets correctly performing all the attacks one after the other and passed to
identified as the attack out of all the packets predicted as the the classification report function.
attack. This will be calculated on a per class basis.
TABLE VI. CLASSIFICATION REPORT - REAL TIME TESTING
்
Precision (class) = (3) Id Precision Recall F1-score Support
்ାி
0 1.00 0.91 0.95 53769
4) Recall: is the number of packets correctly
identified as the attack out of all the actual attacks. This will 1 0.01 1.00 0.02 27
also be calculated on a per class basis. This metric is 2 1.00 1.00 1.00 1224
important here since having a low recall would mean 3 0.97 1.00 0.98 700
missing the identification of potential attacks. 4 0.00 1.00 0.01 5
5 0.00 0.00 0.00 0
்
Recall (class) = (4) Accuracy 0.9117810677433826
்ାிே
5) F1-score: is the harmonic average instead of the It can be observed that the accuracy remained fairly high at
arithmetic mean of the precision and recall. It gives a 91.18%, but the real picture of the performance of the model
combined score that gives importance to recall as well as can be obtained by looking at the other metrics. The F1-
precision. score for types 0 (normal), 2 (DDoS), 3 (Data Type
ଶכ௦כோ
F1-score = (5) Probing) are very high showing good overall performance,
௦ାோ
however the other types did poorly on the F1-score. The
recall on 1 (wrong setup) and 4 (scan attack) are very high
IV. RESULTS which shows that of all the ones that were attacks were
correctly classified but since the precision is low, the
A. Confusion Matrix
predicted attacks were much higher than the actual attacks
TABLE IV. CONFUSION MATRIX i.e. it had a high false positive rate.
Actual V. CONCLUSION
0 1 2 3 4 5
An Intrusion Detection System has been developed for
0 23309 105 0 34 53 62
IoT networks that can be used in real time to improve
1 0 24815 0 0 0 0
Predi security. Using this system, the user can fetch their IoT
2 0 0 23576 0 0 0
cted network data, monitor network traffic, get notified when an
3 0 0 0 23675 0 0
intrusion is detected in the network. This has been achieved
4 0 0 0 0 23530 0
using the ML model Random Forest. The reduced latency of
5 0 0 0 0 0 23947
the system which makes it real time is achieved by web
sockets. The model suffers from a reduced precision rate for
B. Classification Report some attacks like Scan attack and MITM, leading to high
The classification report combines the performance rate of false alarms which is common with these types of
metrics - precision, recall, f1-score and accuracy all in one models. However it shows a high level of accuracy 91.18%
table. The report shows the performance of the model before in real time testing and correctly classifies most of the
and after oversampling. The measures with ‘b’ after them attacks.
are the ones that were performed before oversampling.
Authorized licensed use limited to: Univ Sannio. Downloaded on December 03,2021 at 16:53:46 UTC from IEEE Xplore. Restrictions apply.
VI. FUTURE WORK Computing Systems Engineering (SBESC), Salvador, Brazil, pp. 190-
195, 2018.
More work can be done in the future to reduce the false [4] K.V.V.N.L Sai Kiran, R.N. Kamakshi Deviesetty, N. Pavan Kalyan,
K. Mukundini, and R. Karthi, “Building a Intrusion Detection System
alarm rates and make the model more robust. The system for IoT Environment using Machine Learning Techniques,” Elsevier
B.V., pp. 2372-2379, 2020.
can be enhanced by adding more protocols and attacks, [5] M. Almiani, A. AbuGhazleh, A. AI-Rahayfeh, S. Atiewi and A.
which will help in covering more types of IoT devices. New Razaque, “Deep recurrent neural network for IoT intrusion detection
system,” Elsevier B.V, vol.101, November, pp. 102031, 2019
types of attacks can be added while training to make the [6] M. Ashfaq Khan, Md. R. Karim, and Y. Kim, “A Scalable and Hybrid
model more extensive. The system can be tested with a Intrusion Detection System Based on the Convolutional-LSTM
Network,” Symmetry, pp. 583, April, 2019.
higher number of nodes in the IOT network. [7] N. Upasani, H. Om, “A modified neuro-fuzzy classifier and its
parallel implementation on modern GPUs for real time intrusion
detection”, Applied Soft Computing, vol. 82, Elsevier B. V., June
ACKNOWLEDGMENT 2019
[8] R. Hattarki, S. Houji, S. Dixit, S. Patil, "Real Time Intrusion
We thank Prof. M.R Dhage for her expert guidance and Detection System for IoT Networks using Random Forest", GitHub
continuous encouragement throughout to see that adequate Repository, 2020, https://github.com/s3r-be/be-project
[9] S. Chawla and G. Thamilarasu, “Security as a Service: Real-time
research had been conducted to approve this project. We Intrusion Detection in Internet of Things,” ACM ISB, April, pp. 2-4,
would also like to thank our teammates, Sahil Dixit and 2018.
Sanika Patil who worked alongside us to finish the project [10] S.Mohammed. “Performance Metrics for Classification Problems in
Machine Learning”. Feb. 2019,
within the required time frame. medium.com/@MohammedS/performance-metrics-for-classification-
problems-in-machine-learning-part-i-b085d432082b.
[11] T. A. Ahanger, U. Tariq and M. Nusir, "Real-Time Methodology for
REFERENCES Improving Cyber Security in Internet of Things Using Edge
[1] B. Yu, “Analysis of a Random Forests Model,” Journal of Machine Computing During Attack Threats," 2019 International Conference on
Learning Research Gerard Biau, pp. 1063-1095, 2012. Smart Systems and Inventive Technology (ICSSIT), Tirunelveli,
India, pp. 293-297, 2019.
[2] C. Callegari, E. Bucchianeri, S. Giordano and M. Pagano, "Real Time
Attack Detection with Deep Learning," 2019 16th Annual IEEE [12] V. Morfino, S. Rampone, “Towards Near-Real-Time Intrusion
International Conference on Sensing, Communication, and Detection for IoT Devices Using Supervised Learning and Apache
Networking (SECON), Boston, MA, USA, pp. 1-5, 2019. Spark.”, Electronics, vol. 9, pp. 444, March 2020.
[3] F. de Almeida Florencio, E. D. Moreno, H. Teixeira Macedo, R. J. P. [13] Vinayakumar R, M. Alazab, Soman KP, P. Poornachandran, A. Al-
de Britto Salgueiro, F. Barreto do Nascimento and F. A. Oliveira Nemrat, et al., “Deep Learning Approach for Intelligent Intrusion
Santos, "Intrusion Detection via MLP Neural Network Using an Detection System,” IEEE Access, unpublished, pp. 2169-3536, 2018.
Arduino Embedded System," 2018 VIII Brazilian Symposium on
Authorized licensed use limited to: Univ Sannio. Downloaded on December 03,2021 at 16:53:46 UTC from IEEE Xplore. Restrictions apply.