Botnet Detection in The Internet of Things Using Deep Learning Approaches
Botnet Detection in The Internet of Things Using Deep Learning Approaches
Abstract—The recent growth of the Internet of Things (IoT) responsibility and incorporate security mechanisms into their
has resulted in a rise in IoT based DDoS attacks. This paper devices. Until such a time, the IoT has the potential to become
presents a solution to the detection of botnet activity within the new playground for future cyber attacks and therefore
consumer IoT devices and networks. A novel application of
Deep Learning is used to develop a detection model based presents a number of challenges. Since an increasing number
on a Bidirectional Long Short Term Memory based Recurrent of DDoS attacks seek to leverage consumer level IoT devices,
Neural Network (BLSTM-RNN). Word Embedding is used for the issues highlighted previously, coupled with a lack of
text recognition and conversion of attack packets into tokenised technical knowledge or awareness of inherent vulnerabilities,
integer format. The developed BLSTM-RNN detection model is by owners of these devices, presents one such problem. This
compared to a LSTM-RNN for detecting four attack vectors
used by the mirai botnet, and evaluated for accuracy and loss. challenge is further compounded by a lack of convenient user
The paper demonstrates that although the bidirectional approach interface on many consumer IoT devices, making detection
adds overhead to each epoch and increases processing time, it and awareness of attacks in home networks practically impos-
proves to be a better progressive model over time. A labelled sible for consumers.
dataset was generated as part of this research, and is available To substantiate this issue, we undertook preliminary re-
upon request.
Index Terms—Deep Learning, LSTM, Word Embedding, IoT, search and created a secure sandboxed botnet environment.
Botnet, Mirai, DDoS. An IoT IP Camera was successfully infected, and leveraged to
perform a sequence of DDoS attacks against a selected target.
I. I NTRODUCTION During the infection process and attacks, the camera did not
The Internet of Things (IoT) is expected to usher in an display any adverse symptoms of infection, and continued to
era of increased connectivity, with an estimated 50 billion function as expected. Remote access to the device was still
devices expected to be connected to the Internet by 2020 possible, and performance did not appear to be degraded. Live
[1]. At its core, the aim of the IoT is to connect previously video streaming continued to be as responsiveness as prior to
unconnected devices to the Internet [2], thus creating smart the attacks, therefore without any clear signs of an infection it
devices capable of collecting, storing and sharing data, without was confirmed that, detection or awareness or botnet activity
requiring human interaction [3] [4]. Many of these IoT devices would prove very difficult within consumer networks.
are aimed at consumers, who value low cost and ease of Current methods of botnet detection such as signature or
deployment over security. These market forces have resulted flow based anomaly intrusion detection, have proved ineffec-
in IoT manufacturers omitting critical security features, and tive in preventing the spread of IoT botnets. Largely due to
producing swathes of insecure Internet connected devices, simple code mutations rendering attack signatures obsolete or
such as IP cameras and Digital Video Recorder (DVR) boxes. a lack of protocol support (NetFlow, Sflow) within consumer
Such vulnerabilities and exploits are often derived and epit- networks and equipment.
omised by inherent computational limitations, use of default This paper presents a solution to the detection of botnet
credentials and insecure protocols. The rapid proliferation of activity within consumer IoT devices and networks. A novel
insecure IoT devices and ease by which attackers can locate detection model was developed based on a Deep Bidirectional
them using online services, such as shodan, provides an ever Long Short Term Memory based Recurrent Neural Network
expanding pool of attack resources. By comprising and lever- (BLSTM-RNN). Detection was performed at the packet level,
aging multitudes of these vulnerable IoT devices, attackers can and focused on text recognition within features, normally
now perform large scale attacks such as spamming, phishing discarded by other flow based detection methods. Word Em-
and Distributed Denial of Service (DDoS), against resources bedding was used for text recognition and conversion, and
on the Internet [5]. proved to be an effective method for predicting attack vectors.
The rise in IoT based DDoS attacks, witnessed in recent The BLSTM-RNN detection model was compared with a
years, will likely continue until IoT manufacturers accept LSTM-RNN, and evaluated for accuracy and loss.
Attack Command
Legend
such as network packets and malware designed to compromise Scan
DNS Query
Load Bot
specific IoT devices. Attack detections in IoT systems is Infection
the special service requirements, such as low latency, resource Tap0 Attack
specificity, distributed nature, mobility, to mention a few Deep Learning Detection Model:
[10]. This means that conventional network attack detection
Mirrored Port
has limited application in addressing IoT security problems. Data Tokenisation Defining
Pre-Proecessing
Attack Command
Modelling
Load Bot
According to Kaspersky Lab, in 2016 the majority of IoT Normalisation Fitting
and Hajime malware [11].A considerable number of zero-day Testing and Classification
attacks are continuously emerging because of the addition of Anomaly Detection
various IoT protocols. Most of these attacks are small variants
of previously known cyber-attacks that present a difficulty in Output:
Alert User
their detection even for advanced computational intelligence DDoS Attack
IoT Device (a)
mechanisms such as traditional machine learning systems.
Scan
Previous literature have suggested the potential of lever-
aging machine learning to enhance security threat hunting, IoT User Target
but it is not practical to simply integrate machine learning IoT Device (b)
in static and dynamic cyber security analysis due to the Fig. 2. Botnet Architecture and Deep Learning Detection Model
wide variety and distribution of IoT devices, particularly
for (inexpensive) IoT devices with limited processing power
[12]. On the other hand, the success of deep learning (DL) been successfully applied in mitigating against botnet attacks.
in various big data fields has attracted noticeable interest One example is the use of swarm intelligence for destroying
in cybersecurity fields. The application of DL has become any rigid master-slave relationship between bots and for auton-
practical because of the advances in computer architecture omizing the bot operating roles [15]. The evolving behaviour
(e.g. NVIDIA DGX platforms) and in development of new of botnets often enables them to circumvent the traditional de-
neural network libraries (such as Theano and Tensorflow for tection approaches. The development of behavioural detection
instance); also, the availability of large and diverse training approaches, however, have helped in dealing with the constant
datasets made a contribution to the effectiveness of deep change in the botnet activities by finding the common patterns
learning algorithms. that botnets follow across their life cycle. For instance, all the
Deep learning (DL) enables several breakthroughs of con- bots need connect to the C&C server to receive new orders,
ventional AI tasks in the fields of image processing, pattern and this kind of behaviour observed only after a long period
recognition and computer vision. Deep networks are capable of of time can guide the detection methods.
achieving significant improvement in accuracy of classification One implication of observing the network traffic over a
and predictions in these complex tasks. The main benefit of long period is the necessity to successfully deal with large
deep learning is the absence of manual feature engineering, data sequences. Recurrent neural networks (RNN) in general,
unsupervised pre-training and compression capabilities which and one of its variants the Long Short Term Memory (LSTM)
enable the application of deep learning feasible even in re- network have been proven effective in recognizing the different
source constraint networks. It means that the capability of DL sequences of states that change over time, bridging thereby
to self-learning results in higher accuracy and faster process- long time lags between relevant input and target output [16].
ing, which can be effectively utilised for a novel distributed This type of structure is theoretically well suited and has
attack detection in IoT systems [13]. This is very important been proven a powerful model for tagging tasks with appli-
in the context of IoT security because such systems face a cations in natural language processing, machine translation,
plethora of security problems, including jamming, spoofing, Image recognition, and the like [17]. A bidirectional LSTM
replaying and eavesdropping, but also prone to issues related (BLSTM), furthermore, introduces two independent layers to
to resource constraints e.g. out-of-memory accesses, unsafe accumulate contextual information both from the past and the
programming languages, etc. [14]. future [12]. The main contribution of this paper is the applica-
This research is aimed at adopting a deep learning approach tion of the variants of LSTM networks for implementing deep
to cybersecurity to enable the detection of botnet attacks. Other learning in network traffic analysis aimed at detecting botnet
machine learning and evolutionary computing techniques have attacks.
and control messages between the C&C server and the infected After capturing all five attack scenarios using the .pcap
IoT IP camera (bot) were also captured, as was normal traffic format, the capture files were converted to .csv files. In order
generated by the camera. to train and validate our detection model, ground-truth labels
To capture packets and generate the necessary dataset the norm, mirai, udp, dns, ack were assigned to the captured data,
tcpdump command tcpdump W 5 C 500 w datacapture was ready to be ingested into the detection model. The total number
issued, where -W stipulates to split the capture into a maxi- of samples captured by each attack type can be seen in Table
mum of five files and -C stipulates that the maximum capture II. The cleaned column represents the total number of samples
file size should be 500mb. once packets with missing data have been removed.
The necessary data was captured in a series of five separate V. M ODEL C OMPARISON AND D ISCUSSION
captures, which would later be concatenated into a single To compare our deep learning detection models a series
dataset. The first capture (normal.pcap) consisted of normal of four experiments were performed for each. Since uni-
IoT device traffic, for a duration of 2 hours and included directional LSTM-RNN only preserve information from the
normal device communication on the network, and also two past, the aim of the comparison was to ascertain if the use
remote connections to the camera to view the video feed, each of a bidirectional LSTM-RNN, which is able to accumulate
of which lasted 5 minutes. contextual information from both past and future, could return
Mobaxterm was used to create a secure shell (ssh) into better accuracy or loss metrics for our captured dataset. For
the C&C server, before executing command screen ./cnc Experiment 1 each attack type was split between train and
from within the mirai/release directory, to start the MYSQL validate, presented to each model and trained over a total of 20
database. A second remote session was used to telnet and log iterations. The mean accuracy and loss metrics for each attack
into the C&C server, ready to issue attack commands to the were measured, and are presented in Table V. As can be seen
infected IoT IP camera. A third remote session was used to from the results, both models returned high accuracy and pre-
ssh into the Scan/Loader server, before executing the ./loader diction for mirai, udp, and dns attack types. However, returned
command from within the mirai/release directory, to scan the less favourable results for ack attacks, despite this attack
network for available IoT devices to infect. having the highest number of samples. This was possibly due
The initial scanning process and device infection was cap- to the nature and complexity of information in the info feature,
tured in the second capture (mirai.pcap) which also included as seen in Table IV, where the sequence numbers in each ack
the infected camera scanning on ports 23 and 2323 for new packet changed. Despite this, a pattern can however be seen
devices to infect. The third capture (udp.pcap) consisted of a on rows one and two, where sequence numbers (59693-41058,
single (udp) flood attack, whereby the C&C server issued the 41058-59693) of contiguous packets were clearly linked, and
attack command, and the infected IoT device flooded its target packet size and Length were consistent. Unfortunately some
with bursts of (udp) packets for a total period of 60 seconds. packets appeared out of sync as can been in rows three
The fourth capture (dns.pcap) consisted of a single (dns) flood and four, and possibly resulted in the detection model not
attack, whereby the C&C server issued the attack command, recognising this pattern, contributing to the lower detection
and the infected IoT device flooded its target with bursts of rate, and significantly higher loss metric. By contrast, although
(dns) packets for a total period of 60 seconds. The fifth capture the mirai captured packets in Table III appear to be equally
(ack.pcap) consisted of a single (ack) flood attack, whereby the complex, the information in the info feature, remained largely
C&C server issued the attack command, and the infected IoT the same, possibly aiding better detection.
device flooded its target with bursts of (ack) packets for a total Since multi-vector DDoS attacks were highlighted as being
period of 60 seconds. a growing issue in Section II, Experiment 2 consisted of norm,
Train Validate BLSTM Accuracy LSTM Accuracy BLSTM Loss LSTM Loss
Mirai 387060 208418 99.998992 99.571605 0.000809 0.027775
UDP 391002 210540 98.582144 98.521440 0.125630 0.125667
ACK 411384 221515 93.765198 93.765198 0.858700 0.858773
DNS 391622 210874 98.488289 98.488289 0.116453 0.116453
Mulit-Vector (with ACK) 419887 226094 91.951002 91.951002 0.841303 0.841381
Mulit-Vector (without ACK) 395564 212996 97.521033 97.521033 0.115293 0.115293
Mulit-Vector (with three ACK) 468534 252289 92.243513 92.243513 0.161890 0.242358
0.95 0.95
0.90 0.90
0.85 0.85
Test Dataset Test Dataset
0.75 0.75
0.70 0.70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1.50 1.30
1.40
1.20
1.30
1.10
1.20
Test Dataset Test Dataset
1.10 1.00
Train Dataset Train Dataset
1.00
0.90
0.90
0.80
0.80
0.70 0.70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
mirai, udp, dns, and ack captures being concatenated to form a observe the variation in accuracy and prediction.
multi-vector attack scenario. Results on row 5 of Table V show Row 7 of Table V shows an increase in sample size, im-
the impact of the ack attack on the overall detection accuracy proves the overall validation accuracy to 92%, with BLSTM-
and particularly loss metrics. To validate this observation, RNN returning the better loss metric, meaning this model was
Experiment 3 consisted of norm, mirai, udp, and dns captures able to better predict attack traffic, when presented with a
being concatenated to form a multi-vector attack scenario, larger sample size.
minus the ack attack. Results on row 6 of Table V show that Fig. 3 through to Fig. 6 show accuracy and loss metrics for
once the ack attack is removed, overall detection accuracy and the detection models. Although metric results are comparable,
prediction of the model are very good. A final validation of this and the bidirectional approach adds overhead to each epoch,
observation was conducted in Experiment 4 which consisted of increases processing time, the trajectory shows a better pro-
three ack attacks were performed during the same time frame, gressive model over time. A larger dataset with more samples,
increasing the total sample size of ack attacks, in order to could further demonstrate the benefit of BLSTM-RNN.