0% found this document useful (0 votes)
13 views7 pages

VanDartel BA EEMCS

This document discusses the use of machine learning for malware detection in Internet of Things (IoT) devices, highlighting the increasing infections targeting these devices and the potential for real-time detection on resource-constrained hardware. The research aims to determine if simplifying the classification to two labels ('benign' and 'malicious') improves performance and feasibility for single IoT nodes. The study utilizes the IoT-23 dataset and evaluates various machine learning algorithms to achieve effective anomaly detection in network traffic.

Uploaded by

Kristian Dokic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views7 pages

VanDartel BA EEMCS

This document discusses the use of machine learning for malware detection in Internet of Things (IoT) devices, highlighting the increasing infections targeting these devices and the potential for real-time detection on resource-constrained hardware. The research aims to determine if simplifying the classification to two labels ('benign' and 'malicious') improves performance and feasibility for single IoT nodes. The study utilizes the IoT-23 dataset and evaluates various machine learning algorithms to achieve effective anomaly detection in network traffic.

Uploaded by

Kristian Dokic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Malware detection in IoT devices using Machine Learning

Bram van Dartel


University of Twente
P.O. Box 217, 7500AE Enschede
The Netherlands
[email protected]

ABSTRACT attractive as a target for malware. Often there is not even


The Internet of Things (IoT) is growing rapidly all over an option to monitor such a device. As the amount of
the world, while its security lacks behind. More than 30% observed infections in IoT devices grows, it becomes more
of all the infections observed in mobile networks were tar- clear that steps should be taken to combat these infections.
geted on IoT. Machine learning is suited for detecting mal- One of the possible steps is to use machine learning to ac-
ware on these, often unsupervised, devices and the results tively detect malicious incoming traffic. Since IoT gener-
are promising. At this point, however, such detection in a ates a lot of data, this can be used as a training dataset
single IoT node has not been done yet because IoT nodes for such a machine learning algorithm. One example of
often have weak processors. In this project, the possibili- such a dataset is the IoT-23 dataset [5].
ties of malware detection in a single IoT device are investi- Using machine learning to detect malware in IoT is not
gated by trying to scale machine learning algorithms such something new. In earlier research on this IoT-23 dataset,
that a single IoT device can perform near real-time net- machine learning was used to try to classify the type of
work traffic anomaly detection, marking packets as ‘mal- malware [6]. Classifying the type of malware is not rele-
ware’ or ‘benign’. Using one of the machine learning algo- vant for an IoT device, however, knowing whether a con-
rithms, it is possible to implement the proposed program nection is malicious is. Letting each IoT device detect for
on an ESP32-chip that can classify data points from the itself whether a connection is malicious is something that
IoT-23 dataset. When fully implemented, this could mean has not been done yet. For this research, the following
that, in the future, IoT devices will be able to check for goal has been set:
themselves whether a network connection is part of a mal-
ware attack or if it is a ‘normal’ connection. Goal Discover whether it is possible to make machine learn-
ing scalable such that a single IoT device can detect
Keywords malware attacks in real-time.
Machine learning, IoT, IoT Malware, Malware detection
IoT, IoT-23 To achieve this goal, the following Research Questions
have to be answered:
1. INTRODUCTION
In 1999, the term “Internet of Things” was coined by Kevin • RQ 1: Does a machine learning algorithm for detect-
Ashton, with which he meant that devices should gather ing anomalies with only two labels (namely ‘benign’
data, instead of just people collecting data [1]. The cur- and ‘malicious’) work better than algorithms where
rent meaning of the Internet of Things, also known as IoT, the malware is defined by multiple labels?
is not that different. Nowadays, IoT is referred to as a • RQ 2: What is the time performance and storage
network of small devices, that can sense the physical envi- size of the malware classifier with only two labels?
ronment or act on the physical environment. An example
of such an IoT device is a self-driving car, as it both senses • RQ 3: When comparing the newly proposed solution
the physical environment it is in and acts upon it. in RQ 1 to current algorithms, which would better
The global market size of IoT grows rapidly [2] and with fit a single IoT node?
approximately 10 billion devices in 2021 [3], there are more
• RQ 4: Would it be possible to implement the solu-
of these devices than people in the world. Next to this in-
tion found in RQ 3 and, when looking at the time
creasing amount of the devices, the amount of IoT devices
performance on the chip, would it still be a good
responsible for all infections observed in mobile networks
solution?
went up from 16.17% in 2019 to 32.72% in 2020 [4].
Since usually IoT devices are not supervised [4], they are By the end of the project, it is expected that this research
contributes by checking whether using only two labels in-
Permission to make digital or hard copies of all or part of this work for
fluences the accuracy and time performance of malware
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and that detection using machine learning. Furthermore, it con-
copies bear this notice and the full citation on the first page. To copy oth- tributes by actually showing a proof of concept that ma-
erwise, or republish, to post on servers or to redistribute to lists, requires chine learning for malware detection can be run on IoT
prior specific permission and/or a fee. devices.
35th Twente Student Conference on IT Jul. 2nd , 2021, Enschede, The
Netherlands. The structure of the rest of the research paper is as follows:
Copyright 2021, University of Twente, Faculty of Electrical Engineer- An overview of related work of malware detection in IoT
ing, Mathematics and Computer Science. using machine learning will be given in Section 2. Then,

1
in Section 3, the methodologies used to answer the afore- imately 53 GB. The amount of data points, or packets, is
mentioned research questions will be discussed. Section 4 325,307,990 of which there are 294,449,255 malicious dat-
then will show the results that were obtained by following apoints.
the methodology. By using the results, in Section 5 a con- The dataset contains, next to benign packets, malware
clusion will be drawn, after which Section 6 will contain packets that were categorized in 14 different sub-labels.
the discussion and future work of this project. Since this research focuses on malware detection and not
the classification of the malware category, the sub-labels
2. RELATED WORK of the IoT-23 dataset are not deemed relevant. The orig-
In this section, an overview will be given of work related inal dataset has 23 features that can be used to train a
to IoT malware detection using machine learning. classifier. The features, together with a short description,
are included in Table 1.
In 2020, Ngo et al. [7] already researched the possibilities
of malware detection in IoT and Android devices using 3.2 Modifying the dataset
machine learning. In this survey, they mainly focused on
To load the plain-text files in Python [12], some modi-
malicious files that could be executed. In the machine
fications had to be done to the dataset. Next to these
learning algorithms, static features were used of the files,
modifications, some other changes had to be incorporated
such as the Opcodes or by running strings on a file. The
as well to change the dataset into a dataset that was useful
accuracy of the different methods ranges between 85% and
for this research.
99.8%. The RIPPER-classifier used for the ELF-header is
interesting for comparison, since the classifying time of
this algorithm is only 0.75 seconds, while the accuracy is
3.2.1 Loading the dataset
99.8%. It should be noted, however, that the time perfor- The first goal was to load the dataset consisting of dif-
mance was not tested on a simple IoT device. ferent files into Pandas data frames [13] since these data
frames are easy to work with. To load the different files,
Diro and Chilamkurti [8] worked on a distributed attack some modifications had to be done in the different text
detection scheme for IoT in 2017. For this research, they files. There were still comments and a short explanation
used deep learning to classify network traffic on an IoT in every single one of the files, so they had to be removed.
device. They created so-called distributed fog nodes which Furthermore, the different columns received their name,
are used for model training. Here, the deep model had a so they could easily be worked with.
precision of 99.36% (on benign data) for 2-class detection
and 99.52% (on benign data) for 4-class detection. Diro One thing that was noticed, later on, is that the divider
and Chilamkurti used the NSL-KDD [9] dataset for their of the columns is not always the same, meaning that all
model. occurrences of three spaces after each other had to be re-
placed by the divider, which is the tab-character.
Hasan et al. [10] have worked on the dataset provided by
Pahl et al. [11] to classify network data according to eight To train a classifier, later on, the different datasets had
different labels, seven of which are malware types and the to be merged. This was done by loading all the different
eighth label for ‘normal’. The dataset that was used, has datasets into Pandas data frames and then merging them
13 features with which they managed to get an accuracy of into one. The new data frame was then stored as a CSV
99.4% using the algorithms Decision Tree, Random Forest file.
and Artificial Neural Network. Looking at other metrics,
Random Forest performs comparatively better. It should 3.2.2 Formatting & Pre-processing the dataset
be noted that the data used in the research was based on At this point, the whole dataset can be loaded from a
a virtual environment. CSV file into one single data frame. There are however
some columns that have different types. For example, a
In 2020, Stoian [6] already worked with the IoT-23 dataset ‘-’ - symbol is used in integer columns to represent the
[5] to find out the best machine learning algorithm to clas- number zero. This was replaced by modifying the data
sify the different connections from the dataset. The result frame using Pandas’ built-in functionalities.
for the project was a precision of 99.5% with the Random
Forest algorithm. In the comparison with other studies, Now, the features are manipulated to serve the goal of
Stoian showed that the results using the IoT-23 datasets this research. First of all, Stoian [6] already discovered
are in line with what could be expected, given the other using statistical correlation analysis that the features ts,
works. In this research, ten different labels were used. uid, id.orig_h, local_orig, local_resp, missed_bytes
and tunnel_parents can be removed. This was due to
the weak relationship between the feature and the label.
3. METHODOLOGIES Another reason was that of some columns, namely the last
In this section, an elaboration is given on how the research three mentioned, there was not enough data such that a
was performed. correlation could be measured.
3.1 Initial dataset Before removing the ts-label, a new feature was created
In this research, the IoT-23 dataset [5] is used. This that was based on the timestamp; the difference between
dataset contains 20 captures of malware executed in IoT the timestamps, or the inter-arrival time. To retrieve this
devices and 3 benign captures. The data was captured in interval, the whole dataset was sorted based on the times-
the Stratosphere Laboratory at the CTU University in the tamp. Now, the difference could be calculated and stored
Czech Republic. The dataset offers the captures both in in a new column time_diff.
pcap format, which is the raw data capture, and in a la- Last, before going to the machine learning step, the dif-
belled file with 23 different features. For this research, the ferent strings were translated to integers using the scikit-
labelled files were used, since they contained all the data learn LabelEncoder [14]. This means that a table is cre-
that was needed. The 20 different labelled files together ated with an encoding from a number to a string so that
form the dataset that can be used as input for a classifier. every unique string corresponds with a unique number.
In total, this means that the size of this dataset is approx- Now that this encoding is done, machine learning can be

2
Name Name in dataset Type Description
Timestamp ts Float Timestamp of the packet
Unique Identifier uid String Unique identifier of the packet
Origin host id.orig_h String IP-address of the origin of the packet
Origin port id.orig_p Integer Port-number of the origin of the packet
Response host id.resp_h String IP-address of the responding host of the packet
Response port id.resp_p Integer Port-number of the responding host of the packet
Protocol proto Integer Protocol over which the data was sent
Service service String Service of the data in the packet (e.g. HTTP, DNS)
Duration duration Float Duration of the connection with the other host
Origin # of bytes orig_bytes Integer Amount of bytes sent by origin
Respondent # of bytes resp_bytes Integer Amount of bytes sent by respondent
Connection state conn_state String State of the connection
Local origin local_orig Boolean Did the connection originate locally?
Local response local_resp Boolean Did the response originate locally?
# of missed bytes missed_bytes Integer Number of bytes missing in the packet
History history String History of the connection state
Originating packets orig_pkts Integer Amount of packets originated from host
Originating bytes orig_ip_bytes Integer Amount of bytes originated from host
Responded packets resp_pkts Integer Amount of packets responded by host
Responded bytes resp_ip_bytes Integer Amount of bytes responded by host
Tunneled parent tunnel_parents String Unique ID of parent if packet was tunneled
Label label String Classification of the packet (benign/malware)
Detailed label detailed_label String If the label is malware, the type of malware, otherwise empty

Table 1. The initial features of the IoT-23 labelled dataset.

performed on the dataset and the research questions can Trees tend to over-fit on the training data, which can be
be answered. prevented by using a more complex classifier, such as a
Random Forest classifier, which has Decision Trees as a
3.3 On answering RQ 1 basis.
To check whether a machine learning algorithm for detect-
ing anomalies with only two labels works better than algo- 3.3.2 Random Forest
rithms where the malware is defined by multiple labels, the The Random Forest classifier uses Decision Trees as a basis
results of the two different methods should be compared. for its classification. By computing multiple different Deci-
To do this, first of all, the labelled file was adapted, so sion Trees and then taking an average, the Random Forest
that only the labels ‘benign’ and ‘malware’ would be used. often has higher accuracy than a Decision Tree [16]. The
Then, different machine learning algorithms were used to problem of over-fitting is overcome since the trees picked
be able to compare the results of using only two labels is randomised in the algorithm. Since a Decision Tree is
with the results of having multiple labels, as Stoian [6] computationally light, a Random Forest does not become
did. much more advanced. The number of trees used in the
Random Forest does however have an impact on the size
The used algorithms are considered the more ‘simple’ ma- of the classifier and, next to that, have an impact on the
chine learning algorithms, meaning that they might have time it takes to classify a data point. This means there
a higher chance of being able to run on a single IoT node. is a trade-off between accuracy and the complexity of the
This is both storage-wise and speed-wise. Also, these three algorithm.
algorithms have been used in referenced work, meaning it
is possible to compare the results of this research with 3.3.3 Naïve Bayes
results from other researchers. The Naı̈ve Bayes algorithm is based on Bayes’ Theorem,
To answer which of the two works better, a look was taken which is
at the accuracy, precision, recall, and F1-score. These P (B|A) ∗ P (A)
P (A|B) =
metrics are calculated by using the True Positive (TP), P (B)
True Negative (TN), False Positive (FP) and False Neg- Using this formula, one can calculate what the probability
ative (FN) metrics. A comparison will be made between is of a statement A given a statement B holds. In order
this project and the work of Stoian, since that is the only to calculate this probability, the probability of statement
research that is currently done using the IoT-23 dataset. B given statement A times the probability of statement A
should be divided by the probability of B.
3.3.1 Decision Tree
Naı̈ve Bayes classifiers are, just like the tree-based algo-
A Decision Tree is a tree with nodes and leaves. The
rithms, very fast. Also, these types of classifiers do not
nodes can be interpreted as if-statements, while the leaves
need a lot of training data to already make excellent pre-
contain the categories and thus are the result of the classi-
dictions [17]. Last, but certainly not least, Naı̈ve Bayes
fication. When starting with the classification, one starts
classifiers are computationally light, which also means they
at the start-node and follows the tree via the edges to a
do not take up much storage space when saved.
leaf, like a flow-chart.
A Decision Tree is the computationally lightest algorithm 3.3.4 Metrics
of the three. This means that a small chip should be able The following metrics were used to determine the perfor-
to classify a data point. Decision Trees are not very robust mance of the classifier.
but can handle errors in the dataset [15]. Besides, Decision

3
Accuracy the different options to determine which algorithm would
The accuracy is specified as the formula fit a single IoT node best.
TP + TN 3.6 On answering RQ 4
accuracy =
TP + FP + TN + FN The fourth and final research question is the moment of
and calculates the proportion of correctly classified points truth; would it be possible to implement this on an ESP32-
over the total amount of classified points. Higher accuracy chip [19]? This chip is one of the most common chips used
means a better classifier. in the IoT since it is low-cost, low-power and has both
Wi-Fi and Bluetooth (Low Energy). If it is possible to
implement the proposed solution at RQ 3, it would be a
Precision proof of concept. For this research question, the LOLIN32
Precision is defined as the ability of the algorithm to not [20] board has been used, since this board has 4MB flash
label a negative sample as positive. It is defined as memory, which is sufficient for the classifier. Furthermore,
TP the board has a micro USB port, so it could be easily
precision = connected to a computer to upload code.
TP + FP
Concretely, this means that precision is the ability of the To implement the proposed solution, it should be checked
algorithm to not label a legitimate packet as malware. first whether the size of the proposed classifier is not larger
than the storage space on the ESP32. Since this is the
case, the implementation can be continued. First, it was
Recall tried to implement the algorithm using MicroPython [21],
The recall metric is specified as which did not seem a viable option. This was due to the
TP several extra modules that had to be installed, which took
recall = more storage space on the board than was available. An-
TP + FN
other method was used, namely using the Python module
and constitutes the ability of the classifier to find all the ‘micromlgen’ [22], which is made for the export of classi-
positive samples, so packets, in the test dataset. fiers from Python to C, such that small devices can also
run machine learning algorithms.
F1-score By using this library, it was possible to generate C-code
Since the classification that is done is only on two labels, for the Decision Tree classifier. Of the other classifiers
the true F1 metric can be used. It is the weighted average used, only the Naı̈ve Bayes would have fitted. Although
of the precision and recall, defined by the file size of the Decision Tree classifier now was bigger,
precision ∗ recall namely 170KB, it still fitted easily on the board. By using
F1 = 2 ∗ the ‘EloquentTinyML’ library [23] written for Arduino, it
precision + recall
was possible to write C-code such that datapoints, stored
as an array, could be classified.

3.4 On answering RQ 2 4. RESULTS


To answer this second research question, RQ 1 must be This section will go more in-depth on the results that were
answered. The two options, namely the dataset with only obtained by following the described methodology. First of
two labels and the dataset with a label per type of mal- all, the experimental setup will be discussed, followed by
ware, should be compared on the time they take to exe- the analysis of the results retrieved from the research.
cute. A comparison was made between the options found
after answering RQ 1. This was done by running the test
4.1 Experimental Setup
subset of the dataset on the classifier and measuring the For the experimental setup, a dockerized virtual machine
time. When the complete dataset was classified, the total was used to run the classification. The operating system
time was divided by the number of points that were clas- used is Ubuntu 20.04.2 LTS 64-bit. The processor used
sified, to get an average time that the classifier takes per was a 2x8-core Intel E5-2640 v3, CPU @ 3.40 GHz. The
point. memory of the machine was 768 GB RAM. Implemen-
tation has been done using Python 3.8.5 64-bit and the
Next to the time, it is also important to see what the size module ‘Pandas’ for loading in the data in data frames and
of the classifier is. Although training does not need to be generating correlation matrices. Furthermore, the module
done on the small IoT device, the classifier itself, however, ‘scikit-learn’ has been used for the machine learning and
needs to be stored on the device. The ‘pickle’ module in metrics.
Python [18] can store a Python object, in this case, the
classifier, in a file so that it can be retrieved at a later Furthermore, as mentioned earlier, the LOLIN32 [20] board
point. In such a pickle file the classifier can be stored has been used, which has the ESP32 chip on board with
and uploaded to the IoT device. The size of this file is 512kB of RAM and a 240MHz dual-core processor. The
measured as the classifier size. flash memory on this board is 4MB.

3.5 On answering RQ 3 4.2 Result analysis


The third research question is used to get to the goal of the 4.2.1 Research Question 1
research. Now that RQ 1 and RQ 2 are answered, a com- For RQ 1, it was important to first implement the machine
parison is made to find out which (if any) algorithm would learning algorithm with only two labels. To answer the
work better for a single IoT node to detect an anomaly in question of whether a machine learning algorithm for de-
the network traffic. This comparison is made based on tecting anomalies with only two labels works better than
both the metrics mentioned at RQ 1, the size of the clas- algorithms where the malware is defined by multiple la-
sifier and the time performance of the algorithm, which is bels, the results of Stoian [6] are also included. First of
measured at RQ 2. A choice should be made here between all, in Table 2 the metrics per algorithm are shown.

4
Figure 1. DT Figure 2. RF (n=10) Figure 3. RF (n=100) Figure 4. NB

As can be seen in this table, there is a negligible difference of a tree is automatically chosen by the scikit-learn mod-
between the Decision Tree and Random Forest when look- ule. The number of trees has been tried with two different
ing at the metrics. All the metrics perform almost equally settings; 10 trees and 100 trees. The accuracy when us-
well, which is not surprising when there are only two labels ing 100 trees was negligible higher, namely smaller than a
included in the machine learning algorithm. Although it tenth per cent. On the other hand, the size of the classifier
is negligible, it is interesting to see the confusion matri- incremented from 3.2 MB to 31 MB, meaning that using
ces in Figures 1, 2, 3 and 4. As can be seen, the Random more trees is not worth it when looking at the size, since
Forest algorithm makes more mistakes in the classification there is only a limited amount of storage space on an IoT
than the Decision Tree does. This is interesting, since a device.
Random Forest is an average of different Decision Trees,
meaning that it should perform better than a single De- Classifier Size (KB) Time (ns)
cision Tree. The number of trees used, first 10 and then Decision Tree 63.710 173.44
100, did not make a significant difference in the confusion Random Forest (n=10) 3313.0 990.36
matrices. This points to the Decision Tree over-fitting Random Forest (n=100) 32090 9,447.0
on the dataset. Furthermore, the Naı̈ve Bayes performs Naı̈ve Bayes 1.098 350.68
worse than the Decision Tree and Random Forest. Prob-
ably, this has to do with the fact that both the Decision Table 4. Measurements of the size and time for the different
Tree and Random Forest only do classification, while the classifiers. Random Forest has been included in both the
Naı̈ve Bayes algorithm gives back a probability of being in 10 and the 100 trees variant.
a class. The Naı̈ve Bayes algorithm is not made to have
classification as main-focus, which can be seen back in the In Table 4, the results of the average time that the classi-
metrics. fication of one point takes and the size of the classifier are
included. As can be seen, the Decision Tree has the low-
DT RF (n=10) RF (n=100) NB est size, while being one of the faster algorithms. It should
Accuracy 0.999 0.999 0.999 0.906 be noted that the time of the classification is really fast,
Precision 0.999 0.999 0.999 0.916 which is due to the experimental setup. When running
Recall 0.999 0.999 0.999 0.986 this program on an IoT device, the time a classification
F1-score 0.999 0.999 0.999 0.946 takes will be significantly longer. This is due to the fact
that the processor on the small board is not as fast as the
Table 2. Results of the metrics for the algorithms Decision processor used in the research environment. It is however
Tree (DT), Random Forest (RF) and Naı̈ve Bayes (NB). good to already have an overview of the ranking in the
speed of the classifiers.
When comparing the results to Stoian, see Table 3, it is 4.2.3 Research Question 3
clear that the metrics of this project are better than his, To check for malware in real time on a single IoT device,
although it is just marginal. The conclusion that can be there are two limiting factors: the time performance and
drawn here is that using only the labels ‘benign’ and ‘mal- the size of the classifier. When, again, looking at Table 4,
ware’ for the classification of the packets has better met- it becomes clear that the Random Forest with 100 trees
rics than classification in ‘benign’ and the distinguished certainly is not a good solution. The size of the classifier
types of malware. It should be noted however, that this is too large for an IoT device and next to that, almost the
project could be seen more as an anomaly detector, in- same metrics performance can be guaranteed using the
stead of a real classifier. In the dataset, however, there Random Forest of only 10 trees.
is more training data on the ‘malware’ label than there is
on the ‘benign’ label. The conclusion can be drawn in any Then, when looking at Table 2, the Naı̈ve Bayes algorithm
case that the different metrics are for all but one algorithm does not perform that well in comparison with the other
better than when using multiple labels. two remaining classifiers. Then, only the Decision Tree
and Random Forest with 10 trees are left.
4.2.2 Research Question 2 Looking at the metrics of the two algorithms, there is al-
The second question to answer is what the time perfor- most no difference, meaning that a look should be taken
mance and storage size of the classifier with only two labels again at the size and time performance of the two classi-
is. First of all, some more explanation on the configura- fiers. What is clear, is that the size of the Random Forest
tion of the Random Forest classifier is needed. The depth is more than 50 times as large as the Decision Tree and

5
Research Best performing classifier Accuracy Precision Recall F1-score
Stoian [6] Random Forest 1.00 0.995 1.00 1.00
This project Decision Tree, Random Forest 1.00 0.999 1.00 1.00

Table 3. Comparison in metrics between this research and Stoian

that the time it takes for the Random Forest to classify Next to the encoding of the strings, only three different
a data point is more than 5 times as long. Since the in- algorithms in machine learning have been used. Although
fluence on the performance metrics is not substantial, the the time performance and metrics are excellent, it is al-
Decision Tree classifier seems to be the best to implement ways possible to improve. Now, the Decision Tree classi-
on a single IoT device to check for malware in real time. fier has been chosen as the best algorithm to implement,
but maybe another algorithm can be better. It should
4.2.4 Research Question 4 be noted, however, that not all the different types of algo-
As mentioned in the methodology, it was possible to imple- rithms are supported by the micromlgen module, meaning
ment the Decision Tree classifier on the ESP32 chip. The that probably a different library should be used.
exported C-code of the classifier has been used, which has
a size of 170KB. When running the classification process 6.2 Future research
on 250 data points and then dividing the time it took by In this research, some optimisations might be possible.
250, it can be concluded that it takes approximately 0.24 The goal is to make the classifier as fast as possible, with
ms per datapoint to be classified. It should be noted that a small size, while also maintaining the highest metrics
the total time is only measured per hundredth of a second, possible. In the correlation matrix in Figure 5, it can be
meaning it is not very accurate. seen that there are a lot of features that do not correlate
much with the label. There are only two features that
Seeing only a short time and a small size that is taken
stand out in this matrix, namely the id.resp_p, which
up for the classifier on the ESP32-chip, the Decision Tree
is the respondents port, and the duration, which is the
classifier still is a good choice to classify the data points
duration of the connection. A possible optimisation could
given in real time. The program running on the IoT device
be to only use these two features to reduce the classifier’s
that has the real IoT functionality for which the device was
size and increase the speed. The question remains whether
bought can, storage-wise, run easily next to the malware
the metrics are as good as they are right now. The newly-
detection algorithm.
added feature of the inter-arrival time did only have a
negligible correlation with the label, meaning it was not
5. CONCLUSION very useful for the classifier.
The goal of this research is to discover whether it is pos-
sible to make machine learning scalable such that a single
IoT device can detect malware attacks in real-time. First
of all, it is discovered that it is possible to do this. Using
the Decision Tree classifier and an ESP32, it is possible
to run the machine learning algorithm on a single IoT de-
vice. Next to only being able to run the algorithm on an
ESP32, the classifier itself also has excellent performance
metrics, meaning that it is also very accurate. It can be
concluded that malware detection on a single IoT device
is a promising possibility to reduce the number of infected
IoT devices.

6. DISCUSSION
This discussion will be split into two different parts. First
of all, the possible limitations of this project are discussed,
after which some optimisations and future research will be
mentioned.
Figure 5. Correlation matrix of the complete dataset
6.1 Limitations
Some limitations can be found in this research. First of Furthermore, now only the machine learning part has been
all, the encoding of the strings in the dataset should be implemented on an ESP32-chip. Future research could try
mentioned. The origin host for example is an IP address to implement a whole program that gathers the internet
that was interpreted as a string, meaning it was encoded packets received, extracts the features that are used and
according to it by the LabelEncoder. The LabelEncoder classify them. To really implement the real time malware
encoded them by giving every unique string a unique num- detection on a chip this size, the whole process should only
ber. When a text occured multiple times, they were en- take a fraction of a second, which is still something that
coded as the same number. IP addresses give away a lot needs to be researched.
of information about the host, for example about the lo- There are still a couple of data points that are misclas-
cation but also about the Internet Service Provider (ISP). sified. Although the count is negligible for the Decision
This means that if a lot of malware comes from a certain Tree and Random Forest, it is interesting to see whether
IP range or location, there is no way to detect that using all these points are from the same type of malware. It
this implementation. In future research, a solution may be might be the case that the algorithm finds it (just a bit)
found that takes the IP address into account in a better harder to classify malware from a certain type, but also
way than just encoding it as a string. to make such conclusions there should be extra research

6
diving into this topic. [18] “pickle — Python object serialization.” [Online].
Available:
https://docs.python.org/3/library/pickle.html
7. REFERENCES [19] “The Internet of Things with ESP32.” [Online].
[1] A. Kevin, “That ’ Internet of Things ’ Thing,” RFiD Available: http://esp32.net/
Journal, p. 4986, 2010. [20] “LOLIN D32 Pro — WEMOS documentation.”
[2] Lionel Sujay Vailshery, “Forecast end-user spending [Online]. Available: https:
on IoT solutions worldwide from 2017 to 2025,” //www.wemos.cc/en/latest/d32/d32.htmlhttps:
2021. [Online]. Available: https://www.statista. //docs.wemos.cc/en/latest/d32/d32 pro.html
com/statistics/976313/global-iot-market-size [21] “MicroPython - Python for microcontrollers.”
[3] A. Holst, “Number of Internet of Things (IoT) [Online]. Available: https://micropython.org/
connected devices worldwide from 2019 to 2030,” [22] “GitHub - eloquentarduino/micromlgen: Generate C
2021. [Online]. Available: code for microcontrollers from Python’s sklearn
https://www.statista.com/statistics/1183457/ classifiers.” [Online]. Available:
iot-connected-devices-worldwide/ https://github.com/eloquentarduino/micromlgen
[4] Nokia, “Threat Intelligence Report 2020,” Tech. [23] “GitHub - eloquentarduino/EloquentTinyML:
Rep., 2020. [Online]. Available: Eloquent interface to Tensorflow Lite for
https://onestore.nokia.com/asset/210088 Microcontrollers.” [Online]. Available: https:
[5] A. Parmisano, S. Garcia, and M. J. Erquiaga, “A //github.com/eloquentarduino/EloquentTinyML
labeled dataset with malicious and benign IoT
network traffic.” 2020. [Online]. Available:
https://www.stratosphereips.org/datasets-iot23
[6] N.-A. Stoian, “Machine Learning for Anomaly
Detection in IoT networks: Malware analysis on the
IoT-23 Data set,” Ph.D. dissertation, University of
Twente, 2020. [Online]. Available:
https://essay.utwente.nl/81979/
[7] Q. D. Ngo, H. T. Nguyen, V. H. Le, and D. H.
Nguyen, “A survey of IoT malware and detection
methods based on static features,” ICT Express,
vol. 6, no. 4, pp. 280–286, dec 2020.
[8] A. A. Diro and N. Chilamkurti, “Distributed attack
detection scheme using deep learning approach for
Internet of Things,” Future Generation Computer
Systems, vol. 82, pp. 761–768, may 2018. [Online].
Available: https://linkinghub.elsevier.com/retrieve/
pii/S0167739X17308488
[9] M. Tavallaee, E. Bagheri, W. Lu, and A. A.
Ghorbani, “A detailed analysis of the KDD CUP 99
data set,” in 2009 IEEE Symposium on
Computational Intelligence for Security and Defense
Applications. IEEE, jul 2009, pp. 1–6. [Online].
Available:
http://ieeexplore.ieee.org/document/5356528/
[10] M. Hasan, M. M. Islam, M. I. I. Zarif, and
M. Hashem, “Attack and anomaly detection in IoT
sensors in IoT sites using machine learning
approaches,” Internet of Things, vol. 7, p. 100059,
sep 2019. [Online]. Available: https://linkinghub.
elsevier.com/retrieve/pii/S2542660519300241
[11] M.-O. Pahl and F.-X. Aubet, “All Eyes on You:
Distributed Multi-Dimensional IoT Microservice
Anomaly Detection,” IEEE, 2018. [Online].
Available:
https://ieeexplore.ieee.org/document/8584985
[12] “Python.” [Online]. Available:
https://www.python.org/
[13] “pandas - Python Data Analysis Library.” [Online].
Available: https://pandas.pydata.org/
[14] “scikit-learn: machine learning in Python.” [Online].
Available: https://scikit-learn.org
[15] L. Rokach and O. Maimon, Decision Trees. Boston,
MA: Springer US, 2005, pp. 165–192. [Online].
Available: https://doi.org/10.1007/0-387-25465-X 9
[16] L. Breiman, “Random Forests,” Tech. Rep., 2001.
[17] H. Zhang, “The Optimality of Naive Bayes,” Tech.
Rep. [Online]. Available: www.aaai.org

You might also like