Rev3 Paper
Rev3 Paper
traffic, so as to make up for their respective shortcomings. SVM, XGBoost, and RF algorithms with TLS/SSL features
The modified National Institute of Standard and Technology for binary detection and multi-classification of encrypted
(NIST) test suite is utilized by the authors to simplify the malicious traffic, and finally proposed that XGBoost with
interactions between packet processing and feature extraction. selected TLS/SSL features can achieve a 99.15% accuracy
A handshake-skipping algorithm (HST-R) is also proposed under their own training dataset. Anderson and McGrew et al.
to skip the handshake process during feature extraction, so [28] combined the unique features of TLS/SSL protocol and
as to avoid bringing the wrong message to statistical-based statistical numerical features of the encrypted traffic to detect
approaches. It also further increases the classification accuracy HTTPs malicious traffic. Yang et al. [8] reassembled SSL
of the proposed machine learning based classifier. Zhao et records from packets and proposed an SSL malicious traffic
al. [73] proposed an algorithm to perform the identification detection method by deep learning combination of LSTM,
of encrypted traffic from both public and private encryption AE, and CNN. Torroledo et al. [90] proposed a deep learning
protocols which is named as encrypted traffic identification based identification system to detect legitimate and malicious
based on weighted cumulative sum test (EIWCT). The method TLS traffic certificates by utilizing the TLS certificate content.
utilizes the EIWCT to process each new incoming packet Meghdouri et al. [80] proposed an RF detection model for
instead of testing after all packets arrived, as well as identifies TLS and Internet Protocol Security (IPSec) protocols with
the traffic based on weighted conflated results. The proposed novel cross-layer features, which comprise of features in the
algorithm reduces computation time and achieves the online application level, conversation level and endpoint behaviour
identification of encrypted traffic from common encrypted level at the same time.
protocols, such as SSL and SSH, and other encrypted private There are many works on the binary classification of en-
protocols efficiently. Above 90% accuracy is achieved for SSL crypted traffic as malicious or legitimate [6]. For example, SSL
and private encryption protocol traffic in their experiments. malicious traffic detection model [8] classified SSL traffic as
Shahbar et al. [94] conducted comparative experiments based malicious or legitimate without considering the malware fam-
on supervised learning algorithms such as RF and Naı̈ve Bayes ilies. Side-channel features based encrypted malicious traffic
for The Onion Router (Tor) and Invisible Internet Project (I2P) detection model [14] focused on binary classification. HTTPs
encrypted traffic classification. malicious traffic detection model in [28] is to classify traffic
For the research on encrypted malicious traffic detection, into enterprise and malware traffic. Some works also pro-
various approaches have been proposed in existing works. posed multi-class classification of traffic belonging to different
Many detection approaches are proposed regardless of the type malware families. For example, Liu et al. [12] proposed a
of encrypted protocols. Stergiopoulos et al. [14] proposed a framework, which is comprised of detection models for 24
machine learning based encrypted malicious traffic detection malware types. However, the performance of the models in
model by using only a few features. The authors compared 7 the framework was not tested separately. If any model in the
different algorithms on their traffic datasets with multiple pro- set of detection models performs badly, it is assumed that the
tocols to prove they use fewer features, fewer data and shorter other models with good performance will buffer the overall
time to achieve similar or even higher performance com- result. Zeng et al. [9] proposed a framework of encrypted
pared with other existing works. Furthermore, there are also traffic multi-class classification and intrusion detection based
many works focusing on certain encryption protocols. Most on deep learning. The framework is constructed from three
research aim to detect encrypted malicious traffic under HTTPs different deep learning algorithms, CNN, LSTM, and SAE.
protocols, because HTTPs has become the most commonly However, there is no existing work that covers all types of
used encryption mechanism for both legitimate and malicious traffic and malware families. The performance of multi-class
traffic encryption. Research [11][13][25][28][29][33][92] ex- classification is also observed to be generally worse than the
tracted unique features from TLS/SSL protocol to train their binary classification approaches.
HTTPs detection models. Shekhawat et al. [29] compared In all, existing research can be summarised in four general
approaches: consecutive working days. The real traffic collection method
1. Binary classification regardless of the encryption proto- ensures the authenticity of the network traffic in the dataset,
cols. but it also has some shortcomings. Due to the low frequency of
2. Binary classification under certain encryption protocols. real network attacks compared to normal network interaction,
3. Multi-class classification based on malware families the collected dataset is often imbalanced and has a lot of
regardless of the encryption protocols. redundant data, which requires a lot of time for screening and
4. Multi-class classification based on malware families labelling.
under certain encryption protocols. The simulated traffic generation usually generates targeted
Meanwhile, the encrypted traffic classification can also be traffic datasets through the construction of a simulated network
applied to IoT devices domain. IoT devices are generally or the use of a script simulator. Yu et al. [74] built a network
resource limited and more susceptible to cyber attacks. Due to structure which includes a gateway server and several intranet
the resource constraints and heterogeneity of IoT devices, tra- clients. Metasploit is then used to generate different attacks
ditional endpoint and network security solutions are not a good to the gateway of servers, and Wireshark is utilized to collect
fit for securing IoT devices. Six challenges are summarised traffic at the gateway of servers. In [53], part of traffic data is
in securing IoT devices by [20]. They are endpoint security also generated by simulation. Datasets collected and generated
solutions with limited support, challenges in protecting device- by this method often contain more attacks and malware types.
to-device communications, high IoT devices diversity, high They are also more balanced than the datasets from real traffic
security deployment and operational costs, privacy, perfor- collection, and are more suitable for machine learning. How-
mance and attack risk trade off, and that most edge networks ever, the datasets generated from simulated traffic generation
are vulnerable to attacks. Nakahara et al. [19] proposed an may differ from real network data, and may not truly represent
anomaly detection system that avoids relying on IoT device the real network traffic conditions.
itself to detect and analyse packets, but rather by using other Training an encrypted malicious traffic detection model
devices such as a low load home gateway. Then, the statistical requires a dataset to have following features:
features of aggregated traffic information are used, so as to 1. Sufficient amount of encrypted traffic.
reduce the processing load of the home gateway and the model. 2. High variety of encrypted malicious attacks.
Wang et al. [89] proposed a deep learning based encrypted 3. Class balanced.
traffic classifiers, named as DataNets. MLP, SAE, and CNN 4. Ground truth is confirmed.
are utilized in the classifier for encrypted traffic classification 5. No redundant data.
on software defined network. Aceto et al. [27] proposed a Unfortunately, there is no well-established dataset for this
mobile traffic classifier which can deal with encrypted traffic specific domain problem currently. Thus, research works often
through automatic feature extraction and deep learning. use their own private datasets or select from the few available
Finally, in the process of setting our research objectives, we public datasets. This presents a challenge of performing fair
also need to consider the running environment of the detection performance comparisons among these existing works. There-
model. That is, whether a detection model runs online or fore, we aim to address this issue so as to help lay a good
offline because different operating environments have different foundation to provide for future research on this topic. We
requirements for efficiency, robustness, and accuracy. At times, collect the publicly available datasets which are applicable
we need to compromise one requirement to better meet the to encrypted traffic analysis and analyse their targeted and
others. applicable fields, characteristics, limitations, and whether they
contain encrypted malicious traffic. We show the compiled
B. Traffic Dataset Collection dataset list in Table I. Through our analysis, we found that
After the research objectives have been established, the most public datasets either contain a few or do not contain
next step is to collect the corresponding datasets for model encrypted traffic at all. Datasets containing encrypted mali-
training. Data collection methods can be generally divided cious traffic are even rarer.
into two types: real traffic collection and simulated traffic
generation. For real traffic collection, traffic data is collected C. Traffic and Feature Extraction
from an operational network environment through commonly a) Traffic data pre-processing: The first step of dataset
used traffic collection software, such as Wireshark. The col- pre-processing is data cleaning and filtering. Most public
lected data is then analysed and labelled for use. In [71], datasets are in the format of Packet Capture (PCAP)/ PCAP
the authors deployed the IoT honeypot to collect the network Next Generation (PCAPNG), and these PCAP/PCAPNG files
traffic dataset from 2017 to 2018. In this one and a half years, often record the original raw traffic data. For this kind of
the honeypot was presented on 40 public network IP addresses data, it is necessary to clean up irrelevant network packets
and the traffic was forwarded to 11 real IoT devices. In another which are not applicable in the research of encrypted malicious
project [58] a sample of internet traffic from Japan to USA traffic detection, such as Address Resolution Protocol (ARP)
over different time periods is collected. The data collection or Internet Control Message Protocol (ICMP) packets. Then
began in 2001. UNIBS dataset [65] contains 27GB traffic, we need to remove duplicated, damaged, unnecessary, and
which is captured on the university campus network for three incompletely captured traffic streams or information which
TABLE I
T HE SUMMARY OF PUBLIC TRAFFIC DATASETS
Contain
Year of Encrypted
Dataset Title Characteristics and Limitations
Release Malicious
Traffic
20 malwares traffic are captured in IoT devices. It also includes 3 captures of legitimate
IoT-23
2020 IoT devices traffic. However, only a very small part of traffic is encrypted. The dataset Y
Dataset [51]
contains only raw data.
IoT The dataset contains traffic from several IoT devices. Most traffics are unencrypted,
Encrypted 2020 which are not applicable to encrypted traffic detection model training. The dataset N
Traffic [50] contains only raw data.
UNSW NS The dataset contains encrypted legitimate and malicious traffics from more than 28
2019 Y
2019 [70] different IoT devices. Features extracted from captured raw data are not provided.
The dataset is collected by more than 280 experimenters in the ARCLAB laboratory
at the University of Napoli ”Federico II”. It is a human-generated mobile-app dataset
MIRAGE-2019
2019 from May 2017 to May 2019 for mobile traffic analysis based on the authors proposed N
dataset [31]
MIRAGE architecture[31]. The dataset contains 40 Android apps from 16 different
categories.
The dataset is released on Canadian Institute for Cybersecurity[18]. 5000 of the collected
CIC-AndMal samples (426 malware and 5065 legitimate) were installed on real
2017 Y
2017 [60] devices. All samples are categorized into 4 malware types, from 42 malware families.
Limited extracted features are provided using CICFlowMeter [78].
legitimate and common attacks traffic, which resembles the real traffic (PCAPs) are
CICIDS
2017 captured in this dataset. legitimate traffic is generated by simulation. Limited features Y
2017 [53]
are extracted from captured raw traffic.
VPN- A regular traffic session and a session over VPN were captured. There are 14 traffic
non-VPN categories in this dataset. Captured data have been pre-processed and ISCXFlowMeter
2016 N
traffic dataset (old version of CICFLowMeter [78]) is utilized to create features csv files.
[55] Selected features file and original PCAP file are provided.
The available dataset is generated by the IXIA PerfectStorm tool [69]. It contains 49
UNSW NS
2015 features of 6 feature categories and ten traffic classes (Normal, Fuzzers, Analysis, Y
2015 [69]
Backdoors, Denial-of-Service, Exploits, Generic, Reconnaissance, Shellcode, and Worms).
Traffic is composed from Reverse Shell shellcode connections, website defacing
attacks, ransomware downloaded attack crypto locker and a command and conquer
FIRST 2015
2015 exploit attack (C2) over SSL that takes over the victim machine. However, since Y
[64]
encrypted malicious traffic in the dataset is significantly less than other forms
of malicious traffic [14], it is not suitable for encrypted malicious traffic detection.
Malware
The malware in this dataset is executed under bandwidth limit and spam interception.
Capture
2013 The most important characteristic of this dataset is malware is executed in a long Y
Facility
periods of time (up to several months). The dataset contains only raw data.
Project [49]
Malware
Datasets are from malware traffic analysis website. The website provides lots of
traffic
2013 different malware traffic since 2013. However, traffic data is complex and time Y
analysis net
consuming to process. The dataset contains only raw data.
[52]
A systematic approach to generate HTTP, SMTP, SSH, IMAP, POP3, and FTP traffic
CICIDS
2012 datasets is used. Thus, the datasets cannot fully represent real network traffic. Y
2012 [54]
Limited features are extracted from captured raw traffic.
This dataset contains a total of 13 botnet capture in different scenarios. Each of these
CTU-13 [48] 2011 captures are collected for a period of time. Most traffics are unencrypted traffic, but Y
contain encrypted traffic. The dataset contains only raw data.
Webldent
Web requests and responses over an encrypted SSH tunnel were collected. Limited
2 Traces 2006 Y
features are extracted from captured raw traffic.
[57]
Kyoto The dataset only contains the traffic on honeypots, and the captured raw traffic PCAP file
Dataset 2006 is not provided. It contains limited extracted features csv files only, which means we Y
[63] cannot further process features from captured raw traffic.
may interfere with model training. Data truncation and padding total length of IP packet header per session.
are utilized to keep the length of input data consistent [83]. Feature engineering can be further applied to these packet
Finally, we filter out the non-encrypted traffic which results in based and session based features. Statistical features are ob-
the final dataset containing only encrypted traffic. There are tained from conducting statistical calculations on such fea-
also some public datasets that are not released under basic tures. Common calculations include mean, median, maximum,
PCAP files, but csv format with prior cleaning, filtering, and minimum, variance, and standard deviation (STD) of packet
feature extraction, such as [53]. However, if there is no label in based and session based features. [19][14] selected packet
csv feature datasets to indicate if these features are extracted based features such as packet size, payload size and the time
from encrypted traffic or unencrypted traffic, and the source difference between sessions as their features to train their
of the dataset does not provide the original traffic files (i.e., detection models. In [20][32][76][77], the authors selected
PCAP files), such datasets are difficult to the encrypted traffic session based features and statistical features as their feature
detection model training. Furthermore, if a feature needed to set. Liu et al. [12] extracted more than 80 packet based, session
train the model is not included in this kind of dataset, such based, and statistical features, which are used in their detection
as in [63], the dataset cannot be used as well. In addition, models.
we also need to convert or encode categorical data, such as Many works selected their unique subsets from protocol-
network protocol information, encryption protocol certificates, agnostic numerical features as their model training feature set
server indication names, and so on, in the dataset to numeric based on the distinctive attributes of features. Celik et al. [17]
data. defined tamper resistant features (e.g., IPratio and goodput),
Data imbalance in datasets is also a problem. In existing such features are not related to port, flag or payload. These
research, two methods are usually adopted to solve this prob- features are consistent and difficult to be masked by adver-
lem. The first one is the combination of different datasets, saries. Stergiopoulos et al. [14] defined their TCP side channel
or the use of self-generated dataset to increase the type and features (e.g., ratio to previous packets), which reduces the
quantity of traffic. In the second method, data augmentation, size of training dataset and training time while ensuring
over-sampling, under-sampling or over-sampling followed by encrypted malicious traffic detection accuracy is above 99%.
under-sampling can be applied to address data imbalance. Liu et al. [12] divided their extracted features into 4 different
Yan et al. [26] utilized a mean synthetic minority over- sub-categories based on the attributes of features: TCP/IP
sampling (SMOTE) to balance their own dataset. Wang et header features (e.g., IP header); time based features (e.g.,
al. [89] utilized the under-sampling method to reduce the average inter-arrival time of packets), length related features
number of samples of major classes in the dataset. Vu et (e.g., packet length and payload length), and packet variation
al. [91] conducted a series of comparative experiments with features (e.g., TCP window change times). These features are
different existing methods of addressing data imbalance. Data used for clustering. Then, different feature subsets are selected
normalization is another important part of data pre-processing, based on sequential forward selection (SFS) algorithms for
especially for deep learning. Data normalization can normalize their more than 20 supervised learning models. Sarkar et al.
the traffic data or feature value to a range of zero to one or [77] selected time related features, such as flow duration, mean
negative one to positive one so as to reduce data redundancy, of backward packet time difference, STD of forward packet
enhancing the integrity and efficiency of model training. time difference, and STD of time difference for their model.
b) Feature extraction: Feature extraction is the next key For protocol-specific features, features from the information
step after dataset collection. In order to select the best feature of encrypted protocols are extracted instead. Such features
set, it is necessary to first understand the types and charac- frequently appear in the research of HTTPs traffic detection as
teristics of features that can be extracted from traffic data. HTTPs is dominating as the encrypted traffic type currently.
At present, there is no recognized classification or naming Protocol-specific features extracted from HTTPs protocols
convention for network traffic features. Thus, we performed are mainly referred to as TLS/SSL features. Such features
analysis and categorization of features by studying the relevant are usually extracted from three different log files, conn.log
works. The study led us to broadly classify the features into file, SSL.log file, and X509.log (certificate log) file. One of
two types: protocol-agnostic numerical features and protocol- the most commonly used log generation applications from
specific features. captured raw network traffic is Zeek IDS (Bro IDS). Zeek
Protocol-agnostic numerical features have two granularities, IDS generates such log files based on traffic and protocols
which are packet based features and session based features. by processing raw network traffic [24]. These log files are
Packet based features are features extracted at the traffic packet interconnected to one another through the uid (unique ids)
level, such as time difference between packets in a session column and cert chain fuid (certificate identifiers) column.
flow, packet size (or packet length) of each packet, payload Features extracted from conn.log file have certain overlaps
size (or payload length) of each packet, and value changes of with the protocol-agnostic numerical features. For example,
TCP windows length per session. Session based features are its features also include the number of flow and payload
features extracted at the session flow level, such as session bytes from client and server. Ssl.log file and x509.log file are
flow duration, total bytes from client/server in each session, interconnected. Extracting x509 features is reliant on ssl.log
total number of packets from client/server in each session, and information, and the relationship among these three files is
feature engineering is more decisive and important. However,
because of privacy concerns, the authors did not disclose
their datasets or domain experts’ suggested feature set. Many
research [14][16][17][19][28][29][33] have used similar ways
to artificially select features for their experiments. However, it
is not always reliable to manually choose the most important
features. Human error may exist and human may not be able
to find non-intuitive features, which may greatly affect the
performance of the detection models. Noted that for domain
expert based selection, the features will first be selected and
this set of features will then be extracted from the traffic.
Another category of feature selection methods is based
Fig. 3. Relationship among protocol logs on machine learning self-selection with some algorithms. It
requires humans to extracted all possible features and selected
algorithms will rank and select the most suitable features as
explained in Fig. 3. Examples of ssl Log Features include the final feature set, or directly process raw data to learn
TLS version types and the ratio of the same issuer. X509 and extract required features by themselves. In [11], after
Log Features include public certificate key mean, mean of the image augmentation and feature extraction, the authors
certificate validity, and average number of domains in Subject utilized the mean decrease in impurity algorithm to rank
Alternative Name (SAN). [11][13][25][28][29][33][92] used features based on their importance. Lastly, they generated a
features from TLS/SSL features in their works. In [11], 12 feature set with top ranked features. Liu et al. [12] applied
connection features, 6 SSL features, and 6 x509 (certificate) SFS algorithm to increase the size of feature set gradually
features were selected as the TLS/SSL feature set for CNN until they found their optimal feature set. However, SFS uses
and RF models. In [28][33], the authors chose to combine a greedy algorithm, which may fall into the local optimum
protocol-agnostic numerical features and TLS/SSL features as and is very time consuming. To avoid these limitations, the
their feature set. authors also further enhanced the SFS algorithm by adding
For protocol-specific features, one significant limitation is random selection in each round. Finally, they combined fea-
that such features are only applicable to their related encrypted ture sets with top performance to get their final feature set.
protocols. Therefore, if the dataset contains other encryption Shekhawat et al. [29] extracted TLS/SSL features and applied
protocols, protocol-specific features will not be applicable any- recursive feature elimination (RFE) to eliminate features with
more. On the other hand, the extraction of protocol-agnostic the lowest ranking score iteratively. The author applied the
numerical features does not depend on any specific contents or final feature set to XGBoost, SVM, and RF algorithms.
information of the traffic communication and protocols. Thus, For RF model, they obtained a nearly 99% accuracy while
no matter what encrypted mechanism is utilized, protocol- XGBoost achieved 99.15% accuracy. Desai et al. [30] pro-
agnostic numerical features can always be extracted. However, posed a feature ranking framework based on fundamental
since there is a large amount of protocol-agnostic numerical statistical tests. It is specifically designed for IoT device traffic
features with different extraction logic, the extraction process classification, which can select important features from IoT
is complex, time-consuming, and requires prior knowledge. network traffic, reducing cost and protecting user privacy.
Their experiments indicate that the small number of features
D. Feature Set Selection can achieve similar accuracy as compared to other existing
It is important to note that more features do not imply better methods. Many research [13][20][32] are designed to use
results. Having too many features in the feature set requires a machine selection on the optimal features chosen from a
high volume of memory and computation, extends the model large set of extracted features without human intervention. On
training time and may also affect the model accuracy. the other hand, [9][10][37][44][47] directly used raw data as
Common feature selection methods can be divided into two the deep learning methods input. The model can learn and
categories. The first one is domain expert based selection. extract the required features automatically. Zheng et al. [9]
Domain experts select a set of features that they think is applied CNN, LSTM, and SAE to learn features from different
the most appropriate based on their experience and knowl- aspects. For example, extracting features from time based
edge. After that, such suggested features will be extracted aspect was based on LSTM in [9]. Machine learning based
directly from datasets and used as model input. In [28], the feature selection can discover non-intuitive features, and can
authors found that iterating the initial feature set and adding avoid human errors and bias at the same time. For this method,
domain experts’ suggested features can greatly enhance the manual feature extraction process is skipped and raw data is
detection system’s performance. The RF ensemble method fed to automated feature selection and extraction, or feature
outperformed competing methods from an algorithmic per- extraction is performed first to provide a full set of features
spective based on a feature set from the domain experts. They which is then fed to automated feature selection. However, due
also mentioned that as compared with algorithm selection, to the black box feature of AI, especially when deep learning
algorithms are involved, it is very challenging to interpret Stergiopoulos et al. [14] defined their side channel features
the feature selection process and provide explainability of the and applied their feature set into 7 different machine learning
reasons features were selected as optimal. Additional research algorithms to verify the performance of these side channel
is required to solve the above issue [62][67]. features. The paper focuses on malicious traffic detection,
but also includes encrypted traffic detection. The encrypted
E. Algorithms Selection dataset was extracted from CTU-13 [48], FIRST [64], and
Milicenso [66] and they obtained 99.8% accuracy under CART
Many mature traditional methods for traffic detection, such model. Shekhawat et al. [29] tested XGBoost, SVM and RF
as DPI, are no longer applicable to encrypted traffic. At algorithms with a machine selected feature set and achieved
present, traditional machine learning methods and deep learn- 99.8% accuracy with the XGBoost model.
ing methods are two mainstream research directions in this Ma et al. [32] proposed an enhanced KNN algorithm,
area. which is named WKNN-Selfada (feature weight self-adaptive
a) Traditional Machine Learning Approach: For the algorithm for weighted feature KNN). The proposed algorithm
traditional machine learning, algorithms and feature set opti- improved the KNN distance calculation and also included a
mization are the main focus. For unsupervised learning, there sub-algorithm that can choose the suitable feature set and
are many works like Chen et al. [13], which proposed a feature weights. However, although the paper mentioned that
three-stage hierarchical sampling approach by further devel- the proposed algorithm can be used for encrypted traffic
oping density peaks clustering algorithm (THS-IDPC) based detection, they did not perform any experimental evaluation.
on grid screening, custom centre decision value and mutual Niu et al. [16] proposed a heuristic statistical testing
neighbour degree (DPC-GS-MND). Experiments have shown (HST) approach. The HST approach consists of 3 parts:
that their DPC-GS-MND is better than average-linkage [86], Jnetpcap-based Executor (handshake-skipping algorithm), en-
DPC, modified density peak clustering algorithm (MDPCA) hanced NIST Test Suite and an algorithm classifier (such as
[87] and DPC-GS. Furthermore, their performance of THS- C4.5). The experiment data used in this paper were based
IDPC based model is shown to be better than selective on two proprietary protocols (Freegrate and Ultrasuff) and
sampling [84], smart sampling [85] and hierarchical clustering one private custom unknown protocol. The dataset was not
based sampling (HCBS) [86]. The algorithm can significantly publicly released. A 10-fold cross validation was used in the
reduce the cost of computation and enhance the accuracy experiment. The experiment shows that after applying the
and efficiency of encrypted malicious traffic detection model. handshake skipping algorithm, the accuracy of the experiments
However, the method in this work cannot handle the class improved significantly and that HST is more robust than the
imbalanced problem. Furthermore, DPC-GS-MND uses the K other tested machine learning methods.
nearest neighbour idea, but the K value has to be decided Frameworks that combine supervised learning and unsu-
manually. pervised learning also exist in many works. In [12] the
Hafeez et al. [20] proposed an IoT Keeper model which is authors proposed a distance based encrypted malicious traffic
constructed by the fuzzy C-means (FCM) clustering algorithm identification framework, which comprised of a series of
with the fuzzy interpolation scheme, to perform malicious detection models for different malware types. They firstly
IoT traffic classification. Such unsupervised learning algorithm applied the Gaussian mixture model (GMM) and ordering
does not need fully labelled dataset as the model training points to identify the clustering structure (OPTICS) to classify
input. Furthermore, the proposed model will not be restricted the new malware class (FClass) based on the distance calcu-
to certain types of IoT devices. A novel mechanism, Adhoc lation between malwares. After that, 24 XGBoost encrypted
Overlay networks, is also applied to the proposed model, malicious detection models are trained to identify the 24 kinds
which can strengthen the access control to IoT devices network of malware. Comparative experiments have shown that their
activities actively. However, IoT Keeper was not tested with proposed method is better than the MalClassifier model in [96]
encrypted malicious traffic and the authors listed that as a and the CluClas model in [21].
future research direction. Unsupervised deep learning like b) Deep Learning Approach: Research on deep learning
AE is further discussed in the next deep learning approach has grown rapidly in recent years and has achieved remarkable
subsection. results. Deep learning based encrypted traffic detection has
On supervised learning, Meghdouri et al. [80] proposed many obvious advantages, such as the ability to automatically
an RF classification model based on a multi-key based ap- extract the required data features through its own feature
proach, which is a novel cross-layer feature representation learning. It is also easier to find non-intuitive connections
of traffic data under TLS and IPSec protocols. They tested among traffic features that humans cannot.
the model using three different datasets, CICIDS-2017[54], A general deep learning based framework is proposed by
UNSW0-NB15[22] and ISCX-bot-2014[72], which achieved Aceto et al. [98] for encrypted and mobile traffic classification.
100%, 92.6% and 99.2% F1 scores, respectively. Comparative The proposed framework provides clear guidelines for design-
experiments were also conducted with other existing methods ers in deep learning based traffic classification. It also over-
which used the same datasets. They showed that their model comes the design limitations of the existing single-modality or
outperformed those methods. single-task learning methods by jointly using multimodal and
multi-task techniques. Moreover, the authors test two imple- maximum pooling for non-linear sampling outperformed other
mentations of the framework based on three datasets, which models. Prasse et al. [25] proposed an encrypted malware
are generated by the activity of mobile users. Aceto et al.[99] detection model based on LSTM. The work focused on HTTPs
also proposed a novel multimodal and multitask deep learning traffic and self-collected dataset by using cloud web security
based approach for multipurpose encrypted traffic classifi- (CWS) and VirusTotal, which helped the authors get enough
cation, DISTILLER. The proposed method overcomes the malicious and legitimate traffic. The proposed detection model
performance limitations of state-of-the-art single-mode deep can classify different malware families, even for new unknown
learning models based on heterogeneous and structured traffic. malware. A comparative experiment was also conducted and
It can also tackle different traffic categorization problems from showed that LSTM based model outperformed RF detection
different providers. A fair comparative experiment with 8 other model.
existing deep learning based encrypted traffic classification There are also works that try to combine CNN and RNN
methods is conducted by using the VPN-non-VPN public to their models. Lopez-Martin et al. [83] combined RNN
dataset[55]. Experiment results indicate that DISTILLER has with CNN to classify IoT traffic regardless of the traffic was
a better performance than other methods. encrypted or not. The proposed model achieved a 95.74% F1
Bazuhair et al. [11] proposed a new method which can score and 96.32% accuracy in an imbalanced dataset. The
enhance the generalization of CNN in encrypted malicious research conducted model training using 5 different feature
traffic detection. That work focused on HTTPs malicious sets to analyse the importance of feature set selection in model
traffic and the authors developed a binary detection model. training. The authors also considered the impact of the length
A new encoding method which can convert TLS/SSL features of traffic session flow and the trade off between computing
into images was proposed. Then, Perlin noise is utilized to time and detection rate. Their comparative experiment using
do the data argumentation. This enhanced the generalization the different number of packets in each session indicated that
of the deep learning model. CTU-13 [48] dataset is used in keeping to between five and fifteen packets in each session
the experiment and the CNN model achieved 97% accuracy, flow can achieve above 94.5% performance metric (accuracy,
with a 0.4% false negative rate and 5.6% false positive rate F1, precision, and recall). Furthermore, sessions with less than
(FPR). Lucia and Cotton. [46] proposed a TLS malicious the pre-decided packets number will be padded as zeros to
traffic classification based on a public dataset malware capture ensure each session has the same number of packets. Wang
facility project[49]. SVM and CNN were selected to conduct et al. [36] proposed a hierarchical spatial-temporal features-
the experiment. The one-dimensional CNN achieved 99.91% based intrusion detection system (HAST-IDS) by combining
for both accuracy and F1 with Adam optimizer, 32 batch size, CNN with LSTM. Spatial features and temporal features can
100 epochs and early stopping. The non-linear SVM with be learned by CNN and LSTM, respectively. The resultant
radial basis function kernel achieved 99.97% for both accuracy model brought a reduction to the FPR.
and F1. That work was an enhancement of their previous work [7-10][22][38-42][75][81] applied AE in their detection
[82]. Wang et al. [37] proposed an end to end encrypted traffic models with their different self-collected or public datasets.
classification method. VPN-non-VPN [55] traffic data was A network intrusion detection(for both known and unknown
utilized. The data was split into the same length and used as the attacks) based on a two-stage architecture is proposed by
model input. Furthermore, the end-to-end framework merged Bovenzi et al. [75], which is named H2ID. The first stage
feature extraction, selection and classifier processes, so as to of this framework is to utilize a novel multimodal deep
automatically learn and discover the required features for clas- auto-encoder (M2-DAE) to perform a lightweight anomaly
sification. The experiments showed that the one-dimensional detection. The anomalous traffic is then classified into different
CNN outperforms the two-dimensional CNN. [34][43-45][88] types of attack traffic such as scans and distributed denial-
also applied CNN in their experiments which achieved above of-service using soft-output classifiers in the second stage.
93% detection rates. BotIoT[23] dataset is selected to validate the performance
Yao et al. [15] proposed two encrypted traffic classification of the proposed approach. In [81], comparative experiments
methods. The first method is based on LSTM with attention based on CNN and SAE were conducted by using the VPN-
mechanism and the second one is based on the hierarchical non-VPN dataset[55]. CNN and SAE both achieved above
attention network (HAN). VPN-non-vpn [55] dataset was 92% accuracy for application and traffic classification. Yang
used for their comparative experiments with attention based et al. [8] proposed an encrypted malicious traffic classifier that
LSTM, HAN, Deep Packet [81], one-dimensional CNN model combines CNN and LSTM auto-encoder. They finally achieved
[37], decision tree [35] and XGBoost. The experiment result a 95.8% detection rate. Zeng et al. [9] proposed an encrypted
indicated that their attention based LSTM model and HAN traffic based intrusion detection framework, named Deep-Full-
neural network model both outperformed the machine learning Range (DFR). The framework is constructed by CNN, LSTM,
based model from [35] and the one-dimensional CNN model and SAE to self-learn and extract features from raw data
from [37]. Pascanu et al. [47] proposed a series of hybrid input. The authors conducted comparative experiments with
models that combined an Echo state networks (ESN) with a KNN and other existing works which used the VPN non-VPN
classifier (logistic regression or MLP) or RNN with a classifier [55] and CICIDS 2012 [54] datasets to demonstrate that the
(logistic regression or MLP). ESN plus logistic regression with proposed DFR is more accurate and robust. Xing et al.[10]
proposed an online detection model based on deep dictionary contributed by each individual dataset. Thus, approximate
learning, D2LAD, to address the noisy data label, long training numbers of malicious and legitimate traffic from each selected
time and high traffic data distribution variance. The model public dataset are extracted. We also ensured that there will be
can learn and extract sequential features from raw traffic data no traffic size from one selected public dataset that is much
input based on a pre-trained LSTM auto-encoder. The authors larger than other selected public datasets. The third criterion is
showed that their proposed work achieved a 94.5% accuracy, that our dataset includes both conventional devices’ and IoT
which outperformed existing methods for online encrypted devices’ encrypted malicious and legitimate traffic, as these
traffic detection. devices are increasingly being deployed and are working in
The research of applying deep learning provides us with the same environments such as offices, homes, and other smart
many very useful detection methods. Such methods include city settings.
the algorithms based on feature self-learning which were Based on the criteria, 5 public datasets are selected from
proposed to overcome the difficulty of traffic feature extraction Section III.B.a. Table I. After data pre-processing, details of
and selection. These algorithms do not require human effort each selected public dataset and the final composed dataset
to extract traffic features. The algorithms then automatically are shown in Table II. Table II summarised the malicious and
extract the required features. On the other hand, research on legitimate traffic size we selected from each selected public
applying deep learning to encrypted malicious traffic detection dataset by using random sampling, proportions of selected
and classification are limited. traffic size from each selected public dataset with respect to the
total traffic size of the composed dataset(% w.r.t the composed
V. COMPARATIVE EXPERIMENT dataset), proportions of selected encrypted traffic size from
In this section, we perform two experiments to evaluate our each selected public dataset (% of selected public dataset), and
experiment objectives. Specifically, our first experiment (re- total traffic size of the composed dataset. From the table, we
ferred to as Experiment 1) is to conduct a series of comparative are able to observe that each public dataset equally contributes
experiments with different algorithms and feature sets to find to approximately 20% of the composed dataset, except for
a more reliable, consistent and fair result for each proposed CICDS-2012 (due to its limited number of encrypted malicious
existing work. By screening and analysing public datasets in traffic). This achieves a balance across individual datasets and
Table I of Section IV.B.a., we curated a dataset composed minimizes bias towards traffic belonging to any dataset during
entirely from public datasets that are applicable for encrypted learning. We can also observe that the size of malicious and
malicious traffic detection. To the best of our knowledge, legitimate traffic are almost the same, thus achieving class
the dataset is more comprehensive and objective for use than balance. To facilitate the research in this domain, we released
those used by existing works. To prove this, a cross dataset our dataset[100] in Mendeley Data.
validation is conducted as well in experiment 2. Moreover,
this composed dataset also allows a fair comparison among B. Feature extraction and selection
the proposed works for encrypted malicious traffic detection Protocol-agnostic numerical features and TLS/SSL features
and classification. The second experiment (or Experiment 2) are extracted for experiments. Since there is no recognized
aims to provide some insights on whether protocol-agnostic optimal feature set, researchers have adopted different meth-
numerical features or protocol-specific features are better in ods to select the feature set they think is the best, which
performance, and how well one fares against the other. has been discussed in Section IV. Therefore, for protocol-
agnostic numerical features extraction, in order to compare
A. Data Collection the performance of models with different feature sets reliably,
Existing research that adopts different public datasets may we extracted applicable features mentioned in research papers
result in bias and the inability to compare results of proposed for encrypted traffic detection. A total of more than 113 unique
works in an objective manner. In order to aid in this effort protocol-agnostic numerical features were obtained. Table III
to ensure that experiments can be conducted as fairly and shows a few commonly used features of protocol-agnostic
comprehensive as possible, we aim to collect, analyse and numerical features extracted.
derive data from publicly available datasets on the Internet, As different works selected different feature sets, we con-
to compose a dataset for encrypted malicious traffic detection ducted statistical analysis on the protocol-agnostic numerical
and classification. At the same time, we attempt to include features that appeared in those research and listed the features
datasets from as many different sources as possible to expand that appeared at high frequencies. In comparative experiments,
the variety of encrypted traffic. the listed features will be constructed as a feature set, named
Our dataset is composed based on three criteria: The first Further Optimized Statistical (FOS) feature set.
criterion is to combine widely considered public datasets 5 different feature sets belonging to protocol-agnostic nu-
which contain both encrypted malicious and legitimate traffic merical features are used in experiments:
in existing works, such as Malwares Capture Facility Project 1. FOS feature set contains 14 features;
dataset and CICIDS-2017 dataset. The second criterion is to 2. Top 10 ranked features feature set from [12], which are
ensure the data balance, i.e., the balance of malicious and the top 10 ranked features used in their detection framework
legitimate network traffic and similar size of network traffic by using enhanced SFS algorithm;
TABLE II
T HE SELECTED PUBLIC DATASETS IN EXPERIMENTS
% of
Malicious legitimate Total % w.r.t
Year of selected
Public Dataset Type of Traffic Traffic Traffic Traffic composed
Release public
Size Size Size dataset
dataset
UNSW NS
IoT Encrypted 12900 sessions 13300 sessions 26200 sessions
2019 Dataset 2019 ˜22% ˜60%
Traffic 193500 packets 199500 packets 393000 packets
[70]
CICIDS-2017 Conventional 13000 sessions 13500 sessions 26500 sessions
2018 ˜23% ˜70%
[53] Encrypted Traffic 195000 packets 202500 packets 397500 packets
CIC-AndMal Conventional 12403 sessions 12400 sessions 24803 sessions
2018 ˜21% ˜60%
2017[60] Encrypted Traffic 132859 packets 186000 packets 318859 packets
Malware Capture
Conventional 13600 sessions 12180 sessions 25780 sessions
Facility Project 2013 ˜22% ˜50%
Encrypted Traffic 204000 packets 182700 packets 386700 packets
Dataset [49]
CICIDS-2012 Conventional 7613 sessions 6731 sessions 14344 sessions
2012 ˜12% ˜100%
[54] Encrypted Traffic 69648 packets 71310 packets 140958 packets
59516 sessions 58111 sessions 117627 sessions
Summary
795007 packets 842010 packets 1637017 packets
Feature Name 1 2 3 4 5
mean TCP windows size value X X
source port X X X
mean length of IP packet header X
maximum interval of arrival time of forward traffic X X X
mean Length of backward IP packet header X
maximum interval of arrival time of backward traffic X X X
STD of backward packet length X
flow duration X X X X
time duration of backward traffic X
total payload per session X
destination Port X
STD of time difference between packets per session X
minimum of time difference between packets per session X
STD of interval of arrival time of backward traffic X X
STD of interval of arrival time of forward traffic X
minimum of interval of arrival time of backward traffic X
mean of interval of arrival time of forward traffic X
mean of interval of arrival time of backward traffic X
minimum of interval of arrival time of forward traffic X
length of IP packets X
length of TCP payload X
payload Ratio X
ratio to previous packets in each session X
time difference between packets per session X
total length of forward payload X X
minimum length of TCP payload X
mean length of TCP payload X
median length of TCP payload X
STD of the length of IP packets X X
IPratio (maximum length of IP packets / minimum length of IP packets) X
goodput (Total length of IP packet per session / flow duration) X
maximum time difference between packets per session X
STD of forward packet length X
maximum length of TCP payload X
mean time to live X
STD of time to live X
time duration of forward traffic X
1 refers FOS feature set; 2 refers top 10 ranked features feature set; 3 refers side-channel feature
set; 4 refers tamper resistant feature set; 5 refer to time based feature set; ’X’ refers the feature in
this row is selected to the feature set in this column
A= A=
Train Other Datasets A= A= A=
Malware Capture UNSW NS2019
and Test Dataset A CICIDS-2012 CICIDS-2017 CIC-AndMal2017
Facility Project Dataset Dataset
Accuracy 0.2868 0.8005 0.4318 0.4873 0.5840
roc auc 0.3026 0.8148 0.7451 0.5830 0.6222
FPR 0.6970 0.3799 0.0881 0.0507 0.0002
TPR 0.2702 0.9878 0.0018 0.0110 0.0018
A= A=
Train Dataset A and A= A= A=
Malware Capture UNSW NS2019
Test Other Datasets CICIDS-2012 CICIDS-2017 CIC-AndMal2017
Facility Project Dataset Dataset
Accuracy 0.5422 0.5190 0.6528 0.5165 0.4906
roc auc 0.6450 0.6552 0.7030 0.7783 0.4705
FPR 0.2260 0.0041 0.2500 0.0000 0.5283
TPR 0.2960 0.0107 0.5445 0.0000 0.5093
TLS/SSL features. Although, there is a different performance Lastly, we also showed that the study on only one or two
of Naı̈ve Bayes, the performance of Naı̈ve Bayes detection kinds of encrypted traffic should be avoided. With the growth
model is lower than most other algorithms no matter FOS in the types of encryption protocols and proprietary protocols,
feature set or TLS/SSL features is selected. In summary, the analysis of one or two kinds of encryption protocols is
Among top performance algorithms, such as RF and XGBoost, bound to play a small role moving forward. Furthermore,
FOS feature set can achieve better performance than using the current approaches are mostly binary classification, and multi-
TLS/SSL features regardless of using ROC-AUC or accuracy class classification can be explored to obtain more fine-grained
as a performance evaluation measure. Therefore, it is feasible detection results.
to consider replacing protocol-specific features with protocol-
agnostic numerical features. Then, most research on encrypted
R EFERENCES
malicious traffic detection will no longer be limited to HTTPs
or any specific protocols, but a more comprehensive encrypted
[1] S, G. Cisco ETA – Provides Solution for Detecting Malware in Encrypted
malicious traffic analysis can be carried out. The future Traffic. GBHackers On Security. Retrieved January 14, 2018, from
research of feature selection and optimization on protocol- https://gbhackers.com/cisco-eta-encrypted-traffic/
agnostic numerical features may be more meaningful than [2] Google transparency report. (n.d.). Retrieved April 28, 2021, from
https://transparencyreport.google.com/https/overview?hl=en
protocol-specific features as well. [3] Cisco Cybersecurity Report. (n.d.). Retrieved 2018, from
https://www.cisco.com/c/m/en au/products/security/offers/annual-
VI. C ONCLUSION cybersecurity-report-2018.html
In the paper, we proposed a framework to study and analyse [4] Desai, D. 2020: The State of Encrypted Attacks. Zscaler. Retrieved Febru-
ary 24, 2021, from https://www.zscaler.com/blogs/security-research/2020-
the machine learning based encrypted traffic detection ap- state-encrypted-attacks
proach. We reviewed existing research based on the proposed [5] Cao, Zigang & Xiong, Gang & Zhao, Yong & Li, Zhenzhen & Guo, Li.
framework, including the research objective construction, traf- (2014). A Survey on Encrypted Traffic Classification. Communications
in Computer and Information Science. 490. 73-81. 10.1007/978-3-662-
fic dataset collection and pre-processing, feature extraction and 45670-5 8.
selection, algorithm selection and performance evaluation. [6] Velan, Petr & Cermak, Milan & Celeda, Pavel & Drašar, Martin. (2015).
While some progress has been made in encrypted malicious A survey of methods for encrypted traffic classification and analysis.
International Journal of Network Management. 25. 10.1002/nem.1901.
traffic detection, challenges still exist. The first and most
[7] Li, Ding & Zhu, Yuefei & Lin, Wei. (2017). Traffic Identification
important issue to be addressed is the lack of a comprehensive, of Mobile Apps Based on Variational Autoencoder Network. 287-291.
class balanced, realistic and convincing public dataset in the 10.1109/CIS.2017.00069.
area of encrypted malicious traffic detection. The quality of [8] Yang, Jiwon & Lim, Hyuk. (2021). Deep Learning Approach for Detect-
ing Malicious Activities Over Encrypted Secure Channels. IEEE Access.
the dataset used by research has a direct impact on the final PP. 1-1. 10.1109/ACCESS.2021.3064561.
performance of its model. Thus, we analysed, processed and [9] Zeng, Yi & Gu, Huaxi & Wenting, Wei & Guo, Yantao. (2019).
combined 5 publicly available datasets to construct a large Deep − F ull − Range: A Deep Learning Based Network Encrypted
Traffic Classification and Intrusion Detection Framework. IEEE Access.
and comprehensive dataset to facilitate future research and PP. 1-1. 10.1109/ACCESS.2019.2908225.
technology evaluation in this field. [10] Xing, Junchi & Wu, Chunming. (2020). Detecting Anomalies in En-
In addition, we trained detection models using 10 machine crypted Traffic via Deep Dictionary Learning. 734-739. 10.1109/INFO-
COMWKSHPS50562.2020.9162940.
learning algorithms and 5 feature sets, and conducted com-
[11] Bazuhair, Wajdi & Lee, Wonjun. (2020). Detecting Malign Encrypted
parative experiments to analyse the performance of different Network Traffic Using Perlin Noise and Convolutional Neural Network.
protocol-agnostic numerical feature sets and algorithms using 0200-0206. 10.1109/CCWC47524.2020.9031116.
the same dataset. We also tested the feasibility of replacing [12] Liu, Jiayong & Tian, Zhiyi & Zheng, RongFeng & Liu, Liang.
(2019). A Distance-Based Method for Building an Encrypted Malware
TLS/SSL features with protocol-agnostic numerical features Traffic Identification Framework. IEEE Access. PP. 1-1. 10.1109/AC-
for future encrypted malicious traffic analysis. CESS.2019.2930717.
[13] Chen, Liangchen & Gao, Shu & Liu, Baoxu & Lu, Zhigang & [32] Ma, & Yanhua, Du & Cao,. (2020). Improved KNN Algorithm for Fine-
Jiang, Zhengwei. (2020). THS-IDPC: A three-stage hierarchical sampling Grained Classification of Encrypted Network Flow. Electronics. 9. 324.
method based on improved density peaks clustering algorithm for en- 10.3390/electronics9020324.
crypted malicious traffic detection. The Journal of Supercomputing. 76. [33] Anderson, Blake & Paul, Subharthi & McGrew, David. (2018). Deci-
10.1007/s11227-020-03372-1. phering Malware’s use of TLS (without Decryption). Journal of Computer
[14] Stergiopoulos, George & Talavari, Alexander & Bitsikas, Evangelos Virology and Hacking Techniques. 14. 10.1007/s11416-017-0306-6.
& Gritzalis, Dimitris. (2018). Automatic Detection of Various Mali- [34] Zhou, Huiyi & Wang, Yong & Lei, Xiaochun & Liu, Yuming.
cious Traffic Using Side Channel Features on TCP Packets: 23rd Eu- (2017). A Method of Improved CNN Traffic Classification. 177-181.
ropean Symposium on Research in Computer Security, ESORICS 2018, 10.1109/CIS.2017.00046.
Barcelona, Spain, September 3-7, 2018, Proceedings, Part I. 10.1007/978- [35] Habibi Lashkari, Arash & Draper Gil, Gerard & Mamun, Mohammad
3-319-99073-6 17. & Ghorbani, Ali. (2016). Characterization of Encrypted and VPN Traffic
[15] Yao, Haipeng & Liu, Chong & Zhang, Peiying & Wu, Sheng & Using Time-Related Features. 10.5220/0005740704070414.
Jiang, Chunxiao & Yu, Shui. (2019). Identification of Encrypted Traffic [36] Wang, Wei & Sheng, Y. & Wang, Jinlin & Zeng, Xuewen & Ye,
Through Attention Mechanism Based Long Short Term Memory. IEEE Xiaozhou & Huang, Yongzhong & Zhu, Ming. (2017). HAST-IDS: Learn-
Transactions on Big Data. PP. 1-1. 10.1109/TBDATA.2019.2940675. ing Hierarchical Spatial-Temporal Features using Deep Neural Networks
[16] Niu, Weina & Zhuo, Zhongliu & Zhang, Xiaosong & Du, Xi- to Improve Intrusion Detection. IEEE Access. PP. 1-1. 10.1109/AC-
aojiang & Yang, Guowu & Guizani, Mohsen. (2019). A Heuris- CESS.2017.2780250.
tic Statistical Testing Based Approach for Encrypted Network Traffic [37] Wang, Wei & Zhu, Ming & Wang, Jinlin & Zeng, Xuewen
Identification. IEEE Transactions on Vehicular Technology. PP. 1-1. & Yang, Zhongzhen. (2017). End-to-end encrypted traffic classi-
10.1109/TVT.2019.2894290. fication with one-dimensional convolution neural networks. 43-48.
[17] Celik, Z. Berkay & Walls, Robert & McDaniel, Patrick & Swami, 10.1109/ISI.2017.8004872.
Ananthram. (2015). Malware traffic detection using tamper resistant [38] Min, Erxue & Long, Jun & Liu, Qiang & Cui, Jianjing & Cai, Zhiping
features. 330-335. 10.1109/MILCOM.2015.7357464. & Ma, Junbo. (2018). SU-IDS: A Semi-supervised and Unsupervised
[18] UNB-Canadian Institute for Cybersecurity (CIC) Framework for Network Intrusion Detection: 4th International Confer-
datasets. University of New Brunswick est.1785. (n.d.). ence, ICCCS 2018, Haikou, China, June 8–10, 2018, Revised Selected
https://www.unb.ca/cic/datasets/index.html. Papers, Part III. 10.1007/978-3-030-00012-7 30.
[19] Nakahara, Masataka & Okui, Norihiro & Kobayashi, Yasuaki & [39] Meidan, Yair & Bohadana, Michael & Mathov, Yael & Mirsky, Yisroel
Miyake, Yutaka. (2020). Machine Learning based Malware Traffic & Breitenbacher, Dominik & Shabtai, Asaf & Elovici, Yuval. (2018).
Detection on IoT Devices using Summarized Packet Data. 78-87. N-BaIoT: Network-based Detection of IoT Botnet Attacks Using Deep
10.5220/0009345300780087. Autoencoders.
[20] Hafeez, Ibbad & Antikainen, Markku & Ding, Aaron Yi & Tarkoma, [40] Höchst, Jonas & Baumgärtner, Lars & Hollick, Matthias & Freisleben,
Sasu. (2020). IoT-KEEPER: Detecting Malicious IoT Network Activity Bernd. (2017). Unsupervised Traffic Flow Classification Using a Neural
using Online Traffic Analysis at the Edge. IEEE Transactions on Network Autoencoder. 10.1109/LCN.2017.57.
and Service Management. PP. 1-1. 10.1109/TNSM.2020.2966951. [41] Yu, Yang & Long, Jun & Cai, Zhiping. (2017). Network Intrusion
Detection through Stacking Dilated Convolutional Autoencoders. Security
[21] Fahad, Adil & Alharthi, Kurayman & Tari, Zahir & Almalawi, Ab-
and Communication Networks. 2017. 1-10. 10.1155/2017/4184196.
dulmohsen & Khalil, Ibrahim. (2014). CluClas: Hybrid clustering-
classification approach for accurate and efficient network classifica- [42] Li, Yuancheng & Ma, Rong & Jiao, Runhai. (2015). A Hybrid Malicious
tion. Proceedings - Conference on Local Computer Networks, LCN. Code Detection Method based on Deep Learning. International Journal
10.1109/LCN.2014.6925769. of Software Engineering and Its Applications. 9. 205-216. 10.14257/ij-
seia.2015.9.5.21.
[22] Kim, Jin-Young & Bu, Seok-Jun & Cho, Sung-Bae. (2018). Zero-
[43] Min, Erxue & Long, Jun & Liu, Qiang & Cui, Jianjing & Chen,
day Malware Detection using Transferred Generative Adversarial Net-
Wei. (2018). TR-IDS: Anomaly-Based Intrusion Detection through Text-
works based on Deep Autoencoders. Information Sciences. 460-461.
Convolutional Neural Network and Random Forest. Security and Com-
10.1016/j.ins.2018.04.092.
munication Networks. 2018. 1-9. 10.1155/2018/4943509.
[23] N. Koroniotis, N. Moustafa, E. Sitnikova, and B. Turnbull, “Towards
[44] Wang, Wei & Zhu, Ming & Zeng, Xuewen & Ye, Xiaozhou & Sheng, Y..
the development of realistic botnet dataset in the Internet of Things
(2017). Malware traffic classification using convolutional neural network
for network forensic analytics: Bot-IoT dataset,” FGCS, vol. 100, pp.
for representation learning. 712-717. 10.1109/ICOIN.2017.7899588.
779–796, 2019.
[45] Chen, Zhitang & He, Ke & Li, Jian & Geng, Yanhui. (2017).
[24] ZEEK INTRUSION DETECTION SERIES. Retrieved 2020. from Seq2Img: A sequence-to-image based approach towards IP traffic clas-
http://ce.sc.edu/cyberinfra/docs/workshop/Zeek Lab Series.pdf sification using convolutional neural networks. 1271-1276. 10.1109/Big-
[25] Prasse, Paul & Machlica, Lukas & Pevný, Tomás & Havelka, Jiřı́ & Data.2017.8258054.
Scheffer, Tobias. (2017). Malware Detection by Analysing Encrypted [46] De Lucia, Michael & Cotton, Chase. (2019). Detection of Encrypted
Network Traffic with Neural Networks. 10.1007/978-3-319-71246-8 5. Malicious Network Traffic using Machine Learning. 1-6. 10.1109/MIL-
[26] B. Yan& G. Han& Y. Huang & X. Wang. (2018). New traf- COM47813.2019.9020856.
fic classification method for imbalanced network data. J. Com- [47] Pascanu, Razvan & Stokes, Jack & Sanossian, Hermineh & Marinescu,
put. Appl., vol. 38, no. 1, pp. 20–25, 2018. [Online]. Available: Mady & Thomas, Anil. (2015). Malware classification with recurrent
http://www.joca.cn/EN/abstract/abstract21447.shtml networks. 1916-1920. 10.1109/ICASSP.2015.7178304.
[27] Aceto, Giuseppe & Montieri, Antonio & Pescapè, Antonio & Ciuonzo, [48] CTU-13 dataset, CTU University, Czech Republic, 2011, from
Domenico. (2019). Mobile Encrypted Traffic Classification Using https://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-Capture-Botnet-1/
Deep Learning: Experimental Evaluation, Lessons Learned, and Chal- [49] M. J. Erquiaga and S. Garcia, Malware capture facility project, CVUT
lenges. IEEE Transactions on Network and Service Management. PP. University, 2013, from https://mcfp. weebly.com.
10.1109/TNSM.2019.2899085. [50] Dong, Shuaike & Li, Zhou & Tang, Di & Chen, Jiongyi & Sun,
[28] Anderson, Blake & McGrew, David. (2017). Machine Learning for Menghan & Zhang, Kehuan. (2020). Your Smart Home Can’t Keep
Encrypted Malware Traffic Classification: Accounting for Noisy Labels a Secret: Towards Automated Fingerprinting of IoT Traffic. 47-59.
and Non-Stationarity. 1723-1732. 10.1145/3097983.3098163. 10.1145/3320269.3384732.
[29] Shekhawat, Anish & Di Troia, Fabio & Stamp, Mark. (2019). Feature [51] Sebastian Garcia, Agustin Parmisano, & Maria Jose Erquiaga.
Analysis of Encrypted Malicious Traffic. Expert Systems with Applica- (2020). IoT-23: A labeled dataset with malicious and legiti-
tions. 125. 10.1016/j.eswa.2019.01.064. mate IoT network traffic (Version 1.0.0) [Data set]. Zenodo.
[30] Desai, Bharat & Divakaran, Dinil Mon & Nevat, Ido & Peters, Gareth & http://doi.org/10.5281/zenodo.4743746
Gurusamy, Mohan. (2019). A feature-ranking framework for IoT device [52] B. Duncan, “Malware traffic analysis,” Jul. 2020, [Online].
classification. 10.1109/COMSNETS.2019.8711210. Available:https://www.malware-traffic-analysis.net/
[31] Aceto, Giuseppe & Ciuonzo, Domenico & Montieri, Antonio & Per- [53] Sharafaldin, Iman & Habibi Lashkari, Arash & Ghorbani, Ali. (2018).
sico, Valerio & Pescapè, Antonio. (2019). MIRAGE: Mobile-app Traffic Toward Generating a New Intrusion Detection Dataset and Intrusion
Capture and Ground-truth Creation. 10.1109/CCCS.2019.8888137. Traffic Characterization. 108-116. 10.5220/0006639801080116.
[54] Shiravi, Ali & Shiravi, Hadi & Tavallaee, Mahbod & Ghorbani, Ali. [77] Sarkar, Debmalya & Vinod, P. & Yerima, Suleiman. (2020). Detection of
(2012). Toward developing a systematic approach to generate benchmark Tor Traffic using Deep Learning. 10.1109/AICCSA50499.2020.9316533.
datasets for intrusion detection. Computers & Security. 31. 357–374. [78] CICFlowMeter(2017). Canadian institute for cybersecurity (cic). from
10.1016/j.cose.2011.12.012. https://www.unb.ca/cic/research/applications.html
[55] Habibi Lashkari, Arash & Draper Gil, Gerard & Mamun, Mohammad [79] Khalife, Jawad & Hajjar, Amjad & Dı́az-Verdejo, Jesús. (2014).
& Ghorbani, Ali. (2016). Characterization of Encrypted and VPN Traffic A multilevel taxonomy and requirements for an optimal traffic-
Using Time-Related Features. 10.5220/0005740704070414. classification model. International Journal of Network Management. 24.
[56] Conti, Mauro & Li, Qianqian & Maragno, Alberto & Spolaor, Riccardo. 10.1002/nem.1855.
(2017). The Dark Side(-Channel) of Mobile Devices: A Survey on [80] Meghdouri, Fares & Iglesias Vázquez, Félix & Zseby, Tanja. (2020).
Network Traffic Analysis. IEEE Communications Surveys & Tutorials. Cross-Layer Profiling of Encrypted Network Data for Anomaly Detection.
PP. 10.1109/COMST.2018.2843533. 469-478. 10.1109/DSAA49011.2020.00061.
[57] Liberatore, Marc & Levine, Brian. (2006). Inferring the source of [81] Lotfollahi, Mohammad & Shirali hossein zade, Ramin & Jafari
encrypted HTTP connections. 255-263. 10.1145/1180405.1180437. Siavoshani, Mahdi & Saberian, Mohammadsadegh. (2020). Deep Packet:
[58] MAWI Working Group Traffic Archive. (n.d.). WIDE Project. from A Novel Approach For Encrypted Traffic Classification Using Deep
http://mawi.wide.ad.jp/mawi/ Learning. Soft Computing. 24. 10.1007/s00500-019-04030-2.
[59] Abbasi, Mahmoud & Shahraki, Amin & Taherkordi, Amir. (2021). Deep [82] De Lucia, Michael & Cotton, Chase. (2018). Identifying and detecting
learning for Network Traffic Monitoring and Analysis (NTMA): A survey. applications within TLS traffic. 31. 10.1117/12.2305256.
Computer Communications. 170. 10.1016/j.comcom.2021.01.021. [83] Lopez-Martin, Manuel & Carro, Belén & Sanchez-Esguevillas, Antonio
[60] Habibi Lashkari, Arash & Abdul kadir, Andi Fitriah & Taheri, Laya & Lloret, Jaime. (2017). Network Traffic Classifier With Convolutional
& Ghorbani, Ali. (2018). Toward Developing a Systematic Approach to and Recurrent Neural Networks for Internet of Things. IEEE Access. PP.
Generate Benchmark Android Malware Datasets and Classification. 1-7. 1-1. 10.1109/ACCESS.2017.2747560.
10.1109/CCST.2018.8585560. [84] Androulidakis, Georgios & Papavassiliou, S.. (2008). Improving network
[61] Zhang, Chaoyun & Patras, Paul & Haddadi, Hamed. (2018). Deep Learn- anomaly detection via selective flow-based sampling. Communications,
ing in Mobile and Wireless Networking: A Survey. IEEE Communications IET. 2. 399 - 409. 10.1049/iet-com:20070231.
Surveys & Tutorials. PP. 10.1109/COMST.2019.2904897. [85] Duffield, Nick & Lund, Carsten. (2003). Predicting resource usage and
[62] Nascita, Alfredo & Montieri, Antonio & Aceto, Giuseppe & Ciuonzo, estimation accuracy in an IP flow measurement collection infrastructure.
Domenico & Persico, Valerio & Pescapè, Antonio. (2021). XAI meets 179-191. 10.1145/948205.948228.
Mobile Traffic Classification: Understanding and Improving Multimodal [86] Su, Liya & Yao, Yepeng & Li, Ning & Liu, Junrong & Lu, Zhigang
Deep Learning Architectures. IEEE Transactions on Network and Service & Liu, Baoxu. (2018). Hierarchical Clustering Based Network Traffic
Management. PP. 10.1109/TNSM.2021.3098157. Data Reduction for Improving Suspicious Flow Detection. 744-753.
[63] Traffic Data from Kyoto University’s Honeypots. from 10.1109/TrustCom/BigDataSE.2018.00108.
http://www.takakura.com/Kyoto data/
[87] Yang, Yanqing & Zheng, Kangfeng & Wu, Chunhua & Niu, Xinxin &
[64] First.org, Hands-on Network Forensics - Train-
Yang, Yixian. (2019). Building an Effective Intrusion Detection System
ing PCAP dataset from FIRST 2015. from
Using the Modified Density Peak Clustering Algorithm and Deep Belief
www.first.org/ assets/conf2015/networkforensics virtualbox.zip
Networks. Applied Sciences. 9. 238. 10.3390/app9020238.
[65] UNIBS: Data sharing. (n.d.). UNIBS-2009. from
[88] Zhang, Surong & Bu, Youjun & Chen, Bo & Lu, Xiangyu. (2021).
http://netweb.ing.unibs.it/%7Entw/tools/traces/
Transfer Learning for Encrypted Malicious Traffic Detection Based on
[66] Milicenso, Ponmocup Malware dataset (Update 2012-10-07,
Efficientnet. 72-76. 10.1109/CTISC52352.2021.00021.
http://security-research.dyndns. org/pub/botnet/ponmocup/analysis 2012-
[89] Wang, Pan & Ye, Feng & Chen, Xuejiao & Qian, Yi. (2018). DataNet:
10-05/analysis.txt Accessed 1 Jan 2018)
Deep Learning based Encrypted Network Traffic Classification in SDN
[67] Rai, Arun. (2019). Explainable AI: from black box to glass box. Journal
Home Gateway. IEEE Access. PP. 1-1. 10.1109/ACCESS.2018.2872430.
of the Academy of Marketing Science. 48. 10.1007/s11747-019-00710-5.
[68] Berman, Daniel & Buczak, Anna & Chavis, Jeffrey & Corbett, Cherita. [90] Torroledo, Ivan & Camacho, Luis & Correa Bahnsen, Alejandro. (2018).
(2019). A Survey of Deep Learning Methods for Cyber Security. Infor- Hunting Malicious TLS Certificates with Deep Neural Networks. 64-73.
mation. 10. 122. 10.3390/info10040122. 10.1145/3270101.3270105.
[69] Moustafa, Nour & Slay, Jill. (2015). UNSW-NB15: a comprehensive [91] Vu, Ly & Tra, Dong & Nguyen, Uy. (2016). Learning from im-
data set for network intrusion detection systems (UNSW-NB15 network balanced data for encrypted traffic identification problem. 147-152.
data set). 10.1109/MilCIS.2015.7348942. 10.1145/3011077.3011132.
[70] Hamza, Ayyoob & Habibi Gharakheili, Hassan & Benson, Theophilus [92] Anderson, Blake & McGrew, David. (2016). Identifying
& Sivaraman, Vijay. (2019). Detecting Volumetric Attacks on loT De- Encrypted Malware Traffic with Contextual Flow Data. 35-46.
vices via SDN-Based Monitoring of MUD Activity. SOSR ’19: Pro- 10.1145/2996758.2996768.
ceedings of the 2019 ACM Symposium on SDN Research. 36-48. [93] Zhang, Meng & Zhang, Hongli & Zhang, Bo & Lu, Gang. (2013).
10.1145/3314148.3314352. Encrypted Traffic Classification Based on an Improved Clustering Al-
[71] iTrust. (2021, March 29). Labs Dataset Info. Available: gorithm. Communications in Computer and Information Science. 320.
https://itrust.sutd.edu.sg/itrust-labs datasets/dataset info/ 124-131. 10.1007/978-3-642-35795-4 16.
[72] Beigi, Elaheh & Jazi, Hossein & Stakhanova, Natalia & Ghorbani, Ali. [94] Shahbar, Khalid & Zincir-Heywood, A.. (2018). How far can we push
(2014). Towards effective feature selection in machine learning-based flow analysis to identify encrypted anonymity network traffic?. 1-6.
botnet detection approaches. 2014 IEEE Conference on Communications 10.1109/NOMS.2018.8406156.
and Network Security, CNS 2014. 247-255. 10.1109/CNS.2014.6997492. [95] Wang, Pan & Chen, Xuejiao & Ye, Feng & Zhixin, Sun. (2019).
[73] ZHAO, Bo & GUO, Hong & LIU, Qin-Rang & WU, Jiang-Xing. A Survey of Techniques for Mobile Service Encrypted Traffic Clas-
(2013). Protocol Independent Identification of Encrypted Traffic Based on sification Using Deep Learning. IEEE Access. PP. 1-1. 10.1109/AC-
Weighted Cumulative Sum Test. Ruan Jian Xue Bao/Journal of Software. CESS.2019.2912896.
24. 1334-1345. 10.3724/SP.J.1001.2013.04279. [96] AlAhmadi, Bushra & Martinovic, Ivan. (2018). MalClassifier: Mal-
[74] Yu, Tangda & Zou, Futai & Li, Linsen & yi, Ping. (2019). An Encrypted ware family classification using network flow sequence behaviour. 1-13.
Malicious Traffic Detection System Based on Neural Network. 62-70. 10.1109/ECRIME.2018.8376209.
10.1109/CyberC.2019.00020. [97] ZHAI M F & ZHANG X M & ZHAO B. (2020). Survey of encrypted
[75] Bovenzi, Giampaolo & Aceto, Giuseppe & Ciuonzo, Domenico & malicious traffic detection based on deep learning[J]. Chinese Journal of
Persico, Valerio & Pescapè, Antonio. (2020). A Hierarchical Hy- Network and Information Security, 6(3): 59-70
brid Intrusion Detection Approach in IoT Scenarios. 10.1109/GLOBE- [98] Aceto, Giuseppe & Ciuonzo, Domenico & Montieri, Antonio &
COM42002.2020.9348167. Pescapè, Antonio. (2020). Toward Effective Mobile Encrypted Traf-
[76] Dolgikh, Serge & Seddigh, Nabil & Nandy, Bis & Bennett, Dan & fic Classification through Deep Learning. Neurocomputing. 409.
Zeidler, Colin & Ren, Yonglin & Knoetze, Juhandre & Muthyala, Naveen. 10.1016/j.neucom.2020.05.036.
(2019). A Framework & System for Classification of Encrypted Network [99] Aceto, Giuseppe & Ciuonzo, Domenico & Montieri, Antonio & Pescapè,
Traffic using Machine Learning. 10.23919/CNSM46954.2019.9012662. Antonio. (2021). DISTILLER: Encrypted Traffic Classification via Mul-
timodal Multitask Deep Learning. Journal of Network and Computer
Applications. 183-184. 10.1016/j.jnca.2021.102985.
[100] Wang, Zihao; Fok, Kar Wai; Thing, Vrizlynn (2021), “Com-
posed Encrypted Malicious Traffic Dataset for machine learning
based encrypted malicious traffic analysis.”, Mendeley Data, V1, doi:
10.17632/ztyk4h3v6s.1, https://data.mendeley.com/datasets/ztyk4h3v6s/1