0% found this document useful (0 votes)
68 views21 pages

Rev3 Paper

This document summarizes a research paper on machine learning approaches for detecting encrypted malicious network traffic. It reviews existing literature on the topic, proposes a framework for discussing machine learning models for this application, analyzes available datasets, and compares the performance of different models trained on a compiled dataset. The authors contribute a comprehensive review of techniques, a systematic analysis of datasets, a categorization of model features, and comparative experiments to more reliably evaluate approaches for detecting encrypted malicious traffic.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views21 pages

Rev3 Paper

This document summarizes a research paper on machine learning approaches for detecting encrypted malicious network traffic. It reviews existing literature on the topic, proposes a framework for discussing machine learning models for this application, analyzes available datasets, and compares the performance of different models trained on a compiled dataset. The authors contribute a comprehensive review of techniques, a systematic analysis of datasets, a categorization of model features, and comparative experiments to more reliably evaluate approaches for detecting encrypted malicious traffic.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Machine Learning for Encrypted Malicious Traffic

Detection: Approaches, Datasets and Comparative


Study
Zihao Wang, Kar-Wai Fok, Vrizlynn L. L. Thing
Cybersecurity Strategic Technology Centre
ST Engineering
Singapore
{zihao.wang, fok.karwai}@stengg.com, [email protected]
arXiv:2203.09332v1 [cs.CR] 17 Mar 2022

Abstract—As people’s demand for personal privacy and data


security becomes a priority, encrypted traffic has become main-
stream in the cyber world. However, traffic encryption is also
shielding malicious and illegal traffic introduced by adversaries,
from being detected. This is especially so in the post-COVID-
19 environment where malicious traffic encryption is growing
rapidly. Common security solutions that rely on plain payload
content analysis such as deep packet inspection are rendered
useless. Thus, machine learning based approaches have become
an important direction for encrypted malicious traffic detection.
In this paper, we formulate a universal framework of machine
learning based encrypted malicious traffic detection techniques
and provided a systematic review. Furthermore, current research
adopts different datasets to train their models due to the lack
Fig. 1. Trend of encrypted traffic across google [2]
of well-recognized datasets and feature sets. As a result, their
model performance cannot be compared and analyzed reliably.
Therefore, in this paper, we analyse, process and combine datasets
from 5 different sources to generate a comprehensive and fair challenge to antivirus software and firewalls which cannot
dataset to aid future research in this field. On this basis, we decipher the traffic contents. In a 2018 cybersecurity report,
also implement and compare 10 encrypted malicious traffic Cisco noted that as of October 2017, about 280,000 of the
detection algorithms. We then discuss challenges and propose 400,000 analyzed malware had communicated through en-
future directions of research.
Index Terms—encrypted malicious traffic detection, traffic
cryption methods [3]. According to Zscaler’s 2020 Encrypted
classification, machine learning, deep learning. Attacks Report [4], there are more than 260% increase in
Secure Sockets Layer (SSL) based threats and more than
I. Introduction 500% increase in ransomware, where encrypted web traffic
was utilized. Therefore, starting from the COVID-19 period,
In recent years, with the increasing demand on privacy we are continuing to see accelerated growth in malicious traffic
and data security, enterprises are choosing to use encryption encryption. The report also pointed out that companies are at
mechanisms to protect their application traffic transmission. greater risk now because current cyber security systems cannot
In view of this trend, encrypted traffic data volume has inspect 100% of the network traffic. Therefore, it is timely that
increased dramatically in the global communication network. we investigate further on the detection of encrypted malicious
Gartner has reported that more than 80% of traffic in 2019 are traffic and the efficacy of existing research when dealing with
encrypted [1]. The trend of increasing traffic encryption from this challenge.
2014 to the beginning of 2021, from the Google Transparency The goal of this survey is to provide a comprehensive
Report, is shown in Fig. 1. The percentage of encrypted web overview of machine learning based methods for encrypted
traffic on the Internet has increased from around 50% in 2014 malicious traffic detection. We also propose a framework to
to around 95% after 2020. 97% of the world’s top 100 sites aid with the systematic discussion and analysis of machine
are utilizing the secure hypertext transfer protocol (HTTPs) learning based encrypted malicious traffic detection models.
[2]. We also create a model training dataset that is composed
While most people and enterprises enjoy the traffic privacy of public traffic data from various sources, which is more
and data protection provided by encrypted traffic, adversaries comprehensive than the datasets used in existing research. This
are also leveraging encryption to evade detection of their dataset is used to conduct fair comparative experiments on
malicious activities. With encrypted traffic, it poses a great the different proposed feature sets and algorithms. Finally,
the current challenges and future directions are discussed. well. Although [56][61][95] include works that are applicable
Therefore, the contribution of this paper is: to both encrypted traffic and unencrypted traffic, they do not
• Review and discuss the strengths, limitations and com- pay much attention to malicious traffic detection and focus
pare the different techniques based on the machine learn- on mobile devices. In our review paper, we not only include
ing framework we proposed. works related to encrypted malicious traffic detection but also
• Conduct a relatively comprehensive collation and analysis focus on platforms other than mobile devices.
of the traffic datasets currently available and analyse Berman et al. [68] provided a comprehensive review for
their characteristics, limitations and applicability. Such deep learning based cyber security applications including
analysis enables the dataset to better serve this machine network traffic identification, network intrusion detection,
learning based approach. malware classification and several others. Furthermore, the
• Categorise the features of machine learning models, and authors also provide a short introduction of each deep learning
discuss their strengths, limitations, applicable scenarios, algorithm, such as restricted boltzmann machines, recurrent
and possible optimization directions. neural network (RNN) and generative adversarial network
• Conduct comparative experiments based on the same (GAN), and cover a wide range of attack types, such as
training dataset, with different algorithms and feature malware, botnets and spam. Abbasi et al [59] presented a
sets to find a more reliable and consistent result. To the review of deep learning methods for network traffic monitoring
best of our knowledge, the dataset we built is the most and analysis(NTMA). Same as [68] The authors also pro-
comprehensive training dataset that is composed entirely vide us with detailed definitions and fundamental background
from open public data sources. of deep learning algorithms, such as Multi-layer perceptron
This paper is organized into six sections. Section 2 presents (MLP), Convolutional neural networks (CNN), Long short
the related surveys. In Section 3, we provide an introduction term memory (LSTM), Auto-encoder (AE) and GAN models.
of the different detection techniques and describe a general Aceto et al. [98] presented an overview of the key subjects of
framework of machine learning based encrypted malicious traffic analysis where deep learning is expected to be attractive.
traffic detection. The strengths, limitations and comparison The authors also provide a systematic taxonomy and categorize
among the existing works based on our proposed framework the existing deep learning based traffic classifications. Then,
are discussed in Section 4. In Section 5, we present the setup of they proposed a general deep learning based framework for
our experiments and conduct the performance evaluations. We encrypted and mobile traffic classification with a rigorous
conclude the paper in Section 6, by discussing the remaining definition of milestone(traffic object; type(s) of input data;
challenges and future directions. tasks of classification and deep learning architecture). Zhai
et al. [97] proposed a ‘six-step method’ framework for en-
II. RELATED SURVEYS crypted malicious traffic detection. The authors also reviewed
In this section, we introduce related survey works. Velan et the existing deep learning based encrypted malicious traffic
al. [6] provided us with detailed definitions and information detection approaches based on the proposed framework. 20
of widely used traffic encryption protocols in their survey existing public datasets are also sorted and discussed based
paper. It had also surveyed works on the classification of on their applicable scenarios with pros and cons. In contrast,
encrypted traffic before 2014. In contrast, we intend to conduct our study provides a more comprehensive review of not only
an in-depth study of existing research to analyse the various deep learning based approaches, but also traditional machine
detection methods that have emerged in recent years. learning based approaches and Detailed feature analysis.
Conti et al. [56] conducted an in-depth survey for state of
the art network traffic analysis generated by mobile devices. III. PRELIMINARIES
Three criteria are presented by them for a systematic classi-
fication of existing works, which are the objective of traffic Many mature technologies and methods for the detection
analysis; the network traffic capture point and the targeted and classification of unencrypted traffic, such as payload
mobile platforms. Zhang et al. [61] bridged the gap between based deep packet inspection (DPI) methods and port-based
deep learning and mobile and wireless networks through a identification methods, are already in existence. However,
comprehensive review of the crossovers between these two with traffic encryption mechanisms, these traditional methods
areas. The authors reviewed almost 600 research papers and are no longer applicable. Moreover, DPI methods had drawn
also discuss techniques and platforms which enhance the concerns over user data privacy. On the other hand, port-
deployment of deep learning on mobile environments. Wang based traffic identification methods assume that applications
et al. [95] focused on the research of applying deep learning use well-known Transmission Control Protocol/User Datagram
methods to encryption traffic classification of mobile service Protocol (TCP/UDP) port numbers assigned by Internet As-
based on dataset selection, model input design and model signed Numbers Authority (IANA). Therefore, once applica-
architecture. The authors also proposed a general framework tions do not follow the standards of IANA, such as applying
for deep learning based mobile encrypted traffic classification. dynamic ports or encryption protocols like Peer to Peer (P2P)
Furthermore, they pinpointed some noteworthy problems and protocol, the port-based traffic identification methods can no
challenges of deep learning in encrypted traffic classification as longer identify the traffic and applications [16].
To perform traffic encryption, there is currently a variety of performance of CNN and Stacked Auto-encoder (SAE) for
mechanisms, such as SSL, Transport Layer Security (TLS), encrypted traffic detection under the same dataset.
Virtual Private Network (VPN), Secure Shell Protocol (SSH), Khalife et al. [79] provided a fairly comprehensive frame-
and P2P. These encryption algorithms work differently; some work on network traffic classification. They divided traffic de-
at the transport layer while others at the application layer, tection and classification into three stages, namely data input,
making encrypted traffic classification a challenging task [5]. classification technique selection, and final result output. For
Even with the same encryption mechanism, the encrypted each stage, they listed the current technologies and methods in
traffic presents varying data distribution characteristics due to a general way. In the classification technique selection stage,
the different distribution and utilization of the original traffic there are machine learning technology methods, statistical
[15]. Thus, most of the research focuses on the binary clas- methods (i.e., Heuristics), and graphical technique methods
sification of encrypted traffic, which is to identify malicious (e.g., Motifs and Graphlets). However, the paper lacks an
traffic among legitimate traffic. in-depth technical analysis of the machine learning based
Flow based machine learning methods have been observed encrypted traffic classification techniques. In this work, we
to be the most common approach for encrypted traffic classi- analyse the process of machine learning based encrypted
fication, such as [11-17][31-33][75-77]. However, it is worth traffic classification from different studies and formulate our
noting that the collection of training datasets and the selection framework for the process.
of features for encrypted traffic detection remains an area of
vigorous research. IV. METHODS REVIEW BASED ON OUR
In recent years, researchers have proposed many detection FRAMEWORK
methods based on machine learning, which can be generally Fig. 2 illustrates our proposed framework. It is specifically
classified into traditional machine learning and deep learning. designed to suit a variety of detection models and is com-
Traditional Machine Learning can be further classified into two patible with most existing studies, regardless of the datasets,
sub-groups: supervised learning and unsupervised learning. features or algorithms used. The framework consists of 6 main
In the field of supervised learning, Shekhawat et al. [29] steps. In the first step, the research targets should be set. This
applied three supervised learning algorithms (Random Forest is followed by traffic dataset collection which can be done
(RF), Support Vector Machine (SVM), and XGBoost) to the via two methods: real traffic collection and simulated traffic
problem of distinguishing HTTPs malicious traffic and HTTPs generation. The third step depends on the mode of feature
legitimate traffic. Stergiopoulos et al. [14] conducted compar- selection employed. Feature selection may be performed man-
ative experiments by using seven different supervised learning ually through the advice of domain experts or in an automated
algorithms such as k-nearest neighbors (KNN), Classification manner using machine learning models. If the former is used,
And Regression Trees (CART), and Naı̈ve Bayes to detect feature selection will be performed first and the traffic and
malicious traffic from a dataset with more than one encryption features extraction is done next based on the selected features.
protocol. Ma et al. [32] proposed an enhanced KNN algorithm On the other hand, if the latter is used, all possible features are
to train an encrypted traffic detection model, which enhances first extracted from the traffic followed by automated feature
the KNN distance calculation. For unsupervised Learning, selection to obtain the most ideal feature sets. After feature
Chen et al. [13] proposed an improved density peaks clustering extraction and selection, the selection of algorithms is done
algorithm to enhance the accuracy and efficiency of encrypted as the next step. Finally, experimental evaluation is performed
malicious traffic detection. Celik et al. [17] compared the using the selected features and algorithms. We will discuss
performance of K-means, one-class support vector machine and review methods proposed in different research based on
(OCSVM), least squares anomaly detection (LSAD), and these steps in this section.
KNN algorithms by using tamper resistant features, such as
Goodput and ratio between maximum packet over minimum A. Research Objective
packet. Zhang et al. [93] proposed a novel clustering algorithm Encrypted traffic classification can be generally divided
for identifying encrypted traffic by applying harmonic mean into two subcategories: 1. encrypted & unencrypted traffic
to clustering distance metrics. As for the research on deep classification. 2. encrypted malicious traffic detection.
learning in encrypted traffic detection, most researchers focus The research on encrypted & unencrypted traffic classifica-
on studying the performance of CNN, RNN, and AE. Bazuhair tion may also be regarded as the initial step of the research
et al. [11] proposed a new encoding method where TLS/SSL on encrypted malicious traffic detection. This is needed to
connection features are converted into images. Such an image successfully extract encrypted traffic from mixed traffic of
dataset is utilized to train the CNN model. Yao et al. [15] encrypted and unencrypted traffic. Yao et al. [15] proposed
proposed an RNN with attention mechanism for encrypted an LSTM with attention mechanism RNN model for the iden-
traffic classification. Prasse et al. [25] applied a combination of tification of encrypted traffic, which can avoid complex and
neural domain name features and numeric flow features as the time-consuming feature engineering and learn the relationships
feature set for RNN, and showed that the model outperformed among traffic flow features at the same time. Niu et al. [16]
other feature sets. The authors used their self-generated dataset proposed a heuristic approach which combines both statis-
for this work. Lotfollahi et al. [81] compared the experimental tics and machine learning to classify the encrypted network
Fig. 2. General framework of a machine learning based encrypted traffic detection model

traffic, so as to make up for their respective shortcomings. SVM, XGBoost, and RF algorithms with TLS/SSL features
The modified National Institute of Standard and Technology for binary detection and multi-classification of encrypted
(NIST) test suite is utilized by the authors to simplify the malicious traffic, and finally proposed that XGBoost with
interactions between packet processing and feature extraction. selected TLS/SSL features can achieve a 99.15% accuracy
A handshake-skipping algorithm (HST-R) is also proposed under their own training dataset. Anderson and McGrew et al.
to skip the handshake process during feature extraction, so [28] combined the unique features of TLS/SSL protocol and
as to avoid bringing the wrong message to statistical-based statistical numerical features of the encrypted traffic to detect
approaches. It also further increases the classification accuracy HTTPs malicious traffic. Yang et al. [8] reassembled SSL
of the proposed machine learning based classifier. Zhao et records from packets and proposed an SSL malicious traffic
al. [73] proposed an algorithm to perform the identification detection method by deep learning combination of LSTM,
of encrypted traffic from both public and private encryption AE, and CNN. Torroledo et al. [90] proposed a deep learning
protocols which is named as encrypted traffic identification based identification system to detect legitimate and malicious
based on weighted cumulative sum test (EIWCT). The method TLS traffic certificates by utilizing the TLS certificate content.
utilizes the EIWCT to process each new incoming packet Meghdouri et al. [80] proposed an RF detection model for
instead of testing after all packets arrived, as well as identifies TLS and Internet Protocol Security (IPSec) protocols with
the traffic based on weighted conflated results. The proposed novel cross-layer features, which comprise of features in the
algorithm reduces computation time and achieves the online application level, conversation level and endpoint behaviour
identification of encrypted traffic from common encrypted level at the same time.
protocols, such as SSL and SSH, and other encrypted private There are many works on the binary classification of en-
protocols efficiently. Above 90% accuracy is achieved for SSL crypted traffic as malicious or legitimate [6]. For example, SSL
and private encryption protocol traffic in their experiments. malicious traffic detection model [8] classified SSL traffic as
Shahbar et al. [94] conducted comparative experiments based malicious or legitimate without considering the malware fam-
on supervised learning algorithms such as RF and Naı̈ve Bayes ilies. Side-channel features based encrypted malicious traffic
for The Onion Router (Tor) and Invisible Internet Project (I2P) detection model [14] focused on binary classification. HTTPs
encrypted traffic classification. malicious traffic detection model in [28] is to classify traffic
For the research on encrypted malicious traffic detection, into enterprise and malware traffic. Some works also pro-
various approaches have been proposed in existing works. posed multi-class classification of traffic belonging to different
Many detection approaches are proposed regardless of the type malware families. For example, Liu et al. [12] proposed a
of encrypted protocols. Stergiopoulos et al. [14] proposed a framework, which is comprised of detection models for 24
machine learning based encrypted malicious traffic detection malware types. However, the performance of the models in
model by using only a few features. The authors compared 7 the framework was not tested separately. If any model in the
different algorithms on their traffic datasets with multiple pro- set of detection models performs badly, it is assumed that the
tocols to prove they use fewer features, fewer data and shorter other models with good performance will buffer the overall
time to achieve similar or even higher performance com- result. Zeng et al. [9] proposed a framework of encrypted
pared with other existing works. Furthermore, there are also traffic multi-class classification and intrusion detection based
many works focusing on certain encryption protocols. Most on deep learning. The framework is constructed from three
research aim to detect encrypted malicious traffic under HTTPs different deep learning algorithms, CNN, LSTM, and SAE.
protocols, because HTTPs has become the most commonly However, there is no existing work that covers all types of
used encryption mechanism for both legitimate and malicious traffic and malware families. The performance of multi-class
traffic encryption. Research [11][13][25][28][29][33][92] ex- classification is also observed to be generally worse than the
tracted unique features from TLS/SSL protocol to train their binary classification approaches.
HTTPs detection models. Shekhawat et al. [29] compared In all, existing research can be summarised in four general
approaches: consecutive working days. The real traffic collection method
1. Binary classification regardless of the encryption proto- ensures the authenticity of the network traffic in the dataset,
cols. but it also has some shortcomings. Due to the low frequency of
2. Binary classification under certain encryption protocols. real network attacks compared to normal network interaction,
3. Multi-class classification based on malware families the collected dataset is often imbalanced and has a lot of
regardless of the encryption protocols. redundant data, which requires a lot of time for screening and
4. Multi-class classification based on malware families labelling.
under certain encryption protocols. The simulated traffic generation usually generates targeted
Meanwhile, the encrypted traffic classification can also be traffic datasets through the construction of a simulated network
applied to IoT devices domain. IoT devices are generally or the use of a script simulator. Yu et al. [74] built a network
resource limited and more susceptible to cyber attacks. Due to structure which includes a gateway server and several intranet
the resource constraints and heterogeneity of IoT devices, tra- clients. Metasploit is then used to generate different attacks
ditional endpoint and network security solutions are not a good to the gateway of servers, and Wireshark is utilized to collect
fit for securing IoT devices. Six challenges are summarised traffic at the gateway of servers. In [53], part of traffic data is
in securing IoT devices by [20]. They are endpoint security also generated by simulation. Datasets collected and generated
solutions with limited support, challenges in protecting device- by this method often contain more attacks and malware types.
to-device communications, high IoT devices diversity, high They are also more balanced than the datasets from real traffic
security deployment and operational costs, privacy, perfor- collection, and are more suitable for machine learning. How-
mance and attack risk trade off, and that most edge networks ever, the datasets generated from simulated traffic generation
are vulnerable to attacks. Nakahara et al. [19] proposed an may differ from real network data, and may not truly represent
anomaly detection system that avoids relying on IoT device the real network traffic conditions.
itself to detect and analyse packets, but rather by using other Training an encrypted malicious traffic detection model
devices such as a low load home gateway. Then, the statistical requires a dataset to have following features:
features of aggregated traffic information are used, so as to 1. Sufficient amount of encrypted traffic.
reduce the processing load of the home gateway and the model. 2. High variety of encrypted malicious attacks.
Wang et al. [89] proposed a deep learning based encrypted 3. Class balanced.
traffic classifiers, named as DataNets. MLP, SAE, and CNN 4. Ground truth is confirmed.
are utilized in the classifier for encrypted traffic classification 5. No redundant data.
on software defined network. Aceto et al. [27] proposed a Unfortunately, there is no well-established dataset for this
mobile traffic classifier which can deal with encrypted traffic specific domain problem currently. Thus, research works often
through automatic feature extraction and deep learning. use their own private datasets or select from the few available
Finally, in the process of setting our research objectives, we public datasets. This presents a challenge of performing fair
also need to consider the running environment of the detection performance comparisons among these existing works. There-
model. That is, whether a detection model runs online or fore, we aim to address this issue so as to help lay a good
offline because different operating environments have different foundation to provide for future research on this topic. We
requirements for efficiency, robustness, and accuracy. At times, collect the publicly available datasets which are applicable
we need to compromise one requirement to better meet the to encrypted traffic analysis and analyse their targeted and
others. applicable fields, characteristics, limitations, and whether they
contain encrypted malicious traffic. We show the compiled
B. Traffic Dataset Collection dataset list in Table I. Through our analysis, we found that
After the research objectives have been established, the most public datasets either contain a few or do not contain
next step is to collect the corresponding datasets for model encrypted traffic at all. Datasets containing encrypted mali-
training. Data collection methods can be generally divided cious traffic are even rarer.
into two types: real traffic collection and simulated traffic
generation. For real traffic collection, traffic data is collected C. Traffic and Feature Extraction
from an operational network environment through commonly a) Traffic data pre-processing: The first step of dataset
used traffic collection software, such as Wireshark. The col- pre-processing is data cleaning and filtering. Most public
lected data is then analysed and labelled for use. In [71], datasets are in the format of Packet Capture (PCAP)/ PCAP
the authors deployed the IoT honeypot to collect the network Next Generation (PCAPNG), and these PCAP/PCAPNG files
traffic dataset from 2017 to 2018. In this one and a half years, often record the original raw traffic data. For this kind of
the honeypot was presented on 40 public network IP addresses data, it is necessary to clean up irrelevant network packets
and the traffic was forwarded to 11 real IoT devices. In another which are not applicable in the research of encrypted malicious
project [58] a sample of internet traffic from Japan to USA traffic detection, such as Address Resolution Protocol (ARP)
over different time periods is collected. The data collection or Internet Control Message Protocol (ICMP) packets. Then
began in 2001. UNIBS dataset [65] contains 27GB traffic, we need to remove duplicated, damaged, unnecessary, and
which is captured on the university campus network for three incompletely captured traffic streams or information which
TABLE I
T HE SUMMARY OF PUBLIC TRAFFIC DATASETS

Contain
Year of Encrypted
Dataset Title Characteristics and Limitations
Release Malicious
Traffic
20 malwares traffic are captured in IoT devices. It also includes 3 captures of legitimate
IoT-23
2020 IoT devices traffic. However, only a very small part of traffic is encrypted. The dataset Y
Dataset [51]
contains only raw data.
IoT The dataset contains traffic from several IoT devices. Most traffics are unencrypted,
Encrypted 2020 which are not applicable to encrypted traffic detection model training. The dataset N
Traffic [50] contains only raw data.
UNSW NS The dataset contains encrypted legitimate and malicious traffics from more than 28
2019 Y
2019 [70] different IoT devices. Features extracted from captured raw data are not provided.
The dataset is collected by more than 280 experimenters in the ARCLAB laboratory
at the University of Napoli ”Federico II”. It is a human-generated mobile-app dataset
MIRAGE-2019
2019 from May 2017 to May 2019 for mobile traffic analysis based on the authors proposed N
dataset [31]
MIRAGE architecture[31]. The dataset contains 40 Android apps from 16 different
categories.
The dataset is released on Canadian Institute for Cybersecurity[18]. 5000 of the collected
CIC-AndMal samples (426 malware and 5065 legitimate) were installed on real
2017 Y
2017 [60] devices. All samples are categorized into 4 malware types, from 42 malware families.
Limited extracted features are provided using CICFlowMeter [78].
legitimate and common attacks traffic, which resembles the real traffic (PCAPs) are
CICIDS
2017 captured in this dataset. legitimate traffic is generated by simulation. Limited features Y
2017 [53]
are extracted from captured raw traffic.
VPN- A regular traffic session and a session over VPN were captured. There are 14 traffic
non-VPN categories in this dataset. Captured data have been pre-processed and ISCXFlowMeter
2016 N
traffic dataset (old version of CICFLowMeter [78]) is utilized to create features csv files.
[55] Selected features file and original PCAP file are provided.
The available dataset is generated by the IXIA PerfectStorm tool [69]. It contains 49
UNSW NS
2015 features of 6 feature categories and ten traffic classes (Normal, Fuzzers, Analysis, Y
2015 [69]
Backdoors, Denial-of-Service, Exploits, Generic, Reconnaissance, Shellcode, and Worms).
Traffic is composed from Reverse Shell shellcode connections, website defacing
attacks, ransomware downloaded attack crypto locker and a command and conquer
FIRST 2015
2015 exploit attack (C2) over SSL that takes over the victim machine. However, since Y
[64]
encrypted malicious traffic in the dataset is significantly less than other forms
of malicious traffic [14], it is not suitable for encrypted malicious traffic detection.
Malware
The malware in this dataset is executed under bandwidth limit and spam interception.
Capture
2013 The most important characteristic of this dataset is malware is executed in a long Y
Facility
periods of time (up to several months). The dataset contains only raw data.
Project [49]
Malware
Datasets are from malware traffic analysis website. The website provides lots of
traffic
2013 different malware traffic since 2013. However, traffic data is complex and time Y
analysis net
consuming to process. The dataset contains only raw data.
[52]
A systematic approach to generate HTTP, SMTP, SSH, IMAP, POP3, and FTP traffic
CICIDS
2012 datasets is used. Thus, the datasets cannot fully represent real network traffic. Y
2012 [54]
Limited features are extracted from captured raw traffic.
This dataset contains a total of 13 botnet capture in different scenarios. Each of these
CTU-13 [48] 2011 captures are collected for a period of time. Most traffics are unencrypted traffic, but Y
contain encrypted traffic. The dataset contains only raw data.
Webldent
Web requests and responses over an encrypted SSH tunnel were collected. Limited
2 Traces 2006 Y
features are extracted from captured raw traffic.
[57]
Kyoto The dataset only contains the traffic on honeypots, and the captured raw traffic PCAP file
Dataset 2006 is not provided. It contains limited extracted features csv files only, which means we Y
[63] cannot further process features from captured raw traffic.
may interfere with model training. Data truncation and padding total length of IP packet header per session.
are utilized to keep the length of input data consistent [83]. Feature engineering can be further applied to these packet
Finally, we filter out the non-encrypted traffic which results in based and session based features. Statistical features are ob-
the final dataset containing only encrypted traffic. There are tained from conducting statistical calculations on such fea-
also some public datasets that are not released under basic tures. Common calculations include mean, median, maximum,
PCAP files, but csv format with prior cleaning, filtering, and minimum, variance, and standard deviation (STD) of packet
feature extraction, such as [53]. However, if there is no label in based and session based features. [19][14] selected packet
csv feature datasets to indicate if these features are extracted based features such as packet size, payload size and the time
from encrypted traffic or unencrypted traffic, and the source difference between sessions as their features to train their
of the dataset does not provide the original traffic files (i.e., detection models. In [20][32][76][77], the authors selected
PCAP files), such datasets are difficult to the encrypted traffic session based features and statistical features as their feature
detection model training. Furthermore, if a feature needed to set. Liu et al. [12] extracted more than 80 packet based, session
train the model is not included in this kind of dataset, such based, and statistical features, which are used in their detection
as in [63], the dataset cannot be used as well. In addition, models.
we also need to convert or encode categorical data, such as Many works selected their unique subsets from protocol-
network protocol information, encryption protocol certificates, agnostic numerical features as their model training feature set
server indication names, and so on, in the dataset to numeric based on the distinctive attributes of features. Celik et al. [17]
data. defined tamper resistant features (e.g., IPratio and goodput),
Data imbalance in datasets is also a problem. In existing such features are not related to port, flag or payload. These
research, two methods are usually adopted to solve this prob- features are consistent and difficult to be masked by adver-
lem. The first one is the combination of different datasets, saries. Stergiopoulos et al. [14] defined their TCP side channel
or the use of self-generated dataset to increase the type and features (e.g., ratio to previous packets), which reduces the
quantity of traffic. In the second method, data augmentation, size of training dataset and training time while ensuring
over-sampling, under-sampling or over-sampling followed by encrypted malicious traffic detection accuracy is above 99%.
under-sampling can be applied to address data imbalance. Liu et al. [12] divided their extracted features into 4 different
Yan et al. [26] utilized a mean synthetic minority over- sub-categories based on the attributes of features: TCP/IP
sampling (SMOTE) to balance their own dataset. Wang et header features (e.g., IP header); time based features (e.g.,
al. [89] utilized the under-sampling method to reduce the average inter-arrival time of packets), length related features
number of samples of major classes in the dataset. Vu et (e.g., packet length and payload length), and packet variation
al. [91] conducted a series of comparative experiments with features (e.g., TCP window change times). These features are
different existing methods of addressing data imbalance. Data used for clustering. Then, different feature subsets are selected
normalization is another important part of data pre-processing, based on sequential forward selection (SFS) algorithms for
especially for deep learning. Data normalization can normalize their more than 20 supervised learning models. Sarkar et al.
the traffic data or feature value to a range of zero to one or [77] selected time related features, such as flow duration, mean
negative one to positive one so as to reduce data redundancy, of backward packet time difference, STD of forward packet
enhancing the integrity and efficiency of model training. time difference, and STD of time difference for their model.
b) Feature extraction: Feature extraction is the next key For protocol-specific features, features from the information
step after dataset collection. In order to select the best feature of encrypted protocols are extracted instead. Such features
set, it is necessary to first understand the types and charac- frequently appear in the research of HTTPs traffic detection as
teristics of features that can be extracted from traffic data. HTTPs is dominating as the encrypted traffic type currently.
At present, there is no recognized classification or naming Protocol-specific features extracted from HTTPs protocols
convention for network traffic features. Thus, we performed are mainly referred to as TLS/SSL features. Such features
analysis and categorization of features by studying the relevant are usually extracted from three different log files, conn.log
works. The study led us to broadly classify the features into file, SSL.log file, and X509.log (certificate log) file. One of
two types: protocol-agnostic numerical features and protocol- the most commonly used log generation applications from
specific features. captured raw network traffic is Zeek IDS (Bro IDS). Zeek
Protocol-agnostic numerical features have two granularities, IDS generates such log files based on traffic and protocols
which are packet based features and session based features. by processing raw network traffic [24]. These log files are
Packet based features are features extracted at the traffic packet interconnected to one another through the uid (unique ids)
level, such as time difference between packets in a session column and cert chain fuid (certificate identifiers) column.
flow, packet size (or packet length) of each packet, payload Features extracted from conn.log file have certain overlaps
size (or payload length) of each packet, and value changes of with the protocol-agnostic numerical features. For example,
TCP windows length per session. Session based features are its features also include the number of flow and payload
features extracted at the session flow level, such as session bytes from client and server. Ssl.log file and x509.log file are
flow duration, total bytes from client/server in each session, interconnected. Extracting x509 features is reliant on ssl.log
total number of packets from client/server in each session, and information, and the relationship among these three files is
feature engineering is more decisive and important. However,
because of privacy concerns, the authors did not disclose
their datasets or domain experts’ suggested feature set. Many
research [14][16][17][19][28][29][33] have used similar ways
to artificially select features for their experiments. However, it
is not always reliable to manually choose the most important
features. Human error may exist and human may not be able
to find non-intuitive features, which may greatly affect the
performance of the detection models. Noted that for domain
expert based selection, the features will first be selected and
this set of features will then be extracted from the traffic.
Another category of feature selection methods is based
Fig. 3. Relationship among protocol logs on machine learning self-selection with some algorithms. It
requires humans to extracted all possible features and selected
algorithms will rank and select the most suitable features as
explained in Fig. 3. Examples of ssl Log Features include the final feature set, or directly process raw data to learn
TLS version types and the ratio of the same issuer. X509 and extract required features by themselves. In [11], after
Log Features include public certificate key mean, mean of the image augmentation and feature extraction, the authors
certificate validity, and average number of domains in Subject utilized the mean decrease in impurity algorithm to rank
Alternative Name (SAN). [11][13][25][28][29][33][92] used features based on their importance. Lastly, they generated a
features from TLS/SSL features in their works. In [11], 12 feature set with top ranked features. Liu et al. [12] applied
connection features, 6 SSL features, and 6 x509 (certificate) SFS algorithm to increase the size of feature set gradually
features were selected as the TLS/SSL feature set for CNN until they found their optimal feature set. However, SFS uses
and RF models. In [28][33], the authors chose to combine a greedy algorithm, which may fall into the local optimum
protocol-agnostic numerical features and TLS/SSL features as and is very time consuming. To avoid these limitations, the
their feature set. authors also further enhanced the SFS algorithm by adding
For protocol-specific features, one significant limitation is random selection in each round. Finally, they combined fea-
that such features are only applicable to their related encrypted ture sets with top performance to get their final feature set.
protocols. Therefore, if the dataset contains other encryption Shekhawat et al. [29] extracted TLS/SSL features and applied
protocols, protocol-specific features will not be applicable any- recursive feature elimination (RFE) to eliminate features with
more. On the other hand, the extraction of protocol-agnostic the lowest ranking score iteratively. The author applied the
numerical features does not depend on any specific contents or final feature set to XGBoost, SVM, and RF algorithms.
information of the traffic communication and protocols. Thus, For RF model, they obtained a nearly 99% accuracy while
no matter what encrypted mechanism is utilized, protocol- XGBoost achieved 99.15% accuracy. Desai et al. [30] pro-
agnostic numerical features can always be extracted. However, posed a feature ranking framework based on fundamental
since there is a large amount of protocol-agnostic numerical statistical tests. It is specifically designed for IoT device traffic
features with different extraction logic, the extraction process classification, which can select important features from IoT
is complex, time-consuming, and requires prior knowledge. network traffic, reducing cost and protecting user privacy.
Their experiments indicate that the small number of features
D. Feature Set Selection can achieve similar accuracy as compared to other existing
It is important to note that more features do not imply better methods. Many research [13][20][32] are designed to use
results. Having too many features in the feature set requires a machine selection on the optimal features chosen from a
high volume of memory and computation, extends the model large set of extracted features without human intervention. On
training time and may also affect the model accuracy. the other hand, [9][10][37][44][47] directly used raw data as
Common feature selection methods can be divided into two the deep learning methods input. The model can learn and
categories. The first one is domain expert based selection. extract the required features automatically. Zheng et al. [9]
Domain experts select a set of features that they think is applied CNN, LSTM, and SAE to learn features from different
the most appropriate based on their experience and knowl- aspects. For example, extracting features from time based
edge. After that, such suggested features will be extracted aspect was based on LSTM in [9]. Machine learning based
directly from datasets and used as model input. In [28], the feature selection can discover non-intuitive features, and can
authors found that iterating the initial feature set and adding avoid human errors and bias at the same time. For this method,
domain experts’ suggested features can greatly enhance the manual feature extraction process is skipped and raw data is
detection system’s performance. The RF ensemble method fed to automated feature selection and extraction, or feature
outperformed competing methods from an algorithmic per- extraction is performed first to provide a full set of features
spective based on a feature set from the domain experts. They which is then fed to automated feature selection. However, due
also mentioned that as compared with algorithm selection, to the black box feature of AI, especially when deep learning
algorithms are involved, it is very challenging to interpret Stergiopoulos et al. [14] defined their side channel features
the feature selection process and provide explainability of the and applied their feature set into 7 different machine learning
reasons features were selected as optimal. Additional research algorithms to verify the performance of these side channel
is required to solve the above issue [62][67]. features. The paper focuses on malicious traffic detection,
but also includes encrypted traffic detection. The encrypted
E. Algorithms Selection dataset was extracted from CTU-13 [48], FIRST [64], and
Milicenso [66] and they obtained 99.8% accuracy under CART
Many mature traditional methods for traffic detection, such model. Shekhawat et al. [29] tested XGBoost, SVM and RF
as DPI, are no longer applicable to encrypted traffic. At algorithms with a machine selected feature set and achieved
present, traditional machine learning methods and deep learn- 99.8% accuracy with the XGBoost model.
ing methods are two mainstream research directions in this Ma et al. [32] proposed an enhanced KNN algorithm,
area. which is named WKNN-Selfada (feature weight self-adaptive
a) Traditional Machine Learning Approach: For the algorithm for weighted feature KNN). The proposed algorithm
traditional machine learning, algorithms and feature set opti- improved the KNN distance calculation and also included a
mization are the main focus. For unsupervised learning, there sub-algorithm that can choose the suitable feature set and
are many works like Chen et al. [13], which proposed a feature weights. However, although the paper mentioned that
three-stage hierarchical sampling approach by further devel- the proposed algorithm can be used for encrypted traffic
oping density peaks clustering algorithm (THS-IDPC) based detection, they did not perform any experimental evaluation.
on grid screening, custom centre decision value and mutual Niu et al. [16] proposed a heuristic statistical testing
neighbour degree (DPC-GS-MND). Experiments have shown (HST) approach. The HST approach consists of 3 parts:
that their DPC-GS-MND is better than average-linkage [86], Jnetpcap-based Executor (handshake-skipping algorithm), en-
DPC, modified density peak clustering algorithm (MDPCA) hanced NIST Test Suite and an algorithm classifier (such as
[87] and DPC-GS. Furthermore, their performance of THS- C4.5). The experiment data used in this paper were based
IDPC based model is shown to be better than selective on two proprietary protocols (Freegrate and Ultrasuff) and
sampling [84], smart sampling [85] and hierarchical clustering one private custom unknown protocol. The dataset was not
based sampling (HCBS) [86]. The algorithm can significantly publicly released. A 10-fold cross validation was used in the
reduce the cost of computation and enhance the accuracy experiment. The experiment shows that after applying the
and efficiency of encrypted malicious traffic detection model. handshake skipping algorithm, the accuracy of the experiments
However, the method in this work cannot handle the class improved significantly and that HST is more robust than the
imbalanced problem. Furthermore, DPC-GS-MND uses the K other tested machine learning methods.
nearest neighbour idea, but the K value has to be decided Frameworks that combine supervised learning and unsu-
manually. pervised learning also exist in many works. In [12] the
Hafeez et al. [20] proposed an IoT Keeper model which is authors proposed a distance based encrypted malicious traffic
constructed by the fuzzy C-means (FCM) clustering algorithm identification framework, which comprised of a series of
with the fuzzy interpolation scheme, to perform malicious detection models for different malware types. They firstly
IoT traffic classification. Such unsupervised learning algorithm applied the Gaussian mixture model (GMM) and ordering
does not need fully labelled dataset as the model training points to identify the clustering structure (OPTICS) to classify
input. Furthermore, the proposed model will not be restricted the new malware class (FClass) based on the distance calcu-
to certain types of IoT devices. A novel mechanism, Adhoc lation between malwares. After that, 24 XGBoost encrypted
Overlay networks, is also applied to the proposed model, malicious detection models are trained to identify the 24 kinds
which can strengthen the access control to IoT devices network of malware. Comparative experiments have shown that their
activities actively. However, IoT Keeper was not tested with proposed method is better than the MalClassifier model in [96]
encrypted malicious traffic and the authors listed that as a and the CluClas model in [21].
future research direction. Unsupervised deep learning like b) Deep Learning Approach: Research on deep learning
AE is further discussed in the next deep learning approach has grown rapidly in recent years and has achieved remarkable
subsection. results. Deep learning based encrypted traffic detection has
On supervised learning, Meghdouri et al. [80] proposed many obvious advantages, such as the ability to automatically
an RF classification model based on a multi-key based ap- extract the required data features through its own feature
proach, which is a novel cross-layer feature representation learning. It is also easier to find non-intuitive connections
of traffic data under TLS and IPSec protocols. They tested among traffic features that humans cannot.
the model using three different datasets, CICIDS-2017[54], A general deep learning based framework is proposed by
UNSW0-NB15[22] and ISCX-bot-2014[72], which achieved Aceto et al. [98] for encrypted and mobile traffic classification.
100%, 92.6% and 99.2% F1 scores, respectively. Comparative The proposed framework provides clear guidelines for design-
experiments were also conducted with other existing methods ers in deep learning based traffic classification. It also over-
which used the same datasets. They showed that their model comes the design limitations of the existing single-modality or
outperformed those methods. single-task learning methods by jointly using multimodal and
multi-task techniques. Moreover, the authors test two imple- maximum pooling for non-linear sampling outperformed other
mentations of the framework based on three datasets, which models. Prasse et al. [25] proposed an encrypted malware
are generated by the activity of mobile users. Aceto et al.[99] detection model based on LSTM. The work focused on HTTPs
also proposed a novel multimodal and multitask deep learning traffic and self-collected dataset by using cloud web security
based approach for multipurpose encrypted traffic classifi- (CWS) and VirusTotal, which helped the authors get enough
cation, DISTILLER. The proposed method overcomes the malicious and legitimate traffic. The proposed detection model
performance limitations of state-of-the-art single-mode deep can classify different malware families, even for new unknown
learning models based on heterogeneous and structured traffic. malware. A comparative experiment was also conducted and
It can also tackle different traffic categorization problems from showed that LSTM based model outperformed RF detection
different providers. A fair comparative experiment with 8 other model.
existing deep learning based encrypted traffic classification There are also works that try to combine CNN and RNN
methods is conducted by using the VPN-non-VPN public to their models. Lopez-Martin et al. [83] combined RNN
dataset[55]. Experiment results indicate that DISTILLER has with CNN to classify IoT traffic regardless of the traffic was
a better performance than other methods. encrypted or not. The proposed model achieved a 95.74% F1
Bazuhair et al. [11] proposed a new method which can score and 96.32% accuracy in an imbalanced dataset. The
enhance the generalization of CNN in encrypted malicious research conducted model training using 5 different feature
traffic detection. That work focused on HTTPs malicious sets to analyse the importance of feature set selection in model
traffic and the authors developed a binary detection model. training. The authors also considered the impact of the length
A new encoding method which can convert TLS/SSL features of traffic session flow and the trade off between computing
into images was proposed. Then, Perlin noise is utilized to time and detection rate. Their comparative experiment using
do the data argumentation. This enhanced the generalization the different number of packets in each session indicated that
of the deep learning model. CTU-13 [48] dataset is used in keeping to between five and fifteen packets in each session
the experiment and the CNN model achieved 97% accuracy, flow can achieve above 94.5% performance metric (accuracy,
with a 0.4% false negative rate and 5.6% false positive rate F1, precision, and recall). Furthermore, sessions with less than
(FPR). Lucia and Cotton. [46] proposed a TLS malicious the pre-decided packets number will be padded as zeros to
traffic classification based on a public dataset malware capture ensure each session has the same number of packets. Wang
facility project[49]. SVM and CNN were selected to conduct et al. [36] proposed a hierarchical spatial-temporal features-
the experiment. The one-dimensional CNN achieved 99.91% based intrusion detection system (HAST-IDS) by combining
for both accuracy and F1 with Adam optimizer, 32 batch size, CNN with LSTM. Spatial features and temporal features can
100 epochs and early stopping. The non-linear SVM with be learned by CNN and LSTM, respectively. The resultant
radial basis function kernel achieved 99.97% for both accuracy model brought a reduction to the FPR.
and F1. That work was an enhancement of their previous work [7-10][22][38-42][75][81] applied AE in their detection
[82]. Wang et al. [37] proposed an end to end encrypted traffic models with their different self-collected or public datasets.
classification method. VPN-non-VPN [55] traffic data was A network intrusion detection(for both known and unknown
utilized. The data was split into the same length and used as the attacks) based on a two-stage architecture is proposed by
model input. Furthermore, the end-to-end framework merged Bovenzi et al. [75], which is named H2ID. The first stage
feature extraction, selection and classifier processes, so as to of this framework is to utilize a novel multimodal deep
automatically learn and discover the required features for clas- auto-encoder (M2-DAE) to perform a lightweight anomaly
sification. The experiments showed that the one-dimensional detection. The anomalous traffic is then classified into different
CNN outperforms the two-dimensional CNN. [34][43-45][88] types of attack traffic such as scans and distributed denial-
also applied CNN in their experiments which achieved above of-service using soft-output classifiers in the second stage.
93% detection rates. BotIoT[23] dataset is selected to validate the performance
Yao et al. [15] proposed two encrypted traffic classification of the proposed approach. In [81], comparative experiments
methods. The first method is based on LSTM with attention based on CNN and SAE were conducted by using the VPN-
mechanism and the second one is based on the hierarchical non-VPN dataset[55]. CNN and SAE both achieved above
attention network (HAN). VPN-non-vpn [55] dataset was 92% accuracy for application and traffic classification. Yang
used for their comparative experiments with attention based et al. [8] proposed an encrypted malicious traffic classifier that
LSTM, HAN, Deep Packet [81], one-dimensional CNN model combines CNN and LSTM auto-encoder. They finally achieved
[37], decision tree [35] and XGBoost. The experiment result a 95.8% detection rate. Zeng et al. [9] proposed an encrypted
indicated that their attention based LSTM model and HAN traffic based intrusion detection framework, named Deep-Full-
neural network model both outperformed the machine learning Range (DFR). The framework is constructed by CNN, LSTM,
based model from [35] and the one-dimensional CNN model and SAE to self-learn and extract features from raw data
from [37]. Pascanu et al. [47] proposed a series of hybrid input. The authors conducted comparative experiments with
models that combined an Echo state networks (ESN) with a KNN and other existing works which used the VPN non-VPN
classifier (logistic regression or MLP) or RNN with a classifier [55] and CICIDS 2012 [54] datasets to demonstrate that the
(logistic regression or MLP). ESN plus logistic regression with proposed DFR is more accurate and robust. Xing et al.[10]
proposed an online detection model based on deep dictionary contributed by each individual dataset. Thus, approximate
learning, D2LAD, to address the noisy data label, long training numbers of malicious and legitimate traffic from each selected
time and high traffic data distribution variance. The model public dataset are extracted. We also ensured that there will be
can learn and extract sequential features from raw traffic data no traffic size from one selected public dataset that is much
input based on a pre-trained LSTM auto-encoder. The authors larger than other selected public datasets. The third criterion is
showed that their proposed work achieved a 94.5% accuracy, that our dataset includes both conventional devices’ and IoT
which outperformed existing methods for online encrypted devices’ encrypted malicious and legitimate traffic, as these
traffic detection. devices are increasingly being deployed and are working in
The research of applying deep learning provides us with the same environments such as offices, homes, and other smart
many very useful detection methods. Such methods include city settings.
the algorithms based on feature self-learning which were Based on the criteria, 5 public datasets are selected from
proposed to overcome the difficulty of traffic feature extraction Section III.B.a. Table I. After data pre-processing, details of
and selection. These algorithms do not require human effort each selected public dataset and the final composed dataset
to extract traffic features. The algorithms then automatically are shown in Table II. Table II summarised the malicious and
extract the required features. On the other hand, research on legitimate traffic size we selected from each selected public
applying deep learning to encrypted malicious traffic detection dataset by using random sampling, proportions of selected
and classification are limited. traffic size from each selected public dataset with respect to the
total traffic size of the composed dataset(% w.r.t the composed
V. COMPARATIVE EXPERIMENT dataset), proportions of selected encrypted traffic size from
In this section, we perform two experiments to evaluate our each selected public dataset (% of selected public dataset), and
experiment objectives. Specifically, our first experiment (re- total traffic size of the composed dataset. From the table, we
ferred to as Experiment 1) is to conduct a series of comparative are able to observe that each public dataset equally contributes
experiments with different algorithms and feature sets to find to approximately 20% of the composed dataset, except for
a more reliable, consistent and fair result for each proposed CICDS-2012 (due to its limited number of encrypted malicious
existing work. By screening and analysing public datasets in traffic). This achieves a balance across individual datasets and
Table I of Section IV.B.a., we curated a dataset composed minimizes bias towards traffic belonging to any dataset during
entirely from public datasets that are applicable for encrypted learning. We can also observe that the size of malicious and
malicious traffic detection. To the best of our knowledge, legitimate traffic are almost the same, thus achieving class
the dataset is more comprehensive and objective for use than balance. To facilitate the research in this domain, we released
those used by existing works. To prove this, a cross dataset our dataset[100] in Mendeley Data.
validation is conducted as well in experiment 2. Moreover,
this composed dataset also allows a fair comparison among B. Feature extraction and selection
the proposed works for encrypted malicious traffic detection Protocol-agnostic numerical features and TLS/SSL features
and classification. The second experiment (or Experiment 2) are extracted for experiments. Since there is no recognized
aims to provide some insights on whether protocol-agnostic optimal feature set, researchers have adopted different meth-
numerical features or protocol-specific features are better in ods to select the feature set they think is the best, which
performance, and how well one fares against the other. has been discussed in Section IV. Therefore, for protocol-
agnostic numerical features extraction, in order to compare
A. Data Collection the performance of models with different feature sets reliably,
Existing research that adopts different public datasets may we extracted applicable features mentioned in research papers
result in bias and the inability to compare results of proposed for encrypted traffic detection. A total of more than 113 unique
works in an objective manner. In order to aid in this effort protocol-agnostic numerical features were obtained. Table III
to ensure that experiments can be conducted as fairly and shows a few commonly used features of protocol-agnostic
comprehensive as possible, we aim to collect, analyse and numerical features extracted.
derive data from publicly available datasets on the Internet, As different works selected different feature sets, we con-
to compose a dataset for encrypted malicious traffic detection ducted statistical analysis on the protocol-agnostic numerical
and classification. At the same time, we attempt to include features that appeared in those research and listed the features
datasets from as many different sources as possible to expand that appeared at high frequencies. In comparative experiments,
the variety of encrypted traffic. the listed features will be constructed as a feature set, named
Our dataset is composed based on three criteria: The first Further Optimized Statistical (FOS) feature set.
criterion is to combine widely considered public datasets 5 different feature sets belonging to protocol-agnostic nu-
which contain both encrypted malicious and legitimate traffic merical features are used in experiments:
in existing works, such as Malwares Capture Facility Project 1. FOS feature set contains 14 features;
dataset and CICIDS-2017 dataset. The second criterion is to 2. Top 10 ranked features feature set from [12], which are
ensure the data balance, i.e., the balance of malicious and the top 10 ranked features used in their detection framework
legitimate network traffic and similar size of network traffic by using enhanced SFS algorithm;
TABLE II
T HE SELECTED PUBLIC DATASETS IN EXPERIMENTS

% of
Malicious legitimate Total % w.r.t
Year of selected
Public Dataset Type of Traffic Traffic Traffic Traffic composed
Release public
Size Size Size dataset
dataset
UNSW NS
IoT Encrypted 12900 sessions 13300 sessions 26200 sessions
2019 Dataset 2019 ˜22% ˜60%
Traffic 193500 packets 199500 packets 393000 packets
[70]
CICIDS-2017 Conventional 13000 sessions 13500 sessions 26500 sessions
2018 ˜23% ˜70%
[53] Encrypted Traffic 195000 packets 202500 packets 397500 packets
CIC-AndMal Conventional 12403 sessions 12400 sessions 24803 sessions
2018 ˜21% ˜60%
2017[60] Encrypted Traffic 132859 packets 186000 packets 318859 packets
Malware Capture
Conventional 13600 sessions 12180 sessions 25780 sessions
Facility Project 2013 ˜22% ˜50%
Encrypted Traffic 204000 packets 182700 packets 386700 packets
Dataset [49]
CICIDS-2012 Conventional 7613 sessions 6731 sessions 14344 sessions
2012 ˜12% ˜100%
[54] Encrypted Traffic 69648 packets 71310 packets 140958 packets
59516 sessions 58111 sessions 117627 sessions
Summary
795007 packets 842010 packets 1637017 packets

TABLE III Similar to numerical feature selection, we conducted a


E XAMPLE OF STATISTICAL NUMERICAL FEATURES statistical analysis on TLS/SSL features as well. A list of
Statistical numerical features TLS/SSL features that appeared at high frequencies in many
Length of TCP payload works is created. The list contains 22 features, such as Server
Length of IP packets header Name Indication (SNI) in SAN Domain Name System (DNS),
TCP windows size value
Time difference between packets per session mean of public certificate key, average domain in SAN, and
Interval of arrival time of forward traffic STD of public certificate key. We name this list of TLS/SSL
Ratio to previous packets in each session features as the Further Optimized TLS/SSL (FOTS) feature
Total bytes from client in each session set. Since such protocol-specific features are only applicable to
Flow Duration of each session
Length of IP packets (Minimum; Maximum; Median; Mean; STD) their related encrypted protocols, they are limited in encrypted
Length of TCP payload (Minimum; Maximum; Median; Mean; STD) malicious traffic detection and classification area. Therefore,
......... this FOTS feature set is used to conduct the comparative
experiment with the protocol-agnostic numerical feature set in
Experiment 2 to provide some insights on which feature set is
3. Side channel feature set from [14], which contains five better in performance, and how well one fares against the other.
packet based features, but was shown in the existing work to We will analyze the potential of replacing TLS/SSL features
have achieved a 99.8% accuracy in their experiment; In our (protocol-specific features) with protocol-agnostic numerical
experiment, in order to use this packet based feature set to features.
compare with other session based feature sets, we choose to
only use the first 15 packets in each session. The reason for C. Experiment Setup
choosing 15 packets per session is based on the analysis result Ten commonly used machine learning algorithms are chosen
from [83] which states keeping between 5 and 15 packets in in our experiment. They are RF, KNN, CART, C4.5, MLP,
each session can achieve a better performance metric (also Naı̈ve Bayes, XGBoost, AdaBoost, Linear Regression (Linear
discussed in Section IV, E). This ensures the dataset contains R), and Logistic Regression (Logistic R). In order to construct
as much data variety as possible and is suitable for both packet our models, scikit-learn and xgboost libraries for Python are
based feature set and session based feature set at the same used with default hyperparameters (except RF’s n estimators
time. = 200 and KNN’s n neighbors = 6). The stratified cross
4. Tamper resistant feature set based on the idea from validation is used for model training based on k-fold, where
[17], which the selected features do not rely on TCP payload k is set to 5.
information or do not contain port and flag. In recent years, many authors have stated in their papers that
5. Time based feature set based on the idea from [77], where deep learning is becoming a more efficient technology in the
these time series relevant features appear at high frequencies research of encrypted malicious traffic detection, while tradi-
in research. tional machine learning approaches have not been considered,
Features in each selected feature set are shown in Table IV. or simply discussed without in-depth analysis. We agree that
For TLS/SSL features, we also use Zeek IDS to extract the proportion of research papers on deep learning encrypted
log files and extract features based on these log files. Table malicious traffic detection is increasing, and deep learning is
V shows some samples of the TLS/SSL features that can be indeed gradually replacing traditional machine learning as a
extracted from different log files. broader technology for detecting encrypted malicious traffic.
TABLE IV
F EATURES IN EACH SELECTED FEATURE SET

Feature Name 1 2 3 4 5
mean TCP windows size value X X
source port X X X
mean length of IP packet header X
maximum interval of arrival time of forward traffic X X X
mean Length of backward IP packet header X
maximum interval of arrival time of backward traffic X X X
STD of backward packet length X
flow duration X X X X
time duration of backward traffic X
total payload per session X
destination Port X
STD of time difference between packets per session X
minimum of time difference between packets per session X
STD of interval of arrival time of backward traffic X X
STD of interval of arrival time of forward traffic X
minimum of interval of arrival time of backward traffic X
mean of interval of arrival time of forward traffic X
mean of interval of arrival time of backward traffic X
minimum of interval of arrival time of forward traffic X
length of IP packets X
length of TCP payload X
payload Ratio X
ratio to previous packets in each session X
time difference between packets per session X
total length of forward payload X X
minimum length of TCP payload X
mean length of TCP payload X
median length of TCP payload X
STD of the length of IP packets X X
IPratio (maximum length of IP packets / minimum length of IP packets) X
goodput (Total length of IP packet per session / flow duration) X
maximum time difference between packets per session X
STD of forward packet length X
maximum length of TCP payload X
mean time to live X
STD of time to live X
time duration of forward traffic X
1 refers FOS feature set; 2 refers top 10 ranked features feature set; 3 refers side-channel feature
set; 4 refers tamper resistant feature set; 5 refer to time based feature set; ’X’ refers the feature in
this row is selected to the feature set in this column

TABLE V tions, but also conduct comparative experiments on commonly


E XAMPLE OF TLS/SSL FEATURES used traditional machine learning. Furthermore, one of our
TLS/SSL features experiment purposes is to deliver a more optimal feature set.
payload bytes from clients Machine learning algorithms which can be used to effectively
ratio of responder to clients evaluate optimal feature sets in a straightforward manner.
no. of certificate The experiment running: Intel(R) Core(TM) i7-10700K
TLS version types CPU @ 3.8GHz 64.0GB of RAM. Experiments evaluated
SNI in SAN DNS
ROC-AUC, accuracy, FPR, and true positive rate(TPR) values
differ SNI in SSL log
differ Subject in SSL log of detection models. Furthermore, the STD information of
mean of certificate validity each evaluation measure is also calculated.
Common Name in SAN DNS
no of domain in SAN
D. Experiment 1: Performance Analysis of the different
......... statistical numerical feature sets and algorithms using a
mixed dataset
We would like to achieve four experiment objectives by
conducting this experiment:
However, this does not mean that traditional machine learning 1. Analyze the performance of different algorithms using
has lost the value of research and commentary. Therefore, in the same feature set.
this paper, we not only conduct a comprehensive review of 2. Analyze the performance of different feature sets using
recent deep learning and machine learning papers’ contribu- the same algorithms.
TABLE VI
P ERFORMANCE RESULTS OF 10 ALGORITHMS WITH 5 DIFFERENT FEATURE SETS

FOS Feature Set


accuracy STD(accuracy) roc-auc STD(roc-auc) FPR STD(FPR) TPR STD(TPR)
RF 0.9471 0.0016 0.9909 0.0004 0.0418 0.0023 0.9354 0.0019
XGBoost 0.9365 0.0011 0.9876 0.0002 0.0547 0.0027 0.9272 0.0033
C4.5 0.9345 0.0039 0.9345 0.0039 0.0639 0.0040 0.9328 0.0044
AdaBoost 0.9316 0.0029 0.9316 0.0028 0.0673 0.0042 0.9305 0.0026
CART 0.9322 0.0029 0.9322 0.0028 0.0665 0.0038 0.9308 0.0030
KNN 0.8651 0.0014 0.8651 0.0014 0.1315 0.0018 0.8615 0.0027
MLP 0.7681 0.0556 0.8383 0.0751 0.1327 0.1183 0.6629 0.2145
Naı̈ve Bayes 0.7397 0.0013 0.7990 0.0015 0.1620 0.0038 0.6355 0.0038
Logistc R 0.7210 0.0077 0.7924 0.0016 0.2062 0.0146 0.6440 0.0018
Linear R 0.6986 0.0526 0.6986 0.0507 0.3035 0.1187 0.7008 0.0183
Top 10 Ranked Features Feature Set
accuracy STD(accuracy) roc-auc STD(roc-auc) FPR STD(FPR) TPR STD(TPR)
RF 0.9400 0.0010 0.9879 0.0005 0.0571 0.0024 0.9369 0.0036
XGBoost 0.9166 0.0020 0.9783 0.0008 0.0853 0.0044 0.9185 0.0029
C4.5 0.9164 0.0031 0.9164 0.0031 0.0831 0.0036 0.9159 0.0042
AdaBoost 0.9153 0.0027 0.9153 0.0027 0.0836 0.0025 0.9141 0.0033
CART 0.9151 0.0023 0.9151 0.0023 0.0838 0.0020 0.9140 0.0034
KNN 0.8566 0.0035 0.8566 0.0035 0.1404 0.0045 0.8534 0.0030
MLP 0.7571 0.0594 0.7924 0.0786 0.0744 0.0743 0.5786 0.1830
Naı̈ve Bayes 0.7580 0.0018 0.8068 0.0040 0.1391 0.0031 0.6490 0.0035
Logistc R 0.7430 0.0018 0.8026 0.0016 0.2067 0.0174 0.6897 0.0192
Linear R 0.5543 0.0684 0.5591 0.0688 0.6080 0.1469 0.7262 0.1600
Side Channel Feature Set
accuracy STD(accuracy) roc-auc STD(roc-auc) FPR STD(FPR) TPR STD(TPR)
RF 0.8190 0.0006 0.9147 0.0004 0.1788 0.0017 0.8166 0.0011
XGBoost 0.8235 0.0018 0.9181 0.0012 0.1631 0.0014 0.8093 0.0038
C4.5 0.8128 0.0007 0.8517 0.0005 0.1764 0.0015 0.8013 0.0011
AdaBoost 0.8158 0.0005 0.8711 0.0003 0.1759 0.0100 0.8070 0.0109
CART 0.8126 0.0006 0.8515 0.0005 0.1767 0.0014 0.8013 0.0014
KNN 0.8259 0.0032 0.9124 0.0019 0.1279 0.0073 0.7770 0.0132
MLP 0.6717 0.0107 0.7477 0.0088 0.1373 0.0825 0.4694 0.0926
Naı̈ve Bayes 0.6244 0.0006 0.6135 0.0012 0.0223 0.0007 0.2503 0.0009
Logistc R 0.5973 0.0010 0.6417 0.0008 0.2900 0.0018 0.4779 0.0020
Linear R 0.5649 0.0427 0.5684 0.0360 0.5558 0.2697 0.6926 0.1980
Tamper Resistant Feature Set
accuracy STD(accuracy) roc-auc STD(roc-auc) FPR STD(FPR) TPR STD(TPR)
RF 0.8828 0.0013 0.9616 0.0005 0.1100 0.0018 0.8753 0.0029
XGBoost 0.8894 0.0010 0.9654 0.0008 0.1186 0.0037 0.8978 0.0045
C4.5 0.8790 0.0031 0.8790 0.0031 0.1189 0.0051 0.8768 0.0016
AdaBoost 0.8785 0.0033 0.9111 0.0329 0.1189 0.0055 0.8758 0.0027
CART 0.8762 0.0019 0.8762 0.0019 0.1213 0.0041 0.8737 0.0013
KNN 0.8469 0.0022 0.8471 0.0023 0.1529 0.0016 0.8467 0.0039
MLP 0.7510 0.1026 0.8547 0.0581 0.2858 0.2939 0.7899 0.1360
Naı̈ve Bayes 0.4782 0.0015 0.7353 0.0021 0.9825 0.0019 0.9661 0.0018
Logistc R 0.6591 0.0040 0.7580 0.0043 0.4832 0.0051 0.8097 0.0057
Linear R 0.7482 0.0061 0.7442 0.0063 0.1194 0.0552 0.6079 0.0583
Time Based Feature Set
accuracy STD(accuracy) roc-auc STD(roc-auc) FPR STD(FPR) TPR STD(TPR)
RF 0.9207 0.0010 0.9792 0.0006 0.0759 0.0010 0.9171 0.0022
XGBoost 0.9124 0.0033 0.9746 0.0012 0.0831 0.0047 0.9076 0.0048
C4.5 0.9119 0.0030 0.9119 0.0030 0.0869 0.0037 0.9107 0.0035
AdaBoost 0.9136 0.0014 0.9135 0.0014 0.0856 0.0014 0.9127 0.0028
CART 0.9139 0.0023 0.9138 0.0023 0.0849 0.0023 0.9126 0.0037
KNN 0.8667 0.0021 0.8678 0.0019 0.1355 0.0024 0.8691 0.0023
MLP 0.8056 0.0091 0.9025 0.0049 0.2023 0.0620 0.8140 0.0488
Naı̈ve Bayes 0.6339 0.0019 0.5832 0.0015 0.1038 0.0012 0.3561 0.0045
Logistc R 0.6557 0.0010 0.6918 0.0036 0.1204 0.0012 0.4186 0.0025
Linear R 0.5799 0.0750 0.5830 0.0754 0.5288 0.3559 0.6948 0.3713
Fig. 4. Accuracy performance of 10 algorithms with 5 different feature sets
Fig. 6. TPR performance of 10 algorithms with 5 different feature sets

Fig. 5. ROC-AUC performance of 10 algorithms with 5 different feature sets

Fig. 7. FPR performance of 10 algorithms with 5 different feature sets


3. Identify the optimal algorithm and feature set using this
dataset.
4. Cross Dataset Validation to highlight the importance of
using the composed dataset instead of using datasets for model
training separately.
Therefore, in the first experiment, we first utilized the
composed dataset to train ten different algorithms with five
different protocol-agnostic numerical feature sets mentioned
in section V.B. The experiment results are shown from Fig. 4
to Fig. 7, and Table VI.
For the experiment’s first objective, we can refer to Fig.
4 to Fig. 7 and Table VI for further analysis. In Fig. 4, RF,
XGBoost, C4.5, AdaBoost, CART, and KNN outperform the
remaining algorithms. We can observe that the accuracy is
above 80% and the STD of accuracy is lower than 0.5% for
all feature sets with above 6 algorithms. In Fig. 5, ROC-AUC
values of both RF and XGBoost outperform other algorithms.
Fig. 8. The highest ROC-AUC among five feature sets under each algorithms.
Regardless of which feature set is used for RF and XGBoost Black solid fill bars represent algorithms with FOS feature set and other bars
model training, their ROC-AUC can be higher than 90% and represent algorithms with other feature sets.
STD of ROC-AUC can be lower than 0.13%, which is not
possible with other algorithms in our experiment. Next is to
compare Fig. 6 and Fig. 7 (TPR VS. FPR). Based on Fig. 6,
Fig. 7, and Table VI, tamper resistant feature set with Naı̈ve
Bayes achieved the highest TPR (96.61%) in this experiment,
but its FPR is high which is 98.25%. Side channel feature
with Naı̈ve Bayes achieved the lowest FPR (2.23%) in this
experiment, but its TPR is only 25.03%. Therefore, we cannot
accept the FPR of side channel feature set with Naı̈ve Bayes
and TPR of tamper resistant feature set with Naı̈ve Bayes. RF
and XGBoost models that have always performed well in Fig.
4 and Fig. 5 can also achieve low FPR and high TPR. FOS
feature set with either RF or XGBoost can simultaneously
achieve TPR higher than 92% and FPR lower than 6%.
Compared with RF and XGBoost, TPR and FPR of RF are
better than XGBoost. In particular, the RF model with FOS
Fig. 9. The highest accuracy among five feature sets under each algorithms.
Black solid fill bars represent algorithms with FOS feature set and other bars feature set is able to achieve 94.71% accuracy, 99.09% ROC-
represent algorithms with other feature sets. AUC and 4.18% FPR, which outperforms all other algorithms.
Its 93.54% TPR also achieved the second highest value (The
highest is 93.69% for RF with Top 10 ranked features feature
set).
For the experiment’s second objective, the analysis out-
comes of the comparison are summarized and plotted in Fig.
8 to Fig. 11. These figures represent the best performance of
each algorithm among five feature sets in terms of accuracy,
ROC-AUC, TPR, and FPR. From Fig. 8 to Fig. 9, a solid bar
indicates the best performance of this algorithm is achieved by
FOS feature set. A bar filled with diagonal lines indicates the
best performance of this algorithm is achieved by the other
four feature sets. Based on Fig. 8 and Fig. 9, we observed
that FOS feature set with RF, XGBoost, C4.5, AdaBoost,
and CART can achieve the highest ROC-AUC and accuracy
values among all feature sets. In KNN, MLP, Naı̈ve Bayes,
Logistic R, and Linear R models, the highest ROC-AUC and
accuracy are achieved by the other four feature sets, but their
Fig. 10. The highest TPR among five feature sets under each algorithms and
their corresponding FPR. Black solid fill bars represent algorithms with FOS performances are all lower than the first five algorithms with
feature set and other bars represent algorithms with other feature sets. FOS feature set.
In Fig. 10 and Fig. 11, grey color indicates FPR and black
color indicates TPR. Furthermore, a solid filled bar indicates
the best performance of FPR or TPR of this algorithm is
achieved by FOS feature set. A bar filled with diagonal lines
indicates the best performance of FPR or TPR of this algorithm
is achieved by the other four feature sets. For Fig. 11, we can
observe that the lowest FPR of five algorithms are achieved
by FOS feature set, and their corresponding TPRs are all
greater than 92%. Among them, FOS feature set with RF
achieved the lowest FPR (The lowest FPR of Naı̈ve Bayes
is not acceptable, which has been discussed above). For Fig.
10, four algorithms with FOS feature set achieved the highest
TPR, of which XGBoost, C4.5, AdaBoost, and CART with
FOS feature set achieved both highest TPR and Lowest FPR,
and all have achieved an STD of lower than 0.5%. However,
the highest TPR of RF is not achieved by FOS feature set, but
Fig. 11. The lowest FPR among five feature sets under each algorithms and Top 10 ranked features feature set. In order to find the optimal
their corresponding TPR. Black solid fill bars represent algorithms with FOS feature set between FOS feature set and Top 10 ranked features
feature set and other bars represent algorithms with other feature sets. feature set, we also need to consider the exact performance
results in Table VI. Based on Table VI, FOS feature set with
RF achieved 93.54% TPR and 4.18% FPR, and Top 10 ranked
features feature set with RF achieved 93.69% TPR and 5.71%
FPR. Although the TPR of FOS feature set with RF is 0.15%
lower than that of Top 10 ranked features feature set with RF,
the FPR of FOS feature set with RF is 1.53% lower than that
of Top 10 ranked features feature set with RF. Therefore, FOS
feature set with RF provides a better trade off between TPR
and FPR.
From the analysis results based on the experiment’s first
and second objectives, it is apparent that the combination of
the RF algorithm and FOS feature set can achieve the best
performance.
Next, we will conduct a more detailed analysis of this
experiment. The experiments showed that a feature set that Fig. 12. ROC-AUC result of Experiment 2 between statistical numerical and
combined time based features and traffic numerical features TLS/SSL features
can enhance the model detection rate as compared to using
either one alone. For instance, FOS feature set and Top 10
ranked features feature set both contain time based features
and traffic numerical features, but time based feature set
contains the time based features only. Furthermore, in the
experiment, side channel feature set has the worst performance
among the five feature sets. We believe that it is because
the features in the side channel feature set are all packet
based features, such as payload size, packet size, and payload
ratio. Such features may not have a clear distinction in values
between malicious and legitimate traffic flows, which may
affect the performance of the detection models. While the
features of the other four feature sets are session based
features, they consider multiple packets sequence instead of
only single packets. The experiment indicates that session Fig. 13. Accuracy result of Experiment 2 between statistical numerical and
based features or a combination of session based features TLS/SSL features
with packet based features are likely to outperform those that
consider packet based features only.
Furthermore, a cross dataset validation is also conducted Such limitations may lead to certain unpredictable biases, such
in this experiment. The purpose of this is to highlight the as the high possibility of over-fitting in the selected dataset.
importance of using the composed dataset instead of using In summary, the above analysis highlights the importance of
single datasets for model training and validation. We perform using a comprehensively composed dataset instead of using
two such experiments: Train selected Dataset A and test the datasets for model training separately.
other 4 datasets; and train 4 datasets and test Dataset A. we
E. Experiment 2: Compare the performance of Protocol-
select the RF with FOS feature set for both above cross dataset
agnostic numerical and TLS/SSL feature sets
validation tasks, because it reported the best performance. The
experiment results are shown in Table VII. In the second experiment, the aim is to evaluate and
By observing the two cross dataset validation results in compare the performance of the protocol-agnostic numerical
Table VII, we obtained very poor accuracy, ROC-AUC, FPR, features and TLS/SSL features. As TLS/SSL features are only
and TPR for both tasks. Both FPR and TPR are very low, limited to traffic encrypted by these protocols, we use a dataset
which indicates that the model classifies most of the test containing only TLS/SSL encrypted traffic for this experiment.
set data as negative. Such results may be expected that their We applied FOTS feature set and FOS feature set to train
data structure is very different in distribution, especially the the models using the ten different algorithms. The experiment
difference of data distribution between IoT and conventional result is shown in Fig. 12 and Fig. 13.
device data. When we use the conventional device data to Based on Fig. 12 and Fig. 13, we can observe that except
train the RF model with FOS feature set and use the IoT for Naı̈ve Bayes, the performance of the other nine algorithms
dataset as the test set, we get 0 FPR and 0 TPR at the with FOS feature set can achieve a higher or similar perfor-
same time. This shows that in the experiment, the RF model mance in terms of both accuracy and ROC-AUC compare with
trained by the conventional device data has almost no detection using the TLS/SSL features under the same condition. While
capabilities for the IoT test set. Therefore, a single public only the Naı̈ve Bayes with FOS feature set achieved higher
dataset would have a lack of variety in encryption traffic. ROC-AUC, but lower accuracy than the Naı̈ve Bayes using
TABLE VII
C ROSS DATASET VALIDATION RESULTS BASED ON RF WITH FOS FEATURE SET

A= A=
Train Other Datasets A= A= A=
Malware Capture UNSW NS2019
and Test Dataset A CICIDS-2012 CICIDS-2017 CIC-AndMal2017
Facility Project Dataset Dataset
Accuracy 0.2868 0.8005 0.4318 0.4873 0.5840
roc auc 0.3026 0.8148 0.7451 0.5830 0.6222
FPR 0.6970 0.3799 0.0881 0.0507 0.0002
TPR 0.2702 0.9878 0.0018 0.0110 0.0018
A= A=
Train Dataset A and A= A= A=
Malware Capture UNSW NS2019
Test Other Datasets CICIDS-2012 CICIDS-2017 CIC-AndMal2017
Facility Project Dataset Dataset
Accuracy 0.5422 0.5190 0.6528 0.5165 0.4906
roc auc 0.6450 0.6552 0.7030 0.7783 0.4705
FPR 0.2260 0.0041 0.2500 0.0000 0.5283
TPR 0.2960 0.0107 0.5445 0.0000 0.5093

TLS/SSL features. Although, there is a different performance Lastly, we also showed that the study on only one or two
of Naı̈ve Bayes, the performance of Naı̈ve Bayes detection kinds of encrypted traffic should be avoided. With the growth
model is lower than most other algorithms no matter FOS in the types of encryption protocols and proprietary protocols,
feature set or TLS/SSL features is selected. In summary, the analysis of one or two kinds of encryption protocols is
Among top performance algorithms, such as RF and XGBoost, bound to play a small role moving forward. Furthermore,
FOS feature set can achieve better performance than using the current approaches are mostly binary classification, and multi-
TLS/SSL features regardless of using ROC-AUC or accuracy class classification can be explored to obtain more fine-grained
as a performance evaluation measure. Therefore, it is feasible detection results.
to consider replacing protocol-specific features with protocol-
agnostic numerical features. Then, most research on encrypted
R EFERENCES
malicious traffic detection will no longer be limited to HTTPs
or any specific protocols, but a more comprehensive encrypted
[1] S, G. Cisco ETA – Provides Solution for Detecting Malware in Encrypted
malicious traffic analysis can be carried out. The future Traffic. GBHackers On Security. Retrieved January 14, 2018, from
research of feature selection and optimization on protocol- https://gbhackers.com/cisco-eta-encrypted-traffic/
agnostic numerical features may be more meaningful than [2] Google transparency report. (n.d.). Retrieved April 28, 2021, from
https://transparencyreport.google.com/https/overview?hl=en
protocol-specific features as well. [3] Cisco Cybersecurity Report. (n.d.). Retrieved 2018, from
https://www.cisco.com/c/m/en au/products/security/offers/annual-
VI. C ONCLUSION cybersecurity-report-2018.html
In the paper, we proposed a framework to study and analyse [4] Desai, D. 2020: The State of Encrypted Attacks. Zscaler. Retrieved Febru-
ary 24, 2021, from https://www.zscaler.com/blogs/security-research/2020-
the machine learning based encrypted traffic detection ap- state-encrypted-attacks
proach. We reviewed existing research based on the proposed [5] Cao, Zigang & Xiong, Gang & Zhao, Yong & Li, Zhenzhen & Guo, Li.
framework, including the research objective construction, traf- (2014). A Survey on Encrypted Traffic Classification. Communications
in Computer and Information Science. 490. 73-81. 10.1007/978-3-662-
fic dataset collection and pre-processing, feature extraction and 45670-5 8.
selection, algorithm selection and performance evaluation. [6] Velan, Petr & Cermak, Milan & Celeda, Pavel & Drašar, Martin. (2015).
While some progress has been made in encrypted malicious A survey of methods for encrypted traffic classification and analysis.
International Journal of Network Management. 25. 10.1002/nem.1901.
traffic detection, challenges still exist. The first and most
[7] Li, Ding & Zhu, Yuefei & Lin, Wei. (2017). Traffic Identification
important issue to be addressed is the lack of a comprehensive, of Mobile Apps Based on Variational Autoencoder Network. 287-291.
class balanced, realistic and convincing public dataset in the 10.1109/CIS.2017.00069.
area of encrypted malicious traffic detection. The quality of [8] Yang, Jiwon & Lim, Hyuk. (2021). Deep Learning Approach for Detect-
ing Malicious Activities Over Encrypted Secure Channels. IEEE Access.
the dataset used by research has a direct impact on the final PP. 1-1. 10.1109/ACCESS.2021.3064561.
performance of its model. Thus, we analysed, processed and [9] Zeng, Yi & Gu, Huaxi & Wenting, Wei & Guo, Yantao. (2019).
combined 5 publicly available datasets to construct a large Deep − F ull − Range: A Deep Learning Based Network Encrypted
Traffic Classification and Intrusion Detection Framework. IEEE Access.
and comprehensive dataset to facilitate future research and PP. 1-1. 10.1109/ACCESS.2019.2908225.
technology evaluation in this field. [10] Xing, Junchi & Wu, Chunming. (2020). Detecting Anomalies in En-
In addition, we trained detection models using 10 machine crypted Traffic via Deep Dictionary Learning. 734-739. 10.1109/INFO-
COMWKSHPS50562.2020.9162940.
learning algorithms and 5 feature sets, and conducted com-
[11] Bazuhair, Wajdi & Lee, Wonjun. (2020). Detecting Malign Encrypted
parative experiments to analyse the performance of different Network Traffic Using Perlin Noise and Convolutional Neural Network.
protocol-agnostic numerical feature sets and algorithms using 0200-0206. 10.1109/CCWC47524.2020.9031116.
the same dataset. We also tested the feasibility of replacing [12] Liu, Jiayong & Tian, Zhiyi & Zheng, RongFeng & Liu, Liang.
(2019). A Distance-Based Method for Building an Encrypted Malware
TLS/SSL features with protocol-agnostic numerical features Traffic Identification Framework. IEEE Access. PP. 1-1. 10.1109/AC-
for future encrypted malicious traffic analysis. CESS.2019.2930717.
[13] Chen, Liangchen & Gao, Shu & Liu, Baoxu & Lu, Zhigang & [32] Ma, & Yanhua, Du & Cao,. (2020). Improved KNN Algorithm for Fine-
Jiang, Zhengwei. (2020). THS-IDPC: A three-stage hierarchical sampling Grained Classification of Encrypted Network Flow. Electronics. 9. 324.
method based on improved density peaks clustering algorithm for en- 10.3390/electronics9020324.
crypted malicious traffic detection. The Journal of Supercomputing. 76. [33] Anderson, Blake & Paul, Subharthi & McGrew, David. (2018). Deci-
10.1007/s11227-020-03372-1. phering Malware’s use of TLS (without Decryption). Journal of Computer
[14] Stergiopoulos, George & Talavari, Alexander & Bitsikas, Evangelos Virology and Hacking Techniques. 14. 10.1007/s11416-017-0306-6.
& Gritzalis, Dimitris. (2018). Automatic Detection of Various Mali- [34] Zhou, Huiyi & Wang, Yong & Lei, Xiaochun & Liu, Yuming.
cious Traffic Using Side Channel Features on TCP Packets: 23rd Eu- (2017). A Method of Improved CNN Traffic Classification. 177-181.
ropean Symposium on Research in Computer Security, ESORICS 2018, 10.1109/CIS.2017.00046.
Barcelona, Spain, September 3-7, 2018, Proceedings, Part I. 10.1007/978- [35] Habibi Lashkari, Arash & Draper Gil, Gerard & Mamun, Mohammad
3-319-99073-6 17. & Ghorbani, Ali. (2016). Characterization of Encrypted and VPN Traffic
[15] Yao, Haipeng & Liu, Chong & Zhang, Peiying & Wu, Sheng & Using Time-Related Features. 10.5220/0005740704070414.
Jiang, Chunxiao & Yu, Shui. (2019). Identification of Encrypted Traffic [36] Wang, Wei & Sheng, Y. & Wang, Jinlin & Zeng, Xuewen & Ye,
Through Attention Mechanism Based Long Short Term Memory. IEEE Xiaozhou & Huang, Yongzhong & Zhu, Ming. (2017). HAST-IDS: Learn-
Transactions on Big Data. PP. 1-1. 10.1109/TBDATA.2019.2940675. ing Hierarchical Spatial-Temporal Features using Deep Neural Networks
[16] Niu, Weina & Zhuo, Zhongliu & Zhang, Xiaosong & Du, Xi- to Improve Intrusion Detection. IEEE Access. PP. 1-1. 10.1109/AC-
aojiang & Yang, Guowu & Guizani, Mohsen. (2019). A Heuris- CESS.2017.2780250.
tic Statistical Testing Based Approach for Encrypted Network Traffic [37] Wang, Wei & Zhu, Ming & Wang, Jinlin & Zeng, Xuewen
Identification. IEEE Transactions on Vehicular Technology. PP. 1-1. & Yang, Zhongzhen. (2017). End-to-end encrypted traffic classi-
10.1109/TVT.2019.2894290. fication with one-dimensional convolution neural networks. 43-48.
[17] Celik, Z. Berkay & Walls, Robert & McDaniel, Patrick & Swami, 10.1109/ISI.2017.8004872.
Ananthram. (2015). Malware traffic detection using tamper resistant [38] Min, Erxue & Long, Jun & Liu, Qiang & Cui, Jianjing & Cai, Zhiping
features. 330-335. 10.1109/MILCOM.2015.7357464. & Ma, Junbo. (2018). SU-IDS: A Semi-supervised and Unsupervised
[18] UNB-Canadian Institute for Cybersecurity (CIC) Framework for Network Intrusion Detection: 4th International Confer-
datasets. University of New Brunswick est.1785. (n.d.). ence, ICCCS 2018, Haikou, China, June 8–10, 2018, Revised Selected
https://www.unb.ca/cic/datasets/index.html. Papers, Part III. 10.1007/978-3-030-00012-7 30.
[19] Nakahara, Masataka & Okui, Norihiro & Kobayashi, Yasuaki & [39] Meidan, Yair & Bohadana, Michael & Mathov, Yael & Mirsky, Yisroel
Miyake, Yutaka. (2020). Machine Learning based Malware Traffic & Breitenbacher, Dominik & Shabtai, Asaf & Elovici, Yuval. (2018).
Detection on IoT Devices using Summarized Packet Data. 78-87. N-BaIoT: Network-based Detection of IoT Botnet Attacks Using Deep
10.5220/0009345300780087. Autoencoders.
[20] Hafeez, Ibbad & Antikainen, Markku & Ding, Aaron Yi & Tarkoma, [40] Höchst, Jonas & Baumgärtner, Lars & Hollick, Matthias & Freisleben,
Sasu. (2020). IoT-KEEPER: Detecting Malicious IoT Network Activity Bernd. (2017). Unsupervised Traffic Flow Classification Using a Neural
using Online Traffic Analysis at the Edge. IEEE Transactions on Network Autoencoder. 10.1109/LCN.2017.57.
and Service Management. PP. 1-1. 10.1109/TNSM.2020.2966951. [41] Yu, Yang & Long, Jun & Cai, Zhiping. (2017). Network Intrusion
Detection through Stacking Dilated Convolutional Autoencoders. Security
[21] Fahad, Adil & Alharthi, Kurayman & Tari, Zahir & Almalawi, Ab-
and Communication Networks. 2017. 1-10. 10.1155/2017/4184196.
dulmohsen & Khalil, Ibrahim. (2014). CluClas: Hybrid clustering-
classification approach for accurate and efficient network classifica- [42] Li, Yuancheng & Ma, Rong & Jiao, Runhai. (2015). A Hybrid Malicious
tion. Proceedings - Conference on Local Computer Networks, LCN. Code Detection Method based on Deep Learning. International Journal
10.1109/LCN.2014.6925769. of Software Engineering and Its Applications. 9. 205-216. 10.14257/ij-
seia.2015.9.5.21.
[22] Kim, Jin-Young & Bu, Seok-Jun & Cho, Sung-Bae. (2018). Zero-
[43] Min, Erxue & Long, Jun & Liu, Qiang & Cui, Jianjing & Chen,
day Malware Detection using Transferred Generative Adversarial Net-
Wei. (2018). TR-IDS: Anomaly-Based Intrusion Detection through Text-
works based on Deep Autoencoders. Information Sciences. 460-461.
Convolutional Neural Network and Random Forest. Security and Com-
10.1016/j.ins.2018.04.092.
munication Networks. 2018. 1-9. 10.1155/2018/4943509.
[23] N. Koroniotis, N. Moustafa, E. Sitnikova, and B. Turnbull, “Towards
[44] Wang, Wei & Zhu, Ming & Zeng, Xuewen & Ye, Xiaozhou & Sheng, Y..
the development of realistic botnet dataset in the Internet of Things
(2017). Malware traffic classification using convolutional neural network
for network forensic analytics: Bot-IoT dataset,” FGCS, vol. 100, pp.
for representation learning. 712-717. 10.1109/ICOIN.2017.7899588.
779–796, 2019.
[45] Chen, Zhitang & He, Ke & Li, Jian & Geng, Yanhui. (2017).
[24] ZEEK INTRUSION DETECTION SERIES. Retrieved 2020. from Seq2Img: A sequence-to-image based approach towards IP traffic clas-
http://ce.sc.edu/cyberinfra/docs/workshop/Zeek Lab Series.pdf sification using convolutional neural networks. 1271-1276. 10.1109/Big-
[25] Prasse, Paul & Machlica, Lukas & Pevný, Tomás & Havelka, Jiřı́ & Data.2017.8258054.
Scheffer, Tobias. (2017). Malware Detection by Analysing Encrypted [46] De Lucia, Michael & Cotton, Chase. (2019). Detection of Encrypted
Network Traffic with Neural Networks. 10.1007/978-3-319-71246-8 5. Malicious Network Traffic using Machine Learning. 1-6. 10.1109/MIL-
[26] B. Yan& G. Han& Y. Huang & X. Wang. (2018). New traf- COM47813.2019.9020856.
fic classification method for imbalanced network data. J. Com- [47] Pascanu, Razvan & Stokes, Jack & Sanossian, Hermineh & Marinescu,
put. Appl., vol. 38, no. 1, pp. 20–25, 2018. [Online]. Available: Mady & Thomas, Anil. (2015). Malware classification with recurrent
http://www.joca.cn/EN/abstract/abstract21447.shtml networks. 1916-1920. 10.1109/ICASSP.2015.7178304.
[27] Aceto, Giuseppe & Montieri, Antonio & Pescapè, Antonio & Ciuonzo, [48] CTU-13 dataset, CTU University, Czech Republic, 2011, from
Domenico. (2019). Mobile Encrypted Traffic Classification Using https://mcfp.felk.cvut.cz/publicDatasets/CTU-Malware-Capture-Botnet-1/
Deep Learning: Experimental Evaluation, Lessons Learned, and Chal- [49] M. J. Erquiaga and S. Garcia, Malware capture facility project, CVUT
lenges. IEEE Transactions on Network and Service Management. PP. University, 2013, from https://mcfp. weebly.com.
10.1109/TNSM.2019.2899085. [50] Dong, Shuaike & Li, Zhou & Tang, Di & Chen, Jiongyi & Sun,
[28] Anderson, Blake & McGrew, David. (2017). Machine Learning for Menghan & Zhang, Kehuan. (2020). Your Smart Home Can’t Keep
Encrypted Malware Traffic Classification: Accounting for Noisy Labels a Secret: Towards Automated Fingerprinting of IoT Traffic. 47-59.
and Non-Stationarity. 1723-1732. 10.1145/3097983.3098163. 10.1145/3320269.3384732.
[29] Shekhawat, Anish & Di Troia, Fabio & Stamp, Mark. (2019). Feature [51] Sebastian Garcia, Agustin Parmisano, & Maria Jose Erquiaga.
Analysis of Encrypted Malicious Traffic. Expert Systems with Applica- (2020). IoT-23: A labeled dataset with malicious and legiti-
tions. 125. 10.1016/j.eswa.2019.01.064. mate IoT network traffic (Version 1.0.0) [Data set]. Zenodo.
[30] Desai, Bharat & Divakaran, Dinil Mon & Nevat, Ido & Peters, Gareth & http://doi.org/10.5281/zenodo.4743746
Gurusamy, Mohan. (2019). A feature-ranking framework for IoT device [52] B. Duncan, “Malware traffic analysis,” Jul. 2020, [Online].
classification. 10.1109/COMSNETS.2019.8711210. Available:https://www.malware-traffic-analysis.net/
[31] Aceto, Giuseppe & Ciuonzo, Domenico & Montieri, Antonio & Per- [53] Sharafaldin, Iman & Habibi Lashkari, Arash & Ghorbani, Ali. (2018).
sico, Valerio & Pescapè, Antonio. (2019). MIRAGE: Mobile-app Traffic Toward Generating a New Intrusion Detection Dataset and Intrusion
Capture and Ground-truth Creation. 10.1109/CCCS.2019.8888137. Traffic Characterization. 108-116. 10.5220/0006639801080116.
[54] Shiravi, Ali & Shiravi, Hadi & Tavallaee, Mahbod & Ghorbani, Ali. [77] Sarkar, Debmalya & Vinod, P. & Yerima, Suleiman. (2020). Detection of
(2012). Toward developing a systematic approach to generate benchmark Tor Traffic using Deep Learning. 10.1109/AICCSA50499.2020.9316533.
datasets for intrusion detection. Computers & Security. 31. 357–374. [78] CICFlowMeter(2017). Canadian institute for cybersecurity (cic). from
10.1016/j.cose.2011.12.012. https://www.unb.ca/cic/research/applications.html
[55] Habibi Lashkari, Arash & Draper Gil, Gerard & Mamun, Mohammad [79] Khalife, Jawad & Hajjar, Amjad & Dı́az-Verdejo, Jesús. (2014).
& Ghorbani, Ali. (2016). Characterization of Encrypted and VPN Traffic A multilevel taxonomy and requirements for an optimal traffic-
Using Time-Related Features. 10.5220/0005740704070414. classification model. International Journal of Network Management. 24.
[56] Conti, Mauro & Li, Qianqian & Maragno, Alberto & Spolaor, Riccardo. 10.1002/nem.1855.
(2017). The Dark Side(-Channel) of Mobile Devices: A Survey on [80] Meghdouri, Fares & Iglesias Vázquez, Félix & Zseby, Tanja. (2020).
Network Traffic Analysis. IEEE Communications Surveys & Tutorials. Cross-Layer Profiling of Encrypted Network Data for Anomaly Detection.
PP. 10.1109/COMST.2018.2843533. 469-478. 10.1109/DSAA49011.2020.00061.
[57] Liberatore, Marc & Levine, Brian. (2006). Inferring the source of [81] Lotfollahi, Mohammad & Shirali hossein zade, Ramin & Jafari
encrypted HTTP connections. 255-263. 10.1145/1180405.1180437. Siavoshani, Mahdi & Saberian, Mohammadsadegh. (2020). Deep Packet:
[58] MAWI Working Group Traffic Archive. (n.d.). WIDE Project. from A Novel Approach For Encrypted Traffic Classification Using Deep
http://mawi.wide.ad.jp/mawi/ Learning. Soft Computing. 24. 10.1007/s00500-019-04030-2.
[59] Abbasi, Mahmoud & Shahraki, Amin & Taherkordi, Amir. (2021). Deep [82] De Lucia, Michael & Cotton, Chase. (2018). Identifying and detecting
learning for Network Traffic Monitoring and Analysis (NTMA): A survey. applications within TLS traffic. 31. 10.1117/12.2305256.
Computer Communications. 170. 10.1016/j.comcom.2021.01.021. [83] Lopez-Martin, Manuel & Carro, Belén & Sanchez-Esguevillas, Antonio
[60] Habibi Lashkari, Arash & Abdul kadir, Andi Fitriah & Taheri, Laya & Lloret, Jaime. (2017). Network Traffic Classifier With Convolutional
& Ghorbani, Ali. (2018). Toward Developing a Systematic Approach to and Recurrent Neural Networks for Internet of Things. IEEE Access. PP.
Generate Benchmark Android Malware Datasets and Classification. 1-7. 1-1. 10.1109/ACCESS.2017.2747560.
10.1109/CCST.2018.8585560. [84] Androulidakis, Georgios & Papavassiliou, S.. (2008). Improving network
[61] Zhang, Chaoyun & Patras, Paul & Haddadi, Hamed. (2018). Deep Learn- anomaly detection via selective flow-based sampling. Communications,
ing in Mobile and Wireless Networking: A Survey. IEEE Communications IET. 2. 399 - 409. 10.1049/iet-com:20070231.
Surveys & Tutorials. PP. 10.1109/COMST.2019.2904897. [85] Duffield, Nick & Lund, Carsten. (2003). Predicting resource usage and
[62] Nascita, Alfredo & Montieri, Antonio & Aceto, Giuseppe & Ciuonzo, estimation accuracy in an IP flow measurement collection infrastructure.
Domenico & Persico, Valerio & Pescapè, Antonio. (2021). XAI meets 179-191. 10.1145/948205.948228.
Mobile Traffic Classification: Understanding and Improving Multimodal [86] Su, Liya & Yao, Yepeng & Li, Ning & Liu, Junrong & Lu, Zhigang
Deep Learning Architectures. IEEE Transactions on Network and Service & Liu, Baoxu. (2018). Hierarchical Clustering Based Network Traffic
Management. PP. 10.1109/TNSM.2021.3098157. Data Reduction for Improving Suspicious Flow Detection. 744-753.
[63] Traffic Data from Kyoto University’s Honeypots. from 10.1109/TrustCom/BigDataSE.2018.00108.
http://www.takakura.com/Kyoto data/
[87] Yang, Yanqing & Zheng, Kangfeng & Wu, Chunhua & Niu, Xinxin &
[64] First.org, Hands-on Network Forensics - Train-
Yang, Yixian. (2019). Building an Effective Intrusion Detection System
ing PCAP dataset from FIRST 2015. from
Using the Modified Density Peak Clustering Algorithm and Deep Belief
www.first.org/ assets/conf2015/networkforensics virtualbox.zip
Networks. Applied Sciences. 9. 238. 10.3390/app9020238.
[65] UNIBS: Data sharing. (n.d.). UNIBS-2009. from
[88] Zhang, Surong & Bu, Youjun & Chen, Bo & Lu, Xiangyu. (2021).
http://netweb.ing.unibs.it/%7Entw/tools/traces/
Transfer Learning for Encrypted Malicious Traffic Detection Based on
[66] Milicenso, Ponmocup Malware dataset (Update 2012-10-07,
Efficientnet. 72-76. 10.1109/CTISC52352.2021.00021.
http://security-research.dyndns. org/pub/botnet/ponmocup/analysis 2012-
[89] Wang, Pan & Ye, Feng & Chen, Xuejiao & Qian, Yi. (2018). DataNet:
10-05/analysis.txt Accessed 1 Jan 2018)
Deep Learning based Encrypted Network Traffic Classification in SDN
[67] Rai, Arun. (2019). Explainable AI: from black box to glass box. Journal
Home Gateway. IEEE Access. PP. 1-1. 10.1109/ACCESS.2018.2872430.
of the Academy of Marketing Science. 48. 10.1007/s11747-019-00710-5.
[68] Berman, Daniel & Buczak, Anna & Chavis, Jeffrey & Corbett, Cherita. [90] Torroledo, Ivan & Camacho, Luis & Correa Bahnsen, Alejandro. (2018).
(2019). A Survey of Deep Learning Methods for Cyber Security. Infor- Hunting Malicious TLS Certificates with Deep Neural Networks. 64-73.
mation. 10. 122. 10.3390/info10040122. 10.1145/3270101.3270105.
[69] Moustafa, Nour & Slay, Jill. (2015). UNSW-NB15: a comprehensive [91] Vu, Ly & Tra, Dong & Nguyen, Uy. (2016). Learning from im-
data set for network intrusion detection systems (UNSW-NB15 network balanced data for encrypted traffic identification problem. 147-152.
data set). 10.1109/MilCIS.2015.7348942. 10.1145/3011077.3011132.
[70] Hamza, Ayyoob & Habibi Gharakheili, Hassan & Benson, Theophilus [92] Anderson, Blake & McGrew, David. (2016). Identifying
& Sivaraman, Vijay. (2019). Detecting Volumetric Attacks on loT De- Encrypted Malware Traffic with Contextual Flow Data. 35-46.
vices via SDN-Based Monitoring of MUD Activity. SOSR ’19: Pro- 10.1145/2996758.2996768.
ceedings of the 2019 ACM Symposium on SDN Research. 36-48. [93] Zhang, Meng & Zhang, Hongli & Zhang, Bo & Lu, Gang. (2013).
10.1145/3314148.3314352. Encrypted Traffic Classification Based on an Improved Clustering Al-
[71] iTrust. (2021, March 29). Labs Dataset Info. Available: gorithm. Communications in Computer and Information Science. 320.
https://itrust.sutd.edu.sg/itrust-labs datasets/dataset info/ 124-131. 10.1007/978-3-642-35795-4 16.
[72] Beigi, Elaheh & Jazi, Hossein & Stakhanova, Natalia & Ghorbani, Ali. [94] Shahbar, Khalid & Zincir-Heywood, A.. (2018). How far can we push
(2014). Towards effective feature selection in machine learning-based flow analysis to identify encrypted anonymity network traffic?. 1-6.
botnet detection approaches. 2014 IEEE Conference on Communications 10.1109/NOMS.2018.8406156.
and Network Security, CNS 2014. 247-255. 10.1109/CNS.2014.6997492. [95] Wang, Pan & Chen, Xuejiao & Ye, Feng & Zhixin, Sun. (2019).
[73] ZHAO, Bo & GUO, Hong & LIU, Qin-Rang & WU, Jiang-Xing. A Survey of Techniques for Mobile Service Encrypted Traffic Clas-
(2013). Protocol Independent Identification of Encrypted Traffic Based on sification Using Deep Learning. IEEE Access. PP. 1-1. 10.1109/AC-
Weighted Cumulative Sum Test. Ruan Jian Xue Bao/Journal of Software. CESS.2019.2912896.
24. 1334-1345. 10.3724/SP.J.1001.2013.04279. [96] AlAhmadi, Bushra & Martinovic, Ivan. (2018). MalClassifier: Mal-
[74] Yu, Tangda & Zou, Futai & Li, Linsen & yi, Ping. (2019). An Encrypted ware family classification using network flow sequence behaviour. 1-13.
Malicious Traffic Detection System Based on Neural Network. 62-70. 10.1109/ECRIME.2018.8376209.
10.1109/CyberC.2019.00020. [97] ZHAI M F & ZHANG X M & ZHAO B. (2020). Survey of encrypted
[75] Bovenzi, Giampaolo & Aceto, Giuseppe & Ciuonzo, Domenico & malicious traffic detection based on deep learning[J]. Chinese Journal of
Persico, Valerio & Pescapè, Antonio. (2020). A Hierarchical Hy- Network and Information Security, 6(3): 59-70
brid Intrusion Detection Approach in IoT Scenarios. 10.1109/GLOBE- [98] Aceto, Giuseppe & Ciuonzo, Domenico & Montieri, Antonio &
COM42002.2020.9348167. Pescapè, Antonio. (2020). Toward Effective Mobile Encrypted Traf-
[76] Dolgikh, Serge & Seddigh, Nabil & Nandy, Bis & Bennett, Dan & fic Classification through Deep Learning. Neurocomputing. 409.
Zeidler, Colin & Ren, Yonglin & Knoetze, Juhandre & Muthyala, Naveen. 10.1016/j.neucom.2020.05.036.
(2019). A Framework & System for Classification of Encrypted Network [99] Aceto, Giuseppe & Ciuonzo, Domenico & Montieri, Antonio & Pescapè,
Traffic using Machine Learning. 10.23919/CNSM46954.2019.9012662. Antonio. (2021). DISTILLER: Encrypted Traffic Classification via Mul-
timodal Multitask Deep Learning. Journal of Network and Computer
Applications. 183-184. 10.1016/j.jnca.2021.102985.
[100] Wang, Zihao; Fok, Kar Wai; Thing, Vrizlynn (2021), “Com-
posed Encrypted Malicious Traffic Dataset for machine learning
based encrypted malicious traffic analysis.”, Mendeley Data, V1, doi:
10.17632/ztyk4h3v6s.1, https://data.mendeley.com/datasets/ztyk4h3v6s/1

You might also like