Crafting Adversarial Example To Bypass Flow-&ML - Based Botnet Detector Via RL
Crafting Adversarial Example To Bypass Flow-&ML - Based Botnet Detector Via RL
ABSTRACT KEYWORDS
Machine learning(ML)-based botnet detection methods have be- Bypass Botnet Detector, Adversarial Machine Learning, Reinforce-
come mainstream in corporate practice. However, researchers have ment Learning
found that ML models are vulnerable to adversarial attacks, which ACM Reference Format:
can mislead the models by adding subtle perturbations to the sample. Junnan Wang, Qixu Liu, Di Wu, Ying Dong, and Xiang Cui. 2021. Crafting
Due to the complexity of traffic samples and the special constraints Adversarial Example to Bypass Flow-&ML- based Botnet Detector via RL. In
that to keep malicious functions, no substantial research of adver- 24th International Symposium on Research in Attacks, Intrusions and Defenses
sarial ML has been conducted in the botnet detection field, where (RAID ’21), October 6–8, 2021, San Sebastian, Spain. ACM, New York, NY,
the evasion attacks caused by carefully crafted adversarial exam- USA, 12 pages. https://doi.org/10.1145/3471621.3471841
ples may directly make ML-based detectors unavailable and cause
significant property damage. In this paper, we propose a reinforce- 1 INTRODUCTION
ment learning(RL) method for bypassing ML-based botnet detectors. Machine learning(ML) has greatly promoted the development of bot-
Specifically, we train an RL agent as a functionality-preserving bot- net detection technology. Unlike signature-based detection meth-
net flow modifier through a series of interactions with the detector ods, machine learning-based anomaly detection is able to efficiently
in a black-box scenario. This enables the attacker to evade detection and accurately identify malware-generated traffic when certain be-
without modifying the botnet source code or affecting the botnet havior patterns are recognized. In particular, the significant spatial-
utility. Experiments on 14 botnet families prove that our method temporal similarity of botnets makes them easily detected by ML-
has considerable evasion performance and time performance. based models. In addition, ML-based methods can identify unknown
botnet families and are more suitable for dealing with large-scale
CCS CONCEPTS network traffic data.
• Security and privacy → Intrusion/anomaly detection and Unsurprisingly, attackers have started looking for methods and
malware mitigation; • Computing methodologies → Artifi- techniques that would allow them to overcome the progress in de-
cial intelligence. tection systems and bypass behavior-based detection. For example,
attackers can frequently change the IP address corresponding to the
domain name of the command and control (C&C) server to evade
∗ Corresponding IP blacklist detection; use application layer protocols (HTTP, DNS)
Author
or external web services (Twitter, Facebook) for C&C communi-
cation to bypass the firewall; and encrypt C&C communications
to avoid payload-based botnet detector. However, these evasion
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed methods cannot bypass the widely used Flow-& ML-based botnet
for profit or commercial advantage and that copies bear this notice and the full citation detectors, which distinguish malicious relying on the statistical
on the first page. Copyrights for components of this work owned by others than ACM characteristics of flows. Moreover, these methods require extensive
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a or complicated direct source modifications, thereby placing high
fee. Request permissions from [email protected]. demands on attackers [38].
RAID ’21, October 6–8, 2021, San Sebastian, Spain One direct way to bypass the ML-based botnet detector is using
© 2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-9058-3/21/10. . . $15.00 ML’s vulnerability to attack ML detection model. Szegedy et al. [40]
https://doi.org/10.1145/3471621.3471841 first discovered that well-performing ML models are susceptible to
193
RAID ’21, October 6–8, 2021, San Sebastian, Spain Wang and Liu, et al.
adversarial attacks in the form of adding some tiny perturbations • We demonstrate how to train and deploy our system to avoid
to inputs that fool a model into producing incorrect outputs. Many ML detection on a carefully constructed botnet flow dataset
recent works have also proposed ingenious methods (L-BFGS [40], and comprehensively evaluate the evasion performance, time
FGSM [18], Deepfool [28] et al.) to craft adversarial examples, but cost, and universality of the framework.
these methods are difficult to directly apply for crafting botnet This paper is divided into 6 sections as follows: Section 2 intro-
adversarial samples. duces related works. Section 3 describes the system framework.
In the context of botnet detection, the constraints on the pertur- Section 4 shows how we set up our experiment. Section 5 discusses
bation that added to the adversarial example is no longer imper- the experimental results and conclusions. At the end of the paper,
ceptible to humans, but that it cannot affect the original malicious we summarize and prospect the work of this paper.
intention of botnet traffic. The complexity and specific format of
botnet traffic data determine the need to develop new methods for 2 RELATED WORK
constructing botnet adversarial samples to bypass ML-based botnet
detectors. 2.1 Adversarial machine learning
In this paper, a reinforcement learning(RL) -based method to In the literature, there are many works focusing on adversarial
bypass an ML-based flow-level botnet detector is presented. We attacks. Researchers have proposed various advanced attack meth-
attempt to mislead the ML-based detector by adding some perturba- ods based on how much detector information is available. Such
tions to the botnet flow. Specifically, we model the modification of knowledge of the detector can include [9]:
botnet flow as a sequential decision problem, let the RL agent learns • the training set or part of it;
the optimal modification strategy during a series of interactions • the feature space of the ML algorithm;
with the botnet detector. To ensure the preservation of functionality, • the type of ML algorithm (SVM, LR, DecisionTree, CNN, etc.)
we design an action space containing 14 incremental operations, • the trained classifier (details of the model, such as its archi-
each of which only adds a carefully crafted packet to the original tecture and hyperparameters);
flow in an attempt to change some of the flow-level characteristics. • the feedback from the detector (score-based feedback con-
The detector deems these characteristics to be discriminating, but sisting of probability- or binary-based feedback for the final
this may not be a causal indictor of benign traffic. Moreover, adding label)
packets is an incremental operation at the transport layer, while
malicious functions are generally encapsulated in the application In this paper, we group recent adversarial attack methods into
layer. Therefore, it can be guaranteed that the original malicious three attack scenarios based on different levels of adversary knowl-
intention will not be destroyed. edge about the attacked system: perfect knowledge (PK), limited
The advantages of our attack method include that (1) it is a knowledge (LK) and zero knowledge (ZK).
black box attack, which is more in line with real attack scenarios Perfect knowledge. White box attack. In this scenario, the attacker
than other methods; (2) it is general and can be used regardless of has complete information about the detector. The attacker’s goal is
whether the detector’s loss function is differentiable; (3) it is plug to minimize the sample misclassification function.
and play, the RL agent can exist as a proxy, the attack has a low [40] leveraged L-BFGS to solve the optimization problem of
evasion cost and is suitable for any botnet family. finding the minimum amount of required perturbations. C&W at-
Through extensive experiments, we prove that the current ML- tack [13] designed a new loss function with a small value in the
based botnet detector is vulnerable. Attackers can avoid detection anti-sample but a larger value in the clean sample so the adver-
by only adding a few packets to the botnet flow at a relatively small sarial example could be searched by minimizing the loss function.
cost and without any prior knowledge. FSGM [18] assumes that the loss function in the neighborhood of
The contributions of this paper can be summarized as follows: the clean sample is linear.
Deepfool [28] was the first method to use the L2 norm to limit
the disturbance size to obtain the minimum perturbation. Universal
• We propose a general black box attack framework for ML- perturbation [27] extends DeepFool to craft image-agnostic and
based botnet detectors. We assume that the attacker can universal perturbations.
access the detector and obtain the input binary discrimina-
tion result (malware/benign) but does not have any prior Limited knowledge. Gray box attack. The attacker only has lim-
knowledge of the algorithm and feature space used by the ited knowledge about the detector and cannot use gradient-based
detector. To the best of our knowledge, this work is the first approaches. However, by knowing the type or feature space of the
study about black-box adversarial attacks in botnet evasion model, attackers can mislead the detector by finding a set of fea-
field. tures that not discriminating enough via the structure features of
• We design a series of universal action spaces and encapsulate the model, the feature space or the feedback score sequence.
them in the RL framework. On the one hand, all actions are Apruzzese et al. [6] used a mixed integer linear program solver
incremental operations, thereby ensuring that the transmis- to construct an optimal adversarial example for evading flow and
sion of malicious information and functions is not affected. random forest based botnet detectors.
On the other hand, the actions are universal and well en- Papernot et al. [31] trained a substitute model after performing
capsulated, enabling attackers to escape detection without Jacobian_based dataset augmentation to accurately simulate the
complex modifications to the botnet malware. decision boundary of the detector.
194
Crafting Adversarial Example to Bypass Flow-&ML- based Botnet Detector via RL RAID ’21, October 6–8, 2021, San Sebastian, Spain
Zero knowledge. . Black box attack. Assume the attacker has With the widespread application of machine learning in botnet
zero knowledge of the model except for the binary decision of the detection field [23] [10] [16] [20] [34] [42] [41], security researchers
detector. The attacker can only evade the detector through trial and have gradually sought adversarial machine learning(AML) methods
error or by crafting adversarial examples against a joint classifier. to bypass botnet traffic detectors. That is, constructing botnet traffic
The basic idea is that if the adversarial examples can bypass each confrontation samples by adding perturbations to the botnet traffic
model in the collection, then it may bypass every single detector. samples, thereby bypassing the ML-based botnet traffic detector.
MalGAN [21] introduced GAN to generate PE malware to by- These methods can be achieved by applying traffic proxies instead
pass a black-box detector. The GAN discriminator is a substitute of modifying the botnet source code, which seems to be a more
detector model built by the attacker, while the generator is utilized attractive and lower-cost solution.
to generate adversarial malware samples. Furthermore, AML-based evasion methods can be divided into
Boundary attack [11] starts from a large adversarial perturbation two categories according to different output.
and then seeks to reduce it while remaining adversarial by perform-
ing a random walk along the boundary between the adversarial Feature space attack. refers to methods that can only generate
and the non-adversarial regions. adversarial traffic feature vectors. However, considering that the
From Table 1, we can see that most of the existing white box process of mapping traffic samples to feature vectors is irreversible,
attacks can only deal with differentiable loss functions, while gray such an attack cannot cause actual security threats and can only
box attacks can only bypass specific types of detectors. A few of the use to prove the vulnerability of the ML-based detector [25]. Evade-
existing black box attack methods do not need any prior knowledge RF [6] attempts to make botnet traffic different from the malicious
at all; instead, these methods rely somewhat on external information flows contained in the detector training dataset by slightly altering
or model-related information. This challenge motivates us to build flow-level statistical characteristics, such as flow duration, as well
a general black box adversarial attack method that can bypass any as the numbers of exchanged bytes and exchanged packets. The
ML-based botnet detector. author trained a random forest as a botnet detector on the CTU
dataset and evaluated the performance of the generated adversarial
samples. [29] discussed the learning and evasion consequences of
the gap between the generated and crafted adversarial samples, and
2.2 Botnet Evasion they achieved a white box adversarial attack against an encrypted
While the botnet detection method is constantly improving, at- C&C malware traffic detector. However, these works made the
tackers are also exploring techniques to avoid ML-based botnet unrealistic assumption that the attacker has perfect knowledge of
detection. Traditional botnet evasion techniques make C&C traffic the classifier, and this is obviously inconsistent with real attack
difficult to detect by encrypting traffic, hiding C&C information scenarios.
in redundant fields of TCP/IP protocols, or using online-social-
networks(OSN) to construct covert channels [30] [44] [5] [1] [2]. End-to-end attack. refers to an attack method that can generate
However, these methods require complicated modifications to the real traffic as output. This attack method is more suitable for real
botnet architecture or source code, which places high capability network attack scenarios, because the output can be directly used
requirements on botnet controllers, and also has a certain impact by the attacker. However, some stricter constraints will be intro-
on the availability and market value of the botnet. duced when applying AML to generate malicious traffic samples:
195
RAID ’21, October 6–8, 2021, San Sebastian, Spain Wang and Liu, et al.
(1) Keep the malicious functionality intact in the perturbed sample. the threat model of our attack method, then introduces the over-
(2) Cannot destroy the file structure and network protocol structure all framework of the proposed system, and finally introduces the
of pcap files. This leads to many AML methods in the image field important system components in detail.
that cannot be directly applied, because they perturb image pixels
indiscriminately. 3.1 Threat model
[35] proposed a method of using a generative adversarial net- We describe our threat model according to the method proposed
work (GAN) to mimic the behavior of Facebook chat network traffic in [9].
to bypass a self-adapting Stratospheric IPS. Their article attempted
to adapt the behavior of its communication channel to mimic the Adversary’s goal. The attacker’s goal is to be able to mislead the
behavior of Facebook chat network traffic according to the char- detector by generating adversarial samples, thereby increasing its
acteristics (total byte size, duration, and time delta between the invisibility. From the perspective of the CIA triad (confidentiality,
current flow and the next flow) received from a GAN. However, the integrity and availability), attackers try to reduce the availability of
article did not clearly explain how to modify the source code to network intrusion detection systems by camouflaging botnet flow.
achieve the changes indicated by the GAN, and the IPS mentioned Adversary’s knowledge. The attacker understands that the target
in the article is not a flow detector but rather a 3-tuple (victim IP, network may be protected by a flow-level network intrusion de-
server IP, server port) detector. tection system based on machine learning. However, the attacker
There are also some works [36] that have use the generation does not need to master any prior knowledge about the detector,
ability of the GAN to generate network traffic that looks as real as such as the algorithm, parameters, features or training data.
possible. The purpose of these works is to improve the quality of the
dataset for training malicious traffic detectors to deal with the data Adversary’s capability. In the evasion scenario based on adver-
imbalance problem, yet it cannot be guaranteed that the generated sarial attacks, the attacker has the ability to modify the test data
network traffic actually occurs in the data link. Furthermore, these but not the detector’s training data. Even so, because the attacker
methods do not consider whether the crafted traffic can carry out has full control of the botmaster and partial control of the bots,
the malicious function of the botnet. This leads to a completely we believe that the attacker can also update bots to change their
different scenario from that of our work, in which we aim to bypass communication behaviors by setting up a proxy. At the same time,
botnet traffic detection by constructing botnet adversarial samples. we assume that the attacker can continuously access the detector
to obtain the binary prediction result from the detector.
196
Crafting Adversarial Example to Bypass Flow-&ML- based Botnet Detector via RL RAID ’21, October 6–8, 2021, San Sebastian, Spain
197
RAID ’21, October 6–8, 2021, San Sebastian, Spain Wang and Liu, et al.
Figure 3: Boxplot of the 18 normalized flow features for botnet and normal flow
cannot delete flow packets at will. The only choice is to apply modi-
fications to areas that do not affect the implementation of malicious
functions or add new data packets. Through the analysis of botnet
flow, we find that the malicious content of botnet is encapsulated
in the application layer, so incremental operations at the transport
layer will not affect the original malicious functions.
To determine which features should be disturbed, we refer to
works on ML-based botnet detection, such as [23] [10], [16] [20]. We
find that researchers tend to extract some discriminative features
based on the working mechanism of the botnet, and these features
often have high degrees of overlap in different jobs.
Taking the difficulty of action designing and into account, we
choose 18 features from the set of features commonly used in bot-
net detection, including duration, packet per flow(ppf), packet per
second(pps), bytes pre flow(bpf), bytes per second(bps), inter-arrival
time(iat), down/up ratio and so on(fw:forward, bw:backward). These Figure 4: Action space
characteristics can be easily affected by carefully designing the time,
size and direction of the added packet. Figure 3 shows the boxplot of
the 18 normalized feature values in our training dataset for botnet
and benign flows. • Change the time interval
We can observe that due to the unique working mechanism of 2) Add a forward packet with an interval of 20s at the end
botnets, there are some differences between the ranges of values 3) Add a backward packet with an interval of 20s at the end
for some features in the botnet and normal flows. For example, • Change the image characteristics: (for the DNN model) pay
botnet will send a large number of short heartbeat packets to con- attention to the location and content of the packet
firm whether the connection is maintained, so it has a smaller ppf; 4) Add an empty packet at random location (0 < loc < 8)
downloading malicious applications and transmitting private infor- 5) Add a random packet at random location (0 < loc < 8)
mation on the bot side will result in a larger bpp; bots need to send a 6) Add a full 0 packet at random location (0 < loc < 8)
large amount of secret information in response to short commands • Change the statistical characteristics: (for the non-DNN
from botmaster, so botnet traffic tends to have a small down/up model) pay attention to the direction and size of the packet
ratio. 7) Add a forward large-size 0 packet
Based on this discovery, our action space includes 14 actions, 8) Add a backward large-size 0 packet
which can affect the above-mentioned statistical characteristics of 9) Add a forward avg-size packet with no content at the head
transport layer by simply modifying the data packet timestamp 10) Add a backward avg-size packet with no content at the
or adding carefully constructed packets. When constructing new head
packets, we mainly consider three attributes: timestamp, direction, 11) Add a forward empty packet
and packet size. These 14 actions are divided into 5 categories 12) Add a backward empty packet
according to the objects they intend to affect, as summarized in • Change the packet length
Figure 4: 13) Add a random length of 0 at the end of the packet with a
probability of 0.2
• Change the duration 14) Select two packets without payloads, add the character ’0’
1) Withhold the final TCP FIN packet for randomly 1-3s to avg-size
198
Crafting Adversarial Example to Bypass Flow-&ML- based Botnet Detector via RL RAID ’21, October 6–8, 2021, San Sebastian, Spain
4 EXPERIMENTAL SETUP
4.1 Implementation
By referring to the implementation of Tor, we deploy our system
as an adversarial proxy in the network environment, as shown in
Figure 6. The attacker can easily deploy an adversarial proxy on the
botmaster side, while on the bot side, the attacker can achieve this
by updating the bot through the original C&C channel. Therefore,
our method can be implemented without complex modifications to
the original malware.
Under this deployment scenario, all communication traffic be-
tween the attacker and the bot reaches the proxy first. Therefore,
the attacker can monitor the botnet communication traffic, and the
Figure 5: State Generator Details adversarial proxy equipped with the trained RL agent can perform
incremental actions against the botnet flow until it successfully
bypasses the detector. In this way, what the detector obtains is
The purpose of the "Change the duration" category is to affect the botnet communication traffic that has been processed by the
the duration; "Change the time interval" is to change IAT; "Change adversarial agent and is very likely to bypass the it.
the packet length" is used to disturb bpp; "Change the image char- In such an attack and defense architecture, the attacker can
acteristics" contains actions to change the input of a deep neural achieve adversarial attack-based botnet detector evasion in a com-
network-based botnet detector, so they focus on the location and pletely black box scenario.
payload of the newly inserted data packet; The actions in "Change With an aim to engage the community, we implement our RL
the statistical characteristics" are to comprehensively disturb the framework with OpenAI gym [12]. Specifically, we implement
above statistical characteristics, so the direction and size of the data SARSA and DQN agents using keras-rl [33].
packet are mainly considered.
4.2 Dataset
3.5 State space
Assessing the performance of any detection approach requires ex-
Considering that the binary feedback of the detector contains too
perimentation with data that are heterogeneous enough to simulate
little information for use by the agent, we need a state generator
real traffic at an acceptable level. We choose two public datasets:
to deliver the state of the botnet flow to the agent. To describe the
CTU [17], captured by the Malware Capture Facility Project, which
state of the current botnet flow samples concisely, we use a feature
is a research project in charge of continuously monitoring the threat
encoder with a deep structure — a stacked autoencoder (SAE) — to
landscape for new emerging threats, retrieving malicious samples
automatically extract botnet flow features and feed them back to
and running them in facilities to capture the traffic; and ISOT [37],
the agent as the state.
created by merging different available datasets. It contains both
SAE is a neural network consisting of several layers of sparse
malicious (traces of the Storm and Zeus botnets) and non-malicious
autoencoder, where the output of each hidden layer is connected to
traffic (gaming packets, HTTP traffic and P2P applications). The
the input of the successive hidden layer. SAE was first proposed by
dataset contains trace data for a variety of network activities span-
Bengio et al. [8]. To avoid the potential vanishing gradient problem
ning from web and email to backup and streaming media. This
of the deep network, SAE training is performed using unsupervised
variety of traffic makes it similar to real-world network traffic.
pretraining and supervised fine-tuning. To some extent, the pre-
We select 10 botnet families from these public datasets to form a
trained networks facilitate iterative convergence in the supervised
new dataset with the following considerations:
phase because they fit the structure of the training data, making
the initial value of the entire network have a suitable state. Because
each layer is based on the features extracted by the previous layer,
SAE is able to extract highly abstract and complex features. SAE has
achieved noteworthy performances on many feature preprocessing
and dimensionality reduction tasks.
Specifically, we take the first 1024 bytes of each botnet flow file
(because the first few packets, up to the first 20 packets, have been
shown to be sufficient for correct accuracy, even for encrypted
traffic [42]) as an input for the SAE model. After several epochs of
training, the SAE model can automatically learn a 256-dimensional
state vector of the botnet flow.
When determining the state dimension, we test 128 and 256
dimensions. Under the trade-off between the training time cost and
evasion effects, we finally set the number of feature dimensions to
256. Figure 6: Implementation of our system
199
RAID ’21, October 6–8, 2021, San Sebastian, Spain Wang and Liu, et al.
• Diversity, the dataset covers the most mainstream botnet data needs to be randomized. Specifically, we replace them with a
communication channel types (IRC, HTTP, P2P), and the randomly generated new address.
traffic characteristics are significantly different.
• Typically, the dataset includes typical botnet families’ traffic,
they’re either causes major attacks, has a large number of
4.3 Detector
controlled hosts, or adopts advanced hiding methods. In our system, the function of the detector is to predict the modi-
• Large time span, for a single family, each traffic file takes fied botnet flow and feed the binary result back to the agent. For
a long time to capture. Overall, that dataset covers botnet comparison, we choose two state-of-art detection models in our
traffic from 2011-2018. This makes the dataset more versatile experiments: the composite DL detection model combining CNN
and novel than other existing datasets. with LSTM and the non-differentiable ML detection model based
on XGBoost as our botnet detector.
We summarize the botnet families and their capture times, brief
introductions and numbers of session samples used for the experi-
ment in Table 2. BotCatcher detection model. The communication mode between
After obtaining the dataset, we perform data preprocessing the C&C server and the bot is significantly different from the com-
progress, as shown in Figure 7. munication mode between normal users, so the abnormal traffic
Step 1: Integration & pruning. We combine the collected traffic generated by the botnet can be detected through traffic analysis.
belonging to the same botnet family. If the pcap file is too large, it BotCatcher [43] uses a deep learning algorithm to automatically
is cropped. The purpose of this step is to balance the sample size of extract temporal and spatial features from network traffic and trains
each family. a softmax classifier accordingly. We show the system framework
Step 2: Splitting pcap into sessions. This is done to obtain more in Figure 8.
complete communication information. A session refers to all pack- Considering that CNN has the ability to extract local features
ets consisting of bidirectional flows, that is, the source IP and desti- and recognize spatial similarities, RNN has the ability to process
nation IP are interchangeable in a 5-tuple (SIP, SPort, DIP, DPort, sequence data and recognize temporal similarities. In the feature
Protocol). Specifically, we use SplitCap [3] to split each pcap file learning module, BotCatcher uses CNN to extract spatial features by
into sessions. converting the botnet session data into a gray image and leverages
Step 3: Anonymization & cleaning. Our traffic file is produced LSTM to learn the temporal features of packet sequences. After
from different network environments. To eliminate the IP and MAC processing these two kinds of features through the multilayer neural
address’ effects on the detector, the unique information of the traffic network, BotCatcher puts them into a softmax layer to identify
abnormal traffic patterns. The author used the CTU dataset to
verify the effectiveness of the proposed model, while we obtain a
99.6% detection rate on our dataset.
200
Crafting Adversarial Example to Bypass Flow-&ML- based Botnet Detector via RL RAID ’21, October 6–8, 2021, San Sebastian, Spain
Table 3: XGBoost detector’s feature set. agent–BotCatcher detector, and DQN agent–XGBoost detector.
Each agent is trained for action_num ∗ sample_num rounds.
Feature Importance Feature Importance
max_fiat 0.188713 max_forward_pkt_len 0.003005
std_fiat 0.112026 std_backward_pkt_len 0.002809 5.1 Evasion performance
duration 0.110066 total_pkt_len 0.002221
forward_header_len 0.103142 std_pkt_len 0.001829 The evasion rate illustrates the probability that the botnet flow
total_fiat 0.074335 mean_forward_pkt_len 0.001568 adversarial example can successfully evade the detector. Table 4
mean_fiat 0.071723 mean_pkt_len 0.001502
flowPktsPerSecond 0.066497 min_pkt_len 0.001372 compares the testing evasion rates of four system examples after
fPktsPerSecond 0.050493 NPEx 0.000719 training on 10 botnet families. Among them, XGBoost-random and
bPktsPerSecond 0.03671 max_backward_pkt_len 0.000523
IOPR 0.028872 mean_backward_pkt_len 0.000457 BotCatcher-random represent the situations in which we use a ran-
backward_header_len 0.02652 total_forward_pkt_len 0.000392 dom strategy to select actions from the action space for modifying
SameLenPktRatio 0.021164 flowBytesPerSecond 0.000327
max_biat 0.01633 min_forward_pkt_len 0.000261
the botnet flow. On the one hand, they are the baselines for mea-
total_biat 0.015546 std_forward_pkt_len 0.000261 suring whether the agent is effective. In other words, if the agent’s
total_packets 0.013848 total_backward_pkt_len 0.000196
mean_biat 0.012542 fBytesPerSecond 0.000131
performance is not as good as the random strategy, it means that
std_biat 0.011431 min_backward_pkt_len 6.53E-05 the agent has not learned anything useful. From Figure 1, we can
max_pkt_len 0.010255 min_fiat 0 see that the results of the RL algorithm are better than those of
total_fpackets 0.007773 min_biat 0
total_bpackets 0.004377 bBytesPerSecond 0 the random strategy in any case. On the other hand, this can also
prove that our action space is indeed effective because even random
selection can bypass the detector with a certain probability.
Dhaliwalet et al. [15] chose XGBoost to build an IDS system. The Contrasts between families. From Table 4, we can find that the
performance of XGBoost is better than those of many other models evasion rates vary among different botnet families; Storm even has
because it can effectively deal with the problem of data surplus and an evasion rate of 0 with XGBoost-SARSA. With an unchanging
can be processed in parallel. Therefore, XGBoost is very suitable action space, the influence of existing actions on different family
for dealing with real-world networks. samples vary. Through a statistical analysis of each family sample,
In their work, the authors performed a cross-validation exper- we find that a lower evasion rates may be due to the sessions
iment using the NSL-KDD dataset (a csv file containing the 41- containing the largest numbers of packets (such as Storm, which
dimensional characteristics of malicious traffic), achieving a result is a P2P-based botnet that has a large session size), so the effect
(98.7%) better than those of other machine learning algorithms. The of adding packets or changing the timestamps on eigenvalues is
model structure and hyperparameter settings used in this paper are relatively small. Alternatively, the characteristics of botnet samples
completely consistent with they used (they tuned the hyperparam- are very diverse from those of benign samples, causing the agent
eter settings), and we obtain detection result (98.9%) that is as good to fail to convert it into a benign sample within a limited set of
as the original work on our dataset. We list the extracted features action_num steps. In practical applications, the attacker can trade-
and their importance in Table 3. off between the evasion rate and the size of the perturbation to the
traffic sample, and increase the action_num appropriately.
5 RESULTS Contrasts between agents. By comparing the evasion rates of the
To evaluate our system, we divide the dataset into four disjoint sub- system instances with the same detector but different agents in
sets: a detector training set, an agent training set, a detector testing Figure 1, we find that in most cases, the SARSA agent performs
set, and an agent testing set, at a ratio of 4:4:1:1. The disjointness is better than the DQN agent. We think that this is caused by the
to test the generalization ability of our attack model, and to better intrinsic difference between the two RL algorithms. SARSA is an on-
simulate the real attack scenario, where the attacker may not be policy algorithm that is more cautious than Q-learning. Q-learning
able to obtain the training data of the target detector. always thinks about maximizing Q functions, regardless of other
To compare the performances of different RL algorithms and de- non-maxQ results. SARSA is a conservative algorithm that cares
tectors, we implement the following four system instances: SARSA about every step of the decision and is sensitive to errors and death.
agent–BotCatcher detector, SARSA agent–XGBoost detector, DQN Therefore, when we aim to generate adversarial examples with
201
RAID ’21, October 6–8, 2021, San Sebastian, Spain Wang and Liu, et al.
Menti Rbot Murio virut Miuref Neris HTBot Dridex Trickbot Storm
XGBoost-SARSA 87% 83% 76% 86% 66% 66% 61% 41% 31% 0%
XGBoost-DQN 85% 77% 75% 85% 60% 56% 49% 41% 30% 1%
XGBoost-Random 75% 72% 68% 71% 54% 42% 45% 34% 24% 0%
BotCatcher-SARSA 21% 26% 24% 40% 42% 34% 54% 38% 59% 73%
BotCatcher-DQN 21% 22% 22% 41% 38% 28% 52% 50% 51% 64%
BotCatcher-Random 17% 19% 20% 37% 37% 27% 48% 37% 42% 59%
Menti Rbot Murio Virut Miuref Neris HTBot Dridex Trickbot Storm
XGBoost-SARSA 1.35 3.21 1.14 2.27 1.42 2.93 2.56 1.81 2.33 -
XGBoost-DQN 2.14 5.26 1.87 3.76 2.34 4.45 4.11 3.01 4 1.61
XGBoost-Random 6.42 10.26 5.14 6.05 6.17 8.05 7.12 8.56 7.12 -
BotCatcher-SARSA 6.54 4.24 7.11 3.05 4.73 6.04 3.06 2.19 2.57 4.93
BotCatcher-DQN 8.3 6.36 7.48 3.33 5.82 4.44 3.33 2.38 3.54 4.02
BotCatcher-Random 11.43 11.91 11.14 9.03 9.18 10.11 8.05 7.13 9.06 10.16
We use the average number of queries that are necessary for the agent to craft effective adversarial
samples to evaluate the system’s time performance.
fewer steps and tiny perturbances, SARSA agents may be more botnet family it deals with. This result also further shows that our
suitable than DQN agents. system has high availability. If our system has a large gap in terms
of its time performances when facing different botnet families, the
Contrasts between detectors. As shown in Table 4, the evasion rate attacker may carefully consider whether our system can adapt to
with XGBoost as the detector is higher than that with BotCatcher in his botnet.
most of the botnet families’ results, and we believe this is because
actions have a greater impact on statistical features than image From the perspective of the RL algorithms. , the agent equipped
features. By analyzing the statistical features of the XGBoost model with the DQN algorithm often requires more steps to bypass the
shown in Table 3 and the action space described in section 3, we can detector than the agent equipped with the SARSA algorithm. This
see that (i) actions in the action space all directly or indirectly affect means that the action selected by DQN according to the greedy pol-
the statistical features, whether they are designed for statistical icy may not be the optimal action in the current state, so the agent
features or image features; (ii) many actions in the action space even needs to perform additional actions to accumulate interferences to
change the most dependent features of XGBoost (action 1-duration, bypass the detector.
action 2-foward iat, action add-pps, action 9&10-header_len, etc.).
Therefore, we conclude that the targeted modification and high From the perspective of the detectors. , the number of steps re-
consistency between the action space and the detector’s feature quired for the agent to bypass BotCatcher is higher than that of
space are largely responsible for the vulnerability of the detector. XGBoost, meaning that the agent needs more iterations to mislead
BotCatcher than to mislead XGBoost. In other words, XGBoost
not only has a higher evasion rate but the evasion samples also
5.2 Time performances
require relatively fewer steps than those of BotCatcher. Therefore,
A major point in black box adversarial attacks is to issue the least we can conclude that XGBoost is more vulnerable than BotCatcher
amount of queries to the target model: if the proposed approach according to the experiments of this work.
requires a very high number of queries, then its feasibility in real-
world context would affected. We use the average number of queries
that are necessary for the agent to craft effective adversarial samples
5.3 Dominant actions
to evaluate the system’s time performance. The results are shown Dominant mutations refer to the most frequent actions taken by the
in Table 5. agent when successfully evading the detector. Each botnet family
has its own unique features, so the flows of different families also
From the perspectives of the botnet families. , the time perfor- differ in terms of certain characteristics. For example, the ppf of
mances between different families does not change significantly, Trickbot is 8.66, while that of HTBot is 28.34. Therefore, we guess
meaning that as long as our agent is well trained, it will have a that the influences produced by different actions should be different
better performance than that of the baseline regardless of what for each family, so each botnet family’s dominant actions should
202
Crafting Adversarial Example to Bypass Flow-&ML- based Botnet Detector via RL RAID ’21, October 6–8, 2021, San Sebastian, Spain
Action No. Menti Rbot Murio Virut Miuref Neris HTBot Dridex Trickbot Geodo Storm Waledac
1 0.51 0.02 0.49 0.00 0.03 0.29 0.06 0.00 0.00 0.06 0.01 0.06
2 0.01 0.00 0.01 0.01 0.00 0.03 0.01 0.06 0.42 0.05 0.01 0.06
3 0.01 0.02 0.00 0.32 0.03 0.01 0.02 0.07 0.00 0.01 0.06 0.28
4 0.00 0.14 0.02 0.01 0.09 0.01 0.00 0.06 0.00 0.07 0.01 0.04
5 0.00 0.04 0.02 0.19 0.02 0.03 0.00 0.00 0.00 0.08 0.01 0.02
6 0.09 0.08 0.04 0.01 0.02 0.01 0.31 0.00 0.00 0.03 0.82 0.02
7 0.01 0.01 0.00 0.00 0.26 0.00 0.00 0.21 0.02 0.01 0.01 0.02
8 0.02 0.01 0.01 0.12 0.00 0.13 0.34 0.19 0.32 0.27 0.03 0.01
9 0.00 0.00 0.00 0.00 0.01 0.02 0.00 0.12 0.02 0.06 0.01 0.01
10 0.00 0.55 0.10 0.00 0.35 0.01 0.00 0.21 0.07 0.02 0.01 0.01
11 0.00 0.05 0.20 0.00 0.01 0.04 0.06 0.00 0.15 0.15 0.01 0.24
12 0.20 0.01 0.01 0.01 0.09 0.01 0.00 0.00 0.00 0.01 0.01 0.08
13 0.07 0.08 0.07 0.03 0.03 0.12 0.19 0.08 0.02 0.16 0.01 0.03
14 0.07 0.00 0.02 0.29 0.04 0.29 0.00 0.00 0.00 0.04 0.01 0.12
be different. To test this hypothesis, during the test, we record the that can mislead the detector by exploring the feature set that the
action list taken by the agent and count the frequency of each action. detector depending on to add perturbations to these features in a
Table 6 shows the result under the BotCatcher-SARSA instance, targeted manner. Second, the current actions have a large impact
where the action number corresponds to that described in section 3. on the botnet flow, and we could reduce the perturbations through
From Table 6, we can find that there are large differences in the an iterative method.
distributions and the dominant actions of different families, and We believe the framework proposed in this paper can promote
these are often related to the characteristics and main functions of the research of adversarial botnet flow examples and have a positive
different family flows. impact on the botnet detection field.
Taking Rbot and Menti family as examples, by statistics, we
obtain that the median of the Rbot family’s duration is 9.02 s, while ACKNOWLEDGMENTS
that of the Menti family is 2.91 s. At the same time, we can see
This work is supported by the Youth Innovation Promotion Associ-
from Table 6 that the action chanдetimestamp’s impact on the Rbot
ation CAS (No.2019163), the National Natural Science Foundation
samples (0.02) is significantly less than that on the Menti samples
of China (No.61902396), the Strategic Priority Research Program
(0.51). That is, family flows with short durations may be affected
of Chinese Academy of Sciences (No. XDC02040100), the Key Lab-
more by the chanдetimestamp action than family flows with long
oratory of Network Assessment Technology at Chinese Academy
durations. Therefore, we determine that the influences of actions on
of Sciences and Beijing Key Laboratory of Network security and
different families are closely related to the statistical characteristics
Protection Technology.
or image characteristics of each family.
6 CONCLUSION REFERENCES
[1] 2008. https://en.wikipedia.org/wiki/Conficker
In this paper, we propose a general RL-based framework to craft ad- [2] 2008. https://en.wikipedia.org/wiki/Gh0st_RAT
versarial botnet flow examples,so as to launch black box adversarial [3] 2011. SplitCap. https://www.netresec.com/?page=SplitCap.
[4] Abdullah Al-Dujaili, Alex Huang, Erik Hemberg, and Una-May O’Reilly. 2018.
attacks against ML-based botnet flow detectors. Adversarial deep learning for robust detection of binary encoded malware. In
To ensure that the original malicious functions of the botnet flow 2018 IEEE Security and Privacy Workshops (SPW). IEEE, 76–82.
will not be affected when modifying the botnet flow, we design [5] Manos Antonakakis, Tim April, Michael Bailey, Matt Bernhard, Elie Bursztein,
Jaime Cochran, Zakir Durumeric, J. Alex Halderman, Luca Invernizzi, Michalis
an action space with 14 functionality-preserving actions. These Kallitsis, Deepak Kumar, Chaz Lever, Zane Ma, Joshua Mason, Damian Menscher,
actions can change some important transport layer characteristics Chad Seaman, Nick Sullivan, Kurt Thomas, and Yi Zhou. 2017. Understanding the
but will not affect the application layer information that contains Mirai Botnet. In 26th USENIX Security Symposium (USENIX Security 17). USENIX
Association, Vancouver, BC, 1093–1110. https://www.usenix.org/conference/
malicious functions. We select 14 botnet families to build a new usenixsecurity17/technical-sessions/presentation/antonakakis
botnet dataset for evaluating our method. Through experiments, [6] Giovanni Apruzzese and Michele Colajanni. 2018. Evading botnet detectors
based on flows and random forest with adversarial samples. In 2018 IEEE 17th
we prove that ML-based botnet detectors are indeed susceptible to International Symposium on Network Computing and Applications (NCA). IEEE,
adversarial attacks, and our system can obtain considerable evasion 1–8.
rates for different botnet detection models with fewer queries. [7] Shumeet Baluja and Ian Fischer. 2018. Learning to Attack: Adversarial Transfor-
mation Networks.. In AAAI, Vol. 1. 3.
Although we achieve remarkable performance, our methods can [8] Yoshua Bengio, Pascal Lamblin, Dan Popovici, and Hugo Larochelle. 2007. Greedy
be improved in some ways. First, we can explore additional actions layer-wise training of deep networks. In Advances in neural information processing
203
RAID ’21, October 6–8, 2021, San Sebastian, Spain Wang and Liu, et al.
204