A - Survey - of - Adversarial - Attack - and - Defense - Methods - For - Malware - Classification - in - Cyber - Security
A - Survey - of - Adversarial - Attack - and - Defense - Methods - For - Malware - Classification - in - Cyber - Security
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: SURVEY OF ADVERSARIAL ATTACK AND DEFENSE METHODS FOR MALWARE CLASSIFICATION 469
• In order to better understand the evolution of attack- work directions from the perspective of both attackers and
ers and defenders, we systematically survey the defenders.
Defense-Attack-Enhanced-Defense process of malware
classification under the unified framework, including E. Structure of the Paper
(i) ML-based malware classification methods, (ii) adver-
The roadmap of this paper is shown in Figure 1. We
sarial attack methods on ML-based malware classifiers,
introduce the unified malware classification framework in
and (iii) defense methods to enhance the adversarial
Section II. Section III surveys several malware classifica-
robustness of ML-based malware classifiers.
tion methods based on either static or dynamic features.
• We study the main challenges in the Defense-Attack-
Sections IV and V introduce various adversarial attack and
Enhanced-Defense process and further propose some
defense methods of adversarial malware classification. We dis-
promising future work directions from the perspective of
cuss future work directions in Section VI and draw conclusions
both attackers and defenders.
in Section VII.
predictions and produce final predictions. Binary- to the classification objectives, malware classification prob-
classification methods produce probabilities of malicious lems can be categorized into two different types:
or benign examples, while multi-classification methods 1) Binary-Classification: A given software example is deter-
produce probabilities of different malware families. mined as malicious or benign, which is also called
The data preprocessing phase manipulates original examples malware detection in many studies.
in the input space, which is related to their format (e.g., binary 2) Multi-Classification: A given malware example is classi-
input space for binary executable files). The feature collection fied into one malware family.
phase maps input examples from input space to feature space. The input features used for malware classification are
The feature space consists of selected features that represent generally divided into static features and dynamic features.
the key properties of input examples. The features extraction (i) Malware classification based on static features extracts
phase works in feature space and produces input features for static information of the software example for classification,
the classification phase. without actually executing the program. Information such as
We summarize various ML-based malware classification binary code, assembly code, and source code is commonly
methods under the unified framework. It is worth mentioning used. (ii) Malware classification methods based on dynamic
that some of the methods do not implement all of these phases. features execute the software examples in an isolated envi-
Moreover, both adversarial attack and defense methods can ronment. The dynamic metrics of the running program are
also be presented under the unified framework. For adversarial extracted as input features for malware classification.
attacks on ML-based malware classifiers, attackers manipu- Static methods do not require the actual execution of pro-
late original examples in the feature space using knowledge grams, which are more efficient and thus capable of real-time
acquired from the classification and decision making phases. classification. On the other hand, the dynamic features of
Attackers map the modified examples back to the input space programs at runtime are intuitively more powerful evidence
to generate usable adversarial examples. For defense methods indicating malicious behaviors, which cannot be captured by
to enhance the robustness of ML-based malware classifiers, static methods. However, dynamic methods need to execute
defenders develop defense techniques in different phases in programs and thus inevitably lead to more computational costs.
the unified framework to defend against adversarial malware With the rapid development and broad application of ML,
attacks. Among the five stages, only the feature collection especially DL, ML-based models are extensively utilized for
phase cannot be enhanced, since defenders should not change malware classification. In this section, we systematically sur-
the types of features collected by malware classifiers. vey ML-based malware classification methods. We classify
these methods according to the types of features they use for
classification, as shown in Figure 3. We describe them under
III. ML-BASED M ALWARE C LASSIFICATION the unified malware classification framework as introduced in
Malware classification problems refer to the classification of Section II. We summarize them in Table II according to the
given software examples based on certain criteria. According techniques used in different phases of the framework. Besides,
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: SURVEY OF ADVERSARIAL ATTACK AND DEFENSE METHODS FOR MALWARE CLASSIFICATION 471
TABLE I
we analyze and compare different types of ML-based malware C OMMONLY-U SED M ALWARE C LASSIFICATION DATA S OURCES
classification methods in Table III. Then, we introduce several
challenges and corresponding solutions in the malware classi-
fication domain. Some commonly-used malware classification
data sources are listed in Table I.
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
472 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 25, NO. 1, FIRST QUARTER 2023
TABLE II
ML-BASED M ALWARE C LASSIFICATION M ETHODS
identify them. The authors use the transfer learning method Challenge (BIG 2015) dataset [63]. Results show that DCGAN
to train the malware classifier based on the discriminator. achieves 95.74% accuracy and it can deal with zero-day
DCGAN is evaluated on the Microsoft Malware Classification malware.
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: SURVEY OF ADVERSARIAL ATTACK AND DEFENSE METHODS FOR MALWARE CLASSIFICATION 473
TABLE III
C OMPARISON A MONG ML-BASED M ALWARE C LASSIFICATION M ETHODS
2) Raw Byte-Based Methods: Directly using raw bytes as Instead of n-grams patterns, Wang et al. [77] propose
input features does not require the data preprocessing and to leverage information entropy for feature extraction. They
feature collection phases, but it is hard to implement due to first unpack the original files and use the W32asm tool to
the limited input length of DL models. To address this issue, decompile them. The collected opcode sequences are fur-
Raff et al. [14] propose the MalConv framework, which uses ther selected according to their information entropy. The
the 1D-CNN network to process raw byte sequences. In the authors also propose a density-based clustering algorithm
convolutional layer, large kernel sizes and stride values are set to measure the similarities among examples and identify
to compress input features and reduce the memory overhead. malicious ones. The proposed method is evaluated on a
Besides, the DeConv regularization is implemented to solve dataset containing 11,655 malicious and 1,000 benign exam-
the overfitting problem. MalConv is the first practical malware ples. It achieves 88.45% accuracy and high classification
classifier directly using raw bytes as input features. Besides, efficiency.
Raff et al. further improve both memory efficiency and speed Jeon and Moon [23] propose to use the Convolutional
of the original MalConv framework [76]. They propose the Recurrent Neural Network (CRNN) for malware classification.
fixed-memory convolution, which leverages the sparse gradi- To solve the problem of input length, in the feature extraction
ent of max-pooling layers to break the limitation of the input phase, the Opcode-level Convolutional Autoencoder (OCAE)
length. The Global Channel Gating (GCG) technique is also compresses opcode sequences. The compressed sequences
introduced to handle long time-steps of inputs. The improved are then converted into vectors using one-hot encoding and
MalConv framework is tested on the EMBER [64] dataset and word embedding techniques. Then, a deep RNN model makes
achieves 93.29% accuracy. final predictions. The authors run tests on a dataset includ-
3) Opcode-Based Methods: Opcodes, a.k.a, operation ing 1,000 malicious examples from VirusShare and 1,000
codes or instruction codes, are part of the executed instructions benign examples from Windows 10. Results show that the
in machine language. Opcodes can be used as important fea- CRNN-based classifier achieves 96% accuracy and 95% TPR.
tures for malware classification. Shabtai et al. [22] propose to 4) Function Call-Based Methods: Function calls indicate
use opcode patterns for malware classification. Binary files are the behavioral features of software examples without actually
preprocessed by unpacking tools and disassembled by IDA Pro executing them. Microsoft Windows systems use the Dynamic-
to extract opcode sequences. Term Frequency (TF) and Term Link Library (DLL) to share codes and data across programs.
Frequency-Inverse Document Frequency (TF-IDF) algorithms Schultz et al. [78] propose to use DLL function calls for mal-
are utilized to calculate opcode n-grams patterns of each file. ware classification. In the feature collection phase, the GNU
The 1,000 most frequent patterns are used as input features. Bin–Util is used to analyze Windows PE binary files. Three
The authors train eight ML models on a dataset containing kinds of features are processed and selected: (i) the list of
26,093 examples collected from VX Heaven and Windows DLLs, (ii) the list of DLL functions calls, and (iii) the number
XP and labeled by Kaspersky anti-virus. Results show that of different function calls in each DLL. The processed features
the proposed methods achieves up to 95% accuracy. are fed into a rule-based classifier named Ripper. The authors
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
474 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 25, NO. 1, FIRST QUARTER 2023
run tests on a dataset containing 4,266 software examples and software. Patterns of API calls can be used for malware
Ripper achieves 89% accuracy. classification.
Huda et al. [79] use function call statistics for malware clas- Huang and Stokes [25] propose a multi-task malware clas-
sification. Input examples are first unpacked and processed by sification framework named MtNet. They use a light-weighted
IDA Pro to acquire function call lists. The filter and wrap- emulation system to run input files and extract API calls,
per approaches are proposed, where the filter uses function which are further merged into high-level features. These high-
statistics to extract feature subsets and the wrapper uses the level features are ranked based on mutual information analysis
SVM-based model to make predictions on the feature subsets. and reduced using the Principal Component Analysis (PCA)
The classifier runs these two approaches iteratively and finally method. A DNN model with two kinds of softmax out-
acquires the ability to automatically select the optimal feature put layers is trained to solve the multi-task problem. The
subset. The proposed method is evaluated on a dataset con- dataset is provided by Microsoft and includes 6.5M exam-
taining 66,703 software examples. Experimental results show ples. Results show that MtNet achieves 99% accuracy for
that it achieves over 94% accuracy with a feature subset of binary-classification and 97% accuracy for multi-classification.
only 14 function calls. Inspired by natural language processing methods,
5) Hybrid Static Methods: Arp et al. [67] propose the Pascanu et al. [26] propose to learn the language that
Drebin framework for Android malware classification, which software speaks for malware classification. Low-level API
leverages multiple static features extracted from Android apps. calls are encoded into a high-level event streams by anti-virus
Drebin collects static features from both Android manifest engines. Echo State Networks (ESNs) and RNNs are trained
files and disassembled codes, such as requested permissions, to predict the next API using unsupervised learning. The
API calls, and network addresses. The collected features trained models are able to extract key features from event
are vectorized and fed into an SVM-based classifier. The streams. Their hidden states are fed into ML models, such as
authors also design sentence templates to automatically gener- LR and Multilayer Perceptron (MLP). The proposed method
ate explanations. Drebin is evaluated on a dataset containing is evaluated on a dataset containing 50,000 Windows PE files.
5,560 malicious and 123,453 benign Android apps. Results Results show that it achieves 98.3% TPR and 0.1% FPR.
show that it achieves 93% accuracy and 1% FPR. Instead of only using API names, Rabadi and Teo [81]
Schultz et al. [78] propose to use strings and bytes extracted propose to use both names and parameters of API calls as fea-
from programs as features. They use the GNU strings tool tures. The authors first run the input examples on the Cuckoo
to acquire printable characters and use the HexDump tool Sandbox environment and extract their API calls. The API
to acquire hexadecimal strings. An ensemble of multiple arguments contain key features but lead to high computa-
Naive Bayes (NB) models is chosen as the malware clas- tional costs. To address this issue, two light-weighted API
sifier. Different from them, Saxe and Berlin [24] select information extraction methods are proposed. The extracted
four static features including entropy histograms,Windows API-based features are vectorized by the hashing function.
PE import table, Windows PE metadata, and contextual fea- ML-based models are trained as malware classifiers such
tures. A DNN model with multiple layers is used as the as SVM, Decision Tree (DT), eXtreme Gradient Boosting
classifier. Instead of making binary predictions, the proposed (XGBoost), etc.
framework also provides malicious probabilities for security Amer et al. [27] propose to use the API fusion for mal-
analysts. ware classification, which makes full use of the behavioral
There are a large number of malware examples and source features of API call sequences. To acquire high-level API fea-
codes in public archives like GitHub. Rokon et al. [80] propose tures, original API calls are vectorized using the Word2Vec
the SourceFinder framework to identify malware repositories model and clustered according to their similarities. Bigrams of
on GitHub. To narrow down the search scope, they select key- API calls are also extracted. The authors use these high-level
words to query the repositories related to malware on GitHub. sequential features to create transition matrices. The Maximum
Multiple string-based features are collected, such as the titles, Likelihood Estimation (MLE) method calculates the transi-
descriptions, topics, folder names, etc. These features are fur- tion probabilities and makes the final decision. The proposed
ther vectorized using the Bag of Words (BoW) model and the method is tested on a dataset with 68,698 Windows PE files
word embedding model. Seven ML-based models are evalu- and 70,693 Android apps. Results show that it achieves 99.7%
ated on a dataset containing 1,000 repositories collected from and 97.7% accuracy, respectively.
GitHub and manually labeled by experts. Results show that 2) Network-Based Methods: Malware can use network
it achieves 89% accuracy. Besides, SourceFinder identifies communications to achieve malicious purposes, such as
7,504 malware-related repositories on GitHub. downloading files and leaking data. Nari and Ghorbani [82]
propose a malware classification framework based on network
behaviors. It first uses the TShark tool to extract network flows
B. Malware Classification Based on Dynamic Features based on the protocols and ports. A network behavioral graph
Some malware classifiers use dynamic features for clas- is generated for each example to represent the dependencies
sification, including API calls, network features, behavioral between network flows. Then, multiple graph-based features
graphs, etc. are calculated, such as graph size, root out-degree, average
1) API-Based Methods: Application Programming out-degree, etc. The extracted features are vectorized and clas-
Interfaces (APIs) indicate the dynamic behaviors of sified by a DT model. The authors run tests on the CRC
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: SURVEY OF ADVERSARIAL ATTACK AND DEFENSE METHODS FOR MALWARE CLASSIFICATION 475
dataset in the AES network sandbox. Experimental results is compressed into data packets and then converted into
show that the proposed method achieves over 94% accuracy. time series of grayscale images. HeNet uses transfer learning
Different from them, Arivudainambi et al. [92] use PCA to to train multiple classifiers from pre-trained networks such
reduce network-based features. The dimensionality reduction as VGG [72] and Inception [93]. Predictions from different
framework automatically learns compressed input features. A classifiers are combined as an ensemble model.
CNN-based model serves as the malware classifier. Some researchers also propose to implement malware clas-
3) Behavioral Graph-Based Methods: Some methods con- sifiers using hardware. Ozsoy et al. [85] propose the Malware-
vert dynamic features into behavioral graphs so that malware Aware Processors (MAP) method, which first collects multiple
classification problems are converted into graph classification hardware features related to hardware events, memory, instruc-
problems, which can be solved by DL-based methods. tions, and branches. MAP implements LR and NN models
Wang et al. [28] propose the ProvDetector framework to with x86-based hardware. The online detection unit combines
monitor software examples at the kernel level and use their predictions using the Exponentially Weighted Moving Average
interactions with the operating system for malware clas- (EWMA) method. The dataset contains 1,087 examples from
sification. ProvDetector first builds the provenance graphs 9 malware families, which are collected online and labeled
of processes and uses the Rare Path Selection algorithm by VirusTotal. Results show that MAP achieves up to 94%
to extract key paths. The selected paths are vectorized accuracy with high classification efficiency.
and fed into the density-based Local Outlier Factor classi- IoT devices have become popular targets of attackers in
fier. Then, a threshold-based method makes final decisions. these years [3], [5]. Pham et al. [86] propose a hardware-based
The dataset is collected from VirusShare and VirusSign method using side-channel features for IoT malware classifica-
and contains 15,000 malware examples. Results show that tion. It first collects ElectroMagnetic (EM) field-based features
ProvDetector achieves 96.5% accuracy with low computational on IoT devices, which are unable to be detected or manipu-
costs. lated by malware. Then, the Short-Time Fourier Transform
Besides running time, malicious behaviors may also per- (STFT) algorithm is leveraged to transfer time-domain EM
form in the software installation stage. Han et al. [29] propose data to the spectrogram. The authors also use Linear
the SIGL framework to detect malicious installation behav- Discriminant Analysis (LDA) method to reduce input fea-
iors. SIGL converts log files into installation graphs and uses tures. Several ML models are evaluated on a dataset contains
the Word2Vec model to vectorize them. To learn the patterns 4,790 ARM IoT malware examples collected from Virusign.
of benign installation graphs, SIGL leverages an autoencoder Results show that the proposed method achieves up to 99%
framework, including a Graph-LSTM encoder and an MLP accuracy.
decoder. Malicious activities lead to high reconstruction loss 5) Hybrid Dynamic Methods: Some researchers use a mix-
in the proposed autoencoder network, which is used to detect ture of multiple dynamic features for malware classification.
and prioritize anomalous behaviors. To construct the dataset, Mohaisen et al. [87] propose the AMAL malware classifi-
the authors combine benign installers with malware exam- cation framework consisting of AutoMal and MaLabel sub-
ples from VirusTotal, in order to acquire malicious installers. systems. Specifically, AutoMal collects dynamic information
Besides, SIGL can also provide valuable guidance information associated with files, memory, network, and registry, by run-
for security analysts. ning and monitoring programs in virtualized environments.
Kwon et al. [83] observe that even benign executable files MaLabel generates high-level features from the collected
can download other malicious programs online. To this end, information. Multiple ML models and clustering algorithms
the Downloader Graph Abstraction (DGA) method is proposed are evaluated on two malware datasets containing 4,000 and
to describe the download activities of programs. DGA first 115,000 examples. Results show that AMAL achieves 99.5%
collects downloading data from IDS and anti-virus engines. precision for classification and 98% precision for clustering.
Downloader Graph (DG) and Influence Graph (IG) are con- Alsulami and Mancoridis [88] propose to use multiple
structed based on the collected data. DG is built for each host behavioral features of Windows PE files for classification.
and IG is built for each downloader, where IG is a subgraph The Windows Cache Manager (WCM) monitors software and
of DG. Multiple features of IG are extracted as input fea- stores runtime information in prefetch files. The authors col-
tures, such as IG diameter, growth rate, and URL patterns. RF lect the lists of loaded files from prefetch files and then
is selected as the malware classifier and trained on a dataset vectorize them as input features. The DL-based malware clas-
containing 88,214 examples. Results show that DGA achieves sifier includes a convolutional layer and an LSTM layer. The
96% TPR and 1% FPR. proposed method is evaluated on a dataset containing 100,000
4) Hardware-Based Methods: Existing malware classifi- examples collected from VirusShare and labeled by anti-virus
cation methods mainly use software-based features. Their engines. Results show that it performs well on rare malware
high computational overhead is a major issue for real-time families and can be extended to new families.
classification. To this end, malware classification methods
based on hardware features are proposed.
Chen et al. [84] propose the HeNet framework based on C. Malware Classification Based on Hybrid Features
control flow traces collected by hardware. HeNet first uti- Some researchers also propose to use the mixture of both
lizes the Intel Processor Trace (Intel PT) tool to acquire the static and dynamic features for malware classification. Such
control flows with low overhead. The collected information methods take advantage of both types of features.
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
476 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 25, NO. 1, FIRST QUARTER 2023
Attackers apply anti-analysis obfuscation techniques to And how to identify the performance degradation is still an
increase the difficulty of malware classification and analysis. open issue. Another potential direction is to extract more stable
To address this issue, O’Shaughnessy and Sheridan [89] features as the internal semantic information used in [98]. But
propose a hybrid multi-classification framework. For static whether these features are stable enough for the aging problem
features, it directly uses raw bytes of the input examples. remains to be further investigated.
For dynamic features, it implements the Virtual Machine 2) Data Bias Problem: Existing ML-based malware classi-
Introspection (VMI) tool to execute malware and collect fication methods have achieved high accuracy in experimental
memory dump files. The processed data is converted into settings. Malware classification seems to be a well-solved
images using space-filling curves. Then, the framework problem. However, the distribution bias of the training and
extracts visual features from the images and uses ML-based testing datasets makes it difficult for the trained models to
models as malware classifiers. To evaluate the proposed frame- generalize to real-world data distributions.
work, 13,599 Windows PE malware examples from 23 families Some distribution bias is due to careless data acquisition,
are collected from VirusTotal. Results show that it achieves which introduces additional features to malicious examples.
over 97% accuracy. These features are irrelevant to malicious behaviors, but they
Different from them, Yoo et al. [90] propose to leverage are extracted and learned by malware classifiers. Such prob-
the structural and operational features of Windows PE files lems cause the classifiers to predict unexpected results. For
for malware classification. For static features, the proposed example, malware developers commonly employ packing tech-
method collects and computes the sizes, count-based features, niques to prevent feature extraction. Thus, the model trained
entropy features, etc., from input examples. For dynamic fea- on packed malicious and unpacked benign examples produces
tures, API calls, location features, network features, etc. are high FPR on packed benign examples, which means the model
collected. Both static and dynamic features are combined as uses ‘Packing’ as malicious features [99].
input features. The authors design an ensemble of RF and As proposed in [100], there are generally two kinds of
MLP-based models as the malware classifier, which uses major biases: (i) spatial bias, which means training and testing
voting to make final decisions. distributions are different from real-world data distributions,
Xu et al. [91] propose the Hybrid Analysis for Detection of and (ii) temporal bias, which means the division of training
Malware (HADM) method for Android malware classification. and testing dataset ignores temporal order. However, identi-
HADM uses hybrid features of APK examples for classifica- fying and eliminating the dataset biases is highly related to
tion, including requested permissions, API calls, advertising data preprocessing, label aggregation, engine independence,
networks, instruction sequences, etc. The collected features engine reputation, experiments, malware coverage, and data
are vectorized and fed into DNN-based models to extract sharing [101], which are still open issues.
high-level features. The outputs of different DNN models are 3) Virtual Environment Problem: Sandboxes are commonly
combined using the Multiple Kernel Learning (MKL) method. utilized by researchers to run software examples and extract
The HADM framework is evaluated on an Android APK dynamic features for malware classification. However, some
dataset, which is collected from Google Play and VirusShare malware can use several indicators (e.g., static fingerprints,
and contains 5,888 APK examples. Results show that HADM behavioral patterns, hardware-based features, etc.) to identify
achieves over 94% accuracy. the executing environment and hide malicious behaviors in
virtual environments [102], [103], [104].
To address this issue, Miramirkhani et al. [104] also propose
D. Challenges two practical methods: (i) cloning real systems to sandboxes,
Existing ML-based malware classification methods face sev- or (ii) simulating user actions on a newly installed system.
eral challenges, i.e., classifier aging problems, data distribution However, since the system continuously evolves and envi-
problems, and virtual environment problems, which lead to ronment cloning is not practicable, it is difficult to build a
less reliable experimental results. virtual environment as similar as possible to the real system.
1) Classifier Aging Problem: ML-based malware classifi- A more radical approach is to construct a Parallel Adversarial
cation methods have been widely used to detect real-world Network (PAN) that accompanies the operation network, as
malware. Although these methods perform well on the train- proposed in [105]. The PAN and the operation network share
ing and testing datasets, their performance may degrade the same physical network resources using network slicing and
as the malware evolves. Such a phenomenon is commonly virtualization techniques. The PAN evolves with the operation
called classifier aging (a.k.a., model degradation or concept network and can be deployed on different kinds of networks.
drift) [94]. The problem is due to changes in the data dis- Therefore, we believe that PAN is suitable as the malware
tribution of real-world malware with the evolution of attack analysis environment.
methods. Malware classifiers trained on old datasets are hard
to generalize to new examples.
To overcome this aging problem, one intuitive way is IV. A DVERSARIAL ATTACKS ON ML-BASED
to retrain the malware classifiers periodically [95], [96], M ALWARE C LASSIFIERS
or to retrain the classifier when the performance starts to ML-based models are widely used in malware classifica-
degrade [97], which is online learning. However, retraining the tion tasks and they have already achieved high performance.
model requires too much human work to collect and label data. However, these ML-based models also introduce additional
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: SURVEY OF ADVERSARIAL ATTACK AND DEFENSE METHODS FOR MALWARE CLASSIFICATION 477
TABLE IV
L IST OF N OTATIONS A. Adversarial Attacks on ML Models
The concept of adversarial attacks is first proposed by
Szegedy et al. [30] in 2014. The authors argue that although
DNN models achieve high performance in multiple fields, their
learning processes are uninterpretable. ML-based models have
blind spots, which can be exploited by attackers to bypass
them. The predictions of models can be significantly changed
by applying tiny perturbations to the original examples. These
perturbed examples are known as adversarial examples, which
are leveraged by attackers to perform adversarial attacks.
1) White-Box Adversarial Attacks: Multiple adversarial
attack methods are proposed in white-box settings.
a) L-BFGS attack: Szegedy et al. [30] first model adver-
sarial example generation as an optimization problem. The
adversarial perturbation can be generated by minimizing both
(i) the scale of perturbation and (ii) the probability that the
victim model makes correct predictions. The proposed method
is known as the box-constrained L-BFGS attack, which uses
linear search strategies to solve an optimization problem:
min c · η + L θ, xadv , ytarget .
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
478 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 25, NO. 1, FIRST QUARTER 2023
e) DeepFool attack: Moosavi-Dezfooli et al. [108] pro- The C&W attack uses a binary search algorithm to auto-
pose the DeepFool attack to accurately measure the robustness matically choose the constant c. To solve the box-constraint
of the victim model and generate adversarial perturbations. problem, three different approaches are applied, including pro-
Suppose a classifier Fk predicts the probability of k classes, jected gradient descent, clipped gradient descent, and change
where Fk = W T x + b. The classification problem is to find of variables. Results show that C&W attack can successfully
the optimal class (k̂ ) with the highest probability: crack defensive distillation [110] method.
2) Black-Box Adversarial Attacks: Multiple adversarial
k̂ = arg max Fk (x ). attack methods are proposed in black-box settings.
k
a) Substitution attack: Papernot et al. [111] propose to
DeepFool first computes the distance between the original
train substitute models for black-box attacks. The proposed
example and the decision boundary. The original example (x)
method is based on the transferability of adversarial examples,
is located inside a convex polyhedron. Let wk be the k th
i.e., the crafted adversarial examples are able to attack other
column of W, the closest adversarial hyperplane (l̂ ) of the
similar models. Attackers choose a set of input examples and
decision boundary is computed as
query the DL-based victim model to label them. Then, they
|Fk (x ) − Fk̂ (x )| train the substitute model on the labeled dataset. The trained
l̂ = arg min .
k =k̂ wk − wk̂ substitute model is a white-box model to fit the black-box vic-
tim model. The authors apply two white-box attack algorithms,
The authors propose that the minimum perturbation is gener- including FGSM [43] and JSMA [109], to craft adversarial
ated by projecting the original example to the hyperplane (l̂ ), examples on the substitute model. Besides, the substitution
which can be computed as attack can also be generalized to ML-based models, such as
|F (x ) − Fk̂ (x )| LR, SVM, K-Nearest Neighbors (KNN), etc.
η = l̂ w − w , b) Boundary attack: Brendel et al. [112] propose the
wl̂ − wk̂ 2 l̂ k̂
boundary attack method. It first crafts a large adversarial exam-
xadv = x + η.
ple and then tries to reduce the scale of the perturbations.
f) JSMA attack: Papernot et al. [109] propose the Specifically, the boundary attack first takes a large step to
Jacobian-based Saliency Map Attack (JSMA) method to craft generate a starting example. Then, it iteratively modifies the
adversarial examples, which solves a targeted optimization example along the decision boundary. In each iteration, the
problem: boundary attack takes an orthogonal step randomly sampled
from Gaussian Distribution and moves towards the original
arg min η, s. t. F (x + η) = ytarget . example. The authors show that the boundary attack is compet-
η
itive with some white-box attack methods and can be applied
The JSMA attack first computes the forward derivative with
to real-world black-box models.
respect to the original example, i.e., the Jacobian of the deci-
c) Optimization attack: Cheng et al. [113] propose
sion function. JSMA then applies chain rule to recursively
a query-efficient black-box attack method based on
compute the derivative of each layer. Suppose the victim
optimization, which only takes hard-label predictions as
DNN model learns a N-dimensional function and the inputs
inputs. Let δ be the search direction. g(δ) represents the
are M-dimensional. Let H and W be the output vectors and
distance between the original and adversarial examples. The
weights of the hidden layers in the victim model. Let b be
attack method is defined as
the bias of neurons and let f be active functions of neurons in
hidden layers. The forward derivative is computed as δ
g(δ) = arg min f x + λ = ytrue .
λ>0 δ
∂F (x ) ∂Fj (x )
F (x ) = = , The attacker searches for the optimal direction δ ∗ that mini-
∂x ∂xi i∈1,...,M ,j ∈1,...,N
mizes the distance function g(δ). The problem can be solved
∂Fj (x ) ∂Hn
= Wn+1,j · using zeroth-order optimization algorithms. Then, the optimal
∂xi ∂xi adversarial example is generated as
∂fn+1,j
× Wn+1,j · Hn + bn+1,j . δ∗
∂xi xadv = x + g(δ ∗ ) .
δ ∗
JSMA then uses the computed derivatives to generate the
Adversarial Saliency Map to represent the influence of input The crafted adversarial examples have higher quality because
features on output values. JSMA uses an iterative algorithm to smaller perturbations are generated. Besides, the authors show
select important features and update adversarial perturbations. that the optimization attack can extend to ML models, such
g) C&W attack: Carlini and Wagner [20] propose the as Gradient Boosting Decision Tree (GBDT).
C&W attack, which is a targeted attack algorithm. Multiple d) SimBA attack: Guo et al. [114] propose the Simple
distance metrics are used to measure scale of perturba- Black-box Attack (SimBA) method. With limited times of
tions, including L0 , L2 , and L∞ . C&W generates adversarial queries, the optimization problem of constrained black-box
perturbations by solving an optimization problem: adversarial attacks can be modified as
arg max L(x + η, ytrue ),
min η + c · L θ, x + η, ytarget , η
s. t. x + η ∈ [0, 1]n . s. t. η < dmax and queries ≤ qmax ,
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: SURVEY OF ADVERSARIAL ATTACK AND DEFENSE METHODS FOR MALWARE CLASSIFICATION 479
where dmax is the maximum size of perturbations and qmax is B. Adversarial Attacks on ML-Based Malware Classifiers
the maximum number of queries. The attack algorithm chooses Generating adversarial malware examples is different from
a step size and a direction δ from a set of orthonormal malware attacks in the ML domain. As described above, in the
candidate vectors Q. SimBA iteratively chooses a perturbation ML domain, attackers modify original examples in a contin-
value of either (x + · δ) or (x − · δ) that can decrease the uous feature space. It is simple to map the modified features
prediction confidence of the victim model. The SimBA attack back to input space because tiny perturbations do not severely
only acquires less than 20 lines of PyTorch code, which is affect the original image. However, the feature space is discrete
simple to implement. (usually binary) in the malware domain. To craft meaningful
e) AdvGAN: Xiao et al. [115] propose the AdvGAN adversarial malware examples, modified features need to be
attack based on the GAN architecture. The generator gen- mapped back to input space while preserving the function-
erates adversarial perturbations to bypass the discriminator. alities of original malware examples, which is known as the
The discriminator learns to distinguish the perturbed examples Inverse-Mapping problem in this paper. Some of the adversar-
from the real ones. The loss function of AdvGAN includes ial attack methods derived from the ML domain can be applied
the adversarial attack loss, the target label loss, and the to generate adversarial malware examples. Researchers also
hinge loss. AdvGAN uses dynamic distillation to calculate propose specific methods based on the properties of malware
the adversarial loss of the victim model in black-box set- examples.
tings. Attackers train the generator by minimizing the loss Figure 4 shows the general process of generating adversarial
function. Mangla et al. [116] extend AdvGAN using the malware examples. Attackers take original malware examples
encoder-decoder architecture and propose the AdvGAN++ as input and aim to generate adversarial examples that can
framework. AdvGAN++ first maps the original examples bypass the victim classifier. As shown in the figure, the black
into the latent space. The generator takes the extracted fea- arrows denote the process of attackers using the victim classi-
tures and noise vectors as inputs and generates adversarial fier to make predictions, which is consistent with the unified
examples. malware classification framework. Specifically, original exam-
f) Zoo attack: Chen et al. [117] propose the Zeroth-Order ples are mapped from the input space into the feature space
Optimization (ZOO) black-box attack. It assumes that attack- through feature collection. After that, attackers use prediction
ers have access to the input data and prediction scores of results and internal information of the victim model to mod-
the victim model. The loss function of the ZOO attack is ify original examples in the feature space and map them back
defined as to the input space. The red arrows denote the process of
attackers generating adversarial examples. According to the
attack knowledge they use, adversarial malware attacks can
max log[F (x )]ytrue − max log[F (x )]i , −k , be categorized into white-box attacks (left) and black-box
i=ytrue
attacks (right). As shown in Table V, we summarize these
attacks according to their attack knowledge, features, attack
where k is a parameter to control the transferability of attacks. algorithms, and inverse-mapping manipulations. Besides, we
Attackers estimate the gradient information of the loss func- compare different types of attack methods in Table VI.
tion. Then, ZOO applies multiple optimization algorithms to 1) White-Box Attacks: In white-box settings, attackers have
iteratively generate adversarial examples based on the esti- full access to the victim model. They can utilize both internal
mated gradient. It is shown that ZOO achieves comparable information and prediction results of the model to generate
performance to white-box attacks. adversarial malware examples.
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
480 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 25, NO. 1, FIRST QUARTER 2023
TABLE V
A DVERSARIAL ATTACK M ETHODS ON ML-BASED M ALWARE C LASSIFIERS
TABLE VI
C OMPARISON A MONG A DVERSARIAL ATTACK M ETHODS ON ML-BASED M ALWARE C LASSIFIERS
a) FGSM-based attacks: Kreuk et al. [35] propose a may break the functionalities of the program. To generate
white-box targeted attack method against the MalConv [14] adversarial binary files, the FGSM [43] method is first used
framework. The authors argue that small changes to raw bytes to acquire gradient information of the victim model and craft
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: SURVEY OF ADVERSARIAL ATTACK AND DEFENSE METHODS FOR MALWARE CLASSIFICATION 481
adversarial examples in the feature space. Suppose the input successful attacks. The optimization problem is modified as:
file x is represented as z in the embedding domain, adversarial i
examples are generated as λ xadv − xi
arg min L(xadv ) = F (xadv ) − k· ,
xadv n h
yi =ben
zadv = z − · sign zadv L θ, zadv , ytarget .
where λ is a hyper-parameter to control the weight of KDE
Then zadv is mapped back to adversarial binary file xadv . To loss, and h is a bandwidth parameter for KDE. The gradi-
preserve the functionalities, the authors propose to generate ent descent attack is evaluated on both image classification
adversarial payloads and inject them to the original exam- and malware classification tasks against ML-based classifiers.
ples. The proposed method is evaluated on a dataset containing Results show that it achieves up to 100% success rates.
7,150 benign files from Windows and 10,866 malicious files Kolosnjaji et al. [118] propose a white-box untargeted attack
from the BIG 2015 dataset [63]. Results show that it achieves against the MalConv [14] malware classifier. The authors also
over 99% attack success rates with few payloads inserted. leverage gradient information to generate adversarial payloads
Al-Dujaili et al. [34] propose a white-box attack method, and then append them to the end of original examples. Let
which uses gradient-based methods to generate adversarial lenmax be the maximum length of the adversarial payload,
examples in the feature space and maps them back to the the optimization problem is formulated as
binary input space. Let S be a set of binary feature vectors,
the attack method is modeled as an optimization problem: arg min F (xadv ), s. t. Dis(x , xadv ) ≤ lenmax .
x adv
xadv = arg max L(θ, x , ytrue ). The authors propose a gradient-based method to solve the
x ∈S optimization problem. To deal with the non-differentiable
property of the embedding layer, they define a line that has
Multiple gradient-based methods, are utilized to solve
the same direction as the gradient. The generated feature-space
the optimization problem, including FGSM and BGA.
vector moves along the line to minimize F (xadv ). Then, the
Specifically, instead of generating perturbations in the fea-
optimal byte is selected when it is closest to the line and
ture space, BGA methods search multiples feasible vertices
reduces the F value. The authors use an iterative algorithm to
and choose the optimal one based on gradient information.
generate adversarial bytes and insert adversarial payloads. The
The selected vertices are used to craft adversarial malware
attack method is evaluated on a dataset containing 9,195 mali-
examples. The proposed attack methods are evaluated on
cious and 4,000 benign examples. Results show that it achieves
DNN-based PE malware classifiers. Results show that they
60% success rates with less than 1% bytes inserted.
achieve up to 99% attack success rates.
Li and Li [52] propose the Mixture of Attacks method,
b) JSMA-based attacks: Grosse et al. [16] propose to
which combines multiple attack algorithms (h) and manipula-
apply the JSMA method [109] for adversarial malware attacks.
tion sets (M) to attack Android malware classifiers. Individual
The proposed method attacks Android malware classifiers,
attack algorithms include gradient-based attacks, gradient-free
which uses multiple Android manifest and code features as
attacks, obfuscation attacks, etc. The proposed attack is based
inputs. Attackers first calculate the Jacobian matrix of the vic-
on the loss function of the victim model, where high loss
tim model to estimate the direction where perturbations affect
values indicate strong attacks. The Max strategy chooses
the predictions. Then, the optimal perturbations are selected
the optimal attack and generates adversarial examples in the
iteratively to craft adversarial examples. To preserve the orig-
feature space by maximizing the loss function:
inal functionalities, only positive changes are performed (e.g.,
adding new features) to original examples. The authors choose ĥ, M̂ = arg max L(F (h(M , x )), ytrue ).
L1 norm to limit the number of modified features. The JSMA- h,M
based attack is evaluated on classifiers trained on the Drebin The Max strategy iteratively updates the adversarial pertur-
dataset. Results show that it achieves 62% success rates and bations. To solve the inverse-mapping problem, it selects
it breaks feature reduction-based defense. both incremental manipulations (e.g., inserting string and junk
c) Optimization-based attacks: Biggio et al. [33] propose codes) and decremental manipulations (e.g., modifying XML
the gradient descent attack, which is a white-box method files). The Mixtures of Attacks method is evaluated on both
using gradient information and the Kernel Density Estimation Drebin [67] and AndroZoo [68] datasets. Results show that it
(KDE) technique to generate adversarial examples. The gradi- achieves 100% attack success rates on DNN-based models.
ent descent attack generates adversarial examples by solving d) Feature-based attacks: Some researchers develop
an optimization problem, i.e., attackers minimize the accu- adversarial malware attacks based on the properties of input
racy of the victim classifier with limited perturbations (dmax ), features of the victim classifiers.
which is formulated as Abusnaina et al. [119] propose the white-box Graph
Embedding and Augmentation (GEA) method to generate
arg min F (xadv ), s. t. η ≤ dmax . adversarial malware examples against graph-based IoT mal-
xadv
ware classifiers. The victim classifiers extract graph-based data
The gradient-based algorithms may fail to converge. To this as input features for malware classification. The GEA attack
end, the KDE loss is added to make generated examples closer generates adversarial perturbations based on graph features
to the benign distribution and thus increase the probabilities of while preserving the functionalities of original examples. GEA
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
482 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 25, NO. 1, FIRST QUARTER 2023
first leverages the Radare2 tool to construct the Control Flow Hu and Tan [121] also propose an attack method against API
Graphs (CFG) of original examples xorigin and target exam- call-based malware classifiers. To preserve the original func-
ples xtarget . Graph embedding combines these two examples tionalities, attackers only insert irrelevant API calls. An RNN
and generates an adversarial example xadv , which maintains model, consisting of a Gumbel-Softmax layer and a sequence
the functionalities of xorigin and misleads the victim model to decoder, is designed to generate adversarial API sequences
predict label ytarget . GEA is evaluated on a deep CNN-based from original sequences. The generative RNN model produces
malware classifier. Results show that it achieves 100% suc- one-hot encoded adversarial sequences and softmax output
cess rates and it can successfully craft adversarial examples vectors. The victim model labels the encoded sequences to
on each IoT malware in the dataset. train an LSTM-based substitute model. The substitute model
Chen et al. [120] propose the EvnAttack framework to is trained to fit the victim model, while the generative model
generate adversarial examples on API call-based malware clas- learns to update the adversarial sequences. The proposed attack
sifiers. It manipulates the feature space of original examples is evaluated on a dataset containing 180,000 examples col-
to craft adversarial examples. Let c i be the cost of modifying lected from the Malwr website. Results show that it can reduce
a specific feature, the evasion cost (C) is calculated as the accuracy of the victim model by over 90%.
Khasawneh et al. [122] propose an attack method against
C (x , xadv ) = c i |xadv
i
− x i |. black-box Hardware-based Malware Detectors (HMDs). The
i authors reverse-engineer HMDs by training a substitute model
to fit them. They first query the victim HMDs to label the input
Adversarial examples are generated with limited evasion cost. examples, which are then used to train the ML-based substitute
The goal of EvnAttack is defined as model. The substitute model provides internal information for
attackers. In order to preserve the functionalities, the authors
F (xadv ) = ben, s. t. C (x , xadv ) ≤ manmax , propose block-level and function-level approaches to dynam-
ically insert instructions into original examples. The weights
where manmax is the maximum number of manipulations. To of the substitute model are leveraged to measure the influence
reduce the evasion cost, the contributions of different APIs are of instructions. Then, attackers select and modify significant
measured by calculating the relevance scores. Then, API calls instructions. Results show that the proposed attack achieves
with higher benign scores are injected, while those with higher nearly 100% success rates.
malicious scores are removed. EvnAttack iteratively manipu- ii) GAN-Based Attacks: Hu and Tan [36] propose the
lates original examples to maximize the loss of victim model MalGAN attack framework based on the GAN architecture.
and minimize the evasion cost. EvnAttack is evaluated on a The black-box victim classifier serves as the discriminator. An
dataset containing 10,000 PE examples with 3,503 API fea- ML-based generator aims to generate adversarial examples to
tures. Results show that it achieves over 97% attack success minimize the accuracy of the victim classifier. A substitute
rates with manmax = 55. detector is also trained to fit the victim classifier. It provides
2) Black-Box Attacks: In black-box settings, attackers have gradient information for the generator. The loss function of
limited knowledge about the victim model. Only prediction the substitute detector is formulated as
results of the victim model are available.
LD = −Exben log(1 − D(x )) − Exmal log D(x ).
a) Substitution-based black-box attacks: Substitution-
based attacks train substitute models to fit the black-box victim Let z be the input noise vector, the loss function of the
models based on the prediction results. Then, adversarial mal- generator is formulated as
ware examples are generated on the substitute models using LG = Exmal log D(G(x , z )).
white-box attack methods.
i) Substitute Model-Based Attacks: Rosenberg et al. [38] MalGAN is evaluated on a dataset containing 180,000 exam-
propose a black-box attack method against API call-based ples. Several ML models are trained as the victim classifiers.
malware classifiers. The authors first utilize a Jacobian-based Results show that MalGAN achieves nearly 100% attack
data augmentation technique for selecting input examples and success rates on all of these classifiers.
minimizing the number of queries. The Gated Recurrent Unit Yuan et al. [37] propose the GAPGAN attack framework
(GRU) serves as the substitute model, which is trained on the based on the GAN architecture. The generator generates adver-
selected examples. Then, the FGSM attack is performed on sarial payloads (aadv ) at the byte level. To preserve the
the substitute model. In order to preserve the functionalities original functionalities, GAPGAN appends these payloads to
of original examples, the authors propose a mimicry attack, the end of the binary files (i.e., xadv = [xmal , aadv ]). The loss
which iteratively chooses adversarial API calls and inserts function of the generator is defined as
them to a random position of the original API sequence, until LG = −(1 − β)Exadv (D(x )) − βEaadv (D(a)).
the generated examples can successfully bypass the victim
Both (xben ) and (xadv ) are used to query the victim classi-
model. The proposed attack is evaluated on several DL-based
fier. The discriminator are trained to fit the classifier using
malware classifiers trained on a dataset containing 500,000
a distance metric. The loss function of the discriminator is
examples. Results show that it achieves up to 100% attack
defined as
success rates. Besides, the crafted adversarial examples can
be transferred to attack other similar classifiers. LD = Exben ,xadv D(x ), F (x ).
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: SURVEY OF ADVERSARIAL ATTACK AND DEFENSE METHODS FOR MALWARE CLASSIFICATION 483
GAPGAN is evaluated on MalConv [14] classifiers trained selects subsets of PDF variants with higher fitness values
on four datasets. Results show that GAPGAN achieves 100% until successful adversarial examples are crafted. It is shown
attack success rates with only 2.5% bytes appended. that the proposed attack achieves 100% success rates on the
b) Pure black-box attacks: Some attackers directly use PDFrate [125] and Hidost [126] classifiers.
prediction results of the victim models to generate adversar- Demetrio et al. [17] model adversarial malware attacks
ial malware examples, without accessing the training data or as an optimization problem with limited query times and
internal information of them. payload length. The Genetic Adversarial Machine learning
i) Optimization-Based Attacks: Some researchers model Malware Attack (GAMMA) framework is proposed to solve
black-box adversarial malware attacks as optimization prob- the optimization problem. GAMMA modifies binary files by
lems. They also propose several algorithms to solve them. inserting adversarial payloads, including padding content and
Rosenberg et al. [40] propose a query-efficient black-box injecting benign sections, in order to preserve the functionali-
attack method, which uses either scores or decisions as attack ties of original examples. GAMMA is evaluated against both
knowledge. Let S be the input domain, the proposed attack MalConv [14] and DT-based malware classifiers trained on
generate perturbations by solving an optimization problem: the EMBER dataset [64]. It can effectively bypass these clas-
sifiers with short payloads and few queries. Moreover, it is
arg min F (x + η) = F (x ), s. t. x + η ∈ S . also shown that the generated adversarial examples can be
η
transferred to attack commercial anti-virus engines.
The proposed method attacks API call-based malware classi- ii) Action-Based Attacks: Some researchers propose to take
fiers. The original APIs are modified using either (i) randomly actions to manipulate malware examples and generate adver-
generated perturbations or (ii) benign perturbations generated sarial examples. Reinforcement Learning algorithms are also
by a trained SeqGAN model [124]. Besides, a backtrack- introduced to guide the attackers to take these actions.
ing algorithm limits the scale of perturbations. For score- Anderson et al. [39] propose to use Reinforcement Learning
based attacks, the authors propose an adaptive Evolutionary (RL) [127] algorithms to generate adversarial examples in pure
Algorithm (EA) to update the adversarial API sequences. The black-box settings. The authors propose to model adversarial
proposed attack is evaluated on several ML-based models malware attacks as the Markov Decision Process (MDP). The
trained on a dataset containing 500,000 PE binary files. Results victim classifier serves as the environment in the RL frame-
show that it achieves 98% and 64% success rates for score- work, which is a ML-based model based on hybrid static
based and decision-based attacks, respectively. Moreover, the features. The agent utilizes deep Q-learning algorithms [128]
proposed attack requires fewer queries and less attack knowl- to select actions to modify original examples and generate
edge. adversarial examples. Rewards from the environment are set
Kucuk and Yan [41] propose a black-box targeted attack as ten for successful attacks and zero for failed attacks. Ten
method against three different PE malware classifiers, which different actions are selected to manipulate malware examples
use opcodes, API calls, and system calls for classification. while keeping their functionalities.
The authors assume that attackers have access to the source Song et al. [42] propose a black-box attack framework
code of malware examples. Attackers also need samples from against real-world anti-virus engines. The authors first define a
the target class, which are called anchor samples. To bypass series of macro-actions and micro-actions. Macro-actions are
opcode-based classifiers, the authors first extract CFGs from large-scale perturbations, while micro-actions are atomic per-
original examples at the LLVM-IR level. The bogus control turbations applied to binary files. A binary rewriter randomly
flow method is utilized to add fake basic blocks to the origi- selects macro-actions to modify the original examples, until an
nal CFGs. The modification leverages always-true conditions, adversarial example is successfully crafted. Then, an action
to preserve the functionalities of original examples. Attackers minimizer iteratively evaluates the effectiveness of macro-
compute the KL divergence to measure the fitness of modi- actions and removes unnecessary ones. A feature interpreter
fied examples. The authors also propose a genetic algorithm to finds the precise reasons for adversarial attacks by breaking
craft adversarial malware examples. Besides, for API call and the selected macro-actions into micro-actions. The proposed
system call-based classifiers, attackers use the same genetic method is evaluated on both ML-based classifiers and com-
algorithm with different modification strategies and fitness mercial anti-virus engines. Results show that the attack method
functions. The proposed method is evaluated on three different achieves 56% success rates on open-source models and up to
victim classifiers. Results show that it achieves 75%, 83.3%, 28.8% success rates on anti-virus engines. Besides, it is also
and 91.7% attack success rates, respectively. shown that the generated adversarial examples are transferable
Xu et al. [123] propose a black-box attack method against among different classifiers.
PDF malware classifiers. Attackers stochastically modify PDF Song et al. [42] extend their action-based method and pro-
malware to generate adversarial variants that can bypass the pose the RL-based MAB-Malware framework. The authors
victim model while preserving the functionalities of original model adversarial malware attack as a Multi-Armed Bandit
examples. The proposed attack needs prediction probabilities (MAB) problem. MAB-Malware chooses action-content pairs
(i.e., soft labels) as attack knowledge. PDF malware examples to modify input examples. The MAB-Malware framework also
are first converted into tree-like structures. Attackers modify includes a binary rewriter and an action minimizer. The mod-
them by inserting and deleting elements in the constructed ified binary rewriter uses the Thompson Sampling algorithm
PDF tree. The genetic programming algorithm iteratively to sample a value and use it to choose optimal action-content
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
484 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 25, NO. 1, FIRST QUARTER 2023
pairs to modify examples iteratively until successful adversar- robustness to defense methods. We believe that developing
ial examples are generated. Then, the action minimizer reduces more better inverse-mapping methods is still an open issue.
the scale of perturbations by replacing a macro-action with
multiple micro-actions. MAB-Malware is evaluated on both V. E NHANCING A DVERSARIAL ROBUSTNESS OF
ML-based models and anti-virus engines. Experimental results ML-BASED M ALWARE C LASSIFIERS
show that it achieves over 97% attack success rates on the Various methods are proposed to enhance ML-based mal-
MalConv [14] classifier. ware classifiers against adversarial attacks. Some of these
methods are derived from the ML domain, where defense
methods are widely studied to enhance ML models, such as
C. Challenges adversarial training, adversarial example detection, ensemble
learning, etc. In the cyber security domain, some researchers
1) Query-Efficiency: In most attack methods, attackers
also leverage the specific properties of malware examples to
need to continuously generate and update adversarial exam-
develop defense methods.
ples based on the results of querying the victim model.
In this section, we first survey commonly-used methods to
However, excessive times of queries are easily detected and
improve the robustness of classifiers in the ML domain. Then
blocked by defenders, which makes the attack methods inef-
we summarize defense methods to enhance the adversarial
fective in real-world scenarios. We observe that only few
robustness of ML-based malware classifiers under the unified
research works have considered the query-efficiency problem
framework introduced in Section II. The notations used in this
when developing adversarial malware attack methods [40].
section are introduced in Table IV.
We believe that query times should be introduced to attack
methods as an additional constraint, especially black-box
attacks. A. Enhancing ML Models
2) Inverse-Mapping Problem: There are huge differences Multiple methods are proposed to enhance the robustness
between adversarial attacks in the ML domain and the cyber of ML models. Based on the defense algorithms they use, we
security domain. Specifically, in the ML domain, the input categorize them into adversarial training, randomization-based
space is the same as the feature space. Thus, attackers can defense, adversarial examples detection, etc.
generate adversarial perturbations in the feature space and 1) Adversarial Training: Szegedy et al. [30] show that
directly add them to original examples. The modified exam- training on both adversarial examples and original examples
ples are likely to be valid and difficult to recognize by humans. can help to improve the robustness of classifiers.
However, in the cyber security domain, the feature space is a) FGSM-based adversarial training: Goodfellow
not directly related to the input space. Since programs follow et al. [43] modify the loss function of the original training
strict structural specifications, inverse-mapping manipulations process of ML-based models by adding an adversarial
are more difficult. Small modifications (even only one byte) objective function based on the FGSM attack. The modified
may cause them to crash or disrupt their behaviors. Therefore, loss function (L ) is formulated as
in order to manipulate original malware examples and pro- L (θ, x , ytrue ) = αL(θ, x , ytrue )
duce usable adversarial examples, it is essential to preserve
+ (1 − α)L(θ, x + sign(x L(θ, x , ytrue ))),
their original functionalities.
Some attack methods only try to craft adversarial examples where α is a hyperparameter to control the weights of the
in the feature space, without keeping their functionalities in training loss and the adversarial loss. Adversarial training
the input space [121]. To address this issue, four input-space minimizes the classification loss when input examples are
constraints are proposed in [129], including available trans- perturbed by attackers. The proposed defense is considered
formations, preserving semantics, plausibility, and robustness a special data augmentation technique that detects and fixes
to preprocessing. To satisfy these constraints, the functional- flaws in the classifier. Kurakin et al. [106] also recommend
ities and semantics of original software examples need to be using the batch normalization technique to apply adversarial
preserved. The generated adversarial examples should be able training to large-scale datasets.
to bypass human analysts and robust to preprocessing-based b) PGD-based adversarial training: Madry et al. [44]
defense. However, these constraints may introduce additional propose a saddle point optimization problem to model the
side-effect features [129]. objective of adversarial robustness. Let S be a set of available
Moreover, it is much more computationally inefficient to perturbations. The optimization problem is defined as
dynamically execute and verify the modified software exam-
ples. To this end, several techniques are proposed to manipu- arg min Ex ,ytrue max L(θ, x + η, ytrue ) .
θ η∈S
late software examples while preserving their functionalities,
such as appending (or injecting) bytes to unused sections [17], As shown above, the saddle point problem defines a clear
[118], basic block-level insertion [41], [122], taking lim- goal for robust classifiers. Specifically, it includes an inner
ited modifying actions [39], [42], etc. However, some of maximization problem and an outer minimization problem.
these manipulations can be easily detected and disabled by The inner maximization problem finds the optimal attack
preprocessing-based defense [129]. These attack methods are against a given victim model, while the outer minimization
not practical in real-world attacks, which means they have poor problem updates the parameters to minimize the adversarial
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: SURVEY OF ADVERSARIAL ATTACK AND DEFENSE METHODS FOR MALWARE CLASSIFICATION 485
loss. It is shown that stronger adversaries and larger models their performance. Some researchers also propose to uti-
make the defense method more effective. Besides, Madry et al. lize distillation to enhance the adversarial robustness of DL
also propose that adversarial training makes the decision models.
boundary more complicated so that the model is more robust Papernot et al. [110] propose the defensive distillation
to adversarial examples. framework to defend against adversarial attacks. The defensive
c) Free adversarial training: Shafahi et al. [130] argue distillation framework consists of an initial network and a dis-
that the PGD-based adversarial training has a high com- tilled network. The initial network (F), which is a DNN-based
putational overhead. They propose a Free Adversarial model, is trained on input examples with prediction labels (i.e.,
Training (FreeAT) method, which uses the calculated gradi- hard labels). Let T be the temperature parameter and let k be
ent information to solve both inner and outer optimization the number of classes, the initial network predicts probability
problems. Specifically, in each iteration, FreeAT first calcu- vectors (y), which is formulated as
lates the gradient value of the protected model and updates the
parameters, in order to solve the outer minimization problem: e yi /T
F (x ) = k −1 yj /T
.
θ = θ − · Ex ,ytrue θ L(θ, x + η, ytrue ). j =0 e i∈0···k −1
Then, FreeAT solves the inner maximization problem by Training examples and soft labels predicted by the initial
updating the adversarial perturbation: network are combined as a new dataset to train the distilled
network (F d ). The distilled network has the same architec-
η = η − · x L(θ, x + η, ytrue ).
ture as the initial network. Soft labels provide more detailed
d) Fast adversarial training: Wong et al. [131] propose information to train the distilled model and help smooth the
the Fast Adversarial Training (FastAT) method, which com- decision boundary, which reduces the gradient information
bines FGSM-based adversarial training with random initializa- exploited by adversarial malware attacks.
tion, in order to acquire stronger adversaries and train more 3) Randomization-Based Defense: Some researchers use
robust models. In each iteration, a perturbation is randomly randomization-based techniques to increase the difficulties of
initialized as adversarial attacks. Xie et al. [134] propose that random-
izing input examples before they are fed into the victim
η = Uniform(−, ).
model can enhance the model against adversarial attacks. The
Then, the FGSM method is used to update the perturba- defense method adds several randomization layers to the orig-
tion. The parameters of the protected model are updated inal model, including random resizing layers, random padding
based on the crafted adversarial examples using the Stochastic layers, and random selection layers. Input examples are first
Gradient Descent (SGD) algorithm. Experimental results show sent into these layers to produce randomly transformed exam-
that FastAT achieves comparable accuracy to FreeAT and ples before they are fed into the original model. The authors
PGD-based defense methods with lower computational costs. argue that the randomization-based defense does not require
e) YOPO adversarial training: Zhang et al. [132] pro- retraining the model, which is more efficient.
pose the You Only Propagate Once (YOPO) algorithm to 4) Adversarial Example Detection: Xu et al. [135] propose
reduce the high computational costs of adversarial training. the feature squeezing method to identify adversarial exam-
The authors observe that adversarial perturbations are only ples and enhance DL-based models. Two different techniques
coupled with the first layer of the neural network. YOPO are used to squeeze the feature space of input examples,
algorithm uses a slack variable to iteratively update adversar- including squeezing color depth and spatial smoothing. Feature
ial perturbations, without requiring full gradient calculations. squeezing reduces the searching space that is leveraged by
After several times of updating adversarial perturbations, one attackers to generate adversarial perturbations. Adversarial
full forward and backward propagation is computed to update examples can be identified by measuring the difference
the parameters of the model. YOPO achieves comparable between predictions on original examples and squeezed ones.
accuracy to the PGD-based defense with less time overhead. The authors also propose that combining adversarial training
f) Rob-GAN: GAN architecture is used to enhance the with feature squeezing can further improve the effectiveness
adversarial robustness of ML models. Liu and Hsieh [48] pro- of both defense methods.
pose the Rob-GAN framework. During the training phase, the Meng and Chen [136] propose the MagNet framework to
generator aims to generate fake examples that can fool the dis- detect adversarial examples and enhance classifiers. The core
criminator. An attacker generates adversarial examples using component of MagNet is a detection model, which determines
the PGD attack. The discriminator learns to distinguish both an input example is either adversarial or clean. The authors
fake and adversarial examples from clean ones. Experimental propose two main causes of adversarial attacks, which are
results show that Rob-GAN effectively enhances the adversar- (i) examples that are not related to the task, and (ii) exam-
ial robustness of the discriminator model. Besides, adversarial ples that are too close to the decision boundary. Two different
training makes the GAN network faster to train. detectors are used to identify these two types of adversar-
2) Defensive Distillation: Model distillation [133] is ial examples. One detector uses the autoencoder architecture
proposed to transfer knowledge from a large teacher model to learn the distribution of clean examples. Another detec-
to a small student model. The distillation technique com- tor uses probability-divergence to identify unrelated examples.
presses the scale of the teacher models while preserving Examples that pass these two detectors are regarded as clean
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
486 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 25, NO. 1, FIRST QUARTER 2023
Fig. 5. Different types of defense methods to enhance the adversarial robustness of ML-based malware classifiers.
ones. MagNet can effectively defend against white-box attacks, the effectiveness of ensemble learning-based defense against
such as DeepFool, C&W, and FGSM. adversarial attacks. Three kinds of ensemble defense methods
5) Adversarial Example Purification: Naseer et al. [137] are selected, including (i) multiple feature squeezing tech-
propose a self-supervised-based defense framework to remove niques, (ii) the Specialists+1 ensemble defense, and (iii) an
adversarial perturbations from input examples. The Self- ensemble of three DL-based classifiers. All these defense
Supervised Perturbation (SSP) method automatically generates methods are evaluated on white-box attacks such as DeepFool
adversarial perturbations using the FGSM attack. SSP pro- and FGSM. Experimental results show that these attack meth-
vides a supervision signal to train the Neural Representation ods can successfully generate adversarial examples with low
Purifier (NRP), which learns to automatically remove adver- costs, which indicates that an ensemble of weak classifiers
sarial perturbations and recover clean examples. Specifically, cannot provide a more robust classifier.
NRP consists of three basic components: (i) a generator that Tramèr et al. [46] propose the ensemble adversarial learn-
recovers adversarial examples, (ii) a fixed feature extractor ing method to enhance the robustness of DL-based models. It
that generates feature vectors, and (iii) a discriminator that augments training data with adversarial examples crafted on
distinguishes adversarial examples from clean ones. The NRP other pre-trained models, in order to decouple the data argu-
method is easy to deploy and able to deal with unseen attacks. mentation process from the victim model. Then, the proposed
Samangouei et al. [47] propose the Defense-GAN frame- defense method uses the generated examples to update the
work to learn the distribution of clean examples and recover protected model by solving the saddle-point optimization
adversarial examples to clean ones. Defense-GAN utilizes the problem. Ensemble adversarial learning leverages the trans-
WGAN framework [138] as the generative model. The gen- ferability of adversarial examples to enhance the robustness
erator aims to minimize the reconstruction loss and learn the of the protected model.
distribution of clean examples. The discriminator aims to dis-
tinguishes natural input examples from artificially crafted ones.
The trained generator can reduce perturbations of adversarial B. Enhancing ML-Based Malware Classifiers
examples, which have different distributions from clean ones. Several defense methods are proposed to improve the adver-
In the deployment phase, input examples are first fed into the sarial robustness of malware classifiers. These methods aim to
generator to produce purified examples and then send into the enhance different phases of malware classifiers, as summa-
protected classifier. It is shown that Defense-GAN has little rized in Section II. It is worth mentioning that the feature
impact on the performance of the protected classifier. collection phase is not enhanced since defenders should not
6) Ensemble Learning: Abbasi and Gagné [45] propose change the types of features used by malware classifiers. As
the Specialists+1 defense method, which uses an ensemble shown in Figure 5, we classify these defense methods into
of diverse ML-based classifiers to improve the adversarial four categories, including preprocessing-based, feature-based,
robustness. The authors propose to use confusion matri- classification-based, and decision-based methods. We summa-
ces to select feature subsets and train multiple classifiers. rize these methods in Table VII and compare different types
The predictions of individual classifiers are combined by of defense methods in Table VIII.
voting. 1) Preprocessing-Based Defense: Some defense methods
However, it is worth noting that ensemble learning does not detect adversarial examples in the data preprocessing phase
necessarily lead to better robustness. He et al. [139] evaluate before they are fed to the malware classifiers.
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: SURVEY OF ADVERSARIAL ATTACK AND DEFENSE METHODS FOR MALWARE CLASSIFICATION 487
TABLE VII
M ETHODS TO E NHANCE ROBUSTNESS OF M ALWARE C LASSIFIERS
TABLE VIII
C OMPARISON A MONG D EFENSE M ETHODS TO E NHANCE ML-BASED M ALWARE C LASSIFIERS
Smutz and Stavrou [49] propose to identify adversarial results, which are then integrated into a final prediction by
examples by measuring the performance of ensemble clas- voting. The mutual agreement analysis method is proposed
sifiers. Multiple individual classifiers predict classification to measure the diversity of predictions made by individual
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
488 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 25, NO. 1, FIRST QUARTER 2023
classifiers. In addition to the original prediction result, the b) Feature reduction: Wang et al. [19] propose the
ensemble model produces a mutual agreement rate (A) to mea- Random Feature Nullification (RFN) method to enhance deep
sure the uncertainty of the prediction. Let v be votes for the models against gradient-based adversarial attacks. An addi-
classes, A = |v −0.5|∗2. A prediction is considered to be sus- tional layer is added between input examples and the original
picious when A is below a specific threshold. The proposed network to randomly choose and nullify input features. A
method is evaluated on both PDFrate and Drebin malware binary matrix (I), which has the same dimensions as the input
classifiers. Experimental results show that the defense method features, serves the nullification mask matrix. Element-wise
reduces the effectiveness of gradient descent and KDE-based multiplication of x and I is calculated to produce nullified
attacks [33]. features. The classifier uses the nullified features for malware
Chen et al. [140] propose a two-phase KuafuDet defense classification. The objective function of training a DL-based
framework to detect adversarial malware examples and use model with feature nullification can be modified as
them to retrain the model, in order to improve the adversar- arg min L(θ, x · I , ytrue ).
ial robustness. 564 syntax and semantic features are collected θ
from the input Android app examples, and 195 features The optimization problem can be solved by the SGD algo-
among them are selected as significant features for malware rithm. The nullification mask is fixed in each iteration, which
classification. Input examples are first fed into multiple ML- makes it easier to compute the derivatives and update the
based classifiers. Then, negative examples are sent into the parameters of ML-based classifiers. The authors argue that
Camouflage Detector, which is able to identify adversarial mal- RFN makes the model more non-deterministic and thus the
ware examples. The detector measures the similarities between effectiveness of adversarial examples is reduced. The proposed
input examples based on the Jaccard index, the Jaccard-weight defense can be applied to different tasks such as image classi-
similarity, and the Cosine similarity. The detected adversarial fication and malware classification. RFN is evaluated on DNN
examples are further used to retrain the classifiers, in order models trained on a malware dataset containing over 30,000
to improve their adversarial robustness. The KuafuDet frame- examples. Results show that it achieves over 60% accuracy
work is evaluated on a dataset containing 252,900 Android app against adversarial examples. RFN is simple to implement and
examples downloaded from the Internet. Experimental results it can also be combined with other defense methods such as
show that it can reduce FNR and improve accuracy by over adversarial training.
15% with high efficiency. c) Robust feature extraction: Azmoodeh et al. [142] pro-
2) Feature-Based Defense: Some defense methods enhance pose a robust IoT malware classification method based on
the adversarial robustness of ML-based malware classifiers in opcode sequences. CFGs are first extracted to represent the
the feature extraction phase. relation between opcodes. The vertices (V) of CFGs are names
a) Feature selection: Zhang et al. [50] propose an adver- of opcodes. The edge values (E) are updated by a heuristic
sarial feature selection method, which uses reduced feature algorithm. Suppose s and t are indexes of opcode Vi and Vj
sets for classification, in order to improve both the robustness in an opcode sequence, then Ei,j is computed as
and generalization capabilities of ML-based malware classi- 2
fiers. The authors propose that the optimal adversarial example Ei,j = min(|s−t−1|)
.
∗ ) is defined as s,t
1+α·e
(xadv
∗ The computed edge values are the distance between differ-
xadv = arg min Dis(x , xadv ).
xadv ent opcodes. Extracted CFGs are then converted into feature
Let δ be a binary feature selection vector, where ‘1’ means the vectors using the graph embedding method. The authors pro-
feature is selected. The optimal feature selection vector (δ ∗ ) pose to use a CNN-based model for malware classification.
can be generated as The defense method is evaluated on an IoT software dataset
collected from VirusTotal containing 1,206 examples. Results
δ ∗ = arg max Gen(δ) + λ · Rob(δ),
δ show that it achieves over 99% accuracy and is robust against
where Gen and Rob measure the generalization and robustness adversarial attacks based on junk code insertion.
capabilities of the classifier with the feature selection vector δ. Tong et al. [141] propose to use conserved features for clas-
Specifically, Gen(δ) is defined as the classification accuracy sification to improve the adversarial robustness of malware
of the model trained on selected features. The robustness term classifiers. Conserved features, which are directly related to the
Rob(δ) maximizes the evasion cost of the optimal adversarial malicious functionalities of original examples, are extracted to
example (xδ∗ ), which is formulated as argument the features for classification. The authors propose
an algorithm to automatically identify these conserved fea-
Rob(δ) = Exmal Dis(xδ , xδ∗ ).
tures. The defense method revises the process of adversarial
The proposed method can be implemented on ML-based clas- training by keeping the conserved features unmodified when
sifiers using the Wrapper-based Adversarial Feature Selection generating adversarial examples and updating the protected
(WAFS) algorithm. The defense method is evaluated on ML- classifiers. The proposed defense method is evaluated on ML-
based spam detectors and PDF malware classifiers. WAFS based PDF malware classifiers against multiple attacks, such as
significantly enhances the adversarial robustness of the mod- EvadeML [123] and MalGAN [36]. Experimental results show
els against both white-box and black-box attacks, and the that the enhanced classifiers achieve up to 100% accuracy,
generalization capability of the models is also improved. exceeding the original adversarial training.
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: SURVEY OF ADVERSARIAL ATTACK AND DEFENSE METHODS FOR MALWARE CLASSIFICATION 489
3) Classification-Based Defense: Some defense methods both malware classifiers and image classifiers against white-
work in the classification phase. They aim to modify the victim box FGSM and JSMA attacks. Results show that the enhanced
classifiers to acquire more robust models. classifier achieves over 80% accuracy on identifying adver-
a) Adversarial training: As mentioned above, sarial malware examples. The proposed defense method also
Chen et al. [120] propose the EvnAttack framework, significantly increases the costs of attackers. Over 150% addi-
which calculates the contributions of different input features tional perturbations need to be generated to craft adversarial
to craft strong adversarial examples. To defend against examples.
the EvnAttack, the authors also propose the SecDefender Li et al. [143] propose a robust Android malware detection
framework to enhance the adversarial robustness of malware framework, combing the Variational Autoencoder (VAE) and
classifiers. SecDefender iteratively retrains the model with the Multi-Layer Perceptron (MLP) to classify malware exam-
both original examples and adversarial examples, until no ples. The FD-VAE framework consists of an encoder model
successful adversarial example can be generated. The authors (Enc) and a decoder model (Dec). The encoder compresses
also propose to add a security regularization term to the input features into Gaussian Distribution with mean (μ) and
original loss function, which aims to maximize the costs variance (σ). The decoder samples from the distribution and
of the strongest adversarial attack. Malware classifiers are reconstructs feature vectors to minimize the reconstruction
retrained using the modified objective function. Experimental loss, which is defined as
results show that SecDefender reduces the effectiveness of
white-box EvnAttack methods. L1 = Exben x log Dec(Enc(x ))
Al-Dujaili et al. [34] propose the SLEIPNIR defense frame- + (1 − x ) log 1 − Dec(Enc(x )) .
work against adversarial malware attacks, which leverages
The KL divergence is used to measure the difference
adversarial training to solve the saddle-point optimization
between the compressed Gaussian Distribution and the union
problem. For the inner-maximization problem, several white-
Gaussian Distribution, which is computed as
box attack methods based on FGSM and BGA are performed
to craft optimal adversarial examples. For the outer- L2 = KL(N (μ, σ), N (0, 1)).
minimization problem, SLEIPNIR uses adversarial training to
retrain the classifier to make correct predictions. The modified The authors also propose the disentangle loss function to
saddle-point optimization problem is defined as distinguish between clean and adversarial examples:
arg min Exadv ,xben max L(θ, xadv , mal ) + L(θ, xben , ben) . L3 = Exi ,xj Dis xi , xj , where
θ 2
μi − μj , yi = yj
The outer-minimization problem can be solved by gradient- Dis xi , xj = 2
max k − μi − μj , 0 , yi = yj
based algorithms such as SGD. The SLEIPNIR framework
is evaluated on a PE software dataset against several white- The overall loss function of the FD-VAE is computed as
box attack methods. Experimental results show that it can LFD−VAE = λ1 L1 + λ2 L2 + λ3 L3 . A trained FD-VAE
significantly reduce attack success rates. model serves as both the classifier and the feature extractor.
b) Defensive distillation: Grosse et al. [16] evaluate the It predicts adversarial examples when the reconstruction loss
effectiveness of defensive distillation and adversarial train- exceeds a threshold value. The encoder of FD-VAE generates
ing methods against JSMA-based adversarial malware attacks. compressed features, which are sent to the MLP model for
Experimental results show that defensive distillation has some classification. Prediction results from FD-VAE and MLP are
positive effects on robustness. However, the defensive distilla- combined using the OR function. The proposed defense frame-
tion defense also leads to a significant decrease in classification work is evaluated on the AndroZoo dataset containing 58,447
accuracy. For the adversarial training method, experimental Android apps. Results show that it improves the adversarial
results show that it significantly reduces the attack success robustness against both white-box and black-box attacks.
rates. The authors argue that adversarial training is more d) Verifiable robustness: Chen et al. [144] propose to use
effective than defensive distillation to enhance the adversarial verifiable robustness properties to train ML-based PDF mal-
robustness of ML-based malware classifiers. ware classifiers. Attackers aim to solve a search problem and
c) Attack detection: Based on the insight that adversar- perform modifications on original PDF files to craft adversar-
ial examples have a different distribution from clean ones, ial examples. Such modifications are based on the tree-like
Grosse et al. [145] propose to use statistical tests to identify structure of PDF files, such as deleting or inserting nodes into
adversarial examples. Statistical hypothesis testing is utilized the PDF tree. The robustness properties are used to ensure the
to determine whether two examples are sampled from the same worst-case robustness of the victim classifiers and increase the
distribution. Maximum Mean Discrepancy (MMD) and Energy search costs of adversarial attacks. The subtree distance metric
Distance (ED) methods are also leveraged to measure the dis- measures the difference between PDF trees and thus bounds
tance between different distributions. 50 adversarial examples the modifications performed by attackers. The distance metric
are extracted from each class to run statistical tests to detect of two PDF examples is defined as
the diversity of different distributions. The classification model
is also modified with an additional output value to identify Dis x , x = num of (rootx .subtrees ∪ rootx .subtrees)
adversarial examples. The proposed method is evaluated on − (rootx .subtrees ∩ rootx .subtrees) .
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
490 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 25, NO. 1, FIRST QUARTER 2023
Subtrees of the two PDF examples are extracted with feature sets. RHMD randomly selects these trained classifiers
depth = 1. The metric defines a series of properties to mea- to make predictions. The authors propose to use the Probably
sure the robustness of classifiers within a limited distance. Let Approximately Correct (PAC) learnability theory to evaluate
S be the input set, robust models can be trained using these the effectiveness of RHMD. It is proved that the diversity
properties by solving a saddle-point optimization problem: of classifiers makes it harder to reserve-engineer the malware
classifiers, as well as craft adversarial examples.
arg min L(θ, x , ytrue ) + max L(θ, xadv , ytrue ) . Chen et al. [146] propose the SecureDroid framework
θ xadv ∈S
to improve the robustness of Android malware classifiers.
The defense method is evaluated on several ML-based models SecureDroid first extracts binary feature vectors from input
trained on the Contagio dataset containing 19,000 PDF exam- Android app examples. A feature selection method known as
ples. The authors define the Verified Robust Accuracy (VRA) SecCLS is used to analyze the importance of input features
to measure the accuracy of input examples classified cor- based on (i) the cost (c) of modifying them, and (ii) their
rectly within a limited distance. Results show that the defense contributions (weights w) to the classification problem. The
method achieves over 92% VRA and over 99% accuracy. importance value (I) of feature i is computed as
4) Decision-Based Defense: Stokes et al. [51] propose to |wi |
use ensemble learning to enhance the robustness of ML-based I (i ) ∝ .
ci
malware classifiers, which aims to increase the cost of adver-
sarial attacks. The key idea is that although it is easy for SecCLS aims to increase the modifications of adversarial
attackers to generate adversarial examples against a single attacks. The authors propose that a secure classifier can be
classifier, it is much more difficult to attack an ensemble of trained by reducing the probability that important features are
multiple classifiers with different properties. In the proposed selected. The probability of selecting feature i is inversely
defense, predictions of the individual classifiers are aggre- proportional to its importance value:
gated by voting with a threshold value of 50%. The ensemble 1
defense is evaluated on DNN-based models trained on a P (i ) ∝ .
I (i )
dataset containing over 2.3M software examples. Experimental
Feature subsets are selected to train multiple classifiers using
results show that it significantly reduces the attack suc-
SecCLS. An ensemble learning approach called SecENS is
cess rates of white-box gradient-based attacks. Moreover, the
proposed to aggregate predictions of individual classifiers.
authors also argue that increasing the number of individual
In SecENS, individual classifiers use unique feature subsets,
classifiers can increase the difficulty of adversarial attacks.
while the ensemble of them covers the whole feature space.
Li and Li [52] propose the adversarial deep ensemble
The SecureDroid framework is evaluated on datasets collected
defense method, which performs the Max strategy to gen-
from Comodo. Results show that SecureDroid achieves over
erate optimal adversarial examples and uses them to retrain
80% accuracy with 50% of features modified.
the ensemble of multiple classifiers. Let F be the function of
Li and Li [147] propose a robust malware classification
a deep ensemble, the adversarial deep ensemble method can
framework working in multiple phases. Input examples are
be modeled as a saddle-point optimization problem, which is
first checked and revised in the preprocessing phase. An
defined as
encoder-decoder architecture is designed to learn semantics-
arg min L(F (x ), ytrue ) + max L(F (h(M , x )), ytrue ) . preserving representations as input features. The framework
θ h,M uses an ensemble of multiple different malware classifiers to
The ensemble model has better generalization capabilities make final predictions. Each model is trained on a randomly
when the individual classifiers are effective and independent. selected subset of the generated features. Adversarial training
The authors also propose to retrain the individual classifiers is utilized to improve their robustness. The authors apply the
using adversarial examples generated by different attack meth- proposed framework to the AICS 2019’ challenge and results
ods. The optimization problem can be solved by the Lagrange show that it achieves 93% accuracy.
multiplier with the parameters of individual classifiers frozen.
The defense method is evaluated on DL-based classifiers C. Challenges
trained on Android Drebin and AndroZoo datasets. Results 1) Evolution of Attackers: The game between attackers and
show that it is effective against multiple attack methods. defenders continues. Some of the defense methods are no
5) Hybrid Defense: Some defense methods work in longer effective with stronger attacks being proposed. For
multiple phases in the unified malware classification frame- example, in the ML domain, the defensive distillation tech-
work. nique is cracked by the C&W attack [20]. Another example is
Khasawneh et al. [122] propose the resilient HMD (RHMD) that six gradient obfuscation-based defenses proposed at ICLR
method, which enhances the adversarial robustness of HMDs 2017 are cracked by [148] in 2018. Carlini and Wagner [149]
against substitution-based adversarial attacks. RHMD uses use the C&W attack to crack ten defense methods.
two kinds of randomization-based strategies including ran- Similar problems also arise in the cyber security domain. To
dom collection and random features, in order to increase the verify the effectiveness of existing defenses, several methods
costs of adversarial attacks. Multiple malware classifiers are including defensive distillation, adversarial training, ensemble
trained using different data collection periods and different learning, and random feature nullification are systematically
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: SURVEY OF ADVERSARIAL ATTACK AND DEFENSE METHODS FOR MALWARE CLASSIFICATION 491
evaluated in [150]. Both white-box and black-box attack meth- generated adversarial perturbations are hard to be detected
ods, such as FGSM, BGA, JSMA, and MalGAN, perform or removed by defenders. We believe that developing more
adversarial attacks against these defense methods. Evaluation robust inverse-mapping manipulations is still an open issue.
results show that only adversarial training can significantly 3) Aiming at Stronger Classifiers: Different types of mal-
improve robustness against most adversarial attacks, while ware classifiers are used as victim classifiers to evaluate
some other methods hardly contribute to adversarial robust- the effectiveness of adversarial attack methods. Researchers
ness. select basic ML-based models [33], [36], DL-based mod-
2) Provability and Interpretability: It is worth noting that els [34], [119], malware classifiers proposed in other research
most of the surveyed defense methods are experimentally works [37], [123], and even anti-virus engines [42] as victim
shown to be effective but not mathematically proved to be models. We believe that evaluating attack methods on stronger
robust. This leads to potential unknown vulnerabilities in ML- malware classifiers produces more convincing results. Some
based malware classifiers. Therefore, we believe that currently of the malware classifiers achieve high performance and they
effective defense methods may be cracked by new attack meth- are widely studied by other researchers [14], [67]. Besides,
ods in the future, thus it is essential to develop provable commercial anti-virus engines are black-box malware clas-
defense methods. Besides, due to the weak interpretability of sifiers designed for detecting real-world malware examples.
ML models (especially DL models), the reasoning processes Bypassing such classifiers indicates that the evaluated attack
of them are not clear. Some irrelevant features may be auto- method is able to generate usable adversarial malware exam-
matically extracted and learned as significant features. This ples in real-world settings. Moreover, these classifiers provide
makes it difficult to apply some of the existing ML-based a unified standard for evaluating attack methods.
malware classification methods in practice. Therefore, we pro-
pose to develop ML-based malware classifiers with better
interpretability. B. Defenders
1) Robustness and Accuracy: Existing malware classifica-
VI. F UTURE W ORK D IRECTIONS tion methods have achieved good performance. However, in
order to apply these methods for real-world detection tasks,
In this section, we discuss some future work directions they should be robust to adversarial malware attacks. As
of malware classification domain from the perspectives of proposed in [151], perfect classifiers cannot be bypassed and
attackers and defenders, respectively. adversarial examples are inputs inaccurately classified by the
classifiers. So enhancing the adversarial robustness of clas-
A. Attackers sifiers is essentially improving their performance. Moreover,
1) Generating Usable Adversarial Examples: As men- in real-world cyber attacks, attackers usually have plenty of
tioned in Section IV, attackers need to manipulate malware computing resources to generate adversarial malware exam-
examples in the features space and map them back to the ples. A single successfully created adversarial example can
input space. It is more complex to generate adversarial exam- cause severe damage to the target system. Therefore, we rec-
ples in the malware domain than in the traditional ML domain, ommend considering not only accuracy but also adversarial
because (i) the input space of malware examples is discrete robustness when designing malware classification methods.
(usually binary), and (ii) the functionalities of original malware 2) Large-Scale Malware Classification Datasets: Some
examples should be preserved. In the ML domain, adversarial researchers evaluate their malware classification methods on
attacks add small perturbations to original image examples to datasets manually collected from online malware reposito-
generate adversarial image examples. These modified exam- ries such as VirusShare and VirusTotal [23], [28], [29], while
ples are still valid images because small perturbations are others use open-source malware datasets [25], [70], [74],
hardly noticed by humans. However, for malware examples, a [76]. Open-source datasets provide unified criteria for evalu-
modification of even one single bit can significantly affect the ating different malware classification methods. Some of these
original functionalities. In order to generate usable adversar- datasets extract high-level features of original software exam-
ial examples, we believe that preserving the functionalities of ples, such as Malimg [66], Ember [64] and Drebin [67]. Some
original examples should be considered as the first principle datasets provide both processed binary files and extracted fea-
when developing adversarial malware attacks. tures, such as SoReL-20M [65] and BIG 2015 [63] datasets.
2) Better Inverse-Mapping Manipulations: Various Both executable malware examples and benign software exam-
inverse-mapping manipulations are proposed to modify ples are difficult to collect and release due to legal issues and
malware examples and generate adversarial examples. Some intellectual property protection. Moreover, as reported in [99]
attacks add adversarial perturbations to unused parts of the and [100], there are several biases in existing malware datasets,
executable files at the byte level, such as appending byes which may lead to unreliable experimental results. We recom-
and inserting bytes into unused sections [35], [37]. Some mend considering data distribution biases when constructing
attacks take limited actions to modify original examples [34], new malware classification datasets.
[39], [42]. Some attacks leverage the internal information of 3) Adapting to Real-World Settings: To deploy malware
malware examples to add more stealthy perturbations [17], classification methods in real-world settings, these methods
[41], [122]. In addition, to effectively bypass malware classi- need to process software examples in real-time. As introduced
fiers, attack methods need better robustness, which means the in Section III, static-feature-based methods are more efficient,
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
492 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 25, NO. 1, FIRST QUARTER 2023
while dynamic-feature-based methods can detect malicious attacks because a perfectly robust model should make correct
behaviors. There is a trade-off between security and availabil- predictions on every possible input examples, while attackers
ity. As a part of the defense system, malware classifiers should only need to exploit one vulnerability to perform a success-
not significantly affect the normal operation of the protected ful adversarial attack. To prove the adversarial robustness of
system. We recommend using a hybrid classification method, malware classifiers, one possible method is brute-force traver-
combining both static and dynamic methods. Static methods sal. However, it is impossible to examine the prediction of
can be used as a preliminary detector, and only suspicious each example on the huge (almost infinite) input space. To
examples are fed into dynamic classifiers for further analysis. address this issue, ML-based malware classifiers should have
In addition to availability, it is also worth noting that better interpretability and even give explanations for making
some systems have limited resources. For example, some IoT specific predictions. Basic ML models such as DT and LR
devices have limited storage space and computing power. can be used as malware classifiers due to their solid basis
Mobile devices are powered by batteries and their energy in mathematics. Moreover, in the ML domain, researchers
consumption should be considered. Some model compression propose various methods to improve the interpretability of
and acceleration methods, such as model distillation [133], ML models [155]. In the malware domain, some methods
network quantization [152] and hardware-based model acceler- achieve verifiable robustness of malware classifiers based on
ation [153], can help to apply ML-based malware classification the properties of specific file types [144]. However, such meth-
methods to such systems. ods are difficult to extend to other malware classifiers. We
4) Virtual Analysis Environments: As mentioned in believe that improving the interpretability and providing prov-
Section III, for dynamic feature-based malware classification able robustness of ML-based malware classifiers are still open
methods, input examples are analyzed in virtual environments issues.
and then their behavioral features are collected. Some
malware examples are able to identify artificially created VII. C ONCLUSION
environments and hide their malicious behaviors. Some ML techniques significantly improve the capabilities of
researchers propose cloning real environments or simulating malware classifiers but also introduce security risks to the
user behaviors to build virtual analysis environments [104]. cyber security domain. In this paper, we summarize a uni-
However, such methods cannot adapt to the evolution of the fied malware classification framework and use the framework
protected system, and cloning and simulating operations have to survey the Defense-Attack-Enhanced-Defense process of
high costs. We believe that it is still an open issue to build ML-based malware classification. Specifically, we first sur-
more realistic virtual environments that can (i) dynamically vey ML-based malware classification methods, which are
evolve with the real system, and (ii) apply to different types categorized into static and dynamic methods according to
of systems. the input features. Then, we survey adversarial attack meth-
One promising solution is constructing a Parallel ods on ML-based malware classifiers in both white-box and
Adversarial Network (PAN) that accompanies the pro- black-box attack settings. We also survey defense meth-
tected network [105]. PAN shares the same resources with ods to enhance the robustness of malware classifiers against
the protected network and evolves with it. Another advantage adversarial attacks, which work in different phases of the uni-
of PAN is that it can be extended to different types of private fied framework. Finally, we summarize the main challenges
networks, such as industrial networks and corporate networks. faced by both attackers and defenders and further discuss
5) Immunology-Inspired Defense Framework: The abilities some promising future work directions. According to the
of both attackers and defenders evolve over time. Defenders surveyed literature, the capabilities of attackers and defend-
need to predict the evolution of attackers and deploy defense ers evolve in the battle of ML-based malware classification.
strategies before new attacks arrive. However, predicting We hope this study will lead to further discussions on the
unknown attacks is difficult because defenders usually lack evolution of attack and defense of ML-based malware classi-
the information of attackers. fication and finally realize more robust malware classification
To address this issue, Yu et al. [105] acquire several inspira- methods.
tions from the immune system and propose that the capabilities
of defenders can be enhanced by contesting with attackers in R EFERENCES
real environments and evolving attack and defense strategies
[1] H. S. Lallie et al., “Cyber security in the age of COVID-19: A timeline
dynamically. To enhance the adversarial robustness of a mal- and analysis of cyber-crime and cyber-attacks during the pandemic,”
ware classifier, defenders can deploy an identical classifier Comput. Security, vol. 105, Jun. 2021, Art. no. 102248.
in the PAN and continuously perform different adversar- [2] V. Hassija, V. Chamola, V. Saxena, D. Jain, P. Goyal, and B. Sikdar, “A
survey on IoT security: Application areas, security threats, and solution
ial attack methods on it. Successfully generated adversarial architectures,” IEEE Access, vol. 7, pp. 82721–82743, 2019.
examples indicate vulnerabilities in the protected classifier. [3] Y. Li, Q. Luo, J. Liu, H. Guo, and N. Kato, “TSP security in intelli-
Defense capabilities further evolve by applying defense algo- gent and connected vehicles: Challenges and solutions,” IEEE Wireless
Commun., vol. 26, no. 3, pp. 125–131, Jun. 2019.
rithms such as adversarial training. The protected classifier is [4] Q. Luo, J. Liu, J. Wang, Y. Tan, Y. Cao, and N. Kato, “Automatic con-
enhanced in the continuous game of adversarial attack and tent inspection and forensics for children android apps,” IEEE Internet
defense. Things J., vol. 7, no. 8, pp. 7123–7134, Aug. 2020.
[5] (IBM Corp., Armonk, NY, USA). X-Force Threat Intelligence
6) Interpretable Models and Provable Robustness: As Index 2022. (Feb. 2022). [Online]. Available: https://www.ibm.com/
proposed in [154], it is hard to defend against adversarial downloads/cas/ADLMYLAZ
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: SURVEY OF ADVERSARIAL ATTACK AND DEFENSE METHODS FOR MALWARE CLASSIFICATION 493
[6] N. Provos, D. McNamee, P. Mavrommatis, K. Wang, and [32] Z. Lin, Y. Shi, and Z. Xue, “IDSGAN: Generative adversarial
N. Modadugu, “The ghost in the browser analysis of Web-based mal- networks for attack generation against intrusion detection,” 2018,
ware,” in Proc. USENIX HotBots, Cambridge, MA, USA, Apr. 2007, arXiv:1809.02077.
pp. 1–9. [33] B. Biggio et al., “Evasion attacks against machine learning at test
[7] A. P. Felt, M. Finifter, E. Chin, S. Hanna, and D. Wagner, “A survey time,” in Proc. ECML, PKDD, Prague, Czech Republic, Sep. 2013,
of mobile malware in the wild,” in Proc. ACM SPSM, Chicago, IL, pp. 387–402.
USA, Oct. 2011, pp. 3–14. [34] A. Al-Dujaili, A. Huang, E. Hemberg, and U. O’Reilly, “Adversarial
[8] A. Costin and J. Zaddach, “IoT malware: Comprehensive survey, anal- deep learning for robust detection of binary encoded malware,” in Proc.
ysis framework and case studies,” in Proc. BlackHat, Las Vegas, NV, IEEE SP Workshops, Francisco, CA, USA, May 2018, pp. 76–82.
USA, Aug. 2018, pp. 1–9. [35] F. Kreuk, A. Barak, S. Aviv-Reuven, M. Baruch, B. Pinkas, and
[9] S. Basole and M. Stamp, “Cluster analysis of malware family relation- J. Keshet, “Deceiving end-to-end deep learning malware detectors
ships,” 2021, arXiv:2103.05761. using adversarial examples,” in Proc. NIPS Workshops, Montreal, QC,
[10] I. H. Sarker, A. S. M. Kayes, S. Badsha, H. AlQahtani, P. A. Watters, Canada, Dec. 2018, pp. 1–6.
and A. Ng, “Cybersecurity data science: An overview from machine [36] W. Hu and Y. Tan, “Generating adversarial malware examples for
learning perspective,” J. Big Data, vol. 7, no. 1, p. 41, 2020. black-box attacks based on GAN,” 2017, arXiv:1702.05983.
[11] N. Idika and A. P. Mathur, “A survey of malware detection techniques,” [37] J. Yuan, S. Zhou, L. Lin, F. Wang, and J. Cui, “Black-box adversarial
Ph.D. dissertation, Dept. Comput. Sci., Purdue Univ., West Lafayette, attacks against deep learning based malware binaries detection with
IN, USA, 2007. GAN,” in Proc. ECAI, Santiago de Compostela, Spain, Aug. 2020,
[12] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 2536–2542.
no. 7553, pp. 436–444, 2015.
[38] I. Rosenberg, A. Shabtai, L. Rokach, and Y. Elovici, “Generic black-
[13] D. Gibert, C. Mateu, and J. Planes, “HYDRA: A multimodal deep
box end-to-end attack against state of the art API call based malware
learning framework for malware classification,” Comput. Security,
classifiers,” in Proc. RAID, Heraklion, Greece, Sep. 2018, pp. 490–510.
vol. 95, Aug. 2020, Art. no. 101873.
[14] E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and [39] H. S. Anderson, A. Kharkar, B. Filar, and P. Roth, “Evading machine
C. K. Nicholas, “Malware detection by eating a whole EXE,” in Proc. learning malware detection,” in Proc. BlackHat, Las Vegas, NV, USA,
AAAI Workshop, New Orleans, LA, USA, Feb. 2018, pp. 268–276. Jul. 2017, pp. 1–6.
[15] O. Suciu, S. E. Coull, and J. Johns, “Exploring adversarial examples [40] I. Rosenberg, A. Shabtai, Y. Elovici, and L. Rokach, “Query-efficient
in malware detection,” in Proc. IEEE SP Workshops, San Francisco, black-box attack against sequence-based malware classifiers,” in Proc.
CA, USA, May 2019, pp. 8–14. ACM ACSAC, Austin, TX, USA, Dec. 2020, pp. 611–626.
[16] K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel, [41] Y. Kucuk and G. Yan, “Deceiving portable executable malware clas-
“Adversarial perturbations against deep neural networks for malware sifiers into targeted misclassification with practical adversarial exam-
classification,” 2016, arXiv:1606.04435. ples,” in Proc. ACM CODASPY, New Orleans, LA, USA, Mar. 2020,
[17] L. Demetrio, B. Biggio, G. Lagorio, F. Roli, and A. Armando, pp. 341–352.
“Functionality-preserving black-box optimization of adversarial win- [42] W. Song, X. Li, S. Afroz, D. Garg, D. Kuznetsov, and H. Yin, “MAB-
dows malware,” IEEE Trans. Inf. Forensics Security, vol. 16, malware: A reinforcement learning framework for attacking static
pp. 3469–3478, 2021. malware classifiers,” 2020, arXiv:2003.03100.
[18] D. Li, Q. Li, Y. Ye, and S. Xu, “A framework for enhancing deep [43] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing
neural networks against adversarial malware,” IEEE Trans. Netw. Sci. adversarial examples,” in Proc. ICLR, San Diego, CA, USA, May 2015,
Eng., vol. 8, no. 1, pp. 736–750, Jan.–Mar. 2021. pp. 1–11.
[19] Q. Wang et al., “Adversary resistant deep neural networks with an [44] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards
application to malware detection,” in Proc. ACM SIGKDD, Halifax, deep learning models resistant to adversarial attacks,” in Proc. ICLR,
NS, Canada, Aug. 2017, pp. 1145–1153. Vancouver, BC, Canada, Apr. 2018, pp. 1–28.
[20] N. Carlini and D. Wagner, “Towards evaluating the robustness of neu- [45] M. Abbasi and C. Gagné, “Robustness to adversarial examples through
ral networks,” in Proc. IEEE SP, San Jose, CA, USA, May 2017, an ensemble of specialists,” in Proc. ICLR, Toulon, France, Aug. 2017,
pp. 39–57. pp. 1–9.
[21] Z. Cui, F. Xue, X. Cai, Y. Cao, G.-G. Wang, and J. Chen, “Detection [46] F. Tramèr, A. Kurakin, N. Papernot, I. J. Goodfellow, D. Boneh, and
of malicious code variants based on deep learning,” IEEE Trans. Ind. P. D. McDaniel, “Ensemble adversarial training: Attacks and defenses,”
Informat., vol. 14, no. 7, pp. 3187–3196, Jul. 2018. in Proc. ICLR, Vancouver, BC, Canada, Apr. 2018, pp. 1–20.
[22] A. Shabtai, R. Moskovitch, C. Feher, S. Dolev, and Y. Elovici, [47] P. Samangouei, M. Kabkab, and R. Chellappa, “Defense-GAN:
“Detecting unknown malicious code by applying classification tech- Protecting classifiers against adversarial attacks using generative mod-
niques on opcode patterns,” Security Inform., vol. 1, no. 1, pp. 1–22, els,” in Proc. ICLR, Vancouver, BC, Canada, Apr. 2018, pp. 1–17.
Feb. 2012. [48] X. Liu and C. Hsieh, “Rob-GAN: Generator, discriminator, and
[23] S. Jeon and J. Moon, “Malware-detection method with a convolutional adversarial attacker,” in Proc. IEEE CVPR, Long Beach, CA, USA,
recurrent neural network using opcode sequences,” Inf. Sci., vol. 535, Jun. 2019, pp. 11234–11243.
pp. 1–15, Oct. 2020.
[49] C. Smutz and A. Stavrou, “When a tree falls: Using diversity in ensem-
[24] J. Saxe and K. Berlin, “Deep neural network based malware detec-
ble classifiers to identify evasion in malware detectors,” in Proc. NDSS,
tion using two dimensional binary program features,” in Proc. IEEE
San Diego, CA, USA, Jan. 2016, pp. 1–15.
MALWARE, Fajardo, PR, USA, Oct. 2015, pp. 11–20.
[25] W. Huang and J. W. Stokes, “MtNet: A multi-task neural network [50] F. Zhang, P. P. K. Chan, B. Biggio, D. S. Yeung, and F. Roli,
for dynamic malware classification,” in Proc. DIMVA, San Sebastián, “Adversarial feature selection against evasion attacks,” IEEE Trans.
Spain, Jul. 2016, pp. 399–418. Cybern., vol. 46, no. 3, pp. 766–777, Mar. 2016.
[26] R. Pascanu, J. W. Stokes, H. Sanossian, M. Marinescu, and A. Thomas, [51] J. W. Stokes, D. Wang, M. Marinescu, M. Marino, and B. Bussone,
“Malware classification with recurrent networks,” in Proc. IEEE “Attack and defense of dynamic analysis-based, adversarial neural mal-
ICASSP, South Brisbane, QLD, Australia, Apr. 2015, pp. 1916–1920. ware detection models,” in Proc. IEEE MILCOM, Angeles, CA, USA,
[27] E. Amer, I. Zelinka, and S. El-Sappagh, “A multi-perspective malware Oct. 2018, pp. 1–8.
detection approach through behavioral fusion of API call sequence,” [52] D. Li and Q. Li, “Adversarial deep ensemble: Evasion attacks and
Comput. Security, vol. 110, Nov. 2021, Art. no. 102449. defenses for malware detection,” IEEE Trans. Inf. Forensics Security,
[28] Q. Wang et al., “You are what you do: Hunting stealthy malware vol. 15, pp. 3886–3900, 2020.
via data provenance analysis,” in Proc. NDSS, San Diego, CA, USA, [53] N. Akhtar and A. Mian, “Threat of adversarial attacks on deep learning
Feb. 2020, pp. 1–17. in computer vision: A survey,” IEEE Access, vol. 6, pp. 14410–14430,
[29] X. Han et al., “SIGL: Securing software installations through deep 2018.
graph learning,” in Proc. USENIX Security, Aug. 2021, pp. 2345–2362. [54] X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and
[30] C. Szegedy et al., “Intriguing properties of neural networks,” in Proc. defenses for deep learning,” IEEE Trans. Neural Netw. Learn. Syst.,
ICLR, Banff, AB, Canada, Apr. 2014, pp. 1–10. vol. 30, no. 9, pp. 2805–2824, Sep. 2019.
[31] V. Kuleshov, S. Thakoor, T. Lau, and S. Ermon, “Adversarial exam- [55] E. Raff and C. Nicholas, “A survey of machine learning meth-
ples for natural language classification problems,” in Proc. ICLR, ods and challenges for windows malware classification,” 2020,
Vancouver, BC, Canada, Apr. 2018, pp. 1–13. arXiv:2006.09271.
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
494 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 25, NO. 1, FIRST QUARTER 2023
[56] M. Odusami, O. Abayomi-Alli, S. Misra, O. Shobayo, [80] M. O. F. Rokon, R. Islam, A. Darki, E. E. Papalexakis, and
R. Damasevicius, and R. Maskeliunas, “Android malware detec- M. Faloutsos, “SourceFinder: Finding malware source-code from pub-
tion: A survey,” in Proc. ICAI, Bogotá, Colombia, Nov. 2018, licly available repositories in GitHub,” in Proc. RAID, San Sebastian,
pp. 255–266. Spain, Oct. 2020, pp. 149–163.
[57] A. D. Raju, I. Y. Abualhaol, R. S. Giagone, Y. Zhou, and S. Huang, [81] D. Rabadi and S. G. Teo, “Advanced windows methods on malware
“A survey on cross-architectural IoT malware threat hunting,” IEEE detection and classification,” in Proc. ACM ACSAC, Austin, TX, USA,
Access, vol. 9, pp. 91686–91709, 2021. Dec. 2020, pp. 54–68.
[58] D. Ucci, L. Aniello, and R. Baldoni, “Survey of machine learning tech- [82] S. Nari and A. A. Ghorbani, “Automated malware classification based
niques for malware analysis,” Comput. Security, vol. 81, pp. 123–147, on network behavior,” in Proc. IEEE ICNC, San Diego, CA, USA,
Mar. 2019. Jan. 2013, pp. 642–647.
[59] Y. Ye, T. Li, D. A. Adjeroh, and S. S. Iyengar, “A survey on malware [83] B. J. Kwon, J. Mondal, J. Jang, L. Bilge, and T. Dumitras, “The
detection using data mining techniques,” ACM Comput. Surv., vol. 50, dropper effect: Insights into malware distribution with downloader
no. 3, pp. 1–40, 2018. graph analytics,” in Proc. ACM CCS, Denver, CO, USA, Oct. 2015,
[60] I. Rosenberg, A. Shabtai, Y. Elovici, and L. Rokach, “Adversarial pp. 1118–1129.
machine learning attacks and defense methods in the cyber security [84] L. Chen, S. Sultana, and R. Sahita, “HeNet: A deep learning approach
domain,” ACM Comput. Surv., vol. 54, no. 5, pp. 1–36, 2021. on Intel processor trace for effective exploit detection,” in Proc. IEEE
[61] L. Demetrio, S. E. Coull, B. Biggio, G. Lagorio, A. Armando, and SP Workshops, San Francisco, CA, USA, May 2018, pp. 109–115.
F. Roli, “Adversarial EXEmples: A survey and experimental evalu- [85] M. Ozsoy, K. N. Khasawneh, C. Donovick, I. Gorelik, N. Abu-
ation of practical attacks on machine learning for windows malware Ghazaleh, and D. Ponomarev, “Hardware-based malware detection
detection,” ACM Trans. Privay Security, vol. 24, no. 4, pp. 1–31, 2021. using low-level architectural features,” IEEE Trans. Comput., vol. 65,
[62] D. Li, Q. Li, Y. Ye, and S. Xu, “Arms race in adversarial malware no. 11, pp. 3332–3344, Nov. 2016.
detection: A survey,” ACM Comput. Surv., vol. 55, no. 1, pp. 1–35, [86] D.-P. Pham, D. Marion, M. Mastio, and A. Heuser, “Obfuscation
2021. revealed: Leveraging electromagnetic signals for obfuscated malware
[63] R. Ronen, M. Radu, C. Feuerstein, E. Yom-Tov, and M. Ahmadi, classification,” in Proc. ACM ACSAC, Dec. 2021, pp. 706–719.
“Microsoft malware classification challenge,” 2018, arXiv:1802.10135. [87] A. Mohaisen, O. Alrawi, and M. Mohaisen, “AMAL: High-
[64] H. S. Anderson and P. Roth, “EMBER: An open dataset for training fidelity, behavior-based automated malware analysis and classification,”
static PE malware machine learning models,” 2018, arXiv:1804.04637. Comput. Security, vol. 52, pp. 251–266, Jul. 2015.
[65] R. E. Harang and E. M. Rudd, “SOREL-20M: A large scale benchmark [88] B. Alsulami and S. Mancoridis, “Behavioral malware classifica-
dataset for malicious PE detection,” 2020, arXiv:2012.07634. tion using convolutional recurrent neural networks,” in Proc. IEEE
[66] L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, “Malware MALWARE, Nantucket, MA, USA, Oct. 2018, pp. 103–111.
images: Visualization and automatic classification,” in Proc. ACM [89] S. O’Shaughnessy and S. Sheridan, “Image-based malware classi-
VizSec, Pittsburgh, PA, USA, Jul. 2011, pp. 1–7. fication hybrid framework based on space-filling curves,” Comput.
[67] D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and Security, vol. 116, May 2022, Art. no. 102660.
C. Siemens, “DREBIN: Effective and explainable detection of android [90] S. Yoo, S. Kim, S. Kim, and B. B. Kang, “AI-HydRa: Advanced
malware in your pocket,” in Proc. NDSS, San Diego, CA, USA, hybrid approach using random forest and deep learning for malware
Feb. 2014, pp. 23–26. classification,” Inf. Sci., vol. 546, pp. 420–435, Feb. 2021.
[68] K. Allix, T. F. Bissyandé, J. Klein, and Y. L. Traon, “AndroZoo:
[91] L. Xu, D. P. Zhang, N. Jayasena, and J. Cavazos, “HADM: Hybrid
Collecting millions of android apps for the research community,” in
analysis for detection of malware,” in Proc. SAI IntelliSys, London,
Proc. ACM MSR, Austin, TX, USA, May 2016, pp. 468–471.
U.K., Sep. 2016, pp. 702–724.
[69] N. Srndic and P. Laskov, “Hidost: A static machine-learning-based
[92] D. Arivudainambi, K. A. V. Kumar, S. S. Chakkaravarthy, and P. Visu,
detector of malicious files,” EURASIP J. Inf. Security, vol. 2016, no. 1,
“Malware traffic classification using principal component analysis and
2016. [Online]. Available: https://jis-eurasipjournals.springeropen.com/
artificial neural network for extreme surveillance,” Comput. Commun.,
articles/10.1186/s13635-016-0045-0
vol. 147, pp. 50–57, Nov. 2019.
[70] D. Vasan, M. Alazab, S. Wassan, B. Safaei, and Q. Zheng, “Image-
based malware classification using ensemble of CNN architectures,” [93] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,
Comput. Security, vol. 92, May 2020, Art. no. 101748. “Rethinking the inception architecture for computer vision,” in Proc.
IEEE CVPR, Las Vegas, NV, USA, Jun. 2016, pp. 2818–2826.
[71] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei,
“ImageNet: A large-scale hierarchical image database,” in Proc. IEEE [94] F. Barbero, F. Pendlebury, F. Pierazzi, and L. Cavallaro, “Transcending
CVPR, Miami, FL, USA, Jun. 2009, pp. 248–255. transcend: Revisiting malware classification in the presence of concept
drift,” 2021, arXiv:2010.03856.
[72] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” in Proc. ICLR, San Diego, CA, USA, [95] A. Narayanan, L. Yang, L. Chen, and L. Jinliang, “Adaptive and scal-
May 2015, pp. 1–14. able android malware detection through online learning,” in Proc. IEEE
[73] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image IJCNN, Vancouver, BC, Canada, Jul. 2016, pp. 2484–2491.
recognition,” in Proc. IEEE CVPR, Las Vegas, NV, USA, Jun. 2016, [96] F. Maggi, W. Robertson, C. Kruegel, and G. Vigna, “Protecting a
pp. 770–778. moving target: Addressing Web application concept drift,” in Proc.
[74] J.-Y. Kim, S.-J. Bu, and S.-B. Cho, “Zero-day malware detection RAID, Saint-Malo, France, Sep. 2009, pp. 21–40.
using transferred generative adversarial networks based on deep autoen- [97] R. Jordaney et al., “Transcend: Detecting concept drift in malware clas-
coders,” Inf. Sci., vols. 460–461, pp. 83–102, Sep. 2018. sification models,” in Proc. USENIX Security, Vancouver, BC, Canada,
[75] I. Goodfellow et al., “Generative adversarial nets,” in Proc. NIPS, Aug. 2017, pp. 625–642.
Montreal, QC, Canada, Dec. 2014, pp. 2672–2680. [98] X. Zhang et al., “Enhancing state-of-the-art classifiers with API seman-
[76] E. Raff, W. Fleshman, R. Zak, H. S. Anderson, B. Filar, and tics to detect evolved android malware,” in Proc. ACM SIGSAC,
M. McLean, “Classifying sequences of extreme length with constant Nov. 2020, pp. 757–770.
memory applied to malware detection,” in Proc. AAAI, Feb. 2021, [99] H. Aghakhani et al., “When malware is Packin’ heat; limits of machine
pp. 9386–9394. learning classifiers based on static analysis features,” in Proc. NDSS,
[77] C. Wang, Z. Qin, J. Zhang, and H. Yin, “A malware variants detection San Diego, CA, USA, Feb. 2020, pp. 1–20.
methodology with an opcode based feature method and a fast density [100] F. Pendlebury, F. Pierazzi, R. Jordaney, J. Kinder, and L. Cavallaro,
based clustering algorithm,” in Proc. IEEE ICNC-FSKD, Changsha, “TESSERACT: Eliminating experimental bias in malware classification
China, Aug. 2016, pp. 481–487. across space and time,” in Proc. USENIX Security, Santa Clara, CA,
[78] M. G. Schultz, E. Eskin, F. Zadok, and S. J. Stolfo, “Data mining USA, Aug. 2019, pp. 729–746.
methods for detection of new malicious executables,” in Proc. IEEE [101] S. Zhu et al., “Measuring and modeling the label dynamics of
SP, Oakland, CA, USA, May 2001, pp. 38–49. online anti-malware engines,” in Proc. USENIX Security, Aug. 2020,
[79] S. Huda, J. Abawajy, M. Alazab, M. Abdollalihian, R. Islam, and pp. 2361–2378.
J. Yearwood, “Hybrids of support vector machine wrapper and filter [102] T. Vidas and N. Christin, “Evading android runtime analysis via sand-
based framework for malware detection,” Future Gener. Comput. Syst., box detection,” in Proc. ACM AsiaCCS, Kyoto, Japan, Jun. 2014,
vol. 55, pp. 376–390, Feb. 2016. pp. 447–458.
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: SURVEY OF ADVERSARIAL ATTACK AND DEFENSE METHODS FOR MALWARE CLASSIFICATION 495
[103] A. Yokoyama et al., “SandPrint: Fingerprinting malware sandboxes to [128] V. Mnih et al., “Playing Atari with deep reinforcement learning,” 2013,
provide intelligence for sandbox evasion,” in Proc. RAID, Paris, France, arXiv:1312.5602.
Sep. 2016, pp. 165–187. [129] F. Pierazzi, F. Pendlebury, J. Cortellazzi, and L. Cavallaro, “Intriguing
[104] N. Miramirkhani, M. P. Appini, N. Nikiforakis, and M. Polychronakis, properties of adversarial ML attacks in the problem space,” in Proc.
“Spotless sandboxes: Evading malware analysis systems using wear- IEEE SP, San Francisco, CA, USA, May 2020, pp. 1332–1349.
and-tear artifacts,” in Proc. IEEE SP, San Jose, CA, USA, May 2017, [130] A. Shafahi et al., “Adversarial training for free!” in Proc. NIPS,
pp. 1009–1024. Vancouver, BC, Canada, Dec. 2019, pp. 3353–3364.
[105] Q. Yu et al., “An immunology-inspired network security architecture,” [131] E. Wong, L. Rice, and J. Z. Kolter, “Fast is better than free: Revisiting
IEEE Wireless Commun., vol. 27, no. 5, pp. 168–173, Oct. 2020. adversarial training,” in Proc. ICLR, Addis Ababa, Ethiopia, Apr. 2020,
[106] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine pp. 1–17.
learning at scale,” in Proc. ICLR, Toulon, France, Apr. 2017, pp. 1–17. [132] D. Zhang, T. Zhang, Y. Lu, Z. Zhu, and B. Dong, “You only propagate
[107] A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples once: Accelerating adversarial training via maximal principle,” in Proc.
in the physical world,” in Proc. ICLR Workshop, Toulon, France, NIPS, Vancouver, BC, Canada, Dec. 2019, pp. 227–238.
Apr. 2017, pp. 1–14. [133] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a
[108] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: A simple neural network,” 2015, arXiv:1503.02531.
and accurate method to fool deep neural networks,” in Proc. IEEE [134] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille, “Mitigating adver-
CVPR, Las Vegas, NV, USA, Jun. 2016, pp. 2574–2582. sarial effects through randomization,” in Proc. ICLR, Vancouver, BC,
[109] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and Canada, Apr. 2018, pp. 1–16.
A. Swami, “The limitations of deep learning in adversarial settings,” in [135] W. Xu, D. Evans, and Y. Qi, “Feature squeezing: Detecting adversarial
Proc. IEEE EuroS P, Saarbrücken, Germany, Mar. 2016, pp. 372–387. examples in deep neural networks,” in Proc. NDSS, San Diego, CA,
[110] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation USA, Feb. 2017, pp. 1–15.
as a defense to adversarial perturbations against deep neural networks,” [136] D. Meng and H. Chen, “MagNet: A two-pronged defense against adver-
in Proc. IEEE SP, San Jose, CA, USA, May 2016, pp. 582–597. sarial examples,” in Proc. ACM CCS, Dallas, TX, USA, Oct. 2017,
[111] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and pp. 135–147.
A. Swami, “Practical black-box attacks against machine learning,” in [137] M. Naseer, S. Khan, M. Hayat, F. S. Khan, and F. Porikli, “A self-
Proc. ACM AsiaCCS, Abu Dhabi, UAE, Apr. 2017, pp. 506–519. supervised approach for adversarial robustness,” in Proc. IEEE CVPR,
[112] W. Brendel, J. Rauber, and M. Bethge, “Decision-based adversarial Seattle, WA, USA, Jun. 2020, pp. 262–271.
attacks: Reliable attacks against black-box machine learning models,” [138] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adver-
in Proc. ICLR, Vancouver, BC, Canada, Apr. 2017, pp. 1–12. sarial networks,” in Proc. ICML, Sydney, NSW, Australia, Aug. 2017,
[113] M. Cheng, T. Le, P.-Y. Chen, J. Yi, H. Zhang, and C.-J. Hsieh, “Query- pp. 214–223.
efficient hard-label black-box attack: An optimization-based approach,” [139] W. He, J. Wei, X. Chen, N. Carlini, and D. Song, “Adversarial example
in Proc. ICLR, New Orleans, LA, USA, May 2018, pp. 1–14. defense: Ensembles of weak defenses are not strong,” in Proc. USENIX
WOOT, Vancouver, BC, Canada, Aug. 2017, pp. 1–11.
[114] C. Guo, J. Gardner, Y. You, A. G. Wilson, and K. Weinberger, “Simple
[140] S. Chen et al., “Automated poisoning attacks and defenses in malware
black-box adversarial attacks,” in Proc. ICML, Long Beach, CA, USA,
detection systems: An adversarial machine learning approach,” Comput.
Jun. 2019, pp. 2484–2493.
Security, vol. 73, pp. 326–344, Mar. 2018.
[115] C. Xiao, B. Li, J.-Y. Zhu, W. He, M. Liu, and D. Song, “Generating
[141] L. Tong, B. Li, C. Hajaj, C. Xiao, N. Zhang, and Y. Vorobeychik,
adversarial examples with adversarial networks,” in Proc. IJCAI,
“Improving robustness of ML classifiers against realizable evasion
Stockholm, Sweden, Jul. 2018, pp. 3905–3911.
attacks using conserved features,” in Proc. USENIX Security, Santa
[116] P. Mangla, S. Jandial, S. Varshney, and V. N. Balasubramanian,
Clara, CA, USA, Aug. 2019, pp. 285–302.
“AdvGAN++: Harnessing latent layers for adversary generation,”
[142] A. Azmoodeh, A. Dehghantanha, and K.-K. R. Choo, “Robust mal-
in Proc. IEEE ICCV Workshops, Seoul, South Korea, Oct. 2019,
ware detection for Internet of (battlefield) Things devices using deep
pp. 2045–2048.
Eigenspace learning,” IEEE Trans. Sustain. Comput., vol. 4, no. 1,
[117] P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J. Hsieh, “ZOO: Zeroth pp. 88–95, Jan.–Mar. 2019.
order optimization based black-box attacks to deep neural networks [143] H. Li, S. Zhou, W. Yuan, X. Luo, C. Gao, and S. Chen, “Robust android
without training substitute models,” in Proc. ACM AISec, Dallas, TX, malware detection against adversarial example attacks,” in Proc.
USA, Nov. 2017, pp. 15–26. ACM/IW3C2 WWW, Ljubljana, Slovenia, Apr. 2021, pp. 3603–3612.
[118] B. Kolosnjaji et al., “Adversarial malware binaries: Evading deep learn- [144] Y. Chen, S. Wang, D. She, and S. Jana, “On training robust
ing for malware detection in executables,” in Proc. IEEE EUSIPCO, PDF malware classifiers,” in Proc. USENIX Security, Aug. 2020,
Roma, Italy, Sep. 2018, pp. 533–537. pp. 2343–2360.
[119] A. Abusnaina, A. Khormali, H. Alasmary, J. Park, A. Anwar, and [145] K. Grosse, P. Manoharan, N. Papernot, M. Backes, and P. McDaniel,
A. Mohaisen, “Adversarial learning attacks on graph-based IoT mal- “On the (statistical) detection of adversarial examples,” 2017,
ware detection systems,” in Proc. IEEE ICDCS, Dallas, TX, USA, arXiv:1702.06280.
Jul. 2019, pp. 1296–1305. [146] L. Chen, S. Hou, and Y. Ye, “SecureDroid: Enhancing security of
[120] L. Chen, Y. Ye, and T. Bourlai, “Adversarial machine learning in mal- machine learning-based detection against adversarial android mal-
ware detection: Arms race between evasion attack and defense,” in ware attacks,” in Proc. ACM ACSAC, Orlando, FL, USA, Dec. 2017,
Proc. IEEE EISIC, Athens, Greece, Sep. 2017, pp. 99–106. pp. 362–372.
[121] W. Hu and Y. Tan, “Black-box attacks against RNN based malware [147] D. Li and Q. Li, “Enhancing robustness of deep neural networks against
detection algorithms,” in Proc. AAAI Workshops, New Orleans, LA, adversarial malware samples: Principles, framework, and application
USA, Feb. 2018, pp. 245–251. to AICS’2019 challenge,” in Proc. AAAI Workshops (AICS), 2019,
[122] K. N. Khasawneh, N. B. Abu-Ghazaleh, D. Ponomarev, and L. Yu, pp. 1–9.
“RHMD: Evasion-resilient hardware malware detectors,” in Proc. [148] A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give a
IEEE/ACM MICRO, Cambridge, MA, USA, Oct. 2017, pp. 315–327. false sense of security: Circumventing defenses to adversarial exam-
[123] W. Xu, Y. Qi, and D. Evans, “Automatically evading classifiers: A ples,” in Proc. ICML, Stockholm, Sweden, Jul. 2018, pp. 274–283.
case study on PDF malware classifiers,” in Proc. NDSS, San Diego, [149] N. Carlini and D. Wagner, “Adversarial examples are not easily
CA, USA, Feb. 2016, pp. 1–15. detected: Bypassing ten detection methods,” in Proc. ACM AISec,
[124] L. Yu, W. Zhang, J. Wang, and Y. Yu, “SeqGAN: Sequence generative Dallas, TX, USA, Nov. 2017, pp. 3–14.
adversarial nets with policy gradient,” in Proc. AAAI, San Francisco, [150] R. Podschwadt and H. Takabi, “Effectiveness of adversarial exam-
CA, USA, Feb. 2017, pp. 2852–2858. ples and defenses for malware classification,” in Proc. SecureComm,
[125] C. Smutz and A. Stavrou, “Malicious PDF detection using metadata Orlando, FL, USA, Oct. 2019, pp. 380–393.
and structural features,” in Proc. ACM ACSAC, Orlando, FL, USA, [151] N. Papernot, P. McDaniel, A. Sinha, and M. Wellman, “Towards
Dec. 2012, pp. 239–248. the science of security and privacy in machine learning,” 2016,
[126] N. Srndic and P. Laskov, “Detection of malicious PDF files based on arXiv:1611.03814.
hierarchical document structure,” in Proc. NDSS, San Diego, CA, USA, [152] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio,
Feb. 2013, pp. 1–16. “Quantized neural networks: Training neural networks with low
[127] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. precision weights and activations,” J. Mach. Learn. Res., vol. 18,
Cambridge, MA, USA: MIT Press, 2018. pp. 6869–6898, Jan. 2017.
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.
496 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 25, NO. 1, FIRST QUARTER 2023
[153] S. Han et al., “EIE: Efficient inference engine on compressed deep neu- Limin Sun received the B.S. and Ph.D. degrees
ral network,” in Proc. IEEE/ACM ISCA, Seoul, South Korea, Jun. 2016, from the National University of Defense Technology,
pp. 243–254. Changsha, China, in 1988 and 1998, respec-
[154] G. Ian and P. Nicolas. “Is attacking machine learning easier than tively. He is a Professor with the Institute of
defending it?” cleverhans.io. Accessed: Feb. 15, 2017. [Online]. Information Engineering, Chinese Academy of
Available: http://www.cleverhans.io/security/privacy/ml/2017/02/15/ Sciences, Beijing, China. He is also with the School
why-attacking-machine-learning-is-easier-than-defending-it.html of Cyber Security, University of Chinese Academy
[155] M. Du, N. Liu, and X. Hu, “Techniques for interpretable machine of Sciences, Beijing. His research interests include
learning,” Commun. ACM, vol. 63, no. 1, pp. 68–77, 2019. mobile vehicle networks, Internet of Things security,
and wireless sensor networks.
Authorized licensed use limited to: Universiti Kuala Lumpur. Downloaded on May 28,2024 at 06:03:47 UTC from IEEE Xplore. Restrictions apply.