Adversarial Machine Learning Attack Surfaces, Defence Mechanisms
Adversarial Machine Learning Attack Surfaces, Defence Mechanisms
Adversarial Machine
Learning
Attack Surfaces, Defence Mechanisms,
Learning Theories in Artificial Intelligence
Aneesh Sreevallabh Chivukula Xinghao Yang
BITS Pilani Hyderabad Campus Computer Science
Department of Computer Science & University of Technology Sydney
Information Systems Sydney, NSW, Australia
Secunderabad, Hyderabad, Telangana,
India
Wanlei Zhou
Computer Science
University of Technology Sydney
Sydney, NSW, Australia
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
A significant robustness gap exists between machine intelligence and human percep-
tion despite recent advances in deep learning. Deep learning is not provably secure.
A critical challenge in deep learning is the vulnerability of deep learning networks
to security attacks from malicious adversaries. Even innocuous perturbations to
the training data can be used to manipulate the behavior of the deep network in
unintended ways. For example, autonomous AI agents in unmanned autonomous
systems such as self-driving vehicles can play multistage cyber deception games
with the learning algorithms. Adversarial deep learning algorithms are specifically
designed to exploit such vulnerabilities in deep networks. These vulnerabilities are
simulated by training the learning algorithm under various attack scenarios. The
attack scenarios are assumed to be formulated by an intelligent adversary. The
optimal attack policy is formulated as solving for optimization problems. The attack
scenarios have led to the development of adversarial attack technologies in computer
vision, natural language processing, cybersecurity on multidimensional, textual and
image data, sequence data, and spatial data.
In discriminative learning models, adversarial learning problems are formulated
with deep neural networks computing statistical divergence metrics between train-
ing data features and adversarial data features. Latent space on high-dimensional
training data can also be searched by deep networks to construct adversarial
examples. Depending on the goal, knowledge, and capability of an adversary,
adversarial examples can be crafted by prior knowledge, observation, and experi-
mentation on the loss functions in deep learning. Adversarial examples are known
to transfer between data-specific manifolds of deep learning models. Thus predictive
performance of deep learning models under attack is an interesting area for research.
Randomized adversarial algorithms for discrimination can be extended with effi-
ciency, complexity, reliability, learnability, etc. tradeoffs in the game theoretical
optimization. The resultant convergence properties of game theoretical optima
can be investigated with adaptive dynamic programming to produce numerical
computational methods for adversarial deep learning.
The existing adversarial learning algorithms differ in design assumptions regard-
ing adversary’s knowledge, attack strategies, attack influence, and security violation.
v
vi Preface
in the training stage and testing stage. Such learning problems study feature
manipulations, misclassifications costs, and distributional robustness in adversarial
learning applications. The adversarial loss functions and training procedures in
recently done research are applicable to the study of trustworthiness of deep
learning in deployment. They can simulate the cyberspace security safeguards, risks,
and challenges in cyber-physical systems as computational algorithms design and
statistical inference analysis problems.
This book is relevant for adversarial machine learning practitioners and adver-
sarial artificial intelligence researchers working in the design and application of
adversarial deep learning theories in machine learning, deep learning, data mining,
and knowledge discovery algorithms design. Particular emphasis is placed on the
real-world application domains of Adversarial Deep Learning in the development of
data science, big data analytics, and cybersecurity solutions. The adversarial deep
learning theories are summarized with reference to capabilities of computational
algorithms in pattern recognition, game theory, computational mathematics, and
numerical analysis. The resultant analytics algorithmics, deep neural networks,
and adversarial loss functions review the state of the art in the implementation
of adversarial algorithms, their attack surfaces, concepts, and methods from the
perspective of game theoretical machine learning. The book explores the systems
theoretic dependence between randomization in adversarial manipulations and
generalizability in blackbox optimizations of the game theoretical adversarial deep
learning. It aids future research, design, development, and innovations in the game
theoretical adversarial deep learning algorithms applicable to cyberspace security
data mining problems.
The book also serves as a reference on the existing literature that can be
implemented by researchers as baseline models to empirically compare the relevant
attack scenarios and defense mechanisms for adversarial deep learning. The known
invasive techniques and their countermeasures to develop future cybersecurity capa-
bilities are reviewed. The security issues and vulnerabilities in the machine/deep
learning solutions are mainly located within the deep layers of mathematical formu-
lation and mechanism of the learning methods. The game theoretical formulations
of the adversarial learning in the book leverage deep learning and big data to solve
for adversarial samples that effect data manipulation on the learnt discriminative
learning baselines. Several such learning baselines must be built to generate an
adversary’s attack hypothesis and consequent defense mechanisms available for
adjusting the decision boundaries in discriminative learning. Thus the research
questions covered in the book can set the stage for strategies and expectations in
the adversarial deep learning capabilities offered around cyber adversaries’ Tools,
Tactics, Techniques, and Procedures (TTPs) in the cyber kill chain. They can assess,
prioritize, and select the high-risk use case scenarios of cyber threats targeting deep
learning models in security detection/prevention layers.
One significant barrier to the widespread adoption of deep learning methods
is their complexity in both learning and reasoning phases that make it difficult
to understand and test the potential vulnerabilities and also suitable mitigations.
Learning from data for decision making within cyberspace domain is still a
viii Preface
current and important challenge due to its complexity in design and development.
This challenge is also interweaving with complexities from adversarial attacks
targeting manipulated results for machine/deep learning models. The resilience
of the machine learning models is a critical component for trustworthy systems
in cybersecurity and artificial intelligence, but one that is poorly understood and
investigated by mainstream security research and industry community. The book
provides a survey of the security evaluation of machine learning algorithms with the
design-for-security paradigm of adversarial learning to complement the classical
design-for-performance paradigm of machine learning. The security evaluation is
useful for the purpose of alleviating prediction bias in machine learning systems
according to the security attributes defined for a given adversarial learning models
algorithmics operational in dynamic learning environments. Formalized adversarial
learning assumptions around the attack surface then constructs adversarial deep
learning designs with reference to signal processing characteristics in the robustness
properties of machine learning systems TTPs.
This book begins with a review of adversarial machine learning in Chap. 1
along with a comparison of new versus existing approaches to game theoretical
adversarial machine learning. Chapter 2 positions our research contributions in
contrast to related literature on (i) adversarial security mechanisms and generative
adversarial networks, (ii) adversarial examples for misleading deep classifiers and
game theoretical adversarial deep learning models, and (iii) adversarial examples in
transfer learning and domain adaptation for cybersecurity.
The adversarial attack surfaces for the security and privacy tradeoffs in adver-
sarial deep learning are given in Chap. 3. They summarize the cyber, physical,
active, and passive attack surfaces in interdependent, interconnected, and interac-
tive security-critical environments for learning systems. Such attack surfaces are
increasing vertically in numbers, volumes and horizontally in types, functionality
over Internet, social networks, smartphones, and IoT devices. Autonomic security
in self-protecting and self-healing threat mitigation strategies must consider such
attack surfaces in control mechanisms of the networking domains to identify threats
and choose appropriate machine learning and data mining methods for adversarial
learning.
Chapter 4 describes game theoretical adversarial deep learning. The compu-
tational algorithms in our research are contrasted with stochastic optimization
techniques in the game theory literature. Several game formulations are illustrated
with examples to construct cost-sensitive adversaries for adversarial data mining.
Proper quantification of the hypothesis set in decision problems of this research
leads us into various functional problems, oracular problems, sampling tasks, and
optimization problems in the game theoretical adversarial learning. We can then
develop a theory of sample complexity, formal verification, and fuzzy automata in
the adversarial models with reliable guarantees. The resultant sampling dynamics
are applicable into the Adversarial Signal Processing of soft matching patterns and
their feature embeddings in cybersecurity attack scenarios and defense mechanisms.
In terms of information-theoretic efficiency of machine learning, this is a study
of the sample complexity of the function classes in adversarial learning games to
Preface ix
devise each attack scenario as a blackbox attack where the adversaries have no prior
knowledge of the deep learning training processes and its best response strategies.
Chapter 5 presents theories and algorithms for adversarial deep learning. These
algorithmics can also be used to check the learning system specifications for
consistency and applicability to merge the attack data and harden the specifications
into a new adversarial learning model with vulnerability assessment metrics,
protocols, and countermeasure fusions. Example applications of the adversarial
attacks due to game theoretical adversarial deep learning proposed in our research
are presented in Chap. 6. We work in the context of statistical spam and autonomous
systems applications with images and videos. But we have found literature in
several cybersecurity analytics applications for the adversarial deep learning in real-
world domains. For instance, it is applicable in cryptanalysis, steganalysis, IoT
malware, synthetic data generators, network security, biometrics recognition, object
detection, virtual assistants, cyber-physical control systems, phishing detection,
computational red teaming, natural language generation, etc. But the data analytics
results from adversarial data mining are not always formulated in terms of game
theoretical modelling and optimization although game theory provides an excellent
abstraction for generative-discriminative modelling in adversarial deep learning that
is intractable in shallow architectures for machine learning.
Chapter 7 develops a discussion on the utilization of adversarial learning in pri-
vacy enhancing technologies. By defining the trust, resilience, and agility ontologies
for each threat agent the privacy preserving data mining techniques can extend our
research in game theoretical adversarial deep learning to operate in accordance with
privacy-by-design paradigm for contractual, statutory, and regulatory requirements
regarding the use of computing and internet technologies in machine learning. We
can produce security and dependability metrics ontologies to reflect the quality
of an adversarial system with respect to its privacy functionality, performance,
dependability, coupled with security costs and complexities, transparency and
fairness, interpretability, and explainability in modelling the adversarial AI agents
within multivector, multistage, and hybrid kill-chain strategies for cyberattacks.
Computational difficulties for measuring utility and associated information loss
can be addressed in game theory to provision security service offerings satisfying
lightweightness, heterogeneity, early detection of attacks, high availability, high
accuracy, high reliability, fault tolerance, resilience, robustness, scalability, and
energy efficiency. Such adversarial AI agents can discover new attacks and learn
over time to respond better to threats in cybersecurity as seen in intelligent scanners,
firewalls, anti-malware, intelligent espionage tools, and autonomous weapons.
xi
xii Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
List of Figures
Fig. 1.1 Reactive and proactive arms race between adversary and
learner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Fig. 1.2 A flow chart illustrating the benefits of a game theoretic
learner. The two-player game is played by a single
adversary and one learner. The game produces a final deep
learning network CNNsecure that is better equipped to deal
with the adversarial manipulations than the initial deep
learning network CNNoriginal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Fig. 2.1 Adversarial loss functions training process . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Fig. 2.2 Custom loss functions learning curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Fig. 2.3 A flowchart illustrating the adversarial autoencoder-based
Stackelberg game theoretical modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Fig. 4.1 A flowchart illustrating the variational Stackelberg game
theoretical adversarial learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Fig. 6.1 Successful adversarial examples from [589] to mislead
AlexNet [332]. The perturbations are almost imperceptible
by human vision system, but the AlexNet predicts the
adversarial examples as “ostrich, struthio, camelus” from
top to bottom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
Fig. 7.1 An example of the system architecture for AP-based
file-level privacy protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Fig. 7.2 An example of the result of the file-level privacy protection
(the colors of noises are amplified by normalization
otherwise they would be hard to see) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Fig. 7.3 Framework of AP-based object-level privacy protection algorithm . 259
Fig. 7.4 Illustration of a typical face recognition system and the
process of generating adversarial image perturbation . . . . . . . . . . . . . . . 262
xv
List of Tables
Table 1.1
Adversarial algorithms comparison 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Table 1.2
Adversarial algorithms comparison 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Table 2.1
Generative adversarial network comparison 1 . . . . . . . . . . . . . . . . . . . . . . 28
Table 2.2
Generative adversarial network comparison 2 . . . . . . . . . . . . . . . . . . . . . . 30
Table 2.3
Generative adversarial network comparison 3 . . . . . . . . . . . . . . . . . . . . . . 32
Table 2.4
Generative adversarial network comparison 4 . . . . . . . . . . . . . . . . . . . . . . 34
Table 6.1
Summary of the properties for different attacking
methods. The properties are Targeted attack, Untargeted
attack, White-box attack, Blackbox attack and Gray-box attack . . 245
Table 6.2 Three successful adversarial text examples generated
by the character-level attack, sentence-level attack, and
word-level attack strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
Table 6.3 Summary of the properties for different text attacking
methods. The properties are Targeted attack, Untargeted
attack, White-box attack, and Black-box attack . . . . . . . . . . . . . . . . . . . . 252
xvii
Chapter 1
Adversarial Machine Learning
This chapter investigates the robustness gap between machine intelligence and
human perception in machine learning for cyberspace security with game theoretical
adversarial learning algorithms. In this chapter, we shall conduct a literature
review to provide new insights on the relation between adversarial learning and
cybersecurity. We seek to survey and summarize non-stationary data representations
learnt by machine learning models. The modelling robustness shall be surveyed
to produce a summarization of adversarial examples and adversarial algorithms.
We shall also survey the use of convex optimization, stochastic optimization,
and evolutionary computing in adversarial deep learning formulations. Another
interesting study shall be that of defense mechanisms available for deep learning
models deployed in real-world environments.
Data mining is the study of automatically learning mathematical patterns from
the information in a database. It is a process of knowledge discovery that requires
developing computational algorithms [182] for preprocessing, modelling, and post-
processing data given a database system. The design of those algorithms, however,
must be based on a machine learning paradigm. Machine learning paradigms are
modes of computational learning based on some underlying statistical assumptions,
such as the level of human oversight in the training data or the data’s underlying dis-
tribution. Example paradigms include supervised learning, unsupervised learning,
semi-supervised learning, reinforcement learning, meta-learning, and deep learning.
A standard statistical assumption, called the stationarity assumption, is that the
training data used by a model to learn a mathematical pattern and the testing
data used to evaluate how well it recognizes those patterns are sampled from the
same underlying probability distribution of independent and identically distributed
(i.i.d) random variables. Yet the stationarity assumption does not hold in most
real-world applications; training and testing data seldom share exactly the same
distributions and are not often i.i.d. Therefore, a robust learning paradigm for non-
stationary data analytics has become something of a goal in adversarial learning.
Adversarial learning has applications in areas like spam filtering, virus detection,
Traditional machine learning models assume training data samples, testing data
samples, and validation data samples follow the same, independent, and identically
distributed data distribution. This assumption creates security vulnerabilities in
machine learning models subject to attack from intelligent adversaries with a
malicious intent. Given training data samples, such adversaries design adversarial
examples to increase model error. Securing learning systems from such adversarial
examples is an active area of research in artificial intelligence, security diagnostics,
generative learning, deep learning, information security, autonomous systems,
intelligent systems, and data analytics.
Adversarial examples can mislead learning models as long as adversary’s attack
is planned after learning model has completed training and therefore cannot react to
new samples. From this observation, adversarial algorithms incorporate adversary
into training process of learning models. Thus, adversarial algorithms model
adversarial machine learning as an interaction between two agents—the learning
model and one or more intelligent adversaries.
Game theory provides a framework to study interactions between learning model
(or learner for short) and intelligent adversary (or adversary for short) in terms
of interaction between evolving strategies of the learner and the adversary. Game
theory interactions were first formulated in life sciences as non-linear differential
equations that study interactions between populations of biological systems.
In machine learning, loss functions quantify the impact of information uncer-
tainty over a distribution of analytics predictions. Adversarial algorithms formulate
machine learning loss functions for a training process that prevents model overfitting
to training data in presence of rational, adaptive adversaries that simulate evolving
changes to learning environment as adversarial examples.
1.1 Adversarial Learning Frameworks 5
This section presents a literature review and attack taxonomy of adversarial learning
algorithms. The adversarial algorithms are summarized in Tables 1.1 and 1.2 in
terms of algorithm design and algorithm application. The algorithms are primarily
compared on the adversarial cost function (or cost function for short). It is a
measure of the expected performance of the learning algorithm in the presence of an
adversary. It is formulated differently for different adversarial learning algorithms.
The tables’ columns list the various features for comparing adversarial learning
algorithms. Our algorithm is termed “game theory : deep learning.” The tables’ rows
list the various algorithms under comparison. Across the rows, we list computational
models vulnerable to adversarial data for feature extraction, deep learning, support
vector machines, and classifier ensembles where the input data for simulating
adversarial attacks is taken to be text spam, image spam, and biometric spam. The
algorithms are compared on cost function, search algorithm, convergence condi-
tions, attack strategy, attack influence, security violation, adversary’s knowledge,
algorithm moves, and learning games. The “cost function” is the objective function
to solve for adversarial data. The “search algorithm” is the algorithm used to
find an optimal solution. The “convergence conditions” is the search criteria for
creating adversarial data. The “attack strategy” is the attack scenario under which
the adversary operates. The “attack influence” of a strategy determines the access
that the adversary has to train data and test data input to the learning algorithm.
The “security violation” is the purpose of the adversary’s attack. The “adversary’s
knowledge” is the semantic information of the adversary. The “algorithm moves”
are the actions taken by learning algorithm to adapt to adversarial data manipulation.
From the tables, we see that most of the existing research works do not add
game theory formulations to the cost function. Thus most of the existing learning
Table 1.1 Adversarial algorithms comparison 1
Adversarial Cost Search Convergence Attack Attack Security Adversary’s Algorithm Learning
algorithm function algorithm conditions strategy influence violation knowledge moves games
Classifier Randomized Ensemble size, Reorder features Causative Targeted, Training Average None
ensem- sampling feature subset by importance for availability features discrimination
k |wi | size discriminant functions
bles [59] 2 n
function
E = 2 − k=1 F (k), F (k) = ni=1 . (1.1)
n j =1 |wj |
Feature Feature Number of base Addition/deletion Causative Indiscriminate, Training Average None
weight- bagging models of binary features availability features estimated
1 m T −1 weights
ing [321] minw λ2 w T w + m i=1 l(w (S x), y),
SVM: Gradient Change in test Train noise Causative Targeted, Gradient of Incremental None
inputs [65] ascent error injection integrity loss learning svms
k c k
m (1 − y f x (x ))
maxxc L(xc ) = k=1
∂L ∂Qkc
∂u
m M ∂Qsc +
= k=1 k ∂u ∂u αc . (1.3)
SVM: Gradient svm margin Label noise Causative Targeted, Training Update svm None
labels [664] ascent support vectors injection integrity labels weights and
L(Dtr ) = argminf ∈F [(f ) + C.R̂(f, Dtr )], by lp and qp hyperplanes
1 n
R̂(f, Dtr ) = l(f (xi ), yi ), C > 0,
n i=1 (1.4)
2
VL (Dtr , Dvd ) = fDtr + C.R̂(fDtr , Dvd ),
fDtr = L(Dtr ), VL (z, y) = VL ((xi , zi ), (xi , yi )).
Deep Backpropagation Early stopping Linear Causative Targeted, Training Update None
learn- with L-BFGS on adversarial perturbation on x integrity and decision
J(θ, x, y) = αJ (θ, x, y) + (1 − α)J ((θ, x + validation set testing function
ing [227] error data parame-
(1.7)
sign(∇x J (θ, x, y)))). ters
Adversarial Jacobian-based Early stopping Observe DNN Exploratory Targeted, Testing Jacobian- None
networks: dataset on adversarial outputs given integrity data based
augmentation validation set inputs chosen by regulariza-
DNN [480] Sp+1 = x + λp+1 sgn(JF [Ô(x)])U Sp , error the adversary tion and
distillation
Ô(x) = argmaxj =0...N −1 Oj (x),
δx = sgn( x c(F, x, y)), (1.6)
c(F, x, y) = p(y = 1|x) = exp((x − μ)T β(x − μ)),
F (x) = fn (θn , fn−1 (θn−1 , . . . f2 (θ2 , f1 (θ1 , x)))).
Table 1.2 Adversarial algorithms comparison 2
Adversarial Cost Search Convergence Attack Attack Security Adversary’s Algorithm Learning
algorithm function algorithm conditions strategy influence violation knowledge moves games
Adversarial Stacking Training error Gaussian Exploratory Indiscriminate, Testing data Penalty None
networks: DAEs into a additive noise availability function
m feedforward smoothing the
DAE [238] JDCN (θ) = i=1 (L(t i , y i )
neural adversarial
network data
+1
∂hij
+ jH=1 λj || ||2 ) (1.7)
∂hij −1
Game theory: Quadratic Training error Delete different Causative Targeted, Training Parameters Non-zero
support vector program- subject to features from integrity features regularization sum game
1 ming regularization different data
machines [220] min |w|2 + Ci [1 − yi w T xi + ti ] terms points
2
ti ≥ Kzi + j vij
ti ≥ Kzi + j vij (1.10)
vi ≥ 0
zi + vi ≥ (yi xi w).
Game theory: Evolutionary Nash Move positive Causative Targeted, Training data Update Constant
deep learning algorithm equilibrium samples toward integrity estimated sum game
∗ ∗ negative weights for
(Our method) Maxmin : (α , w ) = argmaxα∈A
(1.11) samples adversarial
JL (α, argminw∈W JL (α, w)). manipulation
1.2 Adversarial Security Mechanisms 9
In addition to Tables 1.1 and 1.2, the existing adversarial learning algorithms and
their application domains can also be classified by the learner’s defense mechanisms
and corresponding adversary’s attack scenarios [33, 59, 63, 531]. Learner’s defense
mechanisms have been proposed by designing secure learning algorithms [63],
multiple classifier systems [59], privacy-preserving machine learning [531], and use
of randomization or disinformation to mislead the adversary [33].
Biggio et al. [63] discuss learner’s defense mechanism in terms of an empirical
framework extending the model selection and performance evaluation steps of
pattern classification by Duda et al. [166]. The framework recommends training
the learner for “security by design” rather than “security by obscurity.” The frame-
work recommends following additional steps to validate the defense mechanisms
proposed in case of both generative learning models and discriminative learning
models under attack.
• Proactively anticipate the most relevant adversarial attacks through a what-if
analysis simulating potential attack scenarios.
• Define attack scenarios in terms of goal, knowledge, and capability of adversary.
• Propose a generative data distribution model on conditional probabilities that
can formally account for a large number of potential attacks and cross-validation
samples on training data and testing data.
Following assumptions are made regarding the learning algorithm’s security. The
model performance is then evaluated under an optimal attack strategy simulated
according to the framework proposed by Biggio et al. [63].
• An adversary’s goal is formulated as the optimization of an objective function.
The objective function is designed on the desired security violation (that is
integrity, availability, or privacy) and attack specificity (from targeted to indis-
criminate).
• An adversary’s knowledge is defined as knowledge of the components of the
classifier, viz., training data, feature set, learning algorithm, decision function
and its parameters, available and feedback.
• An adversary’s capability is defined as the control adversary has on training
data and testing data taking into account application-specific constraints such as
attack influence (either causative or exploratory), effect on class priors, fraction
of samples, and features manipulated by adversary.
10 1 Adversarial Machine Learning
Depending on the goal, knowledge, and capability of the adversary, these assump-
tions are also classified in terms of attack influence, security violation, and attack
specificity.
The attack influence can be causative or exploratory. Causative attack affects both
training and testing data. Exploratory attack affects only testing data.
The security violation can target either integrity or availability or privacy of
the learner. A machine learning algorithm whose integrity is compromised cannot
detect malicious behavior of the adversary. The integrity of an algorithm with
many false negatives gets compromised. A machine learning algorithm whose
availability is compromised exhibits severely degraded performance for legitimate
users. The availability of an algorithm with many false positives gets compromised.
The privacy of an algorithm whose detailed feedback is made public also gets
compromised.
The attack specificity can be either targeted or indiscriminate for attacks that
influence prediction or action of the algorithm. In targeted attacks the attack is
directed at only a few instances of the training or testing data. In indiscriminate
attacks the attack is directed at an entire class of instances or objects.
Our adversarial algorithms have causative attack influence, integrity security
violation, and targeted attack specificity.
The typical adversary’s attack scenarios range across (i) adding noise to fea-
tures/labels, (ii) adding/deleting features/labels, (iii) slight change or manipulation
or perturbation to data distributions, and (iv) slight change to decision boundaries.
The corresponding optimization problems are solved using search algorithms with
sampling and gradients methods. The sampling methods range across incremental
sampling, bagging sampling, stacking sampling, and randomized sampling. The
gradient methods range across linear methods, quadratic methods, convex methods,
and stochastic methods. These optimization problems are solved on finding a local
optimum solution determined by convergence conditions ranging across (i) number
of features, (ii) number of regularization terms, and (iii) changes to estimated errors
over training/testing data.
Our adversarial algorithms cause slight change to data distributions simulated
by stochastic optimization and randomized sampling methods. Our optimization
problems converge onto solutions computed at Nash equilibria in Stackelberg
games. From the adversary’s standpoint, the equilibrium solution is a local optimum
in case of worst-case attack scenarios and a global optimum in case of best-case
attack scenarios. The strength and relevancy of our attack scenarios is determined
by the performance of the deep learning models under attack.
Papernot et al. [482] provide a threat model summarizing various attack scenarios
in adversarial learning algorithms. The adversarial classifier’s defense mechanisms
are then supposed to improve model robustness to its validation data samples.
1.2 Adversarial Security Mechanisms 11
Here, validation data samples are deployed into the trained model’s runtime data
distribution to be non-iid with respect to testing data samples in trained model’s
training data distribution.
Papernot et al. [482] express their machine learning threat model in steps of
adversarial manipulations found during machine learning training process and
machine learning inference process. During machine learning training process,
adversary is supposed to manipulate either online data collection processes or
offline data collection processes. Such an adversarial manipulation either injects
adversarial examples or modifies training data with intent of modifying learning
model’s decision boundaries. During machine learning inference process, adversary
is supposed to plan either blackbox attacks or whitebox attacks on learning model’s
parameters. Such attack settings cause distribution drifts between training and
runtime data distributions.
Papernot et al. [482] also view machine learning security through the prism of
confidentiality, integrity, and availability models where adversary targets classifier’s
parameters, labels, and features, respectively. In contrast to machine learning
security, machine learning privacy is explored in terms of model performance when
(i) training and runtime data distributions differ, (ii) amount of data exposed by
learning model is bound by a differential privacy budget, and (iii) learning model’s
defenses provide fairness, interpretability, and transparency to learning outputs.
Adversarial environments affecting model complexity, model accuracy, and model
resilience are formulated in terms of no free lunch theorem for adversarial learning.
Papernot et al. [482] also motivate game theoretical adversarial learning during
machine learning inference within a probably approximately correct (PAC) learning
framework.
Biggio et al. [68] survey adversarial machine learning for pattern classifiers. The
adversarial examples for pattern classifiers are supposed to be created at either
training time or testing time. Recent research in adversarial examples for deep
network applications in computer vision and cybersecurity is also discussed. Attack
scenarios at training time are called poisoning attacks, while attack scenarios at
testing time are called evasion attacks. To integrate with deep learning terminology,
poisoning attacks are also called adversarial training attacks while evasion attacks
are also called adversarial testing attacks. Then security evaluation and defense
mechanisms of pattern classifiers under attack are discussed. Here, a proactive
security-by-design learning model incorporating adversary designs in learning
process is also presented. It is shown in Fig. 1.1.
Biggio et al. [68] categorize adversary designs as optimization problem-solving
for best attack strategy defined by adversary’s goal in attack scenario, adversary’s
knowledge of targeted learning system and adversary’s capability of manipulating
input data. Under various assumptions on such adversary designs, optimal attack
strategies are then shown to be possible for not only supervised learning algorithms
but also unsupervised learning algorithms such as clustering algorithms and feature
selection algorithms. Adversary’s goal is further categorized into (i) security
violation that compromises one of integrity, availability, and privacy of learning
system and (ii) attack specificity and error specificity that cause misclassification
12 1 Adversarial Machine Learning
Fig. 1.1 Reactive and proactive arms race between adversary and learner
of specific set of samples and specific set of classes, respectively. Here, adversary’s
knowledge of targeted learning system is further categorized into following
• Perfect-knowledge whitebox attacks with complete knowledge of learning
parameters. In this case, security evaluation provides upper bound on perfor-
mance degradation in attack scenario.
• Limited-knowledge gray-box attacks with prior knowledge about feature
representation and learning algorithm but not training data and learning param-
eters. Here security evaluation is conducted on a surrogate classifier learning a
surrogate dataset available from similar data source as training data. Adversarial
examples for surrogate classifier are then tested against targeted classifier to
evaluate transferability of attack scenarios between learning algorithms.
• Zero-knowledge blackbox attacks without any knowledge of learning
algorithm but partial knowledge of feature representation and training data
distribution. Here security evaluation checks whether optimal attack strategy
transfers between an optimally trained surrogate model and targeted classifier
model. Reinforced feedback on classifier decisions can be used to refine surrogate
model.
Biggio et al. [68] also categorize adversary’s knowledge by application-specific
data manipulation constraints on input data distributions, features, and classes.
A high-level formulation of adversary’s optimal attack strategy and classifier’s
security evaluation curves is also provided. Such a security evaluation considers
both differentiable and non-differentiable learning algorithms like neural networks
and decision trees, respectively. Here sensitivity analysis of deep networks is
1.3 Stochastic Game Illustration in Adversarial Deep Learning 13
Figure 1.2 illustrates the learning process in the game formulation of our research
as a flow chart. The CN Noriginal is trained on training data Xtrain and evaluated
on testing data Xtest to give “learner performance” in the experiments. Figure 1.2
Fig. 1.2 A flow chart illustrating the benefits of a game theoretic learner. The two-player game
is played by a single adversary and one learner. The game produces a final deep learning network
CN Nsecure that is better equipped to deal with the adversarial manipulations than the initial deep
learning network CN Noriginal
14 1 Adversarial Machine Learning
illustrates a two-player game. The game has moves executed by each of the
adversaries and the learner during each interaction. In these moves, an adversary
targets the learner by the adversarial sample produced from the evolutionary
operators. The learner then adapts the deep learning operators for the adversarial
data by retraining the CNN on the new cross-validation sample.
A set L of M adversaries L = {L1 , L2 , L3 , . . . , LM } targets this performance
by engaging the CNN in multiple two-player sequential games. In each two-player
game, the CNNs trained on the original and generated data samples and tested
on the adversarial data are CN Nmanipulated−cnn and CN Nmanipulated−gan , respec-
tively. All these CNNs are given under the umbrella term “manipulated learner
performance.” We find that CNNmanipulated−cnn as well as CN Nmanipulated−gan
are significantly worse performing than the original CNN CN Noriginal trained on
the original training and testing data (Xtrain , Xtest ). Thus we conclude adversar-
ial manipulation succeeds in attacking the learner. A new convolutional neural
network CNNsecure is then retrained on (Xtrain + A∗S , Xtest + A∗S ) to adapt to
adversarial manipulations. It is given as “secure learner performance.” CN Nsecure
is our proposed model. It is found to be better than the manipulated CNN’s
CNNmanipulated−cnn and CN Nmanipulated−gan .
Therefore, we conclude that the new CNNsecure has successfully adapted to
adversarial data generated by multiple adversaries, while the given CN Noriginal
is vulnerable to each adversarial manipulation αi∗ generated by each adversary Li
playing a game i on the given training/testing data distributions. Our algorithm
is able to find a data sample that affects the performance of a CNN. The CNN
that is able to recover from our adversarial attack is better equipped to deal with
unforeseen changes in the underlying data distribution. The game between adversary
and learner allows us to produce adversarial data manipulations for a CNN trained
on the underlying data distribution.
Chapter 2
Adversarial Deep Learning
Deep learning is not provably secure. Deep neural networks are vulnerable to
security attacks from malicious adversaries, which is an ongoing and critical
challenge for deep learning researchers. This chapter studies adversarial deep
learning algorithms in exploiting vulnerabilities of deep neural networks. The core
focus is on a series of game theoretical adversarial deep learning algorithms for
improved network robustness especially under zero-knowledge black-box attack
scenarios. Although there are many recent works that study network vulnerabilities,
few are proposed for zero-knowledge black-box attacks, and even fewer are on
game theoretical-based approach. Even innocuous perturbations in training data
can change the way a deep network behaves in unintended ways. This means that
imperceptibly and immeasurably small departures from the training data can result
in a completely different label classification when using the model for supervised
deep learning. The algorithmic details proposed in this chapter have been used in
game theoretical adversarial deep learning with evolutionary adversaries, stochastic
adversaries, randomized adversaries, and variational adversaries proposed in our
research. In designing the attack scenarios, the adversarial objective was to make
small, undetectable changes to the test data. The adversary manipulates representa-
tion parameters in the input data to mislead the learning process of the deep neural
network, so it successfully misclassifies the original class labels as the targeted class
labels.
Deng [156] surveys existing literature on deep learning for representation
learning and feature learning where a hierarchy of higher-level features or concepts
are defined from lower-level ones. Deep learning models, architectures, and algo-
rithms are categorized into three classes—generative, discriminative, and hybrid
models:
• Generative models characterize the joint probability distributions of the observed
data and their associated classes with high-order correlation properties between
the observed variables and the hidden variables.
No free lunch (NFL) theorems for supervised learning and optimization [649–651]
state that, averaged over all learning theoretic situations represented in data samples,
machine learning models preferring simple to complex training fail as often as they
succeed. This means that the random process generating training data distribution
may not always be the same as the random process governing the testing data
distribution. There are many alternative models to consider for the analysis of data
mixed with noise. There is no guarantee that the statistical model chosen is the
right one or adequately captures patterns in all the data samples. Smoothing and
regularization techniques are a simple approach to uncover patterns in training data
with a minimum of preconceptions and assumptions as to what those patterns should
be in testing data. In general, we have to contend with a model selection criteria for
the chosen analytics algorithm.
In predictive analytics models built with supervised machine learning algorithms,
the model selection criteria carries out an optimization of the goodness of fit
to a training and a testing data sample. This is called cross-validation which
assumes that a statistical model is as good as its prediction. This model evaluation
scheme is unable to estimate counterfactual predictions when the world changes.
So Fig. 2.1 shows additional validation data samples to compare predicted classes
2.1 Learning Curve Analysis for Supervised Machine Learning 17
with actual classes. In adversarial learning, such comparisons are done with the
adversarial cost functions accounting for both class and cost distribution information
in generating the predictions of supervised learning algorithms. Thus, adversarial
data can be considered to be part of the validation data samples in model selection.
An adversarial training process trains the machine learning models on both training
data samples given by the user and validation data samples created by the adversary.
Further, the validation data samples are used to fine-tune the hyperparameters for
training the machine learning models.
In experimental evaluation of adversarial machine learning, we may run sta-
tistical tests to find counterfactual scenarios in the training data. Causal inference
can also be used to estimate the impact of counterfactual scenarios. For systematic
model selection in machine learning, the counterfactual modeling focus is on
estimating what would happen in the event of a change that may or may not
actually happen in the training data. Such adversarial machine learning models may
sacrifice predictive performance in the current environment for machine learning
to discover new counterfactual features in a changing validation environment for
machine learning. The resultant counterfactual policies comparing training data with
validation data can be used to define new sensitivity analysis, anomaly detection,
and concept drift applications for adversarial learning. Cost-sensitive evaluation
metrics account for severity differences in false alarms versus missed fraud cases.
18 2 Adversarial Deep Learning
Feature ranking techniques can then guide the contextual signaling of fraudulent
predictions and feature manipulations. They account for the different degrees of
sensitivity of classification algorithms to spurious features in training data samples.
In the presence of adversarial validations, deep learning exhibits slow rate of
convergence and sensitivity to noise. So we ought to create learning curves on
deep learning as in Fig. 2.2 to discover counterfactual features in color-coded
classification baselines showing performance on y-axis for parameter ranges on x-
axis. According to the bias-variance tradeoffs in machine learning, complex models
that tend to overfit noisy data exhibit high variance, while simplistic models that
lack flexibility to approximate complex processes exhibit high bias.
We wish to arrive at a goodness of fit criteria for model selection that is neither
underfitting with high bias nor overfitting with high variance to the training data
samples. Practically, we want to select the regions of Fig. 2.2 that exhibit low errors
on all of training, validation, and testing data samples. Overfitting occurs when
training error is low but testing error is high. Underfitting occurs when testing error
is low but training error is high. In analyzing the prediction error, bias-variance
decomposition separates the analysis of bias and variance in the machine learning
model evaluation. By bootstrapping samples from the data given in cross-validation
experiments, we create training, validation, and testing data samples to estimate
models from the machine learning algorithms. Bagging, boosting, and stacking are
commonly used data sampling methods to create the cross-validation datasets. Bias-
variance decomposition is applicable to the generalization errors resulting from loss
functions in both classification and regression [160].
Learning curves represent the generalization performance of the models pro-
duced by a learning algorithm. Based on estimated probabilities for class mem-
bership, learning curves compare different classification algorithms to explore the
relationship between training dataset size and the learning/induction algorithm.
They allow us to see patterns that are common across different datasets. Without an
examination of the learning curves, we cannot draw conclusions on one algorithm
2.2 Adversarial Loss Functions for Discriminative Learning 19
being better than another algorithm for a particular application domain. A summary
of learning curve analysis is given by Perlich et al. [491].
Comparison between analytics modeling for robust theoretical evidence is to
be done with performance metrics on the cost imbalance due to misclassification
errors in the predictions. Predictive performance of the model’s ability to distinguish
between adversarial data and training data can be analyzed with accuracy and area
under the receiver operating curve (AUC). Additionally, performance metrics that
reflect the imbalance in class labels can be used to calculate the classification
errors. They include sensitivity, F1-score, and F2-score. Learning curves papers.
It is important to incorporate such data protection safeguards in the analytics value
chain built with cross-validation or hold out testing to choose the “most accurate”
algorithm for analyzing a given dataset.
A cyber risk analytics for the information leaks in deep learning has become nec-
essary to analyze machine learning models trained on sensitive datasets. Learning
curves can consider the bias-variance decomposition in adversarial loss functions
to derive such regularizations in the learning objectives of supervised machine
learning. During model validation experiments, the information divergence between
the validation data samples and the training data samples may be computed as the
discretized data distributions obtained from sampling schemes within adversarial
deep learning.
The extent to which noise on modeling parameters and their training data can
benefit the overall quality of the data distributions in sampling schemes depends
on the specific adversarial noise processes and the nature of the generated target
distribution in game theoretical adversarial learning. Our proposals study the
interplay of adversarial cost function and classification error functions to design
game theoretical classifiers that deteriorate at a slower rate than regular classifiers
on adversarial data. With deep generative modeling of the adversary’s best response
strategies, we construct the data resampling dynamics from measurement studies on
cost-sensitive adversaries for discriminative learning. We simulate encodings of the
resultant decision boundaries as storing-retrieving problems in data mining.
of deep learning in this way, we can propose defense mechanisms to make robust
neural networks.
Specific to supervised learning applications, loss functions evaluate the statistical
error of predictive analytics. Typically, the loss function reduces bias in a predictive
classification model and variance in a predictive regression model [69]. Here,
adversarial loss functions reduce the predictive model’s sensitivity to model noise.
Our proposal is to analyze this type of noise in a game theoretical adversarial
deep learning paradigm. It involves the design of adversarial payoff functions that
generate adversarial data manipulations by optimizing the adversarial cost functions
for different types of adversaries. Such adversaries include evolutionary, stochastic,
randomization, variational, and generative adversaries.
The intuition of our adversarial loss functions is derived from the concept
of actions and moves in game theory. During learning, the attack scenarios are
modeled as moves made by a learning algorithm and countermoves made by an
intelligent adversary. Our game theory studies interactions between independent
self-interested agents or players working toward a goal. Each player has a set
of associated strategies/moves/actions that optimize a payoff function or utility
function for achieving the goal. The game eventually converges to an equilibrium
state from which none of the players have any incentive to deviate.
Through optimizations of the proposed adversarial learning, we can empirically
analyze the discriminative loss functions in deep learning to generate misclassified
data points and hence adversarial manipulations to the training data. Further,
in contrast to traditional deep learning methods, we propose adversarial payoff
functions that are non-differentiable and discontinuous over the search space of
the adversarial manipulations. Within an empirical risk minimization framework
for supervised learning and game theory, we study adversarial loss functions for
discriminative learning involving classification and regression.
Rich data science problems and machine learning features can be engineered
from our algorithms by modeling a wide variety of data analytics application
scenarios involving discriminative learning. For example, we propose adversarial
loss functions to learn moments and cumulants of time-dependent data distributions
in regression modeling. The proposed loss functions can be extended for non-
linear algorithm-oriented approaches to robust regression. The sensitivity of our
loss function could be customized with patterns constructed to improve application-
dependent model selection. Here, deep generative models are helpful for feature
engineering and learning generalizations in specific application domains.
Our adversarial payoff functions can model the discrimination hypotheses around
class labels and their decision boundaries in classification modeling. The proposed
payoff functions optimize the search for data manipulations on an original pixel data
space as well as a latent data space representing pixel distributions by a Gaussian
mixture model. The payoff functions were then optimized by the parameter settings
in simulated annealing, variational learning, and generative learning algorithms. The
deep neural network’s misclassification performance at the time of Nash equilibrium
was measured in terms of t-statistics hypothesized over recall, true positive rate, and
the f1 -score of targeted class labels.
2.3 Adversarial Examples in Deep Networks 21
network classifiers. Experiments are then proposed to explore the data space and
transferability properties of targeted adversarial examples. To formulate the attacks,
three threat models of zero-knowledge adversary, perfect-knowledge adversary,
and limited-knowledge adversary are defined. A zero-knowledge adversary has
no knowledge of detector’s presence while targeting class label predictions of
the classifier. The zero-knowledge adversary thus acts as a baseline for targeting
any proposed detector. In comparison, a perfect-knowledge adversary has full
knowledge of both classifier parameters and detector’s detection scheme. Perfect-
knowledge adversary thus performs a white-box attack. To perform a black-box
attack, Carlini et al. [102] assume that a limited-knowledge adversary knows
detector’s detection scheme but has no access to trained classifier, trained detector,
or their training data.
The detector schemes studied by Carlini et al. [102] include (i) a second neural
network to classify images as natural or adversarial, (ii) principal component
analysis (PCA) to detect statistical properties of images or network parameters,
(iii) statistical hypothesis tests (such as maximum mean discrepancy tests and
Gaussian mixture models comparing adversarial data distribution with original
data distribution), and (iv) input normalization with randomization and blurring.
A 2 distance between adversarial examples and training examples is assumed to
be the adversarial loss function measuring robustness of defense in the detection
mechanisms. In an iterative gradient descent attack, the adversarial loss function has
an additional regularization term that compares the deep network’s log-likelihood
of predicting target class with the next-most-likely class. A user-defined threshold
on ranked log-likelihood’s then assigns either a high-confidence or low-confidence
to the generated adversarial examples. The evaluation of the properties of the
adversarial examples is recommended to be done according to the following
evaluation criteria:
• Evaluating adversarial examples across multiple datasets (such as MNIST and
CIFAR) with defenses that did not operate directly on pixels
• Evaluating new schemes for strength of an attack which demonstrate an adver-
sary who can generate attacks to evade detection when aware of proposed defense
• Reporting false positive rates in addition to true positive rates in the performance
evaluation.
Here, a neural network is said to be robust if finding adversarial examples that
bypass its detector is a difficult proposition.
Baluja et al. [25] proposed a targeted attack where feedforward neural net-
works called adversarial transformation networks (ATNs) are trained to generate
adversarial examples. ATNs generate adversarial examples that minimally modify
classifier’s outputs given original input. By contrast, Moosav et al. [439] constructed
an untargeted attack technique, i.e., DeepFool, which is optimized by distance
metrics between adversarial examples and normal examples.
In our research in [124, 125], we generate adversarial examples to effect a
poisoning attack on the classification training data. The adversarial examples are
generated by the adversarial manipulations learned during our game theoretical
2.5 Generative Adversarial Networks 25
attacks on the training process of the learner. In our black-box attack scenarios
generating testing data distributions, no prior knowledge about the learning model
is assumed. Our adversary knows neither the learning model’s training process
nor the learning model’s best response strategies across the Stackelberg game’s
plays. Our adversary does a targeted attack to manipulate multiple positive labels
into a single negative label. The attack strength of our adversarial manipulations is
defined in terms of search randomization parameters in ALS and SA. The scalar
optima in SA are used to generate the vector optima in ALS. The local optima in
ALS converge onto the non-convex stochastic optima solving the Stackelberg game
to produce output of optimal adversarial manipulations. The optimal adversarial
manipulations are able to encode the adversarial data in terms of the multivariate
statistical parameters of a Gaussian mixture model produced in multi-label datasets.
Goodfellow et al. [226] state that the primary cause of deep learning networks’
vulnerability to adversarial examples is their linear nature in high-dimensional
search spaces. Also the deep learning networks perform poorly on testing data
examples that do not have high probability in the training data distribution. Thus,
adversarial examples can be generated by applying a worst-case perturbation to the
training data. The perturbed input results in an incorrect output prediction with high
confidence. Thus, Goodfellow et al. [226] argue for the need of having an adversarial
training procedure whose objective is to minimize the worst-case error when the
training data is perturbed by the adversary. Goodfellow et al. [226] then formulate
the adversarial training as a min-max game between two deep neural networks. The
resulting deep generative model is called generative adversarial networks (GANs).
A variety of deep generative methods are available to create the perturbation
between training and testing data distributions [485]. Radford et al. [500] propose a
stable GAN called DCGAN. Gulrajani et al. [400] design IWGAN which undertakes
a theoretical analysis of the generative learning process. Berthelot et al. [46] propose
BEGAN with a new loss function in the training algorithm. Chen et al. [119] propose
InfoGAN which uses generative learning models for unsupervised representation
learning.
Insofar as the learner’s defense mechanisms are concerned, our game formulation
is similar to the GAN game formulation. However, the objective of our research is to
simulate a real adversarial attack scenario on two-label and multi-label classification
model in terms of the cost to the adversary. We seek to increase the classification
performance when the data distribution is changed with a malicious intent. By
contrast, the objective of GAN is to generate synthetic data that is indistinguishable
from the original data. Our objective function has cost and error terms defining the
attack scenarios in adversarial data generation settings. By contrast, the objective
function in GAN is defined in terms of the loss functions of the deep neural networks
learning the given training and testing data distributions.
26 2 Adversarial Deep Learning
Adversarial examples have been defined for deep generative models [325]. The
distribution of adversarial manipulations in white-box attacks as well as black-
box attacks has been modeled with AdvGAN [659]. A thread of research on
adversarial autoencoders [606] imposes a prior distribution on the output of an
encoder network learning training data, where autoencoder discriminatively predicts
whether a sample comes from its latent space or from prior distribution determined
by the user. By contrast, our game theoretical optimization problem is independent
from a particular training data distribution and classification model.
Larsen et al. [346] propose generative adversarial learning in the reconstruction
loss of a variational autoencoder. Tran et al. [605] propose constraints on the
distance function to train generative adversarial networks in the latent spaces of an
autoencoder. Gregor et al. [233] propose an attention mechanism-based autoencoder
to learn the latent spaces in a sequential variational autoencoder framework. Ha
et al. [246] propose a recurrent neural network for sketch generation in images.
Makhzani et al. [405] propose an adversarial training mechanism for probabilistic
autoencoders.
2.6 Generative Adversarial Networks for Adversarial Learning 27
Causality methods have been applied to deep learning problems such as semi-
supervised learning and transfer learning. In these problems, informed priors
retrieved from other networks are used to center the weights in hybrid deep
learning networks. Such networks are then used to construct statistical hypotheses
on patterns, structure, context, and content in actual data [431].
Backpropagation learning algorithms for deep networks have been improved
by training probabilistic graphical models. Such training is inherently Bayesian
where prior distributions inform and constrain analytics models predicting posterior
distributions [567]. The improved deep learning algorithms result in a predicted
output informed by causal inference. Within a Bayesian framework, causality
methods also enhance the interpretability of deep networks operating in an uncertain
environment [314].
We are interested in the attack scenarios with latent variable models in game
theoretical adversarial learning. Kumari et al. [336] study white-box attacks at the
level of the latent layers of the adversarially trained image classification models.
Higher robustness at the feature layers is achieved by the adversarial training
of latent layers with an iterative variant of FGSM. By contrast, our research
creates deep generative models for the adversarial manipulations that provide game
theoretical regularizers on the targeted classifier’s loss function.
Chattopadhyay et al. [112] propose a structural causal model for causal influence
of an input feature on a neural network’s output. Such causal influences on the
prediction function’s output are called neural network attributions. They are said to
be more interpretable artifacts of the deep network causations rather than regression
features that primarily map correlations between the input and the output of the
Table 2.1 Generative adversarial network comparison 1
28
Information
Discriminator Generator loss divergence Payoff Optimization
Adversarial network Attack scenario loss function function function Game type function Cost function constraints
Defense-GAN [535] Model the Same as WGAN Same as WGAN MSE, L2 norm Min-max Reformer Adversarial Representative
distribution of game network, training GAN to
unperturbed latent codes augments the reconstruct
images in training data adversarial
white-box examples
attacks as well
as black-box
attacks
GANGs [474] Resource- Classifier score Generator payoffDeterministic Zero-sum Player payoff Measuring Finite GANGs on
bounded best that is function as a function ofand non- strategic- under a function in discrete data
responses on of both real data the fake data deterministic form profile of Definition 6
synthetic data and fake data only resource- game mixed and Theorem
bounded Nash strategies, 10
equilibrium Definition 2
AdvGAN [659] Semi-white-box Static and Targeted attack Ensemble Min-max Same as Hinge loss Cross-entropy loss
adversarial dynamic loss for LSGAN adversarial game Goodfellow
perturbations distillation given in learning GAN
with target classmodel training Equation 4
and ground truth with alternative
minimization
approaches
DeLiGAN [245] Limited training Same as Same as m-IS a KL Min-max Repara- L2 Uniform mixture
data to capture DCGAN DCGAN divergence game meterization regularizer to weights in
the diversity measuring of the latent prevent local gradient descent
across the image intra-class space in prior maxima in
modality sample diversity distribution generator
along with the
sample quality
2 Adversarial Deep Learning
EBGAN [702] Handcrafted and Reconstruction Same as Inception score Min-max Adversarial Reconstr- Grid search over
regularized loss and energy Goodfellow game training uction error architectural
contrastive function in GAN choices and
sample autoencoder hyperparameters
generation in
supervised,
weakly
supervised, and
unsupervised
settings
Fisher GAN [445] Standardized Mahalanobis Mean Integral Min-max Same as Same as Second-order
discrepancies in distance embedding probability game DCGAN DCGAN moments of the
two-sample between the distance metrics (IPM) critic that
hypothesis feature means and inception discriminates
testing and embeddings of score between the two
semi-supervised real and fake parametrized by distributions,
learning distributions generative cross-entropy
neural networks regularization
term in
2.6 Generative Adversarial Networks for Adversarial Learning
semi-supervised
learning
Improved Generator Same as Estimate of the f-Divergence Min-max Expectation Real-fake Image
f-GAN [496] divergence Goodfellow model game maximiza- data learning quality/diversity
ordered from the GAN density-to-data tion intractable metrics/factors
most mode density ratio likelihoods without dropping
seeking to most from the current modes
mode covering discriminator
29
Table 2.2 Generative adversarial network comparison 2
Information
30
image divergence
generation
f-CLSWGAN [657] No labeled Softmax Same as WGAN Multimodal Min-max None Generated data Class
examples of classifier for embedding game is much lower embeddings
certain classes in zero-shot model with dimensional that model the
multimodal learning (ZSL) labeled than semantic
embedding and examples of high-quality relationship
generalized seen classes images between
zero-shot and deep CNN necessary for classes
learning features discrimination
(GZSL) conditioned on
class-level
semantic
information
LSGAN [410] Penalize samples Equation 2 Equation 2 to f-Divergence, Min-max None None Deterministic
based on their generate samples Pearson game equations
distances to the toward the chi-square between labels
decision boundary decision boundary divergence for fake data,
and manifold of real data, and
real data generated data
for one-hot
encoding and
dimensionality
2.6 Generative Adversarial Networks for Adversarial Learning
reduction
D2GAN [459] Image generation Kullback- Any multi-mode Inception Min-max None None Minimal
Leibler (KL) density function score and game enclosing ball,
and reverse MODE score surrogate
KL objectives,
divergences multiple
players, etc. in
generator’s
density
function
31
32
Information
Discriminator Generator loss divergence Payoff Optimization
Adversarial network Attack scenario loss function function function Game type function Cost function constraints
McGan [446] Mean and Same as WGAN Mean Lq norm, Geodesic Min-max None None Bounded
covariance along with a covariance distances game modes of the
feature statistics cross-entropy Ky-Fan norm between the feature
loss on labeled (nuclear norm of covariances embeddings
data truncated and of real and
covariance probability fake
difference), measures in a distribution,
feature matching multimodal sufficient
integral setting samples
probability from “real”
metrics (IPM); and “fake”
IPMs are data for
bounded linear training both
functions “generator”
defined in the and the
non-linear “critic”
feature space feature space
induced by the
parametric
feature map
DCGAN [500] Reusable feature Leaky rectified Deconvolutions Percentage Min-max None None Batch nor-
representations activation and filtering the accuracy on game malization
from large maximal training,
unlabeled activations of testing,
datasets, each convolution validation
hierarchical filter in the data
clustering of the network
intermediate
representations
2 Adversarial Deep Learning
BEGAN [46] Matching the Discriminator Negative of Global measure Min-max None None Correct hyper-
distribution of the has two discriminator loss of convergence game parameter
errors instead of competing by using the selection to
the distribution of goals in boundary maintain a
the samples, closed-loop equilibrium balance
dynamically feedback concept from between the
weighing control: proportional generator and
regularization autoencode real control theory discriminator
terms or other images and losses
heterogeneous discriminate
objectives real from
generated
images
BGAN [274] Same as DCGAN Same as Boundary-seeking f-Divergence and Min-max None None Approximated
DCGAN REINFORCE Jensen-Shannon game the
objective with divergence with expectations in
policy gradient importance normalized
training where weights for importance
reward is the generated weights by
2.6 Generative Adversarial Networks for Adversarial Learning
neural network. In sequence prediction tasks with such a structural causal model,
the causal dependencies between different input neurons are assumed to be jointly
caused by a latent confounder such as a data-generating mechanism applied to time-
series models.
Yang et al. [677] study the pixel-level features for causal reasoning in pixel-wise
masking and adversarial perturbation. Ancona et al. [15] and Lundberg et al. [400]
discuss attribution methods in Shapley values from cooperative game theory.
Our research investigation is in creating such interpretable artifacts of the game
theoretical adversarial manipulations. Toward this end, we have created Granger-
causal features of the regression predictions. In future work, we shall create
predictive baselines in latent variable models of the data-generating mechanisms in
neural network attributions. We expect such baselines shall discover counterfactual
features in application-specific rule-based classifiers.
Bastani et al. [35] propose metrics to evaluate the robustness of deep neural
nets. Narodytska et al. [453] create Boolean representation of a deep neural
network to verify its properties. Tomsett et al. [599] survey connections between
interpretability and adversarial attacks. Liu et al. [381] develop adversary-resistant
detection framework by utilizing the interpretation of machine learning models. Tao
et al. [593] propose an adversarial sample detection technique for face recognition
models, based on interpretability. Fidel et al. [194] propose a method for detecting
adversarial examples with SHapley Additive exPlanations (SHAP) values computed
for the internal layers of a DNN. Ilyas et al. [294] attribute adversarial examples
to the presence of non-robust features. Ignatiev et al. [291] demonstrate that the
explanations (XPs) of machine learning (ML) model predictions and of adversarial
examples (AEs) are related by a first-order logic (FOL) framework called hitting set
duality.
Stackelberg Game
Adversarial
CNNo XTrain , Attack
Attack CNNm
Dec(Enc(Xtrain) XTest-manip baseline
+αbest)
Train baseline Adversary Adversarial
Classifier
Search attack data
Encoder
For every
(Enc)
CNN according to α*
according to Line 17 in Optimal
Line 13 in Algorithm 1 attack
Algorithm 1
Adversarial
Classification training data
Defence
XTrain-manip, Defend
payoffbest CNNs
XTest-manip baseline
Fig. 2.3 A flowchart illustrating the adversarial autoencoder-based Stackelberg game theoretical
modeling
38 2 Adversarial Deep Learning
deep neural networks, logistic regression, support vector machines, decision trees,
nearest neighbors, and classifier ensembles. Even commercial machine learning
classifiers hosted by Amazon and Google are considered in the experimental
evaluation. Substitutes are designed for black-box attacks where adversaries target
remote classifiers without knowledge about model architecture, parameters, and
training datasets. Experiments indicate that knowledge transfer occurs between
many machine learning models to a deep neural network mimicking the decision
boundaries of the original classifier.
To study transfer learning with target labels, Liu et al. [388] distinguish between
non-targeted and targeted adversarial examples. Adversarial examples are generated
to transfer to particular target labels as misclassified by deep learning models. Chin
et al. [122] propose a new transfer learning method for fine-tuning the robustness
of transferred neural network obtained from regularizing the pre-trained models in
deep learning. The adversary has access to the pre-trained model’s weights and
architecture but does not have access to task-specific transferred model and query.
Baluja et al. [26] train a neural network called adversarial transformation
network (ATN) to craft a targeted adversarial attack. Instead of solving per-
sample optimization problems to create the adversarial data, ATN creates minimally
modified adversarial examples for every input training image. ATNs accommodate
various threat models such as training black-box and white-box targets over targeted
and untargeted attack scenarios on the rank orders in the target neural network’s
outputs. Further ATN can be trained to generate either an adversarial perturbation
from a variant of residual networks or an adversarial autoencoding of the input
reconstructed with adversarial noise signal.
Wu et al. [654] identify transferable adversarial examples due to the skip
connections in supervised deep learning. Gradients from the skip connections are
proposed to craft the adversarial examples. They transfer to the state of the art in
deep neural networks including ResNets, DenseNets, Inceptions, Inception-ResNet,
and Squeeze-and-Excitation Networks. Furthermore, such adversarial examples
can be combined with existing black-box techniques for adversarial attacks to
obtain improvements in the state-of-the-art transferability methods. Such adversarial
examples raise security concerns in the deployment of deep neural networks in
applications such as face recognition, autonomous driving, video analysis, and
medical diagnosis.
Su et al. [575] propose adversarial domain adaptation with active learning. Impor-
tance sampling combined with adversarial training is used to account for distribution
shifts between domains. It acts as a sample selection scheme for active learning
especially when the target domain does not have as many labeled examples as the
source domain. In the importance sampling, a diversity of samples is generated with
the help of adversarial loss. Such a semi-supervised domain adaptation improves
2.7 Transfer Learning for Domain Adaptation 41
images used in object detection over stop signs, for example. Thys et al. [598]
craft adversarial patches in person detectors. Here, the target classes contain lots
of intra-class variety unlike stop sign’s dataset. Such adversarial attacks can be used
as cloaking devices to circumvent surveillance systems where intruders can sneak
around undetected by holding a small cardboard plate in front of their body aimed
toward the surveillance camera. They can augment the human-annotated images
to determine the model performance for person detection. Such test sets account
for adversarial examples designed to steer the model in the wrong way and further
target to fool the model. Such vulnerabilities in person detection models of a security
surveillance camera can be highlighted as risks of such an attack on the detection
system. The bounding box for adversarial patches is predicted according to an object
score and a class score components in the adversarial losses. Adversarial patches
are then applied to the images after various transformations to fool the detectors
even more. This allows the generation of targeted attacks where data is available
for particular scenes in the footage environment. Some of the factors influencing
an adversarial patch generation are lighting changes, viewing angle differences,
rotations in patch, size of patch. They can change with respect to person size, camera
can add noise or blur the patch. They optimize an image to minimize different
probabilities related to the appearance of a person in the output of a detector. In
experiments, the effects of generated patches are compared with that of random
patches to determine the most effective patches minimizing object loss. Optimizing
the adversarial losses for different detector architectures ensures the transferability
of the adversarial patches.
Elsayed et al. [172] create adversarial examples that transfer from computer
vision models to time-limited human observers. The effect of adversarial examples
in machine learning is investigated in contrast to cognitive biases and optical
illusions in human visual perception studied by neuroscience. So it is possible to
craft adversarial examples with human-meaningful features. They can be designed
to cause a mistake not only in visual object recognition but also in human perception.
Elsayed et al. [172] design psychophysics experiments to compare the pattern
of errors made by humans to the misclassification validations in neural network
classifiers.
Brown et al. [87] create targeted adversarial image patches that can attack any
scene in the physical world to cause an image classifier to output any target class
under a wide variety of mathematical transformations. Prior knowledge of lighting
conditions, camera angle, and classifier types being targeted is not required to
create such a physical-world attack. In image classification tasks, the classifier
must detect the most salient feature in an image to determine its class label. The
adversarial path exploits this feature to produce adversarial features that are much
more salient than objects in the physical world. So large local perturbations that are
not imperceptible can also mislead machine learning classifiers that operate without
human validation. So adversarial examples can be crafted for the physical world
by modeling adversarial examples from physical transformations where robots are
perceiving the world through cameras, sensors, and phones to deal with image,
sound, and video data representations.
2.7 Transfer Learning for Domain Adaptation 45
Athalye et al. [19] synthesize 3D adversarial objects that are adversarial over a
chosen distribution of transformations such as viewpoint shifts, camera noise, and
affine transformation. An expectation over transformation algorithm is designed in a
white-box attack scenario where adversary has access to classifier, its gradient, pos-
sible classes, and a space of valid inputs. In the optimization procedure for creating
adversarial examples, the adversarial perturbations are modeled with respect to an
expectation defined on a chosen distribution of transformation functions. Instead
of selecting the log-likelihood of a single example as the optimization objective,
the effective distance between adversarial and original inputs is minimized. This
is the expected or perceived distance as seen by the classifier. The optimal solution
resulting in adversarial data is obtained by a stochastic gradient descent algorithm of
the expected value where the gradient is computed through differentiation through
each of the sampling transformations. Such an adversarial attack scenario treats the
cyber world as a domain whose transformations transfer to the physical world acting
as a codomain. The distribution of transformations acts as a perturbation budget to
produce successful adversarial examples.
Machine learning systems are vulnerable to adversarial attacks especially in
non-stationary adversarial environments within the cybersecurity domains. Beyond
image recognition domains such as deepfakes in detection systems, adversarial
learning applications in the cybersecurity domains include malware identification,
spam detection, risk scoring, SQL injection, ransomware development, biometrics
recognition systems, traffic sign detection, autonomous driving, anomaly detection,
entity classification, dictionary learning, cyber-physical systems, and industrial
control systems. In cybersecurity domains, modifying an API call or an executable’s
content byte might cause the modified executable to perform a different functional-
ity. So adversaries in the cybersecurity domains must implement methods to modify
executable’s features that will not break its functionality due to the perturbed data
samples in feature vectors within URL characters, spam emails, network packets,
phishing detectors, sensor signals, physical processes, etc. Some of the targeted
attacks in neural networks built for cybersecurity domains are dedicated APT
attack, Trojan attack, backdoor attack, and distributed denial-of-service (DDoS)
attack. In cyber-physical systems, adversarial learning applications play a role in
the optimization of critical infrastructure such as electric power grids, transportation
networks, water supply networks, and nuclear plants. In biometrics recognition
systems, adversarial learning has applications in handwritten signature verification,
fingerprint classification, face recognition, sentiment analysis, speaker recognition,
network forensics, and iris code generation. A survey of adversarial attacks in
cybersecurity in contrast to computer vision is given by [656] and [616]. It can
be used to build threat-knowledge databases in sensitive real-time applications for
artificial intelligence and soft computing.
Chapter 3
Adversarial Attack Surfaces
In this chapter, we explore adversarial attack surfaces. We examine how they can
exploit vulnerabilities in machine learning and how to make learning algorithms
robust to attacks on security and privacy of the learning system. To explore the
vulnerabilities, we can simulate various model training processes under a range
of various attack scenarios in supervised and unsupervised settings. Each attack
strategy is assumed to be formulated by an intelligent adversary that is capable
of either feature manipulation, label manipulation, or both. The optimal attack
policy of the adversaries is determined by the solution for optimization problems
that output the adversarial data. We can then apply the knowledge that we learned
to improve and reinforce the learning procedure so as to better defend against
attacks. The sensitivity analysis summarized in this chapter can be used to develop
computational algorithms for optimization objectives and statistical inferences
in adversarial learning algorithm’s capacity for randomization, discrimination,
reliability, and learnability. It creates research pathways into robustness, fairness,
explainability, and transparency of machine learning models.
Evasion Attacks Biggio et al. [54] discuss adversarial security at test time of a
deployed classifier system. Security evaluation is then performed at different risk
levels of non-linear classifier performance in malware detection. A secure classifier
is proposed by using a gradient descent approach on a differentiable discriminant
function. Adversary’s goal is defined in terms of minimizing classifier’s loss
function with positive adversarial samples that cross decision boundary. This model
can also incorporate application-specific adversarial knowledge in the definition of
adversarial attack scenarios. Such adversarial knowledge includes prior knowledge
about training data, feature representation, type of learning algorithm and its
decision function, classification weights, and feedback from classifier.
and maximum entropy models learning Boolean features for spam filtering. The
proposed learning framework (called ACRE) is useful to study both the attacker
or adversary and the defender or classifier. It can be used to determine whether an
adversary can efficiently learn enough about defeating a classifier by minimizing an
adversarial cost function.
to more robust than the performance of any base model because weights of
less important features will be overwhelmed by the weights of more important
features during the training process.
• Partitioned logistic regression is a special case of feature bagging where feature
subsets and class labels are non-overlapping.
• To prevent undertraining, confidence-weighted learning aggressively updates
weights of rare features in the data by maintaining a normal distribution over
the weight vector of a linear classifiers. The feature weights are updated such
that the Kullback-Leibler divergence between the training data distribution and
testing data distribution is minimized without reducing model performance.
• Feature noise injection alleviates the problem of model overfitting to training data
by introducing artificial feature noise during model training.
• Sample selection bias correction assigns feature weights such that reweighted
training data resembles the available testing data. The correct weights are inferred
without explicit density estimation. However, sample selection bias correction
assumes the testing data is also available during training in the input domain.
Liu et al. [378] design supervised learning algorithms secure to adversarial poison-
ing attacks that do not make independence assumptions on feature distributions.
Poisoning attacks are assumed on both dimensionality reduction and predictive
regression steps. High-dimensional features are projected into a low-dimensional
subspace with high data density. Then linear regression models best characteristics
of data. A matrix factorization algorithm is proposed to recover low-dimensional
subspace in the presence of training data corrupted by both noise and adversarial
examples. A principal component regression uses trimmed optimization to estimate
regression parameters in low-dimensional subspace.
In the adversarial attack scenario proposed by Liu et al. [378], the regression
model can choose its training process and defense strategy without access to training
data before adversarial manipulation. The adversary has full knowledge of training
algorithm and parameters. The adversarial attack scenario is simulated as a zero-sum
Stackelberg game where adversary’s payoff function minimizes a certain budget of
poisoning training data, while regressor’s payoff function is regression accuracy.
The learning process is formally characterized in terms of a model function relating
the adversarial input and the predicted output. A quadratic loss function and a
threshold function bounding loss function are also analyzed in the regression.
An alternative maximization solves the proposed optimization problem on HTTP
logs. The adversarial data is generated by moving training data samples along a
direction to manipulate the regressor until it cannot predict correctly. Results are
benchmarked against robust regression models like OLS linear regression and ridge
regression predictions in the presence of noise.
52 3 Adversarial Attack Surfaces
cost that is independent of the feature values in the sample. The first algorithm
greedily maximizes SVM’s test error through continuous relaxation of the label
values in a gradient ascent procedure. The second algorithm does a breadth-first
search to greedily construct sets of candidate label flips that are correlated to the
SVM’s testing error. Both algorithms can be understood as a search for labels
that achieve maximum difference between empirical risk for classifiers trained on
original data and contaminated data. The algorithms can also be used to simulate
a constant sum game between the attacker and the classifier whose aim is to,
respectively, maximize and minimize testing error on the untainted test dataset.
Different game formulations can be simulated if the players use non-antagonistic
objective functions. Improvements to the algorithms are possible by the study of an
incremental SVM under label perturbations. The problem of label noise injection
creating the attacker manipulation in an SVM is also related to the classification
problems for SVMs in semi-supervised learning, active learning, and structured
prediction.
Biggio et al. [59] propose that an ensemble of linear classifiers can improve
not only accuracy but also robustness of supervised learning. That is because
more than one classifier has to be evaded or poisoned to compromise the whole
ensemble of classifiers. The training strategy evenly distributes the feature weights
between discriminative and non-discriminative features in data. Undermining the
discriminative weights in the classifier can then undermine the accuracy of the
classifier. The objective of robust classifier ensembles is then to find such a correct
tradeoff between robustness and accuracy. Here, an adversary is forced to modify a
large nature of feature values to manipulate the classifier.
Biggio et al. [59] design boosting and random subspace method (RSM) to
distribute weights in the adversarial algorithm. The adversarial behavior is modeled
in terms of two scenarios—a worst-case scenario where the adversary has complete
knowledge of the classifier and an average-case scenario where the adversary has
only an approximate knowledge of the classifier. The ensemble discrimination
function is then obtained by averaging different linear classifiers trained on different
randomly selected subsets of the original feature set.
The averaging method by Biggio et al. [59] to find ensemble performance is
an extension of the idea to use average performance of linear classifier to prevent
overfitting or underfitting in imbalanced data. By reducing the variance component
of classification or estimation error, the randomized sampling used in the algorithm
reduces instability in decision or estimation function. Such a stable decision function
is not supposed to undergo large changes in output for small perturbations in input
data due to adversarial data or stochastic noise.
In Biggio et al. [59], the experimental evaluation has two objectives. The
first objective is to understand the conditions under which randomized sampling
54 3 Adversarial Attack Surfaces
For each feature selection algorithm in the experiments, Xiao et al. [663]
optimize an adversarial loss function with a (sub)gradient ascent algorithm solving a
convex optimization problem. The feature spaces are assumed to define continuous
and discrete features and differentiable and non-differentiable features. To evaluate
the feature selection under attack settings, a stability index is proposed to indicate
anti-correlation rankings between feature subsets of the feature selection algorithm.
Experiments show that an adversary can easily compromise feature selection
algorithms that promote sparsity in the feature representation. The poisoning attacks
and evasion attacks are said to mislead model’s decision-making by introducing
model bias and model variance, respectively, into the feature selection algorithm’s
mean squared error decomposition.
Mei et al. [419] poison latent Dirichlet allocation (LDA) corpus so that LDA
produces adversarially manipulated topics in LDA user decisions. Adversarial attack
is formulated as a bilevel optimization problem for variational inference under
budget constraints. It is solved by a computationally efficient gradient descent
method based on implicit functions. The optimization employs a KL divergence
between LDA learner’s word-topic distribution and fully factorized variational
distribution constrained by Karush-Kuhn-Tucker (KKT) conditions. The adversary
poisons training corpus such that topics learned by LDA are guided toward target
multinomial distributions defined by the adversary. Adversary’s goal is to minimize
an attacker risk function which defines the distance between adversarial multinomial
distribution and training multinomial distribution. Adversary’s risk combined with
learner’s KL divergence gives a bilevel optimization framework for constructing
the adversarial examples. Adversarial examples on words and sentences misleading
LDA topics are created on a corpus sourced from the United States House of
Representatives floor debate transcripts, online new year’s wishes, and TREC AP
newswire articles.
Kloft et al. [318] explore adversarial examples for an (online centroid) anomaly
detection algorithm. The adversarial attack scenario is expressed in terms of the
efficiency and the constraints of formulating an optimal attack on outlier detection.
The outlier detection finds unusual events across finite sliding windows in computer
security applications such as automatic signature generation and intrusion detection
systems. A poisoning attack is assumed to create adversarial examples on training
data where a certain percentage of training data is controlled by the adversary.
An anomalous data point is then measured according to the Euclidean distance
from the empirical mean of the training data. The empirical mean is calculated on
training data by a finite sliding window online algorithm for non-stationarity data.
By pushing the empirical mean point toward adversarial examples, the adversary
forces the anomaly detection algorithm to accept anomalous data point as normal
training data.
3.7 Robust Anomaly Detection Models 57
Kloft et al. [318] express the relative displacement of original empirical mean in
terms of the attack direction vector between the attack point and the mean point.
A greedy optimal attack is then proposed to locate attack points in a Voronoi cell
on data points that maximize relative displacement of the empirical mean. For a
Euclidean norm, the greedy attack is optimized with either a linear program or
a quadratic program. The mixing of normal points and attack points is modeled
by Bernoulli random variables which are iid in a kernel Hilbert space. The attack
progress is measured by projecting the current empirical mean onto an attack
direction vector. Theoretical analysis is provided for bounding the expectation
and the variance of relative displacement by the number of training points and
attack points in the current sliding window. The adversary is assumed to have full
knowledge of the training data and the anomaly detection algorithm. The anomaly
detector’s defense to adversarial attack is proposed in terms of controlling the false
positive rate.
Rubinstein et al. [532] evaluate poisoning attacks and training defenses for
principal component analysis (PCA)-subspace anomaly detection where principal
components maximize robust measures of training data dispersion. Adversary’s goal
is expressed as increasing false positives and false negatives of model under attack.
A time series of traffic volumes between pairs of points is the dataset representing a
routing matrix. Robust PCA of the routing matrix then identifies volume anomalies
in an abnormal subspace. Adversary’s poisoning strategies consider attacks with
increasing amounts of variance information in the attack scenarios. The weakest
attack strategy knows nothing about the traffic flows and adds random noise as
adversarial examples. In locally informed attack strategy, the adversary intercepts
the information about current traffic volume on network links under attack. In
globally informed attack strategy, the adversary has knowledge of traffic volumes
on all network links and network levels. In short-term attack, the anomaly detector
is retrained for each week of training data during which adversary attacks network.
In long-term attack, anomaly detector’s principal components are slowly poisoned
by the adversary over several weeks. In each attack scenario, the adversary decides
the quantity of data to add to the target traffic flow according to a Bernoulli
random variable. A robust PCA analysis on adversarially altered routing matrix then
produces adversarial examples which are classified as innocuous by the anomaly
detector. A tractable analytic solution to robust PCA is derived by Rubinstein et
al. [532] from an objective function with relaxation approximations maximizing the
attack vectors projected onto normal subspace covariance matrices. A projection
pursuit method then produces feasible solutions for the objective function in
direction of its gradient.
Feng et al. [186] present adversarial outliers for logistic regression. Linear
programming procedure estimates logistic parameters in the presence of adversarial
outliers in the covariance matrix in binary classification problems. Non-robustness
of logistic regression to adversarial outliers is calculated from the maximum
likelihood estimate of log-likelihood’s influence function as well as the loss function
in high-dimensional training data that has been corrupted by the adversarial outliers.
In the attack scenario, adversarial outliers seek to dominate correlations in the
58 3 Adversarial Attack Surfaces
objective function of the logistic regression model. Robustness bounds are then
derived on the population risk and the empirical risk with Lipschitz continuous loss
functions.
Zhao et al. [703] propose data poisoning attacks on relatedness of tasks in multi-
task relationship learning (MTRL). Optimal attacks in MTRL solve a bilevel
optimization problem adaptive to arbitrary target tasks and attacking tasks. Such
attacks are found by a stochastic gradient ascent procedure. The vulnerability of
MTRL to adversarial examples is categorized into feature learning approaches, low-
rank approaches, task clustering approaches, and task relationship approaches where
the learning goal is to jointly learn a prediction function. Then MTRL of linear
prediction functions with arbitrary convex loss functions and positive semi-definite
covariance matrices is studied. Adversary’s goal is defined as degradation of the
performance of a set of target tasks by injecting poisoned data to a set of attacking
tasks. Adversary’s payoff function is defined as empirical loss of training data
on target tasks where the adversary has complete knowledge of the target MTRL
model. In the gradient ascent procedure, poisoning data is iteratively updated in the
direction maximizing the adversarial payoff function. The prediction function is a
least squares loss function for regression tasks and squared hinge loss function for
classification tasks. The prediction performance is evaluated by maximizing area
under curve for classification tasks and minimizing normalized mean squared error
for regression tasks.
adversarially poisoned samples in the training dataset. Toward these ends, the
authors develop a robust matrix factorization algorithm which correctly recovers the
subspace wherever possible and use its features in a trimmed principal component
regression, which uses the recovered basis and trimmed optimization to estimate
linear model parameters. A noise residual is the solution of robust regression. It is
used to study the interference of adversarial data with the regression model design
and the ability of the adversary to significantly skew the estimator. This leads to the
design of bounded loss function for adversarial learning. The adversary can then
be assumed to create poisoning strategies to trigger the worst-case performance in
the dimensionality reduction algorithm and regression models. The most effective
of such attacks move the data samples along the direction to maximally modify the
learned estimator. Experimental results are compared with linear regression models
designed to be robust to the adversarial data poisoning. Such adversarial learning
models tend to focus more on the defense against adversary to produce adversarial
learning algorithms with distributional robustness rather than setting up the attack
scenarios for validating their misclassification error costs. The non-linear regression
modeling predictions can further benefit from security feature engineering based on
adversarial learning theories involving deep representation learning models such as
factorization machines [511]. Factorization machines are a low-rank approximation
of a sparse data tensor when most of its predicted elements are unknown. In
security feature engineering, they can model the interactions between features using
factorized parameters. They are applicable not only to dimensionality reduction
tasks but also to general prediction tasks in high-dimensional settings. As proposed
by Blondel et al. [73], higher-order factorization machines can be estimated with
dynamic programming algorithms tailored for prediction tasks in adversarial learn-
ing. Applications can be demonstrated for link prediction applications in complex
networks. In game theoretical adversarial learning, the dynamic programming
algorithms can be proposed to study the convergence properties of game theoretical
optima. The game theoretical loss functions and training procedures in such a
research are applicable to the study of learning and resampling dynamics within
neural computing mechanisms tailored to adversarial learning. The complex dynam-
ics expressed in adversarial data distributions can then be modeled as randomization
algorithmics in data mining systems and machine learning models.
Amin et al. [215] study adversarial regression in cyber-physical systems (CPS).
CPS are the critical infrastructure such as electric power grid, transportation
networks, water networks, and nuclear plants. A supervised regression model is
proposed to detect anomalous sensor readings in such infrastructures. Then a game
theoretical model is built on the interaction between the CPS defender and adver-
sary. In it, the defender chooses detection thresholds, while the attacker deploys a
stealthy attack in response. Such an attack is due to carefully modify readings of
compromised sensors that go undetected. Stuxnet attack is given as a well-known
example of targeting physical infrastructure through cyber means. It is defined as
the corruption of sensor readings to ensure that an attack on the controller code
either is undetected or indirectly impacts controller behavior. The learning problem
solves adversarial anomaly detection in the context of integrity attacks on a subset
60 3 Adversarial Attack Surfaces
of sensors in CPS. The supervised regression task in the anomaly detection model
predicts a measurement for each sensor as a function of readings from other sensors.
The robust anomaly detection is modeled as a Stackelberg game between defender
and adversary. An ensemble predictor containing a combination of neural network
regression and linear regression is explored in the regression-based detector. The
adversarial objective is expressed as a mixed-integer linear programming problem.
Thus, an adversarial loss function can be derived for sampling, prediction, and
optimization problems in deep learning regression. Such regression baselines can
be built for multivariate prediction problems in adversarial learning. Stackelberg
games can be used to model the strategic interactions assuming rational agents
in markets on which there is some hierarchical competition. Strategic interactions
between payoff functions for both players reflect the relative ranking of each
player’s application scenario in terms of the final outcome expected in machine
learning. The search space of strategies for each player in a game is normally
assumed to be bounded and convex, and the corresponding payoff function is
assumed to be differentiable. The equilibrium solution for all payoff functions
in the game is determined by the solution to an optimization objective function.
Game theory provides the mathematical tools to model behaviors of the defender
and the adversary behaviors in machine learning in terms of defense and attack
strategies. Game theoretical adversarial learning takes into account tradeoff made
by the attacker between the cost of adapting to the classifier and the benefit from
attack. On the other hand, the tradeoff made by the defender balances between the
benefit of a correct attack detection and the cost of a false alarm.
Zhang et al. [698] consider robust regression models in online and distributed
environments. Here, the data mining applications of adversarial learning theories
have to accommodate new challenges on the data analytics methodologies imposed
by big data, where it is usually impossible to store an entire high-dimensional data
stream or to scan it multiple times due to its tremendous volume and changing
dynamics to the underlying data distribution over time. We would have to consider
the amortized computational cost of pattern mining on high-dimensional and multi-
dimensional data distributions. The pattern validation metrics in adversarial learning
must also consider physical domain knowledge as ground truth for supervised
learning. The modeling will require to learn dense substructures, rare classes, and
condensed patterns over transactional, sequential, and graph datasets where random
process generating training data may not be the same as that governing testing data.
The implemented approach will have to be merged with approaches for adaptive dis-
criminative learning with continuous optimization. Here, the adversarial examples
can be put into sparse, measured, dense attack scenarios for adversarial learning.
They must account for the operational latency-sensitive stream data analytics and
knowledge representation learning over limited resources specified in terms of time,
power, and communication costs. Adversarial manipulations must be defined over
incremental learning and distributed processing of adversarial attacks at big data
speeds and scales. Zhang et al. [698] identify heterogeneously distributed data
corruption due to adversaries. They have proposals for corruption estimation when
the data cannot be entirely loaded into computer memory. The robust regression
3.10 Adversarial Machine Learning in Cybersecurity 61
is done with a scalable least-squares regression model that learns a reliable set of
regression coefficients. Online and distributed algorithms are proposed for such
robust regression. The true regression coefficients are recovered with a constant
upper bound on the error of the state-of-the-art batch methods under the arbitrary
corruption assumptions that are not uniformly distributed in the training data mini-
batches provided to the online algorithm.
variables aligned to the scenario. These variables would in turn support the creation
of a knowledge base understanding the full range of outcomes given a specific set
of input variables defining the real-world scenario. By testing the machine learning
models across a wide range of scenarios, sensitivity analysis adds credibility to it
by informing data-driven decision-making toward tangible conclusions and optimal
decisions. Particular attention can be paid to algorithmic bias favoring rare features
with strong sensitivity toward probability estimation errors and adversarial noise
processes. Adversarial training refers to the incorporation of adversarial examples
into the training process of machine learning models. Adversarial training is
sensitive to not only the parameters but also the hyperparameters affecting the
training process. Duesterwald et al. [167] present a sensitivity analysis of the
hyperparameter landscape in the adversarial training of deep learning networks.
Hyperparameter optimization techniques are applied to tune the adversarial training
to maximize robustness while keeping the loss in accuracy within a defined budget.
Wexler et al. [646] develop a What-If Tool that allows analyses of machine learn-
ing systems to probe and visualize their inputs and outputs. It can be used to analyze
feature importance, test performance in hypothesis testing, and visualize model
behavior across multiple input datasets. Such a tool is of interest to practitioners of
machine learning to answer questions on the effect of adversarial manipulations to
data points on modeling predictions. It can also be used to analyze the distributional
robustness of machine learning model across data samples acting as training,
testing, and validation datasets. Users of the What-If Tool have a visual interface
to perform counterfactual reasoning, investigate decision boundaries, and explore
changes to predictions with respect to changes in data points. Thus, it supports the
rapid prototyping and exploration over multiple statistical hypotheses in adversarial
learning. Without access to the modeling details, generalizable explanations can be
generated for adversarial manipulations using such hypotheses in a model-agnostic
manner. This flexibility with explanations and their representations improves the
interpretability of adversarial learning. It can be combined with exploratory data
analysis processes to deal with complexity in input data types, modeling tasks,
and optimization strategies. The workflow for testing hypothetical scenarios in the
What-If Tool supports general sense-making around the data in addition to the
evaluation of performance metrics optimized toward fairness constraints on machine
learning. The data points in output predictions can be visualized with confusion
matrices and ROC curves.
In supervised machine learning, sensitivity analysis studies the probability of
misclassification due to weight perturbations in the learning model caused by
machine imprecision and noisy input. To validate the distributional robustness of
supervised learning on various inputs, sensitivity analysis has been extended into
optimization techniques for neural networks such as sample reduction, feature
selection, active learning, and adversarial learning. After discussing the geometrical
and statistical approaches to machine learning sensitivity analysis, Yeung et al. [685]
showcase its application in dimensionality reduction, network optimization, and
selective learning. Provided a neural network contains the optimal number of hidden
units and is able to construct optimal discriminating boundaries between classes, it
64 3 Adversarial Attack Surfaces
can be used for feature extraction and rule induction with sensitivity analysis on
informative patterns expressed in terms of the decision boundaries. Engelbrecht
et al. [174] propose a sensitivity analysis on neural network decision boundaries
and present their visualization algorithms. The dynamic patterns discovered from
the sensitivity analysis are used in a selective learning algorithm. Thus, we can
extract accurate rules from trained neural networks. Patwary et al. [486] conduct a
sensitivity analysis-based investigation of semi-supervised learning. A divide-and-
conquer strategy based on fuzziness in the training dataset is shown to improve
the performance of classifiers. Here, a classifier classifies an instance to a class
with a degree of belief on the extent to which the instance belongs to the specific
class. In an initial training step, a classifier is trained on a small volume of training
data with class labels. In a final training step, a large volume of unlabeled data is
used for assigning each data point to one of several class labels. The classifier’s
generalization ability on unseen validation datasets is interlinked with the fuzziness
of a classifier in arriving at its prediction accuracy. Low fuzziness samples from
testing dataset are added to the original training dataset to retrain the learning model
with improved accuracy. Resampling methods are used to study the generalization
error bounds. Such a theory of learning from noisy data can be used to build semi-
supervised learning classifiers involving learning methods such as self-training,
co-training, multi-view learning, expectation maximization with generative mixture
models, graph pattern mining, and transductive SVM.
Suresh et al. [585] propose risk-sensitive loss functions to solve the multi-
category classification problems that minimize both the approximation and estima-
tion error. Here, approximation error depends on the closeness of the prediction to
the actual classifier, and estimation error depends on the closeness of the estimated
input distribution on the underlying input distribution. The error analysis incorpo-
rates the risk-sensitive loss functions into a neural network classifier architecture.
Performances are compared on imbalanced training samples using other well-known
loss functions to approximate the posterior probability. The proposed loss functions
improve the overall and per class classification accuracy. Such risk-sensitive loss
functions on the decision performance are required to extend the results of binary
classification to multi-category classification problems. In neural networks, the
classifier employs loss functions that minimize the expected misclassification for
all classes. Here, the risk-sensitive loss functions measure the confidence level
in the class label prediction and corresponding risk associated with the action
behind every classifier decision. The cost of misclassification is fixed a priori.
In game theoretical modeling, such misclassification costs can be incorporated
into the design of adversarial payoff functions. In multi-category classification
problems, the adversarial payoff functions must be able to deal with strong overlap
between classes in sparse data and high imbalance in samples per class. Neural
network architectures then find joint probability distributions over the observation
data to arrive at an accurate estimation of the desired coded class label. The risk
factors on estimated posterior probability in the loss function designs penalize
the misclassification patterns and their costs. The training process of the classifier
is guided by both a confusion matrix and a risk matrix. Difficulty in obtaining
3.10 Adversarial Machine Learning in Cybersecurity 65
the ground truth about original classes in the input data distribution increases the
computational complexity in model development.
Cortez et al. [137] propose visualization for extracting human understandable
knowledge from supervised learning black-box data mining models using sensitivity
analysis. The data mining models considered are neural networks, support vector
machines, random forest, and decision trees. Sensitivity responses are used to
create measures of input importance for both regression and classification tasks
in data mining. Such sensitivity-augmented regression trends and classification
patterns discovered by data mining can be used to improve data-driven decision-
making in the real world. Here, the data-driven analytics model learns an unknown
underlying function that maps several input variables to one output target within
a supervised learning paradigm. Interpretability of the data mining models can
be improved with feature engineering strategies such as extraction of rules and
multidimensional visualization techniques. The proposed sensitivity analysis treats
the machine learning models as black boxes to query them with sensitivity samples
and record the obtained responses. It does not utilize additional information such
as the model fitting criteria and feature importance attributions. The rationale
for sensitivity analysis is that a relevant input should produce substantial output
changes when varying its input levels. Such input relevance is quantified by using a
sensitivity measure. A baseline vector is proposed to capture input interactions with
less computational effort. Sensitivity measures model the target outcome with either
output class labels or the probability of classes. The total area under the receiver
operating characteristic curve (AUC) calculation and the ensemble sampling meth-
ods are used as the sensitivity measure in multi-label classification tasks. Regression
errors such as the mean absolute error (MAE) are used as the sensitivity measure in
multivariate regression tasks. The sensitivity measures are first computed for each
individual class, and then a weighted average is performed to compute a global
sensitivity measure. The effects of these sensitivity measures on tree ranking and
hyperplane separation are then studied in data visualizations for various inputs in
the cross-validation experiments estimating the data mining performance metrics.
Several new sensitivity analysis methods, measures, aggregation functions, and
visualization techniques are proposed. Feature selection methods can be further
designed to improve the relative ranking in the sensitivity measures to guide the
search through the variables relevant for data mining tasks at hand.
Engelbrecht [173] does a sensitivity analysis of the decision boundaries learned
by a neural network output function with respect to input perturbations. A sen-
sitivity analysis of each hidden unit activation function reveals which boundary
is implemented by which hidden unit. Here, a decision boundary is treated as
the region of uncertainty in the classification of input feature space. A unique
discriminating decision boundary can be obtained by pruning an oversized network
that is overfitting the data, by growing an undersized network that is underfitting
the data, or by adding regularization terms to the objective function for machine
learning. Optimal decision boundaries lead to a statistically good generalization
performance of classifiers produced with reference to the equations for decision
boundaries. Data sampling methods on modeling outputs are used to locate decision
66 3 Adversarial Attack Surfaces
unseen data. Finally, the prediction uncertainties are decomposed and explained
in the input domain to improve the validation and interpretation of deep learning
models. The notion of interpretability in this chapter is restricted to understanding
model predictions in terms of simple actionable constructs on the input features.
The experimental results are compared with conventional statistical modeling that
adopts a Bayesian inferencing pipeline on pre-trained models for deep learning with
point estimates. Such approaches may not be able to deal with out-of-distribution
test samples. Prediction uncertainties are classified into epistemic uncertainty also
known as model uncertainty that can be explained given enough training data and
aleatoric uncertainty that depends on noise or randomness in the input sample.
To address aleatoric uncertainty assuming a prior distribution on the inputs, the
authors include mean and variance estimates for the prediction in the sensitivity
analysis. When such sensitivity measures are used as regularizers for the learning
objectives, they are said to lead to better generalization performance. At the same
time, features that contribute maximally to the model uncertainty are tracked. Then
a model that assigns uncertainties to reliable outputs suggests learning problems in
either the training process or the input data. The output of deep learning is taken
to be the prediction of a continuous regression response variable. A conditional
likelihood-based loss function is selected to train the neural network toward such
a response. Feature sensitivities are computed with a first-order Taylor expansion
of the model’s decision function decomposed into relevance scores for each input
feature. The feature sensitivities then regularize the conditional entropy centered
on the critical parameters in the deep neural network’s loss function and training
process. In experimental evaluation of the deep neural networks, the proposed
approach produces improved validation performance compared to baseline models
that did not take uncertainties into account. The effect of masking insensitive
features is calculated from the R-squared statistic measuring the prediction variance
in the dependent variable for regression from each of the independent variables.
Zhang et al. [699] conduct a sensitivity analysis of one-layer convolutional
neural networks (CNNs) for sentence classification. It is unknown how CNNs are
dependent on unexpected changes to input word vector representations, filter region
size, activation functions, pooling strategy, regularization parameters, number of
feature maps, hyperparameters, and other free parameters in the model architecture.
The search space of all possible model architectures is huge. SVM for sentence
classification are used as baseline models to improve the CNN results. The
experimental results can be used to guide hyperparameter optimization techniques
such as grid search, random search, and Bayesian optimization. As part of spam
detection in the real world, Pruthi et al. [498] conduct a sensitivity analysis of word
recognition with recurrent neural networks (RNNs) in the presence of adversarial
misspellings. Adversarially crafted spelling mistakes are created in attack scenarios
such as dropping, adding, and swapping internal characters within word input to
text classification having to deal with adversarial edits. Experiments demonstrate
that an adversary can degrade the classifier’s performance to that achieved by
random guessing. To limit the number of different inputs to the classifier, the
sensitivity analysis reduces the number of distinct word recognition outputs that
3.10 Adversarial Machine Learning in Cybersecurity 69
an adversary can induce. Thus, the learning objective is to design a low sensitivity
system with a low error rate. Helton et al. [269] review sampling methods for
sensitivity analysis such as random sampling, importance sampling, and Latin
hypercube sampling. They help in the construction of distributions to characterize
stochastic and subjective uncertainty in the adversarial datasets that propagates
through machine learning models to eventually affect the model predictions.
The sensitivity analysis procedures reviewed include examination of scatterplots,
regression analysis, correlation and partial correlation, rank transformations, and
identification of non-monotonic and non-random patterns.
Xu et al. [670] derive generalization bounds for machine learning algorithms
based on their robustness properties. Here, generalization error can be understood
as an estimation of the risk of learning algorithms. It is empirically measured
in terms of performance errors on the training dataset. Complexity measures on
supervised learning bound the gap between the expected risk and the empirical
risk by the complexity of the hypothesis set for machine learning. They include
Vapnik-Chervonenkis (VC) dimension, Kolmogorov complexity, and Rademacher
complexity. Xu et al. [670] define algorithmic robustness with reference to a min-
max optimization objective found in the theory of robust optimization. Informally,
a learning algorithm is robust if it achieves “similar” performance on a testing
sample and a training sample that are “close.” Many machine learning models such
as LASSO regression, support vector machines, and deep neural networks can be
reformulated to have learning objectives in such a robust optimization framework
to target minimizing the empirical performance error under the worst possible
input perturbation in some properly defined uncertainty set for optimization. Here,
the generalization ability of learning algorithms can be investigated in terms of
the expected value of loss function of the learned hypotheses on samples that
statistically deviate from training samples. In adversarial training settings, such an
expected loss is customized for minimizing the feature manipulations in adversarial
learning and misclassification costs in game theoretical modeling. In other analyses
of machine learning models, the expected loss can be tailored for metric learning,
transfer learning, reinforcement learning, and learning with outliers. The general-
ization bounds on expected loss derived from the proposed algorithmic robustness
framework can handle transfer learning setups due to mismatched datasets in
domain adaptation. They can be extended into investigations on robustness of
unsupervised and semi-supervised learning algorithms.
To correct mistakes in classification settings, Asif et al. [18] create cost-sensitive
classifiers that can be penalized on application-dependent predicted and actual
class labels. A robust min-max game theoretical approach produces classifiers
that minimize the cost of mistakes in classification as a convex optimization
problem. Such an optimization is tractable in comparison to the NP-hard empirical
risk minimization approaches that address the cost of mistakes in cost-sensitive
classification with non-convex loss functions. This is the approach to minimize
expected cost in robust machine learning. It directly minimizes the cost-sensitive
loss on an approximation of the training data. The proposed zero-sum game is
solved using linear programming. The machine learning performance evaluation
70 3 Adversarial Attack Surfaces
is done on a confusion cost matrix. Cost sensitivity can be used for reweighting
available training data, incorporating confusion costs into the formulation of the
classifier, and boosting an ensemble of weak classifiers to produce a cost-sensitive
learner. Further, the loss functions for training processes can directly incorporate
cost sensitivity into multiclass generalizations of binary classifiers. An adversarial
learning perspective on cost sensitivity brings an added dimension of classification
modeling, statistical estimation, and decision-making under uncertainty. Here, the
relevant adversarial learning methods include maximin model of decision-making
as a sequential adversarial game, mini-max optimization of the regret of decisions,
statistical estimates under uncertainty that minimize worst-case risk, and maximum
entropy models using the logarithmic loss on exponential family distributions.
Probability distributions are estimated as the solutions of such min-max games.
Cost-sensitive learning that incorporates such adversarial learnings becomes more
robust not only to distributional shifts in the dataset but also to uncertainty due
to conditional distributions over labels in the loss function. Without assuming any
closed-form equation in parametric forms for the given data, this approach allows us
to incorporate training data properties and conditional data distributions as classi-
fication constraints due to the adversary’s conditional label distribution. Viewing
the cost-sensitive classification task as a two-player game between an estimator
and an adversary constrains the adversary to choose data manipulation distributions
that match a vector of moment statistics of the underlying input distribution. The
computational complexity of the estimator implicitly grows with the dimensionality
of such constraints. Thoughtful feature selection and regularization can avoid such
issues.
Lundberg et al. [401] present a framework called SHAP (SHapley Additive
exPlanations) for interpreting the predictions in deep learning. An explanation of
a model’s prediction is taken to be a model itself. It is called explanation model and
defines a class of additive feature attribution methods. Game theoretical modeling
then guarantees a unique solution for the entire class of additive feature attribution
methods. Such explanation models use simplified inputs that map to original inputs
through a mapping function. They are solved as penalized linear regression models.
Shapley regression values from cooperative game theory are used to find the feature
importances for linear models in the presence of multicollinearity. Samek et al. [538]
introduce the need for explainable artificial intelligence in AI domains such as image
classification, sentiment analysis, speech understanding, and strategic game playing.
The latest developments for visualizing, explaining, and interpreting deep learning
models are then surveyed. Two sensitivity analysis methods are presented for
explaining predictions in black-box deep learning classifiers. One method computes
sensitivity of prediction with respect to changes in the input. Another method
decomposes the decision in terms of input variables. Das et al. [144] review the
explainable artificial intelligence (XAI) landscape. A taxonomy of XAI techniques
is provided. Their usage to build trustworthy, interpretable, and self-explanatory
deep learning models is surveyed. In addition to the creation of adversarial examples
in misleading classifier decisions, XAI must have a feature engineering realization
centered around ethical, judicial, security reasons. “Interpretability” is defined as a
3.10 Adversarial Machine Learning in Cybersecurity 71
This chapter summarizes the game theoretical strategies for generating adversarial
manipulations. The adversarial learning objective for our adversaries is assumed
to be to inject small changes into the data distributions, defined over positive and
negative class labels, to the extent that deep learning subsequently misclassifies
the data distribution. Thus, the theoretical goal of our adversarial deep learning
process becomes one of determining whether a manipulation of the input data
has reached a learner decision boundary, i.e., where too many positive labels have
become negative labels. The adversarial data is generated by solving for optimal
attack policies in Stackelberg games where adversaries target the misclassification
performance of deep learning. Sequential game theoretical formulations can model
the interaction between an intelligent adversary and a deep learning model to gen-
erate adversarial manipulations by solving a two-player sequential non-cooperative
Stackelberg game where each player’s payoff function increases with interactions
to a local optimum. With a stochastic game theoretical formulation, we can then
extend the two-player Stackelberg game into a multiplayer Stackelberg game with
stochastic payoff functions for the adversaries. Both versions of the game are
resolved through the Nash equilibrium, which refers to a pair of strategies in
which there is no incentive for either the learner or the adversary to deviate from
their optimal strategy. We can then explore adversaries who optimize variational
payoff functions via data randomization strategies on deep learning designed for
multi-label classification tasks. Similarly, the outcome of these investigations is an
algorithm design that solves a variable-sum two-player sequential Stackelberg game
with new Nash equilibria. The adversary manipulates variational parameters in the
input data to mislead the learning process of the deep learning, so it misclassifies
the original class labels as the targeted class labels. The ideal variational adversarial
manipulation is the minimum change needed to the adversarial cost function
of encoded data that will result in the deep learning incorrectly labeling the
decoded data. The optimal manipulations are due to stochastic optima in non-
convex best response strategies. The adversarial data generated by this variant
The ideas of two-player sequential games (or Stackelberg game) and multiplayer
cooperative games have been employed as game theoretical frameworks train-
ing adversarial learning algorithms. To search for equilibrium in such games
is equivalent to solving a high-dimensional optimization problem. The eventual
model performance is then estimated by stochastic optimization methods based
on computationally efficient heuristic search algorithms. So long as the objective
function is bounded, global optimization methods such as genetic algorithms,
simulated annealing, and stochastic hill-climbing can be applied to search for the
convergence criteria that lead to subgame perfect equilibria.
Globerson et al. [220] discuss a classification algorithm with a game theoretical
formulation. The proposed algorithm is robust to feature deletion according to
a min-max objective function optimized by quadratic programming. In Liu et
al. [385], the interactions between an adversary and data miner are modeled as a
two-player sequential Stackelberg zero-sum game where the payoff for each player
is designed as a regularized loss function. The adversary iteratively attacks the data
miner using the best possible strategy for transforming the original training data.
The data miner independently reacts by rebuilding the classifier based on the data
4.1 Game Theoretical Learning Models 75
Game theory provides the mathematical tools to model behaviors of the defender
and the adversary behaviors in machine learning in terms of defense and attack
strategies. Game theoretical adversarial learning takes into account the tradeoff
made by the attacker between the cost of adapting to the classifier and the benefit
from the attack. On the other hand, the tradeoff made by the defender balances
between the benefit of a correct attack detection and the cost of a false alarm. The
optima in adversarial learning are able to determine what suitable strategy is needed
to reduce the defender’s loss from adversarial attacks. Strategic interactions between
payoff functions for both players reflect the relative ranking of each player’s
application scenario in terms of the final outcome expected in machine learning.
Stackelberg games are usually used to model the strategic interactions assuming
rational agents in markets on which there is some hierarchical competition. The
search space of strategies for each player in a game is normally assumed to be
bounded and convex, and the corresponding payoff function is assumed to be
differentiable. The equilibrium solution for all payoff functions in the game is
determined by the solution to an optimization objective function. Game theory
has application in economics, political theory, evolutionary science, and military
strategy.
Webb et al. [639] present an introduction to game theory. It covers the decision
models and decision processes for determining a rational agent participating in
76 4 Game Theoretical Adversarial Deep Learning
static games, dynamic games, and evolutionary games. Various notions of game
theoretical equilibrium lead to either descriptive or prescriptive notions of data
analytics. In formulating a game, we have to define the players, actions and
information available to the players, timing information about interaction between
players that are either simultaneous or sequential, order of play and any repetitions
in the interactions, payoffs to various players as a result of interaction, and
estimations on the costs-benefits of each set of potential choices for all players.
Osborne et al. [476] design a textbook for a graduate course in game theory. It
covers strategic games, extensive games, and coalitional games.
Leyton-Brown et al. [352] define game theory as the mathematical study of
interaction among independent, self-interested agents. The main classes of games,
their representations, and the main concepts used to analyze them are summarized.
A utility theory is developed to modeling an agent’s interests and preference across
a set of available alternatives in games such as normal form games, extensive-form
games, imperfect-information games, repeated games, stochastic games, Bayesian
games, and coalitional games. Agents faced with uncertainty in the learning
environment then define the expected value of the utility function with respect to
the appropriate probability distribution over states. In a simple manner, utility can
be interpreted as the amount of happiness an agent (player) gets from a particular
outcome or payoff. Game theoretical outcomes of interest in the machine learning
system can be categorized within subsets of possible outcomes as solution concepts
such as found due to Pareto optimality and Nash equilibrium. So strategic games
in machine learning ought to model the adversarial learning components such
as a set of players, a set of strategies for each player, and a payoff function
indicating desirable outcomes in the game for a player. The game theoretical
context (due to skill or strategy) of decisions for each player is useful for analyzing
the data-driven decision-making made by machine learning systems under risk.
It has been applied to social sciences to create a rational choice theory as an
adaptation of the philosophy of methodological individualism for maximizing the
utility/currency/value of individual actions among collective behaviors. A choice is
considered to be “rational” in economics if it leads to preference ranking over a set
of items characterizing the alternatives for a decision-maker where all comparisons
are consistent. More advanced consistency notions ought to account for uncertainty
in the learning environments and decision-making over time when a player does not
have precise information and cognitive ability about the outcomes of the choices
and comparisons between them, respectively. So the decision-making processes
in rational choice theory have to be validated on an empirically basis for ideas
such as “reason,” “preferences,” “rationality,” and “learnability” with useful formal
mathematical properties in the machine learning systems.
Depending on the player’s interactions in the game theoretical objectives, the
equilibrium solution is called either a Stackelberg equilibrium or a Nash equilib-
rium. In our research, we apply such a game theoretical modeling to supervised
machine learning problems inferring the decision boundaries and corresponding
data distributions in training data samples and validation data samples. We represent
the costs of participating in the game in terms of the misclassification performances
4.1 Game Theoretical Learning Models 77
and retraining costs in deep learning. Evolutionary search and optimization algo-
rithms are used to solve the game to find adversarial manipulations to training data.
Depending on the importance of the costs incurred to generate an attack and to
retrain the classifier, we find different equilibria solutions for the game. Further, we
assume a black-box attack scenario where the adversary is unable to observe the
classifier’s strategies before choosing its strategy. We then experiment with variants
in the attack scenarios where the defender’s utility losses in the game are inferior to
the utility of the adversary in a non-zero sum game.
Our research extends into Bayesian game models in which players have incom-
plete information about other players. This is more likely as the defender might
not know the exact cost of generating adversarial data and the attacker might not
know the exact classification cost for the defender. They only have beliefs about
these costs. This modeling approach transforms games of incomplete information
into games of imperfect information. Thus, adversarial learning techniques that rely
on a game theory-based framework can be relevant as it models behaviors of the
learner and the adversary based on the benefits and costs incurred for retraining the
model and generating an attacker. Training a classifier with adversarial examples in
synthetic adversarial data is similar to regularization of the classifier. In this context,
game theory provides useful tools to model the behavior of the adversary and the
learner as it includes, on the one hand, the benefit for the adversary to attack and
the cost to generate the adversarial data and, on the other hand, the costs of the
learner to update the model. Thus, game theory-based approaches cast light on the
tradeoff adversaries and learners both made and can be used to assess the risks of
implementing a specific cybersecurity technology for data-driven decision-making.
From a computational point of view, decision-making procedures can be encoded
into algorithms and heuristics simulating rationality effectively. One important way
to study rationality is to propose agents on the assumptions adopted by different
algorithms and heuristics. We can then study the equilibrium of a market of
interactions between such agents as quantification of the impact of algorithms and
heuristics in the data analytics models. Game theory can be used to derive the market
equilibrium mathematically. Such agent-based studies also involve computational
intelligence. Because the combinatorial explosion of the optimal algorithms and
heuristics is usually computationally intractable, our ability to effectively use the
available computational power to find a good solution is determined by the computa-
tional intelligence algorithms that we implement. The level of optimality that we can
reasonably achieve in an agent-driven paradigm then defines our effective rationality
in a machine learning problem. Decision procedures in computational intelligence
can be evolved with evolutionary learning algorithms. They are able to separate
domain-specific knowledge from the reasoning mechanism. Thus, rationality in
the real world can be studied within the context of decision problems in com-
putational intelligence. By visualization of data transformations in computational
intelligence including analysis of perturbation data, fuzzy membership functions can
be designed to mitigate the effect of outliers and perturbations on the classification
decision boundaries by penalizing training errors differently.
78 4 Game Theoretical Adversarial Deep Learning
consumption. Such game theoretical analysis is applicable to markets that are the
outcome of strategic interactions rather than stochastic natural processes such as
digital markets, cloud computing, energy marketplaces, and crowdsourcing systems.
Here, computing machines create strategic interactions through communication and
commerce. Resultant statistical inference problems in game theoretical optimization
can benefit from econometrics literature on parametric inference from observed
strategic interactions.
Halpern et al. [248] survey the main themes at the intersection of game theory and
computer science. The computational complexity of modeling bounded rationality
is analyzed with reference to algorithmic mechanism design in game theory. Such
mechanism design has application in combinatorial auctions for voting mechanisms,
spectrum auctions, airport time slots, and industrial procurement. Here, a “mecha-
nism” is a protocol for interactions between players to determine the solution for an
underlying optimization problem. A complex dependence exists between elicited
data and specified behavior in a mechanism. In general, algorithmic game theory
differs from microeconomics in terms of focusing on the optimization problems
with optimal solutions, impossibility results, feasible approximation guarantees, etc.
in Internet-like networks. Narahari et al. [448] write about the application of game
theory and mechanism design to problem-solving in engineering, computer science,
microeconomics, and network science. Illustrative examples are provided for the
key ideas of mechanism design such as social choice theory, direct mechanisms,
and indirect mechanisms. Narahari et al. [449] is another research monograph on
mechanism design theory. Optimal mechanisms are described as a research direction
to optimize a performance metric such as the adversarial payoff functions in
game theoretical adversarial deep learning. Cost-sharing mechanisms are proposed
as a protocol to design computationally efficient adversarial cost functions with
incentives and budgets. Iterative mechanisms can be used to reduce the cost of
computing valuations and allocations in game theoretical adversarial deep learning.
Further research on game theoretical learning can be found in proceedings of
conferences such as Decision and Game Theory for Security (GameSec) and Logic
and the Foundations of Game and Decision Theory (LOFT).
for linear ranking, Boltzmann selection, and tournament selection of the features.
Boltzmann selection converges onto a polymorphic Nash equilibrium according
to a point attractor from chaos theory in theoretical physics and dynamical
systems. Polymorphic Nash equilibria are Nash equilibria for mixed strategy
games expressing polymorphic data populations. Co-evolutionary algorithms
are understood as a search method or a problem-solver and a model of a
dynamical system. Ficici et al. [191] examine feature selection methods such as
fitness-proportional, linear rank, truncation, and ES selection in the context of two-
population co-evolution with game theoretical learning. The selection methods add
regions of phase space that lead to cyclic dynamics in non-Nash attractors.
Herbert et al. [271] apply game theory techniques to assess the optimization
quality of competitive learning clusters with a self-organizing map (SOM). SOMs
offer a flexible robustness model for clustering with several configurable aspects
in many different applications. It can take advantage of dynamic and adaptive data
structures to decide the neuron updates in competitive learning with reference to
several performance measures and selection criteria in machine learning. Here,
game theoretical learning is used to improve the quality of updates to not only
one neuron but also the entire neuron clusters with a training algorithm called
GTSOM. Garg et al. [210] apply game theoretical techniques to feature clustering.
Features are viewed as rational players in a coalitional game where the coalitions
are the clusters. Clusters are then formed to maximize individual payoffs at the
solution concept called Nash stable partition (NSP). NSP is solved by an integer
linear program (ILP). ILP is modified into a hierarchical clustering approach to
find clusters over a large number of features. Thus, game theory is used in feature
selection to distinguish between relevant and irrelevant features and substitutable
and complementary features.
Shah et al. [550] survey game models in privacy preservation, network secu-
rity, intrusion detection, and resource optimization. Game theory is one of the
approaches to privacy-preserving data mining (PPDM). PPDM has utilized asso-
ciation rules to achieve privacy-preserving distributed association rule mining
(PPRADM) algorithms. Game theory can also be used to design the tradeoffs
between data utility and privacy preservation with a sequential game model. Privacy
games can be designed from Cooperative Game Theory to create Cooperative
Privacy in coalitions. Game theory can be used in the analysis of network attacks
such as browser attacks, denial-of-service (DDoS) attacks, worm attacks, and
malware attacks. Bayesian honeypot game models have been proposed for solving
the problems caused by distributed denial-of-service attacks. Stackelberg models
have been proposed for network hardening problems where the defender optimally
adds honeypots in the network to detect the attacker. Dynamic game models have
been proposed for intrusion detection systems (IDS) in ad hoc wireless network.
IDS optimizations can be categorized into resource allocation optimization, IDS
configuration optimization, and countermeasure optimization. Resource allocation
optimization problems are concerned with optimization of network link sampling,
resource sharing between nodes, cluster defense strategy in sensor networks, etc.
IDS configuration optimization is concerned with optimization of IDS sensitivity,
86 4 Game Theoretical Adversarial Deep Learning
items while minimizing the total number of prediction errors. A boosting algorithm
is proposed to combine the learning models obtained from multiple runs across
multiple data distributions. It combines the several selected hypotheses into a final
hypothesis with arbitrarily small error rate. The generalization error of the final
hypothesis can be bounded with reference to the VC theory of computational
learning theory.
Nelson et al. [456] study how an adversary can efficiently query a classifier.
Undetected adversarial examples are crafted at minimum cost to the adversary using
polynomial number of queries in the training feature space. Thus, a cost-sensitive
adversary can discover blind spots of a detector by observing the membership
query responses of the detector for negative labels to construct low-cost adversarial
examples that have maximum impact on the detector’s intended performance.
This problem of finding low-cost negative instances with few queries is termed
the problem of near-optimal evasion. The targeted classifiers are called convex-
inducing classifiers. They include linear classifiers and anomaly detectors that learn
hypersphere decision boundaries. There is no need to reverse engineer the decision
boundary of the classifier. The adversarial objective of query-based optimization is
comparable but not similar to the research area of active learning. The adversary’s
notion of utility to craft adversarial examples is represented by an adversarial
cost function. Lanckriet et al. [344] analyze misclassification probabilities of the
correct classification of future data points in a worst-case setting for classifiers. The
resultant minimax problem is interpreted geometrically as minimizing the maximum
of the Mahalanobis distances between two classes in binary classification problems
optimized by quadratic programs. Classifier robustness is defined on the estimation
errors of means and covariances of the classes. It is found to be competitive with
non-linear classifiers such as support vector machines. This is a discriminative
approach to measuring the adversarial robustness of classifiers. It can be contrasted
with generative approach that makes distributional assumptions about the class-
conditional densities in the adversarial data to estimate and control the relevant
probabilities.
Asif et al. [18] discuss application-dependent penalties for mistakes between
predicted and actual class labels in robust classifiers. The cost of mistakes is
formulated as a convex optimization problem on the non-convex cost-sensitive
loss. This approach to adversarial robustness is contrasted with empirical risk
minimization on a convex surrogate loss that is tractable. But, the statistical
difference between the actual loss and its convex surrogate can lead to a statistically
significant mismatch between the optimal parameter estimation under surrogate loss
function and original performance objective. The penalties for mistakes are repre-
sented as a confusion cost matrix for classification tasks. In contrast to reweighting
methods and mistake-specific losses, the goal for supervised machine learning is to
88 4 Game Theoretical Adversarial Deep Learning
that is contrasted with adversarial risk analysis. The proposed adversarial risk
analysis is a Bayesian decision analysis problem. It does not assume that game
theoretical agents share information about their beliefs and preferences according
to a common knowledge hypothesis in game theoretical frameworks. Adversarial
robustness strategies depend on whether generative or discriminative classifiers
as base models. Monte Carlo (MC) simulation solves for the optimal attack.
Approximate Bayesian computation (ABC) techniques generate adversarial data
distributions. Binary classification problems are evaluated in evasion attack and
integrity-violation attack settings. Here, game theoretical frameworks can make
real-time inference about the adversary’s decision-making process at operation
time. The adversarial risk analysis is suitable for applications having computational
bottlenecks with possible changes in adversary’s behavior being incorporated
into classifier retraining. The proposed adversarial classification has cybersecurity
applications in automation process found in spam detection, autonomous driving,
fraud detection, phishing detection, content filtering, cargo screening, predictive
policing, and terrorism.
Fawzi et al. [181] analyze the robustness of classifiers to adversarial perturbations
to derive upper bounds for adversarial robustness measures on the difficulty of the
classification task. The robustness measures depend on distinguishability measure
between classes. Adversarial instability is attributed to low flexibility of classifiers
in comparison to the difficulty of the classification task. A distinction is made
between robustness of a classifier to random noise and its robustness to adversarial
perturbations. In real-world classification tasks, weak concepts of adversarial
robustness correspond to partial information about the classification task, while
strong concepts capture the essence of fixed classification families such as piecewise
linear functions for the classification task. A more flexible family of non-linear
classifiers and a better training algorithm are found to achieve better robustness.
Experimental evaluation suggests that increasing depth of the neural network
helps with increasing its adversarial robustness but adding layers to an already
deep network only moderately changes the robustness. Biggio et al. [57] improve
classifier robustness with information hiding strategies that introduce randomness
in the decision function. It is used in a multiple classifier system architecture.
A game theoretical formulation between classifier and adversary is proposed for
creating better-performing adversary-aware classifiers. Lack of information about
the exact decision boundary leads to the adversary making too conservative or too
risky choices in deciding adversarial manipulations for a malicious pattern. Thus,
the classifier can benefit by increasing the uncertainty of the adversary. However,
excessive randomization can also lead to a drop in the performance of the selected
classifier. This tradeoff between the randomization strategies is analyzed with a
repeated game of strategies to allow the classifier to retrain according to the strate-
gies selected by the adversary. Schmidt et al. [542] analyze the sample complexity
of robust learning in state-of-the-art classifiers subject to adversarial perturbations.
It is important in analyzing the robustness properties of learning systems deployed
in safety- and security-critical environments. The sample complexity of standard
benign generalization of classifiers is compared with the sample complexity of
90 4 Game Theoretical Adversarial Deep Learning
We may also enforce a prior distribution on the latent factors for coherent data
generation in supervised learning. The trustworthiness of such machine learning
in deployment can be simulated by computational optimization and statistical
inference problems in advanced analytics with game theoretical adversaries and
dynamical system control in black-box optimizations of the deep learning. We
can then analyze the bias-variance decomposition in adversarial payoff functions
to derive utility bounds for deep learning in a mistake bound framework for
cybersecurity. We study the existing adversarial cost functions with respect to
robustness bounds and privacy budgets in the sparse representation learning models
for adversarial learning. To derive reliable guarantees on the security of neural
networks, we conduct a data-driven adversarial machine learning security evalu-
ation at the intersection of software testing, formal verification, robust artificial
intelligence, and interpretable machine learning. Here, model complexity (otherwise
called generalization error) can be defined as the discrepancy between the out-
of-sample error and the in-sample error in formal verification applicable to the
cyber information processing methods in adversarial learning classification and
optimization problems.
Our game theoretical adversarial learning framework can automate the detec-
tion, classification, generation, and optimization of trustworthy machine learning
in the web and mobile apps. The generative representations in our adversarial
manipulations are able to quantify the security threats that exploit the active and
passive measurements in big data application domains. We model the malicious
activities of adversaries in game theoretical optimization objective functions. Then
the deep learning solutions in equilibrium are able to identify the information leaks
of private information available in the AI platforms. By modeling the information
leakage as loss functions in deep learning, we can formulate adversarial settings in a
game theoretical learning framework. We can include sensitive attack scenarios and
defense approaches from application domains such as biometric recognition into
such a learning theory framework. In this context, we can also explore the privacy-
enhancing technologies that impose controls on data sharing and collaborative
analytics in Internet measurements. Here, we can design data analysis, knowledge
discovery, and machine learning algorithms for data sharing frameworks. In them,
domain constraints can be modeled as adversarial cost functions, and design
constraints can be modeled as adversarial payoff functions. Security information can
be represented with complex networks. Then deep learning baselines can perform
in terms of data mining processes and machine learning features. Secure scalable
federated learning can also be implemented into distributed systems and database
systems. Our research in adversarial learning provides the frameworks to analyze
the security and privacy in machine learning. In a high-performance computing
infrastructure, we can implement it in tools and frameworks for serial algorithms,
parallel algorithms, and distributed computations of big data.
4.1 Game Theoretical Learning Models 93
and data manipulation in our research are determined by the stochastic operators
in evolutionary algorithms and variational networks defining the attack scenarios.
The evolutionary search algorithms of my research can be extended into Markov
decision processes and cellular automata by an extension of the local optimization
procedures maximizing the adversarial payoff functions. Here, mixture density
networks can express conditional data distributions on latent variables and class
labels in the training data and adversarial data. We can also measure information
divergence between minimal representations of training data and adversarial data
feature embeddings with deep metric learning-based adversarial cost functions.
We may also enforce a prior distribution on the latent factors for coherent data
generation in supervised learning.
In Liu et al. [385], the interactions between an adversary and data miner are modeled
as a two-player sequential Stackelberg zero-sum game where the payoff for each
player is designed as a regularized loss function. Each player’s move is based on the
observation of the opponent’s last play. The adversary iteratively attacks the data
miner by best strategy for transforming the original training data. The data miner
reacts by rebuilding classifier based on data miner’s observations of the adversary’s
modifications to the training data. The adversary’s strategy of play is determined
independently by the adversary. The game is repeated until adversary’s payoff does
not increase or the maximum number of iterations is reached.
The maximin problem for optimization proposed in Liu et al. [385] is solved
without making assumptions on the distribution underlying training and testing
data. The empirical evaluation of the optimization algorithm is conducted on image
spam and text spam data. Different settings of loss functions yield different types of
classifiers such as logistic regression with log linear loss function and support vector
machines with hinge loss function. For the chosen loss functions, the optimization
objective is formulated as an unconstrained convex optimization problem. The
optimization problem is solved by the trust region method minimizing objective
function on a constrained neighborhood of polar coordinates. At Nash equilibrium,
the solution of the maximin problem achieves the highest false negative rate and
lowest data transformation cost simultaneously. This leads to robust classification
boundaries at the test time. The weight vector computed at Nash equilibrium also
gives features that are more robust to adversarial data manipulations.
Liu et al. [387] propose an extension to Liu et al. [385] where one-step game
is used to reduce computing time of the minimax algorithm. The one-step method
converges to Nash equilibrium by utilizing singular value decomposition (SVD).
SVD gives orthogonal basis vectors or singular vectors acting as the “principal
components” of training data. Thus, the singular vectors characterize each type
of class present in the training data. The label of a test data is then taken by
Liu et al. [385] to be the training class generating smaller residue vector. This
102 4 Game Theoretical Adversarial Deep Learning
Classificaon Defence
payoff best
Decoder
Encoder
Training Search for best aack Retrain CNN according to Line
(Dec)
(Enc)
Adversarial
data data according to Line 18 in CNNs
manipulaons
Encoded training data 11 in Algorithm 1 Manipulated training data
Algorithm 1
Xgen
Adversarial Aack
Stackelberg Game
Fig. 4.1 A flowchart illustrating the variational Stackelberg game theoretical adversarial learning
To derive the payoff functions in the game, we assume that the adversary has no
knowledge of either the deep neural network layers or loss functions in the deep
learning model. Our proposed game theoretical optimization problems are solved
without making assumptions on the learner’s training and testing data distributions.
The strategy space for algorithmic randomization and data manipulation in our game
is determined by the stochastic operators in evolutionary algorithms and variational
networks defining the attack scenarios.
Figure 4.1 is a flowchart of our adversarial learning process that accounts for the
presence of a variational adversary in supervised learning [126]. The final outcome
of our adversarial learning is a CNN classification model CN Nsecure (henceforth
shortened as CNNs ) that is robust to the adversarial attacks.
We generate the adversarial data in a two-player Stackelberg game between the
adversary and the classifier. The adversary creates a variational model by searching
for adversarial manipulations on encoded training data. Every statistical parameter
of the encoded training data is searched according to a simulated annealing (SA)
procedure. The aggregation of adversarial manipulations to all statistical parameters
in the encoded training data is optimized according to an alternating least squares
(ALS) procedure. The ALS optimization is invoked at each time when the adversary
generates adversarial data Xgen in the Stackelberg game. Xgen acts as a validation
data for the classifier under attack. For every Xgen , the classifier re-optimizes its
training weights to update itself.
The result of such a game theoretical interaction between the learner’s and
classifier’s best moves is quantified by the adversary’s payoff payoffbest . The
adversary engages the classifier in the Stackelberg game as long as the payoffbest
increases. A decrease of payoffbest indicates that Nash equilibrium exit condition
has been reached in the Stackelberg game. At the end of the game, the adversary
has optimal adversarial manipulations from the most recent Xgen . Such manipu-
lations are applied on the training data to obtain attacked training data. Then the
classifier’s learning process adds the attacked data into the original training data
so that the CNNs can be optimally retrained by our adversarial attacks. While the
CNN classifier is trained in the original data space, the adversary generates data
104 4 Game Theoretical Adversarial Deep Learning
Although both GAN and our method [126] are based on the framework of game
theory and both of them are seeking for the Nash equilibrium, they have some
great differences. In this section, we summarize these differences into three aspects,
including the model construction, the model optimization, and the optimization
results.
Model Construction Firstly, our variational Stackelberg game is a variable-sum
problem, while the GANs construct a constant-sum game. Secondly, our method
defines the adversary as the game leader, whereas the GAN is led by a generator.
Lastly, the GANs define attack scenarios to discover generative models underlying
given data distribution, while we optimize adversarial payoff functions with evo-
lutionary attack parameters defining our attack scenarios in randomized strategy
spaces.
value. The goal of machine is stated as producing a model with the smallest possible
loss. The optimal model is obtained by minimizing the expected loss over all the
training examples, validation examples, and adversarial examples. In the case of
zero-one loss, the optimal model is the Bayes classifier with a loss function called
the Bayes rate. Since loss is a function of training dataset, the same adversarial
learning algorithm produces different machine learning models for different training
datasets. This dependency is reduced by averaging the expected loss over several
training datasets that include the adversarial learning datasets. Here, bias-variance
decompositions decompose the expected loss into bias, variance, noise terms that
are computed with a computational algorithm. Adversarial data distributions are
accounted for in the noise term. The bias term is independent of the training set and
is zero for a learner that always makes the optimal prediction. The variance term is
independent of the true value of the predicted variable. It is zero for a learner that
always makes the same prediction regardless of the training set. The distribution
of margins for correctly classifying the predictions with high confidence can then
be used to derive bounds on the generalization error of adversarial loss functions
proposed in game theoretical adversarial deep learning. The smaller the probability
of a lower margin, the lower the bound on generalization error on training examples
augmented with the adversarial examples. Maximizing classification margins and
minimizing misclassification errors are a combination of reducing the number of
biased examples, decreasing model variance on unbiased examples, and increasing
model variance on biased examples. The related work is on adversarial deep learning
theories in data mining patterns and machine learning theories in computational
learning algorithms. It has application in malware analysis, agent mining, intelligent
control, and cyber risk analysis in trust modeling of the security and privacy of
machine learning.
Belkin et al. [43] study the model capacity of neural network’s interpolation with
double descent performance curve’s training regime that subsumes conventional
practice of U-shaped bias-variance risk curves to balance underfitting and overfitting
according to empirical risk minimization. The simplicity of the neural network
predictors is defined over function classes that contain interpolating functions with
regularity or smoothness due to less inductive bias as measured by a function
space norm. The interpolating functions with a smaller norm are considered to be
simpler. Adversarial cost functions on the neural network predictors in adversarial
deep learning settings act as measures of regularization on the inductive bias in
game theoretical adversarial deep learning. Here, margin theory is the related
work to discover the function classes in adversarial classifiers. Research on the
optimality of interpolating predictors is required to extend such function classes
in adversarial regressors approximated by multi-label classifiers with vector valued
outputs and sum of squared losses at each output. Incorporating such interpolation
regimes and their empirical data analytics in adversarial deep learning opens new
computational, statistical, and mathematical avenues of research into the optimality
properties and utility bounds of deep learning predictors. Li et al. [358] investigate
the dimensionality of the parameter space to solve computing problems with neural
networks for supervised, reinforcement, and other types of learning. Such results
4.3 Game Theoretical Adversarial Deep Learning 109
are useful for finding the structure of the objective landscape in adversarial deep
learning with compressed representations of the deep neural networks in black-
box optimizations. Strumbelj et al. [574] use coalitional game theory to explain
individual predictions of classification models. The proposed explanation method
is designed to work with any type of classifier. In machine learning, it can be
contrasted with model-specific explanation methods such as decision rules and
Bayesian networks as well as methods that give explanations in the form of feature
contributions in classifier ensembles such as random forests. In deep learning,
it can be contrasted with rule extraction methods applied to neural networks to
reduce the dependence between end-user requirements (obtained from marketing,
medicine, etc.) and underlying machine learning methods. The notion of a prediction
difference is proposed between current prediction and expected prediction with
respect to the current feature value contribution to the prediction. No assumptions
are made on the prior relevance of individual feature values. The changes in a
classifier’s prediction are decomposed into contributions of individual features using
concepts in coalitional game theory. Narayanam et al. [450] propose to discover
influential nodes acting as learned features in a social network with Shapley values
and cooperative game theory. Shapley values are the solution concepts giving
the expected payoff allocations in the coalitional game designs. The problem of
information diffusion in social networks is addressed for applications such as viral
marketing, sales promotions, and research trends in co-authorship networks for
abstract ideas and technical information with computationally efficient algorithms.
The target node set selection problem finding influential nodes is formulated as a
coverage pattern discovery problem in data mining that models individual decisions
influenced by behaviors of immediate neighbors in the social network. Shapley
value solution concepts satisfy mathematical properties called linearity, symmetry,
and carrier property to discover a fair way of distributing the gains of cooperation
among the players in the coalitional game. Shapley values take into account
all possible coalitional dynamics and negotiation scenarios among the players.
The nodes in social network can be considered to be strategically behaving self-
interested individual entities in an organization functioning according to mechanism
design in game theory. Then the probabilities of a node being influenced by its
neighbors depend on not only the social network communities’ structure but also
private information the node has about its neighbors. The target node set problem
can be used for machine learning applications in marketing, politics, economics,
epidemiology, sociology, computer networking, and databases.
In future work, we shall explore dependence between randomization in our
adversarial manipulations and optimization in our game formulation. At present,
the game theoretical stochastic optima (solving for adversarial data) are determined
by the convergence of the adversarial cost function rather than a classification cost
function. However, multi-label classification cost functions in a multiplayer strategy
space of pure strategies as well as mixed strategies could be another fruitful avenue
of research. Assuming a white-box attack scenario on a CNN classifier would mean
we could guide the parameter settings in a genetic algorithm and an SA algorithm
into application-dependent adversarial data distributions.
110 4 Game Theoretical Adversarial Deep Learning
Cai et al. [96] discuss the game theoretical equilibria in two-player non-zero
sum games with multiplayer generalization. Min-max theorem is proven for a
multiplayer polymatrix zero-sum game. Nash equilibrium is found by linear
programming. The polymatrix game is defined by a graph whose vertices are
players with associated strategies and edges are two-player games. A player’s payoff
112 4 Game Theoretical Adversarial Deep Learning
is the sum of all payoffs in games adjacent to it. Zero-sum polymatrix games
represent a closed system of payoffs. Equilibrium strategies for the game are max-
min strategies representing no-regret play for all players. Oliehoek et al. [473]
present asymmetric games for the search in co-evolutionary algorithms that does not
require the specification of an evaluation function. In asymmetric games, the current
player’s strategies are conditioned by the actions taken by previous players. Such
co-evolutionary algorithms are useful in algorithmic problems such as game theo-
retical machine learning, concept learning, sorting networks, density classification
using cellular automata, and function approximation and classification. Complex
evaluation cases can be constructed with a search process. High-quality strategies
are developed in the course of the search. The solution concept in the search
process defines which candidate solutions qualify as optimal solutions and which
do not. A game theoretical learning algorithm designs the convergence criteria on
expected utility determined by the solution concepts optimized against an intelligent
adversary. A Nash equilibrium then specifies mixed (randomized) strategies for each
player that have no incentive to deviate given the strategies of other players. So a
game theoretical solution to machine learning problems is a recommendation on
optimal plays for all players. It leads to multi-agent systems with Nash equilibrium
as a solution concept. They can also include Pareto co-evolution to accommodate
multiple adversaries with separate objectives. Best response strategies are solved by
a partially observable Markov decision process, corresponding to finite extensive
form asymmetric games called parallel Nash memory. All possible player beliefs
and transitions between them can be generated in the Markov decision process. An
alternating maximization or coordinate ascent optimization method solves for the
best response strategies.
Bianchi et al. [52] survey algorithmic frameworks called metaheuristics to solve
complex optimization problems in game theoretical adversarial deep learning with
mathematical formulations for uncertain, stochastic, and dynamic information.
Ant colony optimization, evolutionary computation, simulated annealing, and tabu
search are all metaheuristics applicable to the stochastic combinatorial optimization
problems (SCOPs) in generating adversarial manipulations at every iteration of
the adversarial games. In problem-solving with SCOPs, information about the
problem data is partially unknown such as the information available with a learner
about the specific adversary’s strategies. Moreover, SCOPs assume a probability
distribution about the knowledge of the problem data such as the adversary type’s
characterization. SCOPs are solved by dynamic programming. Such metaheuristics
combine several heuristics that are either local search algorithms starting from a
pre-existent solution/move for the learned features or constructive algorithms that
do feature/component construction of a solution for discovering the adversarial
manipulations in game theoretical adversarial deep learning. In comparison with
heuristics, metaheuristics strike a dynamic stochastic balance between effectively
exploiting the search space representing accumulated experience and efficiently
exploring new regions of the search space with high-quality solutions. Convergence
proofs for metaheuristics are useful to derive analytics insight into the working
principles of a computational algorithm. However, they assume infinite computation
4.3 Game Theoretical Adversarial Deep Learning 113
The interaction between an adversary and the classifier has been modeled as a
Stackelberg game. Here, adversary’s role is not that of a static data generator but an
intelligent agent making deliberate data manipulations to evade classifiers. Failure
of considering adversarial evasion in classifier design exposes security concerns
in fraud detection, computer intrusion detection, web search, spam detection, and
phishing detection applications. Re-learning classifier weights is a weak solution to
robust classification since evasion attacks are generated at cheaper and faster rate
than re-learning.
Li et al. [354] proposed a feature cross-substitution attack to demonstrate
objective-driven adversaries exploiting limitations of feature reduction in adversar-
ial settings. Adversary is able to query the classifier according to a fixed query
budget and a fixed cost budget. An adversarial evasion model with a sparse
regularizer is then presented. Constructing the classifier on feature equivalence
classes rather than feature space is proposed as a solution to improve classifier
116 4 Game Theoretical Adversarial Deep Learning
a communication protocol that tracks the frequency patterns about queries made
by an algorithm to the adversarial/learning optimization objective. Here, zeroth-
order black-box optimization methods respond with information about value of
function at the query point, while first-order gray-box optimization methods respond
with information about gradient information of function. Game theory formulation
for the optimization objectives allow non-convex losses that can be formulated
as a game at many different scales in the neural network architectures where
particular layers of the neural network are solving a convex optimization problem.
Structuring rules for the compositionality of adversarial deep learning objectives can
be then formalized with the distributed communication protocols and grammars.
The resulting feedforward computation is captured in a computation graph data
structure that structures queries and responses into query and response graphs,
respectively. The communication protocol specifies how data mining information
flows through the query and response graphs without specifying players’ utilities of
the information. Grammars on the distributed communication protocols guarantee
that the response graph encodes sufficient information for the players to jointly
converge to a game theoretical solution concept for the learning objective function
and associated adversarial payoff functions. A grammar can be specified for the
players’ interactions and error backpropagation in each game to perform a specific
data analytics task. The players then jointly encode data mining knowledge about the
task. The grammars can also include probabilistic and Bayesian formulations along
with methods for unsupervised pre-training. Practicable examples of the grammars
are demonstrated for the learned objectives in supervised deep learning models,
variational autoencoders and generative adversarial networks for unsupervised
learning, and deviator-actor-critic model for deep reinforcement learning.
Hazan et al. [261] discuss regret minimization in repeated games with non-
convex loss functions. Such repeated games can be used to design multiplayer games
in adversarial deep learning. The notion of a regret in adversarial deep learning is
computationally intractable in general. Thus, a formulation for regret is defined for
efficient optimization and convergence to an approximate local optimum. Regret
minimization in games corresponds to repeated play in which a player accumulates
average loss that is proportional to the best response decision in hindsight. Regret
is a global optimization criterion chosen by a player over its entire decision
set. If the loss function computing the player’s payoff subject to other player’s
actions is convex, then the regret criteria are computationally intractable, and they
converge to game theoretical solution concepts such as Nash equilibrium, correlated
equilibrium, and coarse correlated equilibrium. A local regret criterion is defined
to predict playing points with small gradients on average. An algorithm incurring
the sublinear local regret in time has a small time-averaged gradient in expectation
for every randomly selected iterate. A notion of time-smoothing captures non-
convex online optimization under limited concept drift. By contrast, non-convex
continuous optimization algorithms on the players’ loss functions focus on finding
a local optimum since finding the global optimum is a NP-hard problem. Stochastic
second-order methods are used for such non-convex optimization. They converge
onto approximately stationary solution concepts in the adversarial deep learning
120 4 Game Theoretical Adversarial Deep Learning
procedures. The local equilibrium is smoothened with respect to past iterates. The
smoothening procedure corresponds to form an experience replay in reinforcement
learning. The solution concept captures a state of iterated game play where each
player examines the past actions played and no player can make deviations to
improve the average performance of current play against the opponents’ historical
play. The learning algorithm is assumed to have access to a noisy stochastic gradient
oracle.
Xu et al. [671] investigate the problem of adversarial learning from noise-injected
data without assuming a specific adversary type at the learning stages. Information
theoretic limits of adversarial robustness called Le Cam type bounds are derived.
This work is comparable to other theoretical work in computational learning theory
for adversarial learning such as deriving generalization bounds for adversarial
learning at test time, robustness certification for statistical inference in adversarial
learning, robustly PAC learnability of VC classes, and analysis of the noise injection
in neural network training at inference time. The adversary is assumed to have a
budget on how much noise is injected into the data. This budget is related to the
total variation (TV) distance between the original data distribution and the noise-
injected data distribution. TV is a statistical distance that is used in the study of
the upper and lower bounds for adversarial robustness in learning problems such
as mean estimation, binary classification, and Procrustes analysis. Noise injection
methods are restricted to multivariate Gaussian and multivariate uniform noise. An
expected risk is estimated in minimax optimization frameworks to derive Le Cam’s
bound.
Scutari et al. [545] survey optimization methods in communication systems and
signal processing. Equilibrium models in cooperative and non-cooperative game
theory are used to describe scenarios with interactive decisions in applications such
as communications and networking problems, power control and resource sharing
in wireless/wired and peer-to-peer networks, cognitive radio systems, distributed
routing, flow, and congestion control in communication networks. Variation inequal-
ity (VI) theory is utilized as a general class of problems in non-linear analysis
of such applications in wireless ad hoc or per-to-peer wired networks, cognitive
radio (CR) networks, and multihop communication networks. Then the existence
and uniqueness of the game theoretical equilibrium are investigated to devise the
convergence properties of iterative distributed algorithms. Such algorithms can
also be designed for game theoretical adversarial deep learning with variational
adversaries.
Hinrichs et al. [273] discuss transfer learning between game domains so that
structural analogy from one learned game speeds up the learning of another related
game. Minimal ascension and metamapping are the proposed techniques to transfer
analogy matching representations between games with different relational vocab-
ulary. Minimal ascension finds local match hypotheses by exploiting hierarchical
relationships between predicates. Metamapping is a generalization of minimal
ascension to use all available structural information about predicates in a knowledge
base. The game domains range from physics problem-solving to strategy games.
Transfer learning is defined as the problem of finding a good analogy representation
4.4 Stochastic Games in Predictive Modeling 121
between the source and target domains and using that knowledge representation
to translate symbolic representations of learned knowledge from the source to the
target that have very different surface representations. A cognitive theory called
structure mapping is used to describe the analogies following human analogical
processing and similarity judgments. The mappings include candidate inferences
that represent information projected from source to target domains. Non-identical
matches between analogies are considered when they are part of a larger relational
structure that can be transferred. So structure mapping depends on symbolic,
structured representations of the data that include a vocabulary for representing
a hierarchy of predicates, set relations, and constraints on types of arguments to
predicates. Higher-order predicates such as logical connectives, argument structure,
planning, and discourse relationships are assumed to be the same across source
and target domains. Game theoretical solution concepts then support qualitative
and analogical reasoning on the mappings with compositional strategies in transfer
experiments on a finite state machine. The finite state machine then creates a
declarative understanding of the most efficient transfer learners at level of actions
and effects, threats and hazards, progress toward learning goals, and dynamic
analysis of game traces. A static domain analysis with the games leads to path
planning and quantity planning in spatial coordinates, ordinal relations, movement
operators, and potential influences on these quantities. A static domain analysis
produces compositional strategies when source-target domains are not isomorphs.
It empirically constraints the search space for automated learning strategies in
graph reachability heuristics. By contrast, a dynamic domain analysis verifies the
provenance of transferred strategies to replace failed strategies in a new domain with
learning goals. A dynamic domain analysis with the games leads to explicit learning
goals for knowledge acquisition on the effects of an action, applicability conditions
of an action, and decomposition of a goal into subgoals. The dynamic domain
analysis operates at a higher-level search space than the state machine representation
to drive more efficient exploration. The dynamic domain analysis regresses through
the game execution trace to explain an effect and construct a plan to achieve an effect
according to preference heuristics. Learned sequences can accommodate unforeseen
effects of actions due to adversarial responses where the behavior of adversaries
is incompletely known. Initial experiments can then be performed bottom-up to
learn action effects and preconditions of actions. Then they can include action-level
learning goals to decompose the performance goal of a game to develop a winning
strategy and credit assignment with automated reasoning. The improvements made
by transfer are characterized by a normalized regret score. A higher regret score
indicates that the transfer was beneficial.
Learnability of non-convex optimization landscapes is a topic for future work in
game theoretical adversarial deep learning. We can explore the convergence criteria
in generalization errors and sampling complexities of the concept classes in game
theoretical adversarial deep learning. We can analyze the optimal size of the training
data to predict the future behavior of an unknown target function in adversarial
deep learning such that the hypothesis function in game theoretical optimization is
probably approximately correct. Here, our research into deep generative models and
122 4 Game Theoretical Adversarial Deep Learning
Pawlick et al. [487] survey game theory to model defensive deception for cyberse-
curity and privacy in ubiquitous and wearable computing. A taxonomy of deception
is given as perturbation, moving target defense, obfuscation, mixing, honey-x, and
attacker engagement. It categorizes the information structures, agents, actions, and
duration of deception for its game theory modeling. Deception research is conducted
in military applications, psychology, criminology, cybersecurity, economic markets,
privacy advocacy, and behavioral sciences. Such deception is commonplace in
adversarial or strategic interactions of cybersecurity where one party has informa-
tion unknown to the other. Attack vectors with such deception have the potential
to turn Internet of Things devices into domestic cyber weapons. Cyberattacks can
be devised to physically affect critical infrastructure such as power grids, nuclear
centrifuges, and water dams. Adversaries obtain information about their targets
through reconnaissance where deception counteracts any information asymmetry.
Game theory models the deceptive interactions as strategic confrontations of conflict
and cooperation between rational agents. Each player in the game of cybersecurity
and privacy makes decisions that affect the welfare of the other players. Game
theory is able to model the essential, transferable, and universal aspects of defensive
deception in cyberspace. One-shot and multiple-interaction games lead to static
and dynamic deception, respectively. Deception techniques include impersonation,
delays, fakes, camouflage, false excuses, and social engineering. The stages in
a malicious deception include design of a cover story, planning, execution, and
monitoring. Stackelberg, Nash, and signaling games are the most common game
theoretical models with two-player dynamic interactions. Application domains
include adversarial machine learning, intrusion detection systems, communications
jamming, and airport security.
Nguyen et al. [462] investigate strategic deception from adversary who has
private information in repeatedly interacting with a defender. The sensitive private
4.4 Stochastic Games in Predictive Modeling 123
preference elicitation about the attacker goals and topological misinformation about
incorrect beliefs in the network topology. They can be incorporated into the task
of adversarial classification with reinforcement in game theoretical adversarial deep
learning with hypergames where the learning algorithm predicts attack preferences.
In game theory, intentional deception and misperception utilities are formalized as
hypergames where deception is a component of the strategies in play. Hypergames
formulate defender goals, observations, subgames, and individual strategies defined
in game contexts consisting of an adversary context and a defender context
that present player-specific perspectives of the game. Through observation of the
attacker, the defender tries to infer the attacker’s beliefs over time and apply them
in future decision-making. The attacker’s beliefs are used to estimate the state of
the adversary types as well as attacker’s perceived payoffs with knowledge of the
game tree and attacker perceptions. The defender then dynamically manipulates the
game board with update rules to change its iterative payoffs associated with next
possible actions. The decisions made by the defender in an online learning solution
alter the actions taken by the attacker, limit the strategies available to the attacker
at the next time step, and manipulate the payoffs received by the attacker. Thus,
hypergame concepts can investigate attack trees according to defender goals on
deception rather than adversarial goals on manipulation where there are resource
allocation costs associated with each play.
Cybenko et al. [139] edit a book about adaptation techniques (AT) such as mov-
ing target defenses (MTD) to engineer adversarial machine learning systems with
randomization for security and resiliency purposes. Adaptive cyber defense (ACD)
is categorized into adaptation techniques (AT) and adversarial reasoning (AR)
for adversarial learning in operational learning systems. AR combines machine
learning, behavioral science, operations research, control theory, and game theory to
compute strategies in dynamic, adversarial environments. ACD techniques force the
adversaries to re-assess, re-engineer, and re-launch cyberattacks. A game theoretical
and control-theoretic analysis for tradeoff analysis of security requirements in ACD
presents the adversaries with optimized and dynamically changing attack surfaces.
Prototypes and demonstrations of ACD technologies are presented in several real-
world scenarios.
Dasgupta et al. [146] conduct a high-quality survey of game theory-based mod-
eling of adversarial learning. A supervised machine learning algorithm’s prediction
mechanism is summarized. But the ideas are application to other machine learning
mechanisms in clustering, ranking, or regression. A taxonomy for adversarial
machine learning is characterized in terms of influence, specificity, and security
violation dimensions across adversary types. The influence dimension specifies
causative and exploratory attacks on learner vulnerabilities to create modified
training data and testing data called adversarial data. Adding adversarial data to
the learning process of a classification leads to an incorrect classifier that outputs
classification errors. Adversarial learning to create secure classifiers is then modeled
as a two-player, non-cooperative game. The utility functions of each player reveal
the player’s preferences over various outcomes of the game expressed in terms of
joint actions of all players in the game. The outcome of a game is the strategy
4.4 Stochastic Games in Predictive Modeling 125
selected by each player. The most popular optimization criterion to calculate the
outcomes is Nash equilibrium that assumes the outcomes to be best response
strategies of rational players. The Nash equilibrium is solved for as a search and
optimization problem. In two-player, zero-sum games, the utilities of all players
sum to zero at every iteration of the game. In adversarial learning with zero-sum
games, the gain in utility for a learner comes at the cost of loss of adversary’s
utility and vice versa. This observation leads to a minimax theorem for finding
the Nash equilibrium in a zero-sum game. The minimax outcome is represented
as a constrained optimization problem solved by a linear program. The minimax
theorem does not hold for general-sum non-zero sum games. Because the classifier
reacts to adversarial manipulations, the strategy selection in adversarial learning
is most frequently modeled as a sequential move game rather than a simultaneous
move game. In sequential move games, the follower player has information about
the strategies selected by the leader player. Such information is used in the
optimization of the player’s utility functions. However, the leader has to incorporate
uncertainty about the follower’s strategies leading to Bayesian games. In the normal
form Bayesian game, each player has information about utilities of the other
competing players. Based on this information, we can calculate the expected utilities
conditioned on player types for each player. Security games in cybersecurity are
related to Bayesian games for adversarial learning. In security games, the learner
is a defender protecting a set of targets from an adversary called the attacker. The
defender has to do resource allocation within budget and operational constraints.
In general, the learner’s utility is calculated on the payoff/value of a learner for
correctly classifying the input. Similarly, the adversary’s utility is defined as the
payoff/value of misclassification of an adversarial input presented to the learner.
The learning problem is then formulated as a constrained optimization problem. It is
solved as a mixed-integer linear program for the adversary and a robust classification
strategy for the learner. Adversarial approaches that avoid reverse engineering
black-box classifier’s decision boundaries search over an adversarial cost space to
determine a minimum set of adversarial examples. Moving target defense extends
the resultant learners to employ randomization over multiple classifiers instead of
tuning the parameters of a single robust classifier. Adversary can also generate
data by selecting, removing, or corrupting features from the input dataset. The
learner’s objective is to then find an optimal set of features that minimize its loss
function. When the learner does not have access to the entire training data, the
learning objective becomes an online learning problem. We can then analyze the
runtime and sample complexity of online learners pitted against different adversary
types with their own adversarial cost functions and adversarial example generation
functions. Here, the learner can know about the adversarial cost functions but not its
own ground truth or input distribution. Adversarial training of deep neural networks
leads to deep learning games. They can be formulated as a repeated zero-sum game.
Iterations on repeated plays are then used to adjust the weights of a neural network’s
edges to converge onto the Nash equilibrium with exponentiated weight and
regret matching optimization algorithms. Adversarial robustness of deep learning
classifiers can also be improved by adversarial data generators that are used along
126 4 Game Theoretical Adversarial Deep Learning
with adversarial learning procedures. The most common adversarial data generators
use perturbation techniques on valid examples, transfer adversarial examples across
different learner models, and extend generative adversarial networks. Here, game
theory models can be formulated to be informed by modeling and reasoning costs
such as cost to solve for Nash equilibrium, cost to maintain game play history,
cost to build opponent models from the history of game theoretical interactions,
and expenses incurred by the adversary to access legitimate resources. Transfer
learning can be combined with adversarial learning in real-world applications to
create learning systems on sparse training data that make classification predictions
correctly without requiring information-rich data sources. Domain adaptation can be
applied to adversarial learning to reliably transfer the robust learners mapped out in
the dense source domain to the sparse target domain. The application for combining
transfer learning with adversarial learning includes email spam classifiers, social
network sentiment analysis tools, and image and sensor data recognition systems on
autonomous vehicles.
Hamilton et al. [249] discuss the application in the tactical analysis of informa-
tion warfare. Game theory algorithms can be developed in military applications to
predict future attacks across many possible scenarios and suggest courses of action
(COA) in response to the most dangerous possibilities. COA generation techniques
can benefit from adversarial learning. Game theoretical frameworks allow detailed
analysis of what-if scenarios of chains of events to find exceptions to general rules
in cyber-wargaming systems. Such analysis determines the likelihood, method,
and cost of scenarios such as intelligence-gathering in attack phase, targeting of
the command and control system, data corruption, and denial-of-service attack to
prevent the kinetic warfare planning process. Pruning techniques are required to
reduce the search space in evaluating complex max-max games so that the most
promising node in the game tree is expanded to queue its children for the analysis
of the most promising move that is most likely to be predicted in a given real-world
scenario. Reinforcement learning techniques can be used to iteratively increase the
depth of the game tree and set the corresponding evaluation characteristics to learn
which depth best predicts the opponent behavior. Here, the assumption in the design
of the defender’s evaluation function is that the opponent’s evaluation function uses
a subset of the heuristics of the defender’s evaluation function with changes in
optimality weights.
Schlenker et al. [541] introduce a cyber deception game in network security. It
is solved with mixed-integer linear program solution and a fast greedy minimax
search algorithm. This game theoretical adversarial learning framework for service
obfuscation can be used dynamically to create asymmetric information about the
true state of a network in addition to static measures of network security such as
whitelisting applications, locking permissions, and patching vulnerabilities. Related
work is in honeypot selection games, adversary signaling games, and annotated
probabilistic logic models on attacker’s scan queries. The cyber deception game
is a zero-sum Stackelberg game between a network administrator defender and a
hacker adversary. Nguyen et al. [463] allocate limited security countermeasures to
protect network data from cyberattack scenarios modeled as Bayesian attack graphs.
4.4 Stochastic Games in Predictive Modeling 127
with optimal tradeoffs between multiple goals that include security requirements
such as confidentiality vs. integrity vs. availability, adversarial cost functions in the
attack graph about the level of skills required to mount an attack, and ordered set
of fixed loss categories that apply to all adversarial and learning goals of interest.
Here, the game structure is assumed to follow a stochastic process so that the
loss distributions constituting the game structure are stationary distributions of the
stochastic process under the chosen convergence criteria.
Huang et al. [286] propose a dynamic game framework for long-term interactions
between a stealthy attacker and a proactive defender formulated as a multi-
stage game of incomplete information where each player has private information
unknown to the other. The players act strategically according to beliefs formed
by multi-stage observation and learning. A perfect Bayesian Nash equilibrium
is the solution concept computed by an iterative optimization algorithm. The
stealthy attacker is an APT attacker who has knowledge of the defenders system
architecture, valuable assets, and defense strategies. So APT attacker strategies are
tailor-made to invalidate cryptography, firewalls, and intrusion detection systems.
APT attackers can disguise as legitimate users in the long term. Multi-stage APT
models dividing the attack sequences are classified into attack sequences, or phases
are available in the open-source intelligence communities. They include Lockheed-
Martin’s cyber kill chain, MITRE’s ATT&CK, and NSA/CSS technical cyber
threat framework. During a reconnaissance phase, the attacker (also called threat
actor) collects open-source or internal intelligence to identify valuable targets.
Then the attacker escalates privilege to propagate laterally in the cyber network to
access confidential information or inflict physical damage. A system defender must
incorporate defensive countermeasures across all the phases of APTs with a defense-
in-depth strategy. On identifying the utility and strategies of the attackers, game
theory provides a quantitative and explainable framework to the system defender to
design proactive defense response under uncertainty with better tradeoffs between
security and usability. By contrast, rule-based and machine learning-based defense
methods cannot deal with uncertainty in the multi-stage impact of defense strategies
on both legitimate and adversarial users. Such multi-stage impact is seen in artificial
intelligence, economy, and social science where multi-stage interactions occur
between multiple agents with incomplete information.
Bohrer et al. [75] apply constructive differential game logic to derive structured
proofs in cyber-physical systems (CPS) for safety-critical applications such as
robotics, automotives, aviation, spaceflight, medical devices, and power systems.
Such formal methods of verification ensure the correctness of learning properties
of system models in the implementations of CPS on embedded processors such
as in autonomous driving and ground robotics. Here, game theory is used in the
analysis of differential equations without closed-form solutions. Game proofs in
an adversarial environment then create security warranties for a learning system
against different adversary types who violate the system’s correctness criteria with
manipulation of timing, sensing, control, and physics in hybrid games. Wellman et
al. [643] explore causal dependence structure in private information signal patterns
on underlying agent states that can act as the epistemic types of adversarial agents. A
130 4 Game Theoretical Adversarial Deep Learning
Bayesian game is then formulated with the private information. Probabilistic graph-
ical models (PGMs) are used to model the private information. Their dependence
structure is able to quantify adversarial agents who reason about their own payoff
function values conditioned on the information available on other agents’ payoff
function values. Bayesian networks are the PGMs used to not only interpret but also
generate signal patterns on the private information of players. Thus, PGMs provide
a qualitative framework for analyzing the probabilistic dependencies between
structural decisions and private signal’s game theoretical learning situations. Their
graphical structures allow reasoning about the implications of a game situation.
Applications are evaluated for prediction markets and auctions that augment the
graphical structures with patterns of reasoning available in these domains to suggest
generalization of the results in the applicable adversarial deep learning. Jordan et
al. [305] conduct an empirical analysis of complex games. The value of empiricism
in games lies in the effective exploration of a set of strategies. So generic exploration
policies are proposed for strategy exploration in empirical games. They find a
best response with minimum regret profile among previously explored strategies.
Stochastic best response strategies lead to an effective exploration of the strategy
space. So empirical game theoretical analysis (EGTA) can augment expert modeling
with empirical sources of knowledge such as high-fidelity data obtained from
real-world observations. EGTA games are procedural descriptions of strategic
environments. The simulation and search statistics in EGTA can combine with
game theoretical solution concepts to characterize the strategic properties of an
application domain for adversarial deep learning. An augmented restricted game
is defined as a base game to encapsulate EGTA. Additional strategies are generated
with reinforcement learning.
Prakash et al. [497] examine the interplay between attack and defense strate-
gies in moving target defense (MTD). Multiple game instances are explored by
differences in agent objectives, attack cost, and attack action detectors. Such MTD
techniques incorporate probabilistic attack progressions to develop effective policies
for deploying and operating machine learning systems in specific adversarial
contexts. The behaviors of rational players vary with game theoretical learning
features such as system configurations, environmental conditions, agent objectives,
and technology characteristics. The systematic simulations in game theoretical
learning frameworks can accommodate computational complexities and information
uncertainties in learning dynamics of game formulations that are analytically
intractable. The adversarial payoff functions in MTD allow tradeoffs between
objectives of control and availability. They can incorporate assessments of overall
system state as adversarial cost functions. In addition to learning objectives of the
adversary and defender, security requirements of the learning system are interpreted
as preference patterns of the agents in the MTD game. For example, confidentiality
of the learning system is interpreted as defender’s strong aversion to allow the
attacker to control machine learning servers. Availability is interpreted as the
defender’s control on a fraction of servers that are not down. A weighting scheme
4.4 Stochastic Games in Predictive Modeling 131
games, stochastic general-sum game, stochastic non-zero sum dynamic game, two-
player Stackelberg stochastic game, Bayesian game, coalitional game, variable cost
game, incomplete information game, fictitious game, cheap talk game, etc. The
equilibrium analysis of the games then provides analytics insight into decisions on
issues such as security investment and patch management in complex networking
systems. Roy et al. [530] present a cyberspace taxonomy for classifying the game
types used in the defense mechanisms of next-generation network security and
secure computing. Static games are analyzed with respect to complete imperfect
information and incomplete imperfect information. Dynamic games are analyzed
with respect to complete perfect information, complete imperfect information,
incomplete perfect information, and incomplete imperfect information.
Otrok et al. [477] propose a game theoretical learning model for host-based
intrusion detection systems (HIDS) to offset high computation cost in the generation
and detection of false alarms for resource-limited systems such as wireless mobile
devices. Dynamic, non-cooperative, multi-stage, incomplete information games
are formulated for a mobile ad hoc network (MANET) according to a Bayesian
probabilistic model and a Dempster-Shafer probabilistic model for the mathematical
representation of uncertainty and risk management where the identity of the attacker
is unknown. The game solution concepts determine the posterior belief function
values of a user to determine misbehavior by decreasing false positives, increasing
attacker detection accuracy, and optimizing resource consumption efficiency in
HIDS. Perfect Bayesian equilibrium computes a set of strategies that are optimal
with respect to the estimated beliefs taken to be probabilities. Once the belief mea-
surement reaches a predefined risk threshold, the HIDS gets to decide whether a user
is an attacker or not. The belief measurement is obtained from evidence observed
from a data source. A belief fusion algorithm combines belief measurements to
generate a final belief of the problem domain. At the cost of extra computational
resources, such a final belief measurement is more precise than Bayesian posterior
likelihoods. The HIDS game elements are players and type space, strategy space,
prior beliefs, utility functions, and HIDS detection rate.
Nguyen et al. [461] discuss a zero-sum Stackelberg security game optimizing
a double oracle method on exponentially large action spaces to allocate botnet
detection resources in a game theoretical solution for the defense policies. Two
botnet data exfiltration scenarios are proposed to represent single and multiple path
attack vectors for stealing sensitive network data. Mixed-integer linear programs
optimize the defender’s and attacker’s best response oracles. Greedy heuristics
approximate and implement the oracles. L’Huillier et al. [353] utilize dynamic
games of incomplete information in phishing fraud detection such as email scams
to get private information. A weighted margin support vector machine acts as
the adversarial classifier for content-based filtering of phishing. Phishing filtering
over data streams of messages is based on online algorithms, generative learning
algorithms, and discriminative learning algorithms based in game theoretical adver-
sarial deep learning. Here, phishing can be categorized into deceptive phishing and
malware phishing. Deceptive phishing in turn is categorized into social engineering,
mimicry, email spoofing, URL hiding, invisible content, and image content. A
134 4 Game Theoretical Adversarial Deep Learning
deep learning has the potential to transform financial market landscape (such as
foreign exchange and commodity markets) from a human decision ecosystem
to an algorithmic trading technology with game theoretical agent-based market
models. Automated trading platforms can not only improve market efficiency but
also increase market risk and market fluctuations due to manipulative practices
around vulnerabilities driven by algorithms. Spoofing is defined as the submission
of a large number of spurious buy/sell orders with intent to cancel them before
execution, thus corrupting the limit order book’s signal on supply and demand.
Spoof orders are typically placed outside the current best quotes to mislead investors
before any market movement can trigger a trade. Experimental evaluation is around
a continuous double auction market model with a single security traded. The
market mechanism is designed to have key elements of market microstructure
conditions such as fundamental shocks and observation noise. Spoofing the limit
order book is interpreted as decision-time attacks on machine learning models to
generate adversarial examples with domain constraints on the order streams. Market
equilibrium behavior is then specified by game theoretical adversarial deep learning
aspects around the market context for balancing the robustness and efficacy of
machine learning from order information with cloaking mechanisms. Related work
is on adversarial linear regression with multiple learners by Tong et al. [602].
Nisioti et al. [465] present a data-driven decision support framework called
DISCLOSE for optimizing forensic investigations of cybersecurity breaches. DIS-
CLOSE maintains a threat intelligence information repository of tactics, techniques,
and procedures (TTPs) specifications. Adversarial TTPs are obtained for complex
attacks with multiple attack paths from interviewing cybersecurity professionals,
MITRE ATT&CK STIX repository, and Common Vulnerability Scoring System
(CVSS). Here, game theoretical adversarial deep learning acts as a reasoning
hypothesis increases the efficiency of the forensic investigation by decreasing the
time and resources for robust reasoning process about the logical links between
the uncovered evidence in an objective manner. The game theoretical strategic
reasoning can be contrasted with reasoning frameworks in machine learning such
as case-based reasoning, ruled-based reasoning, data-driven reasoning, etc. Here,
rule-based reasoning is framed from a combination of predefined rules, models, and
previous data. Probabilistic relation between available attack actions, findings of
a forensic investigation, benefit and cost of each inspection, and budget available
to the investigator are considered in the DISCLOSE decision support framework.
Liu et al. [383] conduct a systematic survey of security threats in machine learning
from the learning theoretic aspects of training/reasoning and testing/inferring phase.
Particular emphasis is on the data distributional drifts caused by adversarial samples
and subsequent sensitive information violations in statistical machine learning algo-
rithms. The adversarial capability to create adversarial manipulations according to
adversarial objectives is qualified by the impact of causative or exploratory security
threats, percentage of the training and testing data controlled by the adversary, and
extent of features and parameters known to the adversary. Attack types are then
categorized as causative attacks, exploratory attacks, integrity attacks, availability
attacks, privacy violation attacks, targeted attacks, and indiscriminate attacks.
136 4 Game Theoretical Adversarial Deep Learning
Current defensive techniques for machine learning are categorized into security
assessment mechanisms, countermeasures in the training phase, countermeasures
in the testing or inferring phase, data security, and data privacy. They are essential
in the design of intelligent systems that learn from massive data with high efficiency,
minimum computational cost, and reasonable predictive or classification accuracy.
Xue et al. [674] conduct a survey of security issues in machine learning systems
to summarize countermeasure defenses, secure learning techniques, and security
evaluation methods. The machine learning security threats and attack models are
categorized into training set poisoning, backdoors in the training set, adversarial
example attacks, model theft, and recovery of sensitive training data. Adversarial
examples are defined to be an intrinsic property of the deep learning models. Model
overfitting is then found to have an important influence on recovering sensitive
training data by an adversary who can carry out membership inference attacks
and model inversion attacks. The threat models, attack approaches, and defense
techniques for machine learning systems are systematically analyzed to produce
cyber threat intelligence across multiple stages of the cyber kill chain. Adversarial
example attacks are found for email spam filtering, Android malware detection,
biometric authentication systems, face recognition systems, road sign recognition,
cellphone camera recognition, voice control systems, and 3D object attacks. They
are contrasted with backdoor attacks and Trojan attacks to create malicious data for
the target models. Future research directions are given as attacks under real physical
conditions, privacy-preserving machine learning techniques, watermarking-based
intellectual property (IP) protection of deep neural networks, remote or lightweight
machine learning security techniques, and systematic machine learning security
evaluation methods to produce underlying reasons for the attacks and defenses
on machine learning. Deep learning attacks can be crafted with deep generative
models such as GANs to break a distributed or federated learning framework.
Game theoretical adversarial deep learning is categorized as a defense technique
to simulate attacks, create robustness strategies, and detect abnormal features in
classifier design. It is contrasted with other defense techniques such as data saniti-
zation, input anomaly detection, input pruning and model fine-tuning, adversarial
retraining, defensive distillation, gradient masking, and input randomization. It
can also be combined with defensive techniques for protecting sensitive data such
as cryptography, steganography, distributed machine learning frameworks, trusted
platforms, and processors. Security evaluation of machine learning algorithms can
also benefit from the training, testing, and validation datasets generated by the game
theoretical adversarial deep learning within a design-for-security rather than the
design-for-performance paradigm for machine learning. Security evaluation curves
can also be created around the performance measures for machine learning and
cost functions for adversarial deep learning to characterize the learning system
performance, robustness, security, and privacy evaluation metrics calculated in the
presence of various adversary types having different attack strengths and knowledge
levels.
4.5 Robust Game Theory in Adversarial Learning Games 137
In a Stackelberg game, adversarial strategies are modeled and solved for the solution
rationale and decision-making problem defining the Nash equilibria. The solution
space for Nash equilibria is expressed in terms of the necessary and sufficient
conditions for game players’ convergence criteria [557]. Typical convergence
criteria are (i) zero-sum game vs non-zero sum game; (ii) two-player vs multiplayer
game; (iii) static game vs evolutionary game; (iv) sequential game vs continuous
game; and (v) deterministic game vs stochastic game. Typical players’ strategies
consider cases where a pair of players (i) do not know each other’s performance
criteria; (ii) compute each other’s strategies at different speeds; (iii) have linear
and non-linear payoff functions that may or may not be discontinuous; and (iv)
participate in a game with distributed control vs decentralized control. In such
games, the Stackelberg strategies and Nash equilibria are analyzed in terms of
the structural properties of the coefficient matrices of higher-order matrix-Riccati
differential equations.
The optimization of such game theoretical payoff functions presents a complex
problem in optimization theory. Such problems are often modeled as decision
problems in non-cooperative differential games [86]. The solutions to these prob-
lems are presented as Pareto optima, Nash and Stackelberg equilibria, and co-co
(cooperative-competitive) solutions for the payoff function.
The Riccati differential equations are also analyzed as differential games in
optimal control theory. If the game theoretical players can observe state of the
control system, then the Nash equilibrium is computed according to an open-loop
solution for the control system. If the game theoretical players cannot consider
feedback strategies, then the Nash equilibrium is computed according to a closed-
loop solution for the control system. Principles of dynamic programming are used
as the computational methods finding the game theoretical optima to the necessary
and sufficient conditions for optimal control system.
Furthermore, partial differential state equations of the control system can
augment the player’s payoff functions to result in stochastic control [9] in game
theoretical interactions. Here, the game theoretical equilibria are determined by the
necessary and sufficient conditions on the coefficients solving for the Stackelberg
Riccati differential, difference, and algebraic equations [206]. The study of such
equilibria and their numerical computational methods is the subject of evolutionary
and differential game theory [34].
Zhou et al. [710] model multiple types of adversaries in a nested Stackelberg
game framework. A single-leader learner has to deal with multiple-follower adver-
saries. The solution to the game is an optimal mixed strategy for the leader to play
in the game. The solution of a two-player Stackelberg equilibrium solution is used
as the strategy in a multiplayer Bayesian Stackelberg game. The Stackelberg game
is solved as a mixed-integer quadratic programming (MIQP) problem.
Ratliff et al. [507] characterize Nash equilibria in continuous games over
non-convex strategy spaces. Sufficient conditions are given for differential Nash
138 4 Game Theoretical Adversarial Deep Learning
equilibria. They require the evaluation of player costs and their derivatives. A
dynamical systems viewpoint is taken to analyze the convergence of best response
strategies to a stable equilibrium. The results in non-linear programming and
optimal control provide first- and second-order necessary and sufficient conditions
for local optima assessed as critical points of real-valued functions on training
data manifolds. Such continuous games arise in building energy management,
pricing of network security, travel-time optimization in transportation networks, and
integration of renewables into energy systems. Coupled oscillator models are chosen
for illustrating the system properties of continuous games. They have application in
power networks, traffic networks, robotics, biological networks, and coordinated
motion control. Dianetti et al. [158] investigate the existence of Nash equilibria in
monotone-follower stochastic differential games where each player has submodular
costs. The monotone-follower problem tracks a stochastic control process to
optimize a performance criterion. It has applications in economics and finance,
operations research, queuing theory, mathematical biology, aerospace engineering,
and insurance mathematics. It can allow explicit feedback strategies to compute
equilibria in open-loop and closed-loop strategies in irreversible investment games.
Schuurmans et al. [544] introduce deep learning games. The optimization of
supervised deep learning models is expressed as the Nash equilibrium in a game.
A bijection is established between the Nash equilibria of a simultaneous move
game and KKT points of a directed acyclic neural network in deep learning. Then
a step-free regret matching algorithm is proposed for stochastic training to produce
sparse supervised learning models in deep learning. Thus, supervised learning is
reduced to game playing. A one-shot simultaneous move game is defined for a one-
layer learning problem. Regret minimization can also decompose multiplayer games
into multiple two-player games. Lippi [373] uses statistical relational learning
(SRL) frameworks in the description and the analysis of games. SRL combines
first-order logic with probabilistic graphical models to handle uncertainty in data
and its representation dependencies. SRL can be used in games such as partial
information games, graphical games, and stochastic games. Inference algorithms
in SRL such as belief propagation or Markov chain Monte Carlo can be used
for opponent modeling, finding Nash equilibria, and discovering Pareto-optimal
solutions. SRL produces probabilistic logic clauses to describe the strategies in
a game as a high-level, human-interpretable formalism. Games are described as
domains of interests, strategies, alliances, rules, relationships, and dependencies
among players. Techniques from inductive logic programming can then extract
rules from a knowledge base of logic predicates that aid probabilistic reasoning in
data-driven decision-making. SRL can also be combined with game theory to learn
model structures from data. Some of the SRL methodologies suitable for strategic
reasoning in game theory are Causal Probabilistic Time Logic (CPT-L), Logical
Markov Decision Programs (LOMDPs), DTProbLog, Infinite Hidden Relational
Trust Model (IHRTM), Infinite Relational Models toward trust learning, Relational
Reinforcement Learning, Probabilistic Soft Logic, and Independent Choice Logic
(ICL). SRL can also handle uncertainty in data to handle game models with
incomplete or unknown information. In machine learning, SRL has applications
4.5 Robust Game Theory in Adversarial Learning Games 139
Nash et al. [455] is the original paper defining equilibrium solution concepts in
multiplayer games with pure strategies. Mixed strategies then become probability
distributions over pure strategies. Nash et al. [454] discuss non-cooperative game
theory without coalitions. Each player acts independently rather than as a coali-
tion without communication or collaboration with other players. The equilibrium
solution concepts are then a generalization of those in two-player zero-sum games.
Several solution concepts are developed to satisfy the learning hypothesis in game
theoretical adversarial deep learning such as with geometrical form solutions and
contradiction analysis on equilibrium strategies. Transferability and comparability
between adversarial payoff functions are also a line of enquiry to contrast game
theoretical equilibrium solutions in real-world applications. Dynamical systems of
non-cooperative games can be developed for reducing cooperative games with pre-
play negotiations in the cooperative game that become plays in a non-cooperative
game to describe all the players’ payoffs in an infinite game.
Medanic et al. [418] develop explicit expressions of open-loop multilevel
Stackelberg strategies for control in deterministic sequential decision-making prob-
lems. Continuous linear systems are solved by quadratic optimization criteria to
characterize the Stackelberg controls. Higher-order square-matrix Riccati differ-
ential equations are also formulated to characterize the Stackelberg controls with
coefficient matrices in a dynamical system used for the statistical inference of their
structural properties. The strategies selected in the decision-making sequence for a
player are available to the other players after the current play. The dimensionality of
the associated dynamical system represents a differential constraint for determining
the optimal strategies of next control in the decision-making sequence produced by
subsystems in an interconnected learning system.
Freiling et al. [201] study the existence and uniqueness of Stackelberg equi-
librium in a two-player differential game with open-loop information structure.
Sufficient existence conditions are derived for open-loop equilibrium to solve
Riccati matrix differential equations. Time-invariant parameters are discussed to
address concept drift in the equilibrium solutions. The equilibrium solutions can be
extended with non-cooperative game theory having different hierarchical structures,
cost functions, and sample data information patterns in Stackelberg differential
games. A linear differential equation describes constraints to the state vector in
the game. The payoff functions are constructed to satisfy necessary and sufficient
conditions on the solution concepts obtained by solving differential equations.
OReilly et al. [475] study the dynamics of cyber adversaries to harden cyber
defenses according to adversarial robustness criteria. The dynamics are formu-
lated as a competitive co-evolutionary system that generates many arms races
for harvesting robust solutions. The co-evolutionary process is crafted in the
context of network cybersecurity scenarios where the defender leverages artificial
intelligence (AI) to gain competitive advantage in an asymmetric adversarial
4.5 Robust Game Theory in Adversarial Learning Games 141
convex objective functions is used for modeling the machine learning problems.
The constraints are expressed with functions of indicator random variables. The
game theoretical equilibria then define stochastic classifiers. Syrgkanis et al. [587]
investigate correlated equilibria in multiplayer normal form games embedded into
regularized learning algorithms and their black-box reductions. No-regret learning
is utilized to make players’ decisions. The regret bounds are found in adversarial
environments with no-regret algorithms such as multiplicative weights, mirror
descent, and follow the regularized/perturbed leader. The no-regret dynamics lead
to faster convergence rates for regularized learning algorithms. Their black-box
reductions in game theoretical environments preserve the convergence rates while
maintaining the regret bounds on adversarial robustness. Results are compared
against a simultaneous auction game in terms of utilities, regrets, and convergence
to equilibria. The welfare of the game is the result of a variable-sum allocation of
payoffs and resource matching corresponding to an unweighted bipartite matching
problem.
Many games of interest in adversarial deep learning lie beyond tractable mod-
eling and reasoning. Wellman et al. [642] investigate the gaps between strategic
reasoning and game theory. Computational complexity in automated reasoning is
shown to be due to number of agents, size of strategy sets and policy spaces, degree
of incomplete and imperfect information, and expected payoff computations in a
stochastic environment for adversarial learning. Empirical games are introduced
as game simulators that perform strategic reasoning through interleaved simulation
and game theoretical analysis. The building block for adversarial deep learning is
an interaction scenario where payoff information is obtained from data in observa-
tions and simulations. Constructing and reasoning about empiric games presents
interesting sub-problems in simulation, statistics, search, and game theoretical
adversarial deep learning analysis. Empiric game formulations are decomposed
into strategy space parametrization over continuous or multidimensional action
sets and imperfect information conditioned on observation histories. Strategy space
parametrization is done with candidate strategies in baseline or skeletal structures
for parametric variations on the game search architecture such as truthful revelation
of payoffs, myopic best response strategies, and game tree search in minimax and
max-max optimization. Estimation of precise, rigorous, and automated empiric
game techniques is recommended to be done with statistical techniques such as
Monte Carlo analysis in active learning of strategy choices by adversary types,
adjusting for cyber-observable factors with known effects on payoffs, applying
control variates to measure demand-adjusted payoffs, hierarchical game reductions
to affect computational savings, information theoretic criteria for selecting strategy
profiles, and regression in game estimation to generalize payoffs across very large
profile spaces given available data. Vorobeychik et al. [623] investigate games
with real-valued strategies where payoff information is learned on a sample of
strategy profiles. The payoff function learning problem is formulated as a standard
regression problem with known structure in a multi-agent environment. Learning
performance is measured with respect to relative utility of prescriptive strategies
rather than accuracy of payoff functions. The relative utility of strategies is also
4.5 Robust Game Theory in Adversarial Learning Games 143
Huang et al. [288] describe dynamic games for control system design that is
decomposable into cyber, physical, and human layers. Cross-layer design issues
give rise to security and resilience challenges in critical infrastructures. Such
critical infrastructures are seen in industrial control systems in sectors such as
electric power, manufacturing, and transportation. Here, the control system’s view
of design takes the perspective of sensing, control, and plant dynamics integrated in
a feedback loop in the physical layer. The control design techniques such as robust
control, adaptive control, and stochastic control deal with information uncertainties,
physical disturbances, and adversarial noise in the feedback loop. Adversarial noise
is seen in the cyber layer with communication and networking issues between
sensors and actuators as well as among multiple distributed agents. By contrast,
the human layer is concerned with supervision and management issues such as
coordination, operation, planning, and investment. The management issues include
social and economic issues, pricing and incentives, and market regulation and
risk analysis. In cloud-enabled autonomous systems, service contracts for security
services can include incentive-compatible attack-aware cyber insurance policies
that can be designed with game theoretical adversarial learning to maximize social
welfare and alleviate moral hazard. Adversaries exploit attack surfaces of the control
systems to exploit zero-day vulnerabilities in autonomous systems such as self-
driving vehicles. Game theory provides frameworks for strategic interaction among
components in a complex system to quantify tradeoffs of robustness, security,
and resilience in system performance within adversarial environments for control
systems. In game theoretical frameworks, secure and resilient control design is
viewed as an extension to robust control design. Application focus areas are
enumerated as heterogeneous autonomous systems, defensive deception games
for industrial control systems, and risk management of cyber-physical networks.
The objective of resilient control systems is to have performance guarantees and
recovery mechanisms when robustness and security fail due to adversarial attacks
and system failures. Here, a robust control system can withstand uncertain param-
eters and disturbances due to design-for-security machine learning paradigm. The
defense mechanisms in robust control system design span cryptography, detection,
4.5 Robust Game Theory in Adversarial Learning Games 145
equilibria. The study of such equilibria and their numerical computational methods
is the subject of evolutionary and differential game theory. Then we shall translate
such stochastic methods into the language of adversarial learning with variational
adversaries. A privacy-centric enhancement of the learning capacity, randomization
strategies, and payoff functions in the game formulations would affect the weighting
regularizations and the decision boundaries of the machine learning algorithms
provided as services. Systems theory motifs from non-linear signal processing and
control theory statistics relevant for the adversarial data mining application may
also be defined from the domain knowledge. In this context, we propose to explore
wavelet decompositions and maximum entropy modeling of the data distributions.
Grunwald et al. [236] develop a decision theory by connecting maximum entropy
inference to minimizing worst-case expected loss in zero-sum restricted games
between the player’s decision-maker and nature. The decision theory is used to
derive loss functions that can be used in adversarial deep learning. The maximum
entropy distribution defines the decision-maker’s minimax strategy. A generalized
relative entropy measure is introduced for the decision-theoretic definition of
discrepancy and loss function. The generalized relative entropy is comparable to
other entropy optimization frameworks such as Renyi entropies and expected Fisher
information. Franci et al. [200] typecast the training of generative adversarial net-
works (GANs) as a variational inequality problem with stochastic Nash equilibrium
solution. A stochastic relaxed forward-backward training algorithm is proposed
for GANs. Cai et al. [98] conduct a survey of the state of the art in GANs from
security and privacy perspective. The game theoretical optimization strategy in
GANs is used to generate high-dimensional multimodal probability distributions
that have important applications in mathematics and engineering domains. In the
GAN-based methods in adversarial deep learning, the generators can be used to
not only craft adversarial examples but also design defense mechanisms. In data
privacy research, GAN-based methods can be used in image steganography, image
anonymization, and image encoding. Variational generative adversarial network
(VGAN) and variational autoencoder (VAE) can be built to strike a balance
between privacy and utility in synthesized images. In model privacy research, GAN-
based methods can be used to protect the learning model privacy anonymization
and obfuscation. In application domains of adversarial deep learning, GANs can
generate adversarial malware examples with data compression and reconstruction,
fake malware generation, and malware detection. They can be used to construct
bio-information systems for authentication; financial fraud detection problems in
credit card fraud, telecom fraud, and insurance fraud; botnet detection; and network
intrusion detection.
The adversarial noise characteristics can be defined with respect to the following
notions of noise in future research on game theoretical adversarial deep learning
where game theoretical learning models involve evolutionary adversaries, stochastic
adversaries, and variational adversaries targeting the misclassification performance
of deep neural networks and convolutional neural networks. The extent to which
noise on model parameters and training data can benefit the overall quality of
the data distributions generated by game theoretical adversarial learning depends
4.5 Robust Game Theory in Adversarial Learning Games 147
on the specific adversarial noise processes and the nature of the generated target
distribution.
• Adversarial noise is spam, outlier, discontinuity, and costly
• Adversarial noise is not ground truth, not signal, and non-iid
• Adversarial noise is fake data and false discovery
• Adversarial noise is unexpected prediction and information leak
• Adversarial noise is residual error and unknown object
• Adversarial noise is rare class and sparse structure
• Adversarial noise is complex motif and wrong decision
• Adversarial noise is misclassification example and incorrect regression value
• Adversarial noise is randomized sample and latent variable
• Adversarial noise is due to an underlying stochastic process
• Adversarial noise is statistically insignificant
• Adversarial noise cannot be explained
Ge et al. [213] formulate game design methods for robust quantum control played
between the uncertainties (or noises) and the controls in quantum hardware. Lloyd
et al. [392] introduce quantum generative adversarial networks where generator
and discriminator are equipped with quantum information processors. Romero et
al. [523] introduce variational quantum circuit to mimic the target distributions.
Such variational circuits for encoding classical information into quantum states
are very useful in machine learning applications such as adversarial classification.
We can run costly computational learning algorithms or their BLAS subroutines
efficiently on a quantum computer. Quantum generalizations of the adversarial
deep learning on quantum data distributions would involve quantum sampling,
quantum information, and quantum causality modeling to analyze the bias-variance
decomposition in adversarial payoff functions applicable into the computational
optimization of randomized prediction games. We can derive utility bounds for
quantum neural network’s deep learning in an empirical risk minimization frame-
work and a mistake bounds framework. We can also define quantum-enhanced
learning through interactions in an agent-environment paradigm of the quantum
computation to derive separability criteria in the neural computing mechanisms and
their generalization error due to quantum measurements. We can develop a theory
of sample complexity, formal verification, and fuzzy automata in the adversarial
models with reliable guarantees proposed on the quantum generative adversarial
learning in quantum neural network’s training and optimization. We can create
special-purpose quantum information processors such as quantum annealers that are
well matched to adversarial deep learning architectures. Hybrid classical-quantum
learning schemes would quantify the learnability of the quantum neural networks
in non-convex optimization landscapes from the perspective of the generalization
error and the estimation error due to quantum measurements. The implementation
of trainable unitaries over parameterized quantum circuits can then be analyzed with
a no-regret property in the tolerable error of the generated data. Then quantum
information processing tasks may be reformulated as discriminative learning,
generative learning, and adversarial learning problems with separability criteria
148 4 Game Theoretical Adversarial Deep Learning
structures for static and dynamic data that reduce the communication cost and
increase the load balancing in distributed memory systems. We would need to create
statistical validation criteria for such machine learning with reference to application
domain knowledge. In this context, we can explore evaluation metrics in data mining
applied to adversarial data modeling in cybersecurity applications.
Chapter 5
Adversarial Defense Mechanisms for
Supervised Learning
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 151
A. Sreevallabh Chivukula et al., Adversarial Machine Learning,
https://doi.org/10.1007/978-3-030-99772-4_5
152 5 Adversarial Defense Mechanisms for Supervised Learning
tion to noise and overfitting. They minimize a combination of the training error and a
regularization term. The regularization term is typically a tensor norm. It restricts the
complexity of the classifier’s function class to support generalization performance.
It regards the testing data samples as perturbed copies of the training data samples.
Therefore bounding such a perturbation reduces the gap between classification
errors. Structural risk minimization approach is a regularization technique that
minimizes a bound on the generalization error based on the training error and a
complexity term. The proposed robust SVM performs a minmax optimization over
all possible disturbances between training and testing data samples. Stability of the
SVM against a specific perturbation that can be estimated is a related robustness
notion that is also studied. The training loss plus the regularization penalty is the
regularized loss for training the robust SVM.
Yan et al. [676] propose adversarial margin maximization (AMM) networks
having an adversarial perturbation-based regularization of the adversarial learning.
A differentiable formulation of the perturbation is backpropagated through the
regularized deep nets. Such maximum margin classifiers tend to have better
generalization performance due to intra-class compactness and inter-class discrim-
inability. The proposed adversarial defense mechanism is able to generalize to
multi-label classifiers so long as a target label is properly chosen for the adversarial
perturbation. Zhong et al. [707] embed a margin-based regularization term into the
classification objectives of deep neural networks. The regularization term has two
steps of optimizations to find potential perturbations in an iterative manner. Large
margins in the adversarial classification guarantees the inter-class distance and the
intra-class smoothness in the embedding space to improve the robustness of deep
nets. A cross-entropy loss function is jointly optimized with a large margin distance
constraint acting as the regularization term. Classifier robustness is tested under
conditions of feature manipulation and label manipulation.
Alabdulmohsin et al. [7] discuss reverse engineering attacks against classifiers
with fixed decision boundaries. Then randomization in the classification due to
semidefinite programming in a distribution of classifiers is formulated to mitigate
adversarial risks and provide reliable predictions with a high probability. The
authors investigate the tradeoffs between the predictive accuracy and variance of
the classifier distribution. The reverse engineering attacks proposed are classified
under exploratory attack scenarios where the adversary is manipulating testing
data distribution. The proposed classification system attempts to make reliable
predictions while revealing as little information about the decision boundaries as
impossible. The problem learning with a distribution of classifiers is formulated as
a convex optimization problem. The defense of the classification system is com-
pared with adversarial classification, kernel matrix correction, ensemble learning,
multiple-instance learning, and game theoretical adversarial learning mechanisms.
Here exploratory defense strategies are said to cause disinformation about the choice
of training data, features, cost function, and learning algorithm. Another exploratory
defense strategy is to increase the complexity of the hypothesis space for the
adversary without causing overfitting for the classifier. In such a case, randomization
strategies would estimate a probability of selecting a class label instead of predicting
156 5 Adversarial Defense Mechanisms for Supervised Learning
generated from perturbing such non-robust features will transfer across all the
classifiers that rely on features that are weakly correlated with the correct class
label. In finite training data, such brittle features can even arise due to noise.
Therefore the adversarial perturbations can be interpreted as invariance properties
that a robust model satisfies. Robust training that achieves small loss for all
the perturbations can be viewed as a method to embed certain invariances in a
standard classification model. In this context, the authors observe that gradients for
adversarially trained neural networks align with the perceptually relevant features
of the input image. So we can interpret adversarial perturbations as producing
salient characteristics of samples belonging to the interpolated target class. Such
an explanation cannot be given in standard models where adversarial examples
appear as noisy variants of the input image. The interpolated target classes can be
represented with deep generative models such as generative adversarial networks
and variational autoencoders involving adversarial manipulations into the learned
representations. The loss landscape of robust learning models can then be used to
smoothly interpolate between classes. Studying the generative assumptions in the
data allows us to provide upper bounds on classifier robustness that is able to account
for sample complexity of robust learning.
Reinforcement machine learning is the study of intelligent agents and their actions
in a simulated environment such that a notion of cumulative reward is maximized
in the interactions between the agent and the environment. Instead of input/output
labels required in supervised machine learning, reinforcement learning’s focus is
to find a balance between exploration and exploitation of patterns. Reinforcement
learning can be interpreted as sampling-based methods to solve optimal control
problems. The goal of reinforcement learning is to learn a policy that maximizes
the expected cumulative reward and minimizes long-term regret. An intelligent
agent in reinforcement learning has to randomly select actions without reference
to an estimated probability distribution. Associative reinforcement learning tasks
combine supervised learning with reinforcement learning. In game theoretical
modelling, reinforcement learning can be used to produce error estimates on the
optimization with reference to bounded rationality.
Chen et al. [118] review adversarial attacks taxonomy on reinforcement learning.
The adversarial examples are classified into implicit adversarial examples that
add imperceptible adversarial manipulations to mislead the learner and dominant
adversarial examples which add physical world perturbations to change the local
information available to reinforcement learning. The adversarial attack scenarios
are classified into misclassification attacks to target a neural network performing
reinforcement learning and Targeted attacks to target a particular class label in
training that is misclassified into the target class label selected by the adversary. The
learning model trained according to reinforcement learning policies is called the
158 5 Adversarial Defense Mechanisms for Supervised Learning
game. The solution concepts for such games are found by learning the best response
strategies for each player that maximizes the current payoff with respect to current
strategies of opponents in the game. Nash equilibrium and regret minimization
are the most popular equilibrium concepts for the game theoretical reinforcement
learning. It has applications in multi-agent systems. Regret refers to the difference
between expected and actual payoffs for an agent. The expected payoff is calculated
on various strategies fixed in the game that are either pure or mixed in the search
spaces for machine learning. The actual payoff is computed empirically during the
game’s execution. The accumulated regret is optimized in regret-based learning
approaches. Popular algorithms that combine stochastic games with reinforcement
learning are Minimax-Q [374], Nash-Q [283], Fictitious Self-Play [267, 268], and
counterfactual regret minimization [163].
Song et al. [563] propose imitation learning algorithms multi-agent actor-critic
settings. In imitation learning, the agent learns desired behaviors by imitating
and expert. The expert optimizes an underlying reward function approximately.
The imitating agent learns policies through reinforcement learning. In multi-agent
settings, the reward function optima depends on non-stationary environments with
multiple optimum solutions. The imitation learning algorithms of a single agent
can be extended to multi-agent settings within generative adversarial training
frameworks. The authors map imitation learning to a two-player game between
a generator and a discriminator. The generator controls the policies of all the
distributed agents. The discriminator is a classifier for each agent that distinguishes
between agent and expert behavior. The discriminator maps state-action pairs to
scores. Discriminators can also incorporate prior information about cooperating and
competing agents. To maximize its adversarial reward function, the generator tries
to fool the discriminator with synthetic trajectories. Maximum entropy modelling
forms the loss function for the maximum likelihood estimation in the proposed
imitation learning. Adversarial training is used to incorporate prior knowledge
about the multi-agent settings with an indicator function in the augmented reward
regularizer within the minmax game for reinforcement learning. A policy gradient
algorithm called Kronecker-Factored Trust Region is the optimization algorithm
solving for the game theoretical equilibrium concepts. Imitation learning is also
called inverse reinforcement learning (IRL).
Multiarmed bandits [79] are a simplified version of reinforcement learning that
can benefit from adversarial training. Multiarmed bandit algorithms output an action
for the agent without using any information about the state of the environment called
context. Contextual bandits [364] extend multiarmed bandits by making output
decision conditional on the state of the environment. This allows us to personalize
each decision to a situation based on previous observations. The contextual bandit
algorithm observes a context, makes a decision, chooses an action from a distri-
bution of alternative actions, and observes an outcome of the decision. A reward
function value is associated with every decision. The machine learning goal is to
maximize average reward. Unlike supervised learning, contextual bandit algorithms
do not have all the reward values for every possible action. In machine learning, con-
textual bandits have applications in hyperparameter optimization, feature selection,
5.3 Adversarial Reinforcement Learning 163
Research into game theoretical adversarial learning can be extended into reinforce-
ment learning since the game theoretical objective functions of adversarial machine
learning can be interpreted as bi-level optimization problems [134] solved by actor-
critic methods [323] of decision theory. The task of adversarial classification with
reinforcement can be separated into the task of learning to predict attack preferences
and the task of optimizing operational policy that explicitly abides by the operational
constraints on the predictor. Then adversary’s best response strategies are computed
as randomized operational decisions [716]. To study the theories of robust machine
learning, we can develop computational objectives and statistical inference models
in randomized prediction games for adversarial algorithms discrimination, learn-
ability, and reliability. We can compare and contrast the blackbox optimizations in
game theoretical adversarial learning with multi-agent deep reinforcement learning
for model generalizability. Here the research into constrained objective functions for
adversarial learning is driven by the adversary’s capability and control on training
data and validation data taking into account application-specific attack scenarios
such as effect on class priors, fraction of samples, and features manipulated by
5.3 Adversarial Reinforcement Learning 165
the adversary. Depending on the goal, knowledge, and capability of the adversary,
these scenarios are also classified in terms of attack influence, security violation,
and attack specificity. Such constrained optimization problems on shallow archi-
tectures tend to produce intractable computational algorithms for class estimation
and inference of the adversarial cost functions. They necessitate the need for
deep learning architectures in the statistical methods solving the optimization
problems in adversarial payoff functions. In addition to operation constraints in the
security policies, distance and budget constraints in the adversarial cost functions
are also a research direction. The model update rules derived from the attack
scenarios could impact the convergence of the training process in terms of tradeoffs
between learnability and robustness of the proposed discriminative learning. We
can characterize the problem of discrimination in the presence of noise in terms
of a set of robust points where data encoding is a type of problem-specific error
mitigation strategy in cybersecurity classifiers. Then we shall incorporate non-
linearities in the classification through representing data with non-linear functions.
Such an arrangement would also allow us to explore multiple choices of variational
encodings of the learnable decision boundaries. Here, game theoretical payoff
functions measure player-driven optimizations that improve training and inference
in machine learning and uncertain environments. They also explain the impact of
uncertain environments with reference to a distribution of outcomes, and, in the
sense of decision-theoretic rationality around decision boundaries, payoff functions
maximize the expected utility for each player participating in the game.
Contextual bandits can be combined with game theoretical adversarial learning to
analyze multimodal, weakly supervised, noisy, sparse, and multi-structured training
datasets found in deep knowledge representation learning over dynamic streams and
complex networks. Bias-variance decomposition in the adversarial payoff functions
can derive regret bounds and utility bounds for such deep learning networks.
Furthermore, user or player feedback can be integrated into machine learning
performance measures as validation metrics for personalized recommendation and
adversarial ranking. Game theoretical adversarial learning can be used to explore
neural network architectures and adversarial cost functions in the training processes
implementing the data analytics for such cyber information processing tasks. Here
mistake bounds framework with no-regret property for online learning provides the
theoretical tools to analyze the tolerable error and update rules for the generated
adversarial data. We can also define the utility bounds of neural networks within
an empirical risk minimization framework for adversarial learning. Then cyber
information processing tasks may be formulated as discriminative learning problems
with separability criteria characterized by computational efficiency on structured
data. Thus adversarial deep learning can create mistake bounds frameworks in
cybersecurity applications. Here randomized prediction games can formulate the
learner of robust rank aggregation. We can express the learning robustness, fairness,
explainability, and transparency with game theoretical adversarial learning. Atten-
tion mechanisms in the deep generative modelling of the variational adversary’s best
response strategies can simulate and validate the learning environment for contex-
tual bandits. Payoff functions can be proposed for the knowledge representations
166 5 Adversarial Defense Mechanisms for Supervised Learning
generated by deep learning networks for the objects in multimodal, multiview, and
multitask predictions.
In transfer learning and stochastic optimization over the adversarial examples,
deep reinforcement learning has common objectives as game theoretical adversarial
deep learning. The reinforcement learning actions are typically expressed as a
Markov decision process. It uses dynamic programming techniques in implemen-
tation. Sampling problems in adversarial learning can thus focus on resilience
enhancements to Markov chain methods. Game theoretical modelling can focus
on integrating Bayesian Stackelberg games and Markov Stackelberg games with
reinforcement learning. In cybersecurity classifiers, adversarial cost functions can
be investigated for reinforcement learning that robustness bounds to adversarial
representations. In deep generative learning with Markov decision processes, we
can construct resampling dynamics variational autoencoders used in adversarial
learning as an alternative to derivative-free stochastic optimization methods in game
theoretical modelling of adversaries. The stochastic optimization in game theoretical
learning can benefit from the Markov security games. The game theoretical payoff
functions over complex systems can benefit from the learning models involving a set
of autonomous agents interacting in the shared environment within multi-agent rein-
forcement learning. Multi-agent environments are inherently non-stationary. The
causality and stationarity of the Markov decision processes can be explored within
adversarial settings based on principles for statistical inference such as expectation-
maximization, minimum description length, maximum likelihood estimation, and
empirical risk minimization. They have applications in data mining tasks such as
classification, regression, association rule mining, and clustering.
Computational algorithms in evolutionary game theory and numerical methods
in differential game theory can augment the game theoretical payoff functions with
partial differential state equations of a dynamical system modelling the complex
interactions in stochastic control as game theoretical objective functions. Then
principles of dynamic programming can be used to study the convergence properties
of game theoretical optima. Reliability guarantees can be developed for the solution
concepts according to theories of sample complexity, formal verification, and
fuzzy automata in the adversarial learning. Variational methods and generative
models can represent the adversarial manipulations in the solution concepts for the
adversarial losses and feature embeddings in cybersecurity. Proper quantification
of the hypothesis set in decision problems of such research leads us into various
functional problems, oracular problems, sampling tasks, and optimization problems
in the game theoretical adversarial learning. Here we can compare the solutions
with machine learning baselines such as the inclusion of noise in the optimization
procedure, the simplification of the function landscape by increase of the model
size, the schemes for derivative-free stochastic optimization, and data resampling
in the context of adversarial learning algorithms. The game theoretical adversarial
learning frameworks of iterative attack scenarios and defense optimizations would
then be able to apply game theory to the dynamics detection, characterization, and
prediction in a dynamical system. The complex dynamics detected would lead us
onto adversarial training procedures for the robust optimization of deep neural
5.4 Computational Optimization Algorithmics for Game Theoretical. . . 167
In generalized least squares models and generalized linear models for predictive
analytics, classification loss functions optimize the class-conditioned data likelihood
functions [244, 257] of the targeted deep networks. In this book, the adversarial
cost functions regularize such likelihood functions with norms, gradients, and
expectations of game theoretical objective functions inferred on the adversarial
loss functions. The types of such objective functions determine the types of
adversaries participating in prediction games with the classifier. In this book we have
proposed adversaries solving for evolutionary objectives and variational objectives
in the prediction games. The optimal values for the objectives are searched by
evolutionary algorithms such as genetic algorithms, simulated annealing algorithms,
and alternating least squares algorithms.
In this section we review additional computational algorithms, stochastic oper-
ators, and convergence criteria for computational optimization in deep learning
models. Such a study is expected to lead us to better randomization, convergence,
and parallelization in computation of the step magnitude and the step direction in
our stochastic optimization methods [566]. In designing the iterative update rules
of optimization algorithms and fitness functions solving for systems of equations,
we are interested in robust optimization, numerical optimization, and non-linear
optimization. In addition to game theoretical models, deep learning optimizations of
our interest include utility functions found in expectation-maximization algorithms,
maximum entropy models, learning classifier systems, deep factorization machines,
and probabilistic graphical models.
Fogel [199] categorizes the simulated evolution techniques in stochastic opti-
mization of neural networks. Depending on the facet of natural evolution (i.e.,
viewed as optimizing problem-solving process), the techniques are called genetic
algorithms, evolution strategies, and evolutionary programming. These techniques
do not use higher-order statistics of the fitness function to converge onto opti-
mal solutions. These techniques are not as sensitive as gradient-based methods
to adversarial perturbations in the fitness function. Pirlot [495] describes the
strengths and weaknesses in simulated annealing (SA), Tabu Search (TS), and
genetic algorithms (GAs). Ledesma et al. [351] review the procedure to practically
implement simulated annealing. Bandyopadhyay et al. [29] use simulated annealing
to minimize misclassification rate across decision boundaries in pattern classifica-
tion. A deterministic annealing algorithm is proposed by Rose [525] to optimize
the problems related to clustering, compression, classification, and regression. A
168 5 Adversarial Defense Mechanisms for Supervised Learning
Algorithmic game theory (AGT) [464] is a research area that spans game theory
and computer science. It is concerned with the design and analysis of algorithms
in strategic environments. Typically the input to the algorithm is distributed among
multiple players or agents who have stakes in the algorithm’s output. The analysis
aspect of AGT applies game theoretical tools such as the best response dynamics in
the implementation and analysis of algorithms. The design aspect of AGT is about
5.4 Computational Optimization Algorithmics for Game Theoretical. . . 169
Grunwald et al. [237] show the equivalence between maximum entropy modelling
and minimizing worst-case expected loss from an equilibrium theory of zero-sum
games for loss functions and decision problems. A generalized relative entropy
with regularity conditions is proposed to analyze robust classifiers that minimize
divergence between distributions. A minmax theorem is then proposed for Kullback-
Leibler divergence between training data and adversarial data distributions treated
as a generalized exponential family of distributions. This gives a decision-theoretic
interpretation of maximum entropy principle where the adversarial loss function
is not only regarded as a logarithmic score. A decision-theoretic definition of
discrepancy or relative entropy between probability distributions for training and
adversarial data generalizes Bregman divergences to loss functions in machine
learning. Maximum entropy modelling is considered to be a version of robust Bayes
classifiers.
Cost-sensitive classifiers in machine learning have benefitted from zero-sum
game properties in game theory. Adversarial learning algorithms have made
improvements on the minmax game formulations to arrive at robust classifiers.
Rezek et al. [513] point out the equivalence between inferences drawn on previous
observation made in game theory and machine learning. Smooth best responses in
fictitious play of repeated games are contrasted with Bayesian inference methods
of machine learning integrated over adversarial distributions rather than empirical
averages. Then game theory is used in the analysis and design of variational learning
algorithms. For clustering a mixture of distributions, the variational learning
algorithms exhibit strong convergence properties and update rules. Proposed
solutions are closely related to developments in probabilistic graphical models.
So probabilistic graphical models can lead to efficient algorithms for calculating
the Nash equilibrium in large multiplayer games for supervised machine learning.
In general machine learning algorithms design for stationary environments in an
idealized academic setting can benefit from a game theoretical analysis of non-
stationary scenarios often found in dynamic real-world applications of machine
learning techniques.
170 5 Adversarial Defense Mechanisms for Supervised Learning
equilibrium to the games are a set of strategy profiles for the players where
no can increase their payoff without decreasing the payoff of another player. In
contrast, Nash equilibrium is reached when the profile chosen by every player in
the game is the best response with respect to profiles chosen by the remaining
players. Both Pareto-optimal and Nash equilibrium can be extended to multiplayer
games. Structure learning algorithms from SRL such as Markov logic networks
can produce interpretable probabilistic logic clauses to describe the strategies
of an adversary at a high level to humans. Here graphical games apply game
theoretical models to combinatorial graphs in machine learning and Monte Carlo
tree search can predict the evolution of game theoretical adversarial manipulations.
Graphical games consider players as nodes in a graph and edges represent their
interactions. So the payoff of a player depends on that of its neighbors rather
than all the players in the game. This leads to several local payoff matrices for a
player. Logic formalisms in SRL such as inductive logic programming can address
knowledge representation learning on games to describe domain of interest in game
theory such as strategies, alliances, rules, relationships, and dependencies among
players. They can also discover information about the external environment for
adversarial learning. Probabilistic reasoning is also useful to deal with missing
or incomplete information for decision-making in game theoretical modelling
for machine learning. So SRL has application in decision-making scenarios over
reinforcement learning and adversary modelling. Markov logic in game theoretical
adversarial learning allows us to model adversarial knowledge in terms of logic
predicates about evidence (known facts) or query (facts to be inferred). It can be
extended to decision-theoretic framework attaching utility functions to first-order
clauses in Markov logic decision networks. Expectation-maximization algorithms
can be constructed to infer the value of logic predicates. Their relational nature can
be exploited to model collective classification algorithms in multiplayer games with
interpretable strategies for real-world applications.
Aghassi et al. [4] propose distribution-free robust optimization to contend with
payoff uncertainty in incomplete-information games. A robust optimization equilib-
rium is analyzed for finite games with a bounded polyhedral payoff uncertainty set.
Such an equilibrium can be contrasted with non-cooperative, simultaneous-move,
one-shot, finite games with complete information leading to a Nash equilibrium.
At Nash equilibrium the game theoretical players maximize expected payoff with
respect to the probability distributions given by mixed strategy spaces. Such
worst-case expected utility models are well-suited for analyzing decision-theoretic
situations characterized by uncertainty modelling around the adversarial risk assess-
ments in distributional information available for machine learning as training,
testing, and validation datasets. Here sources of uncertainty in the modelling are
due to uncertainty in each player payoffs given tuples of actions, uncertainty
in players’ behaviors, and prior probability distributions around multiple player
configurations. To solve incomplete-information games, distribution-free decision
criterion of minimax regret is used for the optimization of online learning. The
robust game proposed by Aghassi et al. [4] is comparable to such online games.
The performance validation criteria for machine learning designed with such game
172 5 Adversarial Defense Mechanisms for Supervised Learning
interpreted as clusters. Nash stable partition (NSP) is the solution concept from
coalitional game theory is used to provide a final clustering configuration of the
features. Desirable properties in the clusters can be chosen with reference to the
various game theoretical payoff functions. NSP is found by solving an integer linear
program (ILP). A hierarchical clustering approach is then proposed to scale the
clustering with graph partitioning. All features selected in a cluster are relevant and
complementary to each other. To perform feature extraction using the clustering
technique, a feature ranking of the feature clusters is also proposed.
Bector et al. [39] review fuzzy mathematical programming in game theoretical
modelling. Fuzzy sets can be applied to research areas such as mathematical
programming and matrix game theory that occur at the interface of game theory
and decision theory. Fuzzy environment can provide generalization to the linear and
quadratic programming solving game theoretical objectives in constrained matrix
games within two-player non-zero sum games having fuzzy goals. Matrix games
with fuzzy payoffs can model multi-objective linear programming problems in
adversarial learning. Several solution concepts for such fuzzy matrix games are
then described. Fuzziness of the decision function for an adversarial classifier
can be modelled with respect to adversarial learning objectives, environments,
and constraints. It leads to fuzzy mathematical programming problems in game
theoretical adversarial learning. For instance, fuzzy preference relations can be used
for knowledge representation learning algorithms over multimodal datasets to even-
tually solve modality constrained mathematical programming problems formulating
the game theoretical models in machine learning. Computational algorithms must
be developed to find the optimal solutions for such fuzzy optimization problems in
game theoretical adversarial learning.
Perc et al. [490] survey cooperation in evolutionary game theory to solve prob-
lems called social dilemmas that represent interaction stochasticity between game
theoretical players. An evolution of strategies, promoters of cooperation, and co-
evolutionary rules are used to express the emergence of cooperation and defection
in evolutionary games. Dynamical interactions between players can be studied with
the co-evolutionary rules over complex networks representing interaction network,
data population growth, mobility of players, and aging of players. Ficici et al. [193]
introduce game theoretical modelling in co-evolutionary memory mechanisms. The
collection of salient traits of memory is represented as a mixed strategy. The memory
embodies solution of co-evolutionary process obtained at Nash equilibrium. The
memory can be subject to resource limitations during training. Memory and drift
in co-evolution are interpreted as sampling errors and variational biases in game
theoretical modelling. Sensitivities and contingencies around the fitness function
evaluation of co-evolution processes can cause a machine learning system to learn,
forget, and relearn the memory traits in a cyclic fashion. The solution concept in
game theory then represents collection of memory traits belonging to a desired
or correct set. The proposed “Nash memory” mechanism accumulates a collection
of the traits as best response strategies. Nash equilibrium strategies provide best
response solutions expressing the security level of the evolutionary game as a
highest expected payoff reached by all the players acting as a collective. This
174 5 Adversarial Defense Mechanisms for Supervised Learning
security level is also known as the value of the game. A search heuristic is designed
on the co-evolving population that is able to relieve the population of the burden
of representing the solution and concentrate on search to improve the solution
represented by the memory. Polynomial time algorithms are used to solve a zero-
sum game with linear programming. The strategy space for finding Nash memory
may be finite or infinite, countable, or uncountable. Herbert et al. [272] propose
game theoretical modelling for competitive learning in self-organizing maps (SOM).
The focus of the training process in SOM-based clustering is to find a neuron that
is most similar to the input vector. The proposed extension GTSOM evaluates the
overall quality of the SOM by arriving at a globally optimal position using game
theory to propose dynamic and adaptive update rules to the neuron weights that
are able to account for density mismatch in clustering problems. The clusters are
described in terms of the actual input data and the neurons associated with the data.
Game theory is able to rank the neurons to determine the neurons providing greatest
increase in SOM quality according to distance from the input vector. Additional
quality measures on the neurons can also be introduced to consider related feature
maps extracted from the data. Game theoretical strategies are proposed to adjust the
learning rate of the SOM such that the input vector will have an increased likelihood
to be closer to a different neuron in the next iteration of the training algorithm. A
set of game theoretical actions details the clustering neighborhood and density to
distinguish or diminish the desired clusters. Training terminates if the SOM has
reached a user-defined threshold for the clustering quality preferences.
Schuurmans et al. [544] investigate connections between supervised deep learn-
ing methods and game theory. No-regret strategies in game theoretical modelling are
found to be effective stochastic training methods for supervised learning problems.
Regret matching is proposed as an alternative to gradient descent to efficiently
optimize the stochastic performance of supervised deep learning. A supervised
learning process over a directed acyclic neural network with differentiable convex
activation functions is expressed as a simultaneous move game with simple player
actions and utilities. Players choose their actions independent of actions taken
by other players. The cumulative regret for each player is defined in terms of
their expected utility function. Domain experts and nature can also be accounted
for in the mapping of strategies and actions for the learner. A close correspon-
dence is found between convex online learning and two-person zero-sum games.
Exponentiated weight algorithm and regret matching are proposed as constrained
training algorithms for supervised learning. Training results about the regret bounds,
convergence criteria, and global optima of the constrained training algorithms
are compared with projected stochastic gradient descent and stochastic gradient
descent. The constrained training algorithms are found to be highly competitive
in high-dimensional sparse feature spaces in supervised learning networks. Nash
equilibrium is guaranteed to be one of the local optima if not the global optima
for the deep neural network training. Regret-matching algorithms in evaluation are
found to achieve lower misclassification errors than standard deep learning methods.
However, the proposed theory does not apply to neural networks with non-smooth
activation functions within several hidden layers.
5.4 Computational Optimization Algorithmics for Game Theoretical. . . 175
To arrive at the global optima, Oliehoek et al. [472] model generative adversarial
networks (GANs) according to finite games in mixed strategies. The proposed
solution concept monotonically converges to a resource-bounded Nash equilibrium
that is saddle points in mixed strategies. The solution concepts can be made more
accurate with additional computational resources for approximate best response
computations. The proposed training algorithm for deep generative modelling is
able to avoid common problems such as mode collapse, mode degeneration, mode
omission, and mode forgetting. The proposed game-theoretic method is called
Parallel Nash Memory. It can be exploited to produce improvements in the robust
training metrics of classifiers/generator networks performance. The discriminator
and the generator models can then be updated according to the best response strategy
at each iteration. They can explicitly limit the allowed strategies with finite-state
machines. The resulting robust models yield better generative performance at the
same total complexity and are closer to a global Nash equilibrium. They can be
extended with zero-sum polymatrix games and reduced games with adversarial data
guiding the training.
Hsieh et al. [281] also propose training strategies for GANs to discover mixed
Nash equilibria. Further sampling methods are proposed to solve the mixed strategy
games. The proposed mean-approximation sampling scheme can augment the global
optimization frameworks for game theoretical adversarial learning. Specifically,
a mean-approximation sampling scheme for bi-affine games is investigated to
provision practical training algorithms for GANs. The robust training reformulates
the GAN distributions over finite strategies as probability measures over continuous
parameter sets. A sampling method called entropic mirror descent estimates such
probability measures in a tractable manner. Thus the robust training reformulates
the training dynamics of gradient-based algorithms into minmax programs solved
with mathematical programming and algorithmic game theory. In the experiments,
the stationary optima found by gradient-based algorithms such as SGD, Adam, and
RMSProp are found to be not locally or globally minmax optima. This leads to
further development in the intuitions of non-convex optimization applied to machine
learning validations.
Tembine et al. [595] present an interplay between distributionally robust games
and deep generative adversarial networks. A Bregman discrepancy between adver-
sarial and training data distributions is constructed to avoid using a second derivative
of the objective function in the optimization algorithm applied to GANs. GANs are
formulated as distributionally robust games in adversarial multi-agent settings. The
players in the strategic-form game are the neuron units. The plays are the learned
weights. The game theoretical objective functions are loss functions obtained from
the mismatch between the output and real data measurement. The convergence
rate of the proposed deep learning algorithm is derived using a mean estimate.
Mean-field learning is seen as a candidate class of algorithms to be investigated
for high-dimensional deep learning with games theoretical adversaries acting as
decision-makers. f-divergence and Wasserstein metric are used in the experimental
evaluation to find the mismatch between generated data and true data. So the hidden
layers in a neural network are seen as dynamic interactive environments represented
176 5 Adversarial Defense Mechanisms for Supervised Learning
as games. The multimodal output functions in the deep neural networks introduce
further difficulties in the game theoretical optimizations of strategic and interdepen-
dent parameters in the neural network training algorithms. Misalignment between
the updates of the components of neural networks training motivates the need
for game-theoretic payoffs in training algorithms such as error backpropagation,
stochastic gradient descent, and mean-field or population-based algorithms (such as
genetic, swarm, and simulated annealing). The estimation of the expected value of
the gradient in the derivative-based methods further requires sampling methods such
as Monte Carlo sampling, reinforcement learning, or population-based sampling
integrated with continuous action multi-agent adversarial games. Starting from
various dimensions of dynamical systems, such a strategic deep learning or deep
game-theoretic learning is also useful in addressing hyper-parametrization, curse
of dimensionality, and error propagation. The relevant game-theoretic solution
concepts for the data-driven model-based strategic deep learning across various
threat models include Nash equilibrium, Stackelberg solution, Pareto optima, Berge
solution, bargaining solution, and correlated equilibrium.
recognition, and machine translation. The risk mitigation classifiers tend to achieve
fairness in supervised learning over protected labels through constraints calibration
into the robust optimization criteria. The distributionally robust risk optimization
can accommodate adversarial and high-noise settings to design fair algorithms on
unknown latent groups. Minimizing expected risk in machine learning may produce
models with very poor performance on worst-case inputs. Uesato et al. [614]
define a tractable surrogate objective to the true adversarial risk which tends to be
computationally intractable. Optimizing the adversarial risk motivates the study of
machine learning model’s performance on worst-case inputs. Such an adversarial
risk has application in high-stakes situations involving machine learning systems
for malware detection, computer vision, robotics, natural language processing, and
reinforcement learning.
Wong et al. [652] propose deep ReLU-based classifiers robust against norm-
bounded adversarial perturbations on the training data. A robust optimization
procedure minimizes the worst-case loss over a convex outer approximation of the
set of final-layer activations achieved by norm-bounded perturbation to the input.
It is solved with a linear program represented as a deep neural network trained
by backpropagation of errors. The class predictions of such robust classifiers are
proved to not change within the convex outer bound to the loss function values
called “adversarial polytope.” Such a worst-case loss analysis of neural networks
is valid even for their deep counterparts including representation layers such as
convolutional layers. This research is an attempt at deriving tractable robustness
bounds for adversarial perturbation regions across the layers in deep networks.
It is contrast to research work on combinatorial solvers to verify properties of
neural networks. They include satisfiability modulo theories (SMT) solvers and
integer programming approaches. However, the state of the art in such verification
procedures is too computationally costly to be integrated easily into the current
robust training procedures. The data analytics task to solve such convex robust
optimization problems is to solve an optimization problem where some of the
problem data is unknown but belongs to a bounded set. Provable robustness bounds
on the adversarial error and loss of a classifier are derived from the dual solutions
of the optimization problem. They can be used in the definition of provable
performance metrics measuring robustness and detection of adversarial attacks in
custom loss functions evaluated on training, testing and validation datasets.
Sinha et al. [560] propose a distributionally robust optimization problem based
on the Wasserstein distance metric. It can be augmented into the adversarial training
procedure of machine learning model’s parameter updates faced with the worst-
case perturbations to the training data. It is able to achieve provable robustness
to smooth loss functions with little adversarial cost relative to the empirical risk
minimization of the learning loss. It can be used to provide certifying guarantees
on computational and statistical performance of adversarial training procedures.
Adversarial examples are formed due to a Lagrangian worst-case perturbation of
smooth loss functions. The proposed approach to distributional robustness is related
to parametric optimization models constrained on moments, support, and directional
deviations in training data distribution. It is also related to non-parametric measures
5.4 Computational Optimization Algorithmics for Game Theoretical. . . 179
Goodfellow et al. [225] summarize the need for regularization in deep learning.
Regularization in deep learning is discussed with reference to underfitting, over-
fitting, bias, variance, and generalization to control the computational complexity
of machine learning models. Regularized machine learning models perform well
on not only the training data but also on new inputs. Regularization terms in the
training objective functions are penalties and constraints designed to tradeoff the
reduction in the test error with possible increase in train error. Sparse representa-
tions, noise robustness, dataset augmentation, adversarial training, semi-supervised
learning, multitask learning, and manifold learning are listed as some of the novel
regularization techniques to introduce regularization into adversarial learning losses.
Goodfellow et al. [225] also survey the use of analytical optimizations special-
ized to improve the training procedures in deep learning. They take gradient-based
optimization as a comparison baseline in the benchmarking experiments. Their
objective is to find the neural network parameters that reduce a cost function
involving a performance measure evaluated on training dataset, regularization
terms evaluated on training dataset, and adversarial losses evaluated on validation
dataset. Here optimization algorithms must contend with parameter initialization
strategies, adaptive learning rates during training, and information contained in
the second derivatives of the cost function. The goal of an optimized machine
learning algorithm can be said to be to minimize the expected generalization error
by computing the average training error called empirical risk. The empirical risk
minimization tends to overfit to the training dataset. This must be accounted for in
the convergence criteria for optimization leading to batch, incremental, stochastic,
online, deterministic, and randomized optimization algorithms for deep learning
on dynamic data streams. Ill-conditioned problems, local minima, saddle points,
exploding gradients, and inexact gradients are listed as some of the theoretical
challenges in the optimization algorithms design. Machine learning paradigms such
as curriculum learning, generative learning, metric learning, and transfer learning
are useful to solve such problems with specialized neural network architectures.
180 5 Adversarial Defense Mechanisms for Supervised Learning
Wang et al. [637] discuss the relation between robustness and optimization of
secure deep learning. Adversarial training with projected gradient descent (PGD)
attack is chosen as the minmax optimization problem. The inner maximization
problem generates adversarial examples by maximizing the classification loss. The
outer minimization computes model parameters by minimizing the adversarial loss
on adversarial examples. First-order stationary condition (FOSC) having closed-
form solution for constrained optimization is proposed as such an adversarial loss.
It constructs a dynamic training strategy for robust learning with a gradual increase
in the convergence quality of the generated adversarial examples. As a defense
mechanism, the adversarial training technique is comparable to moderately robust
techniques such as input denoising, gradient regularization, Lipschitz regularization,
defensive distillation, model compression, and curriculum adversarial training. The
selected PGD attack scenario is comparable to fast gradient sign method (FGSM),
Jacobian-based saliency map attack (JSMA), C&W attack, and Frank-Wolfe-based
attack. Gradually increasing the computational hardness of adversarial examples is
an idea based in the curriculum learning paradigm for machine learning. It leads to a
speed up in convergence and improves generalization of deep learning networks. A
learning curriculum within a sequential ordering mechanism is designed for adver-
sarial training. The experimental results of the adversarial training are benchmarked
against the state-of-the-art attacks on WideResNet.
Kunin et al. [337] study the loss landscape in regularized linear autoencoders (LAE)
acting as models for deep representation learning. Autoencoders are trained to
minimize the distance between the data and its reconstruction. They learn a subspace
spanned by the basis vectors learnt from the training data. LAE is connected
with rank-reduced regression models like principal component analysis (PCA).
L2 regularization’s effect on orthogonality patterns in the encoder and decoder is
investigated. LAE are interpreted as generative processes. Denoising autoencoder
and contractive autoencoder are discussed as variants of LAE.
Vincent et al. [621] stack layers of denoising autoencoders in deep neural
networks that is able to demonstrate a lower classification error. Higher-level
representations of the training data are obtained from the denoising criteria acting as
an unsupervised objective for feature detectors. It is able to boost the performance
of support vector machines in multi-label classification. The reconstruction errors
in the proposed denoising autoencoder can be considered to be an improvement
on log-likelihood estimations in stochastic restricted Boltzmann machines utilizing
contrastive divergence updates. The infomax principle from independent component
analysis is exploited as the denoising criteria maximizing the mutual informa-
tion between input random variables and higher-level representations. Empirical
average of mutual information on training samples is taken as the unbiased
estimator for unsupervised learning. The encoder’s loss function is expressed as an
5.4 Computational Optimization Algorithmics for Game Theoretical. . . 181
by masking the spurious latent dimensions. Thus deep variational methods can
be considered as probabilistic generative models. The latent space representation
of data is due to a deterministic or stochastic encoder. The generated data is
from a decoder realizing a learnable family of function approximators. The data
distribution in the latent space follows a known probability distribution from which
sampling is feasible. The algorithmic masking procedure minimizes norm-based
reconstruction error and divergence metrics such as JS divergence, KL divergence,
or Wasserstein’s distance between a masked mixture density prior distribution and
the masked encoded latent distribution.
Zhao et al. [701] train deep latent variable models for discrete structures such
as text sequences and discretized images in textual style transfer. They are exten-
sions of the Wasserstein autoencoder framework and formalize the autoencoder
optimization problem as an optimal transport problem. Different fixed and learned
prior distributions from parameterized generators in the adversarially regularized
autoencoder can target generative representations in the output space. A transfer
learning-based parametric generator is trained to ignore targeted attributes of the
input. It can be used for sentiment or style transfer between unaligned source
and target domains. Image and sentence manipulations can be done in the latent
space via interpolation and vector arithmetic to induce change in the output space.
Constructing the style of interpolation requires a combinatorial search. A latent
space attribute classifier is introduced to adversarially train the encoder. Such
autoencoders accommodate smooth transformations in adversarially regularized
continuous latent space to produce complex modifications of generated outputs
within the data manifold. An information divergence measure such as the f-
divergence or Wasserstein distance minimizes the divergence between learned code
distributions of the true and model distributions. The cross-entropy loss in the
autoencoder upper bounds the total variational distance between the model/data
distributions. Discrete decoders such as recurrent neural networks can be incorpo-
rated into the model distributions. Here non-differentiable objective functions are
solved by policy gradient methods in reinforcement learning and Gumbel-Softmax
distributions to approximate the sampling of discrete data. The autoencoder learning
can be interpreted as learning a deep generative model with latent variables so long
as the marginalized encoded space is the same as the prior. Adversarial regulariza-
tion has an impact on discrete encoding, smoothness of encoder, reconstruction in
decoder, and output manipulation through prior. The resulting deep latent variable
models are sensitive to the training setup and performance measures. Improving
their adversarial robustness shall lead to models for complex discrete structures such
as documents.
Mescheder et al. [423] unify variational autoencoders (VAEs) and generative
adversarial networks (GANs). VAEs are expressed as latent variable models to learn
complex probability distributions from training data. An extension called adversarial
variational Bayes (AVB) with an interpretable inference model is proposed. It has an
auxiliary discriminative network formulating maximum likelihood estimation as a
two-player game not unlike the game in GANs. The proposed deep generative model
is better than generative models such as Pixel-RNNs, PixelCNNs, real NVP, and
5.4 Computational Optimization Algorithmics for Game Theoretical. . . 183
Plug & Play generative networks. In log-likelihood estimation, it has the advantage
of GANs to yield generative representations of the training data as well as that
of VAEs to yield both a generative model and an inference model. Here a highly
expressive inference model combined with a strong decoder allows the VAE to
make use of the latent space representations in arriving at the reconstruction error.
An adversarial loss in the inference model encourages the aggregated posterior
to be close to the prior over the latent variables. Bayesian parameter estimation
approximates the posterior distribution as a probabilistic model. It can approximate
the variational lower bound for learning a latent variable model minimizing the
KL divergence between the training and latent data distributions. The probabilistic
model is able to learn multimodal posterior distributions and generate samples for
complex datasets. A deep convolutional network is used as the decoder network.
The encoder network architecture consisting of a learned basis noise vectors is able
to efficiently compute the moments of latent data distribution that is conditioned
on the input data distribution. The inference model can represent any family of
conditional distributions over the latent variables. The experimental validation is
benchmarked against the annealed importance sampling (AIS) method for decoder
based generative models.
Blei et al. [72] discuss the utilization of variational inference and optimization in
Bayesian statistics for estimating computationally expensive posterior probability
densities. Variational methods are found to be faster than sampling methods such
as Markov chain Monte Carlo. They measure the information divergence between
the approximated data distribution to the posited family of target densities. Here
mean-field variational inference is applicable to exponential family models such as
maximum entropy models forming the loss functions in machine learning. They
can be used in the stochastic optimization of game theoretical adversarial learning.
Grunwald et al. [237] show a equivalence theory between maximizing a generalized
relative entropy and minimizing worst-case expected loss that is based on zero-sum
games between decision-maker and Nature. Robust Bayes acts are found to mini-
mize discrepancy or divergence between distributions maximizing entropy. They are
expressed as solutions to minmax theorems on Kullback-Leibler divergence com-
puted for a generalized exponential family of target densities. The minmax theorems
are called redundancy-capacity theorems in information theory. Generalized relative
entropy is an uncertainty function associated with the loss function for training
machine learning models. Additive models for statistical inference that are based
on Bregman divergences are special cases of the generalized exponential families.
They can be used to derive scoring rules such as Brier score and Bregman score
in the decision problems for multi-label classification. A Pythagorean property of
Kullback-Leibler divergence leads to an interpretation of minimum relative entropy
inference as an information projection operation between adversarial and training
data distributions on discrete sample spaces. It can be extended with entropy-related
optimization problems based in information theory about moment inequalities and
generalized entropy families. Such generalized entropies include Renyi entropies
and Fisher information interpreted from a minimax perspective.
184 5 Adversarial Defense Mechanisms for Supervised Learning
stochastic layers to model not only output pixels but also higher-level latent feature
maps. Autoregressive conditional likelihoods are explored in the context of data
analytics applications such as sentence modelling. The output distributions for
the generative and inference networks can be decomposed and factorized over
the latent variables to derive a log-likelihood for the reconstructed data that is
regularized by a KL divergence of the approximate posterior over latents with an
autoregressive prior. The latent representations of the input data are applicable to
deep representation learning in semi-supervised classification.
Hou et al. [278] propose a loss function for VAEs that enforces a deep feature
consistency preserving the spatial correlation characteristics of the input to give
better perceptual quality. The hidden features of a pre-trained deep convolutional
neural network (CNN) define a feature perceptual loss for VAE training. Instead
of reconstructing pixel-by-pixel measurements, the feature perceptual loss defines
a difference between hidden representations of images that have been extracted
from a pre-trained deep CNN such as AlexNet, VGGNet, and ImageNet. Latent
vectors obtained from such a VAR achieve state-of-the-art performance in facial
attribute prediction. The distribution of the latent vectors can be controlled accord-
ing to a KL divergence from Gaussian random variables. It is combined with
a reconstruction loss to train the VAE. Then attribute-conditioned deep VAEs
such as deep recurrent attentive writer (DRAW) [232] can be extended to semi-
supervised learning with class labels that combines attention mechanism with a
sequential variational autoencoding framework. The performance of VAEs can also
be improved with discriminative regularization of the reconstruction loss achieved
by GAN discriminator on the learned feature representation in VAEs. Feature
perceptual loss can be defined by neural style transfer and classification scores on
individual features from pre-trained deep CNNs.
Hou et al. [279] extend the deep feature consistent VAE to implement a
deep convolutional generative adversarial training mechanism that learns feature
embeddings in facial attribute manipulation. A multiview feature extraction strategy
is then proposed to extract effective image representations useful in facial attribute
prediction tasks. Such a generative model for an image database is useful for
generating realistic images from random inputs, compressing the database into the
learned parameters of a model, and learning reusable representations of unlabelled
data that are applicable into supervised learning tasks such as image classification.
The proposed discriminator balances outputs between image reconstruction loss
and adversarial loss. The proposed VAE can linearly learn semantic information
of facial attributes in a learned latent space. It can extract discriminative facial
attribute representations. Images can be transformed between classes by a simple
linear combination of their latent vectors. Attribute specific features can be encoded
for annotated images to manipulate related attributes of a given image while fixing
the remaining attributes. Thus the adversarial training proposed in the VAE can be
conditioned on class labels and visual attributes obtained from the data manifold of
natural images.
Larsen et al. [347] present an autoencoder that can measure similarities in data
space based on learned feature representations. The representations are obtained
186 5 Adversarial Defense Mechanisms for Supervised Learning
the adversarial network. The reconstruction phase updates the encoder and the
decoder to minimize the reconstruction error of inputs. The regularization phase
updates discriminative network to distinguish between true samples generated using
prior and generated samples that are the hidden codes computed by the autoencoder.
Then the generator is updated to confuse the discriminative network. The generator
of the adversarial network is also the encoder of the autoencoder. After training,
the decoder of the autoencoder defines a generative model that maps an imposed
prior to the data distribution. The encoder can be one of a deterministic function, a
stochastic distribution such as the Gaussian posterior, and a universal approximator
of the posterior that combines both training and adversarial data distributions. A re-
parametrization trick is used in the back-propagation of error through the encoder
of a stochastic distribution. In case of a universal approximator of the posterior,
the adversarial training procedure is interpreted as an efficient method of sampling
from the aggregated posterior. The imposed prior can be a complicated distribution
in a high-dimensional space such as the swiss roll distribution without an explicit
functional form for the distribution. The reconstruction phase of the adversarial
training can also incorporate class label mixtures information to better shape
the distribution of the hidden code. Here, a semi-supervised classifier minimizes
the cross-entropy cost calculated on conditional posteriors estimated for each
labelled mini-batch. AAE designs demonstrate that deep generative models can be
adversarially trained with not only sampling methods such as restricted Boltzmann
machines but also variational methods such as importance weighted autoencoders.
The proposed AAE is shown to have applications in semi-supervised classification,
unsupervised clustering, dimensionality reduction, and data visualization.
Scutari et al. [545] analyze game theoretical modelling as a set of coupled
convex optimization problems in applied mathematics. Such convex optimization
problems are widely studied in signal processing for the design of single-user and
multiuser communication systems. Here, cooperative and non-cooperative game
theory approaches can be used to model the equilibria in communications and
networking problems. Such optimizations can also be generalized to variation
inequality problems in non-linear analysis. Thus signal processing can be used in the
study of the existence and uniqueness of the Nash equilibrium in game theoretical
adversarial learning. Further, iterative distributed computational algorithms can be
designed to study the convergence properties and equilibrium programming of the
game theoretical modelling. The related adversarial learning applications also have
relevance in signal processing and communication applications such as resource
sharing in multihop communication networks, cognitive radio networks, wireless
ad hoc networks, and per-to-peer wired networks.
Gidel et al. [217] explore variational inequality framework as a saddle point
optimization method for designing adversarial training. GANs training is extended
to include variational inequalities averaging and extrapolation. In mathematical
programming, variational inequality problems generalize the stationary conditions
for two-player games. At stationary points the directional derivative of cost func-
tion is non-negative in any feasible direction for the optimization. They can be
generalized to continuous vector fields. The variational inequality problem finds an
188 5 Adversarial Defense Mechanisms for Supervised Learning
optimal set on the vector fields. The game theoretical modelling in deep generative
modelling can be explored within the variational inequality framework to produce
stochastic variational inequalities with bounded constraints and regret minimization
in online learning. Here non-zero games are the GANs learning objectives. Vari-
ational inequalities can be leveraged in various practical optimization algorithms.
Harker et al. [252] review finite-dimensional variational inequality problems in
game theory especially for non-linear models. Solving for equilibrium models
is a topic called equilibrium programming in non-linear optimization. They can
be used to produce numerical computational methods to study the convergence
properties of game-theoretic equilibria. Sensitivity and stability analysis of the
equilibria to changes in model parameters is an important part of the existence and
uniqueness of the solution. The resultant numerical modelling can be integrated
into a game theoretical adversarial learning to optimize the dynamics modelling and
computation in iterative attack scenarios and defense mechanisms. Daniele [143]
recontextualizes dynamics modelling as evolutionary variational inequalities within
dynamic networks evolving over time. The dynamics modelling has applications in
finance, economics, computer science, and mathematics.
and local feature hierarchies, the proposed variational methods learn interpretable
hierarchical features preserving information on natural image datasets. The learnt
representations can be generalized to adversarially trained models that support
statistical inference. The hidden layers of latent variables are characterized in two
designs. The first design recursively stacks generative models assuming that the
bottom layer alone contains information to reconstruct the data distribution and the
information does not depend on the specific family of distributions used to define
the hierarchy. The second design focuses on single-layer latent variable models in
which high-level features are positioned to certain parts of the latent code and low-
level features to others. This approach is called variational ladder autoencoder. It
maximizes a marginal log-likelihood over the training dataset. The likelihood is
complex and intractable for generative models. The marginalization is due to the
latent variables of the autoencoder. Following a variational inference model, an
evidence lower bound (ELBO) involving Kullback-Leibler divergence is optimized
as a solution for the intractable marginal likelihood optimization. Such an inference
is shown to produce learned structured representations that are better than assuming
a Markov independence structure in the latent variables to factorize the inference
distribution according to an autoregressive hierarchical variational autoencoder.
Sønderby et al. [562] propose a ladder variational autoencoder for unsupervised
learning of feature representations. It recursively corrects the generative distribution
with a data-dependent approximate likelihood. A predictive log-likelihood provides
lower bound to the bottom-up inference in layered variational autoencoders. It can
also be used in the design of a deep distributed hierarchy of latent variables in
inference and generative learning models. The hierarchies of conditional stochastic
variables in such VAEs are interpreted as a computationally efficient representation
of factorized models. They approximate a variational approximate posterior lower
bounding the intractable true posterior. It is estimated by dependency structure mod-
elling between bottom-up likelihood inference and top-down generative information
modelling in deep learning. Such a parameterization of the VAEs allows interaction
between the bottom-up and top-down signals like in the variational ladder autoen-
coder. The generative performance of the variational distributions is compared
with VAE baselines such as variational Gaussian processes, normalizing flows,
importance weighted autoencoders, and auxiliary deep generative models. The KL
divergence bounding the log-likelihood training criterion is approximated using
Monte Carlo sampling. A stochastic backpropagation algorithm is used to optimize
the generative and inference parameters. In the VAE inference, each stochastic layer
is specified as a fully factorized Gaussian distribution. Variational regularization
terms are introduced into the loss function for generative log-likelihood distri-
bution estimators. This model can also accommodate explicit parameter sharing
between inference and generative distributions to produce recursive variational
distributions with attention mechanisms such as in the deep recurrent attentive writer
(DRAW) [232]. In game theoretical adversarial learning, such attention mechanisms
create best response strategies for the adversary as randomized operational decisions
while the cost-sensitive classifier learns representations for multimodal, multiview,
and multitask distributions.
190 5 Adversarial Defense Mechanisms for Supervised Learning
of learning to the adversarial noise with data-driven corrections. The use of learned
dictionaries is also compared with the use of predefined wavelet dictionaries to
recreate the observed sensor signals with separability and factorizability in the data
distributions for discriminative-generative modelling in game theoretical adversarial
deep learning. Applications are found for the resultant multimodal loss functions,
multiview cost functions, and multitask objective functions in biomedical imaging,
geophysical seismic sounding, and multitarget tracking.
Zou et al. [718] sparse principal component analysis (SPCA) use the lasso to
produce modified principal components with sparse loadings. SPCA is formulated
as a regression optimization framework with computationally efficient algorithms
on multivariate data. Regression criteria identifying important variables rather than
simple thresholding on explained variance are used to derive the leading principal
components. Without sparsity constraints, the method reduces to PCA. Sprechmann
et al. [568] create a clustering framework with dictionary learning and sparse
coding. The representative points for clustering are modelled in terms of data
distributions represented in one dictionary for each cluster. Thus the entire clustering
configuration is modelled as a union of learned low dimensional subspaces and
their data points. Learned dictionaries make the unsupervised clustering framework
suitable for processing large datasets in a robust manner. An EM-like iterative
optimization algorithm is designed to separate the clusters into the dictionaries.
The dictionaries are also used in a new measurement of representation quality that
combines sparse coding, dictionary learning, and spectral clustering for both hard
and soft clustering.
but also the fitness function and optimization technique as part of the adversarial
machine learning problem.
Michalewicz [426] surveys the evolutionary programming techniques to incor-
porate problem-specific knowledge as specialized operators in genetic algorithms.
They lead to evolution programs that are probabilistic algorithms extending the prin-
ciples of genetic algorithms. The specialized operators can be used for numerical
optimization, model tuning, constrained search, strategy learning, and multimodal
optimization in game theoretical adversarial learning. Here deep learning networks
can go beyond binary encoding of the populations to represent features for machine
learning in fuzzy, numerical, computational operators for evolution programs. The
players associated with particular strategies in game theoretical modelling can
be represented as the population in evolution programs. The adversarial payoff
functions can then act as the fitness functions evaluating individual solutions to
be selected for the next generation. Better strategies can be constructed by mating
players across generations. Representations of the strategies can be randomized with
genetic operators. A player’s regret minimization is determined by the average of
payoffs it receives over all the games it plays. In this manner evolution programs can
be used to solve multi-label multiplayer games in supervised adversarial learning
with simultaneous optimization of multiple objectives in real-world decision-
making problems. Symbolic empirical learning is a research area in evolutionary
programming that can induce classification rules for supervised learning. In contrast
to such symbolic classifier systems that maintain explicit knowledge in a high-level
descriptive language, statistical models represent knowledge as a set of examples
and statistics associated with them and connectionist models represent knowledge
among the weights assigned to neural network connections. The symbolic empirical
learning applied to a classifier system has to define rule-based systems such as a
detector-effector system to encode-decode training data to a genetic representation
of solutions, a message system on inputs to the genetic algorithm, a rule system
producing a population of classifiers, a credit system on evolving solutions across
generations, and a genetic procedure to generate populations for the various rule-
based systems. Here evolution programs can be used to model the behavior of
a game theoretical attack scenario in supervised adversarial learning. Problem-
specific feature representations and specialized operators for evolution programs can
apply evolutionary algorithms in finite-state machines for numerical optimization,
machine learning, iterated games, optimal control, signal processing, cognitive
modelling, engineering design, system integration, and robotics. Strategic oscil-
lation is a constrained optimization approach that is applicable to combinatorial
and non-linear optimization problems solved with evolution programs. It attaches
a feasibility/infeasibility context to cost-sensitive design of neighborhood search
and stochastic optimization in evolution programs. The configuration of rule-
based systems for selecting a region to be traversed and the direction of traversal
are determined by the ability to approach and cross the feasibility frontier from
different directions. Retracing a prior trajectory is avoided by mechanisms of
memory and probability. A constructive process for reaching the feasibility frontier
is accompanied by a destructive process for dismantling its structure resulting in
196 5 Adversarial Defense Mechanisms for Supervised Learning
a strategic oscillation around the boundary. Such strategic oscillations can be used
to guide the increase in adversarial payoff functions around classifier boundaries
around with a search procedure probing the depths of associated regions. Problem
constraints on such search can bound and penalize the search with a constraint
set on vector-valued functions. Tradeoffs between different degrees of violation of
the component constraints can be allowed according to their feature importance
scores. Such problems are called constraint satisfaction problems in evolution
programs and are comparable to constraint programming techniques in mathemat-
ical optimization. Therefore game theoretical adversarial deep learning can benefit
from evolutionary techniques for function optimization with self-adapting systems
incorporating control parameters into solution vectors, co-evolutionary systems
where evolutionary processes are connected across populations, polyploid structures
incorporating memory of non-stationary environments into individual solutions, and
massively parallel programming models embedding evolutionary computation.
McCune et al. [417] present a survey of vertex-centric programming model
in distributed processing frameworks for complex networks. It consists of inter-
dependent components to compute iterative graph algorithms at scale. Thus we
can evaluate the sensitivity of adversarial loss functions with respect to the
connectivity structure discovery, representation, visualization, and evaluation of the
game theoretical modelling on complex networks. In the context of adversarial
machine learning over graph pattern mining dynamics, we can explore functional
programming constructs suitable for distributed processing such as MapReduce
and bulk synchronous parallel. The choice of programming models is between
data parallelism, task parallelism, and graph parallelism. Haller et al. [247] discuss
the challenges of implementing parallel and distributed machine learning with
functional programming abstractions. The implementation detail for distributed data
analytics ought to consider machine learning assumptions implicit in the data model,
memory model, programming model, communication model, execution model, and
the computing model of the parallel and serial algorithms. The relevant features
learning methods include samples, trees, clusters, wavelets, kernels, splines, nets,
filters, wrappers and factors in data series, sequences, graphs, and networks. The
game theoretical modelling will require to learn dense substructures, rare classes,
and condensed patterns over transactional, sequential, and graph datasets where
random process generating training data may not be the same as that governing
testing data. Miller et al. [430] discuss parallel programming models tailor-made
for machine learning implemented in the Scala programming language. They would
have to support distributed graph processing, provide parallel bulk operations on
generic collections, and create a parallel domain-specific language for machine
learning on heterogeneous hardware platforms. Scala language’s features to archi-
tect and distribute parallel run time systems for machine learning are also covered.
For game theoretical modelling, we would have to design unsupervised learning
mechanisms with motif mining models such as biclustering and evolutionary
clustering, multilevel clustering and model-driven clustering. To create supervised
adversarial learning models with such motifs, we can focus on compression methods
and optimization methods within kernel learning and deep learning. The relevant
5.4 Computational Optimization Algorithmics for Game Theoretical. . . 197
can solve multimodal problems and normalize decision variables with an evolving
population utilizing the minimum and maximum values of objective and constraint
functions. It can incorporate both stochastic and deterministic operators that tend
to converge to desired solutions with high probability. Such operators include
selection, crossover, mutation, and elite preservation. The use of a data population
in the search mechanism of the evolutionary optimization is implicitly amenable to
embarrassingly parallel programming over different regions of the search space. It
can solve real-world optimization problems involving non-differentiable objectives
and discontinuous constraints, non-linear solutions, discreteness, scale, random-
ization in the computation, and uncertainty in the decision. A mathematical
concept called partial ordering defines the non-dominating Pareto-optimal solu-
tions in evolutionary multi-objective optimization. The convergence criteria of
evolutionary multi-objective optimization can be combined with mathematical
optimization techniques to produce dynamic optimizers. Such evolutionary multi-
objective optimization algorithms are explainable with respect to an application
such as spacecraft trajectory design. They are evaluated with performance measures
on the Pareto-optimal front such as error ratio, distance from reference set,
hypervolume, coverage, R-metrics, etc. Evolutionary multi-objective optimization
algorithms can deal with stochastiticies in problem parameters, decision variables,
feature dimensions, and convergence properties with a probabilistic scoring of the
objective and constraint function values finding imprecise solutions in uncertain
environments. Such procedures are called stochastic programming methods leading
to robustness frontier in the optimal solutions. It is practically solved with bi-level
optimization formulations in many areas of science and engineering.
Kelley [313] does a mathematical analysis of the necessary and sufficient condi-
tions in iterative optimization. The optimization algorithms for noisy objectives and
bounded constraints is summarized. Sra et al. [570] discuss the role of optimization
methods in machine learning. Stochastic gradient descent methods are summarized
for non-smooth convex large-scale optimization. Regret minimization methods are
proposed to select, learn, and combine features to optimize loss functions in machine
learning. The need for approximate optimization and its asymptotic analysis is given
for large-scale machine learning. Finally the relationship of robustness learning and
generalization error and its role in robust optimization with adversarial learning is
presented. Online optimization and bandit optimization are proposed as the methods
to deal with adversarial noise and label noise in supervised learning.
Koziel et al. [328] review the research area called computational optimization.
Computational optimization models and algorithms try to make optimal use of
available resources to maximize the profit, output, performance, and efficiency while
minimizing the cost and energy consumption. Search algorithms are the practical
tools to reach the optimal solutions in computational optimization. They have to
cope with uncertainty in real-world systems with robust designs for the objective
functions in the computational optimization. Convex optimization techniques that
are widely used in machine learning are special cases of computational optimization.
Satisfactory designs for robustness have to create optimization methodologies that
can make do with limited computational resources and analytically intractable
200 5 Adversarial Defense Mechanisms for Supervised Learning
An adversary can explore the signal filtering, detection, and estimation in tensors to
express the machine learning robustness, fairness, explainability, and transparency.
Here tensor representations of the training data distributions in deep learning
networks explore the structure and context underlying data with learning and
optimization theories based on tensors algebra sensitivities of the loss functions
in machine learning. Tensor can be understood as a multidimensional array. Each
direction in a tensor is called a mode. The number of features in a mode is called
dimension. Rank of a tensor is the total number of covariant indices of a tensor. Rank
is the minimal number of modes in a tensor. Rank is independent of the number of
dimensions of the feature space underlying the tensor. Rank of a tensor is also called
order or degree of the tensor. In various applications, tensors are decomposed into
lower-order tensors using abstract algebra.
From the perspective of supervised machine learning, the tensor algebra can
be based in computational learning theories of machine learning models and data
mining tasks. In game theoretical adversarial learning, the applications of tensor
decompositions are of interest in the study of the bias-variance tradeoffs in the
adversarial payoff functions for mathematical optimization. We can attempt to
explain the tensor decompositions in the adversarial manipulations to learn about
the effects of algorithmic bias in deep learning. Subsequently robust optimization
theories can be proposed for randomization-based adversarial deep learning. Such
deep learning theories would also have application in the data mining tasks such
as novelty detection and feature extraction. Here factorization machines are a low-
rank approximation of the feature engineering in a sparse data tensor when most
of its predicted elements are unknown. Here granular computing is useful to create
data fusion rules on the feature representations of the training data. It can lead to
neuro-fuzzy systems and multi-agent systems in data mining.
We can further investigate the transfer of the statistically significant data
fusion rules between predictive data representations on the spatial resolution and
spectral resolution data distributions of the training manifolds. Complex structure
temporal data in cybersecurity can also be represented as dynamic multidimensional
graphs for positive unlabelled learning. Such graphs can be interpreted as both
a complex network and a complex tensor in data mining. They require the use
of distributed big data processing for graph mining and deep learning. Here we
5.4 Computational Optimization Algorithmics for Game Theoretical. . . 203
can do graph data mining in terms of graph sampling, graph partitioning, graph
compression, graph clustering, and graph search. We may scale the machine
learning with data sampling methods that can address the data dimensionality and
data granularity for multiprocessing and embarrassingly parallel batch processing
over tensors and graphs. The related work is in a study of sampling methods such as
undersampling, oversampling, uncertainty sampling, reservoir sampling, structural
sampling, etc. The big data solutions would involve data engineering operations
for caching, sorting, indexing, hashing, encoding, searching, partitioning, sampling
and retrieval in incremental models, sequence models, and ensemble models for
cost-sensitive learning with graphical models. For the Sensitivity analysis on big
data, we can analyze the prediction validation metrics that tune the deep neural
network parameters according to misclassification trends in structural datasets.
Common validation metrics include confusion matrix, precision-recall curve, ROC
curve, lift curve, and kappa statistic. Here we find literature on sequence learning
and discriminative learning for modelling the feature extractions and regression
residuals. For deep representation learning of the such data distributions, we can
decompose the historical data into recent, frequent, and supervised patterns. Here
we can experiment with discretization methods such as sliding windows, dynamic
time warping and time frequency methods like wavelets, and shapelets. We may
then treat data distributions as 1D vector or 2D tensors in deep learning to extend the
collaborative filtering on end-user feedback into data cubes acting as data structures
for distance metric learning.
We can also define synopsis data structures on tensors and graphs to derive the
machine learning features. The synopsis data structures would aid similarity search
and metric learning in complex network analysis. In this context we can explore
the causality and stationarity of Markov chains with expectation-maximization and
minimum description length principles for statistical inference. The analytics results
are applicable in data mining tasks such as clustering, classification, and association
analysis. We can extend them into feature learning for structured prediction, change
detection, event mining, and pattern mining with deep learning. Here the learnt
features can be one of sampled features, constructed features, extracted features,
inferred features, and predictive features. In terms of the modelling parameter
estimation, regularization parameters would do dimensionality reduction, while
learning parameters do predictive classification and regression. Combining all these
parameters in a data mining model would allow us to do sensitivity analysis of
the model for different data samples. The error minimization over various types
of parameters can be modelled as loss functions in classification models and cost
functions in optimization models for game theoretical adversarial deep learning.
The relevant deep neural networks include feature-based models and memory based
models. The choice between deep neural nets for data mining is determined by
statistical hypothesis testing methods in data analytics methods. Such methods
include maximum likelihood estimation, sequential hypothesis tests, shift invariant
methods, support vector machines, and tensor decomposition methods.
Grasedyck et al. [231] produce a literature review of low-rank tensor approxi-
mation techniques in scientific computing. A particular emphasis is put on tensors
204 5 Adversarial Defense Mechanisms for Supervised Learning
algorithm can be used to fill in the missing values in the target matrices that are in
turn estimated as a low-rank approximation. The iterative updates to the target and
weight matrix can be based on variational bounds on estimating the log-likelihood.
Therefore we can mix weighted low-rank approximation iterations and variational
bound iterations while still ensuring convergence for both.
Tsourakakis [610] improve the Tucker decomposition to analyze multi-aspect
data and extract latent factors. A new sampling algorithm computes the decom-
positions in tensor streams where the tensor does not fit in the available memory.
The Tucker decomposition is formulated as a non-linear optimization problem.
It is solved with a computationally expensive alternating least squares (ALS)
optimization algorithm. The ALS procedure is sped up with randomized algorithms
that select columns according to a biased probability distribution for tensor decom-
positions. They can be interpreted as generalizations of low-rank approximation
methods. Further the randomized algorithm is amenable to embarrassingly parallel
processing on tensor streams. Such low-rank approximations represent statistically
significant portions of the training data obtained from real-world processes. They
have application in data mining tasks such as network anomaly detection. Here
outliers are detected relative to the subspace spanned by the principal components
in the training data.
Zou et al. [717] propose sparse principal component analysis (SPCA) where an
elastic net produces modified principal components with sparse loadings. Principal
component analysis is conducted as a regression-type optimization problem. Such
a SPCA has applications in handwritten zip code classification, human face
recognition, gene expression data analysis, and multivariate data analysis. Richtarik
et al. [517] benchmark eight different optimization formulations for SPCA and
their efficient parallel implementations on multicore, GPU, and cluster. The robust
formulations use objective functions that are functions of the covariance matrix.
An alternating maximization method is the optimization algorithm. It measures
data variance using L1 and L2 norm. Anandkumar et al. [14] propose robust
decomposition of a tensor into low-rank and sparse components. The proposed
method does a gradient ascent on a regularized variational form of the eigenvector
problem. The regularized objective satisfies convexity and smoothness properties
for optimization. Empirical moments in probabilistic are represented as higher-
order moment tensors to be decomposed. Then corruptions on the moments are
assumed to occur due to adversarial manipulations or systematic bias in estimating
the moments. The experimental results are compared with robust matrix PCA on
flattened tensor and matrix slices of the tensor. They have applications in image
and video denoising, multitask learning, and robust learning of latent variable
models. Romano et al. [522] analyze the robustness of a classifier to adversarial
perturbations by using the theory of sparse representations. Bounds are derived on
the performance of the adversarial learner’s properties and structure in regression
and classification. The bounds are shown to be a function of the sparsity of the
signal and the characteristics of the filters/dictionaries/weights on the incoming
signals. They unveil the data model governing the sensitivity to adversarial attacks.
Adversarial regularization mechanism based on sparse solutions and incoherent
208 5 Adversarial Defense Mechanisms for Supervised Learning
dictionaries is proposed to improve the stability of the robust learner dealing with
adversarial noise. The relation of the intrinsic properties of the signal to the success
of the classification task is explored as a generative model. The stability of the
classification model is studied in both binary- and multi-class settings.
Kreutz-Delgado et al. [331] develop data-driven algorithms for domain-specific
dictionary learning. They perform maximum likelihood and maximum a posteriori
estimation. Priors are obtained from sparse representations of environmental signals
matched with a dictionary as concepts, features, and words. In experimental evalua-
tion the proposed dictionary learning has better performance in signal-to-noise ratios
than independent component analysis methods. Images encoded with a dictionary
have higher compression (fewer bits per pixel) and higher accuracy (lower mean
square error). The dictionary provides succinct sparse representation for most
statistically representative signal vectors in the data-generating environment. The
statistical structure in the generated signals spanning a learning environment is
represented with a set of basis vectors spanning a lower-dimensional manifold
of meaningful signals in a dictionary. The dictionary learning maximizes the
mutual information between the basis vectors and the generated signals. Projecting
the signals onto the dictionary results in noise reduction and data compression.
The tensor decomposition problem in dictionary learning is to produce low-rank
approximations completing the dictionary. The signal representation problem as
an entropy minimization elaborates the statistical structure in data distributions. It
can also be viewed as a generalization of vector quantization. Stochastic generative
models can be developed in deep learning to solve such problems. A combination
of expectation-maximization and variational approximation techniques can also be
used in the dictionary learning.
Luedtke et al. [398] propose adversarial Monte Carlo meta-learning to construct
optimal statistical estimation procedures in problems like point estimation and
interval estimation. A two-player game is formulated between Nature and a
statistician. Neural network parameters are repeatedly updated across the game
interactions to arrive at a representation of the finite observed samples in numerical
experiments. Thus adversarial learning can be incorporated into frequentist and
Bayesian approaches of measuring the machine learning performance. In frequentist
approaches adversarial learning can solve for the worst-case performance of
maximum likelihood estimators expressed as a minimaxity optimization criterion.
In Bayesian approaches adversarial learning can approximate posterior probability
distributions where minimax optimization derives Bayes procedures from least
favorable mixtures of priors. Here maximum empirical risk of a statistical procedure
can be determined from its least favorable distribution. Minimax adversarial learn-
ing algorithms iteratively update such risks to improve of machine learning models.
New statistical procedures can be constructed for data mining tasks in a cost-
effective manner using deep adversarial learning. For instance, Zhou et al. [711]
present a sparse relevance vector machine ensemble for adversarial learning. During
model training, it is able to model adversarial attacks with kernel parameters.
A concept drift in the directions of kernel parameters minimizes the likelihood
of positive (malicious) data points. It is used in the learning of weights in a
5.4 Computational Optimization Algorithmics for Game Theoretical. . . 209
Chopra et al. [128] construct a trainable similarity metric for recognition and ver-
ification applications. The similarity metric learns a function to map input patterns
into target space such that L1 norm in the target space approximates the semantics
distance in the input space. The mapping function is architected as a convolutional
neural network that is robust to geometric distortions. A discriminative loss function
minimizes the similarity metric for a face database with high variability in pose,
lighting, expression, position, and artificial occlusions. The loss function is derived
from energy-based models (EBMs). In comparison to generative models, EBMs
do not need to estimate normalized probability distributions over the input space.
Such approaches to recognition tasks are suitable for datasets where the number of
categories is large and the number of samples per category is small.
Xing et al. [666] propose a distance metric learning problem over (dis)similar
relationships side information in data points. The distance metric learning is framed
as a convex optimization problem with efficient solutions. The learned metric is
trained over the full feature space of the inputs rather than a feature embedding
derived from the training dataset. So it generalizes more easily to previously
unseen data. Experimental evaluation is carried out on variants of K-means such
as constrained K-means, K-means + metric, and constrained K-means + metric.
Ye et al. [682] propose instance specific distance metric learning in nearest
neighbor methods. It assigns multiple metrics to different localities in the training
data. The proposed Instance Specific METric Subspace (ISMETS) spans the metric
space in a generative manner. It induces a metric subspace for each instance
by inferring the expectation over the metric bases in a Bayesian manner. The
statistical inference is done according to a variational Bayes framework. The
posterior demonstrates advantages of interpretability, effectiveness, and robustness.
In multimodal data analytics, such a distance metric learning is comparable to
constrained convex programming, and information-theoretical approaches such
as maximum entropy modelling. It can predict distance metrics for unseen test
instances inductively as well as transductively. It can incorporate parallelization
techniques and approximation tricks.
Shen et al. [553] propose a boosting-based technique for learning a quadratic
Mahalanobis distance metric. Semidefinite programming solution is given to the
boosting. It expresses positive semidefinite matrices as a linear combination of
trace-one rank-one matrices. They act as weak learners within an efficient and
scalable boosting-based learning process. The proposed semidefinite programming
can incorporate various types of constraints for rank aggregation in classification
and regression loss functions. Such a distance metric learning is closely related to
subspace methods like principal component analysis, linear discriminant analysis,
locality preserving projection, and relevant component analysis. They can be
interpreted as projections of data from input space to a lower-dimensional output
space while preserving the neighborhood structure of the training dataset in an
information-theoretical sense. Here supervised distance metric learning utilizes
side information presented as constraints on the optimization problem. A sparse
greedy approximation algorithm solves the optimization problem in an AdaBoost-
like optimization procedure for semidefinite programming.
212 5 Adversarial Defense Mechanisms for Supervised Learning
Bellet et al. [44] survey the utility of distance metric learning in machine
learning, pattern recognition, and data mining. Metric learning is a research area
that automatically learns the distance metrics from data. Supervised Mahalanobis
distance metric learning is a baseline for comparison with the learnt metrics.
Variants of metric learning algorithms include those for non-linear metric learning,
similarity learning, edit distance learning, local metric learning, multitask metric
learning, and semi-supervised metric learning. In the context of adversarial learning,
metric learning allows us to derive generalization guarantees to the machine
learning model’s performance. Kulis et al. [334] provide another survey of tuning
a learned distance metric to a particular task in data analytics in a supervised
manner. Supervised metric learning is based on labelling information regarding
the distances of the transformed data. It is of special interest in scaling the data
analytics to high-dimensional feature spaces in computer vision, image retrieval,
face recognition, pose estimation, text analysis, music analysis, program analysis,
and multimedia. Metric learning has extensions in non-linear regression, feature
ranking, dimensionality reduction, database indexing, and domain adaptation. Deep
learning networks have an important role to play in the development of metric
learning methods.
Hoffer et al. [277] propose a triplet network deep learning model to learn
useful representations by distance comparisons. It is applied in the learning of a
ranking in image information retrieval. The similarity function is induced by a norm
metric embedding for multi-class-labelled dataset. A deep network is the embedding
function. It finds the L2 distance between inputs of two labels and the embedded
representation of a third label input acting as a reference label. The neural network
architecture allows this analytics task to be expressed as a two-class classification
problem where the objective of the loss function is to learn a metric embedding
that measures the proximity to the reference label. A back-propagation algorithm
updates this learning model. The model learns comparative measures rather than
class labels between labelled data distributions. This learning mechanism can be
leverages to classify new data sources with unknown labels.
Chen et al. [120] propose a discriminative metric-based generative adversarial
networks (DMGANs) that use probability-based methods for generating real-like
samples in image synthesis tasks. A generator is trained to generate realistic samples
by reducing the distance between real and generated samples. The discriminator
acts as a feature extractor that is learning a discriminative loss constrained by an
identity-preserving loss. The discriminative loss maximizes the distance between
real and fake samples in the feature space. The identity-preserving loss calculates
distance between samples and their centers. The centers are updated during the
GAN training. It maps the generated samples into a latent feature space used to
label the samples. Thus DMGAN recovers the implicit distribution of the real data.
It learns representative features in a transformed space. The proposed identity-
preserving loss can be contrasted with triplet loss and contrastive loss that learn
intra-class variations by constraining the distance between their samples. Thus
GANs can be improved from the perspective of deep metric learning. Such GANs
have applications in image generation, image super-resolution, image-to-image
214 5 Adversarial Defense Mechanisms for Supervised Learning
analysis (PCA), variational autoencoders (VAE), and GANs. The mode collapse in
GANs is investigated with a reconstruction criterion.
Bauso et al. [36] formulate distributionally robust games using f-divergence
in multiplayer games between training distribution and adversarial distribution
scenarios. Each player has to contend with a worst-case distribution called the
adversarial distribution. Bregman learning algorithms speed up the computation of
robust equilibria. The adversarial learning scenarios are selected by nature assumed
to be a virtual player solving a non-convex non-concave objective function. A triality
theory is proposed for the dimensionality reduction of the robust game. A swarm
algorithm estimates the expected gradient solving for adversarial manipulations.
Kamath et al. [307] study the loss functions in the problem of distribu-
tion approximations in statistical learning where a distribution is approximated
from its samples. In compression applications the Kullback-Leibler divergence
is recommended as the relevant loss function. In classification applications the
L1 and L2 losses are recommended as the relevant loss function. In generative
learning the f-divergences are recommended as the relevant loss function. Here
the minmax cumulative loss for a given loss function and the optimal estimator
achieving has practical importance in training machine learning models. Sugiyama
et al. [576, 577] discuss the approximation of two probability distributions from their
samples. This is a problem with implications for statistics, information theory, and
machine learning. Kullback-Leibler divergence of maximum likelihood estimation
models is compared by the authors with Pearson divergence, L2 -distance for
efficiency, robustness, and stability. Here proper distances must satisfy the triangle
inequality that is an extension of the Pythagorean theorem to various geometric
metric spaces. They must not be sensitive to outliers. They must not be numerically
unstable. They must have a relative density ratio function that is bounded and
computationally efficient. The authors survey several data analytics applications
utilizing the divergence measures such as change-point detection, salient object
detection, and class balance estimation in several data mining tasks such as feature
extraction, clustering, independent component analysis, causal feature learning,
independent component analysis, and canonical dependency analysis. Direct diver-
gence approximation in combination with dimensionality reduction is said to be a
better strategy in experiments rather than naive density estimation of distributions
from samples. The difference between such statistical distances and information
divergences is their effect on the convergence criteria in the sequences of learned
probability distributions estimated by generative models and variational methods.
The divergences being optimized are typically discontinuous with respect to the
generator’s parameters. So novel ways for practically estimating the infimum and
supremum of the relative density ratio function are to be devised in adversarial deep
learning-based on metric geometry, applied probability, and statistics.
216 5 Adversarial Defense Mechanisms for Supervised Learning
data points for the learner. Here one-shot games minimize adversarial cost when
each move happens, while iterated games minimize total accumulated cost found
by playing a game that repeats several times. The authors summarize such costs
and their practical considerations in several game theoretical models for causative
availability attacks and exploratory integrity attacks. Their defense mechanisms
in adversarial classifiers change the likelihood function of the learner so that it
can measure each feature at a different known cost. The adversary then plays
optimally against the original cost-sensitive classifier. Here the research area of
robust statistics can compare the candidate procedures to design procedures for
achieving robust learners. It can be used to develop an information theory for secure
learning systems that can measure the information leakage in terms of number of
bits. It can also quantify the empirical risk associated with side channel attacks on
the leaked information.
To create robust machine learning in malware detection, Tong et al. [601] identify
conserved features that cannot be modified without compromising malicious func-
tionality. They are used to construct a successful defense against a realizable evasion
attack. Machine learning robustness is then generalized to multiple realizable
attacks to do model hardening with a feature space accounting for a series of
realizable attacks in robust optimization. A collection of feature extractors is
designed to compute numerical vector values and associated object labels for
features from corresponding input entities. Depending on the assumptions about
the learning algorithm and the adversarial model, evasion defense is classified into
game-theoretic reasoning, robust optimization, and iterative adversarial retraining.
Generalizability of evasion defenses is evaluated over feature space models of
evasion attacks that are realizable. Structure-based PDF classifiers on binary
features of structural properties in PDF files as well as content-based PDF classifiers
on PDF metadata and content are used to distinguish between benign and malicious
instances. Realizable evasion attacks are crafted with EvadeML that has blackbox
access to the classifier, mimicry attack that manipulates a malicious PDF file
using content injection to resemble benign PDF file, MalGAN to generate malware
examples, reverse mimicry attack to inject malicious payloads into target benign
files, and custom attack to replace entries in attack PDF files with hexadecimal
representations that obfuscate tags for code execution in PDF. Iterative adversarial
retraining is selected as the defense mechanism to produce a robust classifier.
Chaowei et al. [111] leverage spatial context information in semantic segmentation
to detect adversarial examples even when dealing with a strong adaptive adversary.
The hypothesis for the defense mechanism is that adversarial examples in different
machine learning tasks contain unique statistical properties that provide in-depth
understanding of the potential defensive mechanisms. In semantic segmentation
tasks, this translates to giving prediction labels to each pixel in an image subject to
222 5 Adversarial Defense Mechanisms for Supervised Learning
exploited by the adversaries. Thus models trained with defensive distillation are
less sensitive to adversarial samples. As a security countermeasure, it leads to
smoother classifier models with improved generalizability properties. Defensive
distillation leads to two folds during training called direction sensitivity estimation
and perturbation selection where adversarial goal is assumed to be misclassifying
samples from a specific source class into a distinct target class. The transferred
knowledge consists of not only the weight parameters learnt by the deep net
but also the encoded class probability vectors produced by the network during
training. Soft class probabilities are better than hard class labels because they hold
relative information about entropy of classes in addition to each sample’s correct
class. Such information on class-conditional probabilities can be used to guide
the convergence of the deep net to an optimal modelling solution that enhances
classification robustness. To deal with optimization attacks with new objectives and
optimizers, Papernot et al. [478] extend the folds in Papernot et al. [484] to add
an outlier class to mitigate adversarial examples and provide uncertainty estimates
in neural networks through stochastic inference. By transferring both knowledge
and uncertainty, the extended defensive distillation does not need the defender to
generate adversarial examples according to heuristics.
Tramer et al. [603] introduce ensemble adversarial training to augment the
training data with adversarial perturbations transferred from other pre-trained
models. Including blackbox attacks in such adversarial perturbations significantly
improves the transferability of the adversarial examples. Such a defense mechanism
is useful in the costlier multistep attacks. Fast gradient sign method (FGSM) and
its variants such as single-step least-likely class method (Step-LL) and iterative
attack (I-FGSM or Iter-LL) are used to create adversarial examples. Both white-
box and blackbox adversaries are used to evaluate the robustness gains in defense
strategies. Thus adversarial training is improved by decoupling adversarial examples
generation from the model training. At the same time, interactive adversaries
are also proposed to include queries on the target model’s prediction function in
their attack. Wu et al. [655] propose highly confident near neighbor framework to
combine prediction confidence information and nearest neighbor search to reinforce
adversarial robustness. Meng et al. [422] propose MagNet framework for the
defense mechanism. It includes separate detector networks and a reformer network
to detect the adversarial examples. The detector networks are autoencoders to learn
the data manifold of normal examples without assuming any particular stochastic
process for generating them. They are trained according to a reconstruction loss
criterion that approximates the distance between input and manifold of normal
examples. The reformer network is another autoencoder that moves adversarial
examples toward the data manifold of normal examples to correctly classify them.
Based on cryptography ideas, defense via diversity is advocated to randomly pick
one out of several defenses at run time in a gray-box attack. Carlini et al. [105] are
then able to construct transferable adversarial examples for MagNet and adversarial
perturbation elimination GAN (APE-GAN). Based on distance metrics, Carlini et
al. [106] also succeed at constructing adversarial examples for defensively distilled
networks.
224 5 Adversarial Defense Mechanisms for Supervised Learning
Metzen et al. [424] augment deep neural networks with a detector network.
It does binary classification between genuine data and adversarial data to detect
a specific adversary type. It can act as a method for hardening detectors against
dynamic adversaries. Cisse et al. [131] introduce Parseval networks for empirical
and theoretical analysis of the robustness of predictions made by deep nets subject to
adversarial perturbations. They act as a regularization method with orthonormality
constraints for reducing the effect adversarial manipulations. Similarly, Gu et
al. [239] propose a Deep Contractive Network (DCN) acting as a smoothness
penalty on adversarial training. DCN is an extension of contractive autoencoder
(CAE) that has the ability to remove adversarial noise. Thus ideas from ideas from
denoising autoencoder (DAE), contractive autoencoder (CAE), and marginalized
denoising autoencoder (mDAE) provide a strong framework for adversarially
training deep neural networks with a robustness criteria tuned toward human
perception. By contrast, Kos et al. [326] create adversarial examples in the latent
space for deep generative models such as variational autoencoders (VAEs) and
generative adversarial networks (GANs). Xiao et al. [660] propose AdvGAN to
generate adversarial examples with conditional adversarial networks in semi-white-
box and blackbox attack scenarios. Jin et al. [303] propose APE-GAN to defend
against adversarial examples in white-box attack scenarios. The generator alters
adversarial perturbations with tiny changes to input examples. The discriminator
is optimized to separate clean examples and reconstructed examples without
adversarial perturbations. A loss function is invented to make adversarial examples
consistent with original images data manifold. APE-GAN can be combined with
other defense mechanisms such as adversarial retraining.
Assuming the hypothesis that adversarial examples lie in the low probability
regions of the training distribution, Song et al. [565] design PixelDefend to move
maliciously perturbed images back to the distribution seen in the training data. A
generative model computes probabilities of all training images. Such a probability
density is used to rank the adversarial examples created by a variety of attacking
methods. A constrained optimization problem that is intractable is formulated
to purify the adversarial examples. It is approximated with a greedy decoding
procedure. Results are compared with other defense mechanisms in the literature
such as adversarial training, label smoothing, and feature squeezing. Bojanowski
et al. [77] introduce a Generative Latent Optimization (GLO) framework to train
generators using reconstruction losses. It is useful in interpolating between training
samples and adversarial examples. It also permits linear arithmetic between noise
vectors in the latent space to study interpolations of adversarial examples without
the need for an adversarial game between the generator and the discriminator. The
generator then translates the linear interpolations in the noise space into semantic
interpolations in the image space. The learnable noise space is able to disentangle
the non-linear factors of variation of image space into linear statistics. Kyatham
et al. [341] incorporate adversarial perturbations to a regularized and quantized
generative latent space to then map it to the true data manifold. A defense mech-
anism based on generative autoencoders then is able to circumvent disadvantages
of related defense mechanisms such as approximation of derivatives in adversarial
5.5 Defense Mechanisms in Adversarial Machine Learning 225
Jha et al. [297] create satisfiability modulo theories (SMT) solvers with a
combination of oracle-guided learning from examples and constraint-based synthe-
sis from components in a machine learning library. Such an automatic synthesis
of programs for program deobfuscation is useful in the formal verification of
adversarial learning. A validation oracle checks whether the machine learning
program is correct or not based on adversarial learning security requirements. It
has connections to optimization procedures in computational learning theory and
bit-manipulating programs. In a non-stationarity adversarial environment, Lowe et
al. [395] explore deep reinforcement learning methods for multi-agent domains.
Actor-critic methods can be adapted for adversarial learning to learn policies
around game theoretical interactions in adversarial deep learning over complex
multi-agent coordination. They solve for robust multi-agent policies in cooperative
and competitive attack scenarios from emergent behavior and complexity in co-
evolving agents. Thus reinforcement learning is applicable to adversarial learning
environments with multiple adversaries.
Li et al. [361] create robust malware detectors for adversarial examples on
Android malware. A combination of a variational autoencoder (VAE) and a multi-
layer perceptron (MLP) is used to design a novel loss function that disentangles
the features of different malware classes. The feature space of Android malware is
represented in a discrete fashion. The proposed defense mechanism computes a sim-
ilarity metric between benign and malicious examples while preserving malicious
functionality. The final classification model simultaneously does malware detec-
tion and adversarial example defense. Hassan et al. [256] address trust-boundary
protection to allow user access privilege in Industrial Internet of Things (IIoT) envi-
ronments. Adversaries can use model skewing techniques to generate adversarial
examples on the attack surface in the IIoT network. A downsampler-encoder-
based cooperative data generator is used to create the adversarial examples in IIoT
devices. Such IIoT devices include IoT devices such as sensors, programmable logic
controllers, actuators, intelligent electronic devices, and cyber-physical systems
(CPS) in industrial operations. CPS include subsystems and processes for design,
infrastructure, monitoring and control, scheduling, and maintaining the value chain
of data analytics for precise control of physical processes, autonomous management
of industrial system collaboration, less expensive production data collection, and
intelligent processing in real time. The vulnerabilities and threats of such industrial
protocols, networks, systems, and services are open to exploitation by adversaries.
They are further exacerbated by existing security loopholes in conventional IT
systems. Here defense mechanisms in adversarial deep learning are used to uphold
security objectives of IIoT data such as confidentiality, integrity, and availability.
Further applications of adversarial deep learning are given by Abusnaina et al. [1]
and Martins et al. [414]. Abusnaina et al. [1] analyze IoT malware detection with
control flow graph (CFG)-based features. A graph embedding and augmentation
method is used to generate and embed adversarial examples into training data of
IoT software. CFG features allow the exploration of IoT malware through graph
theory and machine learning. Martins et al. [414] analyze the generation and
detection of adversarial examples in intrusion and malware detection scenarios.
5.5 Defense Mechanisms in Adversarial Machine Learning 229
Tan et al. [592] discuss attention maps in computer vision tasks. A geometric prior
on the spatial context for a pixel is modelled as a novel self-attention module.
It does not require the computationally expensive positional encoding of content-
driven attention maps constructed with queries and keys. The self-attention training
concept is applicable to not only computer vision tasks but also natural language
processing tasks. In image recognition tasks, it is categorized into channel attention
and spatial attention. Sen et al. [547] conduct a quantitative assessment of human
versus computational attention mechanisms in text classification tasks. They are
contrasted on overlap in word selections, distribution over lexical categories, and
context-dependency of sentiment polarity. The attention mechanisms are useful
for interpretability about the modelling details such as model debugging, archi-
tecture selection in natural language processing (NLP) tasks such as language
modelling, machine translation, document classification, and question answering.
They create explainable attention scores for the model predictions that can be
linked with the feature importance measures on dimensionality reduction. Human
attention is measured from the perspectives of measures for behavioral similarity,
lexical (grammatical) similarity, and context-dependency sentiment polarity. Resul-
tant attention maps are compared with attention-based recurrent neural networks
(RNNs). Bidirectional RNNs with attention mechanisms are found to be similar
to human attention according to the human attention measures. An attention
map is defined as a vector with sequence of words associated with positions in
the text. Neural networks can produce the attention maps by computing either
probability distributions or bitwise operations on the word sequences. The NLP
prediction tasks become more difficult on long text as the accuracy and similarity
scores of the models decrease. Lin et al. [370] create RankGAN to generate
natural language descriptions of human-written and machine-written sentences. The
discriminator does a relative ranking of the text to help create a better generator.
Such relative ranking information can benefit from rank aggregation methods used
in the distributional smoothing of adversarial learning features.
The attention maps in deep learning can be contrasted with the feature maps
as described by Thaller et al. [596] for analyzing the design patterns in recurrent
230 5 Adversarial Defense Mechanisms for Supervised Learning
the learnings of a model and explain its individual predictions to advance machine
learning models beyond neural networks. Such methods for explainable artificial
intelligence are required in machine learning systems for the verification of the
system’s decision-making, improvement of the system architectures, knowledge
transfer of the learning of the system to a human user, and compliance of algo-
rithmic decisions to privacy regulation. Thus the relation between generalizability,
compactness, and explainability of the learned representations in adversarial deep
learning is an active area of research. Ancona et al. [15] use Shapley values from
cooperative game theory to assign relevance scores in attribution methods. They
quantify the “relevance” or “contribution” of each input feature in a given input
sample. The target output in a classification task is chosen to be the prediction
with the highest output probability that is associated with the parts of the input
most relevant for the prediction. The relevance scores also contain information
to assess the input for evidence that supports or rejects a predicted class label.
The attribution methods can also be subject to adversarial attacks without reliable
quantitative metrics based in ground truth to evaluate the explanations. Here
Shapley values act as a self-evident property of the explanations designed for
stronger theoretical guarantees on their reliability. Shapley values can be assigned
to attributes such that certain desirable axioms are satisfied on the completeness,
symmetry, linearity, continuity, and implementation invariance of the attribution
methods. Choras et al. [129] discuss the lack of fairness and explainability in
the state-of-the-art algorithms for machine learning and artificial intelligence in
several application domains that use deep learning capabilities to solve detection
or prediction tasks. Here the security frameworks in adversarial machine learning
can introduce disinformation to mislead the deep learning results. Fairness in
artificial intelligence is then concerned about ethical and legal frameworks around
the disinformation that can be maliciously spread in the society at large. The
algorithmic bias resulting from the bias of human operators providing data with
misrepresentations and discriminations leads to unfairness in artificial intelligence.
Here there is a need to create training datasets without skewed samples, tainted
examples, and limited features leading to sensitive biased attributes in the training
algorithm and subsequent sample size disparity in the classification algorithm. So
machine learning fairness is to be defined according to notions of unawareness,
group fairness, and counterfactual fairness in the mathematical formulations of
adversarial deep learning. Here the counterfactuals due to adversarial machine
learning can be modelled as causal graphs explaining the predictions of supervised
deep learning. In this context, game theoretical adversarial deep learning procedures
provide a statistical framework for optimizing the tradeoffs between accuracy and
fairness measures on the machine learning system performance. They can construct
fair classifiers with reference to a sequence of cost-sensitive classification problems
providing randomized classifiers with the lowest empirical error within the desired
optimization constraints. Arrieta et al. [17] survey the literature on explainable AI
(XAI) and provide a taxonomy on recent contributions in deep learning. It leads onto
the broader concept of responsible artificial intelligence around methodologies for
the large-scale implementation of artificial intelligence in real-world organizations
232 5 Adversarial Defense Mechanisms for Supervised Learning
with fairness, explainability, and accountability built into the artificial intelligence
for every regulated industry in each economic activity sectors. Interpretability
as a design driver in machine learning supports impartiality in decision-making,
facilitates the provision of learnable robustness, and acts as an insurance on the
underlying truthful causality existing in the model reasoning. Samek et al. [537]
provide another review of XAI in deep neural networks.
Ribeiro et al. [515] learn a local interpretable model around a classifier’s
predictions. It explains individual predictions as the solutions to a submodular
optimization problem. The utility of such explanations is validated in the experi-
ments assessing trust in machine learning blackboxes by understanding the reasons
behind the predictions. A local interpretable model-agnostic explanations (LIME)
framework is presented to the problems of “trusting a prediction” to take action
based on it, “trusting the model” to behave in reasonable ways when deployed
in the real world. Interpretable representations for textual and visual artifacts are
produced as explanation tensors for each input instance such that domain- and
task-specific interpretability criteria are accommodated. LIME has applications in
recommendation systems for speech, video, and medical domains to design human-
in-the-loop machine learning systems. Hartl et al. [254] introduce feature sensitivity
measure called adversarial robustness score (ARS) for sequential network flow data
in intrusion detection systems (IDS). It is useful as a feature importance measure
used in the generation of adversarial samples for recurrent neural networks (RNNs).
ARS can be used alongside accuracy to evaluate security-sensitive machine learning
systems. It improves upon explainability methods such as partial dependence plots
(PDPs) for sequential data. Proposed defense mechanisms use ARS to leave out
the manipulable features, reduce the attack surface, and harden the resulting IDS.
Melis et al. [421] evaluate trust in Android malware detectors as they transition from
performing well on benchmark data to being deployed in an operating environment.
A gradient-based approach identifies the most influential local features to increase
accuracy without losing interpretability of decisions. The explanations can provide
insights into vulnerabilities of any blackbox machine-learning model used for
malware detection. Demetrio et al. [153] provide feature attribution to each decision
made for the classification of malware binaries. The explanations are then used to
generate adversarial malware binaries that are better than the state-of-the-art attack
algorithms against deep learning algorithms that provide highly non-linear decision
functions. Contributions of each feature to the label of a data point are calculated
with respect to baselines that create ground truth for the modelling. Adversarial per-
turbations then increase the contributions computed for the modelling output on the
modified features in the baselines. Sensitivity axioms are created for the baselines to
guide the training algorithm of back-propagating errors through the neural network.
Integrated gradients are used to explain the classification results. However, the
explainable models continue to be vulnerable to adversarial manipulations. Marino
et al. [411] generate explanations for misclassifications in data-driven intrusion
detection systems. The explanations provide reasonings behind the misclassification
and match them with expert knowledge. The explanations are applicable to any
classifier that has gradients. They can be used in digital forensics and vulnerability
5.5 Defense Mechanisms in Adversarial Machine Learning 233
assessment of the underlying machine learning system. Such XAI makes use of data
visualizations and natural language descriptions to explain the reasoning for the
decisions made by the machine learning systems. The reasoning can be understood
by a human and be used for simplifying the knowledge discovery process in data.
It can also be used to produce debugging diagnostics on the machine learning
system. Explanations are assumed to be the minimum modifications required to
correctly classify the misclassified samples. The adversarial manipulations are
used to visualize the learning features that are responsible for misclassification.
Misclassifications are frequently found to occur between samples with conflicting
data characteristics.
Liu et al. [382] investigate model interpretation to support an adversarial
detection framework explaining predictions in the target machine learning model.
Adversarial training is then used to improve the robustness of the detectors on
adversarial samples. Feature manipulation costs are estimated to categorize adver-
sary types. Existing detection frameworks are categorized into feature engineering
methods that are vulnerable to adaptive adversaries, game theoretical interactions
between detectors and adversaries where the modelling specifics vary with the
machine learning classifier architectures, and deep neural network defenses such as
adversarial training, defensive distillation, and feature squeezing. Misclassified data
points are seeded from evasion-prone samples likely to shift across the decision
boundary. Adversarial attacks are constructed on perturbation directions based on
model interpretation of input data instances classified as benign or malicious.
Lundberg et al. [399] propose a game theoretical framework called SHAP (SHapley
Additive exPlanations) to study the tradeoffs between accuracy and interpretability
in deep learning outputs. SHAP computes additive feature importance measure for
each prediction as Shapley regression values. Sampling approximation is made
in the computation of the Shapley values. Shapley value estimation methods are
augmented with feature attribution methods satisfying desirable properties on the
explanations such as local accuracy matching explanation model with the original
model, missingness to disallow missing features to have any impact, and consistency
on input’s attribution with respect to changes in the model state due to other
inputs. Then cooperative game theory is used to mathematically prove do not
violate accuracy and interpretability requirements where Shapley values act as the
conditional expectation functions of feature importance in the original model. The
conditional expectation functions are approximated with model-specific methods
such as Shapley sampling values, Kernel SHAP, Max SHAP, and Deep SHAP.
Model-agnostic approximations are obtained from a quantitative input influence
method that is a sampling approximation of a permutation version of the classic
Shapley value equations. Joint estimation of SHAP values with regression provides
better sample complexity/efficiency than direct use of classical Shapley equations.
Thus game theoretical explanations of the adversarial deep learning provide avenues
to create new explanation model classes.
Beyazit et al. [49] propose interpretable representations learned by a deep gener-
ative model by extracting independent marginals as well as causality entanglement
features in the training data. A training regularizer then penalizes disagreement
234 5 Adversarial Defense Mechanisms for Supervised Learning
to cyberattacks. Such tools have been developed as frameworks and libraries like
CleverHans adversarial examples library, Adversarial Robustness Toolbox, Foolbox
toolbox, SecML library, and MLsploit. Currently they implement algorithms to
generate and discriminate adversarial examples. But they ought to be extended to
quantify the risk machine learning systems, conduct cybersecurity threats modelling
in specific deployments and target environments for machine learning algorithms,
and quantify different attributes of the attack technique such as attacker model,
attack impact, and attack performance. Here the characteristics of the machine
learning production pipeline can be expressed as elements of data, deployment,
delivery, and orchestration to create threat analysis ontology for asset, vulnerability,
attacker, capability, impact, threat, and attack technique. The MulVAL framework
analyzes logical attack graphs to automatically extract information from formal
vulnerability databases and network scanning tools. It enumerates all possible attack
paths in a polynomial time on new and emerging attack scenarios. MulVAL is useful
in the design of risk assessment and countermeasure planning algorithms validated
with explicit interaction rules and predicates for attack modelling in the Datalog
programming language. Elitzur et al. [171] analyze cyber threat intelligence (CTI)
on previous attacks for attack reconstruction in tools on unobserved attack patterns
that can augment alert correlations and data visualizations for cybersecurity analysts
studying the attack hypotheses in the digital forensics of adversarial machine
learning with cyber kill chains. An Attack Hypothesis Generator (AHG) constructs
a knowledge graph on the threat intelligence to generate attack hypotheses in a
security information and event management (SIEM). Here CTI is categorized into
strategic threat intelligence, operational threat intelligence, tactical threat intelli-
gence, and technical threat intelligence. They are used to construct knowledge graph
to support adversarial reasoning with graph mining features on topological similar-
ity, correlation, and frequent patterns. Executing Semantic Web Rule Languages
on the knowledge graphs can be used in the data-driven analytics of logic-based
deductive inference rules. Link prediction and collaborative filtering in knowledge
graphs can improve the attack hypothesis generation with attack scenarios that are
likely to occur. Matern et al. [416] create visual artifacts that can be used in statistical
forensics tools to expose adversarial manipulations in Deepfakes. The adversarial
goal in automatic video generation is to create a malicious manipulation to convey
a semantic message within a video that is not originally intended in the training
material. Here image forensics search physical or statistical image artifacts to form
statistical fingerprints, validate noise priors, and learn specific manipulation traces
on the residuals of an image to detect adversarial manipulations. The proposed
visual artifacts are categorized into computer vision problems such as global
consistency, illumination estimation, and geometry estimation. Kamath et al. [307]
address a theoretical question in statistical learning on how a distribution can be
approximated with its samples. Smooth loss measures are proposed for distribution
approximations. For compression and investment applications, the relevant loss
is Kullback-Leibler (KL) divergence. For classification it is L1 , L2 , Hellinger,
chi-squared, softmax losses. For adversarial learning, the least worst-case loss
for a game theoretical optimal estimator for adversarial deep learning is minmax
236 5 Adversarial Defense Mechanisms for Supervised Learning
During the past decades, deep neural networks (DNNs) have shown great success
in a wide range of applications, including image classification in the computer
vision (CV) domain [263, 282, 558] and text recognition in the natural language
processing (NLP) field [157, 276]. However, recent researches have shown that
DNNs are immensely brittle toward adversarial examples primarily in the image
domain [228, 589]. For example, Goodfellow et al. [228] demonstrated that adding
nearly zero noises to a panda image can mislead the GoogLeNet to incorrect label
(gibbon) with high confidence (99.3%). This phenomenon raises great concern
about DNNs security implementation and attracts much attention in the CV
community since 2014. In the literature, numerous approaches have been proposed
to generate adversarial examples to attack DNNs (aka, the attack branch) and design
corresponding mechanisms to defense these potential attacks (aka, the defense
branch). In this chapter, we focus on the adversarial attack direction to craft high-
quality adversarial examples in both the CV domain and the NLP domain.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 239
A. Sreevallabh Chivukula et al., Adversarial Machine Learning,
https://doi.org/10.1007/978-3-030-99772-4_6
240 6 Physical World Adversarial Attacks on Images and Texts
DNNs are highly unstable toward adversarial attacks and attracted great interest in
the CV community.
Based on [589], a number of approaches for image adversarial attacks have
been proposed, such as gradient-based attack [161, 228, 338, 441, 658], score-
based attack [260, 292, 452], decision-based attack [85, 113, 114, 360, 543], and
transformation-based attack [115, 175, 630, 662]. Most of these attack strategies
compute perturbations for each single image by using existing dataset. Compared
with single image attack, crafting universal perturbation for a group of images
belonging to the same class is more challenging [440]. Additionally, most of
existing attack mechanisms are evaluated on public datasets rather than the physical
world environment, where the latter setting is more complex. In this character, we
introduce a novel image-agnostic attack module to generate natural perturbations
for traffic sign attack. This attack module can generate a universal perturbation for
a group of road signs, which is feasible for real-world implementation. Empirical
results on both public datasets and physical world pictures demonstrate that the
method outperforms baselines in terms of attack success rate and perturbation cost.
By using the soft attention module, it generates more natural perturbations, which
look like tree shadows by human drivers.
In this section, we review four kinds of adversarial attack methods on images,
i.e., gradient-based attack (6.1.1), score-based attack (6.1.2), decision-based
attack (6.1.3), and transformation-based attack (6.1.4).
6.1 Adversarial Attacks on Images 241
Gradient-based attack [161, 228, 338, 441, 658] seeks for the most sensitive perturb-
ing direction for an input data according to the gradient of loss function. Goodfellow
et al. [228] proposed the well-known fast gradient sign method (FGSM), which
determines the perturbation direction (increase or decrease) for each pixel by
leveraging the gradient of the loss function. They argued that the vulnerability of the
neural networks is due to the linear nature instead of non-linearity or overfitting. To
achieve efficiency, FGSM is designed for learning perturbations via a single gradient
step. Although this procedure accelerates the adversarial training, it often fails to
find the minimal perturbation and results in high perturbation cost [104].
Kurakin et al. [338] refined the FGSM by repeating the gradient step many
times with a smaller step size in each iteration. This iterative FGSM (I-FGSM)
misleads the classifier in a higher rate with relatively smaller perturbations. Kurakin
et al. also shown that the proposed I-FGSM can mislead the target classifier even
for the physical world systems. Specifically, they printed the generated adversarial
example on papers and taken their photos with cell phone camera. The reported
results elaborate that a large fraction of these photos are incorrectly classified by an
ImageNet Inception classifier [588]. The DeepFool method [441] further reduces the
perturbations strength by iteratively searching for the distance between a clean input
to its closest classification hyperplane. However, the greedy optimization strategies
in both I-FGSM and DeepFool are easily leading to a local optimum.
Dong et al. [161] designed the momentum I-FGSM (MI-FGSM), which employs
a velocity vector to memorize all the previous gradients during iterations to escape
from poor local maximum. Besides the white-box attack, Dong et al. also explored
the blackbox attack by improving the transferability of adversarial examples. To
improve the transferability, they studied momentum iterative methods for attacking
an ensemble of models instead of only one model. The theoretical foundation is that
if the generated adversarial example can fool all the ensemble models, it is more
likely to achieve success attack on an unknown model, as the transferability is the
fact that different machine learning models learn similar decision boundaries around
a data point.
Recently, Xiang et al. [658] embedded the FGSM into the gray-box attack
scheme, where the victim network structure is inaccessible but can be derived by the
side-channel attack (SCA). Specifically, the SCA is a technique that derives internal
knowledge via hardware side-channel information, such as time/power consumption
and electromagnetic radiation. Although SCA cannot exactly reveal the parameter
weights or loss function, it can derive the basic network structure. Therefore, it is
more practical than a white-box attack, as the network structure is usually unknown,
yet superior to the blackbox model where no information is available.
242 6 Physical World Adversarial Attacks on Images and Texts
Score-based attack [260, 292, 452] relies on the output scores (e.g., predicted
probability) instead of the gradient information in constructing adversarial pertur-
bations, without access of either model architecture and model weights. Narodytska
et al. [452] applied the confidence score to guide a greedy local search method,
which finds a few pixels (even single pixel) that are most helpful in generating
perturbations. It adopts the “top-k misclassification” criteria, which means the
search procedure will stop until it pushes the correct label out of the top-k scores.
One shortcoming for single pixel attack is that the perturbed pixel may outside of
the expected range.
Hayes and Danezis [260] trained an attacker neural network to learn pertur-
bations, which then used to attack another blackbox target network. The attacker
model is trained to minimize the difference between the original input image and
the output adversary image, where the output image can mislead the target model.
To achieve this, they defined the loss function by combining the output confidence
scores of both networks, i.e., the reconstruction loss and misclassification loss.
The reconstruction loss measures the distance between the input and output of the
attacker model to ensure the adversarial output looks similar with the clean input.
The misclassification loss is defined according to the type of attack (targeted or
untargeted) to make high-attack success rate.
Ilyas et al. [292] considered three more realistic scenarios than typical blackbox
settings, including (1) the query-limited setting, (2) partial information setting, and
(3) label-only setting. Specifically, the query-limited setting means the attacker
has limited number of queries to the classifier, the partial information setting
indicates that the adversary only know the top-k probabilities, and the label-only
setting denotes the attacker only has access to the top-k labels but does not know
their probabilities. For the query-limited setting, the authors employed the natural
evolution strategy (NES) to estimate the gradient and generate adversarial examples.
To solve the query-limited setting, the authors started from an instance of the
target class instead of the original input, so the top-k class will be appeared in the
prediction results. For the label-only setting, they further defined the Monte Carlo
approximation to estimate the proxy score of softmax probability.
Based on [292], Zhao et al. [704] proposed a zeroth-order natural gradient
descent (ZO-NGD) algorithm to perform adversarial attacks. Specifically, it mul-
tiplies the natural gradient with the Fisher information matrix (FIM) to optimize
the probabilistic models. Then it incorporates FIM with the second-order natural
gradient descent (NGD) to achieve high query efficiency.
6.1 Adversarial Attacks on Images 243
Decision-based attack [85, 113, 114, 360, 543] requires only the model classification
decision (i.e., the top-1 class label) and frees the need of either model gradient or
their output scores. One typical work is the boundary attack [85], which starts with
an adversarial point, i.e., an image selected from the target class. Then it reduces the
noise by implementing the random walk along the decision boundary while staying
adversarial. This method adds minimal perturbations (in terms of L2 -distance)
compared with gradient-based methods and requires almost no hyperparameter to
tune. However, it needs much more iterations to deliver the final adversarial example
due to the slow convergence.
Different from boundary attack that minimize perturbations in terms of the L2 -
norm, Schott et al. [543] proposed a novel decision-based attack, i.e., pointwise
attack, that reduces noises by minimizing the L0 -norm. It firstly initializes the start-
ing point with salt-pepper noise or Gaussian noise until the image is misclassified.
Then it repeatedly resets each perturbed pixel to clean image while making sure the
noisy image still adversarial. This procedure goes on until there is no pixel can be
reset anymore.
Chen and Jordan [113] developed the boundary attack [85] and proposed an
unbiased estimation of the gradient direction at the decision boundary using binary
search. They analyze the estimation error when the sample is not exactly lying at the
boundary and named their method as Boundary Attack ++. Compare with boundary
attack, Boundary Attack ++ not only reduces the number of model queries but also
able to switch between L2 and L∞ distance by designing two clip operators.
In [114], Chen et al. employed the binary information of the decision boundary to
estimate the gradient direction and presented the decision-based HopSkipJumpAt-
tack (HSJA). This method is designed for both targeted or untargeted attack by
minimizing the distance of L2 or L∞ . Specifically, HSJA is an iterative algorithm,
where each iteration contains three steps: the gradient direction estimation, the
geometrical step-size search, and a binary method for boundary search. This method
achieved competitive results by attacking popular defense mechanisms, while its
query efficiency needs improvement.
Li et al. [360] pointed out that the large number of query iterations for boundary-
based attack is due to the high dimensional input (e.g., image). Thereby, three
subspace optimization methods (i.e., spatial subspace, frequency subspace and
principle component subspace) are explored in their Query-Efficient Boundary-
Based Blackbox Attack (QEBA) for perturbation sampling. In particular, the spatial
subspace leverages linear interpolation to reduce the image into a low-dimensional
space. The second frequency subspace is obtained by discrete cosine transformation
(DCT), while the third one selects major components with principle component
analysis (PCA).
244 6 Physical World Adversarial Attacks on Images and Texts
Finally, transformation-based attack [115, 175, 630, 662] crafts adversarial images
by shifting pixels’ spatial location instead of directly modifying their value. For
example, Xiao et al. [662] proposed the spatially transformed adversarial (stAdv)
method, which measures the magnitude of perturbations via local geometry distor-
tion instead of the Lp -norm. The reason is that the spatial transformation on an
image often leads to large Lp loss, but such perturbations are visually imperceptible
to human and hard to be defend. For each pixel, its spatial location can be moved
to four-pixel neighbors, i.e., top-left, top-right, bottom-left, and bottom-right. The
stAdv constructs an objective function to minimize the local distortion and solve
this minimizing problem with the L-BFGS optimizer [379].
Engstrom et al. [175] also found that simply rotating or translating a natural
image is enough to fool a deep vision model. To simultaneously perform the
translation and rotation, the author defines three parameters where two parameters
for translation and one angle parameter to control the rotation. Then they designed
three distinct ways to optimize these parameters, including the first-order method,
the grid search, and the worst-of-k selection. The first-order method needs full
knowledge of the classifier to compute the gradient of loss function, while the
second and third strategies can perform under blackbox settings.
Wang et al. [630] investigated the effect of image spatial transformation on the
image-to-image (Im2Im) translation task, which is more sophisticated than pure
classification problem. They revealed that the geometrical image transformation
(i.e., translation, rotation, and scale) in the input domain can cause incorrect color
map of Im2Im framework in the target domain. Different from the previous works
that depend only on the spatial transformation,
Chen et al. [115] integrated linear spatial transformation (i.e., affine transfor-
mation) with color transformation and proposed a two-phase combination attack.
Except the affine transformation, the authors defined the color transformation as
the change of illumination, because these adjustments do not change the semantic
information of an image. Besides, since the Lp -norm is inappropriate for measuring
the adversarial quality in transformation attack, the authors employed the structural
similarity index (SSI) [638] to measure the perceptual quality. These adversary
models can be potentially applied to protect social users’ interaction for influence
learning [95, 362].
Based on attacker’s knowledge, these methods can be divided into white-box
attack, blackbox attack and gray-box attack. Specifically, white-box attack assumes
attackers know everything about the victim model (e.g., architecture, parameters,
training method, and data), blackbox attack assumes the adversary only knows the
output of the model (prediction label or probability) given an input, and gray-box
attack means the scenario where the hacker knows part of information (e.g., the
network structure) and the rest (e.g., parameters) is missing. Based on attacker’s
specificity, these methods fall into targeted attack where the model outputs a user-
6.2 Adversarial Attacks on Texts 245
Table 6.1 Summary of the properties for different attacking methods. The properties are Targeted
attack, Untargeted attack, White-box attack, Blackbox attack and Gray-box attack
Properties
Attacking methods Targeted Untargeted White Black Gray
FGSM [228]
I-FGSM [338]
DeepFool [441]
MI-FGSM [161]
Xiang et al. [658]
Narodytska et al. [452] Top-k misclass
Hayes and Danezis [260]
Ilyas et al. [292]
Zhao et al. [704]
Boundary Attack [85]
Pointwise attack [543]
Boundary Attack ++ [113]
HSJA [114]
QEBA [360]
stAdv [662]
Engstrom et al. [175]
Wang et al. [630]
Chen et al. [115]
specified label, or untargeted attack where the model is misled to any label other
than the correct label. A summary is provided in Table 6.1.
Compared with adversarial image attack, the vulnerability of deep learning models
in text recognition is greatly underestimated. There are some difficulties in crafting
text adversarial samples. Firstly, the output of a text attack system should meet
various natural properties, such as lexical correctness, syntactic correctness, and
semantic similarity. These properties make sure the human prediction will not
change after the adversarial attack. Secondly, the words in text sequences are
discrete tokens instead of continuous pixel values in image space. Therefore, it is
infeasible to directly compute the model gradient with respect to every word. A
direct roundabout method is mapping the sentences into continuous word embed-
ding space [483], but it cannot ensure that words closed in the embedding space
are syntactically coherence to readers [13]. Thirdly, making small perturbations on
many pixels may still yield a meaningful image from the view of human perception.
However, any small changes, even a single word, on text document can make a
sentence meaningless.
246 6 Physical World Adversarial Attacks on Images and Texts
The first attempt of text attack can be traced back to 2016, when Papernot
et al. [483] investigating the robustness of recurrent neural networks (RNNs) in
processing sequential data. In this work [483], Papernot et al. proved that the
RNNs can be 100% fooled by averagely changing 9 words in a 71-word movie
review for sentiment analysis task. Since 2016, several lines of works have been
proposed to generate adversarial text examples, including the character-level attack
[42, 169, 209], word-level attack [13, 302, 333, 483, 509], and sentence-level
attack [300, 625]. Table 6.2 elaborates three adversarial examples generated by
different attack strategies. Specifically, character-level attack generates adversarial
texts by deleting, inserting, or swapping characters. However, these character-level
modifications lead to misspelled words, which can be easily detected by spell check
machines. Sentence-level attack concatenates an adversarial sentence before or after
the original texts to confuse deep architecture models, but they usually lead to
dramatic semantic changes and generate human incomprehensible sentences. To
address these drawbacks, most recent studies have focused on the word-level attack,
which replaces the original word with another carefully selected one. However,
Table 6.2 Three successful adversarial text examples generated by the character-level attack,
sentence-level attack, and word-level attack strategies
Character-level attack modifies input character from “p → B” [169].
Original: Chancellor Gordon Brown has sought to quell speculation over who should run
the Labour Party and turned the attack on the opposition Conservatives.
Adversarial: Chancellor Gordon Brown has sought to quell speculation over who should run
the Labour Party and turned the attack on the oBposition Conservatives.
Sentence-level attack adds one sentence at the end of the input [300].
Original: Peyton Manning became the first quarterback ever to lead two different teams to
multiple Super Bowls. He is also the oldest quarterback ever to play in a Super Bowl at age
39. The past record was held by John Elway, who led the Broncos to victory in Super Bowl
XXXIII at age 38 and is currently Denver’s Executive Vice President of Football Operations
and General Manager.
Adversarial: Peyton Manning became the first quarterback ever to lead two different teams
to multiple Super Bowls. He is also the oldest quarterback ever to play in a Super Bowl at
age 39. The past record was held by John Elway, who led the Broncos to victory in Super
Bowl XXXIII at age 38 and is currently Denver’s Executive Vice President of Football
Operations and General Manager. Quarterback Jeff Dean had jersey number 37 in Champ
Bowl XXXIV.
Word-level attack replaces input word from “f unny → laughable” [509].
Original: Ah man this movie was funny as hell, yet strange. I like how they kept the
shakespearian language in this movie, it just felt ironic because of how idiotic the movie
really was. this movie has got to be one of troma’s best movies. highly recommended for
some senseless fun!
Adversarial: Ah man this movie was laughable as hell, yet strange. I like how they kept the
shakespearian language in this movie, it just felt ironic because of how idiotic the movie
really was. this movie has got to be one of troma’s best movies. highly recommended for
some senseless fun!
6.2 Adversarial Attacks on Texts 247
existing word substitution strategies are far from perfect in achieving high-attack
success rate and low substitution rate.
In this section, we review the related text attack methods, including character-
level attack (6.2.1), sentence-level attack (6.2.2), word-level attack (6.2.3), and
multilevel attack (6.2.4).
Firstly, character-level attack [42, 169, 170, 209, 218] generates adversarial text by
deleting, inserting or swapping characters. Belinkov and Bisk [42] devised four
types of synthetic noise: (1) swap two adjacent characters but exclude the first and
last letters (e.g., noise → nosie), (2) randomize the order of all the letters in a
word except for the first and last (e.g., noise → nisoe), (3) fully random a word
including the first and last characters (e.g., noise → iones), and (4) keyboard
typo that randomly replace one letter in each word with an adjacent key (e.g.,
noise → noide). These strategies can mislead the neural machine translation
(NMT) models in a large degree. However, they modify every word of an input
sentence as they can, which leads to a high perturbation loss. For example, the
“swap” of two letters is applied to all words with length ≥ 4, as it does not alter
the first and last letters.
To reduce the distortion degree, Ebrahimi et al. [169] proposed HotFlip, which
represents every character as a one-hot vector and proposes two character oper-
ations, i.e., the character insertion and character deletion. Specifically, HotFlip
estimates the best character change (aka, atomic flip operation) by computing direc-
tional derivatives with respect to one-hot vector representation. Then it employs a
beam search to find a sequential of manipulations that can perform well together
to confuse a well-trained classifier. Besides, the HotFlip sets the upper bond of
character flips as 20% for each training sample to restrict the manipulations.
To minimize the edit distance and reduce the distortion degree, Gao et al. [209]
designed a blackbox DeepWordBug and made the text perturbations only on those
highest important words. Specifically, it evaluates the word importance score by
directly removing words one by one and comparing the prediction changes. The
DeepWordBug modifies words by following four character operations, including
(1) replace a letter in the word with a random letters, (2) delete a random character
in the word, (3) insert a random letter in the word, and (4) swap two adjacent letters
in the word. They defines the edit distance as the Levenshtein distance, so the edit
distance for (1), (2), and (3) is 1, but this distance for (4) is 2.
Gil et al. [218] exhibited that the HotFlip method that designed under the
white-box setting can be applied to performing blackbox attack via an efficient
distillation. This white-to-black procedure contains three steps: firstly, train a source
text classification model and a target blackbox model; secondly, craft adversarial
examples by attack the source model with HotFlip under white-box; and, thirdly,
train an attacker to generate new adversarial examples to attack the black box target
248 6 Physical World Adversarial Attacks on Images and Texts
model. The attacker is trained using the (input, output) pairs with a carefully
designed cross-entropy loss function, where the input denotes the original input
word and the output is the modifications made in the second step.
Eger et al. [170] proposed Visual Perturber (VIPER) algorithm to replace
characters with their visually similar symbols, which is commonly used in Internet
slang (e.g., n00b) and toxic comment (e.g., !d10t), etc. The advantages of visual
attack include the needless of any linguistic knowledge beyond the character level
and the less damaging to human perception and understanding. The visually similar
symbol candidates are selected from three character embedding space, including
the image-based character embedding space (ICES), description-based character
embedding space (DCES), and easy character embedding space (ECES). The ECES
achieves the maximal effect on the target model by appending symbol below or
above a character (e.g., c → ĉ), but these perturbations need manual selection.
However, one common drawback for character-level attacks is that they breaks
the lexical constraint and leads to misspelled word, which can be easily detected and
removed by a spell-check machine installed before the classifier.
Sentence-level attack [250, 300, 348, 564, 625, 632] concatenates an adversarial
sentence before or more commonly after the clean input text to confuse deep
architecture models. For example, Jia and Liang [300] appended a compatible
sentence to the end of paragraph to fool reading comprehension models (RCM).
The adversarial sentence looks similar to the original question by combining altered
question and fake answers, aiming to mislead RCM into wrong answer location.
Nevertheless, this strategy requires a lot of human intervention and cannot be fully
automated, e.g., it relies on about 50 manually defined rules to ensure the adversarial
sentence in a declarative form.
Wallace et al. [625] sought for the universal adversarial triggers, i.e., input-
agnostic sequences, which causes a specific target prediction when it is concatenated
to any input from the same dataset. The universal sequence is randomly initialized
and iteratively updated to increase the likelihood of the target prediction using token
replacement gradient as HotFlip, while this method fails to guarantee a semantically
meaningful output to human perception and often generates irregular text (e.g.,
“zoning tapping fiennes”).
Recently, Song et al. [564] proposed the Natural Universal Trigger Search
(NUTS) to craft fluent trigger that carries semantic meanings. The NUTS employs a
pre-trained adversarially regularized autoencoder (ARAE) to generate triggers and
adopts a gradient-based search to maximize the loss function of the classification
system. During optimization, multiple independent noise vectors (256 vectors in
their experiment) are firstly initialized. Then those optimized candidate triggers are
re-ranked according to both of the classifier accuracy and the naturalness.
6.2 Adversarial Attacks on Texts 249
Word-level attack [13, 212, 302, 359, 365, 483, 509, 689] replaces original input
words with carefully picked words. The core problems are (1) how to select proper
250 6 Physical World Adversarial Attacks on Images and Texts
candidate words and (2) how to determine the word substitution order. Incipiently,
Papernot et al. [483] projected words into a 128-dimension embedding space and
leveraged the Jacobian matrix to evaluates input-output interaction. However, a
small perturbation in the embedding space may lead to totally irrelevant words since
there is no hard guarantee that words close in the embedding space are semantically
similar. Therefore, subsequent studies focused on synonym substitution strategy
that search synonyms from the GloVe embedding space, existing thesaurus (e.g.,
WordNet and HowNet), or BERT masked language model (MLM).
By using GloVe, Alzantot et al. [13] designed a population-based genetic
algorithm (GA) to imitate the natural selection. The optimization procedure starts
from the initial generation by a set of distinct word modifications. In every next
generation, crossover, and mutation are employed for population evolving and
candidate optimization. Particularly, the crossover takes more than one parent
solution to produce one child solution, and the mutation is designed for increasing
the diversity of population members. Jin et al. [302] presented TextFooler, which
collected substitution candidates from GloVe embedding space. Different from
the GA, TextFooler determines the word substitution order by calculating the
word importance score (WIS). Specifically, the WIS is defined as the reduction
of the true label probability and the increase of the wrong label score by iter-
atively deleting each input word. However, the GloVe embedding usually fails
to distinguish antonyms from synonyms. For example, the nearest neighbors for
expensive in GloVe space are {pricey, cheaper, costly}, where cheaper is its
antonym. Therefore, Glove-based algorithms have to use a counter-fitting method
to postprocess adversary’s vectors to ensure the semantic constraint [444].
Compared with GloVe, utilizing well-organized linguistic thesaurus, e.g.,
synonym-based WordNet [429] and sememe-based HowNet [162], is more
straightforward way. Specifically, the WordNet [429] is large lexical dataset of
English, in which nouns, verbs, adjectives, and adverbs are grouped into sets of
cognitive synonyms (synsets). HowNet [162] annotates words by their sememes,
where the sememe is a minimum unit of semantic meaning in linguistics. Ren
et al. [509] sought synonyms for each input word from the WordNet synsets and
determined the replacement priority of the input words by calculating the probability
weighted word saliency (PWWS). Then they sequentially substitute each word with
the best candidate following the PWWS descending order until find a successful
adversarial sample. Zang et al. [689] manifested that the sememe-based HowNet
can provide more substitute words than WordNet and proposed the particle swarm
optimization (PSO) to determine which group of words should be attacked. In PSO,
each sentence is treated as a particle in a search space, and each dimension of the
particle corresponds to a word. Therefore, a successful adversarial example can be
found by gradually optimizing the particle’s location.
Some recent studies utilized BERT masked language model (MLM) to generate
contextual perturbations, such as BERT-Attack [365] and BERT-based adversarial
examples (BAE) [212]. The pre-trained BERT MLM can ensure the predicted
token fit in the sentence well but unable to preserve the semantic similarity. For
example, in the sentence “the food was [MASK],” predicting the [MASK] as good
6.2 Adversarial Attacks on Texts 251
or bad is equally fluent but resulting in opposite sentiment label. Besides, both of
BERT-Attack and BAE adopt a static word replacement order guided by the word
importance score (WIS), leading to redundancy word substitution. The difference
lies in that Garg and Ramakrishnan [212] defined the WIS as probability decrease
of the correct label after deleting a word, while Li et al. [365] replaced each of the
original words by a dummy symbol [MASK].
In addition, Li et al. [359] presented the ContextuaLized AdversaRial Example
(CLARE) model to generate fluent adversarial output via a mask-then-infill pro-
cedure. Instead of using the BERT MLM, the CLARE employs the pre-trained
RoBERTa [390] MLM to provide the contextualized infilling words. The CLARE
adopts three text perturbations, i.e., replace, insert, and merge, which are replace an
input token, insert a new token, and merge a bigram. For each input word, CLARE
will try all these three perturbations and select the one that minimizes the gold
label’s probability.
Multilevel attack combines at least two of the above three attack strategies to
create adversarial text [363, 368, 626]. Unlike the single strategy, multilevel
attack algorithms are relatively more complicated and computationally expensive
[633]. For example, Liang et al. [368] presented to dress up text input on both
character-level and word-level via three strategies, i.e., insertion, modification,
and removal. These strategies are applied on those hot characters and hot words
(i.e., classification-important items) that identified by leveraging the cost gradient.
Besides, they proposed a natural language watermarking technique to improve the
readability and utility of the adversarial text, e.g., inserting semantically empty
phrases. It is worth mentioning that using a single strategy (e.g., removal) is often
insufficient to fool a classifier and combining three strategies is essential to crafting
subtle adversarial samples. However, there lacks a clear optimization principle about
how to combine these strategies.
Li et al. [363] proposed the TextBugger that modified the benign text on both
word-level and character-level. Specifically, it defines five kinds of bug perturbation
methods, including (1) insert a space into the word, (2) delete a random character
of the word except the first and last character, (3) swap two adjacent letters of a
word, (4) replace characters with visually similar characters, and (5) replace a word
with its k-nearest neighbors in the GloVe embedding space. For each input word, it
selects the best bug from these five strategies as the one that reduces the ground truth
probability the most. The final adversarial output is crafted by iteratively repeating
this procedure on every input word.
Wang et al. [626] presented a tree-based attack framework T3 that perturbed
text on both word-level (T3(WORD)) and sentence-level (T3(SENT)). The core
component of T3 is a pre-trained tree-based autoencoder, which can convert the
discrete text space into a continuous semantic embedding space. This solves the
252 6 Physical World Adversarial Attacks on Images and Texts
Table 6.3 Summary of the properties for different text attacking methods. The properties are
Targeted attack, Untargeted attack, White-box attack, and Black-box attack
Properties
Attacking methods Targeted Untargeted White Black
Belinkov and Bisk [42]
HotFlip [169]
DeepWordBug [209]
Gil et al. [218]
VIPER [170]
Jia and Liang [300]
Wallace et al. [625]
NUTS [564]
CATGen et al. [632]
Han et al. [250]
Malcom [348]
Papernot et al. [483]
Alzantot et al. [13]
TextFooler [302]
PWWS [509]
PSO [689]
BERT-Attack [365]
BAE [212]
CLARE [359]
Liang et al. [368]
TextBugger [363]
T3 [626]
Email spam filtering has been analyzed as a lazy learning problem in concept
drifts [152]. Kazemian et al. [312] compare machine learning techniques to detect
6.3 Spam Filtering 253
Due to their unprecedented accuracy, deep learning methods have become the
basis of new AI-based services on the Internet in big data era. Meanwhile, it
raises obvious privacy issues. The deep learning-assisted privacy attack can extract
sensitive personal information not only from the text but also from unstructured
data such as images and videos. This prompts us to revisit the privacy challenges in
a big data era with various intelligent technologies emerging [375]. In particular, the
emerging deep learning technique can “automatically collect and process millions
of photos or videos to extract private/sensitive information from social networks.”
Therefore, thoroughly investigating the privacy problem in the context of deep
learning is an urgent need.
Although most of the existing research work considered adversarial examples
(AEs) or adversarial perturbations (APs) as attack methods that threaten the system
security, AP can also serve as a privacy protection tool when facing the deep
learning-based privacy attacks. The fundamental idea of AP is to generate a
small but intentional worst-case disturbance to an original image, which misleads
CNN-based recognition models without causing a significant difference perceptible
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 255
A. Sreevallabh Chivukula et al., Adversarial Machine Learning,
https://doi.org/10.1007/978-3-030-99772-4_7
256 7 Adversarial Perturbation for Privacy Preservation
For file-level privacy protection, we aim to mislead the deep learning tool to a wrong
image class. We consider the scenario of social networks [376]. In more detail,
users post images on social network platforms. Suppose an attacker collects images
through a crawler and use DNNs to mine sensitive information. Figure 7.1 shows
an example of such system architecture. When a user shares an image on social
Fig. 7.1 An example of the system architecture for AP-based file-level privacy protection
258 7 Adversarial Perturbation for Privacy Preservation
where o is the observation, classp is the predicted class of the adversary, and classX
is the true class of the original image X.
The output of P 1 will be a number between 0 and 1, where “0” means completely
private and “1” indicates no privacy.
There are many different methods to generate the noise for the adversarial
example, among which the most widely used one is the fast gradient sign method
(FGSM).
Let θ be the parameters of a model, X the input to the model, y the targets
associated with X (we can randomly pick up a class that we want to mislead the
deep learning model), and J (θ ; X; y) be the cost function (output) used to train the
neural network [228]. The cost function can be linearized around the current value
of θ , obtaining an optimal max-norm constrained perturbation of
η = sign(∇X J (θ ; X; y)),
where is a small scalar which keeps the noise imperceptible to human eyes and
∇X is the gradient of the cost function J with regard to the input image X,
∂J
∇X J (θ ; X; y) = .
∂X
And the release image is generated by
X = η + X.
Figure 7.2 gives an example of the result of the file-level privacy protection. The
deep learning model has high confidence (92.42%) to classify the original image as
“minibus.” And when we add a small noise using FGSM, it will be misclassified as
a “washbasin” with even higher confidence (99.37%).
Existing research results show that AP-based methods can achieve good privacy
protection against the deep learning tools at the cost of adding a small amount
7.1 Adversarial Perturbation for Privacy Preservation 259
Fig. 7.2 An example of the result of the file-level privacy protection (the colors of noises are
amplified by normalization otherwise they would be hard to see)
Private-objects
Object
Input Image Background
Detecon
Non-private
Objects
Image with
adversarial noise
Output
Image Use the loss to generate noise
Noise
Adversarial label replacement
Step 2: image privacy protecon using adversarial perturbaon
of noise that is imperceptible to human eyes [376]. And the effectiveness of the
proposed method is especially good with images of complex structures and textures.
File-level privacy protection is suitable for simple images that contain only one
major object. In practice, there are generally multiple objects in a given image,
especially for social network images. And some of the objects are privacy-sensitive,
while others might be privacy-insensitive. In this case, we can use an object-level
privacy protection framework to solve the problem [673].
As shown in Fig. 7.3, the framework can consist of two major steps: (i)
identifying private objects in the image and (ii) image privacy protection using
adversarial perturbation.
260 7 Adversarial Perturbation for Privacy Preservation
For the first step, a DNN-based object detector can be used. If we have an input
image X, the output of the object detection module is represented as
⎛ ⎞
x1 y1 h1
w1 c1
⎜ x2 y2 h2
w2 c2 ⎟
⎜ ⎟
C(X) = ⎜ . .. ..
.. .. ⎟ ,
⎝ .. . .
. . ⎠
xn yn wn hn cn
pr
where ∀cj ∈ Cprivate : cj = cbg .
Based on the above-described framework, our target is to fool the network by
changing the class of the private objects to background, while the non-private
objects are recognized as their original classes. Meanwhile, the added noise δX
should be small so that it is imperceptible for humans. Hence, the problem can be
formulated as follows:
pr
s.t.: ∀cj ∈ Cprivate : cj = cbg
pr
∀cj ∈ Cnon-private : cj = cj
An AP-based image privacy protection algorithm can be used to solve the above
problem. As shown in Fig. 7.3, the object detector finds all objects in the image at
the beginning. Then, we replace the label of the private objects with the background
and use the corresponding loss function to calculate the gradient. Then the noise
is updated according to the gradient. Finally, the perturbed image is generated, in
which all privacy objects are treated as background by the object detector.
The key part of the algorithm is to trick the classification loss (Lcls ) so as to
mislead the object detector recognizing the privacy objects to background, as shown
in Eq. (7.1):
1
Lcls = En(pi , pi∗ ) + λ X − Xpr , (7.1)
n 2
i
Lcls
δX = −sign(∇X Lcls ) = −sign( ),
∂X
where is the step parameter that scales the noise. Therefore, the generated image
will be:
Lcls
Xpr = X + δX = X − sign( )
∂X
In some other cases, we only need to change certain features in image or video,
using the human imperceptible adversarial perturbation. A typical example is to
change the person’s identity (against the face recognition system) in the image while
keeping the appearance visually unchanged.
A face recognition system is a technology that is capable of recognizing or
authenticating a person from an image or a video frame. With recent advanced
262 7 Adversarial Perturbation for Privacy Preservation
Noise
Adversarial face image's identy vector
Distance Calculaon
Face identy vector
somax vector
Transform to
[s1, s2]
If s1>s2: same person;
else: not the same
Input Image Face person
Detecon Crop Deep Neural Network
Fig. 7.4 Illustration of a typical face recognition system and the process of generating adversarial
image perturbation
where I DX is the identity of the original image and I DX is the identity of the image
with perturbation.
7.1 Adversarial Perturbation for Privacy Preservation 263
X0 = X
Xn = Xn−1 + sign(∇X J (θ ; Xn−1 ; y))
= Xn−1 + ηn−1 , 1 ≤ n ≤ N.
An illustration of the process of is shown in Fig. 7.4. First, a different person was
specifically or randomly selected. Then the embedding vector of this adversarial
face will be calculated and used as the value of y. The image with adversarial
perturbation is generated by the PGD algorithm and finally tested using the face
recognition system.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 265
A. Sreevallabh Chivukula et al., Adversarial Machine Learning,
https://doi.org/10.1007/978-3-030-99772-4
266 References
14. A. ANANDKUMAR, P. JAIN, Y. SHI, AND U. N. NIRANJAN, Tensor vs. matrix methods:
Robust tensor decomposition under block sparse perturbations, in Proceedings of the 19th
International Conference on Artificial Intelligence and Statistics, A. Gretton and C. C. Robert,
eds., vol. 51 of Proceedings of Machine Learning Research, Cadiz, Spain, 09–11 May 2016,
PMLR, pp. 268–276.
15. M. ANCONA, C. ÖZTIRELI, AND M. H. GROSS, Explaining deep neural networks with
a polynomial time algorithm for shapley value approximation, in Proceedings of the 36th
International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach,
California, USA, 2019, pp. 272–281.
16. P. ANDERSEN, M. GOODWIN, AND O. GRANMO, Deep RTS: A game environment for
deep reinforcement learning in real-time strategy games, in 2018 IEEE Conference on
Computational Intelligence and Games, CIG 2018, Maastricht, The Netherlands, August 14-
17, 2018, IEEE, 2018, pp. 1–8.
17. A. B. ARRIETA, N. D. RODRÍGUEZ, J. D. SER, A. BENNETOT, S. TABIK, A. BARBADO,
S. GARCÍA, S. GIL-LOPEZ, D. MOLINA, R. BENJAMINS, R. CHATILA, AND F. HERRERA,
Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges
toward responsible AI, Inf. Fusion, 58 (2020), pp. 82–115.
18. K. ASIF, W. XING, S. BEHPOUR, AND B. D. ZIEBART, Adversarial cost-sensitive classifica-
tion, in Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence,
UAI’15, Arlington, Virginia, USA, 2015, AUAI Press, pp. 92–101.
19. A. ATHALYE, L. ENGSTROM, A. ILYAS, AND K. KWOK, Synthesizing robust adversarial
examples, in Proceedings of the 35th International Conference on Machine Learning, J. Dy
and A. Krause, eds., vol. 80 of Proceedings of Machine Learning Research, PMLR, 10–15
Jul 2018, pp. 284–293.
20. A. ATTAR, R. M. RAD, AND R. E. ATANI, A survey of image spamming and filtering
techniques, Artificial Intelligence Review, 40 (2013), pp. 71–105.
21. P. AUER, N. CESA-BIANCHI, Y. FREUND, AND R. SCHAPIRE, Gambling in a rigged
casino: The adversarial multi-armed bandit problem, in Proceedings of IEEE 36th Annual
Foundations of Computer Science, 1995, pp. 322–331.
22. T. BACK, F. HOFFMEISTER, AND H.-P. SCHWEFEL, A survey of evolution strategies,
in Proceedings of the Fourth International Conference on Genetic Algorithms, Morgan
Kaufmann, 1991, pp. 2–9.
23. D. BAEHRENS, T. SCHROETER, S. HARMELING, M. KAWANABE, K. HANSEN, AND K.-R.
MÜLLER, How to explain individual classification decisions, J. Mach. Learn. Res., 11 (2010),
pp. 1803–1831.
24. D. BALDUZZI, Grammars for games: A gradient-based, game-theoretic framework for
optimization in deep learning, Frontiers Robotics AI, 2 (2016), p. 39.
25. S. BALUJA AND I. FISCHER, Learning to attack: Adversarial transformation networks, in
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
26. ——, Learning to attack: Adversarial transformation networks, in Proceedings of AAAI-
2018, 2018.
27. S. BANDARU, A. H. C. NG, AND K. DEB, Data mining methods for knowledge discovery in
multi-objective optimization: Part A - survey, Expert Syst. Appl., 70 (2017), pp. 139–159.
28. ——, Data mining methods for knowledge discovery in multi-objective optimization: Part B
- new developments and applications, Expert Syst. Appl., 70 (2017), pp. 119–138.
29. S. BANDYOPADHYAY, S. K. PAL, AND C. MURTHY, Simulated annealing based pattern
classification, Information Sciences, 109 (1998), pp. 165–184.
30. M. BARRENO, B. NELSON, A. D. JOSEPH, AND J. D. TYGAR, The security of machine
learning, Mach. Learn., 81 (2010), p. 121–148.
31. M. BARRENO, B. NELSON, R. SEARS, A. D. JOSEPH, AND J. D. TYGAR, Can machine
learning be secure?, in Proceedings of the 2006 ACM Symposium on Information, Computer
and Communications Security, ASIACCS ’06, New York, NY, USA, 2006, ACM, pp. 16–25.
32. ——, Can machine learning be secure?, in Proceedings of the 2006 ACM Symposium on
Information, Computer and Communications Security, ASIACCS ’06, New York, NY, USA,
2006, Association for Computing Machinery, p. 16–25.
References 267
115. J. CHEN, D. WANG, AND H. CHEN, Explore the transformation space for adversarial images,
in Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy,
2020, pp. 109–120.
116. P.-Y. CHEN, H. ZHANG, Y. SHARMA, J. YI, AND C.-J. HSIEH, Zoo: Zeroth order optimization
based black-box attacks to deep neural networks without training substitute models, in
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, AISec ’17,
New York, NY, USA, 2017, Association for Computing Machinery.
117. S. CHEN, M. XUE, L. FAN, S. HAO, L. XU, H. ZHU, AND B. LI, Automated poisoning attacks
and defenses in malware detection systems: An adversarial machine learning approach,
Computers and Security, 73 (2018), pp. 326–344.
118. T. CHEN, J. LIU, Y. XIANG, W. NIU, E. TONG, AND Z. HAN, Adversarial attack and defense
in reinforcement learning-from AI security view, Cybersecur., 2 (2019), p. 11.
119. X. CHEN, Y. DUAN, R. HOUTHOOFT, J. SCHULMAN, I. SUTSKEVER, AND P. ABBEEL, Info-
gan: Interpretable representation learning by information maximizing generative adversarial
nets, in Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama,
U. V. Luxburg, I. Guyon, and R. Garnett, eds., Curran Associates, Inc., 2016, pp. 2172–2180.
120. Z. CHEN, C. WANG, H. WU, K. SHANG, AND J. WANG, Dmgan: Discriminative metric-based
generative adversarial networks, Knowl. Based Syst., 192 (2020), p. 105370.
121. M. CHENG, J. YI, H. ZHANG, P. CHEN, AND C. HSIEH, Seq2sick: Evaluating the robustness
of sequence-to-sequence models with adversarial examples, CoRR, abs/1803.01128 (2018).
122. T. CHIN, C. ZHANG, AND D. MARCULESCU, Improving the adversarial robustness of
transfer learning via noisy feature distillation, CoRR, abs/2002.02998 (2020).
123. A. CHIVUKULA AND W. LIU, Adversarial deep learning models with multiple adversaries,
IEEE Transactions on Knowledge and Data Engineering, 31 (2019), pp. 1066–1079.
124. A. CHIVUKULA, X. YANG, W. LIU, T. ZHU, AND W. ZHOU, Game theoretical adversarial
deep learning with variational adversaries, IEEE Transactions on Knowledge and Data
Engineering, (2020), pp. 1–1.
125. A. S. CHIVUKULA AND W. LIU, Adversarial deep learning models with multiple adversaries,
IEEE Transactions on Knowledge and Data Engineering, 31 (2019), pp. 1066–1079.
126. A. S. CHIVUKULA, X. YANG, W. LIU, T. ZHU, AND W. ZHOU, Game theoretical adversarial
deep learning with variational adversaries, IEEE Transactions on Knowledge and Data
Engineering, 33 (2021), pp. 3568–3581.
127. J.-H. CHO, P. M. HURLEY, AND S. XU, Metrics and measurement of trustworthy systems, in
MILCOM 2016 - 2016 IEEE Military Communications Conference, 2016, pp. 1237–1242.
128. S. CHOPRA, R. HADSELL, AND Y. LECUN, Learning a similarity metric discriminatively,
with application to face verification, in 2005 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR’05), vol. 1, 2005, pp. 539–546 vol. 1.
129. M. CHORAS, M. PAWLICKI, D. PUCHALSKI, AND R. KOZIK, Machine learning - the results
are not the only thing that matters! what about security, explainability and fairness?, in ICCS
(4), vol. 12140 of Lecture Notes in Computer Science, Springer, 2020, pp. 615–628.
130. H. CHRISTOPHER FREY AND S. R. PATIL, Identification and review of sensitivity analysis
methods, Risk Analysis, 22 (2002), pp. 553–578.
131. M. CISSE, P. BOJANOWSKI, E. GRAVE, Y. DAUPHIN, AND N. USUNIER, Parseval networks:
Improving robustness to adversarial examples, in Proceedings of the 34th International
Conference on Machine Learning, D. Precup and Y. W. Teh, eds., vol. 70 of Proceedings
of Machine Learning Research, PMLR, 06–11 Aug 2017, pp. 854–863.
132. S. COHEN, G. DROR, AND E. RUPPIN, Feature selection via coalitional game theory, Neural
Comput., 19 (2007), p. 1939–1961.
133. ——, Feature selection via coalitional game theory, Neural Comput., 19 (2007).
134. B. COLSON, P. MARCOTTE, AND G. SAVARD, An overview of bilevel optimization, 2007.
135. P. COMON, X. LUCIANI, AND A. L. F. DE ALMEIDA, Tensor decompositions, alternating
least squares and other tales, Journal of Chemometrics, 23 (2009), pp. 393–405.
136. I. CORONA, G. GIACINTO, AND F. ROLI, Adversarial attacks against intrusion detection
systems: Taxonomy, solutions and open issues, Inf. Sci., 239 (2013), pp. 201–225.
272 References
137. P. CORTEZ AND M. J. EMBRECHTS, Using sensitivity analysis and visualization techniques
to open black box data mining models, Inf. Sci., 225 (2013), p. 1–17.
138. A. COTTER, H. JIANG, AND K. SRIDHARAN, Two-player games for efficient non-convex
constrained optimization, in ALT, vol. 98 of Proceedings of Machine Learning Research,
PMLR, 2019, pp. 300–332.
139. G. CYBENKO, S. JAJODIA, M. P. WELLMAN, AND P. LIU, Adversarial and uncertain
reasoning for adaptive cyber defense: Building the scientific foundation, in ICISS, vol. 8880
of Lecture Notes in Computer Science, Springer, 2014, pp. 1–8.
140. G. DAI, J. XIE, AND Y. FANG, Metric-based generative adversarial network, in Proceedings
of the 2017 ACM on Multimedia Conference, MM ’17, New York, NY, USA, 2017, ACM,
pp. 672–680.
141. H. DAI, H. LI, T. TIAN, X. HUANG, L. WANG, J. ZHU, AND L. SONG, Adversarial attack
on graph structured data, in Proceedings of the 35th International Conference on Machine
Learning, J. Dy and A. Krause, eds., vol. 80 of Proceedings of Machine Learning Research,
PMLR, 10–15 Jul 2018, pp. 1115–1124.
142. N. DALVI, P. DOMINGOS, MAUSAM, S. SANGHAI, AND D. VERMA, Adversarial classifi-
cation, in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, KDD ’04, New York, NY, USA, 2004, ACM, pp. 99–108.
143. P. DANIELE, Dynamic networks and evolutionary variational inequalities / Patrizia Daniele.,
New dimensions in networks, Edward Elgar Pub., Cheltenham, UK ;, 2006.
144. A. DAS AND P. RAD, Opportunities and challenges in explainable artificial intelligence
(XAI): A survey, CoRR, abs/2006.11371 (2020).
145. S. DAS AND P. N. SUGANTHAN, Differential evolution: A survey of the state-of-the-art, IEEE
Transactions on Evolutionary Computation, 15 (2011), pp. 4–31.
146. P. DASGUPTA AND J. B. COLLINS, A survey of game theoretic approaches for adversarial
machine learning in cybersecurity tasks, AI Mag., 40 (2019), pp. 31–43.
147. P. DASGUPTA, J. B. COLLINS, AND A. BUHMAN, Gray-box techniques for adversarial
text generation, in Proceedings of the AAAI Symposium on Adversary-Aware Learning
Techniques and Trends in Cybersecurity (ALEC 2018) co-located with the Association for
the Advancement of Artificial Intelligence 2018 Fall Symposium Series (AAAI-FSS 2018),
Arlington, Virginia, USA, October 18-20, 2018., 2018, pp. 17–23.
148. S. DE SILVA, J. KIM, AND R. RAICH, Cost aware adversarial learning, in ICASSP 2020 -
2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
2020, pp. 3587–3591.
149. K. DEB, Multi-Objective Optimization Using Evolutionary Algorithms, John Wiley & Sons,
Inc., USA, 2001.
150. K. DEB AND D. SAXENA, On finding pareto-optimal solutions through dimensionality reduc-
tion for certain large-dimensional multi-objective optimization problems, IEEE Congress on
Evolutionary Computation, (2005).
151. O. DEKEL, O. SHAMIR, AND L. XIAO, Learning to classify with missing and corrupted
features, Machine Learning Journal, (2009).
152. S. J. DELANY, P. CUNNINGHAM, A. TSYMBAL, AND L. COYLE, A case-based technique for
tracking concept drift in spam filtering, Knowl.-Based Syst., 18 (2005), pp. 187–195.
153. L. DEMETRIO, B. BIGGIO, G. LAGORIO, F. ROLI, AND A. ARMANDO, Explaining vulnera-
bilities of deep learning to adversarial malware binaries, in Proceedings of the Third Italian
Conference on Cyber Security, Pisa, Italy, February 13-15, 2019, 2019.
154. A. DEMONTIS, P. RUSSU, B. BIGGIO, G. FUMERA, AND F. ROLI, On security and sparsity
of linear classifiers for adversarial settings, in Joint IAPR Int’l Workshop on Structural,
Syntactic, and Statistical Pattern Recognition, vol. 10029 of LNCS, Merida, Mexico, 2016,
Springer International Publishing, Springer International Publishing, pp. 322–332.
155. A. DEMONTIS, P. RUSSU, B. BIGGIO, G. FUMERA, AND F. ROLI, On security and sparsity
of linear classifiers for adversarial settings, in Structural, Syntactic, and Statistical Pattern
Recognition, A. Robles-Kelly, M. Loog, B. Biggio, F. Escolano, and R. Wilson, eds., Cham,
2016, Springer International Publishing, pp. 322–332.
References 273
156. L. DENG, Three classes of deep learning architectures and their applications: A tutorial
survey, APSIPA Transactions on Signal and Information Processing, (2012).
157. J. DEVLIN, M.-W. CHANG, K. LEE, AND K. TOUTANOVA, Bert: Pre-training of deep
bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805,
(2018).
158. J. DIANETTI AND G. FERRARI, Nonzero-sum submodular monotone-follower games:
Existence and approximation of nash equilibria, SIAM J. Control. Optim., 58 (2020),
pp. 1257–1288.
159. S. DIRK, Parallel evolutionary algorithms, Springer Handbooks., (2015).
160. P. DOMINGOS, A unified bias-variance decomposition and its applications, in In Proc. 17th
International Conf. on Machine Learning, Morgan Kaufmann, 2000, pp. 231–238.
161. Y. DONG, F. LIAO, T. PANG, H. SU, J. ZHU, X. HU, AND J. LI, Boosting adversarial attacks
with momentum, in Proceedings of the IEEE conference on computer vision and pattern
recognition, June 2018, pp. 9185–9193.
162. Z. DONG AND Q. DONG, Hownet and the computation of meaning, World Scientific, 2006.
163. R. D’ORAZIO, D. MORRILL, J. R. WRIGHT, AND M. BOWLING, Alternative function
approximation parameterizations for solving games: An analysis of undefined-regression
counterfactual regret minimization, in Proceedings of the 19th International Conference
on Autonomous Agents and MultiAgent Systems, AAMAS ’20, Richland, SC, 2020,
International Foundation for Autonomous Agents and Multiagent Systems.
164. M. DOTTER, S. XIE, K. MANVILLE, J. HARGUESS, C. BUSHO, AND M. RODRIGUEZ,
Adversarial attack attribution: Discovering attributable signals in adversarial ML attacks,
CoRR, abs/2101.02899 (2021).
165. L. DRITSOULA, P. LOISEAU, AND J. MUSACCHIO, A game-theoretical approach for finding
optimal strategies in an intruder classification game, in CDC, IEEE, 2012, pp. 7744–7751.
166. R. O. DUDA, P. E. HART, AND D. G. STORK, Pattern Classification (2Nd Edition), Wiley-
Interscience, 2000.
167. E. DUESTERWALD, A. MURTHI, G. VENKATARAMAN, M. SINN, AND D. VIJAYKEERTHY,
Exploring the hyperparameter landscape of adversarial robustness, Safe Machine Learning
workshop at ICLR, (2019).
168. J. EBRAHIMI, D. LOWD, AND D. DOU, On adversarial examples for character-level neural
machine translation, in Proceedings of the 27th International Conference on Computational
Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018, 2018,
pp. 653–663.
169. J. EBRAHIMI, A. RAO, D. LOWD, AND D. DOU, Hotflip: White-box adversarial examples
for text classification, in Proceedings of the 56th Annual Meeting of the Association for
Computational Linguistics, 2018, pp. 31–36.
170. S. EGER, G. G. ŞAHIN, A. RÜCKLÉ, J.-U. LEE, C. SCHULZ, M. MESGAR, K. SWARNKAR,
E. SIMPSON, AND I. GUREVYCH, Text processing like humans do: Visually attacking and
shielding NLP systems, in Proceedings of the 2019 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language Technologies, Volume 1
(Long and Short Papers), Minneapolis, Minnesota, June 2019, Association for Computational
Linguistics, pp. 1634–1647.
171. A. ELITZUR, R. PUZIS, AND P. ZILBERMAN, Attack hypothesis generation, in 2019 European
Intelligence and Security Informatics Conference (EISIC), 2019, pp. 40–47.
172. G. ELSAYED, S. SHANKAR, B. CHEUNG, N. PAPERNOT, A. KURAKIN, I. GOODFEL-
LOW, AND J. S OHL -D ICKSTEIN , Adversarial examples that fool both computer vision and
time-limited humans, in Advances in Neural Information Processing Systems, S. Bengio,
H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, eds., vol. 31,
Curran Associates, Inc., 2018.
173. A. ENGELBRECHT, Sensitivity analysis for decision boundaries, Neural Processing Letters,
10 (2004), pp. 253–266.
174. A. P. ENGELBRECHT, Sensitivity analysis for decision boundaries, Neural Process. Lett., 10
(1999), pp. 253–266.
274 References
2011, Proceedings, Part I, J. Z. Huang, L. Cao, and J. Srivastava, eds., vol. 6634 of Lecture
Notes in Computer Science, Springer, 2011, pp. 13–25.
212. S. GARG AND G. RAMAKRISHNAN, Bae: Bert-based adversarial examples for text classifi-
cation, arXiv preprint arXiv:2004.01970, (2020).
213. X. GE, H. DING, H. RABITZ, AND R.-B. WU, Robust quantum control in games: An
adversarial learning approach, Phys. Rev. A, 101 (2020), p. 052317.
214. R. GEMULLA, E. NIJKAMP, P. J. HAAS, AND Y. SISMANIS, Large-scale matrix factorization
with distributed stochastic gradient descent, in Proceedings of the 17th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, KDD ’11, New York,
NY, USA, 2011, Association for Computing Machinery.
215. A. GHAFOURI, Y. VOROBEYCHIK, AND X. KOUTSOUKOS, Adversarial regression for
detecting attacks in cyber-physical systems, in Proceedings of the 27th International Joint
Conference on Artificial Intelligence, IJCAI’18, AAAI Press, 2018, p. 3769–3775.
216. ——, Adversarial regression for detecting attacks in cyber-physical systems, in Proceedings
of the 27th International Joint Conference on Artificial Intelligence, IJCAI’18, AAAI Press,
2018.
217. G. GIDEL, H. BERARD, G. VIGNOUD, P. VINCENT, AND S. LACOSTE-JULIEN, A variational
inequality perspective on generative adversarial networks, in 7th International Conference
on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, OpenRe-
view.net, 2019.
218. Y. GIL, Y. CHAI, O. GORODISSKY, AND J. BERANT, White-to-black: Efficient distillation of
black-box adversarial attacks, arXiv preprint arXiv:1904.02405, (2019).
219. J. GILMER, R. P. ADAMS, I. J. GOODFELLOW, D. ANDERSEN, AND G. E. DAHL, Motivating
the rules of the game for adversarial example research, CoRR, abs/1807.06732 (2018).
220. A. GLOBERSON AND S. ROWEIS, Nightmare at test time: Robust learning by feature deletion,
in Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, New
York, NY, USA, 2006, ACM, pp. 353–360.
221. D. E. GOLDBERG, Genetic Algorithms in Search, Optimization and Machine Learning,
Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st ed., 1989.
222. ——, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley
Longman Publishing Co., Inc., USA, 1st ed., 1989.
223. J. GOLDBERGER, S. ROWEIS, G. HINTON, AND R. SALAKHUTDINOV, Neighbourhood com-
ponents analysis, in Proceedings of the 17th International Conference on Neural Information
Processing Systems, NIPS’04, Cambridge, MA, USA, 2004, MIT Press.
224. A. GOLDSTEIN, A. KAPELNER, J. BLEICH, AND E. PITKIN, Peeking inside the black box:
Visualizing statistical learning with plots of individual conditional expectation, Journal of
Computational and Graphical Statistics, 24 (2013), pp. 44–65.
225. I. GOODFELLOW, Y. BENGIO, AND A. COURVILLE, Deep Learning, MIT Press, 2016. http://
www.deeplearningbook.org.
226. I. GOODFELLOW, J. POUGET-ABADIE, M. MIRZA, B. XU, D. WARDE-FARLEY, S. OZAIR,
A. COURVILLE, AND Y. BENGIO, Generative adversarial nets, in Advances in neural
information processing systems (NIPS), 2014, pp. 2672–2680.
227. I. GOODFELLOW, J. SHLENS, AND C. SZEGEDY, Explaining and harnessing adversarial
examples, in Proceedings of International Conference on Learning Representations, 2015.
228. I. J. GOODFELLOW, J. SHLENS, AND C. SZEGEDY, Explaining and harnessing adversarial
examples, arXiv preprint arXiv:1412.6572, (2014).
229. S. GORE AND V. GOVINDARAJU, Feature selection using cooperative game theory and
relief algorithm, in Knowledge, Information and Creativity Support Systems: Recent Trends,
Advances and Solutions - Selected Papers from KICSS’2013 - 8th International Conference
on Knowledge, Information, and Creativity Support Systems, November 7-9, 2013, Kraków,
Poland, A. M. J. Skulimowski and J. Kacprzyk, eds., vol. 364 of Advances in Intelligent
Systems and Computing, Springer, 2013, pp. 401–412.
230. A. GOYAL, N. R. KE, A. LAMB, R. D. HJELM, C. PAL, J. PINEAU, AND Y. BENGIO, Actual:
Actor-critic under adversarial learning, CoRR, abs/1711.04755 (2017).
References 277
251. T. HARADA AND E. ALBA, Parallel genetic algorithms: A useful survey, ACM Comput. Surv.,
53 (2020).
252. P. T. HARKER AND J.-S. PANG, Finite-dimensional variational inequality and nonlinear
complementarity problems: A survey of theory, algorithms and applications, Math. Program.,
48 (1990).
253. S. HART AND A. MAS-COLELL, A general class of adaptive strategies., J. Econ. Theory, 98
(2001), pp. 26–54.
254. A. HARTL, M. BACHL, J. FABINI, AND T. ZSEBY, Explainability and adversarial robustness
for rnns, in BigDataService, IEEE, 2020, pp. 148–156.
255. T. B. HASHIMOTO, M. SRIVASTAVA, H. NAMKOONG, AND P. LIANG, Fairness without
demographics in repeated loss minimization, in Proceedings of the 35th International
Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July
10-15, 2018, J. G. Dy and A. Krause, eds., vol. 80 of Proceedings of Machine Learning
Research, PMLR, 2018, pp. 1934–1943.
256. M. M. HASSAN, M. R. HASSAN, S. HUDA, AND V. H. C. DE ALBUQUERQUE, A robust deep-
learning-enabled trust-boundary protection for adversarial industrial iot environment, IEEE
Internet of Things Journal, 8 (2021), pp. 9611–9621.
257. T. HASTIE, R. TIBSHIRANI, AND J. FRIEDMAN, The elements of statistical learning – data
mining, inference, and prediction.
258. M. HAUSCHILD AND M. PELIKAN, An introduction and survey of estimation of distribution
algorithms, Swarm and Evolutionary Computation, 1 (2011), pp. 111–128.
259. J. HAYES AND G. DANEZIS, Generating steganographic images via adversarial training, in
NIPS, 2017, pp. 1954–1963.
260. J. HAYES AND G. DANEZIS, Machine learning as an adversarial service: Learning black-box
adversarial examples, arXiv preprint arXiv:1708.05207, 2 (2017).
261. E. HAZAN, K. SINGH, AND C. ZHANG, Efficient regret minimization in non-convex games,
in Proceedings of the 34th International Conference on Machine Learning - Volume 70,
ICML’17, JMLR.org, 2017.
262. D. HE, W. CHEN, L. WANG, AND T.-Y. LIU, A game- heoretic machine learning approach for
revenue maximization in sponsored search, in Proceedings of the Twenty-Third International
Joint Conference on Artificial Intelligence, IJCAI ’13, AAAI Press, 2013.
263. K. HE, X. ZHANG, S. REN, AND J. SUN, Deep residual learning for image recognition,
in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016,
pp. 770–778.
264. W. HE, J. WEI, X. CHEN, N. CARLINI, AND D. SONG, Adversarial example defenses:
Ensembles of weak defenses are not strong, in Proceedings of the 11th USENIX Conference
on Offensive Technologies, WOOT’17, USA, 2017, USENIX Association, p. 15.
265. X. HE AND T.-S. CHUA, Neural factorization machines for sparse predictive analytics,
Proceedings of the 40th International ACM SIGIR Conference on Research and Development
in Information Retrieval, (2017).
266. X. HE, Z. HE, X. DU, AND T.-S. CHUA, Adversarial personalized ranking for recommen-
dation, in The 41st International ACM SIGIR Conference on Research & Development in
Information Retrieval, SIGIR ’18, New York, NY, USA, 2018, Association for Computing
Machinery, p. 355–364.
267. J. HEINRICH, M. LANCTOT, AND D. SILVER, Fictitious self-play in extensive-form games, in
Proceedings of the 32nd International Conference on Machine Learning, F. Bach and D. Blei,
eds., vol. 37 of Proceedings of Machine Learning Research, Lille, France, 07–09 Jul 2015,
PMLR, pp. 805–813.
268. J. HEINRICH AND D. SILVER, Deep reinforcement learning from self-play in imperfect-
information games, ArXiv, abs/1603.01121 (2016).
269. J. C. HELTON AND F. J. DAVIS, Sampling-based methods for uncertainty and sensitivity
analysis., 2000.
270. D. HENDERSON, S. JACOBSON, AND A. JOHNSON, The Theory and Practice of Simulated
Annealing, 04 2006, pp. 287–319.
References 279
309. M. KANTARCIO ĞLU, B. XI, AND C. CLIFTON, Classifier evaluation and attribute selection
against active adversaries, Data Mining and Knowledge Discovery, 22 (2011), pp. 291–335.
310. M. KANTARCIOGLU, B. XI, AND C. CLIFTON, Classifier evaluation and attribute selection
against active adversaries, Data Min. Knowl. Discov., 22 (2011), pp. 291–335.
311. Z. KATZIR AND Y. ELOVICI, Quantifying the resilience of machine learning classifiers used
for cyber security, Expert Systems with Applications, 92 (2018), pp. 419–429.
312. H. KAZEMIAN AND S. AHMED, Comparisons of machine learning techniques for detecting
malicious webpages, Expert Syst. Appl., 42 (2015), pp. 1166–1177.
313. C. T. KELLEY, Iterative methods for optimization, Frontiers in applied mathematics, SIAM,
1999.
314. R. O. KEOHANE, Counterfactuals and Causal Inference: Methods and Principles for Social
Research By Stephen E. Morgan and Christopher Winship Cambridge University Press. 2007.
319 pages. 83.99 cloth, 28.99 paper, Social Forces, 88 (2009), pp. 466–467.
315. T. KIM, M. CHA, H. KIM, J. K. LEE, AND J. KIM, Learning to discover cross-domain relations
with generative adversarial networks, in Proceedings of the 34th International Conference on
Machine Learning - Volume 70, ICML’17, JMLR.org, 2017.
316. D. P. KINGMA AND M. WELLING, Auto-encoding variational bayes, in 2nd International
Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014,
Conference Track Proceedings, Y. Bengio and Y. LeCun, eds., 2014.
317. J. KLEINBERG, C. PAPADIMITRIOU, AND P. RAGHAVAN, A microeconomic view of data
mining, 1998.
318. M. KLOFT AND P. LASKOV, Online anomaly detection under adversarial impact, in Proceed-
ings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Y. W.
Teh and M. Titterington, eds., vol. 9 of Proceedings of Machine Learning Research, Chia
Laguna Resort, Sardinia, Italy, 13–15 May 2010, PMLR, pp. 405–412.
319. M. KOCAOGLU, C. SNYDER, A. G. DIMAKIS, AND S. VISHWANATH, CausalGAN: Learning
causal implicit generative models with adversarial training, in International Conference on
Learning Representations, 2018.
320. P. W. KOH AND P. LIANG, Understanding black-box predictions via influence functions,
in Proceedings of the 34th International Conference on Machine Learning - Volume 70,
ICML’17, JMLR.org, 2017.
321. A. KOŁCZ AND C. H. TEO, Feature Weighting for Improved Classifier Robustness, in Proc.
6th Conf. on Email and Anti-Spam, July 2009.
322. S. KOMKOV AND A. PETIUSHKO, Advhat: Real-world adversarial attack on arcface face id
system, arXiv preprint arXiv:1908.08705, (2019).
323. V. KONDA AND J. TSITSIKLIS, Actor-critic algorithms, in Advances in Neural Information
Processing Systems, S. Solla, T. Leen, and K. Müller, eds., vol. 12, MIT Press, 2000.
324. V. KÖNÖNEN, Asymmetric multiagent reinforcement learning, in 2003 IEEE/WIC Interna-
tional Conference on Intelligent Agent Technology (IAT 2003), 13-17 October 2003, Halifax,
Canada, IEEE Computer Society, 2003, pp. 336–342.
325. J. KOS, I. FISCHER, AND D. SONG, Adversarial examples for generative models, in
Proceedings of 2018 IEEE Security and Privacy Workshops (SPW), 2018.
326. J. KOS, I. FISCHER, AND D. SONG, Adversarial examples for generative models, 2018 IEEE
Security and Privacy Workshops (SPW), (2018), pp. 36–42.
327. J. KOS AND D. SONG, Delving into adversarial attacks on deep policies, in 5th International
Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017,
Workshop Track Proceedings, 2017.
328. S. KOZIEL, Computational Optimization, Methods and Algorithms, Springer Publishing
Company, Incorporated, 2016.
329. A. KRAUSE, P. PERONA, AND R. GOMES, Discriminative clustering by regularized infor-
mation maximization, in Advances in Neural Information Processing Systems, J. Lafferty,
C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, eds., vol. 23, Curran Associates,
Inc., 2010.
282 References
348. T. LE, S. WANG, AND D. LEE, MALCOM: generating malicious comments to attack neural
fake news detection models, CoRR, abs/2009.01048 (2020).
349. Y. LECUN, S. CHOPRA, R. HADSELL, F. J. HUANG, AND ET AL., A tutorial on energy-based
learning, in PREDICTING STRUCTURED DATA, MIT Press, 2006.
350. Y. LECUN AND F. HUANG, Loss functions for discriminative training of energy-based models,
in AISTATS 2005 - Proceedings of the 10th International Workshop on Artificial Intelligence
and Statistics, 2005, pp. 206–213.
351. S. LEDESMA, G. AVINA, AND R. SANCHEZ, Practical considerations for simulated anneal-
ing implementation, in Simulated Annealing, C. M. Tan, ed., IntechOpen, Rijeka, 2008,
ch. 20.
352. K. LEYTON-BROWN AND Y. SHOHAM, vol. 2, 2008.
353. G. L’HUILLIER, R. WEBER, AND N. FIGUEROA, Online phishing classification using adver-
sarial data mining and signaling games, in Proceedings of the ACM SIGKDD Workshop
on CyberSecurity and Intelligence Informatics, CSI-KDD ’09, New York, NY, USA, 2009,
Association for Computing Machinery.
354. B. LI AND Y. VOROBEYCHIK, Feature cross-substitution in adversarial classification,
in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling,
C. Cortes, N. D. Lawrence, and K. Q. Weinberger, eds., Curran Associates, Inc., 2014,
pp. 2087–2095.
355. ——, Scalable Optimization of Randomized Operational Decisions in Adversarial Classi-
fication Settings, in Proceedings of the Eighteenth International Conference on Artificial
Intelligence and Statistics, G. Lebanon and S. V. N. Vishwanathan, eds., vol. 38 of Proceed-
ings of Machine Learning Research, San Diego, California, USA, 09–12 May 2015, PMLR,
pp. 599–607.
356. B. LI AND Y. VOROBEYCHIK, Scalable Optimization of Randomized Operational Decisions
in Adversarial Classification Settings, in Proceedings of the Eighteenth International Confer-
ence on Artificial Intelligence and Statistics, G. Lebanon and S. V. N. Vishwanathan, eds.,
vol. 38 of Proceedings of Machine Learning Research, San Diego, California, USA, 09–12
May 2015, PMLR, pp. 599–607.
357. B. LI AND Y. VOROBEYCHIK, Evasion-robust classification on binary domains, 12 (2018).
358. C. LI, H. FARKHOOR, R. LIU, AND J. YOSINSKI, Measuring the intrinsic dimension of
objective landscapes, in International Conference on Learning Representations, 2018.
359. D. LI, Y. ZHANG, H. PENG, L. CHEN, C. BROCKETT, M.-T. SUN, AND B. DOLAN,
Contextualized perturbation for textual adversarial attack, arXiv preprint arXiv:2009.07502,
(2020).
360. H. LI, X. XU, X. ZHANG, S. YANG, AND B. LI, Qeba: Query-efficient boundary-based
blackbox attack, in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), June 2020.
361. H. LI, S. ZHOU, W. YUAN, X. LUO, C. GAO, AND S. CHEN, Robust android malware
detection against adversarial example attacks, in Proceedings of the Web Conference 2021,
WWW ’21, New York, NY, USA, 2021, Association for Computing Machinery, p. 3603–
3612.
362. J. LI, T. CAI, K. DENG, X. WANG, T. SELLIS, AND F. XIA, Community-diversified influence
maximization in social networks, Information Systems, 92 (2020), p. 101522.
363. J. LI, S. JI, T. DU, B. LI, AND T. WANG, Textbugger: Generating adversarial text against
real-world applications, arXiv preprint arXiv:1812.05271, (2018).
364. L. LI, W. CHU, J. LANGFORD, AND R. E. SCHAPIRE, A contextual-bandit approach
to personalized news article recommendation, in Proceedings of the 19th International
Conference on World Wide Web, WWW ’10, New York, NY, USA, 2010, Association for
Computing Machinery.
365. L. LI, R. MA, Q. GUO, X. XUE, AND X. QIU, BERT-ATTACK: Adversarial attack against
BERT using BERT, in Proceedings of the 2020 Conference on Empirical Methods in Natural
Language Processing (EMNLP), 2020, pp. 6193–6202.
284 References
366. T. LI AND L. LIN, Anonymousnet: Natural face de-identification with measurable privacy,
in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Workshops, 2019, pp. 0–0.
367. H. LIAGHATI, T. MAZZUCHI, AND S. SARKANI, Utilizing a maximin optimization approach
to maximize system resiliency, Systems Engineering, 24 (2021).
368. B. LIANG, H. LI, M. SU, P. BIAN, X. LI, AND W. SHI, Deep text classification can be fooled,
IJCAI, (2018).
369. X. LIANG AND Y. XIAO, Game theory for network security, IEEE Communications Surveys
Tutorials, 15 (2013), pp. 472–486.
370. K. LIN, D. LI, X. HE, M. SUN, AND Z. ZHANG, Adversarial ranking for language generation,
in NIPS, 2017, pp. 3155–3165.
371. S. LIN, Rank aggregation methods, WIREs Computational Statistics, 2 (2010), pp. 555–570.
372. Y.-C. LIN, Z.-W. HONG, Y.-H. LIAO, M.-L. SHIH, M.-Y. LIU, AND M. SUN, Tactics
of adversarial attack on deep reinforcement learning agents, in Proceedings of the 26th
International Joint Conference on Artificial Intelligence, IJCAI’17, AAAI Press, 2017,
pp. 3756–3762.
373. M. LIPPI, Statistical relational learning for game theory, IEEE Transactions on Computa-
tional Intelligence and AI in Games, 8 (2015), pp. 1–1.
374. M. L. LITTMAN, Markov games as a framework for multi-agent reinforcement learning,
in In Proceedings of the Eleventh International Conference on Machine Learning, Morgan
Kaufmann, 1994, pp. 157–163.
375. B. LIU, M. DING, S. SHAHAM, W. RAHAYU, F. FAROKHI, AND Z. LIN, When machine
learning meets privacy, ACM Computing Surveys, 54 (2021), pp. 1–36.
376. B. LIU, M. DING, T. ZHU, Y. XIANG, AND W. ZHOU, Adversaries or allies? privacy and deep
learning in big data era, Concurrency and Computation: Practice and Experience, p. e5102.
377. B. LIU, J. XIONG, Y. WU, M. DING, AND C. M. WU, Protecting multimedia privacy from
both humansand ai, in in Proc. IEEE International Symposium on Broadband Multimedia
Systems and Broadcasting, 2019.
378. C. LIU, B. LI, Y. VOROBEYCHIK, AND A. OPREA, Robust linear regression against training
data poisoning, in Proceedings of the 10th ACM Workshop on Artificial Intelligence and
Security, AISec ’17, New York, NY, USA, 2017, ACM, pp. 91–102.
379. D. C. LIU AND J. NOCEDAL, On the limited memory bfgs method for large scale optimization,
Mathematical programming, 45 (1989), pp. 503–528.
380. L. LIU, Y. LUO, H. HU, Y. WEN, D. TAO, AND X. YAO, xtml: A unified heterogeneous transfer
metric learning framework for multimedia applications [application notes], IEEE Comput.
Intell. Mag., 15 (2020), pp. 78–88.
381. N. LIU, H. YANG, AND X. HU, Adversarial detection with model interpretation, in Proceed-
ings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, KDD ’18, New York, NY, USA, 2018, Association for Computing Machinery.
382. ——, Adversarial detection with model interpretation, in Proceedings of the 24th ACM
SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’18, New
York, NY, USA, 2018, Association for Computing Machinery.
383. Q. LIU, P. LI, W. ZHAO, W. CAI, S. YU, AND V. C. M. LEUNG, A survey on security threats
and defensive techniques of machine learning: A data driven view, IEEE Access, 6 (2018),
pp. 12103–12117.
384. W. LIU AND S. CHAWLA, A game theoretical model for adversarial learning, in 2009 IEEE
International Conference on Data Mining Workshops, 2009, pp. 25–30.
385. ——, Mining adversarial patterns via regularized loss minimization, Machine Learning, 81
(2010), pp. 69–83.
386. ——, Mining adversarial patterns via regularized loss minimization, Mach. Learn., 81
(2010), pp. 69–83.
387. W. LIU, S. CHAWLA, J. BAILEY, C. LECKIE, AND K. RAMAMOHANARAO, AI 2012:
Advances in Artificial Intelligence: 25th Australasian Joint Conference, Sydney, Australia,
December 4-7, 2012. Proceedings, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012,
References 285
406. M. T. MAMOUN ALAZAB, Deep Learning Applications for Cyber Security (Advanced
Sciences and Technologies for Security Applications), Springer Nature, Switzerland AG,
2019.
407. M. MANCINI, L. PORZI, S. BULO, B. CAPUTO, AND E. RICCI, Boosting domain adaptation
by discovering latent domains, in 2018 IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), Los Alamitos, CA, USA, jun 2018, IEEE Computer Society,
pp. 3771–3780.
408. A. MANDLEKAR, Y. ZHU, A. GARG, L. FEI-FEI, AND S. SAVARESE, Adversarially robust
policy learning: Active construction of physically-plausible perturbations, in 2017 IEEE/RSJ
International Conference on Intelligent Robots and Systems, IROS 2017, Vancouver, BC,
Canada, September 24-28, 2017, 2017, pp. 3932–3939.
409. M. H. MANSHAEI, Q. ZHU, T. ALPCAN, T. BAC ŞAR, AND J.-P. HUBAUX, Game theory meets
network security and privacy, ACM Comput. Surv., 45 (2013).
410. X. MAO, Q. LI, H. XIE, R. Y. K. LAU, Z. WANG, AND S. P. SMOLLEY, Least squares
generative adversarial networks, in ICCV, IEEE Computer Society, 2017, pp. 2813–2821.
411. D. L. MARINO, C. S. WICKRAMASINGHE, AND M. MANIC, An adversarial approach for
explainable AI in intrusion detection systems, in IECON 2018 - 44th Annual Conference of
the IEEE Industrial Electronics Society, Washington, DC, USA, October 21-23, 2018, 2018,
pp. 3237–3243.
412. O. MARTIN AND S. OTTO, Combining simulated annealing with local search heuristics,
Annals of Operations Research, 63 (1999).
413. O. C. MARTIN AND S. W. OTTO, Combining simulated annealing with local search
heuristics, tech. rep., 1993.
414. N. MARTINS, J. M. CRUZ, T. CRUZ, AND P. HENRIQUES ABREU, Adversarial machine
learning applied to intrusion and malware scenarios: A systematic review, IEEE Access,
8 (2020), pp. 35403–35419.
415. H. MASNADI-SHIRAZI AND N. VASCONCELOS, On the design of loss functions for classi-
fication: theory, robustness to outliers, and savageboost, in Advances in Neural Information
Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, eds., Curran
Associates, Inc., 2009, pp. 1049–1056.
416. F. MATERN, C. RIESS, AND M. STAMMINGER, Exploiting visual artifacts to expose deepfakes
and face manipulations, in 2019 IEEE Winter Applications of Computer Vision Workshops
(WACVW), Jan 2019, pp. 83–92.
417. R. R. MCCUNE, T. WENINGER, AND G. MADEY, Thinking like a vertex: A survey of vertex-
centric frameworks for large-scale distributed graph processing, ACM Comput. Surv., 48
(2015).
418. J. V. MEDANIC AND D. G. RADOJEVIC, Multilevel stackelberg strategies in linear-quadratic
systems, Journal of Optimization Theory and Applications, 24 (1978), pp. 485–497.
419. S. MEI AND X. ZHU, The Security of Latent Dirichlet Allocation, in Proceedings of the
Eighteenth International Conference on Artificial Intelligence and Statistics, G. Lebanon and
S. V. N. Vishwanathan, eds., vol. 38 of Proceedings of Machine Learning Research, San
Diego, California, USA, 09–12 May 2015, PMLR, pp. 681–689.
420. M. MELIS, A. DEMONTIS, B. BIGGIO, G. BROWN, G. FUMERA, AND F. ROLI, Is deep
learning safe for robot vision? adversarial examples against the icub humanoid, in 2017 IEEE
International Conference on Computer Vision Workshops, ICCV Workshops 2017, Venice,
Italy, October 22–29, 2017, 2017, pp. 751–759.
421. M. MELIS, D. MAIORCA, B. BIGGIO, G. GIACINTO, AND F. ROLI, Explaining black-box
android malware detection, in 26th European Signal Processing Conference, EUSIPCO 2018,
Roma, Italy, September 3-7, 2018, 2018, pp. 524–528.
422. D. MENG AND H. CHEN, Magnet: A two-pronged defense against adversarial examples,
in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications
Security, CCS ’17, New York, NY, USA, 2017, Association for Computing Machinery,
p. 135–147.
References 287
Asia Conference on Computer and Communications Security, ASIA CCS ’17, New York, NY,
USA, 2017, ACM, pp. 506–519.
482. N. PAPERNOT, P. MCDANIEL, A. SINHA, AND M. P. WELLMAN, Sok: Security and privacy
in machine learning, in 2018 IEEE European Symposium on Security and Privacy (EuroS P),
April 2018, pp. 399–414.
483. N. PAPERNOT, P. MCDANIEL, A. SWAMI, AND R. HARANG, Crafting adversarial input
sequences for recurrent neural networks, in IEEE Military Communications Conference,
2016, pp. 49–54.
484. N. PAPERNOT, P. MCDANIEL, X. WU, S. JHA, AND A. SWAMI, Distillation as a defense to
adversarial perturbations against deep neural networks, 2016 IEEE Symposium on Security
and Privacy (SP), (2016), pp. 582–597.
485. N. PAPERNOT, P. D. MCDANIEL, A. SINHA, AND M. P. WELLMAN, Towards the science of
security and privacy in machine learning, CoRR, abs/1611.03814 (2016).
486. M. J. A. PATWARY AND X. WANG, Sensitivity analysis on initial classifier accuracy in
fuzziness based semi-supervised learning, Inf. Sci., 490 (2019), pp. 93–112.
487. J. PAWLICK, E. COLBERT, AND Q. ZHU, A game-theoretic taxonomy and survey of defensive
deception for cybersecurity and privacy, ACM Comput. Surv., 52 (2019).
488. G. PEAKE AND J. WANG, Explanation mining: Post hoc interpretability of latent factor
models for recommendation systems, in Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, KDD ’18, New York, NY, USA,
2018, Association for Computing Machinery.
489. M. PELIKAN, D. E. GOLDBERG, AND E. CANTU-PAZ, Linkage problem, distribution
estimation, and bayesian networks, Evolutionary Computation, 8 (2000), pp. 311–340.
490. M. PERC AND A. SZOLNOKI, Coevolutionary games - A mini review, Biosyst., 99 (2010),
pp. 109–125.
491. C. PERLICH, F. PROVOST, AND J. S. SIMONOFF, Tree induction vs. logistic regression: A
learning-curve analysis, 4 (2003).
492. D. PFAU AND O. VINYALS, Connecting generative adversarial networks and actor-critic
methods, CoRR, abs/1610.01945 (2016).
493. S. PIDHORSKYI, D. A. ADJEROH, AND G. DORETTO, Adversarial latent autoencoders, in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
2020, pp. 14104–14113.
494. L. PINTO, J. DAVIDSON, R. SUKTHANKAR, AND A. GUPTA, Robust adversarial reinforce-
ment learning, in Proceedings of the 34th International Conference on Machine Learning,
ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, 2017, pp. 2817–2826.
495. M. PIRLOT, General local search methods, European Journal of Operational Research, 92
(1996), pp. 493–511.
496. B. POOLE, A. ALEMI, J. SOHL-DICKSTEIN, AND A. ANGELOVA, Improved generator
objectives for gans, 2016.
497. A. PRAKASH AND M. P. WELLMAN, Empirical game-theoretic analysis for moving target
defense, in MTD@CCS, ACM, 2015, pp. 57–65.
498. D. PRUTHI, B. DHINGRA, AND Z. C. LIPTON, Combating adversarial misspellings with
robust word recognition, in Proceedings of the 57th Annual Meeting of the Association
for Computational Linguistics, Florence, Italy, July 2019, Association for Computational
Linguistics, pp. 5582–5591.
499. Z. S. W. C. QIU S, LIU Q, Review of artificial intelligence adversarial attack and defense
technologies, in MDPI Applied Sciences, 2019, p. 9(5):909.
500. A. RADFORD, L. METZ, AND S. CHINTALA, Unsupervised representation learning with deep
convolutional generative adversarial networks.
501. A. RAGHUNATHAN, J. STEINHARDT, AND P. LIANG, Certified defenses against adversarial
examples, in ICLR (Poster), OpenReview.net, 2018.
502. A. RAJABI, R. B. BOBBA, M. ROSULEK, C. V. WRIGHT, AND W.-C. FENG, On the (im)
practicality of adversarial perturbation for image privacy, Proceedings on Privacy Enhancing
Technologies, 2021 (2021), pp. 85–106.
References 291
503. S. RAJASEKARAN, On simulated annealing and nested annealing, Journal of Global Opti-
mization, 16 (2000), pp. 43–56.
504. A. RAKHLIN AND K. SRIDHARAN, Optimization, learning, and games with predictable
sequences, in NIPS, 2013, pp. 3066–3074.
505. D. RAM, T. SREENIVAS, AND K. SUBRAMANIAM, Parallel simulated annealing algorithms,
J. Parallel Distrib. Comput., 37 (1996), p. 207–212.
506. S. RASS, S. KONIG, AND S. SCHAUER, Defending against advanced persistent threats using
game-theory, PLOS ONE, 12 (2017), pp. 1–43.
507. L. RATLIFF, S. BURDEN, AND S. SASTRY, Characterization and computation of local nash
equilibria in continuous games, 10 2013, pp. 917–924.
508. J. RAUBER, W. BRENDEL, AND M. BETHGE, Foolbox: A python toolbox to benchmark the
robustness of machine learning models, in Reliable Machine Learning in the Wild Workshop,
34th International Conference on Machine Learning, 2017.
509. S. REN, Y. DENG, K. HE, AND W. CHE, Generating natural language adversarial examples
through probability weighted word saliency, in Proceedings of the 57th Annual Meeting of
the Association for Computational Linguistics, 2019, pp. 1085–1097.
510. S. REN, K. HE, R. B. GIRSHICK, AND J. SUN, Faster R-CNN: towards real-time object
detection with region proposal networks, CoRR, abs/1506.01497 (2015).
511. S. RENDLE, Factorization machines, in 2010 IEEE International Conference on Data Mining,
2010, pp. 995–1000.
512. I. REZEK, D. S. LESLIE, S. REECE, S. J. ROBERTS, A. ROGERS, R. K. DASH, AND N. R.
JENNINGS, On similarities between inference in game theory and machine learning, J. Artif.
Int. Res., 33 (2008), p. 259–283.
513. I. REZEK, D. S. LESLIE, S. REECE, S. J. ROBERTS, A. ROGERS, R. K. DASH, AND N. R.
JENNINGS, On similarities between inference in game theory and machine learning, J. Artif.
Intell. Res., 33 (2008).
514. M. T. RIBEIRO, S. SINGH, AND C. GUESTRIN, Why should i trust you?: Explaining the
predictions of any classifier, in Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, KDD ’16, New York, NY, USA,
2016, Association for Computing Machinery, p. 1135–1144.
515. ——, Why should i trust you?: Explaining the predictions of any classifier, in Proceedings
of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, KDD ’16, New York, NY, USA, 2016, Association for Computing Machinery,
p. 1135–1144.
516. M. T. RIBEIRO, S. SINGH, AND C. GUESTRIN, Anchors: High-precision model-agnostic
explanations, in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelli-
gence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and
the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18),
New Orleans, Louisiana, USA, February 2-7, 2018, 2018, pp. 1527–1535.
517. P. RICHTARIK, M. JAHANI, S. D. AHIPASAOGLU, AND M. TAKAC, Alternating maximiza-
tion: unifying framework for 8 sparse pca formulations and efficient parallel codes, 2020.
518. D. RIOS INSUA, R. NAVEIRO, AND V. GALLEGO, Perspectives on adversarial classification,
Mathematics, 8 (2020).
519. ——, Perspectives on adversarial classification, Mathematics, 8 (2020).
520. T. ROEDER AND F. B. SCHNEIDER, Proactive obfuscation, ACM Trans. Comput. Syst., 28
(2010), pp. 4:1–4:54.
521. Y. ROMANO, A. ABERDAM, J. SULAM, AND M. ELAD, Adversarial noise attacks of deep
learning architectures: Stability analysis via sparse-modeled signals, Journal of Mathemati-
cal Imaging and Vision, 62 (2020).
522. Y. ROMANO, A. ABERDAM, J. SULAM, AND M. ELAD, Adversarial noise attacks of deep
learning architectures: Stability analysis via sparse-modeled signals, J. Math. Imaging Vis.,
62 (2020), pp. 313–327.
523. J. ROMERO AND A. ASPURU-GUZIK, Variational quantum generators: Generative adversar-
ial quantum machine learning for continuous distributions, Advanced Quantum Technolo-
gies, 4 (2020), p. 2000003.
292 References
524. L. ROSASCO, E. DE VITO, A. CAPONNETTO, M. PIANA, AND A. VERRI, Are loss functions
all the same?, Neural Comput., 16 (2004).
525. K. ROSE, Deterministic annealing for clustering, compression, classification, regression, and
related optimization problems, Proceedings of the IEEE, 86 (1998), pp. 2210–2239.
526. I. ROSENBERG, A. SHABTAI, L. ROKACH, AND Y. ELOVICI, Generic black-box end-to-end
attack against state of the art API call based malware classifiers, in Research in Attacks,
Intrusions, and Defenses - 21st International Symposium, RAID 2018, Heraklion, Crete,
Greece, September 10-12, 2018, Proceedings, 2018, pp. 490–510.
527. A. ROSSLER, D. COZZOLINO, L. VERDOLIVA, C. RIESS, J. THIES, AND M. NIESSNER,
Faceforensics++: Learning to detect manipulated facial images, CoRR, abs/1901.08971
(2019).
528. S. ROTA BULO, B. BIGGIO, I. PILLAI, M. PELILLO, AND F. ROLI, Randomized prediction
games for adversarial machine learning, IEEE Transactions on Neural Networks and
Learning Systems, 28 (2017), pp. 2466–2478.
529. B. D. ROUHANI, M. SAMRAGH, M. JAVAHERIPI, T. JAVIDI, AND F. KOUSHANFAR,
Deepfense: Online accelerated defense against adversarial deep learning, in Proceedings of
the International Conference on Computer-Aided Design, ICCAD ’18, New York, NY, USA,
2018, Association for Computing Machinery.
530. S. ROY, C. ELLIS, S. SHIVA, D. DASGUPTA, V. SHANDILYA, AND Q. WU, A survey of game
theory as applied to network security, in 2010 43rd Hawaii International Conference on
System Sciences, 2010, pp. 1–10.
531. B. I. RUBINSTEIN, P. L. BARTLETT, L. HUANG, AND N. TAFT, Learning in a large
function space: Privacy-preserving mechanisms for svm learning, Journal of Privacy and
Confidentiality, Vol.4 : Iss.1, Article 4. (2009).
532. B. I. RUBINSTEIN, B. NELSON, L. HUANG, A. D. JOSEPH, S.-H. LAU, S. RAO, N. TAFT,
AND J. D. T YGAR , Antidote: Understanding and defending against poisoning of anomaly
detectors, in Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement,
IMC ’09, New York, NY, USA, 2009, ACM, pp. 1–14.
533. C. RUDIN, Stop explaining black box machine learning models for high stakes decisions and
use interpretable models instead, Nature Machine Intelligence, 1 (2019), pp. 206–215.
534. K. SADEGHI, A. BANERJEE, AND S. K. S. GUPTA, A system-driven taxonomy of attacks
and defenses in adversarial machine learning, IEEE Transactions on Emerging Topics in
Computational Intelligence, 4 (2020), pp. 450–467.
535. P. SAMANGOUEI, M. KABKAB, AND R. CHELLAPPA, Defense-gan: Protecting classifiers
against adversarial attacks using generative models.
536. S. SAMANTA AND S. MEHTA, Generating adversarial text samples, in Advances in Informa-
tion Retrieval - 40th European Conference on IR Research, ECIR 2018, Grenoble, France,
March 26-29, 2018, Proceedings, 2018, pp. 744–749.
537. W. SAMEK, G. MONTAVON, S. LAPUSCHKIN, C. J. ANDERS, AND K. MÜLLER, Explaining
deep neural networks and beyond: A review of methods and applications, Proc. IEEE, 109
(2021), pp. 247–278.
538. W. SAMEK, G. MONTAVON, A. VEDALDI, L. K. HANSEN, AND K. MÜLLER, eds., Explain-
able AI: Interpreting, Explaining and Visualizing Deep Learning, vol. 11700 of Lecture Notes
in Computer Science, Springer, 2019.
539. S. SANKARANARAYANAN, Y. BALAJI, C. D. CASTILLO, AND R. CHELLAPPA, Generate
to adapt: Aligning domains using generative adversarial networks, in 2018 IEEE/CVF
Conference on Computer Vision and Pattern Recognition, 2018, pp. 8503–8512.
540. T. SCHLEGL, P. SEEBÖCK, S. M. WALDSTEIN, U. SCHMIDT-ERFURTH, AND G. LANGS,
Unsupervised anomaly detection with generative adversarial networks to guide marker
discovery, (2017), pp. 146–157.
541. A. SCHLENKER, O. THAKOOR, H. XU, F. FANG, M. TAMBE, L. TRAN-THANH, P. VAYANOS,
AND Y. VOROBEYCHIK , Deceiving cyber adversaries: A game theoretic approach, in
Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent
Systems, AAMAS ’18, Richland, SC, 2018, International Foundation for Autonomous Agents
and Multiagent Systems, p. 892–900.
References 293
558. K. SIMONYAN AND A. ZISSERMAN, Very deep convolutional networks for large-scale image
recognition, arXiv preprint arXiv:1409.1556, (2014).
559. A. SINHA, P. MALO, AND K. DEB, A review on bilevel optimization: From classical to
evolutionary approaches and applications, IEEE Transactions on Evolutionary Computation,
22 (2018), pp. 276–295.
560. A. SINHA, H. NAMKOONG, AND J. C. DUCHI, Certifying some distributional robustness with
principled adversarial training, in 6th International Conference on Learning Representations,
ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings,
OpenReview.net, 2018.
561. C. K. SØ NDERBY, T. RAIKO, L. MAALØ E, S. R. K. SØ NDERBY, AND O. WINTHER, Ladder
variational autoencoders, in Advances in Neural Information Processing Systems, D. Lee,
M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, eds., vol. 29, Curran Associates, Inc.,
2016.
562. C. K. SØNDERBY, T. RAIKO, L. MAALØE, S. K. SØNDERBY, AND O. WINTHER, Ladder
variational autoencoders, in Proceedings of the 30th International Conference on Neural
Information Processing Systems, NIPS’16, Red Hook, NY, USA, 2016, Curran Associates
Inc.
563. J. SONG, H. REN, D. SADIGH, AND S. ERMON, Multi-agent generative adversarial imitation
learning, in Proceedings of the 32nd International Conference on Neural Information
Processing Systems, NIPS’18, Red Hook, NY, USA, 2018, Curran Associates Inc., p. 7472–
7483.
564. L. SONG, X. YU, H.-T. PENG, AND K. NARASIMHAN, Universal adversarial attacks with
natural triggers for text classification, arXiv preprint arXiv:2005.00174, (2020).
565. Y. SONG, T. KIM, S. NOWOZIN, S. ERMON, AND N. KUSHMAN, Pixeldefend: Leveraging
generative models to understand and defend against adversarial examples, in International
Conference on Learning Representations, 2018.
566. J. C. SPALL, Introduction to Stochastic Search and Optimization, John Wiley & Sons, Inc.,
New York, NY, USA, 1 ed., 2003.
567. P. SPIRTES, C. N. GLYMOUR, AND R. SCHEINES, Causation, prediction, and search, MIT
press, 2000.
568. P. SPRECHMANN AND G. SAPIRO, Dictionary learning and sparse coding for unsupervised
clustering, in 2010 IEEE International Conference on Acoustics, Speech and Signal Process-
ing, 2010, pp. 2042–2045.
569. A. SPURR, E. AKSAN, AND O. HILLIGES, Guiding infogan with semi-supervision, in
ECML/PKDD (1), vol. 10534 of Lecture Notes in Computer Science, Springer, 2017,
pp. 119–134.
570. S. SRA, S. NOWOZIN, AND S. J. WRIGHT, Optimization for Machine Learning, The MIT
Press, 2011.
571. N. SREBRO AND T. JAAKKOLA, Weighted low-rank approximations, in ICML, 2003.
572. B. K. SRIPERUMBUDUR, K. FUKUMIZU, A. GRETTON, B. SCHÖLKOPF, AND G. R. G.
LANCKRIET, On the empirical estimation of integral probability metrics, Electronic Journal
of Statistics, 6 (2012), pp. 1550–1599.
573. E. STRUMBELJ AND I. KONONENKO, An efficient explanation of individual classifications
using game theory, J. Mach. Learn. Res., 11 (2010).
574. ——, An efficient explanation of individual classifications using game theory, J. Mach. Learn.
Res., 11 (2010).
575. J. SU, Y. TSAI, K. SOHN, B. LIU, S. MAJI, AND M. CHANDRAKER, Active adversarial
domain adaptation, in 2020 IEEE Winter Conference on Applications of Computer Vision
(WACV), 2020, pp. 728–737.
576. M. SUGIYAMA, Distance approximation between probability distributions : Recent advances
in machine learning, Transactions of the Japan Society for Industrial and Applied Mathemat-
ics, 23 (2013), pp. 439–452.
577. M. SUGIYAMA, S. LIU, M. PLESSIS, M. YAMANAKA, M. YAMADA, T. SUZUKI, AND
T. KANAMORI, Direct divergence approximation between probability distributions and its
References 295
614. J. UESATO, B. O’DONOGHUE, P. KOHLI, AND A. VAN DEN OORD, Adversarial risk and
the dangers of evaluating against weak attacks, in Proceedings of the 35th International
Conference on Machine Learning, J. Dy and A. Krause, eds., vol. 80 of Proceedings of
Machine Learning Research, PMLR, 10–15 Jul 2018, pp. 5025–5034.
615. M. UMMELS, Stochastic multiplayer games: theory and algorithms, PhD thesis, RWTH
Aachen University, 2011.
616. M. USMAN, M. A. JAN, X. HE, AND J. CHEN, A survey on representation learning efforts in
cybersecurity domain, ACM Comput. Surv., 52 (2019).
617. L. VERDOLIVA, Media forensics and deepfakes: An overview, IEEE Journal of Selected
Topics in Signal Processing, 14 (2020), pp. 910–932.
618. A. VERMA, X. LLORA, D. E. GOLDBERG, AND R. H. CAMPBELL, Scaling genetic algorithms
using mapreduce, in 2009 Ninth International Conference on Intelligent Systems Design and
Applications, Nov 2009, pp. 13–18.
619. M. VIDYADHARI, K. KIRANMAI, K. R. KRISHNIAH, AND D. S. BABU, Security evaluation of
pattern classifiers under attack, International Journal of Research, 3 (2016), pp. 1043–1048.
620. P. VINCENT, H. LAROCHELLE, Y. BENGIO, AND P.-A. MANZAGOL, Extracting and com-
posing robust features with denoising autoencoders, in Proceedings of the 25th International
Conference on Machine Learning, ICML ’08, New York, NY, USA, 2008, Association for
Computing Machinery, p. 1096–1103.
621. P. VINCENT, H. LAROCHELLE, I. LAJOIE, Y. BENGIO, AND P.-A. MANZAGOL, Stacked
denoising autoencoders: Learning useful representations in a deep network with a local
denoising criterion, J. Mach. Learn. Res., 11 (2010).
622. Y. VOROBEYCHIK AND B. LI, Optimal randomized classification in adversarial settings, in
Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent
Systems, AAMAS ’14, Richland, SC, 2014, International Foundation for Autonomous Agents
and Multiagent Systems, p. 485–492.
623. Y. VOROBEYCHIK, M. P. WELLMAN, AND S. P. SINGH, Learning payoff functions in infinite
games, in IJCAI, Professional Book Center, 2005, pp. 977–982.
624. T.-H. VU, H. JAIN, M. BUCHER, M. CORD, AND P. PÉREZ, Advent: Adversarial entropy
minimization for domain adaptation in semantic segmentation, in CVPR, 2019.
625. E. WALLACE, S. FENG, N. KANDPAL, M. GARDNER, AND S. SINGH, Universal trigger
sequences for attacking and analyzing nlp, in Proceedings of the 2019 Conference on
Empirical Methods in Natural Language Processing, 2019, pp. 2153–2162.
626. B. WANG, H. PEI, B. PAN, Q. CHEN, S. WANG, AND B. LI, T3: Tree-autoencoder constrained
adversarial text generation for targeted attack, arXiv preprint arXiv:1912.10375, (2019).
627. B. WANG, Y. YAO, B. VISWANATH, H. ZHENG, AND B. Y. ZHAO, With great training comes
great vulnerability: Practical attacks against transfer learning, in 27th USENIX Security
Symposium (USENIX Security 18), Baltimore, MD, Aug. 2018, USENIX Association,
pp. 1281–1297.
628. F. WANG, W. LIU, AND S. CHAWLA, On sparse feature attacks in adversarial learning, in
2014 IEEE International Conference on Data Mining, Dec 2014, pp. 1013–1018.
629. K. WANG, C. GOU, Y. DUAN, Y. LIN, X. ZHENG, AND F. WANG, Generative adversarial
networks: introduction and outlook, IEEE/CAA Journal of Automatica Sinica, 4 (2017),
pp. 588–598.
630. L. WANG, W. CHO, AND K.-J. YOON, Deceiving image-to-image translation networks for
autonomous driving with adversarial perturbations, IEEE Robotics and Automation Letters,
5 (2020), pp. 1421–1428.
631. T. WANG AND Q. LIN, Hybrid predictive model: When an interpretable model collaborates
with a black-box model, CoRR, abs/1905.04241 (2019).
632. T. WANG, X. WANG, Y. QIN, B. PACKER, K. LI, J. CHEN, A. BEUTEL, AND E. CHI, Cat-
gen: Improving robustness in NLP models via controlled adversarial text generation, CoRR,
abs/2010.02338 (2020).
633. W. WANG, L. WANG, R. WANG, Z. WANG, AND A. YE, Towards a robust deep neural network
in texts: A survey, arXiv preprint arXiv:1902.07285, (2019).
298 References
634. X. WANG, C. HOANG, Y. VOROBEYCHIK, AND M. P. WELLMAN, Spoofing the limit order
book: A strategic agent-based analysis, Games, 12 (2021), p. 46.
635. X. WANG, L. LI, W. YE, M. LONG, AND J. WANG, Transferable attention for domain
adaptation, Proceedings of the AAAI Conference on Artificial Intelligence, 33 (2019),
pp. 5345–5352.
636. Y. WANG, Integration of data mining with game theory, in Knowledge Enterprise: Intelligent
Strategies in Product Design, Manufacturing, and Management, K. Wang, G. L. Kovacs,
M. Wozny, and M. Fang, eds., Boston, MA, 2006, Springer US, pp. 275–280.
637. Y. WANG, X. MA, J. BAILEY, J. YI, B. ZHOU, AND Q. GU, On the convergence and robustness
of adversarial training, in Proceedings of the 36th International Conference on Machine
Learning, K. Chaudhuri and R. Salakhutdinov, eds., vol. 97 of Proceedings of Machine
Learning Research, PMLR, 09–15 Jun 2019, pp. 6586–6595.
638. Z. WANG, A. BOVIK, H. SHEIKH, AND E. SIMONCELLI, Image quality assessment: from
error visibility to structural similarity, IEEE Transactions on Image Processing, 13 (2004),
pp. 600–612.
639. J. WEBB, Game Theory: Decisions, Interaction and Evolution, 01 2007.
640. X. WEI, S. LIANG, N. CHEN, AND X. CAO, Transferable adversarial attacks for image and
video object detection, in Proceedings of the Twenty-Eighth International Joint Conference
on Artificial Intelligence, IJCAI-19, International Joint Conferences on Artificial Intelligence
Organization, 7 2019, pp. 954–960.
641. T. WEISE, Global Optimization Algorithms - Theory and Application, self-published, Ger-
many, 2009.
642. M. P. WELLMAN, Methods for empirical game-theoretic analysis, in AAAI, AAAI Press,
2006, pp. 1552–1556.
643. M. P. WELLMAN, L. HONG, AND S. E. PAGE, The structure of signals: Causal interdepen-
dence models for games of incomplete information, in UAI, AUAI Press, 2011, pp. 727–735.
644. Y. WEN, B. LIU, R. XIE, J. CAO, AND L. SONG, Deep motion flow aided face video de-
identification, in 2021 IEEE International Conference on Visual Communications and Image
Processing, VCIP 2021, Dec 2021, pp. 269–272.
645. Y. WEN, B. LIU, R. XIE, Y. ZHU, J. CAO, AND L. SONG, A hybrid model for natural face
de-identiation with adjustable privacy, in 2020 IEEE International Conference on Visual
Communications and Image Processing, VCIP 2020, Dec 2020, pp. 269–272.
646. J. WEXLER, M. PUSHKARNA, T. BOLUKBASI, M. WATTENBERG, F. VIEGAS, AND J. WIL-
SON , eds., The What-If Tool: Interactive Probing of Machine Learning Models, 2019.
647. L. D. WHITLEY, S. B. RANA, J. DZUBERA, AND K. E. MATHIAS, Evaluating evolutionary
algorithms, Artif. Intell., 85 (1996), pp. 245–276.
648. A. WIECZOREK, M. WIESER, D. MUREZZAN, AND V. ROTH, Learning sparse latent
representations with the deep copula information bottleneck, in 6th International Conference
on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018,
Conference Track Proceedings, OpenReview.net, 2018.
649. D. H. WOLPERT, The supervised learning no-free-lunch theorems, in In Proc. 6th Online
World Conference on Soft Computing in Industrial Applications, 2001, pp. 25–42.
650. D. H. WOLPERT AND W. G. MACREADY, No free lunch theorems for optimization, IEEE
Transactions on Evolutionary Computation, 1 (1997), pp. 67–82.
651. ——, Coevolutionary free lunches, IEEE Transactions on Evolutionary Computation, 9
(2005), pp. 721–735.
652. E. WONG AND J. Z. KOLTER, Provable defenses against adversarial examples via the convex
outer adversarial polytope, in Proceedings of the 35th International Conference on Machine
Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, J. G. Dy
and A. Krause, eds., vol. 80 of Proceedings of Machine Learning Research, PMLR, 2018,
pp. 5283–5292.
653. ——, Learning perturbation sets for robust machine learning, in 9th International Conference
on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, OpenRe-
view.net, 2021.
References 299
654. D. WU, Y. WANG, S.-T. XIA, J. BAILEY, AND X. MA, Skip connections matter: On the
transferability of adversarial examples generated with resnets, in ICLR, 2020.
655. X. WU, U. JANG, J. CHEN, L. CHEN, AND S. JHA, Reinforcing adversarial robustness using
model confidence induced by adversarial training, in Proceedings of the 35th International
Conference on Machine Learning, J. Dy and A. Krause, eds., vol. 80 of Proceedings of
Machine Learning Research, PMLR, 10–15 Jul 2018, pp. 5334–5342.
656. B. XI, Adversarial machine learning for cybersecurity and computer vision: Current devel-
opments and challenges, WIREs Computational Statistics, 12 (2020), p. e1511.
657. Y. XIAN, T. LORENZ, B. SCHIELE, AND Z. AKATA, Feature generating networks for zero-
shot learning, in 31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR
2018), Salt Lake City, UT, USA, 2018.
658. Y. XIANG, Y. XU, Y. LI, W. MA, Q. XUAN, AND Y. LIU, Side-channel gray-box attack for
dnns, IEEE Transactions on Circuits and Systems II: Express Briefs, (2020), pp. 1–1.
659. C. XIAO, B. LI, J.-Y. ZHU, W. HE, M. LIU, AND D. SONG, Generating adversarial examples
with adversarial networks, (2018).
660. C. XIAO, B. LI, J.-Y. ZHU, W. HE, M. LIU, AND D. SONG, Generating adversarial examples
with adversarial networks, in Proceedings of the 27th International Joint Conference on
Artificial Intelligence, IJCAI’18, AAAI Press, 2018.
661. C. XIAO, J. ZHU, B. LI, W. HE, M. LIU, AND D. SONG, Spatially transformed adversarial
examples, in 6th International Conference on Learning Representations, ICLR 2018, Vancou-
ver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018.
662. C. XIAO, J.-Y. ZHU, B. LI, W. HE, M. LIU, AND D. SONG, Spatially transformed adversarial
examples, in International Conference on Learning Representations, 2018.
663. H. XIAO, B. BIGGIO, G. BROWN, G. FUMERA, C. ECKERT, AND F. ROLI, Is feature selection
secure against training data poisoning?, in Proceedings of the 32nd International Conference
on Machine Learning, F. Bach and D. Blei, eds., vol. 37 of Proceedings of Machine Learning
Research, Lille, France, 07–09 Jul 2015, PMLR, pp. 1689–1698.
664. H. XIAO, B. BIGGIO, B. NELSON, H. XIAO, C. ECKERT, AND F. ROLI, Support vector
machines under adversarial label contamination, Neurocomput., 160 (2015), pp. 53–62.
665. C. XIE, J. WANG, Z. ZHANG, Y. ZHOU, L. XIE, AND A. L. YUILLE, Adversarial examples for
semantic segmentation and object detection, in IEEE International Conference on Computer
Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, 2017, pp. 1378–1387.
666. E. XING, M. JORDAN, S. J. RUSSELL, AND A. NG, Distance metric learning with application
to clustering with side-information, in Advances in Neural Information Processing Systems,
S. Becker, S. Thrun, and K. Obermayer, eds., vol. 15, MIT Press, 2003.
667. Y. W. S. M. E. S. W. G. S. D. S. M. E. H. XINGJUN MA, BO LI AND J. BAILEY,
Characterizing adversarial subspaces using local intrinsic dimensionality, in ICLR, 2018.
668. H. XU, C. CARAMANIS, AND S. MANNOR, Robustness and regularization of support vector
machines, Journal of Machine Learning Research, 10 (2009), pp. 1485–1510.
669. H. XU, C. CARAMANIS, AND S. MANNOR, Robustness and regularization of support vector
machines, J. Mach. Learn. Res., 10 (2009), p. 1485–1510.
670. H. XU AND S. MANNOR, Robustness and generalization, Mach. Learn., 86 (2012), p. 391–
423.
671. Q. XU, K. BELLO, AND J. HONORIO, A le cam type bound for adversarial learning and
applications, in 2021 IEEE International Symposium on Information Theory (ISIT), 2021,
pp. 1164–1169.
672. B. XUE, M. ZHANG, W. N. BROWNE, AND X. YAO, A survey on evolutionary computation
approaches to feature selection, IEEE Trans. Evol. Comput., 20 (2016), pp. 606–626.
673. H. XUE, B. LIU, M. DIN, L. SONG, AND T. ZHU, Hiding private information in images from
ai, in ICC 2020 - 2020 IEEE International Conference on Communications (ICC), Dublin,
Ireland, IEEE, Jul 2020.
674. M. XUE, C. YUAN, H. WU, Y. ZHANG, AND W. LIU, Machine learning security: Threats,
countermeasures, and evaluations, IEEE Access, 8 (2020), pp. 74720–74742.
300 References
675. O. YAIR AND T. MICHAELI, Contrastive divergence learning is a time reversal adversarial
game, in ICLR, OpenReview.net, 2021.
676. Z. YAN, Y. GUO, AND C. ZHANG, Adversarial margin maximization networks, IEEE Trans.
Pattern Anal. Mach. Intell., 43 (2021), pp. 1129–1139.
677. C. H. YANG, Y. LIU, P. CHEN, X. MA, AND Y. J. TSAI, When causal intervention
meets adversarial examples and image masking for deep neural networks, in 2019 IEEE
International Conference on Image Processing (ICIP), Sep. 2019, pp. 3811–3815.
678. J. YANG, R. XU, R. LI, X. QI, X. SHEN, G. LI, AND L. LIN, An adversarial perturbation
oriented domain adaptation approach for semantic segmentation, in The Thirty-Fourth
AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative
Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium
on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA,
February 7-12, 2020, AAAI Press, 2020, pp. 12613–12620.
679. L. YANG, P. LI, Y. ZHANG, X. YANG, Y. XIANG, AND W. ZHOU, Effective repair strategy
against advanced persistent threat: A differential game approach, IEEE Transactions on
Information Forensics and Security, 14 (2019), pp. 1713–1728.
680. P. YANG, J. T. ORMEROD, W. LIU, C. MA, A. Y. ZOMAYA, AND J. Y. H. YANG, Adasampling
for positive-unlabeled and label noise learning with bioinformatics applications, IEEE Trans.
Cybern., 49 (2019), pp. 1932–1943.
681. D. YE, T. ZHU, S. SHEN, AND W. ZHOU, A differentially private game theoretic approach for
deceiving cyber adversaries, IEEE Trans. Inf. Forensics Secur., 16 (2021), pp. 569–584.
682. H.-J. YE, D.-C. ZHAN, AND Y. JIANG, Instance specific metric subspace learning: A bayesian
approach, in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence,
AAAI’16, AAAI Press, 2016, p. 2272–2278.
683. ——, Instance specific metric subspace learning: A bayesian approach, Proceedings of the
AAAI Conference on Artificial Intelligence, 30 (2016).
684. S. YE, X. LIN, K. XU, S. LIU, H. CHENG, J.-H. LAMBRECHTS, H. ZHANG, A. ZHOU, K. MA,
AND Y. WANG , Adversarial robustness vs. model compression, or both?, 2019 IEEE/CVF
International Conference on Computer Vision (ICCV), (2019), pp. 111–120.
685. D. S. YEUNG, I. CLOETE, D. SHI, AND W. W. NG, Sensitivity Analysis for Neural Networks,
Springer Publishing Company, Incorporated, 1st ed., 2009.
686. Z. YIN, F. WANG, W. LIU, AND S. CHAWLA, Sparse feature attacks in adversarial learning,
IEEE Transactions on Knowledge and Data Engineering, PP (2018).
687. ——, Sparse feature attacks in adversarial learning, IEEE Transactions on Knowledge and
Data Engineering, 30 (2018), pp. 1164–1177.
688. Z. YIN, F. WANG, W. LIU, AND S. CHAWLA, Sparse feature attacks in adversarial learning,
IEEE Transactions on Knowledge and Data Engineering, 30 (2018), pp. 1164–1177.
689. Y. ZANG, F. QI, C. YANG, Z. LIU, M. ZHANG, Q. LIU, AND M. SUN, Word-level textual
adversarial attacking as combinatorial optimization, in Proceedings of the 58th Annual
Meeting of the Association for Computational Linguistics, 2020, pp. 6066–6080.
690. F. ZHANG, P. CHAN, B. BIGGIO, D. YEUNG, AND F. ROLI, Adversarial feature selection
against evasion attacks, IEEE Transactions on Cybernetics, 46 (2016), pp. 766–777.
691. F. ZHANG, P. P. K. CHAN, B. BIGGIO, D. S. YEUNG, AND F. ROLI, Adversarial feature
selection against evasion attacks, IEEE Trans. Cybernetics, 46 (2016), pp. 766–777.
692. H. ZHANG, Y. YU, J. JIAO, E. XING, L. E. GHAOUI, AND M. JORDAN, Theoretically prin-
cipled trade-off between robustness and accuracy, in Proceedings of the 36th International
Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov, eds., vol. 97 of
Proceedings of Machine Learning Research, PMLR, 09–15 Jun 2019, pp. 7472–7482.
693. J. ZHANG, B. HAN, G. NIU, T. LIU, AND M. SUGIYAMA, Where is the bottleneck of
adversarial learning with unlabeled data?, CoRR, abs/1911.08696 (2019).
694. J. ZHANG, Z. HUI ZHAN, Y. LIN, N. CHEN, Y. JIAO GONG, J.-H. ZHONG, H. S.-H. CHUNG,
Y. LI, AND Y. HUI SHI, Evolutionary computation meets machine learning: A survey., IEEE
Comp. Int. Mag., 6 (2011), pp. 68–75.
References 301
695. J. ZHANG, X. XU, B. HAN, G. NIU, L. CUI, M. SUGIYAMA, AND M. KANKANHALLI, Attacks
which do not kill training make adversarial learning stronger, in Proceedings of the 37th
International Conference on Machine Learning, H. D. III and A. Singh, eds., vol. 119 of
Proceedings of Machine Learning Research, PMLR, 13–18 Jul 2020, pp. 11278–11287.
696. J. ZHANG, Z. ZHAN, Y. LIN, N. CHEN, Y. GONG, J. ZHONG, H. S. H. CHUNG, Y. LI, AND
Y. SHI, Evolutionary computation meets machine learning: A survey, IEEE Computational
Intelligence Magazine, 6 (2011), pp. 68–75.
697. L. ZHANG, T. ZHU, P. XIONG, W. ZHOU, AND P. S. YU, More than privacy: Adopting
differential privacy in game-theoretic mechanism design, ACM Comput. Surv., 54 (2021).
698. X. ZHANG, L. ZHAO, A. P. BOEDIHARDJO, AND C.-T. LU, Online and distributed robust
regressions under adversarial data corruption, in 2017 IEEE International Conference on
Data Mining (ICDM), 2017, pp. 625–634.
699. Y. ZHANG AND B. C. WALLACE, A sensitivity analysis of (and practitioners’ guide to)
convolutional neural networks for sentence classification, in Proceedings of the Eighth
International Joint Conference on Natural Language Processing, IJCNLP 2017, Taipei,
Taiwan, November 27 - December 1, 2017 - Volume 1: Long Papers, G. Kondrak and
T. Watanabe, eds., Asian Federation of Natural Language Processing, 2017, pp. 253–263.
700. Y. ZHANG AND Z. WANG, Joint adversarial learning for domain adaptation in semantic
segmentation, in AAAI, 2020.
701. J. ZHAO, Y. KIM, K. ZHANG, A. RUSH, AND Y. LECUN, Adversarially regularized autoen-
coders, in Proceedings of the 35th International Conference on Machine Learning, J. Dy and
A. Krause, eds., vol. 80 of Proceedings of Machine Learning Research, PMLR, 10–15 Jul
2018, pp. 5902–5911.
702. J. J. ZHAO, M. MATHIEU, AND Y. LECUN, Energy-based generative adversarial networks.
703. M. ZHAO, B. AN, Y. YU, S. LIU, AND S. J. PAN, Data poisoning attacks on multi-task
relationship learning, in Proceedings of the Thirty-Second AAAI Conference on Artificial
Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-
18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence
(EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, 2018, pp. 2628–2635.
704. P. ZHAO, P.-Y. CHEN, S. WANG, AND X. LIN, Towards query-efficient black-box adversary
with zeroth-order natural gradient descent., in AAAI, 2020, pp. 6909–6916.
705. S. ZHAO, J. SONG, AND S. ERMON, Learning hierarchical features from deep generative
models, in Proceedings of the 34th International Conference on Machine Learning - Volume
70, ICML’17, JMLR.org, 2017, p. 4091–4099.
706. Y. ZHAO, B. LIU, T. ZHU, M. DING, AND W. ZHOU, Private-encoder: Enforcing privacy in
latent space for human face images, Concurrency and Computation: Practice and Experience,
(2021), p. e6548.
707. Y. ZHONG AND W. DENG, Adversarial learning with margin-based triplet embedding
regularization, in Proceedings of the IEEE/CVF International Conference on Computer
Vision (ICCV), October 2019.
708. C. ZHOU AND R. C. PAFFENROTH, Anomaly detection with robust deep autoencoders, in
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, KDD ’17, New York, NY, USA, 2017, Association for Computing
Machinery.
709. Y. ZHOU AND M. KANTARCIOGLU, Modeling adversarial learning as nested stackelberg
games, in Advances in Knowledge Discovery and Data Mining, J. Bailey, L. Khan, T. Washio,
G. Dobbie, J. Z. Huang, and R. Wang, eds., Cham, 2016, Springer International Publishing,
pp. 350–362.
710. ——, Modeling adversarial learning as nested stackelberg games, in Proceedings, Part II, of
the 20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining -
Volume 9652, PAKDD 2016, Berlin, Heidelberg, 2016, Springer-Verlag, p. 350–362.
711. Y. ZHOU, M. KANTARCIOGLU, AND B. THURAISINGHAM, Sparse bayesian adversarial
learning using relevance vector machine ensembles, in 2012 IEEE 12th International
Conference on Data Mining, 2012, pp. 1206–1211.
302 References
712. Y. ZHOU, M. KANTARCIOGLU, AND B. XI, A survey of game theoretic approach for
adversarial machine learning, WIREs Data Mining and Knowledge Discovery, 9 (2019),
p. e1259.
713. ——, A survey of game theoretic approach for adversarial machine learning, Wiley
Interdisciplinary Reviews: Data Mining and Knowledge Discovery, (2019).
714. Z. ZHOU, H. CAI, S. RONG, Y. SONG, K. REN, W. ZHANG, J. WANG, AND Y. YU,
Activation maximization generative adversarial nets, in International Conference on Learning
Representations, 2018.
715. C. ZHU, W. R. HUANG, A. SHAFAHI, H. LI, G. TAYLOR, C. STUDER, AND T. GOLD-
STEIN , Transferable clean-label poisoning attacks on deep neural nets, arXiv preprint
arXiv:1905.05897, (2019).
716. B. D. ZIEBART, A. MAAS, J. A. BAGNELL, AND A. K. DEY, Maximum entropy inverse rein-
forcement learning, in Proceedings of the 23rd National Conference on Artificial Intelligence
- Volume 3, AAAI’08, AAAI Press, 2008, p. 1433–1438.
717. H. ZOU, T. HASTIE, AND R. TIBSHIRANI, Sparse principal component analysis, Journal of
Computational and Graphical Statistics, 15 (2004), p. 2006.
718. ——, Sparse principal component analysis, Journal of Computational and Graphical Statis-
tics, 15 (2006), pp. 265–286.