ML Project 4 Final

The document provides a tutorial on attacking deep neural networks (DNNs) using adversarial examples. It begins with an introduction and motivation for the topic using a toy example. It then discusses background concepts needed to understand adversarial examples, including DNNs and adversarial machine learning. The tutorial aims to explain smarter methods for generating adversarial examples and defending against attacks.

Uploaded by

mainakroni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views9 pages

ML Project 4 Final

Uploaded by

mainakroni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

A Tutorial on Attacking DNNs using Adversarial

Examples
Christian Both Arun Rawlani Andi Rayhan Chibrandy
ID: 260750691 ID: 260568533 ID: 260567400
[email protected] [email protected] [email protected]

Abstract—Deep neural networks (DNN) have achieved state- defense mechanisms against adversarial attacks. A demonstra-
of-the-art performance on image recognition tasks. However, it tion of these methods can be found in section V. The tutorial
has been observed that DNNs are vulnerable to small, targeted concludes with a discussion of more recent research on this
perturbations performed on the input image, producing what
the literature refers to as adversarial examples. In this paper, topic and possible future research directions suggested in the
we present a tutorial on the topic of adversarial examples in literature.
the context of deep neural networks. We first outline two attack
methods from the literature and give the readers a demonstration II. M OTIVATION : A T OY E XAMPLE
on using these methods to attack various convolutional neural Let us assume that we have a machine learning model
network architectures trained on various image datasets. We
also discuss defense mechanisms and implement one of them M that classifies images. The way M works is that given
to demonstrate their effectiveness against adversarial examples. an input image, denoted X, M will output a list of class
We hope to inspire readers to use this knowledge to design more scores where each entry Mi denotes the likelihood of the input
robust machine learning models. picture containing the object i that the entry specifies, like the
following:
I. I NTRODUCTION  
Chicken = 0.65
Deep neural networks nowadays have been gaining popular-  Duck = 0.25 
ity in various machine learning tasks in part due to advances  
 . 
made in research on neural networks and the availability of Scores(X) =    (1)
. 
more capable machines suited to the computationally intensive  
 . 
task of training them. The ability of deep neural networks
Car = 0.07
to represent higher-level concepts from low-level features
and generalize them has, amongst other factors, made them We also assume that we have an adversary A that seeks
suitable for machine learning tasks involving a highly complex to produce a version of X that looks like X, denoted X ∗ , but
input space, such as image classification. will lead M into misclassifying the image X ∗ by changing the
It was first discussed in [1] that deep neural networks likeliest class. We would refer to X ∗ as an adversarial example,
are vulnerable to versions of inputs with slight perturbations which is formally explained in Section III. Unfortunately,
added, crafted intentionally to cause misclassification. Many A does not know much about the way machine learning
explanations have been offered as to why deep neural networks classifiers work. He will therefore try to do this the naive way.
possess this property, which appears to be in contradiction He then comes up with the following algorithm. He first picks
with their ability to generalize higher level concepts. Further a target class, t, that he wants to output as the new likeliest
research in [2] suggested and provided some quantitative class. Then, using brute-force search, he finds the combination
evidence that this vulnerability to adversarial examples is a of pixel changes that will try to maximize Mt and result in M
result of the linearity of most of the components that build up classifying to his chosen target class (we can, for the sake
a neural network. of the example, assume A can query M as many times as he
In this tutorial, we aim to provide a basic overview of wants). Will A’s algorithm work? What can we learn from this
research done so far in the topic of adversarial examples in example about actual methods of crafting adversarial images?
the context of deep neural networks. We begin our tutorial One can see that there are several problems with A’s
by demonstrating a naive adversary in a toy example, which algorithm. First of all, misclassification is only guaranteed if
will help the readers build an intuition as to what adversarial A does not object to making a lot of distortions overall to the
examples are and the failure of naive methods to generate image. However, the algorithm may then produce samples that
them. We build upon the basic knowledge gained in Section II look nothing like the original image X. The second problem
by giving a richer, more detailed description of the concepts is that A’s algorithm is very inefficient - the running time is
we will be discussing next. We then in section IV discuss easily exponential just to produce a single image.
the theory behind two of the methods found in literature However, in the real world, adversaries like A are a lot
employed to generate adversarial examples and present two smarter. As a result they have better ways of generating
these adversarial examples which can negatively affect users models of similar architectures, but also generalize across
utilizing these machine learning algorithms. We would request different representations [14]. For example, adversarial ex-
the reader to keep this toy example in mind as we explain more amples generated to for a logistic regression model can also
abstract concepts and attempt to answer these two questions: successfully cause a decision tree to misclassify. It is thus
1) What are smarter ways that an adversary can employ to possible to successfully create adversarial examples without
produce adversarial examples? 2) How can we defend our having access to the underlying model, which is termed
systems against such adversaries? ”Black-Box” attack [10].
III. BACKGROUND INFORMATION C. Adversarial Machine Learning
In order to explain the two questions above, one must It is important to put adversarial examples in the context
possess a deeper understanding the way our target models to the recognized field of adversarial machine learning, which
work. Below we briefly explain the required concepts to make links the crafting of those artificial images to the bigger picture
it easier for the reader to understand the methods discussed in of threats in the real world. Discussing adversarial behaviour
the following sections. is especially relevant when talking about defense mechanisms
A. Deep Neural Networks against adversarial examples. Generally, the term adversarial
machine learning deals with adversary attacks on a learning
A deep neural network can be described as a very large system, e.g. to make a classifier produce false classification
neural network which is typically organized in multiple layers which benefits the adversary, and discusses defenses and
of individual computing units. Those individual units, called countermeasures of such an attack [12]. As described in [11]
neurons are connected by links with different weights and those attacks will naturally occur whenever machine learning
biases which enables neural networks to model highly non- is used to prevent illegal or unsanctioned activity. It follows
linear relationships [15]. Being equipped with a complex that whenever there is an economic incentive, the adversary
architecture of several hidden layers, deep neural networks will attempt to circumvent the protection provided. Examples
are able to learn a higher-level representation of input data given in [12] name amongst others spam filtering, network-
layer by layer and are thus able to solve complex classification intrusion detection and virus detection as applications which
problems [4]. are sensitive to adversarial machine learning. As presented in
Neural Networks in general are trained to minimize a loss [13], adversarial attacks on learners (e.g. classifiers) can be
function. One can think of this as a function that takes in seen as a game between classifier and adversary to defeat the
input data along with the true labels and computes the quality opponent’s strategy. Although several potential scenarios have
of the network’s prediction on that input data. Generally, the been pointed out as to how adversarial examples might be
lower the value of the loss function, the better the predictive used to attack machine learning applications [8], [10], they do
performance of the network. not necessarily imply an adversarial attack itself. They might
B. Adversarial Examples even be intentionally created to improve the learner’s ability
to generalize, as discussed later in our tutorial.
Adversarial examples can be described as inputs that are
specially crafted to cause a machine learning model to produce D. Threats
an incorrect output [14]. More formally, we define X as a
As DNNs are already implemented in several applications
legitimate input that is classified by a learner to a class Y .
and give a promising field for more future applications in the
F(X) = Y (2) real world, it is of importance to explore the potential harm-
fulness of adversarial examples. For example, it is pointed out
We construct an adversarial example X* by adding a small
in [10] that self-driving cars equipped with a DNN classifier to
perturbation vector δ X to X resulting in a misclassification
scan the environment and detect traffic signs could possibly
Y*.
be causing accidents when signs requiring a certain action
F(X + δX ) = Y ∗ (3)
(e.g. stop sign) are manipulated and thus misclassified. Other
However, in order to produce a good adversarial example potentially harmful scenarios can be imagined in manipulating
we aim to find the smallest perturbation that produces misclas- face-recognition systems with adversarial inputs, as well as in
sifcation. This leads us to the following optimization problem fraud detection systems.
to solve:
IV. ATTACKING AND D EFENDING A N EURAL N ETWORK
argmin||δX ||s.t.F(X + δX ) = Y ∗ (4)
δX
We take this opportunity to refer back to A’s algorithm
However, it is non-trivial to solve this optimization problem described in Section II. We recall that one of the biggest
due to the non-linear and non-convex properties of a DNN challenges that the algorithm encounters was that it was doing
[22]. As a result, we employ techniques that can approximate a brute force search in order to find the desired combination
a reduced perturbation to misclassify the input. of pixel-wise perturbation, resulting in its being exponential in
It is now understood that adversarial samples generated to running time. It also risks generating adversarial examples that
attack a specific network architecture will mislead not only do not resemble what natural samples from the original class
would look like. Intuitively, if a smarter way of choosing how
to perturb pixels can be used instead of a brute force search
scheme, one can build an algorithm to produce adversarial
examples that not only will take shorter to run but also can
potentially induce a smaller amount of perturbation, preventing
human detectability.
With these targets in mind, we introduce two methods that
are often employed to generate adversarial examples, followed
by two defense mechanisms which can provide resistance
against such attacks.
A. Fast Gradient Sign Method
Recall from section II that we need a smarter way to
pick a combination of pixel perturbations that will produce
adversarial examples reliably. Recall also from subsection
III-A that neural networks are typically trained to minimize
a loss function. We can see that if we can produce samples
that cause our loss function to increase when these samples Fig. 1: Saliency map of a 784-dimensional input to the LeNet
are passed in as input, then we might have a way to produce model. The 784 input dimensions are arranged to correspond
adversarial examples. We can intuitively see the potential of to the 28x28 image (MNIST) pixel alignment. [6]
gradient-based approaches here. Indeed, the Fast Gradient Sign
Method operates along a similar line of approach.
It was found in [2] that taking the sign of the gradient of the adversary A, there is a smarter approach to find effective
the loss function of a deep neural network with respect to its perturbations that guarantee misclassification, at the expense
input and applying perturbations based on this to input images of a longer running time.
reliably produces adversarial examples. This approach is thus This smarter approach is known Jacobian Saliency Map
termed the ”Fast Gradient Sign Method”. Now we explain Approach (JSMA) and is another technique for generating
the formulation behind fast gradient sign approach. Let w be adversarial examples for acyclic feed-forward DNNs [6]. The
the parameters or weights in the network, x the input to the JSMA approach does a source-target misclassification, where
network and y the target classes that we train our network with. it specifically distorts the source input image from a particular
Now we denote the cost function of our neural network as class, s, to misclassify it to a chosen target class, t.
L(w, x, y). To generate adversarial sample xadv , the fast gradient Unfortunately, the non-convexity and non-linearity prop-
sign method uses the following update rule: erties of DNNs make it difficult to identify the set of op-
timal perturbations required to misclassify the input to the
xadv = x + εsign(∇x L(w, x, y)) (5)
target class. [22] Consequently, this approach aims to find
In equation 5, ε denotes the magnitude of the perturbation a suitable heuristic in order to enable efficient exploration
we want applied to the image. The bigger the value of ε, the of the adversarial-sample search space and find the most
higher the amount of perturbation applied will be. This will effective set of perturbations that will lead to the targeted
result in more certain misclassification. However, too high a misclassification. So what heuristic does JSMA employ to
value will result in a sample that does not resemble the original overcome the challenge of non-convexity and non-linearity in
image. DNNs?
This approach was found to reliably produce adversarial The JSMA strategically handles this challenge by exploiting
examples using various datasets and classification algorithms the forward derivatives of the learned network. It focuses
while still being relatively cheap to compute - the gradient can on the DNNs output with respect to changes in the input
be efficiently derived using the back propagation algorithm. which yields the required forward derivatives. The forward
On top of this, the amount of perturbation applied can be derivatives of the model M, learned by this network with n
controlled by changing the value of ε. This approach thus input features, are defined by the Jacobian matrix which can
addresses in part some of the problems that the brute force be formulated as follows:
algorithm in section II faces.
∂ M(X) ∂ M(X)
∇M(X) = [ ... ] (6)
B. Jacobian Saliency Map Approach ∂ x1 ∂ xn
The Fast Gradient Sign Approach is a good start to solve Forward derivatives assist the adversary by highlighting
the first question asked in Section II. However, it still does input features unlikely to produce adversarial examples. This
not guarantee a misclassification once the perturbation has helps the adversary to focus on features with larger forward
been added to the input, meaning that the adversarial examples derivative values that would yield the misclassification with a
may not be good enough to trick the system. Fortunately, for smaller degree of overall distortion.
To further develop the idea, the matrix in equation 6 is then Balanced by the factor α, the loss of the adversarial example
utilized to form the adversarial saliency map. The adversarial is added to the loss of its original example, so that correct
saliency map assists the adversary A in finding the most classification of both, the legitimate and corrupted input is
efficient way to produce the targeted misclassification by forced.
indicating what features to perturb. Figure 1 provides a helpful The procedure of adversarial training is visualized1 in
visualization of an adversarial saliency map. Figure 2 and can be understood as an iterative approach.
To understand how this approach is helpful, recall the toy When training a Neural Network, adversarial examples are
example introduced at the beginning of our tutorial. We know created after each training epoch and added to the data set for
that our model M outputs a vector with the class scores as future training. Note then that the new loss function shown
shown in Equation 1. Then, the generated saliency map aids in equation 7 simulates the process of generating adversarial
in increasing the class score of the target class, denoted as examples using the Fast Gradient Sign method and injecting
Mtarget (X), while decreasing it for all Mi where i 6= target until it into the training set.
target = argmaxi Mi (X). This strategy facilitates in finding the
set of relevant features to perform perturbations on, such that it
will lead to maximizing Mtarget (X) and finally misclassifying
the input to the target class.
Additionally, the JSMA has two hyperparameters that gov-
ern its performance in different settings. The amount by which
selected features are perturbed in each iteration is regulated by
θ . Additionally, the ϒ controls the maximum distortion/max
number of iterations allowed on a sample and limits the
number of features that are perturbed to form the adversarial
example. The value of these hyperparameters should change
with respect to the data being handled by the DNN, as they
keep the total distortion in check to avoid human detectability.
Now that we have seen that the adversary A could be smarter
in producing its adversarial examples, the reader might believe
Fig. 2: Principle of Adversarial Training
that its classifier is always at risk. So that leaves us with this
question: Is there a way to defend our machine learning
model against such smart adversaries? D. Defensive Distillation
Defensive distillation is another method to mitigate the
C. Adversarial Training effect of adversarial examples developed in [9]. It builds on
In order to give an intuition about the first defense mech- the concept of distilling knowledge from deep architectures
anism introduced, recall that adversarial examples do not introduced in [8] where a learner consisting of a small archi-
reflect naturally occurring images but confront the classifier tecture was used to mimic the output of a large, computational
with artificially crafted worst-case perturbations which expose expensive model. However, defensive distillation uses the
unlearned ”blindspots” in the learned model [2]. Thus it is a interaction between two model architectures differently. Here,
first instinctive approach to use those examples and confront it is aimed to train a model to reproduce probabilistic labels
the model with them again to force correct classification of created by the initial model, which makes the second, distilled
the corrupted input. model more resilient against adversarial examples. The prin-
The concept of adversarial training develops this idea further ciple of defensive distillation2 is presented in Figure 3. The
to a systematic training procedure for neural networks where single steps leading to a distilled network architecture can be
adversarial examples are used for augmentation of the training summarized as follows:
set. This data augmentation method differs from other known • Train an initial neural network using discrete labels
data augmentation schemes such as translating or rotating (one single non-zero element in output vector Y which
legitimate examples, which reflects the reproduction of legit- corresponds to the correct class)
imate inputs in different shapes. However, similar to natural • Use a softmax layer which outputs probability scores for
data set augmentation, it is aimed to create a regularization for each training sample belonging to a particular class
effect where the classifier learns to generalize better using (e.g. P(car) = 0.95, P(ship) = 0.03, P(tree) = 0.02 ... )
the augmented dataset. This regularization can be achieved by • Train a second neural network of identical architecture
adding an additional term when calculating the loss function. using the same training set with the difference of assign-
The modified loss function has the form: ing previously received probability scores as new labels
for each training sample
Lnew (w, x, y) =
1 Own visualization created after [2]
αL(w, x, y) + (1 − α)L(w, x + εsign(∇x L(w, x, y)), y) (7) 2 Own visualization created after [9]
Using the probabilistic training labels instead of discrete the original training and testing sets to build a dataset
labels has the following beneficial effect: The weights of the consisting 13 000 labeled images. We then perform a 90-
distilled model are prevented from fitting too tightly to the 10 split to obtain a training dataset consisting of 11 700
trained data. Defensive distillation thus contributes to a better samples and a testing set with 1 300 samples, with the
generalization, which is resilient against the small worst-case distribution of classes being very similar in the training
perturbations of adversarial examples resulting in correctly and test sets.
classifying them with a high accuracy. 2) Neural Network Architectures: We explore two different
neural network architectures. The first one is a simple deep
convolutional neural network architecture that we built our-
selves. The second one is a pre-trained VGG19 model [19].
This is a much more complex neural network architecture with
19 layers. We decided to use more than one architecture in this
tutorial is because we wanted to illustrate through examples
that vulnerability to adversarial examples is not specific to
a specific architecture of neural networks. Along this line of
reasoning, we decided to use a simple architecture along with
a much more complicated one. A summary of the architecture
of the simpler network can be seen in Table III in the appendix.
B. Fast Gradient Sign Method
In order to give an example of an implementation to the
readers, we simulated a run of this algorithm to generate
Fig. 3: Principle of Defensive Distillation adversarial examples from various datasets. This implemen-
tation follows the pseudo code outlined in figure 4. On the
V. M ETHODS IN ACTION CIFAR10 and MNIST dataset we trained our own neural
In this section we give readers an simulation of a run network architecture for 10 epochs. On the STL10 dataset, a
through of the attack methods outlined above on various net- more complicated model was required to achieve a reasonable
work architectures. We then follow up with an implementation level of accuracy so we trained the VGG19 network on this
of the Adversarial Training defense mechanism to demonstrate dataset for 20 epochs. We outline the statistics of the attacks
that it is possible to defend your machine against such attacks. in table I.
The first section below gives an overview of the datasets and
neural network architectures we used to simulate our methods.
The attack and defense mechanisms above are implemented
using the cleverhans library [20] with tensorflow [21] as the
backend.
A. Overview of Datasets and Architectures
In this section we give readers the opportunity to learn more Fig. 4: Algorithm for Generating Adversarial Examples using
about the datasets and the neural network architectures we FGSM where M is the model, X the input, Y the set of labels
used when giving practical examples of the attack and defense and epsilon the degree of perturbation applied.
methods:
1) Dataset: In Figure 5, we can see one example of an image and its
• MNIST: This dataset consists of 28x28 pixel black and adversarial version that was generated using this approach.
white images of handwritten digits [16]. There are 10 The image is from the STL10 dataset. Here the distortion is
classes, each representing a digit from 0 to 9. There clearly visible. However, the adversarial sample still largely
are 60 000 labeled images in the training set and 10 000 contains the characteristics of the original image.
labeled images in the test set.
• CIFAR10: This dataset consists of 32x32 RGB images C. Jacobian Saliency Map Approach
spread over 10 categories [17]. The train set consists of Now we perform an implementation of the JSMA algorithm
50 000 labeled images and the test set consists of 10 000 discussed previously to generate adversarial examples for the
labeled images. three datasets defined in Section V-A, where the source input
• STL10: This dataset is similar to the CIFAR10 dataset image was distorted to misclassify it to the target class.
but it has different classes and it contains images that are We used the algorithm shown in Figure 6 for our im-
bigger in resolution - 96 x 96 pixels [18]. The training plementation of JSMA. For each dataset, the algorithm for
set contains 5 000 labeled images and the test set con- JSMA was adapted to work with the various resolutions
tains 8 000 labeled images. For this tutorial, we combine and color channels that were available. The algorithm then
(a) Original Image (b) Adversarial Image (a) Original Image (b) Adversarial Image
Fig. 5: Example of an image from the STL10 dataset and Fig. 7: Example of an CIFAR10 image and its adversarial
its adversarial version generated using the fast gradient sign version generated using the JSMA approach with ϒ=0.1 and
method ε=0.04. θ =+1.

chooses the input features, i.e. pixel intensities in case of Furthermore, the algorithm used different combinations of
image classification problems, to be perturbed. For example, hyperparameter values for different datasets. Depending on
in the case of the CIFAR10 dataset, the number of input rows, the resolution of the input image, the ϒ parameter adjusts
input columns and color channels was set to 64, 64 and 3, the maximum number of iterations to increase proportionally
respectively. Furthermore, the sizes of the filters in the pooling with the number of pixels in the input image. The reason
layer of the network were also adjusted to work well with this for this is because ϒ monitors the percentage by which the
dataset because the default values were set to work with the image could be distorted to craft the adversarial example. As
MNIST dataset. the resolution gets better, the algorithm can afford to change
a higher percentage while avoiding human detection. This
phenomenon is further validated by the Figure 7, where lower
resolution of the CIFAR10 images limits the value of ϒ, which
makes it harder to form an good adversarial example that
produces a misclassification, while being imperceptible to the
human eye.

D. Evaluating the Attack Methods

By this point, we have provided an in-depth overview of
two methods used to generate adversarial examples to fool
our DNN models. Now it would be wise to evaluate how well
these methods perform in producing adversarial examples and
if they solve the issues we pointed out with our adversary’s
strategy in our toy example in Section II.
Fig. 6: Algorithm for Generating Adversarial Examples using 1) Fast Gradient Sign Method: In Table I we can see the
JSMA where X is original image, Y* is target network output, statistics of our attack. This method consistently produced
M is the model, ϒ is maximum distortion, and θ is amount adversarial examples that were largely misclassified from all
of perturbation applied to chosen features. the datasets that we trained our networks on. We can see
this as testament to the reliability of this simple approach.
Once the input parameters were chosen for CIFAR10, the Note that our selection of ε stems directly from our empirical
algorithm ran a while loop where it calculated the forward observation based on the richness of features in the images
derivative of the network with respect to the input image, within each dataset - for example, the STL10 dataset requires
formed an adversarial saliency map and then used it to choose a higher ε value compared to the other datasets.
the image pixels it wants to perturb. It then perturbs these
TABLE I: Statistics on the attack using the Fast Gradient Sign
chosen features by θ . The algorithm stops when it either
Method
finds the perturbation which produces the misclassification or
reaches the distortion limit set by ϒ. PARAMETERS CIFAR10 STL10 MNIST
Figure 7 shows the adversarial example generated by the Error rate on legitimate samples 33.2% 37.2% 2.5%
JSMA on a sample image X, taken from the CIFAR10 dataset. epsilon, ε 0.04 0.07 0.04
Originally, X was correctly classified as a cat (i.e. M(X)= cat), Error rate on adversarial samples 83.7% 83.5% 97.2%
but the adversarial example X ∗ , was misclassified as a ship (i.e.
M(X ∗ )= ship).
2) Jacobian Based Saliency Approach: In Table II, we method. Figure 9, representing training on the STL10 dataset
can observe the statistics received by using the JSMA. It and Figure 10, using the CIFAR10 dataset show the evolution
can be seen that it successfully produces good adversarial of validation accuracy for both, original (legitimate) examples
examples for all three datasets, that tricks the learned models and their adversarial counterparts after each epoch. It can be
to misclassify to the chosen target classes. The adversarial clearly seen in Figure 9 that that accuracy scores increase for
examples effectively tricked the DNN models producing high both sample types during the training process. Especially the
error rates of 97.2% and 98.0% for MNIST and STL datasets, increased accuracy for corrupted images testifies the increasing
respectively. robustness of the classifier against adversarial examples using
adversarial training. Figure 10 does not reveal the increasing
TABLE II: Statistics on attacking the simple architecture using robustness against adversarially corrupted images but only
the Jacobian Saliency Map Approach shows an increased accuracy on legitimate examples which
PARAMETERS CIFAR10 STL10 MNIST reflects a normal training process of neural network structures.
Error rate on legitimate samples 33.2% 36.7% 2.5%
However, before starting the training process, the validation
Maximum iterations w.r.t ϒ 153 1382 39 accuracy for adversarial examples appeared to be extremely
Feature Variation, θ +1 +2 +1 low, ranging at 18 percent accuracy. Thus, an increase of
Error rate on adversarial samples 94.3% 98.0% 99.2% robustness against adversarial examples, not visible in the
Average perturbed features 4.0% 1.0% 2.9% graph, has been achieved after the first training epoch.

Figure 8 shows that adversarial attacks produced by JSMA

induce higher error rates than FGSM on all 3 datasets. This can
be explained as it uses the heuristic of an adversarial saliency
map to only perturb relevant features while keeping the overall
distortion in check. In contrast, all pixels are perturbed by ε in
FGSM which leads to an overall higher distortion. However,
this comes at the cost of higher computational time for JSMA
as it needs to search through the adversarial sample space,
compared to FGSM which does not. For example, it took over
48 hours on a machine with a GPU to produce adversarial
examples from the STL10 dataset using JSMA. This makes
it unfavourable to use JSMA for datasets with much higher
image resolutions.
Fig. 9: Comparison of Validation Accuracy for Legitimate and
Adversarial Examples when performing Adversarial Training
on the STL10 Dataset

Fig. 8: Comparing the Error % induced on DNNs by Adver-

sarial Examples, using FGSM and JSMA on all 3 datasets
Fig. 10: Comparison of Validation Accuracy for Legitimate
E. Adversarial Training and Adversarial Examples when performing Adversarial Train-
We analyzed adversarial training on our simple network ing on the CIFAR10 Dataset
structure to give an illustration about the effect of this defense
VI. D ISCUSSION supposedly more promising defense mechanisms, others find
A. Limitations of Methods to Generate Adversarial Examples more and more novel attack methods to go around these
defenses. It, therefore, remains to be discovered what the most
At first glance, it looks like the attack methods we discussed optimal paradigm of designing a defense mechanism would be.
work very well against neural networks that are developed
with no particular adversary in mind. It is also remarkable
VII. C ONCLUSION
how similar to the original images the samples produced by
these methods are. Even the relatively simple Fast Gradient Acquiring a better understanding of adversarial examples
Sign method very successfully attacks neural networks with in general would offer many insights into the theoretical
very similar looking adversarial samples. properties of the models that we are attacking. With the
It is important to note that these attack methods all rely on ubiquity of deep learning models in multiple intelligent
one rather big assumption: the adversary has information about systems and increased dependence on neural networks, it is
the network’s architecture, parameters and training data! While indeed concerning that models that important systems rely
allowing for this assumption to be made can produce very on can be fooled with the simple methods we showed in this
interesting theoretical results, it is unrealistic to assume that, tutorial. There is currently much research, not only in finding
for example, when an adversary wants to attack a proprietary out the best way to defend against adversarial examples
machine learning system, they will have all the information but also in determining if finding such an approach is at
they need about the network to carry out these simple attacks. all possible [23]. In this tutorial, we have given readers an
It is in fact more realistic to assume that adversaries in overview of some well known attack and defense methods
most malicious attacks know very little about the machine currently. While we did not seek to be exhaustive in our
learning models that they are attacking. Often the only piece description, we hope we have given readers a sufficient
of information the adversary has about the network is the insight into this field of research. We encourage inclined
machine learning task it is designed to accomplish. This then readers to read further on this topic and take some of the
begs the question: does that mean that it is impossible to attack knowledge from this tutorial in solving this problem.
a network without knowing everything there is to know about
a network?
The answer to the question we posed above is, unsurpris-
ingly, no! We refer back to the ”Black-Box” methods family
VIII. S TATEMENT OF C ONTRIBUTIONS
we briefly mentioned in III-B. It has been shown that an
adversary can easily train a substitute model that behaves in A. Christian
a very similar way and has similar gradients and decision
boundaries to the target model and use this substitute model to Evaluation of defense mechanisms as well as describing and
generate adversarial examples using the methods we discussed visualizing them in the report. Added the section Background
in section IV [10]. This can easily be done to circumvent a Information to the report and contributed to other sections.
situation where an adversary does not have access to all the
information. B. Arun

B. Effectiveness of Current Defense Mechanisms Responsible for writing the description of the Jacobian
Based Saliency Approach. Implemented and simulated the
Defending against adversarial attacks is a hard problem. The JSMA on three different datasets. Created some of the graphs
fact that most machine learning models are not designed and and images used in the paper. Also contributed to the Back-
trained with the assumption that an adversary is present also ground Information, Discussion sections of this paper.
does not help alleviate this problem.
The defense mechanisms so far seem to handle the attacks
that we described above quite well. We showed that a premise C. Andi
as simple as Adversarial Training can achieve promising Chiefly responsible for the simulation of the fast gradient
results when it comes to warding off against adversarial sign method and parts of the paper pertaining to it. Generated
examples. results for Adversarial Training. Contributed in writing of the
However, most defense mechanisms so far aim to handle toy example and discussion on attack and defense methods.
attack methods carried out under specific paradigms. For
example, adversarial training handles FGSM attacks well and D. All
defensive distillation handles JSMA attacks well. Goodfellow
and Papernot refer to this as playing a ”game of whack-a- Made the paper fit the description of a tutorial better as
mole” [23]. That is to say, defense mechanisms proposed so opposed to a regular report. Proof-read and performed many
far address only specific types of vulnerabilities but not all of revisions of the paper.
them. These defense mechanisms are not adaptive to different We hereby state that all the work presented in this report
types of attacks. It also seems that as researchers discover is that of the Authors.
R EFERENCES
[1] Szegedy, C., Zaremba, W., Sutskever, I. et al. (2014). Intriguing prop-
erties of neural networks.. ICLR, abs/1312.6199, 2014b. URL http:
//arxiv.org/abs/1312.6199.
[2] Goodfellow, I., Shlens, J. and Szegedy, C. (2015). Explaining and
harnessing adversarial examples.. In Proceedings of the International
Conference on Learning Representations.
[3] Krizhevsky, A., Sutskever, I. and Hinton, G. (2012). Imagenet classi-
fication with deep convolutional neural networks. Advances in Neural
Information Processing Systems 25, pages 11061114.
[4] Goodfellow, I., Bengio, Y. and Courville, A. (2016). Deep Learning MIT
Press, url:http://www.deeplearningbook.org.
[5] Deng, J., Dong, W., Socher, R. et al. (2009). Imagenet: A large-scale hi-
erarchical image database. In Computer Vision and Pattern Recognition.
CVPR 2009, IEEE Conference on, pages 248255.
[6] Papernot, N., McDaniel, P., Jha, S. et al (2016). The limitations of deep
learning in adversarial settings In Security and Privacy (EuroS&P), 2016
IEEE European Symposium.
[7] Moosavi-Dezfooli S., Fawzi A. and Frossard P. (2015). Deepfool: a simple
and accurate method to fool deep neural networks.. CoRR (2015) vol.
abs/1511.04599.
[8] Hinton G. , Vinyals O., and Dean J. (2014). Distilling the knowledge in a
neural network. In Deep Learning and Representation Learning Workshop
at NIPS 2014. arXiv preprint arXiv:1503.02531, 2014.
[9] Papernot, N., McDaniel, P., Wu, X. et al (2016). Distillation as a defense
to adversarial perturbations against deep neural networks. In Security
and Privacy (SP), 2016 IEEE Symposium on (pp. 582-597). IEEE.
[10] Papernot, N., McDaniel, P., Goodfellow, I. et al. (2017). Practical Black-
Box Attacks against Machine Learning. ASIA CCS ’17 Proceedings of
the 2017 ACM on Asia Conference on Computer and Communications TABLE III: Architecture of the custom neural network used
Security, Pages 506-519. in for the MNIST and CIFAR10 datasets
[11] Laskov, P. and Lippmann, R. (2010) Machine learning in adversarial
environments. Mach Learn (2010) 81: 115. doi:10.1007/s10994-010-5207- Convolutional, 64 filters, 3x3 filter size, ReLU
6. Convolutional, 128 filters, 3x3 filter size, ReLU
[12] Huang, L., Joseph, A. D., Nelson, B. et al. (2011). Adversarial machine Convolutional, 128 filters, 3x3 filter size, ReLU
learning. In Proceedings of the 4th ACM workshop on Security and MaxPool 2x2 filter size
artificial intelligence (pp. 43-58). ACM. Dropout 0.25
[13] Dalvi, N., Domingos, P., Sanghai, S., et al. (2004). Adversarial clas- Convolutional, 64 filters, 3x3 filter size, ReLU
sification. In Proceedings of the tenth ACM SIGKDD international Convolutional, 128 filters, 3x3 filter size, ReLU
conference on Knowledge discovery and data mining, pp. 99108. ACM. Convolutional, 128 filters, 3x3 filter size, ReLU
[14] N. Papernot, P. McDaniel, and I. Goodfellow. (2016). Transfer- MaxPool 2x2 filter size
ability in Machine Learning: from Phenomena to Black-Box At- Dropout 0.5
tacks using Adversarial Samples.ArXiv e-prints, May 2016b. URL Fully-Connected, 512 nodes
http://arxiv.org/abs/1605.07277. Fully Connected, 10 nodes, softmax activation
[15] Bengio, Y. (2009). Learning deep architectures for AI. Foundations and
trends in Machine Learning 2.1 (2009): 1-127.
[16] LeCun, Y., Cortes, C. and Burges, C.J. (1998). The MNIST database of
handwritten digits.
[17] Krizhevsky, A., and Hinton, G. (2009). Learning multiple layers of
features from tiny images.
[18] Coates, A., Lee, H. and Ng, A.Y., (2010). An analysis of single-layer
networks in unsupervised feature learning. Ann Arbor, 1001(48109), p.2.
[19] Simonyan, K. and Zisserman, A., (2014). Very deep convolutional net-
works for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[20] Papernot, N., Goodfellow, I., Sheatsley, R. et al. (2016). clever-
hans v1. 0.0: an adversarial machine learning library. arXiv preprint
arXiv:1610.00768.
[21] Martn A., Ashish A., Paul A., et al. TensorFlow: Large-scale machine
learning on heterogeneous systems, 2015. Software available from ten-
sorflow.org.
[22] Larochelle, H., Bengio Y.,Louradour J. et al. Exploring strategies for
training deep neural networks. Journal of Machine Learning Research
10, no. Jan (2009): 1-40.
[23] Goodfellow, I., Papernot, N. (2017, February 15). Is attacking
machine learning easier than defending it? [Blog post]. Retrieved from
http://www.cleverhans.io/security/privacy/ml/2017/02/15/why-attacking-
machine-learning-is-easier-than-defending-it.html

A PPENDIX A
A RCHITECTURE OF C USTOM N EURAL N ETOWKR

Adversarial Machine Learning Attack Surfaces, Defence Mechanisms
No ratings yet
Adversarial Machine Learning Attack Surfaces, Defence Mechanisms
314 pages
New Adversarial Image Detection Based On Sentiment Analysis
No ratings yet
New Adversarial Image Detection Based On Sentiment Analysis
15 pages
Adversarial Attacks On Deep-Learning Models in Natural Language Processing: A Survey
No ratings yet
Adversarial Attacks On Deep-Learning Models in Natural Language Processing: A Survey
41 pages
Book - A State of The Art Review On Adversarial Machine Learning
No ratings yet
Book - A State of The Art Review On Adversarial Machine Learning
66 pages
Towards Deep Learning Models Resistant To Adversarial Attacks
No ratings yet
Towards Deep Learning Models Resistant To Adversarial Attacks
28 pages
cs231n 2017 Lecture16
No ratings yet
cs231n 2017 Lecture16
43 pages
Adversarial Machine Learning at Scale - Kurakin Et Al, 2017
No ratings yet
Adversarial Machine Learning at Scale - Kurakin Et Al, 2017
17 pages
Survey TPAMI 2023 Preprint
No ratings yet
Survey TPAMI 2023 Preprint
20 pages
Secure Machine Learning With Neural Networks: Mane Pooja (M190442EC)
No ratings yet
Secure Machine Learning With Neural Networks: Mane Pooja (M190442EC)
31 pages
Adversarial Examples That Fool Both Computer Vision and Time-Limited Humans
No ratings yet
Adversarial Examples That Fool Both Computer Vision and Time-Limited Humans
22 pages
Random Spiking and Systematic Evaluation of Defenses Against Adversarial Examples
No ratings yet
Random Spiking and Systematic Evaluation of Defenses Against Adversarial Examples
12 pages
Adversarial Examples Are Misaligned in Diffusion Model Manifolds
No ratings yet
Adversarial Examples Are Misaligned in Diffusion Model Manifolds
23 pages
Deeplearning2015 Goodfellow Adversarial Examples 01
No ratings yet
Deeplearning2015 Goodfellow Adversarial Examples 01
43 pages
Synthesizing Robust Adversarial Examples
No ratings yet
Synthesizing Robust Adversarial Examples
19 pages
Lec1&2 Final
No ratings yet
Lec1&2 Final
37 pages
Adversarial Examples Attacks and Defenses For Deep Learning
No ratings yet
Adversarial Examples Attacks and Defenses For Deep Learning
20 pages
Dan Iter
No ratings yet
Dan Iter
8 pages
MoNet Impressionism As A Defense Against Adversarial Examples
No ratings yet
MoNet Impressionism As A Defense Against Adversarial Examples
10 pages
7.explaining and Harnessing Adversarial Examples
No ratings yet
7.explaining and Harnessing Adversarial Examples
11 pages
2018-15 FP
No ratings yet
2018-15 FP
11 pages
1 s2.0 S0893608023001259 Main
No ratings yet
1 s2.0 S0893608023001259 Main
9 pages
Adversarial Training For Free
No ratings yet
Adversarial Training For Free
12 pages
Paper AI
No ratings yet
Paper AI
6 pages
Applsci 09 00909
No ratings yet
Applsci 09 00909
29 pages
17 Attacks
No ratings yet
17 Attacks
12 pages
DSCAE A Denoising Sparse Convolutional Autoencoder Defense Against Adversarial Examples
No ratings yet
DSCAE A Denoising Sparse Convolutional Autoencoder Defense Against Adversarial Examples
11 pages
Harnessing The Vulnerability of Latent Layers in Adversarially Trained Models
No ratings yet
Harnessing The Vulnerability of Latent Layers in Adversarially Trained Models
7 pages
Research Paper
No ratings yet
Research Paper
7 pages
Adversarial Examples For Generative Models
No ratings yet
Adversarial Examples For Generative Models
7 pages
Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A Contemporary Survey
No ratings yet
Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A Contemporary Survey
46 pages
Adversarial Examples For Malware Detection: Abstract
No ratings yet
Adversarial Examples For Malware Detection: Abstract
18 pages
Adversarial Attacks and Defenses in Deep Learning
No ratings yet
Adversarial Attacks and Defenses in Deep Learning
39 pages
Wild Patterns: Ten Years After The Rise of Adversarial Machine Learning
No ratings yet
Wild Patterns: Ten Years After The Rise of Adversarial Machine Learning
17 pages
Threat of Adversarial Attacks On Deep Learning A Survey
No ratings yet
Threat of Adversarial Attacks On Deep Learning A Survey
21 pages
A Hamiltonian Monte Carlo Method For Probabilistic Adversarial Attack and Learning
No ratings yet
A Hamiltonian Monte Carlo Method For Probabilistic Adversarial Attack and Learning
13 pages
Secure Machine Learning Against Adversarial Samples at Test Time
No ratings yet
Secure Machine Learning Against Adversarial Samples at Test Time
15 pages
Securing The Diagnosis of Medical Imaging: An In-Depth Analysis of AI-Resistant Attacks
No ratings yet
Securing The Diagnosis of Medical Imaging: An In-Depth Analysis of AI-Resistant Attacks
21 pages
Adversarial Examples - Attacks and Defenses in The Physical World
No ratings yet
Adversarial Examples - Attacks and Defenses in The Physical World
12 pages
Carlini and Wagner 2017 Towards - Evaluating - The - Robustness - of - Neural - Networks
No ratings yet
Carlini and Wagner 2017 Towards - Evaluating - The - Robustness - of - Neural - Networks
19 pages
A Useful Taxonomy For Adversarial Robustness of Neural Networks
No ratings yet
A Useful Taxonomy For Adversarial Robustness of Neural Networks
7 pages
Explaining and Harnessing Adversarial Examples
No ratings yet
Explaining and Harnessing Adversarial Examples
3 pages
Defense Against Adversarial Attacks Using Convolutional Auto-Encoders
No ratings yet
Defense Against Adversarial Attacks Using Convolutional Auto-Encoders
9 pages
Inconspicuous Adversarial Patches For Fooling Image Recognition Systems On Mobile Devices
No ratings yet
Inconspicuous Adversarial Patches For Fooling Image Recognition Systems On Mobile Devices
10 pages
Face Recognition Attack
No ratings yet
Face Recognition Attack
6 pages
Adversarial Examples Are Not Bugs, They Are Features
No ratings yet
Adversarial Examples Are Not Bugs, They Are Features
13 pages
Bai 等。 - 2021 - AI-GAN Attack-Inspired Generation of Adversarial
No ratings yet
Bai 等。 - 2021 - AI-GAN Attack-Inspired Generation of Adversarial
5 pages
Snooping Attacks On Deep Reinforcement Learning: Preprint. Under Review
No ratings yet
Snooping Attacks On Deep Reinforcement Learning: Preprint. Under Review
13 pages
The Limitations of Deep Learning in Adversarial Settings
No ratings yet
The Limitations of Deep Learning in Adversarial Settings
16 pages
Defense Mechanism Against Adversarial Attacks Using Density-Based Representation of Images
No ratings yet
Defense Mechanism Against Adversarial Attacks Using Density-Based Representation of Images
6 pages
Adversarial Examples in The Physical World
No ratings yet
Adversarial Examples in The Physical World
15 pages
Ieeespmag 16
No ratings yet
Ieeespmag 16
5 pages
Adversarial Examples: Attacks and Defenses For Deep Learning
No ratings yet
Adversarial Examples: Attacks and Defenses For Deep Learning
20 pages
Research Paper Adversial Example412.6572
No ratings yet
Research Paper Adversial Example412.6572
1 page
Defense Against Adversarial Attacks On Deep Convolutional Neural Networks Through Nonlocal Denoising
No ratings yet
Defense Against Adversarial Attacks On Deep Convolutional Neural Networks Through Nonlocal Denoising
8 pages
Datacom Dump
No ratings yet
Datacom Dump
136 pages
Project Synopsis (1) 11111
No ratings yet
Project Synopsis (1) 11111
13 pages
Entropy 23 00018 v2 43
No ratings yet
Entropy 23 00018 v2 43
1 page
Adversarial Examples - Attacks and Defenses For Deep Learning
No ratings yet
Adversarial Examples - Attacks and Defenses For Deep Learning
20 pages
2017 Beginner's Review of Generative Adversarial Networks (GAN) Architectures
No ratings yet
2017 Beginner's Review of Generative Adversarial Networks (GAN) Architectures
9 pages
Digital Media Planning Workflow Diagram
No ratings yet
Digital Media Planning Workflow Diagram
1 page
Computer Basics - Basic Parts of A Computer - PowerPoint
No ratings yet
Computer Basics - Basic Parts of A Computer - PowerPoint
6 pages
Everything You Need To Know About SaaS
No ratings yet
Everything You Need To Know About SaaS
4 pages
Architecture Overview Diagram
No ratings yet
Architecture Overview Diagram
12 pages
Module 5
No ratings yet
Module 5
21 pages
Computer Tips and Tricks
100% (1)
Computer Tips and Tricks
46 pages
Cybersecurity Data Science: An Overview From Machine Learning Perspective
No ratings yet
Cybersecurity Data Science: An Overview From Machine Learning Perspective
29 pages
Pumping Lemma For RG
No ratings yet
Pumping Lemma For RG
13 pages
Datasheet LCD 1602A
No ratings yet
Datasheet LCD 1602A
12 pages
Openview Operations Error Messages
No ratings yet
Openview Operations Error Messages
267 pages
Sirah: Prophet
No ratings yet
Sirah: Prophet
4 pages
Kasim2021 (Putinpaper)
No ratings yet
Kasim2021 (Putinpaper)
13 pages
Compiler Design Syntax Analysis Bottom Up
No ratings yet
Compiler Design Syntax Analysis Bottom Up
11 pages
CDM Notes
No ratings yet
CDM Notes
58 pages
MIA Book
No ratings yet
MIA Book
10 pages
5G Network Emulation Solutions Catalog
No ratings yet
5G Network Emulation Solutions Catalog
23 pages
Summative Test Grade 9 Math
No ratings yet
Summative Test Grade 9 Math
3 pages
Abedin 2020
No ratings yet
Abedin 2020
6 pages
Lecture 5 MS-Access Tables Forms Queries Reports
No ratings yet
Lecture 5 MS-Access Tables Forms Queries Reports
34 pages
Final Year Project Report Format
No ratings yet
Final Year Project Report Format
80 pages
Weather Patterns Analysis and Prediction
No ratings yet
Weather Patterns Analysis and Prediction
22 pages
Week 3 Activities
No ratings yet
Week 3 Activities
2 pages
Ds7100niq1 Series
No ratings yet
Ds7100niq1 Series
93 pages
Chomsky Hiearchy 1
No ratings yet
Chomsky Hiearchy 1
2 pages
Hamza ELGADI CV
No ratings yet
Hamza ELGADI CV
1 page
Lecture 2 - Web Application Vulnerabilities in JAVA Web Application
No ratings yet
Lecture 2 - Web Application Vulnerabilities in JAVA Web Application
33 pages
FE Sherwin
No ratings yet
FE Sherwin
2 pages
Database Systems: Ms. Anum Hameed
No ratings yet
Database Systems: Ms. Anum Hameed
10 pages
Park DeepSDF Learning Continuous Signed Distance Functions For Shape Representation CVPR 2019 Paper
No ratings yet
Park DeepSDF Learning Continuous Signed Distance Functions For Shape Representation CVPR 2019 Paper
10 pages
Practicial 1 To 7,10,11,12 by Jas
No ratings yet
Practicial 1 To 7,10,11,12 by Jas
30 pages
A Survey of The Recent Architectures of Deep Convolutional Neural Networks
No ratings yet
A Survey of The Recent Architectures of Deep Convolutional Neural Networks
70 pages
SQL Lab 3
No ratings yet
SQL Lab 3
8 pages
Beautiful Rising Game
No ratings yet
Beautiful Rising Game
41 pages
Reading and Writing Skills Reviewer
No ratings yet
Reading and Writing Skills Reviewer
10 pages
Research On Text Classification Based On CNN and LSTM: Yuandong Luan Shaofu Lin
No ratings yet
Research On Text Classification Based On CNN and LSTM: Yuandong Luan Shaofu Lin
4 pages
Photovoltaic Module Defects Classification Analysis Using Densenet Architecture
No ratings yet
Photovoltaic Module Defects Classification Analysis Using Densenet Architecture
6 pages
A Balanced Introduction To Computer Science, 2/E
No ratings yet
A Balanced Introduction To Computer Science, 2/E
21 pages
Rainbow Technology: BY P.Aswanth Sai
No ratings yet
Rainbow Technology: BY P.Aswanth Sai
15 pages
Character Level Text Classification Via Convolutional Neural Network and Gated Recurrent Unit
No ratings yet
Character Level Text Classification Via Convolutional Neural Network and Gated Recurrent Unit
11 pages
Dbms Practical
No ratings yet
Dbms Practical
3 pages
Using Lexical Features For Malicious URL Detection - A Machine Learning Approach
No ratings yet
Using Lexical Features For Malicious URL Detection - A Machine Learning Approach
6 pages
A Study On Implementation of Goods and Services Tax (GST) in India: Prospectus and Challenges
No ratings yet
A Study On Implementation of Goods and Services Tax (GST) in India: Prospectus and Challenges
4 pages
GST in India: A Key Tax Reform: Management
No ratings yet
GST in India: A Key Tax Reform: Management
9 pages
Chomsky Hierarchy of Grammar (Original)
No ratings yet
Chomsky Hierarchy of Grammar (Original)
5 pages
Database System With Administration: Technical Assessment
No ratings yet
Database System With Administration: Technical Assessment
3 pages

ML Project 4 Final

Uploaded by

ML Project 4 Final

Uploaded by

A Tutorial on Attacking DNNs using Adversarial

D. Evaluating the Attack Methods

Figure 8 shows that adversarial attacks produced by JSMA

Fig. 8: Comparing the Error % induced on DNNs by Adver-

You might also like