ML Project 4 Final
ML Project 4 Final
Examples
Christian Both Arun Rawlani Andi Rayhan Chibrandy
ID: 260750691 ID: 260568533 ID: 260567400
[email protected] [email protected] [email protected]
Abstract—Deep neural networks (DNN) have achieved state- defense mechanisms against adversarial attacks. A demonstra-
of-the-art performance on image recognition tasks. However, it tion of these methods can be found in section V. The tutorial
has been observed that DNNs are vulnerable to small, targeted concludes with a discussion of more recent research on this
perturbations performed on the input image, producing what
the literature refers to as adversarial examples. In this paper, topic and possible future research directions suggested in the
we present a tutorial on the topic of adversarial examples in literature.
the context of deep neural networks. We first outline two attack
methods from the literature and give the readers a demonstration II. M OTIVATION : A T OY E XAMPLE
on using these methods to attack various convolutional neural Let us assume that we have a machine learning model
network architectures trained on various image datasets. We
also discuss defense mechanisms and implement one of them M that classifies images. The way M works is that given
to demonstrate their effectiveness against adversarial examples. an input image, denoted X, M will output a list of class
We hope to inspire readers to use this knowledge to design more scores where each entry Mi denotes the likelihood of the input
robust machine learning models. picture containing the object i that the entry specifies, like the
following:
I. I NTRODUCTION
Chicken = 0.65
Deep neural networks nowadays have been gaining popular- Duck = 0.25
ity in various machine learning tasks in part due to advances
.
made in research on neural networks and the availability of Scores(X) = (1)
.
more capable machines suited to the computationally intensive
.
task of training them. The ability of deep neural networks
Car = 0.07
to represent higher-level concepts from low-level features
and generalize them has, amongst other factors, made them We also assume that we have an adversary A that seeks
suitable for machine learning tasks involving a highly complex to produce a version of X that looks like X, denoted X ∗ , but
input space, such as image classification. will lead M into misclassifying the image X ∗ by changing the
It was first discussed in [1] that deep neural networks likeliest class. We would refer to X ∗ as an adversarial example,
are vulnerable to versions of inputs with slight perturbations which is formally explained in Section III. Unfortunately,
added, crafted intentionally to cause misclassification. Many A does not know much about the way machine learning
explanations have been offered as to why deep neural networks classifiers work. He will therefore try to do this the naive way.
possess this property, which appears to be in contradiction He then comes up with the following algorithm. He first picks
with their ability to generalize higher level concepts. Further a target class, t, that he wants to output as the new likeliest
research in [2] suggested and provided some quantitative class. Then, using brute-force search, he finds the combination
evidence that this vulnerability to adversarial examples is a of pixel changes that will try to maximize Mt and result in M
result of the linearity of most of the components that build up classifying to his chosen target class (we can, for the sake
a neural network. of the example, assume A can query M as many times as he
In this tutorial, we aim to provide a basic overview of wants). Will A’s algorithm work? What can we learn from this
research done so far in the topic of adversarial examples in example about actual methods of crafting adversarial images?
the context of deep neural networks. We begin our tutorial One can see that there are several problems with A’s
by demonstrating a naive adversary in a toy example, which algorithm. First of all, misclassification is only guaranteed if
will help the readers build an intuition as to what adversarial A does not object to making a lot of distortions overall to the
examples are and the failure of naive methods to generate image. However, the algorithm may then produce samples that
them. We build upon the basic knowledge gained in Section II look nothing like the original image X. The second problem
by giving a richer, more detailed description of the concepts is that A’s algorithm is very inefficient - the running time is
we will be discussing next. We then in section IV discuss easily exponential just to produce a single image.
the theory behind two of the methods found in literature However, in the real world, adversaries like A are a lot
employed to generate adversarial examples and present two smarter. As a result they have better ways of generating
these adversarial examples which can negatively affect users models of similar architectures, but also generalize across
utilizing these machine learning algorithms. We would request different representations [14]. For example, adversarial ex-
the reader to keep this toy example in mind as we explain more amples generated to for a logistic regression model can also
abstract concepts and attempt to answer these two questions: successfully cause a decision tree to misclassify. It is thus
1) What are smarter ways that an adversary can employ to possible to successfully create adversarial examples without
produce adversarial examples? 2) How can we defend our having access to the underlying model, which is termed
systems against such adversaries? ”Black-Box” attack [10].
III. BACKGROUND INFORMATION C. Adversarial Machine Learning
In order to explain the two questions above, one must It is important to put adversarial examples in the context
possess a deeper understanding the way our target models to the recognized field of adversarial machine learning, which
work. Below we briefly explain the required concepts to make links the crafting of those artificial images to the bigger picture
it easier for the reader to understand the methods discussed in of threats in the real world. Discussing adversarial behaviour
the following sections. is especially relevant when talking about defense mechanisms
A. Deep Neural Networks against adversarial examples. Generally, the term adversarial
machine learning deals with adversary attacks on a learning
A deep neural network can be described as a very large system, e.g. to make a classifier produce false classification
neural network which is typically organized in multiple layers which benefits the adversary, and discusses defenses and
of individual computing units. Those individual units, called countermeasures of such an attack [12]. As described in [11]
neurons are connected by links with different weights and those attacks will naturally occur whenever machine learning
biases which enables neural networks to model highly non- is used to prevent illegal or unsanctioned activity. It follows
linear relationships [15]. Being equipped with a complex that whenever there is an economic incentive, the adversary
architecture of several hidden layers, deep neural networks will attempt to circumvent the protection provided. Examples
are able to learn a higher-level representation of input data given in [12] name amongst others spam filtering, network-
layer by layer and are thus able to solve complex classification intrusion detection and virus detection as applications which
problems [4]. are sensitive to adversarial machine learning. As presented in
Neural Networks in general are trained to minimize a loss [13], adversarial attacks on learners (e.g. classifiers) can be
function. One can think of this as a function that takes in seen as a game between classifier and adversary to defeat the
input data along with the true labels and computes the quality opponent’s strategy. Although several potential scenarios have
of the network’s prediction on that input data. Generally, the been pointed out as to how adversarial examples might be
lower the value of the loss function, the better the predictive used to attack machine learning applications [8], [10], they do
performance of the network. not necessarily imply an adversarial attack itself. They might
B. Adversarial Examples even be intentionally created to improve the learner’s ability
to generalize, as discussed later in our tutorial.
Adversarial examples can be described as inputs that are
specially crafted to cause a machine learning model to produce D. Threats
an incorrect output [14]. More formally, we define X as a
As DNNs are already implemented in several applications
legitimate input that is classified by a learner to a class Y .
and give a promising field for more future applications in the
F(X) = Y (2) real world, it is of importance to explore the potential harm-
fulness of adversarial examples. For example, it is pointed out
We construct an adversarial example X* by adding a small
in [10] that self-driving cars equipped with a DNN classifier to
perturbation vector δ X to X resulting in a misclassification
scan the environment and detect traffic signs could possibly
Y*.
be causing accidents when signs requiring a certain action
F(X + δX ) = Y ∗ (3)
(e.g. stop sign) are manipulated and thus misclassified. Other
However, in order to produce a good adversarial example potentially harmful scenarios can be imagined in manipulating
we aim to find the smallest perturbation that produces misclas- face-recognition systems with adversarial inputs, as well as in
sifcation. This leads us to the following optimization problem fraud detection systems.
to solve:
IV. ATTACKING AND D EFENDING A N EURAL N ETWORK
argmin||δX ||s.t.F(X + δX ) = Y ∗ (4)
δX
We take this opportunity to refer back to A’s algorithm
However, it is non-trivial to solve this optimization problem described in Section II. We recall that one of the biggest
due to the non-linear and non-convex properties of a DNN challenges that the algorithm encounters was that it was doing
[22]. As a result, we employ techniques that can approximate a brute force search in order to find the desired combination
a reduced perturbation to misclassify the input. of pixel-wise perturbation, resulting in its being exponential in
It is now understood that adversarial samples generated to running time. It also risks generating adversarial examples that
attack a specific network architecture will mislead not only do not resemble what natural samples from the original class
would look like. Intuitively, if a smarter way of choosing how
to perturb pixels can be used instead of a brute force search
scheme, one can build an algorithm to produce adversarial
examples that not only will take shorter to run but also can
potentially induce a smaller amount of perturbation, preventing
human detectability.
With these targets in mind, we introduce two methods that
are often employed to generate adversarial examples, followed
by two defense mechanisms which can provide resistance
against such attacks.
A. Fast Gradient Sign Method
Recall from section II that we need a smarter way to
pick a combination of pixel perturbations that will produce
adversarial examples reliably. Recall also from subsection
III-A that neural networks are typically trained to minimize
a loss function. We can see that if we can produce samples
that cause our loss function to increase when these samples Fig. 1: Saliency map of a 784-dimensional input to the LeNet
are passed in as input, then we might have a way to produce model. The 784 input dimensions are arranged to correspond
adversarial examples. We can intuitively see the potential of to the 28x28 image (MNIST) pixel alignment. [6]
gradient-based approaches here. Indeed, the Fast Gradient Sign
Method operates along a similar line of approach.
It was found in [2] that taking the sign of the gradient of the adversary A, there is a smarter approach to find effective
the loss function of a deep neural network with respect to its perturbations that guarantee misclassification, at the expense
input and applying perturbations based on this to input images of a longer running time.
reliably produces adversarial examples. This approach is thus This smarter approach is known Jacobian Saliency Map
termed the ”Fast Gradient Sign Method”. Now we explain Approach (JSMA) and is another technique for generating
the formulation behind fast gradient sign approach. Let w be adversarial examples for acyclic feed-forward DNNs [6]. The
the parameters or weights in the network, x the input to the JSMA approach does a source-target misclassification, where
network and y the target classes that we train our network with. it specifically distorts the source input image from a particular
Now we denote the cost function of our neural network as class, s, to misclassify it to a chosen target class, t.
L(w, x, y). To generate adversarial sample xadv , the fast gradient Unfortunately, the non-convexity and non-linearity prop-
sign method uses the following update rule: erties of DNNs make it difficult to identify the set of op-
timal perturbations required to misclassify the input to the
xadv = x + εsign(∇x L(w, x, y)) (5)
target class. [22] Consequently, this approach aims to find
In equation 5, ε denotes the magnitude of the perturbation a suitable heuristic in order to enable efficient exploration
we want applied to the image. The bigger the value of ε, the of the adversarial-sample search space and find the most
higher the amount of perturbation applied will be. This will effective set of perturbations that will lead to the targeted
result in more certain misclassification. However, too high a misclassification. So what heuristic does JSMA employ to
value will result in a sample that does not resemble the original overcome the challenge of non-convexity and non-linearity in
image. DNNs?
This approach was found to reliably produce adversarial The JSMA strategically handles this challenge by exploiting
examples using various datasets and classification algorithms the forward derivatives of the learned network. It focuses
while still being relatively cheap to compute - the gradient can on the DNNs output with respect to changes in the input
be efficiently derived using the back propagation algorithm. which yields the required forward derivatives. The forward
On top of this, the amount of perturbation applied can be derivatives of the model M, learned by this network with n
controlled by changing the value of ε. This approach thus input features, are defined by the Jacobian matrix which can
addresses in part some of the problems that the brute force be formulated as follows:
algorithm in section II faces.
∂ M(X) ∂ M(X)
∇M(X) = [ ... ] (6)
B. Jacobian Saliency Map Approach ∂ x1 ∂ xn
The Fast Gradient Sign Approach is a good start to solve Forward derivatives assist the adversary by highlighting
the first question asked in Section II. However, it still does input features unlikely to produce adversarial examples. This
not guarantee a misclassification once the perturbation has helps the adversary to focus on features with larger forward
been added to the input, meaning that the adversarial examples derivative values that would yield the misclassification with a
may not be good enough to trick the system. Fortunately, for smaller degree of overall distortion.
To further develop the idea, the matrix in equation 6 is then Balanced by the factor α, the loss of the adversarial example
utilized to form the adversarial saliency map. The adversarial is added to the loss of its original example, so that correct
saliency map assists the adversary A in finding the most classification of both, the legitimate and corrupted input is
efficient way to produce the targeted misclassification by forced.
indicating what features to perturb. Figure 1 provides a helpful The procedure of adversarial training is visualized1 in
visualization of an adversarial saliency map. Figure 2 and can be understood as an iterative approach.
To understand how this approach is helpful, recall the toy When training a Neural Network, adversarial examples are
example introduced at the beginning of our tutorial. We know created after each training epoch and added to the data set for
that our model M outputs a vector with the class scores as future training. Note then that the new loss function shown
shown in Equation 1. Then, the generated saliency map aids in equation 7 simulates the process of generating adversarial
in increasing the class score of the target class, denoted as examples using the Fast Gradient Sign method and injecting
Mtarget (X), while decreasing it for all Mi where i 6= target until it into the training set.
target = argmaxi Mi (X). This strategy facilitates in finding the
set of relevant features to perform perturbations on, such that it
will lead to maximizing Mtarget (X) and finally misclassifying
the input to the target class.
Additionally, the JSMA has two hyperparameters that gov-
ern its performance in different settings. The amount by which
selected features are perturbed in each iteration is regulated by
θ . Additionally, the ϒ controls the maximum distortion/max
number of iterations allowed on a sample and limits the
number of features that are perturbed to form the adversarial
example. The value of these hyperparameters should change
with respect to the data being handled by the DNN, as they
keep the total distortion in check to avoid human detectability.
Now that we have seen that the adversary A could be smarter
in producing its adversarial examples, the reader might believe
Fig. 2: Principle of Adversarial Training
that its classifier is always at risk. So that leaves us with this
question: Is there a way to defend our machine learning
model against such smart adversaries? D. Defensive Distillation
Defensive distillation is another method to mitigate the
C. Adversarial Training effect of adversarial examples developed in [9]. It builds on
In order to give an intuition about the first defense mech- the concept of distilling knowledge from deep architectures
anism introduced, recall that adversarial examples do not introduced in [8] where a learner consisting of a small archi-
reflect naturally occurring images but confront the classifier tecture was used to mimic the output of a large, computational
with artificially crafted worst-case perturbations which expose expensive model. However, defensive distillation uses the
unlearned ”blindspots” in the learned model [2]. Thus it is a interaction between two model architectures differently. Here,
first instinctive approach to use those examples and confront it is aimed to train a model to reproduce probabilistic labels
the model with them again to force correct classification of created by the initial model, which makes the second, distilled
the corrupted input. model more resilient against adversarial examples. The prin-
The concept of adversarial training develops this idea further ciple of defensive distillation2 is presented in Figure 3. The
to a systematic training procedure for neural networks where single steps leading to a distilled network architecture can be
adversarial examples are used for augmentation of the training summarized as follows:
set. This data augmentation method differs from other known • Train an initial neural network using discrete labels
data augmentation schemes such as translating or rotating (one single non-zero element in output vector Y which
legitimate examples, which reflects the reproduction of legit- corresponds to the correct class)
imate inputs in different shapes. However, similar to natural • Use a softmax layer which outputs probability scores for
data set augmentation, it is aimed to create a regularization for each training sample belonging to a particular class
effect where the classifier learns to generalize better using (e.g. P(car) = 0.95, P(ship) = 0.03, P(tree) = 0.02 ... )
the augmented dataset. This regularization can be achieved by • Train a second neural network of identical architecture
adding an additional term when calculating the loss function. using the same training set with the difference of assign-
The modified loss function has the form: ing previously received probability scores as new labels
for each training sample
Lnew (w, x, y) =
1 Own visualization created after [2]
αL(w, x, y) + (1 − α)L(w, x + εsign(∇x L(w, x, y)), y) (7) 2 Own visualization created after [9]
Using the probabilistic training labels instead of discrete the original training and testing sets to build a dataset
labels has the following beneficial effect: The weights of the consisting 13 000 labeled images. We then perform a 90-
distilled model are prevented from fitting too tightly to the 10 split to obtain a training dataset consisting of 11 700
trained data. Defensive distillation thus contributes to a better samples and a testing set with 1 300 samples, with the
generalization, which is resilient against the small worst-case distribution of classes being very similar in the training
perturbations of adversarial examples resulting in correctly and test sets.
classifying them with a high accuracy. 2) Neural Network Architectures: We explore two different
neural network architectures. The first one is a simple deep
convolutional neural network architecture that we built our-
selves. The second one is a pre-trained VGG19 model [19].
This is a much more complex neural network architecture with
19 layers. We decided to use more than one architecture in this
tutorial is because we wanted to illustrate through examples
that vulnerability to adversarial examples is not specific to
a specific architecture of neural networks. Along this line of
reasoning, we decided to use a simple architecture along with
a much more complicated one. A summary of the architecture
of the simpler network can be seen in Table III in the appendix.
B. Fast Gradient Sign Method
In order to give an example of an implementation to the
readers, we simulated a run of this algorithm to generate
Fig. 3: Principle of Defensive Distillation adversarial examples from various datasets. This implemen-
tation follows the pseudo code outlined in figure 4. On the
V. M ETHODS IN ACTION CIFAR10 and MNIST dataset we trained our own neural
In this section we give readers an simulation of a run network architecture for 10 epochs. On the STL10 dataset, a
through of the attack methods outlined above on various net- more complicated model was required to achieve a reasonable
work architectures. We then follow up with an implementation level of accuracy so we trained the VGG19 network on this
of the Adversarial Training defense mechanism to demonstrate dataset for 20 epochs. We outline the statistics of the attacks
that it is possible to defend your machine against such attacks. in table I.
The first section below gives an overview of the datasets and
neural network architectures we used to simulate our methods.
The attack and defense mechanisms above are implemented
using the cleverhans library [20] with tensorflow [21] as the
backend.
A. Overview of Datasets and Architectures
In this section we give readers the opportunity to learn more Fig. 4: Algorithm for Generating Adversarial Examples using
about the datasets and the neural network architectures we FGSM where M is the model, X the input, Y the set of labels
used when giving practical examples of the attack and defense and epsilon the degree of perturbation applied.
methods:
1) Dataset: In Figure 5, we can see one example of an image and its
• MNIST: This dataset consists of 28x28 pixel black and adversarial version that was generated using this approach.
white images of handwritten digits [16]. There are 10 The image is from the STL10 dataset. Here the distortion is
classes, each representing a digit from 0 to 9. There clearly visible. However, the adversarial sample still largely
are 60 000 labeled images in the training set and 10 000 contains the characteristics of the original image.
labeled images in the test set.
• CIFAR10: This dataset consists of 32x32 RGB images C. Jacobian Saliency Map Approach
spread over 10 categories [17]. The train set consists of Now we perform an implementation of the JSMA algorithm
50 000 labeled images and the test set consists of 10 000 discussed previously to generate adversarial examples for the
labeled images. three datasets defined in Section V-A, where the source input
• STL10: This dataset is similar to the CIFAR10 dataset image was distorted to misclassify it to the target class.
but it has different classes and it contains images that are We used the algorithm shown in Figure 6 for our im-
bigger in resolution - 96 x 96 pixels [18]. The training plementation of JSMA. For each dataset, the algorithm for
set contains 5 000 labeled images and the test set con- JSMA was adapted to work with the various resolutions
tains 8 000 labeled images. For this tutorial, we combine and color channels that were available. The algorithm then
(a) Original Image (b) Adversarial Image (a) Original Image (b) Adversarial Image
Fig. 5: Example of an image from the STL10 dataset and Fig. 7: Example of an CIFAR10 image and its adversarial
its adversarial version generated using the fast gradient sign version generated using the JSMA approach with ϒ=0.1 and
method ε=0.04. θ =+1.
chooses the input features, i.e. pixel intensities in case of Furthermore, the algorithm used different combinations of
image classification problems, to be perturbed. For example, hyperparameter values for different datasets. Depending on
in the case of the CIFAR10 dataset, the number of input rows, the resolution of the input image, the ϒ parameter adjusts
input columns and color channels was set to 64, 64 and 3, the maximum number of iterations to increase proportionally
respectively. Furthermore, the sizes of the filters in the pooling with the number of pixels in the input image. The reason
layer of the network were also adjusted to work well with this for this is because ϒ monitors the percentage by which the
dataset because the default values were set to work with the image could be distorted to craft the adversarial example. As
MNIST dataset. the resolution gets better, the algorithm can afford to change
a higher percentage while avoiding human detection. This
phenomenon is further validated by the Figure 7, where lower
resolution of the CIFAR10 images limits the value of ϒ, which
makes it harder to form an good adversarial example that
produces a misclassification, while being imperceptible to the
human eye.
B. Effectiveness of Current Defense Mechanisms Responsible for writing the description of the Jacobian
Based Saliency Approach. Implemented and simulated the
Defending against adversarial attacks is a hard problem. The JSMA on three different datasets. Created some of the graphs
fact that most machine learning models are not designed and and images used in the paper. Also contributed to the Back-
trained with the assumption that an adversary is present also ground Information, Discussion sections of this paper.
does not help alleviate this problem.
The defense mechanisms so far seem to handle the attacks
that we described above quite well. We showed that a premise C. Andi
as simple as Adversarial Training can achieve promising Chiefly responsible for the simulation of the fast gradient
results when it comes to warding off against adversarial sign method and parts of the paper pertaining to it. Generated
examples. results for Adversarial Training. Contributed in writing of the
However, most defense mechanisms so far aim to handle toy example and discussion on attack and defense methods.
attack methods carried out under specific paradigms. For
example, adversarial training handles FGSM attacks well and D. All
defensive distillation handles JSMA attacks well. Goodfellow
and Papernot refer to this as playing a ”game of whack-a- Made the paper fit the description of a tutorial better as
mole” [23]. That is to say, defense mechanisms proposed so opposed to a regular report. Proof-read and performed many
far address only specific types of vulnerabilities but not all of revisions of the paper.
them. These defense mechanisms are not adaptive to different We hereby state that all the work presented in this report
types of attacks. It also seems that as researchers discover is that of the Authors.
R EFERENCES
[1] Szegedy, C., Zaremba, W., Sutskever, I. et al. (2014). Intriguing prop-
erties of neural networks.. ICLR, abs/1312.6199, 2014b. URL http:
//arxiv.org/abs/1312.6199.
[2] Goodfellow, I., Shlens, J. and Szegedy, C. (2015). Explaining and
harnessing adversarial examples.. In Proceedings of the International
Conference on Learning Representations.
[3] Krizhevsky, A., Sutskever, I. and Hinton, G. (2012). Imagenet classi-
fication with deep convolutional neural networks. Advances in Neural
Information Processing Systems 25, pages 11061114.
[4] Goodfellow, I., Bengio, Y. and Courville, A. (2016). Deep Learning MIT
Press, url:http://www.deeplearningbook.org.
[5] Deng, J., Dong, W., Socher, R. et al. (2009). Imagenet: A large-scale hi-
erarchical image database. In Computer Vision and Pattern Recognition.
CVPR 2009, IEEE Conference on, pages 248255.
[6] Papernot, N., McDaniel, P., Jha, S. et al (2016). The limitations of deep
learning in adversarial settings In Security and Privacy (EuroS&P), 2016
IEEE European Symposium.
[7] Moosavi-Dezfooli S., Fawzi A. and Frossard P. (2015). Deepfool: a simple
and accurate method to fool deep neural networks.. CoRR (2015) vol.
abs/1511.04599.
[8] Hinton G. , Vinyals O., and Dean J. (2014). Distilling the knowledge in a
neural network. In Deep Learning and Representation Learning Workshop
at NIPS 2014. arXiv preprint arXiv:1503.02531, 2014.
[9] Papernot, N., McDaniel, P., Wu, X. et al (2016). Distillation as a defense
to adversarial perturbations against deep neural networks. In Security
and Privacy (SP), 2016 IEEE Symposium on (pp. 582-597). IEEE.
[10] Papernot, N., McDaniel, P., Goodfellow, I. et al. (2017). Practical Black-
Box Attacks against Machine Learning. ASIA CCS ’17 Proceedings of
the 2017 ACM on Asia Conference on Computer and Communications TABLE III: Architecture of the custom neural network used
Security, Pages 506-519. in for the MNIST and CIFAR10 datasets
[11] Laskov, P. and Lippmann, R. (2010) Machine learning in adversarial
environments. Mach Learn (2010) 81: 115. doi:10.1007/s10994-010-5207- Convolutional, 64 filters, 3x3 filter size, ReLU
6. Convolutional, 128 filters, 3x3 filter size, ReLU
[12] Huang, L., Joseph, A. D., Nelson, B. et al. (2011). Adversarial machine Convolutional, 128 filters, 3x3 filter size, ReLU
learning. In Proceedings of the 4th ACM workshop on Security and MaxPool 2x2 filter size
artificial intelligence (pp. 43-58). ACM. Dropout 0.25
[13] Dalvi, N., Domingos, P., Sanghai, S., et al. (2004). Adversarial clas- Convolutional, 64 filters, 3x3 filter size, ReLU
sification. In Proceedings of the tenth ACM SIGKDD international Convolutional, 128 filters, 3x3 filter size, ReLU
conference on Knowledge discovery and data mining, pp. 99108. ACM. Convolutional, 128 filters, 3x3 filter size, ReLU
[14] N. Papernot, P. McDaniel, and I. Goodfellow. (2016). Transfer- MaxPool 2x2 filter size
ability in Machine Learning: from Phenomena to Black-Box At- Dropout 0.5
tacks using Adversarial Samples.ArXiv e-prints, May 2016b. URL Fully-Connected, 512 nodes
http://arxiv.org/abs/1605.07277. Fully Connected, 10 nodes, softmax activation
[15] Bengio, Y. (2009). Learning deep architectures for AI. Foundations and
trends in Machine Learning 2.1 (2009): 1-127.
[16] LeCun, Y., Cortes, C. and Burges, C.J. (1998). The MNIST database of
handwritten digits.
[17] Krizhevsky, A., and Hinton, G. (2009). Learning multiple layers of
features from tiny images.
[18] Coates, A., Lee, H. and Ng, A.Y., (2010). An analysis of single-layer
networks in unsupervised feature learning. Ann Arbor, 1001(48109), p.2.
[19] Simonyan, K. and Zisserman, A., (2014). Very deep convolutional net-
works for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[20] Papernot, N., Goodfellow, I., Sheatsley, R. et al. (2016). clever-
hans v1. 0.0: an adversarial machine learning library. arXiv preprint
arXiv:1610.00768.
[21] Martn A., Ashish A., Paul A., et al. TensorFlow: Large-scale machine
learning on heterogeneous systems, 2015. Software available from ten-
sorflow.org.
[22] Larochelle, H., Bengio Y.,Louradour J. et al. Exploring strategies for
training deep neural networks. Journal of Machine Learning Research
10, no. Jan (2009): 1-40.
[23] Goodfellow, I., Papernot, N. (2017, February 15). Is attacking
machine learning easier than defending it? [Blog post]. Retrieved from
http://www.cleverhans.io/security/privacy/ml/2017/02/15/why-attacking-
machine-learning-is-easier-than-defending-it.html
A PPENDIX A
A RCHITECTURE OF C USTOM N EURAL N ETOWKR