0% found this document useful (0 votes)
111 views15 pages

Adversarial Examples in The Physical World

This technical report from Google explores the vulnerability of machine learning systems to adversarial examples even when inputs are received through physical sensors like cameras rather than directly to the model. The researchers generated adversarial images for an Inception image classifier and found that a large fraction of these examples remained misclassified even when perceived through a cell phone camera, demonstrating that physical systems using computer vision are still at risk. They also discuss the security implications and limitations of assuming the attacker knows the target model, and conduct additional experiments transforming adversarial images synthetically to study transferability.

Uploaded by

eldelmoño luci
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views15 pages

Adversarial Examples in The Physical World

This technical report from Google explores the vulnerability of machine learning systems to adversarial examples even when inputs are received through physical sensors like cameras rather than directly to the model. The researchers generated adversarial images for an Inception image classifier and found that a large fraction of these examples remained misclassified even when perceived through a cell phone camera, demonstrating that physical systems using computer vision are still at risk. They also discuss the security implications and limitations of assuming the attacker knows the target model, and conduct additional experiments transforming adversarial images synthetically to study transferability.

Uploaded by

eldelmoño luci
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Technical report, Google, Inc.

A DVERSARIAL EXAMPLES IN THE PHYSICAL WORLD


Alexey Kurakin
Google Brain
[email protected]

Ian J. Goodfellow
OpenAI
[email protected]

Samy Bengio
arXiv:1607.02533v1 [cs.CV] 8 Jul 2016

Google Brain
[email protected]

A BSTRACT

Most existing machine learning classifiers are highly vulnerable to adversarial


examples. An adversarial example is a sample of input data which has been mod-
ified very slightly in a way that is intended to cause a machine learning classifier
to misclassify it. In many cases, these modifications can be so subtle that a human
observer does not even notice the modification at all, yet the classifier still makes
a mistake. Adversarial examples pose security concerns because they could be
used to perform an attack on machine learning systems, even if the adversary has
no access to the underlying model. Up to now, all previous work have assumed a
threat model in which the adversary can feed data directly into the machine learn-
ing classifier. This is not always the case for systems operating in the physical
world, for example those which are using signals from cameras and other sensors
as an input. This paper shows that even in such physical world scenarios, machine
learning systems are vulnerable to adversarial examples. We demonstrate this by
feeding adversarial images obtained from cell-phone camera to an ImageNet In-
ception classifier and measuring the classification accuracy of the system. We find
that a large fraction of adversarial examples are classified incorrectly even when
perceived through the camera.

1 I NTRODUCTION

Recent advances in machine learning and deep neural networks enabled researchers to solve multiple
important practical problems like image, video, text classification and others (Krizhevsky et al.,
2012; Hinton et al., 2012; Bahdanau et al., 2015).
However, machine learning models are often vulnerable to adversarial manipulation of their input in-
tended to cause incorrect classification (Dalvi et al., 2004). In particular, neural networks and many
other categories of machine learning models are highly vulnerable to attacks based on small modifi-
cations of the input to the model at test time (Biggio et al., 2013; Szegedy et al., 2014; Goodfellow
et al., 2014; Papernot et al., 2016b).
The problem can be summarized as follows. Lets say there is a machine learning system M and
input sample C which we call a clean example. Lets assume that sample C is correctly classified by
the machine learning system, i.e. M (C) = ytrue . Its possible to construct an adversarial example
A which is perceptually indistinguishable from C but is classified incorrectly, i.e. M (A) 6= ytrue .
These adversarial examples are misclassified far more often than examples that have been perturbed
by noise, even if the magnitude of the noise is much larger than the magnitude of the adversarial
perturbation (Szegedy et al., 2014).

1
Technical report, Google, Inc.

Adversarial examples pose potential security threats for practical machine learning applications.
In particular, Szegedy et al. (2014) showed that an adversarial example that was designed to be
misclassified by a model M1 is often also misclassified by a model M2 . This adversarial example
transferability property means that it is possible to generate adversarial examples and perform a mis-
classification attack on a machine learning system without access to the underlying model. Papernot
et al. (2016a) and Papernot et al. (2016b) demonstrated such attacks in realistic scenarios.
However all prior work on adversarial examples for neural networks made use of a threat model in
which the attacker can supply input directly to the machine learning model. Thus adversarial attacks
rely on fine-grained modifications of input data.
Such a threat model can describe some scenarios in which attacks can take place entirely within a
computer, such as as evading spam filters or malware detectors (Biggio et al., 2013; Nelson et al.).
However, many practical machine learning systems operate in the physical world. Possible exam-
ples include but are not limited to: robots perceiving world through cameras and other sensors, video
surveillance systems, and mobile applications for image or sound classification. In such scenarios
the adversary cannot rely on the ability of fine-grained per-pixel modifications of the input data. The
following question thus arises: is it still possible to craft adversarial examples and perform adver-
sarial attacks on machine learning systems which are operating in the physical world and perceiving
data through various sensors, rather than digital representation?
Some prior work has addressed the problem of physical attacks against machine learning systems,
but not in the context of fooling neural networks by making very small perturbations of the input.
For example, Carlini et al. (2016) demonstrate an attack that can create audio inputs that mobile
phones recognize as containing intelligible voice commands, but that humans hear as an unintelli-
gible voice. Face recognition systems based on photos are vulnerable to replay attacks, in which
a previously captured image of an authorized users face is presented to the camera instead of an
actual face (Smith et al., 2015). Adversarial examples could in principle be applied in either of
these physical domains. An adversarial example for the voice command domain would consist of a
recording that seems to be innocuous to a human observer (such as a song) but contains voice com-
mands recognized by a machine learning algorithm. An adversarial example for the face recognition
domain might consist of very subtle markings applied to a persons face, so that a human observer
would recognize their identity correctly, but a machine learning system would recognize them as
being a different person.
In this paper we explore the possibility of creating adversarial examples in the physical world for
image classification tasks. For this purpose we conducted an experiment with a pre-trained ImageNet
Inception classifier (Szegedy et al., 2015). We generated adversarial examples for this model, then
we fed these examples to the classifier through a cell-phone camera and measured the classification
accuracy. This scenario is a simple physical world system which perceives data through a camera
and then runs image classification. We found that a large fraction adversarial examples generated
for the original model remain misclassified even when perceived through a camera.
Surprisingly, our attack methodology required no modification to account for the presence of the
camerathe simplest possible attack of using adversarial examples crafted for the Inception model
resulted in adversarial examples that successfully transferred to the union of the camera and Incep-
tion. Our results thus provide a lower bound on the attack success rate that could be achieved with
more specialized attacks that explicitly model the camera while crafting the adversarial example.
One limitation of our results is that we have assumed a threat model under which the attacker has
full knowledge of the model architecture and parameter values. This is primarily so that we can
use a single Inception v3 model in all experiments, without having to devise and train a different
high-performing model. The adversarial example transfer property implies that our results are likely
to extend trivially to the scenario where the attacker does not have access to the model description
(Szegedy et al., 2014; Goodfellow et al., 2014; Papernot et al., 2016b).
To better understand how the non-trivial image transformations caused by the camera affect adver-
sarial example transferability, we conducted a series of additional experiments where we studied
how adversarial examples transfer across several specific kinds of synthetic image transformations.
The rest of the paper is structured as follows: In Section 2, we review different methods which we
used to generate adversarial examples. This is followed in Section 3 by details about our physical

2
Technical report, Google, Inc.

world experimental set-up and results. Finally, Section 4 describes our experiments with various
artificial image transformations (like changing brightness, contrast, etc...) and how they affect ad-
versarial examples.

2 M ETHODS GENERATING ADVERSARIAL IMAGES

This section describes different methods to generate adversarial examples which we have used in the
experiments. It is important to note that none of the described methods guarantees that generated
image will be misclassified. Nevertheless we call all of the generated images adversarial images.
In the remaining of the paper we use the following notation:

X - an image, which is typically 3-D tensor (width height depth). In this paper, we
assume that the values of the pixels are integer numbers in the range [0, 255].
ytrue - true class for the image X.
J(X, y) - cross-entropy cost function of the neural network, given image X and class
y. We intentionally omit network weights (and other parameters) in the cost func-
tion because we assume they are fixed (to the value resulting from training the machine
learning model) in the context of the paper. For neural networks with a softmax output
layer, the cross-entropy cost function applied to integer class labels equals the negative
log-probability of the true class given the image: J(X, y) = log p(y|X), this relation-
ship will be used below.
ClipX, {X 0 } - function which performs per-pixel clipping of the image X 0 , so the result
will be in L -neighbourhood of the source image X. The exact clipping equation is as
follows:
n o
ClipX, {X 0 } (x, y, z) = min 255, X(x, y, z)+, max 0, X(x, y, z), X 0 (x, y, z)


where X(x, y, z) is the value of channel z of the image X at coordinates (x, y).

2.1 FAST METHOD

One of the simplest methods to generate adversarial images, described in (Goodfellow et al., 2014),
is motivated by linearizing the cost function and solving for the perturbation that maximizes the cost
subject to an L constraint. This may be accomplished in closed form, for the cost of one call to
back-propagation:
X adv = X +  sign X J(X, ytrue )


where  is a hyper-parameter to be chosen.


In this paper we refer to this method as fast because it does not require an iterative procedure to
compute adversarial examples, and thus is much faster than other considered methods.

2.2 BASIC ITERATIVE METHOD

We introduce a straightforward way to extend the fast methodwe apply it multiple times with
small step size, and clip pixel values of intermediate results after each step to ensure that they are in
an -neighbourhood of the original image:
n o
X0adv = X, adv
XN +1 = Clip X, X adv
N + sign X J(X adv
N , ytrue )

In our experiments we used = 1, i.e. we changed the value of each pixel only by 1 on each step.
We selected the number of iterations to be min( + 4, 1.25). This amount of iterations was chosen
heuristically; it is sufficient for the adversarial example to reach the edge of the  max-norm ball but
restricted enough to keep the computational cost of experiments manageable.
Below we refer to this method as basic iterative method.

3
Technical report, Google, Inc.

2.3 I TERATIVE LEAST- LIKELY CLASS METHOD

Both methods we have described so far simply try to increase the cost of the correct class, without
specifying which of the incorrect classes the model should select. Such methods are sufficient for
application to datasets such as MNIST and CIFAR-10, where the number of classes is small and all
classes are highly distinct from each other. On ImageNet, with a much larger number of classes and
the varying degrees of significance in the difference between classes, these methods can result in
uninteresting misclassifications, such as mistaking one breed of sled dog for another breed of sled
dog. In order to create more interesting mistakes, we introduce the iterative least-likely class method.
This iterative method tries to make an adversarial image which will be classified as a specific desired
target class. For desired class we chose the least-likely class according to the prediction of the trained
network on image X:

yLL = arg min p(y|X) .
y

For a well-trained classifier, the least-likely class is usually highly dissimilar from the true class, so
this attack method results in more interesting mistakes, such as mistaking a dog for an airplane.
To make an adversarial image which is classified
 as yLL we maximize
log p(yLL |X) by mak-
iterative steps in the direction of sign X log p(yLL |X) . This last expression equals
ing 
sign X J(X, yLL ) for neural networks with cross-entropy loss. Thus we have the following
procedure:

X0adv = X, adv
 adv adv

XN +1 = ClipX, XN sign X J(XN , yLL )

For this iterative procedure we used the same and same number of iterations as for the basic
iterative method.
Below we refer to this method as the least likely class method or shortly l.l. class.

2.4 C OMPARISON OF METHODS OF GENERATING ADVERSARIAL EXAMPLES

As mentioned above, it is not guaranteed that an adversarial image will actually be misclassified
sometimes the attacker wins, and sometimes the machine learning model wins. We did an exper-
imental comparison of adversarial methods to understand the actual classification accuracy on the
generated images as well as the types of perturbations exploited by each of the methods.
The experiments were performed on all 50, 000 validation samples from the ImageNet dataset (Rus-
sakovsky et al., 2014) using a pre-trained Inception v3 classifier (Szegedy et al., 2015). For each
validation image, we generated adversarial examples using different methods and different values
of . For each pair of method and , we computed the classification accuracy on all 50, 000 images.
Also, we computed the accuracy on all clean images, which we used as a baseline.
Examples of generated adversarial images are provided in Figures 1 and 2. Top-1 and top-5 classi-
fication accuracy on clean and adversarial images are summarized in Figure 3.
As shown in Figure 3, the fast method decreases top-1 accuracy by a factor of two and top-5 accuracy
by about 40% even with the smallest values of . As we increase , accuracy on adversarial images
generated by the fast method stays on approximately the same level until  = 32 and then slowly
decreases to almost 0 as  grows to 128. This could be explained by the fact that the fast method
adds -scaled noise to each image, thus higher values of  essentially destroys the content of the
image and makes it unrecognisable even by humans, see Figure 1.
Iterative methods exploit much finer perturbations which do not destroy the image even with higher
, see Figure 2.
The basic iterative method is able to produce better adversarial images when  < 48, however as we
increase  it is unable to improve.
The least likely class method destroys the correct classification of most images even when  is
relatively small.

4
Technical report, Google, Inc.

clean image =4 =8  = 16

 = 24  = 32  = 48  = 64

clean image =4 =8  = 16

 = 24  = 32  = 48  = 64

Figure 1: Comparison of images resulting from an adversarial pertubation using the fast method.
The top image is a knee pad while the bottom one is a garbage truck. In both cases clean images
are classified correctly and adversarial images are misclassified for all considered .

5
Technical report, Google, Inc.

Clean image Fast; L distance to clean image = 32

Basic iter.; L distance to clean image = 32 L.l. class; L distance to clean image = 28

Figure 2: Comparison of different adversarial methods with  = 32. Perturbations generated by


iterative methods are finer compared to the fast method. Also iterative methods do not always select
a point on the border of -neighbourhood as an adversarial image.

6
Technical report, Google, Inc.

1.0

0.8

0.6
top-1 accuracy

clean images
fast adv.
basic iter. adv.
least likely class adv.
0.4

0.2

0.0
0 16 32 48 64 80 96 112 128
epsilon
1.0

0.8

0.6
top-5 accuracy

clean images
fast adv.
basic iter. adv.
least likely class adv.
0.4

0.2

0.0
0 16 32 48 64 80 96 112 128
epsilon
Figure 3: Top-1 and top-5 accuracy of Inception v3 under attack by different adversarial methods
and different  compared to clean images unmodified images from the dataset. The Accuracy
was computed on all 50, 000 validation images from the ImageNet dataset. In these experiments 
varies from 2 to 128.

7
Technical report, Google, Inc.

(a) Printout (b) Photo of printout (c) Cropped image

Figure 4: Experimental setup: (a) generated printout which contains pairs of clean and adversar-
ial images, as well as QR codes to help automatic cropping; (b) photo of the printout made by a
cellphone camera; (c) automatically cropped image from the photo.

We limit all further experiments to  16 because such perturbations are only perceived as a
small noise (if perceived at all), and adversarial methods are able to produce a significant number of
misclassified examples in this -neighbourhood of clean images.

3 P HOTOS OF ADVERSARIAL EXAMPLES


3.1 D ESTRUCTION RATE OF ADVERSARIAL IMAGES

To study the influence of arbitrary transformations on adversarial images we introduce the notion
of destruction rate. It can be described as the fraction of adversarial images which are no longer
misclassified after the transformations. The formal definition is the following:

Pn
k=1 C(X k , ytrue
k k , yk
)C(Xadv k k
true )C(T (Xadv ), ytrue )
d= Pn (1)
k k k k
k=1 C(X , ytrue )C(Xadv , ytrue )

where n is the number of images used to comput the destruction rate, X k is an image from the
k k
dataset, ytrue is the true class of this image, and Xadv is the corresponding adversarial image. The
function T () is an arbitrary image transformationin this article, we study a variety of transfor-
mations, including printing the image and taking a photo of the result. The function C(X, y) is an
indicator function which returns whether the image was classified correctly:

1, if image X is classified as y;
C(X, y) =
0, otherwise.

We denote the binary negation of this indicator value as C(X, y), which is computed as C(X, y) =
1 C(X, y).

3.2 E XPERIMENTAL SETUP

To explore the possibility of physical adversarial examples we ran a series of experiments with
photos of adversarial examples. We printed clean and adversarial images, took photos of the printed
pages, and cropped the printed images from the photos of the full page. We can think of this as a
black box transformation that we refer to as photo transformation.
We computed the accuracy on clean and adversarial images before and after the photo transformation
as well as the destruction rate of adversarial images subjected to photo transformation.

8
Technical report, Google, Inc.

The experimental procedure was as follows:

1. Print the image, see Figure 4a. In order to reduce the amount of manual work, we printed
multiple pairs of clean and adversarial examples on each sheet of paper. Also, QR codes
were put into corners of the printout to facilitate automatic cropping.
(a) All generated pictures of printouts (Figure 4a) were saved in lossless PNG format.
(b) Batches of PNG printouts were converted to multi-page PDF file using the con-
vert tool from the ImageMagick suite with the default settings: convert *.png
output.pdf
(c) Generated PDF files were printed using a Ricoh MP C5503 office printer. Each page
of PDF file was automatically scaled to fit the entire sheet of paper using the default
printer scaling. The printer resolution was set to 600dpi.
2. Take a photo of the printed image using a cell phone camera (Nexus 5x), see Figure 4b.
3. Automatically crop and warp validation examples from the photo, so they would become
squares of the same size as source images, see Figure 4c:
(a) Detect values and locations of four QR codes in the corners of the photo. The QR
codes encode which batch of validation examples is shown on the photo. If detection
of any of the corners failed, the entire photo was discarded and images from the photo
were not used to calculate accuracy. We observed that no more than 10% of all images
were discarded in any experiment and typically the number of discarded images was
about 3% to 6%.
(b) Warp photo using perspective transform to move location of QR codes into pre-defined
coordinates.
(c) After the image was warped, each example has known coordinates and can easily be
cropped from the image.
4. Run classification on transformed and source images. Compute accuracy and destruction
rate of adversarial images.

This procedure involves manually taking photos of the printed pages, without careful control of
lighting, camera angle, distance to the page, etc. This is intentional; it introduces nuisance variability
that has the potential to destroy adversarial perturbations that depend on subtle, fine co-adaptation
of exact pixel values. That being said, we did not intentionally seek out extreme camera angles
or lighting conditions. All photos were taken in normal indoor lighting with the camera pointed
approximately straight at the page.
For each combination of adversarial example generation method and  we conducted two sets of
experiments:

Average case. To measure the average case performance, we randomly selected 102 images
to use in one experiment with a given  and adversarial method. This experiment estimates
how often an adversary would succeed on randomly chosen photosthe world chooses an
image randomly, and the adversary attempts to cause it to be misclassified.
Prefiltered case. To study a more aggressive attack, we performed experiments in which
the images are prefiltered. Specifically, we selected 102 images such that all clean images
are classified correctly, and all adversarial images (before photo transformation) are clas-
sified incorrectly (both top-1 and top-5 classification). In addition we used a confidence
threshold for the top prediction: p(ypredicted |X) 0.8, where ypredicted is the class pre-
dicted by the network for image X. This experiment measures how often an adversary
would succeed when the adversary can choose the original image to attack. Under our
threat model, the adversary has access to the model parameters and architecture, so the
attacker can always run inference to determine whether an attack will succeed in the ab-
sence of photo transformation. The attacker might expect to do the best by choosing to
make attacks that succeed in this initial condition. The victim then takes a new photo of the
physical object that the attacker chooses to display, and the photo transformation can either
preserve the attack or destroy it.

9
Technical report, Google, Inc.

Table 1: Accuracy on photos of adversarial images in the average case (randomly chosen images).

Photos Source images


Adversarial Clean images Adv. images Clean images Adv. images
method top-1 top-5 top-1 top-5 top-1 top-5 top-1 top-5
fast  = 16 79.8% 91.9% 36.4% 67.7% 85.3% 94.1% 36.3% 58.8%
fast  = 8 70.6% 93.1% 49.0% 73.5% 77.5% 97.1% 30.4% 57.8%
iter. basic  = 16 72.9% 89.6% 49.0% 75.0% 81.4% 95.1% 28.4% 31.4%
iter. basic  = 8 72.5% 93.1% 51.0% 87.3% 73.5% 93.1% 26.5% 31.4%
l.l. class  = 16 71.1% 90.0% 60.0% 83.3% 79.4% 96.1% 1.0% 1.0%
l.l. class  = 8 76.5% 94.1% 69.6% 92.2% 78.4% 98.0% 0.0% 6.9%

Table 2: Accuracy on photos of adversarial images in the prefiltered case (clean image correctly
classified, adversarial image confidently incorrectly classified).

Photos Source images


Adversarial Clean images Adv. images Clean images Adv. images
method top-1 top-5 top-1 top-5 top-1 top-5 top-1 top-5
fast  = 16 81.8% 97.0% 5.1% 39.4% 100.0% 100.0% 0.0% 0.0%
fast  = 8 77.1% 95.8% 14.6% 70.8% 100.0% 100.0% 0.0% 0.0%
iter. basic  = 16 93.3% 97.8% 60.0% 87.8% 100.0% 100.0% 0.0% 0.0%
iter. basic  = 8 89.2% 98.0% 64.7% 91.2% 100.0% 100.0% 0.0% 0.0%
l.l. class  = 16 95.8% 100.0% 87.5% 97.9% 100.0% 100.0% 0.0% 0.0%
l.l. class  = 8 96.0% 100.0% 88.9% 97.0% 100.0% 100.0% 0.0% 0.0%

Table 3: Adversarial image destruction rate with photos.

Adversarial Average case Prefiltered case


method top-1 top-5 top-1 top-5
fast  = 16 12.5% 40.0% 5.1% 39.4%
fast  = 8 33.3% 40.0% 14.6% 70.8%
iter. basic  = 16 40.4% 69.4% 60.0% 87.8%
iter. basic  = 8 52.1% 90.5% 64.7% 91.2%
l.l. class  = 16 72.2% 85.1% 87.5% 97.9%
l.l. class  = 8 86.3% 94.6% 88.9% 97.0%

3.3 E XPERIMENTAL RESULTS ON PHOTOS OF ADVERSARIAL IMAGES

Results of the photo transformation experiment are summarized in Tables 1, 2 and 3.


We found that fast adversarial images are more robust to photo transformation compared to itera-
tive methods. This could be explained by the fact that iterative methods exploit more subtle kind of
perturbations, and these subtle perturbations are more likely to be destroyed by photo transforma-
tion.
One unexpected result is that in some cases the adversarial destruction rate in the prefiltered case
was higher compared to the average case. In the case of the iterative methods, even the total
success rate was lower for prefiltered images rather than randomly selected images. This suggests
that, to obtain very high confidence, iterative methods often make subtle co-adaptations that are not
able to survive photo transformation.
Overall, the results show that some fraction of adversarial examples stays misclassified even after a
non-trivial transformation: the photo transformation. This demonstrates the possibility of physical
adversarial examples. For example, an adversary using the fast method with  = 16 could expect
that about 2/3 of the images would be top-1 misclassified and about 1/3 of the images would be

10
Technical report, Google, Inc.

top-5 misclassified. Thus by generating enough adversarial images, the adversary could expect to
cause far more misclassification than would occur on natural inputs.

4 A RTIFICIAL IMAGE TRANSFORMATIONS

18.0%
fast adv., top-1
16.0% fast adv., top-5
basic iter. adv., top-1
basic iter. adv., top-5
14.0% least likely class adv., top-1
least likely class adv., top-5
12.0%
destruction rate

10.0%

8.0%

6.0%

4.0%

2.0%

0.0%
30 20 10 0 10 20 30
brightness + X
Figure 5: Comparison of adversarial destruction rates for various adversarial methods for transfor-
mation which changes brightness. All experiments were done with  = 16.

Photo transformations described in the previous section could be considered as some combination
of much simpler image transformations. Thus to better understand what is going on we conducted a
series of experiments to measure the adversarial destruction rate on artificial image transformations.
We explored the following set of transformations: change of contrast and brightness, Gaussian blur,
Gaussian noise, and JPEG encoding.
For this set of experiments we used a subset of 1, 000 images randomly selected from the validation
set. This subset of 1, 000 was selected once, thus all experiments from this section used the same
subset of images. selected We performed experiments for multiple pairs of adversarial method
and transformation. For each given pair of transformation and adversarial method we computed
adversarial examples, applied the transformation to the adversarial examples, and then computed
the destruction rate according to Equation (1).
The results1 for various transformations and adversarial methods with  = 16 are summarized in
Figures 5, 6, 7, 8 and 9. The following general observations can be drawn:

Adversarial examples generated by the fast method are the most robust to transformations,
and adversarial examples generated by the iterative least-likely class method are the least
robust. This coincides with our results on photo transformation.
The top-5 destruction rate is typically higher than top-1 destruction rate. This can be ex-
plained by the fact that in order to destroy top-5 adversarial examples, a transformation
1
To save space, we omit several experiments we performed that did not have unique and interesting results.
For example, the composition of several JPEG transforms was similar to a single JPEG transform, median blur
was similar to Gaussian blur, etc.

11
Technical report, Google, Inc.

20.0%
fast adv., top-1
fast adv., top-5
basic iter. adv., top-1
basic iter. adv., top-5
15.0% least likely class adv., top-1
least likely class adv., top-5
destruction rate

10.0%

5.0%

0.0%
0.7 0.8 0.9 1.0 1.1 1.2 1.3
contrast X
Figure 6: Comparison of adversarial destruction rates for various adversarial methods for transfor-
mation which changes contrast. All experiments were done with  = 16.

100.0%
fast adv., top-1
fast adv., top-5
basic iter. adv., top-1
80.0% basic iter. adv., top-5
least likely class adv., top-1
least likely class adv., top-5
destruction rate

60.0%

40.0%

20.0%

0.0%
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
Gaussian blur
Figure 7: Comparison of adversarial destruction rates for various adversarial methods for Gaussian
blur transformation. All experiments were done with  = 16.

12
Technical report, Google, Inc.

100.0%
fast adv., top-1
fast adv., top-5
basic iter. adv., top-1
80.0% basic iter. adv., top-5
least likely class adv., top-1
least likely class adv., top-5
destruction rate

60.0%

40.0%

20.0%

0.0%
5 10 15 20
Gaussian noise
Figure 8: Comparison of adversarial destruction rates for various adversarial methods for Gaussian
noise transformation. All experiments were done with  = 16.

100.0%
fast adv., top-1
fast adv., top-5
basic iter. adv., top-1
80.0% basic iter. adv., top-5
least likely class adv., top-1
least likely class adv., top-5
destruction rate

60.0%

40.0%

20.0%

0.0%
10 20 30 40 50 60 70 80 90 100
Jpeg quality
Figure 9: Comparison of adversarial destruction rates for various adversarial methods for JPEG
encoding transformation. All experiments were done with  = 16.

13
Technical report, Google, Inc.

has to push the correct class labels into one of the top-5 predictions. However in order to
destroy top-1 adversarial examples we have to push the correct label to be top-1 prediction,
which is a strictly stronger requirement.
Changing brightness and contrast does not affect adversarial examples much. The destruc-
tion rate on fast and basic iterative adversarial examples is less than 5%, and for the iterative
least-likely class method it is less than 20%.
Blur, noise and JPEG encoding have a higher destruction rate than changes of brightness
and contrast. In particular, the destruction rate for iterative methods could reach 80%
90%. However none of these transformations destroy 100% of adversarial examples, which
coincides with the photo transformation experiment.

5 C ONCLUSION
In this paper we explored the possibility of creating adversarial examples for machine learning sys-
tems which operate in the physical world. We used images taken from a cell-phone camera as an
input to an Inception v3 image classification neural network. We showed that in such a set-up, a sig-
nificant fraction of adversarial images crafted using the original network are misclassified even when
fed to the classifier through the camera. This finding demonstrates the possibility of adversarial ex-
amples for machine learning systems in the physical world. In future work, we expect that it will
be possible to demonstrate attacks using other kinds of physical objects besides images printed on
paper, attacks against different kinds of machine learning systems, such as sophisticated reinforce-
ment learning agents, attacks performed without access to the models parameters and architecture
(presumably using the transfer property), and physical attacks that achieve a higher success rate by
explicitly modeling the phyiscal transformation during the adversarial example construction process.
We also hope that future work will develop effective methods for defending against such attacks.

R EFERENCES
Bahdanau, Dzmitry, Cho, Kyunghyun, and Bengio, Yoshua. Neural machine translation by jointly
learning to align and translate. In ICLR2015, arXiv:1409.0473, 2015.
Biggio, Battista, Corona, Igino, Maiorca, Davide, Nelson, Blaine, Srndic, Nedim, Laskov, Pavel,
Giacinto, Giorgio, and Roli, Fabio. Evasion attacks against machine learning at test time. In
Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.
387402. Springer, 2013.
Carlini, Nicholas, Mishra, Pratyush, Vaidya, Tavish, Zhang, Yuankai, Sherr, Micah, Shields,
Clay, Wagner, David, and Zhou, Wenchao. Hidden voice commands. In 25th USENIX
Security Symposium (USENIX Security 16), Austin, TX, August 2016. USENIX As-
sociation. URL https://www.usenix.org/conference/usenixsecurity16/
technical-sessions/presentation/carlini.
Dalvi, Nilesh, Domingos, Pedro, Sanghai, Sumit, Verma, Deepak, et al. Adversarial classification.
In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and
data mining, pp. 99108. ACM, 2004.
Goodfellow, Ian J., Shlens, Jonathon, and Szegedy, Christian. Explaining and harnessing adversarial
examples. CoRR, abs/1412.6572, 2014. URL http://arxiv.org/abs/1412.6572.
Hinton, Geoffrey, Deng, Li, Yu, Dong, Dahl, George, rahman Mohamed, Abdel, Jaitly, Navdeep,
Senior, Andrew, Vanhoucke, Vincent, Nguyen, Patrick, Sainath, Tara, and Kingsbury, Brian. Deep
neural networks for acoustic modeling in speech recognition. Signal Processing Magazine, 2012.
Krizhevsky, Alex, Sutskever, Ilya, and Hinton, Geoffrey. ImageNet classification with deep convo-
lutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS2012).
2012.
Nelson, Blaine, Barreno, Marco, Chi, Fuching Jack, Joseph, Anthony D, Rubinstein, Benjamin IP,
Saini, Udam, Sutton, Charles A, Tygar, J Doug, and Xia, Kai. Exploiting machine learning to
subvert your spam filter.

14
Technical report, Google, Inc.

Papernot, N., McDaniel, P., and Goodfellow, I. Transferability in Machine Learning: from Phe-
nomena to Black-Box Attacks using Adversarial Samples. ArXiv e-prints, May 2016b. URL
http://arxiv.org/abs/1605.07277.
Papernot, Nicolas, McDaniel, Patrick Drew, Goodfellow, Ian J., Jha, Somesh, Celik, Z. Berkay, and
Swami, Ananthram. Practical black-box attacks against deep learning systems using adversarial
examples. CoRR, abs/1602.02697, 2016a. URL http://arxiv.org/abs/1602.02697.
Russakovsky, Olga, Deng, Jia, Su, Hao, Krause, Jonathan, Satheesh, Sanjeev, Ma, Sean, Huang,
Zhiheng, Karpathy, Andrej, Khosla, Aditya, Bernstein, Michael, et al. Imagenet large scale visual
recognition challenge. arXiv preprint arXiv:1409.0575, 2014.
Smith, Daniel F, Wiliem, Arnold, and Lovell, Brian C. Face recognition on consumer devices:
Reflections on replay attacks. IEEE Transactions on Information Forensics and Security, 10(4):
736745, 2015.
Szegedy, Christian, Zaremba, Wojciech, Sutskever, Ilya, Bruna, Joan, Erhan, Dumitru, Goodfellow,
Ian J., and Fergus, Rob. Intriguing properties of neural networks. ICLR, abs/1312.6199, 2014.
URL http://arxiv.org/abs/1312.6199.
Szegedy, Christian, Vanhoucke, Vincent, Ioffe, Sergey, Shlens, Jonathon, and Wojna, Zbigniew.
Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015. URL
http://arxiv.org/abs/1512.00567.

15

You might also like