0% found this document useful (0 votes)

111 views15 pages

Adversarial Examples in The Physical World

This technical report from Google explores the vulnerability of machine learning systems to adversarial examples even when inputs are received through physical sensors like cameras rather than directly to the model. The researchers generated adversarial images for an Inception image classifier and found that a large fraction of these examples remained misclassified even when perceived through a cell phone camera, demonstrating that physical systems using computer vision are still at risk. They also discuss the security implications and limitations of assuming the attacker knows the target model, and conduct additional experiments transforming adversarial images synthetically to study transferability.

Uploaded by

eldelmoño luci

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views15 pages

Adversarial Examples in The Physical World

Uploaded by

eldelmoño luci

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Technical report, Google, Inc.

A DVERSARIAL EXAMPLES IN THE PHYSICAL WORLD

Alexey Kurakin
Google Brain
[email protected]

Ian J. Goodfellow
OpenAI
[email protected]

Samy Bengio
arXiv:1607.02533v1 [cs.CV] 8 Jul 2016

Google Brain
[email protected]

A BSTRACT

Most existing machine learning classifiers are highly vulnerable to adversarial

examples. An adversarial example is a sample of input data which has been mod-
ified very slightly in a way that is intended to cause a machine learning classifier
to misclassify it. In many cases, these modifications can be so subtle that a human
observer does not even notice the modification at all, yet the classifier still makes
a mistake. Adversarial examples pose security concerns because they could be
used to perform an attack on machine learning systems, even if the adversary has
no access to the underlying model. Up to now, all previous work have assumed a
threat model in which the adversary can feed data directly into the machine learn-
ing classifier. This is not always the case for systems operating in the physical
world, for example those which are using signals from cameras and other sensors
as an input. This paper shows that even in such physical world scenarios, machine
learning systems are vulnerable to adversarial examples. We demonstrate this by
feeding adversarial images obtained from cell-phone camera to an ImageNet In-
ception classifier and measuring the classification accuracy of the system. We find
that a large fraction of adversarial examples are classified incorrectly even when
perceived through the camera.

1 I NTRODUCTION

Recent advances in machine learning and deep neural networks enabled researchers to solve multiple
important practical problems like image, video, text classification and others (Krizhevsky et al.,
2012; Hinton et al., 2012; Bahdanau et al., 2015).
However, machine learning models are often vulnerable to adversarial manipulation of their input in-
tended to cause incorrect classification (Dalvi et al., 2004). In particular, neural networks and many
other categories of machine learning models are highly vulnerable to attacks based on small modifi-
cations of the input to the model at test time (Biggio et al., 2013; Szegedy et al., 2014; Goodfellow
et al., 2014; Papernot et al., 2016b).
The problem can be summarized as follows. Lets say there is a machine learning system M and
input sample C which we call a clean example. Lets assume that sample C is correctly classified by
the machine learning system, i.e. M (C) = ytrue . Its possible to construct an adversarial example
A which is perceptually indistinguishable from C but is classified incorrectly, i.e. M (A) 6= ytrue .
These adversarial examples are misclassified far more often than examples that have been perturbed
by noise, even if the magnitude of the noise is much larger than the magnitude of the adversarial
perturbation (Szegedy et al., 2014).

1
Technical report, Google, Inc.

Adversarial examples pose potential security threats for practical machine learning applications.
In particular, Szegedy et al. (2014) showed that an adversarial example that was designed to be
misclassified by a model M1 is often also misclassified by a model M2 . This adversarial example
transferability property means that it is possible to generate adversarial examples and perform a mis-
classification attack on a machine learning system without access to the underlying model. Papernot
et al. (2016a) and Papernot et al. (2016b) demonstrated such attacks in realistic scenarios.
However all prior work on adversarial examples for neural networks made use of a threat model in
which the attacker can supply input directly to the machine learning model. Thus adversarial attacks
rely on fine-grained modifications of input data.
Such a threat model can describe some scenarios in which attacks can take place entirely within a
computer, such as as evading spam filters or malware detectors (Biggio et al., 2013; Nelson et al.).
However, many practical machine learning systems operate in the physical world. Possible exam-
ples include but are not limited to: robots perceiving world through cameras and other sensors, video
surveillance systems, and mobile applications for image or sound classification. In such scenarios
the adversary cannot rely on the ability of fine-grained per-pixel modifications of the input data. The
following question thus arises: is it still possible to craft adversarial examples and perform adver-
sarial attacks on machine learning systems which are operating in the physical world and perceiving
data through various sensors, rather than digital representation?
Some prior work has addressed the problem of physical attacks against machine learning systems,
but not in the context of fooling neural networks by making very small perturbations of the input.
For example, Carlini et al. (2016) demonstrate an attack that can create audio inputs that mobile
phones recognize as containing intelligible voice commands, but that humans hear as an unintelli-
gible voice. Face recognition systems based on photos are vulnerable to replay attacks, in which
a previously captured image of an authorized users face is presented to the camera instead of an
actual face (Smith et al., 2015). Adversarial examples could in principle be applied in either of
these physical domains. An adversarial example for the voice command domain would consist of a
recording that seems to be innocuous to a human observer (such as a song) but contains voice com-
mands recognized by a machine learning algorithm. An adversarial example for the face recognition
domain might consist of very subtle markings applied to a persons face, so that a human observer
would recognize their identity correctly, but a machine learning system would recognize them as
being a different person.
In this paper we explore the possibility of creating adversarial examples in the physical world for
image classification tasks. For this purpose we conducted an experiment with a pre-trained ImageNet
Inception classifier (Szegedy et al., 2015). We generated adversarial examples for this model, then
we fed these examples to the classifier through a cell-phone camera and measured the classification
accuracy. This scenario is a simple physical world system which perceives data through a camera
and then runs image classification. We found that a large fraction adversarial examples generated
for the original model remain misclassified even when perceived through a camera.
Surprisingly, our attack methodology required no modification to account for the presence of the
camerathe simplest possible attack of using adversarial examples crafted for the Inception model
resulted in adversarial examples that successfully transferred to the union of the camera and Incep-
tion. Our results thus provide a lower bound on the attack success rate that could be achieved with
more specialized attacks that explicitly model the camera while crafting the adversarial example.
One limitation of our results is that we have assumed a threat model under which the attacker has
full knowledge of the model architecture and parameter values. This is primarily so that we can
use a single Inception v3 model in all experiments, without having to devise and train a different
high-performing model. The adversarial example transfer property implies that our results are likely
to extend trivially to the scenario where the attacker does not have access to the model description
(Szegedy et al., 2014; Goodfellow et al., 2014; Papernot et al., 2016b).
To better understand how the non-trivial image transformations caused by the camera affect adver-
sarial example transferability, we conducted a series of additional experiments where we studied
how adversarial examples transfer across several specific kinds of synthetic image transformations.
The rest of the paper is structured as follows: In Section 2, we review different methods which we
used to generate adversarial examples. This is followed in Section 3 by details about our physical

2
Technical report, Google, Inc.

world experimental set-up and results. Finally, Section 4 describes our experiments with various
artificial image transformations (like changing brightness, contrast, etc...) and how they affect ad-
versarial examples.

2 M ETHODS GENERATING ADVERSARIAL IMAGES

This section describes different methods to generate adversarial examples which we have used in the
experiments. It is important to note that none of the described methods guarantees that generated
image will be misclassified. Nevertheless we call all of the generated images adversarial images.
In the remaining of the paper we use the following notation:

X - an image, which is typically 3-D tensor (width height depth). In this paper, we
assume that the values of the pixels are integer numbers in the range [0, 255].
ytrue - true class for the image X.
J(X, y) - cross-entropy cost function of the neural network, given image X and class
y. We intentionally omit network weights (and other parameters) in the cost func-
tion because we assume they are fixed (to the value resulting from training the machine
learning model) in the context of the paper. For neural networks with a softmax output
layer, the cross-entropy cost function applied to integer class labels equals the negative
log-probability of the true class given the image: J(X, y) = log p(y|X), this relation-
ship will be used below.
ClipX, {X 0 } - function which performs per-pixel clipping of the image X 0 , so the result
will be in L -neighbourhood of the source image X. The exact clipping equation is as
follows:
n o
ClipX, {X 0 } (x, y, z) = min 255, X(x, y, z)+, max 0, X(x, y, z), X 0 (x, y, z)

where X(x, y, z) is the value of channel z of the image X at coordinates (x, y).

2.1 FAST METHOD

One of the simplest methods to generate adversarial images, described in (Goodfellow et al., 2014),
is motivated by linearizing the cost function and solving for the perturbation that maximizes the cost
subject to an L constraint. This may be accomplished in closed form, for the cost of one call to
back-propagation:
X adv = X + sign X J(X, ytrue )

where is a hyper-parameter to be chosen.

In this paper we refer to this method as fast because it does not require an iterative procedure to
compute adversarial examples, and thus is much faster than other considered methods.

2.2 BASIC ITERATIVE METHOD

We introduce a straightforward way to extend the fast methodwe apply it multiple times with
small step size, and clip pixel values of intermediate results after each step to ensure that they are in
an -neighbourhood of the original image:
n o
X0adv = X, adv
XN +1 = Clip X, X adv
N + sign X J(X adv
N , ytrue )

In our experiments we used = 1, i.e. we changed the value of each pixel only by 1 on each step.
We selected the number of iterations to be min( + 4, 1.25). This amount of iterations was chosen
heuristically; it is sufficient for the adversarial example to reach the edge of the max-norm ball but
restricted enough to keep the computational cost of experiments manageable.
Below we refer to this method as basic iterative method.

3
Technical report, Google, Inc.

2.3 I TERATIVE LEAST- LIKELY CLASS METHOD

Both methods we have described so far simply try to increase the cost of the correct class, without
specifying which of the incorrect classes the model should select. Such methods are sufficient for
application to datasets such as MNIST and CIFAR-10, where the number of classes is small and all
classes are highly distinct from each other. On ImageNet, with a much larger number of classes and
the varying degrees of significance in the difference between classes, these methods can result in
uninteresting misclassifications, such as mistaking one breed of sled dog for another breed of sled
dog. In order to create more interesting mistakes, we introduce the iterative least-likely class method.
This iterative method tries to make an adversarial image which will be classified as a specific desired
target class. For desired class we chose the least-likely class according to the prediction of the trained
network on image X:

yLL = arg min p(y|X) .
y

For a well-trained classifier, the least-likely class is usually highly dissimilar from the true class, so
this attack method results in more interesting mistakes, such as mistaking a dog for an airplane.
To make an adversarial image which is classified
as yLL we maximize
log p(yLL |X) by mak-
iterative steps in the direction of sign X log p(yLL |X) . This last expression equals
ing
sign X J(X, yLL ) for neural networks with cross-entropy loss. Thus we have the following
procedure:

X0adv = X, adv
adv adv

XN +1 = ClipX, XN sign X J(XN , yLL )

For this iterative procedure we used the same and same number of iterations as for the basic
iterative method.
Below we refer to this method as the least likely class method or shortly l.l. class.

2.4 C OMPARISON OF METHODS OF GENERATING ADVERSARIAL EXAMPLES

As mentioned above, it is not guaranteed that an adversarial image will actually be misclassified
sometimes the attacker wins, and sometimes the machine learning model wins. We did an exper-
imental comparison of adversarial methods to understand the actual classification accuracy on the
generated images as well as the types of perturbations exploited by each of the methods.
The experiments were performed on all 50, 000 validation samples from the ImageNet dataset (Rus-
sakovsky et al., 2014) using a pre-trained Inception v3 classifier (Szegedy et al., 2015). For each
validation image, we generated adversarial examples using different methods and different values
of . For each pair of method and , we computed the classification accuracy on all 50, 000 images.
Also, we computed the accuracy on all clean images, which we used as a baseline.
Examples of generated adversarial images are provided in Figures 1 and 2. Top-1 and top-5 classi-
fication accuracy on clean and adversarial images are summarized in Figure 3.
As shown in Figure 3, the fast method decreases top-1 accuracy by a factor of two and top-5 accuracy
by about 40% even with the smallest values of . As we increase , accuracy on adversarial images
generated by the fast method stays on approximately the same level until = 32 and then slowly
decreases to almost 0 as grows to 128. This could be explained by the fact that the fast method
adds -scaled noise to each image, thus higher values of essentially destroys the content of the
image and makes it unrecognisable even by humans, see Figure 1.
Iterative methods exploit much finer perturbations which do not destroy the image even with higher
, see Figure 2.
The basic iterative method is able to produce better adversarial images when < 48, however as we
increase it is unable to improve.
The least likely class method destroys the correct classification of most images even when is
relatively small.

4
Technical report, Google, Inc.

clean image =4 =8 = 16

= 24 = 32 = 48 = 64

clean image =4 =8 = 16

= 24 = 32 = 48 = 64

Figure 1: Comparison of images resulting from an adversarial pertubation using the fast method.
The top image is a knee pad while the bottom one is a garbage truck. In both cases clean images
are classified correctly and adversarial images are misclassified for all considered .

5
Technical report, Google, Inc.

Clean image Fast; L distance to clean image = 32

Basic iter.; L distance to clean image = 32 L.l. class; L distance to clean image = 28

Figure 2: Comparison of different adversarial methods with = 32. Perturbations generated by

iterative methods are finer compared to the fast method. Also iterative methods do not always select
a point on the border of -neighbourhood as an adversarial image.

6
Technical report, Google, Inc.

1.0

0.8

0.6
top-1 accuracy

clean images
fast adv.
basic iter. adv.
least likely class adv.
0.4

0.2

0.0
0 16 32 48 64 80 96 112 128
epsilon
1.0

0.8

0.6
top-5 accuracy

clean images
fast adv.
basic iter. adv.
least likely class adv.
0.4

0.2

0.0
0 16 32 48 64 80 96 112 128
epsilon
Figure 3: Top-1 and top-5 accuracy of Inception v3 under attack by different adversarial methods
and different compared to clean images unmodified images from the dataset. The Accuracy
was computed on all 50, 000 validation images from the ImageNet dataset. In these experiments
varies from 2 to 128.

7
Technical report, Google, Inc.

(a) Printout (b) Photo of printout (c) Cropped image

Figure 4: Experimental setup: (a) generated printout which contains pairs of clean and adversar-
ial images, as well as QR codes to help automatic cropping; (b) photo of the printout made by a
cellphone camera; (c) automatically cropped image from the photo.

We limit all further experiments to 16 because such perturbations are only perceived as a
small noise (if perceived at all), and adversarial methods are able to produce a significant number of
misclassified examples in this -neighbourhood of clean images.

3 P HOTOS OF ADVERSARIAL EXAMPLES

3.1 D ESTRUCTION RATE OF ADVERSARIAL IMAGES

To study the influence of arbitrary transformations on adversarial images we introduce the notion
of destruction rate. It can be described as the fraction of adversarial images which are no longer
misclassified after the transformations. The formal definition is the following:

Pn
k=1 C(X k , ytrue
k k , yk
)C(Xadv k k
true )C(T (Xadv ), ytrue )
d= Pn (1)
k k k k
k=1 C(X , ytrue )C(Xadv , ytrue )

where n is the number of images used to comput the destruction rate, X k is an image from the
k k
dataset, ytrue is the true class of this image, and Xadv is the corresponding adversarial image. The
function T () is an arbitrary image transformationin this article, we study a variety of transfor-
mations, including printing the image and taking a photo of the result. The function C(X, y) is an
indicator function which returns whether the image was classified correctly:

1, if image X is classified as y;
C(X, y) =
0, otherwise.

We denote the binary negation of this indicator value as C(X, y), which is computed as C(X, y) =
1 C(X, y).

3.2 E XPERIMENTAL SETUP

To explore the possibility of physical adversarial examples we ran a series of experiments with
photos of adversarial examples. We printed clean and adversarial images, took photos of the printed
pages, and cropped the printed images from the photos of the full page. We can think of this as a
black box transformation that we refer to as photo transformation.
We computed the accuracy on clean and adversarial images before and after the photo transformation
as well as the destruction rate of adversarial images subjected to photo transformation.

8
Technical report, Google, Inc.

The experimental procedure was as follows:

1. Print the image, see Figure 4a. In order to reduce the amount of manual work, we printed
multiple pairs of clean and adversarial examples on each sheet of paper. Also, QR codes
were put into corners of the printout to facilitate automatic cropping.
(a) All generated pictures of printouts (Figure 4a) were saved in lossless PNG format.
(b) Batches of PNG printouts were converted to multi-page PDF file using the con-
vert tool from the ImageMagick suite with the default settings: convert *.png
output.pdf
(c) Generated PDF files were printed using a Ricoh MP C5503 office printer. Each page
of PDF file was automatically scaled to fit the entire sheet of paper using the default
printer scaling. The printer resolution was set to 600dpi.
2. Take a photo of the printed image using a cell phone camera (Nexus 5x), see Figure 4b.
3. Automatically crop and warp validation examples from the photo, so they would become
squares of the same size as source images, see Figure 4c:
(a) Detect values and locations of four QR codes in the corners of the photo. The QR
codes encode which batch of validation examples is shown on the photo. If detection
of any of the corners failed, the entire photo was discarded and images from the photo
were not used to calculate accuracy. We observed that no more than 10% of all images
were discarded in any experiment and typically the number of discarded images was
about 3% to 6%.
(b) Warp photo using perspective transform to move location of QR codes into pre-defined
coordinates.
(c) After the image was warped, each example has known coordinates and can easily be
cropped from the image.
4. Run classification on transformed and source images. Compute accuracy and destruction
rate of adversarial images.

This procedure involves manually taking photos of the printed pages, without careful control of
lighting, camera angle, distance to the page, etc. This is intentional; it introduces nuisance variability
that has the potential to destroy adversarial perturbations that depend on subtle, fine co-adaptation
of exact pixel values. That being said, we did not intentionally seek out extreme camera angles
or lighting conditions. All photos were taken in normal indoor lighting with the camera pointed
approximately straight at the page.
For each combination of adversarial example generation method and we conducted two sets of
experiments:

Average case. To measure the average case performance, we randomly selected 102 images
to use in one experiment with a given and adversarial method. This experiment estimates
how often an adversary would succeed on randomly chosen photosthe world chooses an
image randomly, and the adversary attempts to cause it to be misclassified.
Prefiltered case. To study a more aggressive attack, we performed experiments in which
the images are prefiltered. Specifically, we selected 102 images such that all clean images
are classified correctly, and all adversarial images (before photo transformation) are clas-
sified incorrectly (both top-1 and top-5 classification). In addition we used a confidence
threshold for the top prediction: p(ypredicted |X) 0.8, where ypredicted is the class pre-
dicted by the network for image X. This experiment measures how often an adversary
would succeed when the adversary can choose the original image to attack. Under our
threat model, the adversary has access to the model parameters and architecture, so the
attacker can always run inference to determine whether an attack will succeed in the ab-
sence of photo transformation. The attacker might expect to do the best by choosing to
make attacks that succeed in this initial condition. The victim then takes a new photo of the
physical object that the attacker chooses to display, and the photo transformation can either
preserve the attack or destroy it.

9
Technical report, Google, Inc.

Table 1: Accuracy on photos of adversarial images in the average case (randomly chosen images).

Photos Source images

Adversarial Clean images Adv. images Clean images Adv. images
method top-1 top-5 top-1 top-5 top-1 top-5 top-1 top-5
fast = 16 79.8% 91.9% 36.4% 67.7% 85.3% 94.1% 36.3% 58.8%
fast = 8 70.6% 93.1% 49.0% 73.5% 77.5% 97.1% 30.4% 57.8%
iter. basic = 16 72.9% 89.6% 49.0% 75.0% 81.4% 95.1% 28.4% 31.4%
iter. basic = 8 72.5% 93.1% 51.0% 87.3% 73.5% 93.1% 26.5% 31.4%
l.l. class = 16 71.1% 90.0% 60.0% 83.3% 79.4% 96.1% 1.0% 1.0%
l.l. class = 8 76.5% 94.1% 69.6% 92.2% 78.4% 98.0% 0.0% 6.9%

Table 2: Accuracy on photos of adversarial images in the prefiltered case (clean image correctly
classified, adversarial image confidently incorrectly classified).

Photos Source images

Adversarial Clean images Adv. images Clean images Adv. images
method top-1 top-5 top-1 top-5 top-1 top-5 top-1 top-5
fast = 16 81.8% 97.0% 5.1% 39.4% 100.0% 100.0% 0.0% 0.0%
fast = 8 77.1% 95.8% 14.6% 70.8% 100.0% 100.0% 0.0% 0.0%
iter. basic = 16 93.3% 97.8% 60.0% 87.8% 100.0% 100.0% 0.0% 0.0%
iter. basic = 8 89.2% 98.0% 64.7% 91.2% 100.0% 100.0% 0.0% 0.0%
l.l. class = 16 95.8% 100.0% 87.5% 97.9% 100.0% 100.0% 0.0% 0.0%
l.l. class = 8 96.0% 100.0% 88.9% 97.0% 100.0% 100.0% 0.0% 0.0%

Table 3: Adversarial image destruction rate with photos.

Adversarial Average case Prefiltered case

method top-1 top-5 top-1 top-5
fast = 16 12.5% 40.0% 5.1% 39.4%
fast = 8 33.3% 40.0% 14.6% 70.8%
iter. basic = 16 40.4% 69.4% 60.0% 87.8%
iter. basic = 8 52.1% 90.5% 64.7% 91.2%
l.l. class = 16 72.2% 85.1% 87.5% 97.9%
l.l. class = 8 86.3% 94.6% 88.9% 97.0%

3.3 E XPERIMENTAL RESULTS ON PHOTOS OF ADVERSARIAL IMAGES

Results of the photo transformation experiment are summarized in Tables 1, 2 and 3.

We found that fast adversarial images are more robust to photo transformation compared to itera-
tive methods. This could be explained by the fact that iterative methods exploit more subtle kind of
perturbations, and these subtle perturbations are more likely to be destroyed by photo transforma-
tion.
One unexpected result is that in some cases the adversarial destruction rate in the prefiltered case
was higher compared to the average case. In the case of the iterative methods, even the total
success rate was lower for prefiltered images rather than randomly selected images. This suggests
that, to obtain very high confidence, iterative methods often make subtle co-adaptations that are not
able to survive photo transformation.
Overall, the results show that some fraction of adversarial examples stays misclassified even after a
non-trivial transformation: the photo transformation. This demonstrates the possibility of physical
adversarial examples. For example, an adversary using the fast method with = 16 could expect
that about 2/3 of the images would be top-1 misclassified and about 1/3 of the images would be

10
Technical report, Google, Inc.

top-5 misclassified. Thus by generating enough adversarial images, the adversary could expect to
cause far more misclassification than would occur on natural inputs.

4 A RTIFICIAL IMAGE TRANSFORMATIONS

18.0%
fast adv., top-1
16.0% fast adv., top-5
basic iter. adv., top-1
basic iter. adv., top-5
14.0% least likely class adv., top-1
least likely class adv., top-5
12.0%
destruction rate

10.0%

8.0%

6.0%

4.0%

2.0%

0.0%
30 20 10 0 10 20 30
brightness + X
Figure 5: Comparison of adversarial destruction rates for various adversarial methods for transfor-
mation which changes brightness. All experiments were done with = 16.

Photo transformations described in the previous section could be considered as some combination
of much simpler image transformations. Thus to better understand what is going on we conducted a
series of experiments to measure the adversarial destruction rate on artificial image transformations.
We explored the following set of transformations: change of contrast and brightness, Gaussian blur,
Gaussian noise, and JPEG encoding.
For this set of experiments we used a subset of 1, 000 images randomly selected from the validation
set. This subset of 1, 000 was selected once, thus all experiments from this section used the same
subset of images. selected We performed experiments for multiple pairs of adversarial method
and transformation. For each given pair of transformation and adversarial method we computed
adversarial examples, applied the transformation to the adversarial examples, and then computed
the destruction rate according to Equation (1).
The results1 for various transformations and adversarial methods with = 16 are summarized in
Figures 5, 6, 7, 8 and 9. The following general observations can be drawn:

Adversarial examples generated by the fast method are the most robust to transformations,
and adversarial examples generated by the iterative least-likely class method are the least
robust. This coincides with our results on photo transformation.
The top-5 destruction rate is typically higher than top-1 destruction rate. This can be ex-
plained by the fact that in order to destroy top-5 adversarial examples, a transformation
1
To save space, we omit several experiments we performed that did not have unique and interesting results.
For example, the composition of several JPEG transforms was similar to a single JPEG transform, median blur
was similar to Gaussian blur, etc.

11
Technical report, Google, Inc.

20.0%
fast adv., top-1
fast adv., top-5
basic iter. adv., top-1
basic iter. adv., top-5
15.0% least likely class adv., top-1
least likely class adv., top-5
destruction rate

10.0%

5.0%

0.0%
0.7 0.8 0.9 1.0 1.1 1.2 1.3
contrast X
Figure 6: Comparison of adversarial destruction rates for various adversarial methods for transfor-
mation which changes contrast. All experiments were done with = 16.

100.0%
fast adv., top-1
fast adv., top-5
basic iter. adv., top-1
80.0% basic iter. adv., top-5
least likely class adv., top-1
least likely class adv., top-5
destruction rate

60.0%

40.0%

20.0%

0.0%
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
Gaussian blur
Figure 7: Comparison of adversarial destruction rates for various adversarial methods for Gaussian
blur transformation. All experiments were done with = 16.

12
Technical report, Google, Inc.

100.0%
fast adv., top-1
fast adv., top-5
basic iter. adv., top-1
80.0% basic iter. adv., top-5
least likely class adv., top-1
least likely class adv., top-5
destruction rate

60.0%

40.0%

20.0%

0.0%
5 10 15 20
Gaussian noise
Figure 8: Comparison of adversarial destruction rates for various adversarial methods for Gaussian
noise transformation. All experiments were done with = 16.

100.0%
fast adv., top-1
fast adv., top-5
basic iter. adv., top-1
80.0% basic iter. adv., top-5
least likely class adv., top-1
least likely class adv., top-5
destruction rate

60.0%

40.0%

20.0%

0.0%
10 20 30 40 50 60 70 80 90 100
Jpeg quality
Figure 9: Comparison of adversarial destruction rates for various adversarial methods for JPEG
encoding transformation. All experiments were done with = 16.

13
Technical report, Google, Inc.

has to push the correct class labels into one of the top-5 predictions. However in order to
destroy top-1 adversarial examples we have to push the correct label to be top-1 prediction,
which is a strictly stronger requirement.
Changing brightness and contrast does not affect adversarial examples much. The destruc-
tion rate on fast and basic iterative adversarial examples is less than 5%, and for the iterative
least-likely class method it is less than 20%.
Blur, noise and JPEG encoding have a higher destruction rate than changes of brightness
and contrast. In particular, the destruction rate for iterative methods could reach 80%
90%. However none of these transformations destroy 100% of adversarial examples, which
coincides with the photo transformation experiment.

5 C ONCLUSION
In this paper we explored the possibility of creating adversarial examples for machine learning sys-
tems which operate in the physical world. We used images taken from a cell-phone camera as an
input to an Inception v3 image classification neural network. We showed that in such a set-up, a sig-
nificant fraction of adversarial images crafted using the original network are misclassified even when
fed to the classifier through the camera. This finding demonstrates the possibility of adversarial ex-
amples for machine learning systems in the physical world. In future work, we expect that it will
be possible to demonstrate attacks using other kinds of physical objects besides images printed on
paper, attacks against different kinds of machine learning systems, such as sophisticated reinforce-
ment learning agents, attacks performed without access to the models parameters and architecture
(presumably using the transfer property), and physical attacks that achieve a higher success rate by
explicitly modeling the phyiscal transformation during the adversarial example construction process.
We also hope that future work will develop effective methods for defending against such attacks.

R EFERENCES
Bahdanau, Dzmitry, Cho, Kyunghyun, and Bengio, Yoshua. Neural machine translation by jointly
learning to align and translate. In ICLR2015, arXiv:1409.0473, 2015.
Biggio, Battista, Corona, Igino, Maiorca, Davide, Nelson, Blaine, Srndic, Nedim, Laskov, Pavel,
Giacinto, Giorgio, and Roli, Fabio. Evasion attacks against machine learning at test time. In
Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.
387402. Springer, 2013.
Carlini, Nicholas, Mishra, Pratyush, Vaidya, Tavish, Zhang, Yuankai, Sherr, Micah, Shields,
Clay, Wagner, David, and Zhou, Wenchao. Hidden voice commands. In 25th USENIX
Security Symposium (USENIX Security 16), Austin, TX, August 2016. USENIX As-
sociation. URL https://www.usenix.org/conference/usenixsecurity16/
technical-sessions/presentation/carlini.
Dalvi, Nilesh, Domingos, Pedro, Sanghai, Sumit, Verma, Deepak, et al. Adversarial classification.
In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and
data mining, pp. 99108. ACM, 2004.
Goodfellow, Ian J., Shlens, Jonathon, and Szegedy, Christian. Explaining and harnessing adversarial
examples. CoRR, abs/1412.6572, 2014. URL http://arxiv.org/abs/1412.6572.
Hinton, Geoffrey, Deng, Li, Yu, Dong, Dahl, George, rahman Mohamed, Abdel, Jaitly, Navdeep,
Senior, Andrew, Vanhoucke, Vincent, Nguyen, Patrick, Sainath, Tara, and Kingsbury, Brian. Deep
neural networks for acoustic modeling in speech recognition. Signal Processing Magazine, 2012.
Krizhevsky, Alex, Sutskever, Ilya, and Hinton, Geoffrey. ImageNet classification with deep convo-
lutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS2012).
2012.
Nelson, Blaine, Barreno, Marco, Chi, Fuching Jack, Joseph, Anthony D, Rubinstein, Benjamin IP,
Saini, Udam, Sutton, Charles A, Tygar, J Doug, and Xia, Kai. Exploiting machine learning to
subvert your spam filter.

14
Technical report, Google, Inc.

Papernot, N., McDaniel, P., and Goodfellow, I. Transferability in Machine Learning: from Phe-
nomena to Black-Box Attacks using Adversarial Samples. ArXiv e-prints, May 2016b. URL
http://arxiv.org/abs/1605.07277.
Papernot, Nicolas, McDaniel, Patrick Drew, Goodfellow, Ian J., Jha, Somesh, Celik, Z. Berkay, and
Swami, Ananthram. Practical black-box attacks against deep learning systems using adversarial
examples. CoRR, abs/1602.02697, 2016a. URL http://arxiv.org/abs/1602.02697.
Russakovsky, Olga, Deng, Jia, Su, Hao, Krause, Jonathan, Satheesh, Sanjeev, Ma, Sean, Huang,
Zhiheng, Karpathy, Andrej, Khosla, Aditya, Bernstein, Michael, et al. Imagenet large scale visual
recognition challenge. arXiv preprint arXiv:1409.0575, 2014.
Smith, Daniel F, Wiliem, Arnold, and Lovell, Brian C. Face recognition on consumer devices:
Reflections on replay attacks. IEEE Transactions on Information Forensics and Security, 10(4):
736745, 2015.
Szegedy, Christian, Zaremba, Wojciech, Sutskever, Ilya, Bruna, Joan, Erhan, Dumitru, Goodfellow,
Ian J., and Fergus, Rob. Intriguing properties of neural networks. ICLR, abs/1312.6199, 2014.
URL http://arxiv.org/abs/1312.6199.
Szegedy, Christian, Vanhoucke, Vincent, Ioffe, Sergey, Shlens, Jonathon, and Wojna, Zbigniew.
Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015. URL
http://arxiv.org/abs/1512.00567.

Book - A State of The Art Review On Adversarial Machine Learning
No ratings yet
Book - A State of The Art Review On Adversarial Machine Learning
66 pages
Adversarial Examples Are Not Bugs1905.02175
No ratings yet
Adversarial Examples Are Not Bugs1905.02175
37 pages
Adversarial Robustness For Tabular Data Through Cost and Utility Awareness
No ratings yet
Adversarial Robustness For Tabular Data Through Cost and Utility Awareness
32 pages
New Adversarial Image Detection Based On Sentiment Analysis
No ratings yet
New Adversarial Image Detection Based On Sentiment Analysis
15 pages
EBSCO-FullText-04 18 2025
No ratings yet
EBSCO-FullText-04 18 2025
32 pages
Adversarial Attacks On Deep-Learning Models in Natural Language Processing: A Survey
No ratings yet
Adversarial Attacks On Deep-Learning Models in Natural Language Processing: A Survey
41 pages
Adversarial Machine Learning at Scale - Kurakin Et Al, 2017
No ratings yet
Adversarial Machine Learning at Scale - Kurakin Et Al, 2017
17 pages
D T A E - B - A: Elving Into Ransferable Dversarial X Amples and Lack BOX Ttacks
No ratings yet
D T A E - B - A: Elving Into Ransferable Dversarial X Amples and Lack BOX Ttacks
24 pages
Adversarial - Machine - Learning - Applied - To - Intrusion - and - Malware - Scenarios - A - Systematic - Review
No ratings yet
Adversarial - Machine - Learning - Applied - To - Intrusion - and - Malware - Scenarios - A - Systematic - Review
17 pages
NeurIPS 2023 Content Based Unrestricted Adversarial Attack Paper Conference
No ratings yet
NeurIPS 2023 Content Based Unrestricted Adversarial Attack Paper Conference
15 pages
Random Spiking and Systematic Evaluation of Defenses Against Adversarial Examples
No ratings yet
Random Spiking and Systematic Evaluation of Defenses Against Adversarial Examples
12 pages
Adversarial Examples Attacks and Defenses For Deep Learning
No ratings yet
Adversarial Examples Attacks and Defenses For Deep Learning
20 pages
Synthesizing Robust Adversarial Examples
No ratings yet
Synthesizing Robust Adversarial Examples
19 pages
7.explaining and Harnessing Adversarial Examples
No ratings yet
7.explaining and Harnessing Adversarial Examples
11 pages
Adversarial Examples Are Misaligned in Diffusion Model Manifolds
No ratings yet
Adversarial Examples Are Misaligned in Diffusion Model Manifolds
23 pages
ML Project 4 Final
No ratings yet
ML Project 4 Final
9 pages
MoNet Impressionism As A Defense Against Adversarial Examples
No ratings yet
MoNet Impressionism As A Defense Against Adversarial Examples
10 pages
Marketing Management Notes For All 5 Units PDF
No ratings yet
Marketing Management Notes For All 5 Units PDF
133 pages
Adversarial Attacks Against Medical Deep Learning Systems
No ratings yet
Adversarial Attacks Against Medical Deep Learning Systems
16 pages
DSCAE A Denoising Sparse Convolutional Autoencoder Defense Against Adversarial Examples
No ratings yet
DSCAE A Denoising Sparse Convolutional Autoencoder Defense Against Adversarial Examples
11 pages
1 s2.0 S0893608023001259 Main
No ratings yet
1 s2.0 S0893608023001259 Main
9 pages
Paper AI
No ratings yet
Paper AI
6 pages
1 s2.0 S2352864821000973 Main
No ratings yet
1 s2.0 S2352864821000973 Main
7 pages
Applsci 09 00909
No ratings yet
Applsci 09 00909
29 pages
Adversarial Examples For Malware Detection: Abstract
No ratings yet
Adversarial Examples For Malware Detection: Abstract
18 pages
Adversarial Attacks and Defenses in Deep Learning
No ratings yet
Adversarial Attacks and Defenses in Deep Learning
39 pages
Adversarial Examples That Fool Both Computer Vision and Time-Limited Humans
No ratings yet
Adversarial Examples That Fool Both Computer Vision and Time-Limited Humans
22 pages
Transferability in Machine Learning: From Phenomena To Black-Box Attacks Using Adversarial Samples
No ratings yet
Transferability in Machine Learning: From Phenomena To Black-Box Attacks Using Adversarial Samples
13 pages
Wild Patterns: Ten Years After The Rise of Adversarial Machine Learning
No ratings yet
Wild Patterns: Ten Years After The Rise of Adversarial Machine Learning
17 pages
Generating Adversarial Malware Examples For Black-Box Attacks Based On GAN
No ratings yet
Generating Adversarial Malware Examples For Black-Box Attacks Based On GAN
7 pages
17 Attacks
No ratings yet
17 Attacks
12 pages
Ieeespmag 16
No ratings yet
Ieeespmag 16
5 pages
Threat of Adversarial Attacks On Deep Learning A Survey
No ratings yet
Threat of Adversarial Attacks On Deep Learning A Survey
21 pages
Real-World Adversarial Examples Via Makeup
No ratings yet
Real-World Adversarial Examples Via Makeup
5 pages
Robustness of Image-Based Malware Classification Models Trained With Generative Adversarial Networks
No ratings yet
Robustness of Image-Based Malware Classification Models Trained With Generative Adversarial Networks
7 pages
The Limitations of Deep Learning in Adversarial Settings
No ratings yet
The Limitations of Deep Learning in Adversarial Settings
16 pages
Adversarial Examples For Generative Models
No ratings yet
Adversarial Examples For Generative Models
7 pages
Adversarial Examples - Attacks and Defenses in The Physical World
No ratings yet
Adversarial Examples - Attacks and Defenses in The Physical World
12 pages
(2022) Adversarial Attack and Defense: A Survey
No ratings yet
(2022) Adversarial Attack and Defense: A Survey
19 pages
Mimic and Fool System
No ratings yet
Mimic and Fool System
7 pages
Secure Machine Learning Against Adversarial Samples at Test Time
No ratings yet
Secure Machine Learning Against Adversarial Samples at Test Time
15 pages
Bai 等。 - 2021 - AI-GAN Attack-Inspired Generation of Adversarial
No ratings yet
Bai 等。 - 2021 - AI-GAN Attack-Inspired Generation of Adversarial
5 pages
Inconspicuous Adversarial Patches For Fooling Image Recognition Systems On Mobile Devices
No ratings yet
Inconspicuous Adversarial Patches For Fooling Image Recognition Systems On Mobile Devices
10 pages
A Hamiltonian Monte Carlo Method For Probabilistic Adversarial Attack and Learning
No ratings yet
A Hamiltonian Monte Carlo Method For Probabilistic Adversarial Attack and Learning
13 pages
Adversarial Examples Are Not Bugs, They Are Features
No ratings yet
Adversarial Examples Are Not Bugs, They Are Features
13 pages
Face Recognition Attack
No ratings yet
Face Recognition Attack
6 pages
STOCK VICHAR GANN Calculator
No ratings yet
STOCK VICHAR GANN Calculator
154 pages
Explaining and Harnessing Adversarial Examples
No ratings yet
Explaining and Harnessing Adversarial Examples
3 pages
cs231n 2017 Lecture16
No ratings yet
cs231n 2017 Lecture16
43 pages
Dan Iter
No ratings yet
Dan Iter
8 pages
Defense Against Adversarial Attacks Using Convolutional Auto-Encoders
No ratings yet
Defense Against Adversarial Attacks Using Convolutional Auto-Encoders
9 pages
1801 00349 PDF
No ratings yet
1801 00349 PDF
30 pages
Snooping Attacks On Deep Reinforcement Learning: Preprint. Under Review
No ratings yet
Snooping Attacks On Deep Reinforcement Learning: Preprint. Under Review
13 pages
Defense Mechanism Against Adversarial Attacks Using Density-Based Representation of Images
No ratings yet
Defense Mechanism Against Adversarial Attacks Using Density-Based Representation of Images
6 pages
Adversarial Examples: Attacks and Defenses For Deep Learning
No ratings yet
Adversarial Examples: Attacks and Defenses For Deep Learning
20 pages
Entropy 23 00018 v2 30
No ratings yet
Entropy 23 00018 v2 30
1 page
Entropy 23 00018 v2
No ratings yet
Entropy 23 00018 v2
1 page
Entropy 23 00018 v2 43
No ratings yet
Entropy 23 00018 v2 43
1 page
2017 Beginner's Review of Generative Adversarial Networks (GAN) Architectures
No ratings yet
2017 Beginner's Review of Generative Adversarial Networks (GAN) Architectures
9 pages
Ecpe Speaking Notes Part 1 Webinar
100% (3)
Ecpe Speaking Notes Part 1 Webinar
12 pages
Thanjavur Diocese - Transfer List 2024
No ratings yet
Thanjavur Diocese - Transfer List 2024
3 pages
Defense Against Adversarial Attacks On Deep Convolutional Neural Networks Through Nonlocal Denoising
No ratings yet
Defense Against Adversarial Attacks On Deep Convolutional Neural Networks Through Nonlocal Denoising
8 pages
Edward Witten Ponders The Nature of Reality
No ratings yet
Edward Witten Ponders The Nature of Reality
17 pages
Edward Witten Ponders The Nature of Reality
No ratings yet
Edward Witten Ponders The Nature of Reality
17 pages
Chris Hemsworth Workout - The God of Thunder's Thor Workout TRAIN
100% (1)
Chris Hemsworth Workout - The God of Thunder's Thor Workout TRAIN
15 pages
4.4.2 WBS 5.4 Benjamin - Srock - Ch7 - Exercises
No ratings yet
4.4.2 WBS 5.4 Benjamin - Srock - Ch7 - Exercises
5 pages
Grammer Exercise
100% (1)
Grammer Exercise
4 pages
Author: Hugh Lupton: Plot Diagram
100% (1)
Author: Hugh Lupton: Plot Diagram
2 pages
Assignment No 1 9073
No ratings yet
Assignment No 1 9073
11 pages
Specpro Case Digests
No ratings yet
Specpro Case Digests
16 pages
Informative Speech Assignment Packet - Leaders - Online Class
No ratings yet
Informative Speech Assignment Packet - Leaders - Online Class
6 pages
The Natural History of Selborne by Gilbert White Preview
67% (3)
The Natural History of Selborne by Gilbert White Preview
20 pages
Hill Resort Synopsis
91% (22)
Hill Resort Synopsis
5 pages
I-Flange SAE
No ratings yet
I-Flange SAE
59 pages
MX 88 Manual Part 1
No ratings yet
MX 88 Manual Part 1
56 pages
My Goals in Life Essay
100% (2)
My Goals in Life Essay
4 pages
An Analytic Approach For Obtain-Ing Maximal Entropy OWA Operator Weights
No ratings yet
An Analytic Approach For Obtain-Ing Maximal Entropy OWA Operator Weights
13 pages
An Analytic Approach For Obtain-Ing Maximal Entropy OWA Operator Weights
No ratings yet
An Analytic Approach For Obtain-Ing Maximal Entropy OWA Operator Weights
13 pages
Theater Glossary: A Tempo: A Musical Marking Meaning That The Music Has Returned To The Original Speed of The Song
No ratings yet
Theater Glossary: A Tempo: A Musical Marking Meaning That The Music Has Returned To The Original Speed of The Song
7 pages
Dissertation Topics On Luxury Brands
100% (2)
Dissertation Topics On Luxury Brands
6 pages
130 Catchy Social Media Captions For Massage Therapists Healthinomics
No ratings yet
130 Catchy Social Media Captions For Massage Therapists Healthinomics
13 pages
Mastering Chess and Shogi by Self-Play With A General Reinforcement Learning Algorithm
No ratings yet
Mastering Chess and Shogi by Self-Play With A General Reinforcement Learning Algorithm
19 pages
New World Records: First
No ratings yet
New World Records: First
21 pages
Marketing Analytics
No ratings yet
Marketing Analytics
3 pages
6.728 Applied Quantum and Statistical Physics
No ratings yet
6.728 Applied Quantum and Statistical Physics
2 pages
Ptsp-Modifierd (15-09-17) QB
No ratings yet
Ptsp-Modifierd (15-09-17) QB
11 pages
VAT Administrative Exceptions Guide - EN - 12 03 2020
No ratings yet
VAT Administrative Exceptions Guide - EN - 12 03 2020
14 pages
Child Material Well-Being
No ratings yet
Child Material Well-Being
28 pages
DLL - MTB 1 - Q3 - W1
No ratings yet
DLL - MTB 1 - Q3 - W1
8 pages
Sunil Agarwal 144 Reply
No ratings yet
Sunil Agarwal 144 Reply
6 pages
2017 - Unsupervised Neural Machine Translation PDF
No ratings yet
2017 - Unsupervised Neural Machine Translation PDF
11 pages
Based On The Given Data It Is Measured Quantitative
No ratings yet
Based On The Given Data It Is Measured Quantitative
8 pages
Unify Quantum TH WTH Gravitation1709.03809
No ratings yet
Unify Quantum TH WTH Gravitation1709.03809
10 pages
Syllabus
No ratings yet
Syllabus
11 pages
Job Description & Specification.
No ratings yet
Job Description & Specification.
7 pages
Insctructions:: Arab Open University Faculty of Business Studies
No ratings yet
Insctructions:: Arab Open University Faculty of Business Studies
4 pages
Causes of Conflicts and Disputes in Construction P
No ratings yet
Causes of Conflicts and Disputes in Construction P
6 pages
A Versions of Cause and Effect in Technology and Society
No ratings yet
A Versions of Cause and Effect in Technology and Society
2 pages
Secured Borrowing and A Sale of Receivables
100% (2)
Secured Borrowing and A Sale of Receivables
1 page
Ian Talks AI A-Z
From Everand
Ian Talks AI A-Z
Ian Eress
No ratings yet
AI on the Frontlines: Cyber Defence and Offensive Strategies for the Digital Age
From Everand
AI on the Frontlines: Cyber Defence and Offensive Strategies for the Digital Age
GEW Intelligence Unit
No ratings yet

Adversarial Examples in The Physical World

Uploaded by

Adversarial Examples in The Physical World

Uploaded by

Technical report, Google, Inc.

A DVERSARIAL EXAMPLES IN THE PHYSICAL WORLD

Most existing machine learning classifiers are highly vulnerable to adversarial

2 M ETHODS GENERATING ADVERSARIAL IMAGES

2.1 FAST METHOD

where  is a hyper-parameter to be chosen.

2.2 BASIC ITERATIVE METHOD

2.3 I TERATIVE LEAST- LIKELY CLASS METHOD

2.4 C OMPARISON OF METHODS OF GENERATING ADVERSARIAL EXAMPLES

clean image =4 =8  = 16

clean image =4 =8  = 16

Clean image Fast; L distance to clean image = 32

Figure 2: Comparison of different adversarial methods with  = 32. Perturbations generated by

(a) Printout (b) Photo of printout (c) Cropped image

3 P HOTOS OF ADVERSARIAL EXAMPLES

3.2 E XPERIMENTAL SETUP

The experimental procedure was as follows:

Photos Source images

Photos Source images

Table 3: Adversarial image destruction rate with photos.

Adversarial Average case Prefiltered case

3.3 E XPERIMENTAL RESULTS ON PHOTOS OF ADVERSARIAL IMAGES

Results of the photo transformation experiment are summarized in Tables 1, 2 and 3.

4 A RTIFICIAL IMAGE TRANSFORMATIONS

You might also like

where is a hyper-parameter to be chosen.

clean image =4 =8 = 16

clean image =4 =8 = 16

Figure 2: Comparison of different adversarial methods with = 32. Perturbations generated by