ARGAN Adversarially Robust Generative Adversarial Networks For Deep Neural Networks Against Adversarial Examples
ARGAN Adversarially Robust Generative Adversarial Networks For Deep Neural Networks Against Adversarial Examples
ABSTRACT An adversarial example, which is an input instance with small, intentional feature perturbations
to machine learning models, represents a concrete problem in Artificial intelligence safety. As an emerging
defense method to defend against adversarial examples, generative adversarial networks-based defense
methods have recently been studied. However, the performance of the state-of-the-art generative adversarial
networks-based defense methods is limited because the target deep neural network models with generative
adversarial networks-based defense methods are robust against adversarial examples but make a false
decision for legitimate input data. To solve the accuracy degradation of the generative adversarial networks-
based defense methods for legitimate input data, we propose a new generative adversarial networks-based
defense method, which is called Adversarially Robust Generative Adversarial Networks(ARGAN). While
converting input data to machine learning models using the two-step transformation architecture, ARGAN
learns the generator model to reflect the vulnerability of the target deep neural network model against
adversarial examples and optimizes parameter values of the generator model for a joint loss function. From
the experimental results under various datasets collected from diverse applications, we show that the accuracy
of ARGAN for legitimate input data is good-enough while keeping the target deep neural network model
robust against adversarial examples. We also show that the accuracy of ARGAN outperforms the accuracy
of the state-of-the-art generative adversarial networks-based defense methods.
INDEX TERMS Adversarial examples, adversarial perturbation, deep neural networks (DNNs), security.
effective defense technology [9], [10]. The existing GAN- to adversarial examples and presenting a new joint loss
based defense methods are grouped into one of two types function to optimize the parameter values of the generator.
of architectures according to the design purpose of the In the second transformation step, the robust input data are
generator, i.e., noise generation architecture and noise transformed into feeding input data to the target DNN model
reduction architecture. Note that given a training dataset, using the additive inverse of the generator.
GAN learns to generate new data with the same statistics as Compared to the other state-of-the-art GAN-based defense
the training dataset using two deep networks, which are the methods, ARGAN shows the good accuracy for both
generator and the discriminator. The generator in the noise legitimate input data and adversarial examples. From the
generation architecture produces the corrupted input data. experimental results under various datasets collected from
While training DNN models by using both the corrupted diverse applications, we show that the accuracy of ARGAN
input data and legitimate input data, the noise generation for legitimate input data is good-enough while keeping the
architecture makes DNN models robust against adversarial DNN model robust against adversarial examples. We also
examples [11], [12]. On the other hand, the generator in the show that the accuracy of ARGAN outperforms the accuracy
noise reduction architecture produces the purified input data of the state-of-the-art GAN-based defense methods using
whose data distribution is close to the legitimate input data noise generation architecture and noise reduction archi-
distribution [13], [14]. Thus, the noise reduction architecture tecture, e.g., Gandef [12], Defense-GAN [13] and APE-
results in reducing the perturbation of adversarial examples GAN [14].
before feeding input data into the target DNN model. The rest of the paper is organized as follows. In section II,
However, the accuracy of such GAN-based defense meth- the authors overview the well-known adversarial attacks and
ods can decrease when predicting or classifying legitimate the state-of-the-art GAN-based defense methods. Also the
input data. This is because GAN-based defense methods authors describe the motivation of this paper. In section III,
using the noise generation architecture or the noise reduction the authors describe the threat model, the overall operation
architecture use the slightly modified input data, i.e., the and the details of ARGAN. In section IV, the authors
corrupted input data in the noise generation architecture and verify the effectiveness(accuracy) of ARGAN from various
the purified input data in the noise reduction architecture. experimental results under different adversarial attacks,
While the target DNN model is robust against adversarial different datasets and so on. Finally, the authors conclude this
examples due to the modified input data, it can make wrong paper in section V.
decision for legitimate input data and thus, generates the
false positives for legitimate input data. If such GAN- II. PRELIMINARIES AND RELATED WORKS
based defense methods which cause the wrong decision are In this section, after introducing well-known adversarial
used at self-driving systems, bio-medicine systems, and user attacks, we overview previous GAN-based defense methods.
authentication systems [6]–[8] which are sensitive to small We also describe the motivation for considering a new GAN-
accuracy variation, significant side effects can be caused. based defense method by explaining the limitations of the
To solve the accuracy degradation of the existing GAN- previous GAN-based defense methods.
based defense methods for legitimate input data while
keeping the target DNN model robust against adversarial A. ADVERSARIAL ATTACKS
examples, we propose a new GAN-based defense method, In this section, we summarize the characteristics of four
which is called Adversarially Robust GAN (ARGAN). adversarial attacks [16]–[19], which are frequently used for
Similar to EEJE [15], the proposed ARGAN architecture is performance verification in many defense methods.
designed following two-step transformation of input data to Based on linearization of cost functions, Fast Gradient Sign
the target DNN model. That is, in the first transformation Method (FGSM) generates adversarial examples using the
step, the noise data is added into the input data to the sign of the gradient to increase loss of DNN models [16].
target DNN model and in the second transformation step, FGSM is a simple and fast adversarial attacks, but it often
the inverse noise data is added to eliminate the influence makes sub-optimal perturbations. To resolve the problem
of the noise added from the first transformation step on of FGSM, Projected Gradient Descent (PGD) generates
the output. However, ARGAN is designed using a black- adversarial examples using several gradient updates for
box transformation method different from a white-box the fine optimization [17]. To perform fine optimization
transformation method of EEJE. That is, while EEJE requires efficiently, PGD performs iterative gradient updates from
knowledge of the network architecture, weight values and randomly selected an initial point. To calculate minimal
other parameters of the target DNN model when generating perturbation, DeepFool used the iterative linearization of the
the noise data in the first transformation step, ARGAN target DNN model [18]. In each iteration, the adversarial
does not require complete knowledge of model because it perturbation is updated to reach the decision boundary
uses a pre-trained generator as the transformation method. closest to the input data X . To increase attack success rate
Specifically, in the first transformation step, the generator while calculating the minimum perturbation, C&W generates
produces the robust input data against adversarial examples adversarial examples based on various distance metrics such
while reflecting the vulnerability of the target DNN model as L0 , L∞ and L2 norm [19]. In this paper, only L2 type of
C&W method (CW) is considered as a representative method Su et al. proposed a new training framework for GANs
because it is most frequently mentioned in other works [20], to remove the adversarial perturbation in face verification
[21]. field [25]. To improve the performance of the generator,
they used two pre-trained classification networks instead
B. GAN-BASED DEFENSE METHODS of a discriminator. Hwang et al. proposed a Variational
Autoencoder(VAE)-based defense method, called PuVAE,
1) GAN-BASED DEFENSE METHODS USING NOISE
to purify adversarial examples [26]. They used a dilated
GENERATION ARCHITECTURE
Convolutional Neural Networks(CNNs) as a generator to
To make DNN models more robust against adversarial
prevent the information loss from feature selection. Given
examples, the noise generation architecture uses the GANs
a reasonable time constraints, PuVAE showed outstanding
to train the robust DNN model or retrain the current deployed
performance compared to Defense-GAN.
DNN models.
Lee et al. used a GAN framework to make a DNN
3) LIMITATION OF PREVIOUS GAN-BASED DEFENSE
model more robust against adversarial examples [22]. They
METHODS
trained the generator to produce data corresponding to the
Some artificial intelligent systems such as a cloud-based
weaknesses of the target DNN model, and retrained the
user authentication system can be exposed to adversarial
DNN model using the data produced by the generator.
examples. In other words, the adversary can perform an
Liu and Hsieh proposed a training framework, called Rob-
adversarial attack to change the output of DNN model f (·).
GAN, based on the insight that adversarial training and
When the adversarial attack h(·) and a legitimate input data
GANs complement each other [23]. In Rob-GAN, the
X is given, the objective function of the adversary can be
generator and the discriminator are jointly optimized in
expressed into:
the presence of adversarial attacks. Sun et al. used GAN-
based data augmentation approach to improve the adversarial min Pr(l 0 = l)
h(·)
training [11]. They designed the generator to produce
boundary samples for the target DNN model, and used the s.t. l 0 = f ◦ h(X ),
produced boundary samples for adversarial training. Liu et al. l = f (X ). (1)
proposed GanDef, which is a GAN-based adversarial training
To defend against adversarial examples h(X ) in Equation 1,
method that utilizes a discriminator as the regularization of
the artificial intelligence system can apply a defense method.
the feature selection [12]. Even though these defense methods
For example, let us consider the artificial intelligence
provided good robustness against adversarial examples, they
system that applies GAN-based defense method using noise
require a lot of costs while training the new robust DNN
reduction architecture as a defense method. To preserve the
models or retraining the current deployed DNN models.
output of DNN model f (·), the generator G in the noise
reduction architecture transforms the adversarial example
2) GAN-BASED DEFENSE METHODS USING NOISE into the purified input data. When the adversarial attack h(·)
REDUCTION ARCHITECTURE and a legitimate input data X is given, the objective function
To make adversarial examples less threatened, the noise of the defense method can be expressed into:
reduction architecture reduces perturbation of adversarial
max Pr(l 0 = l)
examples before feeding into DNN model. G(·)
Samangouei et al. proposed a defense method utilizing s.t. l 0 = f ◦ G ◦ h(X ),
GANs, called Defense-GAN, to make the generator distribu-
l = f ◦ G(X ). (2)
tion closer to the data distribution [13]. To find the input data
of generator, they performed the gradient descent iteratively. However, since such defense method transforms even the
Defense-GAN especially worked well for the black-box legitimate input data X while removing perturbations of
attack as well as the white-box attack. Shen et al. proposed the adversarial examples, it may decrease the accuracy of the
APE-GAN, which eliminate the adversarial perturbation of artificial intelligence system for the legitimate input data.
the input data using GANs [14]. They designed the generator The decrease in accuracy for the legitimate input data may
loss that combines content loss and adversarial loss to make be more sensitive than the increase in robustness for the
the adversarial examples highly consistent with legitimate adversarial examples in specific applications, such as self-
data distribution. APE-GAN worked well without knowledge driving car and user authentication. In other words, the
of architecture and parameters of the target DNN model. small decrease in accuracy for the legitimate input data can
Santhanam and Grnarova proposed a GAN-based defense determine whether to apply defense methods for adversarial
method, called cowboy, for detection and purification of examples.
adversarial examples [24]. They used a discriminator to detect For a more specific explanation of why the accuracy
the adversarial examples, and used a generator to purify for legitimate input data is decreased, let us consider the
the detected adversarial examples. Also, they showed that artificial intelligence system that applies APE-GAN, which
adversarial examples lie outside of the data distribution. is a representative method of noise reduction architecture,
FIGURE 1. Ten legitimate images randomly selected from CIFAR-10 test dataset, where each image is transformed using APE-GAN.
A. THREAT MODEL
Let us consider a cloud-based user authentication system
for Machine Learning as a Service(MLaaS). By using the
trained DNN model, the cloud-based user authentication
systems such as SMARTLET [6], FACECUBE [7], and
FIGURE 2. Data distribution of legitimate input data and feeding input
MOCHA [8] predict the identity of the transmitted face
data to the target DNN model under no adversarial attacks. Here, image. In the legitimate situation without adversarial attacks,
1000 test data randomly selected from CIFAR-10 test dataset were used a user face image is taken by IP camera or smartphone
for input data on ResNet-20 model and the term ‘Euclidean similarity’
represents the average euclidean similarity between legitimate input data at the client. The user face image is transmitted to the
and feeding input data. cloud-based face recognition server through either the
unencrypted or encrypted session. As a result, the cloud-
based face recognition server returns a normal prediction
as a defense method. Some examples of APE-GAN from result. However, under the man-in-the-middle(MITM) threat,
CIFAR-10 test dataset [27] under no adversarial attacks are an adversary secretly alters the communications between two
shown in Fig. 1. Here, the legitimate input data represents the parties who believe that they are directly communicating
randomly selected samples for each class, the feeding input with each other [28]. Even though the user face image is
data represents the transformed input data by APE-GAN and transmitted through the encrypted session, an adversary still
the difference represents difference in the legitimate input can easily bypass it using state-of–the-art MITM threats
data and the feeding input data. From ‘‘difference’’ in Fig. 1, based on Renegotiation, Version Rollback and so on [29],
it is observed that most of the pixels of the legitimate input [30]. After the legitimate session between the client and
data are transformed by APE-GAN even though adversarial the cloud-based face recognition server is altered through
attacks do not occur. Note that the transformed pixels corrupt an MITM attack, the adversary modifies the legitimate face
key features that directly affect classification. In Fig. 2, image with slight perturbations. Here, it is assumed that the
we show the data distribution of legitimate input data and adversary has complete access to target DNN model in cloud
feeding input data. It is observed that data distribution of the server. In other words, the adversary can get the architecture
feeding input data, transformed by APE-GAN, is different and parameters of target DNN model. As a result, adversary
from that of the legitimate input data. Such difference causes can relay the perturbed face image to the cloud-based face
the degradation of classification accuracy. These observations recognition server instead of the legitimate face image. Since
motivate us to seek a new GAN-based defense method which the outputs given by neural networks for the legitimate face
provides the good accuracy not only for the adversarial image and the perturbed face image are largely different,
example but also for the legitimate input data. the cloud-based face recognition server returns an abnormal
prediction result.
III. PROPOSED METHOD In this threat model, the goal of the defender is to provide
In this section, we overview the operation of ARGAN in the robustness against adversarial attacks. It is assumed that
detail. First, after introducing the targeted threat model, the the defender can be the administrator of the cloud-based
overall operation and the objective function of ARGAN face recognition server. Also it is assumed that the defender
has trained the defense model for their system and has pre- adversarial attacks, say H = {FGSM, PGD, DeepFool, C&W
deployed it to the IP camera or smartphone at the client. }, it is said to the robust input data Xr if it has the following
characteristic:
B. OVERALL OPERATION OF ARGAN
arg max f · p−1 · h(Xr )
Different from the previous GAN-based defense methods,
ARGAN performs two-step input transformation before feed- = arg max f (X ) for ∀h ∈ H
ing the input data into DNN models. The first transformation s.t. Xr = p(X ), (5)
is performed on the client side and the second transformation
is performed on the cloud server side. Overall defense where p(·) is a conversion function which adds the random
operation procedures are as follows: noise to the input data X , and p−1 (·) is an inverse function of
1) The client performs the first transformation which p(·), which restores the input data X transformed by p(·). Note
transforms the legitimate input data X into the robust that it is difficult for random noise to satisfy Definition 1 for
input data Xr using the generator G before transmitting all H . Therefore, we obtained Xr through several experiments
the data to the cloud server. using various sizes and types of noise.
2) The cloud server performs the second transformation, On the other hand, a discriminator of ARGAN is trained
which restores the key features corrupted by the to distinguish the transformed input data G(X ) by generator
generator G in the first transformation before feeding from robust input data Xr . In other words, a discriminator
the data to the DNN model. encourages the transformed input data G(X ) to better reflect
When the adversarial attack h(·) and a legitimate input data the features of the robust input data Xr . Thus, the generator
X are given, the objective function of ARGAN is expressed and discriminator of ARGAN is defined to solve the
into: adversarial zero sum problem as follow:
min max Pr(l 0 = l) + kXf −X k2 min max [EXr ∼pdata (Xr ) log DθD (Xr )
Xf G(·) θG θD
0 −1
s.t. l = f ◦ G ◦ h ◦ G(X ), − EX ∼pdata (X ) log(DθD (GθG (X )))], (6)
l = f ◦ G−1 ◦ G(X ) = f (X ), (3) where θG and θD represent the parameters of generator G and
where G(·) is the first transformation performed by the gen- discriminator D, respectively.
erator on the client side, G−1 (·) is the second transformation In Equation 6, the parameter θG can be obtained by
performed on the server side, and kXf −X k2 is a difference optimizing the generator loss function. Here, the generator
in Xf and the legitimate input data X . Here, Xf represents a loss function lG is defined as a joint loss consisting of a
feeding input data of DNN model and is defined as follow adversarial loss, a distance loss and a target loss as shown
according to whether adversarial attack h(·) occurs: in Fig. 3.
( The adversarial loss ladv measures the discriminator error
G−1 ◦ h ◦ G(X ), if h(·) is occurred; for the transformed input data G(X ) and is calculated as
Xf = (4)
G ◦ G(X ) = X , otherwise.
−1 follow:
Note that, different from the previous GAN-based defense ladv = 1 − log(D(G(X )), (7)
methods, ARGAN shows the good accuracy for legitimate
input data. As shown in Equations. 3 and 4, if adversarial which encourages the generator distribution to match the
attack h(·) do not occur, DNN models in ARGAN return the robust input data distribution for N number of training data.
same result as DNN model with no defense method. This The distance loss ldist measures the difference between the
is because the second transformation, G−1 (·), is the additive transformed input data G(X ) and the robust input data Xr and
inverse of the first transformation G(·) for a legitimate input is calculated as follow:
data X . ldist = d(G(X ), Xr ), (8)
C. GAN ARCHITECTURE OF ARGAN where d(·) is a distance function that measures the similarity
The goal of ARGAN is to provide the accuracy for legitimate between the transformed input data G(X ) and the robust input
input data is good-enough while keeping the DNN model data Xr . The L2 distance is used so that G(X ) matches Xr even
robust against adversarial example. the key features. That is, the distance loss ldist encourages
To achieve this, the generator of ARGAN is designed by the internal representation of the transformed input data G(X )
reflecting the vulnerability of target DNN model. That is, the match to that of the robust input data Xr .
generator of ARGAN learns the distribution of robust input The target loss ltarget measures the error of the target DNN
data Xr and produces the robust input data Xr with a given X model and is calculated as follow:
as its input data. Here, the robust input data Xr is defined as ltarget = log(f (r(G(X )) − (G(X ) − X ), y)), (9)
follow:
Definition 1 (Robust Input Data): Let X be a given input where f (·) is a target DNN model, y is a class of the legitimate
data and f be a target DNN model. Let H be a set of input data X and r(·) is a random noise addition function
model. The operational details of the cloud server are shown produces correct prediction result because the truck sample is
in Algorithm 2. After decrypting (or not) the input data Xt completely restored at the second transformation. In Fig. 4b,
transmitted from the client (Lines 1 to 4), the cloud server we also show some examples of ARGAN in the CIFAR-10
performs the second transformation using the perturbation Pr test dataset under no adversarial attacks. Here, the legitimate
(Line 5). Pr is the additive inverse of the first transformation input data represents the randomly selected samples for each
and used to restore the key features. Here, the cloud server class, the feeding input data represents the restored input
does not know whether an adversarial attack has occurred. data by ARGAN. From the ‘difference’ row in Fig. 4b,
If the adversarial attack does not occur, the legitimate input it is observed that there is no difference between the
data X is restored due to the inverse relationship between the legitimate input data and the feeding input data because each
first transformation and the second transformation. Even if legitimate input data is completely restored by the second
the adversarial attack occurs, most of the key features are transformation.
restored by the second transformation. This is because the In Fig. 5a, we show a example of ARGAN for a truck
key features which affect classification are already corrupted sample under DeepFool adversarial attack. The operation of
by the generator, and it causes the magnitude of perturbation the client is the same as Fig. 4a. However, in this example,
added by the adversary to the robust input data Xr to be the adversary generates the adversarial example for the
smaller than the magnitude of perturbation added to the truck sample using DeepFool and transmits it to the cloud
legitimate input data X . After analyzing the feeding input data server. Here, the magnitude of perturbation added by the
Xf using DNN model (Line 6), the cloud server transmits the adversary is very small because the key features which affect
encrypted analysis result Rt to the client (Lines 7 to 11). prediction are already corrupted when the client performs
the first transformation. Then, the cloud server performs
the second transformation for the adversarial example and
F. OPERATIONAL EXAMPLE feeds it to DNN model. Even if the cloud server performs
In this section, we show some examples of how ARGAN the second transformation for the adversarial example, the
works using CIFAR-10 test dataset. DNN model produces correct prediction result. This is
In Fig. 4a, we show an example of ARGAN for a truck because most of the key features are restored at the second
sample under no adversarial attacks. Before transmitting transformation. In Fig. 5b, we also show some examples
to the cloud server, the client transforms a truck sample of ARGAN in the CIFAR-10 test dataset under DeepFool
and calculates the perturbation. Then, the client transmits adversarial attack. As shown in the ‘difference’ row in Fig. 5b,
a transformed truck sample and a calculated perturbation it is also observed that there is still no large difference
to the cloud server. The cloud server restores the truck between the legitimate input data and the feeding input data
sample in the second transformation and feeds it to DNN because the legitimate input data are restored by the second
model to produce prediction result. Here, the DNN model transformation.
IV. EVALUATION RESULTS 10 classes. We used randomly selected 1000 data instances
To show the effectiveness of ARGAN, we measured the from CIFAR-10 test dataset.
performance of ARGAN under various conditions including When measuring the influence of different adversarial
various dataset and different adversarial attacks [16]–[19]. attacks on ARGAN and other defense methods, the parameter
Specifically, the performance of ARGAN is evaluated by values are set as follows: (1) 0.3 for the magnitude of
answering the following questions: perturbation () in FGSM; (2) 10 and 0.3 for the number of
• Does ARGAN show the better performance than the iterations (N ) and , respectively, in PGD; (3) 50 and 0.02 for
other state-of-the-art defense methods? the maximum number of iterations and overshoot to prevent
• Does ARGAN show the good performance under updates from vanishing, respectively, in DeepFool; and (4)
various datasets collected from diverse applications? 0 for the parameter to control the confidence value(κ) in
• How do different combinations of loss functions influ- C&W’s method. These parameter values are set following
ence on the performance of ARGAN? the recommended configuration values from the cleverhans
• Does ARGAN satisfy the basic sanity tests? library [32] and some representative works [20], [33].
• Does ARGAN show the good performance under The classification models are implemented using
adaptive attack scenario? TensorFlow-gpu version 1.15.1 and Python version 2.7.15,
and performed adversarial attacks by using the cleverhans
software library, which provides standardized reference
A. EXPERIMENTAL ENVIRONMENT implementations of adversarial examples [32]. For the
When evaluating the performance of ARGAN, the exper- efficient experiments, the performance is measured on the
iments are performed using the CIFAR-10 color image Ubuntu 18.04.1 LTS machine with kernel version 4.15.0-36-
dataset [13]. CIFAR-10 dataset consists of 50,000 train- generic, 2.40GHz CPU clock(Intel Xeon CPU E5-2630 v3),
ing images and 10,000 testing images corresponding to GeForce GTX 2080 Ti, and 32GB memory.
TABLE 1. Comparison results with Gandef and APE-GAN using CIFAR-10 dataset.
FIGURE 6. Data distribution of legitimate input data and feeding input data under various adversarial attacks.
TABLE 2. Comparison results with Gandef and APE-GAN using MNIST dataset.
similarity of 1 means that the legitimate input data and the TABLE 3. Comparison results with Defense-GAN and APE-GAN using
CelebA dataset.
feeding input data are perfectly the same. In other words,
it means ARGAN can restore legitimate input data perfectly.
Also it is observed that ARGAN shows the better similarity
of the data distribution for most adversarial attacks than APE-
GAN. Especially, as shown in Fig. 6(d) and Fig. 6(e), the data
distribution of feeding input data in ARGAN for the state-
of-the-art adversarial examples generation methods, such as
DeepFool and C&W methods, are almost similar to that of
legitimate input data.
Result 1: ARGAN showed good outputs for legitimate
input data as well as adversarial examples than state-of-
the-art defense methods. These observations imply that
ARGAN can be used as a stand-alone defense method
against various adversarial attacks.
TABLE 4. Experimental results with the combinations of various loss functions on ARGAN using CIFAR-10 dataset.
TABLE 5. Classification accuracy of ARGAN under the various denoising techniques of the adversary.
perturbation was greater than 0.2. This means that ARGAN on average, respectively. For gray-box attack, it is observed
reached levels of random guessing on CIFAR-10 dataset that key features in input data are restored by the sec-
which consist of 10 classes. ond transformation even though the first transformation
Result 4: ARGAN satisfies the basic sanity test while is mitigated by the adversary. For example, even though
providing good robustness than non-defense model. the adversary mitigates the first transformation with a bit
depth 5-bit filter, ARGAN showed the classification accuracy
by as much as 83.9% against DeepFool. For white-box
5) DOES ARGAN SHOW THE GOOD PERFORMANCE UNDER attack, the classification accuracy of ARGAN significantly
ADAPTIVE ATTACK SCENARIO? decreased by as much as 13.14%, but is still higher than non-
To evaluate the performance of ARGAN under the worst- defense architecture. Also, ARGAN against DeepFool under
case adversary, the performance of ARGAN is measured on white-box attack showed the lower accuracy than ARGAN
ResNet-20 for randomly selected 1000 data instances from against DeepFool under gray-box attack. For example, while
CIFAR-10 test dataset under the scenario where the adversary ARGAN against DeepFool under the gray-box attack showed
is aware of ARGAN architecture. Note that, an adversary the classification accuracy by as much as 71.12% on
may assume that input data has been processed at the average, ARGAN against DeepFool under the white-box
client, but does not know details of how to process input attack showed the classification accuracy by as much as
data. In other words, the adversary has no direct access to 16.46% on average.
the first transformation. Instead, the adversary can mitigate Result 6: ARGAN effectively works even under the
the effectiveness of the first transformation by applying adaptive attack scenario. Especially, ARGAN shows the
the denoising techniques. On the other hand, an adversary good enough performance under the gray-box attack.
can control the second transformation only when he/she
has complete access to the server. Thus, the classification
accuracy is measured under two attack scenarios: (1) a V. CONCLUSION
white-box attack; and (2) a gray-box attack. For white-box With the evolution of deep learning technology, adversarial
attack, an adversary has complete access to the server and example is mainly highlighted as one of the most severe
can bypass the second transformation. On the other hand, problem of deep learning technology. Especially, adversarial
for gray-box attack, where an adversary only knows the example can cause severe damage on cloud-based deep
parameter values used for training deep neural networks and learning environment where a MITM attack can be occurred.
cannot bypass the second transformation. When measuring To defend such adversarial examples, two types of GAN-
the classification accuracy of ARGAN under two attack based defense methods have actively been studied as an
scenarios, five denoising techniques are used to mitigate the emerging efficient defense methods: (1) noise generation
first transformation. architecture; and (2) noise reduction architecture. However,
As shown in Table 5, the classification accuracy of the accuracy of such GAN-based defense methods decreases
ARGAN decreased under both white-box and gray-box when predicting and classifying the legitimate input data.
attacks. For example, while the stand-alone ARGAN showed In this paper, we propose ARGAN, a new GAN-based
the classification accuracy by as much as 40.65%, ARGAN defense method to provide good outputs for adversarial
under white-box attack and gray-box attack showed the examples as well as legitimate input data. The generator
classification accuracy by as much as 13.14% and 28.48% in ARGAN produces robust input data by reflecting the
vulnerability of the target DNN model while simultaneously [18] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, ‘‘DeepFool: A
training with the discriminator. From evaluation results under simple and accurate method to fool deep neural networks,’’ 2015,
arXiv:1511.04599.
various experimental conditions, it is observed that ARGAN [19] N. Carlini and D. Wagner, ‘‘Towards evaluating the robustness of neural
provided robustness to target DNN models against various networks,’’ 2016, arXiv:1608.04644.
state-of-the-art adversarial attacks while maintaining the [20] C. Guo, M. Rana, M. Cisse, and L. van der Maaten, ‘‘Countering
adversarial images using input transformations,’’ 2017, arXiv:1711.
high accuracy even for legitimate input data. Also, it is 00117.
observed that ARGAN showed better performance than [21] A. Prakash, N. Moran, S. Garber, A. DiLillo, and J. Storer, ‘‘Deflecting
the state-of-the-art GAN-based defense methods such as adversarial attacks with pixel deflection,’’ 2018, arXiv:1801.08926.
[22] H. Lee, S. Han, and J. Lee, ‘‘Generative adversarial trainer: Defense to
Gandef, Defense-GAN and APE-GAN. From such results, adversarial perturbations with GAN,’’ 2017, arXiv:1705.03387.
it is believed that ARGAN presented the necessity of various [23] X. Liu and C.-J. Hsieh, ‘‘Rob-GAN: Generator, discriminator, and
studies on the GAN-based defense architecture. adversarial attacker,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Recognit. (CVPR), Jun. 2019, pp. 11234–11243.
[24] G. K. Santhanam and P. Grnarova, ‘‘Defending against adversarial attacks
REFERENCES by leveraging an entire GAN,’’ 2018, arXiv:1805.10652.
[25] Y. Su, G. Sun, W. Fan, X. Lu, and Z. Liu, ‘‘Cleaning adversarial
[1] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, perturbations via residual generative network for face verification,’’
and R. Fergus, ‘‘Intriguing properties of neural networks,’’ in Int. Conf. in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP),
Learn. Representations, 2014. May 2019, pp. 2597–2601.
[2] T. Ray. (2020). Deep Learning Godfathers Bengio, Hinton, and [26] U. Hwang, J. Park, H. Jang, S. Yoon, and N. Ik Cho, ‘‘PuVAE: A
Lecun Say the Field Can Fix its Flaws. ZDNet. [Online]. Available: variational autoencoder to purify adversarial examples,’’ 2019, arXiv:1903.
https://www.zdnet.com/article/deep-learning-godfathers-bengio-hinton- 00585.
an%d-lecun-say-the-field-can-fix-its-flaws/ [27] A. Krizhevsky, V. Nair, and G. Hinton. (2009). Cifar-10 (Canadian Insti-
[3] M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, tute for Advanced Research). [Online]. Available: http://www.cs.toronto.
L. D. Jackel, M. Monfort, U. Müller, J. Zhang, X. Zhang, J. Zhao, edu/~kriz/cifar.html
and K. Zieba, ‘‘End to end learning for self-driving cars,’’ 2016, [28] Z. Whittaker. (2017). Dozens of Popular Iphone Apps Vulnerable
arXiv:1604.07316. to Man-in-the-Middle Attacks. ZDNet. [Online]. Available:
[4] F. Amato, A. López, E. M. Peña-Méndez, P. Vanhara, A. Hampl, https://www.zdnet.com/article/dozens-of-popular-iphone-apps-
and J. Havel, ‘‘Artificial neural networks in medical diagnosis,’’ vulnerable-%to-man-in-the-middle-attacks/
J. Appl. Biomed., vol. 11, no. 2, pp. 47–58, 2013. [Online]. Available: [29] F. Giesen, F. Kohlar, and D. Stebila, ‘‘On the security of TLS
https://www.sciencedirect.com/science/article/pii/S1214021X14600570 renegotiation,’’ in Proc. ACM SIGSAC Conf. Comput. Commun. Secur.
[5] M. Shirvanian and N. Saxena, ‘‘Stethoscope: Crypto phones with (CCS), 2013, pp. 387–398.
transparent & robust fingerprint comparisons using inter text-speech [30] B. Möller, T. Duong, and K. Kotowicz, ‘‘This POODLE bites:
transformations,’’ in Proc. 17th Int. Conf. Privacy, Secur. Trust (PST), Exploiting The SSL 3.0 fallback,’’ Google, Mountain View, CA, USA,
Aug. 2019, pp. 1–10. Tech. Rep., Sep. 2014. Accessed: Mar. 22, 2022. [Online]. Available:
[6] M. F. R. M. Billah and M. A. Adnan, ‘‘SMARTLET: https://www.openssl.org/~bodo/ssl-poodle.pdf
A dynamic architecture for real time face recognition in smartphone [31] G. Liu, I. Khalil, and A. Khreishah, ‘‘ZK-GanDef: A GAN based zero
using cloudlets and cloud,’’ Big Data Res., vol. 17, pp. 45–55, knowledge adversarial training defense for neural networks,’’ in Proc. 49th
Sep. 2019. Annu. IEEE/IFIP Int. Conf. Dependable Syst. Netw. (DSN), Jun. 2019,
[7] G. Ofualagba, O. Osas, I. Orobor, I. Oseikhuemen, and O. Etse, ‘‘Auto- pp. 64–75, doi: 10.1109/DSN.2019.00021.
mated student attendance management system using face recognition,’’ Int. [32] N. Papernot, I. Goodfellow, R. Sheatsley, R. Feinman, and P. McDaniel,
J. Educ. Res. Inf. Sci., vol. 5, pp. 31–37, Sep. 2018. ‘‘Technical report on the CleverHans v2.1.0 adversarial examples library,’’
[8] T. Soyata, R. Muraleedharan, C. Funai, M. Kwon, and W. Heinzelman, 2016, arXiv:1610.00768.
‘‘Cloud-vision: Real-time face recognition using a mobile-cloudlet- [33] W. Xu, D. Evans, and Y. Qi, ‘‘Feature squeezing: Detecting adversarial
cloud acceleration architecture,’’ in Proc. IEEE Symp. Comput. Com- examples in deep neural networks,’’ 2017, arXiv:1704.01155.
mun. (ISCC), Jul. 2012, pp. 59–66. [Online]. Available: http://dblp.uni- [34] Y. LeCun and C. Cortes. (2010). MNIST Handwritten Digit Database.
trier.de/db/conf/iscc/iscc2012.html#SoyataMFKH12 [Online]. Available: http://yann.lecun.com/exdb/mnist/
[35] Z. Liu, P. Luo, X. Wang, and X. Tang, ‘‘Deep learning face attributes
[9] I. Rosenberg, A. Shabtai, Y. Elovici, and L. Rokach, ‘‘Defense methods
in the wild,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015,
against adversarial examples for recurrent neural networks,’’ 2019,
pp. 3730–3738.
arXiv:1901.09963.
[36] N. Carlini, A. Athalye, N. Papernot, W. Brendel, J. Rauber, D. Tsipras,
[10] W. Hu and Y. Tan, ‘‘Generating adversarial malware examples for black- I. Goodfellow, A. Madry, and A. Kurakin, ‘‘On evaluating adversarial
box attacks based on GAN,’’ 2017, arXiv:1702.05983. robustness,’’ 2019, arXiv:1902.06705.
[11] K. Sun, Z. Zhu, and Z. Lin, ‘‘Enhancing the robustness of deep neural
networks by boundary conditional GAN,’’ 2019, arXiv:1902.11029.
[12] G. Liu, I. Khalil, and A. Khreishah, ‘‘GanDef: A GAN based adver-
sarial training defense for neural network classifier,’’ 2019, arXiv:1903.
02585.
[13] P. Samangouei, M. Kabkab, and R. Chellappa, ‘‘Defense-GAN: Protecting
classifiers against adversarial attacks using generative models,’’ 2018,
arXiv:1805.06605.
[14] S. Shen, G. Jin, K. Gao, and Y. Zhang, ‘‘APE-GAN: Adversarial
perturbation elimination with GAN,’’ 2017, arXiv:1707.05474. SEOK-HWAN CHOI received the B.E. degree
[15] S.-H. Choi, J. Shin, P. Liu, and Y.-H. Choi, ‘‘EEJE: Two-step input from Pusan National University, Busan, South
transformation for robust DNN against adversarial examples,’’ IEEE Trans. Korea, in 2016, where he is currently pursuing the
Netw. Sci. Eng., vol. 8, no. 2, pp. 908–920, Apr. 2021. Ph.D. degree in computer science and engineering.
[16] I. Goodfellow, J. Shlens, and C. Szegedy, ‘‘Explaining and harnessing His research interests include security for artificial
adversarial examples,’’ in Proc. Int. Conf. Learn. Represent., 2015, intelligence, adversarial examples, and intrusion
pp. 1–11. detection.
[17] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu,
‘‘Towards deep learning models resistant to adversarial attacks,’’ 2017,
arXiv:1706.06083.
JIN-MYEONG SHIN received the B.E. degree served on more than 100 program committees and reviewed papers for
from Pusan National University, Busan, South numerous journals. He received the DOE Early Career Principle Investigator
Korea, in 2017, where he is currently pursuing the Award. He has co-led the effort to make Penn State an NSA-Certified
Ph.D. degree in computer science and engineering. National Center of Excellence in Information Assurance Education and
His research interests include security for artificial Research. He has advised or co-advised more than 30 Ph.D. dissertations
intelligence, homomorphic encryption, and intru- to completion.
sion detection.