Adversarial Examples On Graph Data: Deep Insights Into Attack and Defense
Adversarial Examples On Graph Data: Deep Insights Into Attack and Defense
Abstract and its recent variants [Kipf and Welling, 2017] perform con-
volution operations in the graph domain by aggregating and
Graph deep learning models, such as graph convo- combining the information of neighbor nodes. In these works,
lutional networks (GCN) achieve remarkable per- both node features and the graph structures (i.e., edges) are
formance for tasks on graph data. Similar to other considered for classifying nodes.
types of deep models, graph deep learning mod- Deep learning methods are often criticized for their lack
els often suffer from adversarial attacks. However, of robustness [Goodfellow et al., 2015]. In other words, it is
compared with non-graph data, the discrete fea- not difficult to craft adversarial examples by only perturbing
tures, graph connections and different definitions a tiny portion of examples to fool the deep neural networks to
of imperceptible perturbations bring unique chal- give incorrect predictions. Graph convolutional networks are
lenges and opportunities for the adversarial attacks no exception. These vulnerabilities under adversarial attacks
and defenses for graph data. In this paper, we pro- are major obstacles for deep learning applications to be used
pose both attack and defense techniques. For at- in the safety-critical scenarios. In graph neural networks, one
tack, we show that the discreteness problem could node can be a user in the social network or an e-commerce
easily be resolved by introducing integrated gradi- website. A malicious user may manipulate his profile or con-
ents which could accurately reflect the effect of per- nect to targeted users on purpose to mislead the analytics sys-
turbing certain features or edges while still benefit- tem. Similarly, adding fake comments to specific products
ing from the parallel computations. For defense, we can fool the recommender systems of a website.
observe that the adversarially manipulated graph
The key challenge for simply adopting existing adversar-
for the targeted attack differs from normal graphs
ial attack techniques used in non-graph data on graph con-
statistically. Based on this observation, we propose
volutional networks is the discrete input problems. Specifi-
a defense approach which inspects the graph and
cally, the features of the graph nodes are often discrete. The
recovers the potential adversarial perturbation. Our
edges, especially those in unweighted graphs, are also dis-
experiments on a number of datasets show the ef-
crete. To address this, some recent studies have proposed
fectiveness of the proposed methods.
greedy methods [Wang et al., 2018; Zügner et al., 2018]
to attack the graph-based deep learning systems. A greedy
method to perturb either features or graph structure itera-
1 Introduction tively. Graph structure and features statistics are preserved
Graph is commonly used to model many real-world rela- during the greedy attack. In this paper, we show that al-
tionships, such as social networks [Newman et al., 2002], though having the discrete input issue, the gradients can still
citation networks and transactions [Ron and Shamir, 2013] be approximated accurately by integrated gradients. Inte-
and the control-flow of programs [Allen, 1970]. The recent grated gradients approximate Shapley values [Hart, 1989;
advance [Kipf and Welling, 2017; Veličković et al., 2018; Lundberg and Lee, 2016] by integrating partial gradients with
Cao et al., 2016; Henaff et al., 2015] in deep learning ex- respect to input features from reference input to the actual in-
pands its applications on graph data. One common task on put. Integrated gradients greatly improve the efficiency of the
graph data is node classification: for a graph and labels of a node and edge selection in comparison to iterative methods.
portion of nodes, the goal is to predict the labels for the unla- Compared with explorations in attacks, the defense of ad-
belled nodes. This can be used to classify the unknown roles versarial examples in graph models is not well-studied. In
in the graph. For example, topics of papers in the citation this paper, we show that one key reason for the vulnerabili-
network, customer types in the recommendation systems. ties of graph models, such as GCN, is that these models are
Compared with the classic methods [Bhagat et al., 2011; essentially aggregating the features according to graph struc-
Xu et al., 2013], deep learning starts to push forward the per- tures. They heavily rely on the nearest neighboring informa-
formance of node classification tasks. The graph convolu- tion while making predictions on target nodes. We looked
tional networks [Bruna et al., 2013; Edwards and Xie, 2016] into the perturbations made by the existing attack techniques
and found that adding edges which connect to nodes with dif- adversarial examples by performing gradient update along the
ferent features plays the key role in all of the attack methods. direction of the sign of gradients of loss function w.r.t each
In this paper, we show that simply performing pre-processing pixel for image data. Their perturbation can be expressed as:
to the adjacency matrix of the graph is able to identify the
manipulated edges. For nodes with bag-of-words (BOW) η = sign(∇Jθ (x, l)) (3)
features, the Jaccard index is effective while measuring the
where is the magnitude of the perturbation. The gener-
similarities between connected nodes. By removing edges 0
that connect very dissimilar nodes, we are able to defend the ated example is x = x + η.
targeted adversarial attacks without decreasing the accuracy JSMA attack was first proposed in [Papernot et al., 2016].
of the GCN models. Our results on a number of real-world By exploiting the forward derivative of a DNN model, one
datasets show the effectiveness and efficiency of the proposed can find the adversarial perturbations that force the model to
attack and defense. misclassify the test point into a specific target class. Given a
feed-forward neural network F and sample X, the Jacobian
is computed by:
2 Preliminaries
2.1 Graph Convolutional Network
∂F (X) ∂Fj (X)
Given an attributed graph G = (A, X ), A ∈ [0, 1] N ×N ∇F (X) = = (4)
∂X ∂xi
is the adjacency matrix and X ∈ [0, 1]D represents the i∈1...M,j∈1...N
D-dimenisonal binary node features. Assuming the in- where the dimensions for the model output and input data
dices for nodes and features are V = {1, 2, ..., N } and are M and N , respectively. To achieve a target class t, one
F = {1, 2, ..., D}, respectively. We then consider the wants Ft (X) gets increased while Fj (X) for all the other
task of semi-supervised node classification where a subset j 6= t to decrease. This is accomplished by exploiting the
of nodes VL ⊆ V are labelled with labels from classes C = adversarial saliency map which is defined by:
{1, 2, ..., cK }. The target of the task is to map each node in
the graph to a class label. This is often called transductive ( )
∂Ft (X) ∂Fj (X)
learning given the fact that the test nodes are already known 0, if ∂Xi < 0 or Σj6=t ∂Xi >0
during the training time. S(X, t)[i] = ∂Ft (X) ∂Fj (X)
∂Xi |Σj6=t ∂Xi |, otherwise
In this work, we study Graph Convolutional Network
(5)
(GCN) [Kipf and Welling, 2017], a well-established method
Starting from a normal example, the attacker follows the
for semi-supervised node classifications. For GCN, initially,
saliency map and iteratively perturb the example with a very
H 0 = X. The GCN model then follows the following rule to
tiny amount until the predicted label is flipped. For untar-
aggregate the neighboring features:
geted attack, one tries to minimize the prediction score for
1 1 the winning class.
H (l+1) = σ(D̃− 2 ÃD̃− 2 H (l) W (l) ) (1)
where à = A + IN is the adjacency matrix of the graph 2.3 Defense for Adversarial Examples
G with self connections added, D̂ is a diagonal matrix with Although adversarial attack for a graph is a relatively new
D̃i,i = Σj Ãij , and σ is the activation function to introduce topic, a few works have been done as the defense for adver-
non-linearity. Each of the above equation corresponds to one sarial images on convolutional neural networks (e.g., [Xu
graph convolution layer. A fully connected layer with soft- et al., 2018; Papernot and McDaniel, 2018]). For images,
max loss is usually used after L layers of graph convolution as the feature space is continuous, adversarial examples are
layers for the classification. A two-layer GCN is commonly carefully crafted with little perturbations. Therefore, in some
used for semi-supervised node classification tasks [Kipf and cases, adding some randomization to the images is able to
Welling, 2017]. The model can, therefore, be described as: defend the attacks [Xie et al., 2018]. Other forms of input
pre-processing, such as local smoothing [Xu et al., 2018] and
image compression [Shaham et al., 2018] have also been used
Z = f (X, A) = softmax(Âσ(ÂXW (0) )W (1) ) (2) to defend the attacks. These pre-processing works based on
1 1 the observation that neighboring pixels of natural images are
where  = D̃− 2 ÃD̃− 2 .  is essentially the symmetri-
normally similar. Adversarial training [Tramèr et al., 2018]
cally normalized adjacency matrix. W (0) and W (1) are the introduces the generated examples to the training data to en-
input-to-hidden and hidden-to-output weights, respectively. hance the robustness of the model.
2.2 Gradients Based Adversarial Attacks
Gradients are commonly exploited to attack deep learning 3 Integrated Gradients Guided Attack
models [Yuan et al., 2019]. One can either use the gradi- Although FGSM and JSMA are not the most sophisticated
ents of the loss function or the gradients of the model out- attack techniques, they are still not well-studied for graph
put w.r.t the input data to achieve the attacks. Two examples models. For image data, the success of FGSM and JSMA
are Fast Gradient Sign Method (FGSM) attack and Jacobian- benefits from the continuous features in pixel color space.
based Saliency Map Approach (JSMA) attack. Fast Gradient However, recent explorations in the graph adversarial attack
Sign Method (FGSM) [Ian J. Goodfellow, 2014] generates techniques [Zügner et al., 2018; Dai et al., 2018] show that
simply applying these methods may not lead to successful IG-FGSM attack, the optimization goal is to maximize the
attacks. These work address this problem by either using value of F . Therefore, for the features or edges having the
greedy methods or reinforcement learning based methods value of 1, we select the features/edges which have the low-
which are often expensive. est negative IG scores and perturb them to 0. The untargeted
The node features in a graph are often bag-of-words kind IG-JSMA attack aims to minimize the prediction score for the
of features which can either be 1 or 0. The unweighted edges winning class so that we try to increase the input dimensions
in a graph are also frequently used to express the existence with high IG scores to 0.
of specific relationships, thus having only 1 or 0 in the ad- Note that unlike image feature attribution where the base-
jacency matrix. When attacking the model, the adversarial line input is the black image, we use the all-zero or all-one
perturbations are limited to either changing 1 to 0 or vice feature/adjacency matrices to represent the 1 → 0 or 0 → 1
versa. The main issue of applying vanilla FGSM and JSMA perturbations. While removing a specific edge or setting a
in graph models is the inaccurate gradients. Given a target specific feature from 1 to 0, we set the adjacency matrix A
∂J (1) (2) (t)
node t, for FGSM attack, ∇JW (1) ,W (2) (t) = W ∂X ,W
and feature matrix X to all-zero respectively since we want
measures the feature importance of all nodes to the loss func- to describe the overall change pattern of the target function
tion value. Here, X is the feature matrix, each row of which F while gradually adding edges/features to the current state
describes the features for a node in the graph. For a specific of A and X. On the contrary, to add edges/features, we com-
feature i of node n, a larger value of ∇JW (1) ,W (2) in indicates pute the change pattern by gradually removing edges/features
perturbing feature i to 1 is helpful to get the target node mis- from all-one to the current state, thus setting either A or X to
classified. However, following this gradient may not help for an all-one matrix. To keep the direction of gradients consis-
two reasons: First, the feature value might already be 1 so that tent and ensure the computation is tractable, the IG (for edge
we could not perturb it anymore; Second, even if the feature attack) is computed as follows:
value is 0, since a GCN model may not learn a local linear
function between 0 and 1 for this feature value, the result of
this perturbation is unpredictable. It is also similar for JSMA
∂F ( k ×(Aij −0))
as the Jacobian of the model shares all the limitations with (Aij − 0) × Σm m × 1
m,
k=1 ∂Aij
for removing edges
the gradients of loss. In other words, vanilla gradients suffer IG(F (X, A, t))[i, j] ≈
∂F ( k ×(1−Aij ))
from local gradient problems. Take a simple ReLU network (1 − Aij ) × Σm m × 1
m,
k=1 ∂Aij
f (x) = ReLU (x) as an example, when x increase from 0 to for adding edges
(7)
1, the function value also increases by 1. However, comput-
ing the gradient at x = 0 gives 0, which does not capture the Algorithm 1 shows the pseudo-code for untargeted IG-
model behaviors accurately. To address this, we propose an JSMA attack. We compute the integrated gradients of the
integrated gradients based method rather than directly using prediction score for winning class c w.r.t the entries of A and
vanilla derivatives for the attacks. Integrated gradients were X. The integrated gradients are then used as metrics to mea-
initially proposed by [Sundararajan et al., 2017] to provide sure the priority of perturbing specific features or edges in the
sensitivity and implementation invariance for feature attribu- graph G. Note that the edge and feature values are considered
tion in the deep neural networks, particularly the convolu- and only the scores of possible perturbations are computed
tional neural networks for images. (see Eq.(7)). For example, we only compute the importance
The integrated gradient is defined as follows: for a given of adding edges if the edge does not exist before. Therefore,
0
model F : Rn → [0, 1], let x ∈ Rn be the input, x is the for a feature or an edge with high perturbation priority, we
baseline input (e.g., the black image for image data). Con- perturb it by simply flipping it to a different binary value.
0
sider a straight-line path from x to the input x, the integrated While setting the number of steps m for computing inte-
gradients are obtained by accumulating all the gradients at all grated gradients, one size does not fit all. Essentially, more
the points along the path. Formally, for the ith feature of x, steps are required to accurately estimate the discrete gradi-
the integrated gradients (IG) is as follows: ents when the function learned for certain features/edges is
non-linear. Therefore, we enlarge the number of steps while
Z 1 0 0 attacking the nodes with low classification margins until sta-
0 ∂F (x + αx(x − x )) ble performance is achieved. Moreover, the calculation can
IGi (F (x)) ::= (xi − xi ) × dα (6)
α=0 ∂xi be done in an incremental way if we increase the number of
For GCN on graph data, we propose a generic attack frame- steps by integer multiples.
work. Given the adjacency matrix A, feature matrix X, and To ensure the perturbations are unnoticeable, the graph
the target node t, we compute the integrated gradients for structure and feature statistics should be preserved for edge
function FW (1) ,W (2) (A, X, t) w.r.t I where I is the input for attack and feature attack, respectively. The specific prop-
attack. I = A indicates edge attacks while I = X indi- erties to preserve highly depend on the application require-
cates feature attacks. When F is the loss function of the ments. For our IG based attacks, we simply check against
GCN model, we call this attack technique FGSM-like attack these application-level requirements while selecting an edge
with integrated gradients, namely IG-FGSM. Similarly, we or a feature for perturbation. In practice, this process can be
call the attack technique by IG-JSMA when F is the predic- trivial as many statistics can be pre-computed or re-computed
tion output of the GCN model. For a targeted IG-JSMA or incrementally [Zügner et al., 2018].
Algorithm 1: IG-JSMA - Integrated Gradient Guided un- more effective than modifying the features. This is consis-
targeted JSMA attack on GCN tent in all the attacks (i.e., FGSM, JSMA, nettack, and IG-
JSMA). Feature-only perturbations generally fail to change
Input: Graph G(0) = (A(0) , X (0) ), target node v0 the predicted class of the target node. Moreover, the attack
F : the GCN model trained on G(0) approaches tend to favour adding edges over removing edges;
budget ∆: the maximum number of perturbations. Second, nodes with more neighbors are more difficult to at-
0 0 0
Output: Modified graph G = (A , X ). tack than those with less neighbors. This is also consistent
1 Procedure Attack() with the observations in [Zügner et al., 2018] that nodes with
2 //compute the gradients as the perturbation scores for
edges and features.
higher degrees have higher classification accuracy in both the
3 se ← calculate edge importance(A) clean and the attacked graphs.
4 sf ← calculate feature importance(X) Last, the attacks tend to connect the target node to nodes
5 //sort nodes and edges according to their scores. with different features and labels. We find out that this is the
6 features ← sort by importance(s f) most powerful way to perform attacks. We verify this obser-
7 edges ← sort by importance(s e) vation using CORA-ML dataset. To measure the similarity
8 f ← features.first, e ← edges.first of the features, we use the Jaccard similarity score since the
0 0
9 while |A − A| + |X − X| < ∆ do features of CORA-ML dataset are bag-of-words. Note that
10 //decide which to perturb our defense mechanism is generic, while the similarity mea-
11 if se [e] > sf [f ] then sures may vary among different datasets. For the graphs with
12 flip feature f other types of features, such as numeric features, we may use
13 f ← f.next different similarity measures. Given two nodes u and v with
14 else n binary features, the Jaccard similarity score measures the
15 flip edge e overlap that u and v share with their features. Each feature of
16 e = ← e.next
u and v can either be 0 or 1. The total number of each com-
17 end
bination of features for both u and v are specified as follows:
18 end
19 return G0
M11 is the number of features where both u and v have a
value of 1. M01 is the feature number where the value of the
feature is 0 in node u but 1 in node v. Similarly, M10 is the
total number of features which have a value of 1 in node u
4 Defense for Adversarial Graph but 0 in node v. M00 represents the total number of features
In order to defend the adversarial targeted attacks on GCNs, which are 0 for both nodes. The Jaccard similarity score is
we first hypothesize that the GCNs are easily attacked due given as
to the fact that the GCN models strongly rely on the graph
structure and local aggregations. The model trained on the M11
Ju,v = . (8)
attacked graph therefore suffers from the attack surface of the M01 + M10 + M11
model crafted by the adversarial graph. As it is well known We train a two-layer GCN on the CORA-ML dataset and
that adversarial attacks on deep learning systems are trans- study the nodes that are classified correctly with high prob-
ferable to models with similar architecture and trained on the ability (i.e., ≥ 0.8). For these nodes, Figure 1 shows the
same dataset. Existing attacks on GCN models are success- histograms for the Jaccard similarity scores between con-
ful as the attacked graphs are directly used to train the new nected nodes before and after the FGSM attack. The adver-
model. Given that, one feasible defense is to make the adja- sarial attack significantly increases the number of neighbors
cency matrix trainable. If the edge weights are learned dur- which have low similarity scores to the target nodes. This
ing the training process, they may evolve so that the graph also stands for nettack [Zügner et al., 2018]. For example,
becomes different compared with the graph crafted by the ad- we enable both feature and edge attacks for nettack and at-
versary. tack the node 200 in the GCN model trained on CORA-ML
We then verify this idea by making the edge weights train- dataset. Given the node degree of 3, the attack removes the
able in GCN models. In CORA-ML dataset, we select a node edge 200 → 1582 because node 1582 and node 200 are simi-
that is correctly classified and has the highest prediction score lar (J1582,200 = 0.113). Meanwhile, the attacks add edge 200
for its ground-truth class. The adversarial graph was con- → 1762 and 200 → 350, and node 200 shares no feature sim-
structed by using nettack [Zügner et al., 2018]. Without any ilarity with the two nodes. No features were perturbed in this
defense, the target node is misclassified with the confidence experiment.
of 0.998 after the attack. Our defense initializes the weights This result explains our observations. Compared with deep
of the edges just as the adversarial graph. We then train the convolutional neural networks (for image data) which have
GCN model without making any additional modifications on often more layers and parameters than graph neural networks.
the loss functions or other parameters of the model. Inter- GNNs, such as GCN for node classifications, are relatively
estingly, with such a simple defense method, the target node simple. They essentially aggregate the features according to
is correctly classified with high confidence (0.912) after the the graph structure. For a target node, an adversarially crafted
attack. graph attempts to connect the nodes with different features
To explain why the defense works, we observe following and labels to pollute the representation of the target node
the characteristics of the attacks: First, perturbing edges is to make the target node less similar to the nodes within its
essentially assigns low weights to the edges connecting two
dissimilar nodes. We therefore propose a simple yet effective
12 35 defense approach based on the following insight.
10 30
Our defense approach is pre-processing based. We perform
percentage (%)
percentage (%)
25
8
20
a pre-processing on a given graph before training. We check
6
15
the adjacency matrix of the graph and inspect the edges. All
4
10
the edges that connect nodes with low similarity score (e.g.,
2
5 = 0) are selected as candidates to remove. Although the clean
0
0.0 0.2 0.4 0.6 0.8 1.0
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30
graph may also have a small number of such edges, we find
Jaccard similarity between connected nodes Jaccard similarity between connected nodes
that removing these edges does little harm to the prediction of
(a) Clean (b) Attacked the target node. On the contrary, the removal of these edges
Figure 1: Histograms for the Jaccard similarities between connected
may improve the prediction in some cases. This is intuitive
nodes before and after FGSM attack. as aggregating features from nodes that differ sharply from
the target often over-smooths the node representations. In
fact, a recent study [Wu et al., 2019] shows that the nonlin-
correct class. Correspondingly, while removing edges, the earity and multiple weight matrices at different layers do not
attack tends to remove the edges connecting the nodes that contribute much to the predictive capabilities of GCN models
share many similarities to the target node. The edge attacks but introduce unnecessary complexity. [Zügner et al., 2018]
are more effective due to the fact that adding or removing uses a simplified surrogate model to achieve the attacks on
one edge affects all the feature dimensions during the aggre- GCN models for the same reason. Dai et al. [Dai et al., 2018]
gation. In contrast, modifying one feature only affects one briefly introduces a defense method by dropping some edges
dimension in the feature vector and the perturbation can be during the training. They show this decreases the attack rate
easily masked by other neighbors of nodes with high degrees. slightly. In fact, their method works only when the edges con-
Based on these observations, we make another hypothesis necting dissimilar nodes are removed. However, this defense
that the above defense approach works because the model as- fails to differentiate the useful edges from those need to be
signs lower weights to the edges that connect the target node removed, thus achieving sub-optimal defense performance.
to the nodes sharing little feature similarity with it. To verify The proposed defense is computationally efficient as it only
this, we plot the learned weights and the Jaccard similarity makes one pass to the existing edges in the graph, thus having
scores of the end nodes for the edges starting from the target the complexity of O(N ) where N is the number of edges. For
node (see Figure 2). Note that for the target node we choose, large graphs, calculating the similarity scores can be easily
the Jaccard similarity scores between every neighbor of the parallelized in implementation.
target node and itself are larger than 0 in the clean graph. The
edges with zero similarity scores are all added by the attack. 5 Evaluation
As expected, the model learns low weights for most of the We use the widely used CORA-ML, CITESEER [Bojchevski
edges with low similarity scores. and Günnemann, 2018] and Polblogs [Adamic and Glance,
2005] datasets. The overview of the datasets is listed below.
0.2
We split each graph in labeled (20%) and unlabeled nodes
0.0 (80%). Among the labeled nodes, half of them is used for
0 5 10 15 20 25 30 35 40
Edges connecting the target node to its neighbors training while the rest half is used for validation. For the
polblogs dataset, since there are no feature attributes, we set
Figure 2: The normalized learned edge weights and the Jaccard sim- the attribute matrix to an identity matrix.
ilarity scores for the end nodes of the edges. Each value of the x-axis
represents an edge in the neighborhood of the target node. 5.1 Transductive Attack
As mentioned, due to the transductive setting, the models are
To make the defense more efficient, we do not even need to not regarded as fixed while attacking. After perturbing either
use learnable edge weights as the defense. Learning the edge features or edges, the model is retrained for evaluating the at-
weights inevitably introduces extra parameters to the model, tack effectiveness. To verify the effectiveness of the attack,
which may affect the its scalability and accuracy. A simple we select the nodes with different prediction scores. Specifi-
approach is potentially as effective based on the following: cally, we select in total 40 nodes which contain the 10 nodes
First, normal nodes generally do not connect to many nodes with top scores, 10 nodes with the lowest scores and 20 ran-
that share no similarities with it; Second, the learning process domly selected nodes. We compare the proposed IG-JSMA
with several baselines including random attacks, FGSM, and In Figure 4, the node color represents the class of the node.
nettack. Note that for the baselines, we conducted direct at- Round nodes indicate positive importance scores while dia-
tacks on the features of the target node or the edges directly mond nodes indicate negative importance score. The node
connected to the target node. Direct attacks achieve much size indicates the value of the positive/negative importance
better attacks so that can act as stronger baselines. score. A larger node means higher importance. Similarly, red
To evaluate how effective is the attack, we use classifica- edges are the edges which have positive importance scores
tion margins as the metric. For a target node v, the classi- while blue ones have negative importance scores. Thicker
fication margin of v is Zv,c − maxc0 6=c Zv,c0 where c is the edges correspond to more important edges in the graph and
ground truth class, Zv,c is the probability of class c given to the pentagram represents the target node in the attack.
the node v by the graph model. A lower classification mar- Figure 4a, 4b and 4c show the node importance results
gin indicates better attack performance. Figure 3 shows the of brute-force, vanilla gradients and integrated gradients ap-
classification margins of nodes after re-training the model on proach respectively (# of steps = 20). The vanilla gradients re-
the modified graph. We found that IG-JSMA outperforms the veal little information about node/edge importance as almost
baselines. More remarkably, IG-JSMA is quite stable as the all the edges are assigned with certain importance scores and
classification margins have much less variance. Just as stated it is difficult to see the actual node/edge influence. However,
in [Zügner et al., 2018], the vanilla gradient-based methods, in the brute-force case, we notice that the majority number of
such as FGSM are not able to capture the actual change of edges are considered not important for the target node. More-
loss for discrete data. Similarly, while used to describe the over, vanilla gradients underestimate the importance of nodes
saliency map, the vanilla gradients are also not accurate. overall. The integrated gradients, as shown in Figure 4c is
To demonstrate the effectiveness of IG-JSMA, we also consistent with the ground truth produced by brute-force ap-
compare it with the original JSMA method where the saliency proach shown in Figure 4a. With only 20 steps along the path,
map is computed by the vanilla gradients. Table 2 compares integrated gradients provide accurate approximations for the
the ratio of correctly classified nodes after the JSMA and IG- importance scores. This shows the integrated gradients ap-
JSMA attacks for 100 random sampled nodes, respectively. proach is effective when used to guide the adversarial attacks
A lower value is better as more nodes are misclassified. We on graphs with discrete values.
can see that IG-JSMA outperforms JSMA attack. This shows
that the saliency map given by integrated gradients approxi- 5.2 Defense
mate the change patterns of the discrete features/edges better. In the following, we study the effectiveness of the pro-
posed defense technique under different settings. We use
Table 2: The ratio of correctly classified nodes under JSMA and the CORA-ML and Citeseer datasets that have features for
IG-JSMA attacks. the nodes. We first evaluate whether the proposed defense
method affects the performance of the model. Table 4 shows
Dataset CORA Citeseer Polblogs the accuracy of the GCN models with/without the defense.
JSMA 0.04 0.06 0.04
IG JSMA 0.00 0.01 0.01 Table 3: Accuracy (%) of models on clean data with/without the
proposed defense. We remove the outliers (i.e., accu ≤ 75%/65%
for CORA-ML/Citeseer) due to the high variance.
Figure 4 gives an intuitive example about this. For the
graph, we conducted evasion attack where the parameters of Dataset w/o defensde w/ defense
the model are kept fixed as the clean graph. For a target node
CORA-ML 80.9± 0.6 80.7 ± 0.7
in the graph, given a two-layer GCN model, the prediction Citeseer 69.5 ± 0.7 69.6 ± 0.8
of the target node only relies on its two-hop ego graph. We
define the importance of a feature/an edge as follows: For
a target node v, The brute-force method to measure the im- We find that the proposed defense was cheap to use as the
portance of the nodes and edges is to remove one node or one pre-processing of our defense method almost makes no neg-
edge at a time in the graph and check the change of prediction ative impact on the performance of the GCN models. More-
score of the target node. over, the time overhead is negligible. Enabling defense on
the GCNs models for the two datasets increases the run time
Assume the prediction score for the winning class c is pc .
of training by only 7.52s and 3.79s, respectively. Note that
After setting entry Aij of the adjacency matrix from 1 to 0,
0 run time results are obtained using our non-optimized Python
the pc changes to pc . We define the importance of the edge implementation.
0
by ∆pc = pc − pc . To measure the importance of a node, we For different attacks, we then evaluate how the classifi-
could simply remove all the edges connected to the node and cation margins and accuracy of the attacked nodes change
see how the prediction scores change. The importance values with/without the defense. As in the experiments of trans-
can be regarded as the ground truth discrete gradients. ductive attack, we select 40 nodes with different prediction
Both vanilla gradients and integrated gradients are approx- scores. The statistics of the selected nodes are the follow-
imations of the ground truth importance scores. The node ings: For CORA-ML and Citeseers datasets, we train the
importance can be approximated by the sum of the gradients GCN models on the clean graphs. The selected nodes have
of the prediction score w.r.t all the features of the node as well classification margins of 0.693 ± 0.340 and 0.636 ± 0.419,
as the gradients w.r.t to the entries of the adjacency matrix. respectively.
1.00 1.00 1.00
0.75 0.75 0.75
0.50 0.50 0.50
classification margin
classification margin
classification margin
0.25 0.25 0.25
0.00 0.00 0.00
−0.25 −0.25 −0.25
−0.50 −0.50 −0.50
−0.75 −0.75 −0.75
−1.00 −1.00 −1.00
rand FGSM netattack IG_JSMA rand FGSM netattack IG_JSMA rand FGSM netattack IG_JSMA
attack attack attack
Table 4: Classification margins and error rates (%) for the GCN diction confidence for their winning class is much lower since
models with different attacks. the classification margins increase. Therefore, it becomes
harder to fool the users because manual checks are gener-
CM (w/ attack) Accu (w/ attack)
Dataset Attack ally involved in predictions with low confidence. Overall, the
w/ defense no defense w/ defense no defense proposed defense is effective even though we only remove the
FGSM 0.299 ± 0.741 -0.833 ± 0.210 0.625 0.025 edges that connect nodes with Jaccard similarity score of 0.
JSMA 0.419 ± 0.567 -0.828 ± 0.225 0.775 0
CORA
nettack 0.242 ± 0.728 -0.839 ± 0.343 0.600 0.025
IG-JSMA 0.397 ± 0.553 -0.897 ± 0.114 0.750 0 6 Conclusions and Discussion
FGSM 0.451 ± 0.489 -0.777 ± 0.279 0.825 0.025
JSMA 0.501 ± 0.531 -0.806 ± 0.186 0.775 0.05 Graph neural networks (GNN) significantly improved the an-
Citeseer
nettack 0.421 ± 0.468 -0.787 ± 0.332 0.775 0.025 alytic performance on many types of graph data. However,
IG-JSMA 0.495 ± 0.507 -0.876 ± 0.186 0.800 0.025 like deep neural networks in other types of data, GNN suf-
fers from robustness problems. In this paper, we gave in-
sight into the robustness problem in graph convolutional net-
The results are given in Table 4. First of all, without de- works (GCN). We proposed an integrated gradients based at-
fenses, most of the selected nodes are misclassified as the ac- tack method that outperformed existing iterative and gradient-
curacy is always under 0.05 for any attacks. By enabling the based techniques in terms of attack performance. We also an-
defense approach, the accuracy can be significantly improved alyzed attacks on GCN and revealed the robustness issue was
regardless of the attack methods. This, to some degree, shows rooted in the local aggregation in GCN. We give an effec-
that all the attack methods seek similar edges to attack and the tive defense method to improve the robustness of GCN mod-
proposed defense approach is attack-independent. Although els. We demonstrated the effectiveness and efficiency of our
a few nodes were still misclassified with the defense, the pre- methods on benchmark data.
References [Papernot et al., 2016] Nicolas Papernot, Patrick McDaniel,
[Adamic and Glance, 2005] Lada A Adamic and Natalie Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Anan-
Glance. The political blogosphere and the 2004 us elec- thram Swami. The limitations of deep learning in adver-
tion: divided they blog. In Proceedings of the 3rd inter- sarial settings. In Security and Privacy (EuroS&P), 2016
national workshop on Link discovery, pages 36–43. ACM, IEEE European Symposium on, pages 372–387. IEEE,
2005. 2016.
[Allen, 1970] Frances E Allen. Control flow analysis. In [Ron and Shamir, 2013] Dorit Ron and Adi Shamir. Quan-
ACM Sigplan Notices, volume 5, pages 1–19. ACM, 1970. titative analysis of the full bitcoin transaction graph. In
International Conference on Financial Cryptography and
[Bhagat et al., 2011] Smriti Bhagat, Graham Cormode, and
Data Security, pages 6–24. Springer, 2013.
S Muthukrishnan. Node classification in social networks.
In Social network data analytics, pages 115–148. Springer, [Shaham et al., 2018] Uri Shaham, James Garritano, Yutaro
2011. Yamada, Ethan Weinberger, Alex Cloninger, Xiuyuan
Cheng, Kelly Stanton, and Yuval Kluger. Defending
[Bojchevski and Günnemann, 2018] Aleksandar Bojchevski
against adversarial images using basis functions transfor-
and Stephan Günnemann. Deep gaussian embedding of at- mations. arXiv preprint arXiv:1803.10840, 2018.
tributed graphs: Unsupervised inductive learning via rank-
ing. Proceedings of ICLR’18, 2018. [Sundararajan et al., 2017] Mukund Sundararajan, Ankur
Taly, and Qiqi Yan. Axiomatic attribution for deep net-
[Bruna et al., 2013] Joan Bruna, Wojciech Zaremba, Arthur
works. Proceedings of ICML’17, 2017.
Szlam, and Yann LeCun. Spectral networks and lo-
cally connected networks on graphs. arXiv preprint [Tramèr et al., 2018] Florian Tramèr, Alexey Kurakin, Nico-
arXiv:1312.6203, 2013. las Papernot, Ian Goodfellow, Dan Boneh, and Patrick Mc-
[Cao et al., 2016] Shaosheng Cao, Wei Lu, and Qiongkai Daniel. Ensemble adversarial training: Attacks and de-
fenses. Proceedings of ICLR’18, 2018.
Xu. Deep neural networks for learning graph represen-
tations. In Proceedings of AAAI’16, 2016. [Veličković et al., 2018] Petar Veličković, Guillem Cucurull,
[Dai et al., 2018] Hanjun Dai, Hui Li, Tian Tian, Xin Huang, Arantxa Casanova, Adriana Romero, Pietro Lio, and
Yoshua Bengio. Graph attention networks. Proceedings
Lin Wang, Jun Zhu, and Le Song. Adversarial attack on
of ICLR’18, 2018.
graph structured data. Proceedings of ICML’18, 2018.
[Edwards and Xie, 2016] Michael Edwards and Xianghua [Wang et al., 2018] Xiaoyun Wang, Joe Eaton, Cho-Jui
Xie. Graph based convolutional neural network. Proceed- Hsieh, and Felix Wu. Attack graph convolutional networks
ings of BMVC’16, 2016. by adding fake nodes. arXiv preprint arXiv:1810.10751,
2018.
[Goodfellow et al., 2015] Ian J Goodfellow, Jonathon
[Wu et al., 2019] Felix Wu, Tianyi Zhang, Amauri Holanda
Shlens, and Christian Szegedy. Explaining and harnessing
adversarial examples. Proceedings of ICLR’15, 2015. Souza Jr., Christopher Fifty, Tao Yu, and Kilian Q. Wein-
berger. Simplifying graph convolutional networks. arXiv
[Hart, 1989] Sergiu Hart. Shapley value. In Game Theory, preprint arXiv:1902.07153, 2019.
pages 210–216. Springer, 1989.
[Xie et al., 2018] Cihang Xie, Jianyu Wang, Zhishuai Zhang,
[Henaff et al., 2015] Mikael Henaff, Joan Bruna, and Yann Zhou Ren, and Alan Yuille. Mitigating adversarial effects
LeCun. Deep convolutional networks on graph-structured through randomization. Proceedings of ICLR’18, 2018.
data. Proceedings of NeurIPS’15, 2015.
[Xu et al., 2013] Huan Xu, Yujiu Yang, Liangwei Wang, and
[Ian J. Goodfellow, 2014] Christian Szegedy Ian J. Goodfel- Wenhuang Liu. Node classification in social network
low, Jonathon Shlens. Explaining and harnessing adver- via a factor graph model. In Pacific-Asia Conference on
sarial examples. arXiv preprint arXiv:1412.06572, 2014. Knowledge Discovery and Data Mining, pages 213–224.
[Kipf and Welling, 2017] Thomas N Kipf and Max Welling. Springer, 2013.
Semi-supervised classification with graph convolutional [Xu et al., 2018] Weilin Xu, David Evans, and Yanjun Qi.
networks. Proceedings of ICLR’17, 2017. Feature squeezing: Detecting adversarial examples in deep
[Lundberg and Lee, 2016] Scott Lundberg and Su-In Lee. neural networks. Proceedings of NDSS’18, 2018.
An unexpected unity among methods for interpreting [Yuan et al., 2019] Xiaoyong Yuan, Pan He, Qile Zhu, and
model predictions. Proceedings of NeurIPS’16, 2016. Xiaolin Li. Adversarial examples: Attacks and defenses
[Newman et al., 2002] Mark EJ Newman, Duncan J Watts, for deep learning. IEEE transactions on neural networks
and Steven H Strogatz. Random graph models of social and learning systems, 2019.
networks. Proceedings of the National Academy of Sci- [Zügner et al., 2018] Daniel Zügner, Amir Akbarnejad, and
ences, 99(suppl 1):2566–2572, 2002.
Stephan Günnemann. Adversarial attacks on neural net-
[Papernot and McDaniel, 2018] Nicolas Papernot and works for graph data. In Proceedings of the 24th ACM
Patrick McDaniel. Deep k-nearest neighbors: Towards SIGKDD International Conference on Knowledge Discov-
confident, interpretable and robust deep learning. arXiv ery & Data Mining, pages 2847–2856. ACM, 2018.
preprint arXiv:1803.04765, 2018.