Graph Convolutional Neural Networks Sensitivity under Probabilistic Error Model
Abstract
Graph Neural Networks (GNNs), particularly Graph Convolutional Neural Networks (GCNNs), have emerged as pivotal instruments in machine learning and signal processing for processing graph-structured data. This paper proposes an analysis framework to investigate the sensitivity of GCNNs to probabilistic graph perturbations, directly impacting the graph shift operator (GSO). Our study establishes tight expected GSO error bounds, which are explicitly linked to the error model parameters, and reveals a linear relationship between GSO perturbations and the resulting output differences at each layer of GCNNs. This linearity demonstrates that a single-layer GCNN maintains stability under graph edge perturbations, provided that the GSO errors remain bounded, regardless of the perturbation scale. For multilayer GCNNs, the dependency of system’s output difference on GSO perturbations is shown to be a recursion of linearity. Finally, we exemplify the framework with the Graph Isomorphism Network (GIN) and Simple Graph Convolution Network (SGCN). Experiments validate our theoretical derivations and the effectiveness of our approach.
Index Terms:
Sensitivity analysis, graph convolutional neural network, graph shift operator, structural perturbationI Introduction
Graph neural networks (GNNs) have steadily gained prominence as an innovative tool in machine learning and signal processing, exhibiting unparalleled efficiency in processing data encapsulated within complex graph structures [1, 2, 3]. Uniquely designed, GNNs utilize a system of intricately coupled graph filters (GFs) with nonlinear activation functions, enabling the effective transformation and propagation of information within the graph [4].
Different GNN architectures can be delineated based on the GFs, which are an integral to the functioning of GNNs. A notable example of these architectures uses graph-convolutional filters. The GNN employing this design is known as the Graph Convolutional Neural Network (GCNN). Some examples of GCNNs include the vanilla Graph Convolutional Network (GCN) [5], Graph Isomorphism Network (GIN) [6], Simple Graph Convolution Network (SGCN) [7, 8], and Cayley Graph Convolutional Network (CayleyNet) [9]. In contrast to the aforementioned GCNNs, there exist non-convolutional GNNs such as the Graph Attention Network (GAT) [10] and Edge Varying Graph Neural Network (EdgeNet) [11], which utilize edge-varying graph filters [12].
This paper delves into the GCNN, which blends graph convolutional filters with nonlinear activation functions. Graph convolutional filters couple the data and graph with the underlying graph matrix, named graph shift operator (GSO), which can be, for example, the graph adjacency matrix or graph Laplacian, encoding the interactions between data samples [13]. Based on the GSO, the graph filter captures the structural information by aggregating the data propagated within its hop neighborhoods, and feeds it to the next layer after processing, which can be applying graph coarsening and pooling [14, 6]. As the key component of GCNNs, GSO presents the graph structure, and is typically assumed to be perfectly known. The precise estimation of the hidden graph structure is essential for successfully performing feature propagation in a convolution layer [15, 16, 17].
GSOs form the foundation of GCNN structures. Any perturbation in the graph structure has a direct bearing on the operations of a GCNN. Previous studies in graph signal processing (GSP) and GNN have examined both deterministic and probabilistic perturbations affecting GSOs. A probabilistic graph perturbation model for a partially correct estimation of the adjacency matrix is proposed in [18], where a perturbed graph is modeled as a combination of the true adjacency matrix and a perturbation term specified by Erdős-Rényi (ER) graph. The work [19] explores perturbations in graphs using random edge sampling, a scheme characterized by randomly deleting existing edges. In [20], a GSO perturbation strategy is formulated leveraging a general first-order optimization method, which concurrently imposes a constraint on the extent of edge perturbation. In [21], the authors propose to perturb eigenvector pairs of the graph Laplacian, considering single and multiple edge perturbations, under small perturbation assumption. Here, small perturbations refer to changes in a small percentage of edges.
The stability of GFs and GCNNs under GSO perturbations is one of the key research areas in signal processing (SP) and computer science (CS). In the SP community, research focuses on the relationship between the system’s output differences and the GSO differences under evasion attacks, emphasizing changes in the learned representation. In [22], the authors provide bounds on the output changes of spectral GFs resulting from double edge rewiring on normalized augmented adjacency matrices. This study extends the stability results to SGCN and gives theoretical bounds. In [23], the authors present interpretable bounds to verify the stability of spectral GFs against graph edge perturbations. These bounds are derived under the constraint that the degree of any node after perturbation cannot exceed twice its original degree. In [24], the authors apply an additive error model with norm-bounded perturbations on unspecified GSOs to provide stability bounds for multi-layer GCNNs. This model is not generic as it does not explicitly account for the perturbation of graph edges. It primarily considers perturbations resembling a uniform scaling of edge weights, a limitation noted in [25]. Additionally, the bound of error matrix is defined based on the smallest operator norm achievable via node permutation. However, this permutation assumption may not suit social or citation networks where node identification is label-dependent, as noted in [22]. In [19], authors consider random edge deletions as the perturbation on GSOs, specifically focusing on adjacency matrices and graph Laplacians. It concludes that both the GF and GCNN are linearly stable with respect to several factors, including the probability of edge dropping, nonlinearity, and the width and depth of the network architecture. Nevertheless, in the experiments of [19], the maximum edge deletion probability is set to , indicating a limited scale on perturbation. Works in CS [26, 27, 28, 29, 30] focus on the effects of adversarial attacks affecting GCNN accuracy, considering both evasion and poisoning attacks. The focus is on the impacts of such attacks on the downstream task. For instance, under evasion attacks, [27] demonstrates the reduction on GCNN’s accuracy under small perturbations, while maintaining the degree distributions after the attack, and [30] demonstrates the significant drop of accuracy of GCN when 5% of edges are altered.
In this paper, we introduce a sensitivity analysis framework for GCNN under the probabilistic edge perturbation model [18]. We understand stability as the characteristic of a system to maintain bounded output under perturbations, while sensitivity analysis is an examination of how variations in the output depend on influencing factors. Our analysis concentrates on studying the effects of evasion attacks. We use statistical analysis to give expected bounds for GSO errors (Theorem 1 and Proposition 1). These error bounds are explicitly dependent on the parameters of the error model. Then, we establish a sensitivity analysis framework for both GF (Theorem 2) and multilayer GCNN (Theorem 3) by giving expected bounds for differences of outputs because of GSO errors. Finally, we exemplify the framework with GIN (Corollary 1) and SGCN (Corollary 2), and empirically show that under large-scale graph perturbations (significant edge modifications), GCNNs maintain stability.
Our detailed contributions are summarized as follows.
1. Probabilistic error model. The probabilistic edge perturbation model considered is general and practically appealing. It is grounded in stochastic block models, supports both deletion and addition of edges, and permits a broader perturbation scale. The corresponding analysis approach contrasts with the constrained perturbations in existing GCNN analyses, which involve such restrictions as permitting only edge deletions in [19], double edge rewiring in [22], and small norm bounded errors in [24].
2. Tight GSO error bound. We give tighter expected bounds on GSO errors compared to our previous conference work [31], in which the bounds are deterministic. We use the norm suggested in [23] to bound the norm and make this bound interpretable by specifically tracking the changed node degrees, which can be directly linked to parameters of the error model (probabilities of deleting and adding edges). Additionally, our bound does not require the eigendecomposition of GSO [24, 19], which is computationally heavy for large graphs.
3. Generic sensitivity analysis framework. Compared to previous works [24, 19, 22], our proposed analysis framework is more generic in the following aspects. (i) We remove the assumption on limited scale perturbation and allow for a large perturbation budget, for instance that 50% of edges are deleted and 70% of edges are added (compared to the original number of edges). Our analysis is shown empirically to be valid even under such perturbation, while the maximum edge perturbation addressed in the current literature is of edges [23]. (ii) We provide expected bounds under a probabilistic perspective, while the deterministic perturbations can be seen as special cases of our analysis. (iii) This framework is applicable to general GCNN models, with specific adjustments for GSO, graph shifts count, network layer count, and activation functions.
Outline. The remainder of this paper is structured as follows. In Sections II and III, we establish the fundamentals of GCNNs and proceed to formulate the problem. Section IV bounds the difference between original and perturbed GSOs, with particular emphasis on two cases: the adjacency matrix and its normalized version. Section V encompasses both GFs and GCNNs like GIN and SGCN, and demonstrates that variations in the output of each GCNN layer in response to graph perturbations are linearly bounded. Empirical validations presented in Section VI use numerical experiments with both synthetic and real-world data to corroborate the proposed theorems, thereby attesting to the reliability of our sensitivity analysis model. Section VII concludes the paper and discusses the future work.
Notation. Boldface lower case letters such as represent column vectors, while boldface capital letters like denote matrices. A vector full of ones is symbolized as , and a matrix full of ones is expressed as . The identity matrix of size is represented as . The -th row or column of the matrix is given as , and the -th element in matrix is denoted as or . Vector norm is defined as follows: . Matrix norms are defined as follows: the norm is represented as , the norm as (largest singular value of ), and the norm as . In addition, the Hadamard product is expressed with the symbol . We use for probability, for expectation, for variance, and for covariance.
II Preliminaries
Graph theory, GSP, and GCNN form the cornerstone of data analysis in irregular domains. The GSO plays a key role in directing information flow across the graph, thereby enabling the creation of GFs and the design of GCNNs.
The sensitivity analysis of the GSO, which essentially involves matrix sensitivity analysis, provides an empirical insight into the system’s resilience to perturbations. The GCNN, with its local architecture, maintains most of the properties of the graph convolutional filter, making it an ideal tool for sensitivity analysis. These preliminary concepts are essential for the implementation of sensitivity analysis in a graph-based context.
Graph Basics. Consider an undirected and unweighted graph , where the node set consists of nodes, the edge set is a subset of , and the edge weighting function assigns binary edges. For an edge , we have due to our focus on undirected and unweighted graphs. We define the -hop neighboring set of a node as , the degree of node as , and the minimum degree of nodes around as .
GSO. The Graph Shift Operator (GSO) symbolizes the structure of a graph and guides the passage and fusion of signals between neighboring nodes. It is often represented by the adjacency matrix , the Laplacian , or their normalized counterparts. These representations capture the graph’s connectivity patterns, marking them indispensable tools for data analysis in both regular and irregular domains [32]. The adjacency matrix, denoted by , incorporates both the weighting function and the graph topology , where if and if . The Laplacian matrix is defined by the adjacency matrix and a diagonal degree matrix . Specifically, , where is a diagonal matrix, and . The value denotes the degree of node . Moreover, normalized versions of the adjacency and Laplacian matrices are defined as and , respectively. These normalized versions help maintain consistency and manage potential variations in the scale of the data.
Graph Convolutional Filter. Using GSO, graph signals undergo shifting and averaging across their neighboring nodes. The signal on the graph is denoted by . Its -th entry specifies the data value at the node . The one time shift of graph signal is simply , whose value at node is . After one graph shift, the value at node is given by moving a local linear operator over its neighborhood values . Based on the graph shifting, a graph convolutional filter with taps is defined via polynomials of GSO and the filter weights in the graph convolution
(1) |
where is the filter’s output and is a shift-invariant graph filter with taps, and denotes the weight of local information after -hop data exchanges. The graph filter is then combined with the nonlinear activation function, forming the primary component of GCNN and contributing to its expressivity.
Graph Perceptron and GCNN. A Graph Perceptron [4] is a simple unit of transformation in the GCNN. The functionality of a graph perceptron can be seamlessly extended to accommodate graph signals with multiple features. Specifically, a multi-feature graph signal can be denoted by , where signifies the number of features. The architecture of an -layer GCNN is built upon cascading multiple graph perceptrons. It operates such that the output of a graph perceptron in a preceding layer serves as the input to the graph perceptron at the subsequent layer , where spans from to . We denote the feature fed to the first layer as . For an -layer GCNN, the graph perceptron at layer can be represented as
(2) |
Here, signifies the intermediate graph filter output, denotes the nonlinear activation function at layer , and graph signals at each layer are and with sizes of and , respectively, where denotes the number of features at the -th layer. The bank of filter coefficients is represented by . By recursively using (2) until , a general GCNN can be formulated as
(3) |
This representation captures the nature of GCNN operations, going through each layer and applying the corresponding transformation defined by the graph signal, filter coefficients, and the non-linearity function. This hierarchical arrangement facilitates the flow of information through successive layers, thus enabling effective learning from graph-structured data.
III Problem Formulation
A pivotal aspect of understanding the sensitivity of a GCNN is the considerations of potential alterations in the underlying graph structure. These alterations can be broadly construed as perturbations to the GSO, intrinsically linking to changes in the graph topology. In the simplest form, any perturbation to the GSO can be depicted as
(4) |
where signifies the perturbed GSO, is the original GSO, and represents the error term. The spectral norm of this error term is denoted by
(5) |
Inspired by a previous work [18], we utilize a probabilistic error model to represent graph perturbations, where each edge of the graph is subject to perturbation independently. In this context, we primarily focus on the alterations occurring within the neighborhood of a particular node . More specifically, the perturbed neighborhood may encompass added nodes (), deleted nodes (), and remaining nodes (), which ultimately leads to changes in node degree and modifications to the adjacency matrix. We aim to quantify the sensitivity of GSO in relation to these perturbations. To this end, we adopt and expand upon the notation used in [22, 23] for clarity and consistency.
When the graph undergoes perturbations, it transforms into , with the node set remaining unaffected. We express degrees of node in original and perturbed graphs as and , respectively. Here, denotes the adjacency matrix of the perturbed graph , and is the degree change at node , with and corresponding to the number of edges added and deleted, respectively. We will further delve into the assumptions for the error model and its effects on the GCNN’s performance in the following discussion.
III-A Probabilistic Graph Error Model
![Refer to caption](x1.png)
![Refer to caption](x2.png)
![Refer to caption](x3.png)
![Refer to caption](x4.png)
In this work, we utilize an Erdös-Rényi (ER) graph-based model for perturbations on a graph adjacency matrix, following the approach proposed in [18]. The adjacency matrix of an ER graph is characterized by a random matrix , where each element of the matrix is generated independently, satisfying and for all . The diagonal elements are zero, i.e., for , eliminating the possibility of self-loops. For the sake of our analysis, we also assume that the perturbed graph does not contain any isolated nodes, meaning that for all , . The model can be adapted by employing the lower triangular matrix , and then defining . Consequently, by specifying the error term in (4), the perturbed adjacency matrix of a graph signal can be expressed as
(6) |
where the first term is responsible for edge deletion with probability , and the second term accounts for edge addition with probability . This error model can be conceptualized as superimposing two ER graphs on top of the original graph. To better illustrate this model, we utilize visual aids based on a random geometric graph [33, 34]. Fig. 1 visually represents the transition from the original graph to perturbed versions, which include the graph with only edge deletions (), the graph with only edge additions (), and the graph with both edge deletions and additions (). Each state depicts the progressive impacts of the perturbations.
In this context, the impact of the perturbation on the degree of a given node can be computed as follows. The effect of edge deletion is represented by , where each non-zero element in has a probability of being deleted. Thus, the total number of deleted edges is the sum of independent and identically distributed (i.i.d.) Bernoulli random variables, each with a probability of . Similarly, the effect of edge addition is denoted by , and the total number of added edges is the sum of i.i.d. Bernoulli random variables, each with a probability of , where . Hence, we can express the number of deleted edges and the number of added edges as following binomial distributions:
(7) |
where represents a binomial distribution with parameters and .
IV Expected Bound for GSO error
IV-A Error Bound for Unnormalized GSO Using Norm
Building on the foundation laid by the discussion of graph structure perturbations and the proposed error model, we now outline the primary theoretical contributions of this study. Our focus here is to detail the probabilistic bounds that help quantify the sensitivity of the GSO to graph structure perturbations. We examine the case where the adjacency matrix serves as the GSO, implying and . The error model derived in (6) can be expressed as
(8) |
We can link the change in degree with the norm of error term in (8) as
(9) |
where
(10) |
Let . Since and are independent random variables, it is not appropriate to give deterministic upper bounds. Instead, we present expected value bounds, which are better suited for analyzing the degree changes of nodes given the probabilistic nature of the model. Our goal is to derive a closed-form expression for the expectation of the maximum node degree error, i.e.,
(11) |
The probability mass function (PMF) of can be found by convolving the PMFs of and , which are independent random variables. Following binomial distributions in (7), we can obtain the following PMFs
(12) | ||||
(13) |
where , and represent the probabilities of and taking the value , respectively. Then, the PMF of can be computed as
(14) |
where . Using (14), the cumulative distribution function (CDF) of is computed as
(15) |
Given that for are i.i.d. and for , the CDFs for and are as follows
(16) |
With the PMF of taking on a specific value being , the expectation of can be represented as
(17) |
which provides a closed-form expression for . The variance of can also be given as
(18) |
where .
IV-B Bridging and Norms in GSO Analysis
In the analysis of graph-structured data, the spectral norm ( norm), is often employed to quantify the graph spectral error. While [31] did furnish a spectral error bound for the GSO, the need for a more refined and interpretable bound persists to enable more comprehensive analyses. Following the approach of [23], this study uses the norm and assumes that the error matrix is fixed. The proposed approach of bounding is based on assumptions of an undirected graph and perturbation . Using inequalities [35, Section 2.3.3] and the fact that in our case , the norm can be bounded by the norm
(19) |
The entries in the error matrix of equation (8) are random variables. As such, it is challenging to derive a deterministic bound for (19) that is both tight and generalizable. In contrast, an expected bound
(20) |
provides a more reasonable estimate of the true behavior of the error matrix, as it takes into account the distribution of the random variables, as well as the structural changes of the perturbed graph. Thus, we have the following theorem.
Theorem 1.
Theorem 1 provides a closed-form expression for the upper bound, which are explicitly dependent on the parameters of the probabilistic error model in (8). Using a loose upper bound proposed in [36], we can bound (21) as
(22) |
We note that (22) showcases how our bound in Theorem 1 is parameterized by the probabilities of adding and deleting edges. Thus, Theorem 1 precisely captures the resulting structural changes induced by the probabilistic error model, unlike the generic spectral bound in [31], which overlooks specific structural changes on the perturbed GSO.
Remark 1 (Why not use norm?).
The spectral bounds derived using the norm, as presented in [31], cannot fully capture the specific structural changes to the GSO from perturbations, especially in graphs with unique properties like degree distribution or sparsity. Focused on worst-case scenarios, these bounds lead to overestimations, rendering them looser and less applicable to particular graph types. The norm is preferred over the norm for providing an upper bound because it reveals the impact of structural changes denoted by and in (8), whereas the norm absorbs these structural changes into the overall spectral change, making it more challenging to derive a tight bound.
IV-C Error Bound for Normalized GSO
In this context, the GSO is considered as the normalized version of the adjacency matrix, i.e., . The entries of the normalized adjacency matrix are as follows, if , and if . In [23], a closed form for is proposed
(23) |
where and denote the degrees of node and after perturbation. However, the assumption in [23] states that the degree alteration should not exceed twice the initial degree, i.e., . This restriction is not needed in our work. Following the error model in (6), this limitation could easily be breached with an increased probability of edge addition . We start with the following lemma.
Lemma 1.
Let be defined as in (23), then its norm is bounded by a random variable
(24) |
where is defined as the sum of and , is the degree of node , is the minimum degree of neighboring nodes of , and are random variables with binomial distributions as for and , where and .
Proof.
See Appendix A. ∎
Let
(25) |
and note that and are discrete random variables. While the binomial random variables and degrees in the expression for are assumed to be i.i.d., the inherent nonlinearity and high-dimensionality in the function, along with the complexity introduced by the maximization operation over all nodes, pose challenges for deriving an analytical expression for . Furthermore, the expectation of a maximum of random variables often lacks a simple closed form with only bounds often being derivable, not the exact value. On the other hand, Monte Carlo simulations provide an efficient alternative for estimating , which is given as
(26) |
where represents the outcome from the -th Monte Carlo trial. Thus, for the normalized GSO, we have the following proposition as the counterpart of Theorem 1.
Proposition 1.
The upperbound provided in Proposition 1 focuses specifically on normalized adjacency matrices. This result complements the analysis for the unnormalized case. We note that the bound for normalized GSO is not an approximation or an empirical estimation; it presents a theoretical upperbound. The only difference between the bound in Proposition 1 and the bound in Theorem 1 is the computation. As for the bound in Theorem 1 (unnormalized case), has a closed-form expression; while for computing the bound in Proposition 1 (normalized case) , we use Monte Carlo simulations.
V GCNN Sensitivity
V-A Graph Filter Sensitivity Analysis
The sensitivity of graph filters is a critical aspect that follows logically from the preceding discussion on the expected bounds of GSO errors. Having extensively delved into the properties of GSO perturbations, we now turn our attention to the graph filters. Graph filters, being polynomials of GSOs, inherit the perturbations in the graph structure, manifesting as variations in filter responses.
The sensitivity of a graph filter to perturbations in the GSO is captured by the theorem below, which establishes a bound on the error in the graph filter response due to perturbations in the GSO and the filter coefficients.
Theorem 2 (Graph filter sensitivity).
Let and be the GSO for the true graph and the perturbed graph , respectively. The distance between polynomial graph filters and is defined as
(28) |
The expectation of filter distance (28) is bounded as
(29) |
where , , and denotes the largest of the maximum singular values of two GSOs.
Proof.
See Appendix B. ∎
Theorem 2 reveals that the expected graph filter distance is linearly bounded by the expected GSO distance, , if the sufficient condition is met. This bound is influenced by: the filter degree , the maximum singular value of GSOs, and the filter coefficients . The theorem indicates that higher order graph filters are likely to exhibit greater instability. In Section VI-B, we present a supporting experiment, specifically for low-pass graph filters with the unnormalized GSO, .
V-B GCNN Sensitivity Analysis
Based on the sensitivity analysis of graph filter, we extend this study to the sensitivity analysis of the general GCNN. Instead of meticulously quantifying the specifics of each perturbed graph, we propose a probabilistic boundary that captures the potential magnitude of graph perturbations and more insightful assessment of the system’s sensitivity to graph perturbations. We present the following theorem to exemplify this approach, encapsulating the sensitivity of a general GCNN to GSO perturbations.
Theorem 3 (GCNN Sensitivity).
For a general GCNN under the probabilistic error model (8), the expected difference of outputs at the final layer is given as
(30) |
where represents the Lipschitz constant for the nonlinear activation function used at layer , for , and for and then for are defined as follows
(31) |
with constant , and and in Theorem 2, for .
Proof.
See Appendix C. ∎
In Theorem 3, we use recursive bounds containing inter-layer features to simplify the formulation. Note that these inter-layer features can be explicitly computed by the initial input feature , both original and perturbed GSOs , GCNN parameters (number of layers and graph shift , network’s learned weights , and activation functions ). The derivation process employs induction. For the first layer , we have and ; for the second layer , the features are and ; by induction, for the th layer, we have
(32) |
Theorem 3 forms the bedrock of our analysis, quantifying how GCNNs respond to graph perturbations, which is described by a linear relationship at each layer. The sensitivity of multilayer GCNN to perturbations can be represented by a recursion of linearity. For multilayer GCNN, its expected output difference is controlled by: (i) the input feature, (ii) the GSO, error model parameters, (iii) Lipschitz constants of activation functions, and (iv) GCNN weights. We note that, choosing activation functions with more conservative Lipschitz constants can possibly improve the stability of GCNNs by imposing more constraints on the recursion. However, this may suppress the performance of a neural network, as noted in [37]. Our sensitivity analysis framework is generic, allowing for simplifications such as assuming a unit Lipschitz constant and normalized input features, as suggested in [22]. However, these simplifications do not indicate that the GCNN sensitivity is unaffected by the Lipschitz constant or input features. This layered analysis also enables an understanding of how perturbations propagate through GCNN layers, impacting the overall performance. Additionally, Theorem 3 does not restrict the scale of graph perturbations, which is a typical restriction in the existing literature.
Within the evasion attack context, where the focus is on learned representations, we demonstrate the following property: given that the GSO error is bounded as in Theorem 1 and Proposition 1, the linear bound of each layer of GCNN (illustrated in Subsection VI-C1) permits the network’s stability against perturbation as long as the graph error remains within the bound. In Subsection VI-C2, we show that multilayer GCNN is stable by showing its finite responses to large scale perturbations, even under notable declines in accuracy.
V-C Specifications for GCNN variants
Building upon sensitivity analysis Theorem 3, our discussion now evolves towards two specific GCNN variants - GIN [6] and SGCN [7, 8]. They apply different GSOs for feature propagation. In GIN, the GSO for each layer is chosen as a partially augmented unnormalized adjacency matrix; in SGCN, the GSO is chosen as a normalized augmented adjacency matrix. This choice is made to align with the discussions on tight GSO bounds in Section IV. By focusing on GIN and SGCN, we are essentially extending our theoretical understanding to practical and real-world applications.
V-C1 Specification for GIN
The GIN is designed to capture the node features and the graph structure simultaneously. The primary intuition behind GIN is to learn a function of the feature information from both the target node and its neighbors, which is related to the Weisfeiler-Lehman (WL) graph isomorphism test [38]. The chosen GSO for GIN is , where the learnable parameter preserves the distinction between nodes in the graph that are connected differently, and prevents GIN from reducing to a WL isomorphism test.
Given the GSO above, only the first order term with in (1) is kept, and the intermediate output of such graph filter is . A node Multilayer Perceptron (MLP) is then applied to the filter’s output as . Assuming the inner MLP has two layers in each GIN layer, a single-layer GIN () can be represented as
(33) |
where are weight matrix, bias matrix, and nonlinearity function in the first layer of the MLP, and are weight matrix, bias matrix, and nonlinearity function in the second layer of the MLP. Then, we provide the following corollary.
Corollary 1 (The sensitivity of single-layer GIN).
Proof.
See Appendix D. ∎
Corollary 1 shows a linear dependency between the output difference of a single-layer GIN and GSO perturbations. In GIN, node vector transformations by MLP contribute significantly to network’s expressivity. Under evasion attacks, with Corollary 1, the analysis of these transformed node representations is straightforward.
V-C2 Specification for SGCN
The SGCN is a streamlined model, developed by aiming to simplify a multilayered GCNN through the utilization of an affine approximation of graph convolution filter and the elimination of intermediate layer activation functions. The GSO chosen for SGCN is , where is the augmented adjacency matrix and is the corresponding degree matrix of the augmented graph.
Given the normalized augmented GSO, the node degrees are redefined based on the augmented GSO, specifically, they are incremented by compared to their values in the non-augmented version. This streamlined model simplifies the structure of a vanilla GCN [5] by retaining a single layer and the th order GSO in (1), so the output of the filter is . Note that for a SGCN, the maximum number of layers is . Consequently, the output of a single-layer SGCN using a linear logistic regression layer is represented as
(36) |
and thus, we can easily give the following corollary.
Corollary 2 (The sensitivity of SGCN).
With Corollary 2, we conclude that the sensitivity analysis for SGCN is a specification for the general form of a multilayer GCNN.
VI Numerical Experiments
![Refer to caption](x5.png)
![Refer to caption](x6.png)
![Refer to caption](x7.png)
![Refer to caption](x8.png)
![Refer to caption](x9.png)
VI-A Theoretical GSO Bound Corroboration
VI-A1 Synthetic graph
We consider a two-group planted partition model (PPM), which is a special case of the stochastic block model. Parameters are set with in-group probability to , and between-group probability to . The GSO is set as the unnormalized adjacency matrix . We perturb the PPM graph using the probabilistic error model (6) with two scales of perturbation budgets:
-
•
Small-scale perturbation (see Fig. 2, left panel): With and , the graph is slightly altered, preserving its fundamental structure.
-
•
Large-scale perturbation (see Fig. 2, right panel): With and , the graph is under significant structural changes.
We carry out 101 Monte Carlo trials for varying graph sizes (ranging from to , in -node increments). These simulations evaluate the expected bound from Theorem 1 and the deterministic bound from [31, Theorem 2] in relation to graph size. Comparisons with empirical GSO distances (5), calculated using the norm, reveal that our expectation bound is consistently tighter than the deterministic counterpart from [31]. This difference arises due to the consideration of degree changes and the probabilistic nature of our bound, as opposed to the worst-case scenario focus of the deterministic bound. Another observation is the increased bound magnitude correlating with higher perturbation budgets, as depicted in Fig. 2. Both bounds remain valid, even in high perturbation scenarios, underscoring the robustness of our theoretical frameworks.
VI-A2 Real-life graph
We utilize the undirected Cora citation graph [39], which comprises nodes, and classes. Assuming the undirected nature of the underlying graph, we modify the original Cora graph from a directed to an undirected one. The undirected Cora graph has edges. We ascertain the evolution of our theoretical bounds against an increase in edge deletion probability and edge addition probability . These alterations are systematically tracked along with using the and norms of the discrepancy between the original and perturbed graphs.
The range of and is set within , increasing in steps of . In each step, we compute the and norms of the difference between the original and perturbed adjacency matrices. We then compare these empirical results with the theoretical bounds provided in Theorem 1 and Proposition 1. In Fig. 3, with the GSO as the unnormalized adjacency matrix , two distinct scenarios are presented: varying with (left panel), and varying with (right panel). Through 101 Monte Carlo trials, the theoretical bound closely aligns with the empirical norm, particularly in scenarios where increased leads to denser graphs. This trend suggests that enhanced precision of the bounds as graph densities shift from sparse to dense.
In Fig. 4, employing the normalized adjacency matrix as the GSO, a similar analysis is conducted. In the left panel, an increase in and norm bounds is observed under rising error, and Proposition 1 gives a stable upper bound. However, the accuracy of the bound is comparatively less satisfactory in the normalized case. The right-hand case illustrates a stable empirical norm with an increasing number of edges, while the norm and our bound present slight increases and decreases, respectively. These observations can be attributed to the following factors: (i) the normalization operation keeps the adjacency matrix operator norm around 1; (ii) an increased number of edges raises the norm; (iii) increases in the denominator in Lemma 1 result in a general decrease in the bound.
VI-B GF Sensitivity Test
In this experiment, we evaluate the sensitivity of GF to the probabilistic error model. We employ an ER graph with nodes and a connection probability of as the baseline graph. The GSO is set as the unnormalized adjacency matrix . Our focus is on the relationship between filter distance and the bound in Theorem 2 for low pass GFs of orders . The findings are presented in Figs. 5 and 6.
In Fig. 5, the edge addition probability is fixed as and the edge deletion probability varies among . Over Monte Carlo trials, we plot the empirical GF distances alongside the corresponding GSO distances as scatter plots. These empirical GF distances demonstrate the linear scaling with the bounds in Theorem 2, depicted as solid lines. It is noted that the tightness of these bounds decreases with an increase in the GF order. The primary aim of this analysis is to confirm the linear relationship in Theorem 2.
In Fig. 6, the expected output differences of GFs with orders are plotted against the expected GSO differences and the bound in Theorem 1. Over Monte Carlo trials with perturbation probabilities and , the left panel shows that output differences increase with the GF order. The right panel confirms that the bound captures trends similar to the empirical expectation of GSO distance, corroborating Theorem 1. This suggests that for small, sparsely connected graphs, the sensitivity of a low pass GF to perturbations intensifies as its order increases.
VI-C GCNN Sensitivity Test
![Refer to caption](x10.png)
VI-C1 Linearity corroboration
The experimental validation of Theorem 3 is conducted using GIN (Corollary 1) and SGCN (Corollary 2). We note that Corollary 1 is only applicable for the single-layer GIN (). For the multi-layer GIN, our experiments show the recursion of linearity indicated in Theorem 3 empirically (see left panel of Fig. 7). These experiments are carried out on the Cora citation dataset, as discussed in Section VI-A, to assess the sensitivity of GIN and SGCN to perturbed GSOs under evasion attacks.
In Fig. 7, for GIN (left panel), each layer comprises hidden features. GIN variants with , , and layers differ only in the number of cascaded graph filters with MLPs. We investigate the correlation between empirical GIN output differences and GSO distances. The edge deletion probability, , is varied within in increments of , while the edge addition probability is fixed as . The results, categorized by edge deletion probability , are obtained from 101 Monte Carlo trials, computing pairs of bounds and GIN output differences. For SGCN (right panel), we examine networks of orders using a similar approach. Empirical observations for and in GIN and SGCN demonstrate a linear correlation between output differences and GSO distances, corroborating the theoretical frameworks in Corollary 1 and Corollary 2.
Notably, the output differences observed in the two cases operate on different scales. For the SGCN with normalized GSO (right panel), the variation in output differences with increasing perturbation probability is more gradual compared to the unnormalized GSO used in GIN (left panel), which shows a steeper change. This discrepancy is likely due to the influence of the estimated GSO spectral norm .
VI-C2 Accuracy drop under perturbation
![Refer to caption](x11.png)
![Refer to caption](x12.png)
![Refer to caption](x13.png)
![Refer to caption](x14.png)
![Refer to caption](x15.png)
![Refer to caption](x16.png)
After affirming the linear sensitivity in Theorem 3, we also examine the stability of GCNN under significant graph perturbations by observing the accuracy changes of same GCNN candidates as in Section VI-C1.
These experiments are conducted on three citation datasets: Cora, CiteSeer and PubMed [39]. The objective is to assess the impact of different perturbation budgets on the accuracy of GIN and SGCN models. The perturbation budget parameters are set as follows: edge deletion probability varies within in increments of , and edge addition probability varies within in increments of . Consistent with the experimental settings in Section VI-C1, the same GCNN candidates are utilized. The averaged accuracy results are shown in Fig. 8, where the bar indicates the standard variance of accuracy results. The first, second and third rows correspond to datasets Cora, CiteSeer and PubMed, respectively.
A consistent pattern of accuracy decrease across all datasets and GCNN models is observed in Fig. 8, where the accuracy gradually decreases with increasing perturbation budgets. Notably, larger graphs (e.g., PubMed) exhibit a faster accuracy drop compared to smaller graphs (e.g., Cora and CiteSeer). This can be attributed to the alteration of more edges under the same perturbation budget in larger graphs. When fixing edge deletion probability , accuracy drops by approximately (as in Fig. 8a, 1st row with ), and up to (as in Fig. 8a, 3rd row with ). With a fixed edge addition probability , the accuracy drop is around (as in Fig. 8a, 1st row with ), and approximately (as in Fig. 8a, 3rd row with ). This is likely because that, for sparse graphs, the same edge addition probability results in the addition of more edges than the number influenced by the same edge deletion probability.
The maximum of edge perturbation budget and is set to and , respectively. Consequently, up to of the edges are deleted, and are added relative to the original edge count. In this case, the graph structure is significantly perturbed. This significant graph perturbation makes the accuracy drop by up to . Under such large perturbations, GCNN gives finite responses. Thus, the GCNN is stable in our context even when the downstream task performance is significantly impacted, which is due to large-scale edge perturbations. This also verifies Theorem 3, where it is stated that as long as the GSO perturbation is bounded/finite, the GCNN output difference is also bounded/finite.
VII Conclusion and Discussion
This paper has presented an analytical framework for investigating the sensitivity of GCNNs to GSO perturbations, employing a probabilistic graph perturbation model. We have established tighter error bounds than those previously available. We have theoretically demonstrated that the expected output variation for a single layer of GCNN is linearly bounded by the GSO error, ensuring the stability (bounded output difference) of single-layer GCNN under bounded GSO errors. For multilayer GCNN, our analysis has shown that the dependency of GCNN output difference on GSO error can be described through a recursion of linearity. Specifically, this dependency is explicitly controlled by: the input feature, the GSO, error model parameters, Lipschitz constants of activation functions in GCNN, and GCNN weights. Through numerical experiments, we have validated our theoretical findings and confirmed that GCNNs (exemplified with GIN and SGCN) maintain stability under large-scale graph edge perturbations, despite significant performance reductions.
In this work, our primary focus is on edge perturbations in graphs, while potential modifications to the graph signal and node injections are not considered. Any alterations to the graph signal could be subsumed within the spectral norm when performing sensitivity analysis. However, node injection presents a challenge that cannot be addressed using the current definition of graph distance. This is due to the discrepancy in sizes between the unperturbed and perturbed graphs as the number of nodes increases. A potential solution to this issue could involve redefining the GSO distance using a different metric. In this context, Optimal Transport (OT) and its variants emerge as viable candidates for this task [40, 41, 42]. These methods allow for the augmentation of a smaller graph, facilitating the establishment of a meaningful graph distance metric [43]. Consequently, future research could explore an encompassing approach that considers all of the aforementioned types of graph perturbations. Such an investigation has the potential to yield more comprehensive insights into the stability of GCNNs under perturbations.
Graph regularization methods are commonly used to achieve robust graph learning and estimation [44]. Research on adversarial training of GCNNs typically uses specifically designed loss functions to strengthen GCNNs against structural and feature perturbations, thus improving their performance stability against certain graph disturbances [45, 46, 47, 48, 49]. In graph learning, several techniques have been developed to regulate graphs and signals based on specific graph signal assumptions to perform graph estimation [15, 16, 50, 51]. With the inclusion of effective graph regularization, our sensitivity analysis offers insight that can contribute to the development of a uniform metric, paving the way for a more transferable and robust GCNN.
Appendix A Upper Bound of
Proof of Lemma 1.
We start with the first term in (23), which is bounded by
(38) |
The second and third terms in (23) can be bounded using triangle inequality as follows
(39) |
For the first term in (A), we have
(40) |
For the second term in (A), we have
(41) |
Thus, we have a new bound, which is more suited to our error model, that is
(42) |
We will adapt the general bound (42) to the probabilistic error model presented in (8). In (42), we let
(43) |
where , , , , and . Finally, we obtain
(44) |
This completes the proof. ∎
Appendix B Graph filter sensitivity
Proof of Theorem 2.
First, we recall the following result.
Lemma 2.
(Lemma 3, [52]) Suppose that are Hermitian matrices satisfying , and . Then for every
(45) |
Expand the filter representation in , as
(46) |
By Lemma 2 and repeatably using triangle inequality, (46) is bounded by
(47) |
The correlation between and has two cases:
-
1.
If ,
(48) -
2.
If ,
(49)
The following proof is based on the second case (49) because the covariance term can be set to zero to include the first case. By using (46) and taking the expectation of (47), we obtain
(50) |
In (50), let
(51) | ||||
(52) |
Then, we have
(53) |
This completes the proof. ∎
Appendix C GCNN Sensitivity
Proof of Theorem 3.
First Layer. At the first layer , the graph convolution is performed as follows
(54) |
For a perturbed GSO , the difference between the perturbed and clean graph convolutions is
(55) |
Using Lemma 2, we can bound (55) as follows
(56) |
Similar to giving the upper bound for the expectation of graph filter distance from (47) to (53), given the constants and , we take the expectation of (56) and obtain
(57) |
For simplicity, let , and . Thus, (57) illustrates that the expectation of the graph filter distance at the first layer is bounded by a polynomial of as
(58) |
Consider the nonlinearity function at the first layer, which satisfies the Lipschitz condition
(59) |
Applying this Lipschitz condition to (56), we have
(60) |
Second Layer. At the second layer , the graph convolution is performed as
(61) |
The difference between the perturbed and clean graph convolutions is given by
(62) |
Taking the expectation of (C) and using (49), Lemma 2 as well as the submultiplicativity of the spectral norm, we have
(63) |
Let
(64) |
where , and . Thus, in (63), we have . Then, we can express (63) as a function controlled by
(65) |
where and . Consider the second layer’s nonlinearity function , we have
(66) |
Generalization to Layer . By induction, we can generalize the result to the output difference at any layer
(67) |
where
(68) |
This completes the proof. ∎
Appendix D Single-layer GIN Sensitivity
Proof.
In a single-layer GIN, we assume that the inner MLP has two layers as earlier introduced in the paper. The outputs of a single-layer GIN () with original and perturbed GSOs are given as
(69) | |||
(70) |
Expanding (69) and (70) with full matrix transformations, we have
(71) | ||||
(72) |
We can split (71) as
(73a) | |||
(73b) | |||
(73c) | |||
(73d) |
where denotes the intermediate output of the first layer, and represents the output of the second layer. For simplicity of notation, we use instead of . Similarly, we split (72) as
(74a) | |||
(74b) | |||
(74c) | |||
(74d) |
Then, the norm of difference between the perturbed (74d) and clean outputs (73d) is
(75) |
Using the Lipschitz condition of the nonlinearity function in (75), we have
(76) |
Representing by (74c) and by (73c), we have
(77) |
Representing by (74b) and by (73b), we obtain
(78) |
Using the Lipschitz condition of the nonlinearity function in (78), we have
(79) |
Representing by (74a) and by (73a), we have
(80) |
We can rewrite (80) by deleting and adding as
(81) |
Substituting (81) into (80), and using the triangular inequality, we have
(82) |
For the second term in (82), we have for . Then, with the definition of GSO error (5), (82) becomes
(83) |
By connecting (83), (79), (D), (76) together, we can bound the one-layer GIN output difference as
(84) |
Taking the expectation of (84), we have
(85) |
Finally, let , then, we have
(86) |
This completes the proof. ∎
References
- [1] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst, “Geometric deep learning: Going beyond Euclidean data,” IEEE Signal Process. Mag., vol. 34, no. 4, pp. 18–42, July 2017.
- [2] X. Dong, D. Thanou, L. Toni, M. Bronstein, and P. Frossard, “Graph signal processing for machine learning: A review and new perspectives,” IEEE Signal Process. Mag., vol. 37, no. 6, pp. 117–127, Oct. 2020.
- [3] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” IEEE Trans. Neural Netw. Learning Syst., vol. 32, no. 1, pp. 4–24, Mar. 2021.
- [4] E. Isufi, F. Gama, D. I. Shuman, and S. Segarra, “Graph filters for signal processing and machine learning on graphs,” IEEE Trans. Signal Process., pp. 1–32, 2024.
- [5] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in Proc. 5th Int. Conf. Learn. Representations, Toulon, France, Apr. 24-26, 2017, pp. 1–14.
- [6] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” in Proc. 7th Int. Conf. Learn. Representations, New Orleans, LA, USA, May 6-9, 2019, pp. 1–17.
- [7] Q. Li, X.-M. Wu, H. Liu, X. Zhang, and Z. Guan, “Label efficient semi-supervised learning via graph filtering,” in Proc. 32nd Conf. Comput. Vision and Pattern Recognition, Long Beach, CA, USA, June 16-20, 2019, pp. 9574–9583.
- [8] F. Wu, T. Zhang, A. H. d. Souza, Jr, C. Fifty, T. Yu, and K. Q. Weinberger, “Simplifying graph convolutional networks,” in Proc. 36th Int. Conf. Mach. Learning, Long Beach, California, USA, June 9-15, 2019, pp. 6861–6871.
- [9] R. Levie, F. Monti, X. Bresson, and M. M. Bronstein, “Cayleynets: Graph convolutional neural networks with complex rational spectral filters,” IEEE Trans. Signal Process., vol. 67, no. 1, pp. 97–109, Nov. 2019.
- [10] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, “Graph attention networks,” in Proc. 6th Int. Conf. Learn. Representations, Vancouver, BC, Canada, Apr. 30 - May 3, 2018.
- [11] E. Isufi, F. Gama, and A. Ribeiro, “EdgeNets: Edge varying graph neural networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 7457–7473, Sept. 2022.
- [12] M. Coutino, E. Isufi, and G. Leus, “Advances in distributed graph filtering,” IEEE Trans. Signal Process., vol. 67, no. 9, pp. 2320–2333, Mar. 2019.
- [13] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs,” IEEE Trans. Signal Process., vol. 61, no. 7, pp. 1644–1656, Apr. 2013.
- [14] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in Proc. 30th Conf. Neural Inform. Process. Syst., Barcelona, Spain, Dec. 5-10, 2016, pp. 3844–3858.
- [15] X. Dong, D. Thanou, P. Frossard, and P. Vandergheynst, “Learning laplacian matrix in smooth graph signal representations,” IEEE Trans. Signal Process., vol. 64, no. 23, pp. 6160–6173, Dec. 2016.
- [16] S. Segarra, A. G. Marques, G. Mateos, and A. Ribeiro, “Network topology inference from spectral templates,” IEEE Trans. Signal Inf. Process. Netw., vol. 3, no. 3, pp. 467–483, July 2017.
- [17] A. Buciulea, S. Rey, and A. G. Marques, “Learning graphs from smooth and graph-stationary signals with hidden variables,” IEEE Trans. Signal Inf. Process. Netw., vol. 8, pp. 273–287, Mar. 2022.
- [18] J. Miettinen, S. A. Vorobyov, and E. Ollila, “Modelling and studying the effect of graph errors in graph signal processing,” Signal Process., vol. 189, 108256, pp. 1–8, Dec. 2021.
- [19] Z. Gao, E. Isufi, and A. Ribeiro, “Stability of graph convolutional neural networks to stochastic perturbations,” Signal Process., vol. 188, 108216, pp. 1–15, Nov. 2021.
- [20] K. Xu, H. Chen, S. Liu, P.-Y. Chen, T.-W. Weng, M. Hong, and X. Lin, “Topology attack and defense for graph neural networks: An optimization perspective,” in Proc. 28th Int. Joint Conf. Artif. Intell., Macao, China, Aug. 10-16, 2019, pp. 3961–3967.
- [21] E. Ceci and S. Barbarossa, “Graph signal processing in the presence of topology uncertainties,” IEEE Trans. Signal Process., vol. 68, pp. 1558–1573, Feb. 2020.
- [22] H. Kenlay, D. Thanou, and X. Dong, “On the stability of graph convolutional neural networks under edge rewiring,” in Proc. 46th IEEE Int. Conf. Acoustic, Speech and Signal Process., Toronto, Canada, June 6-11, 2021, pp. 8513–8517.
- [23] ——, “Interpretable stability bounds for spectral graph filters,” in Proc. 38th Int. Conf. Mach. Learning, vol. 139, Virtual, July 18-24, 2021, pp. 5388–5397.
- [24] F. Gama, J. Bruna, and A. Ribeiro, “Stability properties of graph neural networks,” IEEE Trans. Signal Process., vol. 68, pp. 5680–5695, Sept. 2020.
- [25] R. Levie, W. Huang, L. Bucci, M. Bronstein, and G. Kutyniok, “Transferability of spectral graph convolutional neural networks,” J. Mach. Learn. Res., vol. 22, no. 1, pp. 12 462–12 520, Nov. 2021.
- [26] H. Dai, H. Li, T. Tian, X. Huang, L. Wang, J. Zhu, and L. Song, “Adversarial attack on graph structured data,” in Proc. 35th Int. Conf. Mach. Learning, vol. 80, Stockholm, Sweden, July 10-15, 2018, pp. 1115–1124.
- [27] D. Zügner, A. Akbarnejad, and S. Günnemann, “Adversarial attacks on neural networks for graph data,” in Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discov. & Data Mining, London, United Kingdom, Aug. 19-23, 2018, p. 2847–2856.
- [28] H. Wu, C. Wang, Y. Tyshetskiy, A. Docherty, K. Lu, and L. Zhu, “Adversarial examples for graph data: Deep insights into attack and defense,” in Proc. 28th Int. Joint Conf. Artif. Intell., Macao, China, Aug. 10-16, 2019, pp. 4816–4823.
- [29] B. Wang, J. Jia, X. Cao, and N. Z. Gong, “Certified robustness of graph neural networks against adversarial structural perturbation,” in Proc. 27th ACM SIGKDD Int. Conf. Knowl. Discov. & Data Mining, Virtual, Aug. 14-18, 2021, pp. 1645–1653.
- [30] L. Lin, E. Blaser, and H. Wang, “Graph structural attack by perturbing spectral distance,” in Proc. 28th ACM SIGKDD Int. Conf. Knowl. Discov. & Data Mining, Washington DC, USA, Aug. 14-18, 2022, p. 989–998.
- [31] X. Wang, E. Ollila, and S. A. Vorobyov, “Graph neural network sensitivity under probabilistic error model,” in Proc. 30th Eur. Signal Process. Conf., Belgrade, Serbia, Aug. 29 - Sept. 2, 2022, pp. 2146–2150.
- [32] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98, Apr. 2013.
- [33] M. Penrose, “Random geometric graphs,” Oxford Stud. in Probab., vol. 5, 2003.
- [34] A. Hagberg, P. Swart, and D. S Chult, “Exploring network structure, dynamics, and function using networkx,” Los Alamos National Lab., Los Alamos, NM, USA, Tech. Rep., 2008.
- [35] G. Golub and C. Van Loan, Matrix Computations vol. 3. Baltimore, MD, USA: The Johns Hopkins Univ. Press, 2012.
- [36] T. Aven, “Upper (lower) bounds on the mean of the maximum (minimum) of a number of random variables,” J. Appl. Probab., vol. 22, no. 3, pp. 723–728, Sept. 1985.
- [37] G. Ohayon, T. Michaeli, and M. Elad, “The perception-robustness tradeoff in deterministic image restoration,” arXiv:2311.09253, [eess.IV], 2023.
- [38] B. Weisfeiler and A. Lehman, “A reduction of a graph to a canonical form and an algebra arising during this reduction,” Nauchno-Technicheskaya Informatsia, vol. 2, no. 9, pp. 12–16, 1968.
- [39] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad, “Collective classification in network data,” AI Magazine, vol. 29, no. 3, p. 93, Sept. 2008.
- [40] L. Chizat, G. Peyré, B. Schmitzer, and F.-X. Vialard, “Unbalanced optimal transport: Dynamic and kantorovich formulations,” J. Funct. Anal., vol. 274, no. 11, pp. 3090–3123, June 2018.
- [41] L. Chapel, M. Z. Alaya, and G. Gasso, “Partial optimal tranport with applications on positive-unlabeled learning,” in Proc. 33th Conf. Neural Inform. Process. Syst., H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Virtual, Dec. 7-12, 2020, pp. 2903–2913.
- [42] H. P. Maretic, M. El Gheche, M. Minder, G. Chierchia, and P. Frossard, “Wasserstein-based graph alignment,” IEEE Trans. Signal Inf. Process. Netw., vol. 8, pp. 353–363, Apr. 2022.
- [43] C.-Y. Chuang and S. Jegelka, “Tree mover’s distance: Bridging graph metrics and stability of graph neural networks,” in Proc. 35th Conf. Neural Inform. Process. Syst., vol. 35, New Orleans, USA, Nov. 28 - Dec. 9, 2022, pp. 2944–2957.
- [44] L. Sun, Y. Dou, C. Yang, K. Zhang, J. Wang, P. S. Yu, L. He, and B. Li, “Adversarial attack and defense on graph data: A survey,” IEEE Trans. Knowl. Data Eng., pp. 1–20, Sept. 2022.
- [45] H. Jin and X. Zhang, “Latent adversarial training of graph convolution networks,” in Proc. 36th Int. Conf. Mach. Learning Workshop Learn. Reasoning with Graph-structured Representations, Long Beach, California, USA, June 9-15, 2019, pp. 1–7.
- [46] F. Feng, X. He, J. Tang, and T.-S. Chua, “Graph adversarial training: Dynamically regularizing based on graph structure,” IEEE Trans. Knowl. Data Eng., vol. 33, no. 6, pp. 2493–2504, June 2019.
- [47] Q. Dai, X. Shen, L. Zhang, Q. Li, and D. Wang, “Adversarial training methods for network embedding,” in Proc. 30th The World Wide Web Conf., San Francisco, CA, USA, May 13-17, 2019, pp. 329–339.
- [48] J. Ren, Z. Zhang, J. Jin, X. Zhao, S. Wu, Y. Zhou, Y. Shen, T. Che, R. Jin, and D. Dou, “Integrated defense for resilient graph matching,” in Proc. 38th Int. Conf. Mach. Learning, vol. 139, Virtual, July 18-24, 2021, pp. 8982–8997.
- [49] X. Zhao, Z. Zhang, Z. Zhang, L. Wu, J. Jin, Y. Zhou, R. Jin, D. Dou, and D. Yan, “Expressive 1-lipschitz neural networks for robust multiple graph learning against adversarial attacks,” in Proc. 38th Int. Conf. Mach. Learning, vol. 139, July 18-24, 2021, pp. 12 719–12 735.
- [50] H. E. Egilmez, E. Pavez, and A. Ortega, “Graph learning from filtered signals: Graph system and diffusion kernel identification,” IEEE Trans. Signal Inf. Process. Netw., vol. 5, no. 2, pp. 360–374, June 2018.
- [51] X. Pu, S. L. Chau, X. Dong, and D. Sejdinovic, “Kernel-based graph learning from smooth signals: A functional viewpoint,” IEEE Trans. Signal Inf. Process. Netw., vol. 7, pp. 192–207, Feb. 2021.
- [52] R. Levie, E. Isufi, and G. Kutyniok, “On the transferability of spectral graph filters,” in Proc. 13th Int. Conf. on Sampling Theory and Appl., Bordeaux, France, July 8-12, 2019, pp. 1–5.