Graph Convolutional Neural Networks Sensitivity under Probabilistic Error Model

Xinjue Wang,  Esa Ollila,  and Sergiy A. Vorobyov All the authors are with the Department of Information and Communications Engineering, Aalto University, Finland. This research was partially supported by the Research Council of Finland under Grant 359848 and 357715.
Abstract

Graph Neural Networks (GNNs), particularly Graph Convolutional Neural Networks (GCNNs), have emerged as pivotal instruments in machine learning and signal processing for processing graph-structured data. This paper proposes an analysis framework to investigate the sensitivity of GCNNs to probabilistic graph perturbations, directly impacting the graph shift operator (GSO). Our study establishes tight expected GSO error bounds, which are explicitly linked to the error model parameters, and reveals a linear relationship between GSO perturbations and the resulting output differences at each layer of GCNNs. This linearity demonstrates that a single-layer GCNN maintains stability under graph edge perturbations, provided that the GSO errors remain bounded, regardless of the perturbation scale. For multilayer GCNNs, the dependency of system’s output difference on GSO perturbations is shown to be a recursion of linearity. Finally, we exemplify the framework with the Graph Isomorphism Network (GIN) and Simple Graph Convolution Network (SGCN). Experiments validate our theoretical derivations and the effectiveness of our approach.

Index Terms:
Sensitivity analysis, graph convolutional neural network, graph shift operator, structural perturbation

I Introduction

Graph neural networks (GNNs) have steadily gained prominence as an innovative tool in machine learning and signal processing, exhibiting unparalleled efficiency in processing data encapsulated within complex graph structures [1, 2, 3]. Uniquely designed, GNNs utilize a system of intricately coupled graph filters (GFs) with nonlinear activation functions, enabling the effective transformation and propagation of information within the graph [4].

Different GNN architectures can be delineated based on the GFs, which are an integral to the functioning of GNNs. A notable example of these architectures uses graph-convolutional filters. The GNN employing this design is known as the Graph Convolutional Neural Network (GCNN). Some examples of GCNNs include the vanilla Graph Convolutional Network (GCN) [5], Graph Isomorphism Network (GIN) [6], Simple Graph Convolution Network (SGCN) [7, 8], and Cayley Graph Convolutional Network (CayleyNet) [9]. In contrast to the aforementioned GCNNs, there exist non-convolutional GNNs such as the Graph Attention Network (GAT) [10] and Edge Varying Graph Neural Network (EdgeNet) [11], which utilize edge-varying graph filters [12].

This paper delves into the GCNN, which blends graph convolutional filters with nonlinear activation functions. Graph convolutional filters couple the data and graph with the underlying graph matrix, named graph shift operator (GSO), which can be, for example, the graph adjacency matrix or graph Laplacian, encoding the interactions between data samples [13]. Based on the GSO, the graph filter captures the structural information by aggregating the data propagated within its klimit-from𝑘k-italic_k -hop neighborhoods, and feeds it to the next layer after processing, which can be applying graph coarsening and pooling [14, 6]. As the key component of GCNNs, GSO presents the graph structure, and is typically assumed to be perfectly known. The precise estimation of the hidden graph structure is essential for successfully performing feature propagation in a convolution layer [15, 16, 17].

GSOs form the foundation of GCNN structures. Any perturbation in the graph structure has a direct bearing on the operations of a GCNN. Previous studies in graph signal processing (GSP) and GNN have examined both deterministic and probabilistic perturbations affecting GSOs. A probabilistic graph perturbation model for a partially correct estimation of the adjacency matrix is proposed in [18], where a perturbed graph is modeled as a combination of the true adjacency matrix and a perturbation term specified by Erdős-Rényi (ER) graph. The work [19] explores perturbations in graphs using random edge sampling, a scheme characterized by randomly deleting existing edges. In [20], a GSO perturbation strategy is formulated leveraging a general first-order optimization method, which concurrently imposes a constraint on the extent of edge perturbation. In [21], the authors propose to perturb eigenvector pairs of the graph Laplacian, considering single and multiple edge perturbations, under small perturbation assumption. Here, small perturbations refer to changes in a small percentage of edges.

The stability of GFs and GCNNs under GSO perturbations is one of the key research areas in signal processing (SP) and computer science (CS). In the SP community, research focuses on the relationship between the system’s output differences and the GSO differences under evasion attacks, emphasizing changes in the learned representation. In [22], the authors provide bounds on the output changes of spectral GFs resulting from double edge rewiring on normalized augmented adjacency matrices. This study extends the stability results to SGCN and gives theoretical bounds. In [23], the authors present interpretable bounds to verify the stability of spectral GFs against graph edge perturbations. These bounds are derived under the constraint that the degree of any node after perturbation cannot exceed twice its original degree. In [24], the authors apply an additive error model with norm-bounded perturbations on unspecified GSOs to provide stability bounds for multi-layer GCNNs. This model is not generic as it does not explicitly account for the perturbation of graph edges. It primarily considers perturbations resembling a uniform scaling of edge weights, a limitation noted in [25]. Additionally, the bound of error matrix is defined based on the smallest operator norm achievable via node permutation. However, this permutation assumption may not suit social or citation networks where node identification is label-dependent, as noted in [22]. In [19], authors consider random edge deletions as the perturbation on GSOs, specifically focusing on adjacency matrices and graph Laplacians. It concludes that both the GF and GCNN are linearly stable with respect to several factors, including the probability of edge dropping, nonlinearity, and the width and depth of the network architecture. Nevertheless, in the experiments of [19], the maximum edge deletion probability is set to 6%percent66\%6 %, indicating a limited scale on perturbation. Works in CS [26, 27, 28, 29, 30] focus on the effects of adversarial attacks affecting GCNN accuracy, considering both evasion and poisoning attacks. The focus is on the impacts of such attacks on the downstream task. For instance, under evasion attacks, [27] demonstrates the reduction on GCNN’s accuracy under small perturbations, while maintaining the degree distributions after the attack, and [30] demonstrates the significant drop of accuracy of GCN when 5% of edges are altered.

In this paper, we introduce a sensitivity analysis framework for GCNN under the probabilistic edge perturbation model [18]. We understand stability as the characteristic of a system to maintain bounded output under perturbations, while sensitivity analysis is an examination of how variations in the output depend on influencing factors. Our analysis concentrates on studying the effects of evasion attacks. We use statistical analysis to give expected bounds for GSO errors (Theorem 1 and Proposition 1). These error bounds are explicitly dependent on the parameters of the error model. Then, we establish a sensitivity analysis framework for both GF (Theorem 2) and multilayer GCNN (Theorem 3) by giving expected bounds for differences of outputs because of GSO errors. Finally, we exemplify the framework with GIN (Corollary 1) and SGCN (Corollary 2), and empirically show that under large-scale graph perturbations (significant edge modifications), GCNNs maintain stability.

Our detailed contributions are summarized as follows.

1. Probabilistic error model. The probabilistic edge perturbation model considered is general and practically appealing. It is grounded in stochastic block models, supports both deletion and addition of edges, and permits a broader perturbation scale. The corresponding analysis approach contrasts with the constrained perturbations in existing GCNN analyses, which involve such restrictions as permitting only edge deletions in [19], double edge rewiring in [22], and small norm bounded errors in [24].

2. Tight GSO error bound. We give tighter expected bounds on GSO errors compared to our previous conference work [31], in which the bounds are deterministic. We use the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm suggested in [23] to bound the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm and make this bound interpretable by specifically tracking the changed node degrees, which can be directly linked to parameters of the error model (probabilities of deleting and adding edges). Additionally, our bound does not require the eigendecomposition of GSO [24, 19], which is computationally heavy for large graphs.

3. Generic sensitivity analysis framework. Compared to previous works [24, 19, 22], our proposed analysis framework is more generic in the following aspects. (i) We remove the assumption on limited scale perturbation and allow for a large perturbation budget, for instance that 50% of edges are deleted and 70% of edges are added (compared to the original number of edges). Our analysis is shown empirically to be valid even under such perturbation, while the maximum edge perturbation addressed in the current literature is 10%percent1010\%10 % of edges [23]. (ii) We provide expected bounds under a probabilistic perspective, while the deterministic perturbations can be seen as special cases of our analysis. (iii) This framework is applicable to general GCNN models, with specific adjustments for GSO, graph shifts count, network layer count, and activation functions.

Outline. The remainder of this paper is structured as follows. In Sections II and III, we establish the fundamentals of GCNNs and proceed to formulate the problem. Section IV bounds the difference between original and perturbed GSOs, with particular emphasis on two cases: the adjacency matrix and its normalized version. Section V encompasses both GFs and GCNNs like GIN and SGCN, and demonstrates that variations in the output of each GCNN layer in response to graph perturbations are linearly bounded. Empirical validations presented in Section VI use numerical experiments with both synthetic and real-world data to corroborate the proposed theorems, thereby attesting to the reliability of our sensitivity analysis model. Section VII concludes the paper and discusses the future work.

Notation. Boldface lower case letters such as 𝐱𝐱{\mathbf{x}}bold_x represent column vectors, while boldface capital letters like 𝐗𝐗{\mathbf{X}}bold_X denote matrices. A vector full of ones is symbolized as 𝟏Nsubscript1𝑁\mathbf{1}_{N}bold_1 start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT, and a N×N𝑁𝑁N\times Nitalic_N × italic_N matrix full of ones is expressed as 𝟏N×N=𝟏N𝟏Nsubscript1𝑁𝑁subscript1𝑁superscriptsubscript1𝑁top\mathbf{1}_{N\times N}=\mathbf{1}_{N}\mathbf{1}_{N}^{\top}bold_1 start_POSTSUBSCRIPT italic_N × italic_N end_POSTSUBSCRIPT = bold_1 start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT bold_1 start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. The identity matrix of size N×N𝑁𝑁N\times Nitalic_N × italic_N is represented as 𝐈N×Nsubscript𝐈𝑁𝑁{\mathbf{I}}_{N\times N}bold_I start_POSTSUBSCRIPT italic_N × italic_N end_POSTSUBSCRIPT. The i𝑖iitalic_i-th row or column of the matrix 𝐀𝐀{\mathbf{A}}bold_A is given as 𝐀isubscript𝐀𝑖{\mathbf{A}}_{i}bold_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and the (i,j)𝑖𝑗(i,j)( italic_i , italic_j )-th element in matrix 𝐀𝐀{\mathbf{A}}bold_A is denoted as [𝐀]i,jsubscriptdelimited-[]𝐀𝑖𝑗[{\mathbf{A}}]_{i,j}[ bold_A ] start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT or 𝐀i,jsubscript𝐀𝑖𝑗{\mathbf{A}}_{i,j}bold_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT. Vector 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm is defined as follows: 𝐚1=j|𝐚j|subscriptnorm𝐚1subscript𝑗subscript𝐚𝑗\|{\mathbf{a}}\|_{1}=\sum_{j}|{\mathbf{a}}_{j}|∥ bold_a ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | bold_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT |. Matrix norms are defined as follows: the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm is represented as 𝐀1=maxji|𝐀i,j|subscriptnorm𝐀1subscript𝑗subscript𝑖subscript𝐀𝑖𝑗\|{\mathbf{A}}\|_{1}=\max_{j}\sum_{i}|{\mathbf{A}}_{i,j}|∥ bold_A ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT |, the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm as 𝐀=𝐀2=max(eig(𝐀𝐀))norm𝐀subscriptnorm𝐀2eigsuperscript𝐀top𝐀\|{\mathbf{A}}\|=\|{\mathbf{A}}\|_{2}=\sqrt{\max(\text{eig}({\mathbf{A}}^{\top% }{\mathbf{A}}))}∥ bold_A ∥ = ∥ bold_A ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = square-root start_ARG roman_max ( eig ( bold_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_A ) ) end_ARG (largest singular value of 𝐀𝐀{\mathbf{A}}bold_A), and the subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm as 𝐀=maxij|𝐀i,j|subscriptnorm𝐀subscript𝑖subscript𝑗subscript𝐀𝑖𝑗\|{\mathbf{A}}\|_{\infty}=\max_{i}\sum_{j}|{\mathbf{A}}_{i,j}|∥ bold_A ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | bold_A start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT |. In addition, the Hadamard product is expressed with the symbol \circ. We use Pr()Pr\textrm{Pr}(\cdot)Pr ( ⋅ ) for probability, 𝔼()𝔼{\mathbb{E}}(\cdot)blackboard_E ( ⋅ ) for expectation, Var()Var\mathrm{Var}(\cdot)roman_Var ( ⋅ ) for variance, and Cov(,)Cov\text{Cov}(\cdot,\cdot)Cov ( ⋅ , ⋅ ) for covariance.

II Preliminaries

Graph theory, GSP, and GCNN form the cornerstone of data analysis in irregular domains. The GSO plays a key role in directing information flow across the graph, thereby enabling the creation of GFs and the design of GCNNs.

The sensitivity analysis of the GSO, which essentially involves matrix sensitivity analysis, provides an empirical insight into the system’s resilience to perturbations. The GCNN, with its local architecture, maintains most of the properties of the graph convolutional filter, making it an ideal tool for sensitivity analysis. These preliminary concepts are essential for the implementation of sensitivity analysis in a graph-based context.

Graph Basics. Consider an undirected and unweighted graph 𝒢=(𝒱,,𝒲)𝒢𝒱𝒲{\mathcal{G}}=({\mathcal{V}},{\mathcal{E}},{\mathcal{W}})caligraphic_G = ( caligraphic_V , caligraphic_E , caligraphic_W ), where the node set 𝒱={1,,N}𝒱1𝑁{\mathcal{V}}=\{1,\ldots,N\}caligraphic_V = { 1 , … , italic_N } consists of N𝑁Nitalic_N nodes, the edge set {\mathcal{E}}caligraphic_E is a subset of 𝒱×𝒱𝒱𝒱{\mathcal{V}}\times{\mathcal{V}}caligraphic_V × caligraphic_V, and the edge weighting function 𝒲:𝒱×𝒱{0,1}:𝒲𝒱𝒱01{\mathcal{W}}:{\mathcal{V}}\times{\mathcal{V}}\to\{0,1\}caligraphic_W : caligraphic_V × caligraphic_V → { 0 , 1 } assigns binary edges. For an edge (i,j)𝑖𝑗(i,j)\in{\mathcal{E}}( italic_i , italic_j ) ∈ caligraphic_E, we have 𝒲(i,j)=𝒲(j,i)=1𝒲𝑖𝑗𝒲𝑗𝑖1{\mathcal{W}}(i,j)={\mathcal{W}}(j,i)=1caligraphic_W ( italic_i , italic_j ) = caligraphic_W ( italic_j , italic_i ) = 1 due to our focus on undirected and unweighted graphs. We define the 1111-hop neighboring set of a node i𝑖iitalic_i as 𝒩i={j𝒱:(i,j)}subscript𝒩𝑖conditional-set𝑗𝒱𝑖𝑗{\mathcal{N}}_{i}=\{j\in{\mathcal{V}}:(i,j)\in{\mathcal{E}}\}caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_j ∈ caligraphic_V : ( italic_i , italic_j ) ∈ caligraphic_E }, the degree of node i𝑖iitalic_i as disubscript𝑑𝑖d_{i}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and the minimum degree of nodes around i𝑖iitalic_i as τi=minj𝒩idjsubscript𝜏𝑖subscript𝑗subscript𝒩𝑖subscript𝑑𝑗\tau_{i}=\min_{j\in{\mathcal{N}}_{i}}d_{j}italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

GSO. The Graph Shift Operator (GSO) 𝐒N×N𝐒superscript𝑁𝑁{\mathbf{S}}\in{\mathbb{R}}^{N\times N}bold_S ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT symbolizes the structure of a graph and guides the passage and fusion of signals between neighboring nodes. It is often represented by the adjacency matrix 𝐀𝐀{\mathbf{A}}bold_A, the Laplacian 𝐋𝐋{\mathbf{L}}bold_L, or their normalized counterparts. These representations capture the graph’s connectivity patterns, marking them indispensable tools for data analysis in both regular and irregular domains [32]. The adjacency matrix, denoted by 𝐀𝐀{\mathbf{A}}bold_A, incorporates both the weighting function and the graph topology 𝒢𝒢{\mathcal{G}}caligraphic_G, where [𝐀]ij=1subscriptdelimited-[]𝐀𝑖𝑗1[{\mathbf{A}}]_{ij}=1[ bold_A ] start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1 if (i,j)𝑖𝑗(i,j)\in{\mathcal{E}}( italic_i , italic_j ) ∈ caligraphic_E and [𝐀]ij=0subscriptdelimited-[]𝐀𝑖𝑗0[{\mathbf{A}}]_{ij}=0[ bold_A ] start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 0 if (i,j)𝑖𝑗(i,j)\not\in{\mathcal{E}}( italic_i , italic_j ) ∉ caligraphic_E. The Laplacian matrix 𝐋𝐋{\mathbf{L}}bold_L is defined by the adjacency matrix and a diagonal degree matrix 𝐃𝐃{\mathbf{D}}bold_D. Specifically, 𝐋=𝐃𝐀𝐋𝐃𝐀{\mathbf{L}}={\mathbf{D}}-{\mathbf{A}}bold_L = bold_D - bold_A, where 𝐃=diag(𝐀𝟏N)𝐃diagsubscript𝐀𝟏𝑁{\mathbf{D}}=\text{diag}({\mathbf{A}}{\mathbf{1}}_{N})bold_D = diag ( bold_A1 start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) is a diagonal matrix, and [𝐃]ii=disubscriptdelimited-[]𝐃𝑖𝑖subscript𝑑𝑖[{\mathbf{D}}]_{ii}=d_{i}[ bold_D ] start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT = italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The value di=j𝒩i[𝐀]ijsubscript𝑑𝑖subscript𝑗subscript𝒩𝑖subscriptdelimited-[]𝐀𝑖𝑗d_{i}=\sum_{j\in{\mathcal{N}}_{i}}[{\mathbf{A}}]_{ij}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_A ] start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT denotes the degree of node i𝑖iitalic_i. Moreover, normalized versions of the adjacency and Laplacian matrices are defined as 𝐀n=𝐃1/2𝐀𝐃1/2subscript𝐀nsuperscript𝐃12superscript𝐀𝐃12{\mathbf{A}}_{\textrm{n}}={\mathbf{D}}^{-1/2}{\mathbf{A}}{\mathbf{D}}^{-1/2}bold_A start_POSTSUBSCRIPT n end_POSTSUBSCRIPT = bold_D start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_AD start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT and 𝐋n=𝐃1/2𝐋𝐃1/2subscript𝐋nsuperscript𝐃12superscript𝐋𝐃12{\mathbf{L}}_{\textrm{n}}={\mathbf{D}}^{-1/2}{\mathbf{L}}{\mathbf{D}}^{-1/2}bold_L start_POSTSUBSCRIPT n end_POSTSUBSCRIPT = bold_D start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_LD start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT, respectively. These normalized versions help maintain consistency and manage potential variations in the scale of the data.

Graph Convolutional Filter. Using GSO, graph signals undergo shifting and averaging across their neighboring nodes. The signal on the graph is denoted by 𝐱N𝐱superscript𝑁{\mathbf{x}}\in{\mathbb{R}}^{N}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT. Its i𝑖iitalic_i-th entry [𝐱]i=xisubscriptdelimited-[]𝐱𝑖subscript𝑥𝑖[{\mathbf{x}}]_{i}=x_{i}[ bold_x ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT specifies the data value at the node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The one time shift of graph signal is simply 𝐒𝐱𝐒𝐱{\mathbf{S}}{\mathbf{x}}bold_Sx, whose value at node i𝑖iitalic_i is [𝐒𝐱]i=j𝒩isijxjsubscriptdelimited-[]𝐒𝐱𝑖subscript𝑗subscript𝒩𝑖subscript𝑠𝑖𝑗subscript𝑥𝑗[{\mathbf{S}}{\mathbf{x}}]_{i}=\sum_{j\in{\mathcal{N}}_{i}}s_{ij}x_{j}[ bold_Sx ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. After one graph shift, the value at node i𝑖iitalic_i is given by moving a local linear operator over its neighborhood values {xj}j𝒩subscriptsubscript𝑥𝑗𝑗𝒩\{x_{j}\}_{j\in{\mathcal{N}}}{ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ caligraphic_N end_POSTSUBSCRIPT. Based on the graph shifting, a graph convolutional filter 𝐡(𝐒)𝐡𝐒{\mathbf{h}}({\mathbf{S}})bold_h ( bold_S ) with K𝐾Kitalic_K taps is defined via polynomials of GSO and the filter weights 𝐡={hk}k=0K𝐡superscriptsubscriptsubscript𝑘𝑘0𝐾{\mathbf{h}}=\{h_{k}\}_{k=0}^{K}bold_h = { italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT in the graph convolution

𝐲=h0𝐒0𝐱++hK𝐒K𝐱=k=0Khk𝐒k𝐱=𝐡(𝐒)𝐱,𝐲subscript0superscript𝐒0𝐱subscript𝐾superscript𝐒𝐾𝐱superscriptsubscript𝑘0𝐾subscript𝑘superscript𝐒𝑘𝐱𝐡𝐒𝐱{\mathbf{y}}=h_{0}{\mathbf{S}}^{0}{\mathbf{x}}+\cdots+h_{K}{\mathbf{S}}^{K}{% \mathbf{x}}=\sum_{k=0}^{K}h_{k}{\mathbf{S}}^{k}{\mathbf{x}}={\mathbf{h}}({% \mathbf{S}}){\mathbf{x}},bold_y = italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_S start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT bold_x + ⋯ + italic_h start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT bold_S start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT bold_x = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT bold_x = bold_h ( bold_S ) bold_x , (1)

where 𝐲𝐲{\mathbf{y}}bold_y is the filter’s output and 𝐡(𝐒)=k=0Khk𝐒k𝐡𝐒superscriptsubscript𝑘0𝐾subscript𝑘superscript𝐒𝑘{\mathbf{h}}({\mathbf{S}})=\sum_{k=0}^{K}h_{k}{\mathbf{S}}^{k}bold_h ( bold_S ) = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT is a shift-invariant graph filter with K𝐾Kitalic_K taps, and denotes the weight of local information after K𝐾Kitalic_K-hop data exchanges. The graph filter is then combined with the nonlinear activation function, forming the primary component of GCNN and contributing to its expressivity.

Graph Perceptron and GCNN. A Graph Perceptron [4] is a simple unit of transformation in the GCNN. The functionality of a graph perceptron can be seamlessly extended to accommodate graph signals with multiple features. Specifically, a multi-feature graph signal can be denoted by 𝐗=[𝐱1,,𝐱d]N×d𝐗subscript𝐱1subscript𝐱𝑑superscript𝑁𝑑{\mathbf{X}}=[{\mathbf{x}}_{1},\cdots,{\mathbf{x}}_{d}]\in{\mathbb{R}}^{N% \times d}bold_X = [ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , bold_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_d end_POSTSUPERSCRIPT, where d𝑑ditalic_d signifies the number of features. The architecture of an L𝐿Litalic_L-layer GCNN is built upon cascading multiple graph perceptrons. It operates such that the output of a graph perceptron in a preceding layer serves as the input to the graph perceptron at the subsequent layer \ellroman_ℓ, where \ellroman_ℓ spans from 1111 to L𝐿Litalic_L. We denote the feature fed to the first layer as 𝐗0=𝐗subscript𝐗0𝐗{\mathbf{X}}_{0}={\mathbf{X}}bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_X. For an L𝐿Litalic_L-layer GCNN, the graph perceptron at layer \ellroman_ℓ can be represented as

𝐘=k=1K𝐒k𝐗1𝐇k,𝐗=σ(𝐘).\begin{split}{\mathbf{Y}}_{\ell}=\sum_{k=1}^{K}{\mathbf{S}}^{k}{\mathbf{X}}_{% \ell-1}{\mathbf{H}}_{\ell k},\ \ {\mathbf{X}}_{\ell}=\sigma_{\ell}\left({% \mathbf{Y}}_{\ell}\right).\end{split}start_ROW start_CELL bold_Y start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT roman_ℓ italic_k end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( bold_Y start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) . end_CELL end_ROW (2)

Here, 𝐘subscript𝐘{\mathbf{Y}}_{\ell}bold_Y start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT signifies the intermediate graph filter output, σ()subscript𝜎\sigma_{\ell}(\cdot)italic_σ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( ⋅ ) denotes the nonlinear activation function at layer \ellroman_ℓ, and graph signals at each layer are 𝐗subscript𝐗{\mathbf{X}}_{\ell}bold_X start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and 𝐗1subscript𝐗1{\mathbf{X}}_{\ell-1}bold_X start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT with sizes of N×Fsuperscript𝑁subscript𝐹{\mathbb{R}}^{N\times F_{\ell}}blackboard_R start_POSTSUPERSCRIPT italic_N × italic_F start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and N×F1superscript𝑁subscript𝐹1{\mathbb{R}}^{N\times F_{\ell-1}}blackboard_R start_POSTSUPERSCRIPT italic_N × italic_F start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, respectively, where Fsubscript𝐹F_{\ell}italic_F start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT denotes the number of features at the \ellroman_ℓ-th layer. The bank of filter coefficients is represented by 𝐇={𝐇k}=1,,L;k=1,,K𝐇subscriptsubscript𝐇𝑘formulae-sequence1𝐿𝑘1𝐾{\mathbf{H}}=\{{\mathbf{H}}_{\ell k}\}_{\ell=1,\ldots,L;k=1,\ldots,K}bold_H = { bold_H start_POSTSUBSCRIPT roman_ℓ italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 , … , italic_L ; italic_k = 1 , … , italic_K end_POSTSUBSCRIPT. By recursively using (2) until =L𝐿\ell=Lroman_ℓ = italic_L, a general GCNN can be formulated as

𝚽(𝐗;𝐇,𝐒)=𝐗L=σ(k=1K𝐒𝐗L1𝐇Lk).𝚽𝐗𝐇𝐒subscript𝐗𝐿𝜎superscriptsubscript𝑘1𝐾subscript𝐒𝐗𝐿1subscript𝐇𝐿𝑘\displaystyle\boldsymbol{\Phi}({\mathbf{X}};{\mathbf{H}},{\mathbf{S}})={% \mathbf{X}}_{L}=\sigma(\sum_{k=1}^{K}{\mathbf{S}}{\mathbf{X}}_{L-1}{\mathbf{H}% }_{Lk}).bold_Φ ( bold_X ; bold_H , bold_S ) = bold_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = italic_σ ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT bold_SX start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_L italic_k end_POSTSUBSCRIPT ) . (3)

This representation captures the nature of GCNN operations, going through each layer and applying the corresponding transformation defined by the graph signal, filter coefficients, and the non-linearity function. This hierarchical arrangement facilitates the flow of information through successive layers, thus enabling effective learning from graph-structured data.

III Problem Formulation

A pivotal aspect of understanding the sensitivity of a GCNN is the considerations of potential alterations in the underlying graph structure. These alterations can be broadly construed as perturbations to the GSO, intrinsically linking to changes in the graph topology. In the simplest form, any perturbation to the GSO can be depicted as

𝐒^=𝐒+𝐄,^𝐒𝐒𝐄{\hat{\mathbf{S}}}={\mathbf{S}}+{\mathbf{E}},over^ start_ARG bold_S end_ARG = bold_S + bold_E , (4)

where 𝐒^^𝐒{\hat{\mathbf{S}}}over^ start_ARG bold_S end_ARG signifies the perturbed GSO, 𝐒𝐒{\mathbf{S}}bold_S is the original GSO, and 𝐄𝐄{\mathbf{E}}bold_E represents the error term. The spectral norm of this error term is denoted by

d(𝐒^,𝐒)=𝐒^𝐒=𝐄.𝑑^𝐒𝐒norm^𝐒𝐒norm𝐄d({\hat{\mathbf{S}}},{\mathbf{S}})=\|{\hat{\mathbf{S}}}-{\mathbf{S}}\|=\|{% \mathbf{E}}\|.italic_d ( over^ start_ARG bold_S end_ARG , bold_S ) = ∥ over^ start_ARG bold_S end_ARG - bold_S ∥ = ∥ bold_E ∥ . (5)

Inspired by a previous work [18], we utilize a probabilistic error model to represent graph perturbations, where each edge of the graph is subject to perturbation independently. In this context, we primarily focus on the alterations occurring within the neighborhood of a particular node u𝒱𝑢𝒱u\in{\mathcal{V}}italic_u ∈ caligraphic_V. More specifically, the perturbed neighborhood may encompass added nodes (𝒜usubscript𝒜𝑢{\mathcal{A}}_{u}caligraphic_A start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT), deleted nodes (𝒟usubscript𝒟𝑢{\mathcal{D}}_{u}caligraphic_D start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT), and remaining nodes (usubscript𝑢{\mathcal{R}}_{u}caligraphic_R start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT), which ultimately leads to changes in node degree and modifications to the adjacency matrix. We aim to quantify the sensitivity of GSO in relation to these perturbations. To this end, we adopt and expand upon the notation used in [22, 23] for clarity and consistency.

When the graph undergoes perturbations, it transforms into 𝒢^=(𝒱,^,𝒲^)^𝒢𝒱^^𝒲\hat{{\mathcal{G}}}=({\mathcal{V}},\hat{{\mathcal{E}}},\hat{{\mathcal{W}}})over^ start_ARG caligraphic_G end_ARG = ( caligraphic_V , over^ start_ARG caligraphic_E end_ARG , over^ start_ARG caligraphic_W end_ARG ), with the node set remaining unaffected. We express degrees of node u𝒱𝑢𝒱u\in{\mathcal{V}}italic_u ∈ caligraphic_V in original and perturbed graphs as du=j|[𝐀]u,j|subscript𝑑𝑢subscript𝑗subscriptdelimited-[]𝐀𝑢𝑗d_{u}=\sum_{j}|[{\mathbf{A}}]_{u,j}|italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | [ bold_A ] start_POSTSUBSCRIPT italic_u , italic_j end_POSTSUBSCRIPT | and d^u=j|[𝐀^]u,j|=du+δusubscript^𝑑𝑢subscript𝑗subscriptdelimited-[]^𝐀𝑢𝑗subscript𝑑𝑢subscript𝛿𝑢{\hat{d}}_{u}=\sum_{j}|[{\hat{\mathbf{A}}}]_{u,j}|=d_{u}+\delta_{u}over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | [ over^ start_ARG bold_A end_ARG ] start_POSTSUBSCRIPT italic_u , italic_j end_POSTSUBSCRIPT | = italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, respectively. Here, 𝐀^^𝐀{\hat{\mathbf{A}}}over^ start_ARG bold_A end_ARG denotes the adjacency matrix of the perturbed graph 𝒢^^𝒢\hat{{\mathcal{G}}}over^ start_ARG caligraphic_G end_ARG, and δu=δu+δusubscript𝛿𝑢superscriptsubscript𝛿𝑢superscriptsubscript𝛿𝑢\delta_{u}=\delta_{u}^{+}-\delta_{u}^{-}italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT is the degree change at node u𝑢uitalic_u, with δu+=|𝒜u|superscriptsubscript𝛿𝑢subscript𝒜𝑢\delta_{u}^{+}=|{\mathcal{A}}_{u}|italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = | caligraphic_A start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT | and δu=|𝒟u|superscriptsubscript𝛿𝑢subscript𝒟𝑢\delta_{u}^{-}=|{\mathcal{D}}_{u}|italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT = | caligraphic_D start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT | corresponding to the number of edges added and deleted, respectively. We will further delve into the assumptions for the error model and its effects on the GCNN’s performance in the following discussion.

III-A Probabilistic Graph Error Model

Refer to caption
(a) ϵ1=0,ϵ2=0formulae-sequencesubscriptitalic-ϵ10subscriptitalic-ϵ20\epsilon_{1}=0,\epsilon_{2}=0italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0
Refer to caption
(b) ϵ1=0.3,ϵ2=0formulae-sequencesubscriptitalic-ϵ10.3subscriptitalic-ϵ20\epsilon_{1}=0.3,\epsilon_{2}=0italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.3 , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0
Refer to caption
(c) ϵ1=0,ϵ2=0.1formulae-sequencesubscriptitalic-ϵ10subscriptitalic-ϵ20.1\epsilon_{1}=0,\epsilon_{2}=0.1italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.1
Refer to caption
(d) ϵ1=0.3,ϵ2=0.1formulae-sequencesubscriptitalic-ϵ10.3subscriptitalic-ϵ20.1\epsilon_{1}=0.3,\epsilon_{2}=0.1italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.3 , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.1
Figure 1: Visual representation of the probabilistic graph error model applied to a random geometric graph. From left to right: (a) Original graph; (b) Graph after edge deletions (ϵ1=0.3,ϵ2=0formulae-sequencesubscriptitalic-ϵ10.3subscriptitalic-ϵ20\epsilon_{1}=0.3,\epsilon_{2}=0italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.3 , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0); (c) Graph after edge additions (ϵ1=0,ϵ2=0.1formulae-sequencesubscriptitalic-ϵ10subscriptitalic-ϵ20.1\epsilon_{1}=0,\epsilon_{2}=0.1italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.1); (d) Graph after both edge deletions and additions (ϵ1=0.3,ϵ2=0.1formulae-sequencesubscriptitalic-ϵ10.3subscriptitalic-ϵ20.1\epsilon_{1}=0.3,\epsilon_{2}=0.1italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.3 , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.1). Deleted edges are marked in red and added edges are marked in blue. The transformations effectively illustrate the impact of perturbations modeled by (6).

In this work, we utilize an Erdös-Rényi (ER) graph-based model for perturbations on a graph adjacency matrix, following the approach proposed in [18]. The adjacency matrix of an ER graph is characterized by a random N×N𝑁𝑁N\times Nitalic_N × italic_N matrix 𝚫ϵsubscript𝚫italic-ϵ\mathbf{\Delta}_{\epsilon}bold_Δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT, where each element of the matrix is generated independently, satisfying Pr([𝚫ϵ]i,j=1)=ϵPrsubscriptdelimited-[]subscript𝚫italic-ϵ𝑖𝑗1italic-ϵ\textrm{Pr}([\mathbf{\Delta}_{\epsilon}]_{i,j}=1)=\epsilonPr ( [ bold_Δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = 1 ) = italic_ϵ and Pr([𝚫ϵ]i,j=0)=1ϵPrsubscriptdelimited-[]subscript𝚫italic-ϵ𝑖𝑗01italic-ϵ\textrm{Pr}([\mathbf{\Delta}_{\epsilon}]_{i,j}=0)=1-\epsilonPr ( [ bold_Δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = 0 ) = 1 - italic_ϵ for all ij𝑖𝑗i\neq jitalic_i ≠ italic_j. The diagonal elements are zero, i.e., [𝚫ϵ]i,i=0subscriptdelimited-[]subscript𝚫italic-ϵ𝑖𝑖0[\mathbf{\Delta}_{\epsilon}]_{i,i}=0[ bold_Δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT = 0 for i=1,,N𝑖1𝑁i=1,\dots,Nitalic_i = 1 , … , italic_N, eliminating the possibility of self-loops. For the sake of our analysis, we also assume that the perturbed graph 𝒢^^𝒢\hat{{\mathcal{G}}}over^ start_ARG caligraphic_G end_ARG does not contain any isolated nodes, meaning that for all u𝒱𝑢𝒱u\in{\mathcal{V}}italic_u ∈ caligraphic_V, d^u1subscript^𝑑𝑢1{\hat{d}}_{u}\geq 1over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ≥ 1. The model can be adapted by employing the lower triangular matrix 𝚫ϵlsuperscriptsubscript𝚫italic-ϵ𝑙\boldsymbol{\Delta}_{\epsilon}^{l}bold_Δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, and then defining 𝚫ϵ=𝚫ϵl+(𝚫ϵl)subscript𝚫italic-ϵsuperscriptsubscript𝚫italic-ϵ𝑙superscriptsuperscriptsubscript𝚫italic-ϵ𝑙top\boldsymbol{\Delta}_{\epsilon}=\boldsymbol{\Delta}_{\epsilon}^{l}+(\boldsymbol% {\Delta}_{\epsilon}^{l})^{\top}bold_Δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT = bold_Δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT + ( bold_Δ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. Consequently, by specifying the error term in (4), the perturbed adjacency matrix of a graph signal can be expressed as

𝐀^=𝐀𝚫ϵ1𝐀+𝚫ϵ2(𝟏N×N𝐀),^𝐀𝐀subscript𝚫subscriptitalic-ϵ1𝐀subscript𝚫subscriptitalic-ϵ2subscript1𝑁𝑁𝐀{\hat{\mathbf{A}}}={\mathbf{A}}-\boldsymbol{\Delta}_{\epsilon_{1}}\circ{% \mathbf{A}}+\boldsymbol{\Delta}_{\epsilon_{2}}\circ(\mathbf{1}_{N\times N}-{% \mathbf{A}}),over^ start_ARG bold_A end_ARG = bold_A - bold_Δ start_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ bold_A + bold_Δ start_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ ( bold_1 start_POSTSUBSCRIPT italic_N × italic_N end_POSTSUBSCRIPT - bold_A ) , (6)

where the first term is responsible for edge deletion with probability ϵ1subscriptitalic-ϵ1\epsilon_{1}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and the second term accounts for edge addition with probability ϵ2subscriptitalic-ϵ2\epsilon_{2}italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. This error model can be conceptualized as superimposing two ER graphs on top of the original graph. To better illustrate this model, we utilize visual aids based on a random geometric graph [33, 34]. Fig. 1 visually represents the transition from the original graph to perturbed versions, which include the graph with only edge deletions (ϵ1=0.3,ϵ2=0formulae-sequencesubscriptitalic-ϵ10.3subscriptitalic-ϵ20\epsilon_{1}=0.3,\epsilon_{2}=0italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.3 , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0), the graph with only edge additions (ϵ1=0,ϵ2=0.1formulae-sequencesubscriptitalic-ϵ10subscriptitalic-ϵ20.1\epsilon_{1}=0,\epsilon_{2}=0.1italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.1), and the graph with both edge deletions and additions (ϵ1=0.3,ϵ2=0.1formulae-sequencesubscriptitalic-ϵ10.3subscriptitalic-ϵ20.1\epsilon_{1}=0.3,\epsilon_{2}=0.1italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.3 , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.1). Each state depicts the progressive impacts of the perturbations.

In this context, the impact of the perturbation on the degree of a given node u𝒱𝑢𝒱u\in{\mathcal{V}}italic_u ∈ caligraphic_V can be computed as follows. The effect of edge deletion is represented by (𝚫ϵ1𝐀)usubscriptsubscript𝚫subscriptitalic-ϵ1𝐀𝑢(-\boldsymbol{\Delta}_{\epsilon_{1}}\circ{\mathbf{A}})_{u}( - bold_Δ start_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ bold_A ) start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, where each non-zero element in 𝐀usubscript𝐀𝑢{\mathbf{A}}_{u}bold_A start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT has a probability of ϵ1subscriptitalic-ϵ1\epsilon_{1}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT being deleted. Thus, the total number of deleted edges δusuperscriptsubscript𝛿𝑢\delta_{u}^{-}italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT is the sum of dusubscript𝑑𝑢d_{u}italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT independent and identically distributed (i.i.d.) Bernoulli random variables, each with a probability of ϵ1subscriptitalic-ϵ1\epsilon_{1}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Similarly, the effect of edge addition is denoted by (𝚫ϵ2(𝟏N×N𝐀))usubscriptsubscript𝚫subscriptitalic-ϵ2subscript1𝑁𝑁𝐀𝑢\left(\boldsymbol{\Delta}_{\epsilon_{2}}\circ(\mathbf{1}_{N\times N}-{\mathbf{% A}})\right)_{u}( bold_Δ start_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ ( bold_1 start_POSTSUBSCRIPT italic_N × italic_N end_POSTSUBSCRIPT - bold_A ) ) start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, and the total number of added edges δu+superscriptsubscript𝛿𝑢\delta_{u}^{+}italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT is the sum of dusuperscriptsubscript𝑑𝑢d_{u}^{*}italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT i.i.d. Bernoulli random variables, each with a probability of ϵ2subscriptitalic-ϵ2\epsilon_{2}italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, where du=Ndu1superscriptsubscript𝑑𝑢𝑁subscript𝑑𝑢1d_{u}^{*}=N-d_{u}-1italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_N - italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT - 1. Hence, we can express the number of deleted edges δusuperscriptsubscript𝛿𝑢\delta_{u}^{-}italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT and the number of added edges δu+superscriptsubscript𝛿𝑢\delta_{u}^{+}italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT as following binomial distributions:

δuBin(du,ϵ1),δu+Bin(du,ϵ2),formulae-sequencesimilar-tosuperscriptsubscript𝛿𝑢Binsubscript𝑑𝑢subscriptitalic-ϵ1similar-tosuperscriptsubscript𝛿𝑢Binsuperscriptsubscript𝑑𝑢subscriptitalic-ϵ2\begin{split}\delta_{u}^{-}\sim\textrm{Bin}(d_{u},\epsilon_{1}),\ \delta_{u}^{% +}\sim\textrm{Bin}(d_{u}^{*},\epsilon_{2}),\end{split}start_ROW start_CELL italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ∼ Bin ( italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ∼ Bin ( italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , end_CELL end_ROW (7)

where Bin(n,p)Bin𝑛𝑝\textrm{Bin}(n,p)Bin ( italic_n , italic_p ) represents a binomial distribution with parameters n𝑛nitalic_n and p𝑝pitalic_p.

IV Expected Bound for GSO error

IV-A Error Bound for Unnormalized GSO Using 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT Norm

Building on the foundation laid by the discussion of graph structure perturbations and the proposed error model, we now outline the primary theoretical contributions of this study. Our focus here is to detail the probabilistic bounds that help quantify the sensitivity of the GSO to graph structure perturbations. We examine the case where the adjacency matrix serves as the GSO, implying 𝐒^=𝐀^^𝐒^𝐀{\hat{\mathbf{S}}}={\hat{\mathbf{A}}}over^ start_ARG bold_S end_ARG = over^ start_ARG bold_A end_ARG and 𝐒=𝐀𝐒𝐀{\mathbf{S}}={\mathbf{A}}bold_S = bold_A. The error model derived in (6) can be expressed as

𝐄=𝐀^𝐀=𝚫ϵ1𝐀+𝚫ϵ2(𝟏N×N𝐀).𝐄^𝐀𝐀subscript𝚫subscriptitalic-ϵ1𝐀subscript𝚫subscriptitalic-ϵ2subscript1𝑁𝑁𝐀{\mathbf{E}}={\hat{\mathbf{A}}}-{\mathbf{A}}=-\boldsymbol{\Delta}_{\epsilon_{1% }}\circ{\mathbf{A}}+\boldsymbol{\Delta}_{\epsilon_{2}}\circ(\mathbf{1}_{N% \times N}-{\mathbf{A}}).bold_E = over^ start_ARG bold_A end_ARG - bold_A = - bold_Δ start_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ bold_A + bold_Δ start_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ ( bold_1 start_POSTSUBSCRIPT italic_N × italic_N end_POSTSUBSCRIPT - bold_A ) . (8)

We can link the change in degree with the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm of error term in (8) as

𝐄1=maxu𝒱𝐄u1,subscriptnorm𝐄1subscript𝑢𝒱subscriptnormsubscript𝐄𝑢1\|{\mathbf{E}}\|_{1}=\max_{u\in{\mathcal{V}}}\|{\mathbf{E}}_{u}\|_{1},∥ bold_E ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_u ∈ caligraphic_V end_POSTSUBSCRIPT ∥ bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , (9)

where

Yu𝐄u1=|𝒟u|+|𝒜u|=δu+δu+.subscript𝑌𝑢subscriptnormsubscript𝐄𝑢1subscript𝒟𝑢subscript𝒜𝑢superscriptsubscript𝛿𝑢superscriptsubscript𝛿𝑢Y_{u}\triangleq\|{\mathbf{E}}_{u}\|_{1}=|{\mathcal{D}}_{u}|+|{\mathcal{A}}_{u}% |=\delta_{u}^{-}+\delta_{u}^{+}.italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ≜ ∥ bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = | caligraphic_D start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT | + | caligraphic_A start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT | = italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT + italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT . (10)

Let Ymaxu𝒱Yu𝑌subscript𝑢𝒱subscript𝑌𝑢Y\triangleq\max_{u\in{\mathcal{V}}}Y_{u}italic_Y ≜ roman_max start_POSTSUBSCRIPT italic_u ∈ caligraphic_V end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT. Since δusuperscriptsubscript𝛿𝑢\delta_{u}^{-}italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT and δu+superscriptsubscript𝛿𝑢\delta_{u}^{+}italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT are independent random variables, it is not appropriate to give deterministic upper bounds. Instead, we present expected value bounds, which are better suited for analyzing the degree changes of nodes given the probabilistic nature of the model. Our goal is to derive a closed-form expression for the expectation of the maximum node degree error, i.e.,

𝔼[𝐄1]=𝔼[maxu𝒱𝐄u1].𝔼delimited-[]subscriptnorm𝐄1𝔼delimited-[]subscript𝑢𝒱subscriptnormsubscript𝐄𝑢1{\mathbb{E}}[\|{\mathbf{E}}\|_{1}]={\mathbb{E}}[\max_{u\in{\mathcal{V}}}\|{% \mathbf{E}}_{u}\|_{1}].blackboard_E [ ∥ bold_E ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = blackboard_E [ roman_max start_POSTSUBSCRIPT italic_u ∈ caligraphic_V end_POSTSUBSCRIPT ∥ bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] . (11)

The probability mass function (PMF) of Yusubscript𝑌𝑢Y_{u}italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT can be found by convolving the PMFs of δusuperscriptsubscript𝛿𝑢\delta_{u}^{-}italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT and δu+superscriptsubscript𝛿𝑢\delta_{u}^{+}italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT, which are independent random variables. Following binomial distributions in (7), we can obtain the following PMFs

Prδu(k)subscriptPrsuperscriptsubscript𝛿𝑢𝑘\displaystyle\text{Pr}_{\delta_{u}^{-}}(k)Pr start_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_k ) =(duk)ϵ1k(1ϵ1)duk,k=0,,du,formulae-sequenceabsentmatrixsubscript𝑑𝑢𝑘superscriptsubscriptitalic-ϵ1𝑘superscript1subscriptitalic-ϵ1subscript𝑑𝑢𝑘𝑘0subscript𝑑𝑢\displaystyle=\begin{pmatrix}d_{u}\\ k\end{pmatrix}\epsilon_{1}^{k}(1-\epsilon_{1})^{d_{u}-k},\ k=0,\ldots,d_{u},= ( start_ARG start_ROW start_CELL italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_k end_CELL end_ROW end_ARG ) italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 - italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT - italic_k end_POSTSUPERSCRIPT , italic_k = 0 , … , italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , (12)
Prδu+(k)subscriptPrsuperscriptsubscript𝛿𝑢𝑘\displaystyle\text{Pr}_{\delta_{u}^{+}}(k)Pr start_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_k ) =(duk)ϵ2k(1ϵ2)duk,k=0,,du,formulae-sequenceabsentmatrixsuperscriptsubscript𝑑𝑢𝑘superscriptsubscriptitalic-ϵ2𝑘superscript1subscriptitalic-ϵ2superscriptsubscript𝑑𝑢𝑘𝑘0superscriptsubscript𝑑𝑢\displaystyle=\begin{pmatrix}d_{u}^{*}\\ k\end{pmatrix}\epsilon_{2}^{k}(1-\epsilon_{2})^{d_{u}^{*}-k},\ k=0,\ldots,d_{u% }^{*},= ( start_ARG start_ROW start_CELL italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_k end_CELL end_ROW end_ARG ) italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 - italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_k end_POSTSUPERSCRIPT , italic_k = 0 , … , italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , (13)

where du=Ndu1superscriptsubscript𝑑𝑢𝑁subscript𝑑𝑢1d_{u}^{*}=N-d_{u}-1italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_N - italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT - 1, Prδu(k)subscriptPrsuperscriptsubscript𝛿𝑢𝑘\text{Pr}_{\delta_{u}^{-}}(k)Pr start_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_k ) and Prδu+(k)subscriptPrsuperscriptsubscript𝛿𝑢𝑘\text{Pr}_{\delta_{u}^{+}}(k)Pr start_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_k ) represent the probabilities of δusuperscriptsubscript𝛿𝑢\delta_{u}^{-}italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT and δu+superscriptsubscript𝛿𝑢\delta_{u}^{+}italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT taking the value k𝑘kitalic_k, respectively. Then, the PMF of Yusubscript𝑌𝑢Y_{u}italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT can be computed as

PrYu(k)=i=max{0,kdu}min{k,du}Prδu,δu+(i,ki)=i=max{0,kdu}min{k,du}Prδu(i)Prδu+(ki),subscriptPrsubscript𝑌𝑢𝑘superscriptsubscript𝑖0𝑘superscriptsubscript𝑑𝑢𝑘subscript𝑑𝑢subscriptPrsuperscriptsubscript𝛿𝑢superscriptsubscript𝛿𝑢𝑖𝑘𝑖superscriptsubscript𝑖0𝑘superscriptsubscript𝑑𝑢𝑘subscript𝑑𝑢subscriptPrsuperscriptsubscript𝛿𝑢𝑖subscriptPrsuperscriptsubscript𝛿𝑢𝑘𝑖\begin{split}\text{Pr}_{Y_{u}}(k)&=\sum_{i=\max\{0,k-d_{u}^{*}\}}^{\min\{k,d_{% u}\}}\text{Pr}_{\delta_{u}^{-},\delta_{u}^{+}}(i,k-i)\\ &=\sum_{i=\max\{0,k-d_{u}^{*}\}}^{\min\{k,d_{u}\}}\text{Pr}_{\delta_{u}^{-}}(i% )\text{Pr}_{\delta_{u}^{+}}(k-i),\end{split}start_ROW start_CELL Pr start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_k ) end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_i = roman_max { 0 , italic_k - italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min { italic_k , italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT } end_POSTSUPERSCRIPT Pr start_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT , italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_i , italic_k - italic_i ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_i = roman_max { 0 , italic_k - italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min { italic_k , italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT } end_POSTSUPERSCRIPT Pr start_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_i ) Pr start_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_k - italic_i ) , end_CELL end_ROW (14)

where k=0,,N1𝑘0𝑁1k=0,\ldots,N-1italic_k = 0 , … , italic_N - 1. Using (14), the cumulative distribution function (CDF) of Y𝑌Yitalic_Y is computed as

FY(k)subscriptF𝑌𝑘\displaystyle\text{F}_{Y}(k)F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_k ) =Pr(Yk)=Pr(max(Y1,,YN)k)absentPr𝑌𝑘Prsubscript𝑌1subscript𝑌𝑁𝑘\displaystyle=\text{Pr}(Y\leq k)=\text{Pr}(\max(Y_{1},\ldots,Y_{N})\leq k)= Pr ( italic_Y ≤ italic_k ) = Pr ( roman_max ( italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Y start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) ≤ italic_k )
=Pr(Y1k,,YNk)=u=1NPr(Yuk).absentPrformulae-sequencesubscript𝑌1𝑘subscript𝑌𝑁𝑘superscriptsubscriptproduct𝑢1𝑁Prsubscript𝑌𝑢𝑘\displaystyle=\text{Pr}(Y_{1}\leq k,\ldots,Y_{N}\leq k)=\prod_{u=1}^{N}\text{% Pr}(Y_{u}\leq k).= Pr ( italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_k , … , italic_Y start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ≤ italic_k ) = ∏ start_POSTSUBSCRIPT italic_u = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT Pr ( italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ≤ italic_k ) . (15)

Given that Yusubscript𝑌𝑢Y_{u}italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT for u𝒱𝑢𝒱u\in{\mathcal{V}}italic_u ∈ caligraphic_V are i.i.d. and for k=1,,N1𝑘1𝑁1k=1,\ldots,N-1italic_k = 1 , … , italic_N - 1, the CDFs for Y𝑌Yitalic_Y and Yusubscript𝑌𝑢Y_{u}italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT are as follows

FY(k)=u=1NFYu(k),FYu(k)=j=0kPrYu(j).formulae-sequencesubscriptF𝑌𝑘superscriptsubscriptproduct𝑢1𝑁subscriptFsubscript𝑌𝑢𝑘subscriptFsubscript𝑌𝑢𝑘superscriptsubscript𝑗0𝑘subscriptPrsubscript𝑌𝑢𝑗\displaystyle\text{F}_{Y}(k)=\prod_{u=1}^{N}\text{F}_{Y_{u}}(k),\quad\text{F}_% {Y_{u}}(k)=\sum_{j=0}^{k}\text{Pr}_{Y_{u}}(j).F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_k ) = ∏ start_POSTSUBSCRIPT italic_u = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_k ) , F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_k ) = ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT Pr start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_j ) . (16)

With the PMF of Y𝑌Yitalic_Y taking on a specific value k𝑘kitalic_k being PrY(k)=FY(k)FY(k1)subscriptPr𝑌𝑘subscriptF𝑌𝑘subscriptF𝑌𝑘1\text{Pr}_{Y}(k)=\text{F}_{Y}(k)-\text{F}_{Y}(k-1)Pr start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_k ) = F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_k ) - F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_k - 1 ), the expectation of Y𝑌Yitalic_Y can be represented as

𝔼[Y]𝔼delimited-[]𝑌\displaystyle{\mathbb{E}}[Y]blackboard_E [ italic_Y ] =k=1N1kPrY(k)=k=1N1k[FY(k)FY(k1)],absentsuperscriptsubscript𝑘1𝑁1𝑘subscriptPr𝑌𝑘superscriptsubscript𝑘1𝑁1𝑘delimited-[]subscriptF𝑌𝑘subscriptF𝑌𝑘1\displaystyle=\sum_{k=1}^{N-1}k\text{Pr}_{Y}(k)=\sum_{k=1}^{N-1}k\left[\text{F% }_{Y}(k)-\text{F}_{Y}(k-1)\right],= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT italic_k Pr start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_k ) = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT italic_k [ F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_k ) - F start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_k - 1 ) ] , (17)

which provides a closed-form expression for 𝔼[Y]=𝔼[𝐄1]𝔼delimited-[]𝑌𝔼delimited-[]subscriptnorm𝐄1{\mathbb{E}}[Y]={\mathbb{E}}[\|{\mathbf{E}}\|_{1}]blackboard_E [ italic_Y ] = blackboard_E [ ∥ bold_E ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ]. The variance of Y𝑌Yitalic_Y can also be given as

Var[Y]=Var[𝐄1]=𝔼[Y2](𝔼[Y])2,Vardelimited-[]𝑌Vardelimited-[]subscriptnorm𝐄1𝔼delimited-[]superscript𝑌2superscript𝔼delimited-[]𝑌2\mathrm{Var}[Y]=\mathrm{Var}[\|{\mathbf{E}}\|_{1}]={\mathbb{E}}[Y^{2}]-({% \mathbb{E}}[Y])^{2},roman_Var [ italic_Y ] = roman_Var [ ∥ bold_E ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = blackboard_E [ italic_Y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - ( blackboard_E [ italic_Y ] ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (18)

where 𝔼[Y2]=k=1N1k2PrY(k)𝔼delimited-[]superscript𝑌2superscriptsubscript𝑘1𝑁1superscript𝑘2subscriptPr𝑌𝑘{\mathbb{E}}[Y^{2}]=\sum_{k=1}^{N-1}k^{2}\text{Pr}_{Y}(k)blackboard_E [ italic_Y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Pr start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_k ).

IV-B Bridging 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT Norms in GSO Analysis

In the analysis of graph-structured data, the spectral norm (2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm), is often employed to quantify the graph spectral error. While [31] did furnish a spectral error bound for the GSO, the need for a more refined and interpretable bound persists to enable more comprehensive analyses. Following the approach of [23], this study uses the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm and assumes that the error matrix 𝐄𝐄{\mathbf{E}}bold_E is fixed. The proposed approach of bounding 𝐄norm𝐄\|{\mathbf{E}}\|∥ bold_E ∥ is based on assumptions of an undirected graph and perturbation 𝐄=𝐄𝐄superscript𝐄top{\mathbf{E}}={\mathbf{E}}^{\top}bold_E = bold_E start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. Using inequalities 𝐄2𝐄1𝐄superscriptnorm𝐄2subscriptnorm𝐄1subscriptnorm𝐄\|{\mathbf{E}}\|^{2}\leq\|{\mathbf{E}}\|_{1}\|{\mathbf{E}}\|_{\infty}∥ bold_E ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ bold_E ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ bold_E ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT [35, Section 2.3.3] and the fact that in our case 𝐄1=𝐄subscriptnorm𝐄1subscriptnorm𝐄\|{\mathbf{E}}\|_{1}=\|{\mathbf{E}}\|_{\infty}∥ bold_E ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∥ bold_E ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT, the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm can be bounded by the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm

𝐄𝐄1=maxu𝒱𝐄u1.norm𝐄subscriptnorm𝐄1subscript𝑢𝒱subscriptnormsubscript𝐄𝑢1\|{\mathbf{E}}\|\leq\|{\mathbf{E}}\|_{1}=\max_{u\in{\mathcal{V}}}\|{\mathbf{E}% }_{u}\|_{1}.∥ bold_E ∥ ≤ ∥ bold_E ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_u ∈ caligraphic_V end_POSTSUBSCRIPT ∥ bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . (19)

The entries in the error matrix 𝐄𝐄{\mathbf{E}}bold_E of equation (8) are random variables. As such, it is challenging to derive a deterministic bound for (19) that is both tight and generalizable. In contrast, an expected bound

𝔼[𝐄]𝔼[𝐄1]=𝔼[maxu𝒱𝐄u1],𝔼delimited-[]norm𝐄𝔼delimited-[]subscriptnorm𝐄1𝔼delimited-[]subscript𝑢𝒱subscriptnormsubscript𝐄𝑢1{\mathbb{E}}[\|{\mathbf{E}}\|]\leq{\mathbb{E}}[\|{\mathbf{E}}\|_{1}]={\mathbb{% E}}[\max_{u\in{\mathcal{V}}}\|{\mathbf{E}}_{u}\|_{1}],blackboard_E [ ∥ bold_E ∥ ] ≤ blackboard_E [ ∥ bold_E ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = blackboard_E [ roman_max start_POSTSUBSCRIPT italic_u ∈ caligraphic_V end_POSTSUBSCRIPT ∥ bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] , (20)

provides a more reasonable estimate of the true behavior of the error matrix, as it takes into account the distribution of the random variables, as well as the structural changes of the perturbed graph. Thus, we have the following theorem.

Theorem 1.

In the context of the probabilistic error model (8), let GSO be adjacency matrix 𝐒=𝐀𝐒𝐀{\mathbf{S}}={\mathbf{A}}bold_S = bold_A, and perturbed GSO be 𝐒^=𝐀^^𝐒^𝐀{\hat{\mathbf{S}}}={\hat{\mathbf{A}}}over^ start_ARG bold_S end_ARG = over^ start_ARG bold_A end_ARG, then, a closed-form expression for the upper bound on the expectation of the GSO distance is given by

𝔼[d(𝐒^,𝐒)]𝔼[Y],𝔼delimited-[]𝑑^𝐒𝐒𝔼delimited-[]𝑌{\mathbb{E}}\left[d({\hat{\mathbf{S}}},{\mathbf{S}})\right]\leq{\mathbb{E}}[Y],blackboard_E [ italic_d ( over^ start_ARG bold_S end_ARG , bold_S ) ] ≤ blackboard_E [ italic_Y ] , (21)

where 𝔼[Y]𝔼delimited-[]𝑌{\mathbb{E}}[Y]blackboard_E [ italic_Y ] is computed using (17), (16), and (14).

Theorem 1 provides a closed-form expression for the upper bound, which are explicitly dependent on the parameters (ϵ1,ϵ2)subscriptitalic-ϵ1subscriptitalic-ϵ2(\epsilon_{1},\epsilon_{2})( italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) of the probabilistic error model in (8). Using a loose upper bound proposed in [36], we can bound (21) as

𝔼[Y]𝔼delimited-[]𝑌\displaystyle{\mathbb{E}}[Y]blackboard_E [ italic_Y ] max1uN(duϵ1+duϵ2)absentsubscript1𝑢𝑁subscript𝑑𝑢subscriptitalic-ϵ1superscriptsubscript𝑑𝑢subscriptitalic-ϵ2\displaystyle\leq\max_{1\leq u\leq N}(d_{u}\epsilon_{1}+d_{u}^{*}\epsilon_{2})≤ roman_max start_POSTSUBSCRIPT 1 ≤ italic_u ≤ italic_N end_POSTSUBSCRIPT ( italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
+N1Nu=1N(duϵ1(1ϵ1)+duϵ2(1ϵ2)).𝑁1𝑁superscriptsubscript𝑢1𝑁subscript𝑑𝑢subscriptitalic-ϵ11subscriptitalic-ϵ1superscriptsubscript𝑑𝑢subscriptitalic-ϵ21subscriptitalic-ϵ2\displaystyle+\sqrt{\frac{N-1}{N}\sum_{u=1}^{N}\big{(}d_{u}\epsilon_{1}(1-% \epsilon_{1})+d_{u}^{*}\epsilon_{2}(1-\epsilon_{2})\big{)}}.+ square-root start_ARG divide start_ARG italic_N - 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_u = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 - italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 - italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) end_ARG . (22)

We note that (22) showcases how our bound in Theorem 1 is parameterized by the probabilities of adding and deleting edges. Thus, Theorem 1 precisely captures the resulting structural changes induced by the probabilistic error model, unlike the generic spectral bound in [31], which overlooks specific structural changes on the perturbed GSO.

Remark 1 (Why not use 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm?).

The spectral bounds derived using the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm, as presented in [31], cannot fully capture the specific structural changes to the GSO from perturbations, especially in graphs with unique properties like degree distribution or sparsity. Focused on worst-case scenarios, these bounds lead to overestimations, rendering them looser and less applicable to particular graph types. The 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm is preferred over the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm for providing an upper bound because it reveals the impact of structural changes denoted by 𝚫ϵ1subscript𝚫subscriptitalic-ϵ1\boldsymbol{\Delta}_{\epsilon_{1}}bold_Δ start_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 𝚫ϵ2subscript𝚫subscriptitalic-ϵ2\boldsymbol{\Delta}_{\epsilon_{2}}bold_Δ start_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT in (8), whereas the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm absorbs these structural changes into the overall spectral change, making it more challenging to derive a tight bound.

IV-C Error Bound for Normalized GSO

In this context, the GSO is considered as the normalized version of the adjacency matrix, i.e., 𝐒=𝐀n𝐒subscript𝐀n{\mathbf{S}}={\mathbf{A}}_{\textrm{n}}bold_S = bold_A start_POSTSUBSCRIPT n end_POSTSUBSCRIPT. The entries of the normalized adjacency matrix are as follows, [𝐀n]u,v=1dudvsubscriptdelimited-[]subscript𝐀n𝑢𝑣1subscript𝑑𝑢subscript𝑑𝑣[{\mathbf{A}}_{\textrm{n}}]_{u,v}=\frac{1}{\sqrt{d_{u}d_{v}}}[ bold_A start_POSTSUBSCRIPT n end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_u , italic_v end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG end_ARG if (u,v)𝑢𝑣(u,v)\in{\mathcal{E}}( italic_u , italic_v ) ∈ caligraphic_E, and [𝐀n]u,v=0subscriptdelimited-[]subscript𝐀n𝑢𝑣0[{\mathbf{A}}_{\textrm{n}}]_{u,v}=0[ bold_A start_POSTSUBSCRIPT n end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_u , italic_v end_POSTSUBSCRIPT = 0 if (u,v)𝑢𝑣(u,v)\not\in{\mathcal{E}}( italic_u , italic_v ) ∉ caligraphic_E. In [23], a closed form for 𝐄u1subscriptnormsubscript𝐄𝑢1\|{\mathbf{E}}_{u}\|_{1}∥ bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is proposed

𝐄u1=v𝒟u1dudv+v𝒜u1d^ud^v+vu|1dudv1d^ud^v|,subscriptdelimited-∥∥subscript𝐄𝑢1subscript𝑣subscript𝒟𝑢1subscript𝑑𝑢subscript𝑑𝑣subscript𝑣subscript𝒜𝑢1subscript^𝑑𝑢subscript^𝑑𝑣subscript𝑣subscript𝑢1subscript𝑑𝑢subscript𝑑𝑣1subscript^𝑑𝑢subscript^𝑑𝑣\begin{split}\|{\mathbf{E}}_{u}\|_{1}=\sum_{v\in{\mathcal{D}}_{u}}\dfrac{1}{% \sqrt{d_{u}d_{v}}}+\sum_{v\in{\mathcal{A}}_{u}}\dfrac{1}{\sqrt{{\hat{d}}_{u}{% \hat{d}}_{v}}}\\ +\sum_{v\in{\mathcal{R}}_{u}}\left|\dfrac{1}{\sqrt{d_{u}d_{v}}}-\dfrac{1}{% \sqrt{{\hat{d}}_{u}{\hat{d}}_{v}}}\right|,\end{split}start_ROW start_CELL ∥ bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_D start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG end_ARG + ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_A start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG end_ARG end_CELL end_ROW start_ROW start_CELL + ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_R start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG end_ARG - divide start_ARG 1 end_ARG start_ARG square-root start_ARG over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG end_ARG | , end_CELL end_ROW (23)

where d^usubscript^𝑑𝑢{\hat{d}}_{u}over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and d^vsubscript^𝑑𝑣{\hat{d}}_{v}over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT denote the degrees of node u𝑢uitalic_u and v𝑣vitalic_v after perturbation. However, the assumption in [23] states that the degree alteration d^vsubscript^𝑑𝑣{\hat{d}}_{v}over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT should not exceed twice the initial degree, i.e., d^v2dv,v{𝒩uu}formulae-sequencesubscript^𝑑𝑣2subscript𝑑𝑣𝑣subscript𝒩𝑢𝑢{\hat{d}}_{v}\leq 2d_{v},v\in\{{\mathcal{N}}_{u}\cup{u}\}over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ≤ 2 italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_v ∈ { caligraphic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∪ italic_u }. This restriction is not needed in our work. Following the error model in (6), this limitation could easily be breached with an increased probability of edge addition ϵ2subscriptitalic-ϵ2\epsilon_{2}italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. We start with the following lemma.

Lemma 1.

Let 𝐄usubscript𝐄𝑢{\mathbf{E}}_{u}bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT be defined as in (23), then its 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm is bounded by a random variable Zusubscript𝑍𝑢Z_{u}italic_Z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT

𝐄u1Zu=Zu,1+Zu,2,subscriptnormsubscript𝐄𝑢1subscript𝑍𝑢subscript𝑍𝑢1subscript𝑍𝑢2\displaystyle\|{\mathbf{E}}_{u}\|_{1}\leq Z_{u}=Z_{u,1}+Z_{u,2},∥ bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_Z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = italic_Z start_POSTSUBSCRIPT italic_u , 1 end_POSTSUBSCRIPT + italic_Z start_POSTSUBSCRIPT italic_u , 2 end_POSTSUBSCRIPT , (24)

where Zusubscript𝑍𝑢Z_{u}italic_Z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT is defined as the sum of Zu,1=du/τusubscript𝑍𝑢1subscript𝑑𝑢subscript𝜏𝑢Z_{u,1}=\sqrt{d_{u}/\tau_{u}}italic_Z start_POSTSUBSCRIPT italic_u , 1 end_POSTSUBSCRIPT = square-root start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT / italic_τ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_ARG and Zu,2=v𝒜uu1(du+δu+δu)(dv+δv+δv)subscript𝑍𝑢2subscript𝑣subscript𝒜𝑢subscript𝑢1subscript𝑑𝑢superscriptsubscript𝛿𝑢superscriptsubscript𝛿𝑢subscript𝑑𝑣superscriptsubscript𝛿𝑣superscriptsubscript𝛿𝑣Z_{u,2}=\sum_{v\in{\mathcal{A}}_{u}\cup{\mathcal{R}}_{u}}\frac{1}{\sqrt{(d_{u}% +\delta_{u}^{+}-\delta_{u}^{-})(d_{v}+\delta_{v}^{+}-\delta_{v}^{-})}}italic_Z start_POSTSUBSCRIPT italic_u , 2 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_A start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∪ caligraphic_R start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG ( italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) ( italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) end_ARG end_ARG, dusubscript𝑑𝑢d_{u}italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT is the degree of node u𝑢uitalic_u, τusubscript𝜏𝑢\tau_{u}italic_τ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT is the minimum degree of neighboring nodes of u𝑢uitalic_u, and δu,δu+,δv,δv+superscriptsubscript𝛿𝑢superscriptsubscript𝛿𝑢superscriptsubscript𝛿𝑣superscriptsubscript𝛿𝑣\delta_{u}^{-},\delta_{u}^{+},\delta_{v}^{-},\delta_{v}^{+}italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT , italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , italic_δ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT , italic_δ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT are random variables with binomial distributions as δuBin(du,ϵ1),δu+Bin(du,ϵ2),δvBin(dv,ϵ1),δv+Bin(dv,ϵ2)formulae-sequencesimilar-tosuperscriptsubscript𝛿𝑢Binsubscript𝑑𝑢subscriptitalic-ϵ1formulae-sequencesimilar-tosuperscriptsubscript𝛿𝑢Binsuperscriptsubscript𝑑𝑢subscriptitalic-ϵ2formulae-sequencesimilar-tosuperscriptsubscript𝛿𝑣Binsubscript𝑑𝑣subscriptitalic-ϵ1similar-tosuperscriptsubscript𝛿𝑣Binsuperscriptsubscript𝑑𝑣subscriptitalic-ϵ2\delta_{u}^{-}\sim\textnormal{Bin}(d_{u},\epsilon_{1}),\delta_{u}^{+}\sim% \textnormal{Bin}(d_{u}^{*},\epsilon_{2}),\delta_{v}^{-}\sim\textnormal{Bin}(d_% {v},\epsilon_{1}),\delta_{v}^{+}\sim\textnormal{Bin}(d_{v}^{*},\epsilon_{2})italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ∼ Bin ( italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ∼ Bin ( italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , italic_δ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ∼ Bin ( italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_δ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ∼ Bin ( italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) for u𝒱𝑢𝒱u\in{\mathcal{V}}italic_u ∈ caligraphic_V and v𝒜uu𝑣subscript𝒜𝑢subscript𝑢v\in{\mathcal{A}}_{u}\cup{\mathcal{R}}_{u}italic_v ∈ caligraphic_A start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∪ caligraphic_R start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, where du=Ndu1superscriptsubscript𝑑𝑢𝑁subscript𝑑𝑢1d_{u}^{*}=N-d_{u}-1italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_N - italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT - 1 and dv=Ndv1superscriptsubscript𝑑𝑣𝑁subscript𝑑𝑣1d_{v}^{*}=N-d_{v}-1italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_N - italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT - 1.

Proof.

See Appendix A. ∎

Let

Zmaxu𝒱Zu,𝑍subscript𝑢𝒱subscript𝑍𝑢\displaystyle Z\triangleq\max_{u\in{\mathcal{V}}}Z_{u},italic_Z ≜ roman_max start_POSTSUBSCRIPT italic_u ∈ caligraphic_V end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , (25)

and note that Zusubscript𝑍𝑢Z_{u}italic_Z start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and Z𝑍Zitalic_Z are discrete random variables. While the binomial random variables and degrees in the expression for Z𝑍Zitalic_Z are assumed to be i.i.d., the inherent nonlinearity and high-dimensionality in the function, along with the complexity introduced by the maximization operation over all nodes, pose challenges for deriving an analytical expression for 𝔼[Z]𝔼delimited-[]𝑍{\mathbb{E}}[Z]blackboard_E [ italic_Z ]. Furthermore, the expectation of a maximum of random variables often lacks a simple closed form with only bounds often being derivable, not the exact value. On the other hand, Monte Carlo simulations provide an efficient alternative for estimating 𝔼[Z]𝔼delimited-[]𝑍{\mathbb{E}}[Z]blackboard_E [ italic_Z ], which is given as

μZ𝔼[Z]1Nsampi=1NsampZ(i)=μ^Z,subscript𝜇𝑍𝔼delimited-[]𝑍1subscript𝑁sampsuperscriptsubscript𝑖1subscript𝑁sampsubscript𝑍𝑖subscript^𝜇𝑍\mu_{Z}\triangleq\mathbb{E}[Z]\approx\frac{1}{N_{\textrm{samp}}}\sum_{i=1}^{N_% {\textrm{samp}}}Z_{(i)}=\hat{\mu}_{Z},italic_μ start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ≜ blackboard_E [ italic_Z ] ≈ divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT samp end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT samp end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT = over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , (26)

where Z(i)subscript𝑍𝑖Z_{(i)}italic_Z start_POSTSUBSCRIPT ( italic_i ) end_POSTSUBSCRIPT represents the outcome from the i𝑖iitalic_i-th Monte Carlo trial. Thus, for the normalized GSO, we have the following proposition as the counterpart of Theorem 1.

Proposition 1.

In the context of the probabilistic error model (8), let GSO be normalized adjacency matrix 𝐒=𝐀n𝐒subscript𝐀n{\mathbf{S}}={\mathbf{A}}_{\textrm{n}}bold_S = bold_A start_POSTSUBSCRIPT n end_POSTSUBSCRIPT, and perturbed GSO being 𝐒^=𝐀^n^𝐒subscript^𝐀n{\hat{\mathbf{S}}}={\hat{\mathbf{A}}}_{\textrm{n}}over^ start_ARG bold_S end_ARG = over^ start_ARG bold_A end_ARG start_POSTSUBSCRIPT n end_POSTSUBSCRIPT. Then, an upper bound on the expectation of the GSO distance is given by

𝔼[d(𝐒^,𝐒)]𝔼[Z],𝔼delimited-[]𝑑^𝐒𝐒𝔼delimited-[]𝑍{\mathbb{E}}\left[d({\hat{\mathbf{S}}},{\mathbf{S}})\right]\leq{\mathbb{E}}[Z],blackboard_E [ italic_d ( over^ start_ARG bold_S end_ARG , bold_S ) ] ≤ blackboard_E [ italic_Z ] , (27)

where 𝔼[Z]𝔼delimited-[]𝑍{\mathbb{E}}[Z]blackboard_E [ italic_Z ] is computed using (26), (25), and Lemma 1.

The upperbound provided in Proposition 1 focuses specifically on normalized adjacency matrices. This result complements the analysis for the unnormalized case. We note that the bound for normalized GSO is not an approximation or an empirical estimation; it presents a theoretical upperbound. The only difference between the bound in Proposition 1 and the bound in Theorem 1 is the computation. As for the bound in Theorem 1 (unnormalized case), 𝔼[Y]𝔼delimited-[]𝑌{\mathbb{E}}[Y]blackboard_E [ italic_Y ] has a closed-form expression; while for computing the bound in Proposition 1 (normalized case) 𝔼[Z]𝔼delimited-[]𝑍{\mathbb{E}}[Z]blackboard_E [ italic_Z ], we use Monte Carlo simulations.

V GCNN Sensitivity

V-A Graph Filter Sensitivity Analysis

The sensitivity of graph filters is a critical aspect that follows logically from the preceding discussion on the expected bounds of GSO errors. Having extensively delved into the properties of GSO perturbations, we now turn our attention to the graph filters. Graph filters, being polynomials of GSOs, inherit the perturbations in the graph structure, manifesting as variations in filter responses.

The sensitivity of a graph filter to perturbations in the GSO is captured by the theorem below, which establishes a bound on the error in the graph filter response due to perturbations in the GSO and the filter coefficients.

Theorem 2 (Graph filter sensitivity).

Let 𝐒𝐒{\mathbf{S}}bold_S and 𝐒^^𝐒{\hat{\mathbf{S}}}over^ start_ARG bold_S end_ARG be the GSO for the true graph 𝒢𝒢{\mathcal{G}}caligraphic_G and the perturbed graph 𝒢^^𝒢\hat{{\mathcal{G}}}over^ start_ARG caligraphic_G end_ARG, respectively. The distance between polynomial graph filters 𝐡(𝐒)=k=0Khk𝐒k𝐡𝐒superscriptsubscript𝑘0𝐾subscript𝑘superscript𝐒𝑘{\mathbf{h}}({\mathbf{S}})=\sum_{k=0}^{K}h_{k}{\mathbf{S}}^{k}bold_h ( bold_S ) = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT and 𝐡(𝐒^)=k=0Khk𝐒^k𝐡^𝐒superscriptsubscript𝑘0𝐾subscript𝑘superscript^𝐒𝑘{\mathbf{h}}({\hat{\mathbf{S}}})=\sum_{k=0}^{K}h_{k}{\hat{\mathbf{S}}}^{k}bold_h ( over^ start_ARG bold_S end_ARG ) = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT is defined as

d(𝐡(𝐒^),𝐡(𝐒))=𝐡(𝐒^)𝐡(𝐒).𝑑𝐡^𝐒𝐡𝐒norm𝐡^𝐒𝐡𝐒d\big{(}{\mathbf{h}}({\hat{\mathbf{S}}}),{\mathbf{h}}({\mathbf{S}})\big{)}=\|{% \mathbf{h}}({\hat{\mathbf{S}}})-{\mathbf{h}}({\mathbf{S}})\|.italic_d ( bold_h ( over^ start_ARG bold_S end_ARG ) , bold_h ( bold_S ) ) = ∥ bold_h ( over^ start_ARG bold_S end_ARG ) - bold_h ( bold_S ) ∥ . (28)

The expectation of filter distance (28) is bounded as

𝔼[d(𝐡(𝐒^),𝐡(𝐒))]k=1Kk|hk|(λk𝔼[𝐄]+ζk),𝔼delimited-[]𝑑𝐡^𝐒𝐡𝐒superscriptsubscript𝑘1𝐾𝑘subscript𝑘subscript𝜆𝑘𝔼delimited-[]norm𝐄subscript𝜁𝑘{\mathbb{E}}\left[d\big{(}{\mathbf{h}}({\hat{\mathbf{S}}}),{\mathbf{h}}({% \mathbf{S}})\big{)}\right]\leq\sum_{k=1}^{K}k|h_{k}|\left(\lambda_{k}\mathbb{E% }[\|\mathbf{E}\|]+\zeta_{k}\right),blackboard_E [ italic_d ( bold_h ( over^ start_ARG bold_S end_ARG ) , bold_h ( bold_S ) ) ] ≤ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_k | italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | ( italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT blackboard_E [ ∥ bold_E ∥ ] + italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , (29)

where λk𝔼[λk1]subscript𝜆𝑘𝔼delimited-[]superscript𝜆𝑘1\lambda_{k}\triangleq{\mathbb{E}}[\lambda^{k-1}]italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≜ blackboard_E [ italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ], ζkCov[𝐄,λk1]subscript𝜁𝑘Covnorm𝐄superscript𝜆𝑘1\zeta_{k}\triangleq\text{Cov}[\|{\mathbf{E}}\|,\lambda^{k-1}]italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≜ Cov [ ∥ bold_E ∥ , italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ], and λ=max{𝐒^,𝐒}𝜆norm^𝐒norm𝐒\lambda={\max}\{\|{\hat{\mathbf{S}}}\|,\|{\mathbf{S}}\|\}italic_λ = roman_max { ∥ over^ start_ARG bold_S end_ARG ∥ , ∥ bold_S ∥ } denotes the largest of the maximum singular values of two GSOs.

Proof.

See Appendix B. ∎

Theorem 2 reveals that the expected graph filter distance is linearly bounded by the expected GSO distance, 𝔼[𝐄]𝔼delimited-[]norm𝐄{\mathbb{E}}\left[\|{\mathbf{E}}\|\right]blackboard_E [ ∥ bold_E ∥ ], if the sufficient condition λ=𝐒𝜆norm𝐒\lambda=\|{\mathbf{S}}\|italic_λ = ∥ bold_S ∥ is met. This bound is influenced by: the filter degree K𝐾Kitalic_K, the maximum singular value λ𝜆\lambdaitalic_λ of GSOs, and the filter coefficients {hk}k=1Ksuperscriptsubscriptsubscript𝑘𝑘1𝐾\{h_{k}\}_{k=1}^{K}{ italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT. The theorem indicates that higher order graph filters are likely to exhibit greater instability. In Section VI-B, we present a supporting experiment, specifically for low-pass graph filters with the unnormalized GSO, 𝐒=𝐀𝐒𝐀{\mathbf{S}}={\mathbf{A}}bold_S = bold_A.

V-B GCNN Sensitivity Analysis

Based on the sensitivity analysis of graph filter, we extend this study to the sensitivity analysis of the general GCNN. Instead of meticulously quantifying the specifics of each perturbed graph, we propose a probabilistic boundary that captures the potential magnitude of graph perturbations and more insightful assessment of the system’s sensitivity to graph perturbations. We present the following theorem to exemplify this approach, encapsulating the sensitivity of a general GCNN to GSO perturbations.

Theorem 3 (GCNN Sensitivity).

For a general GCNN under the probabilistic error model (8), the expected difference of outputs at the final layer L𝐿Litalic_L is given as

𝔼[𝐗^L𝐗L]CσLBL𝔼[𝐄]+CσLDL,𝔼delimited-[]normsubscript^𝐗𝐿subscript𝐗𝐿subscript𝐶subscript𝜎𝐿subscript𝐵𝐿𝔼delimited-[]norm𝐄subscript𝐶subscript𝜎𝐿subscript𝐷𝐿\displaystyle{\mathbb{E}}\left[\left\|{\hat{\mathbf{X}}}_{L}-{\mathbf{X}}_{L}% \right\|\right]\leq C_{\sigma_{L}}B_{L}{\mathbb{E}}\left[\|{\mathbf{E}}\|% \right]+C_{\sigma_{L}}D_{L},blackboard_E [ ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ∥ ] ≤ italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT blackboard_E [ ∥ bold_E ∥ ] + italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , (30)

where Cσsubscript𝐶subscript𝜎C_{\sigma_{\ell}}italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT represents the Lipschitz constant for the nonlinear activation function used at layer \ellroman_ℓ, for =1,,L1𝐿\ell=1,\ldots,Lroman_ℓ = 1 , … , italic_L, Bsubscript𝐵B_{\ell}italic_B start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and Dsubscript𝐷D_{\ell}italic_D start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT for =11\ell=1roman_ℓ = 1 and then for =2,,L2𝐿\ell=2,\ldots,Lroman_ℓ = 2 , … , italic_L are defined as follows

B1=k=1Kkλk𝐗0𝐇1k,D1=k=1Kkζk𝐗0𝐇1k,B=k=1K(λk+1Cσ1B1+kλk𝐗1)𝐇k,D=k=1K(μk,1+λkCσ1D1+kζk𝐗1)𝐇k,formulae-sequencesubscript𝐵1superscriptsubscript𝑘1𝐾𝑘subscript𝜆𝑘delimited-∥∥subscript𝐗0delimited-∥∥subscript𝐇1𝑘formulae-sequencesubscript𝐷1superscriptsubscript𝑘1𝐾𝑘subscript𝜁𝑘delimited-∥∥subscript𝐗0delimited-∥∥subscript𝐇1𝑘formulae-sequencesubscript𝐵superscriptsubscript𝑘1𝐾subscript𝜆𝑘1subscript𝐶subscript𝜎1subscript𝐵1𝑘subscript𝜆𝑘delimited-∥∥subscript𝐗1delimited-∥∥subscript𝐇𝑘subscript𝐷superscriptsubscript𝑘1𝐾subscript𝜇𝑘1subscript𝜆𝑘subscript𝐶subscript𝜎1subscript𝐷1𝑘subscript𝜁𝑘delimited-∥∥subscript𝐗1delimited-∥∥subscript𝐇𝑘\begin{split}&B_{1}=\sum_{k=1}^{K}k\lambda_{k}\|{\mathbf{X}}_{0}\|\|{\mathbf{H% }}_{1k}\|,D_{1}=\sum_{k=1}^{K}k\zeta_{k}\|{\mathbf{X}}_{0}\|\|{\mathbf{H}}_{1k% }\|,\\ &B_{\ell}=\sum_{k=1}^{K}\left(\lambda_{k+1}C_{\sigma_{\ell-1}}B_{\ell-1}+k% \lambda_{k}\|{\mathbf{X}}_{\ell-1}\|\right)\|{\mathbf{H}}_{\ell k}\|,\\ &D_{\ell}=\sum_{k=1}^{K}\left(\mu_{k,\ell-1}+\lambda_{k}C_{\sigma_{\ell-1}}D_{% \ell-1}+k\zeta_{k}\|{\mathbf{X}}_{\ell-1}\|\right)\|{\mathbf{H}}_{\ell k}\|,% \end{split}start_ROW start_CELL end_CELL start_CELL italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_k italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ ∥ bold_H start_POSTSUBSCRIPT 1 italic_k end_POSTSUBSCRIPT ∥ , italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_k italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ ∥ bold_H start_POSTSUBSCRIPT 1 italic_k end_POSTSUBSCRIPT ∥ , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_B start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT + italic_k italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_X start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT ∥ ) ∥ bold_H start_POSTSUBSCRIPT roman_ℓ italic_k end_POSTSUBSCRIPT ∥ , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_D start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_k , roman_ℓ - 1 end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT + italic_k italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_X start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT ∥ ) ∥ bold_H start_POSTSUBSCRIPT roman_ℓ italic_k end_POSTSUBSCRIPT ∥ , end_CELL end_ROW (31)

with constant μk,1Var[𝐗^1𝐗1]Var[λk]subscript𝜇𝑘1Vardelimited-[]normsubscript^𝐗1subscript𝐗1Vardelimited-[]superscript𝜆𝑘\mu_{k,\ell-1}\triangleq\sqrt{\mathrm{Var}[\|{\hat{\mathbf{X}}}_{\ell-1}-{% \mathbf{X}}_{\ell-1}\|]\mathrm{Var}[\lambda^{k}]}italic_μ start_POSTSUBSCRIPT italic_k , roman_ℓ - 1 end_POSTSUBSCRIPT ≜ square-root start_ARG roman_Var [ ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT ∥ ] roman_Var [ italic_λ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ] end_ARG, and λksubscript𝜆𝑘\lambda_{k}italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and ζksubscript𝜁𝑘\zeta_{k}italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in Theorem 2, for k=1,,K𝑘1𝐾k=1,\ldots,Kitalic_k = 1 , … , italic_K.

Proof.

See Appendix C. ∎

In Theorem 3, we use recursive bounds containing inter-layer features to simplify the formulation. Note that these inter-layer features {𝐗1,𝐗^1}=2Lsuperscriptsubscriptsubscript𝐗1subscript^𝐗12𝐿\{{\mathbf{X}}_{\ell-1},{\hat{\mathbf{X}}}_{\ell-1}\}_{\ell=2}^{L}{ bold_X start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT , over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT can be explicitly computed by the initial input feature 𝐗0subscript𝐗0{\mathbf{X}}_{0}bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, both original and perturbed GSOs (𝐒,𝐒^)𝐒^𝐒({\mathbf{S}},{\hat{\mathbf{S}}})( bold_S , over^ start_ARG bold_S end_ARG ), GCNN parameters (number of layers L𝐿Litalic_L and graph shift K𝐾Kitalic_K, network’s learned weights {𝐇k}subscript𝐇𝑘\{{\mathbf{H}}_{\ell k}\}{ bold_H start_POSTSUBSCRIPT roman_ℓ italic_k end_POSTSUBSCRIPT }, and activation functions σ()𝜎\sigma(\cdot)italic_σ ( ⋅ )). The derivation process employs induction. For the first layer =11\ell=1roman_ℓ = 1, we have 𝐗1=σ1(k=1K𝐒k𝐗0𝐇1k)subscript𝐗1subscript𝜎1superscriptsubscript𝑘1𝐾superscript𝐒𝑘subscript𝐗0subscript𝐇1𝑘{\mathbf{X}}_{1}=\sigma_{1}(\sum_{k=1}^{K}{\mathbf{S}}^{k}{\mathbf{X}}_{0}{% \mathbf{H}}_{1k})bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT 1 italic_k end_POSTSUBSCRIPT ) and 𝐗^1=σ1(k=1K𝐒^k𝐗0𝐇1k)subscript^𝐗1subscript𝜎1superscriptsubscript𝑘1𝐾superscript^𝐒𝑘subscript𝐗0subscript𝐇1𝑘{\hat{\mathbf{X}}}_{1}=\sigma_{1}(\sum_{k=1}^{K}{\hat{\mathbf{S}}}^{k}{\mathbf% {X}}_{0}{\mathbf{H}}_{1k})over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT 1 italic_k end_POSTSUBSCRIPT ); for the second layer =22\ell=2roman_ℓ = 2, the features are 𝐗2=σ2(k=1K𝐒k𝐗1𝐇2k)subscript𝐗2subscript𝜎2superscriptsubscript𝑘1𝐾superscript𝐒𝑘subscript𝐗1subscript𝐇2𝑘{\mathbf{X}}_{2}=\sigma_{2}(\sum_{k=1}^{K}{\mathbf{S}}^{k}{\mathbf{X}}_{1}{% \mathbf{H}}_{2k})bold_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT ) and 𝐗^2=σ2(k=1K𝐒^k𝐗^1𝐇2k)subscript^𝐗2subscript𝜎2superscriptsubscript𝑘1𝐾superscript^𝐒𝑘subscript^𝐗1subscript𝐇2𝑘{\hat{\mathbf{X}}}_{2}=\sigma_{2}(\sum_{k=1}^{K}{\hat{\mathbf{S}}}^{k}{\hat{% \mathbf{X}}}_{1}{\mathbf{H}}_{2k})over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT ); by induction, for the 11\ell-1roman_ℓ - 1th layer, we have

𝐗1=σ(k=1K𝐒k𝐗2𝐇1,k),𝐗^1=σ(k=1K𝐒^k𝐗^2𝐇1,k).formulae-sequencesubscript𝐗1subscript𝜎superscriptsubscript𝑘1𝐾superscript𝐒𝑘subscript𝐗2subscript𝐇1𝑘subscript^𝐗1subscript𝜎superscriptsubscript𝑘1𝐾superscript^𝐒𝑘subscript^𝐗2subscript𝐇1𝑘\begin{split}{\mathbf{X}}_{\ell-1}&=\sigma_{\ell}\left(\sum_{k=1}^{K}{\mathbf{% S}}^{k}{\mathbf{X}}_{\ell-2}{\mathbf{H}}_{\ell-1,k}\right),\\ {\hat{\mathbf{X}}}_{\ell-1}&=\sigma_{\ell}\left(\sum_{k=1}^{K}{\hat{\mathbf{S}% }}^{k}{\hat{\mathbf{X}}}_{\ell-2}{\mathbf{H}}_{\ell-1,k}\right).\end{split}start_ROW start_CELL bold_X start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT end_CELL start_CELL = italic_σ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT roman_ℓ - 2 end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT roman_ℓ - 1 , italic_k end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT end_CELL start_CELL = italic_σ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT roman_ℓ - 2 end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT roman_ℓ - 1 , italic_k end_POSTSUBSCRIPT ) . end_CELL end_ROW (32)

Theorem 3 forms the bedrock of our analysis, quantifying how GCNNs respond to graph perturbations, which is described by a linear relationship at each layer. The sensitivity of multilayer GCNN to perturbations can be represented by a recursion of linearity. For multilayer GCNN, its expected output difference is controlled by: (i) the input feature, (ii) the GSO, error model parameters, (iii) Lipschitz constants of activation functions, and (iv) GCNN weights. We note that, choosing activation functions with more conservative Lipschitz constants can possibly improve the stability of GCNNs by imposing more constraints on the recursion. However, this may suppress the performance of a neural network, as noted in [37]. Our sensitivity analysis framework is generic, allowing for simplifications such as assuming a unit Lipschitz constant and normalized input features, as suggested in [22]. However, these simplifications do not indicate that the GCNN sensitivity is unaffected by the Lipschitz constant or input features. This layered analysis also enables an understanding of how perturbations propagate through GCNN layers, impacting the overall performance. Additionally, Theorem 3 does not restrict the scale of graph perturbations, which is a typical restriction in the existing literature.

Within the evasion attack context, where the focus is on learned representations, we demonstrate the following property: given that the GSO error is bounded as in Theorem 1 and Proposition 1, the linear bound of each layer of GCNN (illustrated in Subsection VI-C1) permits the network’s stability against perturbation as long as the graph error remains within the bound. In Subsection VI-C2, we show that multilayer GCNN is stable by showing its finite responses to large scale perturbations, even under notable declines in accuracy.

V-C Specifications for GCNN variants

Building upon sensitivity analysis Theorem 3, our discussion now evolves towards two specific GCNN variants - GIN [6] and SGCN [7, 8]. They apply different GSOs for feature propagation. In GIN, the GSO for each layer is chosen as a partially augmented unnormalized adjacency matrix; in SGCN, the GSO is chosen as a normalized augmented adjacency matrix. This choice is made to align with the discussions on tight GSO bounds in Section IV. By focusing on GIN and SGCN, we are essentially extending our theoretical understanding to practical and real-world applications.

V-C1 Specification for GIN

The GIN is designed to capture the node features and the graph structure simultaneously. The primary intuition behind GIN is to learn a function of the feature information from both the target node and its neighbors, which is related to the Weisfeiler-Lehman (WL) graph isomorphism test [38]. The chosen GSO for GIN is 𝐒=𝐀+(1+ε)𝐈𝐒𝐀1𝜀𝐈{\mathbf{S}}={\mathbf{A}}+(1+\varepsilon){\mathbf{I}}bold_S = bold_A + ( 1 + italic_ε ) bold_I, where the learnable parameter ε𝜀\varepsilonitalic_ε preserves the distinction between nodes in the graph that are connected differently, and prevents GIN from reducing to a WL isomorphism test.

Given the GSO above, only the first order term with K=1𝐾1K=1italic_K = 1 in (1) is kept, and the intermediate output of such graph filter is 𝐲=𝐒𝐱𝐲𝐒𝐱{\mathbf{y}}={\mathbf{S}}{\mathbf{x}}bold_y = bold_Sx. A node Multilayer Perceptron (MLP) 𝐡𝚯subscript𝐡𝚯{\mathbf{h}}_{\boldsymbol{\Theta}}bold_h start_POSTSUBSCRIPT bold_Θ end_POSTSUBSCRIPT is then applied to the filter’s output as 𝐡𝚯(𝐲)subscript𝐡𝚯𝐲{\mathbf{h}}_{\boldsymbol{\Theta}}({\mathbf{y}})bold_h start_POSTSUBSCRIPT bold_Θ end_POSTSUBSCRIPT ( bold_y ). Assuming the inner MLP has two layers in each GIN layer, a single-layer GIN (L=1𝐿1L=1italic_L = 1) can be represented as

𝐗L=σL2(σL1(𝐒𝐗L1𝐖L1+𝐁L1)𝐖L2+𝐁L2),subscript𝐗𝐿subscript𝜎𝐿2subscript𝜎𝐿1subscript𝐒𝐗𝐿1subscript𝐖𝐿1subscript𝐁𝐿1subscript𝐖𝐿2subscript𝐁𝐿2\displaystyle{\mathbf{X}}_{L}=\sigma_{L2}(\sigma_{L1}({\mathbf{S}}{\mathbf{X}}% _{L-1}{\mathbf{W}}_{L1}+{\mathbf{B}}_{L1}){\mathbf{W}}_{L2}+{\mathbf{B}}_{L2}),bold_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ( italic_σ start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ( bold_SX start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT + bold_B start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ) bold_W start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT + bold_B start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ) , (33)

where (𝐖L1,𝐁L1,σL1())subscript𝐖𝐿1subscript𝐁𝐿1subscript𝜎𝐿1\left({\mathbf{W}}_{L1},{\mathbf{B}}_{L1},\sigma_{L1}(\cdot)\right)( bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ( ⋅ ) ) are weight matrix, bias matrix, and nonlinearity function in the first layer of the MLP, and (𝐖L2,𝐁L2,σL2())subscript𝐖𝐿2subscript𝐁𝐿2subscript𝜎𝐿2\left({\mathbf{W}}_{L2},{\mathbf{B}}_{L2},\sigma_{L2}(\cdot)\right)( bold_W start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ( ⋅ ) ) are weight matrix, bias matrix, and nonlinearity function in the second layer of the MLP. Then, we provide the following corollary.

Corollary 1 (The sensitivity of single-layer GIN).

For the single-layer GIN (L=1𝐿1L=1italic_L = 1) in (33) under the probabilistic error model (8), the expected difference of outputs because of GSO perturbations is given as

𝔼[𝐗^L𝐗L]ξ𝔼[𝐄],𝔼delimited-[]normsubscript^𝐗𝐿subscript𝐗𝐿𝜉𝔼delimited-[]norm𝐄\displaystyle{\mathbb{E}}\left[\|{\hat{\mathbf{X}}}_{L}-{\mathbf{X}}_{L}\|% \right]\leq\xi{\mathbb{E}}\left[\|{\mathbf{E}}\|\right],blackboard_E [ ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ∥ ] ≤ italic_ξ blackboard_E [ ∥ bold_E ∥ ] , (34)

with constant

ξ=CσL2CσL1𝐖L2𝐖L1𝐗L1,𝜉subscript𝐶subscript𝜎𝐿2subscript𝐶subscript𝜎𝐿1normsubscript𝐖𝐿2normsubscript𝐖𝐿1normsubscript𝐗𝐿1\displaystyle\xi=C_{\sigma_{L2}}C_{\sigma_{L1}}\|{\mathbf{W}}_{L2}\|\|{\mathbf% {W}}_{L1}\|\|{\mathbf{X}}_{L-1}\|,italic_ξ = italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ bold_W start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ∥ ∥ bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ∥ ∥ bold_X start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT ∥ , (35)

where 𝐗L1=𝐗0subscript𝐗𝐿1subscript𝐗0{\mathbf{X}}_{L-1}={\mathbf{X}}_{0}bold_X start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT = bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the input feature.

Proof.

See Appendix D. ∎

Corollary 1 shows a linear dependency between the output difference of a single-layer GIN and GSO perturbations. In GIN, node vector transformations by MLP contribute significantly to network’s expressivity. Under evasion attacks, with Corollary 1, the analysis of these transformed node representations is straightforward.

V-C2 Specification for SGCN

The SGCN is a streamlined model, developed by aiming to simplify a multilayered GCNN through the utilization of an affine approximation of graph convolution filter and the elimination of intermediate layer activation functions. The GSO chosen for SGCN is 𝐒=𝐃~1/2𝐀~𝐃~1/2𝐒superscript~𝐃12~𝐀superscript~𝐃12{\mathbf{S}}={\tilde{\mathbf{D}}}^{-1/2}{\tilde{\mathbf{A}}}{\tilde{\mathbf{D}% }}^{-1/2}bold_S = over~ start_ARG bold_D end_ARG start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over~ start_ARG bold_A end_ARG over~ start_ARG bold_D end_ARG start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT, where 𝐀~=𝐀+𝐈~𝐀𝐀𝐈{\tilde{\mathbf{A}}}={\mathbf{A}}+{\mathbf{I}}over~ start_ARG bold_A end_ARG = bold_A + bold_I is the augmented adjacency matrix and 𝐃~~𝐃{\tilde{\mathbf{D}}}over~ start_ARG bold_D end_ARG is the corresponding degree matrix of the augmented graph.

Given the normalized augmented GSO, the node degrees du,u=1,,Nformulae-sequencesubscript𝑑𝑢𝑢1𝑁d_{u},u=1,\ldots,Nitalic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_u = 1 , … , italic_N are redefined based on the augmented GSO, specifically, they are incremented by 1111 compared to their values in the non-augmented version. This streamlined model simplifies the structure of a vanilla GCN [5] by retaining a single layer and the K𝐾Kitalic_Kth order GSO in (1), so the output of the filter is 𝐲=hK𝐒K𝐱𝐲subscript𝐾superscript𝐒𝐾𝐱{\mathbf{y}}=h_{K}{\mathbf{S}}^{K}{\mathbf{x}}bold_y = italic_h start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT bold_S start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT bold_x. Note that for a SGCN, the maximum number of layers is L=1𝐿1L=1italic_L = 1. Consequently, the output of a single-layer SGCN using a linear logistic regression layer is represented as

𝐗L=σL(𝐒K𝐗𝐇K),subscript𝐗𝐿subscript𝜎𝐿superscript𝐒𝐾subscript𝐗𝐇𝐾{\mathbf{X}}_{L}=\sigma_{L}({\mathbf{S}}^{K}{\mathbf{X}}{\mathbf{H}}_{K}),bold_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( bold_S start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT bold_XH start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) , (36)

and thus, we can easily give the following corollary.

Corollary 2 (The sensitivity of SGCN).

For the SGCN in (36) under the probabilistic error model (8), the expected difference of outputs because of GSO perturbations is given as

𝔼[𝐗^L𝐗L]CσLBL𝔼[𝐄]+CσLDL,𝔼delimited-[]normsubscript^𝐗𝐿subscript𝐗𝐿subscript𝐶subscript𝜎𝐿subscript𝐵𝐿𝔼delimited-[]norm𝐄subscript𝐶subscript𝜎𝐿subscript𝐷𝐿\displaystyle{\mathbb{E}}\left[\|{\hat{\mathbf{X}}}_{L}-{\mathbf{X}}_{L}\|% \right]\leq C_{\sigma_{L}}B_{L}{\mathbb{E}}\left[\|{\mathbf{E}}\|\right]+C_{% \sigma_{L}}D_{L},blackboard_E [ ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ∥ ] ≤ italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT blackboard_E [ ∥ bold_E ∥ ] + italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , (37)

where BL=λK𝐗𝐇Ksubscript𝐵𝐿subscript𝜆𝐾norm𝐗normsubscript𝐇𝐾B_{L}=\lambda_{K}\|{\mathbf{X}}\|\|{\mathbf{H}}_{K}\|italic_B start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ∥ bold_X ∥ ∥ bold_H start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ∥, DL=KζK𝐗𝐇Ksubscript𝐷𝐿𝐾subscript𝜁𝐾norm𝐗normsubscript𝐇𝐾D_{L}=K\zeta_{K}\|{\mathbf{X}}\|\|{\mathbf{H}}_{K}\|italic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = italic_K italic_ζ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ∥ bold_X ∥ ∥ bold_H start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ∥, λKsubscript𝜆𝐾\lambda_{K}italic_λ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT and ζKsubscript𝜁𝐾\zeta_{K}italic_ζ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT are defined in Theorem 3.

With Corollary 2, we conclude that the sensitivity analysis for SGCN is a specification for the general form of a multilayer GCNN.

VI Numerical Experiments

Refer to caption
Figure 2: Comparative analysis of our bound in Theorem 1, the deterministic bound in Theorem 2 of [31], and the empirical GSO distance in 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm.
Refer to caption
Figure 3: Theoretical (bound in Thm. 1) and empirical bounds (1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norms) for the perturbed Cora graph with 𝐒=𝐀𝐒𝐀{\mathbf{S}}={\mathbf{A}}bold_S = bold_A. Left panel: varying ϵ1subscriptitalic-ϵ1\epsilon_{1}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT with fixed ϵ2=0subscriptitalic-ϵ20\epsilon_{2}=0italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0. Right panel: varying ϵ2subscriptitalic-ϵ2\epsilon_{2}italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT with fixed ϵ1=0.5subscriptitalic-ϵ10.5\epsilon_{1}=0.5italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.5.
Refer to caption
Figure 4: Theoretical (bound in Prop. 1) and empirical bounds (1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norms) for the perturbed Cora graph with 𝐒=𝐀n𝐒subscript𝐀n{\mathbf{S}}={\mathbf{A}}_{\textrm{n}}bold_S = bold_A start_POSTSUBSCRIPT n end_POSTSUBSCRIPT, under identical (ϵ1,ϵ2)subscriptitalic-ϵ1subscriptitalic-ϵ2(\epsilon_{1},\epsilon_{2})( italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) settings as Fig. 3.
Refer to caption
Figure 5: Comparison of Theorem 2 bounds (solid lines) and empirical GF distances (scatter points) with fixed ϵ2=0.05subscriptitalic-ϵ20.05\epsilon_{2}=0.05italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.05 and varying ϵ1subscriptitalic-ϵ1\epsilon_{1}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in [0.1,0.2,0.3]0.10.20.3[0.1,0.2,0.3][ 0.1 , 0.2 , 0.3 ].
Refer to caption
Figure 6: Expected GF output differences under increasing GF order and perturbation budget, illustrating intensified sensitivity along increased GF order and the alignment of Theorem 1 bound with empirical GSO distance trends.

VI-A Theoretical GSO Bound Corroboration

VI-A1 Synthetic graph

We consider a two-group planted partition model (PPM), which is a special case of the stochastic block model. Parameters are set with in-group probability to pin=0.8subscript𝑝in0.8p_{\rm in}=0.8italic_p start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT = 0.8, and between-group probability to pbet=0.5subscript𝑝bet0.5p_{\rm bet}=0.5italic_p start_POSTSUBSCRIPT roman_bet end_POSTSUBSCRIPT = 0.5. The GSO is set as the unnormalized adjacency matrix 𝐒=𝐀𝐒𝐀{\mathbf{S}}={\mathbf{A}}bold_S = bold_A. We perturb the PPM graph using the probabilistic error model (6) with two scales of perturbation budgets:

  • Small-scale perturbation (see Fig. 2, left panel): With ϵ1=0.1subscriptitalic-ϵ10.1\epsilon_{1}=0.1italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.1 and ϵ2=0.01subscriptitalic-ϵ20.01\epsilon_{2}=0.01italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.01, the graph is slightly altered, preserving its fundamental structure.

  • Large-scale perturbation (see Fig. 2, right panel): With ϵ1=0.5subscriptitalic-ϵ10.5\epsilon_{1}=0.5italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.5 and ϵ2=0.1subscriptitalic-ϵ20.1\epsilon_{2}=0.1italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.1, the graph is under significant structural changes.

We carry out 101 Monte Carlo trials for varying graph sizes (ranging from 50505050 to 1000100010001000, in 50505050-node increments). These simulations evaluate the expected bound from Theorem 1 and the deterministic bound from [31, Theorem 2] in relation to graph size. Comparisons with empirical GSO distances (5), calculated using the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm, reveal that our expectation bound is consistently tighter than the deterministic counterpart from [31]. This difference arises due to the consideration of degree changes and the probabilistic nature of our bound, as opposed to the worst-case scenario focus of the deterministic bound. Another observation is the increased bound magnitude correlating with higher perturbation budgets, as depicted in Fig. 2. Both bounds remain valid, even in high perturbation scenarios, underscoring the robustness of our theoretical frameworks.

VI-A2 Real-life graph

We utilize the undirected Cora citation graph [39], which comprises N=2708𝑁2708N=2708italic_N = 2708 nodes, and C=7𝐶7C=7italic_C = 7 classes. Assuming the undirected nature of the underlying graph, we modify the original Cora graph from a directed to an undirected one. The undirected Cora graph has ||=52785278|{\mathcal{E}}|=5278| caligraphic_E | = 5278 edges. We ascertain the evolution of our theoretical bounds against an increase in edge deletion probability ϵ1subscriptitalic-ϵ1\epsilon_{1}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and edge addition probability ϵ2subscriptitalic-ϵ2\epsilon_{2}italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. These alterations are systematically tracked along with using the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norms of the discrepancy between the original and perturbed graphs.

The range of ϵ1subscriptitalic-ϵ1\epsilon_{1}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ϵ2subscriptitalic-ϵ2\epsilon_{2}italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is set within [3×102,3×101]3superscript1023superscript101[3\times 10^{-2},3\times 10^{-1}][ 3 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT , 3 × 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ], increasing in steps of 3×1023superscript1023\times 10^{-2}3 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT. In each step, we compute the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norms of the difference between the original and perturbed adjacency matrices. We then compare these empirical results with the theoretical bounds provided in Theorem 1 and Proposition 1. In Fig. 3, with the GSO as the unnormalized adjacency matrix 𝐒=𝐀𝐒𝐀{\mathbf{S}}={\mathbf{A}}bold_S = bold_A, two distinct scenarios are presented: varying ϵ1subscriptitalic-ϵ1\epsilon_{1}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT with ϵ2=0subscriptitalic-ϵ20\epsilon_{2}=0italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0 (left panel), and varying ϵ2subscriptitalic-ϵ2\epsilon_{2}italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT with ϵ1=0.5subscriptitalic-ϵ10.5\epsilon_{1}=0.5italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.5 (right panel). Through 101 Monte Carlo trials, the theoretical bound closely aligns with the empirical 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm, particularly in scenarios where increased ϵ2subscriptitalic-ϵ2\epsilon_{2}italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT leads to denser graphs. This trend suggests that enhanced precision of the bounds as graph densities shift from sparse to dense.

In Fig. 4, employing the normalized adjacency matrix 𝐒=𝐀n𝐒subscript𝐀n{\mathbf{S}}={\mathbf{A}}_{\textrm{n}}bold_S = bold_A start_POSTSUBSCRIPT n end_POSTSUBSCRIPT as the GSO, a similar analysis is conducted. In the left panel, an increase in 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm bounds is observed under rising error, and Proposition 1 gives a stable upper bound. However, the accuracy of the bound is comparatively less satisfactory in the normalized case. The right-hand case illustrates a stable empirical 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm with an increasing number of edges, while the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm and our bound present slight increases and decreases, respectively. These observations can be attributed to the following factors: (i) the normalization operation keeps the adjacency matrix operator norm around 1; (ii) an increased number of edges raises the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm; (iii) increases in the denominator in Lemma 1 result in a general decrease in the bound.

VI-B GF Sensitivity Test

In this experiment, we evaluate the sensitivity of GF to the probabilistic error model. We employ an ER graph with N=100𝑁100N=100italic_N = 100 nodes and a connection probability of 0.10.10.10.1 as the baseline graph. The GSO is set as the unnormalized adjacency matrix 𝐒=𝐀𝐒𝐀{\mathbf{S}}={\mathbf{A}}bold_S = bold_A. Our focus is on the relationship between filter distance and the bound in Theorem 2 for low pass GFs of orders K=1,2,3𝐾123K=1,2,3italic_K = 1 , 2 , 3. The findings are presented in Figs. 5 and 6.

In Fig. 5, the edge addition probability is fixed as ϵ2=0.05subscriptitalic-ϵ20.05\epsilon_{2}=0.05italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.05 and the edge deletion probability ϵ1subscriptitalic-ϵ1\epsilon_{1}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT varies among [0.1,0.2,0.3]0.10.20.3[0.1,0.2,0.3][ 0.1 , 0.2 , 0.3 ]. Over 101101101101 Monte Carlo trials, we plot the empirical GF distances d(𝐡(𝐒^,𝐒))𝑑𝐡^𝐒𝐒d({\mathbf{h}}({\hat{\mathbf{S}}},{\mathbf{S}}))italic_d ( bold_h ( over^ start_ARG bold_S end_ARG , bold_S ) ) alongside the corresponding GSO distances d(𝐒^,𝐒)=𝐄𝑑^𝐒𝐒norm𝐄d({\hat{\mathbf{S}}},{\mathbf{S}})=\|{\mathbf{E}}\|italic_d ( over^ start_ARG bold_S end_ARG , bold_S ) = ∥ bold_E ∥ as scatter plots. These empirical GF distances demonstrate the linear scaling with the bounds in Theorem 2, depicted as solid lines. It is noted that the tightness of these bounds decreases with an increase in the GF order. The primary aim of this analysis is to confirm the linear relationship in Theorem 2.

In Fig. 6, the expected output differences of GFs 𝔼[d(𝐡(𝐒^),𝐡(𝐒))]𝔼delimited-[]𝑑𝐡^𝐒𝐡𝐒{\mathbb{E}}[d({\mathbf{h}}({\hat{\mathbf{S}}}),{\mathbf{h}}({\mathbf{S}}))]blackboard_E [ italic_d ( bold_h ( over^ start_ARG bold_S end_ARG ) , bold_h ( bold_S ) ) ] with orders K=1,2,3𝐾123K=1,2,3italic_K = 1 , 2 , 3 are plotted against the expected GSO differences 𝔼[d(𝐒^,𝐒)]𝔼delimited-[]𝑑^𝐒𝐒{\mathbb{E}}[d({\hat{\mathbf{S}}},{\mathbf{S}})]blackboard_E [ italic_d ( over^ start_ARG bold_S end_ARG , bold_S ) ] and the bound in Theorem 1. Over 101101101101 Monte Carlo trials with perturbation probabilities ϵ1[0,0.3]subscriptitalic-ϵ100.3\epsilon_{1}\in[0,0.3]italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ [ 0 , 0.3 ] and ϵ2[0,0.05]subscriptitalic-ϵ200.05\epsilon_{2}\in[0,0.05]italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ [ 0 , 0.05 ], the left panel shows that output differences increase with the GF order. The right panel confirms that the bound 𝔼[Y]𝔼delimited-[]𝑌{\mathbb{E}}[Y]blackboard_E [ italic_Y ] captures trends similar to the empirical expectation of GSO distance, corroborating Theorem 1. This suggests that for small, sparsely connected graphs, the sensitivity of a low pass GF to perturbations intensifies as its order increases.

VI-C GCNN Sensitivity Test

Refer to caption
Figure 7: Correlation between GIN (left panel) and SGCN (right panel) output differences and GSO distances. Analysis is conducted with varying edge deletion probabilities ϵ1subscriptitalic-ϵ1\epsilon_{1}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and a fixed edge addition probability ϵ2=1×104subscriptitalic-ϵ21superscript104\epsilon_{2}=1\times 10^{-4}italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT.

VI-C1 Linearity corroboration

The experimental validation of Theorem 3 is conducted using GIN (Corollary 1) and SGCN (Corollary 2). We note that Corollary 1 is only applicable for the single-layer GIN (L=1𝐿1L=1italic_L = 1). For the multi-layer GIN, our experiments show the recursion of linearity indicated in Theorem 3 empirically (see left panel of Fig. 7). These experiments are carried out on the Cora citation dataset, as discussed in Section VI-A, to assess the sensitivity of GIN and SGCN to perturbed GSOs under evasion attacks.

In Fig. 7, for GIN (left panel), each layer comprises 16161616 hidden features. GIN variants with 1111, 2222, and 3333 layers differ only in the number of cascaded graph filters with MLPs. We investigate the correlation between empirical GIN output differences and GSO distances. The edge deletion probability, ϵ1subscriptitalic-ϵ1\epsilon_{1}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, is varied within [5×102,3×101]5superscript1023superscript101[5\times 10^{-2},3\times 10^{-1}][ 5 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT , 3 × 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ] in increments of 5×1025superscript1025\times 10^{-2}5 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT, while the edge addition probability is fixed as ϵ2=1×104subscriptitalic-ϵ21superscript104\epsilon_{2}=1\times 10^{-4}italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. The results, categorized by edge deletion probability ϵ1subscriptitalic-ϵ1\epsilon_{1}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, are obtained from 101 Monte Carlo trials, computing pairs of bounds and GIN output differences. For SGCN (right panel), we examine networks of orders K=[1,2,3]𝐾123K=[1,2,3]italic_K = [ 1 , 2 , 3 ] using a similar approach. Empirical observations for L=1,2,3𝐿123L=1,2,3italic_L = 1 , 2 , 3 and K=1,2,3𝐾123K=1,2,3italic_K = 1 , 2 , 3 in GIN and SGCN demonstrate a linear correlation between output differences and GSO distances, corroborating the theoretical frameworks in Corollary 1 and Corollary 2.

Notably, the output differences observed in the two cases operate on different scales. For the SGCN with normalized GSO (right panel), the variation in output differences with increasing perturbation probability is more gradual compared to the unnormalized GSO used in GIN (left panel), which shows a steeper change. This discrepancy is likely due to the influence of the estimated GSO spectral norm λ𝜆\lambdaitalic_λ.

VI-C2 Accuracy drop under perturbation

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(a) GIN
Refer to caption
(b) SGCN
Figure 8: Accuracy changes for GIN (L=1,2,3𝐿123L=1,2,3italic_L = 1 , 2 , 3) and SGCN (K=1,2,3𝐾123K=1,2,3italic_K = 1 , 2 , 3) under perturbations, with ϵ1[0,0.5]subscriptitalic-ϵ100.5\epsilon_{1}\in[0,0.5]italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ [ 0 , 0.5 ] and ϵ2[0,1×103]subscriptitalic-ϵ201superscript103\epsilon_{2}\in[0,1\times 10^{-3}]italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ [ 0 , 1 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT ]. The 1st, 2nd and 3rd rows correspond to Cora, CiteSeer, and PubMed datasets, respectively.

After affirming the linear sensitivity in Theorem 3, we also examine the stability of GCNN under significant graph perturbations by observing the accuracy changes of same GCNN candidates as in Section VI-C1.

These experiments are conducted on three citation datasets: Cora, CiteSeer and PubMed [39]. The objective is to assess the impact of different perturbation budgets on the accuracy of GIN and SGCN models. The perturbation budget parameters are set as follows: edge deletion probability ϵ1subscriptitalic-ϵ1\epsilon_{1}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT varies within [0,0.5]00.5[0,0.5][ 0 , 0.5 ] in increments of 0.10.10.10.1, and edge addition probability ϵ2subscriptitalic-ϵ2\epsilon_{2}italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT varies within [0,1×103]01superscript103[0,1\times 10^{-3}][ 0 , 1 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT ] in increments of 2×1042superscript1042\times 10^{-4}2 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. Consistent with the experimental settings in Section VI-C1, the same GCNN candidates are utilized. The averaged accuracy results are shown in Fig. 8, where the bar indicates the standard variance of accuracy results. The first, second and third rows correspond to datasets Cora, CiteSeer and PubMed, respectively.

A consistent pattern of accuracy decrease across all datasets and GCNN models is observed in Fig. 8, where the accuracy gradually decreases with increasing perturbation budgets. Notably, larger graphs (e.g., PubMed) exhibit a faster accuracy drop compared to smaller graphs (e.g., Cora and CiteSeer). This can be attributed to the alteration of more edges under the same perturbation budget in larger graphs. When fixing edge deletion probability ϵ1subscriptitalic-ϵ1\epsilon_{1}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, accuracy drops by approximately 10%percent1010\%10 % (as in Fig. 8a, 1st row with L=1𝐿1L=1italic_L = 1), and up to 20%percent2020\%20 % (as in Fig. 8a, 3rd row with L=3𝐿3L=3italic_L = 3). With a fixed edge addition probability ϵ2subscriptitalic-ϵ2\epsilon_{2}italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, the accuracy drop is around 10%percent1010\%10 % (as in Fig. 8a, 1st row with L=1𝐿1L=1italic_L = 1), and approximately 5%percent55\%5 % (as in Fig. 8a, 3rd row with L=1𝐿1L=1italic_L = 1). This is likely because that, for sparse graphs, the same edge addition probability results in the addition of more edges than the number influenced by the same edge deletion probability.

The maximum of edge perturbation budget ϵ1subscriptitalic-ϵ1\epsilon_{1}italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ϵ2subscriptitalic-ϵ2\epsilon_{2}italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is set to 0.50.50.50.5 and 1×1031superscript1031\times 10^{-3}1 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, respectively. Consequently, up to 50%percent5050\%50 % of the edges are deleted, and 70%percent7070\%70 % are added relative to the original edge count. In this case, the graph structure is significantly perturbed. This significant graph perturbation makes the accuracy drop by up to 20%percent2020\%20 %. Under such large perturbations, GCNN gives finite responses. Thus, the GCNN is stable in our context even when the downstream task performance is significantly impacted, which is due to large-scale edge perturbations. This also verifies Theorem 3, where it is stated that as long as the GSO perturbation is bounded/finite, the GCNN output difference is also bounded/finite.

VII Conclusion and Discussion

This paper has presented an analytical framework for investigating the sensitivity of GCNNs to GSO perturbations, employing a probabilistic graph perturbation model. We have established tighter error bounds than those previously available. We have theoretically demonstrated that the expected output variation for a single layer of GCNN is linearly bounded by the GSO error, ensuring the stability (bounded output difference) of single-layer GCNN under bounded GSO errors. For multilayer GCNN, our analysis has shown that the dependency of GCNN output difference on GSO error can be described through a recursion of linearity. Specifically, this dependency is explicitly controlled by: the input feature, the GSO, error model parameters, Lipschitz constants of activation functions in GCNN, and GCNN weights. Through numerical experiments, we have validated our theoretical findings and confirmed that GCNNs (exemplified with GIN and SGCN) maintain stability under large-scale graph edge perturbations, despite significant performance reductions.

In this work, our primary focus is on edge perturbations in graphs, while potential modifications to the graph signal and node injections are not considered. Any alterations to the graph signal could be subsumed within the spectral norm when performing sensitivity analysis. However, node injection presents a challenge that cannot be addressed using the current definition of graph distance. This is due to the discrepancy in sizes between the unperturbed and perturbed graphs as the number of nodes increases. A potential solution to this issue could involve redefining the GSO distance using a different metric. In this context, Optimal Transport (OT) and its variants emerge as viable candidates for this task [40, 41, 42]. These methods allow for the augmentation of a smaller graph, facilitating the establishment of a meaningful graph distance metric [43]. Consequently, future research could explore an encompassing approach that considers all of the aforementioned types of graph perturbations. Such an investigation has the potential to yield more comprehensive insights into the stability of GCNNs under perturbations.

Graph regularization methods are commonly used to achieve robust graph learning and estimation [44]. Research on adversarial training of GCNNs typically uses specifically designed loss functions to strengthen GCNNs against structural and feature perturbations, thus improving their performance stability against certain graph disturbances [45, 46, 47, 48, 49]. In graph learning, several techniques have been developed to regulate graphs and signals based on specific graph signal assumptions to perform graph estimation [15, 16, 50, 51]. With the inclusion of effective graph regularization, our sensitivity analysis offers insight that can contribute to the development of a uniform metric, paving the way for a more transferable and robust GCNN.

Appendix A Upper Bound of 𝐄u1subscriptnormsubscript𝐄𝑢1\|{\mathbf{E}}_{u}\|_{1}∥ bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT

Proof of Lemma 1.

We start with the first term in (23), which is bounded by τudvsubscript𝜏𝑢subscript𝑑𝑣\tau_{u}\leq d_{v}italic_τ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ≤ italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT

v𝒟u1dudvv𝒟u1duτu=δuduτu.subscript𝑣subscript𝒟𝑢1subscript𝑑𝑢subscript𝑑𝑣subscript𝑣subscript𝒟𝑢1subscript𝑑𝑢subscript𝜏𝑢superscriptsubscript𝛿𝑢subscript𝑑𝑢subscript𝜏𝑢\sum_{v\in{\mathcal{D}}_{u}}\dfrac{1}{\sqrt{d_{u}d_{v}}}\leq\sum_{v\in{% \mathcal{D}}_{u}}\dfrac{1}{\sqrt{d_{u}\tau_{u}}}=\dfrac{\delta_{u}^{-}}{\sqrt{% d_{u}\tau_{u}}}.∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_D start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG end_ARG ≤ ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_D start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_ARG end_ARG = divide start_ARG italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_ARG end_ARG . (38)

The second and third terms in (23) can be bounded using triangle inequality as follows

v𝒜u1d^ud^v+vu|1dudv1d^ud^v|subscript𝑣subscript𝒜𝑢1subscript^𝑑𝑢subscript^𝑑𝑣subscript𝑣subscript𝑢1subscript𝑑𝑢subscript𝑑𝑣1subscript^𝑑𝑢subscript^𝑑𝑣\displaystyle\sum_{v\in{\mathcal{A}}_{u}}\dfrac{1}{\sqrt{{\hat{d}}_{u}{\hat{d}% }_{v}}}+\sum_{v\in{\mathcal{R}}_{u}}\left|\dfrac{1}{\sqrt{d_{u}d_{v}}}-\dfrac{% 1}{\sqrt{{\hat{d}}_{u}{\hat{d}}_{v}}}\right|∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_A start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG end_ARG + ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_R start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG end_ARG - divide start_ARG 1 end_ARG start_ARG square-root start_ARG over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG end_ARG |
v𝒜u1d^ud^v+vu(1dudv+1d^ud^v)absentsubscript𝑣subscript𝒜𝑢1subscript^𝑑𝑢subscript^𝑑𝑣subscript𝑣subscript𝑢1subscript𝑑𝑢subscript𝑑𝑣1subscript^𝑑𝑢subscript^𝑑𝑣\displaystyle\leq\sum_{v\in{\mathcal{A}}_{u}}\dfrac{1}{\sqrt{{\hat{d}}_{u}{% \hat{d}}_{v}}}+\sum_{v\in{\mathcal{R}}_{u}}\left(\dfrac{1}{\sqrt{d_{u}d_{v}}}+% \dfrac{1}{\sqrt{{\hat{d}}_{u}{\hat{d}}_{v}}}\right)≤ ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_A start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG end_ARG + ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_R start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG end_ARG + divide start_ARG 1 end_ARG start_ARG square-root start_ARG over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG end_ARG )
=vu1dudv+v𝒜uu1d^ud^v.absentsubscript𝑣subscript𝑢1subscript𝑑𝑢subscript𝑑𝑣subscript𝑣subscript𝒜𝑢subscript𝑢1subscript^𝑑𝑢subscript^𝑑𝑣\displaystyle=\sum_{v\in{\mathcal{R}}_{u}}\dfrac{1}{\sqrt{d_{u}d_{v}}}+\sum_{v% \in{\mathcal{A}}_{u}\cup{\mathcal{R}}_{u}}\dfrac{1}{\sqrt{{\hat{d}}_{u}{\hat{d% }}_{v}}}.= ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_R start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG end_ARG + ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_A start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∪ caligraphic_R start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG end_ARG . (39)

For the first term in (A), we have

vu1dudvvu1duτuduδuduτu.subscript𝑣subscript𝑢1subscript𝑑𝑢subscript𝑑𝑣subscript𝑣subscript𝑢1subscript𝑑𝑢subscript𝜏𝑢subscript𝑑𝑢superscriptsubscript𝛿𝑢subscript𝑑𝑢subscript𝜏𝑢\displaystyle\sum_{v\in{\mathcal{R}}_{u}}\dfrac{1}{\sqrt{d_{u}d_{v}}}\leq\sum_% {v\in{\mathcal{R}}_{u}}\dfrac{1}{\sqrt{d_{u}\tau_{u}}}\leq\dfrac{d_{u}-\delta_% {u}^{-}}{\sqrt{d_{u}\tau_{u}}}.∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_R start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG end_ARG ≤ ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_R start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_ARG end_ARG ≤ divide start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT - italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_ARG end_ARG . (40)

For the second term in (A), we have

v𝒜uu1d^ud^vsubscript𝑣subscript𝒜𝑢subscript𝑢1subscript^𝑑𝑢subscript^𝑑𝑣\displaystyle\sum_{v\in{\mathcal{A}}_{u}\cup{\mathcal{R}}_{u}}\dfrac{1}{\sqrt{% {\hat{d}}_{u}{\hat{d}}_{v}}}∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_A start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∪ caligraphic_R start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT end_ARG end_ARG
=v𝒜uu1(du+δu+δu)(dv+δv+δv)absentsubscript𝑣subscript𝒜𝑢subscript𝑢1subscript𝑑𝑢superscriptsubscript𝛿𝑢superscriptsubscript𝛿𝑢subscript𝑑𝑣superscriptsubscript𝛿𝑣superscriptsubscript𝛿𝑣\displaystyle=\sum_{v\in{\mathcal{A}}_{u}\cup{\mathcal{R}}_{u}}\dfrac{1}{\sqrt% {(d_{u}+\delta_{u}^{+}-\delta_{u}^{-})(d_{v}+\delta_{v}^{+}-\delta_{v}^{-})}}= ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_A start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∪ caligraphic_R start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG ( italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) ( italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) end_ARG end_ARG (41)

Thus, we have a new bound, which is more suited to our error model, that is

𝐄u1δuduτu+duδuduτu+v𝒜uu1(du+δu+δu)(dv+δv+δv)=du/τu+v𝒜uu1(du+δu+δu)(dv+δv+δv).subscriptdelimited-∥∥subscript𝐄𝑢1superscriptsubscript𝛿𝑢subscript𝑑𝑢subscript𝜏𝑢subscript𝑑𝑢superscriptsubscript𝛿𝑢subscript𝑑𝑢subscript𝜏𝑢subscript𝑣subscript𝒜𝑢subscript𝑢1subscript𝑑𝑢superscriptsubscript𝛿𝑢superscriptsubscript𝛿𝑢subscript𝑑𝑣superscriptsubscript𝛿𝑣superscriptsubscript𝛿𝑣subscript𝑑𝑢subscript𝜏𝑢subscript𝑣subscript𝒜𝑢subscript𝑢1subscript𝑑𝑢superscriptsubscript𝛿𝑢superscriptsubscript𝛿𝑢subscript𝑑𝑣superscriptsubscript𝛿𝑣superscriptsubscript𝛿𝑣\begin{split}&\|{\mathbf{E}}_{u}\|_{1}\leq\dfrac{\delta_{u}^{-}}{\sqrt{d_{u}% \tau_{u}}}+\dfrac{d_{u}-\delta_{u}^{-}}{\sqrt{d_{u}\tau_{u}}}\\ &+\sum_{v\in{\mathcal{A}}_{u}\cup{\mathcal{R}}_{u}}\dfrac{1}{\sqrt{(d_{u}+% \delta_{u}^{+}-\delta_{u}^{-})(d_{v}+\delta_{v}^{+}-\delta_{v}^{-})}}\\ &=\sqrt{d_{u}/\tau_{u}}+\sum_{v\in{\mathcal{A}}_{u}\cup{\mathcal{R}}_{u}}% \dfrac{1}{\sqrt{(d_{u}+\delta_{u}^{+}-\delta_{u}^{-})(d_{v}+\delta_{v}^{+}-% \delta_{v}^{-})}}.\end{split}start_ROW start_CELL end_CELL start_CELL ∥ bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ divide start_ARG italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_ARG end_ARG + divide start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT - italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_ARG end_ARG end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_A start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∪ caligraphic_R start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG ( italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) ( italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) end_ARG end_ARG end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = square-root start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT / italic_τ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_ARG + ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_A start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∪ caligraphic_R start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG ( italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) ( italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) end_ARG end_ARG . end_CELL end_ROW (42)

We will adapt the general bound (42) to the probabilistic error model presented in (8). In (42), we let

Zu,1=du/τu,Zu,2=v𝒜uu1(du+δu+δu)(dv+δv+δv),formulae-sequencesubscript𝑍𝑢1subscript𝑑𝑢subscript𝜏𝑢subscript𝑍𝑢2subscript𝑣subscript𝒜𝑢subscript𝑢1subscript𝑑𝑢superscriptsubscript𝛿𝑢superscriptsubscript𝛿𝑢subscript𝑑𝑣superscriptsubscript𝛿𝑣superscriptsubscript𝛿𝑣\begin{split}&Z_{u,1}=\sqrt{d_{u}/\tau_{u}},\\ &Z_{u,2}=\sum_{v\in{\mathcal{A}}_{u}\cup{\mathcal{R}}_{u}}\frac{1}{\sqrt{(d_{u% }+\delta_{u}^{+}-\delta_{u}^{-})(d_{v}+\delta_{v}^{+}-\delta_{v}^{-})}},\end{split}start_ROW start_CELL end_CELL start_CELL italic_Z start_POSTSUBSCRIPT italic_u , 1 end_POSTSUBSCRIPT = square-root start_ARG italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT / italic_τ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_ARG , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_Z start_POSTSUBSCRIPT italic_u , 2 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_v ∈ caligraphic_A start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∪ caligraphic_R start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG ( italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) ( italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT - italic_δ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) end_ARG end_ARG , end_CELL end_ROW (43)

where δuBin(du,ϵ1)similar-tosuperscriptsubscript𝛿𝑢Binsubscript𝑑𝑢subscriptitalic-ϵ1\delta_{u}^{-}\sim\textrm{Bin}(d_{u},\epsilon_{1})italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ∼ Bin ( italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), δu+Bin(du,ϵ2)similar-tosuperscriptsubscript𝛿𝑢Binsuperscriptsubscript𝑑𝑢subscriptitalic-ϵ2\delta_{u}^{+}\sim\textrm{Bin}(d_{u}^{*},\epsilon_{2})italic_δ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ∼ Bin ( italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), δvBin(dv,ϵ1)similar-tosuperscriptsubscript𝛿𝑣Binsubscript𝑑𝑣subscriptitalic-ϵ1\delta_{v}^{-}\sim\textrm{Bin}(d_{v},\epsilon_{1})italic_δ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ∼ Bin ( italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), δv+Bin(dv,ϵ2)similar-tosuperscriptsubscript𝛿𝑣Binsuperscriptsubscript𝑑𝑣subscriptitalic-ϵ2\delta_{v}^{+}\sim\textrm{Bin}(d_{v}^{*},\epsilon_{2})italic_δ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ∼ Bin ( italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), du=Ndu1superscriptsubscript𝑑𝑢𝑁subscript𝑑𝑢1d_{u}^{*}=N-d_{u}-1italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_N - italic_d start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT - 1 and dv=Ndv1superscriptsubscript𝑑𝑣𝑁subscript𝑑𝑣1d_{v}^{*}=N-d_{v}-1italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_N - italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT - 1. Finally, we obtain

𝐄u1Zu,1+Zu,2.subscriptnormsubscript𝐄𝑢1subscript𝑍𝑢1subscript𝑍𝑢2\displaystyle\|{\mathbf{E}}_{u}\|_{1}\leq Z_{u,1}+Z_{u,2}.∥ bold_E start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_Z start_POSTSUBSCRIPT italic_u , 1 end_POSTSUBSCRIPT + italic_Z start_POSTSUBSCRIPT italic_u , 2 end_POSTSUBSCRIPT . (44)

This completes the proof. ∎

Appendix B Graph filter sensitivity

Proof of Theorem 2.

First, we recall the following result.

Lemma 2.

(Lemma 3, [52]) Suppose that 𝐒^,𝐒,𝐄N×N^𝐒𝐒𝐄superscript𝑁𝑁\hat{{\mathbf{S}}},{\mathbf{S}},{\mathbf{E}}\in\mathbb{R}^{N\times N}over^ start_ARG bold_S end_ARG , bold_S , bold_E ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT are Hermitian matrices satisfying 𝐒^=𝐒+𝐄^𝐒𝐒𝐄\hat{{\mathbf{S}}}={\mathbf{S}}+{\mathbf{E}}over^ start_ARG bold_S end_ARG = bold_S + bold_E, and λ=max{𝐒^,𝐒}𝜆norm^𝐒norm𝐒\lambda=\max\{\|{\hat{\mathbf{S}}}\|,\|{\mathbf{S}}\|\}italic_λ = roman_max { ∥ over^ start_ARG bold_S end_ARG ∥ , ∥ bold_S ∥ }. Then for every k0𝑘0k\geq 0italic_k ≥ 0

𝐒^k𝐒k=(𝐒+𝐄)k𝐒kkλk1𝐄.normsuperscript^𝐒𝑘superscript𝐒𝑘normsuperscript𝐒𝐄𝑘superscript𝐒𝑘𝑘superscript𝜆𝑘1norm𝐄\|\hat{{\mathbf{S}}}^{k}-{\mathbf{S}}^{k}\|=\|({\mathbf{S}}+{\mathbf{E}})^{k}-% {\mathbf{S}}^{k}\|\leq k\lambda^{k-1}\|{\mathbf{E}}\|.∥ over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = ∥ ( bold_S + bold_E ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ ≤ italic_k italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ bold_E ∥ . (45)

Expand the filter representation in 𝐡(𝐒^)𝐡(𝐒)norm𝐡^𝐒𝐡𝐒\|{\mathbf{h}}({\hat{\mathbf{S}}})-{\mathbf{h}}({\mathbf{S}})\|∥ bold_h ( over^ start_ARG bold_S end_ARG ) - bold_h ( bold_S ) ∥, as

𝐡(𝐒^)𝐡(𝐒)=k=0K(hk𝐒^khk𝐒k).delimited-∥∥𝐡^𝐒𝐡𝐒delimited-∥∥superscriptsubscript𝑘0𝐾subscript𝑘superscript^𝐒𝑘subscript𝑘superscript𝐒𝑘\begin{split}&\left\|{\mathbf{h}}({\hat{\mathbf{S}}})-{\mathbf{h}}({\mathbf{S}% })\right\|=\left\|\sum_{k=0}^{K}\left(h_{k}{\hat{\mathbf{S}}}^{k}-h_{k}{% \mathbf{S}}^{k}\right)\right\|.\end{split}start_ROW start_CELL end_CELL start_CELL ∥ bold_h ( over^ start_ARG bold_S end_ARG ) - bold_h ( bold_S ) ∥ = ∥ ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ . end_CELL end_ROW (46)

By Lemma 2 and repeatably using triangle inequality, (46) is bounded by

k=0K(hk𝐒^khk𝐒k)k=0K|hk|𝐒^k𝐒kk=0K|hk|kλk1𝐄=k=1K|hk|kλk1𝐄.delimited-∥∥superscriptsubscript𝑘0𝐾subscript𝑘superscript^𝐒𝑘subscript𝑘superscript𝐒𝑘superscriptsubscript𝑘0𝐾subscript𝑘delimited-∥∥superscript^𝐒𝑘superscript𝐒𝑘superscriptsubscript𝑘0𝐾subscript𝑘𝑘superscript𝜆𝑘1delimited-∥∥𝐄superscriptsubscript𝑘1𝐾subscript𝑘𝑘superscript𝜆𝑘1delimited-∥∥𝐄\begin{split}&\left\|\sum_{k=0}^{K}\left(h_{k}{\hat{\mathbf{S}}}^{k}-h_{k}{% \mathbf{S}}^{k}\right)\right\|\leq\sum_{k=0}^{K}|h_{k}|\|{\hat{\mathbf{S}}}^{k% }-{\mathbf{S}}^{k}\|\\ &\leq\sum_{k=0}^{K}|h_{k}|k\lambda^{k-1}\|{\mathbf{E}}\|=\sum_{k=1}^{K}|h_{k}|% k\lambda^{k-1}\|{\mathbf{E}}\|.\end{split}start_ROW start_CELL end_CELL start_CELL ∥ ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ ≤ ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | ∥ over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_k italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ bold_E ∥ = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_k italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ bold_E ∥ . end_CELL end_ROW (47)

The correlation between λ𝜆\lambdaitalic_λ and 𝐄norm𝐄\|{\mathbf{E}}\|∥ bold_E ∥ has two cases:

  1. 1.

    If λ=𝐒𝜆norm𝐒\lambda=\|{\mathbf{S}}\|italic_λ = ∥ bold_S ∥,

    𝔼[λk1𝐄]=𝔼[λk1]𝔼[𝐄];𝔼delimited-[]superscript𝜆𝑘1norm𝐄𝔼delimited-[]superscript𝜆𝑘1𝔼delimited-[]norm𝐄\mathbb{E}[\lambda^{k-1}\|\mathbf{E}\|]=\mathbb{E}[\lambda^{k-1}]\mathbb{E}[\|% \mathbf{E}\|];blackboard_E [ italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ bold_E ∥ ] = blackboard_E [ italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ] blackboard_E [ ∥ bold_E ∥ ] ; (48)
  2. 2.

    If λ=𝐒^𝜆norm^𝐒\lambda=\|{\hat{\mathbf{S}}}\|italic_λ = ∥ over^ start_ARG bold_S end_ARG ∥,

    𝔼[λk1𝐄]=𝔼[λk1]𝔼[𝐄]+Cov[𝐄,λk1].𝔼delimited-[]superscript𝜆𝑘1norm𝐄𝔼delimited-[]superscript𝜆𝑘1𝔼delimited-[]norm𝐄Covnorm𝐄superscript𝜆𝑘1\mathbb{E}[\lambda^{k-1}\|\mathbf{E}\|]=\mathbb{E}[\lambda^{k-1}]\mathbb{E}[\|% \mathbf{E}\|]+\textrm{Cov}[\|\mathbf{E}\|,\lambda^{k-1}].blackboard_E [ italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ bold_E ∥ ] = blackboard_E [ italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ] blackboard_E [ ∥ bold_E ∥ ] + Cov [ ∥ bold_E ∥ , italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ] . (49)

The following proof is based on the second case (49) because the covariance term can be set to zero to include the first case. By using (46) and taking the expectation of (47), we obtain

𝔼[𝐡(𝐒^)𝐡(𝐒)]𝔼[k=1K|hk|kλk1𝐄]k=1Kk|hk|𝔼[λk1𝐄]=k=1Kk|hk|(𝔼[λk1]𝔼[𝐄]+Cov[𝐄,λk1]).𝔼delimited-[]delimited-∥∥𝐡^𝐒𝐡𝐒𝔼delimited-[]superscriptsubscript𝑘1𝐾subscript𝑘𝑘superscript𝜆𝑘1delimited-∥∥𝐄superscriptsubscript𝑘1𝐾𝑘subscript𝑘𝔼delimited-[]superscript𝜆𝑘1delimited-∥∥𝐄superscriptsubscript𝑘1𝐾𝑘subscript𝑘𝔼delimited-[]superscript𝜆𝑘1𝔼delimited-[]delimited-∥∥𝐄Covdelimited-∥∥𝐄superscript𝜆𝑘1\begin{split}&{\mathbb{E}}\left[\left\|{\mathbf{h}}({\hat{\mathbf{S}}})-{% \mathbf{h}}({\mathbf{S}})\right\|\right]\leq{\mathbb{E}}\left[\sum_{k=1}^{K}|h% _{k}|k\lambda^{k-1}\|{\mathbf{E}}\|\right]\\ &\leq\sum_{k=1}^{K}k|h_{k}|{\mathbb{E}}\left[\lambda^{k-1}\|{\mathbf{E}}\|% \right]\\ &=\sum_{k=1}^{K}k|h_{k}|\left(\mathbb{E}[\lambda^{k-1}]\mathbb{E}[\|\mathbf{E}% \|]+\textrm{Cov}[\|\mathbf{E}\|,\lambda^{k-1}]\right).\end{split}start_ROW start_CELL end_CELL start_CELL blackboard_E [ ∥ bold_h ( over^ start_ARG bold_S end_ARG ) - bold_h ( bold_S ) ∥ ] ≤ blackboard_E [ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT | italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | italic_k italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ bold_E ∥ ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_k | italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | blackboard_E [ italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ bold_E ∥ ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_k | italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | ( blackboard_E [ italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ] blackboard_E [ ∥ bold_E ∥ ] + Cov [ ∥ bold_E ∥ , italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ] ) . end_CELL end_ROW (50)

In (50), let

λksubscript𝜆𝑘\displaystyle\lambda_{k}italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =𝔼[λk1],absent𝔼delimited-[]superscript𝜆𝑘1\displaystyle={\mathbb{E}}[\lambda^{k-1}],= blackboard_E [ italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ] , (51)
ζksubscript𝜁𝑘\displaystyle\zeta_{k}italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =Cov[𝐄,λk1].absentCovnorm𝐄superscript𝜆𝑘1\displaystyle=\textrm{Cov}[\|{\mathbf{E}}\|,\lambda^{k-1}].= Cov [ ∥ bold_E ∥ , italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ] . (52)

Then, we have

𝔼[𝐡(𝐒^)𝐡(𝐒)]k=1Kk|hk|(λk𝔼[𝐄]+ζk).𝔼delimited-[]delimited-∥∥𝐡^𝐒𝐡𝐒superscriptsubscript𝑘1𝐾𝑘subscript𝑘subscript𝜆𝑘𝔼delimited-[]delimited-∥∥𝐄subscript𝜁𝑘\begin{split}{\mathbb{E}}\left[\left\|{\mathbf{h}}({\hat{\mathbf{S}}})-{% \mathbf{h}}({\mathbf{S}})\right\|\right]\leq\sum_{k=1}^{K}k|h_{k}|\left(% \lambda_{k}\mathbb{E}[\|\mathbf{E}\|]+\zeta_{k}\right).\end{split}start_ROW start_CELL blackboard_E [ ∥ bold_h ( over^ start_ARG bold_S end_ARG ) - bold_h ( bold_S ) ∥ ] ≤ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_k | italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | ( italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT blackboard_E [ ∥ bold_E ∥ ] + italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) . end_CELL end_ROW (53)

This completes the proof. ∎

Appendix C GCNN Sensitivity

Proof of Theorem 3.

First Layer. At the first layer =11\ell=1roman_ℓ = 1, the graph convolution is performed as follows

𝐘1=k=1K𝐒k𝐗0𝐇1k,𝐗1=σ1(𝐘1).formulae-sequencesubscript𝐘1superscriptsubscript𝑘1𝐾superscript𝐒𝑘subscript𝐗0subscript𝐇1𝑘subscript𝐗1subscript𝜎1subscript𝐘1\displaystyle{\mathbf{Y}}_{1}=\sum_{k=1}^{K}{\mathbf{S}}^{k}{\mathbf{X}}_{0}{% \mathbf{H}}_{1k},\quad{\mathbf{X}}_{1}=\sigma_{1}({\mathbf{Y}}_{1}).bold_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT 1 italic_k end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) . (54)

For a perturbed GSO 𝐒^^𝐒{\hat{\mathbf{S}}}over^ start_ARG bold_S end_ARG, the difference between the perturbed and clean graph convolutions is

𝐘^1𝐘1=k=1K(𝐒^k𝐒k)𝐗0𝐇1k.subscript^𝐘1subscript𝐘1superscriptsubscript𝑘1𝐾superscript^𝐒𝑘superscript𝐒𝑘subscript𝐗0subscript𝐇1𝑘\displaystyle{\hat{\mathbf{Y}}}_{1}-{\mathbf{Y}}_{1}=\sum_{k=1}^{K}({\hat{% \mathbf{S}}}^{k}-{\mathbf{S}}^{k}){\mathbf{X}}_{0}{\mathbf{H}}_{1k}.over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT 1 italic_k end_POSTSUBSCRIPT . (55)

Using Lemma 2, we can bound (55) as follows

𝐘^1𝐘1k=1Kkλk1𝐗0𝐇1k𝐄.normsubscript^𝐘1subscript𝐘1superscriptsubscript𝑘1𝐾𝑘superscript𝜆𝑘1normsubscript𝐗0normsubscript𝐇1𝑘norm𝐄\displaystyle\left\|{\hat{\mathbf{Y}}}_{1}-{\mathbf{Y}}_{1}\right\|\leq\sum_{k% =1}^{K}k\lambda^{k-1}\|{\mathbf{X}}_{0}\|\|{\mathbf{H}}_{1k}\|\|{\mathbf{E}}\|.∥ over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ≤ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_k italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ ∥ bold_H start_POSTSUBSCRIPT 1 italic_k end_POSTSUBSCRIPT ∥ ∥ bold_E ∥ . (56)

Similar to giving the upper bound for the expectation of graph filter distance from (47) to (53), given the constants λk=𝔼[λk1]subscript𝜆𝑘𝔼delimited-[]superscript𝜆𝑘1\lambda_{k}={\mathbb{E}}[\lambda^{k-1}]italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = blackboard_E [ italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ] and ζk=Cov[𝐄,λk1]subscript𝜁𝑘Covnorm𝐄superscript𝜆𝑘1\zeta_{k}=\textrm{Cov}[\|{\mathbf{E}}\|,\lambda^{k-1}]italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = Cov [ ∥ bold_E ∥ , italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ], we take the expectation of (56) and obtain

𝔼[𝐘^1𝐘1]𝔼[k=1Kkλk1𝐗0𝐇1k𝐄]𝔼delimited-[]normsubscript^𝐘1subscript𝐘1𝔼delimited-[]superscriptsubscript𝑘1𝐾𝑘superscript𝜆𝑘1normsubscript𝐗0normsubscript𝐇1𝑘norm𝐄\displaystyle{\mathbb{E}}\left[\left\|{\hat{\mathbf{Y}}}_{1}-{\mathbf{Y}}_{1}% \right\|\right]\leq{\mathbb{E}}\left[\sum_{k=1}^{K}k\lambda^{k-1}\|{\mathbf{X}% }_{0}\|\|{\mathbf{H}}_{1k}\|\|{\mathbf{E}}\|\right]blackboard_E [ ∥ over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ] ≤ blackboard_E [ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_k italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ ∥ bold_H start_POSTSUBSCRIPT 1 italic_k end_POSTSUBSCRIPT ∥ ∥ bold_E ∥ ]
=k=1Kk𝐗0𝐇1k𝔼[λk1𝐄]absentsuperscriptsubscript𝑘1𝐾𝑘normsubscript𝐗0normsubscript𝐇1𝑘𝔼delimited-[]superscript𝜆𝑘1norm𝐄\displaystyle=\sum_{k=1}^{K}k\|{\mathbf{X}}_{0}\|\|{\mathbf{H}}_{1k}\|{\mathbb% {E}}\left[\lambda^{k-1}\|{\mathbf{E}}\|\right]= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_k ∥ bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ ∥ bold_H start_POSTSUBSCRIPT 1 italic_k end_POSTSUBSCRIPT ∥ blackboard_E [ italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ bold_E ∥ ]
=k=1Kk𝐗0𝐇1k(𝔼[λk1]𝔼[𝐄]+Cov[𝐄,λk1])absentsuperscriptsubscript𝑘1𝐾𝑘normsubscript𝐗0normsubscript𝐇1𝑘𝔼delimited-[]superscript𝜆𝑘1𝔼delimited-[]norm𝐄Covnorm𝐄superscript𝜆𝑘1\displaystyle=\sum_{k=1}^{K}k\|{\mathbf{X}}_{0}\|\|{\mathbf{H}}_{1k}\|\left({% \mathbb{E}}[\lambda^{k-1}]{\mathbb{E}}\left[\|{\mathbf{E}}\|\right]+\textrm{% Cov}[\|{\mathbf{E}}\|,\lambda^{k-1}]\right)= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_k ∥ bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ ∥ bold_H start_POSTSUBSCRIPT 1 italic_k end_POSTSUBSCRIPT ∥ ( blackboard_E [ italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ] blackboard_E [ ∥ bold_E ∥ ] + Cov [ ∥ bold_E ∥ , italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ] )
k=1Kk𝐗0𝐇1k(λk𝔼[𝐄]+ζk).absentsuperscriptsubscript𝑘1𝐾𝑘normsubscript𝐗0normsubscript𝐇1𝑘subscript𝜆𝑘𝔼delimited-[]norm𝐄subscript𝜁𝑘\displaystyle\leq\sum_{k=1}^{K}k\|{\mathbf{X}}_{0}\|\|{\mathbf{H}}_{1k}\|\left% (\lambda_{k}{\mathbb{E}}\left[\|{\mathbf{E}}\|\right]+\zeta_{k}\right).≤ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_k ∥ bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ ∥ bold_H start_POSTSUBSCRIPT 1 italic_k end_POSTSUBSCRIPT ∥ ( italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT blackboard_E [ ∥ bold_E ∥ ] + italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) . (57)

For simplicity, let B1=k=1Kkλk𝐗0𝐇1ksubscript𝐵1superscriptsubscript𝑘1𝐾𝑘subscript𝜆𝑘normsubscript𝐗0normsubscript𝐇1𝑘B_{1}=\sum_{k=1}^{K}k\lambda_{k}\|{\mathbf{X}}_{0}\|\|{\mathbf{H}}_{1k}\|italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_k italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ ∥ bold_H start_POSTSUBSCRIPT 1 italic_k end_POSTSUBSCRIPT ∥, and D1=k=1Kkζk𝐗0𝐇1ksubscript𝐷1superscriptsubscript𝑘1𝐾𝑘subscript𝜁𝑘normsubscript𝐗0normsubscript𝐇1𝑘D_{1}=\sum_{k=1}^{K}k\zeta_{k}\|{\mathbf{X}}_{0}\|\|{\mathbf{H}}_{1k}\|italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_k italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ ∥ bold_H start_POSTSUBSCRIPT 1 italic_k end_POSTSUBSCRIPT ∥. Thus, (57) illustrates that the expectation of the graph filter distance at the first layer is bounded by a polynomial of 𝔼[𝐄]𝔼delimited-[]norm𝐄{\mathbb{E}}\left[\|{\mathbf{E}}\|\right]blackboard_E [ ∥ bold_E ∥ ] as

𝔼[𝐘^1𝐘1]B1𝔼[𝐄]+D1.𝔼delimited-[]normsubscript^𝐘1subscript𝐘1subscript𝐵1𝔼delimited-[]norm𝐄subscript𝐷1\displaystyle{\mathbb{E}}\left[\left\|{\hat{\mathbf{Y}}}_{1}-{\mathbf{Y}}_{1}% \right\|\right]\leq B_{1}{\mathbb{E}}[\|{\mathbf{E}}\|]+D_{1}.blackboard_E [ ∥ over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ] ≤ italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT blackboard_E [ ∥ bold_E ∥ ] + italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . (58)

Consider the nonlinearity function σ1()subscript𝜎1\sigma_{1}(\cdot)italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ⋅ ) at the first layer, which satisfies the Lipschitz condition

σ1(𝐘^)σ1(𝐘)Cσ1𝐘^𝐘.normsubscript𝜎1^𝐘subscript𝜎1𝐘subscript𝐶subscript𝜎1norm^𝐘𝐘\displaystyle\|\sigma_{1}({\hat{\mathbf{Y}}})-\sigma_{1}({\mathbf{Y}})\|\leq C% _{\sigma_{1}}\|{\hat{\mathbf{Y}}}-{\mathbf{Y}}\|.∥ italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over^ start_ARG bold_Y end_ARG ) - italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_Y ) ∥ ≤ italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ over^ start_ARG bold_Y end_ARG - bold_Y ∥ . (59)

Applying this Lipschitz condition to (56), we have

𝔼[𝐗^1𝐗1]=𝔼[σ1(𝐘^)σ1(𝐘)]𝔼delimited-[]normsubscript^𝐗1subscript𝐗1𝔼delimited-[]normsubscript𝜎1^𝐘subscript𝜎1𝐘\displaystyle{\mathbb{E}}\left[\|{\hat{\mathbf{X}}}_{1}-{\mathbf{X}}_{1}\|% \right]={\mathbb{E}}\left[\left\|\sigma_{1}({\hat{\mathbf{Y}}})-\sigma_{1}({% \mathbf{Y}})\right\|\right]blackboard_E [ ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ] = blackboard_E [ ∥ italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over^ start_ARG bold_Y end_ARG ) - italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_Y ) ∥ ]
Cσ1𝔼[𝐘^𝐘]Cσ1B1𝔼[𝐄]+Cσ1D1.absentsubscript𝐶subscript𝜎1𝔼delimited-[]norm^𝐘𝐘subscript𝐶subscript𝜎1subscript𝐵1𝔼delimited-[]norm𝐄subscript𝐶subscript𝜎1subscript𝐷1\displaystyle\leq C_{\sigma_{1}}{\mathbb{E}}\left[\left\|{\hat{\mathbf{Y}}}-{% \mathbf{Y}}\right\|\right]\leq C_{\sigma_{1}}B_{1}{\mathbb{E}}\left[\|{\mathbf% {E}}\|\right]+C_{\sigma_{1}}D_{1}.≤ italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_E [ ∥ over^ start_ARG bold_Y end_ARG - bold_Y ∥ ] ≤ italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT blackboard_E [ ∥ bold_E ∥ ] + italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . (60)

Second Layer. At the second layer =22\ell=2roman_ℓ = 2, the graph convolution is performed as

𝐘2=k=1K𝐒k𝐗1𝐇2k,𝐗2=σ(𝐘2).formulae-sequencesubscript𝐘2superscriptsubscript𝑘1𝐾superscript𝐒𝑘subscript𝐗1subscript𝐇2𝑘subscript𝐗2𝜎subscript𝐘2\displaystyle{\mathbf{Y}}_{2}=\sum_{k=1}^{K}{\mathbf{S}}^{k}{\mathbf{X}}_{1}{% \mathbf{H}}_{2k},\quad{\mathbf{X}}_{2}=\sigma({\mathbf{Y}}_{2}).bold_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_σ ( bold_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) . (61)

The difference between the perturbed and clean graph convolutions is given by

𝐘^2𝐘2=k=1K𝐒^k𝐗^1𝐇2kk=1K𝐒k𝐗1𝐇2ksubscript^𝐘2subscript𝐘2superscriptsubscript𝑘1𝐾superscript^𝐒𝑘subscript^𝐗1subscript𝐇2𝑘superscriptsubscript𝑘1𝐾superscript𝐒𝑘subscript𝐗1subscript𝐇2𝑘\displaystyle{\hat{\mathbf{Y}}}_{2}-{\mathbf{Y}}_{2}=\sum_{k=1}^{K}{\hat{% \mathbf{S}}}^{k}{\hat{\mathbf{X}}}_{1}{\mathbf{H}}_{2k}-\sum_{k=1}^{K}{\mathbf% {S}}^{k}{\mathbf{X}}_{1}{\mathbf{H}}_{2k}over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - bold_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT
=k=1K(𝐒^k𝐗^1𝐒^k𝐗1+𝐒^k𝐗1𝐒k𝐗1)𝐇2kabsentsuperscriptsubscript𝑘1𝐾superscript^𝐒𝑘subscript^𝐗1superscript^𝐒𝑘subscript𝐗1superscript^𝐒𝑘subscript𝐗1superscript𝐒𝑘subscript𝐗1subscript𝐇2𝑘\displaystyle=\sum_{k=1}^{K}({\hat{\mathbf{S}}}^{k}{\hat{\mathbf{X}}}_{1}-{% \hat{\mathbf{S}}}^{k}{\mathbf{X}}_{1}+{\hat{\mathbf{S}}}^{k}{\mathbf{X}}_{1}-{% \mathbf{S}}^{k}{\mathbf{X}}_{1}){\mathbf{H}}_{2k}= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) bold_H start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT
=k=1K(𝐒^k(𝐗^1𝐗1)+(𝐒^k𝐒k)𝐗1)𝐇2k.absentsuperscriptsubscript𝑘1𝐾superscript^𝐒𝑘subscript^𝐗1subscript𝐗1superscript^𝐒𝑘superscript𝐒𝑘subscript𝐗1subscript𝐇2𝑘\displaystyle=\sum_{k=1}^{K}\left({\hat{\mathbf{S}}}^{k}({\hat{\mathbf{X}}}_{1% }-{\mathbf{X}}_{1})+({\hat{\mathbf{S}}}^{k}-{\mathbf{S}}^{k}){\mathbf{X}}_{1}% \right){\mathbf{H}}_{2k}.= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + ( over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) bold_H start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT . (62)

Taking the expectation of (C) and using (49), Lemma 2 as well as the submultiplicativity of the spectral norm, we have

𝔼[𝐘^2𝐘2]𝔼delimited-[]normsubscript^𝐘2subscript𝐘2\displaystyle{\mathbb{E}}\left[\left\|{\hat{\mathbf{Y}}}_{2}-{\mathbf{Y}}_{2}% \right\|\right]blackboard_E [ ∥ over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - bold_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ ]
𝔼[k=1K(𝐒^k(𝐗^1𝐗1)+(𝐒^k𝐒k)𝐗1)𝐇2k]absent𝔼delimited-[]normsuperscriptsubscript𝑘1𝐾superscript^𝐒𝑘subscript^𝐗1subscript𝐗1superscript^𝐒𝑘superscript𝐒𝑘subscript𝐗1subscript𝐇2𝑘\displaystyle\leq{\mathbb{E}}\left[\left\|\sum_{k=1}^{K}\left({\hat{\mathbf{S}% }}^{k}({\hat{\mathbf{X}}}_{1}-{\mathbf{X}}_{1})+({\hat{\mathbf{S}}}^{k}-{% \mathbf{S}}^{k}){\mathbf{X}}_{1}\right){\mathbf{H}}_{2k}\right\|\right]≤ blackboard_E [ ∥ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + ( over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) bold_H start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT ∥ ]
k=1K𝐇2k𝔼[𝐒^k(𝐗^1𝐗1)+(𝐒^k𝐒k)𝐗1]absentsuperscriptsubscript𝑘1𝐾normsubscript𝐇2𝑘𝔼delimited-[]normsuperscript^𝐒𝑘subscript^𝐗1subscript𝐗1normsuperscript^𝐒𝑘superscript𝐒𝑘subscript𝐗1\displaystyle\leq\sum_{k=1}^{K}\|{\mathbf{H}}_{2k}\|{\mathbb{E}}\left[\left\|{% \hat{\mathbf{S}}}^{k}({\hat{\mathbf{X}}}_{1}-{\mathbf{X}}_{1})\right\|+\left\|% ({\hat{\mathbf{S}}}^{k}-{\mathbf{S}}^{k}){\mathbf{X}}_{1}\right\|\right]≤ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∥ bold_H start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT ∥ blackboard_E [ ∥ over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∥ + ∥ ( over^ start_ARG bold_S end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_S start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ]
k=1K𝐇2k(𝔼[λk]𝔼[𝐗^1𝐗1]+Cov[𝐗^1𝐗1,λk]\displaystyle\leq\sum_{k=1}^{K}\|{\mathbf{H}}_{2k}\|\Big{(}{\mathbb{E}}[% \lambda^{k}]{\mathbb{E}}\left[\|{\hat{\mathbf{X}}}_{1}-{\mathbf{X}}_{1}\|% \right]+\text{Cov}\left[\|{\hat{\mathbf{X}}}_{1}-{\mathbf{X}}_{1}\|,\lambda^{k% }\right]≤ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∥ bold_H start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT ∥ ( blackboard_E [ italic_λ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ] blackboard_E [ ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ] + Cov [ ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ , italic_λ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ]
+k𝐗1(𝔼[λk1]𝔼[𝐄]+Cov[𝐄,λk1])).\displaystyle+k\|{\mathbf{X}}_{1}\|\left({\mathbb{E}}[\lambda^{k-1}]{\mathbb{E% }}\left[\|{\mathbf{E}}\|\right]+\textrm{Cov}[\|{\mathbf{E}}\|,\lambda^{k-1}]% \right)\Bigr{)}.+ italic_k ∥ bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ( blackboard_E [ italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ] blackboard_E [ ∥ bold_E ∥ ] + Cov [ ∥ bold_E ∥ , italic_λ start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ] ) ) . (63)

Let

μk,1=Cov[𝐗^1𝐗1,λk],subscript𝜇𝑘1Covnormsubscript^𝐗1subscript𝐗1superscript𝜆𝑘\displaystyle\mu_{k,\ell-1}=\text{Cov}[\|{\hat{\mathbf{X}}}_{\ell-1}-{\mathbf{% X}}_{\ell-1}\|,\lambda^{k}],italic_μ start_POSTSUBSCRIPT italic_k , roman_ℓ - 1 end_POSTSUBSCRIPT = Cov [ ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT ∥ , italic_λ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ] , (64)

where k=1,,K𝑘1𝐾k=1,\ldots,Kitalic_k = 1 , … , italic_K, and =2,,L2𝐿\ell=2,\ldots,Lroman_ℓ = 2 , … , italic_L. Thus, in (63), we have μk,1=Cov𝐗^1𝐗1subscript𝜇𝑘1Covnormsubscript^𝐗1subscript𝐗1\mu_{k,1}=\text{Cov}\|{\hat{\mathbf{X}}}_{1}-{\mathbf{X}}_{1}\|italic_μ start_POSTSUBSCRIPT italic_k , 1 end_POSTSUBSCRIPT = Cov ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥. Then, we can express (63) as a function controlled by 𝔼[𝐄]𝔼delimited-[]norm𝐄{\mathbb{E}}[\|{\mathbf{E}}\|]blackboard_E [ ∥ bold_E ∥ ]

𝔼[𝐘^2𝐘2]k=1K𝐇2k(λk+1𝔼[𝐗^1𝐗1]\displaystyle{\mathbb{E}}\left[\left\|{\hat{\mathbf{Y}}}_{2}-{\mathbf{Y}}_{2}% \right\|\right]\leq\sum_{k=1}^{K}\|{\mathbf{H}}_{2k}\|\Bigl{(}\lambda_{k+1}{% \mathbb{E}}\left[\|{\hat{\mathbf{X}}}_{1}-{\mathbf{X}}_{1}\|\right]blackboard_E [ ∥ over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - bold_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ ] ≤ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∥ bold_H start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT ∥ ( italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT blackboard_E [ ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ]
+μk,1+kλk𝐗1𝔼[𝐄]+kζk𝐗1)\displaystyle+\mu_{k,1}+k\lambda_{k}\|{\mathbf{X}}_{1}\|{\mathbb{E}}[\|{% \mathbf{E}}\|]+k\zeta_{k}\|{\mathbf{X}}_{1}\|\Bigr{)}+ italic_μ start_POSTSUBSCRIPT italic_k , 1 end_POSTSUBSCRIPT + italic_k italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ blackboard_E [ ∥ bold_E ∥ ] + italic_k italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ )
k=1K𝐇2k((λk+1Cσ1B1+kλk𝐗1)𝔼[𝐄]\displaystyle\leq\sum_{k=1}^{K}\|{\mathbf{H}}_{2k}\|\Bigl{(}\left(\lambda_{k+1% }C_{\sigma_{1}}B_{1}+k\lambda_{k}\|{\mathbf{X}}_{1}\|\right){\mathbb{E}}[\|{% \mathbf{E}}\|]≤ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∥ bold_H start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT ∥ ( ( italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_k italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ) blackboard_E [ ∥ bold_E ∥ ]
+μk,1+λkCσ1D1+kζk𝐗1)\displaystyle+\mu_{k,1}+\lambda_{k}C_{\sigma_{1}}D_{1}+k\zeta_{k}\|{\mathbf{X}% }_{1}\|\Bigr{)}+ italic_μ start_POSTSUBSCRIPT italic_k , 1 end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_k italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ )
B2𝔼[𝐄]+D2,absentsubscript𝐵2𝔼delimited-[]norm𝐄subscript𝐷2\displaystyle\leq B_{2}{\mathbb{E}}[\|{\mathbf{E}}\|]+D_{2},≤ italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT blackboard_E [ ∥ bold_E ∥ ] + italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , (65)

where B2=k=1K(λk+1Cσ1B1+kλk𝐗1)𝐇2ksubscript𝐵2superscriptsubscript𝑘1𝐾subscript𝜆𝑘1subscript𝐶subscript𝜎1subscript𝐵1𝑘subscript𝜆𝑘normsubscript𝐗1normsubscript𝐇2𝑘B_{2}=\sum_{k=1}^{K}\left(\lambda_{k+1}C_{\sigma_{1}}B_{1}+k\lambda_{k}\|{% \mathbf{X}}_{1}\|\right)\|{\mathbf{H}}_{2k}\|italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_k italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ) ∥ bold_H start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT ∥ and D2=k=1K(μk,1+λkCσ1D1+kζk𝐗1)𝐇2ksubscript𝐷2superscriptsubscript𝑘1𝐾subscript𝜇𝑘1subscript𝜆𝑘subscript𝐶subscript𝜎1subscript𝐷1𝑘subscript𝜁𝑘normsubscript𝐗1normsubscript𝐇2𝑘D_{2}=\sum_{k=1}^{K}\left(\mu_{k,1}+\lambda_{k}C_{\sigma_{1}}D_{1}+k\zeta_{k}% \|{\mathbf{X}}_{1}\|\right)\|{\mathbf{H}}_{2k}\|italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_k , 1 end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_k italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ) ∥ bold_H start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT ∥. Consider the second layer’s nonlinearity function σ2()subscript𝜎2\sigma_{2}(\cdot)italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ⋅ ), we have

𝔼[𝐗^2𝐗2]Cσ2B2𝔼[𝐄]+Cσ2D2.𝔼delimited-[]normsubscript^𝐗2subscript𝐗2subscript𝐶subscript𝜎2subscript𝐵2𝔼delimited-[]norm𝐄subscript𝐶subscript𝜎2subscript𝐷2\displaystyle{\mathbb{E}}\left[\|{\hat{\mathbf{X}}}_{2}-{\mathbf{X}}_{2}\|% \right]\leq C_{\sigma_{2}}B_{2}{\mathbb{E}}[\|{\mathbf{E}}\|]+C_{\sigma_{2}}D_% {2}.blackboard_E [ ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ ] ≤ italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT blackboard_E [ ∥ bold_E ∥ ] + italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . (66)

Generalization to Layer 11\ell\geq 1roman_ℓ ≥ 1. By induction, we can generalize the result to the output difference at any layer 11\ell\geq 1roman_ℓ ≥ 1

𝔼[𝐗^𝐗]CσB𝔼[𝐄]+CσD,𝔼delimited-[]normsubscript^𝐗subscript𝐗subscript𝐶subscript𝜎subscript𝐵𝔼delimited-[]norm𝐄subscript𝐶subscript𝜎subscript𝐷\displaystyle{\mathbb{E}}\left[\left\|{\hat{\mathbf{X}}}_{\ell}-{\mathbf{X}}_{% \ell}\right\|\right]\leq C_{\sigma_{\ell}}B_{\ell}{\mathbb{E}}\left[\|{\mathbf% {E}}\|\right]+C_{\sigma_{\ell}}D_{\ell},blackboard_E [ ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ∥ ] ≤ italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT blackboard_E [ ∥ bold_E ∥ ] + italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , (67)

where

B=k=1K(λk+1Cσ1B1+kλk𝐗1)𝐇k,D=k=1K(μk,1+λkCσ1D1+kζk𝐗1)𝐇k.formulae-sequencesubscript𝐵superscriptsubscript𝑘1𝐾subscript𝜆𝑘1subscript𝐶subscript𝜎1subscript𝐵1𝑘subscript𝜆𝑘delimited-∥∥subscript𝐗1delimited-∥∥subscript𝐇𝑘subscript𝐷superscriptsubscript𝑘1𝐾subscript𝜇𝑘1subscript𝜆𝑘subscript𝐶subscript𝜎1subscript𝐷1𝑘subscript𝜁𝑘delimited-∥∥subscript𝐗1delimited-∥∥subscript𝐇𝑘\begin{split}B_{\ell}&=\sum_{k=1}^{K}\left(\lambda_{k+1}C_{\sigma_{\ell-1}}B_{% \ell-1}+k\lambda_{k}\|{\mathbf{X}}_{\ell-1}\|\right)\|{\mathbf{H}}_{\ell k}\|,% \\ D_{\ell}&=\sum_{k=1}^{K}\left(\mu_{k,\ell-1}+\lambda_{k}C_{\sigma_{\ell-1}}D_{% \ell-1}+k\zeta_{k}\|{\mathbf{X}}_{\ell-1}\|\right)\|{\mathbf{H}}_{\ell k}\|.% \end{split}start_ROW start_CELL italic_B start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT + italic_k italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_X start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT ∥ ) ∥ bold_H start_POSTSUBSCRIPT roman_ℓ italic_k end_POSTSUBSCRIPT ∥ , end_CELL end_ROW start_ROW start_CELL italic_D start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_k , roman_ℓ - 1 end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT + italic_k italic_ζ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ bold_X start_POSTSUBSCRIPT roman_ℓ - 1 end_POSTSUBSCRIPT ∥ ) ∥ bold_H start_POSTSUBSCRIPT roman_ℓ italic_k end_POSTSUBSCRIPT ∥ . end_CELL end_ROW (68)

This completes the proof. ∎

Appendix D Single-layer GIN Sensitivity

Proof.

In a single-layer GIN, we assume that the inner MLP has two layers as earlier introduced in the paper. The outputs of a single-layer GIN (L=1𝐿1L=1italic_L = 1) with original and perturbed GSOs are given as

𝐗L=𝐡𝚯L(𝐒𝐗L1),subscript𝐗𝐿subscript𝐡subscript𝚯𝐿subscript𝐒𝐗𝐿1\displaystyle{\mathbf{X}}_{L}={\mathbf{h}}_{\boldsymbol{\Theta}_{L}}\bigl{(}{% \mathbf{S}}{\mathbf{X}}_{L-1}\bigr{)},bold_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = bold_h start_POSTSUBSCRIPT bold_Θ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_SX start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT ) , (69)
𝐗^L=𝐡𝚯L(𝐒^𝐗^L1).subscript^𝐗𝐿subscript𝐡subscript𝚯𝐿^𝐒subscript^𝐗𝐿1\displaystyle{\hat{\mathbf{X}}}_{L}={\mathbf{h}}_{\boldsymbol{\Theta}_{L}}% \bigl{(}{\hat{\mathbf{S}}}{\hat{\mathbf{X}}}_{L-1}\bigr{)}.over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = bold_h start_POSTSUBSCRIPT bold_Θ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( over^ start_ARG bold_S end_ARG over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT ) . (70)

Expanding (69) and (70) with full matrix transformations, we have

𝐗Lsubscript𝐗𝐿\displaystyle{\mathbf{X}}_{L}bold_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT =σL2(σL1(𝐒𝐗L1𝐖L1+𝐁L1)𝐖L2+𝐁L2),absentsubscript𝜎𝐿2subscript𝜎𝐿1subscript𝐒𝐗𝐿1subscript𝐖𝐿1subscript𝐁𝐿1subscript𝐖𝐿2subscript𝐁𝐿2\displaystyle=\sigma_{L2}(\sigma_{L1}({\mathbf{S}}{\mathbf{X}}_{L-1}{\mathbf{W% }}_{L1}+{\mathbf{B}}_{L1}){\mathbf{W}}_{L2}+{\mathbf{B}}_{L2}),= italic_σ start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ( italic_σ start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ( bold_SX start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT + bold_B start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ) bold_W start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT + bold_B start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ) , (71)
𝐗^Lsubscript^𝐗𝐿\displaystyle{\hat{\mathbf{X}}}_{L}over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT =σL2(σL1(𝐒^𝐗^L1𝐖L1+𝐁L1)𝐖L2+𝐁L2).absentsubscript𝜎𝐿2subscript𝜎𝐿1^𝐒subscript^𝐗𝐿1subscript𝐖𝐿1subscript𝐁𝐿1subscript𝐖𝐿2subscript𝐁𝐿2\displaystyle=\sigma_{L2}(\sigma_{L1}({\hat{\mathbf{S}}}{\hat{\mathbf{X}}}_{L-% 1}{\mathbf{W}}_{L1}+{\mathbf{B}}_{L1}){\mathbf{W}}_{L2}+{\mathbf{B}}_{L2}).= italic_σ start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ( italic_σ start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ( over^ start_ARG bold_S end_ARG over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT + bold_B start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ) bold_W start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT + bold_B start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ) . (72)

We can split (71) as

𝐘L1=𝐒𝐗L1𝐖L1+𝐁L1,subscript𝐘𝐿1subscript𝐒𝐗𝐿1subscript𝐖𝐿1subscript𝐁𝐿1\displaystyle{\mathbf{Y}}_{L1}={\mathbf{S}}{\mathbf{X}}_{L-1}{\mathbf{W}}_{L1}% +{\mathbf{B}}_{L1},bold_Y start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT = bold_SX start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT + bold_B start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT , (73a)
𝐗L1=σL1(𝐘L1),subscript𝐗𝐿1subscript𝜎𝐿1subscript𝐘𝐿1\displaystyle{\mathbf{X}}_{L1}=\sigma_{L1}({\mathbf{Y}}_{L1}),bold_X start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ( bold_Y start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ) , (73b)
𝐘L2=𝐗L1𝐖L2+𝐁L2,subscript𝐘𝐿2subscript𝐗𝐿1subscript𝐖𝐿2subscript𝐁𝐿2\displaystyle{\mathbf{Y}}_{L2}={\mathbf{X}}_{L1}{\mathbf{W}}_{L2}+{\mathbf{B}}% _{L2},bold_Y start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT = bold_X start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT + bold_B start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT , (73c)
𝐗L=σL2(𝐘L2),subscript𝐗𝐿subscript𝜎𝐿2subscript𝐘𝐿2\displaystyle{\mathbf{X}}_{L}=\sigma_{L2}({\mathbf{Y}}_{L2}),bold_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ( bold_Y start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ) , (73d)

where 𝐗L1subscript𝐗𝐿1{\mathbf{X}}_{L1}bold_X start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT denotes the intermediate output of the first layer, and 𝐗Lsubscript𝐗𝐿{\mathbf{X}}_{L}bold_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT represents the output of the second layer. For simplicity of notation, we use 𝐗Lsubscript𝐗𝐿{\mathbf{X}}_{L}bold_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT instead of 𝐗L2subscript𝐗𝐿2{\mathbf{X}}_{L2}bold_X start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT. Similarly, we split (72) as

𝐘^L1=𝐒^𝐗^L1𝐖L1+𝐁L1,subscript^𝐘𝐿1^𝐒subscript^𝐗𝐿1subscript𝐖𝐿1subscript𝐁𝐿1\displaystyle{\hat{\mathbf{Y}}}_{L1}={\hat{\mathbf{S}}}{\hat{\mathbf{X}}}_{L-1% }{\mathbf{W}}_{L1}+{\mathbf{B}}_{L1},over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT = over^ start_ARG bold_S end_ARG over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT + bold_B start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT , (74a)
𝐗^L1=σL1(𝐘^L1),subscript^𝐗𝐿1subscript𝜎𝐿1subscript^𝐘𝐿1\displaystyle{\hat{\mathbf{X}}}_{L1}=\sigma_{L1}({\hat{\mathbf{Y}}}_{L1}),over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ( over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ) , (74b)
𝐘^L2=𝐗^L1𝐖L2+𝐁L2,subscript^𝐘𝐿2subscript^𝐗𝐿1subscript𝐖𝐿2subscript𝐁𝐿2\displaystyle{\hat{\mathbf{Y}}}_{L2}={\hat{\mathbf{X}}}_{L1}{\mathbf{W}}_{L2}+% {\mathbf{B}}_{L2},over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT = over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT + bold_B start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT , (74c)
𝐗^L=σL2(𝐘^L2).subscript^𝐗𝐿subscript𝜎𝐿2subscript^𝐘𝐿2\displaystyle{\hat{\mathbf{X}}}_{L}=\sigma_{L2}({\hat{\mathbf{Y}}}_{L2}).over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ( over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ) . (74d)

Then, the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm of difference between the perturbed (74d) and clean outputs (73d) is

𝐗^L𝐗L=σL2(𝐘^L2)σL2(𝐘L2).normsubscript^𝐗𝐿subscript𝐗𝐿normsubscript𝜎𝐿2subscript^𝐘𝐿2subscript𝜎𝐿2subscript𝐘𝐿2\displaystyle\|{\hat{\mathbf{X}}}_{L}-{\mathbf{X}}_{L}\|=\|\sigma_{L2}({\hat{% \mathbf{Y}}}_{L2})-\sigma_{L2}({\mathbf{Y}}_{L2})\|.∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ∥ = ∥ italic_σ start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ( over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ) - italic_σ start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ( bold_Y start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ) ∥ . (75)

Using the Lipschitz condition of the nonlinearity function σL2()subscript𝜎𝐿2\sigma_{L2}(\cdot)italic_σ start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ( ⋅ ) in (75), we have

𝐗^L𝐗LCσL2𝐘^L2𝐘L2.normsubscript^𝐗𝐿subscript𝐗𝐿subscript𝐶subscript𝜎𝐿2normsubscript^𝐘𝐿2subscript𝐘𝐿2\displaystyle\|{\hat{\mathbf{X}}}_{L}-{\mathbf{X}}_{L}\|\leq C_{\sigma_{L2}}\|% {\hat{\mathbf{Y}}}_{L2}-{\mathbf{Y}}_{L2}\|.∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ∥ ≤ italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT - bold_Y start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ∥ . (76)

Representing 𝐘^L2subscript^𝐘𝐿2{\hat{\mathbf{Y}}}_{L2}over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT by (74c) and 𝐘L2subscript𝐘𝐿2{\mathbf{Y}}_{L2}bold_Y start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT by (73c), we have

𝐘^L2𝐘L2normsubscript^𝐘𝐿2subscript𝐘𝐿2\displaystyle\|{\hat{\mathbf{Y}}}_{L2}-{\mathbf{Y}}_{L2}\|∥ over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT - bold_Y start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ∥ =𝐗^L1𝐖L2𝐗L1𝐖L2absentnormsubscript^𝐗𝐿1subscript𝐖𝐿2subscript𝐗𝐿1subscript𝐖𝐿2\displaystyle=\|{\hat{\mathbf{X}}}_{L1}{\mathbf{W}}_{L2}-{\mathbf{X}}_{L1}{% \mathbf{W}}_{L2}\|= ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ∥
𝐗^L1𝐗L1𝐖L2.absentnormsubscript^𝐗𝐿1subscript𝐗𝐿1normsubscript𝐖𝐿2\displaystyle\leq\|{\hat{\mathbf{X}}}_{L1}-{\mathbf{X}}_{L1}\|\|{\mathbf{W}}_{% L2}\|.≤ ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ∥ ∥ bold_W start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ∥ . (77)

Representing 𝐗^L1subscript^𝐗𝐿1{\hat{\mathbf{X}}}_{L1}over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT by (74b) and 𝐗L1subscript𝐗𝐿1{\mathbf{X}}_{L1}bold_X start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT by (73b), we obtain

𝐗^L1𝐗L1=σL1(𝐘^L1)σL1(𝐘L1).normsubscript^𝐗𝐿1subscript𝐗𝐿1normsubscript𝜎𝐿1subscript^𝐘𝐿1subscript𝜎𝐿1subscript𝐘𝐿1\displaystyle\|{\hat{\mathbf{X}}}_{L1}-{\mathbf{X}}_{L1}\|=\|\sigma_{L1}({\hat% {\mathbf{Y}}}_{L1})-\sigma_{L1}({\mathbf{Y}}_{L1})\|.∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ∥ = ∥ italic_σ start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ( over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ) - italic_σ start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ( bold_Y start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ) ∥ . (78)

Using the Lipschitz condition of the nonlinearity function σL1()subscript𝜎𝐿1\sigma_{L1}(\cdot)italic_σ start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ( ⋅ ) in (78), we have

𝐗^L1𝐗L1CσL1𝐘^L1𝐘L1.normsubscript^𝐗𝐿1subscript𝐗𝐿1subscript𝐶subscript𝜎𝐿1normsubscript^𝐘𝐿1subscript𝐘𝐿1\displaystyle\|{\hat{\mathbf{X}}}_{L1}-{\mathbf{X}}_{L1}\|\leq C_{\sigma_{L1}}% \|{\hat{\mathbf{Y}}}_{L1}-{\mathbf{Y}}_{L1}\|.∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ∥ ≤ italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT - bold_Y start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ∥ . (79)

Representing 𝐘^L1subscript^𝐘𝐿1{\hat{\mathbf{Y}}}_{L1}over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT by (74a) and 𝐘L1subscript𝐘𝐿1{\mathbf{Y}}_{L1}bold_Y start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT by (73a), we have

𝐘^L1𝐘L1=𝐒^𝐗^L1𝐖L1𝐒𝐗L1𝐖L1.normsubscript^𝐘𝐿1subscript𝐘𝐿1norm^𝐒subscript^𝐗𝐿1subscript𝐖𝐿1subscript𝐒𝐗𝐿1subscript𝐖𝐿1\displaystyle\|{\hat{\mathbf{Y}}}_{L1}-{\mathbf{Y}}_{L1}\|=\|{\hat{\mathbf{S}}% }{\hat{\mathbf{X}}}_{L-1}{\mathbf{W}}_{L1}-{\mathbf{S}}{\mathbf{X}}_{L-1}{% \mathbf{W}}_{L1}\|.∥ over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT - bold_Y start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ∥ = ∥ over^ start_ARG bold_S end_ARG over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT - bold_SX start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ∥ . (80)

We can rewrite (80) by deleting and adding 𝐒𝐗^L1𝐖L1𝐒subscript^𝐗𝐿1subscript𝐖𝐿1{\mathbf{S}}{\hat{\mathbf{X}}}_{L-1}{\mathbf{W}}_{L1}bold_S over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT as

𝐒^𝐗^L1𝐖L1𝐒𝐗L1𝐖L1^𝐒subscript^𝐗𝐿1subscript𝐖𝐿1subscript𝐒𝐗𝐿1subscript𝐖𝐿1\displaystyle{\hat{\mathbf{S}}}{\hat{\mathbf{X}}}_{L-1}{\mathbf{W}}_{L1}-{% \mathbf{S}}{\mathbf{X}}_{L-1}{\mathbf{W}}_{L1}over^ start_ARG bold_S end_ARG over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT - bold_SX start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT
=𝐒^𝐗^L1𝐖L1𝐒𝐗^L1𝐖L1+𝐒𝐗^L1𝐖L1absent^𝐒subscript^𝐗𝐿1subscript𝐖𝐿1𝐒subscript^𝐗𝐿1subscript𝐖𝐿1𝐒subscript^𝐗𝐿1subscript𝐖𝐿1\displaystyle={\hat{\mathbf{S}}}{\hat{\mathbf{X}}}_{L-1}{\mathbf{W}}_{L1}-{% \mathbf{S}}{\hat{\mathbf{X}}}_{L-1}{\mathbf{W}}_{L1}+{\mathbf{S}}{\hat{\mathbf% {X}}}_{L-1}{\mathbf{W}}_{L1}= over^ start_ARG bold_S end_ARG over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT - bold_S over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT + bold_S over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT
𝐒𝐗L1𝐖L1subscript𝐒𝐗𝐿1subscript𝐖𝐿1\displaystyle\quad-{\mathbf{S}}{\mathbf{X}}_{L-1}{\mathbf{W}}_{L1}- bold_SX start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT
=(𝐒^𝐒)𝐗^L1𝐖L1+𝐒(𝐗^L1𝐗L1)𝐖L1.absent^𝐒𝐒subscript^𝐗𝐿1subscript𝐖𝐿1𝐒subscript^𝐗𝐿1subscript𝐗𝐿1subscript𝐖𝐿1\displaystyle=({\hat{\mathbf{S}}}-{\mathbf{S}}){\hat{\mathbf{X}}}_{L-1}{% \mathbf{W}}_{L1}+{\mathbf{S}}({\hat{\mathbf{X}}}_{L-1}-{\mathbf{X}}_{L-1}){% \mathbf{W}}_{L1}.= ( over^ start_ARG bold_S end_ARG - bold_S ) over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT + bold_S ( over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT ) bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT . (81)

Substituting (81) into (80), and using the triangular inequality, we have

𝐘L1𝐘^L1normsubscript𝐘𝐿1subscript^𝐘𝐿1\displaystyle\|{\mathbf{Y}}_{L1}-{\hat{\mathbf{Y}}}_{L1}\|∥ bold_Y start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT - over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ∥
(𝐒^𝐒)𝐗^L1𝐖L1+𝐒(𝐗^L1𝐗L1)𝐖L1absentnorm^𝐒𝐒subscript^𝐗𝐿1subscript𝐖𝐿1norm𝐒subscript^𝐗𝐿1subscript𝐗𝐿1subscript𝐖𝐿1\displaystyle\leq\|({\hat{\mathbf{S}}}-{\mathbf{S}}){\hat{\mathbf{X}}}_{L-1}{% \mathbf{W}}_{L1}\|+\|{\mathbf{S}}({\hat{\mathbf{X}}}_{L-1}-{\mathbf{X}}_{L-1})% {\mathbf{W}}_{L1}\|≤ ∥ ( over^ start_ARG bold_S end_ARG - bold_S ) over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ∥ + ∥ bold_S ( over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT ) bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ∥
𝐒^𝐒𝐗^L1𝐖L1+𝐒𝐗^L1𝐗L1𝐖L1.absentnorm^𝐒𝐒normsubscript^𝐗𝐿1normsubscript𝐖𝐿1norm𝐒normsubscript^𝐗𝐿1subscript𝐗𝐿1normsubscript𝐖𝐿1\displaystyle\leq\|{\hat{\mathbf{S}}}-{\mathbf{S}}\|\|{\hat{\mathbf{X}}}_{L-1}% \|\|{\mathbf{W}}_{L1}\|+\|{\mathbf{S}}\|\|{\hat{\mathbf{X}}}_{L-1}-{\mathbf{X}% }_{L-1}\|\|{\mathbf{W}}_{L1}\|.≤ ∥ over^ start_ARG bold_S end_ARG - bold_S ∥ ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT ∥ ∥ bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ∥ + ∥ bold_S ∥ ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT ∥ ∥ bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ∥ . (82)

For the second term in (82), we have 𝐗^L1=𝐗L1=𝐗0subscript^𝐗𝐿1subscript𝐗𝐿1subscript𝐗0{\hat{\mathbf{X}}}_{L-1}={\mathbf{X}}_{L-1}={\mathbf{X}}_{0}over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT = bold_X start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT = bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT for L=1𝐿1L=1italic_L = 1. Then, with the definition of GSO error (5), (82) becomes

𝐘^L1𝐘L1𝐄𝐗L1𝐖L1.normsubscript^𝐘𝐿1subscript𝐘𝐿1norm𝐄normsubscript𝐗𝐿1normsubscript𝐖𝐿1\displaystyle\|{\hat{\mathbf{Y}}}_{L1}-{\mathbf{Y}}_{L1}\|\leq\|{\mathbf{E}}\|% \|{\mathbf{X}}_{L-1}\|\|{\mathbf{W}}_{L1}\|.∥ over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT - bold_Y start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ∥ ≤ ∥ bold_E ∥ ∥ bold_X start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT ∥ ∥ bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ∥ . (83)

By connecting (83), (79), (D), (76) together, we can bound the one-layer GIN output difference as

𝐗^L𝐗LCσL2CσL1𝐖L2𝐖L1𝐗L1𝐄.normsubscript^𝐗𝐿subscript𝐗𝐿subscript𝐶subscript𝜎𝐿2subscript𝐶subscript𝜎𝐿1normsubscript𝐖𝐿2normsubscript𝐖𝐿1normsubscript𝐗𝐿1norm𝐄\displaystyle\|{\hat{\mathbf{X}}}_{L}-{\mathbf{X}}_{L}\|\leq C_{\sigma_{L2}}C_% {\sigma_{L1}}\|{\mathbf{W}}_{L2}\|\|{\mathbf{W}}_{L1}\|\|{\mathbf{X}}_{L-1}\|% \|{\mathbf{E}}\|.∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ∥ ≤ italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ bold_W start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ∥ ∥ bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ∥ ∥ bold_X start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT ∥ ∥ bold_E ∥ . (84)

Taking the expectation of (84), we have

𝔼[𝐗^L𝐗L]CσL2CσL1𝐖L2𝐖L1𝐗L1𝔼[𝐄].𝔼delimited-[]normsubscript^𝐗𝐿subscript𝐗𝐿subscript𝐶subscript𝜎𝐿2subscript𝐶subscript𝜎𝐿1normsubscript𝐖𝐿2normsubscript𝐖𝐿1normsubscript𝐗𝐿1𝔼delimited-[]norm𝐄\displaystyle{\mathbb{E}}\left[\|{\hat{\mathbf{X}}}_{L}-{\mathbf{X}}_{L}\|% \right]\leq C_{\sigma_{L2}}C_{\sigma_{L1}}\|{\mathbf{W}}_{L2}\|\|{\mathbf{W}}_% {L1}\|\|{\mathbf{X}}_{L-1}\|{\mathbb{E}}\left[\|{\mathbf{E}}\|\right].blackboard_E [ ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ∥ ] ≤ italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ bold_W start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ∥ ∥ bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ∥ ∥ bold_X start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT ∥ blackboard_E [ ∥ bold_E ∥ ] . (85)

Finally, let ξ=CσL2CσL1𝐖L2𝐖L1𝐗L1𝜉subscript𝐶subscript𝜎𝐿2subscript𝐶subscript𝜎𝐿1normsubscript𝐖𝐿2normsubscript𝐖𝐿1normsubscript𝐗𝐿1\xi=C_{\sigma_{L2}}C_{\sigma_{L1}}\|{\mathbf{W}}_{L2}\|\|{\mathbf{W}}_{L1}\|\|% {\mathbf{X}}_{L-1}\|italic_ξ = italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ bold_W start_POSTSUBSCRIPT italic_L 2 end_POSTSUBSCRIPT ∥ ∥ bold_W start_POSTSUBSCRIPT italic_L 1 end_POSTSUBSCRIPT ∥ ∥ bold_X start_POSTSUBSCRIPT italic_L - 1 end_POSTSUBSCRIPT ∥, then, we have

𝔼[𝐗^L𝐗L]ξ𝔼[𝐄].𝔼delimited-[]normsubscript^𝐗𝐿subscript𝐗𝐿𝜉𝔼delimited-[]norm𝐄\displaystyle{\mathbb{E}}\left[\|{\hat{\mathbf{X}}}_{L}-{\mathbf{X}}_{L}\|% \right]\leq\xi{\mathbb{E}}\left[\|{\mathbf{E}}\|\right].blackboard_E [ ∥ over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT - bold_X start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ∥ ] ≤ italic_ξ blackboard_E [ ∥ bold_E ∥ ] . (86)

This completes the proof. ∎

References

  • [1] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst, “Geometric deep learning: Going beyond Euclidean data,” IEEE Signal Process. Mag., vol. 34, no. 4, pp. 18–42, July 2017.
  • [2] X. Dong, D. Thanou, L. Toni, M. Bronstein, and P. Frossard, “Graph signal processing for machine learning: A review and new perspectives,” IEEE Signal Process. Mag., vol. 37, no. 6, pp. 117–127, Oct. 2020.
  • [3] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” IEEE Trans. Neural Netw. Learning Syst., vol. 32, no. 1, pp. 4–24, Mar. 2021.
  • [4] E. Isufi, F. Gama, D. I. Shuman, and S. Segarra, “Graph filters for signal processing and machine learning on graphs,” IEEE Trans. Signal Process., pp. 1–32, 2024.
  • [5] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in Proc. 5th Int. Conf. Learn. Representations, Toulon, France, Apr. 24-26, 2017, pp. 1–14.
  • [6] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” in Proc. 7th Int. Conf. Learn. Representations, New Orleans, LA, USA, May 6-9, 2019, pp. 1–17.
  • [7] Q. Li, X.-M. Wu, H. Liu, X. Zhang, and Z. Guan, “Label efficient semi-supervised learning via graph filtering,” in Proc. 32nd Conf. Comput. Vision and Pattern Recognition, Long Beach, CA, USA, June 16-20, 2019, pp. 9574–9583.
  • [8] F. Wu, T. Zhang, A. H. d. Souza, Jr, C. Fifty, T. Yu, and K. Q. Weinberger, “Simplifying graph convolutional networks,” in Proc. 36th Int. Conf. Mach. Learning, Long Beach, California, USA, June 9-15, 2019, pp. 6861–6871.
  • [9] R. Levie, F. Monti, X. Bresson, and M. M. Bronstein, “Cayleynets: Graph convolutional neural networks with complex rational spectral filters,” IEEE Trans. Signal Process., vol. 67, no. 1, pp. 97–109, Nov. 2019.
  • [10] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, “Graph attention networks,” in Proc. 6th Int. Conf. Learn. Representations, Vancouver, BC, Canada, Apr. 30 - May 3, 2018.
  • [11] E. Isufi, F. Gama, and A. Ribeiro, “EdgeNets: Edge varying graph neural networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 7457–7473, Sept. 2022.
  • [12] M. Coutino, E. Isufi, and G. Leus, “Advances in distributed graph filtering,” IEEE Trans. Signal Process., vol. 67, no. 9, pp. 2320–2333, Mar. 2019.
  • [13] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs,” IEEE Trans. Signal Process., vol. 61, no. 7, pp. 1644–1656, Apr. 2013.
  • [14] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in Proc. 30th Conf. Neural Inform. Process. Syst., Barcelona, Spain, Dec. 5-10, 2016, pp. 3844–3858.
  • [15] X. Dong, D. Thanou, P. Frossard, and P. Vandergheynst, “Learning laplacian matrix in smooth graph signal representations,” IEEE Trans. Signal Process., vol. 64, no. 23, pp. 6160–6173, Dec. 2016.
  • [16] S. Segarra, A. G. Marques, G. Mateos, and A. Ribeiro, “Network topology inference from spectral templates,” IEEE Trans. Signal Inf. Process. Netw., vol. 3, no. 3, pp. 467–483, July 2017.
  • [17] A. Buciulea, S. Rey, and A. G. Marques, “Learning graphs from smooth and graph-stationary signals with hidden variables,” IEEE Trans. Signal Inf. Process. Netw., vol. 8, pp. 273–287, Mar. 2022.
  • [18] J. Miettinen, S. A. Vorobyov, and E. Ollila, “Modelling and studying the effect of graph errors in graph signal processing,” Signal Process., vol. 189, 108256, pp. 1–8, Dec. 2021.
  • [19] Z. Gao, E. Isufi, and A. Ribeiro, “Stability of graph convolutional neural networks to stochastic perturbations,” Signal Process., vol. 188, 108216, pp. 1–15, Nov. 2021.
  • [20] K. Xu, H. Chen, S. Liu, P.-Y. Chen, T.-W. Weng, M. Hong, and X. Lin, “Topology attack and defense for graph neural networks: An optimization perspective,” in Proc. 28th Int. Joint Conf. Artif. Intell., Macao, China, Aug. 10-16, 2019, pp. 3961–3967.
  • [21] E. Ceci and S. Barbarossa, “Graph signal processing in the presence of topology uncertainties,” IEEE Trans. Signal Process., vol. 68, pp. 1558–1573, Feb. 2020.
  • [22] H. Kenlay, D. Thanou, and X. Dong, “On the stability of graph convolutional neural networks under edge rewiring,” in Proc. 46th IEEE Int. Conf. Acoustic, Speech and Signal Process., Toronto, Canada, June 6-11, 2021, pp. 8513–8517.
  • [23] ——, “Interpretable stability bounds for spectral graph filters,” in Proc. 38th Int. Conf. Mach. Learning, vol. 139, Virtual, July 18-24, 2021, pp. 5388–5397.
  • [24] F. Gama, J. Bruna, and A. Ribeiro, “Stability properties of graph neural networks,” IEEE Trans. Signal Process., vol. 68, pp. 5680–5695, Sept. 2020.
  • [25] R. Levie, W. Huang, L. Bucci, M. Bronstein, and G. Kutyniok, “Transferability of spectral graph convolutional neural networks,” J. Mach. Learn. Res., vol. 22, no. 1, pp. 12 462–12 520, Nov. 2021.
  • [26] H. Dai, H. Li, T. Tian, X. Huang, L. Wang, J. Zhu, and L. Song, “Adversarial attack on graph structured data,” in Proc. 35th Int. Conf. Mach. Learning, vol. 80, Stockholm, Sweden, July 10-15, 2018, pp. 1115–1124.
  • [27] D. Zügner, A. Akbarnejad, and S. Günnemann, “Adversarial attacks on neural networks for graph data,” in Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discov. & Data Mining, London, United Kingdom, Aug. 19-23, 2018, p. 2847–2856.
  • [28] H. Wu, C. Wang, Y. Tyshetskiy, A. Docherty, K. Lu, and L. Zhu, “Adversarial examples for graph data: Deep insights into attack and defense,” in Proc. 28th Int. Joint Conf. Artif. Intell., Macao, China, Aug. 10-16, 2019, pp. 4816–4823.
  • [29] B. Wang, J. Jia, X. Cao, and N. Z. Gong, “Certified robustness of graph neural networks against adversarial structural perturbation,” in Proc. 27th ACM SIGKDD Int. Conf. Knowl. Discov. & Data Mining, Virtual, Aug. 14-18, 2021, pp. 1645–1653.
  • [30] L. Lin, E. Blaser, and H. Wang, “Graph structural attack by perturbing spectral distance,” in Proc. 28th ACM SIGKDD Int. Conf. Knowl. Discov. & Data Mining, Washington DC, USA, Aug. 14-18, 2022, p. 989–998.
  • [31] X. Wang, E. Ollila, and S. A. Vorobyov, “Graph neural network sensitivity under probabilistic error model,” in Proc. 30th Eur. Signal Process. Conf., Belgrade, Serbia, Aug. 29 - Sept. 2, 2022, pp. 2146–2150.
  • [32] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98, Apr. 2013.
  • [33] M. Penrose, “Random geometric graphs,” Oxford Stud. in Probab., vol. 5, 2003.
  • [34] A. Hagberg, P. Swart, and D. S Chult, “Exploring network structure, dynamics, and function using networkx,” Los Alamos National Lab., Los Alamos, NM, USA, Tech. Rep., 2008.
  • [35] G. Golub and C. Van Loan, Matrix Computations vol. 3.   Baltimore, MD, USA: The Johns Hopkins Univ. Press, 2012.
  • [36] T. Aven, “Upper (lower) bounds on the mean of the maximum (minimum) of a number of random variables,” J. Appl. Probab., vol. 22, no. 3, pp. 723–728, Sept. 1985.
  • [37] G. Ohayon, T. Michaeli, and M. Elad, “The perception-robustness tradeoff in deterministic image restoration,” arXiv:2311.09253, [eess.IV], 2023.
  • [38] B. Weisfeiler and A. Lehman, “A reduction of a graph to a canonical form and an algebra arising during this reduction,” Nauchno-Technicheskaya Informatsia, vol. 2, no. 9, pp. 12–16, 1968.
  • [39] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad, “Collective classification in network data,” AI Magazine, vol. 29, no. 3, p. 93, Sept. 2008.
  • [40] L. Chizat, G. Peyré, B. Schmitzer, and F.-X. Vialard, “Unbalanced optimal transport: Dynamic and kantorovich formulations,” J. Funct. Anal., vol. 274, no. 11, pp. 3090–3123, June 2018.
  • [41] L. Chapel, M. Z. Alaya, and G. Gasso, “Partial optimal tranport with applications on positive-unlabeled learning,” in Proc. 33th Conf. Neural Inform. Process. Syst., H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Virtual, Dec. 7-12, 2020, pp. 2903–2913.
  • [42] H. P. Maretic, M. El Gheche, M. Minder, G. Chierchia, and P. Frossard, “Wasserstein-based graph alignment,” IEEE Trans. Signal Inf. Process. Netw., vol. 8, pp. 353–363, Apr. 2022.
  • [43] C.-Y. Chuang and S. Jegelka, “Tree mover’s distance: Bridging graph metrics and stability of graph neural networks,” in Proc. 35th Conf. Neural Inform. Process. Syst., vol. 35, New Orleans, USA, Nov. 28 - Dec. 9, 2022, pp. 2944–2957.
  • [44] L. Sun, Y. Dou, C. Yang, K. Zhang, J. Wang, P. S. Yu, L. He, and B. Li, “Adversarial attack and defense on graph data: A survey,” IEEE Trans. Knowl. Data Eng., pp. 1–20, Sept. 2022.
  • [45] H. Jin and X. Zhang, “Latent adversarial training of graph convolution networks,” in Proc. 36th Int. Conf. Mach. Learning Workshop Learn. Reasoning with Graph-structured Representations, Long Beach, California, USA, June 9-15, 2019, pp. 1–7.
  • [46] F. Feng, X. He, J. Tang, and T.-S. Chua, “Graph adversarial training: Dynamically regularizing based on graph structure,” IEEE Trans. Knowl. Data Eng., vol. 33, no. 6, pp. 2493–2504, June 2019.
  • [47] Q. Dai, X. Shen, L. Zhang, Q. Li, and D. Wang, “Adversarial training methods for network embedding,” in Proc. 30th The World Wide Web Conf., San Francisco, CA, USA, May 13-17, 2019, pp. 329–339.
  • [48] J. Ren, Z. Zhang, J. Jin, X. Zhao, S. Wu, Y. Zhou, Y. Shen, T. Che, R. Jin, and D. Dou, “Integrated defense for resilient graph matching,” in Proc. 38th Int. Conf. Mach. Learning, vol. 139, Virtual, July 18-24, 2021, pp. 8982–8997.
  • [49] X. Zhao, Z. Zhang, Z. Zhang, L. Wu, J. Jin, Y. Zhou, R. Jin, D. Dou, and D. Yan, “Expressive 1-lipschitz neural networks for robust multiple graph learning against adversarial attacks,” in Proc. 38th Int. Conf. Mach. Learning, vol. 139, July 18-24, 2021, pp. 12 719–12 735.
  • [50] H. E. Egilmez, E. Pavez, and A. Ortega, “Graph learning from filtered signals: Graph system and diffusion kernel identification,” IEEE Trans. Signal Inf. Process. Netw., vol. 5, no. 2, pp. 360–374, June 2018.
  • [51] X. Pu, S. L. Chau, X. Dong, and D. Sejdinovic, “Kernel-based graph learning from smooth signals: A functional viewpoint,” IEEE Trans. Signal Inf. Process. Netw., vol. 7, pp. 192–207, Feb. 2021.
  • [52] R. Levie, E. Isufi, and G. Kutyniok, “On the transferability of spectral graph filters,” in Proc. 13th Int. Conf. on Sampling Theory and Appl., Bordeaux, France, July 8-12, 2019, pp. 1–5.