Paper On Graph Theory
Paper On Graph Theory
Abstract—The analysis of signals defined over a graph is Fourier basis is constituted by the eigenvectors of the graph
relevant in many applications, such as social and economic Laplacian, which represent the basis that minimizes the l2 -
networks, big data or biological networks, and so on. A key norm graph total variation. This approach is well motivated
tool for analyzing these signals is the so called Graph Fourier
Transform (GFT). Alternative definitions of GFT have been on undirected graphs where the minimization of the ℓ2 -norm
arXiv:1601.05972v3 [math.SP] 1 Jul 2017
suggested in the literature, based on the eigen-decomposition of total variation is equivalent to minimizing the quadratic form
either the graph Laplacian or adjacency matrix. In this paper, built on the Laplacian matrix. Hence an orthonormal basis
we address the general case of directed graphs and we propose minimizing the ℓ2 -norm total variation leads to the eigenvec-
an alternative approach that builds the graph Fourier basis tors of the Laplacian matrix. However, these properties do
as the set of orthonormal vectors that minimize a continuous
extension of the graph cut size, known as the Lovász extension. not hold anymore in the directed graph case. An alternative
To cope with the non-convexity of the problem, we propose two approach, valid for the more general and challenging case of
alternative iterative optimization methods, properly devised for directed graphs, was proposed in [1], [4]. That method builds
handling orthogonality constraints. Finally, we extend the method on the Jordan decomposition of the adjacency matrix, and
to minimize a continuous relaxation of the balanced cut size. defines the associated generalized eigenvectors as the GFT
The formulated problem is again non-convex and we propose an
efficient solution method based on an explicit-implicit gradient basis. This second method is rooted on the association of the
algorithm. graph adjacency matrix with the signal shift operator, which
is at the basis of all shift-invariant linear filtering methods
Index Terms—Graph signal processing, Graph Fourier Trans-
form, total variation, clustering. for graph signals [15], [16]. This approach paved the way
to the algebraic signal processing framework. However, the
GFT definition proposed in [4] raises some important issues
I. I NTRODUCTION requiring further investigation. First, the basis vectors are
Graph signal processing (GSP) has attracted a lot of interest linearly independent, but in general they are not orthogonal,
in the last years because of its many potential applications, so that the resulting transform is not unitary and then it
from social and economic networks to smart grids, gene does not preserve scalar products. Second, the total variation
regulatory networks, and so on. GSP represents a promising introduced in [4], does not respect some desirable properties,
tool for the representation, processing and analysis of complex for example, it does not guarantee that a constant graph
networks, where discrete signals are defined on the vertices of signal has zero total variation [17], [18]. Finally, the numerical
a (possibly weighted) graph. Many works in the recent litera- computation of the Jordan decomposition often incurs into
ture attempt to extend the classical discrete signal processing well-known numerical instabilities, even for moderate size
(DSP) theory from time signals or images to signals defined matrices [19], although alternative decomposition methods
over the vertices of a graph by introducing the basic concepts have been recently suggested to tackle these instability issues
of graph-based filtering [1]–[3], graph-based transforms [4]– [20]. In some applications, one of the major motivations for
[7], sampling and uncertainty principle [8]–[12]. A central role using the GFT is the analysis of graph signals that exhibit
in GSP is played by the spectral analysis of graph signals, clustering properties, i.e. signals that are smooth within subsets
which is based on the introduction of the so called Graph of highly interconnected nodes (clusters), while they can vary
Fourier Transform (GFT). Alternative definitions of GFT have arbitrarily across different clusters. In such cases, the GFT of
been introduced see, e.g., [5], [4], [8], [13], [14], each of these signals is typically sparse and its sparsity carries relevant
them coming from different motivations, like building a basis information on the data under analysis. These signals are said
with minimal variation, filtering signals defined over graphs, to be band-limited, in analogy with what happens to smooth
etc. Two basic approaches have been suggested. The first one time signals. Within the machine learning context, GSP can
is rooted on spectral graph theory and it uses the graph- play a key role in unsupervised and semi-supervised learning,
Laplacian as the central unit, see e.g. [5] and the references as suggested in [21], [22]. In these applications, the input is a
therein. This approach applies to undirected graphs and the point cloud and the goal is to detect clusters, either without or
with limited supervision. Graph-based methods tackle these
S. Sardellitti and S. Barbarossa are with Sapienza University of Rome, problems by associating a graph to the point cloud, where
DIET Dept., Via Eudossiana 18, 00184 Rome, Italy (e-mail: stefa- the vertices are the points themselves, whereas edges between
[email protected], [email protected]). P. Di Lorenzo is
with the Dept. of Engineering, University of Perugia, Via G. Duranti 93, pairs of points are established if two points are sufficiently
06125 Perugia, Italy (e-mail: [email protected]). This work has been close. The goal of clustering/classification is to associate a
supported by TROPIC Project, Nr. ICT-318784. The work of P. Di Lorenzo different label to each cluster. If we look at these labels as a
was funded by the “Fondazione Cassa di Risparmio di Perugia”. Matlab
code to implement the algorithms proposed in this paper is available at signal defined over the points (vertices), this signal is band-
https://sites.google.com/site/stefaniasardellitti/code-supplement limited by construction [21], [22].
2
In this paper, we propose a novel alternative approach II. M IN - CUT SIZE AND ITS L OV ÁSZ EXTENSION
to build the GFT basis for the general case of directed In this section, we recall the definitions of cut size and
graphs. Rather than starting from the decomposition of one Lovász extension, as they will form the basic tools for our
of the graph matrix descriptors, either adjacency or Laplacian, definition of GFT. We consider a graph G = {V, E} consisting
we start identifying an objective function to be minimized of a set of N vertices (or nodes) V = {1, . . . , N } along with
and then we build an orthogonal matrix that minimizes that a set of edges E = {aij }i,j∈V , such that aij > 0 if there is
objective function. More specifically, we choose as objective a direct link from node j to node i, or aij = 0 otherwise.
function the graph cut size, as its minimization leads to We denote with |V| the cardinality of V, i.e. the number of
identifying clusters. We consider the general case of directed elements of V. A signal s on a graph G is defined as a mapping
graphs, which subsumes the undirected graphs as a particular from the vertex set to a real vector of size N = |V|, i.e.
case. The cut function is a set function and its minimization s : V → R. Let A denote the N × N adjacency matrix with
is NP-hard, however exploiting the sub-modularity property entries given by the edge weights aij for i, j = 1, . . . , N .
of the cut size, it has been shown that there exists a lossless The graph Laplacian is defined as L := D − A where the
convex relaxation of the cut size, named its Lovász extension
P D is a diagonal matrix whose ith diagonal
in-degree matrix
[23], [24], whose minimization preserves the optimality of the entry is di = j aij .
solution of the original non-convex problem. Interestingly, the One of the basic operations over graphs is clustering,
Lovász extension of the cut size gives rise to an alternative i.e. the partition of the graph onto disjoint subgraphs, such
definition of total variation of a graph signal that captures that the vertices within each subgraph (cluster) are highly
the edges’ directivity. Furthermore, in the case of undirected interconnected, whereas there are only a few links between
graphs, the Lovász extension reduces to the l1 norm total different clusters. Finding a good partition can be formulated
variation of a graph signal, which represents the discrete as the minimization of the cut size [28], whose definition
counterpart of the total variation of continuous-time signals, is reported here below. Let us consider a subset of vertices
which plays a fundamental role in the continuous time Fourier S ⊂ V, and its complement set in V denoted by S̄. The edge
Transform, see, e.g., [17], [13]. We define the GFT basis boundary of S is defined as the set of edges with one end in
as the set of orthonormal vectors that minimize the Lovász S and the other end in S̄. The cut size between S and S̄ is
extension of the cut size. Unfortunately, even though the ob- defined as the sum of the weights over the boundary [28], i.e.
jective function is convex, the resulting problem is non-convex, X
¯ :=
cut(S, S) aji . (1)
because of the orthogonality constraint imposed on the basis
vectors. Thus, to find a (possibly local) solution of the problem i∈S,j∈S̄
in an efficient manner, we exploit two recently developed Finding the partition that minimizes the cut size in (1) is an
methods that are specifically tailored to handle non-convex NP-hard problem. To overcome this difficulty, we exploit the
orthogonality constraints, namely, the splitting orthogonality sub-modularity property of the cut size [24], which ensures
constraints (SOC) method [25], and the proximal alternating that its Lovász extension is a convex function [24]. We briefly
minimized augmented Lagrangian (PAMAL) method [26]. recall some of the main definitions and properties here below.
SOC method is quite simple to implement and, even if no Given the set V and its power set 2V , i.e. the set of all its
convergence proof has been provided yet, extensive numerical subsets, let us consider a real-valued set function F : 2V → R.
results validate the effectiveness and robustness of such a The cut size in (1) is an example of set function, with
strategy. Conversely, PAMAL algorithm, which hybridizes the F (S) := cut(S, S̄). Every element of the power set 2V may
augmented Lagrangian method and the proximal minimization be associated to a vertex of the hyper-cube {0, 1}N . Namely,
scheme, is known to guarantee convergence. Furthermore, any a set S ⊆ V can be uniquely identified to the indicator vector
limit point of each sequence generated by PAMAL method 1S , i.e. the vector which is 1 at entry j, if j ∈ S, and
satisfies the Karush-Kuhn Tucker conditions of the original 0 otherwise. Then, a set-function F can be defined on the
non-convex problem [26]. Finally, to prevent the resulting vertices of the hyper-cube {0, 1}N . The Lovász extension of
basis vectors to be excessively sparse vectors, we consider a graph function F [23], [24], allows the extension of a set-
the minimization of a continuous relaxation of the balanced function defined on the vertices of the hyper-cube {0, 1}N , to
cut size. To solve the corresponding non-convex fractional the full hypercube [0, 1]N and hence to the entire space RN .
problem, we adopt an efficient and convergent algorithm based We recall its definition hereafter.
on the explicit-implicit gradient method [27]. Definition 1: Let F : 2V → R be a set function with F (∅) =
The paper is organized as follows. Sec. II introduces the 0. Let x ∈ RN be ordered w.l.o.g. in increasing order such
graph signal variations as the continuous Lovász extension of that x1 ≤ x2 ≤ . . . ≤ xN . Define C0 , V and Ci , {j ∈ V :
the min-cut size. In Sec. III, we define the GFT as the set xj > xi } for i > 0. Then, the Lovász extension f : RN → R
of optimal orthonormal vectors minimizing the graph signal of F , evaluated at x, is given by:
variation, and in Sec. IV we illustrate the optimization methods
N
X
used for solving the resulting non-convex problem. Therefore,
f (x) = xi (F (Ci−1 ) − F (Ci ))
in Sec. V we conceive the GFT as the solution of a balanced
i=1
min cut problem, while Sec. VI illustrates some numerical N −1
(2)
X
examples validating the effectiveness of the proposed ap- = F (Ci )(xi+1 − xi ) + x1 F (V).
proaches. Finally, Sec. VII draws some conclusions. i=1
3
Note that f (x) is piecewise affine w.r.t. x, and F (S) = f (1S ) III. G RAPH F OURIER BASIS AND D IRECTED
for all S ⊆ V. An interesting class of set functions is given by T OTAL VARIATION
the submodular set functions, whose definition follows next. Alternative definitions of GFT have been proposed in the
Definition 2: A set function F : 2V → R is submodular if literature, depending on the different perspectives used to em-
and only if, ∀A, B ⊆ V, it satisfies the following inequality: phasize specific signal features. In case of undirected graphs,
the GFT of a vector s was defined as [5]
F (A) + F (B) ≥ F (A ∪ B) + F (A ∩ B).
ŝ = UT s, (5)
A fundamental property of a submodular set function is that
its Lovász extension is a convex function. This is formally where the columns of matrix U are the eigenvectors of the
stated in the following proposition [24, p.23]. Laplacian matrix L, i.e. L = UΛUT . This definition is basi-
Proposition 1: Let F : 2V → R be a submodular function cally rooted on the clustering properties of these eigenvectors,
and f be its Lovász extension. Then, it holds see, e.g., [30]. In fact, by definition of eigenvector, the Fourier
basis used in (5) can be thought as the solution of the following
min F (S) = min f (x) = min f (x). sequence of optimization problems:
S⊆V x∈{0,1}N x∈[0,1]N
uk = arg min uTk Luk := arg min GQV(uk )
Moreover, the set of minimizers of f (x) on [0, 1]N is the uk ∈RN uk ∈RN (6)
convex hull of the minimizers of f (x) on {0, 1}N . s.t. T
uk uℓ = δkℓ , ℓ = 1, . . . , k,
The cut size function in (1) is known for being submodular,
see, e.g., [24], [29]. More specifically, as shown in [24, p.54], for k = 2, . . . , N , where δkℓ is the Kronecker delta, and we
the cut function is equal to the positive linear combination of used the property that the quadratic form built on the Laplacian
the function Gij : S 7→ (1S )i [1 − (1S )j ], i.e. is the ℓ2 -norm, or graph quadratic variation (GQV), i.e.
X N
X
cut(S) = aji Gij . GQV(x) := aji (xi − xj )2 .
i,j∈V i,j=1,j>i
The function Gij is the extension to V of a function G eij Thus, the Fourier basis obtained from (6) coincides with
defined only on the power set of {i, j}, where G e ij ({i}) = 1 the set of orthonormal vectors that minimize the ℓ2 -norm
and all other values are zero, so that, from (2), its Lovász total variation. In all applications where the graph signals
e ij (xi , xj ) = [xi − xj ]+ with [y]+ := max{y, 0}.
extension is G exhibit a cluster behavior, meaning that the signal is relatively
Therefore the Lovász extension of the cut size function, in the smooth within each cluster, whereas it can vary arbitrarily from
general case of directed graphs, is given by: cluster to cluster, the GFT defined as in (5) helps emphasizing
the presence of clusters [30]. However, the identification of
N
X the Laplacian eigenvectors as the orthonormal vectors that
f (x) = aji [xi − xj ]+ := GDV(x). (3) minimize the GQV is only valid for undirected graphs, for
i,j=1
which the quadratic form built on the Laplacian reduces to
We term this function the Graph Directed Variation (GDV), the GQV. For directed graphs, the quadratic form in (6) cap-
as it captures the edges’ directivity. For undirected graphs, tures only properties associated to the symmetrized Laplacian
imposing aij = aji , the Lovász extension of the cut size boils (i.e., Ls = (L + LT )/2), and hence it cannot capture the
down to edges’ directivity. The generalization to directed graphs, was
N proposed in [4] as
X
f (x) = aji |xi − xj | := GAV(x). (4) ŝ = V−1 s, (7)
i,j=1,i>j
where V comes from the Jordan decomposition of the non-
Interestingly, this function, which we call Graph Absolute symmetric adjacency matrix A, i.e. A = VJV−1 . To estimate
Variation (GAV), represents the discrete counterpart of the l1 variations of the graph Fourier basis and to identify an order
norm total variation, which plays a key role in the classical among frequencies, the total variation of a vector was defined
Fourier Transform of continuous time signals [17], [13]. in [4] as
It is easy to show that the directed variation GDV satisfies the TVA (s) = ks − Anorm sk1 , (8)
following properties:
where Anorm := A/|λmax (A)|. The previous definition leads to
i) GDV(x) ≥ 0, ∀ x ∈ RN ; the elegant theory of algebraic signal processing over graphs
ii) GDV(x) = 0, ∀ x = c1 with c ≥ 0; [1,4,15,16]. However, there are some critical issues associated
iii) GDV(α x) = α GDV(x), ∀ α ≥ 0, i.e. it is positively to that definition that need to be further explored. First, the
homogeneous; definition of total variation as given in (8) does not ensure
iv) GDV(x + y) ≤ GDV(x) + GDV(y), ∀ x, y ∈ RN . that a constant graph signal has zero total variation, and this
GDV is neither a proper norm nor a semi-norm, since, in this collides with the common meaning of total variation [17],
latter case, it should be absolutely homogeneous. However, it [13], [18]. Second, the columns of V are linearly independent
meets the desired property ii) ensuring that a constant graph complex generalized eigenvectors, but in general they are not
signal has zero total variation. orthogonal. This gives rise to a GFT that does not preserve
4
inner products when passing from the observation to the trans- solutions and the choice of the initial points, ensuring a fast
formed domain. Furthermore, the computation of the Jordan convergence rate, is usually nontrivial. To cope with these
decomposition incurs into serious and intractable numerical issues, in this section we present two alternative iterative
instabilities when the graph size exceeds even moderate values algorithms to solve the non-convex, non-smooth problem P,
[19] and more stable matrix decomposition methods have to hinging on some recently developed methods for solving
be adopted to tackle its instability issues [20]. To overcome non-differentiable problems with non-convex constraints [25],
some of these criticalities, very recently the authors of [14] [26]. The first method, introduced in [25], called splitting
proposed a shift operator based on the directed Laplacian of a orthogonality constraints (SOC) method, is based on the
graph. Using the Jordan decomposition, the graph Laplacian alternating method of multipliers (ADMM) [35], [36] and the
is decomposed as split Bregman method [37], [38]. The SOC method leads to
L = VL JL V−1 L (9) some important benefits, as it is simple to implement and the
resulting non-convex sub-problem with orthonormal constraint
and the GFT is defined in [14] as
admits a closed form solution. Although no convergence proof
ŝ = V−1
L s. (10) of SOC method has been provided yet, numerical results
validate its value and robustness.
To quantify oscillations in the graph harmonics and to order An alternative optimization method that tackles the non-
the frequencies, the total variation was defined in [14] as convex minimization problem P and guarantees convergence
TVL (s) = kL sk1 . (11) is the PAMAL algorithm recently developed in [26]. The
algorithm combines the augmented Lagrangian method with
This definition of total variation ensures a zero value for con- proximal alternating minimization. A convergence proof was
stant graph signals. Furthermore, the eigenvalues with small provided in [26]. More specifically, this method has the so-
absolute value correspond to low frequencies. Nevertheless, called sub-sequence convergence property, i.e. there exists
the GFT given by F = V−1 L is still a non-unitary transform at least one convergent sub-sequence, and any limit point
and its computation is affected by the numerical instabilities satisfies the Karush-Kuhn Tucker (KKT) conditions of the
associated to the Jordan decomposition. original nonconvex problem. Building on these algorithms, in
In this paper, we propose a novel method to build the the sequel we introduce two efficient optimization strategies
graph Fourier basis as the set of N orthonormal vectors that build the basis for the Graph Fourier Transform, as the
xi , i = 1, . . . , N , that minimizes the total variation defined in solution of problem P.
(3), which represents the continuous convex Lovász extension
of the graph cut size in (1). The first vector √ is certainly the
constant vector, i.e. x1 = b 1, with b = 1/ N , as this (unit-
norm) vector yields a total variation equal to zero. Let us A. SOC method
introduce the matrix X := (x1 , . . . , xN ) ∈ RN ×N containing
all the basis vectors. Thus, the search for the GFT basis can The SOC algorithm was developed in [25] and tackles
be formally stated as the search for the orthonormal vectors orthogonality constrained problems by iteratively solving a
that minimize the directed total variation in (3), i.e. convex problem and a quadratic problem that admits a closed-
N
form solution. More specifically, introducing an auxiliary
X variable P = X to split the orthogonality constraint, problem
min GDV(X) := GDV(xk ) (P)
X∈RN ×N
k=1
P is equivalent to
s.t. XT X = I, x1 = b1.
min GDV(X)
The constraints are used to find an orthonormal basis and X,P∈RN ×N
(12)
to prevent the trivial null solution. Although the objective s.t. X = P, x1 = b1, PT P = I.
function is convex, problem P is non-convex due to the
orthogonality constraint. In the next section, we present two
alternative optimization strategies aimed at solving the non- The first constraint is linear and, as discussed in [25], it can
convex, non-differentiable problem P in an efficient manner. be solved using Bregman iteration. Therefore, by adding the
Bregman penalty function [37], problem (12) is equivalent to
IV. O PTIMIZATION A LGORITHMS the following simple two-step procedure:
To avoid handling the non-convex orthogonality constraints
β
directly, several methods have been proposed in the literature (Xk , Pk ) , arg min GDV(X) + kX − P + Bk−1 k2F
based on the solution of a sequence of unconstrained problems X,P∈RN ×N 2
approaching the feasibility condition, such as the penalty meth- s.t. x1 = b1, PT P = I;
ods [31], [32] and the augmented Lagrangian based methods
Bk = Bk−1 + Xk − Pk ,
[33], [34]. The penalty method is generally simple, but it
suffers from slow-convergence and ill-conditioning. On the
other hand, the standard augmented Lagrangian method solves where β is a strictly positive constant. Similarly to ADMM
a sequence of sub-problems that usually have no analytical and split Bregman iteration [39], the above problem can be
5
Algorithm 1: SOC method PT P = I}, which represents the Stiefel manifold [40]. For
0 N×N 0T 0 0 0
Set β > 0, X ∈ R , X X = I, x1 = b1, P = X , any set S, its indicator function is defined as
B0 = 0, k = 1.
Repeat 0, if X ∈ S
δS (X) = (14)
Find Xk as solution of Pk in (13), +∞, otherwise.
Y k = Xk + Bk−1 ,
T Given these symbols, problem (12) is equivalent to the fol-
Compute SVD decomposition Y k = Q̄SR̄ ,
k
P = Q̄R̄ ,
T lowing one:
Bk = Bk−1 + Xk − Pk , min f (X, P) , GDV(X) + δS1 (x1 ) + δSt (P) (Pe )
k = k + 1, X,P∈RN ×N
until convergence. s.t. H(X, P) , P − X = 0.
The basic idea to solve a problem in the form of Pe was
solved by iteratively minimizing with respect to X and P: proposed in [26], and combines the augmented Lagrangian
method [41], [33] with the alternating proximal minimization
k β k−1 k−1 2 algorithm. The result is known as the PAM method [42], which
1. X , arg min GDV(X) + kX − P +B kF
X∈RN ×N 2 deals with non-smooth, non-convex optimization. According
s.t. x1 = b1 (Pk ) to the augmented Lagrangian method, we add a penalty term
to the objective function in order to associate a high cost
2. Pk , arg min kP − (Xk + Bk−1 )k2F to unfeasible points. In particular, the augmented Lagrangian
P∈RN ×N
function associated to the non-smooth problem Pe , is
s.t. PT P = I (Qk ) ρ
k k−1 k k
L(X, P, Λ) = f (X, P) + hΛ, H(X, P)i + kH(X, P)k2F ,
3. B = B +X −P . 2
(13) where ρ is a positive penalty coefficient, Λ ∈ RN ×N repre-
The interesting aspect of this formulation is that subproblem sents the multipliers matrix, while the matrix inner product
Pk is convex and the second constrained quadratic problem is defined as hA, Bi , tr(AT B). The proposed augmented
Qk has a closed-form solution, as illustrated in the following Lagrangian method reduces problem Pe to a sequence of
proposition. problems that alternately update, at each iteration k, the
Proposition 2: Define Y k = Xk + Bk−1 and let following three steps:
T 1. Compute the critical point (Xk , Pk ) of the function
Yk = Q̄SR̄
L(X, P, Λk ; ρk ) by solving
be its SVD decomposition, where Q̄, R̄ ∈ RN ×N are unitary (Xk , Pk ) , min L(X, P, Λk ; ρk ); (15)
matrices, and S ∈ RN ×N is the diagonal matrix with entries X,P∈RN ×N
the singular values of Y k . Then, the optimal solution of the 2. Update the multiplier estimates Λk ;
T
quadratic non-convex problem Qk in (13) is Pk = Q̄R̄ . 3. Update the penalty parameter ρk .
Proof. See the proof of Theorem 2.1 in [25]. We will show next how to implement the previous steps, which
Combining (13) and Proposition 2, the main steps of the are described in detail in Algorithm 2.
SOC method are summarized in Algorithm 1. It is important Computation of the critical points (Xk , Pk ). The optimal
to remark that the choice of the coefficient β strongly affects solution (Xk , Pk ) of problem (15) is computed using an
the convergence behavior of the algorithm: a large value of β approximate algorithm, i.e. finding a subgradient point Θk ∈
will force a stronger equality constraint, while a too small ∂L(Xk , Pk , Λk ; ρk ) satisfying, with a prescribed tolerance
β might not be able to guarantee the solution to satisfy value ǫk , the following inequality
the orthogonality constraint. Then, a proper tuning of the
coefficient β is important to ensure a fast convergence of k Θk k∞ ≤ ǫk (16)
the algorithm. Although, as remarked in [25], the convergence k
with P ∈ St . To evaluate such point, we exploit a coordinate-
analysis of SOC algorithm is still an open problem, we will descent method with proximal regularization based on the
show next that the numerical results testify the validity and PAM method proposed in [43]. More specifically, at the k-
robustness of this method when applied to our case. th outer iteration of the algorithm, we compute (Xk , Pk ) by
iteratively solving, at each inner iteration n, the following
proximal regularization of a two blocks Gauss-Seidel method:
B. PAMAL method Xk,n = arg min L(X, Pk,n−1 , Λk ; ρk )
X∈R N ×N ,x1 =b1
As an alternative efficient method to tackle the non-
convexity of problem P, we propose here an approach based c1k,n−1
+ k X − Xk,n−1 k2F (P̃k,n )
on PAMAL algorithm [26]. The method solves the orthogo- 2
nality constrained problem by iteratively updating the primal Pk,n = arg min L(Xk,n−1 , P, Λk ; ρk )
P∈RN ×N
variables and the multipliers estimates. To this end, let us
reformulate the problem as follows. Let us introduce the sets c2k,n−1
+ k P − Pk,n−1 k2F (Q̃k,n )
S1 , defined as S1 , {x = ±b1}, and St , {P ∈ RN ×N : 2
6
k Θk,n k∞ ≤ ǫk , Pk,n ∈ St , where Θk,n , (Θk,n k,n Step.3: Set (Xk , Pk ) = (Xk,n , Pk,n ), Θk = Θk,n ,
1 , Θ2 ) until k Θk,n k∞ ≤ ǫk .
with the subgradients given by
Θk,n
1 = c1k,n−1 (Xk,n−1 − Xk,n ) + ρk (Pk,n−1 − Pk,n )
convergence to a critical point [43, Th. 6.2], provided that the
Θk,n
2 = c2k,n−1 (Pk,n−1 − Pk,n ). penalty parameters {ρk }k∈N in Algorithm 2 satisfy some mild
(18)
conditions, as stated in the following theorem.
Update of the multipliers and penalty coefficients. The rule
Theorem 1: Denote by {(Xk,n , Pk,n )}n∈N the sequence
for updating the multipliers matrix in Step 2 of Algorithm
generated by Algorithm 3. The function Lk in (15) satisfies
2 needs some further discussion. We adopt the classical
the Kurdyka-Łojasiewicz (K-Ł) property1. Then Θk,n defined
first-order approximation by imposing that the estimates of
by (18) satisfies
multipliers must be bounded. Then, we explicitly project the
multipliers matrix on the compact box set T , {Λ : Λmin ≤ Θk,n ∈ ∂L(Xk,n , Pk,n , Λk ; ρk ), ∀n ∈ N. (19)
Λ ≤ Λmax } with −∞ < [Λmin ]i,j ≤ [Λmax ]i,j < ∞,
Also, if γ > 1, ρ1 > 0, for each k ∈ N, it holds
∀i, j. The boundedness of the multipliers is a fundamental
assumption needed to preserve the property that global min- k Θk,n k∞ → 0, as n → ∞. (20)
imizers of the original problem are obtained if each outer
iteration of the penalty method computes a global minimum Proof. See Appendix B.
of the subproblem. Unfortunately, assumptions that imply The convergence claim for Algorithm 2 to a stationary
boundedness of multipliers tend to be very strong and often solution of problem Pe is stated in the following theorem.
hard to be verified. Nevertheless, following [26], [41], [44], we Theorem 2: Let {(Xk , Pk )}k∈N be the sequence generated
also impose the boundedness of the multipliers. This implies by Algorithm 2. Suppose ρ1 > 0 and γ > 1. Then, the set of
that, in the convergence proofs, we will assume that the true limit points of {(Xk , Pk )}k∈N is non-empty, and every limit
multipliers fall within the bounds imposed by the algorithm, point satisfies the KKT conditions of the original problem Pe .
see, e.g. [26]. Regarding the setting of the remaining param- Proof. The proof follows similar arguments as in [26, Th. 3.1-
eters of the proposed algorithm, we will assume that: i) the 3.5], and thus is omitted due to space limitation.
sequence of positive tolerance parameters {ǫk }k∈N is chosen Remark 1. Note that both Algorithms 1 and 3 at each step
such that limk→∞ ǫk = 0; ii) the penalty parameter ρk is of their loops have to compute the SVD of an N × N
updated according to the infeasibility degree by following the matrix. Therefore, at each iteration their computational cost is
rule described in step 3 of Algorithm 2 [26], [33]. proportional to O(N 3 ). So, clearly, there is a complexity issue
Convergence Analysis. We now discuss in details the conver- that deserves further investigations to enable the application
gence properties of the proposed PAMAL method. Assume to large size graphs. In this paper, we have not investigated
that: i) the proximal parameters {ck,n methods to reduce the complexity of the approach exploiting,
i }∀k,n are arbitrarily
chosen as long as they satisfy (17); ii) the sequence {ǫk }k∈N for instance, the sparsity of the graphs under analysis. Also, we
is chosen such that limk→∞ ǫk = 0; iii) the penalty parameter have not optimized the selection of the parameters involved in
ρk is updated according to the rule described in Algorithm 2. 1 The reader can refer to Appendix B for a definition of the Kurdyka-
The PAM method, as given in Algorithm 3, guarantees global Łojasiewicz (K-Ł) property.
7
both SOC and PAMAL methods. However, even if complexity Algorithm 4 : Balanced graph signal variation
is an issue, the proposed approach is more numerically stable For k = 2, . . . , N
than the only method available today for the analysis of Set n = 0, xn 0 n
k = x nonzero vector with m(xk ) = 0,
α > 0, 0 < ǫ ≪ 1.
directed graphs, based on the Jordan decomposition. Repeat
Remark 2. The two alternative methods proposed above to wn ∈ sign(xn k ),
solve the non-convex problem P are robust to random initial- v n = wn − mean(wn )1,
izations, as testified also by the numerical results presented hn = xn k + αv ,
n
E(xn
k)
in the sequel. In terms of implementation complexity, SOC x̂n+1
k = arg min f (xk ) + k xk − hn k22 ,
xk ∈X b 2α
algorithm is easier to code even though, to the best of our k
3
3 2
3
2 2 1
1 15 4
15 15 1
4 4
11
11 11 5
5 14
14 5 14
12
12 12
7
6 6 6 7
7 13
13 13
9 9 9
8 8 8
10 10 10
1 1 0
0.2 0.2 0.2
0.5 0.5 −0.2
0 0 −0.4
0 0 0
−0.6
−0.5 −0.2 −0.5 −0.2
−0.2 −0.8
GDV(x4 ) = 2.24 GDV(x5 ) = 2.46 GDV(x6 ) = 3 GDV(x4 ) = 2.7 GDV(x5 ) = 2.73 GDV(x6 ) = 3
0.2 0.2 0.2
0 0 0.2 0 0.2 0
GDV(x7 ) = 3.37 GDV(x8 ) = 3.46 GDV(x9 ) = 3.53 GDV(x7 ) = 3.2 GDV(x8 ) = 3.5 GDV(x9 ) = 3.5
0.6 0.8 0.8 0.8 0.8
0.5
0.4 0.6 0.6 0.6 0.6
0.2 0.4 0.4 0.4 0.4
0
0.2
0 0.2 0.2 0.2
0
−0.2 −0.5 0 0 0
−0.2
GDV(x10 ) = 3.67
GDV(x11 ) = 4.08 GDV(x12 ) = 4.25
0.8 GDV(x 10 ) = 3.5 GDV(x 11 ) = 4.1 GDV(x 12 ) = 4.2
0.8
0.8
0.6 0.6 0.5
0.5 0.6 0.5
0.4 0.4
0.4
0.2 0.2 0
0 0.2 0
0 0
0
−0.2 −0.2 −0.5
−0.5 −0.2 −0.5
−0.4 −0.4 −0.4
GDV(x13 ) = 4.6 GDV(x14 ) = 4.6 GDV(x15 ) = 5 GDV(x 13 ) = 4.2 GDV(x 14 ) = 4.5 GDV(x 15 ) = 5
0.6 0.6
0.2 0.4
0.5 0.5
0.4 0 0.4
0.2
0.2 −0.2 0.2
0 0 0
0 −0.4 0
−0.2
−0.6
−0.2 −0.2 −0.5 −0.5
−0.4
−0.8
Fig. 2: Optimal basis vectors xk , k = 1, . . . , 15 for Algorithm Fig. 3: Optimal basis vectors xk , k = 1, . . . , 15 for Algorithm
2 and the directed graph in Fig. 1a. 2 and the graph in Fig. 1b.
unique and an interesting consequence of the edge directivity. suggests that bases associated to different local minima behave
In fact, as can be observed from Fig. 5, the optimal bases similarly in terms of total variation. Additionally, since the
for the corresponding undirected graph (obtained by simply PAMAL algorithm solves the orthogonality constrained, non-
removing edge directivity) have only one vector with zero convex problem by iteratively updating the primal variables
variation, the constant vector. Conversely, in the case shown and the multipliers, the objective function evaluated at each
before, we have had three, two, and one vectors yielding zero (inner and outer) iteration does not necessarily follow a
variation. monotonous decay, as can be noticed by the lower subplot
Convergence test. Since the optimization problem P is non- in Fig. 6.
convex, there is of course the possibility that the proposed Comparison with alternative GFT bases. We compare
methods fall into a local minimum. Furthermore, while PA- now the GFT basis found with our methods with the bases
MAL method guarantees convergence, SOC algorithm might associated to either the Laplacian or the adjacency matrix,
also fail to converge because, theoretically speaking, there is as proposed in [5], [4] and references therein. To compare
no convergence analysis. To test what happens, we considered the results, we applied all algorithms to several indepen-
several independent initializations of both SOC and PAMAL dent realizations of random graphs. We chose as family of
algorithms in the search for a basis for the graph of Fig. 1a. random graphs the so called scale-free graphs, as they are
In Fig. 6, we report the average behavior (± the standard known to fit many situations of practical interest [51]. In
deviation) of the directed variation versus the iteration index the generation of random scale-free graphs, it is possible to
m, which counts the overall number of (outer and inner) set the minimum degree dmin of each node. To compare
iterations for Algorithm 1 and 2. The curves refer to 200 our method with the GFT definition proposed in [1], since
independent initializations of algorithms SOC and PAMAL, the eigenvectors of an asymmetric matrix can be complex
using the same initialization for both. We can observe that in and the directed total variation GDV, as defined in (3),
all cases the algorithms converge but indeed there is a spread does not represent a valid metric for complex vectors, we
in the final variation, meaning that both methods can incur into restricted the comparison to undirected scale-free graphs, in
local minima. Nonetheless, the spread is quite limited, which which case the adjacency and Laplacian matrices are real and
10
GDV(x1 ) = 0 GDV(x2 ) = 0.54 GDV(x3 ) = 1.3 GAV(x1 ) = GQV(x1 ) = 0 GAV(x2 ) = 2.68, GQV(x2 ) = 2.6 GAV(x3 ) = 2.8, GQV(x3 ) = 1.6
0.2
1 0.3 0.2 1 0.6
0
0.2 0.4
0.5 0 0.5 −0.2
0.1
−0.4 0.2
0 0 0
−0.2 −0.6 0
−0.5 −0.1 −0.5 −0.8 −0.2
GDV(x4 ) = 2.4 GDV(x5 ) = 3 GDV(x6 ) = 3.3 GAV(x4 ) = 2.88, GQV(x4 ) = 2.39 GAV(x5 ) = 3.65, GQV(x5 ) = 2.93 GAV(x6 ) = 3.88, GQV(x6 ) = 3.39
0.2 0.6
0 0 0.2 0.4 0.2
0.4
−0.2 −0.2 0 0.2 0
0.2
−0.4 −0.4 −0.2 −0.2
0 0
−0.6 −0.6
−0.4 −0.4
−0.2 −0.2
−0.8 −0.8
−0.6 −0.6
GDV(x7 ) = 3.3 GDV(x8 ) = 3.4 GDV(x9 ) = 3.4 GAV(x7 ) = 3.88, GQV(x7 ) = 3.39 GAV(x8 ) = 3.88, GQV(x8 ) = 3.39 GAV(x9 ) = 4, GQV(x9 ) = 2.93
0.2 0.8 0.6 0.6
0.2 0.2
0 0.6 0.4 0.4
0 0
−0.2 0.4
0.2 0.2 −0.2
−0.2 −0.4 0.2
0 0 −0.4
−0.4 −0.6 0
−0.2 −0.2 −0.6
−0.6 −0.8 −0.2
GDV(x 10 ) = 3.5 GDV(x 11 ) = 4 GDV(x 12 ) = 4.2 GAV(x10 ) = 4.08, GQV(x10 ) = 3.66 GAV(x11 ) = 4.11, GQV(x11 ) = 3.61 GAV(x12 ) = 4.49, GQV(x12 ) = 4.16
0.8 0.8 0.8
0.5 0.6 0.5 0.6 0.2 0.6
0.4 0.4 0 0.4
0 0.2 0 0.2 0.2
−0.2
0 0 0
−0.4
−0.2 −0.2 −0.2
−0.5 −0.5
−0.4 −0.6 −0.4
−0.4
GDV(x 13 ) = 4.5 GDV(x 14 ) = 5 GDV(x15 ) = 5.2 GAV(x13 ) = 4.61, GQV(x13 ) = 4.33 GAV(x14 ) = 4.94, GQV(x14 ) = 4.5 GAV(x15 ) = 5.65, GQV(x15 ) = 4.9
0.4 0.6 0.6 0.8
0.2 0.5 0.5
0.4 0.4 0.6
0
0.4
0.2 0.2 0
−0.2 0
0.2
−0.4 0 0
0
−0.6 −0.2 −0.5 −0.5
−0.2 −0.2
−0.8
Fig. 4: Optimal basis vectors xk , k = 1, . . . , 15 for Algorithm Fig. 5: Optimal basis vectors xk , k = 1, . . . , 15 for Algorithm
2 and the graph in Fig. 1c. 2 and the undirected counterpart of the graph in Fig. 1c.
symmetric, so that their eigenvectors arePreal. In the sequel, are averaged over 100 independent realizations of scale-free
N graphs, vs. the average minimum degree, under the same
we will use the notations GAV(X) := k=1 GAV(xk ) and
PN
GQV(X) := k=1 GQV(xk ) to denote, respectively, the total settings of Fig. 7. Interestingly, even if our basis vectors X∗ do
graph absolute and quadratic variation of a matrix X. In Fig. not coincide with V or U, they provide the same GQV, within
7, we compare the following metrics: a) GAV(X∗ ), derived by negligible numerical inaccuracies. Indeed, the invariance of the
solving problem P through the SOC and PAMAL methods; metric GQV(X), for any square, orthogonal matrix PN X, can be
b) GAV(V), where V are the eigenvectors of the adjacency easily proved from the equality GQV(X) = k=1 xTk Lxk =
matrix according to the GFT defined in (7); c) GAV(U), trace(XT LX), by observing that trace(XT LX) = trace(L)
where U are the eigenvectors of the Laplacian matrix by for any orthogonal matrix X. Interestingly, this implies that,
assuming the GFT as in (5), that for undirected graphs is for undirected graphs, our orthogonal matrix X∗ can be
equivalent to the GFT defined in (10). More specifically, Fig. obtained by applying an orthogonal transform to the Laplacian
7 shows the previous metrics vs. the minimum degree of the eigenvectors basis.
graph averaged over 100 independent realizations of scale- Complexity issues. Clearly, looking at both SOC and PAMAL
free graphs of N = 20 nodes. As we can notice from Fig. methods, complexity is a non trivial issue which deserves
7, the bases built using SOC and PAMAL algorithms yield a further investigations, especially when the size of the graph
significantly lower total variation than the conventional bases increases. To get an idea of computing time, in Fig. 9 we report
built with either adjacency or Laplacian eigenvectors. This is the execution time of both SOC and PAMAL algorithms, as
primarily due to the fact that our optimization methods tend a function of the number of vertices in the graph. The results
to assign constant values within each cluster. Finally, in Fig. 8 have been obtained running a non-compiled Matlab program,
we compare the alternative basis vectors using as performance with no optimization of the parameters involved, by setting
metric the GQV. So, in Fig. 8 we report the GQV(X∗ ) metric ρ1 = β = 20. The program ran on a laptop having a processor
derived from the SOC and PAMAL methods with GQV(V) Intel Core i7-4500, CPU 1.8, 2.4 GHz. The graphs under
and GQV(U) obtained, respectively, from the eigenvectors of test were generated as geometric random graphs with equal
the adjacency and the Laplacian matrix. Again, the results percentage of directed links as N increases.
11
100 400
SOC algorithm GQV(X ⋆ ), SOC
GQV(X ⋆ ), PAMAL
GQV(V)
GDV m ± σm
80
350 GQV(U)
60
300
40
0 100 200 300 400 500
GQV
Iteration index m 250
100
PAMAL algorithm
200
GDV m ± σm
80
60 150
40
0 20 40 60 80 100 120 140 160 100
4 6 8 10 12 14 16
Iteration index m
Average minimum degree
Fig. 6: Average directed variation (± the standard deviation)
Fig. 8: Average GQV versus the average minimum degree
for SOC and PAMAL methods vs. the iteration index m for the
according to alternative GFT definitions for undirected scale-
graph of Fig. 1a, by averaging over 200 random initializations
free graphs with N = 20 nodes.
of the algorithms.
10 3
900 SOC algorithm
GAV(X ⋆ ), SOC PAMAL algorithm
GAV(X ⋆ ), PAMAL
800 GAV(V)
GAV(U)
700
Execution time [min]
10 2
600
GAV
500
400 10 1
300
200
10 0
50 100 150 200 250
100
4 6 8 10 12 14 16 N
Average minimum degree
Fig. 9: Execution time vs. the number of nodes for RGGs with
Fig. 7: Average absolute total variation versus the average 25% of directed links and β = ρ1 = 20.
minimum degree according to alternative GFT definitions for
undirected scale-free graphs with N = 20 nodes.
edges’ directivity.
Balanced total variation. In some cases, the solution of the
Examples with real networks. As an application to real total variation problem in (12) can cut the graph in subsets
graphs, in Fig. 10 we considered the directed graph obtained of very different cardinality. As an extreme case, it may be
from the street map of Rome, incorporating the true directions not uncommon to have a subset composed of only one node
of traffic lanes in the area around Mazzini square. The graph and the other set containing all the rest of the network. To
is composed of 239 nodes. Even though, the scope of this prevent such a behavior, Algorithm 4 aims at minimizing the
paper is to propose a method to build a GFT basis, so that balanced total variation. An example of its application to the
we do not dig further into applications, this an example that graph of Fig. 10 is reported in Fig. 12, where we show some
has interesting applications of GSP. The problem in this case basis vectors computed using Algorithm 4. Comparing these
is to build a map of vehicular traffic in a city, starting from vectors with the corresponding ones obtained with PAMAL
a subset of measurements collected along road side units or algorithm, see, e.g. Fig. 11, we can see how clusters of single
sent by cars equipped with ad hoc equipment. The problem nodes are now avoided.
can be interpreted as the reconstruction of the entire graph
signal from a subset of samples and then it builds on graph VII. C ONCLUSION
sampling theory [10]. In Fig. 11 we report some basis vectors In this paper we have proposed an alternative approach to
obtained by using Algorithm 2 with ρ1 = 10. We can observe build an orthonormal basis for the Graph Fourier Transform
that the basis vectors highlight clusters, while capturing the (GFT). The approach considers the general case of a directed
12
A PPENDIX
A. Closed-form solution for problem Q̃k,n
In this section we provide a closed-form solution for the
non-convex problem Q̃k,n . This problem can be equivalently
written as
Pk,n = arg min gk,n−1 (P)
P∈RN ×N (31)
T
s.t. P P=I
k
where gk,n−1 (P) , hΛk , P− Xk,n−1 i+ ρ2 kP− Xk,n−1 k2F +
ck,n−1
2
2 k P − Pk,n−1 k2F . Our proof consists of two steps:
i) first, we find the stationary solutions by solving the KKT
necessary conditions; ii) then, we prove that the resulting
closed-form solution is a global minimum of the non-convex
problem (31). The Lagrangian function LP associated to (31)
can be written as
ρk
LP =hΛk , P − Xk,n−1 i + kP − Xk,n−1 k2F
2 (32)
c2k,n−1 k,n−1 2 T
+ k P−P kF +hΛ1 , P P − Ii
2
where Λ1 ∈ RN ×N is the multipliers’ matrix associated to the
orthogonality constraint. The KKT conditions become then
Fig. 10: Directed graph associated to street map of Rome a) ∇P LP =P[I(ρk + c2k,n−1 ) + 2Λ1 ] − c2k,n−1 Pk,n−1
(Piazza Mazzini). − ρk Xk,n−1 + Λk = 0,
b) Λ1 ⊥ PT P − I = 0 (33)
where we chose Λ1 = ΛT1 . Hence, defining B , I +
graph and then it includes the undirected case as a particular 2Λ1 /(ρk + c2k,n−1 ), from equation a) one gets:
example. The search method starts from the identification of
an objective function and then looks for an orthonormal basis PB = F (34)
that minimizes that function. More specifically, motivated by c2k,n−1 Pk,n−1 + ρk Xk,n−1 −Λ k
the need to detect clustering behaviors in graph signals, we with F , . Let QΣTT be
chose as objective function the cut size. We showed that this ρk + c2k,n−1
the SVD decomposition of F. From (34), it turns out
approach leads, without loss of optimality, to the minimization
of a function that represents a directed total variation of graph PB = QΣTT (35)
signals, as it captures the edges’ directivity. Interestingly, in
and, using the orthogonality condition b) in (33), it holds
case of undirected graphs, this function converts into an ℓ1 -
norm total variation, which represents the graph (discrete) BT B = TΣ2 TT ⇒ B = TΣTT . (36)
counterpart of the ℓ1 -norm total variation that plays a key role
Therefore, replacing B in (35), we get
in the classical Fourier Transform of continuous-time signals
[17]. We compared our basis vectors with the eigenvectors PTΣTT = QΣTT ⇒ P = QTT . (37)
of either the Laplacian or adjacency matrix, assuming as
performance metric either our graph absolute variation or the It remains to prove that P⋆ = Pk,n = QTT is a global
graph quadratic variation. As expected, our method outper- minimum for problem (31). To this end, it is sufficient to show
forms the other methods when using the absolute variation, as that
it is built by minimizing that metric. However, what has been gk,n−1 (P⋆ ) ≤ gk,n−1 (P), ∀ P : PT P = I (38)
interesting to see was that our basis performs as well as the ⋆
alternative basis when we assumed as performance metric the i.e., using the equalities k P k2F =k
P k2F = N , we have to
graph quadratic variation. Before concluding, we wish to point prove that ∀ P : PT P = I, it results
out that, as always, our alternative approach to build a GFT trace(P⋆T (Λk − ρk Xk,n−1 − c2k,n−1 Pk,n−1 )) ≤
basis has its own merits and shortcomings when compared (39)
to alternative approaches. For example, having restricted the trace(PT (Λk − ρk Xk,n−1 − c2k,n−1 Pk,n−1 )).
search to the real domain, differently from available methods, Using the above definition of F, (39) reduces to
our method fails to find the complex exponentials as the
GFT basis in the case of circular graphs. Furthermore, other trace(P⋆T F) ≥ trace(PT F), ∀ P : PT P = I (40)
methods like the ones in [1] starting from the identification of ⋆ T
and since P = QT , the final inequality to hold true is
the adjacency matrix as the shift operator, are more suitable
than our approach to devise a filtering theory over graphs. trace(Σ) ≥ trace(TT PT QΣ), ∀ P : PT P = I. (41)
13
GDV(x3 ) =0 GDV(x5 ) =0
0
0.5
-0.05
-0.1 0.4
-0.15
0.3
-0.2
0.2
-0.25
-0.3 0.1
-0.35
0
0.4
0.5
0.35
0.4 0.3
0.25
0.3
0.2
0.2 0.15
0.1
0.1
0.05
0 0
0.3
-0.1 0.2
0.1
-0.2
0
-0.1
-0.3
-0.2
-0.4
-0.3
-0.4
-0.5
-0.5
Fig. 11: Optimal basis vectors xk , k = 3, 5, 17, 27, 29, 63 for Algorithm 2 and the graph in Fig. 10.
14
GDV(x2 ) =0 GDV(x3 ) =0
0.15
0.3
0.1
0.2
0.05
0.1
0 0
-0.1
-0.05
-0.2
-0.1
-0.3
-0.15
0.35
0.3
0
0.25
0.2
0.15
-0.05
0.1
0.05
0 -0.1
-0.05 0
-0.1
-0.05
-0.15
-0.2
-0.1
-0.25
Fig. 12: Optimal basis vectors xk , k = 2, . . . , 7 for Algorithm 4 and the graph in Fig. 10.
15
Define ZT := TT PT Q so that ZT Z = I. Then, from (41) we Lk (W) → ∞ when kWk∞ → ∞. Clearly, the term f2 (P)
get is coercive. The remaining terms in (45) can be written as
trace(Σ) ≥ trace(ZT Σ), ∀ Z : ZT Z = I. (42) ρk
f1 (X) + gk (X, P) = GDV(X) + hX, Xi−hρk P + Λk , Xi
This last inequality holds because Σii > 0 and Zii ≤| Zii | 2
≤ 1, ∀i, where the latter is implied by ZT Z = I [40]. ρk
+hΛk , Pi + k P k2F .
Additionally, Zii = 1, ∀i, if and only if Z = I, so that the 2
equality in (42) holds if and only if Z = I or P⋆ = QTT . Since P ∈ St it holds k P k2F = N . Thus, from the inequalities
hA, Bi ≥ − √ k A kF k B kF and k B kF ≤k B k1 , it holds
B. Proof of Theorem 1 hΛk , Pi ≥ − N k Λk k1 , so that one gets
For lack of space, we omit here the details of the proof,
ρk
which proceeds using similar arguments as in the proof of f1 (X) + gk (X, P) ≥ GDV(X) + hX, Xi − ρk k X k1
2
Proposition 2.5 in [26]. However, to invoke this correspon- √ ρk N
dence, we need to prove that the following properties hold true: −hΛk , Xi − N k Λk k1 +
i) the function Lk in (15) satisfies the Kurdyka-Łojasiewicz 2
k k
(K-Ł) property; ii) Lk is a coercive function. To prove point where we used the inequality hρ P, Xi ≤ ρ k X k1 . Observe
i), let us first introduce some definitions [52]. that the sequence {ρk }k∈N is non-decreasing when γ > 1 so
Definition 3: A semi-algebraic subset of Rn is a finite union that ρk > ρ1 . Then the function f1 (X) + gk (X, P) is coercive
of sets of the form ρk
being GDV(X) + hX, Xi a positive function.
{x ∈ Rn : P1 (x) = 0, . . . ,Pk (x) = 0, 2
(43)
Q1 (x) > 0, . . . , Ql (x) > 0} R EFERENCES
[1] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on
where P1 , . . . , Pk and Q1 , . . . , Ql are polynomial in n vari- graphs,” IEEE Trans. Signal Process., vol. 61, no. 7, pp. 1644–1656,
ables. Apr. 2013.
Definition 4: A function f : Rn → R is said to be semi- [2] S. K. Narang and A. Ortega, “Perfect reconstruction two-channel wavelet
filterbanks for graph structured data,” IEEE Trans. Signal Process.,
algebraic if its graph, defined as gphf := {(x, f (x))| x ∈ vol. 60, no. 6, pp. 2786–2799, 2012.
Rn }, is a semi-algebraic set. [3] ——, “Compact support biorthogonal wavelet filter banks for arbitrary
It is shown [ cf. [42], Th. 3] that the semi-algebraic functions undirected graphs,” IEEE Trans. Signal Process., vol. 61, no. 19, pp.
4673–4685, 2013.
satisfy the K-Ł property. [4] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on
Definition 5: A function φ(x) satisfies the Kurdyka- graphs: Frequency analysis,” IEEE Trans. Signal Process., vol. 62,
Łojasiewicz (K-Ł) property at point x̄ ∈ dom(∂φ) if there no. 12, pp. 3042–3054, Jun. 2014.
[5] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Van-
exists θ ∈ [0, 1) such that dergheynst, “The emerging field of signal processing on graphs: Ex-
tending high-dimensional data analysis to networks and other irregular
|φ(x) − φ(x̄)|θ domains,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98, May
(44)
dist(0, ∂φ(x)) 2013.
[6] D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets on
is bounded around x̄. graphs via spectral graph theory,” Appl. Comput. Harmon. Anal., vol. 30,
The global convergence of the PAM method established in pp. 129–150, 2011.
[7] S. K. Narang, G. Shen, and A. Ortega, “Unidirectional graph-based
[43] requires the objective function to satisfy the K-Ł property. wavelet transforms for efficient data gathering in sensor networks,” in
Define W := (X, P) and consider the function Lk in (15), Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Mar.
i.e. 2010, pp. 2902–2905.
Lk (W) = L(X, P, Λk ; ρk ) = f1 (X) + f2 (P) + gk (X, P) [8] I. Pesenson, “Sampling in Paley-Wiener spaces on combinatorial
graphs,” Trans. of the American Math. Society, vol. 360, no. 10, pp.
(45) 5603–5627, Oct. 2008.
where f1 (X) = GDV(X), f2 (P) = δSt (P) and gk (X, P) = [9] A. Agaskar and Y. M. Lu, “A spectral graph uncertainty principle,” IEEE
k Trans. Inform. Theory, vol. 59, no. 7, pp. 4338–4356, Jul. 2013.
hΛk , P − Xi + ρ2 kP − Xk2F . Observe that f1 (X) = [10] M. Tsitsvero, S. Barbarossa, and P. Di Lorenzo, “Signals on graphs:
XN
Uncertainty principle and sampling,” IEEE Trans. Signal Process.,
aji max(xi − xj , 0) is the weighted sum of the func- vol. 64, no. 18, pp. 4845–4860, Sep. 2016.
i,j=1 [11] M. Tsitsvero and S. Barbarossa, “On the degree of freedom of signals
tions fij (xi , xj ) = max(xi − xj , 0). Being a finite sum of on graphs,” in Proc. European Signal Process. Conf., Nice, Sep. 2015,
pp. 1521–1525.
semi-algebraic functions also a semi-algebraic function, it is [12] S. Chen, R. Varma, A. Sandryhaila, and J. Kovačević, “Discrete signal
sufficient to show that fij is semi-algebraic. Assume, w.l.o.g. processing on graphs: Sampling theory,” IEEE Trans. Signal Process.,
yij = xi − xj so that z = fij (yij ) = max(yij , 0). The graph vol. 63, no. 24, pp. 6510–6523, Dec. 2015.
[13] X. Zhu and M. Rabbat, “Approximating signals supported on graphs,” in
of fij becomes Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Mar.
2012, pp. 3921–3924.
gphfij = {(yij , z) : z = yij , yij ≥ 0}∪{(yij , z) : z = 0, yij ≤ 0} [14] R. Singh, A. Chakraborty, and B. S. Manoj, “Graph Fourier transform
based on directed Laplacian,” in Proc. Int. Conf. Signal Process.
and according to Definition 3 it is a semi-algebraic set. Commun. (SPCOM), Jun. 2016, pp. 1–5.
Then f1 (X) as sum of semi-algebraic functions is also semi- [15] M. Püschel and J. M. F. Moura, “Algebraic signal processing theory:
algebraic. Since f2 (P) and gk (X, P) are semi-algebraic func- Foundation and 1-D time,” IEEE Trans. Signal Process., vol. 56, no. 8,
pp. 3572–3585, Aug. 2008.
tions it follows that Lk (W) is also semi-algebraic. It remains [16] ——, “Algebraic signal processing theory: 1-D space,” IEEE Trans.
to prove point ii) to assess that Lk is a coercive function, i.e. Signal Process., vol. 56, no. 8, pp. 3586–3599, Aug. 2008.
16
[17] S. Mallat, A wavelet tour of signal processing: The sparse way. [44] E. G. Birgin, D. Fernández, and J. M. Martı́nez, “On the boundedness
Accademic Press, 2009. of penalty parameters in an augmented Lagrangian method with con-
[18] F. Lozes, A. Elmoataz, and O. Lézoray, “Partial difference operators on strained subproblems,” Optimization Methods and Software, vol. 27, pp.
weighted graphs for image processing on surfaces and point clouds,” 1001–1024, 2012.
IEEE Trans. Image Process., vol. 23, no. 9, pp. 3896–3909, Sep. 2014. [45] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE
[19] G. H. Golub and J. H. Wilkinson, “Ill-conditioned eigensystems and Trans. Pattern Analysis and Mach. Lear., vol. 22, no. 8, pp. 888–905,
computation of the Jordan canonical form,” SIAM Review, vol. 18, no. 4, Aug. 2000.
pp. 578–619, Oct. 1976. [46] M. Hein and S. Setzer, “Beyond spectral clustering - tight relaxations of
[20] B. Girault, “Signal Processing on Graphs - Contributions to an Emerging balanced graph cuts,” in Advances in Neural Inform. Process. Systems
Field,” Theses, Ecole normale supérieure de lyon - ENS LYON, Dec. (NIPS), 2011, pp. 2366–2374.
2015. [Online]. Available: https://tel.archives-ouvertes.fr/tel-01256044 [47] J. Cheeger, “A lower bound for the smallest eigenvalue of the Laplacian,”
[21] A. Gadde, A. Anis, and A. Ortega, “Active semi-supervised learning Problems in Analysis, R.C. Gunning, ed., Princeton Univ. Press, pp.
using sampling theory for graph signals,” in Proc. 20th ACM SIGKDD 195–199, 1970.
Int. Conf. Knowledge Discovery and Data Mining, ser. KDD ’14. New [48] M. Hein and T. Bühler, “An inverse power method for nonlinear
York, NY, USA: ACM, 2014, pp. 492–501. eigenproblems with applications in 1-spectral clustering and sparse
[22] A. Anis, A. E. Gamal, S. Avestimehr, and A. Ortega, “Asymptotic PCA,” in Advances in Neural Inform. Process. Systems (NIPS), 2010,
justification of bandlimited interpolation of graph signals for semi- pp. 847–855.
supervised learning,” in Proc. IEEE Int. Conf. Acoust., Speech Signal [49] A. Szlam and X. Bresson, “Total variation and Cheeger cuts,” in Proc.
Process. (ICASSP), Apr. 2015, pp. 5461–5465. 27th Int. Conf. on Machine Learning (ICML), 2010, pp. 1039–1046.
[23] L. Lovász, “Submodular functions and convexity,” in A. Bachem et al. [50] S. Boyd and N. Parikh, Proximal Algorithms. Foundations and Trends
(eds.) Math. Program. The State of the Art, Springer Berlin Heidelberg, in Optimization, 2013, vol. 1, no. 3.
pp. 235–257, 1983. [51] R. Albert and A.-L. Barabási, “Statistical mechanics of complex net-
[24] F. Bach, “Learning with submodular functions: A convex optimization works,” Rev. Mod. Phys, pp. 47–97, 2002.
perspective,” Foundations and Trends in Machine Learning, vol. 6, no. [52] J. Bochnak, M. Coste, and M. F. Roy, Real Algebraic Geometry.
2–3, pp. 145–373, 2013. Springer-Verlag, Berlin, 1998.
[25] R. Lai and S. Osher, “A splitting method for orthogonality constrained
problems,” J. Scientific Computing, vol. 58, no. 2, pp. 431–449, Feb.
2014.
[26] W. Chen, H. Ji, and Y. You, “An augmented Lagrangian method for l1 -
regularized optimization problems with orthogonality constraints,” SIAM
J. Scientific Computing, vol. 38, no. 4, pp. B570–B592, 2016.
[27] X. Bresson, T. Laurent, D. Uminsky, and J. H. von Brecht, “Convergence
and energy landscape for Cheeger cut clustering,” in Advances in Neural
Inform. Process. Systems (NIPS), 2012, pp. 1394–1402.
[28] M. Newman, Networks: An Introduction. New York, NY, USA: Oxford
Univ. Press, 2010.
[29] L. Jost, S. Setzer, and M. Hein, “Nonlinear eigenproblems in data
analysis: Balanced graph cuts and the ratioDCA-Prox,” in Extraction
of Quantifiable Information from Complex Systems, Springer Intern.
Publishing, vol. 102, pp. 263–279, 2014.
[30] F. R. K. Chung, Spectral Graph Theory. American Math. Soc., 1997.
[31] J. Nocedal and S. J. Wright, Numerical Optimization. Springer, 2006.
[32] F. Bethuel, H. Brezis, and F. Hélein, “Asymptotics for the minimization
of a Ginzburg-Landau functional,” Calculus of Variations and Partial
Differential Equations, vol. 1, no. 2, pp. 123–148, 1993.
[33] D. P. Bertsekas, Constraint optimization and Lagrange multiplier meth-
ods. Belmont Massachusetts: Athena Scientific, 1999.
[34] M. Fortin and R. Glowinski, Augmented Lagrangian Methods: Applica-
tions to the Numerical Solution of Boundary-Value Problems. North
Holland, 2000, vol. 15.
[35] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed
Optimization and Statistical Learning via the Alternating Direction
Method of Multipliers. Foundations and Trends in Machine Learning,
2010, vol. 3, no. 1.
[36] R. Glowinski and P. Le Tallee, Augmented Lagrangian and Operator-
Splitting Methods in Nonlinear Mechanics. SIAM, 1989.
[37] W. Yin, S. Osher, D. Goldfarb, and J. Darbon, “Bregman iterative
algorithms for l1 -minimization with application to compressed sensing,”
SIAM J. Imag. Sciences, vol. 1, pp. 143–168, 2008.
[38] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin, “An iterative regu-
larization method for total variation-based image restoration,” Multiscale
Model. Simul., vol. 4, no. 2, pp. 460–489, 2005.
[39] T. Goldstein and S. Osher, “The split Bregman method for l1 -regularized
problems,” SIAM J. Imag. Sciences, vol. 2, no. 2, pp. 323–343, 2009.
[40] J. H. Manton, “Optimization algorithms exploiting unitary constraints,”
IEEE Trans. Signal Process., vol. 50, no. 3, pp. 635–650, Mar. 2002.
[41] R. Andreani, E. G. Birgin, J. M. Martı́nez, and M. L. Schuverdt, “On
augmented Lagrangian methods with general lower–level constraints,”
SIAM J. Optimiz., vol. 18, no. 4, pp. 1286–1309, 2007.
[42] J. Bolte, S. Sabach, and M. Teboulle, “Proximal alternating linearized
minimization for nonconvex and nonsmooth problems,” Math. Program.,
vol. 146, no. 1–2, pp. 459–494, Aug. 2014.
[43] H. Attouch, J. Bolte, and B. F. Svaiter, “Convergence of descent
methods for semi–algebraic and tame problems: proximal algorithms,
forward–backward splitting, and regularized Gauss–Seidel methods,”
Math. Program., vol. 137, no. 1–2, pp. 91–129, Feb. 2013.