0% found this document useful (0 votes)
21 views

Understanding the basis of graph signal processing via an intuitive example-driven approach

Uploaded by

mymnaka82125
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Understanding the basis of graph signal processing via an intuitive example-driven approach

Uploaded by

mymnaka82125
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

1

Understanding the Basis of Graph Signal


Processing via an Intuitive Example-Driven
Approach
Ljubiša Stanković, Danilo Mandic, Miloš Daković, Ilya Kisil, Ervin Sejdić, Anthony G. Constantinides

I. S COPE that graphs account naturally for irregular data relations in


Graphs are irregular structures which naturally account the problem definition, together with the corresponding data
for data integrity, however, traditional approaches have been connectivity in the analysis.
established outside Signal Processing, and largely focus on Indeed, Graph Signal and Information Processing is partic-
analyzing the underlying graphs rather than signals on graphs. ularly well suited to making sense from data acquired over
irregular data domains, which can be achieved, for example,
arXiv:1903.11179v2 [eess.SP] 12 May 2019

Given the rapidly increasing availability of multisensor and


multinode measurements, likely recorded on irregular or ad- by leveraging intuitions developed on Euclidean domains,
hoc grids, it would be extremely advantageous to analyze by employing analogies with other irregular domains such
such structured data as graph signals and thus benefit from as polygon meshes and manifolds, or learning the mutual
the ability of graphs to incorporate spatial awareness of the connectivity from available sets of data. In many emerging
sensing locations, physical intuition and sensor importance, applications, e.g., Big Data, this also introduces a number of
and the ease of local versus global sensor association. The new challenges:
aim of this Lecture Note is therefore to establish a common • Basic concepts must be revisited in order to accommo-

language between graph signals, defined on irregular signal date structured but often incomplete information,
domains, and some of the most fundamental paradigms in • New physically meaningful frameworks, specifically tai-

DSP, such as spectral analysis of multichannel signals, system lored for heterogeneous data sources, are required, and
transfer function, digital filter design, parameter estimation, • Trade-offs between performance and numerical require-

and optimal denoising. ments are a prerequisite when operating in real-time.


This is achieved through a physically meaningful and The common language and enhanced intuition between the
intuitive real-world example of geographically distributed graph approaches and their standard counterparts, illuminated
multisensor temperature estimation. A similar spatial multi- in this article through the relationships between the vertex
sensor arrangement is already widely used in Signal Process- and time domains, may be naturally generalized to address
ing curricula to introduce minimum variance estimators and the above challenges and spur further developments in the
Kalman filters, and by adopting this framework we facilitate curricula on Statistical Signal Processing, Graph Signal Pro-
a seamless integration of graph theory into the curriculum of cessing, and Big Data.
existing DSP courses. By bridging the gap between standard
approaches and graph signal processing, we also show that III. P REREQUISITES
standard methods can be thought of as special cases of their This Lecture Note assumes a basic knowledge of Linear
graph counterparts, evaluated on line graphs. It is hoped Algebra and Digital Signal Processing.
that our approach would not only help to demystify graph
theoretic approaches in education but it would also empower
IV. H ISTORY OF G RAPH T HEORETIC A PPLICATION
practitioners and researchers to explore a whole host of
otherwise prohibitive modern applications. Graph theory, as a branch of mathematics, has existed for
almost three centuries. The beginning of graph theory appli-
cations in electrical engineering dates back to the mid-XIX
II. R ELEVANCE century and the definition of Kirchoff’s laws. Owing to their
In classical Signal Processing, the signal domain is de- inherent “spatial awareness”, graph models have since become
termined by equidistant time instants or by a set of spatial a de facto standard for data analysis across the science and
sensing points on a uniform grid. However, increasingly the engineering areas, including chemistry, operational research,
actual data sensing domain may not even be related to the social networks, and computer sciences.
physical dimensions of time and/or space, and it typically A systematic account of graph theory as an optimiza-
does exhibit various forms of regularity. For example, in tion tool can be attributed to the seminal book by Nicos
social or web-related networks, the sensing points and their Christofides of Imperial College London, published in 1975
connectivity pertain to specific objects/nodes and topology of [1]. Soon after gaining prominence in general optimization,
their links. It should be noted that even for the data acquired in it was very natural to explore the application of graph theory
well defined time and space domains, the introduction of new in signal processing and related areas [2]. Indeed, perhaps the
relations between the signal samples, through graphs, may first lecture course to teach graph theory to then emerging
yield new insights into the analysis and provide enhanced data communication networks and channel coding student cohort
processing (e.g., based on local similarity neighborhoods). was introduced by the author Anthony Constantinides in
The advantage of graphs over classical data domains is 1970s. This helped to establish and formalize the connections
2

between general optimization and the topology of a communi-


cation network, and has spurred further applications in image
processing [3].
After a relative lull over the next two decades, current
developments in graph theory owe their prominence to the
emergence of modern data sources, such as large-scale sensor
and social networks, which inherently provide rich underlying
physical, social, and geographic structures that require new
ways to establish statistical inference, leading to data pro-
cessing on graphs, within a new fast maturing field of Graph
Signal Processing, [4]–[10].

V. P ROBLEM S TATEMENT: A N I LLUSTRATIVE E XAMPLE


Graphs and Graph Signal Processing represent quite a
general mathematical formalism which, albeit different from
classic concepts, does admit the development of graph-domain
counterparts of well established DSP paradigms. It would
therefore be valuable to introduce such a general concept a)
in an inductive and intuitive way, through a simple, general
enough and well-understood example regarding a commonly 40

considered topic in classical DSP. 35

To this end consider a multi-sensor setup, shown in Fig. 30

1, for measuring a temperature field in a known geograph- 25

ical region; such a set-up is typically used in the context 20

of minimum-variance estimators and Kalman filters. The 15

temperature sensing locations are chosen according to the 10

significance of a particular geographic area to local users, 5

with N = 64 sensing points in total, as shown in Fig. 1a). 0


10 20 30 35 37 39 41 43 50 60
The temperature field is denoted by {x(n)} and a snapshot of
its values is given in Fig. 1b). Each measured sensor signal 1 2 3 4 5 6 7 ... 61 62 63 64
can then be mathematically expressed as
b)
x(n) = s(n) + ε(n), (1) Fig. 1. Temperature sensing as a classical signal processing problem.
a) Sensing locations in a geographic region along the Adriatic area. b)
where s(n) is the true temperature that would have be Temperatures measured at N = 64 sensing locations (top). In standard
obtained in ideal measuring conditions and ε(n) comprises the signal processing, the spatial sensor index is used for the horizontal axis and
serves as the signal domain. This domain can be interpreted as a directed line
adverse effects of the local environment on sensor readings or graph (bottom). Observe the lack of physical intuition, as for example, sensor
faulty sensor activity, and is referred to as “noise” in the se- 37 (mountains) is followed by sensor 38 (coast), with drastic difference in
quel. For illustrative purposes, in our study ε(n) was modeled temperature.
as a realization of white, zero-mean, Gaussian process, with
standard deviation σε = 4. It was added to the signal, s(n),
to yield the signal-to-noise ratio in {x(n)} of SN R0 = 14.2 neighboring nodes, as in the line graph in Fig. 1b), and
dB. for each sensing point. Physically, such local neighborhood
Remark 1: Classical signal processing requires an arrange- should indeed include close neighboring sensing points but
ment of the quintessentially spatial temperature samples in which also exhibit similar meteorological properties defined
Fig. 1a) into a line structure shown in Fig. 1b). Obviously, by the distance, altitude difference, and other terrain prop-
such “lexicographic” ordering is not amenable to exploiting erties. In other words, since the sensor network in Fig. 1
the spatial information related to the actual sensor arrange- measures a set of related temperatures from irregularly spaced
ment, dictated by the terrain. For example, this renders sensors, an effective estimation strategy should include
classical analyses of this temperature field inapplicable (or domain knowledge – not possible to achieve with standard
at best suboptimal), as the performance critically depends DSP (line graph).
on the chosen sensor ordering scheme. This exemplifies that Consider the local neighborhoods for the sensing points
even a most routine temperature measurement setup requires n = 20, 29, 37, and 41, shown in Fig. 2a). The cumulative
a more complex estimation structure than the simple line one temperature for each sensing point is then given by
corresponding to the classical signal processing framework, X
y(n) = x(m),
shown in Fig. 1b).
m at and around n
To introduce a “situation-aware” noise reduction scheme for
the temperature field in Fig. 1, we proceed to explore a graph- so that the local average temperature for a sensing point n
theoretic framework to this problem, starting from a local may be easily obtained by dividing the cumulative tempera-
signal average operator. In classical Signal Processing this can ture, y(n), with the number of included sensing points. For
be achieved through a moving average operator, e.g., through example, for the sensing points n = 20 and n = 37, presented
averaging across the neighboring data samples, or equivalently in Fig. 2a), the “domain knowledge aware” local estimation
3

takes the form

y(20) = x(20) + x(19) + x(22) + x(23) (2)


y(37) = x(37) + x(32) + x(33) + x(35) + x(61). (3)

For convenience, the full set of relations among the sensing


points can now be arranged into the matrix form, to give

y = x + Ax, (4)

where the matrix A indicates the connectivity structure of


the neighboring sensing locations that should be involved
in the calculation for each y(n). The matrix A is therefore
referred to as the connectivity or adjacency matrix of a
graph. Its elements are either 1 (if the corresponding vertices
are related) or 0 (if they are not related). Fig. 2b) shows
the sensing locations with the corresponding connectivity for
the temperature estimation scenario in Fig. 2a). From (2) we a)
can observe, for example, that the 20th row of the adjacency
matrix A will have all zero elements, except for A20,19 = 1,
A20,22 = 1, and A20,23 = 1 (for more detail see the electronic
supplement).
This simple real-world example can be interpreted within
the graph signal processing framework as follows:
• The sensing points where the signal is measured are
designated as the graph vertices, see Fig. 1,
• The vertex-to-vertex lines indicating the connectivity
among the sensing points are called the graph edges,
• The vertices and edges form a graph, as in Fig. 2b), a
new and very structurally rich signal domain,
• The graph, rather than a standard vector of sensing
points, is then used for analyzing and processing data,
as it is equipped with spatial and physical awareness,
• The measured temperatures are now interpreted as signal
samples on graph, as shown in Fig. 3,
• Similar to traditional signal processing, this new graph
signal may have many realizations on the same graph b)
and may include noise,
• Through relation (4), we have therefore introduced a Fig. 2. Temperature setup as a domain-aware graph signal processing
problem. a) Local neighborhood for the sensing points n = 20, 29, 37, and
simple graph system for physically and spatially aware 41. These neighborhoods are chosen using “domain knowledge”, dictated
signal averaging (a linear first-order graph system). by local terrain and by taking into account the distance and altitude of
sensors. Neighboring sensors for each of these sensing locations (vertices)
To emphasize our trust in a particular sensor and to model are chosen in a physically meaningful way and their relation is indicated by
mutual sensor relevance, a weighting scheme may be imposed the connectivity lines, called edges. b) Local neighborhoods for all sensing
on the edges (connectivity) between the sensing points, in the vertices, presented in a graph form.
form
X
y(n) = x(n) + Wnm x(m). (5) estimate for each y(n) should sum up to unity. This may
m6=n
be achieved through a normalized form of (6), given by
The weight Wnm indicates the strength of the coupling
1
between signal values at the sensing points n and m; its y= (x + D−1 Wx), (7)
2
value is zero if the points n and m are not related and
for n = m. We have now arrived at a weighted graph, where the elements of the diagonal normalization matrix, D,
called the degree matrix, are Dnn = m Wnm and D−1 W
P
whereby each edge has an associated weight, Wnm , which
adds a “mutual sensor relevance” information to the already is referred to as a random walk weight matrix. When this
established “spatial awareness” modeled by the edges. In our simple normalized first-order system is employed to filter the
example, a matrix form of a weighted cumulative graph signal original noisy signal from Fig. 3, an improvement of 6 dB
now becomes over the original signal-to-noise ratio, SN R0 = 14.2 dB, is
y = x + Wx. (6) achieved.
Another important operator for graph signal processing is
This equips graph signal models with additional flexibility. In the graph Laplacian, L, which is defined as
order to produce unbiased estimates, instead of the cumulative
sums in (4) and (5), the weighting coefficients within the L = D − W.
4

The definition of an appropriate graph structure is a prereq-


uisite for physically meaningful and computationally efficient
graph signal processing applications. Three important classes
of problems, regarding the way how the graph topology is
defined, are described in Sidebar 1.
In the following, we shall demonstrate how this simple
and intuitive concept provides a natural and straightforward
platform to introduce the graph-counterparts of several fun-
damental signal processing algorithms.

VI. S YSTEM ON A G RAPH


The signal shift operator (unit time delay) is the lynchpin
in discrete-time signal processing, but it is not so obvious
to define on graphs due to the rich underlying connectivity
structure. Topologically, the signal shift on a graph can
a) be viewed as the movement of a signal sample from the
considered vertex along all edges connected to this vertex.
The signal (backward) shift operator can then be compactly
45 defined using the graph adjacency matrix as xshif ted = Ax.
To draw distinction between the standard shift and the graph
40 shift operator, consider the line graph in Fig. 1b) (bottom)
and the “spatial aware” graph in Fig. 2a), b), and assume
35
that the input signal is a pulse that occurs only at the sensor
n = 29, that is, x(n) = δ(n − 29). The shifted signal in
30
classic signal processing (line graph in the bottom of Fig.
25
1b) (bottom)) will be xshif ted (n) = δ(n − 28) and can be
considered as a movement of the delta pulse along the line
20 graph from vertex n to vertex (n − 1). The same principle
can be applied to the graph domain in Fig. 2a) whereby the
15
delta pulse from vertex n = 29 is moved to all its connected
vertices, to obtain the shifted graph signal, xshif ted (n) =
10
δ(n − 27) + δ(n − 28) + δ(n − 51) + δ(n − 59), as shown in
5
Fig. 4.
b) If the shifted signal values are also scaled by the weighting
Fig. 3. From a multi-sensor measurement to a graph signal. a) The coefficients of the corresponding edges, then the shifted signal
temperature field is represented on a graph that combines spatially unaware is given by Wx. Since the Laplacian can also be used as a
measurements in Fig. 1b) and the physically relevant graph topology in Fig. shift operator, we will adopt the symbol S to denote a general
2b). b) The graph signal intensity may also be designated by the vertex color,
as in the right half of the panel. shift operator on a graph, which yields a graph shifted signal
.
Sx.
Remark 3: The standard shift operator, x(n) = x(n − 1),
Remark 2: A graph is fully specified by the set of its is a “one-to-one” mapping, while the graph shift operator,
vertices and their connectivity scheme (designated by edges). xshif ted = Sx, is a “one-to-many” mapping which accounts
The edges may be defined by the adjacency matrix, A, with for the underlying physics of the sensing process (in our exam-
Amn ∈ {0, 1}, for unweighted graphs or by the “connec- ple), not possible to achieve with standard DSP. Moreover, it
tivity strength” weighting matrix, W, with Wmn ∈ R+ , for also allows us to incorporate a contextual relation between the
weighted graphs. The degree matrix, D, and the Laplacian vertices within the irregular grid trough the weighting matrix
matrix, L, with Lmn ∈ R, are defined using the adja- W. Notice that the graph shift operator does not satisfy the
cency/weighting matrix. When the relations between all pairs isometry property since the energy of the shifted signal is not
of vertices are mutually symmetric, then all the matrices the same as the energy of the original signal.
involved are also symmetric, and such graphs are called undi- In analogy to the pivotal role of time shift in standard
rected. If that is not the case, then the adjacency/weighting system theory, a system on a graph can be implemented as
matrix is not symmetric and such graphs are called directed a linear combination of a graph signal and its graph shifted
graphs. versions. The notion of a system is used in its classical sense,
as a set of physical rules (an algorithm) that transforms an
The above-introduced graph framework is quite general and input graph signal into another (output) graph signal. The
admits application to many different scenarios. For example, output graph signal from a system on a graph can then be
when performing an opinion poll within a social network, the written as
members of that social network are treated as the vertices (data M −1
acquisition points). Their friendship relations are represented y = h0 S0 x + h1 S1 x + · · · + hM −1 SM −1 x =
X
hm Sm x,
by the edges which model graph connectivity while the m=0
member answers play the role of graph signal values. (8)
5
1

Sidebar 1: Graph Topology (Edges and Weights) Euclidean distance between vertices, rmn , may be used,
While in classic graph theory, the graphs are typically where for a given distance threshold, fi ,
given (e.g., in various computer, social, road, transporta- 2
Wmn = e −rmn =¸ or Wmn = e −rmn =¸
tion, and power networks) oftentimes, the first step in
graph signal processing is to employ background knowledge if rmn < fi and Wmn = 0 for rmn ≥ fi . This form
of signal generating mechanisms in order to define the has been used in the graph in Fig. 2, whereby the
graph as a signal domain. This poses a number of chal- altitude difference, hmn , was accounted for as Wmn =
lenges, e.g., while the data sensing points (graph vertices) e −rmn =¸ e −hmn =˛ .
are usually well defined in advance, their connectivity • Physically well defined relations among the sens-
(graph edges) is often not available. In other words, the ing positions: Examples include electric circuits, lin-
data domain definition within the graph signal paradigm ear heat transfer systems, spring-mass systems, and
represents a part of the problem itself, and has to be various forms of networks like social, computer or
determined based on the properties of the sensing positions power networks. In these cases, the edge weights are
or features of the acquired set of data. All in all, the defini- given as a part of problem definition.
tion of an appropriate graph structure is a prerequisite for • Data similarity dictates the underlying graph
physically meaningful and computationally efficient graph topology: This scenario is the most common in image
signal processing applications. and biomedical signal processing (see Sidebar 5). Vari-
Three important classes of problems regarding the defi- ous approaches and metrics can be used to define data
nition of graph edges are: similarity, including the correlation matrix between
the signals at various vertices or the corresponding
• Geometry of the vertex positions: The distances
inverse covariance (precision) matrix, combined with
between vertex positions play a crucial role in estab-
the signal smoothness and the edge sparsity condi-
lishing relations between the sensed data. In many
tions. Learning a graph (its edges) based on the set
physical processes, the presence of edges and their
of the available data is an interesting and currently
associated connecting weights is defined based on
extensively studied research area.
the vertex distances. An exponential function of the

From (10), a simple first order system based on the graph


Laplacian can be written as
y = x + h1 Lx (11)
and is amenable, with slight modifications, to being used for
efficient low-pass graph filtering, see Sidebar 2.
Fig. 4. A single pulse graph signal x at the vertex n = 29, that is, x(n) =
δ(n − 29) , and its graph shifted version xshif ted = Ax. The shift operator Remark 5: A system on a graph is conveniently defined by
is demonstrated on the north-east part of the graph from Fig. 3, around the the “graph transfer function”, H(S), as
vertex n = 29, is presented.
y = H(S)x. (12)

where, by definition S0 = I, while h0 , h1 , . . . , hM −1 are For an unweighted graph, the adjacency matrix, A, is
the system coefficients to be found (see Section IX). Notice commonly used as a shift matrix, S, while the Laplacian
that for the directed and unweighted line graph in Fig. 1b) matrix, L = D − W, is used to define a shift on a weighted
(bottom), the system on a graph in (8) reduces to the well graph.
known standard Finite Impulse Response (FIR) filter, given Properties of a system on a graph: Following the above
by discussion, it is now possible to link the properties of linear
systems with those of systems on a graph. From equations
y(n) = h0 x(n)+h1 x(n−1)+· · ·+hM −1 x(n−M +1). (9) (8)-(12) the system on a graph is said to be:
• Linear, if
Remark 4: The above established link between the clas-
sical transfer function of a physical system and its graph- H(S)(a1 x1 + a2 x2 ) = a1 y1 + a2 y2 .
theoretic counterpart may serve to promote new algorithmic
approaches, which stem from signal processing, into many • Shift invariant, if
application scenarios that are directly considered as graphs. H(S)(Sx) = S(H(S)x).
Observe that the Laplacian operator applied on a signal,
Lx, can be considered as a combination of the scaled original Remark 6: A system on a graph, defined by
signal, Dx, and its weighted shifted version, Wx, since Lx = H(S) = h0 S0 + h1 S1 + · · · + hS−1 SM −1 (13)
Dx − Wx. A system defined using the graph Laplacian is
obtained from (8) by replacing S = L, and has the form is linear and shift invariant, since the matrix multiplication

of the square weighting matrices is associative S(SS) =
y = L0 x + h1 L1 x + · · · + hM −1 LM −1 x (10) 
(SS)S , that is SSm = Sm S.
therefore allows us to always produce an unbiased estimate
of a constant c, that is, if x = c then y = c, since Lc = 0.
6
1

Sidebar 2: Smoothness and Filtering on a Graph 45

The quadratic form of a graph signal is given by


40

N N
1X X “ ”2
Ex = xLxT = Wnm x(n) − x(m) 35
2 n=1 m=1
and can be used to define signal smoothness since small 30

values of the squared local deviation, (x(n) − x(m))2 , cor-


25
respond to a smooth, slow-varying, signal. For a constant
signal, x = c, we therefore have Ex = 0. 20

Physically, the minimum of xLxT implies the smoothest


possible signal and to arrive at this solution we may employ 15

steepest descent. Then, the signal value at an iteration p is


adjusted in the opposite direction of the gradient, toward 10

the minimum of xLxT . The gradient of this quadratic form


is @Ex =@xT = 2Lx, and yields the iterative procedure 5

xp+1 = xp − ¸Lxp = (I − ¸L)xp : 45

Notice that the signal xp+1 can be considered as an output


40
of the first order system in (11), with h1 = −¸, and this
relation can be used for simple and efficient filtering of 35

graph signals.
Since the minimum of the quadratic form xLxT corre- 30

sponds to a constant signal, in order to avoid obtaining only


constant steady state (i.e., to account for the slow-varying 25

part of the graph signal as well), the above iteration process


can be used in alternation with xp+2 = (I + ˛L)xp+1 . A 20

compact form of these two iterative processes is known as


15
Taubin’s ¸ − ˛ algorithm and is given by
xp+2 = (I + ˛L)(I − ¸L)xp : (1) 10

For appropriate values of ¸ and ˛, this system can give 5

a good and very simple approximation of a low-pass graph


Low-pass filtering on a graph. Top: The original noisy
filter with transfer function H(–k ) = (1 + (˛k − ¸k )–k −
signal. Bottom: The filtered signal. The graph signal
¸k ˛k –2k )P , and in P iterations.
intensity is designated by the vertex color.
In our experiment, the original noisy signal from Fig. 3
was filtered using Taubin’s algorithm, with ¸ = 0:2 and
˛ = 0:1. After 50 iterations, the signal-to-noise ratio retained 7 out of 64 spectral components in the signal (with
improved from the original SNR0 = 14:2 dB to 26:8 an attenuation lower than 3dB).
dB. With these parameters, the transfer function H(–k )

VII. G RAPH F OURIER T RANSFORM signal, x, onto the k-th eigenvector, uk ∈ U, that is
N
While classic spectral analysis is performed in the Fourier X
domain, spectral representations of graph signals employ X(k) = x(n)uk (n). (15)
n=1
either the adjacency/weighting matrix or the graph Laplacian
eigenvalue decomposition. For the latter case we have The inverse graph Fourier transform is then straightfor-
wardly obtained as
L = UΛU−1 , x = UX (16)

where U is an orthonormal matrix of the eigenvectors, uk , or


of the graph Laplacian matrix, L, (in its columns), and Λ N
X
is a diagonal matrix of the corresponding eigenvalues, λk . x(n) = X(k)uk (n). (17)
These eigenvectors may then be used for the spectral-based k=1
clustering of graph vertices, see Sidebar 3. Remark 7: In analogy to the classic Fourier transform where
The graph Fourier transform, X, of a graph signal, x, is the signal is projected onto a set of harmonic orthogonal
then defined as bases, X = U−1 x, where U is the matrix √ of harmonic
X = U−1 x. (14) bases uk = [1, ej2πk/N , . . . , ejπ(N −1)k/N ]T / N , the graph
Fourier transform can be understood as a signal decomposition
Physically, since U−1 = UT , the element X(k) of a graph onto the set of eigenvectors of the graph Laplacian (or the
Fourier transform, X, represents a projection of the graph adjacency matrix) that serve as orthonormal basis functions.
7
1

Sidebar 3: Vertex Clustering


Clustering of graph vertices refers to a process of
identifying and arranging the vertices of a graph into
nonverlapping vertex subsets, with data in each subset
expected to exhibit relative similarity in some sense. One
efficient approach to vertex clustering is based on spectral
graph analysis. For a graph with N vertices, the orthogonal
eigenvectors of its Laplacian build an N-dimensional space,
called spectral space. The elements uk (n) of the eigen-
vector uk , k = 1; 2; : : : ; N, can be assigned to vertices
n, n = 1; 2; : : : ; N to form an N-dimensional spectral
vector qn = [u1 (n); u1 (n); : : : ; uN (n)]. The elements of
the first eigenvector, u1 , are constant and are omitted,
since they do not convey any spectral difference to the
graph vertices. For the purpose of vertex clustering, the
original N-dimensional spectral vector space is reduced to a Vertices colored using the spectral vectors
new L < N-dimensional spectral space, where the spectral qn = [u2 (n); u3 (n); u4 (n)] as color coordinates.
vectors,
qn = [u2 (n); u3 (n); : : : ; uL+1 (n)]; color coordinates for the vertex n. Similar colors indicate
are used to define the spectral similarity between vertices high spectral similarity.
n and m as kqn − qm k2 . Clustering of vertices is then Note that vertex clustering is a signal-independent op-
performed by grouping spectrally similar vertices. eration. It roughly indicates the expected relation between
The simplest (and most widely used) case occurs when sensor data values on the considered graph, and suggests
only one eigenvector, u2 , is used for spectral clustering, that data processing operations (including processing of the
whereby the order of vertices in the sorted u2 corresponds signal from Fig. 3) will be predominantly localized within
to its smoothest representation. This procedure can be these clusters.
used for ordering the vertices in graphs if we desire to Formally, the presented reduction in spectral dimension-
perform any form of classical presentation or processing ality, from the original N eigenvectors to L eigenvectors
with vertices on a line graph, as in Fig. 1b) (bottom). with lowest variations (with the smallest smoothness index
The spectral vector, qn , can be used either as a position uTk Luk = –k ) corresponds to the low-pass filtering in
of a vertex in a new low L-dimensional space, or it can be graph signal processing, whereby a signal with N spectral
used for coloring of the vertices at their original positions. components is projected onto a reduced spectral space with
For the graph from Fig. 2, such coloring is performed using L slowest varying spectral components, within a given set
the spectral vector elements qn = [u2 (n); u3 (n); u4 (n)] as of basis functions.

In the case of a circular graph, the graph Fourier transform x = −160u1 +16u2 −8u3 −40u4 +16u5 −24u6 +ε(n), where
reduces to the standard discrete Fourier transform (DFT). For the random Gaussian noise, ε(n), had standard deviation σε =
this reason, the transform in (15) is referred to as the Graph 4.
Fourier transform (GFT).
Classic spectral analysis can thus be considered as a special VIII. S PECTRAL D OMAIN OF A S YSTEM ON G RAPHS
case of graph signal spectral analysis, with the adjacency Consider a system on a graph, as in (10), defined by its
matrix defined on an unweighted circular directed graph (a Laplacian matrix, given by
line graph with the connected last and√first vertex), when M −1
uk = [1, ej2πk/N , . . . , ejπ(N −1)k/N ]T / N . This becomes
X
y= hm Lm x. (18)
obvious by recognizing that the eigenvalues of a directed m=0
unweighted circular graph, λk = e−j2πk/N , are easily ob-
Upon employing the eigen-domain (graph spectral) represen-
tained as a solution of the eigenvalue/eigenvector (EVD)
tation of the Laplacian matrix, L = UΛU−1 , we have
relation Auk = λk uk . For a vertex n, this relation is of
the form uk (n − 1) = λk uk (n). The previous vector elements M
X −1

uk (n) and eigenvalues λk are the solutions of this difference y= hm UΛm U−1 x = U H(Λ)U−1 x, (19)
equation. It can be shown that the eigenvectors of the graph m=0

Laplacian of a line graph are real-valued harmonic functions, where


M −1
whose combinations can produce the standard complex-valued X
H(Λ) = hm Λm (20)
DFT basis functions, in an indirect way. The standard signal
m=0
representation in Fig 1b) therefore corresponds to a signal
whose domain is a line graph. is the transfer function of the graph system.
From (19), U−1 y = H(Λ)U−1 x, or in terms of the graph
As is common in signal processing, the true temperature
Fourier transform of the input and output signal
was simulated through a linear combination of several graph
Laplacian eigenvectors (serving as basis functions) in the form Y = H(Λ) X. (21)
8

The classic spectral transfer function for (9) is then obtained g(λk ) = exp(−λk ) and to then filter the graph signal using
by using the adjacency matrix of an unweighed directed this spectral domain graph filter. For M = 4, the correspond-
circular graph whose eigenvalues are λk = e−j2πk/N . ing system coefficients can be found to be h0 = 0.9606,
h1 = −0.7453, h2 = 0.1936, and h3 = −0.0162. Upon
IX. S PECTRAL D OMAIN F ILTER D ESIGN signal filtering using the so defined graph transfer function,
Consider a desired graph transfer function, G(Λ). Like in the output signal-to-noise ratio was SN R = 21.74 dB, that is
classic signal processing, a system with this transfer function a 7.54 dB improvement over the original signal-to-noise ratio
can be implemented either in the spectral domain or in the SN R0 = 14.2 dB.
vertex domain. More detail on the solution of the system in (22) and (23)
The spectral domain implementation is straightforward and is provided in Sidebar 4.
can be performed in the following three steps:
1) Calculate the GFT of the input graph signal X = U−1 x, X. O PTIMAL D ENOISING
2) Multiply the GFT of the input graph signal with transfer Consider a measurement, as in the temperature measure-
function G(Λ) to obtain Y = G(Λ)X, and ment scenario in Fig. 1, which is composed of a slow-
3) Calculate the output graph signal as the inverse graph varying desired signal, s, and a superimposed fast changing
Fourier transform of Y to yield y = UY. disturbance, ε, to give
Notice that this procedure may be computationally very
x = s + ε.
demanding for large graphs where it may be easier to im-
plement the desired filter (or its close approximation) in the The aim is to design a graph filter for disturbance suppression
vertex domain, in analogy to the time domain in the classical (denoising), the output of which is denoted by y, [11].
approach. This means that we have to find the coefficients, The optimal denoising task can then be defined through a
h0 , h1 , . . . , hM −1 in (8), such that its spectral representation, minimization of the cost function
H(Λ), is equal (or at least as close as possible) to the desired 1
G(Λ). J = ky − xk22 +αyT Ly. (25)
2
In other words, the transfer function of the vertex do-
main system in (20), given by H(λk ) = h0 + h1 λ1k + The minimization of the first term, 21 ky − xk22 , enforces the
. . . hM −1 λM −1
, should be equal to the desired transfer func- output signal, y, to be as close as possible, in terms of the
k
tion, G(λk ), for each spectral index, k. This condition leads minimum residual disturbance power, to the available obser-
to a system of linear equations vations, x. As mentioned before, the second term, yT Ly,
represents a measure of smoothness of the graph filter output,
−1
h0 + h1 λ11 + . . . hM −1 λM
1 = G(λ1 ) y. For more detail on promoting smoothness of a graph signal,
−1
h0 + h1 λ12 + . . . hM −1 λM
2 = G(λ2 ) see Sidebar 2. The parameter α models a balance between
.. the closeness of the output, y, to the observed data, x, and
. the smoothness of output estimate y. While the problem in
h0 + h1 λ1N + . . . hM −1 λM −1
= G(λN ). (22) (25) could be expressed through a constrained Lagrangian
N
optimization, whereby we choose to focus more on the graph
The matrix form of this system is given by theoretic issues and hence we adopt a simpler option whereby
Vλ h = g, (23) the mixing parameter α is chosen empirically.
The solution to this minimization problem follows from
where Vλ is a Vandermonde matrix formed of the eigenvalues,
∂J
λk , while h = [h0 , h1 , . . . , hM −1 ]T is the vector of system = y − x + 2αLy = 0
coefficients that we wish to estimate, and ∂yT
and results in a smoothing optimal denoiser in the form
g = [G(λ1 ), G(λ2 ), . . . , G(λN )]T = diag(G(Λ)).
y = (I + 2αL)−1 x.
The system order M is typically significantly lower than the
number of equations, N , in (22). For such an overdetermined The Laplacian spectral domain form of this relation is
case, the least-squares approximation of h is obtained by
2
minimizing the squared error, e2 = kVλ h − gk2 . Like in Y = (I + 2αΛ)−1 X,
standard least-squares, the solution is obtained by a direct with the corresponding graph filter transfer function
minimization, ∂e2 /∂hT = 0, to yield
1
ĥ = (VλT Vλ )−1 VλT g = pinv(Vλ )g. (24) H(λk ) = .
1 + 2αλk
The so obtained solution, ĥ, therefore represents the mean For a small α, H(λk ) ≈ 1 and y ≈ x, while for a large
square error minimizer for Vλ h = g. Notice that this solution α, H(λk ) ≈ δ(k) and y ≈ const., which enforces y to
may not satisfy Vλ h = g, in which case the coefficients ĝ be maximally smooth (a constant, without any variation).
(its spectrum Ĝ(Λ)) may be used, that is Using α = 4, the obtained output signal-to-noise ratio for
the graph signal from Fig. 3 was SN R = 26 dB, a 11.8 dB
Vλ ĥ = ĝ.
improvement over the original SN R0 = 14.2 dB.
Such a solution, in general, differs from the desired system Remark 8: There are many cases when the graph topology is
coefficients g (its spectrum G(Λ)). unknown, so that the graph structure, i.e., the Laplacian (graph
Example: Consider the graph signal from Fig. 3. The edges and their weights) is also unknown. To this end, we
task is to design a graph filter whose frequency response is may employ a class of methods for graph topology learning,
9
1

Sidebar 4: Comments on the Graph Filter in (22) (M − Nm ) filter coefficients are free variables.
An infinite number of equivalent filters is ob-
Consider the following cases:
tained.
1) All the eigenvalues of L are distinct: b) For M = Nm , the solution is unique.
a) For M = N, the solution is unique. c) For M < Nm (overdetermined system), the
b) For M < N (overdetermined system), the mean mean square sense solution is obtained.
square sense solution is obtained. 3) Any filter of an order M > Nm has a unique
2) Some of the eigenvalues are of a degree higher than equivalent filter whose order is at most Nm . Such
one, the system reduces to Nm < N linear equations. equivalence can be obtained by setting the free vari-
a) For Nm < M ≤ N (underdetermined system), ables to zero, hi = 0 for i = Nm ; Nm + 1; : : : ; N − 1.

based on the minimization of the cost function in (25) with meaningful nature of this example-driven Lecture Note is also
respect to both the Laplacian, L, and the output signal, y, likely to promote intellectual curiosity and serve as a platform
with additional (commonly sparsity) constraints imposed on to explore the numerous opportunities in manifold applica-
the Laplacian values. tions in our ever-growing interconnected world, facilitated by
the Internet of Things.
XI. C URRENT G RAPH S IGNAL P ROCESSING C HALLENGES
Current research is mainly focused on graphs themselves, ACKNOWLEDGMENTS
like for example, on reducing the complexity of calculation in We are privileged to have had the help and advice of one
very large graphs, including downsampling, multirate analy- of the pioneers in Graph Theory Professor Nicos Christofides.
sis, compressive sensing, graph segmentation, non-linear GSP, We are grateful for his time, his incisive comments and
robust GSP, deep learning architectures for graph signals, mul- valuable advice. We would also like to express our sin-
tidimensional graph signals, and vertex-varying and vertex- cere gratitude to the students in our respective postgraduate
frequency analysis. courses, for their feedback on the material taught based on
this Lecture Note.
XII. W HAT W E H AVE L EARNED
Natural signals (speech, biomedical, video) reside over AUTHORS
irregular domains and are, unlike the signals in communica- Ljubiša Stanković, FIEEE, ([email protected]) is profes-
tions, not adequately processed using, e.g., standard harmonic sor at the University of Montenegro. His research interests
analyses. While Data Analytics are heavily dependent on include time-frequency analysis, compressive sensing, and
advances in DSP, neither the EE graduates worldwide nor graph signal processing. He is a vice-president of the National
practical data analysts are yet best prepared to employ graph Academy of Sciences and Arts of Montenegro (CANU) and a
algorithms in their future jobs. Our aim has been to fill member of the European Academy of Sciences and Arts. Prof.
this void by providing an example-driven platform to intro- Stanković is a recipient of the 2017 EURASIP Best Journal
duce graphs and their properties through the well understood Paper Award.
notions of transfer functions, Fourier transform, and digital Danilo P. Mandic, FIEEE, ([email protected]) is
filtering. a professor of signal processing at Imperial College London,
While both a graph with N vertices and a classical discrete United Kingdom. He is a member of the IEEE Signal Process-
time signal with N samples can be viewed as N -dimensional ing Society Education Technical Committee, and has received
vectors, structured graphs are much richer irregular domains President’s Award for Excellence in Postgraduate Supervision
which convey information about both the signal generation at Imperial College. He is a recipient of the 2018 Best Paper
and propagation mechanisms. This allows us to employ intu- Award in IEEE Signal Processing Magazine.
ition and our know-how from Euclidean domains to revisit Miloš Daković ([email protected]) is professor at the Univer-
basic dimensionality reduction operations, such as coarse sity of Montenegro. His research interests include graph signal
graining of graphs (cf. standard downsampling). In addition, processing, and time-frequency analysis.
in the vertex domain a number of different distances (shortest- Ilya Kisil ([email protected]) is a Ph.D. candidate
path, resistance, diffusion) have useful properties which can at Imperial College London. His research interests include
be employed to maintain data integrity throughout the pro- tensor decompositions, big data, efficient software for large
cessing, storage, communication and analysis stages, as the scale problems, and graph signal processing.
connectivities and edge weights are either dictated by the Ervin Sejdić, SMIEEE, ([email protected]) is an assistant
physics of the problem at hand or are inferred from the data. professor at the University of Pittsburgh, USA. His research
This particularly facilitates maintaining control and intuition interests include biomedical signal processing, rehabilitation
over distributed operations throughout the processing chain. engineering, and neuroscience. He received the USA Presi-
It is our hope that this lecture note has helped to demystify dential Early Career Award for Scientists and Engineers in
graph signal processing for students and educators, together 2016.
with empowering practitioners with enhanced intuition in Anthony G. Constantinides, LFIEEE, (a.constantinides
graph-theoretic design and optimization. This material may @imperial.ac.uk) is emeritus professor of signal processing
also serve as a vehicle to seamlessly merge curricula in Elec- in the Department of Electrical and Electronic Engineering at
trical Engineering and Computing. The generic and physically Imperial College London, United Kingdom.
10
1

Original image Noisy image Graph filtered image


Sidebar 5: Graph Topology Based on Signal
Similarity: Image Processing Example
The graph weights in our temperature field example
are defined based on the geometric distance of vertices
(sensing points). However, in some applications signal
values themselves may be used as an indicator of signal
similarity, as is the case with image processing, where this Originaln, noise corrupted, and filtered image using
is achieved in combination with the pixel/vertex distances. Taubin’s algorithm (see Sidebar 2).
For the image intensity values at pixels indexed by n and
m, denoted by x(n) and x(m), the difference of intensities
may be defined using an exponential kernel, as Example: Consider the problem of denoising a 50 × 50
pixel, 8-bit grayscale, image, shown above. The vertices of
Intensity distance(m; n) = snm = |x(n) − x(m)|: the graph are the pixel locations. The edge weights for the
Then, the corresponding weights may be defined as graph representation
√ of this noisy image were calculated
2
with » = 2 and fi = 20. This value of » means
=fi 2
Wnm = e −(x(n)−x(m)) that each vertex is connected with 8 neighboring vertices
(including diagonal ones) with the defined weights, Wnm .
for rnm ≤ », and Wnm = 0 for rnm > », where rnm is a
Low-pass filtering was performed on the corresponding
geometric distance of the considered pixels/vertices.
image graph using iterative filtering (Taubin’s algorithm)
We next present an example of this kind of weighting
over 200 iterations, with ¸ = 0:1 and ˛ = 0:15.
applied to a simple graph image filtering problem.

R EFERENCES
[1] N. Christofides, “Graph theory: An algorithmic approach”, Academic
Press, 1975.
[2] F. Afrati and A. G. Constantinides, “The use of graph theory in binary
block code construction”, in Proceedings of the International Conference
on Digital Signal Processing, pp. 228-233, Florence, Italy, 310 August
- 2 September, 1978
[3] O. J. Morris, M. de J. Lee, and A. G. Constantinides, “Graph theory for
image analysis: An approach based on the shortest spanning tree”, IEE
Proceedings F-Communications, Radar and Signal Processing, vol. 133,
no. 2, pp. 146-152, 1986.
[4] S. Chen, R. Varma, A. Sandryhaila, and J. Kovačević, “Discrete signal
processing on graphs: Sampling theory,” IEEE Trans. on Signal Process-
ing, vol. 63, no. 24, pp. 6510-6523, Dec.15, 2015.
[5] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on
graphs,” IEEE Transactions on Signal Processing, vol. 61, no. 7, pp.
1644–1656, Apr. 2013.
[6] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on
graphs: Frequency analysis,” IEEE Transactions on Signal Processing,
vol. 62, no. 12, pp. 3042–3054, Jun. 2014.
[7] G. Cheung, E. Magli, Y. Tanaka, and M. K. Ng, “Graph spectral image
processing,” Proceedings of the IEEE, vol. 106(5), pp. 907-930, May
2018.
[8] A. Ortega, P. Frossard, J. Kovačević, J. M. F. Moura, P. Vandergheynst,
“Graph signal processing: Overview, challenges, and applications”, Pro-
ceedings of the IEEE, vol. 106(5), pp. 808–28, May 2018.
[9] S. Saito, H. Suzuki, and D. P. Mandic, “Hypergraph p-Laplacian: A
differential geometry view”, in Proceedings of the the Thirty Second
AAAI Conference on Artificial Intelligence (AAAI-18), pp. 3984–3991,
2018.
[10] L. Stanković and E. Sejdić, “Vertex-frequency analysis of graph sig-
nals,”, Springer Nature, 2019.
[11] S. Segarra, A. G. Marques, and A. Ribeiro, “Optimal graph-filter
design and applications to distributed linear network operators”, IEEE
Transactions on Signal Processing, 65(15), pp. 4117–4131, 2017.

You might also like