0% found this document useful (0 votes)
68 views22 pages

KERNEL Geometric Deep Learning: LeCun

This document discusses geometric deep learning, which aims to generalize deep neural networks to non-Euclidean data structures like graphs and manifolds. It presents examples of non-Euclidean data that arise in applications like social networks, sensor networks, genetics, neuroscience, and computer graphics. Traditional deep learning methods assume an underlying Euclidean structure and properties like translation invariance, but these do not apply to non-Euclidean domains. The document outlines two classes of geometric learning problems - characterizing the structure of non-Euclidean data, and analyzing functions defined on such domains. It aims to show how concepts from deep learning like convolutions can be adapted to these settings.

Uploaded by

Rubens Zimbres
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views22 pages

KERNEL Geometric Deep Learning: LeCun

This document discusses geometric deep learning, which aims to generalize deep neural networks to non-Euclidean data structures like graphs and manifolds. It presents examples of non-Euclidean data that arise in applications like social networks, sensor networks, genetics, neuroscience, and computer graphics. Traditional deep learning methods assume an underlying Euclidean structure and properties like translation invariance, but these do not apply to non-Euclidean domains. The document outlines two classes of geometric learning problems - characterizing the structure of non-Euclidean data, and analyzing functions defined on such domains. It aims to show how concepts from deep learning like convolutions can be adapted to these settings.

Uploaded by

Rubens Zimbres
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

IEEE SIG PROC MAG 1

Geometric deep learning:


going beyond Euclidean data
Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, Pierre Vandergheynst

Many scientific fields study data with an underlying struc- the data such as stationarity and compositionality through
ture that is a non-Euclidean space. Some examples include local statistics, which are present in natural images, video, and
social networks in computational social sciences, sensor net- speech [14], [15]. These statistical properties have been related
arXiv:1611.08097v2 [cs.CV] 3 May 2017

works in communications, functional networks in brain imag- to physics [16] and formalized in specific classes of convo-
ing, regulatory networks in genetics, and meshed surfaces lutional neural networks (CNNs) [17], [18], [19]. In image
in computer graphics. In many applications, such geometric analysis applications, one can consider images as functions
data are large and complex (in the case of social networks, on the Euclidean space (plane), sampled on a grid. In this
on the scale of billions), and are natural targets for machine setting, stationarity is owed to shift-invariance, locality is due
learning techniques. In particular, we would like to use deep to the local connectivity, and compositionality stems from
neural networks, which have recently proven to be powerful the multi-resolution structure of the grid. These properties
tools for a broad range of problems from computer vision, are exploited by convolutional architectures [20], which are
natural language processing, and audio analysis. However, built of alternating convolutional and downsampling (pooling)
these tools have been most successful on data with an un- layers. The use of convolutions has a two-fold effect. First,
derlying Euclidean or grid-like structure, and in cases where it allows extracting local features that are shared across the
the invariances of these structures are built into networks used image domain and greatly reduces the number of parameters
to model them. in the network with respect to generic deep architectures
Geometric deep learning is an umbrella term for emerging (and thus also the risk of overfitting), without sacrificing the
techniques attempting to generalize (structured) deep neural expressive capacity of the network. Second, the convolutional
models to non-Euclidean domains such as graphs and man- architecture itself imposes some priors about the data, which
ifolds. The purpose of this paper is to overview different appear very suitable especially for natural images [21], [18],
examples of geometric deep learning problems and present [17], [19].
available solutions, key difficulties, applications, and future
research directions in this nascent field. While deep learning models have been particularly success-
ful when dealing with signals such as speech, images, or video,
I. I NTRODUCTION in which there is an underlying Euclidean structure, recently
“Deep learning” refers to learning complicated concepts there has been a growing interest in trying to apply learning
by building them from simpler ones in a hierarchical or on non-Euclidean geometric data. Such kinds of data arise
multi-layer manner. Artificial neural networks are popular in numerous applications. For instance, in social networks,
realizations of such deep multi-layer hierarchies. In the past the characteristics of users can be modeled as signals on the
few years, the growing computational power of modern GPU- vertices of the social graph [22]. Sensor networks are graph
based computers and the availability of large training datasets models of distributed interconnected sensors, whose readings
have allowed successfully training neural networks with many are modelled as time-dependent signals on the vertices. In
layers and degrees of freedom [1]. This has led to qualitative genetics, gene expression data are modeled as signals defined
breakthroughs on a wide variety of tasks, from speech recog- on the regulatory network [23]. In neuroscience, graph models
nition [2], [3] and machine translation [4] to image analysis are used to represent anatomical and functional structures of
and computer vision [5], [6], [7], [8], [9], [10], [11] (the the brain. In computer graphics and vision, 3D objects are
reader is referred to [12], [13] for many additional examples modeled as Riemannian manifolds (surfaces) endowed with
of successful applications of deep learning). Nowadays, deep properties such as color texture.
learning has matured into a technology that is widely used in
commercial applications, including Siri speech recognition in The non-Euclidean nature of such data implies that there
Apple iPhone, Google text translation, and Mobileye vision- are no such familiar properties as global parameterization,
based technology for autonomously driving cars. common system of coordinates, vector space structure, or
One of the key reasons for the success of deep neural shift-invariance. Consequently, basic operations like convo-
networks is their ability to leverage statistical properties of lution that are taken for granted in the Euclidean case are
even not well defined on non-Euclidean domains. The purpose
MB is with USI Lugano, Switzerland, Tel Aviv University, and Intel of our paper is to show different methods of translating the
Perceptual Computing, Israel. JB is with Courant Institute, NYU and UC
Berkeley, USA. YL with with Facebook AI Research and NYU, USA. AS is key ingredients of successful deep learning methods such as
with Facebook AI Research, USA. PV is with EPFL, Switzerland. convolutional neural networks to non-Euclidean data.
IEEE SIG PROC MAG 2

II. G EOMETRIC LEARNING PROBLEMS represented as a time-dependent signal on the vertices of the
Broadly speaking, we can distinguish between two classes social graph. An important application in location-based social
of geometric learning problems. In the first class of problems, networks is to predict the position of the user given his or
the goal is to characterize the structure of the data. The second her past behavior, as well as that of his or her friends [42].
class of problems deals with analyzing functions defined on a In this problem, the domain (social graph) is assumed to be
given non-Euclidean domain. These two classes are related, fixed; methods of signal processing on graphs, which have
since understanding the properties of functions defined on previously been reviewed in this Magazine [43], can be applied
a domain conveys certain information about the domain, to this setting, in particular, in order to define an operation
and vice-versa, the structure of the domain imposes certain similar to convolution in the spectral domain. This, in turn,
properties on the functions on it. allows generalizing CNN models to graphs [44], [45].
Structure of the domain: As an example of the first In computer graphics and vision applications, finding sim-
class of problems, assume to be given a set of data points ilarity and correspondence between shapes are examples of
with some underlying lower dimensional structure embedded the second sub-class of problems: each shape is modeled as
into a high-dimensional Euclidean space. Recovering that a manifold, and one has to work with multiple such domains.
lower dimensional structure is often referred to as manifold In this setting, a generalization of convolution in the spatial
learning1 or non-linear dimensionality reduction, and is an domain using local charting [46], [47], [48] appears to be more
instance of unsupervised learning. Many methods for non- appropriate.
linear dimensionality reduction consist of two steps: first,
Brief history: The main focus of this review is on this
they start with constructing a representation of local affinity
second class of problems, namely learning functions on non-
of the data points (typically, a sparsely connected graph).
Euclidean structured domains, and in particular, attempts to
Second, the data points are embedded into a low-dimensional
generalize the popular CNNs to such settings. First attempts
space trying to preserve some criterion of the original affinity.
to generalize neural networks to graphs we are aware of are
For example, spectral embeddings tend to map points with
due to Scarselli et al. [49], who proposed a scheme combining
many connections between them to nearby locations, and
recurrent neural networks and random walk models. This
MDS-type methods try to preserve global information such
approach went almost unnoticed, re-emerging in a modern
as graph geodesic distances. Examples of manifold learning
form in [50], [51] due to the renewed recent interest in deep
include different flavors of multidimensional scaling (MDS)
learning. The first formulation of CNNs on graphs is due to
[26], locally linear embedding (LLE) [27], stochastic neighbor
Bruna et al. [52], who used the definition of convolutions in
embedding (t-SNE) [28], spectral embeddings such as Lapla-
the spectral domain. Their paper, while being of conceptual
cian eigenmaps [29] and diffusion maps [30], and deep models
importance, came with significant computational drawbacks
[31]. Most recent approaches [32], [33], [34] tried to apply the
that fell short of a truly useful method. These drawbacks were
successful word embedding model [35] to graphs. Instead of
subsequently addressed in the followup works of Henaff et al.
embedding the vertices, the graph structure can be processed
[44] and Defferrard et al. [45]. In the latter paper, graph CNNs
by decomposing it into small sub-graphs called motifs [36] or
allowed achieving some state-of-the-art results.
graphlets [37].
In some cases, the data are presented as a manifold or graph In a parallel effort in the computer vision and graphics
at the outset, and the first step of constructing the affinity community, Masci et al. [47] showed the first CNN model
structure described above is unnecessary. For instance, in on meshed surfaces, resorting to a spatial definition of the
computer graphics and vision applications, one can analyze 3D convolution operation based on local intrinsic patches. Among
shapes represented as meshes by constructing local geometric other applications, such models were shown to achieve state-
descriptors capturing e.g. curvature-like properties [38], [39]. of-the-art performance in finding correspondence between
In network analysis applications such as computational sociol- deformable 3D shapes. Followup works proposed different
ogy, the topological structure of the social graph representing construction of intrinsic patches on point clouds [53], [48]
the social relations between people carries important insights and general graphs [54].
allowing, for example, to classify the vertices and detect The interest in deep learning on graphs or manifolds has
communities [40]. In natural language processing, words in a exploded in the past year, resulting in numerous attempts to
corpus can be represented by the co-occurrence graph, where apply these methods in a broad spectrum of problems ranging
two words are connected if they often appear near each other from biochemistry [55] to recommender systems [56]. Since
[41]. such applications originate in different fields that usually do
Data on a domain: Our second class of problems deals with not cross-fertilize, publications in this domain tend to use
analyzing functions defined on a given non-Euclidean domain. different terminology and notation, making it difficult for a
We can further break down such problems into two subclasses: newcomer to grasp the foundations and current state-of-the-art
problems where the domain is fixed and those where multiple methods. We believe that our paper comes at the right time
domains are given. For example, assume that we are given attempting to systemize and bring some order into the field.
the geographic coordinates of the users of a social network,
Structure of the paper: We start with an overview of
1 Note that the notion of “manifold” in this setting can be considerably more traditional Euclidean deep learning in Section III, summarizing
general than a classical smooth manifold; see e.g. [24], [25] the important assumptions about the data, and how they are
IEEE SIG PROC MAG 3

realized in convolutional network architectures.2


Going to the non-Euclidean world in Section IV, we then Notation
define basic notions in differential geometry and graph theory. Rm m-dimensional Euclidean space
These topics are insufficiently known in the signal processing a, a, A Scalar, vector, matrix
community, and to our knowledge, there is no introductory- ā Complex conjugate of a
level reference treating these so different structures in a Ω, x Arbitrary domain, coordinate on it
common way. One of our goals is to provide an accessible f ∈ L2 (Ω) Square-integrable function on Ω
overview of these models resorting as much as possible to the δx0 (x), δij Delta function at x0 , Kronecker delta
intuition of traditional signal processing. {fi , yi }i∈I Training set
In Sections V–VIII, we overview the main geometric deep Tv Translation operator
learning paradigms, emphasizing the similarities and the differ- τ, Lτ Deformation field, operator
ences between Euclidean and non-Euclidean learning methods. fˆ Fourier transform of f
The key difference between these approaches is in the way a f ?g Convolution of f and g
convolution-like operation is formulated on graphs and mani- X , T X , Tx X Manifold, its tangent bundle, tangent
folds. One way is to resort to the analogy of the Convolution space at x
Theorem, defining the convolution in the spectral domain. An h·, ·, iT X Riemannian metric
alternative is to think of the convolution as a template matching f ∈ L2 (X ) Scalar field on manifold X
in the spatial domain. Such a distinction is, however, far from F ∈ L2 (T X ) Tangent vector field on manifold X
being a clear-cut: as we will see, some approaches though A∗ Adjoint of operator A
draw their formulation from the spectral domain, essentially ∇, div, ∆ Gradient, divergence, Laplace operators
boil down to applying filters in the spatial domain. It is V, E, F Vertices and edges of a graph,
also possible to combine these two approaches resorting to faces of a mesh
spatio-frequency analysis techniques, such as wavelets or the wij , W Weight matrix of a graph,
windowed Fourier transform. f ∈ L2 (V) Functions on vertices of a graph
In Section IX, we show examples of selected problems from F ∈ L2 (E) Functions on edges of a graph
the fields of network analysis, particle physics, recommender φi , λi Laplacian eigenfunctions, eigenvalues
systems, computer vision, and graphics. In Section X, we ht (·, ·) Heat kernel
draw conclusions and outline current main challenges and Φk Matrix of first k Laplacian eigenvectors
potential future research directions in geometric deep learning. Λk Diagonal matrix of first k Laplacian
To make the paper more readable, we use inserts to illustrate eigenvalues
important concepts. Finally, the readers are invited to visit ξ Point-wise nonlinearity (e.g. ReLU)
a dedicated website geometricdeeplearning.com for γl,l0 (x), Γl,l0 Convolutional filter in spatial
additional materials, data, and examples of code. and spectral domain

III. D EEP LEARNING ON E UCLIDEAN DOMAINS


Geometric priors: Consider a compact d-dimensional Stationarity: Let
Euclidean domain Ω = [0, 1]d ⊂ Rd on which square-
Tv f (x) = f (x − v), x, v ∈ Ω, (2)
integrable functions f ∈ L2 (Ω) are defined (for example,
in image analysis applications, images can be thought of be a translation operator3 acting on functions f ∈ L2 (Ω).
as functions on the unit square Ω = [0, 1]2 ). We consider Our first assumption is that the function y is either invariant or
a generic supervised learning setting, in which an unknown equivariant with respect to translations, depending on the task.
function y : L2 (Ω) → Y is observed on a training set In the former case, we have y(Tv f ) = y(f ) for any f ∈ L2 (Ω)
and v ∈ Ω. This is typically the case in object classification
{fi ∈ L2 (Ω), yi = y(fi )}i∈I . (1)
tasks. In the latter, we have y(Tv f ) = Tv y(f ), which is well-
In a supervised classification setting, the target space Y can defined when the output of the model is a space in which
be thought discrete with |Y| being the number of classes. In translations can act upon (for example, in problems of object
a multiple object recognition setting, we can replace Y by the localization, semantic segmentation, or motion estimation).
K-dimensional simplex, which represents the posterior class Our definition of invariance should not be confused with the
probabilities p(y|x). In regression tasks, we may consider Y = traditional notion of translation invariant systems in signal
Rm . processing, which corresponds to translation equivariance in
In the vast majority of computer vision and speech analysis our language (since the output translates whenever the input
tasks, there are several crucial prior assumptions on the translates).
unknown function y. As we will see in the following, these Local deformations and scale separation: Similarly, a
assumptions are effectively exploited by convolutional neural deformation Lτ , where τ : Ω → Ω is a smooth vector field,
network architectures. acts on L2 (Ω) as Lτ f (x) = f (x − τ (x)). Deformations can
2 For a more in-depth review of CNNs and their applications, we refer the 3 We assume periodic boundary conditions to ensure that the operation is
reader to [12], [1], [13] and references therein. well-defined over L2 (Ω).
IEEE SIG PROC MAG 4

model local translations, changes in point of view, rotations Additionally, a downsampling or pooling layer g = P (f )
and frequency transpositions [18]. may be used, defined as
Most tasks studied in computer vision are not only transla-
tion invariant/equivariant, but also stable with respect to local gl (x) = P ({fl (x0 ) : x0 ∈ N (x)}), l = 1, . . . , q, (8)
deformations [57], [18]. In tasks that are translation invariant where N (x) ⊂ Ω is a neighborhood around x and P is a
we have permutation-invariant function such as a Lp -norm (in the latter
|y(Lτ f ) − y(f )| ≈ k∇τ k, (3) case, the choice of p = 1, 2 or ∞ results in average-, energy-,
for all f, τ . Here, k∇τ k measures the smoothness of a given or max-pooling).
deformation field. In other words, the quantity to be predicted A convolutional network is constructed by composing sev-
does not change much if the input image is slightly deformed. eral convolutional and optionally pooling layers, obtaining a
In tasks that are translation equivariant, we have generic hierarchical representation

|y(Lτ f ) − Lτ y(f )| ≈ k∇τ k. (4) UΘ (f ) = (CΓ(K) · · · P · · · ◦ CΓ(2) ◦ CΓ(1) )(f ) (9)

This property is much stronger than the previous one, since where Θ = {Γ(1) , . . . , Γ(K) } is the hyper-vector of the
the space of local deformations has a high dimensionality, as network parameters (all the filter coefficients). The model is
opposed to the d-dimensional translation group. said to be deep if it comprises multiple layers, though this
It follows from (3) that we can extract sufficient statistics notion is rather vague and one can find examples of CNNs with
at a lower spatial resolution by downsampling demodulated as few as a couple and as many as hundreds of layers [11].
localized filter responses without losing approximation power. The output features enjoy translation invariance/covariance
An important consequence of this is that long-range depen- depending on whether spatial resolution is progressively lost
dencies can be broken into multi-scale local interaction terms, by means of pooling or kept fixed. Moreover, if one spec-
leading to hierarchical models in which spatial resolution is ifies the convolutional tensors to be complex wavelet de-
progressively reduced. To illustrate this principle, denote by composition operators and uses complex modulus as point-
wise nonlinearities, one can provably obtain stability to local
Y (x1 , x2 ; v) = Prob(f (u) = x1 and f (u + v) = x2 ) (5) deformations [17]. Although this stability is not rigorously
the joint distribution of two image pixels at an offset v from proved for generic compactly supported convolutional tensors,
each other. In the presence of long-range dependencies, this it underpins the empirical success of CNN architectures across
joint distribution will not be separable for any v. However, a variety of computer vision applications [1].
the deformation stability prior states that Y (x1 , x2 ; v) ≈ In supervised learning tasks, one can obtain the CNN
Y (x1 , x2 ; v(1 + )) for small . In other words, whereas parameters by minimizing a task-specific cost L on the training
long-range dependencies indeed exist in natural images and set {fi , yi }i∈I ,
are critical to object recognition, they can be captured and
X
min L(UΘ (fi ), yi ), (10)
down-sampled at different scales. This principle of stability Θ
i∈I
to local deformations has been exploited in the computer
vision community in models other than CNNs, for instance, for instance, L(x, y) = kx − yk. If the model is sufficiently
deformable parts models [58]. complex and the training set is sufficiently representative,
In practice, the Euclidean domain Ω is discretized using when applying the learned model to previously unseen data,
a regular grid with n points; the translation and deformation one expects U (f ) ≈ y(f ). Although (10) is a non-convex
operators are still well-defined so the above properties hold in optimization problem, stochastic optimization methods offer
the discrete setting. excellent empirical performance. Understanding the structure
Convolutional neural networks: Stationarity and sta- of the optimization problems (10) and finding efficient strate-
bility to local translations are both leveraged in convolutional gies for its solution is an active area of research in deep
neural networks (see insert IN1). A CNN consists of several learning [62], [63], [64], [65], [66].
convolutional layers of the form g = CΓ (f ), acting on a p- A key advantage of CNNs explaining their success in
dimensional input f (x) = (f1 (x), . . . , fp (x)) by applying a numerous tasks is that the geometric priors on which CNNs
bank of filters Γ = (γl,l0 ), l = 1, . . . , q, l0 = 1, . . . , p and are based result in a learning complexity that avoids the
point-wise non-linearity ξ, curse of dimensionality. Thanks to the stationarity and local
deformation priors, the linear operators at each layer have a
p
!
X constant number of parameters, independent of the input size
gl (x) = ξ (fl0 ? γl,l0 )(x) , (6)
n (number of pixels in an image). Moreover, thanks to the
l0 =1
multiscale hierarchical property, the number of layers grows
producing a q-dimensional output g(x) = (g1 (x), . . . , gq (x)) at a rate O(log n), resulting in a total learning complexity of
often referred to as the feature maps. Here, O(log n) parameters.
Z
(f ? γ)(x) = f (x − x0 )γ(x0 )dx0 (7)
Ω IV. T HE GEOMETRY OF MANIFOLDS AND GRAPHS
denotes the standard convolution. According to the local Our main goal is to generalize CNN-type constructions
deformation prior, the filters Γ have compact spatial support. to non-Euclidean domains. In this paper, by non-Euclidean
IEEE SIG PROC MAG 5

[IN1] Convolutional neural networks: CNNs are currently cations requiring covariance are semantic image segmentation
among the most successful deep learning architectures in a [8] or motion estimation [59].
variety of tasks, in particular, in computer vision. A typical In applications requiring invariance, such as image classifi-
CNN used in computer vision applications (see FIGS1) con- cation [7], the convolutional layers are typically interleaved
sists of multiple convolutional layers (6), passing the input with pooling layers (8) progressively reducing the resolution
image through a set of filters Γ followed by point-wise non- of the image passing through the network. Alternatively, one
linearity ξ (typically, half-rectifiers ξ(z) = max(0, z) are used, can integrate the convolution and downsampling in a single
although practitioners have experimented with a diverse range linear operator (convolution with stride). Recently, some au-
of choices [13]). The model can also include a bias term, which thors have also experimented with convolutional layers which
is equivalent to adding a constant coordinate to the input. increase the spatial resolution using interpolation kernels [60].
A network composed of K convolutional layers put together These kernels can be learnt efficiently by mimicking the so-
U (f ) = (CΓ(K) . . . ◦ CΓ(2) ◦ CΓ(1) )(f ) produces pixel-wise called algorithme à trous [61], also referred to as dilated
features that are covariant w.r.t. translation and approximately convolution.
covariant to local deformations. Typical computer vision appli-

Samoyed (16) ; Papillon (5.7) ; Pomeranzian (2.7) ; Arctic fox (1.0) ; Eskimo dog (0.6) ; White wolf (0.4) ; Siberian husky (0.4)

Convolutions and ReLU

Max pooling

Convolutions and ReLU

Max pooling

Convolutions and ReLU

Red Green Blue

[FIGS1] Typical convolutional neural network architecture used in computer vision applications (figure reproduced from [1]).

domains, we refer to two prototypical structures: manifolds and is called a Riemannian metric in differential geometry and
graphs. While arising in very different fields of mathematics allows performing local measurements of angles, distances,
(differential geometry and graph theory, respectively), in our and volumes. A manifold equipped with a metric is called a
context, these structures share several common characteristics Riemannian manifold.
that we will try to emphasize throughout our review. It is important to note that the definition of a Rieman-
Manifolds: Roughly, a manifold is a space that is locally nian manifold is completely abstract and does not require a
Euclidean. One of the simplest examples is a spherical surface geometric realization in any space. However, a Riemannian
modeling our planet: around a point, it seems to be planar, manifold can be realized as a subset of a Euclidean space (in
which has led generations of people to believe in the flatness of which case it is said to be embedded in that space) by using
the Earth. Formally speaking, a (differentiable) d-dimensional the structure of the Euclidean space to induce a Riemannian
manifold X is a topological space where each point x has a metric. The celebrated Nash Embedding Theorem guarantees
neighborhood that is topologically equivalent (homeomorphic) that any sufficiently smooth Riemannian manifold can be
to a d-dimensional Euclidean space, called the tangent space realized in a Euclidean space of sufficiently high dimension
and denoted by Tx X (see Figure 1, top). The collection of [67]. An embedding is not necessarily unique; two different
tangent spaces at all points (more formally, their disjoint realizations of a Riemannian metric are called isometries.
union) is referred to as the tangent bundle and denoted by Two-dimensional manifolds (surfaces) embedded into R3
T X . On each tangent space, we define an inner product are used in computer graphics and vision to describe boundary
h·, ·iTx X : Tx X × Tx X → R, which is additionally assumed surfaces of 3D objects, colloquially referred to as ‘3D shapes’.
to depend smoothly on the position x. This inner product This term is somewhat misleading since ‘3D’ here refers to the
IEEE SIG PROC MAG 6

structure on the manifold, prohibiting us from naı̈vely using


expressions like f (x+dx). The conceptual leap that is required
F (x) to generalize such notions to manifolds is the need to work
Tx X locally in the tangent space.
x
To this end, we define the differential of f as an operator
F (x0 ) df : T X → R acting on tangent vector fields. At each
point x, the differential can be defined as a linear functional
x0 (1-form) df (x) = h∇f (x), · iTx X acting on tangent vectors
F (x) ∈ Tx X , which model a small displacement around
x. The change of the function value as the result of this
Tx0 X
displacement is given by applying the form to the tangent
vector, df (x)F (x) = h∇f (x), F (x)iTx X , and can be thought
of as an extension of the notion of the classical directional
derivative.
The operator ∇f : L2 (X ) → L2 (T X ) in the definition
above is called the intrinsic gradient, and is similar to the
classical notion of the gradient defining the direction of the
steepest change of the function at a point, with the only differ-
ence that the direction is now a tangent vector. Similarly, the
intrinsic divergence is an operator div : L2 (T X ) → L2 (X )
acting on tangent vector fields and (formal) adjoint to the
gradient operator [71],
Fig. 1. Top: tangent space and tangent vectors on a two-dimensional manifold
(surface). Bottom: Examples of isometric deformations. hF, ∇f iL2 (T X ) = h∇∗ F, f iL2 (X ) = h−divF, f iL2 (X ) . (17)
Physically, a tangent vector field can be thought of as a
dimensionality of the embedding space rather than that of the flow of material on a manifold. The divergence measures
manifold. Thinking of such a shape as made of infinitely thin the net flow of a field at a point, allowing to distinguish
material, inelastic deformations that do not stretch or tear it are between field ‘sources’ and ‘sinks’. Finally, the Laplacian
isometric. Isometries do not affect the metric structure of the (or Laplace-Beltrami operator in differential geometric jargon)
manifold and consequently, preserve any quantities that can be ∆ : L2 (X ) → L2 (X ) is an operator
expressed in terms of the Riemannian metric (called intrinsic).
Conversely, properties related to the specific realization of the ∆f = −div(∇f ) (18)
manifold in the Euclidean space are called extrinsic.
acting on scalar fields. Employing relation (17), it is easy to
As an intuitive illustration of this difference, imagine an
see that the Laplacian is self-adjoint (symmetric),
insect that lives on a two-dimensional surface (Figure 1,
bottom). A human observer, on the other hand, sees a surface h∇f, ∇f iL2 (T X ) = h∆f, f iL2 (X ) = hf, ∆f iL2 (X ) . (19)
in 3D space - this is an extrinsic point of view.
Calculus on manifolds: Our next step is to consider The lhs in equation (19) is known as the Dirichlet energy in
functions defined on manifolds. We are particularly interested physics and measures the smoothness of a scalar field on the
in two types of functions: A scalar field is a smooth real manifold (see insert IN3). The Laplacian can be interpreted
function f : X → R on the manifold. A tangent vector field as the difference between the average of a function on an
F : X → T X is a mapping attaching a tangent vector F (x) ∈ infinitesimal sphere around a point and the value of the
Tx X to each point x. As we will see in the following, tangent function at the point itself. It is one of the most important oper-
vector fields are used to formalize the notion of infinitesimal ators in mathematical physics, used to describe phenomena as
displacements on the manifold. We define the Hilbert spaces diverse as heat diffusion (see insert IN4), quantum mechanics,
of scalar and vector fields on manifolds, denoted by L2 (X ) and wave propagation. As we will see in the following, the
and L2 (T X ), respectively, with the following inner products: Laplacian plays a center role in signal processing and learning
Z on non-Euclidean domains, as its eigenfunctions generalize the
hf, giL2 (X ) = f (x)g(x)dx; (15) classical Fourier bases, allowing to perform spectral analysis
ZX on manifolds and graphs.
hF, GiL2 (T X ) = hF (x), G(x)iTx X dx; (16) It is important to note that all the above definitions are
X coordinate free. By defining a basis in the tangent space, it is
dx denotes here a d-dimensional volume element induced by possible to express tangent vectors as d-dimensional vectors
the Riemannian metric. and the Riemannian metric as a d × d symmetric positive-
In calculus, the notion of derivative describes how the value definite matrix.
of a function changes with an infinitesimal change of its Graphs and discrete differential operators: Another
argument. One of the big differences distinguishing classical type of constructions we are interested in are graphs, which
calculus from differential geometry is a lack of vector space are popular models of networks, interactions, and similarities
IEEE SIG PROC MAG 7

j j
k α
sisting of small triangles glued together. The triplet (V, E, F)
ij aijk
wij is referred to as triangular mesh. To be a correct discretization
`ij
βij h
of a manifold (a manifold mesh), every edge must be shared by
ai
exactly two triangular faces; if the manifold has a boundary,
i i any boundary edge must belong to exactly one triangle.
On a triangular mesh, the simplest discretization of the Rie-
mannian metric is given by assigning each edge a length
`ij > 0, which must additionally satisfy the triangle inequality
Undirected graph Triangular mesh in every triangular face. The mesh Laplacian is given by
[FIGS2] Two commonly used discretizations of a two-dimensional formula (25) with
manifold: a graph and a triangular mesh.
−`2ij + `2jk + `2ik −`2ij + `2jh + `2ih
[IN2] Laplacian on discrete manifolds: In computer graph- wij = + ; (12)
8aijk 8aijh
ics and vision applications, two-dimensional manifolds are 1
X
commonly used to model 3D shapes. There are several com- ai = 3 aijk , (13)
jk:(i,j,k)∈F
mon ways of discretizing such manifolds. First, the manifold
is assumed to be sampled at n points. Their embedding p
where aijk = sijk (sijk − `ij )(sijk − `jk )(sijk − `ik ) is
coordinates x1 , . . . , xn are referred to as point cloud. Second,
the area of triangle ijk given by the Heron formula, and
a graph is constructed upon these points, acting as its vertices.
sijk = 21 (`ij + `jk + `ki ) is the semi-perimeter of triangle
The edges of the graph represent the local connectivity of the
ijk. The vertex weight ai is interpreted as the local area
manifold, telling whether two points belong to a neighborhood
element (shown in red in FIGS2). Note that the weights (12-
or not, e.g. with Gaussian edge weights
13) are expressed solely in terms of the discrete metric ` and
2
/2σ 2
wij = e−kxi −xj k . (11) are thus intrinsic. When the mesh is infinitely refined under
some technical conditions, such a construction can be shown
This simplest discretization, however, does not capture cor- to converge to the continuous Laplacian of the underlying
rectly the geometry of the underlying continuous manifold (for manifold [69].
example, the graph Laplacian would typically not converge An embedding of the mesh (amounting to specifying the
to the continuous Laplacian operator of the manifold with vertex coordinates x1 , . . . , xn ) induces a discrete metric `ij =
the increase of the sampling density [68]). A geometrically kxi − xj k2 , whereby (12) become the cotangent weights
consistent discretization is possible with an additional structure
of faces F ∈ V × V × V, where (i, j, k) ∈ F implies wij = 1
(cot αij + cot βij ) (14)
2
(i, j), (i, k), (k, j) ∈ E. The collection of faces represents the
underlying continuous manifold as a polyhedral surface con- ubiquitously used in computer graphics [70].

between different objects. For simplicity, we will consider vertices and edges of the graphs, respectively. We can define
weighted undirected graphs, formally defined as a pair (V, E), differential operators acting on such functions analogously to
where V = {1, . . . , n} is the set of n vertices, and E ⊆ V × V differential operators on manifolds [72]. The graph gradient
is the set of edges, where the graph being undirected implies is an operator ∇ : L2 (V) → L2 (E) mapping functions defined
that (i, j) ∈ E iff (j, i) ∈ E. Furthermore, we associate a on vertices to functions defined on edges,
weight ai > 0 with each vertex i ∈ V, and a weight wij ≥ 0 (∇f )ij = fi − fj , (22)
with each edge (i, j) ∈ E.
Real functions f : V → R and F : E → R on the automatically satisfying (∇f )ij = −(∇f )ji . The graph diver-
vertices and edges of the graph, respectively, are roughly the gence is an operator div : L2 (E) → L2 (V) doing the converse,
discrete analogy of continuous scalar and tangent vector fields 1 X
(divF )i = wij Fij . (23)
in differential geometry.4 We can define Hilbert spaces L2 (V) ai
j:(i,j)∈E
and L2 (E) of such functions by specifying the respective inner
products, It is easy to verify that the two operators are adjoint w.r.t. the
X inner products (20)–(21),
hf, giL2 (V) = ai fi gi ; (20)
hF, ∇f iL2 (E) = h∇∗ F, f iL2 (V) = h−divF, f iL2 (V) . (24)
i∈V
2 2
The graph Laplacian is an operator ∆ : L (V) → L (V)
X
hF, GiL2 (E) = wij Fij Gij . (21)
i∈E
defined as ∆ = −div ∇. Combining definitions (22)–(23), it
can be expressed in the familiar form
Let f ∈ L (V) and F ∈ L2 (E) be functions on the
2
1 X
(∆f )i = wij (fi − fj ). (25)
4 It is tacitly assumed here that F is alternating, i.e., Fij = −Fji . ai
(i,j)∈E
IEEE SIG PROC MAG 8

Note that formula (25) captures the intuitive geometric inter- where the projection on the basis functions producing a
pretation of the Laplacian as the difference between the local discrete set of Fourier coefficients (fˆi ) generalizes the analysis
average of a function around a point and the value of the (forward transform) stage in classical signal processing, and
function at the point itself. summing up the basis functions with these coefficients is the
Denoting by W = (wij ) the n × n matrix of edge synthesis (inverse transform) stage.
weights (it is assumed that wij = 0 if (i, j) ∈ / E), by A centerpiece of classical Euclidean signal processing is
A = diag(a1 , . . . , aP
n ) the diagonal matrix of vertex weights, the property of the Fourier transform diagonalizing the con-
and by D = diag( j:j6=i wij ) the degree matrix, the graph volution operator, colloquially referred to as the Convolution
Laplacian application to a function f ∈ L2 (V) represented as Theorem. This property allows to express the convolution f ?g
a column vector f = (f1 , . . . , fn )> can be written in matrix- of two functions in the spectral domain as the element-wise
vector form as product of their Fourier transforms,
Z ∞ Z ∞
∆f = A−1 (D − W)f . (26) (f[? g)(ω) = f (x)e−iωx dx g(x)e−iωx dx.(33)
−∞ −∞
The choice of A = I in (26) is referred to as the unnormalized
graph Laplacian; another popular choice is A = D producing Unfortunately, in the non-Euclidean case we cannot even
the random walk Laplacian [73]. define the operation x − x0 on the manifold or graph, so the
Discrete manifolds: As we mentioned, there are many notion of convolution (7) does not directly extend to this case.
practical situations in which one is given a sampling of One possibility to generalize convolution to non-Euclidean
points arising from a manifold but not the manifold itself. domains is by using the Convolution Theorem as a definition,
In computer graphics applications, reconstructing a correct X
discretization of a manifold from a point cloud is a difficult (f ? g)(x) = hf, φi iL2 (X ) hg, φi iL2 (X ) φi (x). (34)
i≥0
problem of its own, referred to a meshing (see insert IN2).
In manifold learning problems, the manifold is typically ap- One of the key differences of such a construction from the
proximated as a graph capturing the local affinity structure. We classical convolution is the lack of shift-invariance. In terms of
warn the reader that the term “manifold” as used in the context signal processing, it can be interpreted as a position-dependent
of generic data science is not geometrically rigorous, and can filter. While parametrized by a fixed number of coefficients in
have less structure than a classical smooth manifold we have the frequency domain, the spatial representation of the filter
defined beforehand. For example, a set of points that “looks can vary dramatically at different points (see FIGS4).
locally Euclidean” in practice may have self intersections, The discussion above also applies to graphs instead of
infinite curvature, different dimensions depending on the scale manifolds, where one only has to replace the inner product
and location at which one looks, extreme variations in density, in equations (32) and (34) with the discrete one (20). All the
and “noise” with confounding structure. sums over i would become finite, as the graph Laplacian ∆
Fourier analysis on non-Euclidean domains: The has n eigenvectors. In matrix-vector notation, the generalized
Laplacian operator is a self-adjoint positive-semidefinite oper- convolution f ? g can be expressed as Gf = Φ diag(ĝ)Φ> f ,
ator, admitting on a compact domain5 an eigendecomposition where ĝ = (ĝ1 , . . . , ĝn ) is the spectral representation of
with a discrete set of orthonormal eigenfunctions φ0 , φ1 , . . . the filter and Φ = (φ1 , . . . , φn ) denotes the Laplacian
(satisfying hφi , φj iL2 (X ) = δij ) and non-negative real eigen- eigenvectors (30). The lack of shift invariance results in the
values 0 = λ0 ≤ λ1 ≤ . . . (referred to as the spectrum of the absence of circulant (Toeplitz) structure in the matrix G,
Laplacian), which characterizes the Euclidean setting. Furthermore, it is
easy to see that the convolution operation commutes with the
∆φi = λi φi , i = 0, 1, . . . (31)
Laplacian, G∆f = ∆Gf .
The eigenfunctions are the smoothest functions in the sense Uniqueness and stability: Finally, it is important to note
of the Dirichlet energy (see insert IN3) and can be interpreted that the Laplacian eigenfunctions are not uniquely defined. To
as a generalization of the standard Fourier basis (given, in start with, they are defined up to sign, i.e., ∆(±φ) = λ(±φ).
fact, by the eigenfunctions of the 1D Euclidean Laplacian, Thus, even isometric domains might have different Laplacian
2
− xd2 eiωx = ω 2 eiωx ) to a non-Euclidean domain. It is impor- eigenfunctions. Furthermore, if a Laplacian eigenvalue has
tant to emphasize that the Laplacian eigenbasis is intrinsic due multiplicity, then the associated eigenfunctions can be de-
to the intrinsic construction of the Laplacian itself. fined as orthonormal basis spanning the corresponding eigen-
A square-integrable function f on X can be decomposed subspace (or said differently, the eigenfunctions are defined
into Fourier series as up to an orthogonal transformation in the eigen-subspace).
X A small perturbation of the domain can lead to very large
f (x) = hf, φi iL2 (X ) φi (x), (32) changes in the Laplacian eigenvectors, especially those asso-
i≥0
| {z } ciated with high frequencies. At the same time, the definition
fˆi
of heat kernels (36) and diffusion distances (38) does not suffer
5 In the Euclidean case, the Fourier transform of a function defined on a
from these ambiguities – for example, the sign ambiguity
finite interval (which is a compact set) or its periodic extension is discrete. disappears as the eigenfunctions are squared. Heat kernels also
In practical settings, all domains we are dealing with are compact. appear to be robust to domain perturbations.
IEEE SIG PROC MAG 9

[IN3] Physical interpretation of Laplacian eigenfunctions: where Φk = (φ0 , . . . φk−1 ). The solution of (29) is given by
Given a function f on the domain X , the Dirichlet energy the first k eigenvectors of ∆ satisfying
Z Z ∆Φk = Φk Λk , (30)
EDir (f ) = k∇f (x)k2Tx X dx = f (x)∆f (x)dx,(27)
X X where Λk = diag(λ0 , . . . , λk−1 ) is the diagonal matrix of
corresponding eigenvalues. The eigenvalues 0 = λ0 ≤ λ1 ≤
measures how smooth it is (the last identity in (27) stems
. . . λk−1 are non-negative due to the positive-semidefiniteness
from (19)). We are looking for an orthonormal basis on X ,
of the Laplacian and can be interpreted as ‘frequencies’, where
containing k smoothest possible functions (FIGS3), by solving
φ0 = const with the corresponding eigenvalue λ0 = 0 play
the optimization problem
the role of the DC.
The Laplacian eigendecomposition can be carried out in
min EDir (φ0 ) s.t. kφ0 k = 1 (28)
φ0 two ways. First, equation (30) can be rewritten as a gen-
min EDir (φi ) s.t. kφi k = 1, i = 1, 2, . . . k − 1 eralized eigenproblem (D − W)Φk = AΦk Λk , result-
φi
ing in A-orthogonal eigenvectors, Φ> k AΦk = I. Alterna-
φi ⊥ span{φ0 , . . . , φi−1 }. tively, introducing a change of variables Ψk = A1/2 Φk ,
we can obtain a standard eigendecomposition problem
In the discrete setting, when the domain is sampled at n points, A−1/2 (D − W)A−1/2 Ψk = Ψk Λk with orthogonal eigen-
problem (28) can be rewritten as vectors Ψ> k Ψk = I. When A = D is used, the matrix
∆ = A−1/2 (D − W)A−1/2 is referred to as the normalized
min trace(Φ>
k ∆Φk ) s.t. Φ>
k Φk = I, (29) symmetric Laplacian.
Φk ∈Rn×k

max
0.2
φ0

0 φ3 0
φ2
φ1
−0.2
0 10 20 30 40 50 60 70 80 90 100
φ0 φ1 φ2 φ3 min

Euclidean Manifold

max

φ0 φ1 φ2 φ3
min

Graph
[FIGS3] Example of the first four Laplacian eigenfunctions φ0 , . . . , φ3 on a Euclidean domain (1D line, top left) and non-Euclidean domains
(human shape modeled as a 2D manifold, top right; and Minnesota road graph, bottom). In the Euclidean case, the result is the standard
Fourier basis comprising sinusoids of increasing frequency. In all cases, the eigenfunction φ0 corresponding to zero eigenvalue is constant
(‘DC’).

V. S PECTRAL METHODS weighted graph, a first route to generalize a convolutional


architecture is by first restricting our interest to linear operators
We have now finally got to our main goal, namely, con- that commute with the graph Laplacian. This property, in turn,
structing a generalization of the CNN architecture on non- implies operating on the spectrum of the graph weights, given
Euclidean domains. We will start with the assumption that the by the eigenvectors of the graph Laplacian.
domain on which we are working is fixed, and for the rest of
this section will use the problem of classification of a signal
on a fixed graph as the prototypical application.
We have seen that convolutions are linear operators that Spectral CNN (SCNN) [52]: Similarly to the convolutional
commute with the Laplacian operator. Therefore, given a layer (6) of a classical Euclidean CNN, Bruna et al. [52] define
IEEE SIG PROC MAG 10

[IN4] Heat diffusion on non-Euclidean domains: An impor- more the initial heat distribution).
tant application of spectral analysis, and historically, the main The ‘cross-talk’ between two heat kernels positioned at points
motivation for its development by Joseph Fourier, is the solu- x and x0 allows to measure an intrinsic distance
tion of partial differential equations (PDEs). In particular, we Z
are interested in heat propagation on non-Euclidean domains. d2t (x, x0 ) = (ht (x, y) − ht (x0 , y))2 dy (37)
X
This process is governed by the heat diffusion equation, which X
in the simplest setting of homogeneous and isotropic diffusion = e−2tλi (φi (x) − φi (x0 ))2 (38)
i≥0
has the form
(
ft (x, t) = −c∆f (x, t) referred to as the diffusion distance [30]. Note that interpret-
(35) ing (37) and (38) as spatial- and frequency-domain norms
f (x, 0) = f0 (x) (Initial condition) k · kL2 (X ) and k · k`2 , respectively, their equivalence is the
consequence of the Parseval identity. Unlike geodesic distance
with additional boundary conditions if the domain has a
that measures the length of the shortest path on the manifold
boundary. f (x, t) represents the temperature at point x at
or graph, the diffusion distance has an effect of averaging over
time t. Equation (35) encodes the Newton’s law of cooling,
different paths. It is thus more robust to perturbations of the
according to which the rate of temperature change of a
domain, for example, introduction or removal of edges in a
body (lhs) is proportional to the difference between its own
graph, or ‘cuts’ on a manifold.
temperature and that of the surrounding (rhs). The proportion
coefficient c is referred to as the thermal diffusivity constant;
here, we assume it to be equal to one for the sake of simplicity.
The solution of (35) is given by applying the heat operator
H t = e−t∆ to the initial condition and can be expressed in
the spectral domain as
max
X
f (x, t) = e−t∆ f0 (x) = hf0 , φi iL2 (X ) e−tλi φi (x)(36)
i≥0
Z X
= f0 (x0 ) e−tλi φi (x)φi (x0 ) dx0 .
X i≥0
0
| {z }
ht (x,x0 )

ht (x, x0 ) is known as the heat kernel and represents the


solution of the heat equation with an initial condition f0 (x) =
δx0 (x), or, in signal processing terms, an ‘impulse response’.
In physical terms, ht (x, x0 ) describes how much heat flows
from a point x to point x0 in time t. In the Euclidean case,
the heat kernel is shift-invariant, ht (x, x0 ) = ht (x − x0 ),
allowing to interpret the integral in (36) as a convolution
f (x, t) = (f0 ?ht )(x). In the spectral domain, convolution with
the heat kernel amounts to low-pass filtering with frequency [FIGS4] Examples of heat kernels on non-Euclidean domains (man-
response e−tλ . Larger values of diffusion time t result in lower ifold, top; and graph, bottom). Observe how moving the heat kernel
effective cutoff frequency and thus smoother solutions in space to a different location changes its shape, which is an indication of
(corresponding to the intuition that longer diffusion smoothes the lack of shift-invariance.

a spectral convolutional layer as which depends on the intrinsic regularity of the graph and
q
X
! also the sample size. Typically, k  n, since only the first
gl = ξ Φk Γl,l0 Φ>
k fl 0 , (39) Laplacian eigenvectors describing the smooth structure of the
l0 =1 graph are useful in practice.
where the n × p and n × q matrices F = (f1 , . . . , fp ) and If the graph has an underlying group invariance, such a
G = (g1 , . . . , gq ) represent the p- and q-dimensional input construction can discover it. In particular, standard CNNs
and output signals on the vertices of the graph, respectively can be redefined from the spectral domain (see insert IN5).
(we use n = |V| to denote the number of vertices in the However, in many cases the graph does not have a group
graph), Γl,l0 is a k × k diagonal matrix of spectral multipliers structure, or the group structure does not commute with the
representing a filter in the frequency domain, and ξ is a Laplacian, and so we cannot think of each filter as passing a
nonlinearity applied on the vertex-wise function values. Using template across V and recording the correlation of the template
only the first k eigenvectors in (39) sets a cutoff frequency with that location.
IEEE SIG PROC MAG 11

We should stress that a fundamental limitation of the spec- where P is a αn × n binary matrix whose ith row encodes
tral construction is its limitation to a single domain. The reason the position of the ith vertex of the coarse graph on the
is that spectral filter coefficients (39) are basis dependent. It original graph. It follows that strided convolutions can be
implies that if we learn a filter w.r.t. basis Φk on one domain, generalized using the spectral construction by keeping only
and then try to apply it on another domain with another basis the low-frequency components of the spectrum. This property
Ψk , the result could be very different (see Figure 2 and insert also allows us to interpret (via interpolation) the local filters at
IN6). It is possible to construct compatible orthogonal bases deeper layers in the spatial construction to be low frequency.
across different domains resorting to a joint diagonalization However, since in (39) the non-linearity is applied in the
procedure [74], [75]. However, such a construction requires spatial domain, in practice one has to recompute the graph
the knowledge of some correspondence between the domains. Laplacian eigenvectors at each resolution and apply them
In applications such as social network analysis, for example, directly after each pooling step.
where dealing with two time instances of a social graph The spectral construction (39) assigns a degree of free-
in which new vertices and edges have been added, such dom for each eigenvector of the graph Laplacian. In most
a correspondence can be easily computed and is therefore graphs, individual high-frequency eigenvectors become highly
a reasonable assumption. Conversely, in computer graphics unstable. However, similarly as the wavelet construction in
applications, finding correspondence between shapes is in Euclidean domains, by appropriately grouping high frequency
itself a very hard problem, so assuming known correspondence eigenvectors in each octave one can recover meaningful and
between the domains is a rather unreasonable assumption. stable information. As we shall see next, this principle also
entails better learning complexity.
Spectral CNN with smooth spectral multipliers [52], [44]:
In order to reduce the risk of overfitting, it is important
to adapt the learning complexity to reduce the number of
free parameters of the model. On Euclidean domains, this is
achieved by learning convolutional kernels with small spatial
support, which enables the model to learn a number of
parameters independent of the input size. In order to achieve
a similar learning complexity in the spectral domain, it is thus
necessary to restrict the class of spectral multipliers to those
Domain X X Y corresponding to localized filters.
Basis Φ Φ Ψ For that purpose, we have to express spatial localization
Signal f ΦWΦ> f ΨWΨ> f of filters in the frequency domain. In the Euclidean case,
smoothness in the frequency domain corresponds to spatial
Fig. 2. A toy example illustrating the difficulty of generalizing decay, since
spectral filtering across non-Euclidean domains. Left: a function
defined on a manifold (function values are represented by color);
∂ k fˆ(ω) 2

middle: result of the application of an edge-detection filter in the Z +∞ Z +∞
2k 2
frequency domain; right: the same filter applied on the same function |x| |f (x)| dx = dω, (42)

∂ω k

but on a different (nearly-isometric) domain produces a completely −∞ −∞
different result. The reason for this behavior is that the Fourier basis
is domain-dependent, and the filter coefficients learnt on one domain
cannot be applied to another one in a straightforward manner.
by virtue of the Parseval Identity. This suggests that, in
order to learn a layer in which features will be not only
shared across locations but also well localized in the original
Assuming that k = O(n) eigenvectors of the Laplacian
domain, one can learn spectral multipliers which are smooth.
are kept, a convolutional layer (39) requires pqk = O(n)
Smoothness can be prescribed by learning only a subsampled
parameters to train. We will see next how the global and local
set of frequency multipliers and using an interpolation kernel
regularity of the graph can be combined to produce layers with
to obtain the rest, such as cubic splines.
constant number of parameters (i.e., such that the number of
learnable parameters per layer does not depend upon the size However, the notion of smoothness also requires some
of the input), which is the case in classical Euclidean CNNs. geometry in the spectral domain. In the Euclidean setting,
such a geometry naturally arises from the notion of frequency;
The non-Euclidean analogy of pooling is graph coarsening,
for example, in the plane, the similarity between two Fourier
in which only a fraction α < 1 of the graph vertices is > 0>
atoms eiω x and eiω x can be quantified by the distance
retained. The eigenvectors of graph Laplacians at two different
kω − ω 0 k, where x denotes the two-dimensional planar co-
resolutions are related by the following multigrid property:
ordinates, and ω is the two-dimensional frequency vector. On
Let Φ, Φ̃ denote the n × n and αn × αn matrices of
graphs, such a relation can be defined by means of a dual
Laplacian eigenvectors of the original and the coarsened graph,
graph with weights w̃ij encoding the similarity between two
respectively. Then,
eigenvectors φi and φj .
A particularly simple choice consists in choosing a one-
 
Iαn
Φ̃ ≈ PΦ , (40) dimensional arrangement, obtained by ordering the eigenvec-
0
IEEE SIG PROC MAG 12

[IN5] Rediscovering standard CNNs using correlation argument diagonal in the Fourier basis, hence translation
kernels: In situations where the graph is constructed from invariant, hence classical convolutions. Furthermore, Section
the data, a straightforward choice of the edge weights (11) of VI explains how spatial subsampling can also be obtained via
the graph is the covariance of the data. Let F denote the input dropping the last part of the spectrum of the Laplacian, leading
data distribution and to pooling, and ultimately to standard CNNs.
Σ = E(F − EF)(F − EF)> (41)
be the data covariance matrix. If each point has the same
variance σii = σ 2 , then diagonal operators on the Laplacian
simply scale the principal components of F.
In natural images, since their distribution is approximately
stationary, the covariance matrix has a circulant structure
σij ≈ σi−j and is thus diagonalized by the standard Discrete
Cosine Transform (DCT) basis. It follows that the principal
components of F roughly correspond to the DCT basis vectors
ordered by frequency. Moreover, natural images exhibit a
power spectrum E|fb(ω)|2 ∼ |ω|−2 , since nearby pixels are
more correlated than far away pixels [14]. It results that [FIG5a] Two-dimensional embedding of pixels in 16 × 16 image
principal components of the covariance are essentially ordered patches using a Euclidean RBF kernel. The RBF kernel is constructed
from low to high frequencies, which is consistent with the as in (11), by using the covariance σij as Euclidean distance between
standard group structure of the Fourier basis. When applied to two features. The pixels are embedded in a 2D space using the
natural images represented as graphs with weights defined by first two eigenvectors of the resulting graph Laplacian. The colors
the covariance, the spectral CNN construction recovers the in the left and right figure represent the horizontal and vertical
standard CNN, without any prior knowledge [76]. Indeed, coordinates of the pixels, respectively. The spatial arrangement of
the linear operators ΦΓl,l0 Φ> in (39) are by the previous pixels is roughly recovered from correlation measurements.

tors according to their eigenvalues. 6 In this setting, the spectral this cost by avoiding explicit computation of the Laplacian
multipliers are parametrized as eigenvectors.
diag(Γl,l0 ) = Bαl,l0 , (43)
where B = (bij ) = (βj (λi )) is a k × q fixed interpolation VI. S PECTRUM - FREE METHODS
kernel (e.g., βj (λ) can be cubic splines) and α is a vector
of q interpolation coefficients. In order to obtain filters with A polynomial of the Laplacian acts as a polynomial on the
constant spatial support (i.e., independent of the input size eigenvalues. Thus, instead of explicitly operating in the fre-
n), one should choose a sampling step γ ∼ n in the spectral quency domain with spectral multipliers as in equation (43), it
domain, which results in a constant number nγ −1 = O(1) of is possible to represent the filters via a polynomial expansion:
coefficients αl,l0 per filter. Therefore, by combining spectral
gα (∆) = Φgα (Λ)Φ> , (44)
layers with graph coarsening, this model has O(log n) total
trainable parameters for inputs of size n, thus recovering the
corresponding to
same learning complexity as CNNs on Euclidean grids.
Even with such a parametrization of the filters, the spec- r−1
X
tral CNN (39) entails a high computational complexity of gα (λ) = αj λj . (45)
performing forward and backward passes, since they require j=0
an expensive step of matrix multiplication by Φk and Φ> k.
While on Euclidean domains such a multiplication can be Here α is the r-dimensional vector of polynomial coefficients,
efficiently carried in O(n log n) operations using FFT-type and gα (Λ) = diag(gα (λ1 ), . . . , gα (λn )), resulting in filter
algorithms, for general graphs such algorithms do not exist and matrices Γl,l0 = gαl,l0 (Λ) whose entries have an explicit form
the complexity is O(n2 ). We will see next how to alleviate in terms of the eigenvalues.
An important property of this representation is that it
6 In the mentioned 2D example, this would correspond to ordering the
automatically yields localized filters, for the following reason.
Fourier basis function according to the sum of the corresponding frequencies
ω1 + ω2 . Although numerical results on simple low-dimensional graphs show Since the Laplacian is a local operator (working on 1-hop
that the 1D arrangement given by the spectrum of the Laplacian is efficient at neighborhoods), the action of its jth power is constrained to
creating spatially localized filters [52], an open fundamental question is how to j-hops. Since the filter is a linear combination of powers of
define a dual graph on the eigenvectors of the Laplacian in which smoothness
(obtained by applying the diffusion operator) corresponds to localization in the Laplacian, overall (45) behaves like a diffusion operator
the original graph. limited to r-hops around each vertex.
IEEE SIG PROC MAG 13

Graph CNN (GCNN) a.k.a. ChebNet [45]: Defferrard et W. Given a p-dimensional input signal on the vertices of the
al. used Chebyshev polynomial generated by the recurrence graph, represented by the n × p matrix F, the GNN considers
relation a generic nonlinear function ηθ : Rp ×Rp → Rq , parametrized
by trainable parameters θ that is applied to all nodes of the
Tj (λ) = 2λTj−1 (λ) − Tj−2 (λ); (46)
graph,
T0 (λ) = 1; gi = ηθ ((Wf )i , (Df )i ) . (51)
T1 (λ) = λ.
In particular, choosing η(a, b) = b−a one recovers the Lapla-
A filter can thus be parameterized uniquely via an expansion cian operator ∆f , but more general, nonlinear choices for η
of order r − 1 such that yield trainable, task-specific diffusion operators. Similarly as
r−1 with a CNN architecture, one can stack the resulting GNN
X
˜
gα (∆) = αj ΦTj (Λ̃)Φ> (47) layers g = Cθ (f ) and interleave them with graph pooling
j=0 operators. Chebyshev polynomials Tr (∆) can be obtained
r−1
X with r layers of (51), making it possible, in principle, to
= ˜
αj Tj (∆), consider ChebNet and GCN as particular instances of the GNN
j=0 framework.
Historically, a version of GNN was the first formulation
˜ = 2λ−1 − I and Λ̃ = 2λ−1
where ∆ n ∆ n Λ − I denotes of deep learning on graphs, proposed in [49], [78]. These
a rescaling of the Laplacian mapping its eigenvalues from works optimized over the parameterized steady state of some
the interval [0, λn ] to [−1, 1] (necessary since the Chebyshev diffusion process (or random walk) on the graph. This can be
polynomials form an orthonormal basis in [−1, 1]). interpreted as in equation (51), but using a large number of
Denoting f̄ (j) = Tj (∆)f˜ , we can use the recurrence
layers where each Cθ is identical, as the forwards through the
relation (46) to compute f̄ (j) = 2∆ ˜ f̄ (j−1) − f̄ (j−2) with
Cθ approximate the steady state. Recent works [55], [50], [51],
(0) (1)
f̄ = f and f̄ = ∆f ˜ . The computational complexity of this
[79], [80] relax the requirements of approaching the steady
procedure is therefore O(rn) operations and does not require state or using repeated applications of the same Cθ .
an explicit computation of the Laplacian eigenvectors. Because the communication at each layer is local to a vertex
Graph Convolutional Network (GCN) [77]: Kipf and neighborhood, one may worry that it would take many layers
Welling simplified this construction by further assuming r = 2 to get information from one part of the graph to another,
and λn ≈ 2, resulting in filters of the form requiring multiple hops (indeed, this was one of the reasons
gα (f ) = α0 f + α1 (∆ − I)f for the use of the steady state in [78]). However, for many
applications, it is not necessary for information to completely
= α0 f − α1 D−1/2 WD−1/2 f . (48)
traverse the graph. Furthermore, note that the graphs at each
Further constraining α = α0 = −α1 , one obtains filters layer of the network need not be the same. Thus we can replace
represented by a single parameter, the original neighborhood structure with one’s favorite multi-
scale coarsening of the input graph, and operate on that to
gα (f ) = α(I + D−1/2 WD−1/2 )f . (49) obtain the same flow of information as with the convolutional
Since the eigenvalues of I + D−1/2 WD−1/2 are now in nets above (or rather more like a “locally connected network”
the range [0, 2], repeated application of such a filter can [81]). This also allows producing a single output for the whole
result in numerical instability. This can be remedied by a graph (for “translation-invariant” tasks), rather than a per-
renormalization vertex output, by connecting each to a special output node.
Alternatively, one can allow η to use not only Wf and ∆f at
gα (f ) = αD̃−1/2 W̃D̃−1/2 f , (50) each node, but also Ws f for several diffusion scales s > 1, (as
P
where W̃ = W + I and D̃ = diag( j6=i w̃ij ). in [45]), giving the GNN the ability to learn algorithms such
Note that though we arrived at the constructions of ChebNet as the power method, and more directly accessing spectral
and GCN starting in the spectral domain, they boil down to properties of the graph.
applying simple filters acting on the r- or 1-hop neighborhood The GNN model can be further generalized to replicate
of the graph in the spatial domain. We consider these con- other operators on graphs. For instance, the point-wise non-
structions to be examples of the more general Graph Neural linearity η can depend on the vertex type, allowing extremely
Network (GNN) framework: rich architectures [55], [50], [51], [79], [80].
Graph Neural Network (GNN) [78]: Graph Neural Net-
works generalize the notion of applying the filtering operations VII. C HARTING - BASED METHODS
directly on the graph via the graph weights. Similarly as We will now consider the second sub-class of non-Euclidean
Euclidean CNNs learn generic filters as linear combinations learning problems, where we are given multiple domains.
of localized, oriented bandpass and lowpass filters, a Graph A prototypical application the reader should have in mind
Neural Network learns at each layer a generic linear combi- throughout this section is the problem of finding correspon-
nation of graph low-pass and high-pass operators. These are dence between shapes, modeled as manifolds (see insert IN7).
given respectively by f 7→ Wf and f 7→ ∆f , and are thus As we have seen, defining convolution in the frequency
generated by the degree matrix D and the diffusion matrix domain has an inherent drawback of inability to adapt the
IEEE SIG PROC MAG 14

model across different domains. We will therefore need to shooting from a point at equi-spaced angles. The weighting
resort to an alternative generalization of the convolution in functions in this case can be obtained as a product of Gaussians
the spatial domain that does not suffer from this drawback. 0 2
/2σρ2 0 2
/2σθ2
Furthermore, note that in the setting of multiple domains, vij (x, x0 ) = e−(ρ(x )−ρi ) e−(θ(x )−θj ) , (54)
there is no immediate way to define a meaningful spatial pool- where i = 1, . . . , J and j = 1, . . . , J 0 denote the indices of
ing operation, as the number of points on different domains the radial and angular bins, respectively. The resulting JJ 0
can vary, and their order be arbitrary. It is however possible to weights are bins of width σρ × σθ in the polar coordinates
pool point-wise features produced by a network by aggregating (Figure 3, right).
all the local information into a single vector. One possibility
Anisotropic CNN [48]: We have already seen the non-
for such a pooling is computing the statistics of the point-wise
Euclidean heat equation (35), whose heat kernel ht (x, ·)
features, e.g. the mean or covariance [47]. Note that after such
produces localized blob-like weights around the point x
a pooling all the spatial information is lost.
(see FIGS4). Varying the diffusion time t controls the spread
On a Euclidean domain, due to shift-invariance the convolu- of the kernel. However, such kernels are isotropic, meaning
tion can be thought of as passing a template at each point of the that the heat flows equally fast in all the directions. A more
domain and recording the correlation of the template with the general anisotropic diffusion equation on a manifold
function at that point. Thinking of image filtering, this amounts
to extracting a (typically square) patch of pixels, multiplying ft (x, t) = −div(A(x)∇f (x, t)), (55)
it element-wise with a template and summing up the results,
then moving to the next position in a sliding window manner. involves the thermal conductivity tensor A(x) (in case of two-
Shift-invariance implies that the very operation of extracting dimensional manifolds, a 2 × 2 matrix applied to the intrinsic
the patch at each position is always the same. gradient in the tangent plane at each point), allowing modeling
One of the major problems in applying the same paradigm heat flow that is position- and direction-dependent [82]. A
to non-Euclidean domains is the lack of shift-invariance, particular choice of the heat conductivity tensor proposed in
implying that the ‘patch operator’ extracting a local ‘patch’ [53] is  
α
would be position-dependent. Furthermore, the typical lack Aαθ (x) = Rθ (x) R> θ (x), (56)
of meaningful global parametrization for a graph or manifold 1
forces to represent the patch in some local intrinsic system where the 2 × 2 matrix Rθ (x) performs rotation of θ w.r.t.
of coordinates. Such a mapping can be obtained by defining to some reference (e.g. the maximum curvature) direction and
a set of weighting functions v1 (x, ·), . . . , vJ (x, ·) localized to α > 0 is a parameter controlling the degree of anisotropy
positions near x (see examples in Figure 3). Extracting a patch (α = 1 corresponds to the classical isotropic case). The heat
amounts to averaging the function f at each point by these kernel of such anisotropic diffusion equation is given by the
weights, spectral expansion
Z
f (x0 )vj (x, x0 )dx0 , j = 1, . . . , J, (52)
X
Dj (x)f = hαθt (x, x0 ) = e−tλαθi φαθi (x)φαθi (x0 ), (57)
X
i≥0
providing for a spatial definition of an intrinsic equivalent of
convolution where φαθ0 (x), φαθ1 (x), . . . are the eigenfunctions and
X λαθ0 , λαθ1 , . . . the corresponding eigenvalues of the
(f ? g)(x) = gj Dj (x)f, (53) anisotropic Laplacian
j
∆αθ f (x) = −div(Aαθ (x)∇f (x)). (58)
where g denotes the template coefficients applied on the patch
extracted at each point. Overall, (52)–(53) act as a kind of non- The discretization of the anisotropic Laplacian is a modi-
linear filtering of f , and the patch operator D is specified by fication of the cotangent formula (14) on meshes or graph
defining the weighting functions v1 , . . . , vJ . Such filters are Laplacian (11) on point clouds [48].
localized by construction, and the number of parameters is The anisotropic heat kernels hαθt (x, ·) look like elongated
equal to the number of weighting functions J = O(1). Several rotated blobs (see Figure 3, center), where the parameters
frameworks for non-Euclidean CNNs essentially amount to α, θ and t control the elongation, orientation, and scale,
different choice of these weights. The spectrum-free methods respectively. Using such kernels as weighting functions v in
(ChebNet and GCN) described in the previous section can also the construction of the patch operator (52), it is possible to
be thought of in terms of local weighting functions, as it is obtain a charting similar to the geodesic patches (roughly, θ
easy to see the analogy between formulae (53) and (47). plays the role of the angular coordinate and t of the radial
Geodesic CNN [47]: Since manifolds naturally come with one).
a low-dimensional tangent space associated with each point, Mixture model network (MoNet) [54]: Finally, as the most
it is natural to work in a local system of coordinates in the general construction of patches, Monti et al. [54] proposed
tangent space. In particular, on two-dimensional manifolds one defining at each point a local system of d-dimensional pseudo-
can create a polar system of coordinates around x where the coordinates u(x, x0 ) around x. On these coordinates, a set of
radial coordinate is given by some intrinsic distance ρ(x0 ) = parametric kernels v1 (u), . . . , vJ (u)) is applied, producing the
d(x, x0 ), and the angular coordinate θ(x) is obtained by ray weighting functions in (52). Rather than using fixed kernels
IEEE SIG PROC MAG 15

as in the previous constructions, Monti et al. use Gaussian by localizing frequency analysis in a window g(x), leading
kernels to the definition of the Windowed Fourier Transform (WFT,
also known as short-time Fourier transform or spectrogram in
vj (u) = exp − 12 (u − µj )> Σ−1

j (u − µj ) signal processing),
Z ∞
whose parameters (d × d covariance matrices Σ1 , . . . , ΣJ and 0

d×1 mean vectors µ1 , . . . , µJ ) are learned.7 Learning not only (Sf )(x, ω) = f (x0 ) g(x0 − x)e−iωx dx0 (59)
−∞ | {z }
the filters but also the patch operators in (53) affords additional gx,ω (x0 )
degrees of freedom to the MoNet architecture, which makes it = hf, gx,ω iL2 (R) . (60)
currently the state-of-the-art approach in several applications.
It is also easy to see that this approach generalizes the The WFT is a function of two variables: spatial location of
previous models, and e.g. classical Euclidean CNNs as well as the window x and the modulation frequency ω. The choice of
Geodesic- and Anisotropic CNNs can be obtained as particular the window function g allows to control the tradeoff between
instances thereof [54]. MoNet can also be applied on general spatial and frequency localization (wider windows result in
graphs using as the pseudo-coordinates u some local graph better frequency resolution). Note that WFT can be interpreted
features such as vertex degree, geodesic distance, etc. as inner products (60) of the function f with translated and
modulated windows gx,ω , referred to as the WFT atoms.
The generalization of such a construction to non-Euclidean
domains requires the definition of translation and modulation
operators [83]. While modulation simply amounts to multipli-
cation by a Laplacian eigenfunction, translation is not well-
defined due to the lack of shift-invariance. It is possible to
resort again to the spectral definition of a convolution-like
operation (34), defining translation as convolution with a delta-
function,
X
(g ? δx0 )(x) = hg, φi iL2 (X ) hδx0 , φi iL2 (X ) φi (x)
Diffusion distance Anisotropic Geodesic polar i≥0
heat kernel coordinates X
= ĝi φi (x0 )φi (x). (61)
i≥0

The translated and modulated atoms can be expressed as


X
gx0 ,j (x) = φj (x0 ) ĝi φi (x)φi (x0 ), (62)
i≥0

where the window is specified in the spectral domain by its


Fig. 3. Top: examples of intrinsic weighting functions used to Fourier coefficients ĝi ; the WFT on non-Euclidean domains
construct a patch operator at the point marked in black (different thus takes the form
colors represent different weighting functions). Diffusion distance X
(left) allows to map neighbor points according to their distance from (Sf )(x0 , j) = hf, gx0 ,j iL2 (X ) = ĝi φi (x0 )hf, φi φj iL2 (X ) . (63)
the reference point, thus defining a one-dimensional system of local i≥0
intrinsic coordinates. Anisotropic heat kernels (middle) of different Due to the intrinsic nature of all the quantities involved in its
scale and orientations and geodesic polar weights (right) are two-
dimensional systems of coordinates. Bottom: representation of the definition, the WFT is also intrinsic.
weighting functions in the local polar (ρ, θ) system of coordinates Wavelets: Replacing the notion of frequency in time-
(red curves represent the 0.5 level set). frequency representations by that of scale leads to wavelet
decompositions. Wavelets have been extensively studied in
general graph domains [84]. Their objective is to define
VIII. C OMBINED SPATIAL / SPECTRAL METHODS stable linear decompositions with atoms well localized both in
space and frequency that can efficiently approximate signals
The third alternative for constructing convolution-like oper-
with isolated singularities. Similarly to the Euclidean setting,
ations of non-Euclidean domains is jointly in spatial-frequency
wavelet families can be constructed either from its spectral
domain.
constraints or from its spatial constraints.
Windowed Fourier transform: One of the notable
The simplest of such families are Haar wavelets. Several
drawbacks of classical Fourier analysis is its lack of spatial
bottom-up wavelet constructions on graphs were studied in
localization. By virtue of the Uncertainty Principle, one of the
[85] and [86]. In [87], the authors developed an unsupervised
fundamental properties of Fourier transforms, spatial localiza-
method that learns wavelet decompositions on graphs by opti-
tion comes at the expense of frequency localization, and vice
mizing a sparse reconstruction objective. In [88], ensembles of
versa. In classical signal processing, this problem is remedied
Haar wavelet decompositions were used to define deep wavelet
7 This choice allow interpreting intrinsic convolution (53) as a mixture of scattering transforms on general domains, obtaining excellent
Gaussians, hence the name of the approach. numerical performance. Learning amounts to finding optimal
IEEE SIG PROC MAG 16

pairings of nodes at each scale, which can be efficiently solved


in polynomial time. [IN6] Citation network analysis application: The
Localized Spectral CNN (LSCNN) [89]: Boscaini et al. used CORA citation network [90] is a graph containing
the WFT as a way of constructing patch operators (52) on man- 2708 vertices representing papers and 5429 edges
ifolds and point clouds and used in an intrinsic convolution- representing citations. Each paper is described by a
like construction (53). The WFT allows expressing a function 1433-dimensional bag-of-words feature vector and
around a point in the spectral domain in the form Dj (x)f = belongs to seven classes. For simplicity, the network
(Sf )(x, j). Applying learnable filters to such ‘patches’ (which is treated as an undirected graph. Applying the
in this case can be interpreted as spectral multipliers), it is spectral CNN with two spectral convolutional layers
possible to extract meaningful features that also appear to parametrized according to (50), the authors of [77]
generalize across different domains. An additional degree of obtained classification accuracy of 81.6% (compared
freedom is the definition of the window, which can also be to 75.7% previous best result). In [54], this result was
learned [89]. slightly improved further, reaching 81.7% accuracy
with the use of MoNet architecture.

Dichotomy of Geometric deep learning methods


Method Domain Data
Spectral CNN [52] spectral graph
GCNN/ChebNet [45] spec. free graph
GCN [77] spec. free graph
GNN [78] spec. free graph
Geodesic CNN [47] charting mesh
Anisotropic CNN [48] charting mesh/point cloud
MoNet [54] charting graph/mesh/point cloud
LSCNN [89] combined mesh/point cloud

IX. A PPLICATIONS
Network analysis: One of the classical examples used
in many works on network analysis are citation networks. Ci-
tation network is a graph where vertices represent papers and
there is a directed edge (i, j) if paper i cites paper j. Typically, [FIGS6a] Classifying research papers in the CORA dataset
vertex-wise features representing the content of the paper (e.g. with MoNet. Shown is the citation graph, where each node
histogram of frequent terms in the paper) are available. A is a paper, and an edge represents a citation. Vertex fill and
prototypical classification application is to attribute each paper outline colors represents the predicted and groundtruth labels,
to a field. Traditional approaches work vertex-wise, performing respectively; ideally, the two colors should coincide. (Figure
classification of each vertex’s feature vector individually. More reproduced from [54]).
recently, it was shown that classification can be considerably
improved using information from neighbor vertices, e.g. with
a CNN on graphs [45], [77]. Insert IN6 shows an example of
application of spectral and spatial graph CNN models on a currently exploring this connection by constructing multiscale
citation network. versions of graph neural networks.
Another fundamental problem in network analysis is rank- Recommender systems: Recommending movies on Net-
ing and community detection. These can be estimated by flix, friends on Facebook, or products on Amazon are a few
solving an eigenvalue problem on an appropriately defined examples of recommender systems that have recently become
operator on the graph: for instance, the Fiedler vector (the ubiquitous in a broad range of applications. Mathematically,
eigenvector associated with the smallest non-trivial eigenvalue a recommendation method can be posed as a matrix comple-
of the Laplacian) carries information on the graph partition tion problem [92], where columns and rows represent users
with minimal cut [73], and the popular PageRank algorithm and items, respectively, and matrix values represent a score
approximates page ranks with the principal eigenvector of a determining whether a user would like an item or not. Given
modified Laplacian operator. In some contexts, one may want a small subset of known elements of the matrix, the goal is to
develop data-driven versions of such algorithms, that can adapt fill in the rest. A famous example is the Netflix challenge [93]
to model mismatch and perhaps provide a faster alternative to offered in 2009 and carrying a 1M$ prize for the algorithm
diagonalization methods. By unrolling power iterations, one that can best predict user ratings for movies based on previous
obtains a Graph Neural Network architecture whose parame- ratings. The size of the Netflix matrix is 480K movies × 18K
ters can be learnt with backpropagation from labeled examples, users (8.5B elements), with only 0.011% known entries.
similarly to the Learnt Sparse Coding paradigm [91]. We are Several recent works proposed to incorporate geometric
IEEE SIG PROC MAG 17

structure into matrix completion problems [94], [95], [96], FILTER


[97] in the form of column- and row graphs representing
similarity of users and items, respectively (see Figure 4). Such
a geometric matrix completion setting makes meaningful e.g.
the notion of smoothness of the matrix values, and was shown
beneficial for the performance of recommender systems.
In a recent work, Monti et al. [56] proposed addressing
the geometric matrix completion problem by means of a
learnable model combining a Multi-Graph CNN (MGCNN)
and a recurrent neural network (RNN). Multi-graph convo-
lution can be thought of a generalization of the standard bi- FILTER
dimensional image convolution, where the domains of the rows
and the columns are now different (in our case, user- and
item graphs). The features extracted from the score matrix by
means of the MGCNN are then passed to an RNN, which
produces a sequence of incremental updates of the score
values. Overall, the model can be considered as a learnable
diffusion of the scores, with the main advantage compared
to traditional approach being a fixed number of variables
independent of the matrix size. MGCNN achieved state-of-the-
art results on several classical matrix completion challenges Euclidean CNN Geometric CNN
and, on a more conceptual level, could be a very interesting
practical application of geometric deep learning to a classical Fig. 5. Illustration of the difference between classical CNN (left)
signal processing problem. applied to a 3D shape (checkered surface) considered as a Euclidean
object, and a geometric CNN (right) applied intrinsically on the
j1 j2 ... j3 surface. In the latter case, the convolutional filters (visualized as a
colored window) are deformation-invariant by construction.

i2
its topological structure. Second, Euclidean representations
are not intrinsic, and vary when changing pose or deforming
m items

.. the object. Achieving invariance to shape deformations, a


.
common requirement in many vision applications, demands
i1 very complex models and huge training sets due to the large
number of degrees of freedom involved in describing non-rigid
deformations (Figure 5, left).
In the domain of computer graphics, on the other hand,
n users working intrinsically with geometric shapes is a standard
practice. In this field, 3D shapes are typically modeled as
Riemannian manifolds and are discretized as meshes. Numer-
ous studies (see, e.g. [102], [103], [104], [105], [106]) have
been devoted to designing local and global features e.g. for
establishing similarity or correspondence between deformable
Fig. 4. Geometric matrix completion exemplified on the famous Netflix shapes with guaranteed invariance to isometries.
movie recommendation problem. The column and row graphs represent the
relationships between users and items, respectively.
Furthermore, different applications in computer vision and
graphics may require completely different features: for in-
Computer vision and graphics: The computer vision stance, in order to establish feature-based correspondence
community has recently shown an increasing interest in work- between a collection of human shapes, one would desire the
ing with 3D geometric data, mainly due to the emergence descriptors of corresponding anatomical parts (noses, mouths,
of affordable range sensing technology such as Microsoft etc.) to be as similar as possible across the collection. In other
Kinect or Intel RealSense. Many machine learning techniques words, such descriptors should be invariant to the collection
successfully working on images were tried “as is” on 3D variability. Conversely, for shape classification, one would like
geometric data, represented for this purpose in some way “di- descriptors that emphasize the subject-specific characteristics,
gestible” by standard frameworks, e.g. as range images [98], and for example, distinguish between two different nose
[99] or rasterized volumes [100], [101]. The main drawback shapes (see Figure 6). Deciding a priori which structures
of such approaches is their treatment of geometric data as should be used and which should be ignored is often hard
Euclidean structures. First, for complex 3D objects, Euclidean or sometimes even impossible. Moreover, axiomatic modeling
representations such as depth images or voxels may lose of geometric noise such as 3D scanning artifacts turns out to
significant parts of the object or its fine details, or even break be extremely hard.
IEEE SIG PROC MAG 18

where (αj,l ) model particle-specific information such as the


spin, and (xj,l ) are the locations of the particles in a given
phase-space. Such system can be recast as a signal defined
over a graph with |Vl | = Nl vertices and edge weights
Wl = (φ(αi,l , αj,l , xi,l , xj,l )) expressed through a similarity
kernel capturing the appropriate priors. Graph neural networks
are currently being applied to perform event classification,
energy regression, and anomaly detection in high-energy
physics experiments such as the Large Hadron Collider (LHC)
Correspondence Similarity and neutrino detection in the IceCube Observatory. Recently,
models based on graph neural networks have been applied to
Fig. 6. Left: features used for shape correspondence should ideally
predict the dynamics of N -body systems [111], [112] showing
manifest invariance across the shape class (e.g., the “knee feature”
shown here should not depend on the specific person). Right: on excellent prediction performance.
the contrary, features used for shape retrieval should be specific to Molecule design: A key problem in material- and drug
a shape within the class to allow distinguishing between different design is predicting the physical, chemical, or biological
people. Similar features are marked with same color. Hand-crafting properties of a novel molecule (such as solubility of tox-
the right feature for each application is a very challenging task.
icity) from its structure. State-of-the-art methods rely on
hand-crafted molecule descriptors such as circular fingerprints
[113], [114], [115]. A recent work from Harvard university
By resorting to intrinsic deep neural networks, the invari-
[55] proposed modeling molecules as graphs (where vertices
ance to isometric deformations is automatically built into
represents atoms and edges represent chemical bonds) and
the model, thus vastly reducing the number of degrees of
employing graph convolutional neural networks to learn the
freedom required to describe the invariance class. Roughly
desired molecule properties. Their approach has significantly
speaking, the intrinsic deep model will try to learn ‘residual’
outperformed hand-crafted features. This work opens a new
deformations that deviate from the isometric model. Geometric
avenue in molecule design that might revolutionize the field.
deep learning can be applied to several problems in 3D shape
analysis, which can be divided in two classes. First, problems Medical imaging: An application area where signals
such as local descriptor learning [47], [53] or correspondence are naturally collected on non-Euclidean domains and where
learning [48] (see example in the insert IN7), in which the the methodologies we reviewed could be very useful is brain
output of the network is point-wise. The inputs to the network imaging. A recent trend in neuroscience is to associate func-
are some point-wise features, for example, color texture or tional MRI traces with a pre-computed connectivity rather than
simple geometric features such as normals. Using a CNN inferring it from the traces themselves [116]. In this case, the
architecture with multiple intrinsic convolutional layers, it is challenge consists in processing and analyzing an array of
possible to produce non-local features that capture the context signals collected over a complex topology, which results in
around each point. The second type of problems such as shape subtle dependencies. In a recent work from Imperial College
recognition require the network to produce a global shape [117], graph CNNs were used to detect disruptions of the brain
descriptor, aggregating all the local information into a single functional networks associated with autism.
vector using e.g. the covariance pooling [47].
Particle physics and Chemistry: Many areas of exper- X. O PEN PROBLEMS AND FUTURE DIRECTIONS
imental science are interested in studying systems of discrete The recent emergence of geometric deep learning methods
particles defined over a low-dimensional phase space. For in various communities and application domains, which we
instance, the chemical properties of a molecule are determined tried to overview in this paper, allow us to proclaim, perhaps
by the relative positions of its atoms, and the classification of with some caution, that we might be witnessing a new field
events in particle accelerators depends upon position, momen- being born. We expect the following years to bring exciting
tum, and spin of all the particles involved in the collision. new approaches and results, and conclude our review with
The behavior of an N -particle system is ultimately derived a few observations of current key difficulties and potential
from solutions of the Schrödinger equation, but its exact directions of future research.
solution involves diagonalizing a linear system of exponential Many disciplines dealing with geometric data employ some
size. In this context, an important question is whether one empirical models or “handcrafted” features. This is a typical
can approximate the dynamics with a tractable model that situation in geometry processing and computer graphics, where
incorporates by construction the geometric stability postulated axiomatically-constructed features are used to analyze 3D
by the Schrödinger equation, and at the same time has enough shapes, or computational sociology, where it is common to first
flexibility to adapt to data-driven scenarios and capture com- come up with a hypothesis and then test it on the data [22].
plex interactions. Yet, such models assume some prior knowledge (e.g. isometric
An instance l of an Nl -particle system can be expressed as shape deformation model), and often fail to correctly capture
Nl the full complexity and richness of the data. In computer
vision, departing from “handcrafted” features towards generic
X
fl (t) = αj,l δ(t − xj,l ) ,
j=1
models learnable from the data in a task-specific manner
IEEE SIG PROC MAG 19

[IN7] 3D shape correspondence application: Finding intrin- network at one point depends on the output in other points
sic correspondence between deformable shapes is a classical (in the simplest setting, correspondence should be smooth,
tough problem that underlies a broad range of vision and i.e., the output at nearby points should be similar). Litany
graphics applications, including texture mapping, animation, et al. [109] proposed intrinsic structured prediction of shape
editing, and scene understanding [107]. From the machine correspondence by integrating a layer computing functional
learning standpoint, correspondence can be thought of as a correspondence [106] into the deep neural network.
classification problem, where each point on the query shape is
assigned to one of the points on a reference shape (serving as a
“label space”) [108]. It is possible to learn the correspondence
with a deep intrinsic network applied to some input feature
vector f (x) at each point x of the query shape X , producing UΘ
xi
an output UΘ (f (x))(y), which is interpreted as the conditional yi
probability p(y|x) of x being mapped to y. Using a training set
of points with their ground-truth correspondence {xi , yi }i∈I ,
supervised learning is performed minimizing the multinomial
regression loss
X
min − log UΘ (f (xi ))(yi ) (64)
Θ
i∈I

w.r.t. the network parameters Θ. The loss penalizes for the de- X Y
viation of the predicted correspondence from the groundtruth.
We note that, while producing impressive result, such an [FIGS7a] Learning shape correspondence: an intrinsic deep network
approach essentially learns point-wise correspondence, which UΘ is applied point-wise to some input features defined at each point.
then has to be post-processed in order to satisfy certain The output of the network at each point x of the query shape X is a
properties such as smoothness or bijectivity. Correspondence probability distribution of the reference shape Y that can be thought
is an example of structured output, where the output of the of as a soft correspondence.

[FIGS7b] Intrinsic correspondence established between human shapes using intrinsic deep architecture (MoNet [54] with three convolutional
layers). SHOT descriptors capturing the local normal vector orientations [110] were used in this example as input features. The correspondence
is visualized by transferring texture from the leftmost reference shape. For additional examples, see [54].

has brought a breakthrough in performance and led to an formulation of convolution allows designing CNNs on a graph,
overwhelming trend in the community to favor deep learning but the model learned this way on one graph cannot be
methods. Such a shift has not occurred yet in the fields dealing straightforwardly applied to another one, since the spectral
with geometric data due to the lack of adequate methods, but representation of convolution is domain-dependent. A possible
there are first indications of a coming paradigm shift. remedy to the generalization problem of spectral methods
is the recent architecture proposed in [118], applying the
Generalization: Generalizing deep learning models to
idea of spatial transformer networks [119] in the spectral
geometric data requires not only finding non-Euclidean coun-
domain. This approach is reminiscent of the construction of
terparts of basic building blocks (such as convolutional and
compatible orthogonal bases by means of joint Laplacian
pooling layers), but also generalization across different do-
diagonalization [75], which can be interpreted as an alignment
mains. Generalization capability is a key requirement in many
of two Laplacian eigenbases in a k-dimensional space.
applications, including computer graphics, where a model
is learned on a training set of non-Euclidean domains (3D The spatial methods, on the other hand, allow generaliza-
shapes) and then applied to previously unseen ones. Spectral tion across different domains, but the construction of low-
IEEE SIG PROC MAG 20

dimensional local spatial coordinates on graphs turns to be this paper. This work was supported in part by the ERC
rather challenging. In particular, the construction of anisotropic Grants Nos. 307047 (COMET) and 724228 (LEMAN), Google
diffusion on general graphs is an interesting research direction. Faculty Research Award, Radcliffe fellowship, Rudolf Diesel
The spectrum-free approaches also allow generalization fellowship, and Nvidia equipment grants.
across graphs, at least in terms of their functional form.
However, if multiple layers of equation (51) used with no non- R EFERENCES
linearity or learned parameters θ, simulating a high power of [1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol.
the diffusion, the model may behave differently on different 521, no. 7553, pp. 436–444, 2015.
[2] T. Mikolov, A. Deoras, D. Povey, L. Burget, and J. Černockỳ, “Strate-
kinds of graphs. Understanding under what circumstances and gies for training large scale neural network language models,” in Proc.
to what extent these methods generalize across graphs is ASRU, 2011, pp. 196–201.
currently being studied. [3] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly,
A. Senior, V. Vanhoucke, P. Nguyen, and T. N. Sainath, “Deep neural
Time-varying domains: An interesting extension of networks for acoustic modeling in speech recognition: The shared
geometric deep learning problems discussed in this review views of four research groups,” IEEE Sig. Proc. Magazine, vol. 29,
is coping with signals defined over a dynamically changing no. 6, pp. 82–97, 2012.
[4] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning
structure. In this case, we cannot assume a fixed domain with neural networks,” in Proc. NIPS, 2014.
and must track how these changes affect signals. This could [5] Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks
prove useful to tackle applications such as abnormal activity and applications in vision,” in Proc. ISCAS, 2010.
[6] D. Cireşan, U. Meier, J. Masci, and J. Schmidhuber, “A committee of
detection in social or financial networks. In the domain of neural networks for traffic sign classification,” in Proc. IJCNN, 2011.
computer graphics and vision, potential applications deal with [7] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification
dynamic shapes (e.g. 3D video captured by a range sensor). with deep convolutional neural networks,” in Proc. NIPS, 2012.
[8] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierar-
Directed graphs: Dealing with directed graphs is also chical features for scene labeling,” Trans. PAMI, vol. 35, no. 8, pp.
a challenging topic, as such graphs typically have non- 1915–1929, 2013.
[9] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “Deepface: Closing the
symmetric Laplacian matrices that do not have orthogo- gap to human-level performance in face verification,” in Proc. CVPR,
nal eigendecomposition allowing easily interpretable spectral- 2014.
domain constructions. Citation networks, which are directed [10] K. Simonyan and A. Zisserman, “Very deep convolutional networks
for large-scale image recognition,” arXiv:1409.1556, 2014.
graphs, are often treated as undirected graphs (including [11] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
in our example in IN7) considering citations between two recognition,” arXiv:1512.03385, 2015.
papers without distinguishing which paper cites which. This [12] L. Deng and D. Yu, “Deep learning: methods and applications,”
Foundations and Trends in Signal Processing, vol. 7, no. 3–4, pp. 197–
obviously may loose important information. 387, 2014.
Synthesis problems: Our main focus in this review was [13] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT
primarily on analysis problems on non-Euclidean domains. Press, 2016, in preparation.
[14] E. P. Simoncelli and B. A. Olshausen, “Natural image statistics and
Not less important is the question of data synthesis. There neural representation,” Annual Review of Neuroscience, vol. 24, no. 1,
have been several recent attempts to try to learn a generative pp. 1193–1216, 2001.
model allowing to synthesize new images [120] and speech [15] D. J. Field, “What the statistics of natural images tell us about visual
coding,” in Proc. SPIE, 1989.
waveforms [121]. Extending such methods to the geometric [16] P. Mehta and D. J. Schwab, “An exact mapping between the variational
setting seems a promising direction, though the key difficulty renormalization group and deep learning,” arXiv:1410.3831, 2014.
is the need to reconstruct the geometric structure (e.g., an em- [17] S. Mallat, “Group invariant scattering,” Communications on Pure and
Applied Mathematics, vol. 65, no. 10, pp. 1331–1398, 2012.
bedding of a 2D manifold in the 3D Euclidean space modeling [18] J. Bruna and S. Mallat, “Invariant scattering convolution networks,”
a deformable shape) from some intrinsic representation [122]. Trans. PAMI, vol. 35, no. 8, pp. 1872–1886, 2013.
Computation: The final consideration is a computational [19] M. Tygert, J. Bruna, S. Chintala, Y. LeCun, S. Piantino, and A. Szlam,
“A mathematical motivation for complex-valued convolutional net-
one. All existing deep learning software frameworks are pri- works,” Neural Computation, 2016.
marily optimized for Euclidean data. One of the main reasons [20] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard,
for the computational efficiency of deep learning architectures W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten
ZIP code recognition,” Neural Computation, vol. 1, no. 4, pp. 541–551,
(and one of the factors that contributed to their renaissance) 1989.
is the assumption of regularly structured data on 1D or 2D [21] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and
grid, allowing to take advantage of modern GPU hardware. Y. Bengio, “Maxout networks,” arXiv:1302.4389, 2013.
[22] D. Lazer et al., “Life in the network: the coming age of computational
Geometric data, on the other hand, in most cases do not social science,” Science, vol. 323, no. 5915, p. 721, 2009.
have a grid structure, requiring different ways to achieve [23] E. H. Davidson et al., “A genomic regulatory network for develop-
efficient computations. It seems that computational paradigms ment,” Science, vol. 295, no. 5560, pp. 1669–1678, 2002.
[24] M. B. Wakin, D. L. Donoho, H. Choi, and R. G. Baraniuk, “The
developed for large-scale graph processing are more adequate multiscale structure of non-differentiable image manifolds,” in Proc.
frameworks for such applications. SPIE, 2005.
[25] N. Verma, S. Kpotufe, and S. Dasgupta, “Which spatial partition trees
are adaptive to intrinsic dimension?” in Proc. Uncertainty in Artificial
ACKNOWLEDGEMENT Intelligence, 2009.
[26] J. B. Tenenbaum, V. De Silva, and J. C. Langford, “A global geometric
The authors are grateful to Federico Monti, Davide framework for nonlinear dimensionality reduction,” Science, vol. 290,
Boscaini, Jonathan Masci, Emanuele Rodolà, Xavier Bresson, no. 5500, pp. 2319–2323, 2000.
[27] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by
Thomas Kipf, and Michaël Defferard for comments on the locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326,
manuscript and for providing some of the figures used in 2000.
IEEE SIG PROC MAG 21

[28] L. Maaten and G. Hinton, “Visualizing data using t-SNE,” JMLR, [59] A. Dosovitskiy, P. Fischery, E. Ilg, C. Hazirbas, V. Golkov, P. van der
vol. 9, pp. 2579–2605, 2008. Smagt, D. Cremers, T. Brox et al., “Flownet: Learning optical flow
[29] M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality with convolutional networks,” in Proc. ICCV, 2015.
reduction and data representation,” Neural Computation, vol. 15, no. 6, [60] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller,
pp. 1373–1396, 2003. “Striving for simplicity: The all convolutional net,” arXiv:1412.6806,
[30] R. R. Coifman and S. Lafon, “Diffusion maps,” App. and Comp. 2014.
Harmonic Analysis, vol. 21, no. 1, pp. 5–30, 2006. [61] S. Mallat, A wavelet tour of signal processing. Academic Press, 1999.
[31] R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by [62] A. Choromanska, M. Henaff, M. Mathieu, G. B. Arous, and Y. LeCun,
learning an invariant mapping,” in Proc. CVPR, 2006. “The loss surfaces of multilayer networks,” in Proc. AISTATS, 2015.
[32] B. Perozzi, R. Al-Rfou, and S. Skiena, “DeepWalk: Online learning of [63] I. Safran and O. Shamir, “On the quality of the initial basin in
social representations,” in Proc. KDD, 2014. overspecified neural networks,” arXiv:1511.04210, 2015.
[33] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “LINE: [64] K. Kawaguchi, “Deep learning without poor local minima,” in Proc.
Large-scale information network embedding,” in Proc. WWW, 2015. NIPS, 2016.
[34] S. Cao, W. Lu, and Q. Xu, “GraRep: Learning graph representations [65] T. Chen, I. Goodfellow, and J. Shlens, “Net2net: Accelerating learning
with global structural information,” in Proc. IKM, 2015. via knowledge transfer,” arXiv:1511.05641, 2015.
[35] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation [66] C. D. Freeman and J. Bruna, “Topology and geometry of half-rectified
of word representations in vector space,” arXiv:1301.3781, 2013. network optimization,” ICLR, 2017.
[36] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and [67] J. Nash, “The imbedding problem for Riemannian manifolds,” Annals
U. Alon, “Network motifs: simple building blocks of complex net- of Mathematics, vol. 63, no. 1, pp. 20–63, 1956.
works,” Science, vol. 298, no. 5594, pp. 824–827, 2002. [68] M. Wardetzky, S. Mathur, F. Kälberer, and E. Grinspun, “Discrete
[37] N. Pržulj, “Biological network comparison using graphlet degree laplace operators: no free lunch,” in Proc. SGP, 2007.
distribution,” Bioinformatics, vol. 23, no. 2, pp. 177–183, 2007. [69] M. Wardetzky, “Convergence of the cotangent formula: An overview,”
[38] J. Sun, M. Ovsjanikov, and L. J. Guibas, “A concise and provably in Discrete Differential Geometry, 2008, pp. 275–286.
informative multi-scale signature based on heat diffusion,” Computer [70] U. Pinkall and K. Polthier, “Computing discrete minimal surfaces and
Graphics Forum, vol. 28, no. 5, pp. 1383–1392, 2009. their conjugates,” Experimental Mathematics, vol. 2, no. 1, pp. 15–36,
[39] R. Litman and A. M. Bronstein, “Learning spectral descriptors for 1993.
deformable shape correspondence,” Trans. PAMI, vol. 36, no. 1, pp. [71] S. Rosenberg, The Laplacian on a Riemannian manifold: an introduc-
171–180, 2014. tion to analysis on manifolds. Cambridge University Press, 1997.
[40] S. Fortunato, “Community detection in graphs,” Physics Reports, vol. [72] L.-H. Lim, “Hodge Laplacians on graphs,” arXiv:1507.05379, 2015.
486, no. 3, pp. 75–174, 2010. [73] U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and
[41] T. Mikolov and J. Dean, “Distributed representations of words and Computing, vol. 17, no. 4, pp. 395–416, 2007.
phrases and their compositionality,” Proc. NIPS, 2013. [74] A. Kovnatsky, M. M. Bronstein, A. M. Bronstein, K. Glashoff, and
R. Kimmel, “Coupled quasi-harmonic bases,” in Computer Graphics
[42] E. Cho, S. A. Myers, and J. Leskovec, “Friendship and mobility: user
Forum, vol. 32, no. 2, 2013, pp. 439–448.
movement in location-based social networks,” in Proc. KDD, 2011.
[75] D. Eynard, A. Kovnatsky, M. M. Bronstein, K. Glashoff, and A. M.
[43] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Van-
Bronstein, “Multimodal manifold analysis by simultaneous diagonal-
dergheynst, “The emerging field of signal processing on graphs: Ex-
ization of Laplacians,” Trans. PAMI, vol. 37, no. 12, pp. 2505–2517,
tending high-dimensional data analysis to networks and other irregular
2015.
domains,” IEEE Sig. Proc. Magazine, vol. 30, no. 3, pp. 83–98, 2013.
[76] N. L. Roux, Y. Bengio, P. Lamblin, M. Joliveau, and B. Kégl, “Learning
[44] M. Henaff, J. Bruna, and Y. LeCun, “Deep convolutional networks on
the 2-d topology of images,” in Proc. NIPS, 2008.
graph-structured data,” arXiv:1506.05163, 2015.
[77] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
[45] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional convolutional networks,” arXiv:1609.02907, 2016.
neural networks on graphs with fast localized spectral filtering,” in [78] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini,
Proc. NIPS, 2016. “The graph neural network model,” IEEE Trans. Neural Networks,
[46] J. Atwood and D. Towsley, “Diffusion-convolutional neural networks,” vol. 20, no. 1, pp. 61–80, 2009.
arXiv:1511.02136v2, 2016. [79] M. B. Chang, T. Ullman, A. Torralba, and J. B. Tenenbaum, “A
[47] J. Masci, D. Boscaini, M. M. Bronstein, and P. Vandergheynst, compositional object-based approach to learning physical dynamics,”
“Geodesic convolutional neural networks on Riemannian manifolds,” 2016.
in Proc. 3DRR, 2015. [80] P. Battaglia, R. Pascanu, M. Lai, D. Jimenez Rezende, and
[48] D. Boscaini, J. Masci, E. Rodolà, and M. M. Bronstein, “Learning K. Kavukcuoglu, “Interaction networks for learning about objects,
shape correspondence with anisotropic convolutional neural networks,” relations and physics,” in Proc. NIPS, 2016.
in Proc. NIPS, 2016. [81] A. Coates and A. Y. Ng, “Selecting receptive fields in deep networks,”
[49] M. Gori, G. Monfardini, and F. Scarselli, “A new model for learning in Proc. NIPS, 2011.
in graph domains,” in Proc. IJCNN, 2005. [82] M. Andreux, E. Rodolà, M. Aubry, and D. Cremers, “Anisotropic
[50] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel, “Gated graph Laplace-Beltrami operators for shape analysis,” in Proc. NORDIA,
sequence neural networks,” arXiv:1511.05493, 2015. 2014.
[51] S. Sukhbaatar, A. Szlam, and R. Fergus, “Learning multiagent com- [83] D. I. Shuman, B. Ricaud, and P. Vandergheynst, “Vertex-frequency
munication with backpropagation,” arXiv:1605.07736, 2016. analysis on graphs,” App. and Comp. Harmonic Analysis, vol. 40, no. 2,
[52] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks pp. 260–291, 2016.
and locally connected networks on graphs,” Proc. ICLR, 2013. [84] R. R. Coifman and M. Maggioni, “Diffusion wavelets,” App. and Comp.
[53] D. Boscaini, J. Masci, E. Rodolà, M. M. Bronstein, and D. Cremers, Harmonic Analysis, vol. 21, no. 1, pp. 53–94, 2006.
“Anisotropic diffusion descriptors,” in Computer Graphics Forum, [85] A. D. Szlam, M. Maggioni, R. R. Coifman, and J. C. BremerJr,
vol. 35, no. 2, 2016, pp. 431–441. “Diffusion-driven multiscale analysis on manifolds and graphs: top-
[54] F. Monti, D. Boscaini, J. Masci, E. Rodolà, J. Svoboda, and M. M. down and bottom-up constructions,” in Optics & Photonics 2005.
Bronstein, “Geometric deep learning on graphs and manifolds using International Society for Optics and Photonics, 2005, pp. 59 141D–
mixture model CNNs,” in Proc. CVPR, 2017. 59 141D.
[55] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, [86] M. Gavish, B. Nadler, and R. R. Coifman, “Multiscale wavelets on
A. Aspuru-Guzik, and R. P. Adams, “Convolutional networks on graphs trees, graphs and high dimensional data: Theory and applications to
for learning molecular fingerprints,” in Proc. NIPS, 2015. semi supervised learning,” in Proc. ICML, 2010.
[56] F. Monti, X. Bresson, and M. M. Bronstein, “Geometric matrix com- [87] R. Rustamov and L. J. Guibas, “Wavelets on graphs via deep learning,”
pletion with recurrent multi-graph neural networks,” arXiv:1704.06803, in Proc. NIPS, 2013.
2017. [88] X. Cheng, X. Chen, and S. Mallat, “Deep Haar scattering networks,”
[57] S. Mallat, “Understanding deep convolutional networks,” Phil. Trans. Information and Inference, vol. 5, pp. 105–133, 2016.
R. Soc. A, vol. 374, no. 2065, 2016. [89] D. Boscaini, J. Masci, S. Melzi, M. M. Bronstein, U. Castellani, and
[58] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, P. Vandergheynst, “Learning class-specific descriptors for deformable
“Object detection with discriminatively trained part-based models,” shapes using localized spectral convolutional networks,” in Computer
Trans. PAMI, vol. 32, no. 9, pp. 1627–1645, 2010. Graphics Forum, vol. 34, no. 5, 2015, pp. 13–23.
IEEE SIG PROC MAG 22

[90] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi- [118] L. Yi, H. Su, X. Guo, and L. J. Guibas, “SyncSpecCNN: Synchronized
Rad, “Collective classification in network data,” AI Magazine, vol. 29, spectral CNN for 3D shape segmentation,” in Proc. CVPR, 2017.
no. 3, p. 93, 2008. [119] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu,
[91] K. Gregor and Y. LeCun, “Learning fast approximations of sparse “Spatial transformer networks,” in Proc. NIPS, 2015.
coding,” in Proc. ICML, 2010. [120] A. Dosovitskiy, J. Springenberg, M. Tatarchenko, and T. Brox, “Learn-
[92] E. Candès and B. Recht, “Exact matrix completion via convex opti- ing to generate chairs, tables and cars with convolutional networks,”
mization,” Commu. ACM, vol. 55, no. 6, pp. 111–119, 2012. in Proc. CVPR, 2015.
[93] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques [121] S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalch-
for recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, 2009. brenner, A. Senior, and K. Kavukcuoglu, “Wavenet: A generative model
[94] H. Ma, D. Zhou, C. Liu, M. Lyu, and I. King, “Recommender systems for raw audio,” arXiv:1609.03499, 2016.
with social regularization,” in Proc. Web Search and Data Mining, [122] D. Boscaini, D. Eynard, D. Kourounis, and M. M. Bronstein, “Shape-
2011. from-operator: Recovering shapes from intrinsic operators,” in Com-
[95] V. Kalofolias, X. Bresson, M. Bronstein, and P. Vandergheynst, “Matrix puter Graphics Forum, vol. 34, no. 2, 2015, pp. 265–274.
completion on graphs,” arXiv:1408.1717, 2014.
[96] N. Rao, H.-F. Yu, P. K. Ravikumar, and I. S. Dhillon, “Collaborative
filtering with graph information: Consistency and scalable methods,”
in Proc. NIPS, 2015.
[97] D. Kuang, Z. Shi, S. Osher, and A. Bertozzi, “A harmonic extension
approach for collaborative ranking,” arXiv:1602.05127, 2016.
[98] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi-view
convolutional neural networks for 3D shape recognition,” in Proc.
ICCV, 2015.
[99] L. Wei, Q. Huang, D. Ceylan, E. Vouga, and H. Li, “Dense human
body correspondences using convolutional networks,” in Proc. CVPR,
2016.
[100] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao,
“3D shapenets: A deep representation for volumetric shapes,” in Proc.
CVPR, 2015.
[101] C. R. Qi, H. Su, M. Nießner, A. Dai, M. Yan, and L. J. Guibas,
“Volumetric and multi-view CNNs for object classification on 3D data,”
in Proc. CVPR, 2016.
[102] A. M. Bronstein, M. M. Bronstein, and R. Kimmel, “Generalized
multidimensional scaling: a framework for isometry-invariant partial
surface matching,” PNAS, vol. 103, no. 5, pp. 1168–1172, 2006.
[103] M. M. Bronstein and I. Kokkinos, “Scale-invariant heat kernel signa-
tures for non-rigid shape recognition,” in Proc. CVPR, 2010.
[104] V. Kim, Y. Lipman, and T. Funkhouser, “Blended intrinsic maps,” ACM
Trans. Graphics, vol. 30, no. 4, p. 79, 2011.
[105] A. M. Bronstein, M. M. Bronstein, L. J. Guibas, and M. Ovsjanikov,
“ShapeGoogle: Geometric words and expressions for invariant shape
retrieval,” ACM Trans. Graphics, vol. 30, no. 1, p. 1, 2011.
[106] M. Ovsjanikov, M. Ben-Chen, J. Solomon, A. Butscher, and L. J.
Guibas, “Functional maps: a flexible representation of maps between
shapes,” ACM Trans. Graphics, vol. 31, no. 4, p. 30, 2012.
[107] S. Biasotti, A. Cerri, A. M. Bronstein, and M. M. Bronstein, “Recent
trends, applications, and perspectives in 3D shape similarity assess-
ment,” in Computer Graphics Forum, 2015.
[108] E. Rodolà, S. Rota Bulo, T. Windheuser, M. Vestner, and D. Cremers,
“Dense non-rigid shape correspondence using random forests,” in Proc.
CVPR, 2014.
[109] O. Litany, E. Rodolà, A. M. Bronstein, and M. M. Bronstein, “Deep
functional maps: Structured prediction for dense shape correspon-
dence,” arXiv:1704.08686, 2017.
[110] F. Tombari, S. Salti, and L. Di Stefano, “Unique signatures of his-
tograms for local surface description,” in Proc. ECCV, 2010.
[111] P. Battaglia, R. Pascanu, M. Lai, D. J. Rezende et al., “Interaction
networks for learning about objects, relations and physics,” in Proc.
NIPS, 2016.
[112] M. B. Chang, T. Ullman, A. Torralba, and J. B. Tenenbaum, “A
compositional object-based approach to learning physical dynamics,”
arXiv:1612.00341, 2016.
[113] H. L. Morgan, “The generation of a unique machine description for
chemical structure,” J. Chemical Documentation, vol. 5, no. 2, pp. 107–
113, 1965.
[114] R. C. Glem, A. Bender, C. H. Arnby, L. Carlsson, S. Boyer, and
J. Smith, “The generation of a unique machine description for chemical
structure,” Investigational Drugs, vol. 9, no. 3, pp. 199–204, 2006.
[115] D. Rogers and M. Hahn, “Extended-connectivity fingerprints,” J. Chem-
ical Information and Modeling, vol. 50, no. 5, pp. 742–754, 2010.
[116] M. G. Preti, T. A. Bolton, and D. Van De Ville, “The dynamic
functional connectome: State-of-the-art and perspectives,” NeuroImage,
2016.
[117] S. I. Ktena, S. Parisot, E. Ferrante, M. Rajchl, M. Lee, B. Glocker, and
D. Rueckert, “Distance metric learning using graph convolutional net-
works: Application to functional brain networks,” arXiv:1703.02161,
2017.

You might also like