0% found this document useful (0 votes)
70 views20 pages

Nested Hyperbolic Spaces For Dimensionality Reduct

This document describes a new method for dimensionality reduction and neural network design in hyperbolic spaces. It presents a novel technique called nested hyperbolic spaces that projects data onto a lower-dimensional embedded hyperbolic space in an isometric and equivariant manner. This projection allows for efficient dimensionality reduction and the development of a new fully hyperbolic graph convolutional neural network architecture.

Uploaded by

aDreamerBoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views20 pages

Nested Hyperbolic Spaces For Dimensionality Reduct

This document describes a new method for dimensionality reduction and neural network design in hyperbolic spaces. It presents a novel technique called nested hyperbolic spaces that projects data onto a lower-dimensional embedded hyperbolic space in an isometric and equivariant manner. This projection allows for efficient dimensionality reduction and the development of a new fully hyperbolic graph convolutional neural network architecture.

Uploaded by

aDreamerBoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/356841985

Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN


Design

Preprint · December 2021

CITATIONS READS

0 28

3 authors, including:

Chun-Hao Yang Baba Vemuri


National Taiwan University University of Florida
18 PUBLICATIONS 35 CITATIONS 322 PUBLICATIONS 13,416 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Statistics on manifold valued data View project

Information Geometry and Applications to Neuroimaging View project

All content following this page was uploaded by Baba Vemuri on 12 May 2022.

The user has requested enhancement of the downloaded file.


Nested Hyperbolic Spaces for Dimensionality Reduction
and Hyperbolic NN Design
Xiran Fan Chun-Hao Yang
University of Florida National Taiwan University
arXiv:2112.03402v1 [cs.LG] 3 Dec 2021

Department of Statistics Institute of Applied Mathematical Science


[email protected] [email protected]

Baba C. Vemuri
Department of CISE, University of Florida
[email protected]

December 8, 2021

Abstract
Hyperbolic neural networks have been popular in the recent past due to their abil-
ity to represent hierarchical data sets effectively and efficiently. The challenge in de-
veloping these networks lies in the nonlinearity of the embedding space namely, the
Hyperbolic space. Hyperbolic space is a homogeneous Riemannian manifold of the
Lorentz group which is a semi-Riemannian manifold, i.e. a manifold equipped with an
indefinite metric. Most existing methods (with some exceptions) use local lineariza-
tion to define a variety of operations paralleling those used in traditional deep neural
networks in Euclidean spaces. In this paper, we present a novel fully hyperbolic neural
network which uses the concept of projections (embeddings) followed by an intrinsic
aggregation and a nonlinearity all within the hyperbolic space. The novelty here lies in
the projection which is designed to project data on to a lower-dimensional embedded
hyperbolic space and hence leads to a nested hyperbolic space representation indepen-
dently useful for dimensionality reduction. The main theoretical contribution is that
the proposed embedding is proved to be isometric and equivariant under the Lorentz
transformations, which are the natural isometric transformations in hyperbolic spaces.
This projection is computationally efficient since it can be expressed by simple linear
operations, and, due to the aforementioned equivariance property, it allows for weight
sharing. The nested hyperbolic space representation is the core component of our
network and therefore, we first compare this ensuing nested hyperbolic space represen-
tation – independent of the network – with other dimensionality reduction methods
such as tangent PCA, principal geodesic analysis (PGA) and HoroPCA. Based on this
equivariant embedding, we develop a novel fully hyperbolic graph convolutional neural
network architecture to learn the parameters of the projection. Finally, we present ex-
periments demonstrating comparative performance of our network on several publicly
available data sets.

1
Figure 1: Projections of data from a 2-dimensional hyperbolic space to a 1-dimensional
hyperbolic space using different dimensionality reduction methods. The results are visualized
in the Poincaré disk. Original data (blue dots) lie in a 2-dimensional hyperbolic space and
have a zero mean (origin of the Poincaré disk). The HoroPCA direction (red dotted line)
and the principal geodesic obtained by tangent PCA (orange dashed line) and Exact PGA
(purple dash-dotted line) fail to capture the main trend of the data since they are restricted
to learn a geodesic submanifold passing through the mean. In contrast, our nested hyperbolic
(NH) representation (green solid line), captures the data trend more accurately. The diamond
markers on each line represent the reconstructed data from each method. The reconstruction
errors for HoroPCA, tangent PCA, EPGA and the proposed NH scheme in this example are,
0.1708, 0.1202, 0.1638 and 0.0062 respectively.

1 Introduction
Hyperbolic geometry is a centuries old field of non-Euclidean geometry and has recently
found its way into the field of machine learning, in particular into deep learning in the form
of hyperbolic neural networks (HNNs) or hyperbolic graph convolutional networks (HGCNs)
and recently for dimensionality reduction of data embedded in the hyperbolic space. In
this paper, we will discuss both problems namely, dimensionality reduction in hyperbolic
spaces and HNN architectures. In particular, we will present novel techniques for both these
problems. In the following, we present literature review of the two above stated problems and
establish the motivation for our work. A word on terminology, we will use the term hyperbolic
neural network and hyperbolic graph (convolutional) neural network synonymously for the
rest of the paper.

1.1 Dimensionality Reduction of Manifold-valued Data


Dimensionality reduction is a fundamental problem in machine learning with applications
in computer vision and many other fields of engineering and sciences. The simplest and
most popular method among these is the principal component analysis (PCA), which was
proposed more than a century ago (see [23] for a review and some recent developments

2
on PCA). PCA however is limited to data in vector spaces. For data that are manifold-
valued, principal geodesic analysis (PGA) was presented in [11], which yields the projection
of data onto principal geodesic submanifolds passing through an intrinsic (Fréchet) mean
[12] of the data. They find the geodesic submanifold of a lower dimension that maximizes
the projected variance and computationally, this was achieved via linear approximation,
i.e., applying PCA on the tangent space anchored at the Fréchet mean. This is sometimes
referred to as the tangent PCA (tPCA). This approximation however requires the data
to be clustered around the Fréchet mean, otherwise the tangent space approximation to
the manifold leads to inaccuracies. Subsequently, [44] presented the Exact PGA (EPGA)
algorithm, which does not use any linear approximation. However, EPGA is computationally
expensive as it requires two non-linear optimizations steps per iteration (projection to the
geodesic submanifold and finding the new geodesic direction such that the reconstruction
error is minimized). Later, authors in [5] developed a version of EPGA for constant sectional
curvature manifolds, namely the hypersphere and the hyperbolic space, by deriving closed
form formulae for the projection. There are many variants of PGA and we refer the reader to
[1, 22, 50] for the details. More recently, Barycentric subspace analysis (BSA) was proposed
in [37] which finds a more general parameterization of a nested sequence of submanifolds via
the minimization of unexplained variance. Another useful dimensionality reduction scheme
is the Principal curves [19] and their generalization to Riemannian manifolds [20] that are
more appropriate for certain applications.
A salient feature of PCA is that it yields nested linear subspaces, i.e., the reduced di-
mensional principal subspaces form a nested hierarchy. This idea was exploited in [24] where
authors proposed the principal nested spheres (PNS) by embedding an (n − 1)-sphere in
to an n-sphere, the embedding however is not necessarily isometric. Hence, PNS is more
general than PGA in that PNS does not have to be geodesic. Similarly, for the manifold Pn
of (n × n) symmetric positive definite (SPD) matrices, authors in [18] proposed a geometry-
aware dimensionality reduction by projecting data on Pn to Pm for some m  n. More
recently, the idea of constructing a nested sequence of manifolds was presented in [49] where
authors unified and generalized the nesting concept to general Riemannian homogeneous
manifolds, which form a large class of Riemannian manifolds, including the hypersphere, Pn ,
the Grassmannian, Stiefel manifold, Lie groups, and others. Although the general frame-
work in [49] seems straightforward and applicable to hyperbolic spaces, many significantly
important technical aspects need to be addressed and derived in detail. In this paper, we
will present novel derivations suited for the hyperbolic spaces – a projection operator which
is proved to yield an isometric embedding, and a proof of equivariance to isometries of the
projection operator – which will facilitate the construction of nested hyperbolic spaces and
the hyperbolic neural network. Note that there are five models of the hyperbolic space
namely, the hyperboloid (Lorentz) model, the Poincaré disk/ball model, the Poincaré half
plane model, the Klein model and the Jemisphere model [3]. All these models are isomet-
rically equivalent but some are better suited than others depending on the application. We
choose the Lorentz model of the hyperbolic space with a Lorentzian metric in our work.
The choice of this model and the associated metric over other models is motivated by the
properties of Riemannian optimization efficiency and numerical stability afforded [35, 8].
Most recently, an elegant approach called HoroPCA was proposed in [6], for dimension-
ality reduction in hyperbolic spaces. In particular, the authors represented the hyperbolic

3
space using the Poincaré model and they proposed to generalize the notion of direction and
the coordinates in a given direction using ideal points (points at infinity) and the Busemann
coordinates (defined using the Busemann function) [2]. The levels sets of the Busemann
function, called the horospheres, resemble the hyperplanes (or affine subspaces) in Euclidean
spaces and hence the dimensionality reduction is achieved by a projection that moves points
along a horosphere. The data is then projected to a geodesic hull of a base point b and
a number of ideal points p1 , . . . , pK , which is also a geodesic submanifold. This is the key
difference between HoroPCA and our proposed method which leads to a significant difference
in performance. This is evident from the toy example in Figure 1 which depicts the reduced
dimensional representations obtained by our method in comparison to those from EPGA,
HoroPCA, and tangent PCA. Note that all of the other methods yield submanifold repre-
sentations that do not capture the data trend accurately, unlike ours. More comprehensive
comparisons will be made in a later section.
To briefly summarize, our first goal in this paper is to present a nested hyperbolic space
representation for dimensionality reduction and we will demonstrate, via synthetic exam-
ples and real datasets, that it achieves a lower reconstruction error in comparison to other
competing methods.

1.2 Hyperbolic Neural Networks


Several researchers have demonstrated that the hyperbolic space is apt for modeling hierar-
chically organized data, for example, graphs and trees [40, 39, 34]. Recently, the formalism
of Gyrovector spaces (an algebraic structure) [46] was applied to the hyperbolic space to
define basic operations paralleling those in vector spaces and were used to build a hyper-
bolic neural network (HNN) [14, 43]. The Gyrovector space formalism facilitates performing
Möbius additions and subtractions in the Poincare model of the hyperbolic space. HNNs
have been successfully applied to word embeddings [45] as well as image embeddings [25].
Additionally, several existing deep network architectures have been modified to suit hyper-
bolic embeddings of data, e.g., graph networks [30, 7], attention module [16], and variational
auto-encoders [31, 36]. These hyperbolic networks were shown to perform comparably or
even better than their euclidean counterparts.
Existing HNNs have achieved moderate to great successes in multiple areas and shown
great potential in solving complex problems. However, most of them use tangent space
approximations to facilitate the use of vector space operations prevalent in existing neural
network architectures. There are however some exceptions, for instance, the authors in [9]
developed what they call a Hyperbolic-to-Hyperbolic network and the authors in [8] also
developed a fully Hyperbolic network. They both considered the use of Lorentz transfor-
mations on hyperbolic features since the Lorentz transformation matrix acts transitively on
a hyperbolic space and thus preserves the global hyperbolic structure. Each Lorentz trans-
formation is a composition of a Lorentz rotation and a rotation free Lorentz transformation
called the Lorentz boost operation. Authors in [9] only use Lorentz rotation for hyperbolic
feature transformations while authors in [8] build a fully-connected layer in hyperbolic space
(called a hyperbolic linear layer) parameterized by an arbitrary weight matrix (not neces-
sarily invertible) which is applied to each data point in the hyperbolic space resulting in a
mapping from a hyperbolic space to itself. This procedure is ad hoc in the sense that it does

4
not use the intrinsic characterization of the hyperbolic space as a homogeneous space with
the isometry group being the Lorentz group.
Lorentz transformations are however inappropriate for defining projection operations
(required for reducing the dimensionality) as they preserve the Lorentz model only when
there is no change in dimension. In other words, to find a lower-dimensional hyperbolic
space representation for data embedded in a higher-dimensional hyperbolic space, one cannot
use Lorentz transformations directly. Hence, we propose to use an isometric embedding
operation mentioned in the previous subsection as the building block to design a hyperbolic
neural network. We will now briefly summarize our proposed model and the contributions
of our work.

1.3 Proposed Model and Contributions


Inspired by [24] and [49], we construct a nested representation in a hyperbolic space to extract
the hyperbolic features. Such a nested (hierarchical) hyperbolic space representation has the
advantage that the data in reduced dimensions remains in a hyperbolic space. Hereafter, we
refer to these nested hyperbolic spaces as nested hyperboloids (NHs). As a dimensionality
reduction method in Riemannian manifolds, the learned lower dimensional submanifold in
NH is not required to pass the Fréchet mean unlike in PGA and need not be a geodesic
submanifold as in HoroPCA, PGA or EPGA. In the experiments section, we will demonstrate
that this leads to much lower reconstruction error in comparison to the aforementioned
dimensionality reduction methods.
After defining the projection which leads to an embedding within hyperbolic spaces of dif-
ferent dimensions, these projections/embeddings are used to define a feature transformation
layer in the hyperbolic space. This layer is then composed with a hyperbolic neighborhood
aggregation operation/layer and appropriate non-linear operations in between namely, the
tangent-ReLU, to define a novel nested hyperbolic graph convolutional network (NHGCN)
architecture.
The rest of the paper is organized as follows. In Section 2, we briefly review the geometry
of hyperbolic space. In Section 3, we explicitly give the projection and embedding to map
data between hyperbolic spaces of different dimensions. We also present a novel hyperbolic
graph convolutional neural network architecture based on these projections and tangent-
ReLU activation. In Section 4, we first present the performance of NH as a dimensionality
reduction method and compare with other competing methods, including EPGA, tangent-
PCA and HoroPCA. Next, we compare our NHGCN with other hyperbolic networks on the
problems of link prediction and node classification on four graph datasets described and used
in [7]. Finally, we draw conclusions in Section 5.

2 Preliminaries
In this section, we briefly review relevant concepts of hyperbolic geometry. In this paper,
we will regard the hyperbolic space as a homogeneous Riemannian manifold of the Lorentz
group and present a few important geometric concepts, including the geodesic distance and
the exponential map, in the hyperbolic space, which are used in our work. The materials

5
(a) Lorentz rotation (b) Lorentz boost

Figure 2: Illustration of two kinds of Lorentz transformation, Lorentz rotation and Lorentz
boost in a Lorentz model. They are isometric operations of the Lorentz model.

presented in this section can be found in most textbooks on hyperbolic spaces, for example
[38, 4].

2.1 Lorentzian Space and Hyperbolic Space


As mentioned in Section 1, there are several (isometrically) equivalent models of a hyperbolic
space, including the Poincaré model, Klein model, the upper-half space model, and the
Jemisphere model [3]. We choose to use the hyperboloid (Lorentz) model of the hyperbolic
space in this paper due to its numerical stability property which is very useful for the
optimization problem involved in the training and test phases. Our technique is however
applicable to all of the models due to the isometric equivalence of the models.
The (n+1)-dimensional Lorentzian space R1,n is the Euclidean space Rn+1 equipped with
a bilinear form
hx, yiL = −x0 y0 + x1 y1 + · · · + xn yn
where x = [x0 , x1 , . . . , xn ]T , y = [y0 , y1 , . . . , yn ]T ∈ Rn+1 . This bilinear form is sometimes
referred to as the Lorentzian inner product although it is not positive-definite. We denote p the
norm, called Lorentzian norm, induced by the Lorentzian inner product by kxkL = hx, xiL .
Note that kxkL is either positive, zero, or positive imaginary.
We consider the following submanifold of R1,n

Ln := {x = [x0 , . . . , xn ]T ∈ Rn+1 : kxk2L = −1, x0 > 0}

This is called the n-dimensional hyperboloid model of one sheet of a hyperbolic space defined
in Rn+1 .

6
2.2 Lorentz Transformations
In the Lorentzian space, the linear isometries are called the Lorentz transformation, i.e. the
map φ : Rn+1 → Rn+1 is a Lorentz transformation if hφ(x), φ(y)iL = hx, yiL for any x, y ∈
Rn+1 . It is easy to see that all Lorentz transformations form a group under composition,
and this group is denoted by O(1, n), called the Lorentz group. The matrix representation
of O(1, n) in Rn+1 is defined as follows. Let Jn = diag(−1, In ) where In is the n × n identity
matrix and diag(·) denotes a diagonal matrix. Then, O(1, n) is defined as O(1, n) := {A ∈
Mn+1 (R) : AJn AT = AT Jn A = Jn }. There are a few important subgroups of O(1, n): (i) the
subgroup O+ (1, n) := {A ∈ O(1, n) : a11 > 0} is called the positive Lorentz group; (ii) the
subgroup SO(1, n) := {A ∈ O(1, n) : det(A) = 1} is called the special Lorentz group; (iii)
the subgroup SO+ (1, n) := {A ∈ SO(1, n) : a11 > 0} is called the positive special Lorentz
group. Briefly speaking, the special Lorentz group preserves the orientation, and the positive
Lorentz group preserves the sign of the first entry of x ∈ Ln .

2.3 Riemannian Geometry of Hyperbolic Space


A commonly used Riemannian metric for Ln ⊂ Rn+1 is the restriction of the Lorentz inner
product to the tangent space of Ln . Note that even though the Lorentz inner product is not
positive-definite, when restricted to the tangent space of Ln , it is positive-definite. Hence, Ln
is a Riemannian manifold with constant negative sectional curvature. Furthermore, the group
of isometries of Ln is precisely O+ (1, n) and the group of orientation-preserving isometries
is SO+ (1, n). We now state a few useful facts about the group of isometries that are used
in this paper and refer the interested reader to [13] for details.
Fact 1. The positive special Lorentz group SO+ (1, n) acts transitively on Ln where the group
action is defined as x 7→ Ax for x ∈ Ln and A ∈ SO+ (1, n).
Fact 2. Let x = [1, 0, . . . , 0]T ∈ Ln . The isotropy subgroup Gx is given by
Gx := {A ∈ SO+ (1, n) : Ax = x}
  
1 0
= : R ∈ SO(n) ∼ = SO(n)
0 R
where SO(n) is the group of n × n orthogonal matrices with determinant 1.
Hence, the hyperbolic space is a homogeneous Riemannian manifold and can be written
as a quotient space, Ln = SO+ (1, n)/SO(n).
Fact 3 ([32]). A Lorentz transformation A ∈ SO+ (1, n) can be decomposed using a polar
decomposition and expressed as
c √ vT
  
1 0
A=
0 R v In + vv T
p
where R ∈ SO(n), v ∈ Rn and c = kvk + 1.
The first component is called a Lorentz rotation and the second component is called a
Lorentz boost.

7
Fact 4. Every Lorentz transformation matrix A ∈ SO+ (1, n) can be decomposed into
 
  cosh α sinh α 0T  
1 0  1 0
A= sinh α cosh α 0T  (1)
0 P 0 QT
0 0 In−1

where P, Q ∈ SO(n), α ∈ R and 0 ∈ Rn−1 . See Figure 2 for examples of the Lorentz rotations
and the Lorentz boosts.

The matrix in the middle is the Lorentz boost along the first coordinate axis. This
decomposition will be very useful in the optimization problem stated in Section 3.3, equation
(11).
We now conclude this section by presenting the explicit closed form formulae for the
exponential map and the geodesic distance. For any x ∈ Ln and v ∈ Tp Ln (the tangent
space of Ln at x), the exponential map at x is given by

Expx (v) = cosh(kvkL )x + sinh(kvkL )b/kvkL . (2)

Since Ln is a negatively curved Riemannian manifold, its exponential map is invertible and
the inverse of the exponential map, also called the Log map, is given by
θ
Logx (y) = (y − cosh(θ)x) (3)
sinh(θ)

where x, y ∈ Ln and θ is the geodesic distance between x and y given by

θ = dL (x, y) = cosh−1 (−hx, yiL ). (4)

3 Nested Hyperbolic Spaces and Networks


In this section, we first present the construction of nested hyperboloids (NHs); an illustration
of the NHs are given in Figure 3. We also prove that the proposed NHs possess several
nice properties, including the isometry property and the equivariance under the Lorentz
transformations. Then we use the NH representations to design a novel graph convolutional
network architecture, called Nested Hyperbolic Graph Convolutional Network (NHGCN).

3.1 The Nested Hyperboloid Representation


The key steps to the development of the NHs are the embedding of Lm into Ln for m < n and
the projection from Ln to Lm . The principle is to define an embedding of the corresponding
groups of isometries, SO+ (1, m) and SO+ (1, n).
First, we consider the embedding ι̃m : SO+ (1, m) → SO+ (1, m + 1) defined by
  
O aT
ι̃m (O) = adapted-GS Λ (5)
b c

8
𝜋𝑚+1 𝜋𝑚
𝑥𝑚+1 = sinh𝛼

𝕃𝑚
𝜄𝑚
𝕃𝑚+1

Figure 3: Illustration of NH model using the embedding ιm in Eq. (6) of Lm into Lm+1 .
The m-dimensional nested hyperboloid in Lm+1 is indeed the intersection of Lm+1 and an
m-dimensional hyperplane.

where O ∈ SO+ (1, m), a, b ∈ Rm+1 , c 6= aT O−1 b, and Λ ∈ SO+ (1, m + 1). The function
adapted-GS(·) is an adaptation of the standard Gram-Schmidt process to orthonormalize
vectors with respect to the Lorentz inner product defined earlier.
The Riemannian submersion (see [21] for the definition of a Riemannian submersion)
π : SO+ (1, m) → Lm is given by π(O) = O1 where O ∈ SO+ (1, m) and O1 is the first
column of O. Therefore, the induced embedding ιm : Lm → Lm+1 is
 
cosh(r)x
ιm (x) = Λ = cosh(r)Λ̃x + sinh(r)v (6)
sinh(r)

where Λ = [Λ̃ v] ∈ SO+ (1, m + 1). This class of embeddings is quite general as it includes
isometric embeddings as special cases.

Proposition 1. The embedding ιm : Lm → Lm+1 is isometric when r = 0.

Proof. It follows directly from the definitions of the Lorentz transformation and the geodesic
distance on Lm .
Furthermore, the embedding (6) is equivariant under Lorentz transformations.

Theorem 1. The embedding ιm : Lm → Lm+1 is equivariant under Lorentz transformations


of SO+ (1, m), i.e., ιm (Rx) = ΨΛ (ι̃m (R))ιm (x) where Ψg (h) = ghg −1 .

9
Proof. For x ∈ Lm and R ∈ SO+ (1, m),
 
cosh(r)Rx
ιm (Rx) = Λ
sinh(r)
  
R 0 cosh(r)x

0 1 sinh(r)
   
R 0 −1 cosh(r)x
=Λ Λ Λ
0 1 sinh(r)
= ΨΛ (ι̃m (R))ιm (x).

The projection πm+1 : Lm+1 → Lm corresponding to ιm is given by,


1 T
πm+1 (x) = Jm Λ̃ Jm+1 x
cosh r
T (7)
Jm Λ̃ Jm+1 x
= T
kJm Λ̃ Jm+1 xkL

for x ∈ Lm+1 . Hence, the reconstructed point x̂ ∈ Lm+1 of x ∈ Lm+1 is


T
Jm Λ̃ Jm+1 x
x̂ = cosh(r)Λ T
+ sinh(r)v. (8)
kJm Λ̃ Jm+1 xkL

The unknowns Λ = [Λ̃ v] and r can then be obtained by minimizing the reconstruction
error
N
1 X
L(Λ, r) = (dL (xi , x̂i ))2 . (9)
N i=1

The projection of x ∈ Ln into Lm for n > m can be obtained via the composition
π := πm+1 ◦ · · · ◦ πn
n
!T
Y 1
π(x) = Jm Λ̃i Jn x
i=m+1
cosh(r i )
(10)
T
Jm M Jn x
=
kJm M T Jn xkL
Qn
where M = i=m+1 Λ̃i ∈ R(n+1)×(m+1) .

3.2 Nested Hyperbolic Graph Convolutional Network (NHGCN)


The Hyperbolic Graph Convolutional Network (HGCN) proposed in [7] is a generalization of
Euclidean Graph Network to a hyperbolic space. There are three different layers in HGCN:

10
Figure 4: The HGCN Architecture

feature transformation, neighborhood aggregation and non-linear activation. We use our


NH representation to define a hyperbolic feature transformation, the weighted Fréchet mean
to define the neighborhood aggregation and use a tangent ReLU activation. This leads to
a novel HGCN architecture. Figure 4 depicts the HGCN architecture. Each of the three
distinct layers are described in detail below.
Hyperbolic Feature Transformation: Given x ∈ Ln , the hyperbolic feature trans-
formation is defined using (10) as follows

Wx
y= s.t. W Jn W T = Jm (11)
kW xkL

where W ∈ R(m+1)×(n+1) . It is easy to prove that y ∈ Lm .


At the l-th layer, the inputs are the hyperbolic representation xl−1
i from the previous layer
l
and the feature transformation matrix is W . The intermediate hyperbolic representation
of i-th node is computed as follows

W l xl−1 T
xli = i
s.t. W l Jnl−1 W l = Jnl (12)
kW l xl−1
i k

Hyperbolic Neighborhood Aggregation: In GCNs, the neighborhood aggregation is


used to combine neighboring features by computing the weighted centroid of these features.
The weighted centroid in hyperbolic space of a point set {xi }i=1 ∈ Ln is obtained using
the weighted Fréchet mean. However, it does not have closed form expression in hyperbolic
space. We use hyperbolic neighborhood aggregation proposed in [8, 51], where aggregated
representation for a node xli at l-th layer is the weighted centroid µli of its neighboring nodes

11
{xlj }pj=1 ∈ Lnl w.r.t squared Lorentzian distance, namely
p
X
µli = arg minn νjl d2L (xlj , µli ) (13)
µl ∈L l
j=1

where νjl is the weight for xlj and d2L (x, y) = −1 − hx, yiL is the squared Lorentzian
distance[38]. Authors in [28] proved that this problem has closed form solution given by,
Pp l l
l j=1 νj xj
µi = Pp . (14)
|k j=1 νjl xlj kL |

Hyperbolic Nonlinear Activation: A nonlinear activation is required in our network


since the feature transform is a linear operation. We choose to apply tangent ReLU to
prevents our multi-layer network from collapsing into a single layer network. The tangent
ReLU in the hyperbolic space is defined as,

σ(xli ) = Exp0 (ReLU(Log0 (xli ))). (15)

Here 0 = [1, 0, . . . , 0]T ∈ Lnl (correspond to the origin in the Poincaré model) is chosen as
the base point to define the anchor point in the tangent ReLU.

3.3 Optimization
In this section, we will explain how to update parameters in network, i.e. transformation ma-
trix W in (11). Instead of updating W directly, we find an alternative way by decomposing
W into three matrices using (1). More specifically, we write
 
  cosh α sinh α 0T  
1 0  T  1 0
W = sinh α cosh α 0
0 P e 0 QT
0 0 In−1

where Q ∈ SO(n), α ∈ R and P e is the first m rows of a P ∈ SO(n) which is from a Stiefel
manifold [10]. Then we regard our feature transformation as a sequence of multiplication by
three matrices and update them one by one.

4 Experiments
In this section, we will first evaluate NH as a dimensionality reduction method compared
with HoroPCA, tangent PCA and EPGA. We show that the proposed NH outperforms all of
these method on both synthetic data and real data in terms of reconstruction error. Then,
we apply the proposed NHGCN to the problems of link prediction and node classification on
four graph data sets described in [7]. Our method yields results that are better or comparable
to existing hyperbolic graph networks. The implementations are based on Pymanopt[27] and
GeoTorch[29] for dimensionality reduction and NHGCN respectively.

12
Figure 5: Synthetic data in hyperbolic space visualized using a Poincaré disk model along
with principal geodesic obtained using tangent PCA and the NH. NH is better at capturing
the trend of the data since it is not restricted to pass through the Fréchet mean.

4.1 Dimensionality Reduction in Hyperbolic Space


First we present synthetic data experiments followed by experiments on real data.

Synthetic Experiments As a dimensionality reduction method, we compare NH with


three other competing methods: tangent PCA, EPCA, and HoroPCA. Note that the first
two are applicable on any Riemannian manifolds and HoroPCA is proposed specifically
for hyperbolic spaces as is our NH space method. The major difference between NH and
the aforementioned competitors is that NH does not require the fitted submanifold to pass
through the Fréchet mean whereas the others do. This extra requirement can sometimes
lead to failure in capturing the data trend as shown in Figure 5.
Apart from visual inspection, we use the reconstruction error as a measure of the goodness
of fit. To see how NH performs in comparison to others under different levels of noise, we
generate synthetic data from the wrapped normal distribution [31] on L10 with variance
ranging from 0.2 to 2. Then we apply different dimensionality reduction methods to reduce
the dimension down to 2. The result is shown in Figure 6. The results of EPGA and NH are
essentially the same. This is due to the fact that the wrapped normal distribution we chose
is isotropic around the mean and hence in this case the assumption of submanifold passing
through the Fréchet mean is valid. Even in this case, we observe a significant improvement
of NH over tangent PCA and HoroPCA especially in the large variance scenario. The main
reasons are that (i) tangent PCA uses local linearization which would lead to inaccuracies
when the data is not tightly clustered around the Fréchet mean and (ii) the HoroPCA seeks to
maximize the projected variance on the submanifold, which, as is well known, not equivalent
to minimizing the reconstruction error. There is a clear justification for the choice of using
reconstruction error as the objective function since, we want a good approximation of the
original data with the lower-dimensional representation.

Hyperbolic Embeddings of Trees For real data experiments, we consider reducing


the dimensionality of trees that are embedded into a hyperbolic space. We validate our
method on the four datasets described in [39] including (i) a fully balanced tree, (ii) a

13
Datasets balancedtree unbalanced1 unbalanced2 phylo tree diseasome ca-CSphd
tPCA 5.75 4.98 4.86 121.19 21.53 71.67
HoroPCA 7.80±0.06 6.51±0.28 7.35±0.61 108.62±9.20 26.94±0.99 87.99±4.69
EPGA 4.01±0.76 3.23±0.08 3.33±0.46 25.93±0.99 9.72±0.36 22.98±0.23
Nested 3.35±0.05 3.10±0.01 3.22±0.06 24.11±0.68 9.18±0.10 22.68±0.40

Table 1: Reconstruction errors from L10 to L2 . The numbers depicted are: mean error ±
standard dev. of error. Numbers in bold indicate the method with the smallest errors while
underlined numbers indicate the second best results.

Disease Airport PubMed Cora


Task LP NC LP NC LP NC LP NC
GCN[26] 64.7±0.5 69.7±0.4 89.3±0.4 81.4±0.6 91.1±0.5 78.1±0.2 90.4±0.2 81.3±0.3
GAT[47] 69.8±0.3 70.4±0.4 90.5±0.3 81.5±0.3 91.2±0.1 79.0±0.3 93.7±0.1 83.0±0.7
SAGE[17] 65.9±0.3 69.1±0.6 90.4±0.5 82.1±0.5 86.2±1.0 77.4±2.2 85.5±0.6 77.9±2.4
SGC[48] 65.1±0.2 69.5±0.2 89.8±0.3 80.6±0.1 94.1±0.0 78.9±0.0 91.5±0.1 81.0±0.1
HGCN[7] 90.8±0.3 74.5±0.9 96.4±0.1 90.6±0.2 96.3±0.0 80.3±0.3 92.9±0.1 79.9±0.2
H2H-GCN[9] 97.0±0.3 88.6±1.7 96.4±0.1 89.3±0.5 96.9±0.0 79.9±0.5 95.0±0.0 82.8±0.4
HYBONET[8] 96.3±0.3 94.5±0.8 97.0±0.2 92.5±0.9 96.4±0.1 77.9±1.0 94.3±0.3 81.3±0.9
LGCN[51] 96.6±0.6 84.4±0.8 - - 96.6±0.1 78.6±0.7 93.6±0.4 83.3±0.7
NHGCN(Ours) 92.8±0.2 91.7±0.7 97.2±0.3 92.4±0.7 96.9±0.1 80.5±0.0 93.6±0.2 80.3±0.8

Table 2: Area under the ROC test results (%) for link prediction (LP), and F1 scores (%)
for node classification (NC). The results of other networks are obtained from the original
papers and in [51], the authors did not test their network on the Airport dataset.

14
Figure 6: Reconstruction errors for L10 to L2 . The data is generated from wrapped normal
distributions [31] with variances ranging from 0.2 to 2.

phylogenetic tree, (iii) a biological graph comprising of diseases’ relationships, and (iv) a
graph of Computer Science (CS) Ph.D. advisor-advisee relationships. We also create another
two datasets by removing some edges in the balanced tree dataset. We apply the method in
[15] to embed the tree datasets into a Poincaré ball of dimension 10 and then apply our NH
along with other competing dimensionality reduction methods to reduce the dimension down
to 2. The results are reported in Table 1. In Table 1, we report the means and the standard
deviations of the reconstruction errors for EPGA, HoroPCA and NH. From the table, we can
see that our method performs the best among other methods. Especially, the HoroPCA is
worse than the tangent PCA and EPGA in terms of reconstruction error, through it shows
higher explained variance in [6]. The reason might be that HoroPCA seeks projections that
maximize the explained variance which is not equivalent to minimizing the reconstruction
error in the Riemannian manifold case.

4.2 Nested Hyperbolic Graph Networks


To evaluate the power of the proposed NHGCN, we apply it to problems of link prediction
and node classification. We use four public domain datasets: Disease [7], Airport [7], PubMed
[33], and Cora [42]. We compare our NHGCN with many other graph neural networks and
the results are reported in Table 2. For the link prediction (LP), we report the means and the
standard deviation of the area under the receiver operating characterization (ROC) curve
on the test data; for the problem of node classification (NC), we report the mean and the
standard deviation of the F1 scores. As evident from the table, our results are comparable to
the state-of-the-art and in three cases better. Our reported results can be improved further
via the use of better Riemannian ADAM used in this work e.g., one with a built in variance
reduction [41].

15
5 Conclusion
In this paper, we presented a novel dimensionality reduction technique in hyperbolic spaces
called the nested hyperbolic (NH) space representation. NH representation was constructed
using a projection operator that was shown to yield isometric embeddings and further was
shown to be equivariant to the isometry group admitted by the hyperbolic space. Further,
we empirically showed that it yields lower reconstruction error compared to the state-of-
the-art (HorroPCA, PGA, tPCA). Using the NH representation, we developed a novel fully
HGCN and tested it on several data sets. Our NHGCN was shown to achieve comparable
to superior performance over several competing methods.

Acknowledgement: This research was in part funded by the NSF grant IIS-1724174
to Vemuri.

References
[1] Monami Banerjee, Rudrasis Chakraborty, and Baba C Vemuri. Sparse exact pga on riemannian
manifolds. In Proceedings of the IEEE International Conference on Computer Vision, pages
5010–5018, 2017. 3
[2] Herbert Busemann. The geometry of geodesics. Pure and Applied Mathematics, 1955. 4
[3] James W. Cannon, William J. Floyd, Richard Kenyon, and Walter R. Parry. Hyperbolic
Geometry, volume 31. MSRI Publications, 1997. 3, 6
[4] James W Cannon, William J Floyd, Richard Kenyon, and Walter R Parry. Hyperbolic geom-
etry. Flavors of geometry, 31:59–115, 1997. 6
[5] Rudrasis Chakraborty, Dohyung Seo, and Baba C Vemuri. An efficient exact-pga algorithm
for constant curvature manifolds. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pages 3976–3984, 2016. 3
[6] Ines Chami, Albert Gu, Dat P Nguyen, and Christopher Ré. Horopca: Hyperbolic dimen-
sionality reduction via horospherical projections. In International Conference on Machine
Learning, pages 1419–1429. PMLR, 2021. 3, 15
[7] Ines Chami, Zhitao Ying, Christopher Ré, and Jure Leskovec. Hyperbolic graph convolutional
neural networks. Advances in Neural Information Processing Systems, 32:4868–4879, 2019. 4,
5, 10, 12, 14, 15
[8] Weize Chen, Xu Han, Yankai Lin, Hexu Zhao, Zhiyuan Liu, Peng Li, Maosong Sun, and Jie
Zhou. Fully hyperbolic neural networks. arXiv preprint arXiv:2105.14686, 2021. 3, 4, 11, 14
[9] Jindou Dai, Yuwei Wu, Zhi Gao, and Yunde Jia. A hyperbolic-to-hyperbolic graph convolu-
tional network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 154–163, 2021. 4, 14
[10] Alan Edelman, Tomás A Arias, and Steven T Smith. The geometry of algorithms with or-
thogonality constraints. SIAM journal on Matrix Analysis and Applications, 20(2):303–353,
1998. 12

16
[11] P Thomas Fletcher, Conglin Lu, Stephen M Pizer, and Sarang Joshi. Principal geodesic
analysis for the study of nonlinear statistics of shape. IEEE transactions on medical imaging,
23(8):995–1005, 2004. 3
[12] Maurice Fréchet. Les ’el´ements al´eatoires de nature quelconque dans un espace distanci´e.
Ann. Inst. H. Poincar´e, 10:215–310, 1948. 3
[13] Jean Gallier and Jocelyn Quaintance. Notes on differential geometry and lie groups. University
of Pennsylvannia, 4:3–1, 2012. 7
[14] Octavian-Eugen Ganea, Gary Bécigneul, and Thomas Hofmann. Hyperbolic neural networks.
Advances in Neural Information Processing Systems 31, pages 5345–5355, 2019. 4
[15] Albert Gu, Frederic Sala, Beliz Gunel, and Christopher Ré. Learning mixed-curvature repre-
sentations in product spaces. In International Conference on Learning Representations, 2018.
15
[16] Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz
Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, et al. Hyperbolic
attention networks. In International Conference on Learning Representations, 2018. 4
[17] William L Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on large
graphs. In Proceedings of the 31st International Conference on Neural Information Processing
Systems, pages 1025–1035, 2017. 14
[18] Mehrtash Harandi, Mathieu Salzmann, and Richard Hartley. Dimensionality reduction on
SPD manifolds: The emergence of geometry-aware methods. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 40(1):48–62, 2018. 3
[19] Trevor Hastie and Werner Stuetzle. Principal curves. Journal of the American Statistical
Association, 84(406):502–516, 1989. 3
[20] Søren Hauberg. Principal curves on riemannian manifolds. IEEE transactions on pattern
analysis and machine intelligence, 38(9):1915–1921, 2015. 3
[21] Sigurdur Helgason. Differential geometry, Lie groups, and symmetric spaces. Academic Press,
1979. 9
[22] Stephan Huckemann, Thomas Hotz, and Axel Munk. Intrinsic shape analysis: Geodesic pca
for riemannian manifolds modulo isometric lie group actions. Statistica Sinica, pages 1–58,
2010. 3
[23] Ian T Jolliffe and Jorge Cadima. Principal component analysis: a review and recent de-
velopments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and
Engineering Sciences, 374(2065):20150202, 2016. 2
[24] Sungkyu Jung, Ian L Dryden, and James Stephen Marron. Analysis of principal nested spheres.
Biometrika, 99(3):551–568, 2012. 3, 5
[25] Valentin Khrulkov, Leyla Mirvakhabova, Evgeniya Ustinova, Ivan Oseledets, and Victor Lem-
pitsky. Hyperbolic image embeddings. In Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition, pages 6418–6428, 2020. 4
[26] Thomas N. Kipf and Max Welling. Semi-Supervised Classification with Graph Convolutional
Networks. In International Conference on Learning Representations, 2017. 14
[27] Niklas Koep and Sebastian Weichwald. Pymanopt: A python toolbox for optimization on
manifolds using automatic differentiation. Journal of Machine Learning Research, 17:1–5,
2016. 12

17
[28] Marc Law, Renjie Liao, Jake Snell, and Richard Zemel. Lorentzian distance learning for
hyperbolic representations. In International Conference on Machine Learning, pages 3672–
3681. PMLR, 2019. 12
[29] Mario Lezcano-Casado. Trivializations for gradient-based optimization on manifolds. In Ad-
vances in Neural Information Processing Systems, NeurIPS, pages 9154–9164, 2019. 12
[30] Qi Liu, Maximilian Nickel, and Douwe Kiela. Hyperbolic graph neural networks. Advances in
Neural Information Processing Systems, 32:8230–8241, 2019. 4
[31] Emile Mathieu. Charline le lan, chris j maddison, ryota tomioka, and yee whye teh. continu-
ous hierarchical representations with poincaré variational auto-encoders. Advances in Neural
Information Processing Systems, pages 12544–12555, 2019. 4, 13, 15
[32] Valter Moretti. The interplay of the polar decomposition theorem and the lorentz group. arXiv
preprint math-ph/0211047, 2002. 7
[33] Galileo Namata, Ben London, Lise Getoor, Bert Huang, and UMD EDU. Query-driven active
surveying for collective classification. In 10th International Workshop on Mining and Learning
with Graphs, volume 8, page 1, 2012. 15
[34] Maximillian Nickel and Douwe Kiela. Poincaré embeddings for learning hierarchical represen-
tations. Advances in neural information processing systems, 30:6338–6347, 2017. 4
[35] Maximillian Nickel and Douwe Kiela. Learning continuous hierarchies in the lorentz model
of hyperbolic geometry. In International Conference on Machine Learning, pages 3779–3788.
PMLR, 2018. 3
[36] Jiwoong Park, Junho Cho, Hyung Jin Chang, and Jin Young Choi. Unsupervised hyperbolic
representation learning via message passing auto-encoders. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pages 5516–5526, 2021. 4
[37] Xavier Pennec. Barycentric subspace analysis on manifolds. The Annals of Statistics,
46(6A):2711–2746, 2018. 3
[38] John G Ratcliffe. Foundations of Hyperbolic Manifolds, volume 149. Springer, 2 edition, 2006.
6, 12
[39] Frederic Sala, Chris De Sa, Albert Gu, and Christopher Ré. Representation tradeoffs for
hyperbolic embeddings. In International conference on machine learning, pages 4460–4469.
PMLR, 2018. 4, 13
[40] Rik Sarkar. Low distortion delaunay embedding of trees in hyperbolic plane. In International
Symposium on Graph Drawing, pages 355–366. Springer, 2011. 4
[41] Hiroyuki Sato, Hiroyuki Kasai, and Bamdev Mishra. Riemannian stochastic variance reduced
gradient algorithm with retraction and vector transport. SIAM Journal on Optimization,
29(2):1444–1472, 2019. 15
[42] Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-
Rad. Collective classification in network data. AI magazine, 29(3):93–93, 2008. 15
[43] Ryohei Shimizu, YUSUKE Mukuta, and Tatsuya Harada. Hyperbolic neural networks++. In
International Conference on Learning Representations, 2020. 4
[44] Stefan Sommer, François Lauze, Søren Hauberg, and Mads Nielsen. Manifold valued statis-
tics, exact principal geodesic analysis and the effect of linear approximations. In European
conference on computer vision, pages 43–56. Springer, 2010. 3
[45] Alexandru Tifrea, Gary Becigneul, and Octavian-Eugen Ganea. Poincare glove: Hyperbolic
word embeddings. In International Conference on Learning Representations, 2018. 4

18
[46] Abraham A Ungar. Gyrovector spaces and their differential geometry. Nonlinear Funct. Anal.
Appl, 10(5):791–834, 2005. 4
[47] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and
Yoshua Bengio. Graph attention networks. In International Conference on Learning Repre-
sentations, 2018. 14
[48] Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger.
Simplifying graph convolutional networks. In International conference on machine learning,
pages 6861–6871. PMLR, 2019. 14
[49] Chun-Hao Yang and Baba C Vemuri. Nested grassmanns for dimensionality reduction with
applications to shape analysis. In International Conference on Information Processing in
Medical Imaging, pages 136–149. Springer, 2021. 3, 5
[50] Miaomiao Zhang and Tom Fletcher. Probabilistic principal geodesic analysis. Advances in
Neural Information Processing Systems, 26:1178–1186, 2013. 3
[51] Yiding Zhang, Xiao Wang, Chuan Shi, Nian Liu, and Guojie Song. Lorentzian graph convo-
lutional networks. In Proceedings of the Web Conference 2021, pages 1249–1261, 2021. 11,
14

19

View publication stats

You might also like