Nested Hyperbolic Spaces For Dimensionality Reduct
Nested Hyperbolic Spaces For Dimensionality Reduct
net/publication/356841985
CITATIONS READS
0 28
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Baba Vemuri on 12 May 2022.
Baba C. Vemuri
Department of CISE, University of Florida
[email protected]
December 8, 2021
Abstract
Hyperbolic neural networks have been popular in the recent past due to their abil-
ity to represent hierarchical data sets effectively and efficiently. The challenge in de-
veloping these networks lies in the nonlinearity of the embedding space namely, the
Hyperbolic space. Hyperbolic space is a homogeneous Riemannian manifold of the
Lorentz group which is a semi-Riemannian manifold, i.e. a manifold equipped with an
indefinite metric. Most existing methods (with some exceptions) use local lineariza-
tion to define a variety of operations paralleling those used in traditional deep neural
networks in Euclidean spaces. In this paper, we present a novel fully hyperbolic neural
network which uses the concept of projections (embeddings) followed by an intrinsic
aggregation and a nonlinearity all within the hyperbolic space. The novelty here lies in
the projection which is designed to project data on to a lower-dimensional embedded
hyperbolic space and hence leads to a nested hyperbolic space representation indepen-
dently useful for dimensionality reduction. The main theoretical contribution is that
the proposed embedding is proved to be isometric and equivariant under the Lorentz
transformations, which are the natural isometric transformations in hyperbolic spaces.
This projection is computationally efficient since it can be expressed by simple linear
operations, and, due to the aforementioned equivariance property, it allows for weight
sharing. The nested hyperbolic space representation is the core component of our
network and therefore, we first compare this ensuing nested hyperbolic space represen-
tation – independent of the network – with other dimensionality reduction methods
such as tangent PCA, principal geodesic analysis (PGA) and HoroPCA. Based on this
equivariant embedding, we develop a novel fully hyperbolic graph convolutional neural
network architecture to learn the parameters of the projection. Finally, we present ex-
periments demonstrating comparative performance of our network on several publicly
available data sets.
1
Figure 1: Projections of data from a 2-dimensional hyperbolic space to a 1-dimensional
hyperbolic space using different dimensionality reduction methods. The results are visualized
in the Poincaré disk. Original data (blue dots) lie in a 2-dimensional hyperbolic space and
have a zero mean (origin of the Poincaré disk). The HoroPCA direction (red dotted line)
and the principal geodesic obtained by tangent PCA (orange dashed line) and Exact PGA
(purple dash-dotted line) fail to capture the main trend of the data since they are restricted
to learn a geodesic submanifold passing through the mean. In contrast, our nested hyperbolic
(NH) representation (green solid line), captures the data trend more accurately. The diamond
markers on each line represent the reconstructed data from each method. The reconstruction
errors for HoroPCA, tangent PCA, EPGA and the proposed NH scheme in this example are,
0.1708, 0.1202, 0.1638 and 0.0062 respectively.
1 Introduction
Hyperbolic geometry is a centuries old field of non-Euclidean geometry and has recently
found its way into the field of machine learning, in particular into deep learning in the form
of hyperbolic neural networks (HNNs) or hyperbolic graph convolutional networks (HGCNs)
and recently for dimensionality reduction of data embedded in the hyperbolic space. In
this paper, we will discuss both problems namely, dimensionality reduction in hyperbolic
spaces and HNN architectures. In particular, we will present novel techniques for both these
problems. In the following, we present literature review of the two above stated problems and
establish the motivation for our work. A word on terminology, we will use the term hyperbolic
neural network and hyperbolic graph (convolutional) neural network synonymously for the
rest of the paper.
2
on PCA). PCA however is limited to data in vector spaces. For data that are manifold-
valued, principal geodesic analysis (PGA) was presented in [11], which yields the projection
of data onto principal geodesic submanifolds passing through an intrinsic (Fréchet) mean
[12] of the data. They find the geodesic submanifold of a lower dimension that maximizes
the projected variance and computationally, this was achieved via linear approximation,
i.e., applying PCA on the tangent space anchored at the Fréchet mean. This is sometimes
referred to as the tangent PCA (tPCA). This approximation however requires the data
to be clustered around the Fréchet mean, otherwise the tangent space approximation to
the manifold leads to inaccuracies. Subsequently, [44] presented the Exact PGA (EPGA)
algorithm, which does not use any linear approximation. However, EPGA is computationally
expensive as it requires two non-linear optimizations steps per iteration (projection to the
geodesic submanifold and finding the new geodesic direction such that the reconstruction
error is minimized). Later, authors in [5] developed a version of EPGA for constant sectional
curvature manifolds, namely the hypersphere and the hyperbolic space, by deriving closed
form formulae for the projection. There are many variants of PGA and we refer the reader to
[1, 22, 50] for the details. More recently, Barycentric subspace analysis (BSA) was proposed
in [37] which finds a more general parameterization of a nested sequence of submanifolds via
the minimization of unexplained variance. Another useful dimensionality reduction scheme
is the Principal curves [19] and their generalization to Riemannian manifolds [20] that are
more appropriate for certain applications.
A salient feature of PCA is that it yields nested linear subspaces, i.e., the reduced di-
mensional principal subspaces form a nested hierarchy. This idea was exploited in [24] where
authors proposed the principal nested spheres (PNS) by embedding an (n − 1)-sphere in
to an n-sphere, the embedding however is not necessarily isometric. Hence, PNS is more
general than PGA in that PNS does not have to be geodesic. Similarly, for the manifold Pn
of (n × n) symmetric positive definite (SPD) matrices, authors in [18] proposed a geometry-
aware dimensionality reduction by projecting data on Pn to Pm for some m n. More
recently, the idea of constructing a nested sequence of manifolds was presented in [49] where
authors unified and generalized the nesting concept to general Riemannian homogeneous
manifolds, which form a large class of Riemannian manifolds, including the hypersphere, Pn ,
the Grassmannian, Stiefel manifold, Lie groups, and others. Although the general frame-
work in [49] seems straightforward and applicable to hyperbolic spaces, many significantly
important technical aspects need to be addressed and derived in detail. In this paper, we
will present novel derivations suited for the hyperbolic spaces – a projection operator which
is proved to yield an isometric embedding, and a proof of equivariance to isometries of the
projection operator – which will facilitate the construction of nested hyperbolic spaces and
the hyperbolic neural network. Note that there are five models of the hyperbolic space
namely, the hyperboloid (Lorentz) model, the Poincaré disk/ball model, the Poincaré half
plane model, the Klein model and the Jemisphere model [3]. All these models are isomet-
rically equivalent but some are better suited than others depending on the application. We
choose the Lorentz model of the hyperbolic space with a Lorentzian metric in our work.
The choice of this model and the associated metric over other models is motivated by the
properties of Riemannian optimization efficiency and numerical stability afforded [35, 8].
Most recently, an elegant approach called HoroPCA was proposed in [6], for dimension-
ality reduction in hyperbolic spaces. In particular, the authors represented the hyperbolic
3
space using the Poincaré model and they proposed to generalize the notion of direction and
the coordinates in a given direction using ideal points (points at infinity) and the Busemann
coordinates (defined using the Busemann function) [2]. The levels sets of the Busemann
function, called the horospheres, resemble the hyperplanes (or affine subspaces) in Euclidean
spaces and hence the dimensionality reduction is achieved by a projection that moves points
along a horosphere. The data is then projected to a geodesic hull of a base point b and
a number of ideal points p1 , . . . , pK , which is also a geodesic submanifold. This is the key
difference between HoroPCA and our proposed method which leads to a significant difference
in performance. This is evident from the toy example in Figure 1 which depicts the reduced
dimensional representations obtained by our method in comparison to those from EPGA,
HoroPCA, and tangent PCA. Note that all of the other methods yield submanifold repre-
sentations that do not capture the data trend accurately, unlike ours. More comprehensive
comparisons will be made in a later section.
To briefly summarize, our first goal in this paper is to present a nested hyperbolic space
representation for dimensionality reduction and we will demonstrate, via synthetic exam-
ples and real datasets, that it achieves a lower reconstruction error in comparison to other
competing methods.
4
not use the intrinsic characterization of the hyperbolic space as a homogeneous space with
the isometry group being the Lorentz group.
Lorentz transformations are however inappropriate for defining projection operations
(required for reducing the dimensionality) as they preserve the Lorentz model only when
there is no change in dimension. In other words, to find a lower-dimensional hyperbolic
space representation for data embedded in a higher-dimensional hyperbolic space, one cannot
use Lorentz transformations directly. Hence, we propose to use an isometric embedding
operation mentioned in the previous subsection as the building block to design a hyperbolic
neural network. We will now briefly summarize our proposed model and the contributions
of our work.
2 Preliminaries
In this section, we briefly review relevant concepts of hyperbolic geometry. In this paper,
we will regard the hyperbolic space as a homogeneous Riemannian manifold of the Lorentz
group and present a few important geometric concepts, including the geodesic distance and
the exponential map, in the hyperbolic space, which are used in our work. The materials
5
(a) Lorentz rotation (b) Lorentz boost
Figure 2: Illustration of two kinds of Lorentz transformation, Lorentz rotation and Lorentz
boost in a Lorentz model. They are isometric operations of the Lorentz model.
presented in this section can be found in most textbooks on hyperbolic spaces, for example
[38, 4].
This is called the n-dimensional hyperboloid model of one sheet of a hyperbolic space defined
in Rn+1 .
6
2.2 Lorentz Transformations
In the Lorentzian space, the linear isometries are called the Lorentz transformation, i.e. the
map φ : Rn+1 → Rn+1 is a Lorentz transformation if hφ(x), φ(y)iL = hx, yiL for any x, y ∈
Rn+1 . It is easy to see that all Lorentz transformations form a group under composition,
and this group is denoted by O(1, n), called the Lorentz group. The matrix representation
of O(1, n) in Rn+1 is defined as follows. Let Jn = diag(−1, In ) where In is the n × n identity
matrix and diag(·) denotes a diagonal matrix. Then, O(1, n) is defined as O(1, n) := {A ∈
Mn+1 (R) : AJn AT = AT Jn A = Jn }. There are a few important subgroups of O(1, n): (i) the
subgroup O+ (1, n) := {A ∈ O(1, n) : a11 > 0} is called the positive Lorentz group; (ii) the
subgroup SO(1, n) := {A ∈ O(1, n) : det(A) = 1} is called the special Lorentz group; (iii)
the subgroup SO+ (1, n) := {A ∈ SO(1, n) : a11 > 0} is called the positive special Lorentz
group. Briefly speaking, the special Lorentz group preserves the orientation, and the positive
Lorentz group preserves the sign of the first entry of x ∈ Ln .
7
Fact 4. Every Lorentz transformation matrix A ∈ SO+ (1, n) can be decomposed into
cosh α sinh α 0T
1 0 1 0
A= sinh α cosh α 0T (1)
0 P 0 QT
0 0 In−1
where P, Q ∈ SO(n), α ∈ R and 0 ∈ Rn−1 . See Figure 2 for examples of the Lorentz rotations
and the Lorentz boosts.
The matrix in the middle is the Lorentz boost along the first coordinate axis. This
decomposition will be very useful in the optimization problem stated in Section 3.3, equation
(11).
We now conclude this section by presenting the explicit closed form formulae for the
exponential map and the geodesic distance. For any x ∈ Ln and v ∈ Tp Ln (the tangent
space of Ln at x), the exponential map at x is given by
Since Ln is a negatively curved Riemannian manifold, its exponential map is invertible and
the inverse of the exponential map, also called the Log map, is given by
θ
Logx (y) = (y − cosh(θ)x) (3)
sinh(θ)
8
𝜋𝑚+1 𝜋𝑚
𝑥𝑚+1 = sinh𝛼
𝕃𝑚
𝜄𝑚
𝕃𝑚+1
Figure 3: Illustration of NH model using the embedding ιm in Eq. (6) of Lm into Lm+1 .
The m-dimensional nested hyperboloid in Lm+1 is indeed the intersection of Lm+1 and an
m-dimensional hyperplane.
where O ∈ SO+ (1, m), a, b ∈ Rm+1 , c 6= aT O−1 b, and Λ ∈ SO+ (1, m + 1). The function
adapted-GS(·) is an adaptation of the standard Gram-Schmidt process to orthonormalize
vectors with respect to the Lorentz inner product defined earlier.
The Riemannian submersion (see [21] for the definition of a Riemannian submersion)
π : SO+ (1, m) → Lm is given by π(O) = O1 where O ∈ SO+ (1, m) and O1 is the first
column of O. Therefore, the induced embedding ιm : Lm → Lm+1 is
cosh(r)x
ιm (x) = Λ = cosh(r)Λ̃x + sinh(r)v (6)
sinh(r)
where Λ = [Λ̃ v] ∈ SO+ (1, m + 1). This class of embeddings is quite general as it includes
isometric embeddings as special cases.
Proof. It follows directly from the definitions of the Lorentz transformation and the geodesic
distance on Lm .
Furthermore, the embedding (6) is equivariant under Lorentz transformations.
9
Proof. For x ∈ Lm and R ∈ SO+ (1, m),
cosh(r)Rx
ιm (Rx) = Λ
sinh(r)
R 0 cosh(r)x
=Λ
0 1 sinh(r)
R 0 −1 cosh(r)x
=Λ Λ Λ
0 1 sinh(r)
= ΨΛ (ι̃m (R))ιm (x).
The unknowns Λ = [Λ̃ v] and r can then be obtained by minimizing the reconstruction
error
N
1 X
L(Λ, r) = (dL (xi , x̂i ))2 . (9)
N i=1
The projection of x ∈ Ln into Lm for n > m can be obtained via the composition
π := πm+1 ◦ · · · ◦ πn
n
!T
Y 1
π(x) = Jm Λ̃i Jn x
i=m+1
cosh(r i )
(10)
T
Jm M Jn x
=
kJm M T Jn xkL
Qn
where M = i=m+1 Λ̃i ∈ R(n+1)×(m+1) .
10
Figure 4: The HGCN Architecture
Wx
y= s.t. W Jn W T = Jm (11)
kW xkL
W l xl−1 T
xli = i
s.t. W l Jnl−1 W l = Jnl (12)
kW l xl−1
i k
11
{xlj }pj=1 ∈ Lnl w.r.t squared Lorentzian distance, namely
p
X
µli = arg minn νjl d2L (xlj , µli ) (13)
µl ∈L l
j=1
where νjl is the weight for xlj and d2L (x, y) = −1 − hx, yiL is the squared Lorentzian
distance[38]. Authors in [28] proved that this problem has closed form solution given by,
Pp l l
l j=1 νj xj
µi = Pp . (14)
|k j=1 νjl xlj kL |
Here 0 = [1, 0, . . . , 0]T ∈ Lnl (correspond to the origin in the Poincaré model) is chosen as
the base point to define the anchor point in the tangent ReLU.
3.3 Optimization
In this section, we will explain how to update parameters in network, i.e. transformation ma-
trix W in (11). Instead of updating W directly, we find an alternative way by decomposing
W into three matrices using (1). More specifically, we write
cosh α sinh α 0T
1 0 T 1 0
W = sinh α cosh α 0
0 P e 0 QT
0 0 In−1
where Q ∈ SO(n), α ∈ R and P e is the first m rows of a P ∈ SO(n) which is from a Stiefel
manifold [10]. Then we regard our feature transformation as a sequence of multiplication by
three matrices and update them one by one.
4 Experiments
In this section, we will first evaluate NH as a dimensionality reduction method compared
with HoroPCA, tangent PCA and EPGA. We show that the proposed NH outperforms all of
these method on both synthetic data and real data in terms of reconstruction error. Then,
we apply the proposed NHGCN to the problems of link prediction and node classification on
four graph data sets described in [7]. Our method yields results that are better or comparable
to existing hyperbolic graph networks. The implementations are based on Pymanopt[27] and
GeoTorch[29] for dimensionality reduction and NHGCN respectively.
12
Figure 5: Synthetic data in hyperbolic space visualized using a Poincaré disk model along
with principal geodesic obtained using tangent PCA and the NH. NH is better at capturing
the trend of the data since it is not restricted to pass through the Fréchet mean.
13
Datasets balancedtree unbalanced1 unbalanced2 phylo tree diseasome ca-CSphd
tPCA 5.75 4.98 4.86 121.19 21.53 71.67
HoroPCA 7.80±0.06 6.51±0.28 7.35±0.61 108.62±9.20 26.94±0.99 87.99±4.69
EPGA 4.01±0.76 3.23±0.08 3.33±0.46 25.93±0.99 9.72±0.36 22.98±0.23
Nested 3.35±0.05 3.10±0.01 3.22±0.06 24.11±0.68 9.18±0.10 22.68±0.40
Table 1: Reconstruction errors from L10 to L2 . The numbers depicted are: mean error ±
standard dev. of error. Numbers in bold indicate the method with the smallest errors while
underlined numbers indicate the second best results.
Table 2: Area under the ROC test results (%) for link prediction (LP), and F1 scores (%)
for node classification (NC). The results of other networks are obtained from the original
papers and in [51], the authors did not test their network on the Airport dataset.
14
Figure 6: Reconstruction errors for L10 to L2 . The data is generated from wrapped normal
distributions [31] with variances ranging from 0.2 to 2.
phylogenetic tree, (iii) a biological graph comprising of diseases’ relationships, and (iv) a
graph of Computer Science (CS) Ph.D. advisor-advisee relationships. We also create another
two datasets by removing some edges in the balanced tree dataset. We apply the method in
[15] to embed the tree datasets into a Poincaré ball of dimension 10 and then apply our NH
along with other competing dimensionality reduction methods to reduce the dimension down
to 2. The results are reported in Table 1. In Table 1, we report the means and the standard
deviations of the reconstruction errors for EPGA, HoroPCA and NH. From the table, we can
see that our method performs the best among other methods. Especially, the HoroPCA is
worse than the tangent PCA and EPGA in terms of reconstruction error, through it shows
higher explained variance in [6]. The reason might be that HoroPCA seeks projections that
maximize the explained variance which is not equivalent to minimizing the reconstruction
error in the Riemannian manifold case.
15
5 Conclusion
In this paper, we presented a novel dimensionality reduction technique in hyperbolic spaces
called the nested hyperbolic (NH) space representation. NH representation was constructed
using a projection operator that was shown to yield isometric embeddings and further was
shown to be equivariant to the isometry group admitted by the hyperbolic space. Further,
we empirically showed that it yields lower reconstruction error compared to the state-of-
the-art (HorroPCA, PGA, tPCA). Using the NH representation, we developed a novel fully
HGCN and tested it on several data sets. Our NHGCN was shown to achieve comparable
to superior performance over several competing methods.
Acknowledgement: This research was in part funded by the NSF grant IIS-1724174
to Vemuri.
References
[1] Monami Banerjee, Rudrasis Chakraborty, and Baba C Vemuri. Sparse exact pga on riemannian
manifolds. In Proceedings of the IEEE International Conference on Computer Vision, pages
5010–5018, 2017. 3
[2] Herbert Busemann. The geometry of geodesics. Pure and Applied Mathematics, 1955. 4
[3] James W. Cannon, William J. Floyd, Richard Kenyon, and Walter R. Parry. Hyperbolic
Geometry, volume 31. MSRI Publications, 1997. 3, 6
[4] James W Cannon, William J Floyd, Richard Kenyon, and Walter R Parry. Hyperbolic geom-
etry. Flavors of geometry, 31:59–115, 1997. 6
[5] Rudrasis Chakraborty, Dohyung Seo, and Baba C Vemuri. An efficient exact-pga algorithm
for constant curvature manifolds. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pages 3976–3984, 2016. 3
[6] Ines Chami, Albert Gu, Dat P Nguyen, and Christopher Ré. Horopca: Hyperbolic dimen-
sionality reduction via horospherical projections. In International Conference on Machine
Learning, pages 1419–1429. PMLR, 2021. 3, 15
[7] Ines Chami, Zhitao Ying, Christopher Ré, and Jure Leskovec. Hyperbolic graph convolutional
neural networks. Advances in Neural Information Processing Systems, 32:4868–4879, 2019. 4,
5, 10, 12, 14, 15
[8] Weize Chen, Xu Han, Yankai Lin, Hexu Zhao, Zhiyuan Liu, Peng Li, Maosong Sun, and Jie
Zhou. Fully hyperbolic neural networks. arXiv preprint arXiv:2105.14686, 2021. 3, 4, 11, 14
[9] Jindou Dai, Yuwei Wu, Zhi Gao, and Yunde Jia. A hyperbolic-to-hyperbolic graph convolu-
tional network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 154–163, 2021. 4, 14
[10] Alan Edelman, Tomás A Arias, and Steven T Smith. The geometry of algorithms with or-
thogonality constraints. SIAM journal on Matrix Analysis and Applications, 20(2):303–353,
1998. 12
16
[11] P Thomas Fletcher, Conglin Lu, Stephen M Pizer, and Sarang Joshi. Principal geodesic
analysis for the study of nonlinear statistics of shape. IEEE transactions on medical imaging,
23(8):995–1005, 2004. 3
[12] Maurice Fréchet. Les ’el´ements al´eatoires de nature quelconque dans un espace distanci´e.
Ann. Inst. H. Poincar´e, 10:215–310, 1948. 3
[13] Jean Gallier and Jocelyn Quaintance. Notes on differential geometry and lie groups. University
of Pennsylvannia, 4:3–1, 2012. 7
[14] Octavian-Eugen Ganea, Gary Bécigneul, and Thomas Hofmann. Hyperbolic neural networks.
Advances in Neural Information Processing Systems 31, pages 5345–5355, 2019. 4
[15] Albert Gu, Frederic Sala, Beliz Gunel, and Christopher Ré. Learning mixed-curvature repre-
sentations in product spaces. In International Conference on Learning Representations, 2018.
15
[16] Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz
Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, et al. Hyperbolic
attention networks. In International Conference on Learning Representations, 2018. 4
[17] William L Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on large
graphs. In Proceedings of the 31st International Conference on Neural Information Processing
Systems, pages 1025–1035, 2017. 14
[18] Mehrtash Harandi, Mathieu Salzmann, and Richard Hartley. Dimensionality reduction on
SPD manifolds: The emergence of geometry-aware methods. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 40(1):48–62, 2018. 3
[19] Trevor Hastie and Werner Stuetzle. Principal curves. Journal of the American Statistical
Association, 84(406):502–516, 1989. 3
[20] Søren Hauberg. Principal curves on riemannian manifolds. IEEE transactions on pattern
analysis and machine intelligence, 38(9):1915–1921, 2015. 3
[21] Sigurdur Helgason. Differential geometry, Lie groups, and symmetric spaces. Academic Press,
1979. 9
[22] Stephan Huckemann, Thomas Hotz, and Axel Munk. Intrinsic shape analysis: Geodesic pca
for riemannian manifolds modulo isometric lie group actions. Statistica Sinica, pages 1–58,
2010. 3
[23] Ian T Jolliffe and Jorge Cadima. Principal component analysis: a review and recent de-
velopments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and
Engineering Sciences, 374(2065):20150202, 2016. 2
[24] Sungkyu Jung, Ian L Dryden, and James Stephen Marron. Analysis of principal nested spheres.
Biometrika, 99(3):551–568, 2012. 3, 5
[25] Valentin Khrulkov, Leyla Mirvakhabova, Evgeniya Ustinova, Ivan Oseledets, and Victor Lem-
pitsky. Hyperbolic image embeddings. In Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition, pages 6418–6428, 2020. 4
[26] Thomas N. Kipf and Max Welling. Semi-Supervised Classification with Graph Convolutional
Networks. In International Conference on Learning Representations, 2017. 14
[27] Niklas Koep and Sebastian Weichwald. Pymanopt: A python toolbox for optimization on
manifolds using automatic differentiation. Journal of Machine Learning Research, 17:1–5,
2016. 12
17
[28] Marc Law, Renjie Liao, Jake Snell, and Richard Zemel. Lorentzian distance learning for
hyperbolic representations. In International Conference on Machine Learning, pages 3672–
3681. PMLR, 2019. 12
[29] Mario Lezcano-Casado. Trivializations for gradient-based optimization on manifolds. In Ad-
vances in Neural Information Processing Systems, NeurIPS, pages 9154–9164, 2019. 12
[30] Qi Liu, Maximilian Nickel, and Douwe Kiela. Hyperbolic graph neural networks. Advances in
Neural Information Processing Systems, 32:8230–8241, 2019. 4
[31] Emile Mathieu. Charline le lan, chris j maddison, ryota tomioka, and yee whye teh. continu-
ous hierarchical representations with poincaré variational auto-encoders. Advances in Neural
Information Processing Systems, pages 12544–12555, 2019. 4, 13, 15
[32] Valter Moretti. The interplay of the polar decomposition theorem and the lorentz group. arXiv
preprint math-ph/0211047, 2002. 7
[33] Galileo Namata, Ben London, Lise Getoor, Bert Huang, and UMD EDU. Query-driven active
surveying for collective classification. In 10th International Workshop on Mining and Learning
with Graphs, volume 8, page 1, 2012. 15
[34] Maximillian Nickel and Douwe Kiela. Poincaré embeddings for learning hierarchical represen-
tations. Advances in neural information processing systems, 30:6338–6347, 2017. 4
[35] Maximillian Nickel and Douwe Kiela. Learning continuous hierarchies in the lorentz model
of hyperbolic geometry. In International Conference on Machine Learning, pages 3779–3788.
PMLR, 2018. 3
[36] Jiwoong Park, Junho Cho, Hyung Jin Chang, and Jin Young Choi. Unsupervised hyperbolic
representation learning via message passing auto-encoders. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pages 5516–5526, 2021. 4
[37] Xavier Pennec. Barycentric subspace analysis on manifolds. The Annals of Statistics,
46(6A):2711–2746, 2018. 3
[38] John G Ratcliffe. Foundations of Hyperbolic Manifolds, volume 149. Springer, 2 edition, 2006.
6, 12
[39] Frederic Sala, Chris De Sa, Albert Gu, and Christopher Ré. Representation tradeoffs for
hyperbolic embeddings. In International conference on machine learning, pages 4460–4469.
PMLR, 2018. 4, 13
[40] Rik Sarkar. Low distortion delaunay embedding of trees in hyperbolic plane. In International
Symposium on Graph Drawing, pages 355–366. Springer, 2011. 4
[41] Hiroyuki Sato, Hiroyuki Kasai, and Bamdev Mishra. Riemannian stochastic variance reduced
gradient algorithm with retraction and vector transport. SIAM Journal on Optimization,
29(2):1444–1472, 2019. 15
[42] Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-
Rad. Collective classification in network data. AI magazine, 29(3):93–93, 2008. 15
[43] Ryohei Shimizu, YUSUKE Mukuta, and Tatsuya Harada. Hyperbolic neural networks++. In
International Conference on Learning Representations, 2020. 4
[44] Stefan Sommer, François Lauze, Søren Hauberg, and Mads Nielsen. Manifold valued statis-
tics, exact principal geodesic analysis and the effect of linear approximations. In European
conference on computer vision, pages 43–56. Springer, 2010. 3
[45] Alexandru Tifrea, Gary Becigneul, and Octavian-Eugen Ganea. Poincare glove: Hyperbolic
word embeddings. In International Conference on Learning Representations, 2018. 4
18
[46] Abraham A Ungar. Gyrovector spaces and their differential geometry. Nonlinear Funct. Anal.
Appl, 10(5):791–834, 2005. 4
[47] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and
Yoshua Bengio. Graph attention networks. In International Conference on Learning Repre-
sentations, 2018. 14
[48] Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger.
Simplifying graph convolutional networks. In International conference on machine learning,
pages 6861–6871. PMLR, 2019. 14
[49] Chun-Hao Yang and Baba C Vemuri. Nested grassmanns for dimensionality reduction with
applications to shape analysis. In International Conference on Information Processing in
Medical Imaging, pages 136–149. Springer, 2021. 3, 5
[50] Miaomiao Zhang and Tom Fletcher. Probabilistic principal geodesic analysis. Advances in
Neural Information Processing Systems, 26:1178–1186, 2013. 3
[51] Yiding Zhang, Xiao Wang, Chuan Shi, Nian Liu, and Guojie Song. Lorentzian graph convo-
lutional networks. In Proceedings of the Web Conference 2021, pages 1249–1261, 2021. 11,
14
19