Path Signatures On Lie Groups
Path Signatures On Lie Groups
Editor:
Abstract
Path signatures are powerful nonparametric tools for time series analysis, shown to form
a universal and characteristic feature map for Euclidean valued time series data. We lift
the theory of path signatures to the setting of Lie group valued time series, adapting these
tools for time series with underlying geometric constraints. We prove that this generalized
path signature is universal and characteristic. To demonstrate universality, we analyze
the human action recognition problem in computer vision, using SO(3) representations for
the time series, providing comparable performance to other shallow learning approaches,
while offering an easily interpretable feature set. We also provide a two-sample hypothesis
test for Lie group-valued random walks to illustrate its characteristic property. Finally we
provide algorithms and a Julia implementation of these methods.
1. Introduction
Time series data is ubiquitous in modern data science, and may take values in a variety of
forms. Perhaps the most common is a collection of simultaneous multivariate real-valued
time series {γ i }N i
i=1 , where γ : [0, 1] → R. In this case, we may consider the entire collection
γ = (γ 1 , . . . , γ N ) as a path through Euclidean space, γ : [0, 1] → RN . The path signature
is a feature set that completely characterizes such paths, and has recently been applied to
several tasks in machine learning (Chevyrev and Kormilitzin, 2016; Lyons, 2014). Recent
work has provided the path signature with strong theoretical properties; namely that it is
a universal and characteristic kernel for time series in Euclidean space RN (Chevyrev and
Oberhauser, 2018).
However, in many scenarios, the data may have some geometric constraints, and may
be better represented by elements of a (non-Euclidean) manifold. In this case, the time-
varying data can be modelled as path on such a manifold, rather than on Euclidean space.
Lie groups are smooth manifolds equipped with a compatible group structure. Paths (or
time series) valued in Lie groups model a number of natural phenomena, including the
following.
• The special Euclidean group SE(n) is the Lie group of all rigid body motions in Rn .
The group SE(3) is often used to model the position and pose of a rigid body, such as
1
Darrick Lee and Robert Ghrist
• The special orthogonal group SO(n) is the Lie group of all rotations in Rn ; this is a Lie
subgroup of SE(n). The Lie group SO(3)k has recently been used to represent the
pose of a human by recording the relative rotations of k pairs of body parts (Vemula-
palli and Chellappa, 2016). Thus, human movement can be represented as a path in
SO(3)k . This representation has been used in the computer vision problem of human
action recognition, and Lie group methods have achieved state-of-the-art results in
this domain (Huang et al., 2017).
• The state of an oscillator may be described as an element of the circle S 1 , and collective
behavior of a network of oscillators can be describe by an element of the n-torus,
T n = (S 1 )n . The time evolution of oscillator networks can therefore be modelled as a
path on T n (Strogatz, 2000).
• The Euclidean space RN is the simplest example of a Lie group, where the group
operation is addition. The classical path signature for Euclidean space can be viewed
as a special case of path signatures on Lie groups.
In this paper, we extend path signatures to time series valued in Lie groups, and show
that this extension is also a universal and characteristic kernel.
1.1 Contributions
We lift the theory of path signatures for time series valued in Euclidean space to the setting
of time series valued in Lie groups, restricting ourselves to the class of piecewise regular
paths on Lie groups.
Definition 1 Let G be a Lie group. A path γ : [a, b] → G is regular if γt0 is continuous and
nonvanishing on the entire interval [a, b]. Such a path is piecewise regular if there exists a
partition a = t0 < t1 < . . . < tn = b such that γ is regular on each open subinterval (ti , ti+1 )
for all i. The pathspace – the space of all piecewise regular paths on the unit interval,
γ : [0, 1] → G – will be denoted P G.
Let G be a Lie group of dimension N , and let g be its Lie algebra (the tangent space at
the identity). We denote the underlying vector space of g by g ∼ = RN . The path signature
is a function on paths,
S : P G → T ((g)),
valued in a formal power series of tensors, T ((g)), where we may view the coefficients as
descriptors (or features) of the underlying path (or time series). Path signatures for general
manifolds were originally defined by Chen (1958), but not in a manner conducive to data
analysis. This paper gives a computationally clean derivation for path signatures on Lie
groups tuned for use in data analysis.
Our generalization is designed to be analogous to the Euclidean case as much as possible,
for ease of applicability. For example, the definition of the path signature for γ : [0, 1] → G
2
Path Signatures on Lie Groups
depends only on the derivative γ 0 : [0, 1] → g. We exploit one of the key properties of Lie
groups — that tangent vectors at a point correspond to elements of its Lie algebra g, a
vector space. This will permit a signature construction making use of iterated integrals as
per the Euclidean case.
In the Euclidean case G = RN , the Lie group is often conflated with its Lie algebra
r = RN , and the fact that the integration is performed in the Lie algebra is often not
made. By clarifying and emphasizing this point, the generalization to Lie groups illuminates
understanding of the classical Euclidean case.
From a machine learning perspective, the basic properties of the path signature as a
feature map provide several benefits.
• The signature is a feature set for a path as a whole, and can be used to compare time
series with varying numbers of time points.
• Defined as iterated line integrals, the path signature is invariant under reparametriza-
tion, and thus only depends on the order in which events occur.
• The signature is left translation invariant, meaning the signatures of paths that differ
by a constant element g ∈ G will be the same. This implies that the signature only
depends on the dynamics of the time series and is unconcerned with the initial point.
However, the most crucial property is that the path signature fully characterizes paths
up to tree-like equivalence; that is, the map S is injective, up to quotienting P G out by
an equivalence relation. This fact is originally due to Chen (1958) for the case of piecewise
regular paths on Lie groups, and later generalized by Hambly and Lyons (2010) to the case
of bounded variation paths in Rn .
Our main contribution is to apply this injectivity result to prove that a normalized
variant of the signature, S : P G → T ((g)), is a universal and characteristic feature map
for time series in G, when we equip T ((g)) with the structure of a Hilbert space. This is
proved in Section 4.2. This was originally shown for the Euclidean case by Chevyrev and
Oberhauser (2018). Such feature maps can be used to two large classes of machine learning
problems, in the context of kernel methods.
3
Darrick Lee and Robert Ghrist
where M(P G) denotes all finite regular Borel measures on P G, and S is appropriately
normalized. This allows us to consider probability measures as elements of a linear
space; furthermore, the norm induced by the Hilbert space structure coincides with
the maximum mean discrepancy (MMD) between measures.
We perform two experiments that demonstrate the efficacy of the path signature for
these two classes of problems. First, we consider the computer vision problem of human
action recognition in Section 5.1. We show that the path signature method is much easier to
use than shallow learning methods previously applied to this problem (Vemulapalli et al.,
2014; Vemulapalli and Chellappa, 2016) while providing comparable results. Second, in
Section 5.2, we consider a hypothesis testing problem for simulated random walks on the
Lie group SO(3). Here, we show that the Lie group valued path signature vastly outperforms
the Euclidean path signature.
Along the way, we will establish extensions of other properties of the path signatures to
Lie groups and discuss several concepts related to path signatures and data analysis on Lie
groups more broadly. A summary of these contributions is given below.
1. We provide a detailed exposition of Lie group valued time series, and discuss a notion
of scaling for such time series in Section 2.2. Scaling of data is sometimes required
when the data needs to be normalized, and we discuss how scaling affects the path
signature in Section 3.1. We also discuss the continuous interpretation of discrete time
series on Lie groups in Section 2.3.
3. It is well known that the Euclidean path signature is equivariant with respect to linear
transformations (Friz and Victoir, 2010). We show that path signatures are equivari-
ant under Lie group homomorphisms in general. Namely, given a homomorphism of
Lie groups F : G1 → G2 , where g1 and g2 are the respective Lie algebras, we define
the action of this homomorphism on the tensor algebra F∗ : T ((g1 )) → T ((g2 )), and
show in Section 3.4 that
S(F γ) = F∗ S(γ)
for all γ ∈ P G.
4
Path Signatures on Lie Groups
4. An important feature of the path signature is the interpretability of lower level sig-
nature terms. We discuss the extension of the lead-lag interpretation of second level
signature terms for Euclidean paths, as well as a topological interpretation of the first
level signature terms for abelian Lie groups in Section 3.6.
5. Path transformations, such as appending the time parameter or using a sliding win-
dow, are often used as a preprocessing step for Euclidean path signatures (Chevyrev
and Kormilitzin, 2016). We discuss these transformations in the context of breaking
reparametrization or left-translation invariance in Section 3.8. Empirical studies (Fer-
manian, 2019) have shown that the sliding window transformation (also called the
lead-lag transformation) provides good classification results, despite the lack of a the-
oretical explanation. We propose one explanation, which is that the sliding window
transformation breaks left-translation invariance, and we provide empirical evidence
in the experiments in Section 5.1.
6. We provide both algorithmic details and a Julia package for the computation of
path signatures valued in Lie groups, which can be found at https://github.com/
ldarrick/PathSignatures. For details, see Appendix A.
5
Darrick Lee and Robert Ghrist
was shown to be a universal and characteristic feature map in Chevyrev and Oberhauser
(2018). This exploits the recently formalized duality between universal and characteristic
kernels in Simon-Gabriel and Schölkopf (2018).
It is well known that Euclidean path signatures are translation invariant, and we will
show that Lie group path signatures are left translation invariant. Diehl and Reizenstein
(2019) has considered the related problem of determining the Euclidean path signature
terms which are invariant under some matrix Lie group action.
We begin in Section 2 by reviewing basic facts on Lie groups and Lie algebras, and
provide an exposition on continuous and discrete time series on Lie groups. We then define
the path signature for Lie groups in Section 3, and discuss the bijection between P G and
P RN , the equivariance of the path signature, detecting lead-lag behavior in time series,
and path transformations. In Section 4, we provide a brief overview of kernel methods
and prove our main result, which shows that the path signature kernel is universal and
characteristic. Finally, in Section 5, we apply the path signature on Lie groups to a human
action classification problem and a hypothesis testing problem involving random walks on
SO(3).
1.3 Notation
Throughout this paper, we will denote the time parameter for a path γ : [0, 1] → G using
a subscript t, meaning γt := γ(t). Derivatives are shown using the prime notation, as in
γt0 := dγ N
dt (t). If we have a path in Euclidean space α : [0, 1] → R , we will use superscripts
to represent the components, such as α = (α1 , α2 , . . . , αN ). If G is a Lie group, we will use
g to denote its Lie algebra and use ḡ to be the underlying vector space of g (forgetting the
Lie bracket structure).
Continuous paths will often be denoted using the lowercase Greek symbols α, β, γ, and
the space of all piecewise regular paths in G is denoted P G. For T ∈ N, we let [T ] =
{1, . . . , T } denote the finite set of integers up to T . Discrete time series will be distinguished
using the hat notation γ̂ : [T ] → G, and the space of all discrete time series in G will be
denoted P̂ G.
There are also several parameters that will be used consistently throughout the paper.
Unless otherwise specified, we reserve the following symbols for the given meaning.
• N is the dimension of the Lie group G that paths take values in;
• T + 1 is the length of a discrete time series (so that the discrete derivative will be of
length T );
6
Path Signatures on Lie Groups
for all g0 , g ∈ G. This implies that all left-invariant vector fields X are defined by their
value at the identity e ∈ G,
X(g) = Lg∗ X(e),
and thus, we obtain a one-to-one correspondence between left-invariant vector fields and
the tangent space at the identity, which we denote by g := Te G. Vector fields act on smooth
functions f : G → R, and we define an operation of left-invariant vector fields X and Y by
where [X, Y ] is also left-invariant. This provides g with the structure of a Lie algebra, where
the Lie bracket [·, ·] : g × g → g is a bilinear mapping such that for all X, Y, Z ∈ g
[X, Y ] = −[X, Y ]
[X, [Y, Z]] + [Z, [X, Y ]] + [Y, [Z, X]] = 0.
Similarly, left translation induces a map L∗g0 : Tg∗0 g G → Tg∗ G on cotangent spaces. A
1-form ω ∈ T ∗ G is called left-invariant if
for all g, g0 ∈ G. Again, we obtain a correspondence between left-invariant 1-forms and the
cotangent space at the identity via the property
Thus, we may identify the left-invariant 1-forms by the dual of the Lie algebra, g∗ .
7
Darrick Lee and Robert Ghrist
This exponential map provides a way to move between a Lie group and its Lie algebra.
Proposition 4 The exponential map exp : g → G is smooth and d(exp)0 = id. Thus,
exp is a diffeomorphism between an open neighborhood of the origin 0 ∈ g and an open
neighborhood of the identity e ∈ G.
Thus, if elements are near the origin, we can define an inverse map.
Definition 5 Suppose U ⊂ g is a neighborhood of the origin such that the exponential map
is a diffeomorphism. Let V = exp(U ). The logarithm map on V is defined to be
We will denote the duals of these basis vectors to be ωi = ei ∈ g∗ . For all matrix Lie
groups, the Lie exponential and logarithm are simply the matrix exponential and logarithm.
Suppose θ ∈ R. The exponential map in these three basis directions gives us
cos θ − sin θ 0
exp(θe1 ) = sin θ cos θ 0 ,
0 0 1
cos θ 0 sin θ
exp(θe2 ) = 0 1 0 ,
− sin θ 0 cos θ
1 0 0
exp(θe3 ) = 0 cos θ − sin θ .
0 sin θ cos θ
8
Path Signatures on Lie Groups
These are exactly the rotation matrices about the z, y, and x axes respectively. Therefore, we
may think of the basis vectors ei of the Lie algebra as infinitesimal rotations in the respective
directions. In particular, given a path γ ∈ P (SO(3)), the value ωi (γt0 ) corresponds to the
infinitesimal rotation of γ at time t in the direction of ei . If we integrate this over the
domain of the path,
Z 1
ωi (γt0 )dt,
0
we obtain the cumulative rotation of γ in the direction of ei over the unit interval. This
interpretation will be important to keep in mind when we define the path signature in Sec-
tion 3.
Finally, we briefly discuss the Riemannian structure of Lie groups. Recall that a Rie-
mannian metric on a smooth manifold M is the assignment of an inner product h·, ·ip to
the tangent space Tp M for every point p ∈ M , which varies smoothly. Specifically, this
means that if X, Y are smooth vector fields defined on a neighborhood of p, then the map
p 7→ hXp , Yp ip is smooth. On a Lie group, we often want a Riemannian metric that is
compatible with the algebraic structure of G. A Riemannian metric is left-invariant if
Namely, evaluating the inner product h , ig simply corresponds to viewing the tangent
vectors as elements of the identity, and then evaluating the chosen inner product on g. We
will assume that all Riemannian metrics under discussion are left-invariant, and simply call
them Riemannian metrics.
Note that since the Riemannian metric is left invariant, this metric is also left invariant,
9
Darrick Lee and Robert Ghrist
for all h ∈ G. The more familiar notion of length in the path signature literature is the
1-variation of a path.
Using the metric induced by the Riemannian metric, we may consider the 1-variation
length of paths in G. Under the piecewise regular hypothesis, these two lengths are equiv-
alent.
At this point, in the case of paths on Euclidean space, we may use the 1-variation to
define a metric on P RN0 , which are the paths which start at the origin. Given a Lie group
G with a left-invariant Riemannian metric, we could follow the same procedure to obtain
a metric space structure on P Ge . However, this is not the metric space structure on P G
that is the most compatible with the path signature. We will defer this discussion until
Section 3.3.
The path space P RN is endowed with a vector space structure since RN itself is a vector
space. Similarly, we can endow P G with a group structure by pointwise multiplication,
where the identity is the constant path at the identity, and the inverse to a path γ ∈ P G
is the pointwise inverse. However, we are missing a notion of scaling for paths in P G, and
such an operation is important to have in machine learning, since algorithms may require
normalization of data. Such a scaling is obtained by proving a correspondence between
paths in G and paths in g, and then transferring the scaling operation from g to G.
This is done by considering paths on G from the point of view of differential equations.
We have the following existence and uniqueness theorem for first order ordinary differential
equations. Let P g denote the space of piecewise continuous paths γ : [0, 1] → g which are
right continuous, meaning limt↓t0 γt = γt0 .
P Gg = {γ ∈ P G : γ0 = g}.
10
Path Signatures on Lie Groups
Corollary 10 Suppose G is a Lie group and g its Lie algebra. The map Ψg : P g → P Gg ,
which takes f ∈ P g to the solution of the ODE in Equation 2 with initial condition γ0 = g,
is a bijection.
Proof Firstly, the map Ψg is well defined by the existence and uniqueness theorem above.
The inverse to Ψg can be defined by taking the derivative at every differentiable point.
Suppose γ ∈ P Gg , and let d(γ) ⊂ [0, 1] denote the set of points such that γ is differentiable.
Note that [0, 1] − d(γ) is a finite set since γ is piecewise regular. Now, define Ψ−1 0
g (γ)(t) = γt
for all t ∈ d(γ), and at the nondifferentiable points by right continuity
Ψ−1 0
g (γ)(t) = lim γs .
s↓t
We can view P g as a Lie algebra, with pointwise vector space operations, and pointwise
Lie bracket. Because the group structure of P G and the Lie algebra structure of P g are
defined pointwise, the map Ψg is compatible with Lie algebra morphisms induced by Lie
group morphisms. Namely, if F : G → H is a Lie group morphism, we obtain a group
homomorphism F : P G → P H by applying the map pointwise. Analogously, if F∗ : g → h
is the induced Lie algebra morphism, we obtain a Lie algebra morphism F∗ : P g → P h.
The following lemma is immediate since the group structure on P G and the Lie algebra
structure on P g are defined pointwise.
The map Ψg allows us to view paths on Lie groups as paths in a linear space, while
retaining all first order differential information. We can use the fact that many operations
for paths on RN are defined via operations on the Lie algebra, and thus generalize these
operations to Lie groups.
For a path α ∈ P RN and λ ≥ 0, denote the vector space scaling operation as
(λα)t := λαt .
However, another way of viewing the scaling operation for paths that begin at the origin is
by scaling in the Lie algebra. Suppose λ ≥ 0, and denote the vector space scaling in a Lie
algebra g by cλ : g → g.
Lemma 12 Let α ∈ P RN
0 . Then
λα = Ψ0 ◦ cλ ◦ Ψ−1
0 (α).
11
Darrick Lee and Robert Ghrist
Definition 13 Suppose G is a Lie group and g its Lie algebra. Let γ ∈ P G and λ ≥ 0.
We define the Lie algebra scaling of γ by λ to be
λ · γ := Ψγ0 ◦ cλ ◦ Ψ−1
γ0 (γ). (3)
Remark 14 We highlight three important differences between vector space scaling for paths
in RN and Lie algebra scaling for paths in an arbitrary Lie group G, and provide a reason
for each.
1. Returning to the setting of paths in RN , the two notions of scaling differ slightly when
the path does not start at the origin. If we have α ∈ P RN such that α0 = x, then
(λα)0 = λx, while (λ · α)0 = x. However, if we align the initial points, the paths
coincide,
(λα) − λx = (λ · α) − x.
This difference is due to the fact that arbitrary Lie groups do not have a natural scaling
operation. However, if our Lie group was equipped with a suitable scaling operation,
such as a Carnot group (Le Donne, 2017), then we would be able to do define a scaling
operation that coincides with the vector space scaling in P RN .
e2tX : t ∈ [0, 21 )
γt =
eX e(2t−1)Y : t ∈ [ 12 , 1].
Here, we have γ1 = eX eY and (−1 · γ)1 = e−X e−Y , which are not inverses in general.
Thus, we see that the obstruction to this interpretation is the noncommutativity of
arbitrary Lie groups. However, in the setting of abelian Lie groups, such an interpre-
tation would hold.
12
Path Signatures on Lie Groups
3. By definition, the vector space scaling in P RN obeys the distributive law: λ(α + β) =
(λα) + (λβ) for α, β ∈ P RN and λ ∈ R. In other words, the vector space scaling is a
pointwise Lie group homomorphism for RN . However, cλ : g → g is not a morphism
of Lie algebras in general since cλ ([X, Y ]) = λ[X, Y ] 6= λ2 [X, Y ] = [cλ X, cλ Y ]. Thus,
it cannot be the induced map of an underlying Lie group homomorphism for G, so
the Lie algebra scaling for G is not distributive, λ · (αβ) 6= (λ · α)(λ · β), in general.
In the case of an abelian Lie group H, the associated Lie algebra h is abelian so that
[X, Y ] = 0 for all X, Y ∈ h, and thus Lie algebra scaling can be viewed as a pointwise
Lie group morphism.
Due to these remarks, we must keep in mind that the scaling operation for paths in Lie
groups is not compatible with the algebraic structure of G.
Remark 15 Here, we will assume that discrete time series are uniformly sampled at integer
times. This does not result in any loss of generality due to the reparametrization invariance
of the path signature, given in Proposition 22.
13
Darrick Lee and Robert Ghrist
is addition, so we should interpret all of the products as sums. However, there are two
essential differences between the case of arbitrary Lie groups and Euclidean space.
Firstly, for an arbitrary Lie group G, the logarithm map is only defined in a neighborhood
of the identity. The two reasons the logarithm may not be defined in a larger neighborhood
are the loss of injectivity and the loss of surjectivity of the exponential map. On any
compact Lie group, the exponential map will not be injective at any point. In this case,
we can define the logarithm to be the value closest to the origin, but non-injectivity may
still occur. For example, the point antipodal to the identity in S 1 has no unique logarithm
since there are two paths of equal distance to the identity. However, if we perturb the
target point in either direction, there exists a unique shortest path. This implies that by
undersampling the underlying time series, we may infer incorrect information. The case of
S 1 is exactly the situation encountered in the Nyquist sampling theorem.
The exponential map is not always surjective, with the simplest examples being non-
connected Lie groups. However, connected Lie groups such as SL(2, R) can still have non-
surjective exponential maps. In these cases, discrete derivatives may not exist, and finer
sampling is required so that the difference between adjacent points γ̃i−1 γ̃i+1 is closer to the
identity and has a well-defined logarithm. However, for compact Lie groups such as SO(3),
the Lie exponential map is surjective.
Secondly, the interpolation defined here may not be a geodesic connecting the two points.
Suppose h is a Riemannian metric on G. In general, geodesics do not coincide with the
one-parameter subgroups of G. In other words, in these cases, the Riemannian exponential
map and the Lie exponential map are not the same. However, for bi-invariant metrics, they
coincide.
Theorem 16 The Lie exponential map and the Riemannian exponential map at the identity
agree on Lie groups with bi-invariant metrics.
Thus, for all Lie groups equipped with bi-invariant metrics, we may continue to interpret
the interpolation as a geodesic interpolation. In fact, this holds for all compact Lie groups.
From this discussion, we find that for a compact Lie group G, the interpretation of
discrete time series on G is similar to the case of RN , with the main difference being the
non-injectivity of the exponential map.
14
Path Signatures on Lie Groups
extends to the case of Lie groups. This result provides a Euclidean representation of Lie
group valued time series, and can thus be used to apply classical Euclidean data analysis
techniques to Lie group valued time series. To the authors’ knowledge, this result has not
previously appeared in literature.
We then consider the extension of the equivariance property of path signatures. This
is followed by an interpretation of the second-level signature terms as indicators of lead-lag
behavior between the directions corresponding to our choice of basis vectors for the Lie
algebra g. Finally, we close this section by discussing computational aspects of the path
signature for discrete time series, as well as symmetry breaking path transformations which
can be used as a preprocessing step.
In this section, we use (e1 , . . . , eN ) to denote an ordered basis of g and use (ω1 , . . . , ωN )
to denote the dual basis of g∗ such that ωi (ej ) = δi,j , where δi,j is the Kronecker delta.
We can also present the definition in a non-inductive way. Let ∆m be the standard
m-simplex
By collapsing the inductive definition, we can write the path signature of γ with respect to
I = (i1 , . . . , im ) as
Z
I
S (γ) = ωi1 (γt01 ) . . . ωim (γt0m ) dt1 . . . dtm . (6)
∆m
We can amalgamate the path signatures with respect to every multi-index I into an
element of a tensor algebra.
15
Darrick Lee and Robert Ghrist
Definition 19 Suppose V is a real vector space of dimension N . The tensor algebra with
respect to V is defined to be
Y
T ((V )) = V ⊗m .
m≥0
• (s + t)I = sI + tI ,
• (λt)I = λtI ,
Let ḡ be the underlying vector space of the Lie algebra g. Let e1 , . . . , eN be a basis for
g. We define the path signature of Γ ∈ P G to be
X X
S(γ) := 1 + S I (γ)ei1 ⊗ . . . ⊗ eim ∈ T ((ḡ)). (7)
m≥1 |I|=m
Remark 20 For path signatures defined on Euclidean space RN , we often choose the stan-
dard 1-forms (dx1 , . . . , dxN ) to be the basis of r, the Lie algebra of RN . Suppose α ∈ P RN .
We can also write our path component-wise as α = (α1 , . . . , αN ), where each αi : [0, 1] → R.
Then, evaluation of the standard 1-forms is simply dxi (αt0 ) = (αi )0t . Thus, in the Euclidean
case, the definition of the path signature reduces to
Z
I
S (α) = (αi1 )0t1 . . . (αim )0tm dt1 . . . dtm . (8)
∆m
Proof It suffices to show that S I (gγ) = S I (γ) for all multi-indices I. Note that we have
Specifically, this implies that γt0 and gγt0 are represented by the same element in the Lie
algebra g. Therefore for any ω ∈ g∗ , we have ω(gγt0 ) = ω(γt0 ). Thus, S I (gγ) = S I (γ) for all
I.
16
Path Signatures on Lie Groups
Proof This is the Change of Variables Theorem. Reparametrization invariance of the first
level of the signature is given as
Z d Z d Z b
0 0 0
i
S (γ ◦ φ) = ωi ((γ ◦ φ)t )dt = ωi (γφt )φt dt = ωi (γτ0 )dτ = S i (γ).
c c a
Invariance for higher order terms is shown by induction using the same argument.
In particular this proposition justifies our choice of only considering paths parametrized by
[0, 1], as any other path can be reparametrized into this domain. Next, we would like to
understand how scaling of paths in G given in Definition 13 affects the path signature. Note
that the vector space scaling in ḡ induces a dilation map in T ((ḡ)). Explicitly, we define
the map δλ : T ((ḡ)) → T ((ḡ)) as
We have seen that the group structure on G allows us to define a group structure on P G
by pointwise multiplication. The group structure on G allows us to obtain another group
structure on a quotient of P G where the group operation is given by concatenation. Let
α, β ∈ P G. The concatenation of α and β is defined to be
: t ∈ [0, 12 )
α2t
(α ∗ β)t = −1
α1 (β0 ) β2t−1 : t ∈ [ 12 , 1].
The inverse of a path γ is defined to be the same path, but in the reverse direction
(γ −1 )t = γ1−t .
17
Darrick Lee and Robert Ghrist
ζ −1
β β
ζ
α α
α ∗ ζ ∗ ζ −1 ∗ β α∗β
Theorem 25 (Chen (1958)) Every piecewise regular path γ ∈ P G has a unique irre-
ducible reduction up to reparametrization.
Proof Let γ, γ1 , γ2 , γ3 ∈ P G. Note that the subscript here denotes distinct paths, and
does not denote the time parameter. By definition the reduction of γ ∗ γ −1 is the constant
path, so γ ∼t γ.
Next, if γ = α ∗ ζ ∗ ζ −1 ∗ β, for paths α, β, ζ ∈ P G, then γ −1 = β −1 ∗ ζ ∗ ζ −1 ∗ α−1 .
Thus, a path is reducible if and only if its inverse is reducible. Additionally, the reduction
β −1 ∗ α−1 of γ −1 is the inverse of the reduction α ∗ β of γ. Now, suppose γ1 ∼t γ2 so that
γ1 ∗ γ2−1 is tree-like. By the above argument, γ2 ∗ γ1−1 is also tree-like, so γ2 ∼t γ1 .
Finally, the concatentation α ∗ β of two tree-like paths is also tree-like, by performing
all the reductions of α and then performing all the reductions on β. Suppose γ1 ∼t γ2 and
γ2 ∼t γ3 . Then, γ1 ∗ γ3−1 is a reduction of (γ1 ∗ γ2−1 ) ∗ (γ2 ∗ γ3−1 ), and the latter path is
tree-like since it is a concatenation of two tree-like paths. By the uniqueness of irreducible
reductions, γ1 ∗ γ3−1 is tree-like. Thus, γ1 ∼t γ3 .
18
Path Signatures on Lie Groups
Theorem 30 (Chen (1958)) Suppose G is a real Lie group. Let α, β ∈ P G. Then S(α) =
S(β) if and only if α and β are tree-like equivalent.
Chen also showed that the signature is a group homomorphism. Namely, suppose α, β ∈
P G. Chen’s identity (Chen, 1954) states that
We will also require an internal multiplicative structure on the path signature coefficients
which is an immediate generalization of the Euclidean path signature.
and
We denote by Sh(k, l) the set of (k, l)-shuffles. Given two finite ordered multi-indices I =
(i1 , . . . , ik ) and J = (j1 , . . . , jl ) , let R = (r1 , . . . , rk , rk+1 , . . . rk+1 ) = (i1 , . . . , ik , j1 , . . . , jl )
be the concatenated multi-index. The shuffle product of I and J is defined to be the multiset
I J = rσ(1) , . . . rσ(k+l) : σ ∈ Sh(k, l) .
I J = {(1, 2, 2, 3), (1, 2, 2, 3), (2, 1, 2, 3), (1, 2, 3, 2), (2, 1, 3, 2), (2, 3, 1, 2)} .
Theorem 33 Let I and J be multi-indices in [N ], of lengths k and l respectively, and
suppose γ ∈ P G. Then X
S I (γ)S J (γ) = S K (γ). (11)
K∈I J
19
Darrick Lee and Robert Ghrist
Proof Let R = (r1 , . . . , rk , rk+1 , . . . rk+l ) = (i1 , . . . , ik , j1 , . . . , jl ). Writing out the signature
on the left side of the equation using Equation 6, we get
Z Z
0 0
ωi1 (γt1 ) . . .ωik (γtk )dt1 . . . dtk ωj1 (γt01 ) . . . ωjl (γt0l )dt1 . . . dtl
∆k Z ∆ l
∆k × ∆l = {(t1 , . . . , tk+l ) : 0 < t1 < . . . < tk < 1, 0 < tk+1 < . . . < tk+l < 1}
G
= (tσ(1) , . . . , tσ(k+l) ) : 0 < t1 < . . . < tk+l < 1 .
σ∈Sh(k,l)
20
Path Signatures on Lie Groups
denote the standard 1-forms of RN , and define ωi = (φ∗ )−1 (dxi ). Let SG : P G → T ((RN ))
be the path signature map for G with respect to the ordered basis (ω1 , . . . , ωN ) of g∗ . Then,
there exists a bijection Φ : P RN
0 → P Ge such that SR (γ) = SG (Φ(γ)) for all γ ∈ P R .
N
Proof The construction of the map Φ is derived from Corollary 10. Let ΨR : P r → P RN0
and ΨG : P g → P Ge be the bijections from Corollary 10 for RN and G respectively. Now,
define Φ by
Ψ−1 φ Ψ
Φ : P RN
0 −
−−R
→ Pr − G
→ P g −−→ P Ge .
The idea is that we start with a path γ ∈ P RN
0 , and apply the following maps:
1. Ψ−1 0
R : take the derivative γ to obtain a path in r
5
0.1
2.5 0.05
0 0
-2.5 -0.05
-5 -0.1
-5 -2.5 0 2.5 5 -0.1 -0.05 0 0.05 0.1
Because all three maps are bijective, Φ is also bijective. To show that the signatures
are invariant under this mapping, let I = (i1 , . . . , ik ) and γ ∈ P RN 0 . The path signature of
γ with respect to I is
Z
I
SR (γ) = dxi1 (γt1 ) . . . dxik (γtk )dt1 . . . dtk .
∆k
Note that the derivative of Φ(γ) is given by Φ(γ)0t = φ(γt0 ) and thus, the path signature of
Φ(γ) with respect to I is
Z
I
SG (Φ(γ)) = ωi1 (φ(γt01 )) . . . ωik (φ(γt0k ))dt1 . . . dtk
k
Z∆
= φ∗ (ωi1 )(γt01 ) . . . φ∗ (ωik )(γt0k )dt1 . . . dtk
∆k
= SRI (γ).
21
Darrick Lee and Robert Ghrist
The final equality holds because the dual isomorphism φ∗ takes ωi to dxi . Thus, SRI (γ) =
I (Φ(γ)) for all Γ ∈ P RN and all multi-indices I.
SG e
where SM only retains information about the first M levels of the path signature. In
addition, we define the projection map
πm : T ((ḡ)) → ḡ⊗m
to a particular tensor level. Such a map can also be defined on the truncated tensor algebra
πm : T ≤M (ḡ) → ḡ⊗m , and we denote all such maps in the same manner.
By stability of the path signature, we mean to say that the truncated signature map
SM : P G → T ≤M (ḡ) is Lipschitz continuous. In order to disucss such a notion, we must
provide both P G and T ≤M (ḡ) with metrics. We begin with the metric on T ((ḡ)), which is
required in Section 4.2 and is analogous to the metric on T ≤M (ḡ).
Recall that a basis (e1 , . . . , eN ) of ḡ induces a natural inner product on ḡ by defining
the basis to be orthonormal. This extends to an inner product structure on ḡ⊗m , and given
tk ∈ ḡ⊗m , we will denote the norm by ktkm . In addition, this also extends to an inner
product on T ≤M (ḡ). Let s, t ∈ T ≤M (ḡ). Such an inner product and norm are defined to be
v
M X
u M X
X uX
I I
hs, ti = s t , ktk = t (tI )2 . (12)
m=0 |I|=m m=0 |I|=m
Then, we can use the norm to define a metric on both ḡ⊗m and T ≤M (ḡ). Namely, given
sm , tm ∈ ḡ⊗m and s, t ∈ T ≤M (ḡ), we have
dm (sm , tm ) = ksm − tm km
d(s, t) = ks − tk.
Note that this norm on the tensor algebra extends to T ((ḡ)), where the inner product and
norm for s, t ∈ T ((ḡ)) are defined as in (12) with M → ∞. In this case, the inner product
and norm may be infinite. However, image of the path signature lies in a subalgebra of
T ((ḡ)) where the norm is finite. Namely, we define
22
Path Signatures on Lie Groups
Proof Without loss of generality, we suppose that γ is parametrized by length such that
it is defined as γ : [0, L] → G, where L is the length, and kγt0 k = 1 for all differentiable t;
this assumption is valid due to the reparametrization invariance of the signature. We will
inductively bound each signature term. At the first level, we have
Z t
|S i (γ)(t)| ≤ |ωi (γs0 )|ds
0
≤ t,
using the fact that |ωi (γt0 )| ≤ kγt0 k = 1. Assume that for any multi-index I = (i1 , . . . , im−1 )
of length m − 1, we have
tm−1
|S I (γ)(t)| ≤ .
(m − 1)!
Now consider the multi-index I = (i1 , . . . , im ) of length m. Using the induction hypothesis,
and the recursive definition of the signature, we have
Z t
I
|S (γ)(t)| ≤ |S (i1 ,...,im−1 ) (s)||ωim (γs0 )|ds
0
Z t
sm−1 tm
≤ ds = .
0 (m − 1)! m!
where the last inequality uses the fact that there are N m multi-indices of length m.
23
Darrick Lee and Robert Ghrist
The main reason for this is that the computation of |β −1 α|1−var depends fundamentally
on the adjoint action of the Lie group on the Lie algebra, which is governed by the Lie
bracket. Namely, the adjoint action is trivial if and only if the Lie bracket is zero. However,
the path signature ignores the Lie bracket structure, so the prospect of Lipschitz continuity
of the signature with respect to this metric is problematic.
We therefore consider a different metric. Note that our path signature computations
have consistently been performed on the underlying vector space of the Lie algebra ḡ, so it
seems natural to directly define a metric using the derivatives α0 , β 0 ∈ P̂ g. One such notion
of a distance would be the L1 distance between these derivatives
Z 1
kα0 − β 0 kL1 = kαt0 − βt0 kg dt
0
which in particular does not use the Lie bracket structure. In fact this L1 distance is exactly
the 1-variation of the corresponding paths Φ−1 (α), Φ−1 (β) ∈ P RN 0 , given by the bijection
in Proposition 34. Thus, we can define the metric on P Ge to be
Note that equipped with this metric, the map Φ is trivially an isometry.
is equipped with the 1-variation metric, and P Ge is equipped with the metric dR . Then, Φ
is an isometry.
Using this isometry, stability for Lie group path signatures is a direct corollary of stability
for Euclidean path signatures.
L ≥ max{|α|1−var , |β|1−var }.
24
Path Signatures on Lie Groups
between the underlying vector spaces. Because linear transformations induce maps on tensor
products of the space F∗⊗m : ḡ⊗m
1 → ḡ⊗m
2 , we also get an induced map of algebras between
tensor algebras
If (e1 , . . . , eN1 ) is an ordered basis for g1 and (f1 , . . . , fN2 ) is an ordered basis for g2 , then
we can write F∗ : g1 → g2 as an N2 × N1 matrix in terms of these bases, which we call M .
We can describe the action of F∗ in the tensor algebra using this matrix. Let t ∈ T ((ḡ1 )).
In general, the action on the order m elements tm ∈ ḡ⊗m is a tensor-matrix multiplication,
as described in Pfeffer et al. (2019), in which all m sides of the tensor tm are multiplied by
the matrix M . This can be written out as
∞ X
X
F∗ t = tI (M ei1 ) ⊗ (M ei2 ) ⊗ . . . ⊗ (M eik ).
m=0 |I|=m
The low order tensors can be written out in usual matrix notation. Consider t1 as a
column vector. The action on first order elements is matrix multiplication,
(F∗ t)1 = M t1 .
(F∗ t)2 = M t2 M | .
For higher orders, we can no longer use matrix notation, so we explicitly define the
action for a given index. Let J = (j1 , . . . , jn ) be a multi-index where jk ∈ [N2 ]. Then, the
element of F∗ t corresponding to the multi-index J is
N1 X
X N1 N1
X
J
(F∗ t) = ... t(i1 ,...,in ) Mj1 ,i1 Mj2 ,i2 , . . . , Mjn ,in .
i1 =1 i2 =1 in =1
Proposition 39 Let G1 and G2 be Lie groups, with Lie algebras g1 and g2 respectively.
Suppose F : G1 → G2 is a Lie group morphism and γ ∈ P (G1 ). Then
S(F γ) = F∗ S(γ).
Proof The proof of this claim is simply due to the linearity of integrals and 1-forms.
Consider the multi-index J = (j1 , . . . , jm ). Then,
Z
J
S (F γ) = νj1 (F∗ γt01 ) . . . νjm (F∗ γt0m )dt1 . . . dtm .
∆m
25
Darrick Lee and Robert Ghrist
Consider a single factor in the integrand. Using the basis (e1 , . . . , eN1 ) for g, write the
derivative γ 0 as
N1
X
γt0 = cit ei
i=1
where ci : [0, 1] → R are the component paths. Then, since νj (F∗ γt0 ) denotes the j th
component of F∗ γt0 , we can write this as
N1
X
νj (F∗ γt0 ) = Mj,i ωi (γt0 ).
i=1
N1
X N1
X Z
J
S (F γ) = ... (Mj1 ,i1 . . . Mjm ,im ) ωi1 (γt0 ) . . . ωim (γt0 )dt1 . . . dtm
i1 =1 in =1 ∆m
N1
X N1
X
= ... (Mj1 ,i1 . . . Mjm ,im ) S (i1 ,...,im ) (γ)
i1 =1 in =1
J
= (F∗ S(γ)) .
with φ monotone and winding around the circle at least twice, the winding condition en-
forcing nontrivial repetition.
Consider the interpretation for Euclidean paths in R2 . Suppose γ = (γ 1 , γ 2 ) ∈ P R2 is
a cyclic time series. We say that the component γ 1 exhibits a cyclic leading behavior with
respect to the component γ 2 if the following two conditions hold:
26
Path Signatures on Lie Groups
The first condition can be viewed as a reparametrization invariant definition of a time series
γ 1 leading another time series γ 2 . The second condition is used because we are working
with cyclic time series, so we also consider the negative influence of γ 2 on γ 1 . We may think
of this phenomena as a feedback loop in which γ 1 positively influences γ 2 and γ 2 negatively
influences γ 1 . The standard example of such behavior is γt = (sin(t), − cos(t)).
To quantify what we mean by large or small in the two conditions above, we translate
the time series such that γ0 = (0, 0) and interpret large (small) to mean positive (negative).
Then, a measure for these two conditions are given by S 1,2 (γ) and −S 2,1 (γ) respectively,
Z 1 Z 1
S 1,2
(γ) = γt1 (γ 2 )0t dt, S 2,1
= (γ) γt2 (γ 1 )0t dt.
0 0
1 1 1 2 0
Z
1 1,2
1,2
A (γ) = 2,1
S (γ) − S (γ) = γ (γ )t − γt2 (γ 1 )0t dt.
2 2 0 t
Because the signature is translation invariant, the translation to the origin described
above does not affect this measure. Moreover, if we consider a time series γ ∈ P RN , then
we can consider all pairwise cyclic leading behavior between components. We can place all
of this information into a matrix called the lead matrix, A(γ), which has entries
1 i,j
Ai,j (γ) = S (γ) − S j,i (γ) .
(15)
2
The entries Ai,j (γ) have a geometric interpretation in terms of the signed area of the
path, as per Baryshnikov and Schlafly (2016).
An example of the second level signatures and the signed area is shown in the figure
below.
Figure 3: Second level signature computations S 1,2 (left), S 2,1 (middle), and the signed are
A1,2 (right). Blue represents positive area, while red represents negative area.
Returning to the setting of Lie groups, we can define the lead matrix of a path γ ∈ P G
in the same manner, but the interpretation must be slightly modified. Writing out the
27
Darrick Lee and Robert Ghrist
The inner integral is simply S i (γ)t and represents the cumulative variation of the path in
the direction of ei (the dual of ωi ), which is the analogue of the displacement in Euclidean
space. Thus, for a cyclic time series γ ∈ P G, we say that the ei direction exhibits cyclic
leading behavior with respect to the ej direction if the following holds:
1. (positive influence) when S i (γ)t is positive (negative), then ωi (γt0 ) is positive (neg-
ative), and
Thus the lead matrix, as defined in Equation 15, can be interpreted as a measure of this
cyclic leading behavior for Lie group time series. An example of this interpretation is given
in Section 5.1.
However, the geometric interpretation in terms of signed area is no longer available.
This is because any area computation on Lie groups will require second-order differential
information about the paths, but path signatures are only defined using first order differ-
ential information. This suggests that an interpretation in terms of areas on the Lie group
will not be possible. However, by using Proposition 34, the value Ai,j (γ) can still be inter-
preted as the signed area of the corresponding path Φ−1 (γ), where Φ is the bijection given
in Proposition 34.
We use the notation α ' β if the paths α and β are homotopic relative to endpoints.
Loosely speaking, two paths are homotopic relative to endpoints if their endpoints co-
incide, and there exists a continuous deformation from one path to the other. Namely,
28
Path Signatures on Lie Groups
For left-invariant forms on Lie groups, there is a simple way to determine whether the
form is closed. We begin with the invariant formula for the exterior derivative (Lee, 2003).
Let ω be a 1-form on G, and X, Y are vector fields on G, then
dω(X, Y ) = ω([X, Y ])
since ω(X) and ω(Y ) are constant functions. Thus, the left invariant form ω is closed if
and only if ω([X, Y ]) = 0 for all X, Y ∈ g. In particular, this implies that all left-invariant
1-forms are closed on abelian Lie groups such as RN and T N , since [X, Y ] = 0 for all
X, Y ∈ g. However, there are no closed left invariant 1-forms on SO(3) since a nontrivial
ω ∈ so(3)∗ must be nonzero for at least some Z ∈ so(3). However, for all Z ∈ so(3), there
exist X, Y ∈ so(3) such that Z = [X, Y ]. In fact, this argument extends to all semisimple
Lie groups, and thus there are no closed left-invariant 1-forms on any semisimple Lie group.
29
Darrick Lee and Robert Ghrist
Definition 41 Let V be a real vector space. The tensor exponential exp⊗ : V → T ((V )) is
defined to be
v ⊗m
(exp⊗ (v))m = .
m!
Then, we may write the path signature of γ = exp(vt) to be
Therefore, by the above computation of the path signature of an exponential path and
Chen’s identity, we define the continuous path signature of the discrete time series to be
By using tensor operations, this formula provides an effective implementation for the com-
putation of the path signature.
An alternative approach is to compute an approximation of the path signature for dis-
crete time series.
ˆ m = {(t1 , . . . , tm ) ∈ [T ]m
∆ : 0 ≤ t1 < t2 < . . . < tm ≤ 1}.
T
30
Path Signatures on Lie Groups
Ŝ : P̂ G → T ((g)).
The discrete path signature can be viewed as an approximation to the continuous path
signature. Let γ ∈ P G be a continuous path. Given a partition π = (0 = t1 < t2 < . . . <
tT +1 = 1), the discretization of γ with respect to π, denoted γ̂ (π) : [T + 1] → G, is defined
to be
(π)
γ̂i := γti .
The following proposition in Kiraly and Oberhauser (2019) shows that the discrete signature
indeed approximates the continuous path signature.
By applying the map Φ : P RN → P G from Proposition 34, and using the fact that it is
an isometry, we immediately get the following corollary for Lie group valued paths.
31
Darrick Lee and Robert Ghrist
TTime : P Ge → Pe(G × R)
γt 7→ (γt , t). (17)
TIdInit : P G → P Ge
γ 7→ `γ0 ∗ γ. (18)
For discrete time series, this simply amounts to appending the identity element to the
beginning of the time series. The following lemma is clear by definition.
32
Path Signatures on Lie Groups
TSWin,m : P G → P Gm+1
γt 7→ (γt , γt−τ , γt−2τ , . . . , γt−mτ ). (19)
For discrete time series, we assume that the data is temporally uniformly sampled, and
we choose τ to be the time in between samples. Then, if we consider a discrete time series
to be γ̃ : {0, . . . , n} → G, then the sliding window transformation will be
In the context of Euclidean path signatures, Fermanian (2019) empirically shows that
the sliding window embedding often performs well on classification tasks, though there is no
theoretical explanation. We note that due to the choice of padding the start of the delayed
time series with the identity, this transformation breaks translation invariance. We suggest
that breaking the translation symmetry is one reason the sliding window transformation
performs well in practice. This is discussed in Remark 64 in Section 5.1.
33
Darrick Lee and Robert Ghrist
2. Those which involve making inferences about probability measures µ ∈ P(X ), where
P(X ) denotes the space of Borel probability measures on X . For example, in two
sample hypothesis testing, we begin with samples {x1 , . . . , xn } and {y1 , . . . , ym } taken
from probability distributions p and q on X respectively. Testing the null hypothesis
that p = q then corresponds to learning about the underlying measures of p and q.
The general philosophy behind kernel methods is to map the input space X into a
reproducing kernel Hilbert space (RKHS) H using a feature map
Φ : X → Hκ ,
Problems involving learning nonlinear functions f ∈ RX given some input data {xi },
where xi ∈ X , can be reformulated as problems involving learning an element g ∈ H (which
can be thought of as a function g ∈ RX ) given the data {Φ(xi )}. Additionally, the norm
induced by the Hilbert space provides a metric between points x, y ∈ X as kx − yk. In
essence, this translates a nonlinear learning problem into a linear learning problem. This
allows the application of linear methods, which are much simpler and better developed in
many cases.
Measures µ on X can be mapped into the RKHS via the kernel mean embedding (KME),
Z
Φ : M(X ) → Hκ , Φ(µ) := Φ(x)dµ(x) = Eµ [Φ]. (20)
X
A priori, this map is not necessarily well-defined, so we will usually require restrictions on
the feature map or kernel such that the integral exists.
Specifically, the bounded integral condition is satisfied if we know that κ(x, x) = kΦ(x)k2 <
C for all x ∈ X for some fixed constant C. In other words, if the image of the feature map
Φ is contained in a bounded subset of Hκ , then the KME is well defined.
Similar to the previous case, we can use the norm on Hκ to define a notion of distance
on P(X ). Although this only provides a pseudometric since Φ may not be injective, it
coincides with a well known measure of discrepancy between probability measures.
34
Path Signatures on Lie Groups
When we take the function class F to be the unit ball in the RKHS Hκ , the MMD can
be written as the distance between the mean embeddings, with respect to the norm on H.
Lemma 50 (Borgwardt et al. (2006)) Suppose the KME map Φ is well-defined and
suppose µ, ν ∈ M(X ). Let F = {f ∈ H ⊂ RX : kf k ≤ 1}. Then,
This simplifies the study of probability measures by considering them as elements of a linear
space, and also provides a straightforward method to compute an unbiased finite sample
estimate of the MMD in terms of the kernel.
Lemma 51 (Gretton et al. (2012)) Suppose the KME map Φ is well-defined and sup-
pose µ, ν ∈ M(X ). Let F = {f ∈ H ⊂ RX : kf k ≤ 1}. Let X = (x1 , . . . , xn ) and
Y = (y1 , . . . , ym ) be i.i.d. samples from µ and ν respectively. An unbiased estimate of
MMD2 [F, µ, ν] is given as the MMD of the empirical distributions of X and Y ,
n n m m
1 XX 1 XX
MMD2u [F, X, Y ] = κ(xi , xj ) + κ(yi , yj )
n(n − 1) m(m − 1)
i=1 j6=i i=1 j6=i
n m
2 XX
− κ(xi , yj ). (23)
nm
i=1 j=1
From this discussion, kernels provide a unified way to study both nonlinear functions
and probability measures using the linear space H. However, there are deficiencies in both
scenarios.
1. In the case of nonlinear functions, we usually begin by choosing our function class
F ⊂ RX . How do we know that any function f : X → R can be represented arbitrarily
closely by an element in ` ∈ H such that f (x) ≈ h`, Φ(x)i for all x ∈ X ?
2. In the case of probability measures, it is often crucial that the MMD is in fact a
metric instead of just a pseudometric. How do we know that the feature map Φ is
rich enough to distinguish all probability measures µ ∈ M(X )?
The answer is given by the definitions of universal and characteristic kernels. This will
require us to extend the definition of the KME to Schwarz distributions rather than just
measures. We provide a quick exposition of the definitions here, and refer the reader to a
more thorough treatment in Simon-Gabriel and Schölkopf (2018). As usual, let F ⊂ RX be
a function class, and let F 0 denote its topological dual of all continuous linear functionals.
The definition of the KME for distributions is analogous to the case of measures
Z
0
Φ : F → Hκ , Φ(D) := Φ(x)dD(x), (24)
X
where the integral here is the weak- or Pettis- integral (Simon-Gabriel and Schölkopf, 2018).
Similar to the case of measures, this map is a priori not well-defined. However, we have a
simple criterion for the existence of these weak integrals.
35
Darrick Lee and Robert Ghrist
We can now state the definition of a universal and characteristic feature map.
Φ : X → Hκ
into an RKHS Hκ with respect to a kernel k. Suppose that h`, Φ(·)i ∈ F for all ` ∈ Hκ . We
say that Φ is
ι : Hk → F, ` 7→ h`, Φ(·)i
is injective.
Note that we have assumed that the image ι(Hκ ) ⊂ F so by Lemma 52, the KME map
is well defined. The property of universality allows us to approximate any function f ∈ F
using linear functionals h`, Φ(·)i for ` ∈ Hκ . The dual of a class of functions F is generally
much larger than the set of probability measures on X . If M(X ) ⊂ F 0 , then a characteristic
feature map is able to represent probability measures on X with elements of H. Moreover
the MMD becomes a metric due to the injectivity of the KME.
We have the following equivalence between universality and characteristicness, as shown
in Simon-Gabriel and Schölkopf (2018) and Chevyrev and Oberhauser (2018).
Theorem 54 Suppose that F is a locally convex topological vector space. A feature map Φ
is universal to F if and only if Φ is characteristic to F 0 .
36
Path Signatures on Lie Groups
Recall that we have defined T1 ((ḡ)) in Equation 13 to be the subspace of T ((ḡ)) with
constant value 1 and finite norm. We will view the path signature
S:P
g G → T1 ((ḡ))
as a feature map, and recall that T1 ((ḡ)) is equipped with an inner product, and is in
particular a Hilbert space. However, as discussed in the previous subsection, we will need
to ensure that the signature map sends paths to a bounded subset of T1 ((ḡ)) in order for
the KME to be defined. This will be done by using a tensor normalization, which was first
discussed in Chevyrev and Oberhauser (2018).
ΦS : P
g G → T1 ((ḡ)), ΦS = Λ ◦ S,
which is a continuous injective map from P gG into a bounded subset of T1 ((ḡ)). Note that
due to the scaling property of the path signature from Proposition 23, this is equivalent to
ΦS (γ) = S(λ(S(γ)) · γ),
where we first scale the path in G by λ(S(γ)), using the Lie algebra scaling from Defini-
tion 13.
Following Chevyrev and Oberhauser (2018), we will show universality and then use
the duality in Theorem 54 to show characteristicness with respect to probability measures.
Using the theory discussed in the previous section, the objective is to find a function class
F ⊂ RP G and a topology on P G such that
1. the function class F can be approximated by linear functionals h`, ΦS (·)i, and
2. the dual F 0 contains probability measures on P G.
The difficulty with such a result is due to the fact that P G is not locally compact. How-
ever, the class of continuous bounded functions Cb (P G, R) has such properties when P G is
endowed with the strict topology, originally defined in Giles (1971).
Definition 56 Let X be a topological space. We say that a function ψ : X → R vanishes at
infinity if for each > 0, there exists a compact set K ⊂ X such that supx∈X−K |ψ(x)| < .
Denote by B0 (X, R) the set of functions that vanish at infinity. The strict topology on
Cb (X, R) is the topology generated by the seminorms
pψ (f ) = sup |f (x)ψ(x)|, ψ ∈ B0 (X, R).
x∈X
37
Darrick Lee and Robert Ghrist
3. The topological dual of Cb (X, R) equipped with the strict topology is the space of finite
regular Borel measures on X.
Specifically, note that the space of finite regular Borel measures on X includes all probability
measures on X. Finally, we are ready to state the universality and characteristicness result.
Φ:P
g G → T1 ((ḡ)), Φ = Λ ◦ S,
2. is universal to F := Cb (P
g G, R) equipped with the strict topology, and
Proof The fact that Φ is an injection follows from the injectivity of the path signature
from Theorem 30 and the definition of the tensor normalization. Continuity follows from
the stability property of Corollary 38. Next, we move on to universality. Define
∞
M
L=1+ (ḡ)⊗m
m=1
to be a dense subspace of T1 ((ḡ)) (note that L only contains finite linear combinations of
tensors, whereas T1 ((ḡ)) contains all power series of tensors) and define
F0 = {h`, Φ(·)i : P
g G → R : ` ∈ L}.
We aim to show that F0 satisfies the hypotheses of the second point of Theorem 57. By the
injectivity of Φ, the class of functions F0 separates points, and because the path signature
is defined with constant term 1, the path signature is nonzero for all paths γ ∈ P
g G. Finally,
by the shuffle product identity from Theorem 33, the class of functions F0 is closed under
shuffle multiplication and is therefore a subalgebra of Cb (Pg G, R). Namely, let I and J be
multi-indices, and eI and eJ be the corresponding basis vectors in L. Then, we may define
multiplication in F0 by
* +
X
heI , Φ(·)iheJ , Φ(·)i = eK , Φ(·) ,
K∈I J
38
Path Signatures on Lie Groups
which is closed. Thus, Φ is universal with respect to F. Finally, by the duality in Theo-
rem 54 and the third point of Theorem 57, the function class F is characteristic with respect
to finite regular Borel measures on P
g G.
Remark 59 Although this theorem is stated for tree-like equivalence classes of paths in G,
by precomposing with the time transformation or identity start transformation discussed
in Section 3.8, we can also obtain universal and characteristic feature maps that are not
reparametrization or translation invariant.
SM : P G → T ≤M (ḡ).
Here, the inner product for T ≤M (ḡ) was given in Equation 12. The signature kernel trun-
cated at level M is defined to be
We will begin by simplifying the computation of the kernel for continuous paths.
M Z
X m
Y
KM (α, β) = hαs0 i , βt0i ig ds dt, (26)
m m
m=0 (s,t)∈∆ ×∆ i=1
where we view αt0 and βt0 as elements of the Lie algebra g, and the inner product in the
integrand h·, ·ig is computed in the Lie algebra. Also, we denote s = (s1 , . . . , sm ) and
t = (t1 , . . . , tm ) as elements of ∆m , and write ds := ds1 . . . dsm and dt := dt1 . . . dtm .
Proof Let’s consider the inner product at a single level m. Recall that πm : T ((ḡ)) → ḡ⊗m
is the projection on to the level m tensors, and h·, ·im refers to the inner product on ḡ⊗m .
39
Darrick Lee and Robert Ghrist
Then,
As noted by Kiraly and Oberhauser (2019), the expression in Equation 26 can be effi-
ciently computed by a method that is similar to Horner’s scheme forPcomputing polynomial
expressions. Suppose we wish to compute the expression p(x) = M i
i=0 x . By expanding
this polynomial as
where the recursion occurs M times and computing the brackets from the inside to the out-
side, we can evaluate the expression using M additions and M multiplications. In contrast,
2
the naive computation of p(x) would require M additions and M 2+M multiplications. Note
that we may write out this recursion explicitly as follows. Let
q1 = 1 + x, qm = 1 + xqm−1 .
Then, we may write p(x) = qM . We can significantly reduce the number of operations
required to compute the integrals in Equation 26 by adapting this procedure.
Corollary 61 Let
Z
Q1 (s, t) = 1 + s0 ∈[0,s]
hαs0 0 , βt00 ig ds0 dt0 . (27)
t0 ∈[0,t]
40
Path Signatures on Lie Groups
The general proof proceeds in the same manner. By each successive unfolding of the defi-
nition of Qm , we recover an additional summand in Equation 26.
Next, we will consider the discrete formulation of this expression. For simplicity, we
will consider discrete time series of the same length, though the following results also hold
when the two discrete time series are of different lengths. Suppose α̂, β̂ : [T + 1] → G, and
let α̂0 , β̂ 0 : [T ] → g be their corresponding discrete derivatives. Following the notation in
Section 3.7 for the discrete signature, we define the discrete signature kernel truncated at
level M to be
K̂M (α̂, β̂) := hŜM (α̂), ŜM (β̂)iM . (30)
Notice we are using the discrete signature given in Definition 42. Then, the discrete ana-
logues of Proposition 60 and Corollary 61 are as follows.
M
X X m
Y
K̂M (α̂, β̂) = hαs0 i , βt0i ig . (31)
m=0 (s,t)∈∆ ˆ m i=1
ˆ m ×∆
T T
Corollary 63 Let
X
Q̂1 (s, t) = 1 + hαs0 0 , βt00 ig (32)
s0 ∈[s], t0 ∈[t]
41
Darrick Lee and Robert Ghrist
The proofs for these discrete formulas proceed in exactly the same manner as their contin-
uous counterparts. This final recursive formula for the kernel provides an efficient compu-
tation of the discrete signature. We set some notation before writing down the algorithm.
Suppose A, B are T × T arrays. We will use the notation A[i, j] to denote elements of the
array, and we suppose that the arrays are 1-indexed. The notation for the pseudocode is
explained in Appendix A.
For the algorithm, suppose we have discrete time series α, β : [T + 1] → G, and the
corresponding derivatives α0 , β 0 : [T ] → g are already computed, as per Appendix A. We
assume that the Lie group G is N dimensional. In the pseudocode, we let a, b be the discrete
derivatives α0 , β 0 respectively.
Algorithm 1: Discretized Signature Kernel
Input : a,b (T × N arrays): Two paths as discrete derivatives
M (Int): The truncation level
Output: R (Float): The kernel value KM (α, β).
1 Compute the Gram matrix of the derivatives
K ← ab| ;
2 Initialize (T × T ) arrays A and Q ;
3 Initialize the first step of the recursion
A←K;
4 for m=2..M do
5 Q ← 1+ A[, ] ;
6 A←K·Q ;
7 end
8 R ← 1 + A[Σ, Σ];
9 Return R
As discussed by Kiraly and Oberhauser (2019), the runtime of this algorithm is O(T 2 ·
M ). Now, consider the naive computation of the kernel where we first compute the trun-
cated signatures of α̂ and β̂, and then compute the inner product. From the analysis in
Appendix A, a signature computation requires O(T N M ) operations. The leading-order
term in the computation of the inner product is computing the inner product of the re-
spective level M tensors. This requires N M operations. Thus, the complexity of the naive
computation is O(T N M ).
This suggests that if the length T of the time series is large and the truncation level is
small, then a naive computation may be more efficient. However, if we wish to compute the
kernel at a high truncation level, the recursive algorithm provided here scales significantly
better. Furthermore, several variants of this algorithm for Euclidean path signatures are
considered in Kiraly and Oberhauser (2019) such as by incorporating low-rank approxima-
tions. These algorithms can similarly be extended to the setting of Lie groups by applying
them to the discrete derivatives.
5. Experiments
In this section, we provide two detailed experiments to demonstrate the universal and char-
acteristic properties of the path signature. First, we consider the human action recognition
42
Path Signatures on Lie Groups
problem from computer vision, using a Lie group representation of the data. We find that
the path signature method is simple to implement, achieves comparable classification per-
formance to shallow learning methods, and provides an interpretable feature set. Second,
we perform a kernel two-sample hypothesis test aiming to distinguish between two different
random walks on SO(3). Here, we find that the path signature for SO(3) significantly
outperforms the same hypothesis test done using the Euclidean representation of SO(3).
ei (1) − ei (0)
êi = .
kei (1) − ei (0)k
We will consider all pairs of body parts (ei , ej ) that share a joint, such that ei (c1 ) = ej (c2 )
for some c1 , c2 ∈ {0, 1}, so k is the number of adjacent pairs of body parts. To obtain
the rotation matrix for a chosen pair (ei , ej ), we rotate the global coordinate system (with
minimum rotation) such that êi is the x-axis. Then, the rotation matrix Ri,j ∈ SO(3) is the
minimum rotation from êi to êj in this coordinate system. By repeating this for all adjacent
43
Darrick Lee and Robert Ghrist
pairs, we obtain an element of SO(3)k , and further repeating this for all time steps, we can
represent this motion as a time series in SO(3)k .
Note that in Vemulapalli et al. (2014) and Vemulapalli and Chellappa (2016), all pairs
of body parts are used in the Lie group representation, which results in k = 342 pairs for
this dataset. In contrast, we only use all adjacent pairs of body parts, which we call the
primary pairs, resulting in k = 18 pairs for this dataset. We use significantly less data
because the path signatures take into account the relationships between all input pairs, so
we have information regarding non-primary pairs through the higher order signature terms.
The numbering of the primary pairs are given in the following figure.
44
Path Signatures on Lie Groups
1
8 4
7 3
2
9 5
10 15 11 6
12
16
17 13
18 14
Body 1-2
L Arm 3-6
R Arm 7-10
L Leg 11-14
R Leg 15-18
Extensive preprocessing of this data is performed in Vemulapalli et al. (2014) and Vem-
ulapalli and Chellappa (2016) in order to deal with several difficulties. During the training
stage, the following steps were taken.
1. Each time series is resampled via interpolation so that all time series have a fixed
length.
3. To handle issues of rate variation and temporal misalignment, dynamic time warping
(DTW) is used to warp each time series to its corresponding nominal curve.
4. A rolling and unwrapping procedure is computed with respect to its nominal curve to
obtain a curve in the Lie algebra so(3)k .
5. (Optional) A Fourier temporal pyramid (FTP) representation of the Lie algebra curve
may be computed to further deal with temporal misalignment.
45
Darrick Lee and Robert Ghrist
A classifier such as a support vector machine (SVM) is then trained for every action class
(one vs. rest). In the testing stage, the data is preprocessed with respect to all nominal
curves and the corresponding SVM is used for prediction. This amounts to a large pre-
processing cost, especially for test samples, which must be preprocessed with respect to all
action classes.
In contrast, we perform minimal preprocessing since we can compare time series with
varying numbers of frames using path signatures, and the issues of rate variation and tem-
poral misalignment are handled by reparametrization invariance.
5.1.3 Results
Following Vemulapalli et al. (2014); Vemulapalli and Chellappa (2016), we use a cross-
subject test setting, where we use half of the subjects for training, and the other half for
testing. All of the reported classification results are averaged over ten different combinations
of the train/test split. We perform the classification using a kernel SVM, as well as a random
forest. For the kernel SVM, we report the results using the signature kernel truncated at
level 6. We use the Julia implementation of scikit-learn, with the SVC implementation for
support vector machines, which uses the one-against-one approach (Knerr et al., 1990) for
multi-class classifcation. For the random forest, we compute the level 2 signature of the time
series and treat it as a feature set. We use the random forest implementation in the Julia
DecisionTrees package. We follow the suggested default random forest hyperparameters
√
in Probst et al. (2019), and use 1,000 trees, nf features for each tree, where nf is the total
number of input features, a maximum depth of 100, and use 70% of the data to train each
tree. A tensor renormalization is used for all path signature computations as described in
Proposition 65, and using the function
√
x2
if x ≤ √M
ψ(x) =
M + M 1+a (M −a − x−a )/a if x > M
Table 2: Classification results using a random forest and SVM for different embeddings.
46
Path Signatures on Lie Groups
Based on our results, a random forest trained using level 2 signatures outperforms an
SVM trained using level 6 signatures for all embeddings. This may be due to the fact that
random forests are in general better suited for multi-class classification tasks, since the SVM
approach is to split the multi-class problem into 20
2 binary classification tasks.
We note by using the random forest classifier, and introducing lags, we are able to
achieve results which are comparable to the 87.95% accuracy of Vemulapalli and Chellappa
(2016), given the fact that we are using significantly less input data, minimal preprocessing,
and default hyperparameters on the random forest.
Remark 64 In Section 3.8.3, we mentioned that one possible explanation for the strong
empirical performance of the sliding window embedding is the breaking of translation invari-
ance. In these results, we can isolate the effect of breaking translation invariance by using
the IdInit embedding. We see that for both the random forest and the SVM, the performance
improves significantly after using the IdInit embedding when compared to the raw time se-
ries. The sliding window embedding with 1 lag also significantly improves the performance
when compared to the raw time series, but we see that the increase is comparable to that
of the IdInit embedding. When we increase the number of lags, the path signature is able
to capture more information by integrating with respect to past values, and thus the perfor-
mance continues to improve. Fermanian (2019) suggested the problem of explaining why
the sliding window embedding performs well in practice. This empirical evidence indicates
that one such reason is the breaking of the translation invariance of the path signature.
47
Darrick Lee and Robert Ghrist
Walk (Si,j)
j
i L Arm R Arm L Leg R Leg
3 6 9 12 15 18
1
3
L Arm 0.8
6
0.6
R Arm
9
0.4
L Leg 12
15 0.2
R Leg
18 0
Figure 6: Averaged absolute second level signature matrix for the action class “walk.”
basis direction for a given primary pair. The off diagonal entries measure the positive or
negative influence of the basis direction i on the basis direction j, as defined in Section 3.5.
For a closer look at this example, we isolate the blocks corresponding to the left and right
legs, and reduce the color threshold.
Walk (|Si,j|)
j
i L Leg R Leg
11 12 13 14 15 16 17 18
0.3
11
12 0.25
L Leg
13 0.2
14
0.15
15
16 0.1
R Leg
17 0.05
18
0
Figure 7: Averaged absolute second level signature matrix for the action class “walk.”
The blocks with the largest magnitude are the diagonal blocks 14 and 18, which cor-
respond to the joints of the left and right foot (see Figure 5. In addition, many of the
off-diagonal entries are nonzero, which corresponds to the action of walking. While walk-
ing, we alternate moving our left and right legs, and the signature matrix measures this as
an influence of rotations about joints in one leg on the rotations about joints in the other
leg.
In Figure 8, we have plotted the absolute value of the averaged second level signature
matrix for all action classes in the data set. We omit the labelling to simplify the figure,
but all labelling is the same as Figure 6. In particular, the colors range from 0 to 1.
48
Path Signatures on Lie Groups
49
Darrick Lee and Robert Ghrist
50
Path Signatures on Lie Groups
Figure 9: The von-Mises Fisher density on S 2 with mean direction x = (0, 0, 1) and κ = 0.1.
where c > 0 is the step size and γ̂1 is a randomly sampled point on SO(3).
5.2.3 Results
We perform two classes of tests.
All distributions will have a concentration parameter of κ = 0.1. For each test, we will
sample n = 50 random walks from each distribution, and each random walk will have
L = 100 steps. To generate the null distribution of the MMD, we perform a permutation
test with 2,000 permutations for a given set of samples. A level of α = 0.05 is chosen, so we
compute the 0.95 quantile of the null distribution as the threshold used for the MMD. The
path signature is truncated at level 4 in the MMD computation. We perform 1,000 tests
for each class, and each test is done using both Lie group and Euclidean path signatures.
The following table provides the error rates (false positive/negative) for the two classes
of tests using the two methods.
Table 3: Error rates for hypothesis testing. Each test was run 1,000 times.
Additionally, we provide histograms that summarize the test results. The test distribu-
tions show the distribution of MMDu over the 1,000 independent trials. The null distribution
shown is generated from a permutation test for a single trial. The red line shows the 0.95
51
Darrick Lee and Robert Ghrist
quantile, and represents the threshold for that trial. The histograms for H0 being false are
shown in Figure 10 and the histograms for H0 being true are shown in Figure 11.
We find that the Lie group path signatures significantly outperform Euclidean path
signatures. This is due to the fact that the Euclidean representation of the data is ill-suited
for this problem. We are aiming to detect a slight drift in the direction of the rotation,
which is a translation invariant feature in SO(3). However, this is not a translation invariant
feature in the Euclidean representation of the problem, so the effect of the drift is confounded
in the Euclidean path signature.
52
Path Signatures on Lie Groups
Figure 10: Test (top) and null (bottom) distributions of MMDu when H0 is false.
Figure 11: Test (top) and null (bottom) distributions of MMDu when H0 is true.
53
Darrick Lee and Robert Ghrist
6. Conclusion
We have defined path signature for Lie group valued time series, and studied several of its
properties, the main result being the universal and characteristic properties of the signature
kernel. By defining signature using only the derivative of the path, computational techniques
from Euclidean valued paths to Lie group valued paths can be exported cleanly. Our
theory is validated using two detailed experiments highlighting both the universal and
characteristic properties, showing that the path signature has strong empirical performance,
while providing an interpretable feature set which can be used to better understand the
underlying phenomena.
Lie group valued data is ubiquitous; however, in previous studies analyzing such data,
the analysis pipeline can be complicated due to the ostensible complexities when dealing
with Lie groups (as described in Section 5.1). Our derivations show that Lie group valued
data can be treated in a manner nearly identical to standard Euclidean valued data.
The work in this paper provides the foundations for further studying properties and
applications of path signatures for Lie groups and more. We highlight two directions for
possible future research.
2. In Chen (1958), the path signature is defined for a manifold M by choosing a collection
of 1-forms {ωi }N
i=1 (as no natural Lie algebra basis is available). Choose x ∈ M to be
a basepoint, and let S : P Mx → T ((RN )) be the path signature defined with respect
to these 1-forms and let α, β ∈ P Mx . If the 1-forms span the cotangent bundle of M
at every point x ∈ M , Chen’s injectivity result states that S(α) = S(β) if and only if
α and β are tree-like equivalent (in a slightly modified sense). The injectivity theorem
given in Theorem 30 is a customization of Chen’s result to the case M = G since a
basis of g∗ spans the cotangent bundle at every point.
Recently, there has been interest in studying time series evolving on manifolds, and
the development of a path signature kernel for paths on manifolds would provide a
powerful tool for geometric time series analysis. One of the difficulties of this definition
would be the representation of data on manifolds and the choice of 1-forms used. One
could begin with path signatures on parallelizable manifolds, which by definition admit
a smooth basis of vector fields and 1-forms. This would encompass all orientable 3-
dimensional manifolds.
54
Path Signatures on Lie Groups
Acknowledgments
D.L. would like to acknowledge Chad Giusti and Jakob Hansen for several helpful discus-
sions throughout this project. D.L. and R.G. are supported by the Office of the Assistant
Secretary of Defense Research & Engineering through ONR N00014-16-1-2010. D.L. is also
supported by the Natural Sciences and Engineering Research Council of Canada (NSERC)
PGS-D3.
Our goal in this section is to compute the truncated discrete signature ŜM (P ). Recall
that the discrete path signature with respect to the multi-index I = (i1 , . . . , im ) ∈ [N ]m is
computed as
X
Ŝ I (γ̂) := ωi1 (pt1 ) . . . ωim (ptm ).
ˆm
(t1 ,...,tm )∈∆ T
We note that the path signature computation relies only on the derivative p, so we split the
computation into two steps.
55
Darrick Lee and Robert Ghrist
1. Compute the discrete derivative p from a discrete time series P in a Lie group G with
respect to a chosen basis of g. Note that this step is dependent on both the Lie group
G, and the choice of basis of g.
2. Compute the truncated discrete path signature ŜM (γ̂) given the discrete derivative p.
Note that this step is independent of both the Lie group G as well as the choice of
basis of g.
This abstraction allows us to write a single path signature function, though we must write
a new discrete derivative function for each Lie group we wish to consider. Let us first fix
some notation used in the pseudo-code.
2. Element-wise Multiplication. For two arrays A and B of the same size, we define
A[end] = A[T ].
56
Path Signatures on Lie Groups
57
Darrick Lee and Robert Ghrist
58
Path Signatures on Lie Groups
59
Darrick Lee and Robert Ghrist
Proposition 65 Let ψ : [1, ∞) → [1, ∞) with ψ(1) = 1. For t ∈ T1 ((H)) and define
λ : T1 ((H)) → (0, ∞) to be the unique non-negative number such that kδλ(t) tk2 = ψ(ktk).
Define
Λ : T1 ((H)) → T1 ((H))
t 7→ δλ(t) t.
2. If ψ is injective, then so is Λ.
3. Suppose that supx≥1 ψ(x)/s2 ≤ 1, kψk∞ < ∞, and that ψ is K-Lipschitz for some
K > 0. Then
√ p p
kΛ(s) − Λ(t)k ≤ (1 + K + 2 kψk∞ )( ks − tk ∨ ks − tk).
Corollary 66 Let ψ : [1, ∞) → [1, ∞) be injective satisfying ψ(1) = 1 and the conditions
of item (3.) in Proposition 65. Then, the function Λ constructed in Proposition 65 is a
tensor normalization.
References
Marcos Alexandrino and Renato Bettiol. Lie Groups and Geometric Aspects of Isometric
Actions. Springer International Publishing, 2015.
Carlos Améndola, Peter Friz, and Bernd Sturmfels. Varieties of Signature Tensors. Forum
of Mathematics, Sigma, 7:e10, 2019.
Imanol Perez Arribas, Kate Saunders, Guy Goodwin, and Terry Lyons. A signature-
based machine learning model for bipolar disorder and borderline personality disorder.
arXiv:1707.07124 [stat], July 2017.
60
Path Signatures on Lie Groups
Yuliy Baryshnikov and Emily Schlafly. Cyclicity in multivariate time series and applications
to functional MRI data. In 2016 IEEE 55th Conference on Decision and Control (CDC),
pages 1625–1630, December 2016.
Victoria Bloom, Dimitrios Makris, and Vasileios Argyriou. G3D: A gaming action dataset
and real time action recognition evaluation framework. In 2012 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition Workshops, pages 7–12, June
2012.
Jiawei Chang, Nick Duffield, Hao Ni, and Weijun Xu. Signature inversion for monotone
paths. Electronic Communications in Probability, 22, 2017.
Kuo-Tsai Chen. Integration of Paths, Geometric Invariants and a Generalized Baker- Haus-
dorff Formula. Annals of Mathematics, 65(1):163–178, 1957.
Kuo-Tsai Chen. Iterated path integrals. Bulletin of the American Mathematical Society, 83
(5):831–879, September 1977.
Ilya Chevyrev and Andrey Kormilitzin. A Primer on the Signature Method in Machine
Learning. arXiv:1603.03788 [cs, stat], March 2016.
Ilya Chevyrev and Harald Oberhauser. Signature moments to characterize laws of stochastic
processes. arXiv:1810.10971 [math, stat], October 2018.
Ilya Chevyrev, Vidit Nanda, and Harald Oberhauser. Persistence Paths and Signature
Features in Topological Data Analysis. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 42(1):192–202, January 2020.
Persi Diaconis and Mehrdad Shahshahani. The Subgroup Algorithm for Generating Uniform
Random Variables. Probability in the Engineering and Informational Sciences, 1(1):15–
32, January 1987.
Joscha Diehl and Jeremy Reizenstein. Invariants of Multidimensional Time Series Based
on Their Iterated-Integral Signature. Acta Applicandae Mathematicae, 164(1):83–122,
December 2019.
61
Darrick Lee and Robert Ghrist
Adeline Fermanian. Embedding and learning with signatures. arXiv:1911.13211 [cs, stat],
November 2019.
Peter K. Friz and Nicolas B. Victoir. Multidimensional Stochastic Processes as Rough Paths:
Theory and Applications. Cambridge Studies in Advanced Mathematics. Cambridge Uni-
versity Press, 2010.
Robin Giles. A Generalization of the Strict Topology. Transactions of the American Math-
ematical Society, 161:467–474, 1971.
Chad Giusti and Darrick Lee. Iterated Integrals and Population Time Series Analysis. In
Nils A. Baas, Gunnar E. Carlsson, Gereon Quick, Markus Szymik, and Marius Thaule,
editors, Topological Data Analysis, Abel Symposia, pages 219–246, Cham, 2020. Springer
International Publishing.
Arthur Gretton, Kenji Fukumizu, Zaı̈d Harchaoui, and Bharath K. Sriperumbudur. A Fast,
Consistent Kernel Two-Sample Test. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I.
Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems
22, pages 673–681. Curran Associates, Inc., 2009.
Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexan-
der Smola. A Kernel Two-Sample Test. Journal of Machine Learning Research, 13:
723–773, March 2012.
Lajos Gergely Gyurkó, Terry Lyons, Mark Kontkowski, and Jonathan Field. Extracting
information from the signature of a financial data stream. arXiv:1307.7244 [q-fin], July
2013.
Ben Hambly and Terry Lyons. Uniqueness for the signature of a path of bounded variation
and the reduced path group. Annals of Mathematics, 171(1):109–167, 2010.
Zhiwu Huang, Chengde Wan, Thomas Probst, and Luc Van Gool. Deep learning on lie
groups for skeleton-based action recognition. In The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), July 2017.
Wenzel Jakob. Numerically stable sampling of the von Mises Fisher distribution on $S2̂$
(and other tricks). 2012.
Patrick Kidger and Terry Lyons. Signatory: Differentiable computations of the signature
and logsignature transforms, on both CPU and GPU. arXiv:2001.00706 [cs, stat], Jan-
uary 2020.
Franz J. Kiraly and Harald Oberhauser. Kernels for Sequentially Ordered Data. Journal
of Machine Learning Research, 20(31):1–45, 2019.
62
Path Signatures on Lie Groups
Stefan Knerr, Léon Personnaz, and Gérard Dreyfus. Single-layer learning revisited: A
stepwise procedure for building and training a neural network. In Françoise Fogelman
Soulié and Jeanny Hérault, editors, Neurocomputing, NATO ASI Series, pages 41–50,
Berlin, Heidelberg, 1990. Springer.
Enrico Le Donne. A Primer on Carnot Groups: Homogenous Groups, Carnot-Carathéodory
Spaces, and Regularity of Their Isometries. Analysis and Geometry in Metric Spaces, 5
(1):116, 2017.
John M. Lee. Introduction to Smooth Manifolds. Graduate Texts in Mathematics. Springer-
Verlag, New York, 2003.
Yanshan Li, Tianyu Guo, Xing Liu, and Rongiie Xia. Skeleton-based Action Recognition
with Lie Group and Deep Neural Networks. In 2019 IEEE 4th International Conference
on Signal and Image Processing (ICSIP), pages 26–30, July 2019.
Terry Lyons. esig. https://esig.readthedocs.io/en/latest/.
Terry Lyons. Differential equations driven by rough signals. Revista Matemática Iberoamer-
icana, 14(2):215–310, 1998.
Terry Lyons. Rough paths, Signatures and the modelling of functions on streams.
arXiv:1405.4537 [math, q-fin, stat], May 2014.
Terry Lyons and Zhongmin Qian. System Control and Rough Paths. Clarendon, Oxford,
2007.
Terry Lyons and Weijun Xu. Hyperbolic development and inversion of signature. Journal
of Functional Analysis, 272(7):2933–2955, April 2017.
Terry Lyons, Michael Caruana, and Thierry Lévy. Differential Equations Driven by Rough
Paths: Ecole d’Eté de Probabilités de Saint-Flour XXXIV-2004. École d’Été de Proba-
bilités de Saint-Flour. Springer-Verlag, Berlin Heidelberg, 2007.
Terry Lyons, Hao Ni, and Harald Oberhauser. A feature set for streams and an application
to high-frequency financial tick data. In BigDataScience ’14, 2014.
P. J. Moore, T. J. Lyons, J. Gallacher, and Alzheimer’s Disease Neuroimaging Initiative.
Using path signatures to predict a diagnosis of Alzheimer’s disease. PloS One, 14(9):
e0222212, 2019.
Max Pfeffer, Anna Seigal, and Bernd Sturmfels. Learning paths from signature tensors.
SIAM Journal on Matrix Analysis and Applications, 40(2):394–416, 2019.
Philipp Probst, Marvin N. Wright, and Anne-Laure Boulesteix. Hyperparameters and
tuning strategies for random forest. WIREs Data Mining and Knowledge Discovery, 9
(3):e1301, 2019.
Jeremy F. Reizenstein and Benjamin Graham. Algorithm 1004: The Iisignature Library:
Efficient Calculation of Iterated-Integral Signatures and Log Signatures. ACM Transac-
tions on Mathematical Software, 46(1):8:1–8:21, March 2020.
63
Darrick Lee and Robert Ghrist
Manel Rhif, Hazem Wannous, and Imed Riadh Farah. Action Recognition from 3D Skele-
ton Sequences using Deep Networks on Lie Group Features. In 2018 24th International
Conference on Pattern Recognition (ICPR), pages 3427–3432, August 2018.
J. M. Selig. Lie Groups and Lie Algebras in Robotics. In Jim Byrnes, editor, Computa-
tional Noncommutative Algebra and Applications, NATO Science Series II: Mathematics,
Physics and Chemistry, pages 101–125, Dordrecht, 2004. Springer Netherlands.
Raviteja Vemulapalli and Rama Chellappa. Rolling Rotations for Recognizing Human Ac-
tions from 3D Skeletal Data. In 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pages 4471–4479, June 2016.
Raviteja Vemulapalli, Felipe Arrate, and Rama Chellappa. Human Action Recognition
by Representing 3D Skeletons as Points in a Lie Group. In 2014 IEEE Conference on
Computer Vision and Pattern Recognition, pages 588–595, June 2014.
Weixin Yang, Lianwen Jin, and Manfei Liu. DeepWriterID: An End-to-End Online Text-
Independent Writer Identification System. IEEE Intelligent Systems, 31(2):45–53, March
2016.
Weixin Yang, Terry Lyons, Hao Ni, Cordelia Schmid, and Lianwen Jin. Developing the Path
Signature Methodology and its Application to Landmark-based Human Action Recogni-
tion. arXiv:1707.03993 [cs], December 2019.
Benjamin J. Zimmerman, Ivan Abraham, Sara A. Schmidt, Yuliy Baryshnikov, and Fa-
tima T. Husain. Dissociating tinnitus patients from healthy controls using resting-state
cyclicity analysis and clustering. Network Neuroscience, pages 1–23, April 2018.
64