0% found this document useful (0 votes)
43 views17 pages

ST PDE Equivariant

Uploaded by

fakeherolimit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views17 pages

ST PDE Equivariant

Uploaded by

fakeherolimit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Space-Time Continuous PDE Forecasting using

Equivariant Neural Fields

David M. Knigge∗,1 , David R. Wessels∗,1 , Riccardo Valperga1 , Samuele Papa1 , Jan-Jakob Sonke2 ,
Efstratios Gavves †,1 , Erik J. Bekkers †,1
1
University of Amsterdam 2 Netherlands Cancer Institute
[email protected], [email protected]
arXiv:2406.06660v1 [cs.LG] 10 Jun 2024

Abstract

Recently, Conditional Neural Fields (NeFs) have emerged as a powerful modelling


paradigm for PDEs, by learning solutions as flows in the latent space of the
Conditional NeF. Although benefiting from favourable properties of NeFs such
as grid-agnosticity and space-time-continuous dynamics modelling, this approach
limits the ability to impose known constraints of the PDE on the solutions – e.g.
symmetries or boundary conditions – in favour of modelling flexibility. Instead, we
propose a space-time continuous NeF-based solving framework that - by preserving
geometric information in the latent space - respects known symmetries of the PDE.
We show that modelling solutions as flows of pointclouds over the group of interest
G improves generalization and data-efficiency. We validated that our framework
readily generalizes to unseen spatial and temporal locations, as well as geometric
transformations of the initial conditions - where other NeF-based PDE forecasting
methods fail - and improve over baselines in a number of challenging geometries.

1 Introduction

Partial Differential Equations (PDEs) are a foundational tool in modelling and understanding spatio-
temporal dynamics across diverse scientific domains. Classically, PDEs are solved using numerical
methods such as finite elements, finite volumes, or spectral methods. In recent years, Deep Learning
(DL) methods have emerged as promising alternatives due to abundance of observed and simulated
data as well as the accessibility to computational resources, with applications ranging from fluid
simulations and weather modelling [49, 7] to biology [32].

* shared first author, † shared lead advising

Preprint. Under review.

Figure 1: We propose to solve an equivariant PDE in function space by solving an equivariant ODE in
latent space. Through our proposed framework, which leverages Equivariant Neural Fields fθ , a field
νt is represented by a set of latents ztν = {(pνi , cνi )}N
i=1 consisting of a pose pi and context vector ci .
Using meta-learning, the initial latent z0ν is fit in only 3 SGD steps, after which an equivariant neural
ODE Fψ models the solution as a latent flow.
The systems modelled by PDEs often have underlying symmetries. For example, heat diffusion
or fluid dynamics can be modeled with differential operators which are rotation equivariant, e.g.,
given a solution to the system of PDEs, its rotation is also a valid solution 1 . In such scenarios it is
sensible, and even desirable, to design neural networks that incorporate and preserve such symmetries
to improve generalization and data-efficiency [12, 47, 4].
Crucially, DL-based approaches often rely on data sampled on a regular grid, without the inherent
ability to generalize outside of it, which is restrictive in many scenarios [39]. To this end, [49] propose
to use Neural Fields (NeFs) for modelling and forecasting PDE dynamics. This is done by fitting a
neural ODE [11] to the conditioning variables of a conditional Neural Field trained to reconstruct
states of the PDE [13]. However, this approach fails to leverage aforementioned known symmetries
of the system. Furthermore, using neural fields as representations has proved difficult due to the
non-linear nature of neural networks [13, 3, 34], limiting performance in more challenging settings.
We posit that NeF-based modelling of PDE dynamics benefits from representations that account
for the symmetries of the system as this allows for introducing inductive biases into the model that
ought to be reflected in solutions. Furthermore, we show that through meta-learning [28, 44] the NeF
backbone improves performance for complex PDEs by further structuring the NeF’s latent space,
simplifying the task of the neural ODE.
We introduce a framework for space-time continuous equivariant PDE solving, by adapting a class
of SE(n)-Equivariant Neural Fields (ENFs) to PDE-specific symmetries. We leverage the ENF as
representation for modelling spatiotemporal dynamics. We solve PDEs by learning a flow in the
latent space of the ENF - starting at a point z0 corresponding to the initial state of the PDE - with an
equivariant graph-based neural ODE [11] we develop from previous work [5]. We extend the ENF
to equivariances beyond SE(n), by extending its weight-sharing scheme to equivalance classes for
specific symmetries relevant to our setting. Furthermore, we show how meta-learning [14, 28, 44, 13],
can not only significantly reduce inference time of the proposed framework, but also substantially
simplify the structure of the latent space of the ENF, thereby simplifying the learning process of the
latent dynamics for the neural ODE model. We present the following contributions:

• We introduce a framework for spatio-temporally continuous PDE solving that respects


known symmetries of the PDE through equivariance constraints.
• We show that correctly chosen equivariance constraints as inductive bias improves per-
formance of the solver - in terms of MSE - in spatio-temporally continuous settings, i.e.
evaluated off the training grid and beyond the training horizon.
• We show how meta-learning improves the structure of the latent space of the ENF, simplify-
ing the learning process, leading to better performance in solving PDEs.

We structure the paper as follows: in Sec. 2 we provide an overview of the mathematical preliminaries
and describe the problem setting. Our proposed framework is introduced in Sec. 3. We validate
our framework on different PDEs defined over a variety of geometries in Sec. 4, with differing
equivariance constraints, showing competitive performance over other neural PDE solvers.We provide
an in-depth positioning of our approach in relation to other work in Appx. A.

2 Mathematical background and problem setting


Continuous spatiotemporal dynamics forecasting. The setting considered is data-driven learning
of the dynamics of a system described by continuous observables. In particular, we consider flows of
fields, denoted with ν̂ : Rd × [0, T ] → Rc . We use ν̂t as a shorthand for ν̂(·, t). We assume the flow
is governed by a PDE, and consider the Initial Value Problem (IVP) of predicting ν̂t from a given ν0 .
The dataset consists of field snapshots ν : X × JT K → Rc , in which JT K := 1, 2, . . . , T denotes the
set of time points on which the flow is sampled and X ⊂ Rd is a set of coordinate values. For each
time point we are given a set of input-output pairs [X , ν(X )] where ν(X ) ⊂ Rc are the values of the
field at those coordinates. Importantly, the location at which the field is sampled need not be regular,
i.e., we do not require the training data to be on a grid or to be regularly spaced in time, nor need
coordinate values be identical for train and test sets. Following [49], we distinguish between tin -
referring to values within the training time horizon [0, T ] - and tout - analogously to values beyond T .
1
Assuming boundary conditions are symmetric, i.e. they transform according to the relevant group action.

2
Neural Fields in dynamics modelling. Conditional Neural fields (NeFs) are a class of coordinate-
based neural networks, often trained to reconstruct discretely-sampled input continuously. More
specifically, a conditional neural field fθ : Rn → Rd is a field –parameterized by a neural network
with parameters θ– that maps input coordinates x ∈ Rn in the data domain alongside conditioning
latents z to d-dimensional signal values ν(x) ∈ Rd . By associating a conditioning latent z ν ∈ Rc
to each signal ν, a single conditional NeF fθ : Rn × Rc → Rd can learn to represent families D of
continuous signals such that ∀ ν ∈ D : f (x) ≈ fθ (x; z ν ). [49] propose to use conditional NeFs for
PDE modelling by learning a continuous flow in the latent space of a conditional neural field. In
particular, a set of latents {ziν }Ti=1 are obtained by fitting a conditional neural field to a given set
of observations {νi }Ti=1 at timesteps 1, ..., T ; simultaneously, a neural ODE [11] Fψ is trained to
map pairs of temporally contiguous latents s.t. solutions correspond to the trajectories traced by the
learned latents. Though this approach yields impressive results for sparse and irregular data in planar
PDEs, we show it breaks down on complex geometries. We hypothesize that this is due to lack of
a latent space that preserves relevant geometric transformations that characterize the symmetries
of the systems we are modelling, and as such propose an extension of this framework where such
symmetries are preserved.

Symmetries and weight sharing. Given a group G with identity element e ∈ G, and a set X, a
group action is a map T : G × X → X. For simplicity, we denote the action of g ∈ G on x ∈ X
as gx := T (g, x), and call G-space a smooth manifold equipped with a G action. A group action
is homomorphic to G with its group product, namely it is such that ex = x and (gh)x = g(hx).
As an example, we are interested in the Special Euclidean group SE(n)=Rn ⋊ SO(n): group
elements of SE(n) are identified by a translation t ∈ Rn and rotations R ∈ SO(n) with group
operation gg ′ = (t, Rθ ) (t′ , Rθ′ ) = (Rx′ + x, RRθ′ ); We denote by Lg the left action of G on
function spaces defined as Lg f (x′ ) = f (g −1 x′ ) = f (R−1 ′
θ (x − x)). Many PDEs are defined by
equivariant differential operators such that for a given state ν: Lg N [ν] = N [Lg ν]. If the boundary
conditions do not break the symmetry, namely if the boundary is symmetric with respect to the same
group action, then a G-transformed solution to the IVP for some ν0 corresponds to the solution
for the G-transformed initial value. For example, laws of physics do not depend on the choice
of coordinate system, this implies that many PDEs are defined by SE(n)-equivariant differential
operators. The geometric deep learning literature shows that models can benefit from leveraging the
inherent symmetries or invariances present in the data by constraining the searchable function space
through weight sharing [9, 25, 5]. Recall that in our framework we model flows of fields, solutions to
PDEs defined by equivariant differential operators, with ordinary differential equations in the latent
space of conditional neural fields. We leverage the symmetries of the system for two key aspects
of the proposed method: first by making the relation between signals and corresponding latents
equivariant; second, by using equivariant ODEs, namely ODEs defined by equivariant vector fields: if
dz
dτ =F (z) is such that F (gz) = gF (z), then solutions are mapped to solutions by the group action.

3 Method
We adapt the work of [49], and consider the following optimization problem 2 :
Z t
ν 2 ν ν
min Eν∈D,x∈X ,t∈JT K ∥νt (x) − fθ (x; zt )∥2 , where zt = z0 + Fψ (zτν )dτ , (1)
θ,ψ,zτ 0

with fθ (x; ztν ) a decoder tasked with reconstructing state νt from latent ztν and Fψ a neural ODE that
dz ν
maps a latent to its temporal derivative: τ =Fψ (zτν ), modelling the solution as flow in latent space

starting at the initial latent z0ν - see Fig. 1 for a visual intuition.

Equivariant space-time continuous dynamics forecasting. A PDE defined by a G-equivariant


differential operator - for which Lg N [ν] = N [Lg ν] - are such that solutions are mapped to other
solutions by the group action if the boundary conditions are symmetric. We would like to leverage
this property, and constrain the neural ODE Fψ such that the solutions it finds in latent space can be
mapped onto each other by the group action. Our motivation for this is twofold: (1) it is natural for
2
We highlight that [49] optimize latents ztν , neural field fθ , and ODE Fψ using two separate objectives. We
instead found that our framework is more stable under single-objective optimization.

3
our model to have, by construction, the geometric properties that the modelled system is known to
posses - (2) to get more structured latent representations and facilitate the job of the neural ODE. To
achieve this we first need the latent space Z to be equipped with a well-defined group action with
respect to which ∀g ∈ G, z ∈ Z : Fψ (gz) =gFψ (z), and, most importantly, we need the relation
between the reconstructed field and the corresponding latent to be equivariant, i.e.,
∀g ∈ G , x ∈ X : Lg fθ (x; ztν ) = fθ (g −1 x; ztν ) = fθ (x; gztν ). (2)
Note that, somewhat imprecisely, we call this condition equivariance to convey the idea even though
it is not, strictly speaking, the commonly used definition of equivariance for general operators. If we
consider the decoder as a mapping from latents to fields, we can make the notion of equivariance of
this mapping more precise. Namely
f (x) = Dθ (z), Dθ (z) : ztν 7→ fθ (·; ztν ) , f (g −1 x) = Dθ (gz), Dθ (gz) : g ztν 7→ fθ (g −1 ·; ztν ) . (3)
In Sec. 3.1 we describe the Equivariant Neu-
ral Field (ENF)-based decoder, which satisfies
equation (2). Second, in Sec. 3.2 we outline the
graph-based equivariant neural ODE. Sec. 3.3
explains the motivation for- and use of- meta-
learning for obtaining the ENF backbone param-
eters. We show how the combination of equiv-
ariance and meta-learning produce much more
structured latent representations of continuous
signals (Fig. 3).

3.1 Representing
Figure 2: The proposed framework respects pre-
PDE states with Equivariant Neural Fields
defined symmetries of the PDE: a rotated solution
We briefly recap ENFs here, referring the reader Lg νT νmay be obtained either by solving from la-ν
to [48] for more detail. We extend ENFs to tent z0 (top-left)νand transforming the solution zT
symmetries for PDEs over varying geometries. (top-right) to gzT (bottom-right) or transforming
z0ν to gz0ν (bottom-left) and solving this.
ENFs as cross-attention over bi-invariant attributes. Attention-based conditional neural fields
represent a signal ν ∈ D with a corresponding latent set z ν [50]. This class of conditional neural fields
obtain signal-specific reconstructions ν(x) ≈ fθ (x; z ν ) through a cross-attention operation between
the latent set z ν and input coordinates x. ENFs [48] extend this approach by imposing equivariance
constraints w.r.t a group G ⊆ SE(n) on the relation between the neural field and the latents such
that transformations to the signal ν correspond to transformation of the latent z ν (Eq. (2)). For this
condition to hold, we need a well-defined action on the latent space Z of fθ . To this end, ENFs define
elements of the latent set z ν as tuples of pose pi ∈ G and context ci ∈ Rd , z ν := {(pi , ci )}N i=1 .
The latent space is then equipped with a group action defined as gz = {(gpi , ci )}N i=1 . To achieve
equivariance over transformations ENFs follow [5] where equivariance is achieved with convolutional
weight-sharing over equivalence classes of points pairs x, x′ . ENFs instead extend weight-sharing to
cross-attention over bi-invariant attributes of z, x pairs.
Weight-sharing over bi-invariant attributes of z, x is motivated by Eq. 2, by which we have:
fθ (x; z) = fθ (gx; gz). (4)
−1
Intuitively, the above equation says that a transformation g on the domain of fθ , i.e. g x, can
be undone by also acting with g on z. In other words, the output of the neural field fθ should be
bi-invariant to g−transformations of the pair z, x. For a specific pair (zi , xm ) ∈ Z × X, the term
bi-invariant attribute ai,m describes a function a : (zi , xm ) 7→ a(zi , xm ) such that a(zi , xm ) =
a(gzi , gxm ). Thorughout the paper we use ai,m as shorthand for a(zi , xm ).
To parameterize fθ , we can accordingly choose any function that is bi-invariant to G−transformations
of z, x. In particular, for an input coordinate xm ENFs choose to make fθ a cross-attention operation
between attributes ai,m and the invariant context vectors ci :
fθ (xm , z) = cross_attn(a:,m , c: , c: ) (5)
As an example, for SE(n)-equivariance, we can define the bi-invariant simply using the group action:
SE(n)
ai,m = p−1 T
i xm = Ri (xm − xi ), which is bi-invariant by:
∀g ∈ SE(n) : (pi , x) 7→ (g pi , g x) ⇔ p−1
i x 7→ (g pi )
−1
g x = p−1
i g
−1
g x = p−1
i x . (6)

4
Bi-invariant attributes for PDE solving. As explained above, ENF is equivariant to SE(n)-
transformations by defining fθ as a function of an SE(n)−bi-invariant attribute aSE(n) . Although
many physical processes adhere to roto-translational symmetries, we are also interested in solving
PDEs that - due to the geometry of the domain, their specific formulation, and/or their boundary
conditions - are not fully SE(n)−equivariant. As such, we are interested in extending ENFs to
equivariances that are not strictly (subsets of) SE(n), which we show we can achieve by finding
bi-invariants that respect these particular transformations. Below, we provide two examples, the other
invariants we use in the experiments - including a "bi-invariant" a∅ that is not actually bi-invariant to
any geometric transformations, which we use to ablate over equivariance constraints - are in Appx. D.
The flat 2-torus. When the physical domain of interest is continuous and extends indefinitely, periodic
boundary conditions are often used, i.e. the PDE is defined over a space topologically equivalent
to that of the 2-torus. Such boundary conditions break SO(2) symmetries; assuming the domain
has periodicity π and none of the terms of this PDE depend on the choice of coordinate frame,
these boundary conditions imply that the PDE is equivariant to periodic translations: the group of
translations modulo π: T2 ≡ R2 /Z2 . In this case, periodic functions over x, y with periods π would
2
work as a bi-invariant, i.e. using poses p ∈ T2 , aT = cos(2π(x0 − p0 )) + cos(2π(x1 − p1 )) - which
π
happens to be bi-invariant to rotations by 2 as well. Instead, since we do not assume any rotational
symmetries to exist on the torus, we opt for a non-rotationally symmetric function:
2
aTi,m = cos(2π(x0i − p0i )) ⊕ cos(2π(x1i − p1i )), (7)

where ⊕ denotes concatenation. This bi-invariant is used in experiments on Navier-Stokes over the
flat 2-Torus.
The 2-sphere. In some settings a PDE may be symmetric only to rotations along a certain axes. An
example is that of the global shallow-water equations on the two-sphere - used to model geophysical
processes such as atmospheric flow [16], which are characterised by rotational symmetry only along
the earth’s axis of rotation due to inclusion of a term for Coriolis acceleration that breaks full SO(3)
equivariance. We use poses p ∈ SO(3) parametrised by Euler angles ϕ, θ, γ, and spherical coordinates
ϕ, θ for x ∈ S 2 . We make the first two Euler angles coincide with the spherical coordinates and
define a bi-invariant for rotations around the axis θ = π.

aSW
i,m = ∆ϕpi ,xm ⊕ θpi ⊕ γpi ⊕ θxm , (8)

where ∆ϕpi ,xm =ϕpi −ϕxm −2π if ϕpi −ϕxm > π and ∆ϕpi ,xm =ϕpi −ϕxm + 2π if ϕpi −ϕxm < −π,
to adjust for periodicity.
In summary, to parameterize an ENF equivariant with respect to a specific group we are simply
required to find attributes that are bi-invariant with respect to the same group. In general we achieve
this by using group-valued poses and their action on the PDE domain.

3.2 PDE solution as latent space flow

Let z0ν be a latent set that faithfully reconstructs the initial state ν0 . We want to define a neural
dz ν
ODE Fψ that map latents ztν to their temporal derivatives τ =Fψ (zτν ) that is equivariant with

respect to the group action: gFψ (zτν )=Fψ (gzτν ). To this end, we use a message passing neural
network (MPNN) to learn a flow of poses pi and contexts ci over time. We base our architecture
on PΘNITA [5], which employs convolutional weight-sharing over bi-invariants for SE(n). For an
in-depth recap of message-passing frameworks, we refer the reader to Appx. A. Since Fψ is required
to be equivariant w.r.t. the group action, any updates to the poses pi should also be equivariant. [40]
propose to parameterize an equivariant node position update by using a basis spanned by relative
node positions xj − xi . In our setting, poses pi are points on a manifold M equipped with a group
action. As such, we analogously propose parameterizing pose updates by a weighted combination
of logarithmic maps logpi (pj ), which intuitively describe the relative position between pi , pj in the
tangent space Tpi M , or the displacement from pi to pj . We integrate the resulting pose update over
the manifold through the exponential map exppi . In the euclidean case logpi (pj )=xj − xi and we
get back node position updates per [40]. In short, the message passing layers we use consist of the

5
(a) (b) (c)
Figure 3: We show the impact of meta-learning and equivariance on the latent space of the ENF when
representing trajectories of PDE states. Fig. 3a shows a T-SNE plot of the latent space of fθ when ztν
is optimized with autodecoding, and no weight sharing over bi-invariants is enforced. Fig. 3b shows
the latent space when meta-learning is used, but no weight sharing is enforced. Fig. 3c shows the
latent space when ztν are obtained using meta-learning and fθ shares weights over aSE(n) .

following update functions:


 
X 1 X
cl+1
i = k context (ali,j )clj , pl+1
i = exppli k pose (ali,j )clj logpli (plj ) ,
N
(pj ,cj )∈z ν,l (plj ,clj )∈z ν,l
(9)
with k context , k pose message functions weighting the incoming context and pose updates, parameterized
by a two-layer MLP as a function of the respective bi-invariant.

3.3 Obtaining the initial latent z0ν

Until now we’ve not discussed how to obtain latents corresponding to the initial condition z0ν . An
approach often used in conditional neural field literature is that of autodecoding [35], where latents z ν
are optimized for reconstruction of the input signal ν with SGD. Optimizing a NeF for reconstruction
does not necessarily lead to good quality representations [34], i.e. using MSE-based autodecoding to
obtain latents ztν - as is proposed by [49] - may complicate the latent space, impeding optimization
of the neural ODE Fψ . Moreover, autodecoding requires many optimization steps at inference (for
reference, [49] use 300-500 steps). [13] propose meta-learning as a way to overcome long inference
times, as it allows for fitting latents in a few steps - typically three or four. We hypothesize that
meta-learning may also structure the latent space - similar to the impact of equivariance constraints,
since the very limited number of optimization steps requires efficient organization of latents ztν
around the (shared) initialization, forcing together the latent representation of contiguous states. To
this end, we propose to use meta-learning for obtaining the initial latent z0ν , which is then unrolled by
the neural ode Fψ to find solutions ztν .

3.4 Equivariance and meta-learning structure the latent space Z

As a first validation of the hypotheses that both equivariance constraints and meta-learning introduce
structure to the latent space of fθ , we visualize latent spaces of different variants of the ENF. We fit
ENFs to a dataset consisting of solutions to the heat equation for various initial conditions (details
in Appx. E). For each sample νt , we obtain a set of latents ztν , which we average over the invariant
context vectors ci ∈ Rc to obtain a single vector in Rc invariant to a group action according to the
chosen bi-invariant. Next, we apply T-SNE [46] to the resulting vectors in Rc . We use three different
setups: (a) no meta-learning, model weights θ and latents ztν optimized for every νt separately using
autodecoding [35], and no equivariance imposed (per Eq. 15), shown in Fig. 3a. (b) meta-learning
is used to obtain θ,ztν , but no equivariance imposed, shown in Fig. 3b and (c) meta-learning is
used to obtain θ,ztν and SE(2)-equivariance is imposed by weight-sharing over aSE(n) bi-invariants,
shown in Fig. 3c. The results confirm our intuition that both meta-learning and equivariance improve
latent-space structure.

Recap: optimization objective. We use a meta-learning inner-loop [28, 13] to obtain the initial
latent z0ν under supervision of coordinate-value pairs (x, ν(x)0 )x∈X from ν0 . This latent is unrolled

6
for ttrain timesteps using Fψ . The obtained latents are used to reconstruct states ztν along the trajectory
of ν, and parameters of fθ , Fψ are optimised for reconstruction MSE, as shown in the left-hand side
of Eq. 1. See Appx. B for detailed pseudocode of this process.

4 Experiments
We intend to show the impact of symmetry-preservation in continuous PDE solving. To this end we
perform a range of experiments assessing different qualities of our model on tasks with different
symmetries. First, we investigate the equivariance properties of our framework by evaluating it
against unseen geometric transformations of the initial conditions. Next, we assess generalization
and extrapolation capabilities w.r.t. unseen spatial locations and time horizons inside and outside the
time ranges seen during training respectively, robustness to partial test-time observations, and data-
efficiency. As the continuous nature of NeF-based PDE solving allows, we verify these properties for
PDEs defined over challenging geometries: the plane R2 , 2-torus T2 and the sphere S 2 and the 3D
ball B3 . Architectural details and hyperparameters are in Appx. E. Code is attached to submission.

4.1 Datasets and evaluation

All datasets are obtained by randomly sampling


disjoint sets of initial conditions for train and Table 1: MSE ↓ for heat equation on R2 .
test sets, and solving them using numerical meth-
ods. Dataset-specific details on generation can t TRAIN
IN t TRAIN
OUT t TEST
IN t
OUT TEST

2 DINo [49] 5.92E-04 2.40E-04 3.85E-03 5.12E-03


be found in Appx E. •Heat equation on R and Ours a ∅
6.23 E-06 4.90
±1.01 E-06 2.19
±20.1 E-03 5.08
±0.32 ±13.2 E-04
S 2 . The heat equation describes diffusion over Ours a SE(2)
1.18 E-05 2.53
±0.45 E-05 1.50
±3.50 ±0.77E-05 2.53
±3.43 E-05
dc
a surface: dt = D∇2 c, where c is a scalar field,
and D is the diffusivity coefficient. We solve it
∂2c ∂2c
on the 2D plane where ∇2 c = ∂x 1
+ ∂x 2
- and
2
on the 2-sphere  S where in spherical coordi- 
1 ∂ ∂c 1 ∂2c
2

nates: ∇ c = sin θ ∂θ sin θ ∂θ + sin2 θ ∂ϕ 2 .
Although a relatively simple PDE, we find that
defining it over a non-trivial geometry such
as the sphere proves hard for non-equivariant
methods. •Navier-Stokes on T2 . We solve 2D Figure 4: A train and test sample from the planar
Navier Stokes [42] for an incompressible fluid diffusion dataset. Initial conditions for train and
with dynamics dv dt = −u∇v + v∆µ + f, v = test are spikes in disjoint subsets of R2 .
∇ × u, ∇u = 0, where u is the velocity field, v
the vorticity, µ the viscosity and f a forcing term (see Appx. E). We create a dataset of solutions
for the vorticity using Gaussian random fields as initial conditions. Due to the incompressibility
condition, it is natural to solve this PDE with periodic boundary conditions corresponding to the
topology of a 2-Torus T2 - implying equivariance to periodic translation. •Shallow-water on S2 . The
global shallow-water equations model large-scale oceanic and atmospheric flow on the globe, derived
from Navier-Stokes under assumption of shallow fluid depth. The global shallow-water equations (see
Appx. E) include terms for Coriolis accelleration, which makes this problem equivariant to rotation
along the globe’s axis of rotation. We follow the IVP specified by [16], and create a dataset of paired
vorticity-fluid height solutions. •Internally-heated convection in a 3D ball. We solve the Boussinesq
equation for internally heated convection in a ball, a model relevant for example in the context of
the Earth’s mantle convection. It involves continuity equations for mass conservation, momentum
equations for fluid flow under pressure, viscous forces and buoyancy, and a term modelling heat
transfer. We generate initial conditions varying the internal temperature using N (0, 1) noise and
obtain solutions for the temperature defined over a regular spherical ϕ, θ, r grid.

Evaluation. All reported MSE values are for predictions obtained given only the initial condition v0 ,
with std over 3 runs. We evaluate two settings for train and test sets both: generalization setting with
time evolution happening within the seen horizon during training (tin ); and, extrapolation setting
with the time evolution happening outside the seen horizon during training (tout ). For both cases we
measure the mean-squared error (MSE). To position our work relative to competitive data-driven
PDE solvers, on the 2D-Navier-Stokes experiment we provide comparisons with a range of baselines.

7
Figure 5: A Navier-Stokes test sample (top) and corre- Figure 6: Test MSE tin for increasing
sponding predictions from our model (bottom). We visu- training set sizes for the heat equa-
alize predictions in the train horizon tin = [0, ..., 9] , tout = tion over the sphere. Equivariant im-
[10, ..., 20] and beyond. The model remains stable well be- proves over non-equivariant. For ref-
yond the train horizon, but due to accumulated errors fails to erence we show performance of DINo
capture dynamics beyond t > 40. [49] trained on 256 trajectories.

In most other settings these models cannot straightforwardly be applied, and we only compare to
[49], to our knowledge the only other fully continuous PDE solving method in literature.

Equivariance properties - heat equation on the plane. To verify our framework respects the posed
equivariance constraints, we create a dataset of solutions to the heat equation that requires a neural
solver to respect equivariance constraints to achieve good performance. Specifically, for initial condi-
tions we randomly insert a pulse of variable intensity in x = (x1 , x2 ) ∈ R2 s.t. −1<x1 <1, 0<x2 <1
for the training data and −1<x1 <1, −1<x2 <0 for the test data. Intuitively, train and test sets
contain spikes under different disjoint sets of roto-translations (see Fig. 4). We train variants of
our framework with (aSE(2) , Eq. 6) and without (a∅ , Eq. 15) equivariance constraints. In this
dataset, we set tin = [0, ..., 9], and evaluation horizon tout = [10, ..., 20]. Results in Tab. 1 show that
the non-equivariant model, as well as the baseline [49] are unable to successfully solve test initial
conditions, whereas the equivariant model performs well.

Robustness to subsampling & time-horizons Table 2: MSE ↓ for Navier-Stokes on T2 .


- Navier-Stokes on the 2-Torus. We perform
an experiment assessing the impact of equivari- t TRAIN t
IN TRAIN OUT t TEST tIN TEST OUT

100% OF ν OBSERVED
ance constraints and meta-learning on robust- CNODE [2] 6.02E-02 3.35E-01 5.48E-02
0
3.17E-01
ness to sparse test-time observations of the ini- FNO
G-FNO
9.43E-05
3.13E-05
2.11E-03
3.49E-04
8.44E-05
3.15E-05
1.60E-03
3.52E-04
tial condition. To this end, we train a model DINo [49] 8.20E-03
T2 /π
6.85E-02 1.11E-02 9.08E-02
2 Ours a 5.60
AD, E-02 0.37 E-01 6.75
±0.43 ±0.34 E-02 4.00 E-01
±0.62 ±0.38
with (aT , Eq. 7), without (a∅ , Eq. 15) equivari- Ours a 1.41 ∅
E-02 1.67 E-01 2.60
±1.83 ±1.27 E-02 2.14 E-01
±3.16 ±1.46
T2 /π
Ours a 1.45 E-03 9.14 E-03 1.57 E-03 1.16 E-02
ance constraints, and one with equivariance con- ±0.08 ±0.36 ±0.09 ±0.14

2 50% OF ν OBSERVED 0
straints and without meta-learning (AD aT , Eq. CNODE [2] 1.38E-01 6.33E-01 1.52E-01 6.76E-01
FNO 3.31E-02 1.39E-01 3.20E-02 1.47E-01
7), on a fully-observed train set. The training G-FNO 2.75E-02 1.17E-01 2.32E-02 1.01E-01
DINo [49] 3.67E-02 2.81E-01 3.74E-02 2.83E-01
horizon tin = [0, ..., 9], and evaluation horizon Ours a 6.89
AD,
T2 /π
E-02 3.95 E-01 7.01
±2.68 ±2.18 E-02 4.01 E-01
±3.56 ±2.29

tout = [10, ..., 20]. Subsequently, we apply the Ours a 1.05 ∅


E-02 1.45
T2 /π
E-01 2.60
±0.04 ±0.01 E-02 2.14 E-01
±3.16 ±1.46
Ours a 1.50 E-03 8.97
±0.17 E-03 5.75
±1.57 E-03 5.03
±2.58E-02 ±2.63
trained model to the problem of solving from 5% OF ν OBSERVED 0
sparse initial conditions v0 , with observation CNODE [2] 1.23E+01 2.14E+01 1.20E+01 4.35E+01
FNO 4.13E-01 7.70E-01 3.84E-01 7.07E-01
rates where 50% and 5% of the initial condition G-FNO 3.56E-01 7.09E-01 3.40E-01 6.47E-01
DINo [49] 3.67E-02 2.81E-01 3.94E-02 2.91E-01
is observed (Tab. 2). Approaches operating on Ours a 6.89
AD,
T2 /π
E-02 3.95 E-01 7.01
±2.68 ±2.18 E-02 4.01 E-01
±3.56 ±2.29
discrete (CNODE [2]) and regular grids (FNO Ours a 7.31 ∅
E-02 2.97
T2 /π
E-01 7.96
±1.37 ±2.42 E-02 3.35 E-01
±1.65 ±3.41
Ours a 3.19 E-02 1.33
±1.07 E-01 3.44
±0.35 E-02 1.61
±1.43E-01 ±4.93
[29], G-FNO [20]) perform very well when eval-
uated on fully-observed regular grids, outperforming continuous approaches (ours, [49]). However,
we note that all discrete/regular models greatly deteriorate in performance when observation rates
decrease. Equivariance constraints and meta-learning clearly improve performance overall, achieving
best perfomance in all sparse settings. Our proposed framework performs competitively to discrete
baselines and other NeF based PDE solving methods [49] in the fully observed setting. To qualita-
tively assess long-term stability well-beyond the train horizon, we visualizate test trajectory and the
solution found by our model for tin = [0, ..., 9] , tout = [10, ..., 20] and beyond in Fig. 5.

Data-efficiency - Diffusion on the sphere. To assess the impact of equivariance on data efficiency,
we vary the size of the training set of heat equation solutions from 16 to 64 trajectories and apply a
model with (aSO(3) , Eq. 13) and without (a∅ , Eq. 15) equivariance constraints. In this dataset, we set

8
tin = [0, ..., 9], and evaluation horizon tout = [10, ..., 20]. We visualize tin test- and train MSE in Fig.
6. These results show the non-equivariant model overfitting the training set for smaller numbers of
trajectories while unable to solve the PDE satisfactorily, whereas the equivariant model generalizes
well even with only 16 training trajectories.
Super-resolution - Shallow-Water on the
sphere. Due to their continuous nature, NeF- Table 3: MSE ↓ on Shallow-Water equations on
based approaches inherently support zero-shot the sphere.
super-resolution. In this setting, we generate
a set of solutions for the global shallow-water tIN TRAIN tOUT TRAIN tIN TEST tOUT TEST

equations over S2 at 2× resolution, and apply DINo [49] 1.75E-04


T RAIN RESOLUTION
1.36E-03 2.01E-04 1.37E-03
mean-pooling with a kernel size of 2 to obtain Ours aSW 9.94±0.41E-05 1.89±0.03E-03 1.09±1.14E-04 1.87±0.04E-03
ZERO - SHOT 2 X SUPER - RESOLUTION
a low-resolution dataset. We train a model that DINo [49] 3.03E-04 2.03E-03 3.37E-04 2.03E-03
respects rotational symmetries along the rota- Ours aSW 1.58 ± 0.02E-04 1.96 ±0.02E-03 1.61 ±0.01E-04 1.93 ±0.02E-03

tion axis of the globe (aSW , Eq. 8) at train


resolution, and evaluate the model by solving
initial conditions at 2× resolution (Tab. 3, Fig.
7). In this dataset, we set tin = [0, ..., 9], and
evaluation horizon tout = [10, ..., 14]. First, we
note that our model has difficulty capturing the
dynamics near tout - and beyond the training
horizon, i.e. t => 9 - we suspect because of
accumulation of reconstruction errors impacting
the ability of Fψ to model the relatively volatile
dynamics of these equations. This points to a
drawback of NeF-based solvers: error accumu-
lation starts with the reconstruction error on the Figure 7: Test samples at train resolution (top),
initial condition. Ranging over our experiments, 2× train resolution (middle) and corresponding
we found that this error can be reduced by in- predictions from our equivariant model (aSW Eq.
creasing model capacity, at steep cost of com- 8 (bottom). The model does not produce significant
putational complexity attributable to the global upsampling artefacts, but fails to capture dynamics
attention operator in the ENF backbone. Regard- outside the training horizon.
ing super-resolution; the model is able to solve
the high-resolution initial conditions without in-
ducing significantly increased MSE - it does not Table 4: MSE ↓ on Internally-Heated Convection
produce significant artefacts in the process. in the ball.
tIN TRAIN tOUT TRAIN tIN TEST tOUT TEST
Challenging geometries - Internally heated DINo [49] 2.94E-03 7.56E-02 3.06E-03 7.78E-02
convection in 3D ball. We show the value Ours a 5.79 B3
E-04 7.72 E-03 5.99
±0.17 E-04 7.97
±0.55 E-03
±0.15 ±0.46

of inductive biases in modelling over a chal-


lenging geometry. We apply an equivariant
3
model (aB , Eq. 14) to a set of solutions to
Boussinesq internally heated convection in a
ball defined over a regular ϕ, θ, r-grid, where
we set tin = [0, ..., 9], and evaluation horizon
tout = [10, ..., 14]. Results (Tab. 4, Fig. 8)
for our equivariant model show good generaliza-
tion compared to a non-equivariant baseline [49].
We interpret this as an indication of a marked
reduction in solving-complexity when correctly Figure 8: Test samples (top) and corresponding
accounting for a PDE’s symmetries. predictions from our model equivariant to S 2 -
rotations in the ball. (Eq. 14)
5 Conclusion
We introduce a novel equivariant space-time continuous framework for solving partial differential
equations (PDEs). Uniquely - our method handles sparse or irregularly sampled observations of
the initial state while respecting symmetry-constraints and boundary conditions of the underlying
PDE. We clearly show the benefit of symmetry-preservation over a range of challenging tasks, where
existing methods fail to capture the underlying dynamics.

9
References
[1] Ilze Amanda Auzina, Çağatay Yıldız, Sara Magliacane, Matthias Bethge, and Efstratios Gavves.
Modulated neural odes. Advances in Neural Information Processing Systems, 36, 2024.
[2] Ibrahim Ayed, Emmanuel De Bezenac, Arthur Pajot, and Patrick Gallinari. Learning the spatio-
temporal dynamics of physical processes from partial observations. In ICASSP 2020-2020
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages
3232–3236. IEEE, 2020.
[3] Matthias Bauer, Emilien Dupont, Andy Brock, Dan Rosenbaum, Jonathan Richard Schwarz,
and Hyunjik Kim. Spatial functa: Scaling functa to imagenet classification and generation.
arXiv preprint arXiv:2302.03130, 2023.
[4] Erik J Bekkers. B-spline cnns on lie groups. In International Conference on Learning Repre-
sentations, 2019.
[5] Erik J Bekkers, Sharvaree Vadgama, Rob D Hesselink, Putri A van der Linden, and David W
Romero. Fast, expressive se (n) equivariant networks through weight-sharing in position-
orientation space. arXiv preprint arXiv:2310.02970, 2023.
[6] Johannes Brandstetter, Rob Hesselink, Elise van der Pol, Erik J Bekkers, and Max Welling.
Geometric and physical quantities improve e (3) equivariant message passing. arXiv preprint
arXiv:2110.02905, 2021.
[7] Johannes Brandstetter, Rianne van den Berg, Max Welling, and Jayesh K Gupta. Clifford neural
layers for pde modeling. arXiv preprint arXiv:2209.04934, 2022.
[8] Johannes Brandstetter, Daniel Worrall, and Max Welling. Message passing neural pde solvers.
arXiv preprint arXiv:2202.03376, 2022.
[9] Michael M Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković. Geometric deep learning:
Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478, 2021.
[10] Keaton J. Burns, Geoffrey M. Vasil, Jeffrey S. Oishi, Daniel Lecoanet, and Benjamin P. Brown.
Dedalus: A flexible framework for numerical simulations with spectral methods. Physical
Review Research, 2(2):023068, April 2020. doi: 10.1103/PhysRevResearch.2.023068.
[11] Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary
differential equations. Advances in neural information processing systems, 31, 2018.
[12] Taco Cohen and Max Welling. Group equivariant convolutional networks. In International
conference on machine learning, pages 2990–2999. PMLR, 2016.
[13] Emilien Dupont, Hyunjik Kim, SM Eslami, Danilo Rezende, and Dan Rosenbaum. From
data to functa: Your data point is a function and you can treat it like one. arXiv preprint
arXiv:2201.12204, 2022.
[14] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adap-
tation of deep networks. In International conference on machine learning, pages 1126–1135.
PMLR, 2017.
[15] Marc Finzi, Samuel Stanton, Pavel Izmailov, and Andrew Gordon Wilson. Generalizing
convolutional neural networks for equivariance to lie groups on arbitrary continuous data. In
International Conference on Machine Learning, pages 3165–3176. PMLR, 2020.
[16] Joseph Galewsky, Richard K Scott, and Lorenzo M Polvani. An initial-value problem for testing
numerical models of the global shallow-water equations. Tellus A: Dynamic Meteorology and
Oceanography, 56(5):429–440, 2004.
[17] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural
message passing for quantum chemistry. In International conference on machine learning,
pages 1263–1272. PMLR, 2017.

10
[18] Samuel Greydanus, Misko Dzamba, and Jason Yosinski. Hamiltonian neural networks. Advances
in neural information processing systems, 32, 2019.

[19] Xiaoxiao Guo, Wei Li, and Francesco Iorio. Convolutional neural networks for steady flow ap-
proximation. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge
discovery and data mining, pages 481–490, 2016.

[20] Jacob Helwig, Xuan Zhang, Cong Fu, Jerry Kurtin, Stephan Wojtowytsch, and Shuiwang Ji.
Group equivariant fourier neural operators for partial differential equations. Proceedings of
the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202,
2023., 2023.

[21] Quercus Hernández, Alberto Badías, David González, Francisco Chinesta, and Elías Cueto.
Structure-preserving neural networks. Journal of Computational Physics, 426:109950, 2021.

[22] Pengzhan Jin, Zhen Zhang, Aiqing Zhu, Yifa Tang, and George Em Karniadakis. Sympnets:
Intrinsic structure-preserving symplectic networks for identifying hamiltonian systems. Neural
Networks, 132:166–179, 2020.

[23] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980, 2014.

[24] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional
networks. arXiv preprint arXiv:1609.02907, 2016.

[25] David M Knigge, David W Romero, and Erik J Bekkers. Exploiting redundancy: Separable
group convolutional networks on lie groups. In International Conference on Machine Learning,
pages 11359–11386. PMLR, 2022.

[26] Miltiadis Miltos Kofinas, Erik Bekkers, Naveen Nagaraja, and Efstratios Gavves. Latent field
discovery in interacting dynamical systems with neural fields. Advances in Neural Information
Processing Systems, 36, 2023.

[27] Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya,
Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function
spaces. arXiv preprint arXiv:2108.08481, 2021.

[28] Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. Meta-sgd: Learning to learn quickly for
few-shot learning. arXiv preprint arXiv:1707.09835, 2017.

[29] Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya,
Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differen-
tial equations. arXiv preprint arXiv:2010.08895, 2020.

[30] Yongtuo Liu, Sara Magliacane, Miltiadis Kofinas, and Efstratios Gavves. Graph switching
dynamical systems. In International Conference on Machine Learning, pages 21867–21883.
PMLR, 2023.

[31] Yongtuo Liu, Sara Magliacane, Miltiadis Kofinas, and Efstratios Gavves. Amortized equation
discovery in hybrid dynamical systems, 2024.

[32] Philipp Moser, Wolfgang Fenz, Stefan Thumfart, Isabell Ganitzer, and Michael Giretzlehner.
Modeling of 3d blood flows with physics-informed neural networks: Comparison of network
architectures. Fluids, 8(2):46, 2023.

[33] Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms.
arXiv preprint arXiv:1803.02999, 2018.

[34] Samuele Papa, David M Knigge, Riccardo Valperga, Nikita Moriakov, Miltos Kofinas, Jan-
Jakob Sonke, and Efstratios Gavves. Neural modulation fields for conditional cone beam neural
tomography. arXiv preprint arXiv:2307.08351, 2023.

11
[35] Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove.
Deepsdf: Learning continuous signed distance functions for shape representation. In Proceed-
ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174,
2019.
[36] Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. Film:
Visual reasoning with a general conditioning layer. In Proceedings of the AAAI conference on
artificial intelligence, volume 32, 2018.
[37] Adeel Pervez, Francesco Locatello, and Efstratios Gavves. Mechanistic neural networks for
scientific machine learning. arXiv preprint arXiv:2402.13077, 2024.
[38] Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W Battaglia. Learning
mesh-based simulation with graph networks. arXiv preprint arXiv:2010.03409, 2020.
[39] Michael Prasthofer, Tim De Ryck, and Siddhartha Mishra. Variable-input deep operator
networks. arXiv preprint arXiv:2205.11404, 2022.
[40] Vıctor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E (n) equivariant graph neural
networks. In International conference on machine learning, pages 9323–9332. PMLR, 2021.
[41] Vincent Sitzmann, Eric Chan, Richard Tucker, Noah Snavely, and Gordon Wetzstein. Metasdf:
Meta-learning signed distance functions. Advances in Neural Information Processing Systems,
33:10136–10147, 2020.
[42] George Gabriel Stokes et al. On the effect of the internal friction of fluids on the motion of
pendulums. 1851.
[43] Matthew Tancik, Pratul Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan,
Utkarsh Singhal, Ravi Ramamoorthi, Jonathan Barron, and Ren Ng. Fourier features let
networks learn high frequency functions in low dimensional domains. Advances in neural
information processing systems, 33:7537–7547, 2020.
[44] Matthew Tancik, Ben Mildenhall, Terrance Wang, Divi Schmidt, Pratul P Srinivasan, Jonathan T
Barron, and Ren Ng. Learned initializations for optimizing coordinate-based neural repre-
sentations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 2846–2855, 2021.
[45] Riccardo Valperga, Kevin Webster, Dmitry Turaev, Victoria Klein, and Jeroen Lamb. Learning
reversible symplectic dynamics. In Proceedings of The 4th Annual Learning for Dynamics
and Control Conference, volume 168 of Proceedings of Machine Learning Research, pages
906–916. PMLR, 23–24 Jun 2022.
[46] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine
learning research, 9(11), 2008.
[47] Maurice Weiler and Gabriele Cesa. General e (2)-equivariant steerable cnns. Advances in neural
information processing systems, 32, 2019.
[48] David R Wessels, David M Knigge, Samuele Papa, Riccardo Valperga, Efstratios Gavves, and
Erik J Bekkers. Grounding continuous representations in geometry: Equivariant neural fields.
ArXiv Preprint arXiv:, 2024.
[49] Yuan Yin, Matthieu Kirchmeyer, Jean-Yves Franceschi, Alain Rakotomamonjy, and Patrick
Gallinari. Continuous pde dynamics forecasting with implicit neural representations. arXiv
preprint arXiv:2209.14855, 2022.
[50] Biao Zhang, Jiapeng Tang, Matthias Niessner, and Peter Wonka. 3dshape2vecset: A 3d shape
representation for neural fields and generative diffusion models. ACM Transactions on Graphics
(TOG), 42(4):1–16, 2023.
[51] Maksim Zhdanov, David Ruhe, Maurice Weiler, Ana Lucic, Johannes Brandstetter, and Patrick
Forré. Clifford-steerable convolutional neural networks. arXiv preprint arXiv:2402.14730,
2024.

12
[52] David Zwicker. py-pde: A python package for solving partial differential equations. Journal
of Open Source Software, 5(48):2158, 2020. doi: 10.21105/joss.02158. URL https://doi.
org/10.21105/joss.02158.

A Related work
DL approaches to dynamics modelling In recent years, the learning of spatiotemporal dynamics has been
receiving significant attention, either for modelling interacting systems [31, 30], scientific Machine Learning
[49, 8, 7, 37, 26, 51], or even videos [1]. Most DL methods for solving PDEs attempt to directly replace solvers
with mappings between finite-dimensional Euclidean spaces, i.e. through the use of CNNs [19, 2] or GNNs
[38, 8] often applied autoregressively to an observed (discretized) PDE state. Instead, the Neural Operator (NO)
[27] paradigm attempts to learn infinite-dimensional operators, i.e. mappings between function spaces, with
limited success. Fourier Neural Operator (FNO) [29] extends this method by performing convolutions in the
spectral domain. FNO obtains much improved performance, but due to its reliance on FFT is limited to data on
regular grids.

Inductive biases in DL and dynamics modelling Geometric Deep Learning aims to improve model
generalization and performance by constraining/designing a model’s space of learnable functions based on
geometric principles. Prominent examples include Group Equivariant Convolutional Networks and Steerable
CNNs [12, 4], generalizations of CNNs that respect symmetries of the data - such as dilations and continuous
rotations [47, 15, 25]. Analogously, Graph Neural Networks (GNNs) [24] or Message Passing Neural Networks
(MPNNS) [17] are a variant of neural network that respects set-permutations naturally found in graph data. They
are typically formulated for graphs G = (V, E), with nodes i ∈ V and edges E. Typically nodes are embedded
into a node vector fi0 , which is subsequently updated over multiple layers of message passing. Message passing
consists of (1) computing messages mi,j over edges i, j from node j to i with the message function P (taking into
account edge attributes ai,j : mi,j = ϕm (fil , fjl , ai,j ) (2) aggegating incoming messages: mi = j∈N (i) mi,j ,
(3) computing updated node features fil+1 = ϕu (fil , mi ).
Recently, such methods have also been adapted for sparse physical data, e.g. for molecular property prediction
[40, 6] - where the GNN is additionally required to respect transformation symmetries. [5] unifies these
approaches to equivariance under the guise of weight sharing over equivalence classes defined by bi-invariant
attributes of pairs of nodes i, j, a viewpoint we leverage in constructing the equivariant conditioning latent ztν
corresponding to a PDE state νt . In the context of dynamics modelling, equivariant architectures have been
employed to incorporate various properties of physical systems in the modelling process, examples of such
properties are the symplectic structure [22], discrete symmetries such as reversing symmetries [45] and energy
conservation [18, 21].

Neural Fields in dynamics modelling Conditional Neural fields (NeFs) are a class of coordinate-based
neural networks, often trained to reconstruct discretely-sampled input continuously. More specifically, a
conditional neural field fθ : Rn → Rd is a field –parameterized by a neural network with parameters θ– that
maps input coordinates x ∈ Rn in the data domain alongside conditioning latents z to d-dimensional signal
values f (x) ∈ Rd . By associating a conditioning latent z ν ∈ Rc to each signal ν, a single conditional NeF fθ :
Rn × Rc → Rd can learn to represent families D of continuous signals such that ∀ ν ∈ D : f (x) ≈ fθ (x; zν ).
[13] showed the viability of using the latents zi as representations for downstream tasks (e.g. classification,
generation) proposing a framework for learning on neural fields. This framework inherits desirable properties of
neural fields, such as inherent support for sparsely and/or irregularly sampled data, and independence to signal
resolution. [49] propose to use conditional NeFs for PDE modelling by learning a continuous flow in the latent
space of a conditional neural field. In particular, a set of latents {zνi }Ti=1 are obtained by fitting a conditional
neural field to a given set of observations {νi }Ti=1 at timesteps 1, ..., T ; simultaneously, a neural ODE [11] Fψ
is trained to map pairs of temporally continuous latents s.t. solutions correspond to the trajectories traced by the
learned latents. Though this approach yields impressive results for sparse and irregular data in planar PDEs, we
show it breaks down on more challenging geometries. We hypothesize that this is due to a lack of a latent space
that preserves relevant geometric transformation with respect to which systems we are modelling are symmetric,
and as such propose an extension of this framework where such symmetries are preserved.

Obtaining Neural Fields representations Most NeF-based approach to representation or reconstruction


use SGD to optimize (a subset of) the parameters of the NeF, inevitably leading to significant overhead in
inference; conditional NeFs require optimizing a (set of) latents from initialization to reconstruct for a novel
sample. Accordingly, research has explored ways of addressing this limitation. [41, 44] propose using Meta-
Learning [14, 33] to optimize for an initialization for the NeF from which it is possible to reconstruct for a
novel sample in as few as 3 gradient descent steps. [13] propose to meta-learn the NeF backbone, but fix the
initialization for the latent z and instead optimize the learning rate used in its optimization using Meta-SGD

13
[28]. Recently, work has also explored the relation between initialization/optimization of a NeF and its value as
downstream representation; [34] show that (1) using a shared NeF initialization and (2) limiting the number
of gradient updates to the NeF improves performance in downstream tasks, as this simplifies the complex
relation between a NeFs parameter space and its output function space. We combine these insights and make
Meta-Learning part of our equivariant PDE solving pipeline, as it enables fast inference and we show it to
simplify the latent space of the ENF, improving performance of the neural ODE solver.

B Pseudocode for optimization objective


See Alg. 1 for pseudocode of the training loop that we use, written for a single datasample for simplicity of
notation. For simplicity, we further assume we’re using an euler stepper to solve the neural ODE, but this can be
replaced by any solver. For inference, this stratagem is identical, except we do not perform gradient updates to
θ, ψ.

Algorithm 1 Optimization objective


Randomly initialize neural field fθ
Randomly initialize neural ode Fψ
while not done do
Sample initial states and coordinates ν0 .
Initialize latents z0ν ← {(pi , ci )}N i=1 .
for all step ∈ Ninitial state opt = 3 do 
z0ν ← z0ν − ϵ∇z0ν Lmse fθ (·, z0ν ), ν0 )
end for
for all t ∈ [1, ..., tin ] do
Rt
ztν ← z0ν + 0 Fψ (zτν )dτ
end for
Update θ, ψ per:
tin 
θ ← θ − η∇θ L′mse , ψ ← ψ − η∇ψ L′mse with L′mse = ( fθ (·, ztν ), νt

t=0
end while

C Equivariant Neural Fields


ENF to reconstruct PDE states For ease of notation we denote P and C the matrices containing poses
and corresponding appearances stacked row-wise, i.e. Pi,: = pTi and Ci,: = cTi . Furthermore, we denote A as
the matrix containing all bi-invariants ai,m stacked row-wise, i.e. Ai,: = aTi,m :
Q(A)KT (C)
 
fθ (x; z νt ) := softmax √ + G(A) V(C; A), (10)
dk
where the softmax is applied over the latent set and with dk the hidden dimensionality of the ENF. The query
matrix Q is constructed as Q=Wq γqT (A), γq a Gaussian RFF embedding [43], followed by a linear layer Wq ,
i.e. Q consists of the RFF embedded bi-invariants of the input coordinate xm and each of the latent poses
pi stacked row-wise. The key matrix is given by a learnable linear transformation Wk of the context vectors
ci : K=Wk CT . The attention coefficients which result from the inner product of Q, K are weighted by a
Gaussian window G whose magnitude is conditioned on a distance measure on the relative distance between
latent poses and input coordinates as: Gi = σatt (||pi − x||2 ), with σatt a hyperparameter which determines
the locality of each of the latents. Finally the value matrix is calculated as a learnable linear transformation
Wv of the appearances A, conditioned through FiLM modulation [36] by a second RFF embedding of the
relative poses split into scale- and shift modulations: V=Wv A ⊙ Wvα γvα (A) + Wvβ γvβ (A). The latents
ztν are optimized for a single state νt , whereas the parameters θ of the ENF backbone - which consist of all the
learnable parameters of the linear layers Wq , Wk , Wv , Wvα , Wvβ used to construct Q, K, V - are shared
over all states.
The overall architecture consists of a linear layer WRc → Rd applied to ci ∈ Rc , followed by a layernorm.
After this, the cross attention listed above is applied, followed by three d-dim linear layers, the final one mapping
to the output dimension Rout .

Equivariance follows from sharing Q, K, V over equivalence classes Note that the latent space of
the ENF is equipped with a group action as: gztν = {(gpi , ai )}N
i=1 . As an example, SE(2)-equivariance of the
ENF follows from bi-invariance of the quantity a used to construct Q under the group action:
∀g ∈ SE(n) : (pi , x) 7→ (g pi , g x) ⇔ p−1
i x 7→ (g pi )
−1
g x = p−1
i g
−1
g x = p−1
i g
−1
g. (11)

14
And so, constructing the matrix containing the relative poses of bi-transformed poses and coordinates (gP)−1 gx
as ((gP)−1 gx)i,: = p−1i g
−1
gx = p−1
i x, we trivially have:

∀g ∈ SE(n) : (pi , x) 7→ (g pi , g x) ⇔ Q(A) 7→ Q(gA) = Q(A). (12)

D Defining additional bi-invariant attributes


Other examples of the bi-invariants attributes that are used in the experiments section are listed here.
Full rotation symmetries on the 2-sphere For the global shallow water equations we defined aSW as an attribute
that is bi-invariant only to rotations over globe’s axis, i.e. rotations over ϕ. In our experiments we also solve
diffusion over the sphere, which is fully SO(3) rotationally symmetric. To achieve equivariance to full 3d
rotations, we take poses p ∈ SO(3) parameterized by euler angles which act on points x ∈ S 2 parameterized by
3D unit vectors x through 3D-rotation matrices, allowing us to calculate the bi-invariant p−1 x:
SO(3)
ai,m = Ri xm . (13)
This bi-invariant is used in our experiments for diffusion on the 2-sphere.
The 3D ball B3 . We experiment with Boussinesq equation for internally heated convection in a ball. The PDE is
fully rotationally symmetric, but since the heat source K is at a fixed point (the center of the ball resp.), it is not
symmetric to translations of the initial conditions within the ball. As such, we let p ∈ SO(3) × R with ϕ, θ, γ, r
s.t. 0 < r < 1. The PDE is defined over spherical coordinates (ϕ, θ, r), which we map to vectors in x ∈ R3 .
We then use the following bi-invariant, which is only symmetric to rotations in SO(3):
3
aBi,m = Ri xm ⊕ rpi ⊕ rxm . (14)

No transformation symmetries. A simple "bi-invariant" for this setting that preserves all geometric information
is given by simply concatenating coordinates p with coordinates x:

a∅i,m = pi ⊕ xm (15)
Parameterizing the cross-attention operation in Eq. 5 as function of this bi-invariant results in a framework
without any equivariance constraints. We use this in experiments to ablate over equivariance constraints and its
impact on performance.

E Experimental Details
E.1 Dataset creation
For creating the dataset of PDE solutions we used py-pde [52] for Navier-Stokes and the diffusion equation on
the plane. For the shallow-water equation and the diffusion equation on the sphere, as well as the internally
heated convection in a 3D ball we used Dedalus [10].

Diffusion on the plane. For the diffusion equation on the plane we use as initial conditions narrow spikes
centred at random locations in the left half of the domain for the train set, and in the right half of the domain for
the test set. States are defined on a 64 × 64 grid ranging from -3 to 3. Initial conditions are randomly sampled
uniformly between -2 and 2 for x and 0 and 2 for y in the training set and between -2 and 2 for x and -2 and 0
for y. A random value uniformly sampled between 5.0 and 5.5 is inserted at the randomly sampled location.
We solve the equation with an Euler solver for 27 steps, discarding the first 7, with a timestep dt = 0.01. We
generate 1024 training and 128 test trajectories.

Navier-Stokes on the flat 2-torus. For Navier-Stokes on the flat 2-torus we use Gaussian random fields
as initial conditions and solve the PDE using a Cranck-Nicholson method with timestep dt = 1.0 for 20 steps.
The PDE is dv dt
= −u∇v + v∆µ + f, v = ∇ × u, ∇u = 0, where u is the velocity field, v the vorticity, µ the
viscosity and f a forcing term
dv
= −u∇v + v∆µ + f
dt
v =∇×u
∇u = 0,
where u is the velocity field, v the vorticity, µ the viscosity and f a forcing term. States are defined on a 64 ×
64 grid. We generate 8192 training and 512 test trajectories.

15
Diffusion on the 2-sphere. For the diffusion dataset on the sphere, states are defined over a 128 × 64
ϕ, θ grid. Initial conditions are generated as a gaussian peak inserted at a random point on the sphere with
σ = 0.25. The equation is solved for 20 timesteps with RK4 and dt = 1.0. We generate 256 training and 64 test
trajectories.

Spherical whallow-water equations [16]. The global shallow-water equations are


du
= −f k × u − g∇h + ν∆u
dt
dh
= −h∇ · u + ν∆h,
dt
d
where dt is the material derivative, k is the unit vector orthogonal to the surface of the sphere, u is the velocity
field that is tangent to the spherical surface and and h is the thickness of the fluid layer. The rest are constant
parameters of the Earth (see [16] for details). As initial conditions we follow [16] and use basic zonal flow,
representing a mid-latitude tropospheric jet, with a correspondingly balanced height field.

0
 h i for ϕ ≤ ϕ0
u(ϕ) = umax e
exp 1
(ϕ−ϕ0 )(ϕ−ϕ1 )
for ϕ0 < ϕ < ϕ1
 n

0 for ϕ ≥ ϕ1

Where umax = 80ms−1 , ϕ0 = π/7, ϕ1 = π/2 − ϕ1 , and en = exp[−4(ϕ1 − ϕ0 )2 ]. With this initial zonal
flow, we numerically integrate the balance equation
tan(ϕ′ )
Z ϕ  
gh(ϕ) = gh0 − au(ϕ′ ) f + u(ϕ′ ) dϕ′ ,
a
to obtain the height h. We then randomly generate small un-balanced perturbations h′ to the height field
2 2
h′ (θ, ϕ) = ĥ cos(ϕ)e−(θ2 −θ/α) e−[(ϕ2 −ϕ)/β]
by uniformly sampling α, β, ĥ, θ2 , and ϕ2 within a neighbourhood of the values use in [16]. States are defined
on a 192 × 96 grid for the high-resolution dataset, which is subsequently downsampled by 2 × 2 mean pooling
to a 96 × 48 grid. We generate 512 training trajectories and 64 test trajectories.

Internally-heated convection in the ball. The equations for the internally-heated convection system are
listed here, they include thermal diffusivity (κ) and kinematic viscosity (ν), given by:
κ = (Ra · Pr)−1/2
 −1/2
Ra
ν=
Pr
We set Ra = 1e − 6 and Pr = 1.
1. Incompressibility condition (continuity equation):
∇ · u + τp = 0

2. Momentum equation (Navier-Stokes equation):


∂u
− ν∇2 u + ∇p − rT + lift(τu ) = −u × (∇ × u)
∂t

3. Temperature equation:
∂T
− κ∇2 T + lift(τT ) = −u · ∇T + κTsource
∂t
4. Shear stress boundary condition (stress-free condition):
Shear Stress = 0 on the boundary

5. No penetration boundary condition (radial component of velocity at r = 1):


radial(u(r = 1)) = 0

6. Thermal boundary condition (radial gradient of temperature at r = 1):


radial(∇T (r = 1)) = −2

16
7. Pressure gauge condition: Z
p dV = 0

The boundary conditions imposed are stress-free and no-penetration for the velocity field and a constant thermal
flux at the outer boundary. These conditions are enforced using penalty terms (τ ) that are lifted into the domain
using higher-order basis functions.
States are defined over a 64 × 24 × 24 ϕ, θ, r grid. We use a SBDF2 solver which we constrain by dtmin = 1e − 4
and dtmax = 2e − 2. We evolve the PDE for 26 timesteps, discarding the first 6. We generate 512 training
trajectories and 64 test trajectories.

E.2 Training details


We provide hyperparameters per experiment. We optimize the weights of the neural field fθ , and neural ODE
Fψ with Adam [23] with a learning rate of 1E-4 and 1E-3 respectively. We initialize the inner learning rate
that we use in Meta-SGD [28] for learning z ν at 1.0 for p and 5.0 for c. For the neural ODE Fψ , we use 3 of
our message passing layers in the architecture specified in [5], with a hidden dimensionality of 128. The std
parameter of the RFF embedding functions γq , γvα , γvβ (see Appx. C), is chosen per experiment. We run all
experiments on a single A100. All experiments are ran 3 times.

Diffusion on the plane. We use 4 latents with c ∈ R16 . We set the hidden dim of the ENF to 64 and use 2
attention heads. We train the model for 1000 epochs. We set γq = 0.05, γvα = 0.01, γvβ = 0.01. We use a
batch size of 8. The model takes approximately 8 hours to train.

Navier-Stokes on the flat 2-torus. We use 4 latents with c ∈ R16 . We set the hidden dim of the ENF to
64 and use 2 attention heads. We train the model for 2000 epochs. We set γq = 0.05, γvα = 0.2, γvβ = 0.2.
We use a batch size of 4. The model takes approximately 48 hours to train.

Diffusion on the 2-sphere. We use 18 latents with c ∈ R4 . We set the hidden dim of the ENF to 16 and
use 2 attention heads. We train the model for 1500 epochs. We set γq = 0.01, γvα = 0.01, γvβ = 0.01. We use
a batch size of 2. The model takes approximately 12 hours to train.

Spherical whallow-water equations [16]. We use 8 latents with c ∈ R3 2. We set the hidden dim of the
ENF to 128, and use 2 attention heads. We train the model for 1500 epochs. γq = 0.05, γvα = 0.2, γvβ = 0.2.
We use a batch size of 2. The model takes approximately 24 hours to train.

Internally-heated convection in the ball We use 8 latents with c ∈ R3 2. We set the hidden dim of the
ENF to 128, and use 2 attention heads. We train the model for 1500 epochs. γq = 0.05, γvα = 0.2, γvβ = 0.2.
We use a batch size of 2. The model takes approximately 24 hours to train.

Baselines As baseline models on Navier-Stokes we train FNO and GFNO [29] with 8 modes and 32 channels
for 700 epochs (until convergence). We train CNODE [2] with 4 layers of size 64 for 300 epochs (until
convergence). We train DINo on all experiments for 2000 epochs with an architecture as specified in [49]. For
the IHC and shallow-water experiments, we increase the latent dim from 100 to 200, the number of layers for
the neural ODE from 3 to 5, and the latent dim of the neural field decoder from 64 to 256, as per [49].

17

You might also like