ST PDE Equivariant
ST PDE Equivariant
David M. Knigge∗,1 , David R. Wessels∗,1 , Riccardo Valperga1 , Samuele Papa1 , Jan-Jakob Sonke2 ,
Efstratios Gavves †,1 , Erik J. Bekkers †,1
1
University of Amsterdam 2 Netherlands Cancer Institute
[email protected], [email protected]
arXiv:2406.06660v1 [cs.LG] 10 Jun 2024
Abstract
1 Introduction
Partial Differential Equations (PDEs) are a foundational tool in modelling and understanding spatio-
temporal dynamics across diverse scientific domains. Classically, PDEs are solved using numerical
methods such as finite elements, finite volumes, or spectral methods. In recent years, Deep Learning
(DL) methods have emerged as promising alternatives due to abundance of observed and simulated
data as well as the accessibility to computational resources, with applications ranging from fluid
simulations and weather modelling [49, 7] to biology [32].
Figure 1: We propose to solve an equivariant PDE in function space by solving an equivariant ODE in
latent space. Through our proposed framework, which leverages Equivariant Neural Fields fθ , a field
νt is represented by a set of latents ztν = {(pνi , cνi )}N
i=1 consisting of a pose pi and context vector ci .
Using meta-learning, the initial latent z0ν is fit in only 3 SGD steps, after which an equivariant neural
ODE Fψ models the solution as a latent flow.
The systems modelled by PDEs often have underlying symmetries. For example, heat diffusion
or fluid dynamics can be modeled with differential operators which are rotation equivariant, e.g.,
given a solution to the system of PDEs, its rotation is also a valid solution 1 . In such scenarios it is
sensible, and even desirable, to design neural networks that incorporate and preserve such symmetries
to improve generalization and data-efficiency [12, 47, 4].
Crucially, DL-based approaches often rely on data sampled on a regular grid, without the inherent
ability to generalize outside of it, which is restrictive in many scenarios [39]. To this end, [49] propose
to use Neural Fields (NeFs) for modelling and forecasting PDE dynamics. This is done by fitting a
neural ODE [11] to the conditioning variables of a conditional Neural Field trained to reconstruct
states of the PDE [13]. However, this approach fails to leverage aforementioned known symmetries
of the system. Furthermore, using neural fields as representations has proved difficult due to the
non-linear nature of neural networks [13, 3, 34], limiting performance in more challenging settings.
We posit that NeF-based modelling of PDE dynamics benefits from representations that account
for the symmetries of the system as this allows for introducing inductive biases into the model that
ought to be reflected in solutions. Furthermore, we show that through meta-learning [28, 44] the NeF
backbone improves performance for complex PDEs by further structuring the NeF’s latent space,
simplifying the task of the neural ODE.
We introduce a framework for space-time continuous equivariant PDE solving, by adapting a class
of SE(n)-Equivariant Neural Fields (ENFs) to PDE-specific symmetries. We leverage the ENF as
representation for modelling spatiotemporal dynamics. We solve PDEs by learning a flow in the
latent space of the ENF - starting at a point z0 corresponding to the initial state of the PDE - with an
equivariant graph-based neural ODE [11] we develop from previous work [5]. We extend the ENF
to equivariances beyond SE(n), by extending its weight-sharing scheme to equivalance classes for
specific symmetries relevant to our setting. Furthermore, we show how meta-learning [14, 28, 44, 13],
can not only significantly reduce inference time of the proposed framework, but also substantially
simplify the structure of the latent space of the ENF, thereby simplifying the learning process of the
latent dynamics for the neural ODE model. We present the following contributions:
We structure the paper as follows: in Sec. 2 we provide an overview of the mathematical preliminaries
and describe the problem setting. Our proposed framework is introduced in Sec. 3. We validate
our framework on different PDEs defined over a variety of geometries in Sec. 4, with differing
equivariance constraints, showing competitive performance over other neural PDE solvers.We provide
an in-depth positioning of our approach in relation to other work in Appx. A.
2
Neural Fields in dynamics modelling. Conditional Neural fields (NeFs) are a class of coordinate-
based neural networks, often trained to reconstruct discretely-sampled input continuously. More
specifically, a conditional neural field fθ : Rn → Rd is a field –parameterized by a neural network
with parameters θ– that maps input coordinates x ∈ Rn in the data domain alongside conditioning
latents z to d-dimensional signal values ν(x) ∈ Rd . By associating a conditioning latent z ν ∈ Rc
to each signal ν, a single conditional NeF fθ : Rn × Rc → Rd can learn to represent families D of
continuous signals such that ∀ ν ∈ D : f (x) ≈ fθ (x; z ν ). [49] propose to use conditional NeFs for
PDE modelling by learning a continuous flow in the latent space of a conditional neural field. In
particular, a set of latents {ziν }Ti=1 are obtained by fitting a conditional neural field to a given set
of observations {νi }Ti=1 at timesteps 1, ..., T ; simultaneously, a neural ODE [11] Fψ is trained to
map pairs of temporally contiguous latents s.t. solutions correspond to the trajectories traced by the
learned latents. Though this approach yields impressive results for sparse and irregular data in planar
PDEs, we show it breaks down on complex geometries. We hypothesize that this is due to lack of
a latent space that preserves relevant geometric transformations that characterize the symmetries
of the systems we are modelling, and as such propose an extension of this framework where such
symmetries are preserved.
Symmetries and weight sharing. Given a group G with identity element e ∈ G, and a set X, a
group action is a map T : G × X → X. For simplicity, we denote the action of g ∈ G on x ∈ X
as gx := T (g, x), and call G-space a smooth manifold equipped with a G action. A group action
is homomorphic to G with its group product, namely it is such that ex = x and (gh)x = g(hx).
As an example, we are interested in the Special Euclidean group SE(n)=Rn ⋊ SO(n): group
elements of SE(n) are identified by a translation t ∈ Rn and rotations R ∈ SO(n) with group
operation gg ′ = (t, Rθ ) (t′ , Rθ′ ) = (Rx′ + x, RRθ′ ); We denote by Lg the left action of G on
function spaces defined as Lg f (x′ ) = f (g −1 x′ ) = f (R−1 ′
θ (x − x)). Many PDEs are defined by
equivariant differential operators such that for a given state ν: Lg N [ν] = N [Lg ν]. If the boundary
conditions do not break the symmetry, namely if the boundary is symmetric with respect to the same
group action, then a G-transformed solution to the IVP for some ν0 corresponds to the solution
for the G-transformed initial value. For example, laws of physics do not depend on the choice
of coordinate system, this implies that many PDEs are defined by SE(n)-equivariant differential
operators. The geometric deep learning literature shows that models can benefit from leveraging the
inherent symmetries or invariances present in the data by constraining the searchable function space
through weight sharing [9, 25, 5]. Recall that in our framework we model flows of fields, solutions to
PDEs defined by equivariant differential operators, with ordinary differential equations in the latent
space of conditional neural fields. We leverage the symmetries of the system for two key aspects
of the proposed method: first by making the relation between signals and corresponding latents
equivariant; second, by using equivariant ODEs, namely ODEs defined by equivariant vector fields: if
dz
dτ =F (z) is such that F (gz) = gF (z), then solutions are mapped to solutions by the group action.
3 Method
We adapt the work of [49], and consider the following optimization problem 2 :
Z t
ν 2 ν ν
min Eν∈D,x∈X ,t∈JT K ∥νt (x) − fθ (x; zt )∥2 , where zt = z0 + Fψ (zτν )dτ , (1)
θ,ψ,zτ 0
with fθ (x; ztν ) a decoder tasked with reconstructing state νt from latent ztν and Fψ a neural ODE that
dz ν
maps a latent to its temporal derivative: τ =Fψ (zτν ), modelling the solution as flow in latent space
dτ
starting at the initial latent z0ν - see Fig. 1 for a visual intuition.
3
our model to have, by construction, the geometric properties that the modelled system is known to
posses - (2) to get more structured latent representations and facilitate the job of the neural ODE. To
achieve this we first need the latent space Z to be equipped with a well-defined group action with
respect to which ∀g ∈ G, z ∈ Z : Fψ (gz) =gFψ (z), and, most importantly, we need the relation
between the reconstructed field and the corresponding latent to be equivariant, i.e.,
∀g ∈ G , x ∈ X : Lg fθ (x; ztν ) = fθ (g −1 x; ztν ) = fθ (x; gztν ). (2)
Note that, somewhat imprecisely, we call this condition equivariance to convey the idea even though
it is not, strictly speaking, the commonly used definition of equivariance for general operators. If we
consider the decoder as a mapping from latents to fields, we can make the notion of equivariance of
this mapping more precise. Namely
f (x) = Dθ (z), Dθ (z) : ztν 7→ fθ (·; ztν ) , f (g −1 x) = Dθ (gz), Dθ (gz) : g ztν 7→ fθ (g −1 ·; ztν ) . (3)
In Sec. 3.1 we describe the Equivariant Neu-
ral Field (ENF)-based decoder, which satisfies
equation (2). Second, in Sec. 3.2 we outline the
graph-based equivariant neural ODE. Sec. 3.3
explains the motivation for- and use of- meta-
learning for obtaining the ENF backbone param-
eters. We show how the combination of equiv-
ariance and meta-learning produce much more
structured latent representations of continuous
signals (Fig. 3).
3.1 Representing
Figure 2: The proposed framework respects pre-
PDE states with Equivariant Neural Fields
defined symmetries of the PDE: a rotated solution
We briefly recap ENFs here, referring the reader Lg νT νmay be obtained either by solving from la-ν
to [48] for more detail. We extend ENFs to tent z0 (top-left)νand transforming the solution zT
symmetries for PDEs over varying geometries. (top-right) to gzT (bottom-right) or transforming
z0ν to gz0ν (bottom-left) and solving this.
ENFs as cross-attention over bi-invariant attributes. Attention-based conditional neural fields
represent a signal ν ∈ D with a corresponding latent set z ν [50]. This class of conditional neural fields
obtain signal-specific reconstructions ν(x) ≈ fθ (x; z ν ) through a cross-attention operation between
the latent set z ν and input coordinates x. ENFs [48] extend this approach by imposing equivariance
constraints w.r.t a group G ⊆ SE(n) on the relation between the neural field and the latents such
that transformations to the signal ν correspond to transformation of the latent z ν (Eq. (2)). For this
condition to hold, we need a well-defined action on the latent space Z of fθ . To this end, ENFs define
elements of the latent set z ν as tuples of pose pi ∈ G and context ci ∈ Rd , z ν := {(pi , ci )}N i=1 .
The latent space is then equipped with a group action defined as gz = {(gpi , ci )}N i=1 . To achieve
equivariance over transformations ENFs follow [5] where equivariance is achieved with convolutional
weight-sharing over equivalence classes of points pairs x, x′ . ENFs instead extend weight-sharing to
cross-attention over bi-invariant attributes of z, x pairs.
Weight-sharing over bi-invariant attributes of z, x is motivated by Eq. 2, by which we have:
fθ (x; z) = fθ (gx; gz). (4)
−1
Intuitively, the above equation says that a transformation g on the domain of fθ , i.e. g x, can
be undone by also acting with g on z. In other words, the output of the neural field fθ should be
bi-invariant to g−transformations of the pair z, x. For a specific pair (zi , xm ) ∈ Z × X, the term
bi-invariant attribute ai,m describes a function a : (zi , xm ) 7→ a(zi , xm ) such that a(zi , xm ) =
a(gzi , gxm ). Thorughout the paper we use ai,m as shorthand for a(zi , xm ).
To parameterize fθ , we can accordingly choose any function that is bi-invariant to G−transformations
of z, x. In particular, for an input coordinate xm ENFs choose to make fθ a cross-attention operation
between attributes ai,m and the invariant context vectors ci :
fθ (xm , z) = cross_attn(a:,m , c: , c: ) (5)
As an example, for SE(n)-equivariance, we can define the bi-invariant simply using the group action:
SE(n)
ai,m = p−1 T
i xm = Ri (xm − xi ), which is bi-invariant by:
∀g ∈ SE(n) : (pi , x) 7→ (g pi , g x) ⇔ p−1
i x 7→ (g pi )
−1
g x = p−1
i g
−1
g x = p−1
i x . (6)
4
Bi-invariant attributes for PDE solving. As explained above, ENF is equivariant to SE(n)-
transformations by defining fθ as a function of an SE(n)−bi-invariant attribute aSE(n) . Although
many physical processes adhere to roto-translational symmetries, we are also interested in solving
PDEs that - due to the geometry of the domain, their specific formulation, and/or their boundary
conditions - are not fully SE(n)−equivariant. As such, we are interested in extending ENFs to
equivariances that are not strictly (subsets of) SE(n), which we show we can achieve by finding
bi-invariants that respect these particular transformations. Below, we provide two examples, the other
invariants we use in the experiments - including a "bi-invariant" a∅ that is not actually bi-invariant to
any geometric transformations, which we use to ablate over equivariance constraints - are in Appx. D.
The flat 2-torus. When the physical domain of interest is continuous and extends indefinitely, periodic
boundary conditions are often used, i.e. the PDE is defined over a space topologically equivalent
to that of the 2-torus. Such boundary conditions break SO(2) symmetries; assuming the domain
has periodicity π and none of the terms of this PDE depend on the choice of coordinate frame,
these boundary conditions imply that the PDE is equivariant to periodic translations: the group of
translations modulo π: T2 ≡ R2 /Z2 . In this case, periodic functions over x, y with periods π would
2
work as a bi-invariant, i.e. using poses p ∈ T2 , aT = cos(2π(x0 − p0 )) + cos(2π(x1 − p1 )) - which
π
happens to be bi-invariant to rotations by 2 as well. Instead, since we do not assume any rotational
symmetries to exist on the torus, we opt for a non-rotationally symmetric function:
2
aTi,m = cos(2π(x0i − p0i )) ⊕ cos(2π(x1i − p1i )), (7)
where ⊕ denotes concatenation. This bi-invariant is used in experiments on Navier-Stokes over the
flat 2-Torus.
The 2-sphere. In some settings a PDE may be symmetric only to rotations along a certain axes. An
example is that of the global shallow-water equations on the two-sphere - used to model geophysical
processes such as atmospheric flow [16], which are characterised by rotational symmetry only along
the earth’s axis of rotation due to inclusion of a term for Coriolis acceleration that breaks full SO(3)
equivariance. We use poses p ∈ SO(3) parametrised by Euler angles ϕ, θ, γ, and spherical coordinates
ϕ, θ for x ∈ S 2 . We make the first two Euler angles coincide with the spherical coordinates and
define a bi-invariant for rotations around the axis θ = π.
aSW
i,m = ∆ϕpi ,xm ⊕ θpi ⊕ γpi ⊕ θxm , (8)
where ∆ϕpi ,xm =ϕpi −ϕxm −2π if ϕpi −ϕxm > π and ∆ϕpi ,xm =ϕpi −ϕxm + 2π if ϕpi −ϕxm < −π,
to adjust for periodicity.
In summary, to parameterize an ENF equivariant with respect to a specific group we are simply
required to find attributes that are bi-invariant with respect to the same group. In general we achieve
this by using group-valued poses and their action on the PDE domain.
Let z0ν be a latent set that faithfully reconstructs the initial state ν0 . We want to define a neural
dz ν
ODE Fψ that map latents ztν to their temporal derivatives τ =Fψ (zτν ) that is equivariant with
dτ
respect to the group action: gFψ (zτν )=Fψ (gzτν ). To this end, we use a message passing neural
network (MPNN) to learn a flow of poses pi and contexts ci over time. We base our architecture
on PΘNITA [5], which employs convolutional weight-sharing over bi-invariants for SE(n). For an
in-depth recap of message-passing frameworks, we refer the reader to Appx. A. Since Fψ is required
to be equivariant w.r.t. the group action, any updates to the poses pi should also be equivariant. [40]
propose to parameterize an equivariant node position update by using a basis spanned by relative
node positions xj − xi . In our setting, poses pi are points on a manifold M equipped with a group
action. As such, we analogously propose parameterizing pose updates by a weighted combination
of logarithmic maps logpi (pj ), which intuitively describe the relative position between pi , pj in the
tangent space Tpi M , or the displacement from pi to pj . We integrate the resulting pose update over
the manifold through the exponential map exppi . In the euclidean case logpi (pj )=xj − xi and we
get back node position updates per [40]. In short, the message passing layers we use consist of the
5
(a) (b) (c)
Figure 3: We show the impact of meta-learning and equivariance on the latent space of the ENF when
representing trajectories of PDE states. Fig. 3a shows a T-SNE plot of the latent space of fθ when ztν
is optimized with autodecoding, and no weight sharing over bi-invariants is enforced. Fig. 3b shows
the latent space when meta-learning is used, but no weight sharing is enforced. Fig. 3c shows the
latent space when ztν are obtained using meta-learning and fθ shares weights over aSE(n) .
Until now we’ve not discussed how to obtain latents corresponding to the initial condition z0ν . An
approach often used in conditional neural field literature is that of autodecoding [35], where latents z ν
are optimized for reconstruction of the input signal ν with SGD. Optimizing a NeF for reconstruction
does not necessarily lead to good quality representations [34], i.e. using MSE-based autodecoding to
obtain latents ztν - as is proposed by [49] - may complicate the latent space, impeding optimization
of the neural ODE Fψ . Moreover, autodecoding requires many optimization steps at inference (for
reference, [49] use 300-500 steps). [13] propose meta-learning as a way to overcome long inference
times, as it allows for fitting latents in a few steps - typically three or four. We hypothesize that
meta-learning may also structure the latent space - similar to the impact of equivariance constraints,
since the very limited number of optimization steps requires efficient organization of latents ztν
around the (shared) initialization, forcing together the latent representation of contiguous states. To
this end, we propose to use meta-learning for obtaining the initial latent z0ν , which is then unrolled by
the neural ode Fψ to find solutions ztν .
As a first validation of the hypotheses that both equivariance constraints and meta-learning introduce
structure to the latent space of fθ , we visualize latent spaces of different variants of the ENF. We fit
ENFs to a dataset consisting of solutions to the heat equation for various initial conditions (details
in Appx. E). For each sample νt , we obtain a set of latents ztν , which we average over the invariant
context vectors ci ∈ Rc to obtain a single vector in Rc invariant to a group action according to the
chosen bi-invariant. Next, we apply T-SNE [46] to the resulting vectors in Rc . We use three different
setups: (a) no meta-learning, model weights θ and latents ztν optimized for every νt separately using
autodecoding [35], and no equivariance imposed (per Eq. 15), shown in Fig. 3a. (b) meta-learning
is used to obtain θ,ztν , but no equivariance imposed, shown in Fig. 3b and (c) meta-learning is
used to obtain θ,ztν and SE(2)-equivariance is imposed by weight-sharing over aSE(n) bi-invariants,
shown in Fig. 3c. The results confirm our intuition that both meta-learning and equivariance improve
latent-space structure.
Recap: optimization objective. We use a meta-learning inner-loop [28, 13] to obtain the initial
latent z0ν under supervision of coordinate-value pairs (x, ν(x)0 )x∈X from ν0 . This latent is unrolled
6
for ttrain timesteps using Fψ . The obtained latents are used to reconstruct states ztν along the trajectory
of ν, and parameters of fθ , Fψ are optimised for reconstruction MSE, as shown in the left-hand side
of Eq. 1. See Appx. B for detailed pseudocode of this process.
4 Experiments
We intend to show the impact of symmetry-preservation in continuous PDE solving. To this end we
perform a range of experiments assessing different qualities of our model on tasks with different
symmetries. First, we investigate the equivariance properties of our framework by evaluating it
against unseen geometric transformations of the initial conditions. Next, we assess generalization
and extrapolation capabilities w.r.t. unseen spatial locations and time horizons inside and outside the
time ranges seen during training respectively, robustness to partial test-time observations, and data-
efficiency. As the continuous nature of NeF-based PDE solving allows, we verify these properties for
PDEs defined over challenging geometries: the plane R2 , 2-torus T2 and the sphere S 2 and the 3D
ball B3 . Architectural details and hyperparameters are in Appx. E. Code is attached to submission.
Evaluation. All reported MSE values are for predictions obtained given only the initial condition v0 ,
with std over 3 runs. We evaluate two settings for train and test sets both: generalization setting with
time evolution happening within the seen horizon during training (tin ); and, extrapolation setting
with the time evolution happening outside the seen horizon during training (tout ). For both cases we
measure the mean-squared error (MSE). To position our work relative to competitive data-driven
PDE solvers, on the 2D-Navier-Stokes experiment we provide comparisons with a range of baselines.
7
Figure 5: A Navier-Stokes test sample (top) and corre- Figure 6: Test MSE tin for increasing
sponding predictions from our model (bottom). We visu- training set sizes for the heat equa-
alize predictions in the train horizon tin = [0, ..., 9] , tout = tion over the sphere. Equivariant im-
[10, ..., 20] and beyond. The model remains stable well be- proves over non-equivariant. For ref-
yond the train horizon, but due to accumulated errors fails to erence we show performance of DINo
capture dynamics beyond t > 40. [49] trained on 256 trajectories.
In most other settings these models cannot straightforwardly be applied, and we only compare to
[49], to our knowledge the only other fully continuous PDE solving method in literature.
Equivariance properties - heat equation on the plane. To verify our framework respects the posed
equivariance constraints, we create a dataset of solutions to the heat equation that requires a neural
solver to respect equivariance constraints to achieve good performance. Specifically, for initial condi-
tions we randomly insert a pulse of variable intensity in x = (x1 , x2 ) ∈ R2 s.t. −1<x1 <1, 0<x2 <1
for the training data and −1<x1 <1, −1<x2 <0 for the test data. Intuitively, train and test sets
contain spikes under different disjoint sets of roto-translations (see Fig. 4). We train variants of
our framework with (aSE(2) , Eq. 6) and without (a∅ , Eq. 15) equivariance constraints. In this
dataset, we set tin = [0, ..., 9], and evaluation horizon tout = [10, ..., 20]. Results in Tab. 1 show that
the non-equivariant model, as well as the baseline [49] are unable to successfully solve test initial
conditions, whereas the equivariant model performs well.
100% OF ν OBSERVED
ance constraints and meta-learning on robust- CNODE [2] 6.02E-02 3.35E-01 5.48E-02
0
3.17E-01
ness to sparse test-time observations of the ini- FNO
G-FNO
9.43E-05
3.13E-05
2.11E-03
3.49E-04
8.44E-05
3.15E-05
1.60E-03
3.52E-04
tial condition. To this end, we train a model DINo [49] 8.20E-03
T2 /π
6.85E-02 1.11E-02 9.08E-02
2 Ours a 5.60
AD, E-02 0.37 E-01 6.75
±0.43 ±0.34 E-02 4.00 E-01
±0.62 ±0.38
with (aT , Eq. 7), without (a∅ , Eq. 15) equivari- Ours a 1.41 ∅
E-02 1.67 E-01 2.60
±1.83 ±1.27 E-02 2.14 E-01
±3.16 ±1.46
T2 /π
Ours a 1.45 E-03 9.14 E-03 1.57 E-03 1.16 E-02
ance constraints, and one with equivariance con- ±0.08 ±0.36 ±0.09 ±0.14
2 50% OF ν OBSERVED 0
straints and without meta-learning (AD aT , Eq. CNODE [2] 1.38E-01 6.33E-01 1.52E-01 6.76E-01
FNO 3.31E-02 1.39E-01 3.20E-02 1.47E-01
7), on a fully-observed train set. The training G-FNO 2.75E-02 1.17E-01 2.32E-02 1.01E-01
DINo [49] 3.67E-02 2.81E-01 3.74E-02 2.83E-01
horizon tin = [0, ..., 9], and evaluation horizon Ours a 6.89
AD,
T2 /π
E-02 3.95 E-01 7.01
±2.68 ±2.18 E-02 4.01 E-01
±3.56 ±2.29
Data-efficiency - Diffusion on the sphere. To assess the impact of equivariance on data efficiency,
we vary the size of the training set of heat equation solutions from 16 to 64 trajectories and apply a
model with (aSO(3) , Eq. 13) and without (a∅ , Eq. 15) equivariance constraints. In this dataset, we set
8
tin = [0, ..., 9], and evaluation horizon tout = [10, ..., 20]. We visualize tin test- and train MSE in Fig.
6. These results show the non-equivariant model overfitting the training set for smaller numbers of
trajectories while unable to solve the PDE satisfactorily, whereas the equivariant model generalizes
well even with only 16 training trajectories.
Super-resolution - Shallow-Water on the
sphere. Due to their continuous nature, NeF- Table 3: MSE ↓ on Shallow-Water equations on
based approaches inherently support zero-shot the sphere.
super-resolution. In this setting, we generate
a set of solutions for the global shallow-water tIN TRAIN tOUT TRAIN tIN TEST tOUT TEST
9
References
[1] Ilze Amanda Auzina, Çağatay Yıldız, Sara Magliacane, Matthias Bethge, and Efstratios Gavves.
Modulated neural odes. Advances in Neural Information Processing Systems, 36, 2024.
[2] Ibrahim Ayed, Emmanuel De Bezenac, Arthur Pajot, and Patrick Gallinari. Learning the spatio-
temporal dynamics of physical processes from partial observations. In ICASSP 2020-2020
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages
3232–3236. IEEE, 2020.
[3] Matthias Bauer, Emilien Dupont, Andy Brock, Dan Rosenbaum, Jonathan Richard Schwarz,
and Hyunjik Kim. Spatial functa: Scaling functa to imagenet classification and generation.
arXiv preprint arXiv:2302.03130, 2023.
[4] Erik J Bekkers. B-spline cnns on lie groups. In International Conference on Learning Repre-
sentations, 2019.
[5] Erik J Bekkers, Sharvaree Vadgama, Rob D Hesselink, Putri A van der Linden, and David W
Romero. Fast, expressive se (n) equivariant networks through weight-sharing in position-
orientation space. arXiv preprint arXiv:2310.02970, 2023.
[6] Johannes Brandstetter, Rob Hesselink, Elise van der Pol, Erik J Bekkers, and Max Welling.
Geometric and physical quantities improve e (3) equivariant message passing. arXiv preprint
arXiv:2110.02905, 2021.
[7] Johannes Brandstetter, Rianne van den Berg, Max Welling, and Jayesh K Gupta. Clifford neural
layers for pde modeling. arXiv preprint arXiv:2209.04934, 2022.
[8] Johannes Brandstetter, Daniel Worrall, and Max Welling. Message passing neural pde solvers.
arXiv preprint arXiv:2202.03376, 2022.
[9] Michael M Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković. Geometric deep learning:
Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478, 2021.
[10] Keaton J. Burns, Geoffrey M. Vasil, Jeffrey S. Oishi, Daniel Lecoanet, and Benjamin P. Brown.
Dedalus: A flexible framework for numerical simulations with spectral methods. Physical
Review Research, 2(2):023068, April 2020. doi: 10.1103/PhysRevResearch.2.023068.
[11] Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary
differential equations. Advances in neural information processing systems, 31, 2018.
[12] Taco Cohen and Max Welling. Group equivariant convolutional networks. In International
conference on machine learning, pages 2990–2999. PMLR, 2016.
[13] Emilien Dupont, Hyunjik Kim, SM Eslami, Danilo Rezende, and Dan Rosenbaum. From
data to functa: Your data point is a function and you can treat it like one. arXiv preprint
arXiv:2201.12204, 2022.
[14] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adap-
tation of deep networks. In International conference on machine learning, pages 1126–1135.
PMLR, 2017.
[15] Marc Finzi, Samuel Stanton, Pavel Izmailov, and Andrew Gordon Wilson. Generalizing
convolutional neural networks for equivariance to lie groups on arbitrary continuous data. In
International Conference on Machine Learning, pages 3165–3176. PMLR, 2020.
[16] Joseph Galewsky, Richard K Scott, and Lorenzo M Polvani. An initial-value problem for testing
numerical models of the global shallow-water equations. Tellus A: Dynamic Meteorology and
Oceanography, 56(5):429–440, 2004.
[17] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural
message passing for quantum chemistry. In International conference on machine learning,
pages 1263–1272. PMLR, 2017.
10
[18] Samuel Greydanus, Misko Dzamba, and Jason Yosinski. Hamiltonian neural networks. Advances
in neural information processing systems, 32, 2019.
[19] Xiaoxiao Guo, Wei Li, and Francesco Iorio. Convolutional neural networks for steady flow ap-
proximation. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge
discovery and data mining, pages 481–490, 2016.
[20] Jacob Helwig, Xuan Zhang, Cong Fu, Jerry Kurtin, Stephan Wojtowytsch, and Shuiwang Ji.
Group equivariant fourier neural operators for partial differential equations. Proceedings of
the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202,
2023., 2023.
[21] Quercus Hernández, Alberto Badías, David González, Francisco Chinesta, and Elías Cueto.
Structure-preserving neural networks. Journal of Computational Physics, 426:109950, 2021.
[22] Pengzhan Jin, Zhen Zhang, Aiqing Zhu, Yifa Tang, and George Em Karniadakis. Sympnets:
Intrinsic structure-preserving symplectic networks for identifying hamiltonian systems. Neural
Networks, 132:166–179, 2020.
[23] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980, 2014.
[24] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional
networks. arXiv preprint arXiv:1609.02907, 2016.
[25] David M Knigge, David W Romero, and Erik J Bekkers. Exploiting redundancy: Separable
group convolutional networks on lie groups. In International Conference on Machine Learning,
pages 11359–11386. PMLR, 2022.
[26] Miltiadis Miltos Kofinas, Erik Bekkers, Naveen Nagaraja, and Efstratios Gavves. Latent field
discovery in interacting dynamical systems with neural fields. Advances in Neural Information
Processing Systems, 36, 2023.
[27] Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya,
Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function
spaces. arXiv preprint arXiv:2108.08481, 2021.
[28] Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. Meta-sgd: Learning to learn quickly for
few-shot learning. arXiv preprint arXiv:1707.09835, 2017.
[29] Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya,
Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differen-
tial equations. arXiv preprint arXiv:2010.08895, 2020.
[30] Yongtuo Liu, Sara Magliacane, Miltiadis Kofinas, and Efstratios Gavves. Graph switching
dynamical systems. In International Conference on Machine Learning, pages 21867–21883.
PMLR, 2023.
[31] Yongtuo Liu, Sara Magliacane, Miltiadis Kofinas, and Efstratios Gavves. Amortized equation
discovery in hybrid dynamical systems, 2024.
[32] Philipp Moser, Wolfgang Fenz, Stefan Thumfart, Isabell Ganitzer, and Michael Giretzlehner.
Modeling of 3d blood flows with physics-informed neural networks: Comparison of network
architectures. Fluids, 8(2):46, 2023.
[33] Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms.
arXiv preprint arXiv:1803.02999, 2018.
[34] Samuele Papa, David M Knigge, Riccardo Valperga, Nikita Moriakov, Miltos Kofinas, Jan-
Jakob Sonke, and Efstratios Gavves. Neural modulation fields for conditional cone beam neural
tomography. arXiv preprint arXiv:2307.08351, 2023.
11
[35] Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove.
Deepsdf: Learning continuous signed distance functions for shape representation. In Proceed-
ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174,
2019.
[36] Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. Film:
Visual reasoning with a general conditioning layer. In Proceedings of the AAAI conference on
artificial intelligence, volume 32, 2018.
[37] Adeel Pervez, Francesco Locatello, and Efstratios Gavves. Mechanistic neural networks for
scientific machine learning. arXiv preprint arXiv:2402.13077, 2024.
[38] Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W Battaglia. Learning
mesh-based simulation with graph networks. arXiv preprint arXiv:2010.03409, 2020.
[39] Michael Prasthofer, Tim De Ryck, and Siddhartha Mishra. Variable-input deep operator
networks. arXiv preprint arXiv:2205.11404, 2022.
[40] Vıctor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E (n) equivariant graph neural
networks. In International conference on machine learning, pages 9323–9332. PMLR, 2021.
[41] Vincent Sitzmann, Eric Chan, Richard Tucker, Noah Snavely, and Gordon Wetzstein. Metasdf:
Meta-learning signed distance functions. Advances in Neural Information Processing Systems,
33:10136–10147, 2020.
[42] George Gabriel Stokes et al. On the effect of the internal friction of fluids on the motion of
pendulums. 1851.
[43] Matthew Tancik, Pratul Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan,
Utkarsh Singhal, Ravi Ramamoorthi, Jonathan Barron, and Ren Ng. Fourier features let
networks learn high frequency functions in low dimensional domains. Advances in neural
information processing systems, 33:7537–7547, 2020.
[44] Matthew Tancik, Ben Mildenhall, Terrance Wang, Divi Schmidt, Pratul P Srinivasan, Jonathan T
Barron, and Ren Ng. Learned initializations for optimizing coordinate-based neural repre-
sentations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 2846–2855, 2021.
[45] Riccardo Valperga, Kevin Webster, Dmitry Turaev, Victoria Klein, and Jeroen Lamb. Learning
reversible symplectic dynamics. In Proceedings of The 4th Annual Learning for Dynamics
and Control Conference, volume 168 of Proceedings of Machine Learning Research, pages
906–916. PMLR, 23–24 Jun 2022.
[46] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine
learning research, 9(11), 2008.
[47] Maurice Weiler and Gabriele Cesa. General e (2)-equivariant steerable cnns. Advances in neural
information processing systems, 32, 2019.
[48] David R Wessels, David M Knigge, Samuele Papa, Riccardo Valperga, Efstratios Gavves, and
Erik J Bekkers. Grounding continuous representations in geometry: Equivariant neural fields.
ArXiv Preprint arXiv:, 2024.
[49] Yuan Yin, Matthieu Kirchmeyer, Jean-Yves Franceschi, Alain Rakotomamonjy, and Patrick
Gallinari. Continuous pde dynamics forecasting with implicit neural representations. arXiv
preprint arXiv:2209.14855, 2022.
[50] Biao Zhang, Jiapeng Tang, Matthias Niessner, and Peter Wonka. 3dshape2vecset: A 3d shape
representation for neural fields and generative diffusion models. ACM Transactions on Graphics
(TOG), 42(4):1–16, 2023.
[51] Maksim Zhdanov, David Ruhe, Maurice Weiler, Ana Lucic, Johannes Brandstetter, and Patrick
Forré. Clifford-steerable convolutional neural networks. arXiv preprint arXiv:2402.14730,
2024.
12
[52] David Zwicker. py-pde: A python package for solving partial differential equations. Journal
of Open Source Software, 5(48):2158, 2020. doi: 10.21105/joss.02158. URL https://doi.
org/10.21105/joss.02158.
A Related work
DL approaches to dynamics modelling In recent years, the learning of spatiotemporal dynamics has been
receiving significant attention, either for modelling interacting systems [31, 30], scientific Machine Learning
[49, 8, 7, 37, 26, 51], or even videos [1]. Most DL methods for solving PDEs attempt to directly replace solvers
with mappings between finite-dimensional Euclidean spaces, i.e. through the use of CNNs [19, 2] or GNNs
[38, 8] often applied autoregressively to an observed (discretized) PDE state. Instead, the Neural Operator (NO)
[27] paradigm attempts to learn infinite-dimensional operators, i.e. mappings between function spaces, with
limited success. Fourier Neural Operator (FNO) [29] extends this method by performing convolutions in the
spectral domain. FNO obtains much improved performance, but due to its reliance on FFT is limited to data on
regular grids.
Inductive biases in DL and dynamics modelling Geometric Deep Learning aims to improve model
generalization and performance by constraining/designing a model’s space of learnable functions based on
geometric principles. Prominent examples include Group Equivariant Convolutional Networks and Steerable
CNNs [12, 4], generalizations of CNNs that respect symmetries of the data - such as dilations and continuous
rotations [47, 15, 25]. Analogously, Graph Neural Networks (GNNs) [24] or Message Passing Neural Networks
(MPNNS) [17] are a variant of neural network that respects set-permutations naturally found in graph data. They
are typically formulated for graphs G = (V, E), with nodes i ∈ V and edges E. Typically nodes are embedded
into a node vector fi0 , which is subsequently updated over multiple layers of message passing. Message passing
consists of (1) computing messages mi,j over edges i, j from node j to i with the message function P (taking into
account edge attributes ai,j : mi,j = ϕm (fil , fjl , ai,j ) (2) aggegating incoming messages: mi = j∈N (i) mi,j ,
(3) computing updated node features fil+1 = ϕu (fil , mi ).
Recently, such methods have also been adapted for sparse physical data, e.g. for molecular property prediction
[40, 6] - where the GNN is additionally required to respect transformation symmetries. [5] unifies these
approaches to equivariance under the guise of weight sharing over equivalence classes defined by bi-invariant
attributes of pairs of nodes i, j, a viewpoint we leverage in constructing the equivariant conditioning latent ztν
corresponding to a PDE state νt . In the context of dynamics modelling, equivariant architectures have been
employed to incorporate various properties of physical systems in the modelling process, examples of such
properties are the symplectic structure [22], discrete symmetries such as reversing symmetries [45] and energy
conservation [18, 21].
Neural Fields in dynamics modelling Conditional Neural fields (NeFs) are a class of coordinate-based
neural networks, often trained to reconstruct discretely-sampled input continuously. More specifically, a
conditional neural field fθ : Rn → Rd is a field –parameterized by a neural network with parameters θ– that
maps input coordinates x ∈ Rn in the data domain alongside conditioning latents z to d-dimensional signal
values f (x) ∈ Rd . By associating a conditioning latent z ν ∈ Rc to each signal ν, a single conditional NeF fθ :
Rn × Rc → Rd can learn to represent families D of continuous signals such that ∀ ν ∈ D : f (x) ≈ fθ (x; zν ).
[13] showed the viability of using the latents zi as representations for downstream tasks (e.g. classification,
generation) proposing a framework for learning on neural fields. This framework inherits desirable properties of
neural fields, such as inherent support for sparsely and/or irregularly sampled data, and independence to signal
resolution. [49] propose to use conditional NeFs for PDE modelling by learning a continuous flow in the latent
space of a conditional neural field. In particular, a set of latents {zνi }Ti=1 are obtained by fitting a conditional
neural field to a given set of observations {νi }Ti=1 at timesteps 1, ..., T ; simultaneously, a neural ODE [11] Fψ
is trained to map pairs of temporally continuous latents s.t. solutions correspond to the trajectories traced by the
learned latents. Though this approach yields impressive results for sparse and irregular data in planar PDEs, we
show it breaks down on more challenging geometries. We hypothesize that this is due to a lack of a latent space
that preserves relevant geometric transformation with respect to which systems we are modelling are symmetric,
and as such propose an extension of this framework where such symmetries are preserved.
13
[28]. Recently, work has also explored the relation between initialization/optimization of a NeF and its value as
downstream representation; [34] show that (1) using a shared NeF initialization and (2) limiting the number
of gradient updates to the NeF improves performance in downstream tasks, as this simplifies the complex
relation between a NeFs parameter space and its output function space. We combine these insights and make
Meta-Learning part of our equivariant PDE solving pipeline, as it enables fast inference and we show it to
simplify the latent space of the ENF, improving performance of the neural ODE solver.
Equivariance follows from sharing Q, K, V over equivalence classes Note that the latent space of
the ENF is equipped with a group action as: gztν = {(gpi , ai )}N
i=1 . As an example, SE(2)-equivariance of the
ENF follows from bi-invariance of the quantity a used to construct Q under the group action:
∀g ∈ SE(n) : (pi , x) 7→ (g pi , g x) ⇔ p−1
i x 7→ (g pi )
−1
g x = p−1
i g
−1
g x = p−1
i g
−1
g. (11)
14
And so, constructing the matrix containing the relative poses of bi-transformed poses and coordinates (gP)−1 gx
as ((gP)−1 gx)i,: = p−1i g
−1
gx = p−1
i x, we trivially have:
No transformation symmetries. A simple "bi-invariant" for this setting that preserves all geometric information
is given by simply concatenating coordinates p with coordinates x:
a∅i,m = pi ⊕ xm (15)
Parameterizing the cross-attention operation in Eq. 5 as function of this bi-invariant results in a framework
without any equivariance constraints. We use this in experiments to ablate over equivariance constraints and its
impact on performance.
E Experimental Details
E.1 Dataset creation
For creating the dataset of PDE solutions we used py-pde [52] for Navier-Stokes and the diffusion equation on
the plane. For the shallow-water equation and the diffusion equation on the sphere, as well as the internally
heated convection in a 3D ball we used Dedalus [10].
Diffusion on the plane. For the diffusion equation on the plane we use as initial conditions narrow spikes
centred at random locations in the left half of the domain for the train set, and in the right half of the domain for
the test set. States are defined on a 64 × 64 grid ranging from -3 to 3. Initial conditions are randomly sampled
uniformly between -2 and 2 for x and 0 and 2 for y in the training set and between -2 and 2 for x and -2 and 0
for y. A random value uniformly sampled between 5.0 and 5.5 is inserted at the randomly sampled location.
We solve the equation with an Euler solver for 27 steps, discarding the first 7, with a timestep dt = 0.01. We
generate 1024 training and 128 test trajectories.
Navier-Stokes on the flat 2-torus. For Navier-Stokes on the flat 2-torus we use Gaussian random fields
as initial conditions and solve the PDE using a Cranck-Nicholson method with timestep dt = 1.0 for 20 steps.
The PDE is dv dt
= −u∇v + v∆µ + f, v = ∇ × u, ∇u = 0, where u is the velocity field, v the vorticity, µ the
viscosity and f a forcing term
dv
= −u∇v + v∆µ + f
dt
v =∇×u
∇u = 0,
where u is the velocity field, v the vorticity, µ the viscosity and f a forcing term. States are defined on a 64 ×
64 grid. We generate 8192 training and 512 test trajectories.
15
Diffusion on the 2-sphere. For the diffusion dataset on the sphere, states are defined over a 128 × 64
ϕ, θ grid. Initial conditions are generated as a gaussian peak inserted at a random point on the sphere with
σ = 0.25. The equation is solved for 20 timesteps with RK4 and dt = 1.0. We generate 256 training and 64 test
trajectories.
Where umax = 80ms−1 , ϕ0 = π/7, ϕ1 = π/2 − ϕ1 , and en = exp[−4(ϕ1 − ϕ0 )2 ]. With this initial zonal
flow, we numerically integrate the balance equation
tan(ϕ′ )
Z ϕ
gh(ϕ) = gh0 − au(ϕ′ ) f + u(ϕ′ ) dϕ′ ,
a
to obtain the height h. We then randomly generate small un-balanced perturbations h′ to the height field
2 2
h′ (θ, ϕ) = ĥ cos(ϕ)e−(θ2 −θ/α) e−[(ϕ2 −ϕ)/β]
by uniformly sampling α, β, ĥ, θ2 , and ϕ2 within a neighbourhood of the values use in [16]. States are defined
on a 192 × 96 grid for the high-resolution dataset, which is subsequently downsampled by 2 × 2 mean pooling
to a 96 × 48 grid. We generate 512 training trajectories and 64 test trajectories.
Internally-heated convection in the ball. The equations for the internally-heated convection system are
listed here, they include thermal diffusivity (κ) and kinematic viscosity (ν), given by:
κ = (Ra · Pr)−1/2
−1/2
Ra
ν=
Pr
We set Ra = 1e − 6 and Pr = 1.
1. Incompressibility condition (continuity equation):
∇ · u + τp = 0
3. Temperature equation:
∂T
− κ∇2 T + lift(τT ) = −u · ∇T + κTsource
∂t
4. Shear stress boundary condition (stress-free condition):
Shear Stress = 0 on the boundary
16
7. Pressure gauge condition: Z
p dV = 0
The boundary conditions imposed are stress-free and no-penetration for the velocity field and a constant thermal
flux at the outer boundary. These conditions are enforced using penalty terms (τ ) that are lifted into the domain
using higher-order basis functions.
States are defined over a 64 × 24 × 24 ϕ, θ, r grid. We use a SBDF2 solver which we constrain by dtmin = 1e − 4
and dtmax = 2e − 2. We evolve the PDE for 26 timesteps, discarding the first 6. We generate 512 training
trajectories and 64 test trajectories.
Diffusion on the plane. We use 4 latents with c ∈ R16 . We set the hidden dim of the ENF to 64 and use 2
attention heads. We train the model for 1000 epochs. We set γq = 0.05, γvα = 0.01, γvβ = 0.01. We use a
batch size of 8. The model takes approximately 8 hours to train.
Navier-Stokes on the flat 2-torus. We use 4 latents with c ∈ R16 . We set the hidden dim of the ENF to
64 and use 2 attention heads. We train the model for 2000 epochs. We set γq = 0.05, γvα = 0.2, γvβ = 0.2.
We use a batch size of 4. The model takes approximately 48 hours to train.
Diffusion on the 2-sphere. We use 18 latents with c ∈ R4 . We set the hidden dim of the ENF to 16 and
use 2 attention heads. We train the model for 1500 epochs. We set γq = 0.01, γvα = 0.01, γvβ = 0.01. We use
a batch size of 2. The model takes approximately 12 hours to train.
Spherical whallow-water equations [16]. We use 8 latents with c ∈ R3 2. We set the hidden dim of the
ENF to 128, and use 2 attention heads. We train the model for 1500 epochs. γq = 0.05, γvα = 0.2, γvβ = 0.2.
We use a batch size of 2. The model takes approximately 24 hours to train.
Internally-heated convection in the ball We use 8 latents with c ∈ R3 2. We set the hidden dim of the
ENF to 128, and use 2 attention heads. We train the model for 1500 epochs. γq = 0.05, γvα = 0.2, γvβ = 0.2.
We use a batch size of 2. The model takes approximately 24 hours to train.
Baselines As baseline models on Navier-Stokes we train FNO and GFNO [29] with 8 modes and 32 channels
for 700 epochs (until convergence). We train CNODE [2] with 4 layers of size 64 for 300 epochs (until
convergence). We train DINo on all experiments for 2000 epochs with an architecture as specified in [49]. For
the IHC and shallow-water experiments, we increase the latent dim from 100 to 200, the number of layers for
the neural ODE from 3 to 5, and the latent dim of the neural field decoder from 64 to 256, as per [49].
17