DeepONet Extensions, E.g., POD-DeepONet (Comput. Methods Appl. Mech. Eng.)
DeepONet Extensions, E.g., POD-DeepONet (Comput. Methods Appl. Mech. Eng.)
com
ScienceDirect
Received 10 November 2021; received in revised form 20 January 2022; accepted 15 February 2022
Available online 11 March 2022
Abstract
Neural operators can learn nonlinear mappings between function spaces and offer a new simulation paradigm for real-time
prediction of complex dynamics for realistic diverse applications as well as for system identification in science and engineering.
Herein, we investigate the performance of two neural operators, which have shown promising results so far, and we develop
new practical extensions that will make them more accurate and robust and importantly more suitable for industrial-complexity
applications. The first neural operator, DeepONet, was published in 2019 (Lu et al., 2019), and its original architecture was
based on the universal approximation theorem of Chen & Chen (1995). The second one, named Fourier Neural Operator or
FNO, was published in 2020 (Li et al., 2020), and it is based on parameterizing the integral kernel in the Fourier space.
DeepONet is represented by a summation of products of neural networks (NNs), corresponding to the branch NN for the input
function and the trunk NN for the output function; both NNs are general architectures, e.g., the branch NN can be replaced
with a CNN or a ResNet. According to Kovachki et al. (2021), FNO in its continuous form can be viewed conceptually as
a DeepONet with a specific architecture of the branch NN and a trunk NN represented by a trigonometric basis. In order to
compare FNO with DeepONet computationally for realistic setups, we develop several extensions of FNO that can deal with
complex geometric domains as well as mappings where the input and output function spaces are of different dimensions. We
also develop an extended DeepONet with special features that provide inductive bias and accelerate training, and we present
a faster implementation of DeepONet with cost comparable to the computational cost of FNO, which is based on the Fast
Fourier Transform.
We consider 16 different benchmarks to demonstrate the relative performance of the two neural operators, including
instability wave analysis in hypersonic boundary layers, prediction of the vorticity field of a flapping airfoil, porous
media simulations in complex-geometry domains, etc. We follow the guiding principles of FAIR (Findability, Accessibility,
Interoperability, and Reusability) for scientific data management and stewardship. The performance of DeepONet and FNO is
comparable for relatively simple settings, but for complex geometries the performance of FNO deteriorates greatly. We also
∗ Corresponding author at: Division of Applied Mathematics, Brown University, United States of America.
E-mail address: [email protected] (G.E. Karniadakis).
1 These authors contributed equally to this work.
https://doi.org/10.1016/j.cma.2022.114778
0045-7825/© 2022 Elsevier B.V. All rights reserved.
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
compare theoretically the two neural operators and obtain similar error estimates for DeepONet and FNO under the same
regularity assumptions.
© 2022 Elsevier B.V. All rights reserved.
Keywords: Nonlinear mappings; Operator regression; Deep learning; DeepONet; FNO; Scientific machine learning
1. Introduction
While there have been rapid developments in machine learning in the last 20 years and there is currently plenty of
euphoria and admiration about the so-called “unreasonable effectiveness of deep learning in artificial intelligence”
[1], the development of physics-informed machine learning and its application to scientific applications is relatively
recent [2]. Physics-informed neural networks (PINNs) were introduced first in 2017 [3,4] for forward, inverse
and hybrid problems, and since then there have also been rapid developments in this area [5–13], although the
achievements so far in scientific and engineering applications have been modest compared to applications in imaging,
speech recognition and natural language processing. According to some estimates, last year alone close to 100,000
papers on machine learning were published, all of which were based on the assumption of the universal function
approximation of neural networks — a theoretical work that goes back to the early 1990s [14,15].
At about the same time, a much less known theorem of Chen & Chen [16] on the universal operator
approximation by single-layer neural networks was developed but remained largely unknown until our work
on DeepONet in 2019 [17] with subsequent theoretical and computational extensions in [18]. Unlike function
regression, operator regression aims to map infinite-dimensional functions (inputs) to infinite-dimensional functions
(outputs). The work in [18] extended the theorem of Chen & Chen [16] to deep neural networks, which are more
expressive and break the curse of dimensionality in the input space. More importantly, from the computational point
of view, the new paradigm of operator regression allows for simulating the dynamics of complex nonlinear systems,
e.g., fluid flows, corresponding to different boundary and initial conditions without the need for retraining the neural
network. As noted independently by DeepMind researcher Irina Higgins [19], “Once DeepONet is trained, it can
be applied to new input functions, thus producing new results substantially faster than numerical solvers. Another
benefit of DeepONet is that it can be applied to simulation data, experimental data or both, and the experimental
data may span multiple orders of magnitude in spatio-temporal scales, thus allowing scientists to estimate dynamics
better by pooling the existing data”. DeepONet can be applied to partial differential equations (PDEs) but also to
learning explicit mathematical operators, e.g., integration, fractional derivatives, Laplace transforms, etc. [18].
Another parallel effort on operator regression started in 2020 with a paper on a graph kernel network (GKN) for
PDEs [20]. The authors represented the infinite-dimensional mapping by composing nonlinear activation functions
and a class of integral operators with the kernel integration computed by message passing on graph networks.
Unfortunately, GKN was of limited use as it was shown to be unstable with the increase of the number of hidden
layers [21]. In fact, a recent extension of GKN using nonlocal operator regression [21] shows that this type of neural
operator is powerful if it is stable, but no extensive experiments have been performed yet. A different architecture
was then proposed by the same group in [22], where they formulated the operator regression by parameterizing
the integral kernel directly in Fourier space. They demonstrated very good accuracy and efficiency for relatively
simple settings, including the Burgers’ equation, Darcy flow, and the Navier–Stokes equations. However, they made
the claim that “The Fourier neural operator is the first ML-based method to successfully model turbulent flows
with zero-shot super-resolution. It is up to three orders of magnitude faster compared to traditional PDE solvers.
Additionally, it achieves superior accuracy compared to previous learning-based solvers under fixed resolution”.
This is of course erroneous as the flow considered was a simple smooth laminar flow, and their comparison with
DeepONet was not properly conducted. In fact, as we will demonstrate herein, DeepONet can achieve similar and
in fact better accuracy for the same or similar benchmarks presented in [22] and even superior accuracy in realistic
problems involving complex-geometry domains and noisy input data.
In this paper, we present a very systematic and transparent study of comparing the performance of DeepONet
and FNO for 16 different benchmarks selected carefully to highlight both the advantages and the limitations of the
two neural operators. According to an independent work by [23], FNO in its continuous form can be viewed as
a DeepONet with a specific architecture of the branch and a trunk represented by a trigonometric basis. Hence,
in the current work we introduce two significant enhancements to FNO so that they can deal with mappings of
2
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
different dimensionality as well as with complex-geometry domains, so that we can make sensible comparisons with
DeepONet, which is a general neural operator. We also introduce various enhancements to DeepONet to accelerate
its training and increase its accuracy, introducing for example the POD modes in the trunk net, obtained readily
from the available training datasets. In addition to computational tests, we also perform a theoretical comparison of
DeepONet versus FNO, following the published work on the theory of DeepONet in [16,24–27], and on the more
recent theory of FNO in [23,28]. On this point, it is worth noted that DeepONet was based from the onset on the
theorem of Chen & Chen [16], whereas the formulation of FNO was not theoretically justified originally, and the
recent theoretical work covers only invariant kernels. There are also other methods for operator regression such
as [29–32], but in this work we only consider DeepONet and FNO.
In the following, we summarize the new contributions of the current work in addition to designing 16 different
benchmarks and obtaining new results for both DeepONet and FNO along with their new extensions. More
specifically, the new developments for DeepONet are the following:
• We introduce extra features in the trunk net and the branch net.
• We impose hard-constraints for Dirichlet and periodic boundary conditions via a modified trunk net.
• We develop a new extension, POD-DeepONet, that employs the POD modes of the training data as the trunk
net.
• We analyze and test a new DeepONet scaling that leads to accuracy improvement.
• We extend DeepONet to deal with multiple outputs.
• We present a new fast implementation of DeepONet, comparable to FNO for similar settings.
Similarly, the new developments for FNO are the following:
• dFNO+: We extend FNO to nonlinear mappings with inputs and outputs defined on different domains.
• gFNO+: We extend FNO to nonlinear mappings with inputs and outputs defined on a complex geometry.
• We add extra features by using them as extra network inputs.
Moreover, we employ normalization of both inputs and outputs for DeepONet and FNO and demonstrate its effect.
In our comparative studies, we designed benchmarks with important features typically encountered in real
world applications, such as complex-geometry domains, non-smooth solutions, unsteadiness, and noisy data. On
the theoretical side, our contribution is on developing error estimation of the network size by emulating Fourier
methods both for DeepONet and FNO. We aim to make this study, including codes and data, accessible to all,
and to the degree possible we have followed the FAIR (Findability, Accessibility, Interoperability, and Reusability)
guiding principles for scientific data management and stewardship [33].
The paper is organized as follows. In Section 2, we define the problem setup and the data types. In Section 3,
we describe the DeepONet and FNO architectures as well as their extensions. In Section 4, we present a theoretical
comparison of the approximation theorems for the two neural operators and in addition we compare their error
estimates for the solution of the Burgers’ equation. In Section 5, we compare the performance of DeepONet
and FNO for 16 different benchmarks listed in Table 3. Finally, we conclude with a summary in Section 6. The
Appendix includes the proof of the theorem presented in the main text, description of the datasets, description of
the architectures employed, data generation for the Darcy problems and the cavity flows, and a comparative study
on the computational cost for different neural operators.
2. Operator learning
We first present the problem setup of operator learning and then discuss some important aspects of the dataset
used for learning.
We consider a physical system, which involves multiple functions. Among these functions, we are usually
interested in one function and aim to predict this function from other functions. For example, when the physical
system is described by partial differential equations (PDEs), these functions typically include the PDE solution
u(x, t), the forcing term f (x, t), the initial condition (IC) u 0 (x), and the boundary conditions (BCs) u b (x, t), where
x and t are the space and time coordinates, respectively.
3
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
We denote the input function by v defined on the domain D ⊂ Rd , e.g., f (x, t) or u 0 (x),
v : D ∋ x ↦→ v(x) ∈ R,
′
and denote the output function by u defined on the domain D ′ ⊂ Rd
u : D ′ ∋ ξ ↦→ u(ξ ) ∈ R.
Let V and U be the spaces of v and u, respectively, and D and D ′ could be two different domains. Then, the
mapping from the input function v to the output function u is denoted by an operator
G : V ∋ v ↦→ u ∈ U.
In
{( this study,
) ( we aim)to approximate G
)} by neural networks and train the network from a training dataset T =
v (1) , u (1) , v (2) , u (2) , . . . , v (m) , u (m) .
(
Deep learning is a powerful method for leveraging big data, but in many science and engineering problems,
the available data is not “big” enough to ensure accuracy and reliability of deep learning models. What we may
have instead is “dinky, dirty, dynamics, and deceptive data” as first characterized by Alexander Kott, chief of the
Network Science Division of the US Army Research Laboratory [34]. On the other hand, scientific data should
meet principles of findability, accessibility, interoperability, and reusability (FAIR) [33].
In addition, we have to facilitate multi-modality input data that may come from diverse sources, e.g., static
images, videos, Schlieren photography, particle image velocimetry (PIV), particle tracking velocimetry (PTV),
radar and satellite images, MRI, CT, X-ray, two-photon microscopy, satellite, synthetic/simulated data, scattered
unstructured opportunistic data, etc. We also have to facilitate multi-fidelity data that not only include multi-
resolution data but also data corresponding to different levels of physical complexity, and hence training of NN
requires special multi-fidelity methods such as in [35–37]. Noisy data are omnipresent and effective NN training
and inference should be stable to noise and provide answers with uncertainty quantification using, e.g., Bayesian
NN as in [36,38–40].
3.1. DeepONet
Next, we provide an introduction to the vanilla2 DeepONet. Although vanilla DeepONet has demonstrated good
performance in diverse applications [18,41–48], here, we propose several extensions of DeepONet to achieve better
accuracy and faster training.
Fig. 1. Architecture of DeepONet. (A) DeepONet architecture. If the trunk net is a feed-forward neural network, then it is a vanilla
DeepONet. (B) Feature expansion of the trunk-net input. Periodic BCs can also be strictly imposed into the DeepONet by using the Fourier
feature expansion. (C) Dirichlet BCs are strictly enforced in DeepONet by modifying the network output.
“trunk” network and a “branch” network (Fig. 1A). The trunk net takes the coordinates ξ ∈ D ′ as the input, while
the branch net takes the discretized function v as input. Finally, the output of the network is expressed as:
p
∑
G(v)(ξ ) = bk (v)tk (ξ ) + b0 ,
k=1
where b0 ∈ R is a bias. {b1 , b2 , . . . , b p } are the p outputs of the branch net, and {t1 , t2 , . . . , t p } are the p outputs
of the trunk net.
We note that DeepONet is a high-level framework without restricting the branch net and the trunk net to any
specific architecture. As ξ is usually low dimensional, a standard FNN is commonly used as the trunk net. The
choice of the branch net depends on the structure of the input function v, and it can be chosen as a FNN, ResNet,
CNN, RNN, or a graph neural network (GNN), etc. For example, if the discretization of v is on an equispaced 2D
grid, then a CNN can be used; if the discretization of v is on an unstructured mesh, then a GNN can be used.
Feature expansion in the branch net. If the feature is a function of x, then we cannot encode it in the trunk net,
and instead we can use the feature as an extra input function of the branch net (Fig. 1A).
G(v)(ξ ) = g(ξ ), ξ ∈ ΓD ,
where Γ D is a part of the boundary. To make the DeepONet output satisfy this BC automatically, we construct the
solution as
where N (v)(ξ ) is the product of the branch and trunk nets, and ℓ(ξ ) is a function satisfying the following condition:
ℓ(ξ ) = 0, ξ ∈ Γ D ,
{
ℓ(ξ ) > 0, otherwise.
Here, we assume g(ξ ) is well defined for any ξ , otherwise we can construct a continuous extension for g.
Periodic BCs. Here, we first introduce how to enforce periodic BCs in neural networks in 1D [9,50] by constructing
special feature expansion as discussed in Section 3.1.2, and then discuss how to extend it to 2D. If the solution
u(ξ ) is periodic with respect to ξ of the period P, then u(ξ ) can be represented well by the Fourier series. Hence,
we can replace the network input ξ with Fourier basis functions, i.e., the features in Fig. 1B are
{1, cos (ωξ ) , sin (ωξ ) , cos (2ωξ ) , sin (2ωξ ) , . . .}
with ω = 2π P
. Compared to the aforementioned feature expansion, here we do not have the feature ξ any more.
Because each Fourier basis function is periodic, it is easy to prove that the DeepONet output G(v)(ξ ) is also
periodic [50]. The number of Fourier features to be used is problem dependent, and we may use as few as the
first two Fourier basis function {cos (ωξ ) , sin (ωξ )}.
Next, we discuss the 2D case, and with a slight abuse of the notation for x we denote ξ = (x, y) ∈ R2 . The
basis functions of the Fourier series on a 2D square are:
cos (nωx x) cos mω y y , cos (nωx x) sin mω y y , sin (nωx x) cos mω y y , sin (nωx x) sin mω y y m,n=0,1,2,...
{ ( ) ( ) ( ) ( )}
with ωx = 2π
Px
and ω y = 2πPy
. If we only choose m, n ∈ {0, 1}, the basis are:
{
1, cos(ωx x), sin(ωx x), cos(ω y y), sin(ω y y),
}
cos(ωx x) cos(ω y y), cos(ωx x) sin(ω y y), sin(ωx x) cos(ω y y), sin(ωx x) sin(ω y y)
where φ0 (ξ ) is the mean function of u(ξ ) computed from the training dataset. {b1 , b2 , . . . , b p } are the p outputs of
the branch net, and {φ1 , φ2 , . . . , φ p } are the p precomputed POD modes of u(ξ ). The proposed POD-DeepONet
shares a similar idea of using POD to represent functions as in [29].
1 1
Var[bk ] = , E[tk2 ] = .
2 4
Observe that bk and tk are independent and that E[bk ] = 0 and E[bk tk ] = 0. Moreover, E[bi b j ] = 1/2 if i = j and
E[bi b j ] = 0 if i ̸= j. Then
p p p p p
∑ ∑ ∑ ∑ ∑ p
Var[ bk tk ] = E[( bk tk )2 ] = E[ bk bl tk tl ] = E[bk2 ]E[tk2 ] = Var[bk ]E[tk2 ] = .
k=1 k=1 k,l=1 k=1 k=1
8
7
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
√
The analysis suggests that the scaling factor of the DeepONet should be O(1/ p), i.e.,
[ p ]
1 ∑
G(v)(ξ ) = √ bk (v)tk (ξ ) + b0 .
p k=1
However, we note that the analysis only applies to the He initialization and considers the initialization stage, and
√
thus in practice the scaling factor 1/ p may not be optimal for the final network accuracy.
The analysis is only suggestive, so we tested computationally the vanilla DeepONet and POD-DeepONet with
√
a few different scaling factors, including no rescaling, 1/ p, 1/ p, and 1/ p 1.5 . From our experiments, we find
that the scaling factor does not have a significant effect on the accuracy of vanilla DeepONet, but it is important to
POD-DeepONet. In some of our experiments with POD-DeepONet, e.g., the Navier–Stokes equation in Section 5.6,
√
among all scaling factors, we indeed obtain the best accuracy when it is scaled by 1/ p as suggested by our
analysis. In some other experiments, e.g., the Burgers’ equation in Section 5.1 and the advection problem (Case
√
I) in Section 5.4.1, POD-DeepONet scaled by the factor 1/ p obtains a better accuracy than POD-DeepONet
without rescaling, but it achieves an even better accuracy when scaled by the factor 1/ p. Therefore, the optimal
scaling factor in practice is problem-dependent, and we use 1/ p by default in our experiments, unless otherwise
stated.
Fig. 2. Examples of three different types of datasets. u 1 and u 2 are two examples of the output function in the dataset. The black dots
are the locations of measurements. (Case I) u 1 has measurements in 6 locations of ξ , and u 1 has measurements in 4 different locations.
(Case II) Both u 1 and u 2 have 4 measurements, but the locations are different. (Case III) Both u 1 and u 2 have 6 measurements in the same
locations.
Table 1
Comparison of three different types of datasets and code implementation.
Case I Case II Case III
Do all u have the same number of locations of ξ ? No Yes Yes
Do all u have the same locations of ξ ? No No Yes
Flexibility High Medium Low
Computational cost High Medium Low
3. Another similar case as Case II is that all ξ are sampled from the points in the same grid, and each u uses
only a portion of the grid points (the number of ξ may be different). Then, we can reuse the same input ξ
of the trunk net and remove the computational redundancy.
4. Case III: The trunk net input ξ is exactly the same for all u, see an example in Fig. 2 right. Based on
Case II, we can further reuse the same trunk net input ξ for all v. In this way, we remove the computational
redundancy in both branch and trunk nets.
The comparison of the three different cases is summarized in Table 1. Here, we only introduce the underlying idea,
but for the detailed code implementation, we refer the reader to our code.
DeepONet can work for the datasets in all the cases, but FNO only works for the dataset in Case III. Hence, in
order to compare DeepONet and FNO, in this study we generate the data as in Case III. We also use the DeepONet
implementation discussed in Case III, instead of the original implementation, to achieve comparable computational
cost of FNO. We note that compared to Case III, Case I and Case II may lead to better accuracy of DeepONet.
3.2. FNO
Next, we introduce the vanilla FNO and then discuss how to generalize the vanilla FNO for problems whose
inputs and outputs are defined on different domains or on a complex geometry.
In FNO, for any location x on the mesh, the function value v(x) is first lifted to a higher dimensional
representation z 0 (x) by
z 0 (x) = P(v(x)) ∈ Rdz
using a local transformation P : R → Rdz (Fig. 3), which is parameterized by a shallow fully-connected neural
network or simply a linear layer. We note that z 0 is defined on the same mesh as v, and the values of z 0 in the
mesh can be viewed as an image with dz channels. Then L Fourier layers are applied iteratively to z 0 . Let us denote
z L as the output of the last Fourier layer, and the dimension of z L (x) is also dz . Hence, at the end, another local
transformation Q : Rdz → R is applied to project z L (x) to the output (Fig. 3) by
u(x) = Q(z L (x)).
We parameterize Q by a fully-connected neural network.
Next we introduce the Fourier layer by using the Fast Fourier Transform (FFT). For the output of the lth Fourier
layer zl with dv channels, we first compute the following transform by FFT F and inverse FFT F −1 (the top path
of the Fourier layer in Fig. 3):
F −1 (Rl · F(zl )) .
The details are as follows:
• F is applied to each channel of zl separately, and we usually truncate the higher modes of F(zl ), keeping
only the first k Fourier modes in each channel. So F(zl ) has the shape dv × k. For 2D functions, k = k1 × k2 ,
where k1 and k2 are the number of modes to keep in the first and second dimensions, respectively. For 3D
functions, it can be done similarly.
• We apply a different (complex-number) weight matrix of shape dv × dv for each mode index of F(zl ), so we
have k trainable matrices, which form a weight tensor Rl ∈ Cdv ×dv ×k . Then, Rl · F(zl ) has the same shape of
dv × k as F(zl ).
• Before we perform the inverse FFT, we need to append zeros to Rl · F(zl ) to fill in the truncated modes.
Moreover, in each Fourier layer, a residual connection with a weight matrix Wl ∈ Rdv ×dv is used to compute a new
set of dv channels, and each new channel is a linear combinations between all the zl channels (the bottom path of
the Fourier layer in Fig. 3). Wl · zl has the same shape as zl . We can implement Wl · zl by a matrix multiplication
or by a convolution layer with the kernel size 1. Then, the output of the (l + 1)th Fourier layer zl+1 is
zl+1 = σ F −1 (Rl · F(zl )) + Wl · zl + bl ,
( )
where σ is a nonlinear activation function, and bl ∈ Rdv is a bias. In this work, we use ReLU in all cases.
We note that in practice, to achieve good accuracy, we need to use the coordinates x as the input as well, in
addition to v(x), i.e., the FNO input is the values of (x, v(x)) ∈ Rd+1 in a grid (i.e., an image with d + 1 channels).
This is a special case of the feature expansion that we will discuss in Section 3.2.4.
10
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
3.2.2. dFNO+: Operators with inputs and outputs defined on different domains
One limitation of FNO is that it requires D and D ′ to be the same domain, which is not always satisfied. Here,
we discuss two scenarios where D ̸= D ′ , and propose new extensions of FNO to address this issue.
Case I: The output space is a product space of the input space and another space D0 , i.e., D ′ = D × D0 . We use
a specific example to illustrate the idea. We consider the PDE solution operator mapping from the initial condition
to the solution in the whole domain:
G : v(x) = u(x, 0) ↦→ u(x, t),
where x ∈ [0, 1] and t ∈ [0, T ]. Here, D = [0, 1], D ′ = [0, 1] × [0, T ], and thus D0 = [0, T ]. In order to match
the input and output domains, we propose the following two methods.
Method 1: Expand the input domain. We can extend the input function by adding the extra coordinate t,
defining ṽ as
ṽ(x, t) = v(x).
Then, FNO is used to learn the mapping from ṽ(x, t) to u(x, t).
Method 2 [22]: Shrink the output domain via RNN. We can also reduce the dimension of the output by
decomposing G into a series of operators. We denote a new time-marching operator
G̃ : u(x, t) ↦→ u(x, t + ∆t), x ∈ D,
i.e., G̃ predicts the solution at t + ∆t from the solution at t. Then, we apply G̃ to the input v(x) repeatedly to obtain
the solution in the whole domain, which is similar to a RNN.
Case II: The input space is a subset of the output space, i.e., D ⊂ D ′ . In general, we can always extend v from
D to D ′ by padding zeros in the domain D ′ \ D, i.e., we define
{
v(ξ ), if ξ ∈ D
ṽ(ξ ) =
0, if ξ ∈ D ′ \ D,
and then learn the mapping from ṽ to u.
However, in some cases, this padding strategy may not be efficient. For example, we consider a PDE defined on
a rectangular domain (x, y) ∈ D ′ = [0, 1]2 , and the operator is the mapping from the Dirichlet boundary condition
v defined in the four boundaries (D = {0, 1} × [0, 1] ∪ [0, 1] × {0, 1}) to the solution u(x, y) inside the rectangular
domain. In this example, D is essentially a 1D space and occupies zero area in D ′ . We propose a better strategy
so that we first unfold the curve of v into a 1D function ṽ defined in [0, 4]:
⎪v(x̃, 0),
⎧
⎪ if x̃ ∈ [0, 1] (bottom boundary)
⎨v(1, x̃ − 1), if x̃ ∈ [1, 2] (right boundary)
⎪
ṽ(x̃) = ,
⎪v(3 − x̃, 1), if x̃ ∈ [2, 3] (top boundary)
⎪
⎪
v(0, 4 − x̃), if x̃ ∈ [3, 4] (left boundary)
⎩
and then, we use the method in Case I above to learn the operator from ṽ in 1D to u in 2D.
3.2.3. gFNO+: Operators with inputs and outputs defined on a complex geometry
FNO uses FFT, which requires the input and output functions to be defined on a Cartesian domain with a lattice
grid mesh. However, for the PDEs defined on a complex geometry D (e.g., L-shape, triangular domain, etc.), an
unstructured mesh is usually used, and thus we need to deal with two issues: (1) non-Cartesian domain, and (2)
non-lattice mesh. For the second issue of unstructured mesh, we need to do interpolation between the unstructured
mesh and a lattice grid mesh.
For the issue of the Cartesian domain, we first define the Cartesian domain D̃, which is the minimum bounding
box of D, and then extend v (the same for u) from D to D̃ by
{
v(x), if x ∈ D
ṽ(x) =
v0 (x), if x ∈ D̃ \ D.
11
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
Table 2
Comparison between vanilla DeepONet and vanilla FNO.
DeepONet FNO
Input domain D′ & Output domain D′ Arbitrary Cuboid, D = D ′
Discretization of output function u No Yes
Mesh Arbitrary Grid
Prediction location Arbitrary Grid points
Full field observation data No Yes
Discontinuous functions Good Questionable
Here, the choice of v0 (x) is not unique. The simplest choice is v0 (x) = 0, i.e., zero padding. However, we find
that such zero padding leads to large error of FNO, which may be due to the discontinuity from v(x) to v0 (x). We
propose to compute v0 (x) by “nearest neighbor”:
for x ∈ D̃ \ D, v0 (x) = v(x0 ), where x0 = min ∥ p − x∥,
p∈D
so that ṽ(x) is continuous on the boundary of D. In the training, we use a mask to only consider the points inside
D in the loss function.
Here, we list a comparison between DeepONet and FNO in Table 2 in terms of some of their properties instead
of accuracy. The first three points have been discussed in the introduction above of DeepONet and FNO. Because
FNO also discretizes the output function, then after the network training, it can only predict the solution in the
same mesh as the input function, whereas DeepONet can make predictions at any location. For the training, FNO
requires a full field observation data, but DeepONet is more flexible, except that POD-DeepONet requires a full
field data to compute the POD modes. We also note that FNO relies on Fourier transformation, which may not be
very accurate for discontinuous functions, see an example in Section 5.4.3.
There are several other useful techniques that may improve the performance of DeepONet and FNO, such as
learning rate decay, L 2 regularization, and input normalization. Here, we emphasize the output normalization. Let us
assume that in the training dataset the mean function and the standard deviation function of u is ū(ξ ) and std[u](ξ ),
respectively. Then, we construct the surrogate model as
u(ξ ) = std[u](ξ ) · N (v)(ξ ) + ū(ξ ),
where N is a DeepONet or FNO. Hence, for any ξ , the mean value of N (v)(ξ ) is zero and the standard deviation
is one.
4. Theoretical comparison
In this section, we compare the universal approximation theorem of operators using DeepONet and FNO. We
also compare their error estimates for the solution operator from the Burgers’ equation. Here, we first present the
general universal approximation theorem for DeepONet and FNO for completeness. Also, we emulate the same
numerical method for DeepONet and FNO, while the comparison in [23] between two methods is abstract instead
of specific equations such as the Burgers’ equations we consider here.
12
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
Theorem 4.1 (Generalized universal approximation theorem for operators [18]). Suppose that X is a Banach
space, K 1 ⊂ X , K 2 ⊂ Rd are two compact sets in X and Rd , respectively, V is a compact set in C(K 1 ) Assume
that G : V → C(K 2 ) is a nonlinear continuous operator. Then for any ϵ > 0, there exist positive integers m, p,
branch nets bk : V → R, and trunk nets tk : Rd → R, and x1 , x2 , . . . , xm ∈ K 1 , such that
⏐ ⏐
⏐ p ⏐
∑
bk (v(x1 ), v(x2 ), . . . , v(xm )) tk (x)⏐⏐ < ϵ.
⏐ ⏐
sup sup ⏐G(v)(x) −
⏐
v∈V x∈K 2 ⏐ k=1
⏐
branch tr unk
Furthermore, the functions bk and tk can be chosen as diverse classes of neural networks, satisfying the classical
universal approximation theorem of functions, e.g., fully-connected neural networks, residual neural networks, and
convolutional neural networks.
This theorem was proved in [16] with two-layer neural networks. Also, the theorem holds when the Banach
space C(K 1 ) is replaced by L q (K 1 ) and C(K 2 ) replaced by L r (K 2 ), q, r ≥ 1. Some extensions have been made
in [24] for measurable operators, which contains discontinuous operators that can be approximated by continuous
operators. The conclusion can be readily extended to the case of vector (multiple) output functions.
The FNO for operator regression is proposed in [22], and the corresponding universal approximation theorem
is presented in [23]. We( present) the theorem for completeness. Define Td = [0, 2π ]d and the Hilbert space with
smooth index s by H s Td ; Rdv for Rdv -valued functions defined on Td . Let V(Td ; Rdv ) be a Banach space of
Rdv -valued functions defined on Td . Let P be a lifting operator from V(Td ; Rdv ) → U(Td ; Rda ) and∑ Nthe projection
Q : U(Td ; Rda ) → U(Td ; Rdu ). Let I N denote the Fourier interpolation operator, i.e., I N f (x) = i=1 f (xi )L i (x),
where xi ’s are the Fourier collocation points on Td and L i (x) are corresponding Lagrange interpolation trigonometric
polynomials. Therefore the output of the lth Fourier block is defined as
( ( ) )
Ll (z)(x) = σ Wl z(x) + bl (x) + F −1
Rl (k) · F(z)(k) (x) .
Here, Wl ∈ Rdl ×dl and the function bl (x) is Rdl -valued, and the coefficients Rl (k) ∈ Rdl ×dl define a convolution
operator via the Fourier transform. Then, the FNO in its continuous form is defined as follows:
F(v) = Q ◦ I N ◦ L L ◦ I N ◦ · · · ◦ L1 ◦ I N ◦ P(v). (4.1)
′ d
Theorem 4.2 (Universal approximation
s
( dv
) theorem
s′
(of FNO
du
) [23]). Let s, s ≥ 0 and Ω ⊂ T be sa( domain dv
) with
Lipschitz boundary. Let G : H Ω ; R → H Ω; R be a continuous operator. Let V ⊂ H Ω ; R ) be a
′ (
compact subset. Then for any ϵ > 0, there exists a FNO of the form (4.1), F : H s Td ; Rdv → H s Td ; Rdu such
( )
that
sup ∥G(v) − F(v)∥ H s ′ ≤ ϵ.
v∈V
The conclusion above can be found in Theorem 2.15 of [23].
Both DeepONet and FNO suffer from the curse of dimensionality if one uses ReLU or tanh networks for
Lipschitz continuous operators, due to the approximation capacity of these networks for high dimensional inputs
(v(x1 ), v(x2 ), . . . , v(xm )). However, rates of convergence are obtained for [24,25,27] for DeepONet and [28] for
FNO, for some solution operators from PDEs. Next, we compare error estimates of DeepONet and FNO for 1D
Burgers equations with periodic boundary conditions. As will be shown, the inputs of the operator will be first
approximated by some numerical methods such that one has possibly high-dimensional inputs for the neural network
operators. This will introduce approximation errors in the inputs and thus in the outputs. Then extra errors will be
induced by the network approximation emulating the analytical (after approximation of the input function) method
of the solution.
Since 0 < exp(− π∥uκ0 ∥∞ ) ≤ v0 (x) ≤ exp( π∥uκ0 ∥∞ ), the solution u can be written explicitly as
∂x K(x, y, t)v0 (y)dy
∫
u(x) = G(u 0 )(x) := −2κ ∫R , x = (x, t), (4.4)
R K(x, y, t)v0 (y)dy
( 2
)
where K(x, y, t) = √4π1 κt exp − (x−y) 4κt
is the heat kernel. It can be readily checked that u(x) is a unique solution
to (4.2).
We may obtain a neural network for operators of regression of G by first approximating it with classical
numerical methods and then emulating these methods using neural networks. ∑ ∫ x Let us first(yapproximate G. Define
k)
Vm := V(u0,m ) = (V0 ,(V1 , . . . , Vm−1 ))⊤ , where V0 = 1 and V j = exp(− k −πj L k (y) dy u 02κ ), j = 1, . . . , m − 1.
Define Gm (u0,m )(x) = G̃m ◦ V(u0,m ) (x) where
−2κ R ∂x K(x, y, t)(Im v0 )(y)dy V0 c01 (x) + V1 c11 (x) + · · · + Vm−1 cm−1
1
∫
(x)
Gm (u0,m )(x) = = , (4.5)
R K(x, y, t)(Im v0 )(y)dy
∫ 2 2 2
V0 c0 (x) + V1 c1 (x) + · · · + Vm−1 cm−1 (x)
where G̃m is a rational function with respect to Vm , with both the numerator and the denominator being mth degree
m-variate polynomials with m terms and Im is the Fourier interpolation operator and for j = 0, . . . , m − 1,
∫ x (∑ ) ∫ x (∑ )
1
c j (x) = −2κ ∂x K(x, y + 2πl, t) L j (y) dy, c j (x) =
2
K(x, y + 2πl, t) L j (y) dy.
0 l∈Z 0 l∈Z
By the Lipschitz continuity of the solution operator G(·) of the Burgers equation, we have the following estimate.
{ }
Theorem 4.3. Let u 0 ∈ S ′ = u 0 |[−π,π] ∈ S : u 0 grows at most quadratically at ∞ . Let G(u 0 )(x) and Gm (u0,m )(x)
be defined in (4.4) and (4.5), respectively. Suppose h is small enough. Then there is a uniform constant C depending
only on t, the lower and upper bounds M0 and M1 and κ, such that for any t ∈ (0, +∞), we have
sup G(u 0 )(·, t) − Gm (u0,m )(·, t) L 2 (−π,π ) ≤ Ch, where h = 2π/m.
u 0 ∈S ′
Theorem 4.4 ([54]). Let ε ∈ (0, 1] and nonnegative integer k be given. Let p : [0, 1]d → [−1, 1] and q : [0, 1]d →
[2−k , 1] be polynomials of degree
( ≤ r , each with ≤ s monomials. Then there ) exists a ReLU network
⏐ f of⏐ size
⏐ ≤ ε.
p(x) ⏐
(number of total neurons) O k 7 ln( 1ε )3 +min{sr k ln(sr/ε), sdk 2 ln(dsr/ε)2 } such that supx∈[0,1]d ⏐ f (x)− q(x)
⏐
Observing that Gm (u0,m ) = G̃(Vm ) is a rational function with respect to Vm while Vm is an exponential function
in u0,m . Then by approximation of rational polynomials in Theorem 4.4, there exists a ReLU network of size
14
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
Table 3
16 problems tested in this study.
Section Problems
Section 5.1 Burgers’ equation
Section 5.2 5 Darcy problems in a rectangular domain and complex geometries
Section 5.3 Multiphysics electroconvection problem
Section 5.4.1 3 Advection problems
Section 5.4.2 Linear instability waves in high-speed boundary layers
Section 5.4.3 Compressible Euler equation with non-equilibrium chemistry
Section 5.5 Predicting surface vorticity of a flapping airfoil
Section 5.6 Navier–Stokes equation in the vorticity–velocity form
Section 5.7 2 problems of regularized cavity flows
O(m 2 ln(m)) (s = r = m in Theorem 4.4) to obtain accuracy of O(m −1 ) for each x. For the approximation of ‘exp’
by ReLU networks, we only need a network size of O(ln(m)) to obtain accuracy of O(m −1 ), according to [55]. Thus,
we need a ReLU network of size O(m 3 ln(m)) to emulate Gm (u0,m ) and denote this ReLU network by GmN (u0,m ).
The network GmN (u0,m )(x j ) can be viewed as FNO, where for all the kernels Rl ≡ 0. Thus, the FNO of size
O(m 3 ln(m)) can achieve the accuracy of O(m −1 ). According to [25], the network GmN (u0,m )(x j ) serves as a branch
net while only a ReLU network of size O(ln m) is needed for the trunk net. The size of the DeepONet is thus
O(m 3 ln(m)). In conclusion, we find that
• both FNO and DeepONet of size O(m 3 ln(m)) can achieve the accuracy of O(m −1 ).
There are many ways to approximate the solution operator and emulate the corresponding numerical methods.
3
∑ N of sizes of networks may be not optimal . However, we can always connect FNO F(v) and
Thus, the estimation
DeepONet by i=1 F(v)(x j )L j (x), where L j (x) is a basis of interpolation type, i.e., L j (x j ) = 1 and L j (xi ) = 0
for all i ̸= j. Depending on the underlying problems, the basis L j (x) can be trigonometric polynomials, piecewise
or global algebraic polynomials, splines, neural networks, etc. In other words, FNO can be thought of as a special
form of branch network in DeepONet.
5. Numerical results
We compare the performance of DeepONet and FNO on 16 different problems listed in Table 3. To evaluate
the performance of the networks, we compute the L 2 relative error of the predictions, and for each case, five
independent training trials are performed to compute the mean error and the standard deviation. The dataset sizes
for each problem are listed in Table B.1, and the network sizes of DeepONets are listed in Tables C.1 and C.2. All
the data and codes are available in GitHub at https://github.com/lu-group/deeponet-fno.
Table 4
L 2 relative error for the Burgers’ equation in Section 5.1. Here, p is the number of outputs of the branch
and trunk nets.
Section 5.1 Burgers’
DeepONet w/o normalization 2.29 ± 0.10%
DeepONet w/ normalization 2.15 ± 0.09%
FNO w/o normalization 2.23 ± 0.04%
FNO w/ normalization 1.93 ± 0.04%
POD-DeepONet (w/o rescaling) 3.46 ± 0.06%
√
POD-DeepONet (rescaling by 1/ p) 2.40 ± 0.06%
POD-DeepONet (rescaling by 1/ p) 1.94 ± 0.07%
POD-DeepONet (rescaling by 1/ p 1.5 ) 2.41 ± 0.04%
Results. As discussed in Section 3.1.3, in DeepONet, we impose the periodic boundary condition by applying
four Fourier basis {cos(2π x), sin(2π x), cos(4π x), sin(4π x)} to the input of the trunk net. By using the output
normalization, the error of DeepONet decreases from 2.29 ± 0.10% to 2.15 ± 0.09% (Table 4), and FNO achieves
an error of 1.93 ± 0.04% (Table 4). Hence, the output normalization is helpful to both DeepONet and FNO.
As we showed in Section 3.1.5, DeepONet has variance proportional to p instead of 1 when each branch and
√
trunk net has variance 1 and thus a scale of 1/ p is needed. A different scaling is required for POD-DeepONet
as the precomputed POD modes are in place of the trunk nets and are not networks to be trained. We tested
√
different scaling factors, including 1/ p, 1/ p and 1/ p 1.5 (Table 4), and POD-DeepONet with the factor 1/ p has
the smallest error of 1.94 ± 0.07%, which is almost the same as the error of FNO with the output normalization.
For the following problems all POD-deepONet, we use the scale 1/ p, unless otherwise stated.
We consider two-dimensional Darcy flows in different geometries filled with porous media, which can be
described by the following equation:
− ∇ · (K (x, y)∇h(x, y)) = f, (x, y) ∈ Ω , (5.1)
where K is the permeability field, h is the pressure, and f is a source term which can be either a constant or
a space-dependent function. Boundary conditions will be described in the problem setup below. Four different
geometries are considered in the present study, including a rectangular domain in Section 5.2.1, a pentagram with
a hole in Section 5.2.2, a triangular domain in Section 5.2.3, and a triangular domain with notch in Section 5.2.4.
We generate the dataset by solving Eq. (5.1) using the MATLAB Partial Differential Equation Toolbox (for more
details see Appendix E).
Table 5
L 2 relative error for the Darcy problem in a rectangular domain in Section 5.2.1. PWC, piecewise constant.
Cont, continuous.
Section 5.2.1 Darcy (PWC) Section 5.2.1 Darcy (Cont.)
DeepONet w/o normalization 2.91 ± 0.04% 2.04 ± 0.13%
DeepONet w/ normalization 2.98 ± 0.03% 1.36 ± 0.12%
FNO w/o normalization 4.83 ± 0.12% 2.38 ± 0.02%
FNO w/ normalization 2.41 ± 0.03% 1.19 ± 0.05%
POD-DeepONet 2.32 ± 0.03% 1.26 ± 0.07%
Specifically, we keep the leading 100 terms in the KL expansion for the Gaussian process with zero mean and
the following covariance kernel:
−(x − x ′ ) −(y − y ′ )2
[ ]
K((x, y), (x ′ , y ′ )) = exp + ,
2l12 2l22
with l1 = l2 = 0.25. Both K (x) and h(x) have the same resolution of 20 × 20.
Examples of these two Darcy datasets can be found in Fig. E.1.
Results. We enforce the zero Dirichlet boundary condition on DeepONet by choosing the surrogate solution as
û(x, y) = 20x(1 − x)y(1 − y)N (x, y, K ),
where N is a DeepONet, as we discussed in Section 3.1.3. We use the coefficient 20 such that 20x(1 − x)y(1 − y) is
of order 1 for x ∈ [0, 1] and y ∈ [0, 1]. Similar to the Burgers’ problem, by using the output normalization, a better
accuracy of DeepONet and FNO is obtained (Table 5). For the case PWC, all the methods have errors around 2%,
and POD-DeepONet achieves the smallest error. For the case Cont., all the methods have the error around 1%, and
FNO is slightly better. However, POD-DeepONet is only worse by one standard deviation of the error (Table 5),
so there is no significant difference between the performance of POD-DeepONet and FNO.
Fig. 4. Darcy flow in the domains of a pentagram and a triangle. (a) Darcy flow in a pentagram with a hole: two representative cases.
For each case, Left: boundary condition; Right: pressure field. (b) Darcy flow in a triangular domain: two representative cases. Left: boundary
condition; Right: pressure field. (c) Darcy flow in a triangular domain: Left: The first four POD modes used in POD-DeepONet. Right: An
example of the augmented data for FNO training, where the padding is set by “nearest neighbor”.
We further test an additional case using dgFNO+, in which we use a uniform grid with a resolution 50 × 50.
The L 2 relative error for this case is 2.78 ± 0.01%, which is better than the dgFNO+ with the resolution 44 × 44
but is still less accurate than the DeepONet and POD-DeepONet.
Fig. 5. Darcy flow in a triangular domain with a notch. For a representative boundary condition, the pressure field is obtained using
DeepONet, dgFNO+, and POD-DeepONet. The prediction errors for the three operator networks are shown against the respective plots. The
ground truth is simulated using the PDE Toolbox in Matlab. The predicted solutions and the ground truth share the same colorbar, while
the errors corresponding to each of the neural operators are plotted on the same colorbar.
Table 6
L 2 relative error for the Darcy flows in complex geometries. dgFNO+ is the combination of dFNO+ and gFNO+.
Section 5.2.2 Section 5.2.3 Section 5.2.4
Darcy (Pentagram) Darcy (Triangular) Darcy (Notch)
DeepONet 1.19 ± 0.12% 0.43 ± 0.02% 2.64 ± 0.02%
FNO – – –
POD-DeepONet 0.82 ± 0.05% 0.18 ± 0.02% 1.00 ± 0.00%
dgFNO+ 3.34 ± 0.01% 1.00 ± 0.03% 7.82 ± 0.03%
of the triangular domain, and then use the operator networks to map the boundary conditions to the pressure field
in the entire domain. Specifically, 861 nodes are employed in the numerical solver. Similarly, we also display two
representative solutions with corresponding boundary conditions in Fig. 4(b).
Results. As demonstrated in Table 6, both the DeepONet and POD-DeepONet are more accurate than the dgFNO+.
POD-DeepONet achieves the best accuracy (0.18%) among the three test approaches. The first four POD modes
are demonstrated in Fig. 4(c)(Left), where we note that each mode is scaled by its L 2 -norm. For dgFNO+, we use
a uniform grid with a resolution of 51 × 51 and the “nearest neighbor” in Section 3.2.3, and an example of the
solution is illustrated in Fig. 4(c) (Right).
Fig. 6. Electroconvection problem: three representative cases. First row: electric potential field. Second row: cation concentration field.
Note that there is a stiff boundary at y = 0 in the concentration field, where the concentration drops from c+ = 2 to approximately zero.
Problem setup. Following the setup in [42], we consider a 2D electroconvection problem, which is a multiphysics
phenomenon involving coupling of the flow field with the electric field, the cation and anion concentration fields.
The full governing equations, including the Stokes equations, the electric potential and the ion transport equations,
can be written as follows:
∂u
= −∇ p + ∇ 2 u + f e ,
∂t
∇ · u = 0,
(5.3)
−2ϵ 2 ∇ 2 φ = ρe ,
∂c±
= −∇ · c± u − ∇c± ∓ c± ∇φ ,
( )
∂t
where u and p are the velocity and the pressure fields, respectively, and φ is the electric potential. Moreover, c+
and c− are the cation and anion concentrations, respectively. Also, ρe = (c+ − c− ) is the free charge density,
f e = −0.5ρe ∇φ/2ϵ 2 is the electrostatic body force, where ϵ is the Debye length. The investigated domain is
defined as Ω : [−1, 1] × [0, 1] with a regular mesh containing 101 × 51 grid points. By defining ϵ = 0.01, the
electroconvection described in Eq. (5.3) becomes a steady flow, where the flow pattern is uniquely dependent on
the electric potential difference (∆Φ) acted on the upper and lower boundaries.
In this problem, the operator networks are expected to learn the mapping from the 2D electric potential field
φ(x, y) to the 2D cation concentration field c+ (x, y):
G : φ(x, y) ↦→ c+ (x, y).
Different flow fields are generated by modifying the boundary condition of φ, namely using ∆Φ = 5, 10, . . . , 75,
which results in 15 steady states for network training. We also consider two unseen conditions, namely ∆Φ = 13.4
and ∆Φ = 62.15, which are applied for testing. The equations are solved by using a high-order spectral element
method. Three training cases of the electroconvection flow are demonstrated in Fig. 6 and more details are included
in [42].
Results. The evaluation errors for this example are given in Table 7, where we find that the testing errors of the
different methods are all very small. Nevertheless, POD-DeepONet outperforms others, followed by DeepONet
with data normalization. The reason that FNO performs relatively worse in this case is because of a stiff boundary
at y = 0 in the output field, where the concentration drops from c+ = 2 to approximately zero. Note that the
output resolution of this dataset is 101 × 51, which means that there are 5151 sampling points for each output
function used for training. However, it is reported in [42] that the DeepONet can be trained with much less data
20
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
Table 7
L 2 relative error for the multiphysics electroconvection problem.
Section 5.3
Electroconvection
DeepONet w/o normalization 0.26 ± 0.04%
DeepONet w/ normalization 0.28 ± 0.02%
FNO w/o normalization 1.00 ± 0.01%
FNO w/ normalization 0.43 ± 0.01%
POD-DeepONet 0.14 ± 0.03%
Table 8
L 2 relative error of wave propagation for continuous and discontinuous problems in Section 5.4.
Section 5.4.1 Section 5.4.1 Section 5.4.1
Advection (I) Advection (II) Advection (III)
DeepONet 0.22 ± 0.03% 0.27 ± 0.01% 0.32 ± 0.04%
FNO 0.66 ± 0.10% 54.4 ± 0.00% 47.7 ± 0.00%
POD-DeepONet 0.04 ± 0.00% 0.08 ± 0.00% 0.40 ± 0.00%
dFNO+1 – 0.22 ± 0.01% 0.60 ± 0.02%
dFNO+2 – 3.89 ± 1.26% 10.9 ± 2.08%
Section 5.4.2 Section 5.4.3 Section 5.4.3
Instability waves Compressible Euler ( p) Compressible Euler (N2 )
DeepONet 8.90 ± 0.60% 0.068 ± 0.011% 0.043 ± 0.006%
FNO 17.8 ± 0.92% 0.076 ± 0.005% 0.044 ± 0.004%
POD-DeepONet 20.8 ± 1.12% 0.020 ± 0.004% 0.012 ± 0.002%
measurements (i.e., 800 random points for each output function) to achieve a similar accuracy of 0.49 ± 0.04%.
This also demonstrates the flexibility of DeepONet training, since FNO requires the output representation to be on
a regular mesh, while DeepONet does not have such a stringent requirement.
21
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
Results. For case I, DeepONet, POD-DeepONet and FNO all have errors < 1%, while POD-DeepONet achieves
the smallest error (< 0.1%, Table 8). Here, the output of POD-DeepONet is rescaled by a factor of 1/ p and the
effect of different rescaling factors is summarized in Table D.1.
For Cases II and III, DeepONet and POD-DeepONet still obtain good accuracy of the order of 0.1% (Table 8).
Because FNO cannot directly map from the 1D initial condition to the 2D solution, we considered the following
three approaches.
• Brute-force FNO (2D): We first use FNO in a brute-force way. Because the input u 0 is only a function of x,
we directly repeat the same u 0 multiple times to match the 2D size of the output function, so that it is a valid
2D input. We find that the training loss is always very large ∼ O(0.1) and cannot be improved, even if we do
not truncate Fourier modes, use a large hidden dimension, and add more hidden layers. The L 2 relative error
is ∼50% (Table 8).
• dFNO+1: We use dFNO+ in Section 3.2.2 for Case I using Method 1. Compared to the brute-force FNO
above, here we also add the t coordinate as input, and the optimization of FNO improves significantly. dFNO+1
achieves an error < 1% (Table 8), but it is still worse than DeepONet and POD-DeepONet.
• dFNO+2: We also use dFNO+ in Section 3.2.2 for Case I using Method 2, i.e., instead of using FNO in 2D,
we test FNO in 1D with RNN for time-marching. We find that dFNO+2 is very hard to train, and the final
result is unstable, as also observed in [22]. In our 10 independent experiments, dFNO+2 gets stuck at a bad
local minimum with (L 2 relative error > 20%) for 4 times. Here we truncate the FFT to the first 16 modes,
but using more modes does not improve the accuracy or reduce the probability of training failure. We also
note that the training cost of dFNO+2 is much more expensive than FNO in 2D.
Fig. 7. Linear instability waves in high-speed boundary layers. (A) Visualization of an instability wave in a spatially developing boundary
layer. At the inlet to the computational domain, the base flow is superposed with instability waves. The dashed line marks the 99% thickness
of the boundary layer. The objective is to accurately predict the downstream evolution of the instability wave. (B) The training and testing
losses during the training process. (C) The L 2 relative errors of different networks for noiseless inputs and inputs with a 0.1% Gaussian
noise. Panel A is adapted from Ref. [43].
but is much worse than DeepONet with Fourier features. Although DeepONet and FNO have a similar training error,
DeepONet has a smaller testing error than FNO (Fig. 7B). In FNO, the generalization gap between the training and
testing errors is more than one order of magnitude gap, while there is almost no generalization gap for DeepONet.
We further analyze the robustness of different networks to input uncertainties by adding a Gaussian noise of
0.1% to the input functions during testing. We note that training data is still noiseless. We also add the result of
CNN in Ref. [43] for the comparison. The errors of DeepONet, POD-DeepONet and FNO only increase slightly
for noisy inputs and remain satisfactory, but the error of CNN increases by two orders of magnitudes, respectively
(Fig. 7C). This implies that the mapping that CNN has learned is unstable, but DeepONet, POD-DeepONet and
FNO are more stable.
23
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
where the enthalpy h 01 is a constant, R is the universal gas constant, Ms is the molar mass of species s, and the
internal energy es (T ) = 3R/2Ms and 5R/2Ms for mono-atomic and diatomic species, respectively. The rate of the
chemical reaction is given by
( ) ) 3
ρ2 ρ1 2 ∑ ρs
(
ω = k f (T ) − kb (T ) , k f = C T −2 e−E/T ,
M2 M1 s=1
M s
2 +b z 3
kb = k f /eb1 +b2 log z+b3 z+b4 z 5 , z = 10, 000/T,
where bi , C and E are constants which can be found in [56,58]. The model involves three species, namely, O2 , O
and N2 (ρ1 = ρ O , ρ2 = ρ O2 and ρ3 = ρ N2 ) with the reaction:
O2 + N2 ⇐⇒ O + O + N2 .
We consider the following initial condition:
pL − p R
T0 (x) = 8000 K, p0 (x) = (1 − tanh(x/η)) + p L ,
2
where η is a parameter in the range of [0.02, 0.2], and p L and p R are constant left and right pressure levels. This is
a multi-physics problem involving different quantities. Here, we only consider the pressure field and the Nitrogen
density to demonstrate the comparison between different operator learners. The neural operators are expected to
learn two mappings: from the initial condition p0 (x) to the solution p(x) at t = 0.0002, and from N2 (x, t = 0) to
N2 (x, t = 0.0002).
To generate the training data, we use the positivity-preserving high order discontinuous Galerkin (DG) schemes
developed by Zhang and Shu in [57]. In the simulation of the DG solver, we set ∆x = 10−3 and set ∆t according
to the CFL condition of the DG scheme.
Results. The L 2 relative errors of DeepONet and FNO are in Table 8 and a testing example is demonstrated in
Fig. 8. All the networks provide results with good accuracy, and POD-DeepONet performs the best. We note that
the FNO relies on Fourier transformation, which is not very accurate for discontinuous functions. On the contrary,
the DeepONet performs well for functions with discontinuity, as shown in the insets of Fig. 8.
Problem setup. Here, we predict the surface vorticity of a flapping airfoil based on the angle of attack. We perform
simulation of the flow over a NACA0012 airfoil. The flapping airfoil setup is achieved by defining an oscillating
inflow velocity, which is expressed as:
α0 π sin(2 f π t) + 1.0
u = U∞ cos( × ),
180 2
α0 π sin(2 f π t) + 1.0
v = U∞ sin( × ),
180 2
24
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
Fig. 8. Compressible Euler equations: results of a testing example. Left: neural operators for pressure. Right: neural operators for N2
density. The black dash line is the initial condition. The red, orange, blue lines denote the exact solution, the FNO prediction and the
DeepONet prediction, respectively.
where α0 = 15◦ is the reference angle of attack (AOA), f = 0.2 is the frequency and U∞ = 1. In this case, the
Reynolds number based on the chord length is Re = 2500. We can simply define the normalized time-dependent
AOA as α(t) = sin(2 f π t), which is used as the input of the learning algorithms hereafter. Specifically, we aim to
map the AOA α(t) to the vorticity on the airfoil surface, which is denoted by ω(s, t) and s is the location index on
the surface. The airfoil geometry is illustrated in Fig. 9A, where the surface of the airfoil is discretized by using
152 points.
In this example, we only have one time-dependent signal, which covers about 56.4 time units. We apply
approximately 36.6 time units for neural network training and the rest are used for prediction and validation.
Moreover, we divide the training data into multiple independent signals, which involve two periods and have
different phases. An example of the input–output functions for standard DeepONet and FNO training is illustrated
in Fig. 9B. By doing this, the training data contain 28 input–output pairs, while the testing data (about 20 time
units) are composed of two signals whose phases are not included in training.
For vanilla DeepONet and FNO, the operator can be expressed as: ω(s, t) = G(α)(s, t). In this example, we also
adopt a modified DeepONet with feature expansion, where a couple of historical states of the time signal are fed
into the trunk net as the features and replace the time coordinate (Section 3.1.2). Such an operator is denoted as:
ω(s, t) = G(α)(s, ω(s, t − 1), . . . , ω(s, t − k)),
where k denotes the number of historical states. We note that in the prediction stage, only the initial data of ω is
given. The predictions from the network itself are concatenated and fed into the trunk net for the future predictions.
A schematic of this DeepONet is shown in Fig. 9C.
Results. The L 2 relative errors of DeepONet and FNO for the testing data are given in Table 9. Note that here we
learn an operator mapping from a 1D function (i.e., function of time) to a 2D function (i.e., function of time and
space). As mentioned above, FNO requires dimension augmentation for the input function in this case. If the input
only involves the time coordinate, i.e., the brute-force way in Section 5.4.1, then FNO fails to predict correctly
the output (16.20% error). However, when the input is augmented to involve both time and space coordinates
(Section 3.2.2 Case I Method 1), the relative error of dFNO+ reduces to 3.56%.
The vanilla DeepONet predicts the output with satisfactory result (3.65%). In addition, when we apply the
modified DeepONet with 5 historical states of the investigated quantity as the features, the error becomes even
smaller (2.87%). The prediction results of this modified DeepONet are illustrated in Figs. 9D and E, where (D)
shows the time-dependent signals and (E) shows the vorticity profiles over the airfoil surface. It is worth noting
again that in the prediction process, the predictions from DeepONet are concatenated to the input of the trunk
net in order to perform future evaluation. We can observe great consistency between the truth and the DeepONet
prediction from the figures.
25
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
Fig. 9. Predicting vorticity on the surface of a flapping airfoil. (A) Geometry of NACA0012 airfoil. (B) An example of the input function
α(t) and output function ω(s, t) that are used to train the operator networks. (C) Schematic of the modified DeepONet, which includes
a few historical states in the trunk net as the features to replace the time coordinate. (D) Testing result of the modified DeepONet. Top:
2D visualization in space–time; bottom: 1D signals at different surface locations. (E) Vorticity profiles on the airfoil surface at three time
stamps. The red and blue colors in (D) and (E) represent the truth and the DeepONet prediction, respectively.
Table 9
L 2 relative error of predicting surface vorticity of a flapping airfoil.
Section 5.5
Flapping airfoil
DeepONet 3.65 ± 0.02%
FNO 16.20 ± 0.01%
DeepONet (feature expansion) 2.87 ± 0.24%
dFNO+ 3.56 ± 0.10%
26
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
Table 10
L 2 relative error for the Navier–Stokes equation in the vorticity–velocity form.
Section 5.6
Navier–Stokes
DeepONet w/o normalization 2.51 ± 0.07%
DeepONet w/ normalization 1.78 ± 0.02%
FNO w/o normalization 2.62 ± 0.03%
FNO w/ normalization 1.81 ± 0.02%
POD-DeepONet 1.36 ± 0.03%
Problem setup. Following the problem setup in [22], we consider the 2D incompressible Navier–Stokes equation
in the vorticity–velocity form:
∂t ω + u · ∇ω = ν∆ω + f , x ∈ [0, 1]2 , t ∈ [0, T ],
∇ · u = 0, x ∈ [0, 1]2 , t ∈ [0, T ],
ω(x, 0) = ω0 (x), x ∈ [0, 1]2 ,
where ω(x, y, t) and u(x, y, t) are the vorticity and velocity, respectively. The viscosity ν in our experiment is
0.001. The forcing term is defined as:
f (x, y) = 0.1 sin(2π (x + y)) + 0.1 cos(2π(x + y))
and the periodic boundary condition is imposed. The dataset we use in this paper is identical to that in [22]. We are
interested in learning the operator mapping the vorticity field at the first ten time steps t ∈ [0, 10] to the vorticity
at a target time step T = 20. The solution is determined by modifying the initial condition ω0 , which is given by
a Gaussian random field R(0, 73/2 (−∆ + 49I )−2.5 ). The spatial resolution is fixed to be 64 × 64 both for training
and testing.
Results. DeepONet in this example is imposed with periodic boundary condition in Section 3.1.3 and a CNN is
used to process the input function (namely ω(t1 − t10 )) in the branch net. For FNO, the input is the concatenation of
ω(t1 − t10 ) and the grid points. The L 2 relative errors of DeepONet and FNO are illustrated in Table 10, indicating
that the accuracy of DeepONet and FNO are comparable in this case. First, we find that the networks without data
normalization perform the worst (both giving more than 2% errors). The data normalization improves the testing
accuracy for both DeepONet and FNO, resulting in the errors of 1.78% and 1.81%, respectively. The best result
√
is obtained by POD-DeepONet (1.36%). Here, the output of POD-DeepONet is rescaled by a factor of 1/ p, and
the effect of different rescaling factors is summarized in Table D.1. Furthermore, we also computed the enstrophy
spectra (i.e., the vorticity variance) based on the vorticity information reconstructed by different networks. As shown
in Fig. 10, the spectrum of POD-DeepONet is more consistent with the spectrum of reference data, followed by
the DeepONet and FNO. The discrepancy of FNO may be caused by the truncation in the Fourier layers.
Here we consider a two-dimensional lid-driven flow in a square cavity (i.e., x, y ∈ [0, 1]), which can be described
by the incompressible Navier–Stokes equations as
∇ · u = 0,
∂t u + u · ∇u = −∇ P + ν∇ 2 u,
where u = (u, v) denotes the velocity in x− and y− directions, respectively; P is the pressure; and ν is the kinematic
viscosity. We consider two cases with different boundary conditions for the upper wall, i.e., a time-independent (Case
A) and a time-dependent one (Case B). In particular, the boundary conditions are expressed as
( )
cosh[r (x − L2 )]
Case A: u = U 1 − , v = 0,
cosh( r2L )
27
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
Fig. 10. Enstrophy spectra of the predicted vorticity fields by different methods. We note that the spatial resolution of the mesh for training
is 64 × 64.
( )
cosh[r (x − L2 )]
Case B: u = U 1 − + 0.8 sin(2π x) sin(5t) , v = 0,
cosh( r2L )
where U , r and L are constant. Specifically, r = 10, L = 1 is the length of the cavity. In addition, the remaining
walls are stationary in both cases. The aforementioned equations are then solved using the lattice Boltzmann method
(LBM) [59] to generate training data (see more details for the data generation in Appendix F).
For Case A, we generate 100 velocity flow fields at different Reynolds numbers (Re = U L/ν), i.e., spanning
from 100 to 2080 with a step size 20. We then simulate the flow fields for 10 randomly generated Reynolds numbers
within the range [100, 2080], which are not seen in the training dataset and employ them as the testing dataset. We
take the boundary condition on the upper wall as the input for the operator networks, and the corresponding output
is the converged velocity field.
In Case B, we fix the Re as 1000, and set the maximum iteration number in LBM as 1,500,000, i.e., T =
1,500,000 dt, where dt is the time step used in LBM. We then save the velocity fields every 100 time steps in the
last 10,000 iterations, suggesting that we have a total number of 100 snapshots for the velocity fields. Similarly, we
train the operator networks to learn the mapping from the boundary condition to the corresponding velocity field.
Specifically, we consider the following two cases:
• Unsteady I: We utilize the first 90 snapshots in the training and the last 10 as testing dataset.
• Unsteady II: We utilize the first 10 snapshots in the training and the last 10 as testing dataset. A representative
velocity field for Re = 1000 is illustrated in Fig. 11.
Note that the boundary condition is assumed to be known in both Cases A and B. We then use sin(5t), which
is introduced in the boundary condition in Case B as an extra feature for the networks to enhance the predicted
accuracy. In particular, we employ sin(5t) as (1) an additional input for the branch network of DeepONet (by
concatenating the boundary condition and sin(5t)), and (2) an additional channel in dFNO+, i.e., we change the
inputs from (x, y, t, u bc ) to (x, y, t, sin(5t), u bc ), where u bc represents the velocity along the x direction at the upper
wall. See more details on the feature expansion of DeepONet and FNO in Sections 3.1.2 and 3.2.4.
Results. As displayed in Table 11: (1) For Case A, the results from dFNO+ are slightly more accurate than
DeepONet, while POD-DeepONet achieves the best accuracy among them. (2) For Case B, DeepONet, dFNO+,
and POD-DeepONet without feature expansion cannot provide accurate predictions for the velocity fields, and the
relative errors from these three methods are quite similar. However, the extra feature can significantly reduce the
predicted accuracy as compared to the results without feature expansion for all operator networks. In addition,
DeepONet and dFNO+ with feature expansion have similar accuracy, and POD-DeepONet provides the most
accurate predictions among all the methods. It is also interesting to observe that we can still get quite accurate
predictions (around 2% relative errors for all methods) for the velocity field as we reduce the number of training
dataset from 90 to 10 with the feature expansion in Unsteady II.
28
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
Fig. 11. Unsteady cavity flows for Re = 1000 at T = 1,500,000 dt. (a) Velocity field obtained from LBM. Left: u, Right: v. (b) Time
series for the velocity at location A. u and v are normalized by U , and t is normalized by L/U .
Table 11
L 2 relative error for the regularized cavity flows. Unsteady I: number of training data: 90; number of testing data: 10.
Unsteady II: number of training data: 10; number of testing data: 10.
Section 5.7 Section 5.7 Section 5.7
Cavity (Steady) Cavity (Unsteady I) Cavity (Unsteady II)
DeepONet 1.20 ± 0.23% 14.4 ± 0.53%
FNO – – –
POD-DeepONet 0.33 ± 0.08% 14.2 ± 0.51%
dFNO+ 0.63 ± 0.04% 15.0 ± 0.17%
DeepONet (feature expansion) 0.51 ± 0.12% 2.24 ± 0.21%
POD-DeepONet (feature expansion) 0.18 ± 0.02% 1.51 ± 0.27%
dFNO+ (feature expansion) 0.56 ± 0.03% 1.78 ± 0.22%
6. Summary
In this study, we have investigated the performance of two neural operators that have shown early promising
results: the deep operator network (DeepONet) and the Fourier neural operator (FNO). The main difference between
DeepONet and FNO is that DeepONet does not discretize the output, but FNO does. Moreover, DeepONet can
employ any type of neural network architectures in the branch net whereas FNO has a fixed architecture, and
hence DeepONet is more flexible than FNO in terms of problem settings and datasets (see comparison details
in Table 2). Here, we have designed 16 benchmarks that have elements of industrial-complexity applications,
e.g., unsteadiness, complex geometry, noisy data, and have generated data that will be available publicly so that
interested researchers can test their own ideas against the results presented herein. In particular, we have shown
that the vanilla DeepONet and the vanilla FNO may lead to suboptimal results in several of the 16 benchmarks
that exhibit multiscale behavior and non-smooth solutions. To this end, we have proposed several extensions of
both DeepONet and FNO (e.g., POD-DeepONet, dFNO+, gFNO+, and feature expansion) to either improve their
accuracy or expand their capability to tackle diverse PDE based applications, especially for FNO to be able to deal
with problems involving complex-geometry domains and mappings with different dimensionality of the input–output
spaces.
We have compared DeepONet and FNO both theoretically and computationally. We have theoretically showed
that FNO and DeepONet of the same size exhibit the same accuracy when emulating the Cole–Hopf transformation
for Burgers’ equation. In particular, FNO in its continuous form can be thought of as a subcase of DeepONet with a
specially-designed branch network and a discrete trigonometric basis to replace the trunk net. On the computational
side, we have performed extensive experiments of 16 PDE problems, and we demonstrated that when proper
extensions were employed, both DeepONet and FNO exhibited good accuracy for diverse applications, with similar
performance in most problems.
Training neural operators could be very expensive, but the theoretical works in [23–25] have shown that both
DeepONet and FNO can break the curse of dimensionality in the input space for solution operators arising from
the majority of partial differential equations. We have also demonstrated exponential convergence of the DeepONet
error with respect to the size of the training data, see [18], although for larger data sizes the convergence rate
switches to algebraic due to the finite size of the neural network architecture. We believe that this is a possible
29
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
direction for future work, i.e., to design new architectures that can sustain exponential convergence rate for all data
sizes. Moreover, the demand for large datasets could be reduced significantly by incorporating physics-informed
learning in the loss function of the neural operators as it was demonstrated in the works of [46,47]. However, as
demonstrated in [46], a hybrid physics-data training is more effective for realistic problems of the type considered
herein, and simply using the governing equations in the loss as suggested in [47] without using data may lead to
erroneous results, e.g., for the compressible Euler equations considered in the current work.
Acknowledgments
This work was supported by DARPA/CompMods HR00112090062, DOE PhILMs project (no. DE-SC0019453),
and OSD/AFOSR MURI grant FA9550-20-1-0358, United States of America. Zhang was also partially supported
by AFOSR under award number FA9550-20-1-0056, United States of America.
Proof. By the fact that ∂x K(x, y, t) = −∂ y K(x, y, t) and integration by parts, we have
we then have ∥I1 ∥ L 2 (−π,π) ≤ Ch and ∥I2 ∥ L 2 (−π,π ) ≤ Ch and whence for some small h (large m)
∥G(u 0 )(·, t) − Gm (u0,m )(·, t)∥ L 2 (−π,π ) ≤ Ch.
Here C > 0 depends on t, M0 , M1 , and κ and C may be very large when κ is small. □
Table B.1
Dataset size for each problem, unless otherwise stated.
No. of training data No. of testing data
Section 5.1 1000 200
Section 5.2.1 PWC 1000 200
Section 5.2.1 Cont. 1000 200
Section 5.2.2 1900 100
Section 5.2.3 1900 100
Section 5.2.4 1900 100
Section 5.3 Electroconvection 15 2
Section 5.4.1 Advection I 1000 1000
Section 5.4.1 Advection II/III 1000 200
Section 5.4.2 40800 10000
Section 5.4.3 Euler 100 50
Section 5.5 Airfoil 28 2
Section 5.6 NS 1000 200
Section 5.7 Cavity (Steady) 100 10
Section 5.7 Cavity (Unsteady I) 90 10
Section 5.7 Cavity (Unsteady II) 10 10
Table C.1
DeepONet architecture for each problem, unless otherwise stated.
Branch net Trunk net Activation
function
Section 5.1 Burgers Depth 4 & Width 128 Depth 4 & Width 128 tanh
Section 5.2.1 PWC CNN Depth 5 & Width 128 ReLU
Section 5.2.1 Cont. CNN [128, 128, 100] tanh
Section 5.2.2 [128, 128] [128, 128, 128, 128] tanh
Section 5.2.3 [128, 128] [128, 128, 128] ReLU
Section 5.2.4 [128, 128] [128, 128, 128] ReLU
Section 5.3 Electroconvection [256, 256, 128] [128, 128, 128] ReLU
Section 5.4.1 Advection I Depth 2 & Width 256 Depth 4 & Width 256 ReLU
Section 5.4.1 Advection II/III Depth 2 & Width 512 Depth 4 & Width 512 ReLU
Section 5.4.2 Depth 6 & Width 200 Depth 7 & Width 200 ELU
Section 5.4.3 Euler Depth 2 & Width 256 Depth 4 & Width 256 ReLU
Section 5.5 Airfoil Depth 2 & Width 200 Depth 4 & Width 200 ReLU
Section 5.6 NS CNN [128, 128, 64] tanh
Section 5.7 Cavity CNN [128, 128, 128, 100] tanh
Table C.2
POD-DeepONet architecture for each problem, unless otherwise stated. The activation functions are the same as
those in Table C.1.
Branch net No. of POD modes
Section 5.1 Burgers Depth 3 & Width 128 32
Section 5.2.1 PWC CNN 115
Section 5.2.1 Cont. CNN 10
Section 5.2.2 [64, 64] 8
Section 5.2.3 Depth 3 & Width 128 32
Section 5.2.4 CNN 20
Section 5.3 Electroconvection Depth 3 & Width 256 12
Section 5.4.1 I Depth 2 & Width 256 38
Section 5.4.1 II Depth 2 & Width 512 38
Section 5.4.1 III Depth 2 & Width 512 32
Section 5.4.2 Depth 6 & Width 256 9
Section 5.4.3 Euler Depth 2 & Width 256 16
Section 5.6 NS CNN 29
Section 5.7 CNN 6
Table D.1
L 2 relative error for POD-DeepONet with different rescaling factors. Here, p is the number of outputs of the
branch and trunk nets.
Section 5.4.1 Advection (I) Section 5.6 Navier–Stokes
POD-DeepONet (w/o rescaling) 0.215 ± 0.027% 3.37 ± 0.22%
√
POD-DeepONet (rescaling by 1/ p) 0.078 ± 0.009% 1.36 ± 0.03%
POD-DeepONet (rescaling by 1/ p) 0.041 ± 0.003% 1.71 ± 0.03%
POD-DeepONet (rescaling by 1/ p 1.5 ) 0.042 ± 0.002% 5.35 ± 0.17%
solutions for 2000 different boundary conditions. We then use 1900 of them for training the operator networks,
and utilize the remaining for testing.
• Triangular domain: We utilize the same Gaussian processes in Eq. (5.2) to generate the Dirichlet boundary
conditions for all boundaries here, and then employ 861 unstructured meshes in our simulations. Similarly,
we generate 2000 solutions with different boundary conditions, and use 1900 of them for training and 100 for
testing.
• Triangular domain with a notch: We employ 1084 unstructured meshes in our simulations to generate 2000
solutions with different boundary conditions. From the generated datasets, 1900 samples are used for training,
while the remaining 100 are employed as testing samples.
Fig. E.1. Two datasets of Darcy flow. (a) Data from [22]. (b) Newly-generated data.
Fig. E.2. Darcy flows: Unstructured meshes for solving Eq. (5.1) in different geometries.
Table G.1
Computational cost (second) of one iteration for training different networks.
dgFNO+/dFNO+ DeepONet (vanilla) DeepONet POD-DeepONet
Section 5.2.2 Darcy (Pentagram) 0.0566 3.4862 0.0250 0.0085
Section 5.7 Cavity (case B) 0.0135 0.9072 0.0085 0.0048
Table G.2
GPU memory usage (MiB) for training different networks.
dgFNO+/dFNO+ DeepONet (vanilla) DeepONet POD-DeepONet
Section 5.2.2 Darcy (Pentagram) 1485 3311 1324 289
Section 5.7 Cavity (case B) 2183 1441 541 285
100, and we keep the first 32 and 8 modes in the dgFNO+ and POD-DeepONet, respectively. While for the problem
in Section 5.7, the batch size is 10 and we keep 64 and 5 modes in the dFNO+ and POD-DeepONet, respectively.
All computations are performed on a workstation with 2 CPUs (Intel Xeon E5-2643) and a GPU (NVIDIA TITAN
Xp). We list the computational time of one iteration for all the cases in Table G.1 and the GPU memory usage in
Table G.2.
References
[1] T.J. Sejnowski, The unreasonable effectiveness of deep learning in artificial intelligence, Proc. Natl. Acad. Sci. 117 (48) (2020)
30033–30038.
33
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
[2] G.E. Karniadakis, I.G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, L. Yang, Physics-informed machine learning, Nat. Rev. Phys. 3 (6)
(2021) 422–440.
[3] M. Raissi, P. Perdikaris, G.E. Karniadakis, Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential
equations, 2017, arXiv preprint arXiv:1711.10561.
[4] M. Raissi, P. Perdikaris, G.E. Karniadakis, Physics informed deep learning (part ii): Data-driven discovery of nonlinear partial differential
equations, 2017, arXiv preprint arXiv:1711.10566.
[5] L. Lu, X. Meng, Z. Mao, G.E. Karniadakis, DeepXDE: A deep learning library for solving differential equations, SIAM Rev. 63 (1)
(2021) 208–228.
[6] G. Pang, L. Lu, G.E. Karniadakis, FPINNs: Fractional physics-informed neural networks, SIAM J. Sci. Comput. 41 (4) (2019)
A2603–A2626.
[7] D. Zhang, L. Lu, L. Guo, G.E. Karniadakis, Quantifying total uncertainty in physics-informed neural networks for solving forward
and inverse stochastic problems, J. Comput. Phys. 397 (2019) 108850.
[8] D. Zhang, L. Guo, G.E. Karniadakis, Learning in modal space: Solving time-dependent stochastic PDEs using physics-informed neural
networks, SIAM J. Sci. Comput. 42 (2) (2020) A639–A665.
[9] L. Lu, R. Pestourie, W. Yao, Z. Wang, F. Verdugo, S.G. Johnson, Physics-informed neural networks with hard constraints for inverse
design, SIAM J. Sci. Comput. 43 (6) (2021) B1105–B1132.
[10] X. Meng, Z. Li, D. Zhang, G.E. Karniadakis, PPINN: Parareal physics-informed neural network for time-dependent PDEs, Comput.
Methods Appl. Mech. Engrg. 370 (2020) 113250.
[11] A.D. Jagtap, G.E. Karniadakis, Extended physics-informed neural networks (XPINNs): A generalized space-time domain decomposition
based deep learning framework for nonlinear partial differential equations, Commun. Comput. Phys. 28 (5) (2020) 2002–2041.
[12] J. Yu, L. Lu, X. Meng, G.E. Karniadakis, Gradient-enhanced physics-informed neural networks for forward and inverse PDE problems,
2021, arXiv preprint arXiv:2111.02801.
[13] S. Cai, Z. Mao, Z. Wang, M. Yin, G.E. Karniadakis, Physics-informed neural networks (PINNs) for fluid mechanics: A review, 2021,
arXiv preprint arXiv:2105.09506.
[14] G. Cybenko, Approximation by superpositions of a sigmoidal function, Math. Control Signals Systems 2 (4) (1989) 303–314.
[15] K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Netw. 2 (5) (1989)
359–366.
[16] T. Chen, H. Chen, Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its
application to dynamical systems, IEEE Trans. Neural Netw. 6 (4) (1995) 911–917.
[17] L. Lu, P. Jin, G.E. Karniadakis, DeepONet: Learning nonlinear operators for identifying differential equations based on the universal
approximation theorem of operators, 2019, arXiv preprint arXiv:1910.03193.
[18] L. Lu, P. Jin, G. Pang, Z. Zhang, G.E. Karniadakis, Learning nonlinear operators via DeepONet based on the universal approximation
theorem of operators, Nat. Mach. Intell. 3 (3) (2021) 218–229.
[19] I. Higgins, Generalizing universal function approximators, Nat. Mach. Intell. 3 (3) (2021) 192–193.
[20] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, A. Anandkumar, Neural operator: Graph kernel network
for partial differential equations, 2020, arXiv preprint arXiv:2003.03485.
[21] H. You, Y. Yu, M. DÉlia, T. Gao, S. Silling, Nonlocal kernel network (NKN): a stable and resolution-independent deep neural network,
Preprint (2021).
[22] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, A. Anandkumar, Fourier neural operator for parametric
partial differential equations, 2020, arXiv preprint arXiv:2010.08895.
[23] N. Kovachki, S. Lanthaler, S. Mishra, On universal approximation and error bounds for Fourier neural operators, 2021, arXiv preprint
arXiv:2107.07562.
[24] S. Lanthaler, S. Mishra, G.E. Karniadakis, Error estimates for DeepONets: A deep learning framework in infinite dimensions, 2021,
arXiv preprint arXiv:2102.09618.
[25] B. Deng, Y. Shin, L. Lu, Z. Zhang, G.E. Karniadakis, Convergence rate of DeepONets for learning operators arising from
advection-diffusion equations, 2021, arXiv preprint arXiv:2102.10621.
[26] A. Yu, C. Becquey, D. Halikias, M.E. Mallory, A. Townsend, Arbitrary-depth universal approximation theorems for operator neural
networks, 2021, arXiv preprint arXiv:2109.11354.
[27] C. Marcati, C. Schwab, Exponential Convergence of Deep Operator Networks for Elliptic Partial Differential Equations, Tech. Rep.
2021–42, Seminar for Applied Mathematics, ETH Zürich, Switzerland, 2021.
[28] N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, A. Anandkumar, Neural operator: Learning maps between
function spaces, 2021, arXiv preprint arXiv:2108.08481.
[29] K. Bhattacharya, B. Hosseini, N.B. Kovachki, A.M. Stuart, Model reduction and neural networks for parametric PDEs, 2020, arXiv
preprint arXiv:2005.03180.
[30] N. Trask, R.G. Patel, B.J. Gross, P.J. Atzberger, GMLS-Nets: A framework for learning from unstructured data, 2019, arXiv preprint
arXiv:1909.05371.
[31] H. You, Y. Yu, N. Trask, M. Gulian, M. D’Elia, Data-driven learning of nonlocal physics from high-fidelity synthetic data, Comput.
Methods Appl. Mech. Engrg. 374 (2021) 113553.
[32] R.G. Patel, N.A. Trask, M.A. Wood, E.C. Cyr, A physics-informed operator regression framework for extracting data-driven continuum
models, Comput. Methods Appl. Mech. Engrg. 373 (2021) 113500.
[33] M.D. Wilkinson, M. Dumontier, I.J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L.B. da Silva Santos,
P.E. Bourne, et al., The FAIR guiding principles for scientific data management and stewardship, Sci. Data 3 (1) (2016) 1–9.
34
L. Lu, X. Meng, S. Cai et al. Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778
[34] H. Maupin, US Army Research Laboratory South Partnership Summit 2018, Tech. Rep., Army Research Lab Aberdeen Proving Ground
United States, 2019.
[35] X. Meng, G.E. Karniadakis, A composite neural network that learns from multi-fidelity data: Application to function approximation
and inverse PDE problems, J. Comput. Phys. 401 (2020) 109020.
[36] X. Meng, H. Babaee, G.E. Karniadakis, Multi-fidelity Bayesian neural networks: Algorithms and applications, J. Comput. Phys. 438
(2021) 110361.
[37] L. Lu, M. Dao, P. Kumar, U. Ramamurty, G.E. Karniadakis, S. Suresh, Extraction of mechanical properties of materials through deep
learning from instrumented indentation, Proc. Natl. Acad. Sci. 117 (13) (2020) 7052–7062.
[38] L. Yang, X. Meng, G.E. Karniadakis, B-PINNs: Bayesian physics-informed neural networks for forward and inverse PDE problems
with noisy data, J. Comput. Phys. 425 (2021) 109913.
[39] A. Olivier, M.D. Shields, L. Graham-Brady, BayesIan neural networks for uncertainty quantification in data-driven materials modeling,
Comput. Methods Appl. Mech. Engrg. 386 (2021) 114079.
[40] X. Meng, L. Yang, Z. Mao, J.d.Á. Ferrandis, G.E. Karniadakis, Learning functional priors and posteriors from data and physics, J.
Comput. Phys. 457 (2022) 111073.
[41] C. Lin, Z. Li, L. Lu, S. Cai, M. Maxey, G.E. Karniadakis, Operator learning for predicting multiscale bubble growth dynamics, J.
Chem. Phys. 154 (10) (2021) 104118.
[42] S. Cai, Z. Wang, L. Lu, T.A. Zaki, G.E. Karniadakis, DeepM&Mnet: Inferring the electroconvection multiphysics fields based on
operator approximation by neural networks, J. Comput. Phys. 436 (2021) 110296.
[43] P.C. Di Leoni, L. Lu, C. Meneveau, G. Karniadakis, T.A. Zaki, DeepONet prediction of linear instability waves in high-speed boundary
layers, 2021, arXiv preprint arXiv:2105.08697.
[44] Z. Mao, L. Lu, O. Marxen, T.A. Zaki, G.E. Karniadakis, DeepM&Mnet for hypersonics: Predicting the coupled flow and finite-rate
chemistry behind a normal shock using neural-network approximation of operators, J. Comput. Phys. 447 (2021) 110698.
[45] C. Lin, M. Maxey, Z. Li, G.E. Karniadakis, A seamless multiscale operator neural network for inferring bubble dynamics, J. Fluid
Mech. 929 (2021).
[46] S. Goswami, M. Yin, Y. Yu, G.E. Karniadakis, A physics-informed variational DeepONet for predicting crack path in quasi-brittle
materials, Comput. Methods Appl. Mech. Engrg. 391 (2022) 114587.
[47] S. Wang, H. Wang, P. Perdikaris, Learning the solution operator of parametric partial differential equations with physics-informed
DeepONets, 2021, arXiv preprint arXiv:2103.10974.
[48] M. Yin, E. Ban, B.V. Rego, E. Zhang, C. Cavinato, J.D. Humphrey, G.E. Karniadakis, Simulating progressive intramural damage
leading to aortic dissection using an operator-regression neural network, 2021, arXiv preprint arXiv:2108.11985.
[49] A. Yazdani, L. Lu, M. Raissi, G.E. Karniadakis, Systems biology informed deep learning for inferring parameters and hidden dynamics,
PLoS Comput. Biol. 16 (11) (2020) e1007575.
[50] S. Dong, N. Ni, A method for representing periodic functions and enforcing exactly periodic boundary conditions with deep neural
networks, J. Comput. Phys. 435 (2021) 110242.
[51] X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in: Proceedings of the Thirteenth
International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, 2010, pp. 249–256.
[52] L. Lu, Y. Shin, Y. Su, G. E. Karniadakis, Dying ReLU and initialization: Theory and numerical examples, Commun. Comput. Phys.
28 (5) (2020) 1671–1706.
[53] K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in:
Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1026–1034.
[54] M. Telgarsky, Neural networks and rational functions, in: 34th International Conference on Machine Learning, ICML 2017, International
Machine Learning Society (IMLS), 2017, pp. 5195–5210.
[55] J.A.A. Opschoor, C. Schwab, J. Zech, Exponential reLU DNN expression of holomorphic maps in high dimension, Constr. Approx.
(2021).
[56] W. Wang, C.-W. Shu, H.C. Yee, B. Sjögreen, High-order well-balanced schemes and applications to non-equilibrium flow, J. Comput.
Phys. 228 (18) (2009) 6682–6702.
[57] X. Zhang, C.-W. Shu, Positivity-preserving high order discontinuous Galerkin schemes for compressible Euler equations with source
terms, J. Comput. Phys. 230 (4) (2011) 1238–1248.
[58] P.A. Gnoffo, R.N. Gupta, J.L. Shinn, Conservation equations and physical models’ for hypersonic air flows in thermal and chemical
nonequilibrium, 1989, NASA Technical Paper 2869.
[59] X. Meng, Z. Guo, Multiple-relaxation-time lattice Boltzmann model for incompressible miscible flow with large viscosity ratio and
high Péclet number, Phys. Rev. E 92 (4) (2015) 043305.
[60] Z. Guo, C. Zheng, B. Shi, Non-equilibrium extrapolation method for velocity and pressure boundary conditions in the lattice Boltzmann
method, Chin. Phys. 11 (4) (2002) 366.
35