0% found this document useful (0 votes)
14 views44 pages

Neural Operator Prediction of Linear Instability Waves in High-Speed Boundary Layers

The document discusses using deep learning methods called deep operator networks (DeepONet) to accurately predict the linear evolution of instability waves in high-speed boundary layers. DeepONet is trained to take an upstream disturbance and downstream location as input and output the perturbation field downstream. DeepONet can also solve the inverse problem of predicting upstream disturbances from downstream wall measurements, enabling efficient data assimilation. New metrics are introduced to evaluate neural operator training and performance.

Uploaded by

Zhu Wenkai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views44 pages

Neural Operator Prediction of Linear Instability Waves in High-Speed Boundary Layers

The document discusses using deep learning methods called deep operator networks (DeepONet) to accurately predict the linear evolution of instability waves in high-speed boundary layers. DeepONet is trained to take an upstream disturbance and downstream location as input and output the perturbation field downstream. DeepONet can also solve the inverse problem of predicting upstream disturbances from downstream wall measurements, enabling efficient data assimilation. New metrics are introduced to evaluate neural operator training and performance.

Uploaded by

Zhu Wenkai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Journal Pre-proof

Neural operator prediction of linear instability waves in high-speed boundary layers

Patricio Clark Di Leoni, Lu Lu, Charles Meneveau, George Em Karniadakis and Tamer
A. Zaki

PII: S0021-9991(22)00856-7
DOI: https://doi.org/10.1016/j.jcp.2022.111793
Reference: YJCPH 111793

To appear in: Journal of Computational Physics

Received date: 18 January 2022


Revised date: 21 September 2022
Accepted date: 14 November 2022

Please cite this article as: P.C. Di Leoni, L. Lu, C. Meneveau et al., Neural operator prediction of linear instability waves in high-speed
boundary layers, Journal of Computational Physics, 111793, doi: https://doi.org/10.1016/j.jcp.2022.111793.

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and
formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and
review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal
pertain.

© 2023 Published by Elsevier.


Highlights
• Deep operator networks (DeepONet) are trained to accurately predict the linear evolution of instability waves in high-speed boundary
layers.
• DeepONet can also approximate the inverse operator, from downstream observations to upstream signal.
• Combining inverse and forward predictions, DeepONet can accelerate data assimilation in high-speed flows.
Neural operator prediction of linear instability waves in high-speed
boundary layers

Patricio Clark Di Leoni,1 Lu Lu,2 Charles Meneveau,1


George Em Karniadakis,3 and Tamer A. Zaki1, ∗
1
Department of Mechanical Engineering,
Johns Hopkins University, Baltimore, MD 21218, USA
2
Department of Chemical and Biomolecular Engineering,
University of Pennsylvania, Philadelphia, PA 19104, USA
3
Division of Applied Mathematics and School of Engineering,
Brown University, Providence, RI 02912, USA

1
Abstract
We investigate if neural operators can predict the linear evolution of instability waves in high-
speed boundary layers. To this end, we extend the design of the DeepOnet to ensure accurate and
robust predictions, and also to perform data assimilation. In particular, we train DeepONet to
take as inputs an upstream disturbance and a downstream location of interest, and to provide as
output the perturbation field downstream in the boundary layer. DeepONet thus approximates the
linearized and parabolized Navier-Stokes operator for this flow. For successful application to the
high-speed boundary layer problem, we add sample weighting and Fourier input features to the reg-
ular DeepONet formulation. Once trained, the DeepOnet can perform fast and accurate predictions
of the downstream disturbances within the range of training frequencies (inside the distribution).
In addition, we show that DeepONet can solve the inverse problem, where downstream wall mea-
surements are adopted as input, and a trained network can predict the upstream disturbances that
led to these observations. This capability, along with the forward predictions, allows us to perform
a full data assimilation cycle efficiently: starting from wall-pressure data, we predict the upstream
disturbance using the inverse DeepONet and its evolution using the forward DeepONet. Finally, we
introduce three new metrics to benchmark the training, evaluation and break-even cost of neural
operators.

PACS numbers:


corresponding author, email: [email protected]

2
I. INTRODUCTION

The early stages of transition to turbulence in high-speed flight often involves the ex-
ponential amplification of linear instability waves, which ultimately become nonlinear and
cause breakdown to turbulence. The potential impact of premature transition on a flight
vehicle can be undesirable due to the increased drag or even catastrophic due to the exces-
sive local heating. Therefore, ongoing research aims to accurately examine each stage of the
transition process using theory [1, 2], simulations [3] and experiments [4]. At every stage,
the transition process can be significantly altered by uncertain elements, e.g., in the flow
profile and boundary conditions [5], free-stream noise [6, 7] or vibrations [8]. While nonlinear
optimization strategies can be adopted to discover the most dangerous configurations and
to mitigate them [9, 10], these techniques are computationally very costly. Accurate and
efficient approaches to predict the different stages of transition, starting with the early devel-
opment of exponential instability waves, which is the focus of the present effort, are therefore
pacing items for robust design and optimization of high-speed flight [11, 12]. Figure 1 shows
a visualization of an instability wave in a high-speed, spatially developing boundary layer:
the goal is to accurately predict how the upstream instability wave will amplify or decay
within a region of interest downstream.
Several data-driven methods have been proposed to determine the amplification of insta-
bility waves, ranging from complex data fits [13, 14] and numerical look-up tables [15–17]
to artificial neural networks [18–21]. All these approaches attempt to predict amplification
factors over a range of frequencies of the instability waves, Reynolds numbers and flow condi-
tions based on data generated from stability theory and eN method. One issue that all these
methods encounter is how to account for the shape of both the incoming perturbations and
the underlying base-flow profile. A commonly adopted approach to characterize the base-
flow profile is to reduce its functional form to a single number, namely the shape factor,
and use this value as input to the prediction method. This approach has been shown to
lack the necessary expressivity, and hence alternatives that rely on generating reduced-order
representations using convolutional neural networks have been developed [21], although the
need to accommodate functional input still remains.
Of equal importance and complexity is the inverse of the above-described problem, namely
determining incoming perturbations from downstream data. This class of problems falls

3
FIG. 1: Visualization of an instability wave in a spatially developing boundary layer. At the inlet
to the computational domain, the base flow is superposed with instability waves. The dashed
line marks the 99% thickness of the boundary layer. The objective is to accurately predict the
downstream evolution of the instability wave.

within the realm of data assimilation (DA). In the context of fluid dynamics, DA has found
success in tackling the problem of state reconstruction using a variety of techniques, such as
adjoint methods [22–24], ensemble approaches [25, 26], nudging [27, 28] and neural networks
[29–31]. In the context of high-speed boundary-layer stability, Buchta et al. [32] utilized an
ensemble-variational (EnVar) method to determine inflow perturbations from downstream
wall-pressure measurements on a cone. The sensors in that case were placed in the nonlinear,
transitional flow regime. One important consideration is that the EnVar procedure must be
repeated for each new set of measurements. Therefore, pre-trained neural networks, which
can be evaluated very quickly relative to performing a new simulations, have the potential
to accelerate the solution of these inverse problems.
Traditional neural networks, by virtue of the universal approximator theorem [33], are
built to approximate functions. While useful for many applications, they may be at a dis-
advantage when tackling the problems described above where the input is itself a function,
e.g., the shape of the upstream instability wave, and we want to examine quickly the effect of
different disturbances on different velocity profiles. Two approaches can deal with this type

4
of input. The first is the notion of an evolutional deep neural network (EDNN [34]), which
can solve the governing equations to evaluate the downstream evolution of disturbances. In
this framework, the network represents a flow state at a given location, so its initial state
represents the upstream disturbance; the network parameters are subsequently updated de-
terministically using the governing equations to predict the downstream evolution of the
instability waves [34]. This approach is therefore accurate, predictive and solves the govern-
ing equations for every new configuration of interest. Here, we focus on a different class of
networks that rely on offline pre-training to learn the operator that governs the dynamics
of the instability waves and, once trained, can make fast online predictions. Such networks
harness the power of the universal operator approximator theorem [35], which states that
neural networks can approximate functionals and operators with arbitrary accuracy, and
have shown great promise. For example, [36] utilized long short-term memory (LSTM) net-
works to learn a functional that predicts the motion of vessels in extreme sea states, and [37]
developed a Fourier-based method and used it to predict the evolution of 2D incompressible
flows. Of particular importance to our work is the deep operator network (DeepONet, [38]),
a composite branched type of neural network based directly on the theorems by Chen and
Chen [35], and which can successfully model a wide range of problems including ODEs,
PDEs and fractional differential operators. Beyond the cost of their training, these neural
operators are able to approximate the targeted operators with high accuracy and speed.
DeepONet can also be easily integrated into data assimilation schemes involving multiscale
multiphysics problems; for example, Cai et al. [39] applied them to electroconvection while
Mao et al. [40] demonstrated their ability to predict the flow and finite-rate chemistry behind
a normal shock in high-speed flow.
To summarize, in high-Mach-number applications, the amplification of instability waves
is an important precursor to laminar-to-turbulence transition. Accurate and fast predictions
of these waves are therefore required, and motivate the development of new computational
models. In this context, we propose to use DeepONet for the prediction of the evolution
of linear instability waves in compressible boundary layers. We show that DeepONet can
learn to reproduce solutions of the parabolized stability equations (PSE), a linearized and
parabolized set of equations derived from the full Navier-Stokes equations that describe the
evolution of perturbations in a developing boundary layer. The pre-trained DeepONet is
accurate and orders of magnitude faster than recomputing the data using the equations.

5
Specifically, We introduce two important modifications to the original layout of DeepONet
[38], namely the usage of Fourier features at the input and a physically-motivated sample
weighting scheme that allowed us to achieve satisfactory levels of accuracy. Moreover, we
demonstrate how a neural operator like DeepONet can be used creatively to tackle the
inverse problem of determining the upstream disturbance environment from very limited
wall measurements.
The paper is organized as follows. In Section II, we present the DeepONet architecture
and provide a simple example. In Section III, we outline the equations and regions in
parameter space of the flow, while in Section IV we explain the details of the data generation
and of the training protocols. The results are presented in Section V, including a comparison
between DeepONet and both the Convolutional Neural Network (CNN) and the Fourier
Neural Operators (FNO) [37]. The conclusions are presented in Section VI.

II. DEEPONET ARCHITECTURE

Branch Network
f (ξ1 ) σ
..
. σ

f (ξm ) σ

Trunk Network
× G(f (ξ1 ), · · · , f (ξm ))(ζ)
σ σ

ζ σ σ

σ σ

FIG. 2: The DeepONet architecture consists of two neural networks, the branch (for the input
space) and the trunk (for the output space). The green nodes on the left are the inputs. The blue
nodes indicate the hidden units, with the ones marked with σ denoting those that use activation
functions σ. The crossed red node on the right indicates the output, which is obtained by taking
the dot product of the final layers of the branch and trunk networks.

Here we present some background on DeepONet, which follows the original presentation
by Lu et al. [38]. Let G† be an operator which maps an input function f to an output
function G† (f ), and let ζ ∈ Y be a point in the domain of the output function (Y can be a
subset of R or Rn , indistinctly). We define points [ξ1 , ξ2 , · · · , ξm ] in the domain of f , such

6
that [f (ξ1 ), f (ξ2 ), · · · , f (ξm )] is a discrete representation of f . The objective of DeepONet
is to approximate the operator G† (f )(ζ) by a DeepONet G(f (ξ1 ), f (ξ2 ), · · · , f (ξm ))(ζ). The
structure of G is shown in figure 2 and corresponds to the “stacked” version presented by
Lu et al. [38]. The input is indicated with green nodes, hidden units are indicated with
blue nodes, and the output is indicated with a red node. The network is separated into two
subnetworks, a branch network of depth db that handles the discretized function input, and
a trunk network of depth dt that handles the input of the final function. Each subnetwork
consists of a fully-connected feed-forward neural network where every hidden unit is passed
through an activation function σ, except for the last layer of the branch network which does
not go through any activation function. The depth, or number of layers, of both subnetworks
can be different, and similarly the widths of their different layers, apart from their respective
last layers which must have have the same width. In practice, we use the same number of
hidden units p in every layer. The final output is obtained by performing the dot product
of the last layer of each subnetwork plus a bias term,
p
X
G(f (ξ1 ), f (ξ2 ), · · · , f (xm ))(ζ) = bk tk + b0 , (1)
k=1

where bk and tk are the values of the units for the last layer of the branch and trunk networks
and b0 is the extra bias term. As mentioned above, the last layer of the branch network does
not go through activation functions. Therefore, the output of the DeepOnet, Eq. (1), can
be thought of as the combination of basis functions given by the trunk network weighted by
coefficients given by the branch network [35, 38].
For simplicity, we omit the explicit sampling/discretization of the functional input in
G(f (ξ1 ), f (ξ2 ), · · · , f (ξm ))(ζ) from now on and abbreviate it as G(f )(ζ). Finally, DeepONet
can be trained by minimizing a loss function of the type
N
1 X
L= wi |G(fi )(ζi ) − G† (fi )(ζi )|2 , (2)
N i=1

where (fi , ζi ) are the N different pairs of functions and trunk inputs used for training and
wi are the associated weights, which in the simplest case are taken to be equal to unity for
every sample. We go into the detail of how we chose the values of wi below. The training can
be performed with either a single-batch high-order method such as L-BFGS or a mini-batch
stochastic gradient-descent method like Adam. A comparison between DeepONet and the

7
convolutional neural networks (CNN) as well as the Fourier Neural Operator (FNO) [37] is
presented in Sec. V F.
While the standard DeepONet architecture is established, important physical considera-
tions must be taken into account, which impact the learning of the network. We describe
these considerations in the next sections, and introduce new metrics to evaluate the compu-
tational cost of the DeepONet.

A. Loss-function weighting

In the physical cases of interest in this work, the amplitude of the different target fields
varies by more than two orders of magnitude. Having such disparity in the dataset can pose
a problem for a gradient descent-based training protocol. This becomes clear when looking
at the gradient of equation (2) with respect to the network parameters θ,
N
∂L 2 X ∂G(fi )(ζi )
= wi (G(fi )(ζi ) − G† (fi )(ζi )) . (3)
∂θ N i=1 ∂θ

During training Gi − G†i ∝ G†i , and the optimization is less sensitive to small-amplitude
solutions that lead to small changes in the loss function when applying a gradient update,
relative to amplifying modes. In order to mitigate this issue, we propose using wi = A−1
i

with
Ai = max G(fi )(ζ), (4)
ζ∈Y

as weights in the loss function in equation (2).

B. Fourier input features

Another characteristic of our dataset that can hinder the convergence of the networks
is its highly oscillating nature, as networks can easily learn how to predict mean values
while ignoring high-frequency fluctuations. To mitigate this potential difficulty, we perform
a harmonic feature expansion on the input of the trunk network,

ζ 7→ (ζ, cos(20 πζ), sin(20 πζ), cos(21 πζ), sin(21 πζ), · · · cos(2n πζ), sin(2n πζ)). (5)

We are thus replacing the trunk input ζ by ζ and harmonics of various orders. Details on
the choice of n are given below for each case.

8
C. Proposed DeepONet cost metrics

DeepONets are fast and inexpensive to evaluate, but require to be trained beforehand.
Training has two types of costs involved, one related to generating the dataset, which in
our case amounts to performing several numerical simulations of the desired system, and
another one related to the actual gradient descent-based training procedure. In order to get
a good understanding of these different costs and how they compare with each other, we
define the following three metrics: the training ratio, Rt , the evaluation ratio, Re , and break
even number, Ne∗ , as
Ct Ce Ct
Rt = , Re = , Ne∗ = Ns + , (6)
Ns C s Cs Cs
where Ct is the cost in time of training the DeepONet, Ns is the number of simulations needed
to generate the dataset (i.e., the number of times the PSE was solved, as is explained below
in Section IV), Cs is the cost in time of running each simulation, and Ce is cost in time of
evaluating a DeepONet. The training ratio compares the cost of training against the cost of
generating data. The evaluation ratio compares the cost of evaluating the DeepONet against
the cost of performing a single simulation. The break even number indicates the number
of evaluations at which the DeepONet becomes beneficial compared to the simulation tool,
and stems from the analysis of the total cost ratio
Ns C s + C t + Ne C e
Rc = , (7)
Ne C s
which compares the total cost of generating data, training the DeepONet and evaluating
Ne different solutions to the cost of generating all Ne different solutions with simulations.
Equating the ratio Rc to unity and using that Ce  Cs yields the expression for Ne∗ reported
above.

III. LINEAR INSTABILITY WAVES IN COMPRESSIBLE BOUNDARY LAYERS

We now introduce the governing equations for high-Mach number flows, with our interest
being the linear evolution of instability waves in zero-pressure-gradient boundary layers.
Due to the sensitivity of the instability waves, accuracy of the generated data is essential
for assessment of the DeepONet. For this reason, we describe the data generation procedure
in detail. We take (x, y) to be the streamwise and wall-normal coordinates. The state

9
vector q̃ = (ρ̃, ũ, ṽ, T̃ ) is comprised of the fluid density, the velocity components in the two
coordinate directions and the temperature. The free stream has characteristic velocity U0 ,
temperature T0 , specific heat ratio γ, viscosity µ0 and density ρ0 . The starting location of the
p
flow domain under consideration is located at x0 , and the Blasius length L0 = µ0 x0 /ρ0 U0
is adopted as the characteristic lengthscale. The inflow Reynolds number is therefore Re0 =

ρ0 U0 L0 /µ0 and the Mach number is M0 = U0 / γRT0 where R is the gas constant.
The flow satisfies the Navier-Stokes equations for an ideal compressible gas,

∂ ρ̃
+ ∇ · (ρ̃ũ) = 0, (8)
∂t
∂ ρ̃ũ
+ ∇ · (ρ̃ũũ + p̃I − τ ) = 0, (9)
∂t
∂E
+ ∇ · (ũ[E + p̃] + θ − ũ · τ ) = 0, (10)
∂t

where ũ is the velocity vector, I is the unit tensor, E = ρ̃e + 0.5ρ̃ũ · ũ is the total energy,
e is the specific internal energy, τ is the viscous stress tensor, and θ is the heat-flux vector.
Thermodynamic relations for p̃ and T̃ , as well as the expression for τ and θ close the system
and can be found in the literature, e.g., [9].

A. Parabolized Stability Equations

In the early stages of their development, small-amplitude instability waves in a boundary


layer can be accurately described by the linear parabolized stability equations (PSE). The
equations are derived from the Navier-Stokes equations by decomposing the flow state q̃
into the sum of a base flow and a perturbation (see figure 1). The base flow in this case is
the undistorted, spatially developing boundary-layer solution Q = (ρB , UB , VB , TB )T . The
equations governing the perturbation field q = (ρ, u, v, T )T are then linearized,

∂q
Vt + L(Q)q = 0, (11)
∂t

where Vt is the linear operator matrix, and L is the linear differential operator matrix [41]

∂ ∂
L =V0 + Vx + Vy +
∂x ∂y
2
∂ ∂2 ∂2
Vxx 2 + Vxy + Vyy 2 .
∂x ∂x∂y ∂y

10
The exact form of the operator matrices V is provided in Appendix A. We introduce the
following ansatz for the perturbations,
Z x 
q = q̌(x, y) exp α(s)ds − iωt + iφ + c.c., (12)
x0

where q̌ = (ρ̌, ǔ, v̌, T̂ )T , α is the local complex-valued streamwise wavenumber, ω is the
perturbation frequency, and φ is the phase. Since downstream amplification can be absorbed
in either q̌ or α, an additional constraint is required. We adopt the condition
Z ∞  
∗ ∂ ǔ ∗ ∂v̌ ∗ ∂ w̌
ρB ǔ + v̌ + w̌ dy = 0. (13)
0 ∂x ∂x ∂x
Substituting the solution ansatz (12) into equation (11) yields the PSE,
∂ q̌
Ǎ(Q) + L(Q, α, ω)q̌ = 0, (14)
∂x
where Ǎ and L are linear differential operators whose expressions are provided in Ap-
pendix A. Further details and explanations about the derivation of the PSE can be found
in Refs. [5, 41]. This formulation is linear and the parabolization allows for the solution
to be marched downstream instead of solving the full domain all at once. Nonetheless, the
solution procedure requires careful consideration to ensure numerical stability and accuracy.
Once solved, the generated data can be used to train neural networks that are better suited
for making fast predictions.

B. Instability modes

The disturbances of interest are instability waves, which depend on the flow param-
eters. We will consider air with Prandtl number P r = 0.72 and ratio of specific heats
γ = 1.4. The free-stream Mach number is M0 = 4.5 and the free-stream temperature
is T0 = 65.15K. The neutral curves for spatial instabilities in a parallel, zero-pressure-
gradient boundary layer are shown in figure 3, reproduced from [9]. The instability fre-
quency is F = ω106 /Re0 , and the two shaded regions mark two different classes on unstable
modes: The lower region corresponds to three-dimensional vortical instabilities that have
their origin in Tollmien-Schlichting waves when traced back to lower Mach numbers; The
upper region corresponds to the Mack second modes. At our Mach number and above, the
Mack modes start to become dominant and are recognized to be a key contributor to tran-
sition at high Mach numbers. In high-altitude flight test at M0 > 4, flows become unstable

11
FIG. 3: Neutral stability curves of the compressible boundary layer at M0 = 4.5. The gray and
blue regions mark the first and second unstable Mack modes, respectively. The green line denotes
the region of parameter space used as input for the DeepONet, while the red square denotes the
output region.

√  p 
at around Rex ≡ ρ0 U0 x/µ0 = 2000 [42, 43]. For this reason we set the inflow location
p
of our configuration slightly upstream, at Rex=x0 = 1800, so as to cover the region where
instability waves would be observed. At this chosen inflow Reynolds number, the unstable
two-dimensional Mack second modes span the frequency range 100 ∼ < F < 125 as shown in

figure 3. This frequency range will be the focus of our DeepONet training.

IV. PROBLEM SET-UP AND DATASET GENERATION

Our goal is to use DeepONet to approximate the PSE operator, and its inverse, between
upstream and downstream regions of the flow and the mapping between different field vari-
ables. We generated data to train and test our DeepONet by simulating the evolution of
instability waves using the parabolized stability equation (14) and the code described in
[5]. The code generates the inflow perturbation modes through linear stability theory and
all modes were normalized so as to have the same total energy. The domain of integration

spans 1800 ≤ Rex ≤ 2322 and y/L0 ∈ [0, 220]. The equations were solved for 67 different

12
FIG. 4: Real part of the streamwise wavenumber α as a function of perturbation frequency F . The
modes marked with blue dots were used to generate the training dataset, while the modes marked
with red squares were reserved for testing purposes.

perturbation frequencies in the range F ∈ [100, 125]. Figure 4 shows the real part of the
streamwise wavenumber α as a function of the perturbation frequencies F that we consider.
Modes marked with blue circles were used to generate the training datasets, while the eight
modes marked with red squares were used as independent validation data and the eight
modes marked with green stars were used to generate tests. The resolution in frequency of
the training dataset is ∆F = 0.5. Within the range of F of interest, only two-dimensional
Mack modes were considered since they are recognized as an important precursor of transi-
tion in high-speed flows. Note that in addition to selecting the frequency of an instability
wave, we can also arbitrarily adjust its phase. The data generated by evolving the PSE for
different perturbation frequencies were used to craft the different training, validation and
testing datasets for each case that we examine in this work. The details are given below.
We separate the spatio-temporal flow domain into three regions. The first,
p
Yu = { Rex=x0 = 1800, y/L0 ∈ [0, 30], t ∈ [0, 4τ0 ]}, (15)

is the upstream position where the instability waves enter the domain of interest and, at
this position, the waves are a function of the wall-normal coordinate and time; the period
τ0 = 34.9 corresponds to the lowest-frequency wave. This region was discretized using
20 points in time and 47 points in the vertical direction following the computational grid

13
outlined in [5]. The second region,
p
Yd = { Rex ∈ [2200, 2322], y/L0 ∈ [0, 30], t = 0}, (16)

is a downstream region where the instability waves depend on the streamwise and wall-
normal positions, and have amplified or decayed relative to their inflow amplitude. This
domain is not spatially contiguous with Yu , and its boundaries were chosen such that the
perturbations have experienced a significant change compared to their states at Yu , and also
undergo considerable amplification and decay within Yd . Note that the DeepONet effectively
learns the flow response as a function of space at a specific time. Extension to learn the
time-dependent solution is straightforward, and is not considered here since it is a simple
harmonic dependence as shown in equation (12). Region Yd was discretized using 111 points
in the horizontal direction and 47 points in the vertical direction. The last region,
p
Yw = { Rex ∈ [2300, 2322], y/L0 = 0, t ∈ [0, 4τ0 ]}, (17)

marks a narrow streamwise extent along the wall. This region was discretized using 109
points in the horizontal direction and 40 points in time.

A. Cases, hyperparameters, and training protocols

We trained several DeepONets under a variety of setups. The first version is termed the
forward case F, whose idea is illustrated in figure 1. In this case the goal is to map an inflow
perturbation to its evolution downstream and we only work with the streamwise velocity
component u of the full flow state q. The input to the branch net is u evaluated in the
subdomain Yu discretized using 47 points in the wall-normal coordinate and 20 in time, thus
totalling 940 sensors. The trunk net, on the other hand, is evaluated at a point downstream
(x, y) ∈ Yd . The output of the DeepONet is u(x, y), where the dependence on t is dropped
since Yd is evaluated at t = 0. Under the notation presented in Sec. II, f = u, ξ = (y, t) ∈ Yu ,
ζ = (x, y) ∈ Yd and G(f ) = u. Note that the input and output domains are not adjacent.
The goal of the DeepONet is shown schematically in figure 3, where the green line denotes
the input and the red area denotes the output region. The training dataset for this case
was generated by picking N solutions with different frequencies F (from the training set
marked in figure 4) and phases φ. From each solution only one point, chosen at random,

14
was used for the trunk evaluation. As the PSE were solved using Chebyshev discretization on
Gauss-Lobatto points, the training data preferentially resolves the near-wall region within
the boundary layer, where the instability waves are located. It is also possible to use a
smaller pool of solutions and sample more points (x, y) for the trunk evaluation; in our
experience choosing either strategy yields similar prediction accuracy as long as the number
of solutions with different frequencies and phases used is sufficiently large. The validation
dataset was generated similarly at independent frequencies F , as shown in figure 4. The
testing dataset is also comprised of unseen frequencies but, very importantly, these modes
were evaluated over the whole output domain Yd , i.e., for each solution we use many (x, y)
points for testing. For this reason, the results below using testing data are shown as contour
plots, from multiple evaluations of the DeepONets since the network output is only a single
point. For each input testing frequency out of the eight possible ones, we picked 10 random
phases; thus, the testing dataset contained a total of 417,360 points, divided into 80 different
examples.
The next cases focus on retrieving other field variables. Using the same setup as case F, we
define cases Fp and FT , where the target outputs are now the pressure and temperature fields,
respectively. Note that the input to both cases is still the streamwise velocity perturbation
u evaluated at Yu . When solving the PSE, all five fields have to be evaluated concurrently,
as they are coupled. With these two cases our objective is to show how a DeepONet, which
learns an operator using data, may be trained separately for different fields. DeepONet thus
learns the PSE and the observation operator that extracts the specific field of interest. The
respective training, validation and testing datasets for these cases were generated in the
same way as for case F, just changing the corresponding output.
As a counterpart to the forward case F, we also examine an inverse case I. The goal here
is to reconstruct an inflow perturbation from downstream wall-pressure measurements. A
diagram of this problem is shown in figure 5. We use the pressure field evaluated at Yw as
input to the branch network, evaluate the trunk network at points (y, t) ∈ Yu , and output
u. The subdomain Yw is discretized using 47 points for x and 40 points to t, totalling 1880
sensors. It is worth noting that in this setup, the inverse problem is well posed and has
unique solutions as each perturbation frequency generates a distinct response. Again, the
training, validation, and testing datasets for this case were generated following the same
procedure as for case F, but using the aforementioned inputs and outputs. We note that as

15
the sizes and complexities of the target fields are not the same, the number of points used
for training the forward and inverse cases are different, even though the set of simulations
from which we queried these points is indeed the same.

Unknown incoming
perturbation

Pressure
measurements
at the wall

FIG. 5: Diagram outlining the setup of the inverse problem. The DeepONet takes measurements of
the pressure at the wall downstream as input to the branch net, and outputs the inflow perturbation
that generated such pressure fluctuations.

Finally, as the PSE is a linear equation we can generate solutions with more than one
perturbation frequency using the superposition principle. However, as neural networks are
inherently non-linear operators, DeepONets cannot predict the evolution of superposed in-
stability waves unless explicitly trained to do so. We therefore consider two further cases,
F2 and I2 , which expand upon cases F and I by considering pairs of instability waves,

q = a1 q 1 + a2 q 2 , (18)

where q1 and q2 are two different solutions and a1 , a2 ∈ [0.9, 1.1] are their respective ampli-
tudes, to generate the datasets. The range of values for a1 and a2 was chosen such that the
difference in amplitude between the two modes remains manageable while being significant.
The training, validation, and testing datasets for these cases were generated in a similar
manner as the one-mode cases, the only difference being a choice of the frequencies, phases

16
and amplitudes of the pairs of modes to superpose. A sensible estimate of the size of the
training dataset for the two-modes cases is the square of that for one-mode, which is equiv-
alent to combining every mode in the dataset with all the others. In practice, the datasets
used were between two and eight times larger than the single-mode cases (precise numbers
are given below), which indicates that DeepONet training does not require every possible
combination. The testing datasets are composed of 300 examples, each evaluated at every
point in their output domain (so a total of 300 × 111 × 47 points for F2 and 300 × 20 × 47
for I2 ). In Sec. V F we show a comparison between DeepONet, CNN and FNO using F2 .
While all frequencies studied are unstable at the inflow to the domain, the higher fre-
quencies become stable and decay considerably as they evolve downstream. As a result, the
target fields differ in amplitudes by more than two orders of magnitude. In order to deal
with this issue, we weight the different samples following the choice of wi = A−1
i outlined in
Sec. II A. As a way of showing the importance of weighting each sample appropriately we
define two more cases, FA0 and FA2 , that use different weights. Both are based on case F
but use wi = 1 and wi = A−2
i , respectively.

Cases db dt p N

F 5 6 100 3.3 × 106


Fd−2 3 4 100 3.3 × 106
Fd−1 4 5 100 3.3 × 106
Fd+1 6 7 100 3.3 × 106
Fd+2 7 8 100 3.3 × 106
F2 6 6 200 2.6 × 107
I 7 5 90 4.8 × 106
I2 8 8 100 9.6 × 106

TABLE I: Depth of the branch network, db , depth of the trunk network, dt , width of the networks,
p and number of samples in the training dataset used for each case, N . All forward cases use the
same architecture as case F, except for F2 , Fd−2 , Fd−1 , Fd+1 and Fd+2 .

As mentioned in Sec. II B, we use a Fourier Features Input expansion in order to deal with
the highly-oscillating nature of our data. All forward cases previously defined use feature
expansion up to n = 1, i.e., including wavenumbers 20 π and 21 π. The trunk input of case

17
F, for example, then takes the form,

(x, y) 7→(x, y,
cos(πx), sin(πx), cos(πy), sin(πy),
cos(2πx), sin(2πx), cos(2πy), sin(2πy)).

To examine the impact of the feature expansion we define one more set of cases, all based
on case F but with different number of features expansions: Fnf with no harmonic features;
Fn{0,2,3,4} with features up to n = {0, 2, 3, 4}. Note that the inverse cases did not adopt the
feature expansion.
Finally, in order to study the impact of varying the depth of the DeepONets we define a
last set of cases, all based on case F but with different number of layers in both the trunk
and branch networks. Cases Fd−1 and Fd−2 have one and two layers less (in both trunk and
branch) than case F, respectively, while cases Fd+1 and Fd+2 have one and two layers more
than case F, respectively.
The depth and width of the networks and the number of samples in the training dataset
used is reported in Table I. In all cases, Exponential Linear Units (ELUs) were used as
activation functions and the Glorot algorithm (also known as the Xavier algorithm) was
used for initialization of the network. All networks also have an input and output min-
max normalization layer that ensures the values entering and predicted by the network are
between −1 and 1. When feature expansion is adopted, it is preceded by the normalization
layer.
All cases were trained by minimizing the loss function (2) using the Adam algorithm.
Mini-batches of 1000 samples and an initial learning rate η = 10−4 were used in every case.
The learning rate was reduced to 10−5 if the value of the loss function reached a plateau or
started to increase. An early stopping protocol was adopted in order to retain the optimal
state. No further regularization procedures were adopted.
As a summary, we present a list and short description of every case performed in Table II.

18
Cases Description

F Forward case
Fp , F T Forward cases mapping to pressure and temperature
FA0 , FA2 Forward cases with alternative loss function weights
Fn{f,0,2,3,4} Forward cases with different number of input features
Fd{−2,−1,+1,+2} Forward cases with different number of layers
F2 Forward case with two-mode combinations
I Inverse case
I2 Inverse case with two-mode combinations
A Data assimilation case

TABLE II: Summary of all the different cases presented. Details are provided in Section IV.

V. RESULTS

A. Forward problems

We start by presenting the results of case F, where an inflow instability wave is mapped to
the associated downstream velocity field. Figure 6 shows the evolution of the value of the loss
function evaluated on the training and validation datasets as a function of the training epoch.
After a brief plateau in the loss where the network outputs zero for every input (all solutions
have zero mean), both curves decrease by several orders of magnitude which indicates that
the DeepONet is able to learn the correct mapping of the data. As a first qualitative
assessment of how well the trained DeepONet performs, we show in figure 7 the prediction
of two different modes evaluated over the whole output domain Yd . The two modes were
selected from the testing dataset, i.e. their frequencies were never seen during training. The
figure also shows the true field u(x, y) and a comparison of the profiles for a fixed y/L0 = 5.
The DeepONet correctly predicts the wall-normal profile, streamwise wavelength, phase,
and amplitude of each solution. While the loss function takes much smaller values when
evaluated on the training dataset compared to the validation data (figure 6), the apparent
overfitting does not compromise the accuracy of prediction for modes within the testing set.
For a quantitative assessment of the performance of the DeepONet, we define  as the

19
FIG. 6: Evolution of the loss function evaluated on the training and validation datasets for case F.

relative root mean square error evaluated over the full output domain for a given input
mode, s
h[G(f )(ζ) − G† (f )(ζ)]2 iζ
(f ) = , (19)
h[G† (f )(ζ)]2 iζ
where the average operation h·iζ is performed over the output domain Y (so under proper
P
discretization h·iζ ≈ (1/Nζ ) ζ ·). We calculated  over the testing dataset set. The mean
and standard deviation of  are shown in figure 8(a), grouped by the frequency F of the
input mode. The errors for all modes in the testing set are below 5%.
We evaluated the robustness of the trained DeepONet against noisy input data to the
branch. We introduced an additive white noise term scaled by the amplitude of the respective
input function f , i.e.,
f 7→ f + A max (f )η, (20)

where η is a delta-correlated random field with unit standard deviation with the same dimen-
sions as the corresponding f and A is the effective noise amplitude. Then,  is calculated by
drawing modes from the testing dataset, similar to the process performed in Figure 8(a) but
without separating the results into the different input frequencies. The prediction errors are
shown in figure 8(b). The results demonstrate that the accuracy of DeepONet predictions
are unchanged when the noise is less than 1%, but deteriorates at higher noise amplitudes.
The results from cases Fp and FT are similar to those from case F. Sample predictions
for the two configurations are shown in Figure 9, while the relative errors for the different
modes in the testing dataset are shown in figure 8(a). As with the previous case, the
trained DeepONets are able to accurately map inflow perturbations to the different fields
downstream and can differentiate between the different modal profiles, frequencies, and

20
FIG. 7: Examples from case F. For two particular input mode frequencies, (a) F = 102.75 and (b)
F = 122.75, we show the true solutions as generated by the PSE on the top row, the prediction
obtained from the DeepONet on the middle row, and a comparison of the profiles for a fixed
y/L0 = 5 on the bottom row. Both frequencies shown belong to the testing dataset.

FIG. 8: (a): Relative root mean square errors  evaluated over the whole output domain for different
solutions at the various frequencies present in the testing datasets for cases F, Fp , FT and FA0 . (b):
Relative root mean square errors  evaluated over the whole output domain for different solutions
using noisy data as a function of the noise amplitude A for cases F, Fp and FT .

21
FIG. 9: Examples from cases (a) Fp and (b) FT . For one particular input mode frequency, belonging
to the testing dataset, we show the true solutions as generated by the PSE on the top row, the
prediction obtained from the DeepONet on the middle row, and a comparison of the profiles for a
fixed y/L0 = 5 on the bottom row.

spatially dependent growth rates.

B. Computational cost and training metrics

The preceding cases each require approximately 3,000 minutes to train using an Nvidia
Tesla K80 GPU card. Evaluating the DeepONet with one batch of 5,000 points (roughly
the number of points used to represent the field in Yd ) entails on the order of 108 floating
point operations and requires 2.5 × 10−2 seconds using the same card. In order to view
these computational requirements in context, we can compare them to costs associated with
solving the PSE. We stress, however, that DeepONets cannot be regarded as replacement for
classical numerical simulations which are needed to generate the training data, and which
in the present case were performed using very accurate Chebyshev spectral discretization.

22
The PSE solution was performed on an Intel Core i5 CPU and required approximately 15
minutes; 59 different modes were evaluated to generate the training and validation data,
thus totalling 885 minutes of single CPU time.
Following the definitions of the different cost metrics introduced in Sec. II C and the
numbers presented above, we obtain Rt = 3.39, Re = 2.7 × 10−5 , and Ne∗ = 250. These
results show that the training time of a DeepONet is manageable and comparable to the
data generation in the present case and, as expected, that evaluating a trained DeepONet
is a very fast operation. It is also important to state that these values may vary strongly
depending on the particular application.

C. Impact of loss function weighting, feature expansion and network depth

Some of the algorithmic considerations when training DeepONets are important to note.
The first one is the use of weights in the loss function. In figure 10(a) we show the evolution
of the loss function evaluated on the validation dataset for cases FA0 , F and FA2 , which used
wi equal to 1, A−1 −2
i and Ai , respectively. All three cases go through an initial plateau where

networks output a constant value equal to the mean of the solutions over the given domain
(i.e., hG(f )(ζ)iζ ), and which the overcompensated case FA2 is not able to exit during the
number of epochs shown. The non-compensated case FA0 and compensated case F are able
to learn the solutions of the PSE. While the former reaches a lower loss, its prediction errors
are not necessarily smaller. The values of  grouped by frequency from both cases FA0 and
F are shown in figure 8, indicating that not compensating for the differences in amplitudes
limits the ability of the former network to correctly learn the low-amplitudes modes.
The existence of a plateau during training represents an extra cost for DeepONet. Ex-
panding input features using equation (5) proved to be a key element in reducing the time
that the networks spend in the plateau. The values of the loss function evaluated over the
validation dataset for cases Fnf , Fn0 , F, Fn2 , Fn3 and Fn4 are shown in figure 10(b). The
feature expansion reduces the duration of the initial plateau during training, to the point
of omitting it completely when using a large number of modes (n = 3 and 4). All cases
converge to the same values, expect for case Fn2 which converges to slightly lower value. To
the best of our knowledge, there is nothing special in relation to our data about using n = 2.
Finally we analyze the impact of network depth. In figure 11 we show the relative root

23
FIG. 10: (a): Evolution of the loss function evaluated on the validation datasets for cases F, FA0
and FA2 . (b): Evolution of the loss function evaluated on the validation datasets for cases Fnf ,
Fn0 , F, Fn2 , Fn3 and Fn4 .

FIG. 11: Relative root mean square errors  evaluated over the whole output domain for different
solutions using noisy data as a function of the noise amplitude A for cases Fd−2 , Fd−1 , F, Fd+1 ,
and Fd+2 .

mean square errors  and the robustness with respect to input corruption for cases Fd−2 ,
Fd−1 , F, Fd+1 , and Fd+2 . Making the models deeper decreases the errors in the prediction
up to a saturation point where extra layers do not improve the models performance.

D. Inverse problem

The evolution of the loss functions for both the training and validation datasets for
case I is shown in figure 12. The network is able to learn how to map from downstream
measurements of the wall pressure to inflow perturbation modes. Contrary to the forward

24
cases, no initial plateau arises during training nor did the loss function require weighting,
as the outputs of the network are all of the same order of magnitude. An example of a
reconstructed inflow mode evaluated over the whole output domain is shown in figure 13.
The overall accuracy and robustness with respect to noisy inputs are analyzed in figure 14,
where the relative error  is plotted as function of the input noise amplitude A. The inverse
case yields results similar or better than the forward cases.

FIG. 12: Evolution of the loss function evaluated on the training and validation datasets for case I.

E. Two-mode cases

The next cases we analyze are F2 and I2 , where the DeepONets were trained to learn
linear combinations of solutions. The loss functions are shown in figure 15 and examples of
prediction and reconstruction are shown in figures 16 and 17, for case F2 and I2 , respectively.
Once again, DeepONet is able to learn the target solutions. However, training is more
challenging because the effective solution space is much larger than for a single instability
mode; here the perturbation is comprised of two instability waves with different frequencies,
phases and amplitudes. This is evidenced in figure 16(b), which shows very good qualitative
agreement between data and prediction, but has a relative error  = 0.185 that is appreciably
higher than the values encountered in the earlier cases.
We analyze the robustness to input uncertainties in figure 18. As expected from the
comment above, the average  when A = 0 is higher than in the respective cases where
only one mode was used to generate the solutions, and the increased difficulties in training
also lead to the networks being more sensitive to noise. Nonetheless, the predictions and
reconstructions generated by the DeepONet remain satisfactory.

25
FIG. 13: Example from case I. For one particular downstream pressure measurement, belonging
to the testing dataset, we show the true upstream perturbation on the top row, the reconstruction
obtained from the DeepONet on the middle row, and a comparison of the profiles for a fixed
y/L0 = 5 on the bottom row.

Due to its complexity, case F2 was chosen for comparison of DeepONet against CNN
and FNO. The results are shown in Sec. V F. The errors achieved by DeepONet are almost
half as low as those of CNN or FNO. DeepONets can also make predictions at arbitrary
locations, contrary to the fixed output of a CNN or an FNO, and are thus then capable of
utilizing physics-informed constraints [44], as it is possible to apply automatic differentiation
to the trunk input variables, or use Fourier feature expansion, as shown above. The extra
flexibility of DeepONets comes at the cost of increased training times. For these examples
the CNN trained seven times faster.

26
FIG. 14: Relative root mean square errors  evaluated over the whole output domain for different
solutions using noisy data as a function of the noise amplitude A for case I.

FIG. 15: Evolution of the loss function evaluated on the training and validation datasets for cases
(a) F2 and (b) I2 .

F. Comparison of DeepONet to alternative architectures

Mapping inflow signals to downstream perturbations can be performed with other deep-
learning architectures besides DeepONets. In this section we compare DeepONet to con-
volutional neural network (CNN), which a popular choice for image analysis, and Fourier
Neural Operators (FNO, [37]). This comparison, which was quoted in a recent survey paper
[45], establishes the accuracy and robustness of the present DeepONet results in the context
of other existing methods.
The CNN used here is an encoder-decoder architecture. Specifically, we first utilize a CNN
as an encoder to extract the features of the input image, which reduces the high-dimensional
input space (R20×47 ) to a low-dimensional latent space (R64 ), and then we employ another
CNN as a decoder to map the vector in the latent space to the output space (R111×47 ). The

27
FIG. 16: Examples from case F2 . For two particular input mode combinations, one in panel (a)
and another in panel (b) with both belonging to the testing dataset, we show the true solutions
as generation from the PSE on the top row, the prediction obtained from the DeepONet on the
middle row, and a comparison of the profiles for a fixed y/L0 = 5 on the bottom row.

encoder has two convolution layers followed by a dense layer. In the decoder, we first have a
dense layer mapping from the latent space to a small image of size 14 × 6 × 64, and then we
use three transposed convolution layers to gradually increase the image size. The last layer
is a convolution layer to reduce the channel size to be 1. Each convolution or transposed
convolution layer has 64 channels, and the size of the convolution kernel is chosen to be 3 by
3 with stride 2 and “same” padding. We used tanh as the activation function, which leads
to a smoother output and a better accuracy than ReLU in this problem. The network is
optimized by the Adam optimizer with the learning rate 0.0003 and batch size 500 for 20000
epochs.
For FNO, because the input and outputs functions have different mesh resolutions, we
(bi-linearly) interpolate the input function from (20 × 47) points to (111 × 47) points to
match the output. We use a similar FNO as the one used for the Navier-Stokes equation

28
FIG. 17: Example from case I2 . For one particular downstream pressure measurement, belonging
to the testing dataset, we show the true upstream perturbation on the top row, the reconstruction
obtained from the DeepONet on the middle row, and a comparison of the profiles for a fixed
y/L0 = 5 on the bottom row.

in [37], but we increase the number of channels from 32 to 64 and keep the first 16 Fourier
modes in each channel to achieve better accuracy.
In figure 19 we show the evolution of the loss function for both the CNN and the FNO.
The CNN achieves an error of  = 0.176, the FNO performs similarly with  = 0.178. Both
of these values are higher than the respective value of  = 0.09 achieved by DeepONet in case
F2 . In [45], an additional comparison to an alternative formulation of DeepONet is presented,
termed POD-DeepONet. This version replaces the trunk network for p-precomputed POD
modes, while the branch network and the final dot product layer are kept in place. For this
problem, the POD-DeepONet achieved an error of  = 0.21 with A = 0, more than twice as
much as that of the regular DeepONet.
We expand on the differences between the three approaches and compare the results when
applied to the same problem. Both CNN and FNO generally adopt gridded rectangular data

29
FIG. 18: Relative root mean square errors  evaluated over the whole output domain for different
solutions using noisy data as a function of the noise amplitude A for cases F2 and I2 .

FIG. 19: Evolution of the loss function evaluated on the training and validation datasets for case
F2 using (a) a CNN-based encoder-decoder network and (b) a FNO (adapted from [45]).

DeepONet CNN & FNO

Input/Output domain Arbitrary Rectangle


Mesh Arbitrary Grid
Training data Partial observation Complete observation
Prediction location Arbitrary Grid points
Architecture flexibility DeepONet is more flexible, e.g., adding features.

Accuracy Comparable
Training cost CNN and FNO is faster (up to 7X in our test).

TABLE III: Comparison between DeepONet, CNN and FNO.

30
both at the input and output, and require all the grid values during training. DeepONet can
use arbitrary domains for their input, and we can train DeepONet using partial observations.
During the inference stage, DeepONet can evaluate the output at any location inside the
domain, while CNN and FNOs can only predict the output on the grid. The flexibility of
the DeepONet architecture also allows for the easy implementation of extra components,
such as the feature expansion discussed in Sec. IV A or even the adoption of convolutional
layers in the branch network. Flexibility comes at a cost, however, as DeepONet training is
less efficient than CNN or FNO. In the the example shown above, performed using the same
GPU in both cases, DeepONet required 7 times as long as CNN or FNOs to train. The
differences are listed in Table III. Therefore, the only drawback of DeepONet is the high
training cost, but we note the training is offline and improvements such as data-parallel
training should help to speed up the process. For a more detailed comparison between
DeepONet and FNOs, and for comments on how to speed up the training of DeepONet, see
[45].

G. Data assimilation using trained DeepONets

Finally, we analyze a prototype problem for data assimilation (DA) using DeepONet.
While DeepONet could be trained to directly map measurements to flow-field prediction, we
look at the case where we concatenate two trained networks. In particular, the output of
case I is fed into case F and we term this configuration case A. Figure 20 shows the value of
 at different noise levels for this case. The proposed DA protocol is able to reconstruct the
inflow condition and predict the corresponding downstream field. The higher errors obtained
when A = 0 are due to the errors present in the output of case I.
In figure 21 we show two examples of assimilated fields, using the same solutions as
in figure 7. Qualitatively, cases F and A produce very similar results, but the errors are
slightly higher for case A. The error  for the example shown in figure 21(a) is equal to 0.112,
compared to the previously reported  = 0.030 in figure 7(a). For the example shown in
panel (b) of both figures,  only increases slightly from 0.047 to 0.054 when performing the
assimilation. Similar to what was shown in figure 16(b), the main sources of errors are slight
shifts in phase and amplitude. Thanks to the robustness of case I to input uncertainty, case
A scales slightly better than case F. Overall, DeepONet can act as efficient and flexible DA

31
FIG. 20: Relative root mean square errors  evaluated over the whole output domain for different
solutions using noisy data as a function of the noise amplitude A for case A.

framework.

VI. SUMMARY

Deep learning techniques have made great strides in numerous problems in computer
science, and their capabilities for important applications in physics and engineering are
expanding. Generating fast and accurate solutions of systems of equations is one such
major problem where machine learning is posed to accelerate scientific discovery. We showed
that DeepONet, which is a neural operator architecture, can map an instability wave in a
high-Mach-number boundary layer from an upstream to a downstream field. DeepONet can
predict different components of the state vector, and also map downstream measurements to
upstream disturbances, which is beneficial for inverse modeling and data assimilation. The
introduction of Fourier harmonic input feature expansion and loss function weighting were
key elements to speed up training and achieve accurate predictions. We introduced three
different cost metrics as a way to assess the feasibility of neural operators for the application
at hand. DeepONet was also shown to be robust against noisy inputs and can be combined
to perform data assimilation, even though the networks were not explicitly trained for either
task. Improvements to the training procedure that take into account noisy data can further
improve the performance of DeepONets, as well as incorporatiopn of some of the governing
equations of the physical models in the loss function during the training stage.

32
FIG. 21: Examples from case A. For two particular downstream pressure measurements, (a) F =
102.75 and (b) F = 122.75, we show the true solutions as generated from the PSE on the top row,
the assimilated fields obtained from the DeepONet on the middle row, and a comparison of the
profiles for a fixed y/L0 = 5 on the bottom row.

Acknowledgments

This work was supported by DARPA/CompMods (Grant HR00112090062) and the Air
Force Office of Scientific Research (Grants FA9550-19-S-0003, FA9550-21-1-0345).

Appendix A: Operator matrices of the Parabolized Stability Equations

We present the different non-zero elements of the operator matrices V, featured in equa-
tion (11) and operators Ǎ and L, used in Eq. (14). Note that terms with order O(1/Re20 ) are
neglected [46]. The indices i, j represent the row and the column entries within the matrix
operator:

Vt (1, 1) = 1, Vt (2, 2) = Vt (3, 3) = Vt (4, 4) = ρB , (A1)

33
∂UB ∂VB ∂ρB ∂ρB
V0 (1, 1) = + , V0 (1, 2) = , V0 (1, 3) = (A2)
∂x ∂y ∂x ∂y

∂UB ∂UB 1 ∂TB ∂UB ∂UB


V0 (2, 1) = UB + VB + , V0 (2, 2) = ρB , V0 (2, 3) = ρB , (A3)
∂x ∂y γM02 ∂x ∂x ∂y

  2
∂ 2 VB ∂µB

1 ∂ρB 1 ∂ UB
V0 (2, 4) = − l +
γM02 ∂x Re0 ∂x2 ∂x∂y ∂TB
  2 
∂UB ∂VB ∂ µB ∂TB
+ +
∂x ∂y ∂TB2 ∂x
 2
∂ UB ∂µB ∂UB ∂ 2 µB ∂TB

+2 + (A4)
∂x2 ∂TB ∂x ∂TB2 ∂x
 2
∂ 2 VB ∂µB

∂ UB
+ +
∂y 2 ∂x∂y ∂TB
∂UB ∂VB ∂ 2 µB ∂TB
  
+ + ,
∂y ∂x ∂TB2 ∂y

∂VB ∂VB 1 ∂TB ∂VB ∂VB


V0 (3, 1) = UB + VB + , V0 (3, 2) = ρB , V0 (3, 3) = ρB , (A5)
∂x ∂y γM02 ∂y ∂x ∂y

  2
∂ UB ∂ 2 VB ∂µB

1 ∂ρB 1
V0 (3, 4) = − l +
γM02 ∂y Re0 ∂x∂y ∂y 2 ∂TB
  2 
∂UB ∂VB ∂ µB ∂TB
+ +
∂x ∂y ∂TB2 ∂y
 2
∂ VB ∂µB ∂VB ∂ 2 µB ∂TB

+2 + (A6)
∂y 2 ∂TB ∂y ∂TB2 ∂y
 2
∂ UB ∂ 2 VB ∂µB

+ +
∂x∂y ∂x2 ∂TB
∂UB ∂VB ∂ 2 µB ∂TB
  
+ + ,
∂y ∂x ∂TB2 ∂x
 
∂TB ∂TB ∂UB ∂VB
V0 (4, 1) = UB + VB + (γ − 1)TB + , (A7)
∂x ∂y ∂x ∂y

∂TB ∂TB
V0 (4, 2) = ρB , V0 (4, 3) = ρB , (A8)
∂x ∂y

34
 
∂UB ∂VB
V0 (4, 4) = (γ − 1)ρB +
∂x ∂y
 2
∂ TB ∂ 2 TB ∂kB

γ
− +
Re0 P r0 ∂x2 ∂y 2 ∂TB
( 2  2 ) 2 #
∂TB ∂TB ∂ kB
+ +
∂x ∂y ∂TB2 (A9)
" ( 2  2 )
γ(γ − 1)M02 ∂µB ∂UB ∂VB
− 2 +
Re0 ∂TB ∂x ∂y
 2  2 #
∂VB ∂UB ∂UB ∂VB
+ + +l + ,
∂x ∂y ∂x ∂y

Vx (1, 1) = UB , Vx (1, 2) = ρB , (A10)

TB l + 2 ∂µB ∂TB
Vx (2, 1) = , Vx (2, 2) = ρB UB − , (A11)
γM02 Re0 ∂TB ∂x

1 ∂µB ∂TB
Vx (2, 3) = − , (A12)
Re0 ∂TB ∂y
   
ρB 1 ∂µB ∂UB ∂VB ∂UB
Vx (2, 4) = − l + +2 , (A13)
γM02 Re0 ∂TB ∂x ∂y ∂x

l ∂µB ∂TB
Vx (3, 2) = − (A14)
Re0 ∂TB ∂y

1 ∂µB ∂TB
Vx (3, 3) = ρB UB − , (A15)
Re0 ∂TB ∂x
 
1 ∂µB ∂UB ∂VB
Vx (3, 4) = − + , (A16)
Re0 ∂TB ∂y ∂x

2γ(γ − 1)M02 µB
 
∂UB ∂VB
Vx (4, 2) = (γ − 1) − (l + 2) +l , (A17)
Re0 ∂x ∂y

2γ(γ − 1)M02 µB
 
∂VB ∂UB
Vx (4, 3) = − + , (A18)
Re0 ∂x ∂y

2γ ∂kB ∂TB
Vx (4, 4) = ρB UB − , (A19)
Re0 P r0 ∂TB ∂x

35
Vy (1, 1) = VB , Vy (1, 3) = ρB , (A20)

1 ∂µB ∂TB l ∂µB ∂TB


Vy (2, 2) = ρB VB − , Vy (2, 3) = − , (A21)
Re0 ∂TB ∂y Re0 ∂TB ∂x
 
1 ∂µB ∂UB ∂VB
Vy (2, 4) = − + , (A22)
Re0 ∂TB ∂y ∂x

TB 1 ∂µB ∂TB
Vy (3, 1) = , Vy (3, 2) = − , (A23)
γM02 Re0 ∂TB ∂x

l + 2 ∂µB ∂TB
Vy (3, 3) = ρB VB − , (A24)
Re0 ∂TB ∂y
   
ρB 1 ∂µB ∂UB ∂VB ∂VB
Vy (3, 4) = − l + +2 (A25)
γM02 Re0 ∂TB ∂x ∂y ∂y

2γ(γ − 1)M02 µB
 
∂VB ∂UB
Vy (4, 2) = − + , (A26)
Re0 ∂x ∂y

2γ(γ − 1)M02 µB
 
∂VB ∂UB
Vy (4, 3) = (γ − 1) − (l + 2) +l , (A27)
Re0 ∂y ∂x

2γ ∂kB ∂TB
Vy (4, 4) = ρB VB − , (A28)
Re0 P r0 ∂TB ∂y

µB
Vxx (2, 2) = −(l + 2) , (A29)
Re0

µB γkB
Vxx (3, 3) = − , Vxx (4, 4) = − , (A30)
Re0 Re0 P r0

µB µB
Vxy (2, 3) = −(l + 1) , Vxy (3, 2) = −(l + 1) , (A31)
Re0 Re0

µB
Vyy (2, 2) = − , (A32)
Re0

µB
Vyy (3, 3) = −(l + 2) , (A33)
Re0

36
γkB
Vyy (4, 4) = − , (A34)
Re0 P r0

Ǎ(1, 1) = UB , (A35)

Ǎ(1, 2) = ρB , (A36)

Ǎ(2, 2) = Ǎ(3, 3) = Ǎ(4, 4) = ρB UB , (A37)


TB
Ǎ(2, 1) = , (A38)
γM02
ρB
Ǎ(2, 4) = , (A39)
γM02
Ǎ(4, 2) = (γ − 1). (A40)

   
∂ ∂ ∂
L = α Vx + αVxx + Vxy − iωVt + V0 + Vy + Vyy (A41)
∂y ∂y ∂y

[1] L. M. Mack, Tech. Rep. (1984), URL http://adsabs.harvard.edu/abs/1984scst.agar...


..M.
[2] A. Fedorov, Annual Review of Fluid Mechanics 43, 79 (2011), URL https://doi.org/10.
1146/annurev-fluid-122109-160750.
[3] X. Zhong and X. Wang, Annual Review of Fluid Mechanics 44, 527 (2012), URL https:
//doi.org/10.1146/annurev-fluid-120710-101208.
[4] S. P. Schneider, Progress in Aerospace Sciences 72, 17 (2015), ISSN 0376-0421, URL https:
//www.sciencedirect.com/science/article/pii/S0376042114000876.
[5] J. Park and T. A. Zaki, Journal of Fluid Mechanics 859, 476 (2019), ISSN 0022-1120, 1469-
7645.
[6] S. P. Schneider, Journal of Spacecraft and Rockets 38, 323 (2001), ISSN 0022-4650, URL
https://arc.aiaa.org/doi/10.2514/2.3705.
[7] J. Joo and P. A. Durbin, Flow, Turbulence and Combustion 88, 407 (2012), ISSN 1573-1987,
URL https://doi.org/10.1007/s10494-011-9372-x.
[8] A. Frendi, L. Maestrello, and A. Bayliss, AIAA Journal 31, 708 (1993), ISSN 0001-1452, URL
https://arc.aiaa.org/doi/10.2514/3.49017.

37
[9] R. Jahanbakhshi and T. A. Zaki, Journal of Fluid Mechanics 876, 87 (2019), ISSN 0022-1120,
1469-7645.
[10] R. Jahanbakhshi and T. A. Zaki, Journal of Fluid Mechanics 916, A46 (2021).
[11] I. A. Leyva, Physics Today 70, 30 (2017), ISSN 0031-9228, URL https://physicstoday.
scitation.org/doi/10.1063/PT.3.3762.
[12] J. P. Slotnick, A. Khodadoust, J. J. Alonso, D. L. Darmofal, W. D. Gropp, E. A. Lurie, and
D. J. Mavriplis, Cfd vision 2030 study: A path to revolutionary computational aerosciences
(2014).
[13] M. Drela and M. B. Giles, AIAA Journal 25, 1347 (1987), ISSN 0001-1452, URL https:
//doi.org/10.2514/3.9789.
[14] J. Perraud and A. Durant, Journal of Spacecraft and Rockets 53, 730 (2016), ISSN 0022-4650,
URL https://doi.org/10.2514/1.A33475.
[15] A. Krumbein, Aerospace Science and Technology 12, 592 (2008), ISSN 1270-9638.
[16] F. Pinna, L. Zanus, S. Demange, and M. Olazabal-Loume, in 2018 Fluid Dynamics Conference
(American Institute of Aeronautics and Astronautics, 2018), URL https://arc.aiaa.org/
doi/abs/10.2514/6.2018-3697.
[17] J. Saint-James, H. Deniau, O. Vermeersch, and E. Piot, in AIAA Scitech 2020 Forum (Amer-
ican Institute of Aeronautics and Astronautics, 2020), AIAA SciTech Forum.
[18] J. D. Crouch, I. W. M. Crouch, and L. L. Ng, AIAA Journal 40, 1536 (2002), ISSN 0001-1452,
1533-385X, URL https://arc.aiaa.org/doi/10.2514/2.1850.
[19] R. Fuller, W. Saunders, and U. Vandsburger (American Institute of Aeronautics and Astro-
nautics, 2012), URL https://arc.aiaa.org/doi/abs/10.2514/6.1997-559.
[20] F. Danvin, M. Olazabal-Loume, and F. Pinna, in 2018 Fluid Dynamics Conference (American
Institute of Aeronautics and Astronautics, 2018), URL https://arc.aiaa.org/doi/abs/10.
2514/6.2018-3701.
[21] M. I. Zafar, H. Xiao, M. M. Choudhari, F. Li, C.-L. Chang, P. Paredes, and B. Venkatachari,
Physical Review Fluids 5, 113903 (2020), publisher: American Physical Society, URL https:
//link.aps.org/doi/10.1103/PhysRevFluids.5.113903.
[22] M. Wang, Q. Wang, and T. A. Zaki, Journal of Computational Physics 396, 427
(2019), ISSN 0021-9991, URL https://www.sciencedirect.com/science/article/pii/
S0021999119304735.

38
[23] Q. Wang, Y. Hasegawa, and T. A. Zaki, Journal of Fluid Mechanics 870, 316 (2019), ISSN
0022-1120, 1469-7645.
[24] M. Wang and T. A. Zaki, Journal of Fluid Mechanics 917, A9 (2021).
[25] V. Mons, Q. Wang, and T. A. Zaki, Journal of Computational Physics 398, 108856
(2019), ISSN 0021-9991, URL http://www.sciencedirect.com/science/article/pii/
S0021999119305406.
[26] D. A. Buchta and T. A. Zaki, Journal of Fluid Mechanics 916, A44 (2021).
[27] C. Foias, C. Mondaini, and E. Titi, SIAM Journal on Applied Dynamical Systems 15, 2109
(2016), URL https://epubs.siam.org/doi/abs/10.1137/16M1076526.
[28] P. Clark Di Leoni, A. Mazzino, and L. Biferale, Physical Review X 10, 011023 (2020), URL
https://link.aps.org/doi/10.1103/PhysRevX.10.011023.
[29] M. Raissi, A. Yazdani, and G. E. Karniadakis, arXiv:1808.04327 [physics, stat] (2018), arXiv:
1808.04327, URL http://arxiv.org/abs/1808.04327.
[30] X. Jin, S. Cai, H. Li, and G. E. Karniadakis, arXiv:2003.06496 [physics] (2020), arXiv:
2003.06496, URL http://arxiv.org/abs/2003.06496.
[31] M. Buzzicotti, F. Bonaccorso, P. Clark Di Leoni, and L. Biferale, arXiv:2006.09179 [cond-mat,
physics:nlin, physics:physics] (2020), arXiv: 2006.09179, URL http://arxiv.org/abs/2006.
09179.
[32] D. A. Buchta, S. J. Laurence, and T. A. Zaki, Journal of Fluid Mechanics 947, R2 (2022).
[33] G. Cybenko, Mathematics of Control, Signals and Systems 2, 303 (1989), ISSN 0932-4194,
1435-568X, URL https://link.springer.com/article/10.1007/BF02551274.
[34] Y. Du and T. A. Zaki, Physical Review E 104, 045303 (2021).
[35] T. Chen and H. Chen, IEEE Transactions on Neural Networks 6, 911 (1995).
[36] J. d. A. Ferrandis, M. Triantafyllou, C. Chryssostomidis, and G. Karniadakis, arXiv:1912.13382
[cs] (2019), arXiv: 1912.13382, URL http://arxiv.org/abs/1912.13382.
[37] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anand-
kumar, arXiv:2010.08895 [cs, math] (2020), arXiv: 2010.08895, URL http://arxiv.org/abs/
2010.08895.
[38] L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis, Nature Machine Intelligence 3, 218
(2021), ISSN 2522-5839, URL https://www.nature.com/articles/s42256-021-00302-5.
[39] S. Cai, Z. Wang, L. Lu, T. A. Zaki, and G. E. Karniadakis, Journal of Computational Physics

39
436, 110296 (2021), ISSN 0021-9991, URL https://www.sciencedirect.com/science/
article/pii/S0021999121001911.
[40] Z. Mao, L. Lu, O. Marxen, T. A. Zaki, and G. E. Karniadakis, Journal of Computational
Physics 447, 110698 (2021).
[41] C.-L. Chang, M. R. Malik, G. Erlebacher, and M. Y. Hussaini, Final Report Institute for
Computer Applications in Science and Engineering, Hampton, VA. (1993).
[42] W. D. Harvey, NASA Tech. Memorandum 78635 (1978).
[43] S. P. Schneider, Journal of Spacecraft and Rockets 36, 8 (1999), ISSN 0022-4650, URL https:
//arc.aiaa.org/doi/10.2514/2.3428.
[44] S. Wang, H. Wang, and P. Perdikaris, arXiv:2103.10974 [cs, math, stat] (2021), arXiv:
2103.10974, URL http://arxiv.org/abs/2103.10974.
[45] L. Lu, X. Meng, S. Cai, Z. Mao, S. Goswami, Z. Zhang, and G. E. Karniadakis, Computer
Methods in Applied Mechanics and Engineering 393, 114778 (2022).
[46] F. P. Bertolotti, Ph.D. thesis, The Ohio State University (1991).

40
Declaration of interests

☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.

☐The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:
Patricio Clark Di Leoni: Methodology, Software,
Investigation, Writing-Original draft. Lu Lu:
Methodology, Software. Charles Meneveau:
Conceptualization, Supervision, Writing- Reviewing
and Editing. George Em Karniadakis:
Conceptualization, Supervision, Writing- Reviewing
and Editing. Tamer A. Zaki: Conceptualization,
Supervision, Writing- Reviewing and Editing.

You might also like