0% found this document useful (0 votes)
14 views26 pages

Friedrichs Learning: Weak Solutions of Partial Differential Equations Via Deep Learning

Uploaded by

ahamdaftab786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views26 pages

Friedrichs Learning: Weak Solutions of Partial Differential Equations Via Deep Learning

Uploaded by

ahamdaftab786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

FRIEDRICHS LEARNING: WEAK SOLUTIONS OF PARTIAL DIFFERENTIAL

EQUATIONS VIA DEEP LEARNING


FAN CHEN

SCHOOL OF MATHEMATICAL SCIENCES, AND MOE-LSC,


SHANGHAI JIAO TONG UNIVERSITY, SHANGHAI 200240, CHINA ([email protected])

JIANGUO HUANG

SCHOOL OF MATHEMATICAL SCIENCES, AND MOE-LSC,


arXiv:2012.08023v3 [math.NA] 31 Dec 2022

SHANGHAI JIAO TONG UNIVERSITY, SHANGHAI 200240, CHINA ([email protected])

CHUNMEI WANG

DEPARTMENT OF MATHEMATICS, UNIVERSITY OF FLORIDA, GAINESVILLE, FL 32611, USA


([email protected])

HAIZHAO YANG

DEPARTMENT OF MATHEMATICS, UNIVERSITY OF MARYLAND, COLLEGE PARK, MD 20742, USA


([email protected])

Abstract. This paper proposes Friedrichs learning as a novel deep learning methodology that can learn the
weak solutions of PDEs via a minimax formulation, which transforms the PDE problem into a minimax optimization
problem to identify weak solutions. The name “Friedrichs learning” is to highlight the close relation between our
learning strategy and Friedrichs theory on symmetric systems of PDEs. The weak solution and the test function in the
weak formulation are parameterized as deep neural networks in a mesh-free manner, which are alternately updated to
approach the optimal solution networks approximating the weak solution and the optimal test function, respectively.
Extensive numerical results indicate that our mesh-free Friedrichs learning method can provide reasonably good
solutions for a wide range of PDEs defined on regular and irregular domains, where conventional numerical methods
such as finite difference methods and finite element methods may be tedious or difficult to be applied, especially for
those with discontinuous solutions in high-dimensional problems.

Key words. Partial Differential Equation; Friedrichs’ System; Minimax Optimization; Weak Solution; Deep
Neural Network; High Dimensional Complex Domain.

AMS subject classifications. 65M75; 65N75; 62M45.

1. Introduction. High-dimensional PDEs and PDEs defined on complex domains are


important tools in physical, financial, and biological models, etc. [52, 19, 68, 25, 67]. Generally
speaking, they do not have closed-form solutions making numerical solutions of such equations
indispensable in real applications. First, developing numerical methods for high-dimensional PDEs
has been a challenging task due to the curse of dimensionality in conventional discretization. Second,
conventional numerical methods rely on mesh generation that requires profound expertise and
programming skills without the use of commercial software. In particular, for problems defined in
complicated domains, it is challenging and time-consuming to implement conventional methods.
As an efficient parametrization tool for high-dimensional functions [8, 17, 58, 57, 64, 41, 43, 62, 63]
with user-friendly software (e.g., TensorFlow and PyTorch), neural networks have been applied to
solve PDEs via various approaches recently. The idea of using neural networks to solve PDEs dates
back to the 1990s [51, 26, 15, 50] and was revisited and popularized recently [16, 32, 18, 48, 65, 11,
53, 9, 42, 41, 13, 60, 54, 69, 7, 56, 47, 44].
Many network-based PDE solvers are concerned with the classical solutions that are differen-
tiable and satisfy PDEs in common sense. Unlike classical solutions, weak solutions are functions
for which the derivatives may not always exist but which are nonetheless deemed to satisfy the
PDE in some precisely defined sense. These solutions are crucial because many PDEs in modeling
1
2 Friedrichs Learning for Weak Solutions of PDEs

real-world phenomena do not have sufficiently smooth solutions. Motivated by the seminal work
in [7], we propose Friedrichs learning as an alternative method that can learn the weak solutions
of elliptic, parabolic, and hyperbolic PDEs in L2 (Ω) via a novel minimax formulation devised and
analyzed in Section 2.3. Since the formulation is closely related to the work of Friedrichs theory on
symmetric systems of PDEs (cf. [24]), we call our learning strategy the Friedrichs learning. The
main idea is to transform the PDE problem into a minimax optimization problem to identify weak
solutions. Note that no regularity for the solution is required in Friedrichs learning, which is the
main advantage of the proposed method, making it applicable to a wide range of PDE problems,
especially those with discontinuous solutions. In addition, Friedrichs learning is capable of solving
PDEs with discontinuous solutions without a priori knowledge of the location of the discontinuity.
Although Friedrichs learning may not be able to provide highly accurate solutions, it could solve
a coarse solution without a priori knowledge of the discontinuity. This rough estimation of the
discontinuity could serve as a good initial guess of conventional computation approaches for highly
accurate solutions following the Int-Deep framework in [40]. Finally, theoretical results are provided
to justify the Friedrichs learning framework for various PDEs.
The main philosophy of Friedrichs learning is to reformulate a PDE problem into a minimax
optimization, the solution of which is a test deep neural network (DNN) that maximizes the loss
and a solution DNN that minimizes the loss. For a high-order PDE, we first reformulate it into a
first-order PDE system by introducing auxiliary variables, the weak form of which naturally leads
to a minimax optimization using integration by parts according to the theory of Friedrichs’ system
[24]. The above-mentioned feature is the crucial difference from existing deep learning methods for
weak solutions [18, 69]. Let us introduce the formulation of Friedrichs learning using first-order
boundary value problems (BVPs) with homogeneous boundary conditions without loss of generality.
The initial value problems (IVPs) can be treated as BVPs, where the time variable is considered to
be one more spatial variable. The non-homogeneous boundary conditions can be easily transferred
to homogeneous ones by subtracting the boundary functions from the solutions.
In the seminal results by Friedrichs in [24] and other investigations in [5, 23], an abstract
framework of the boundary value problem of the first-order system was established, which is referred
to as Friedrichs’ system in the literature. Let us introduce the concept of Friedrichs’ system using
a concrete and simple example and illustrate the main idea and intuition of the Friedrichs learning
proposed in this paper. A more detailed abstract framework of Friedrichs learning will be discussed
later in Section 2. Let r ∈ N and Ω ⊂ Rd be an open and bounded domain with Lipschitz boundary
∂Ω. The notation (·)| denotes P the transpose of a vector or a matrix throughout the paper. We
assume: 1) Ak ∈ [L∞ (Ω)]r×r , d ∞ r×r , A = A| a.e. in Ω for k = 1, . . . , d,
k=1 ∂k Ak ∈ [L (Ω)] k k P
and C ∈ [L (Ω)] ; 2) the full coercivity holds true, i.e., C + C | − dk=1 ∂k Ak ≥ 2µ0 Ir a.e. in
∞ r×r

Ω for some µ0 > 0 and the identity matrix Ir ∈ Rr×r . Then the P first-order differential operator
T : D → L with L = [L (Ω)] and D = [C0 (Ω)] defined by T u := dk=1 Ak ∂k u + Cu is called the
2 r ∞ r

Friedrichs operator, while the first-order system of PDEs T u = f is called the Friedrichs’ system,
where f is a given data function in L and the space C0∞ (Ω) consists of all infinitely differentiable
functions with compact support in Ω. Throughout this paper, the bold font will be used for vectors
and matrices in concrete examples. In our abstract framework, PDE solutions are considered as
elements of a Hilbert space, so they will not be denoted as bold letters.
Friedrichs [24] also introduced an abstract framework Pd for representing boundary conditions
via matrix-valued boundary fields. First, let An := n A ∈ [L∞ (∂Ω)]r×r , where n =
k=1 k k
(n1 , · · · , nd ) ∈ Rd is the unit outward normal direction on ∂Ω, and let M : ∂Ω → Rr×r be a matrix
field on the boundary. Then a homogeneous Dirichlet boundary condition of Friedrichs’ system is
prescribed by (An −M )u = 0 on ∂Ω by choosing an appropriate M to ensure the well-posedness of
Friedrichs Learning for Weak Solutions of PDEs 3

Friedrichs’ system. In real applications, M is given by physical knowledge. Let V := N (An − M )


and V ∗ := N (An + M | ), where N is the null space of the argument. It has been proved that u
solves the BVP

(1.1) T u = f in Ω and (An − M )u = 0 on ∂Ω,

if and only if u solves the minimax problem

|(u, T̃ v)L − (f , v)L |


min max∗ L(u, v) := ,
u∈V v∈V kT̃ vkL

where T̃ : D → L is the formal adjoint of T . Hence, in our Friedrichs learning, DNNs are applied
to parametrize u and v to solve the above minimax problem to obtain the solution of the BVP
(1.1). Friedrichs learning also works for other kinds of boundary conditions.
This paper is organized as follows. In Section 2, we devise and analyze Friedrichs minimax
formulation for weak solutions of PDEs. In Section 3, several concrete examples of PDEs and their
minimax formulations are provided. In Section 4, network-based optimization is introduced to solve
the minimax problem in Friedrichs formulation. In Section 5, a series of numerical examples are
provided to demonstrate the effectiveness of the proposed Friedrichs learning. Finally, we conclude
this paper in Section 6.
2. Friedrichs Minimax Formulation for Weak Solutions. In this section, we shall
first recall some standard notations frequently used later on. Then, briefly review Friedrichs’
system in a Hilbert space setting [23, 12], followed by introducing and analyzing Friedrichs minimax
formulation for weak solutions which is the foundation of Friedrichs learning.
Let Ω ⊂ Rd be a bounded domain with the Lipschitz boundary. Let Dj = ∂x∂ j be the partial
derivative operation with respect to xj in the weak sense. For a multi-index α = (α1 , · · · , αd ) with
each αi being a non-negative integer, denote Dα = D1α1 D2α2 · · · Ddαd . For a non-negative integer k
and a real number with 1 ≤ p ≤ ∞, define the Sobolev space W k,p (Ω) as a vector space P consisting
of all functions v ∈ Lp (Ω) such that Dα v ∈ Lp (Ω) for all multi-indices α with |α| = dj=1 αj ≤ k,
which is equipped with the following norm:
 X Z 1/p X
kvkW k,p (Ω) = |Dα u|p dx , 1 ≤ p < ∞; kukW k,∞ (Ω) = esssupΩ |Dα u| ,
|α|≤k Ω |α|≤k

where esssupΩ is the essential supremum for a function in Ω. When p = 2, W k,2 (Ω) is simply
written as H k (Ω). In addition, let H0k (Ω) be the closure of C0∞ (Ω) with respect to the norm of
H k (Ω), while H −k (Ω) denotes the dual space of H0k (Ω). We refer the reader to the monograph [1]
for details about Sobolev spaces and their properties.
Let L denote a real Hilbert space, which is equipped with the inner product (·, ·)L and the
induced norm k · kL . For any two vectors in an Euclidean space, we use (·, ·) to represent their
natural inner product and denote by k·kp the related `p norm for 1 ≤ p ≤ ∞; most of these symbols
will appear in Sections 4 and 5. For a vector space W and its dual space W 0 , the notation h·, ·iW ×W 0
represents the duality pair between W and W 0 . For any two Hilbert spaces X and Y , denote by
L(X, Y ) the vector space consisting of all continuous linear operators from X into Y .
2.1. An Abstract Framework of Friedrichs’ System. First of all, we recall some basic
results on Friedrichs’ system developed in [12, 23] for later use in order to be self-contained. Let L
be a real Hilbert space, and dual space of L, denoted by L0 , can be identified naturally with L by
the Riesz representation theorem. For a dense subspace D of L, we consider two linear operators
4 Friedrichs Learning for Weak Solutions of PDEs

T : D → L and T̃ : D → L satisfying the following properties: for any u, v ∈ D, there exists a


positive constant C such that

(2.1) (T u, v)L = (u, T̃ v)L ,


(2.2) k(T + T̃ )ukL ≤ CkukL .

It is worth noting that the two operators T and T̃ are given simultaneously. Due to the property
(2.1), we often call T̃ as the formal adjoint of T and vice versa. Since the operators T and T̃ play
the same roles, we will focus on the forthcoming discussion for T , which can be applied to T̃ in a
straightforward way. As shown in [6, Sect. 5.5], write W0 as the completion of D with respect to
the scalar product (·, ·)T = (·, ·)L + (T ·, T ·)L . Then, we have by (2.1) that

D ⊂ W0 ⊂ L = L0 ⊂ W00 ⊂ D0 .

In addition, in view of (2.2), we know W0 is also the completion of D with respect to the scalar
product (·, ·)T̃ = (·, ·)L̃ + (T ·, T ·)L̃ . Thus, T̃ can be extended from D to W0 , and its true adjoint
(T̃ )∗ ∈ L(L; W00 ) can be viewed as the extension of T to L. When there is no confusion caused, we
still use the notation T for this extension operator. This argument applies to T̃ as well.
We provide an example to make the above abstract treatment more accessible. Let Ω = (a, b).
Choose D = C0∞ (Ω) and L = L2 (Ω). Let T v = v 0 and T̃ v = −v 0 for all v ∈ C0∞ (Ω). In this case,
we have
b Z
(v, w)T = (v, w)T̃ = (vw + v 0 w0 )dx, ∀ v, w ∈ C0∞ (Ω),
a
so, by definition, the completion of ∞
C0 (Ω) with respect to the induced norm is exactly the Sobolev
space H01 (Ω). Hence, according to Theorem 1.4.4.6 in [27, p. 31], if we understand the derivative
operator in the sense of distributions, we know (T̃ )∗ ∈ L(L2 (Ω); H −1 (Ω)). In other words, the
derivative operator (·)0 can be viewed as a continuous linear operator from L2 (Ω) into H −1 (Ω).
Next, as given in [23, Lemma 2.1], define a graph space W by

(2.3) W = {u ∈ L; T u ∈ L},
1/2
which is a Hilbert space with respect to the graph norm k · kT = (·, ·)T . In addition, owing to
(2.2), we have
W = {u ∈ L; T̃ u ∈ L}.
That means W is also a graph space associated with T̃ .
The abstract framework of Friedrichs’ system concerns the solvability of the problem

(2.4) T u = f ∈ L,

and its solution falls in the graph space W . Obviously, the problem (2.4) may not be well-posed
since its solution in W may not be unique. We are interested in constructing a subspace V ⊆ W
such that T : V → L is an isomorphism. A standard way is carried out as follows. We first define
a self-adjoint boundary operator B ∈ L(W, W 0 ) as follows (cf. [23]):

(2.5) hBu, viW 0 ×W = (T u, v)L − (u, T̃ v)L , ∀ u, v ∈ W.

This operator plays a key role in the forthcoming analysis. Moreover, the identity (2.5) can be
reformulated in the form
(T u, v)L = (u, T̃ v)L + hBu, viW 0 ×W ,
Friedrichs Learning for Weak Solutions of PDEs 5

which is usually regarded as an abstract integration by parts formula (cf. [23]).


Furthermore, we assume that there exists an operator M ∈ L(W, W 0 ) such that

(2.6) hM w, wiW 0 ×W ≥ 0, ∀ w ∈ W,
(2.7) W = N (B − M ) + N (B + M ),

where N is the null space of its argument. Meanwhile, let M ∗ ∈ L(W, W 0 ) denote the adjoint
operator of M given by hM ∗ u, viW 0 ×W = hM v, uiW 0 ×W , ∀ u, v ∈ W.
To find V such that the problem (2.4) is well-posed, we should make an additional assumption
for L as follows; i.e.,

(2.8) ((T + T̃ )v, v)L ≥ 2µ0 kvk2L , ∀ v ∈ L,

where µ0 is a positive constant. Then we choose

(2.9) V = N (B − M ), V ∗ = N (B + M ∗ ).

We have the following important theory for Friedrichs’ system [23, Lemma 3.2 and Theorem
3.1].
Theorem 2.1. Assume (2.2) (2.8), (2.6) and (2.7) hold true. Let V and V ∗ be given by (2.9).
The following statements hold true:
1. For any v ∈ W , it holds

(2.10) µ0 kvkL ≤ kT vkL , µ0 kvkL ≤ kT̃ vkL .

2. For any f ∈ L, problem (2.4) has a unique solution in V . In other words, T is an isomor-
phism from V onto L. Moreover, T̃ is an isomorphism from V ∗ onto L.
2.2. First Order PDEs of Friedrichs Type. As a typical application of the above
framework, we restrict L to be the space of square integral (vector-valued) functions over an open
and bounded domain Ω ⊂ Rd with Lipschitz boundary, D to be the space of test functions, and T to
be a first-order differential operator with its formal adjoint T̃ . In particular, we take L = [L2 (Ω)]r ,
r ∈ N and D = [C0∞ (Ω)]r . D is thus dense in L. Consider T : D → L as follows
d
X
(2.11) Tu = Ak ∂k u + Cu = f , ∀ u ∈ D.
k=1

The standard assumptions are imposed on Ak and C for Friedrichs’ system [20, 21, 24]:

(2.12) C ∈ [L∞ (Ω)]r×r ,


d
X
(2.13) Ak ∈ [L∞ (Ω)]r×r , k = 1, · · · , d and ∂k Ak ∈ [L∞ (Ω)]r×r
k=1
(2.14) Ak = A|k , a. e. in Ω, k = 1, · · · , d.

The formal adjoint T̃ : D → L of T can be defined by


d
X d
X
(2.15) T̃ u = − Ak ∂k u + (C | − ∂k Ak )u, ∀ u ∈ D.
k=1 k=1
6 Friedrichs Learning for Weak Solutions of PDEs

It is easy to see that T and T̃ satisfy (2.1)-(2.2). All the results in this section hold true for
Friedrichs’ system satisfying (2.12)-(2.14).
For an abstract Friedrichs’ system, one may find the explicit representation of B, but it is very
difficult
Pdto derive the operator on M which is governed by the conditions (2.6) and (2.7). Assume
|
B = k=1 nk Ak is well-defined a.e. on ∂Ω where n = (n1 , · · · , nd ) is the unit outward normal
vector of ∂Ω. For simplicity of notations, we set Hs = [H s ]r with H s being the usual Sobolev
space of order s, and C 1 = [C 1 ]r with C 1 being the space of continuously differentiable functions,
similarly notate C0∞ = [C0∞ ]r .
Lemma 2.2. [3, 45] For u, v ∈ H1 (Ω) ⊂ W (Ω), there holds
hBu, viW 0 (Ω)×W (Ω) = hBu, vi 1 1 ,
H− 2 (∂Ω)×H 2 (∂Ω)

where W (Ω) = {u ∈R L(Ω); T u ∈ L(Ω)} and W 0 (Ω) is the dual space of W (Ω). Specifically,
hBu, viW 0 (Ω)×W (Ω) = ∂Ω v | Buds, for any u, v ∈ C0∞ (Rd ).
If Ω has segment property [4], C 1 (Ω) is thus dense in H1 (Ω) and further is dense in W (Ω).
Therefore, the representation could be uniquely extended to the whole space W (Ω) in the sense
that for any u ∈ W (Ω) and v ∈ H1 (Ω),
(2.16) hBu, viW 0 (Ω)×W (Ω) = hBu, vi 1 1 .
H− 2 (∂Ω)×H 2 (∂Ω)

The coercivity condition on T dictated by the positiveness condition on the coefficients Ak and
C [20, 21, 22] is needed to show the well-posedness of PDEs of Friedrichs type. After some direct
manipulation, the abstract coercivity condition (2.8) is equivalent to the following full coercivity
for Friedrichs PDEs:
d
X
(2.17) C + C| − ∂k Ak ≥ 2µ0 Ir , a.e., in Ω,
k=1
where µ0 is a positive constant and Ir is the r × r identity matrix. If a system does not satisfies the
coercivity condition (2.17) we can introduce a feasible transformation so that the modified system
satisfies this condition. In [12], the authors introduced the so-called partial coercivity condition to
study the mathematical theory of the corresponding system. Readers are referred to [12] for more
details.
2.3. Friedrichs Minimax Formulation. Throughout this subsection, we assume all the
conditions given in Theorem 2.1 hold true. Recall that V = N (B − M ) and V ∗ = N (B + M ∗ ) with
M ∈ L(W, W 0 ) satisfying conditions (2.6)-(2.7). For a given f ∈ L, find the solution u ∈ V such
that
(2.18) T u = f,
or equivalently,
(2.19) (T u, v)L = (f, v)L , ∀ v ∈ L.
In most cases, T is a differential operator whose action on a function should be understood in the
sense of distributions. u is thus called the weak solution of the primal variational equation (2.19).
We restrict v ∈ V ∗ ⊂ L. From (2.5),
(T u, v)L =(u, T̃ v)L + hBu, viW 0 ×W
B−M B+M
=(u, T̃ v)L + h u, viW 0 ×W + h u, viW 0 ×W
2 2
B + M∗
=(u, T̃ v)L + hu, viW 0 ×W = (u, T̃ v)L ,
2
Friedrichs Learning for Weak Solutions of PDEs 7

where we used u ∈ V = N (B − M ) and v ∈ V ∗ = N (B + M ∗ ). This, combined with (2.19), gives

(2.20) (u, T̃ v)L = (f, v)L , ∀ v ∈ V ∗.

For u ∈ V , (2.20) is equivalent to (2.19). For u ∈ L satisfying (2.20), u is called the weak solution
of the dual variational equation (2.20).
For u ∈ V , v ∈ V ∗ , we define

|(u, T̃ v)L − (f, v)L |


(2.21) L(u, v) := .
kT̃ vkL

According to the estimate (2.10), we have


 1 
|(u, T̃ v)L − (f, v)L | ≤ kukL kT̃ vkL + kf kL kvkL ≤ kukL + kf kL kT̃ vkL ,
µ0

where µ0 is given in (2.8). Therefore, the functional L(u, v) is bounded with respect to v ∈ V ∗ for
a fixed u ∈ L.
Thus we can reformulate the problem (2.18) or equivalently the problem (2.19) as the following
minimax problem formally:

|(u, T̃ v)L − (f, v)L |


(2.22) min max∗ L(u, v) := min max∗ ,
u∈V v∈V u∈V v∈V kT̃ vkL

to identify the weak solution of the primal variational equation (2.19).


Theorem 2.3. Assume all the conditions given in Theorem 2.1 hold true. Then u is the unique
weak solution of the primal variational equation (2.19) if and only if u is the unique solution that
solves the minimax problem (2.22).
Proof. On the one hand, if u ∈ V is a weak solution of (2.19), we have from (2.20) that
L(u, v) = 0 for all v ∈ V ∗ . Thus, u is a solution to the minimax problem (2.22).
On the other hand, if u is a solution of the minimax problem (2.22), then

|(u, T̃ v)L − (f, v)L |


max L(u, v) = max∗ = 0.
v∈V ∗ v∈V kT̃ vkL

Thus, we have L(u, v) = 0 for all v ∈ V ∗ . This implies

(u, T̃ v)L − (f, v)L = 0, ∀ v ∈ V ∗.

Since u is in V , the above equation gives

(T u − f, v)L = 0, ∀ v ∈ V ∗.

Observing that D belongs to V ∗ and is dense in L, the above equation implies that u is a weak
solution of the primal variational equation (2.19).
Finally, under the conditions given in Theorem 2.1, it is well known that the weak solution
u of the primal variational equation (2.19) exists and is unique. This completes the proof of this
theorem.

Note that the above discussion and Theorem 2.3 are concerned with the weak solution of the
primal variational equation (2.19) with a solution u being in V . It is also of interest to discuss
8 Friedrichs Learning for Weak Solutions of PDEs

the weak solution u of the dual variational equation (2.20) with u being in L due to Friedrichs (cf.
[24]). According to similar arguments for proving Theorem 2.3, we have the following theorem.
Theorem 2.4. Assume all the conditions given in Theorem 2.1 hold true. Then u is a weak
solution of the dual variational equation (2.20) if and only if u is a solution of the following minimax
problem:
|(u, T̃ v)L − (f, v)L |
(2.23) min max∗ L(u, v) = min max∗ .
u∈L v∈V u∈L v∈V kT̃ vkL

Note that the weak solution of the dual variational equation (2.20) in L may not be unique,
which is also true for the minimax problem (2.23). However, their solution is unique for Friedrichs’
system mentioned in the Subsection 2.2, due to the equivalence between the weak solution and the
strong solution (cf. [24]). In this case, the two problems (2.22) and (2.23) are equivalent.
Theorems 2.3-2.4 have covered various interesting equations in real applications. However, we
would like to mention that Friedrichs learning can be extended to a more general setting, e.g.,
u ∈ L but the data function f in T u = f is not necessarily in L. Since the solution space L is more
generic than V including solutions with discontinuity, this setting has a wide range of applications
in fluid mechanics. We will show this application by a numerical example for the advection-reaction
problem in Section 5. Theoretical analysis for more general cases is left as future work.
3. Examples of PDEs and the Corresponding Minimax Formulation. Using the
abstract framework and the minimax formulation developed in Section 2, we will derive the minimax
formulations for several typical PDEs. From now on, we will denote by (·, ·)Ω the standard L2 inner
product, which induces the L2 norm k · kΩ . These notations also apply to L2 smooth vector-valued
functions. For simplicity, we will focus on PDEs with homogeneous boundary conditions throughout
this section.
3.1. Advection-Reaction Equation. The advection-reaction equation seeks u such that
(3.1) µu + β · ∇u = f,
where β = (β1 , · · · , βd )| ∈ [L∞ (Ω)]d , ∇ · β ∈ L∞ (Ω), µ ∈ L∞ (Ω) and f ∈ L2 (Ω). Compared with
(2.11), (3.1) is a Friedrichs’ system by setting Ak = βk for k = 1, 2, ..., d and C = µ.
We assume there exists µ0 > 0 such that
1
(3.2) µ(x) − ∇ · β(x) ≥ µ0 > 0, a.e. in Ω.
2
Thus, the full coercivity condition in (2.17) holds true. The graph space W given by (2.3) is
W = {w ∈ L2 (Ω); β · ∇w ∈ L2 (Ω)}.
We define the inflow and outflow boundary for the advection-reaction equation (3.1):
(3.3) ∂Ω− = {x ∈ ∂Ω; β(x) · n(x) < 0}, ∂Ω+ = {x ∈ ∂Ω; β(x) · n(x) > 0}.
To enforce boundary conditions, we choose from the physical interpretation that
(3.4) V = {v ∈ W ; v|∂Ω− = 0}, V ∗ = {v ∈ W ; v|∂Ω+ = 0}.
In this case, it is easy to check that the conditions (2.6) and (2.7) hold true. By (2.11),
d 
X ∂v ∂ 
T̃ v = − βi + βi v + C | v = −β · ∇v − (∇ · β)v + µv.
∂xi ∂xi
i=1
Friedrichs Learning for Weak Solutions of PDEs 9

The minimax problem is thus given as follows



| u, −(β · ∇v + (∇ · β)v) + µv Ω − (f, v)Ω |
min max∗ L(u, v) = min max∗ .
u∈V v∈V u∈V v∈V kβ · ∇v + (∇ · β)v − µvkΩ
Note that if the coercivity condition (3.2) does not hold true, we can introduce a transformation
u = eλ0 t ũ, so that the advection-reaction equation (3.1) in ũ satisfies (3.2) for sufficiently large
constant λ0 > 0.
3.2. Scalar Elliptic PDEs. Consider the second-order PDE to find u satisfying
(3.5) − ∆u + µu = f, in Ω,
where Ω ⊂ Rd , µ ∈ L∞ (Ω) is positive and uniformly bounded away from zero, f ∈ L2 (Ω). This
PDE can be rewritten into a first-order PDE system by introducing an auxiliary function v; i.e.,
v + ∇u = 0, µu + ∇ · v = f.
This first order system could be formulated into a Friedrichs’ system with r = d + 1. The
Hilbert space L is chosen as L = [L2 (Ω)]r . Let ũ = (v | , u)| ∈ L. For k = 1, 2, ..., d, Ak =
k
 
0 e Id 0
k | , C= , where ek is the k-th canonical basis of Rd . Since µ > 0 and has
(e ) 0 0 µ
a lower bound away from zero, the full coercivity condition (2.17) is satisfied. The graph space is
W = H(div; Ω) × H 1 (Ω).
One possible choice of the Dirichlet boundary condition is as follows
(3.6) V = V ∗ = H(div; Ω) × H01 (Ω) = {(v | , u)| ∈ W ; u|∂Ω = 0}.
The choices of boundary conditions are not unique, obviously. By introducing auxiliary variables,
the second-order linear PDE can be reformulated into a first-order PDE system. Finally, the weak
solution of (3.5) can be found by solving the equivalent minimax problem in (2.22).
Denote the test function by ψ = (ψv| , ψu )| in the space V ∗ . The minimax problem can be
presented as

| − v, ψv − ∇ψu Ω + (u, µψu − ∇ · ψv )Ω − (f, ψu )Ω |
min max L(ũ, ψ) = min max∗ .
ũ∈V ψ∈V ∗ ũ∈V ψ∈V k((ψv − ∇ψu )| , µψu − ∇ · ψv )| kΩ
To reduce the computational cost, we will reformulate the above formulation into a minimax prob-
lem in a primal form. To this end, letting ψv = ∇ψu , and noting that ũ = ((∇u)| , u)| , we have by
a direct manipulation that
|(u, µψu − ∆ψu )Ω − (f, ψu )Ω |
L(ũ, ψ) = ,
kµψu − ∆ψu kΩ
which induces the following minimax problem
|(u, µψu − ∆ψu )Ω − (f, ψu )Ω |
(3.7) min max L(u, ψu ) = min max .
u∈H01 (Ω) ψu ∈H01 (Ω) u∈H01 (Ω) ψu ∈H01 (Ω) kµψu − ∆ψu kΩ
In fact, we can derive the above minimax problem in a rigorous way. From (3.5), we have
(−∆u + µu, ψu )Ω = (f, ψu )Ω , ∀ ψu ∈ H01 (Ω),
which, from the usual integration by parts twice, gives
(u, µψu − ∆ψu )Ω = (f, ψu )Ω , ∀ ψu ∈ H01 (Ω).
This will naturally give the minimax problem (3.7).
10 Friedrichs Learning for Weak Solutions of PDEs

3.3. Maxwell’s Equation in the Diffusion Regime. The Maxwell’s equations in R3


in the diffusive regime could be considered as

(3.8) µH + ∇ × E = f , σE − ∇ × H = g,

with µ and σ being two positive functions in L∞ (Ω) and uniformly bounded away from zero. Three-
dimensional functions f , g lie in the space [L2 (Ω)]3 and the solution functions (H | , E | )| are in
2 3 2 3 . In Equation (2.11), set r = 6 and let A ∈ R6×6 and C be A =
the
 space [L k(Ω)]  × [L (Ω)]
  k k
0 R µ · I3 0 k
, C= , for k = 1, 2, 3. Here, the entries of Rij = sign(i − j) if
(Rk )| 0 0 σ · I3
i = k+1(mod 3) and Rkij = 0 otherwise. The graph space is defined as W = H(curl; Ω)×H(curl; Ω).
One example of the boundary condition is V = V ∗ = H(curl; Ω) × H0 (curl; Ω). The function pair
| | |
u := (H | , E | )| ∈ W is in V whenever E × n|∂Ω = 0. Let ψ = (ψH , ψE ) be the test function in

V . Then the minimax problem in (2.22) becomes
 
| H, −∇ × ψE + µψH Ω + E, ∇ × ψH + σψE Ω − (f , ψH )Ω − (g, ψE )Ω |
(3.9) min max .
u∈V ψ∈V ∗ k((−∇ × ψE + µψH )| , (∇ × ψH + σψE )| )| kΩ

4. Deep Learning-Based Solver. To complete the introduction of Friedrichs learning,


we introduce a deep learning-based method to solve the minimax optimization in (2.22) or (2.23)
for the weak solution of (2.18) or (2.20) in this section. For simplicity, we will focus on the minimax
optimization (2.22) to identify the weak solution of (2.18).
4.1. Overview. In the deep learning-based method, one solution DNN, φs (x; θs ), is ap-
plied to parametrize the weak solution u in (2.22) and another test DNN, φt (x; θt ), is used to
parametrize the test function ψ in (2.22). Here, θs and θt are the parameters to be identified such
that

(θ̄s , θ̄t ) = arg min max L(φs (x; θs ), φt (x; θt ))


θs θt
(4.1) |(φs (x; θs ), T̃ φt (x; θt ))Ω − (f, φt (x; θt ))Ω |
= arg min max ,
θs θt kT̃ φt (x; θt )kΩ
under the constraints
φs (x; θs ) ∈ V and φt (x; θt ) ∈ V ∗ .
For simplicity, we use L(θs , θt ) for short to represent L(φs (x; θs ), φt (x; θt )) from now on.
4.2. Network Implementation and Approximation Theory. Now, we will introduce
the network structures of the solution DNN and test DNN used in the previous section. In this
paper, all DNNs are chosen as ResNet [35] defined as follows. Let φ(x; θ) denote such a network
with an input x and parameter θ, which is defined recursively using a nonlinear activation function
σ as follows:

(4.2) h0 = V x, g` = σ(W` h`−1 + b` ), h` = Ū` h`−2 + U` g` , ` = 1, 2, . . . , L, φ(x; θ) = a| hL ,

where V ∈ Rm×d , W` ∈ Rm×m , Ū` ∈ Rm×m , U` ∈ Rm×m , b` ∈ Rm for ` = 1, . . . , L, a ∈ Rm ,


h−1 = 0. Throughout this paper, U` is set as an identity matrix in the numerical implementation of
ResNets for the purpose of simplicity. Furthermore, as used in [18], we set Ū` as the identity matrix
when ` is even and set Ū` = 0 when ` is odd, i.e., each ResNet block has two layers of activation
functions. θ consists of all the weights and biases {W l , bl }L
l=0 . The number m and L are called the
Friedrichs Learning for Weak Solutions of PDEs 11

width and the depth of the network, respectively. The activation function σ is problem-dependent.
For example, if the DNN as a test function is required to be continuously differentiable, the Tanh
activation function can be chosen to guarantee that our DNN is in C ∞ ; if it is desired that φ(x; θ)
is in the H 1 space, the activation function ReLU(x) could be used, where ReLU(x) := max{0, x}.
ResNets contain fully connected neural networks (FNNs) as special examples when Ū` = 0 and
U` is the identity matrix for all `. Here, we quote existing approximation theory to briefly justify the
application of neural networks as a parametrization tool in this paper. Of particular interest here is
the approximation theory for Sobolev spaces W n,p [30, 31, 37] for numerical PDEs. The following
lemma is proved in [31] to describe the approximation power of neural networks quantitatively.
Lemma 4.1 (Theorem 4.9 of [31]). Let d ∈ N, k ∈ N0 , n ∈ N≥k+1 , and 1 ≤ p ≤ ∞. There
exist constants L, C, and ˜ such that, for every  ∈ (0, ˜) and every f ∈ {f ∈ W n,p ((0, 1)d ) :
kf kW n,p ((0,1)d ) ≤ 1}, there exist a FNN φ with at most L layers and nonzero weights at most
(
C−d/(n−k) , max{0, x}a activation function,
(4.3) M=
C−d/(n−k−1) , Tanh activation function,

such that
kφ − f kW k,p ((0,1)d ) ≤ .

The approximation theory in Lemma 4.1 justifies the application of Tanh, ReLU, and the power
of ReLU as activation functions in FNNs to approximate target functions in Friedrichs learning.
Since ResNets of depth L and width m contain FNNs of depth L and width m as special cases,
Lemma 4.1 can also provide a lower bound of the approximation capacity of ResNets to justify the
application of ResNets in our numerical examples. Lemma 4.1 is asymptotic in the sense that it
requires sufficiently large network width and depth. For quantitative results in terms of a finite
width and depth, the reader is referred to [37].
In theory, the target function space of neural network approximation in Friedrichs learning
may be as large as the Lp space, which is not covered by Lemma 4.1. Recently, the approximation
capacity of neural networks for Lp spaces has been characterized in [61].

4.3. Unconstrained Minimax Problem. When the domain becomes relatively com-
plex, the penalty method may be employed to solve the constrained minimax optimization in (4.1).
For this purpose, we shall introduce a distance to quantify how good the solution DNN is and test
how DNN satisfies its constraints. Such a distance is specified according to the boundary condi-
tions. Denote by dist(φ(x; θ), V ) the distance between a DNN φ(x; θ) and a space V . Therefore,
the penalty terms of boundary conditions can be written as

(4.4) Lb (θs , θt ) := λ1 dist(φs (x; θs ), V ) + λ2 dist(φt (x; θt ), V ∗ ),

where λ1 and λ2 are two positive hyper-parameters. Finally, the constraint minimax problem (4.1)
can be formulated into the following unconstrained minimax problem

(4.5) (θ̄s , θ̄t ) = arg min max L(θs , θt ) + Lb (θs , θt ) ,
θs θt

which can be solved to obtain the solution DNN φs (x; θ̄s ) as the weak solution of the given PDE
in (2.18) by Friedrichs Learning.
12 Friedrichs Learning for Weak Solutions of PDEs

4.4. Special Networks for Different Boundary Conditions. As discussed in [29, 28],
it is possible to build special networks to satisfy various boundary conditions automatically, which
can simplify the unconstrained optimization (4.5) into

(4.6) (θ̄s , θ̄t ) = arg min max L(θs , θt ).


θs θt

This optimization problem (4.6) is easier to solve compared to (4.5) since two hyperparameters λ1
and λ2 in (4.4) are dropped. Note that for a regular PDE domain, e.g., a hypercube or a ball, it is
simple to construct such special networks satisfying various boundary conditions automatically.
Let us take the case of a homogeneous Dirichlet boundary condition as an example. For other
cases, the readers are referred to [29, 28]. A DNN satisfying the Dirichlet boundary condition
ψ(x) = g(x) on ∂Ω can be constructed by φ(x; θ) = h(x)φ̂(x; θ) + b(x), where φ̂ is a generic
network as in (4.2), and h(x) is a specifically chosen function such that h(x) = 0 on ∂Ω, and b(x)
is chosen such that b(x) = g on ∂Ω. For example, if Ω is a d-dimensional unit ball, then φ(x; θ)
can take the form φ(x; θ) = (|x|2 − 1)φ̂(x; θ) + b(x). For another example, if Ω is the d-dimensional
d
hyper-cube [−1, 1]d , then φ(x; θ) can take the form φ(x; θ) = (x2i − 1)φ̂(x; θ) + b(x).
Q
i=1

4.5. Network Training. Once the solution DNN and test DNN have been set up, the
rest is to train them to solve the minimax problem in (4.5). The stochastic gradient descent (SGD)
method or its variants (e.g., RMSProp [36] and Adam [49]) is an efficient tool to solve this problem
numerically. Although the convergence of SGD for the minimax problem is still an active research
topic [34, 14, 66], empirical success shows that SGD can provide a good approximate solution. The
training algorithm and main numerical setup are summarized in Algorithm 1.
In Algorithm 1, the outer iteration loop takes n iterations. Each inner iteration loop contains ns
steps of θs updates and nt steps of θt updates. In each inner iteration for updating θs , we generate
two new sets of random samples {x1i }N 2 Nb
i=1 ⊂ Ω and {xi }i=1 ⊂ ∂Ω following uniform distributions.
In most of the examples, the Latin Hyper-cube Sampling method is employed to generate random
points in order to simulate the distributional characteristics even for the relatively small number of
samples. We define the empirical loss of these training points for the Friedrichs’ system (2.11) as

(4.7) Lt (θs , θt ) := L̂(θs , θt ) + L̂b (θs , θt ),

|L̂n (θs ,θt )|


where L̂(θs , θt ) := with
L̂d (θs ,θt )

N1 X
d N1
A(Ω) X ∂ 1 1
 A(Ω) X
C | φt (x1i ; θt ), φs (x1i ; θs )

L̂n (θs , θt ) = (−Aj φt (xi ; θt )), φs (xi ; θs ) +
N1 ∂xj N1
i=1 j=1 i=1
N1 N2 d
A(Ω) X  A(∂Ω) X X
f (x1i ), φt (x1i ; θt ) + Aj nj )φs (x2i ; θs ), φt (x2i ; θt ) ,

− (
N1 N2
i=1 i=1 j=1

1 N d
A(Ω) X X ∂
L̂d (θs , θt ) = k (−Aj φt (x1i ; θt )) + C | φt (x1i ; θt )k22 ,
N1 ∂xj
i=1 j=1

where (·, ·) denotes the inner product of two vectors, k · k2 denotes the 2-norm of vectors, A(·) is
denoted as the area or volume of the integral region, ∂x∂ j denotes the partial derivative with respect
to the j-th argument of a function in x, and {Aj }dj=1 has been introduced in Section 2.2. As for
Friedrichs Learning for Weak Solutions of PDEs 13

the boundary loss, let us take the Dirichlet boundary condition u(x) = gd (x) as an example. In
this case, the boundary loss can be formulated as

2 N
A(∂Ω) X
L̂b (θs , θt ) := kφs (x2i , θs ) − gd (x2i )k22 .
N2
i=1
As mentioned in Section 4.4, if the solution DNN and test DNN are both built to satisfy their
boundary conditions automatically, L̂b (θs , θt ) is zero.
Next, we compute the gradient of Lt (θs , θt ) with respect to θs , denoted by gs , which is known as
the gradient descent direction. The gradient is evaluated via the autograd in PyTorch, which is es-
sentially computed by processing a sequence of chain rules since the loss function is the composition
of several simple functions with explicit formulas. For specific classes of PDEs, the computational
cost of gradients can be reduced via recent development [10]. Besides, optimizers will use gs to-
gether with some historical gradient information to output a real descent direction, say g̃s . Thus,
θs can update along the direction g̃s as θs ← θs − ηs g̃s . In each outer iteration of Algorithm 1, we
repeatedly sample new training points and update θs for ns steps.
In each inner iteration, θt can be updated similarly to maximize the empirical loss Lt (θs , θt ).
In each inner iteration for updating θt , we generate random samples and evaluate the gradient of
the empirical loss with respect to θt , denoted by gt . Similar to the update of θs , θt can be updated
via one step of ascent with a step size ηt as follows: θt ← θt + ηt g̃t . In each outer iteration, we
repeatedly sample new training points and update θt for nt steps.
We would like to emphasize that minimax optimization problems are generally more challenging
to solve than minimization problems arising in network-based PDE solvers in the strong form. Note
that, when we fix the test DNN φt (x; θt ), the loss function in (4.1) is a convex functional with respect
to the solution DNN φs (x; θs ), but not with respect to the parameters θs on it. Hence, the difficulty
of the minimization problem when the test DNN is fixed is the same as the network-based least
squares method. An appropriate choice of step size is crucial to improve the solution. Moreover, in
the extra step of updating test function DNN for a fixed solution DNN, the maximization problem
over the test DNN is not convex neither in the parameter space nor in the DNN space, which makes
the optimization even difficult.
To further facilitate the convergence of Friedrichs learning, a restarting strategy is employed
to obtain the restarted Friedrichs learning in Example 5.1, which is in the same spirit as typical
restarted iterative solvers in numerical linear algebra, e.g., the restarted GMRES [46], or the restart
strategies in optimization [2, 33, 59, 39]. For simplicity and without loss of generality, the restarted
Friedrichs learning is introduced for PDEs with Dirichlet boundary conditions. For other boundary
conditions, the restarted Friedrichs learning can be designed similarly. We stress the fact that
except for the example in 5.1, the Friedrichs learning algorithm performs well enough without a
restarting strategy, so we do not implement the restarting method in the subsequent experiments.
5. Numerical Experiments. In this section, all hyperparameters are listed in Table 5.1.
We set the solution DNN φs (x, θs ) as a fully connected ResNet with ReLU activation functions,
depth 7, and width ms , where ms is problem dependent. The activation of φs (x, θs ) is chosen
as ReLU due to its capacity to approximate functions with low regularity and its good numerical
performance. The test DNN φt (x, θt ) has the same structure with depth 7 and width mt . To ensure
the smoothness of φt (x, θt ), we employ the Tanh activation function. The optimizers for updating
φs (x, θs ) and φt (x, θt ) are chosen as Adam and RMSProp, respectively. All of our experiments
share the same setting for network structures and optimizers. During the pre-training phase, we
always set the learning rate to be larger than the following training phase. Thereafter, to ensure
an effective and stable training process, the learning rate in the optimization is updated in an
14 Friedrichs Learning for Weak Solutions of PDEs

Algorithm 1 Restarted Friedrichs Learning for Weak Solutions of PDEs.


Require: The desired PDE.
Ensure: Parameters θt and θs solving the minimax problem in (4.5).
Set iteration parameters n, ns , and nt . Set sample size parameters N1 and N2 . Set step sizes
(k) (k)
ηs and ηt in the k-th outer iteration. Set the restart index set Θs and Θt .
Initialize φs (x; θs0,0 ) and φt (x; θt0,0 ).
for k = 1, · · · , n do
if k ∈ Θs then
Keep a copy b(x) = φs (x, θsk−1,0 ) and randomly re-initialized θsk−1,0 .
if the penalty method for boundary conditions is used then
Set a new DNN φs (x, θsk−1,0 ) = φ̂s (x, θsk−1,0 ) + b(x) with a generic DNN φ̂s (x, θsk−1,0 ).
else
Set a new DNN φs (x, θsk−1,0 ) = h(x)φ̂s (x, θsk−1,0 ) + b(x) with a generic DNN φ̂s (x, θsk−1,0 )
and h(x) in (5.4).
end if
end if
for j = 1, · · · , ns do
Generate uniformly distributed sample points {x1i }N 2 N2
i=1 ⊂ Ω and {xi }i=1 ⊂ ∂Ω.
1

Compute the gradient of the loss function in (4.7) at the point (θsk−1,j−1 , θtk−1,0 ) with respect
to θs and denote it as g(θsk−1,j−1 , θtk−1,0 ).
(k) (k)
Update θsk−1,j ← θsk−1,j−1 − ηs g(θsk−1,j−1 , θtk−1,0 ) with a step size ηs .
end for
θsk,0 ← θsk−1,ns .
If k ∈ Θt , re-initialize θtk−1,0 randomly.
for j = 1, · · · , nt do
Generate uniformly distributed sample points {x1i }N 2 N2
i=1 ⊂ Ω and {xi }i=1 ⊂ ∂Ω.
1

Compute the gradient of the loss function in (4.7) at (θsk,0 , θtk−1,j−1 ) with respect to θt and
denote it as g(θsk,0 , θtk−1,j−1 ).
(k) (k)
Update θtk−1,j ← θtk−1,j−1 + ηt g(θsk,0 , θtk−1,j−1 ) with a step size ηt .
end for
θtk,0 ← θtk−1,nt .
if Stopping criteria is satisfied then
Return θs = θsk,0 and θt = θtk,0 .
end if
end for

exponentially decaying scheme. More precisely, at the k-th iteration, we set the learning rate
(k) (0) 1 (k/νs ) (0)
ηs = ηs ( 10 ) for the solution DNN, where ηs is the initial learning rate and νs is the
(k) (0)
1 (k/νt )
decaying rate. Similarly, we set ηt = ηt ( 10 ) for test DNN. The codes for reproducing the
numerical results are available at https://github.com/SeiruGanki/Friedrich-Learning.
Throughout this section, special networks satisfying boundary conditions automatically are
used to avoid tuning the parameters λ1 and λ2 in (4.4); the inner iteration numbers are set as
ns = 1 and nt = 1. The values of other parameters listed in Table 5.1 will be specified later.
To measure the solution accuracy, the following discrete relative L2 error at uniformly dis-
Friedrichs Learning for Weak Solutions of PDEs 15

Notation Meaning
d the dimension of the problem
np the number of pre-training iterations
n the number of outer iterations
ηsp the pre-training learning rate for optimizing the solution network
ηtp the pre-training learning rate for optimizing the test network
(0)
ηs the initial learning rate for optimizing the solution network
(0)
ηt the initial learning rate for optimizing the test network
νs the decaying rate for ηs
νt the decaying rate for ηt
ms the width of each layer in the solution network
mt the width of each layer in the test network
ns the number of inner iterations for the solution network
nt the number of inner iterations for the test network
N the number of training points inside the domain
Nb the number of training points on the domain boundary
Θs the restart index set of the solution network
Θt the restart index set of the test network
Table 5.1
Parameters in the model and algorithm.

tributed test points in the domain is applied; i.e.,


1
kφs (xi ; θs ) − u∗ (xi )k22 2
P
i
eL2 (θs ) :=   ,
 
ku (xi )k22
P ∗
i

where u∗ is the exact solution. In the case when the true solution is continuous, the following
discrete relative L∞ error at uniformly distributed test points in the domain is also applied; i.e.,
maxi (kφs (xi ; θs ) − u∗ (xi )k∞ )
eL∞ (θs ) := ,
maxi (ku∗ (xi )k∞ )

where k · k∞ denotes the L∞ -norm of a vector. In most examples, we choose at least 10, 000 testing
points for error evaluation. When the dimension is high or the value of the target function surges,
we may choose 50, 000 or even 100, 000 testing points.
5.1. Advection-Reaction Equation with Plain Discontinuity. In the first example,
we identify the weak solution in L2 (Ω) of the advection-reaction equation in (3.1) with discontinuous
solutions. Following Example 2 in [38], we choose the velocity β = (1, 9/10)| and µ = 1 in the
domain Ω = [−1, 1]2 . We choose the right-hand-side function f and the boundary function g such
that the exact solution is
9 9
sin(π(x + 1)2 /4) sin(π(y − 10 x)/2) for − 1 ≤ x ≤ 1, 10 x < y ≤ 1,


(5.1) u (x, y) = 2 9 2
−5(x +(y− 10 x) ) 9
e for − 1 ≤ x ≤ 1, −1 ≤ y < 10 x.

The exact solution is visualized in Figure 5.1(b). The discontinuity of the initial value function will
propagate along the characteristic line y = 9x/10. Hence, the derivative of the exact solution does
16 Friedrichs Learning for Weak Solutions of PDEs

not exist along that line. Classical network-based least square algorithms in the strong form will
encounter a large residual error near the characteristic line and hence its accuracy may not be very
attractive, which motivates our Friedrichs Learning in the weak form.
As discussed in [38], a priori knowledge of the characteristic line is crucial for conventional
finite element methods with adaptive mesh to obtain high accuracy. In [38], the streamline dif-
fusion method (SDFEM) can obtain a solution with O(10−2 ) accuracy using O(104 ) degrees of
freedom when the mesh is aligned with the discontinuity, i.e., when the priori knowledge of the
characteristic line is used in the mesh generation. The discontinuous Galerkin method (DGFEM)
in [38] can obtain O(10−8 ) accuracy under the same setting. When the mesh is not aligned with the
discontinuity, e.g., when the characteristic line is not used in mesh generation, DGFEM converges
as slow as SDFEM and the accuracy is not better than O(10−2 ) with O(104 ) degrees of freedom
according to the discussion in [38].
As a deep learning algorithm, Friedrichs Learning is a mesh-free method and the weak solution
can be identified without the priori knowledge of the characteristic line. By the discussion in Section
4.4, a special network φs (x, θs ) is constructed as follows to fulfill the boundary condition of the
solution:
π π π π
(5.2) φs (x, θs ) = cos(− + x) cos(− + y)φ̂s (x, θs ) + b(x, y),
4 4 4 4
where b(x, y) is constructed directly from the boundary condition as
(5.3)
for − 1 ≤ x ≤ 1, −0.4 + x/2 < y ≤ 1,

 0,
−5[x 2 +(−1−9/10x)2 ] −5[(−1)2 +(y+9/10)2 ]
b(x, y) = e +e
−5[(−1) 2 +(−1+9/10)2 ] for − 1 ≤ x ≤ 1, −1 ≤ y ≤ −0.4 + x/2,
−e

,

satisfying b(x, y) = u(x, y) on the inflow boundary ∂Ω− . For test function, we fix its structure so
that φt (x, θt ) = 0 on ∂Ω+ defined in (3.3).
First of all, the restarting strategy as introduced at the end of section 4 for pre-training the base
function is employed. The special network structure satisfying the Dirichlet boundary conditions
for solution DNN φs is constructed as

(5.4) φs (x; θs ) = h(x)φ̂s (x; θs ) + b(x),

where b(x) satisfies the boundary condition which also can be regarded as an initial guess; h(x) = 0
on the Dirichlet boundary. We observe that if b(x) is closer to the true solution, it is easier to
train a generic DNN φ̂s to obtain the solution DNN φs that approximates the true solution more
accurately. Therefore, after a few rounds of outer iterations in the original Friedrichs learning, we
obtain a rough solution DNN, which can be served as a better b function in (5.4) to construct a
new solution DNN. After that, we will continue training to obtain a more accurate solution.
Secondly, we choose b(x, y) to be discontinuous along a random line rather than the true discon-
tinuous line of the exact solution. This could be a reasonable reproduction of the real application
scenarios. Indeed, our choice of b(x, y) above actually makes the problem more challenging. The
true solution is discontinuous along the characteristic line, the blue line in Figure 5.1(a), and b(x, y)
is discontinuous along the orange line in Figure 5.1(a). Hence, to make the solution DNN φs in
(5.2) approximate the true solution well, one algorithm needs to find and correct these two lines
automatically and the DNN φ̂s in (5.2) should be approximately discontinuous along these two
lines. As shown by Figure 5.1(d), with Friedrichs learning the solution DNN φs has a configuration
similar to the true solution in Figure 5.1(b), which means that it has successfully learned these two
Friedrichs Learning for Weak Solutions of PDEs 17

Parameters n ms mt N Nb Θs
Value 50, 000 pre-train 50, after 250 150 90, 000 45, 000 {1, 000}
(0) (0)
Parameters ηs ηt νs νt parameter number Θt
Value 3e-4 3e-3 9, 000 9, 000 327, 700 ∅
Table 5.2
The parameters for the Friedrichs learning solver of the experiment in Section 5.1.

(0)
Parameters n ms N ηs νs
Value 50, 000 250 90, 000 1e-3 10, 000
Table 5.3
The parameters of the comparative experiment in Section 5.1.

lines. This feature can be significant because no prior knowledge of the discontinuity of the exact
solution is needed during the training, as long as the boundary condition is satisfied.
Thirdly, we can observe the mechanism of Friedrichs learning from Figure 5.1(e), where the
test DNN φt surges and has a larger magnitude near these two lines to emphasize the error of the
solution DNN φs . It can make the update of the configuration of φs more focused on these two
lines than other places, which in turn facilitates the expected convergence of the solution DNN.
The whole training process can be divided into two phases due to restarting. In Phase I of
pre-training, we train a ResNet of width 50 for 1, 000 outer iterations to get a rough solution with
an L2 relative error 2.76e-1. All other parameters are shown in Table 5.2. As shown in Figure
5.1(c), the rough solution has already captured basically the shape of the solution. In Phase II of
training, we set this rough solution as a base function b(x) and again set up a ResNet of width
150. It is shown that 50, 000 outer iterations are enough to make the L2 error of the solution DNN
decrease to 2.27e-2, as shown in Figure 5.1(d) and Figure 5.1(f). Our method is comparable with
the SDFEM in [38] considering the same order of degrees of freedom summarized in Table 5.2.
However, SDFEM in [38] requires the priori knowledge of the characteristic line while our method
does not. Therefore, from the perspective of practical computation, our method would be more
convenient in real applications.
To compare Friedrichs Learning and the DNN-based least square (LS) algorithm [15, 50, 60],
we conduct comparative experiments with very similar hyper-parameters shown in Table 5.3. After
50, 000 iterations we obtain a solution with the relative error in L2 norm which is 3.29e-2 as shown in
5.1(f). It is worth pointing out that the iteration shown is the outer iteration, and the computation
of Friedrichs learning costs about twice as much as the LS approach for each iteration. Though
Friedrichs learning is more accurate, the DNN-based least square algorithm and the Friedrichs
learning have errors of the same order in this numerical test.

5.2. Advection-Reaction Equation with Curved Discontinuity. p Considerpa domain


Ω = {(x, y)|x2 + y 2 ≤ 1, y ≥ 0}. The velocity β = (sin θ, − cos θ)| = (y/ x2 + y 2 , −x/ x2 + y 2 )
with θ being the polar angle and µ = 0. The Dirichlet boundary condition on the inflow boundary
is given as u(x, 0) = 1 for −1 ≤ x ≤ − 12 , u(x, 0) = 0 for − 12 < x ≤ 0. The true solution is

0, x2 + y 2 < 1/4

(5.5) u∗ (x, y) = .
1, x2 + y 2 ≥ 1/4

Again, without the prior knowledge of the characteristic line, to create a network satisfying the
18 Friedrichs Learning for Weak Solutions of PDEs

(a) The characteristic line (blue) of (b) Exact solution. (c) The solution DNN right before
the exact solution and the line (or- restarting.
ange) along which b(x, y) in (5.2) is
discontinuous.

(d) The point-wise error of approx- (e) The test DNN value at epoch (f) The relative L2 error curve
imate solution at epoch 50, 000 by 50, 000. by DNN-based least square and
Friedrichs learning. Friedrichs learning

Fig. 5.1. Numerical results of Equation (3.1) when the exact solution is chosen as (5.1).

boundary condition, we choose a solution DNN φs as


π −x  π
(5.6) φs (x, θs ) = − arctan( ) sin( r)φ̂s (x, θs ) + b(x, y),
2 y 2
where 
0, x ≥ −1/2 p
b(x, y) = , and r = x2 + y 2 .
1, x < −1/2
φs will be applied as the solution network of Friedrichs learning. Similarly,
 π −x 
(5.7) φt (x, θt ) = − − arctan( ) φ̂t (x, θt ).
2 y
By applying Friedrichs learning with φs and φt as the solution and test DNN, respectively, we get an
approximate solution with an L2 relative error 2.48e-2 with the iteration error visualized in Figure
5.3(b). Figure 5.2(a) shows the point-wise error after 100,000 iterations by Friedrichs learning.
Friedrichs learning can capture the discontinuous locations well with sharp characterization. The
test function value is relatively large around the discontinuous place, resulting in a greater weight
for samples around there, which can help to obtain a more accurate PDE solution. Our experiments
are implemented on the graphic card Nvidia Tesla P100 with CUDA; in this example, for 10, 000
iterations it will take about 50 minutes and cost twice as much as the Least Square methods.
As a comparison with traditional PDE solvers, note that the same PDE was solved by the
adaptive least-squares finite element method (LSFEM) in [55] with the same order of degrees of
Friedrichs Learning for Weak Solutions of PDEs 19

Parameters n ms mt N Nb
Value 100, 000 150 150 45, 000 5, 000
(0) (0)
Parameters ηs ηt νs νt parameter number
Value 3e-4 3e-3 15, 000 15, 000 113,850
Table 5.4
The parameters for the Friedrichs learning solver of the experiment in Section 5.2.

freedom (≈ 1.1 × 105 ) as in Friedrichs learning. The L2 relative error of LSFEM is 4.59e-2, which
is larger than the one by Friedrichs learning. We would like to emphasize that LSFEM in [55] has
applied extra computational resources to adaptively generate discretization mesh, without which
the error would be poorer. Besides, the DGFEM1 with adaptive mesh is also applied to solve the
same PDE with the same order of degrees of freedom (107,332) as in Friedrichs learning. The
L2 relative error of DGFEM is 2.05e-2, which is very similar to the error by Friedrichs learning.
Following the idea in [55] to visualize the solution, we project the approximate solutions by DGFEM
and Friedrichs learning to the radius axis in Figure 5.3(b) and plot the scatters corresponding to the
angle θ ranging from 0 to π, the points chosen is the same as DGFEM following the software built-
in functions. This visualization makes it easier to compare the solutions near the discontinuous
location. It is easy to see that the solution by DGFEM has a larger error than the one by Friedrichs
learning near the discontinuous location.
DNN-based least square is also applied to solve the same problem as a comparison. Two options
of DNN-based least square are tested: one with φs as the solution network so that there is no penalty
terms to enforce the boundary condition in the loss function; another one with a standard neural
network as the solution network and, hence, a penalty term in the loss function is added to enforce
the boundary condition. The first option, i.e., DNN-based least square with the special network
structure described in (5.6) to parametrize the PDE solution, fails to find a reasonable solution
even though the optimization loss is almost zero as shown by Figure 5.2(b). One possible reason is
due to the fact that the square loss in the strong form is 0 for b(x, y), since DNN-based least square
samples points randomly in the “interior” but not on the discontinuous line with probability almost
1. Therefore, even if the generic network φ̂s (x, θs ) is not 0 at the beginning, no information of the
discontinuity is captured by the strong form in DNN-based least square and, hence, the solution
network will converge to 0, resulting in a fake solution satisfying the equation almost everywhere in
the strong sense. However, this solution is mathematically wrong in the weak sense. For instance,
the derivatives across the discontinuity contain Dirac’s delta functions.
The second option of DNN-based least square can provide a meaningful solution and serves as
a good baseline for Friedrichs learning. Figure 5.2(a) shows the point-wise error after 100,000 itera-
tions by DNN-based least square with a boundary penalty term and Friedrichs learning. Friedrichs
learning can capture the location of discontinuous line with better accuracy than DNN-based least
square. The error curve of DNN-based least square in the L2 norm is shown in 5.2(b) (the red
line) and the iteration error cannot be improved anymore at the early beginning. DNN-based least
square with a boundary penalty term provides a solution with an L2 error 9.35e-2 after 100, 000
iterations and this error is almost 4 times as the error by Friedrichs learning.
5.3. Green’s Function. The next example is to identify the Green’s function of the
Laplacian operator by solving

(5.8) ∆u(x) = δ0 (x),


1
Available at https://github.com/dealii/dealii.
20 Friedrichs Learning for Weak Solutions of PDEs

(b) Upper, the relative L2 error curve with re-


(a) Top, the point-wise error for solution by DNN- spect to the iteration number for three different al-
based least square; Middle, the point-wise error gorithm settings; Lower, the running DNN-based
for solution by Friedrichs learning; Bottom, the least square loss with respect to the iteration num-
point-wise test function value by Friedrichs learn- ber.
ing.

Fig. 5.2. Numerical results of Equation (3.1) when the exact solution is chosen as (5.5).

(a) Projected solution of DGFEM with adaptive (b) Projected solution of Friedrichs learning. The
mesh grid. value of points 0.005 Euclid distance away from
x = − 12 is adjusted to true value.

Fig. 5.3. Numerical results of Equation (3.1) when the exact solution is chosen as (5.5).

where δ0 (x) is the Dirac’s delta function at the origin. In this example, we solve the above equation
on a 3D unit ball Ω = {x ∈ R3 | kxk2 ≤ 1}. The true solution is

1
(5.9) u∗ (x) = ,
8π kxk2
1
and the given Dirichlet boundary condition is u(x) = 8π on ∂Ω. Although the exact solution is
1
in H and has strong singularity near the origin, Friedrichs learning can provide an approximate
solution with a small error as shown in Figure 5.4(a) and 5.4(b). Figure 5.4(b) visualizes the
point-wise relative error of the solution by Friedrichs learning. We can see that, except for those
Friedrichs Learning for Weak Solutions of PDEs 21
(0)
Parameters n N ηs νs ms
Value 100, 000 45, 000 1e-3 15, 000 150
Table 5.5
The parameters of the comparative experiment in Section 5.2.

Parameters n ms mt N Nb ηsp
Value 20, 000 100 100 45, 000 5, 000 1e-4
(0) (0)
Parameters ηtp ηs ηt νs νt parameter number
Value 2e-4 1e-5 2e-5 10, 000 10, 000 51, 000
Table 5.6
The parameters for the Friedrichs learning solver of the experiment in Section 5.3.

locations that are very close to the origin, the relative errors are not greater than 1e-1. In Table
5.7, we summarize the relative L2 errors of the solution by Friedrichs learning in the region of
Ω\B(0, ε) with ε equal to 0.001, 0.01, 0.1, 0.2, respectively. Therefore, the solution is accurate when
the location is not very close to the origin.
As a comparison, the DNN-based least square method cannot find a meaningful solution for
the Green’s function. The right hand side function of (5.8) is a Dirac Delta function and, hence,
cannot be captured by the discrete analog of the least square loss function via random sampling.
Therefore, even if the DNN-based least square method can be applied to form an optimization
problem, the minimizer of this problem will return a constant function as a solution, which has a
large error.

(a) The cross section of the Green’s (b) The projected point-wise rela- (c) The relative L2 and maximum
function at x3 = 0. The Green’s tive error by Friedrichs learning on error curve with respect to the iter-
function has strong singularity near the slice x3 = 0. ation number.
the origin.

Fig. 5.4. Numerical results of Equation (3.1) when the exact solution is chosen as (5.9).

5.4. High-Dimensional Advection-Reaction Equation. We consider a 10D advection


equation with discontinuity in the domain [0, 1]10 . In particular, we find u = u(x) such that

  10
X 2 
(5.10) 2 1 + exp − xi ux1 + exp(2x1 )ux2 = 0,
i=3
22 Friedrichs Learning for Weak Solutions of PDEs

ε mean
0.2 3.47e-2
0.1 4.43e-2
0.01 8.16e-2
0.001 9.39e-2
Table 5.7
The relative L2 errors by Friedrichs learning in the region of Ω\B(0, ε) for the Green’s function experiment in
Section 5.3.

Parameters n ms mt N Nb ηsp
Value 50, 000 150 150 45, 000 5, 000 3e-4
(0) (0)
Parameters ηtp ηs ηt νs νt parameter number
Value 3e-3 5e-5 5e-4 20, 000 20, 000 115, 050
Table 5.8
The parameters for the Friedrichs learning solver of the experiment in Section 5.4.

∂u ∂u
where ux1 = ∂x1 and ux2 = ∂x2 . The exact solution is

   10
X 2  
(5.11) u∗ (x) = g exp(2x1 ) − 4 1 + exp − xi x2 ,
i=3

where 
1, x > 0
g(x) = .
0, x ≤ 0
The Dirichlet boundary condition is given on the inflow boundary {x|x1 = 0 or x2 = 0}.
Figure 5.5(a) and Figure 5.5(c) show that Friedrichs learning can identify the location of low
regularization by test DNNs in this high-dimensional problem. After 50,000 outer iterations, we
obtain an approximate solution with a relative L2 error 4.034e-2. As a comparison, the DNN-based
least square is also applied to solve the same problem and the relative L2 error is 1.015e-1, which
is much larger than the one by Friedrichs learning. In Figure 5.5(d), we observe that DNN-based
least square is not stable in optimization due to the curved discontinuity, and stops ultimately at
a solution with a large error.
5.5. Maxwell Equations. In the last example, we consider Maxwell equations (3.8) de-
fined in the domain Ω = [0, π]3 . Let H and E be the solutions of the Maxwell equations (3.8) with
µ = σ = 1. Let f , g ∈ [L2 (Ω)]3 be f = (0, 0, 0)| and g = (3 sin y sin z, 3 sin z sin x, 3 sin x sin y)| .
The boundary condition is set as E × n = 0, which is an ideal conductor boundary condition. The
exact solutions to these equations are H ∗ = (sin x(cos z − cos y), sin y(cos x − cos z), sin x(cos y −
cos x))| and E ∗ = (sin y sin z, sin z sin x, sin x sin y)| . Considering test functions (ϕ|H , ϕ|E )| in the
space V ∗ = V mentioned in (3.4), we set up DNNs to satisfy the boundary conditions ϕE · n = 0
and ϕH × n = 0, where n is the unit outward normal direction to the boundary. Note that the
domain is a cube, the normal vector is parallel to one of the unit vectors. The boundary condition
above is indeed a Dirichlet boundary. For example, S1 = {x = π} ∩ ∂Ω on the right surface, implies
that E2 |S1 = E3 |S1 = 0. It is worth pointing out that the Dirichlet boundary for (Ei| , (ϕH )i )|
closes the faces of the cube as shown in Figure 5.6(a). Here, we denote by Ei (i = 1, 2, 3) the i-th
component of the vector E and the same applies to other notations.
Friedrichs Learning for Weak Solutions of PDEs 23

(a) The projected point-wise error by Friedrichs learn- (b) The projected point-wise error by DNN-based least
ing on the slice xi = 12 , i = 3, 4, . . . , 10. square on the slice xi = 12 , i = 3, 4, . . . , 10.

(c) The projected point-wise test function value on the (d) The relative L2 error curve with respect to the
slice xi = 12 , i = 3, 4, . . . , 10. iteration number by DNN-based least square and
Friedrichs learning.

Fig. 5.5. Numerical results of Equation (3.1) when the exact solution is chosen as (5.11).

(0)
Parameters n N ηs νs ms
Value 50, 000 45, 000 1e-3 20, 000 150
Table 5.9
The parameters of the comparative experiment in Section 5.4.

To solve the Maxwell equations by Friedrichs learning, we initialize sub-networks of width ms


for vector functions and each sub-network decides one output value of the vector function. The
test networks are set up similarly. We list all the parameters used in this experiment in Table 5.10.
After 20, 000 outer iterations, we obtain an L2 relative error 1.766e-2 and an L∞ relative error
3.467e-2. Figure 5.6(c) and Figure 5.6(d) illustrate the absolute difference between E1 and (φE )1
and the absolute difference between H1 and (φH )1 after 20, 000 outer iterations.
6. Conclusion. Friedrichs learning was proposed as a new deep learning methodology to
learn the weak solutions of PDEs via Friedrichs seminal minimax formulation. Extensive numerical
results imply that our mesh-free method provides reasonably accurate solutions for a wide range
of PDEs defined on regular and irregular domains in various dimensions, where classical numerical
methods may be difficult to be employed. In particular, Friedrichs learning infers the solution
24 Friedrichs Learning for Weak Solutions of PDEs

(a) The boundary conditions of (E1 , (φH )1 ). (b) The relative error versus iterations.

(c) The absolute difference between E1 and (φE )1 after (d) The absolute difference between H1 and (φH )1 af-
20, 000 outer iterations. ter 20, 000 outer iterations.

Fig. 5.6. Numerical results of Maxwell equations in (3.8).

Parameters n ms mt N
Value 20, 000 250 50 50, 000
(0) (0)
Parameters ηs ηt νs νt
Value 3e-6 3e-3 8, 000 15, 000
Table 5.10
The parameters for Friedrichs learning solver of the experiment in Section 5.5.

without the knowledge of the location of discontinuity when the solution is discontinuous. Our
numerical experiments show that Friedrichs learning can solve PDEs with a discontinuous solution
to O(10−2 ) accuracy, while the DNN-based least square method can typically only get O(10−1 )
accuracy. This demonstrates the advantage of the loss function in Friedrichs learning over the
naive least square loss function. Compared with traditional FEM methods, Friedrichs learning
performs as well as DGFEM with adaptive mesh when no prior knowledge about the discontinuous
location is known. Friedrichs learning is better than LSFEM with adaptive mesh when no prior
knowledge about the discontinuous location is known. In the future, it is interesting to develop
adaptive Friedrichs learning to further reduce the error or the network size.
Friedrichs Learning for Weak Solutions of PDEs 25

Acknowledgements. J. H. was partially supported by NSFC (Grant No. 12071289), the


Strategic Priority Research Program of Chinese Academy of Sciences (Grant No. XDA25010402)
and Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102). C. W. was
partially supported by National Science Foundation Award DMS-2136380 and DMS-2206333. H.
Y. was partially supported by the NSF Award DMS-2244988 and DMS-2206333, ONR N00014-23-
1-2007, and the NVIDIA GPU grant.

REFERENCES

[1] Robert A. Adams, Sobolev spaces, Pure and Applied Mathematics, Vol. 65, Academic Press [Harcourt Brace
Jovanovich, Publishers], New York-London, 1975.
[2] A. Al-Dujaili, S. Srikant, E. Hemberg, and U.-M. O’Reilly, On the application of danskin’s theorem to
derivative-free minimax problems, in AIP Conference Proceedings, vol. 2070, AIP Publishing LLC, 2019,
p. 020026.
[3] M. Anthony and P. L. Bartlett, Neural Network Learning: Theoretical Foundations, Cambridge University
Press, New York, NY, USA, 1st ed., 2009.
[4] N. Antonić and K. Burazin, Graph spaces of first-order linear partial differential operators, Math. Commun.,
14 (2009), pp. 135–155.
[5] , Intrinsic boundary conditions for Friedrichs systems, Comm. Partial Differential Equations, 35 (2010),
pp. 1690–1715.
[6] J.-P. Aubin, Applied functional analysis, Pure and Applied Mathematics (New York), Wiley-Interscience, New
York, second ed., 2000. With exercises by Bernard Cornet and Jean-Michel Lasry, Translated from the
French by Carole Labrousse.
[7] G. Bao, X. Ye, Y. Zang, and H. Zhou, Numerical solution of inverse problems by weak adversarial networks,
Inverse Problems, 36 (2020), pp. 115003, 31.
[8] A. R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Transactions
on Information theory, 39 (1993), pp. 930–945.
[9] C. Beck, S. Becker, P. Cheridito, A. Jentzen, and A. Neufeld, Deep splitting method for parabolic
PDEs, SIAM J. Sci. Comput., 43 (2021), pp. A3135–A3154.
[10] C. Beck, S. Becker, P. Grohs, N. Jaafari, and A. Jentzen, Solving the Kolmogorov PDE by means of
deep learning, J. Sci. Comput., 88 (2021), pp. Paper No. 73, 28.
[11] J. Berg and K. Nyström, A unified deep artificial neural network approach to partial differential equations
in complex geometries, Neurocomputing, 317 (2018), pp. 28–41.
[12] T. Bui-Thanh, L. Demkowicz, and O. Ghattas, A unified discontinuous Petrov-Galerkin method and its
analysis for Friedrichs’ systems, SIAM J. Numer. Anal., 51 (2013), pp. 1933–1958.
[13] W. Cai, X. Li, and L. Liu, A phase shift deep neural network for high frequency approximation and wave
problems, SIAM J. Sci. Comput., 42 (2020), pp. A3285–A3312.
[14] C. Daskalakis and I. Panageas, The limit points of (optimistic) gradient descent in min-max optimization,
in Proceedings of the 32Nd International Conference on Neural Information Processing Systems, NIPS’18,
USA, 2018, Curran Associates Inc., pp. 9256–9266.
[15] M. W. M. G. Dissanayake and N. Phan-Thien, Neural-network-based approximations for solving partial
differential equations, communications in Numerical Methods in Engineering, 10 (1994), pp. 195–201.
[16] W. E, J. Han, and A. Jentzen, Deep learning-based numerical methods for high-dimensional parabolic par-
tial differential equations and backward stochastic differential equations, Commun. Math. Stat., 5 (2017),
pp. 349–380.
[17] W. E, C. Ma, and L. Wu, Barron Spaces and the Compositional Function Spaces for Neural Network Models,
arXiv e-prints, arXiv:1906.08039 (2019).
[18] W. E and B. Yu, The deep Ritz method: a deep learning-based numerical algorithm for solving variational
problems, Commun. Math. Stat., 6 (2018), pp. 1–12.
[19] M. Ehrhardt and R. E. Mickens, A fast, stable and accurate numerical method for the Black-Scholes equation
of American options, Int. J. Theor. Appl. Finance, 11 (2008), pp. 471–501.
[20] A. Ern and J.-L. Guermond, Discontinuous Galerkin methods for Friedrichs’ systems. I. General theory,
SIAM J. Numer. Anal., 44 (2006), pp. 753–778.
[21] , Discontinuous Galerkin methods for Friedrichs’ systems. II. Second-order elliptic PDEs, SIAM J. Numer.
Anal., 44 (2006), pp. 2363–2388.
[22] , Discontinuous Galerkin methods for Friedrichs’ systems. III. Multifield theories with partial coercivity,
SIAM J. Numer. Anal., 46 (2008), pp. 776–804.
26 Friedrichs Learning for Weak Solutions of PDEs

[23] A. Ern, J.-L. Guermond, and G. Caplain, An intrinsic criterion for the bijectivity of Hilbert operators
related to Friedrichs’ systems, Comm. Partial Differential Equations, 32 (2007), pp. 317–341.
[24] K. O. Friedrichs, Symmetric positive linear differential equations, Comm. Pure Appl. Math., 11 (1958),
pp. 333–418.
[25] A. Gaikwad and I. M. Toke, Gpu based sparse grid technique for solving multidimensional options pricing
pdes, in Proceedings of the 2Nd Workshop on High Performance Computational Finance, WHPCF ’09, New
York, NY, USA, 2009, ACM, pp. 6:1–6:9.
[26] D. Gobovic and M. E. Zaghloul, Analog cellular neural network with application to partial differential
equations with variable mesh-size, in Proceedings of IEEE International Symposium on Circuits and Systems
- ISCAS ’94, vol. 6, May 1994, pp. 359–362 vol.6.
[27] P. Grisvard, Elliptic Problems in Nonsmooth Domains, Pitman, Boston, 1985.
[28] Y. Gu, C. Wang, and H. Yang, Structure probing neural network deflation, J. Comput. Phys., 434 (2021),
pp. Paper No. 110231, 21.
[29] Y. Gu, H. Yang, and C. Zhou, SelectNet: self-paced learning for high-dimensional partial differential equa-
tions, J. Comput. Phys., 441 (2021), pp. Paper No. 110444, 18.
[30] I. Gühring, G. Kutyniok, and P. Petersen, Error bounds for approximations with deep ReLU neural
networks in W s,p norms, Anal. Appl. (Singap.), 18 (2020), pp. 803–859.
[31] I. Gühring and M. Raslan, Approximation rates for neural networks with encodable weights in smoothness
spaces, Neural Networks, 134 (2021), p. 107–130.
[32] J. Han, A. Jentzen, and W. E, Solving high-dimensional partial differential equations using deep learning,
Proc. Natl. Acad. Sci. USA, 115 (2018), pp. 8505–8510.
[33] K. Hanada, T. Wada, and Y. Fujisaki, A restart strategy with time delay in distributed minimax optimization,
in Theory and Practice of Computation: Proceedings of Workshop on Computation: Theory and Practice
WCTP2017, World Scientific, 2019, pp. 89–100.
[34] R. Hassan, L. Mingrui, L. Qihang Lin, and Y. Tianbao, Non-convex min-max optimization: Provable
algorithms and applications in machine learning, ArXiv, abs/1810.02060 (2018).
[35] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[36] G. Hinton, N. Srivastava, and K. Swersky, Neural networks for machine learning lecture 6a overview of
mini-batch gradient descent, Cited on, 14 (2012), p. 2.
[37] S. Hon and H. Yang, Simultaneous neural network approximation for smooth functions, Neural Networks, 154
(2022), pp. 152–164.
[38] P. Houston, C. Schwab, and E. Süli, Stabilized hp-finite element methods for first-order hyperbolic problems,
SIAM J. Numer. Anal., 37 (2000), pp. 1618–1643.
[39] X. Hu, R. Shonkwiler, and M. C. Spruill, Random restarts in global optimization, (2009).
[40] J. Huang, H. Wang, and H. Yang, Int-Deep: a deep learning initialized iterative method for nonlinear
problems, J. Comput. Phys., 419 (2020), pp. 109675, 24.
[41] M. Hutzenthaler, A. Jentzen, T. Kruse, and T. A. Nguyen, A proof that rectified deep neural networks
overcome the curse of dimensionality in the numerical approximation of semilinear heat equations, Partial
Differ. Equ. Appl., 1 (2020), pp. Paper No. 10, 34.
[42] M. Hutzenthaler, A. Jentzen, T. Kruse, T. A. Nguyen, and P. von Wurstemberger, Overcoming the
curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations,
Proc. A., 476 (2020), pp. 20190630, 25.
[43] M. Hutzenthaler, A. Jentzen, and P. von Wurstemberger, Overcoming the curse of dimensionality
in the approximative pricing of financial derivatives with default risks, Electron. J. Probab., 25 (2020),
pp. Paper No. 101, 73.
[44] A. D. Jagtap and G. E. Karniadakis, Extended physics-informed neural networks (XPINNs): a generalized
space-time domain decomposition based deep learning framework for nonlinear partial differential equations,
Commun. Comput. Phys., 28 (2020), pp. 2002–2041.
[45] M. Jensen, Discontinuous Galerkin Methods for Friedrichs’ Systems with Irregular Solutions, Ph.D. thesis,
University of Oxford, Oxford, (2004).
[46] W. Joubert, On the convergence behavior of the restarted GMRES algorithm for solving nonsymmetric linear
systems, Numer. Linear Algebra Appl., 1 (1994), pp. 427–447.
[47] E. Kharazmi, Z. Zhang, and G. E. M. Karniadakis, hp-VPINNs: variational physics-informed neural
networks with domain decomposition, Comput. Methods Appl. Mech. Engrg., 374 (2021), pp. Paper No.
113547, 25.
[48] Y. Khoo, J. Lu, and L. Ying, Solving parametric PDE problems with artificial neural networks, European J.
Appl. Math., 32 (2021), pp. 421–435.
[49] D. P. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, arXiv e-prints, (2014).

You might also like