0% found this document useful (0 votes)

26 views24 pages

Data-Driven Finite Elements Methods

This document introduces data-driven finite element methods that can accurately approximate quantities of interest even on coarse meshes. The methods are obtained by training the parameters of a stable parametric Petrov-Galerkin finite element discretization using available data. An artificial neural network is used to define a family of test spaces, and the network parameters are tuned to minimize errors in quantities of interest. Numerical examples for diffusion and advection problems demonstrate the method achieves superior accuracy compared to standard finite elements on coarse meshes.

Uploaded by

Stephen Pan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views24 pages

Data-Driven Finite Elements Methods

Uploaded by

Stephen Pan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Data-Driven Finite Elements Methods:

Machine Learning Acceleration of

arXiv:2003.04485v1 [math.NA] 10 Mar 2020

Goal-Oriented Computations

I. Brevis∗, I. Muga†, and K.G. van der Zee‡

11th March, 2020

Abstract
We introduce the concept of data-driven finite element methods. These are finite-element
discretizations of partial differential equations (PDEs) that resolve quantities of interest
with striking accuracy, regardless of the underlying mesh size. The methods are obtained
within a machine-learning framework during which the parameters defining the method
are tuned against available training data. In particular, we use a stable parametric
Petrov–Galerkin method that is equivalent to a minimal-residual formulation using a
weighted norm. While the trial space is a standard finite element space, the test space
has parameters that are tuned in an off-line stage. Finding the optimal test space
therefore amounts to obtaining a goal-oriented discretization that is completely tailored
towards the quantity of interest. As is natural in deep learning, we use an artificial
neural network to define the parametric family of test spaces. Using numerical examples
for the Laplacian and advection equation in one and two dimensions, we demonstrate
that the data-driven finite element method has superior approximation of quantities of
interest even on very coarse meshes

Keywords Goal-oriented finite elements · Machine-Learning acceleration · Residual Mini-

mization · Petrov-Galerkin method · Weighted inner-products · Data-driven algorithms.
MSC 2020 41A65 · 65J05 · 65N15 · 65N30 · 65L60 · 68T07
∗
Pontificia Universidad Católica de Valparaı́so, Instituto de Matemáticas.
[email protected]
†
Pontificia Universidad Católica de Valparaı́so, Instituto de Matemáticas.
[email protected]
‡
University of Nottingham, School of Mathematical Sciences.
[email protected]

1
Contents
1 Introduction 2
1.1 Motivating example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Related literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Methodology 6
2.1 Abstract problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Main idea of the accelerated methods . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Analysis of the discrete method . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Equivalent Petrov-Galerkin formulation . . . . . . . . . . . . . . . . . . 9
2.3.2 Equivalent Minimal Residual formulation . . . . . . . . . . . . . . . . . 9

3 Implementational details 10
3.1 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Offline procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Online procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Numerical tests 12
4.1 1D diffusion with one QoI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 1D advection with one QoI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 1D advection with multiple QoIs . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 2D diffusion with one QoI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5 Conclusions 19

A Proof of Theorem 2.B 20

1 Introduction
In this paper we consider the data-driven acceleration of Galerkin-based discretizations, in
particular the finite element method, for the approximation of partial differential equations
(PDEs). The aim is to obtain approximations on meshes that are very coarse, but nevertheless
resolve quantities of interest with striking accuracy.
We follow the machine-learning framework of Mishra [27], who considered the data-driven
acceleration of finite-difference schemes for ordinary differential equations (ODEs) and PDEs.
In Mishra’s machine learning framework, one starts with a parametric family of a stable and
consistent numerical method on a fixed mesh (think of, for example, the θ-method for ODEs).
Then, a training set is prepared, typically by offline computations of the PDE subject to

2
Data-driven FEM: Machine learning acceleration of goal-oriented computations 3

a varying set of data values (initial conditions, boundary conditions, etc), using a standard
method on a (very) fine mesh. Accordingly, an optimal numerical method on the coarse grid
is found amongst the general family, by minimizing a loss function consisting of the errors in
quantities of interest with respect to the training data.
The objective of this paper is to extend Mishra’s machine-learning framework to finite
element methods. The main contribution of our work lies in the identification of a proper
stable and consistent general family of finite element methods for a given mesh that allows
for a robust optimization. In particular, we consider a parametric Petrov–Galerkin method,
where the trial space is fixed on the given mesh, but the test space has trainable parameters
that are to be determined in the offline training process. Finding this optimized test space
therefore amounts to obtaining a coarse-mesh discretization that is completely tailored for the
quantity of interest.
A crucial aspect for the stability analysis is the equivalent formulation of the parametric
Petrov–Galerkin method as a minimal-residual formulation using discrete dual norms. Such
techniques have been studied in the context of discontinuous Petrov–Galerkin (DPG) and
optimal Petrov–Galerkin methods; see for example the overview by Demkowicz & Gopalakr-
ishnan [8] (and also [29] for the recent Banach-space extension). A key insight is that we can
define a suitable test-space parametrization, by using a (discrete) trial-to-test operator for a
test-space norm based on a parametric weight function. This allows us to prove the stability
of the parametric minimal-residual method, and thus, by equivalence, proves stability for the
parametric Petrov–Galerkin method.
As is natural in deep learning, we furthermore propose to use an artificial neural network
for the weight function defining the test space in the Petrov–Galerkin method. The training
of the tuning parameters in the neural network is thus achieved by a minimization of a loss
function that is implicitly defined by the neural network (indeed via the weight function that
defines the test space, which in turn defines the Petrov-Galerkin approximation, which in turn
leads to a value for the quantity of interest).

1.1 Motivating example

To briefly illustrate our idea, let us consider a very simple motivating example. We consider
the following simple 1-D elliptic boundary-value problem:

−u00λ = δλ

in (0, 1),
0 (1)
uλ (0) = uλ (1) = 0,

where δλ denotes the usual Dirac’s delta distribution centered at the point λ ∈ (0, 1). The
quantity of interest (QoI) is the value uλ (x0 ) of the solution at some fixed point x0 ∈ (0, 1).
4 I. Brevis, I. Muga, and K.G. van der Zee

The standard variational formulation of problem (1) reads:

 1
 Find uλ ∈ H(0 (0, 1) such that:
Z 1
(2)
 u0λ v 0 = v(λ), 1
∀v ∈ H(0 (0, 1),
0

where H(01
(0, 1) := {v ∈ L2 (0, 1) : v 0 ∈ L2 (0, 1) ∧ v(0) = 0}. For the very coarse discrete
1
subspace Uh := Span{ψ} ⊂ H(0 (0, 1) consisting of the single linear trial function ψ(x) = x, the
usual Galerkin method approximating (2) delivers the discrete solution uh (x) = λx. However,
the exact solution to (1) is:

x if x ≤ λ,
uλ (x) = (3)
λ if x ≥ λ.
Hence, the relative error in the QoI for this case becomes:

|uλ (x0 ) − uh (x0 )| 1 − λ if x0 ≤ λ,
= (4)
|uλ (x0 )| 1 − x0 if x0 ≥ λ,

As may be expected for this very coarse approximation, the relative errors are large (and
actually never vanishes except in limiting cases).
Let us instead consider a Petrov–Galerkin method for (2), with the same trial space Uh ,
R1
but a special test space Vh , i.e., uh ∈ Uh := Span{ψ} such that 0 u0h vh0 = vh (λ), for all
vh ∈ Vh := Span{ϕ}. We use the parametrized test function ϕ(x) = θ1 x + e−θ2 (1 − e−θ1 x ),
which is motivated by the simplest artificial neural network; see Section 4.1 for details. By
varying the parameters θ1 , θ2 ∈ R, the errors in the quantity of interest can be significantly
reduced. Indeed, Figure 1 shows the relative error in the QoI, plotted as a function of the
θ1 -parameter, with the other parameter set to θ2 = −9, in the case of x0 = 0.1 and two values
of λ. When λ = 0.15 > 0.1 = x0 (left plot in Figure 1), the optimal value θ1 ≈ 48.5 delivers
a relative error of 0.575% in the quantity of interest. Notice that the Galerkin method has a
relative error > 80%. For λ = 0.05 < 0.1 = x0 (right plot in Figure 1), the value θ1 ≈ 13.9
actually delivers an exact approximation of the QoI, while the Galerkin method has a relative
error ≈ 90%.
This example illustrates a general trend that we have observed in our numerical test (see
Section 4): Striking improvements in quantities of interest are achieved using well-tuned test
spaces.

1.2 Related literature

Let us note that deep learning, in the form of artificial neural networks, has become extremely
popular in scientific computation in the past few years, a crucial feature being the capacity
of neural networks to approximate any continuous function [6]. While classical applications
concern classification and prediction for image and speech recognition [14, 24, 18], there have
Data-driven FEM: Machine learning acceleration of goal-oriented computations 5

(a) Relative error for λ = 0.15 (b) Relative error for λ = 0.05.

Figure 1: Relative error in the quantity of interest x0 = 0.1, for different values of θ1 .

been several new advances related to differential equations, either focussing on the data-driven
discovery of governing equations [34, 3, 31] or the numerical approximation of (parametric)
differential equations.
On the one hand, artificial neural networks can be directly employed to approximate a
single PDE solution, see e.g. [2, 23, 25], and in particular the recent high-dimensional Ritz
method [10]. On the other hand, in the area of model order reduction of differential equations,
there have been tremendous recent developments in utilizing machine learning to obtain the
reduced-order model for parametric models [19, 17, 33, 36, 22]. These developments are very
closely related to recent works that use neural networks to optimize numerical methods, e.g.,
tuning the turbulence model [26], slope limiter [32] or artificial viscosity [9].
The idea of goal-oriented adaptive (finite element) methods date back to the late 1990s,
see e.g., [1, 30, 28] for early works and analysis, and [13, 21, 38, 11, 16] for some recent
new developments. These methods are based on a different idea than the machine-learning
framework that we propose. Indeed, the classical goal-oriented methods aim to adaptively
refine the underlying meshes (or spaces) so as to control the error in the quantity of interest,
thereby adding more degrees of freedom at each adaptive step. In our framework, we train a
finite element method so as to control the error in the quantity of interest based on training
data for a parametric model. In particular, we do not change the number of degrees of freedom.

1.3 Outline
The contents of this paper are arranged as follows. Section 2 presents the machine-learning
methodology to constructing data-driven finite element methods. It also presents the stability
analysis of the discrete method as well as equivalent discrete formulations. Section 3 presents
several implementational details related to artificial neural networks and the training proce-
6 I. Brevis, I. Muga, and K.G. van der Zee

dure. Section 4 present numerical experiments for 1-D and 2-D elliptic and hyperbolic PDEs.
Finally, Section 5 contains our conclusions.

2 Methodology
2.1 Abstract problem
Let U and V be infinite dimensional Hilbert spaces spaces, with respective dual spaces U∗
and V∗ . Consider a boundedly invertible linear operator B : U → V∗ , a family of right-hand-
side functionals {`λ }λ∈Λ ⊂ V∗ that may depend non-affinely on λ, and a quantity of interest
functional q ∈ U∗ . Given λ ∈ Λ, the continuous (or infinite-dimensional) problem will be to
find uλ ∈ U such that:
Buλ = `λ , in V∗ , (5)
where the interest is put in the quantity q(uλ ). In particular, we consider the case when
hBu, viV∗ ,V := b(u, v), for a given bilinear form b : U × V → R. If so, problem (5) translates
into:
Find uλ ∈ U such that:
(6)
b(uλ , v) = `λ (v), ∀v ∈ V,
which is a type of problem that naturally arises in the context of variational formulations of
partial differential equations with multiple right-hand-sides or parametrized PDEs.1

2.2 Main idea of the accelerated methods

We assume that the space V can be endowed with a family of equivalent weighted inner
products {(·, ·)V,ω }ω∈W and inherited norms {k · kV,ω }ω∈W , without affecting the topology
given by the original norm k · kV on V. That is, for each ω ∈ W, there exist equivalence
constants C1,ω > 0 and C2,ω > 0 such that:

C1,ω kvkV,ω ≤ kvkV ≤ C2,ω kvkV,ω , ∀v ∈ V. (7)

Consider a coarse finite dimensional subspace Uh ⊂ U where we want to approximate the

solution of (6), and let Vh ⊂ V be a discrete test space such that dim Vh ≥ dim Uh . The
discrete method that we want to use to approach the solution uλ ∈ U of problem (6), is to
find (rh,λ,ω , uh,λ,ω ) ∈ Vh × Uh such that:
(
(rh,λ,ω , vh )V,ω + b(uh,λ,ω , vh ) = `λ (vh ) ∀vh ∈ Vh , (8a)
b(wh , rh,λ,ω ) =0 ∀wh ∈ Uh . (8b)
1
While parametrized bilinear forms bλ (·, ·) are also possible, they lead to quite distinct algorithmic details.
We therefore focus on parametrized right-hand sides and leave parametrized bilinear forms for future work.
Data-driven FEM: Machine learning acceleration of goal-oriented computations 7

System (8) corresponds to a residual minimization in a discrete dual norm that is equivalent
to a Petrov–Galerkin method. See Section 2.3 for equivalent formulations and analysis of this
discrete approach. In particular, the counterpart rh,λ,ω ∈ Vh of the solution of (8) is interpreted
as a minimal residual representative, while uh,λ,ω ∈ Uh is the coarse approximation of uλ ∈ U
that we are looking for.
Assume now that one has a reliable sample set of Ns ∈ N (precomputed) data
{(λi , q(uλi ))}N
i=1 , where q(uλi ) is either the quantity of interest of the exact solution of (6)
s

with λ = λi ∈ Λ, or else, a high-precision approximation of it. The main goal of this paper is
to find a particular weight ω ∗ ∈ W, such that for the finite sample of parameters {λi }N i=1 ⊂ Λ,
s

Ns
the discrete solutions {uh,λi ,ω∗ }i=1 ⊂ Uh of problem (8) with ω = ω ∗ , makes the errors in the
quantity of interest as smallest as possible, i.e.,
N s
1X
|q(uλi ) − q(uh,λi ,ω∗ )|2 → min . (9)
2 i=1

To achieve this goal we will work with a particular family of weights described by artificial
neural networks (ANN). The particular optimal weight ω ∗ will be trained using data-driven al-
gorithms that we describe in the following. Our methodology will be divided into an expensive
offline procedure (see Section 3.2) and an unexpensive online procedure (see Section 3.3).
In the offline procedure:

• A weight function ω ∗ ∈ W that minimizes (9) for a sample set of training data
{(λi , q(uλi ))}N
i=1 is obtained.
s

• From the matrix related with the discrete mixed formulation (8) using ω = ω ∗ , a static
condensation procedure is applied to condense-out the residual variable rh,λ,ω∗ . The
condensed matrices are store for the online procedure.

In the online procedure:

• The condensed mixed system (8) with ω = ω ∗ is solved for multiple right-hand-sides in
{`λ }λ∈Λ , and the quantities of interest {q(uh,λ,ω∗ )}λ∈Λ are directly computed as reliable
approximations of {q(uλ )}λ∈Λ .

2.3 Analysis of the discrete method

In this section we analyze the well-posedness of the discrete system (8), as well as equiva-
lent interpretations of it. The starting point will be always to assume well-posedness of the
continuous (or infinite-dimensional) problem (6), which we will establish below.

Theorem 2.A Let (U, k · kU ) and (V, k · kV ) be Hilbert spaces, and let k · kV,ω be the norm in-
herited from the weighted inner-product (·, ·)V,ω , which satisfies the equivalence (7). Consider
8 I. Brevis, I. Muga, and K.G. van der Zee

the problem (6) and assume the existence of constants Mω > 0 and γω > 0 such that:
|b(u, v)|
γω kukU ≤ sup ≤ Mω kukU , ∀u ∈ U. (10)
v∈V kvkV,ω
Furthermore, assume that for any λ ∈ Λ:
h`λ , viV∗ ,V = 0, ∀v ∈ V such that b(·, v) = 0 ∈ U∗ . (11)
Then, for any λ ∈ Λ, there exists a unique uλ ∈ U solution of problem (6).

Proof This result is classical. Using operator notation (see eq. (5)), condition (10) says that
the operator B : U → V∗ such that hBu, viV∗ ,V = b(u, v) is continuous, injective and has a
closed range. In particular, if uλ ∈ U exists, then it must be unique. The existence of uλ is
guaranteed by condition (11), since `λ is orthogonal to the kernel of B ∗ , which means that `λ
is in the range of B by the Banach closed range theorem.

Remark 2.1 Owing to the equivalence of norms (7), if (10) holds true for a particular weight
ω ∈ W, then it also holds true for the original norm k · kV of V, and for any other weighted
norm linked to the family of weights W.

The next Theorem 2.B establishes the well-posedness of the discrete mixed scheme (8).
Theorem 2.B Under the same assumptions of Theorem 2.A, let Uh ⊂ U and Vh ⊂ V be
finite dimensional subspaces such that dim Vh ≥ dim Uh , and such that the following discrete
inf-sup conditon is satisfied:
|b(uh , vh )|
sup ≥ γh,ω kuh kU , ∀uh ∈ Uh , (12)
vh ∈Vh kvh kV,ω
where γh,ω > 0 is the associated discrete inf-sup constant. Then, the mixed system (8) has a
unique solution (rh,λ,ω , uh,λ,ω ) ∈ Vh × Uh . Moreover, uh,λ,ω satifies the a priori estimates:
Mω Mω
kuh,λ,ω kU ≤ kuλ kU and kuλ − uh,λ,ω kU ≤ inf kuλ − uh kU . (13)
γh,ω γh,ω uh ∈Uh

Proof See Appendix A.

Remark 2.2 It is straightforward to see, using the equivalences of norms (7), that having the
discrete inf-sup condition in one weighted norm k · kV,ω is fully equivalent to have the discrete
inf-sup condition in the original norm of V, and also to have the discrete inf-sup condition in
another weighted norm linked to the family of weights W. If (12) holds true for any weight
of the family W (or for the original norm of V) we say that the the discrete pairing Uh -Vh is
compatible.

Remark 2.3 (Influence of the weight) In general, to make the weight ω ∈ W influence
the mixed system (8), we need dim Vh > dim Uh . In fact, the case dim Vh = dim Uh is not
interesting because equation (8b) becomes a square system and one would obtain rh,λ,ω = 0
from it, thus recovering a standard Petrov-Galerkin method without any influence of ω.
Data-driven FEM: Machine learning acceleration of goal-oriented computations 9

2.3.1 Equivalent Petrov-Galerkin formulation

For any weight ω ∈ W, consider the trial-to-test operator Tω : U → V such that:

(Tω u, v)V,ω = b(u, v) , ∀u ∈ U, ∀v ∈ V. (14)

Observe that for any u ∈ U, the vector Tω u ∈ V is nothing but the Riesz representative of the
functional b(u, ·) ∈ V∗ under the weighted inner-product (·, ·)V,ω .
Given a discrete subspace Uh ⊂ U, the optimal test space paired with Uh , is defined as
Tω Uh ⊂ V. The concept of optimal test space was introduced by [7] and its main advantage is
that the discrete pairing Uh -Tω Uh (with equal dimensions) satisfies automatically the inf-sup
condition (12), with inf-sup constant γω > 0, inherited from the stability at the continuous
level (see eq. (10)).
Of course, equation (14) is infinite dimensional and thus not solvable in the general case.
Instead, having the discrete finite-dimensional subspace Vh ⊂ V, we can define the discrete
trial-to-test operator Th,ω : U → Vh such that:

(Th,ω u, vh )V,ω = b(u, vh ) , ∀u ∈ U, ∀vh ∈ Vh . (15)

Observe now that the vector Th,ω u ∈ Vh corresponds to the orthogonal projection of Tω u into
the space Vh , by means of the weighted inner-product (·, ·)V,ω . This motivates the definition
of the projected optimal test space (of the same dimension of Uh ) as Vh,ω := Th,ω Uh (cf. [4]).
It can be proven that if the discrete pairing Uh -Vh satisfies the inf-sup condition (12), then
the discrete pairing Uh -Vh,ω also satisfies the inf-sup conditon (12), with the same inf-sup
constant γh,ω > 0. Moreover, the solution uh,λ,ω ∈ Uh of the mixed system (8) is also the
unique solution of the well-posed Petrov-Galerkin scheme:

b(uh,λ,ω , vh ) = `λ (vh ), ∀vh ∈ Vh,ω . (16)

Indeed, from equation (8b), for any vh = Th,ω wh ∈ Vh,ω ⊂ Vh , we obtain that

(rh,λ,ω , vh )V,ω = (rh,λ,ω , Th,ω wh )V,ω = b(wh , rh,λ,ω ) = 0,

which upon being replaced in equation (8a) of the mixed system gives (16). We refer to [4,
Proposition 2.2] for further details.

2.3.2 Equivalent Minimal Residual formulation

Let Uh ⊂ U and Vh ⊂ V as in Theorem 2.B, and consider the following discrete-dual residual
minimization:
| h · , vh iV∗ ,V |
uh,λ,ω = argmin k`λ (·) − b(wh , ·)k(Vh )∗ω , where k · k(Vh )∗ω := sup .
wh ∈Uh vh ∈Vh kvh kV,ω
10 I. Brevis, I. Muga, and K.G. van der Zee

Let Rω,Vh : Vh → (Vh )∗ be the Riesz map (isometry) linked to the weighted inner-product
(·, ·)V,ω , that is:
hRω,Vh vh , · i(Vh )∗ ,Vh = (vh , · )V,ω , ∀vh ∈ Vh .
−1
Defining the minimal residual representative rh,λ,ω := Rω,V h
(`λ (·) − b(uh,λ,ω , ·)), we observe
that the couple (rh,λ,ω , uh,λ,ω ) ∈ Vh × Uh is the solution of the mixed system (8). Indeed,
rh,λ,ω ∈ Vh satisfy:

(rh,λ,ω , vh )V,ω = `λ (vh ) − b(uh,λ,ω , vh ) , ∀vh ∈ Vh ,

which is nothing but equation (8a) of the mixed system. On the other hand, using the isometric
property of Rω,Vh we have:
2
uh,λ,ω = argmin k`λ (·) − b(wh , ·)k2(Vh )∗ω = argmin Rω,V
−1
h
(`λ (·) − b(wh , ·))
V,ω
.
wh ∈Uh wh ∈Uh

Differentiating the norm k · kV,ω and using first-order optimality conditions we obtain:
−1 −1

0 = Rω,Vh
(`λ (·) − b(uh,λ,ω , ·)), Rω,Vh
b(wh , ·) V,ω
= b(wh , rh,λ,ω ), ∀wh ∈ Uh ,

which gives equation (8b).

3 Implementational details
3.1 Artificial Neural Networks
Roughly speaking, an artificial neural network is obtained from compositions and superposi-
tions of a single, simple nonlinear activation or response function [6]. Namely, given an input
xin ∈ Rd and an activation function σ, an artificial neural network looks like:

ANN(xin ) = Θn σ(· · · σ(Θ2 σ(Θ1 xin + φ1 ) + φ2 ) · · · ) + φn , (17)

where {Θj }nj=1 are matrices (of different size) and {φj }nj=1 are vectors (of different length)
of coefficients to be determined by a “training” procedure. Depending on the application,
an extra activation function can be added at the end. A classical activation function is the
logistic sigmoid function:
1
σ(x) = . (18)
1 + e−x
Other common activation functions used in artificial neural network applications are the rec-
tified linear unit (ReLU), the leaky ReLU, and the hyperbolic tangent (see, e.g.[5, 37]).
The process of training an artificial neural network as (17) is performed by the minimization
of a given functional J(Θ1 , φ1 , Θ2 , φ2 , . . . , Θn , φn ). We search for optimal sets of parameters
{Θ∗j }nj=1 and {φ∗j }nj=1 minimizing the cost functional J. For simplicity, in what follows we
Data-driven FEM: Machine learning acceleration of goal-oriented computations 11

will denote all the parameters of an artificial neural network by θ ∈ Φ, for a given set Φ of
admissible parameters. A standard cost functional is constructed with a sample training set
of known values {x1 , x2 , . . . , xNs } and its corresponding labels {y1 , y2 , . . . , yNs } as follows:
sN
1X
J(θ) = (yi − F (ANN(xi ; θ)))2 ,
2 i=1

(for some real function F ) which is known as supervised learning [14]. Training an artificial
neural network means to solve the following minimization problem:

θ∗ = argmin J(θ). (19)

θ∈Φ

Thus, the artificial neural network evaluated in the optimal θ∗ (i.e., ANN(x; θ∗ )) is the trained
network. There are many sophisticated tailor-made procedures to perform the minimization
in (19) efficiently. The reader may refer to [35] for inquiring into this topic, which is out of
the scope of this paper.

3.2 Offline procedures

The first step is to choose an artificial neural network ANN(· ; θ) that will define a family W of
positive weight-functions to be used in the weighted inner products {(·, ·)V,ω }ω∈W . Typically
we have:
W = {ω(·) = g(ANN(· ; θ)) : θ ∈ Φ},
where g is a suitable positive an bounded continuous function.
Next, given a discrete trial-test pairing Uh -Vh satisfying (12), we construct the map W ×
Λ 3 (ω, λ) 7→ q(uh,λ,ω ) ∈ R, where uh,λ,ω ∈ Uh is the second component of the solution the
mixed system (8). Having coded this map, we proceed to train the ANN by computing:
 n
1X
∗
θ = argmin |q(uh,λi ,ω ) − q(uλi )|2 ,





 θ∈Φ 2 i=1
(20)

  ω(·) = g(ANN(· ; θ))



 s.t. (rh,λi ,ω , vh )V,ω + b(uh,λi ,ω , vh ) = `λi (vh ), ∀ vh ∈ Vh ,
∀ wh ∈ Uh .

b(wh , rh,λi ,ω ) = 0,


The last step is to build the matrices of the linear system needed for the online phase.
Denote the basis of Uh by {ψ1 , ..., ψn }, and the basis of Vh by {ϕ1 , ..., ϕm } (recall that m > n).
Having θ∗ ∈ Φ approaching (20), we extract from the mixed system (8) the matrices A ∈ Rm×m
and B ∈ Rm×n such that:

Aij = (ϕj , ϕi )V,ω∗ and Bij = b(ψj , ϕi ),

12 I. Brevis, I. Muga, and K.G. van der Zee

where ω ∗ (·) = g(ANN(· ; θ∗ )). Finally, we store the matrices B T A−1 B ∈ Rn×n and B T A−1 ∈
Rn×m to be used in the online phase to compute directly uh,λ,ω∗ ∈ Uh for any right hand side
`λ ∈ V∗ . Basically, we have condensed-out the residual variable of the mixed system (8), since
it is useless for the application of the quantity of interest q ∈ U∗ . In addition, it will be also
important to store the vector Q ∈ Rn such that:

Qj := q(ψj ) , j = 1, ..., n.

3.3 Online procedures

For each λ ∈ Λ for which we want to obtain the quantity of interest q(uh,λ,ω∗ ), we first compute
the vector Lλ ∈ Rm such that its i-th component is given by:

(Lλ )i = h`λ , ϕi iV∗ ,V ,

where ϕi is the i-th vector of in the basis of Vh . Next, we compute

q(uh,λ,ω∗ ) = QT (B T A−1 B)−1 B T A−1 Lλ .

Observe that the matrix QT (B T A−1 B)−1 B T A−1 can be fully obtained and stored from the
previous offline phase (see Section 3.2).

4 Numerical tests
In this section, we show some numerical examples in 1D and 2D to investigate the main
features of the proposed data-driven finite element method. In particular, we consider in the
following order: 1D diffusion, 1D advection, 1D advection with multiple QoIs, and finally 2D
diffusion.

4.1 1D diffusion with one QoI

We recover here the motivational example from the introduction (see Section 1.1). Consider
1
the variational formulation (2), with trial and test spaces U = V = H(0 (0, 1). We endowed V
with the weighted inner product:
Z 1
(v1 , v2 )V,ω := ω v10 v20 , ∀v1 , v2 ∈ V.
0

As in the introduction, we consider the simplest coarse discrete trial space Uh := Span{ψ} ⊂ U,
where ψ(x) = x. The optimal test function (see Section 2.3.1), paired with the trial function
ψ, is given by ϕ := Tω ψ ∈ V, which is the solution of (14) with u = ψ. Hence,
Z x
1
ϕ(x) = ds. (21)
0 ω(s)
Data-driven FEM: Machine learning acceleration of goal-oriented computations 13

Let us consider the Petrov-Galerkin formulation with optimal test functions, which is
equivalent to the mixed system (8) in the optimal case Vh = V. Consequently, the Petrov-
Galerkin scheme with trial function ψ and optimal test function ϕ, delivers the discrete solution
uh,λ,ω (x) = xϕ(λ)/ϕ(1) (notice that the trivial weight ω ≡ 1 recovers the test function ϕ = ψ,
and therefore the standard Galerkin approach).
Recalling the exact solution (3), we observe that the relative error in the quantity of interest
for our Petrov-Galerkin approach is:

 1 − ϕ(λ) if x0 ≤ λ,
ϕ(1)
Err = (22)
 1 − x0 ϕ(λ) if x0 ≥ λ.

λ ϕ(1)

Of course, any function such that ϕ(λ) = ϕ(x0 ) 6= 0 for λ ≥ x0 , and ϕ(λ) = λϕ(x0 )/x0 for
λ ≤ x0 , will produce zero error for all λ ∈ (0, 1). Notice that such a function indeed exists,
and in this one-dimensional setting it solves the adjoint problem:
 1
 Find z ∈ H(0 (0, 1) such that:
Z 1
 w0 z 0 = w(x0 ), ∀w ∈ H(01
(0, 1).
0

This optimal test function is also obtained in our framework via (21), by using a limiting
weight of the form:
c if x < x0 ,
ω(x) → (23)
+∞ if x > x0 ,
for some constant c > 0. Hence, the Petrov–Galerkin method using a test function of the
form (21) has sufficient variability to eliminate any errors for any λ!
We now restrict the variability by parametrizing ω. In the motivating example given
in Section 1.1, for illustration reasons we chose a weight of the form ω(x) = σ(θ1 x + θ2 ),
which corresponds to the most simple artificial neural network, having only one hidden layer
with one neuron. We now select a slightly more complex family of weights having the form
ω(x) = exp(ANN(x; θ)) > 0, where
5
X
ANN(x; θ) = θj3 σ(θj1 x + θj2 ). (24)
j=1

Observe that ANN(x; θ) corresponds to an artificial neural network of one hidden layer with
five neurons (see Section 3.1).
The training set of parameters has been chosen as λi = 0.1i, with i = 1, ..., 9. For compar-
isons, we perform three different experiments. The first experiment trains the network (24)
based on a cost functional that uses the relative error formula (22), where the optimal test func-
tion ϕ is computed using eq. (21). The other two experiments use the training approach (20),
with discrete spaces Vh consisting of conforming piecewise linear functions over uniform meshes
14 I. Brevis, I. Muga, and K.G. van der Zee

(a) λ = 0.35 (b) λ = 0.75

Figure 2: Discrete solutions computed using the optimal test function approach (blue line),
and discrete mixed form approach (8) with different discrete test spaces Vh (red and yellow
lines). Dotted line shows the QoI location.

of 4 and 16 elements respectively. The quantity of interest has been set to x0 = 0.6, which
does not coincide with a node of the discrete test spaces. Figure 2 shows the obtained discrete
solutions uh,λ,ω∗ for each experiment, and for two different values of λ. Figure 3a shows the
trained weight obtained for each experiment (cf. eq. (23)), while Figure 3b depicted the asso-
ciated optimal and projected-optimal test functions linked to those trained weights. Finally,
Figure 3c shows the relative errors in the quantity of interest for each discrete solution in
terms of the λ parameter.
It can be observed that the trained method using a parametrized weight function based
on (24), while consisting of only one degree of freedom, gives quite accurate quantities of
interest for the entire range of λ. This should be compared to the O(1) error for standard
Galerkin given by (4). We note that some variation can be observed depending on whether
the optimal or a projected optimal test function is used (with a richer Vh being better).

4.2 1D advection with one QoI

Consider the family of ODEs:
u0 = fλ in (0, 1),

(25)
u(0) = 0,
for a family of continuous functions {fλ }λ∈[0,1] given by fλ (x) := (x − λ)1[λ,1] (x), where 1[λ,1]
denote the characteristic function of the interval [λ, 1]. The exact solution of (25) will be used
as a reference solution and is given by uλ (x) = 12 (x − λ)2 1[λ,1] (x). The quantity of interest
considered for this example will be qx0 (uλ ) := uλ (x0 ), where x0 could be any value in [0, 1].
Data-driven FEM: Machine learning acceleration of goal-oriented computations 15

(a) Trained weights (b) Optimal test functions (c) Relative errors in QoI

Figure 3: Trained weights, optimal (and projected-optimal) test functions, and relative errors
computed with three different approaches. Dotted line shows the QoI location.

Let us consider the following variational formulation of problem (25):


 Find uλ ∈ U such that:
Z 1 Z 1
0
 b(uλ , v) := uλ v = fλ v =: `λ (v), ∀v ∈ V,
0 0
1
where U := H(0 (0, 1) := {u ∈ L2 (0, 1) : u0 ∈ L2 (0, 1) ∧ u(0) = 0}, and V := L2 (0, 1) is endowed
with the weighted inner-product:
Z 1
(v1 , v2 )V,ω := ω v1 v2 , ∀v1 , v2 ∈ V.
0

We want to approach this problem using coarse discrete trial spaces Uh ⊂ U of piecewise linear
polynomials on a partition of one, two and three elements.
We describe the weight ω(x) by the sigmoid of an artificial neural network that depends on
parameters θ, i.e., ω(x) = σ(ANN(x; θ)) > 0 (see Section 3.1). In particular, we use the artifi-
cial neural network given in (24). To train such a network, we consider a training set {λi }9i=1 ,
where λi = 0.125(i − 1), together with the set of exact quantities of interest {qx0 (uλi )}9i=1 ,
computed using the reference exact solution with x0 = 0.9. The training procedure uses the
constrained minimization problem (20), where for each low-resolution trial space Uh (based on
one, two and three elements), the same discrete test space Vh has been used: a high-resolution
space of piecewise linear and continuous functions linked to a uniform partition of 128 ele-
ments. The minimization algorithm has been stopped until the cost functional reaches the
tolerance tol= 9 · 10−7 .
After an optimal parameter θ∗ has been found (see (20)), we follow the matrix procedures
described in Section 3.2 and Section 3.3 to approach the quantity of interest of the discrete
solution for any λ ∈ [0, 1].
16 I. Brevis, I. Muga, and K.G. van der Zee

(a) One element (b) Two elements (c) Three elements

Figure 4: Petrov-Galerkin solution with projected optimal test functions with trained weight.
Dotted line shows the QoI location (0.9) and parameter value is λ = 0.19.

(a) One DoF (b) Two DoF (c) Three DoF

Figure 5: Absolute error between QoI of exact and approximate solutions for different λ values.

Figures 4 and 5 show numerical experiments considering model problem (25) in three
different trial spaces. Figure 4 shows, for λ = 0.19, the exact solution and the Petrov-Galerkin
solution computed with projected optimal test functions given by the trained weighted inner-
product. Notice that for the three cases (with one, two, and three elements) the Petrov-
Galerkin solution intends to approximate the quantity of interest (dotted line).
Figure 5 displays the QoI error |qx0 (uλ ) − qx0 (uλ,h,ω∗ )| for different values of λ ∈ [0, 1].
When the ANN-training stops at a cost functional smaller than tol= 9 · 10−7 , the QoI error
remains smaller than 10−3 for all λ ∈ [0.1]. In particular, Figure 5a shows that even in the
simplest case of one-degree of freedom, it is possible to get reasonable approximations of the
QoI for the entire range of λ.
Data-driven FEM: Machine learning acceleration of goal-oriented computations 17

(a) Three elements (b) Four elements (c) Five elements

Figure 6: Petrov-Galerkin solution with projected optimal test functions with trained weight.
Dotted lines show the QoI locations (0.3 and 0.7) and parameter value is λ = 0.2.

4.3 1D advection with multiple QoIs

This example is based on the same model problem of Section 4.2, but now we intend to
approach two quantities of interest simultaneously: q1 (uλ ) := uλ (x1 ) and q2 (uλ ) := uλ (x2 ),
where x1 , x2 ∈ [0, 1] are two different values. We also have considered now discrete trial spaces
based on three, four and five elements. The training routine has been modified accordingly,
and is driven now by the following minimization problem:

Ns
 θ∗ = argmin 1
 X
|q1 (uh,λi ,ω ) − q1 (uλi )|2 + |q2 (uh,λi ,ω ) − q2 (uλi )|2 ,



 θ∈Φ 2
 i=1

  ω(·) = σ(ANN(· ; θ)).
subject to: (rh,λi ,ω , vh )V,ω + b(uh,λi ,ω , vh ) = `λi (vh ), ∀ vh ∈ Vh ,





∀ wh ∈ Uh .

 b(wh , rh,λi ,ω ) = 0,

For this example, we consider a training set of size Ns = 12 with λi = (i − 1)/11, for all
i = 1, . . . , 12. The weight ω(x) will be described by the sigmoid of an artificial neural network
that depends on one single hidden layer and Nn = 6 hidden neurons. Numerical results are
depicted in Figures 6 and 7 for x1 = 0.3 and x2 = 0.7. Accurate values of both QoIs are
obtained for the entire range of λ. These results are roughly independent of the size of the
trial space.

4.4 2D diffusion with one QoI

Consider the two-dimensional unit square Ω = [0, 1] × [0, 1] and the family of PDEs:

−∆u = fλ in Ω,
(26)
u = 0 over ∂Ω,
18 I. Brevis, I. Muga, and K.G. van der Zee

(a) Three DoF (b) Four DoF (c) Five DoF

Figure 7: Absolute error between QoI of exact and approximate solutions for different λ values,
for each QoI q1 (u) = u(0.3) and q2 (u) = u(0.7).

where the family of functions {fλ }λ∈(0,1) is described by the formula:

fλ (x1 , x2 ) = 2π 2 (1 + λ2 ) sin(πx1 ) sin(λπx1 ) sin(πx2 ) sin(λπx2 )−

2λπ 2 [cos(πx1 ) cos(λπx1 ) sin(πx2 ) sin(λπx2 )+
sin(πx1 ) sin(λπx1 ) cos(πx2 ) cos(λπx2 )].

Accordingly, the reference exact solution of (26) is:

uλ (x) = sin(πx1 ) sin(λπx1 ) sin(πx2 ) sin(λπx2 ).

The quantity of interest chosen for this example will be the average
Z
1
q(uλ ) := uλ (x) dx, (27)
|Ω0 | Ω0

with Ω0 := [0.79 , 0.81] × [0.39 , 0.41] ⊂ Ω (see Figure 8).

The variational formulation of problem (26) will be:

 Find uλ ∈ U Z such that: Z
 b(uλ , v) := ∇uλ · ∇v = fλ v =: `λ (v), ∀v ∈ V,
Ω Ω

where U = V = H01 (Ω), and V is endowed with the weighted inner-product:

Z
(v1 , v2 )V,ω := ω ∇v1 · ∇v2 , ∀v1 , v2 ∈ V.
Ω

As in the previous example, the weight is going to be determined using an artificial neural
network so that ω(x1 , x2 ) = σ(ANN(x1 , x2 ; θ)). Such a network is composed by one single
Data-driven FEM: Machine learning acceleration of goal-oriented computations 19

(a) One DoF (b) Five DoF (c) Eight DoF

Figure 8: Meshes considered for the discrete trial space Uh . The black square represent the
quantity of interest location Ω0 = [0.79 , 0.81] × [0.39 , 0.41].

hidden layer, and Nn = 5 hidden neurons. Hence, θ contains 20 parameters to estimate, i.e.,
Nn
X
ANN(x1 , x2 ; θ) = θj4 σ(θj1 x1 + θj2 x2 + θj3 ).
j=1

To train the ANN, we use the inputs {λi }9i=1 , where λi = 0.125(i − 1), and its corresponding
quantities of interest {q(uλi )}9i=1 , by means of equation (27). Again, the training procedure
is based on the constrained minimization (20). For the experiments, we use coarse discrete
trial spaces Uh having one, five, and eight degrees of freedom respectively (see Figure 8). In
each case, the test space Vh has been set to be a piecewise quadratric conforming polynomial
space, over a uniform triangular mesh of 1024 elements. The minimization algorithm (20)
stops when a tolerance tol= 9 · 10−7 is reached.
The errors on the QoI are depicted in Figure 9 for each trial space under consideration,
and show relative errors below 10−3 for the entire range of λ.

5 Conclusions
In this paper, we introduced the concept of data-driven finite element methods. These methods
are tuned within a machine-learning framework, and are tailored for the accurate computation
of output quantities of interest, regardless of the underlying mesh size. In fact, on coarse
meshes, they deliver significant improvements in the accuracy compared to standard methods
on such meshes.
We presented a stability analysis for the discrete method, which can be regarded as a
Petrov–Galerkin scheme with parametric test space that is equivalent to a minimal-residual
formulation measuring the residual in a (discrete) dual weighted norm. Numerical examples
20 I. Brevis, I. Muga, and K.G. van der Zee

(a) One DoF (b) Five DoF (c) Eight DoF

Figure 9: Absolute error between QoI of exact and approximate solutions for different λ values.

were presented for elementary one- and two-dimensional elliptic and hyperbolic problems, in
which weight functions are tuned that are represented by artificial neural networks with up
to 20 parameters.
Various extensions of our methodology are possible. While we only focussed on linear
quantities of interest, nonlinear ones can be directly considered. Also, it is possible to consider
a dependence of the bilinear form on λ, however, this deserves a completely separate treatment,
because of the implied λ-dependence of the trial-to-test map and the B-matrix. An open
problem of significant interest is the dependence of the performance of the trained method
on the richness of the parametrized weight function. While we showed that in the simplest
example of 1-D diffusion with one degree of freedom, the weight function allows for exact
approximation of quantities of interest, it is not at all clear if this is valid in more general
cases, and what the effect is of (the size of) parametrization.

A Proof of Theorem 2.B

The mixed scheme (8) has a classical saddle point structure, which is uniquely solvable since
the top left bilinear form is an inner-product (therefore coercive) and b(·, ·) satisfies the discrete
inf-sup condition (12) (see, e.g., [12, Proposition 2.42]).
The a priori estimates (13) are very well-known in the residual minimization FEM literature
(see, e.g., [15, 4, 29]). However, for the sake of completeness, we show here how to obtain
them. Let vh,λ,ω ∈ Vh be such that (cf. (15)):
(vh,λ,ω , vh )V,ω = b(uh,λ,ω , vh ), ∀vh ∈ Vh . (28)
In particular, combining eq. (28) with eq. (8b) of the mixed scheme, we get the orthogonality
property:
(vh,λ,ω , rh,λ,ω )V,ω = b(uh,λ,ω , rh,λ,ω ) = 0 . (29)
Data-driven FEM: Machine learning acceleration of goal-oriented computations 21

To get the first estimate observe that:

For the second estimate we define the projector P : U → Uh , such that P u ∈ Uh cor-
responds to the second component of the solution of the mixed system (8) with right hand
side b(u, ·) ∈ V∗ in (8a). We easily check that P is a bounded linear proyector satisfying
P 2 = P 6= 0, I, and kP k ≤ Mω /γh,ω . Hence, from Kato’s identity kI − P k = kP k for Hilbert
space projectors [20], we get for any wn ∈ Un :

kuλ − uh,λ,ω kU = k(I − P )uλ kU = k(I − P )(uλ − wh )kU ≤ kP kkuλ − wh kU .

Thus, the a priori error estimate follows using the bound of kP k and taking the infimum over
all wh ∈ Uh .

References
[1] R. Becker and R. Rannacher, An optimal control approach to a posteriori error
estimation in finite element methods, Acta Numer., 10 (2001), pp. 1–102.

[2] J. Berg and K. Nyström, A unified deep artificial neural network approach to partial
differential equations in complex geometries, Neurocomputing, 317 (2018), pp. 28–41.

[3] J. Berg and K. Nyström, Data-driven discovery of PDEs in complex datasets, Journal
of Computational Physics, 384 (2019), pp. 239–252.

[4] D. Broersen and R. Stevenson, A robust Petrov–Galerkin discretisation of

convection–diffusion equations, Comput. Math. Appl., 68 (2014), pp. 1605–1618.
22 I. Brevis, I. Muga, and K.G. van der Zee

[5] J.-T. Chien, Source separation and machine learning, Academic Press, 2018.

[6] G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of

Control, Signals, and Systems, 2 (1989), pp. 303–314.

[7] L. Demkowicz and J. Gopalakrishnan, A class of discontinuous Petrov-Galerkin

methods. II. Optimal test functions, Numerical Methods for Partial Differential Equations,
27 (2010), pp. 70–105.

[8] L. Demkowicz and J. Gopalakrishnan, An overview of the discontinuous Petrov

Galerkin method, in Recent Developments in Discontinuous Galerkin Finite Element
Methods for Partial Differential Equations: 2012 John H Barrett Memorial Lectures,
X. Feng, O. Karakashian, and Y. Xing, eds., vol. 157 of The IMA Volumes in Mathemat-
ics and its Applications, Springer, Cham, 2014, pp. 149–180.

[9] N. Discacciati, J. S. Hesthaven, and D. Ray, Controlling oscillations in high-

order Discontinuous Galerkin schemes using artificial viscosity tuned by neural networks,
To appear in J. Comput. Phys (2020), (2020).

[10] W. E and B. Yu, The deep Ritz method: A deep learning-based numerical algorithm
for solving variational problems, Commun. Math. Stat., 6 (2018), pp. 1–12.

[11] B. Endtmeyer and T. Wick, A partition-of-unity dual-weighted residual approach

for multi-objective goal functional error estimation applied to elliptic problems, Comput.
Methods Appl. Math., 17 (2017), pp. 575–600.

[12] A. Ern and J.-L. Guermond, Theory and practice of finite elements, Springer, New-
York, 2004.

[13] M. Feischl, D. Praetorius, and K. G. van der Zee, An abstract analysis of

optimal goal-oriented adaptivity, SIAM J. Numer. Anal., 54 (2016), pp. 1423–1448.

[14] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, The MIT Press,
2019.

[15] J. Gopalakrishnan and W. Qiu, An analysis of the practical DPG method, Math.
Comp., 83 (2014), pp. 537–552.

[16] B. Hayhurst, M. Keller, C. Rai, X. Sun, and C. R. Westphal, Adaptively

weighted least squares finite element methods for partial differential equations with singu-
larities, Comm. Applied Math. and Comput. Science, 13 (2018), pp. 1–25.

[17] J. S. Hesthaven and S. Ubbiali, Non-intrusive reduced order modeling of nonlinear

problems using neural networks, J. Comput. Phys., 363 (2018), pp. 55–78.
Data-driven FEM: Machine learning acceleration of goal-oriented computations 23

[18] C. F. Higham and D. J. Higham, Deep learning: An introduction for applied mathe-
maticians, SIREV, 61 (2019), pp. 860–891.

[19] A. C. Ionita and A. C. Antoulas, Data-driven parametrized model reduction in the

Lowner framework, SIAM Journal of Scientific Computing, 36 (2014), pp. A984–A1007.

[20] T. Kato, Estimation of iterated matrices with application to von Neumann condition,
Numer. Math., 2 (1960), pp. 22–29.

[21] K. Kergrene, S. Prudhomme, L. Chamoin, and M. Laforest, A new goal-

oriented formulation of the finite element method, Comput. Methods Appl. Mech. Engrg.,
327 (2017), pp. 256–276.

[22] G. Kutyniok, P. Petersen, M. Raslan, and R. Schneider, A theoretical analysis

of deep neural networks and parametric PDEs, arXiv preprint, arXiv:1904.00377v2 (2019).

[23] I. Lagaris, A. Likas, and D. Fotiadis, Artificial neural networks for solving ordi-
nary and partial differential equations, IEEE Transactions on Neural Networks, 9 (1998),
pp. 987–1000.

[24] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521 (2015), pp. 436–
444.

[25] H. Lee and I. S. Kang, Neural algorithm for solving differential equations, Journal of
Computational Physics, 91 (1990), pp. 110–131.

[26] J. Ling, A. Kurzawski, and J. Templeton, Reynolds averaged turbulence modelling

using deep neural networks with embedded invariance, J. Fluid Mech., 807 (2016), pp. 155–
166.

[27] S. Mishra, A machine learning framework for data driven acceleration of computations
of differential equations, Mathematics in Engineering, 1 (2018), pp. 118–146.

[28] M. S. Mommer and R. Stevenson, A goal-oriented adaptive finite element method

with convergence rates, SIAM J. Numer. Anal., 47 (2009), pp. 861–886.

[29] I. Muga and K. G. van der Zee, Discretization of linear problems in Banach
spaces: Residual minimization, nonlinear Petrov–Galerkin, and monotone mixed methods.
http://arxiv.org/abs/1511.04400, 2018.

[30] J. T. Oden and S. Prudhomme, Goal-oriented error estimation and adaptivity for the
finite element method, Comput. Math. Appl., 41 (2001), pp. 735–756.

[31] M. Raissi and G. E. Karniadakis, Hidden physics models: Machine learning of

nonlinear partial differential equations, J. Comput. Phys., 357 (2018), pp. 125–141.
24 I. Brevis, I. Muga, and K.G. van der Zee

[32] D. Ray and J. S. Hesthaven, An artificial neural network as a troubled-cell indicator,

J. Comput. Phys., 367 (2018), pp. 166–191.

[33] F. Regazzoni, L. Dedè, and A. Quarteroni, Machine learning for fast and reliable
solution of time-dependent differential equations, Journal of Computational Physics, 397
(2019), p. 108852.

[34] S. H. Rudy, S. L. Brunton, J. L. Proctor, and J. N. Kutz, Data-driven discovery

of partial differential equations, Science Advances, 3 (2017), p. e1602614.

[35] S. Sra, S. Nowozin, and S. J. Wright, Optimization for machine learning, The
MIT Press, 2011.

[36] R. Swischuka, L. Maininia, B. Peherstorferb, and K. Willcox, Projection-

based model reduction: Formulations for physics-based machine learning, Computers and
Fluids, 179 (2019), pp. 704–717.

[37] G. Tsihrintzis, D. N. Sotiropoulos, and L. C. Jain, Machine learning paradigms:

Advances in data analytics, Springer, 2019.

[38] E. H. van Brummelen, S. Zhuk, and G. J. van Zwieten, Worst-case multi-

objective error estimation and adaptivity, Comput. Methods Appl. Mech. Engrg., 313
(2017), pp. 723–743.

JEE 2025 - 90 Days Planner - MathonGo
No ratings yet
JEE 2025 - 90 Days Planner - MathonGo
3 pages
Optimum Position of Outrigger System For Tall Buildings: July 2019
No ratings yet
Optimum Position of Outrigger System For Tall Buildings: July 2019
7 pages
Tank Pressure During Pump Out
No ratings yet
Tank Pressure During Pump Out
1 page
Bosch Company Catalog 2020 PDF
No ratings yet
Bosch Company Catalog 2020 PDF
37 pages
Penn State Aerospace Engineering Undergraduate Curriculum Guide
No ratings yet
Penn State Aerospace Engineering Undergraduate Curriculum Guide
37 pages
Continuum Mechanics For Engineers 3rd Edition Mase Download
100% (1)
Continuum Mechanics For Engineers 3rd Edition Mase Download
61 pages
History of Optics
No ratings yet
History of Optics
16 pages
Electricity and Magnetism: Magnetostatics
No ratings yet
Electricity and Magnetism: Magnetostatics
49 pages
The King of Mazy May
No ratings yet
The King of Mazy May
11 pages
W193674 - Equipment Passport
No ratings yet
W193674 - Equipment Passport
5 pages
Thermodynamics Lecture 3
100% (1)
Thermodynamics Lecture 3
1 page
1969 Moon Landing: A Fake Leap Forward?: Polytechnic University of The Philippines
No ratings yet
1969 Moon Landing: A Fake Leap Forward?: Polytechnic University of The Philippines
16 pages
03 - Multicomponent Systems
No ratings yet
03 - Multicomponent Systems
10 pages
Process Safety Testing Training - Preread
No ratings yet
Process Safety Testing Training - Preread
9 pages
Provisional: Progressive Transcript of Academic Records
No ratings yet
Provisional: Progressive Transcript of Academic Records
1 page
XI Assignment CH 6
0% (1)
XI Assignment CH 6
3 pages
Hydro 2
No ratings yet
Hydro 2
2 pages
Habasit TC and TCF Power
No ratings yet
Habasit TC and TCF Power
8 pages
Optics Manufacturing: Components and Systems 1st Edition Christoph Gerhard Instant Download
No ratings yet
Optics Manufacturing: Components and Systems 1st Edition Christoph Gerhard Instant Download
51 pages
LN 3 SciEd 131 Thermodynamics-1
No ratings yet
LN 3 SciEd 131 Thermodynamics-1
11 pages
Formula Sheet
No ratings yet
Formula Sheet
8 pages
CHAPTER 6-6.3 Solving System of Equation Using Inverse Method-440
No ratings yet
CHAPTER 6-6.3 Solving System of Equation Using Inverse Method-440
7 pages
Maths p1 Gr11 Memo Nov 2019 - English+afrikaans2
No ratings yet
Maths p1 Gr11 Memo Nov 2019 - English+afrikaans2
13 pages
Determining Reaction Order Initial Rates and The Method of Isolation
No ratings yet
Determining Reaction Order Initial Rates and The Method of Isolation
4 pages
T.C. Gebze Technical University Physics Department: Physics Laboratory Ii Experiment Report
No ratings yet
T.C. Gebze Technical University Physics Department: Physics Laboratory Ii Experiment Report
10 pages
Basics of Fluid Mechanics 6th Edition Genick Bar-Meir All Chapter Instant Download
100% (3)
Basics of Fluid Mechanics 6th Edition Genick Bar-Meir All Chapter Instant Download
49 pages
IMPNUMERICALS4MARKS D01-Mar-2024 240301 212054PDF 240301 212137
No ratings yet
IMPNUMERICALS4MARKS D01-Mar-2024 240301 212054PDF 240301 212137
7 pages
The Quantum and Spiritual Dimensions of Human Design: An Integrative Exploration
No ratings yet
The Quantum and Spiritual Dimensions of Human Design: An Integrative Exploration
6 pages
Abortion Essay Outline
100% (2)
Abortion Essay Outline
3 pages
7#8ALUMOWELD AluminumCladSteelWire
No ratings yet
7#8ALUMOWELD AluminumCladSteelWire
2 pages
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1175)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

Data-Driven Finite Elements Methods

Uploaded by

Data-Driven Finite Elements Methods

Uploaded by

Data-Driven Finite Elements Methods:

Machine Learning Acceleration of

I. Brevis∗, I. Muga†, and K.G. van der Zee‡

11th March, 2020

Keywords Goal-oriented finite elements · Machine-Learning acceleration · Residual Mini-

A Proof of Theorem 2.B 20

1.1 Motivating example

The standard variational formulation of problem (1) reads:

1.2 Related literature

2.2 Main idea of the accelerated methods

C1,ω kvkV,ω ≤ kvkV ≤ C2,ω kvkV,ω , ∀v ∈ V. (7)

Consider a coarse finite dimensional subspace Uh ⊂ U where we want to approximate the

In the online procedure:

2.3 Analysis of the discrete method

Proof See Appendix A.

2.3.1 Equivalent Petrov-Galerkin formulation

(Tω u, v)V,ω = b(u, v) , ∀u ∈ U, ∀v ∈ V. (14)

(Th,ω u, vh )V,ω = b(u, vh ) , ∀u ∈ U, ∀vh ∈ Vh . (15)

b(uh,λ,ω , vh ) = `λ (vh ), ∀vh ∈ Vh,ω . (16)

(rh,λ,ω , vh )V,ω = (rh,λ,ω , Th,ω wh )V,ω = b(wh , rh,λ,ω ) = 0,

2.3.2 Equivalent Minimal Residual formulation

(rh,λ,ω , vh )V,ω = `λ (vh ) − b(uh,λ,ω , vh ) , ∀vh ∈ Vh ,

which gives equation (8b).

ANN(xin ) = Θn σ(· · · σ(Θ2 σ(Θ1 xin + φ1 ) + φ2 ) · · · ) + φn , (17)

θ∗ = argmin J(θ). (19)

3.2 Offline procedures

Aij = (ϕj , ϕi )V,ω∗ and Bij = b(ψj , ϕi ),

3.3 Online procedures

(Lλ )i = h`λ , ϕi iV∗ ,V ,

where ϕi is the i-th vector of in the basis of Vh . Next, we compute

q(uh,λ,ω∗ ) = QT (B T A−1 B)−1 B T A−1 Lλ .

4.1 1D diffusion with one QoI

(a) λ = 0.35 (b) λ = 0.75

4.2 1D advection with one QoI

Let us consider the following variational formulation of problem (25):

(a) One element (b) Two elements (c) Three elements

(a) One DoF (b) Two DoF (c) Three DoF

(a) Three elements (b) Four elements (c) Five elements

4.3 1D advection with multiple QoIs

4.4 2D diffusion with one QoI

(a) Three DoF (b) Four DoF (c) Five DoF

where the family of functions {fλ }λ∈(0,1) is described by the formula:

fλ (x1 , x2 ) = 2π 2 (1 + λ2 ) sin(πx1 ) sin(λπx1 ) sin(πx2 ) sin(λπx2 )−

Accordingly, the reference exact solution of (26) is:

uλ (x) = sin(πx1 ) sin(λπx1 ) sin(πx2 ) sin(λπx2 ).

with Ω0 := [0.79 , 0.81] × [0.39 , 0.41] ⊂ Ω (see Figure 8).

where U = V = H01 (Ω), and V is endowed with the weighted inner-product:

(a) One DoF (b) Five DoF (c) Eight DoF

(a) One DoF (b) Five DoF (c) Eight DoF

A Proof of Theorem 2.B

To get the first estimate observe that:

kuλ − uh,λ,ω kU = k(I − P )uλ kU = k(I − P )(uλ − wh )kU ≤ kP kkuλ − wh kU .

[4] D. Broersen and R. Stevenson, A robust Petrov–Galerkin discretisation of

[6] G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of

[7] L. Demkowicz and J. Gopalakrishnan, A class of discontinuous Petrov-Galerkin

[8] L. Demkowicz and J. Gopalakrishnan, An overview of the discontinuous Petrov

[9] N. Discacciati, J. S. Hesthaven, and D. Ray, Controlling oscillations in high-

[11] B. Endtmeyer and T. Wick, A partition-of-unity dual-weighted residual approach

[13] M. Feischl, D. Praetorius, and K. G. van der Zee, An abstract analysis of

[16] B. Hayhurst, M. Keller, C. Rai, X. Sun, and C. R. Westphal, Adaptively

[17] J. S. Hesthaven and S. Ubbiali, Non-intrusive reduced order modeling of nonlinear

[19] A. C. Ionita and A. C. Antoulas, Data-driven parametrized model reduction in the

[21] K. Kergrene, S. Prudhomme, L. Chamoin, and M. Laforest, A new goal-

[22] G. Kutyniok, P. Petersen, M. Raslan, and R. Schneider, A theoretical analysis

[26] J. Ling, A. Kurzawski, and J. Templeton, Reynolds averaged turbulence modelling

[28] M. S. Mommer and R. Stevenson, A goal-oriented adaptive finite element method

[31] M. Raissi and G. E. Karniadakis, Hidden physics models: Machine learning of

[32] D. Ray and J. S. Hesthaven, An artificial neural network as a troubled-cell indicator,

[34] S. H. Rudy, S. L. Brunton, J. L. Proctor, and J. N. Kutz, Data-driven discovery

[36] R. Swischuka, L. Maininia, B. Peherstorferb, and K. Willcox, Projection-

[37] G. Tsihrintzis, D. N. Sotiropoulos, and L. C. Jain, Machine learning paradigms:

[38] E. H. van Brummelen, S. Zhuk, and G. J. van Zwieten, Worst-case multi-

You might also like