0% found this document useful (0 votes)
12 views28 pages

Hesthaven和Ubbiali - 2018 - Non-intrusive reduced order modeling of nonlinear problems using neural networks

The document presents a non-intrusive reduced basis method for parametrized steady-state partial differential equations using neural networks, specifically multi-layer perceptrons, to approximate coefficients of the reduced model. This method, called POD-NN, efficiently decouples the offline and online phases, allowing for fast and accurate solutions to complex nonlinear problems. Numerical studies demonstrate the method's effectiveness on the nonlinear Poisson equation and steady incompressible Navier-Stokes equations.

Uploaded by

ruian luo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views28 pages

Hesthaven和Ubbiali - 2018 - Non-intrusive reduced order modeling of nonlinear problems using neural networks

The document presents a non-intrusive reduced basis method for parametrized steady-state partial differential equations using neural networks, specifically multi-layer perceptrons, to approximate coefficients of the reduced model. This method, called POD-NN, efficiently decouples the offline and online phases, allowing for fast and accurate solutions to complex nonlinear problems. Numerical studies demonstrate the method's effectiveness on the nonlinear Poisson equation and steady incompressible Navier-Stokes equations.

Uploaded by

ruian luo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Manuscript

Click here to view linked References

Non-intrusive reduced order modeling of nonlinear problems


using neural networks

J. S. Hesthavena , S. Ubbialia,b
a École Polytechnique Fédérale de Lausanne (EPFL), Route Cantonale, 1015 Lausanne, Switzerland
b Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy

Abstract
We develop a non-intrusive reduced basis (RB) method for parametrized steady-state partial differential equations (PDEs).
The method extracts a reduced basis from a collection of high-fidelity solutions via a proper orthogonal decomposition
(POD) and employs artificial neural networks (ANNs), particularly multi-layer perceptrons (MLPs), to accurately approxi-
mate the coefficients of the reduced model. The search for the optimal number of neurons and the minimum amount of
training samples to avoid overfitting is carried out in the offline phase through an automatic routine, relying upon a joint
use of the latin hypercube sampling (LHS) and the Levenberg-Marquardt training algorithm. This guarantees a complete
offline-online decoupling, leading to an efficient RB method - referred to as POD-NN - suitable also for general nonlinear
problems with a non-affine parametric dependence. Numerical studies are presented for the nonlinear Poisson equation
and for driven cavity viscous flows, modeled through the steady incompressible Navier-Stokes equations. Both physical
and geometrical parametrizations are considered. Several results confirm the accuracy of the POD-NN method and show
the substantial speed-up enabled at the online stage as compared to a traditional RB strategy.

Keywords: non-intrusive reduced basis method, proper orthogonal decomposition, multi-layer perceptron,
Levenberg-Marquardt algorithm, Poisson equation, driven cavity flow

1. Introduction

Many applications in engineering and the applied sciences involve mathematical models expressed as parametrized
partial differential equations (PDEs), in which boundary conditions, material properties, source terms, loads or geometric
features of the underlying physical problem are expressed by a parameter µ [18, 23, 26]. A list of notable examples includes
parameter estimation [6], topology optimization [5], optimal control [30] and uncertainty quantification [29]. In these
examples, one is typically interested in a real-time evaluation of an output of interest (defined as a functional of the state
variable [15]) for many parameter entries, i.e., for many configurations of the problem.
The increasing computational power and the simultaneous algorithmic improvements enable nowadays the high-
fidelity numerical resolution of complex problems via standard discretization procedures, such as finite difference (FD),
finite volume (FV), finite element (FE), or spectral methods [41]. However, these schemes remain prohibitively expensive in
many-query and real-time contexts, both in terms of CPU time and memory demand, due to the large amount of degrees of
freedom (DOFs) they need to accurately solve the PDE [1]. In light of this, reduced order modeling (ROM) methods have
received a significant attention in the last decades. The objective of these methods is to replace the full-order system by one
of significant smaller dimension, to decrease the computational burden while leading to a controlled loss of accuracy [11].
Reduced basis (RB) methods constitute a well-known and widely-used example of reduced order modeling techniques.
They are generally implemented pursuing an offline-online paradigm [31]. Based upon an ensemble of snapshots (i.e., high-
fidelity solutions to the parametrized differential problem), the goal of the offline step is to construct a solution-dependent
basis, yielding a reduced space of globally approximating functions to represent the main dynamics of the full-order model
[2, 11]. For this, two major approaches have been proposed in the literature: proper orthogonal decomposition (POD)
[22, 46] and greedy algorithms [24]. The former relies on a deterministic or random sample in the parameter space to

Email addresses: [email protected] (J. S. Hesthaven), [email protected] (S. Ubbiali)

Preprint submitted to Elsevier October 31, 2017


generate snapshots and then employs a singular value decomposition (SVD) to recover the reduced basis. In the second
approach, the basis vectors coincide with the snapshots themselves, carefully selected according to some optimality
criterion. As a result, a greedy strategy is typically more effective and efficient than POD, as it enables the exploration of a
wider region of the parameter space while entailing the computation of many fewer high-fidelity solutions [23]. However,
there exist problems for which a greedy approach is not feasible, simply because a natural criterion for the choice of the
snapshots is not available [2].
Once a reduced order framework has been properly set up, an approximation to the truth solution for a new parameter
value is sought online as a linear combination of the RB functions, with the expansion coefficients determined via a
projection of the full-order system onto the reduced space [7]. To this end, a Galerkin procedure is the most popular choice.
Despite their established effectiveness, projection-based RB methods do not provide any computational gain with
respect to a direct (expensive) approach for complex nonlinear problems with a non-affine dependence on the parameters.
This is a result of the cost to compute the projection coefficients, which depends on the dimension of the full-order model.
In fact, a full decoupling between the online stage and the high-fidelity scheme is the ultimate secret for the success of any
RB procedure [41]. For this purpose, one may recover an affine expansion of the differential operator through the empirical
interpolation method (EIM) [3] or its discrete variants [10, 37]. However, for general nonlinear problems this is far from
trivial.
A valuable alternative to address this concern is represented by non-intrusive RB methods, in which the high-fidelity
model is used to generate the snapshots, but not in the projection process [11]. The projection coefficients are obtained via
interpolation over the parameter domain of a database of reduced order information [9]. However, since reduced bases
generally belong to nonlinear, matrix manifolds, standard interpolation techniques may fail, as they cannot enforce the
constraints characterizing those manifolds, unless employing a large amount of samples [1, 4].
In this work, we develop a non-intrusive RB method employing POD for the generation of the reduced basis and resort
to (artificial) neural networks, in particular multi-layer perceptrons, in the interpolation step. Hence, in the following we
refer to the proposed RB procedure as the POD-NN method. Being of non-intrusive nature, POD-NN is suitable for a fast
and reliable resolution of complex nonlinear PDEs featuring a non-affine parametric dependence. To test this assertion, the
POD-NN method is applied to the one- and two-dimensional nonlinear Poisson equation and to the steady incompressible
Navier-Stokes equations. Both physical and geometrical parametrizations are considered.
The paper is organized as follows. Section 2 defines the (parametrized) functional and variational framework which is
required to develop a finite element solver, briefly outlined in Subsection 2.2. The standard projection-based POD-Galerkin
(POD-G) RB method is derived in Section 3. Section 4 discusses components, topology and learning process for artificial
neural networks. This is preparatory for the subsequent Section 5, which details the non-intrusive POD-NN RB procedure;
both theoretical and practical aspects are addressed. Several numerical results, aiming to show the reliability and efficiency
of the proposed RB technique, are offered in Section 6 for the Poisson equation (Subsection 6.1) and the lid-driven cavity
problem for the steady Navier-Stokes equations (Subsection 6.2). Finally, Section 7 gathers some relevant conclusions and
suggests future developments.

2. Parametrized partial differential equations

Assume P ph Ω RP ph and P g Ω RP g are compact sets, and let µph 2 P ph and µg 2 P g be respectively the physical
and geometrical parameters characterizing the differential problem, so that µ = (µph , µg ) 2 P = P ph £ P g Ω RP , with
P = P ph + P g , represents the overall input vector parameter. While µph addresses material properties, source terms
and boundary conditions, µg defines the shape of the computational domain ≠ e g ) Ω Rd , d = 1, 2. We denote by
e = ≠(µ
e e e e e
°(µg ) = @≠(µg ) the (Lipschitz) boundary of ≠(µg ), and by °D (µg ) and °N (µg ) the portions of °(µ e g ) where Dirichlet and
Neumann boundary conditions are enforced, respectively, with ° eD [ °eN = °e and °e̊D \ °
e̊ N = ;.
Consider a Hilbert space Vep = Ve (µg ) = Ve (≠(µ
e g )) defined over the domain ≠(µ e g ), equipped with the scalar product (·, ·) e
V
and the induced norm k·k e = (·, ·) e . Furthermore, let Ve 0 = Ve 0 (µg ) be the dual space of Ve . Denoting by Ge : Ve £ P ph ! Ve 0
V V
the map representing a parametrized nonlinear second-order PDE, the differential (strong) form of the problem of interest
reads: given µ = (µph , µg ) 2 P , find u(µ)
e 2 Ve (µg ) such that

e u(µ);
G( e µph ) = 0 in Ve 0 (µg ) , (2.1)

2
namely
e u(µ);
hG( e µph ), viVe 0 ,Ve = 0 8v 2 Ve (µg ) ,

with h·, ·iVe 0 ,Ve : Ve 0 £ Ve ! R the duality pairing between Ve 0 and Ve .


The finite element method requires problem (2.1) to be stated in a weak (or variational) form [40]. To this end, let us
introduce the form ge : Ve £ Ve £ P ! R, with ge(·, ·; µ) defined as:

e
ge(w, v; µ) = hG(w; µph ), viVe 0 ,Ve 8w, v 2 Ve .

The variational formulation of (2.1) then reads: given µ = (µph , µg ) 2 P , find u(µ)
e 2 Ve (µg ) such that

ge(u(µ),
e v; µ) = 0 8v 2 Ve (µg ) .

2.1. From physical to reference domain


As discussed in the Introduction, a reduced basis method seeks an approximated solution to a problem as a combination
of (few) well-chosen basis vectors. These typically result from a suitable combination of a collection of high-fidelity
approximations, called snapshots. Therefore, when addressing problems defined on variable shape domains, ensuring
the compatibility among snapshots is crucial. To this end, it is practice to formulate and solve the differential problem
over a fixed, parameter-independent domain ≠ Ω Rd [35]. This can be accomplished by introducing a parametrized map
© : ≠£Pg ! ≠ e such that
e g ) = ©(≠; µg ) .
≠(µ
The transformation ©(·; µg ) allows one to restate the general problem (2.1). Let V be a suitable Hilbert space over ≠
p
and V 0 be its dual. Suppose V is equipped with the scalar product (·, ·)V and the induced norm k·kV = (·, ·)V . Given the
parametrized map G : V £ P ! V 0 representing the (nonlinear) PDE over the reference domain ≠, we focus on problems of
the form: given µ 2 P , find u(µ) 2 V such that

G(u(µ); µ) = 0 in V 0 . (2.2)

The weak formulation of problem (2.2) reads: given µ 2 P , seek u(µ) 2 V such that

g (u(µ), v; µ) = 0 8v 2 V , (2.3)

where g : V £ V £ P ! R is defined as

g (w, v; µ) = hG(w; µ), viV 0 ,V 8w, v 2 V , 8µ 2 P ,

with h·, ·iV 0 ,V : V 0 £ V ! R the dual pairing between V and V 0 . Observe that the explicit expression of g (·, ·; µ) involves the
map ©(·; µg ), thus keeping track of the original domain ≠(µe g ). Then, the solution u(µ)
e over the original domain ≠(µ e g ) can
be recovered as
u(µ)
e = u(µ) ± ©(µg ) .
In our numerical tests, we employ the squared reference domain shown on the right in Fig. 6.3 and we resort to a particular
choice for ©(·; µg ) - the boundary displacement-dependent transfinite map (BDD TM) proposed by Jaggli et al. [26].

2.2. Discrete full-order model


Let Vh Ω V be a suitable FE subspace of V of (finite) dimension Nh , with h ∏ 0 being the characteristic size of the
underlying mesh ≠h which discretizes ≠. The FE approximation of the weak problem (2.3) can be cast in the form: given
µ 2 P , find u h (µ) 2 Vh such that
g (u h (µ), v h ; µ) = 0 8v h 2 Vh . (2.4)
© ™
From an algebraic standpoint, letting ¡1 , . . . , ¡Nh be a Lagrangian basis for Vh and denoting by uh (µ) 2 RNh the vector
© (N ) ™
collecting the nodal values u h(1) (µ), . . . , u h h (µ) of u h (µ), problem (2.4) is equivalent to: given µ 2 P , find uh (µ) 2 RNh
such that
Gh (uh (µ); µ) = 0 2 RNh , (2.5)

3
where the i -th component of the residual vector Gh (·; µ) is given by
° ¢
Gh (uh (µ); µ) i = g (u h (µ), ¡i ; µ) , i = 1, . . . , Nh . (2.6)

Observe that, due to the nonlinearity of g (·, ·; µ) in its first argument, one has to resort to some iterative method, e.g.,
Newton’s method, to solve the Galerkin problem (2.5).

3. Projection-based reduced basis method

As outlined above, the finite element discretization of the µ-dependent nonlinear differential problem (2.3), combined
with Newton’s method, entails the assembly and solution of (possibly) many linear systems, whose dimension Nh is directly
related to (i ) the size of the underlying grid and (i i ) the order of the polynomial FE space adopted. Since the accuracy of the
resulting discretization heavily relies on these two factors, a direct numerical approximation of the full-order model implies
severe computational costs. Therefore, this approach is not affordable in many-query and real-time contexts. This motivates
the use of reduced order models. Particularly, reduced basis © methods ™seek an approximate solution to problem (2.3) as
a linear combination of parameter-independent
© ° ¢ functions
° ¢√™ 1 , . . . , √L Ω Vh , called reduced basis functions,
© built from
™ a
collection of high-fidelity snapshots u h µ(1) , . . . , u h µ(N ) , where the discrete and finite set •N = µ(1) , . . . , µ(N ) Ω P
may
© ™ consist of either a uniform lattice or randomly generated points over the parameter domain P [23]. The basis functions
√l 1∑l ∑L generally follow from a principal component analysis (PCA) of the set of snapshots (in that case, N > L), or they
might coincide ©with ™the snapshots themselves (in that case, N = L). In the latter approach, typical of any greedy method,
the parameters µ(n) 1∑n∑N must be carefully chosen following some optimality criterium (see, e.g., [11]). Here, we pursue
the first approach, employing the well-known proper orthogonal decomposition (POD) method [22, 46], detailed in the
following subsection. For now, assume that a reduced basis is available and let Vrb Ω Vh be the associated reduced basis
space, i.e., © ™
Vrb = span √1 , . . . , √L .
A reduced basis solution u L (µ) is sought in the form

L
X (l )
u L (x; µ) = u rb (µ) √l (x) 2 Vrb ,
l =1

with £ (1) §T
(L)
urb (µ) = u rb (µ), . . . , u rb (µ) 2 RL
being the reduced coefficients (also called generalized coordinates) for the expansion of the RB solution in the RB basis
functions. To recover u L (µ), we proceed to project the variational problem (2.3) onto the RB space Vrb by pursuing a
standard Galerkin approach, leading to the following reduced basis problem: given µ 2 P , find u L (µ) 2 Vrb so that

g (u L (µ), v L ; µ) = 0 8v L 2 Vrb . (3.1)

Denoting by √l 2 RNh the vector gathering the nodal values of √l , for l = 1, . . . , L, let us introduce the matrix
£ §
V = √1 | . . . | √L 2 RNh £L .

For any v L 2 Vrb , V encodes the change of variables from the RB basis to the Lagrangian FE basis, i.e.,

vL = V vrb . (3.2)

Then, due to (2.6) and (3.2), the algebraic formulation of the reduced basis problem (3.1) reads: given µ 2 P , seek
urb (µ) 2 RL such that
Grb (urb (µ); µ) = VT Gh (uL (µ); µ) = VT Gh (V urb (µ); µ) = 0 2 RL . (3.3)

4
3.1. Proper orthogonal decomposition
© ° ¢ ° ¢™
Consider
© (1) a collection™ of N snapshots u h µ(1) , . . . , u h µ(N ) , corresponding to the finite and discrete parameter set
•N = µ , . . . , µ(N ) Ω P , and let M•N be the associated subspace, i.e.,
© ° ¢ ° ¢™
M•N = span u h µ(1) , . . . , u h µ(N ) .

We assume that M•N provides a good approximation of the discrete solution manifold Mh ,
© ™
Mh = u h (µ) : µ 2 P ,

as long as the number of snapshots is sufficiently large (but typically much smaller than the dimension Nh of the FE
space).
© Then,
™ we aim at finding a parameter-independent reduced basis for M•N , i.e., a collection of FE functions
√1 , . . . , √L Ω M•N , with L ø Nh and L independent of Nh , so that the associated linear space constitutes a low-rank
approximation of M•N , optimal in some sense to be defined later. To this end, consider the snapshot matrix S 2 RNh £N
gathering the nodal values of the snapshots in a column-wise sense, i.e.,
£ ° ¢Ø Ø ° ¢§
S = uh µ(1) Ø . . . Ø uh µ(N ) .
© ™
Denoting by R the rank of£S, withØ ØR ∑ min
§ Nh , N , the singular
£ Ø value
Ø decomposition
§ (SVD) of S ensures the existence of two
orthogonal matrices W = w1 Ø . . . Ø wNh 2 RNh £Nh and Z = z1 Ø . . . Ø zN 2 RN £N , and a diagonal matrix D = diag(æ1 , . . . , æR ) 2
RR£R , with æ1 ∏ æ2 ∏ . . . ∏ ær > 0, such that
∑ ∏
D 0 T
S=W Z = W ß ZT ,
0 0
© ™
where the zeros
© denote
™ null matrices of appropriate dimensions. The real values æ©i 1∑i
™ ∑R are called singular values of S,
the columns wm 1∑m∑N of W are called left singular vectors of S, and the columns zn 1∑n∑N of Z are called right singular
h
vectors of S, and they are related by the following relations:
( (
T æ2m wm for 1 ∑ m ∑ R , T æ2n zn for 1 ∑ n ∑ R ,
SS wm = S S zn =
0 for R + 1 ∑ m ∑ Nh , 0 for R + 1 ∑ n ∑ N , (3.4)
T
S zi = æi wi for 1 ∑ i ∑ R , S wi = æi zi for 1 ∑ i ∑ R .
© ™
At the algebraic level, our goal is to approximate the columns of S by means of L orthonormal© vectors™ w e 1, . . . , w
e L , with
L < R. It is an easy matter to show that for each sn , n = 1, . . . , N , the element of span we 1, . . . , w
e L closest to sn in the
p
Euclidean norm k·kRNh = (·, ·)RNh is given by
L °
X ¢
sn , w
e l RNh wel .
l =1
© ™
Hence, we could measure the error committed by approximating the columns of S via the vectors w e l 1∑l ∑L through the
quantity
∞ ∞2
XN ∞ X L ° ¢ ∞
∞ ∞
"(w
e 1, . . . , w
eL) = ∞sn ° sn , w
e l RNh wel∞ . (3.5)
n=1
∞ l =1
∞ N
R h
© ™
The Schmidt-Eckart-Young theorem [17, 44] states that the POD basis of rank L w1 , . . . , wL , consisting of the first L left
L
singular values of S, minimizes (3.5) among all the orthonormal bases of R . Therefore, in the POD-Galerkin RB method,
we set √l = wl , for all l = 1, . . . , L, so that £ Ø Ø §
V = w1 Ø . . . Ø wL .
© ™
From a computational viewpoint, the first L left singular vectors wl 1∑l ∑L of S can be efficiently computed through the
so-called method of snapshots. We should distinguish two cases:

(a) if Nh ∑ N : directly solve the eigenvalue problems SST wl = ∏l wl , for 1 ∑ l ∑ L;

5
(b) if Nh > N : compute the correlation matrix M = ST S and solve the eigenvalue problems M zl = ∏l zl , for 1 ∑ l ∑ L.
Then, by (3.4), set wl = (∏l )° /2 S zl , for 1 ∑ l ∑ L.
1

3.2. Implementation: details and issues


The numerical procedure presented so far can be efficiently carried out within an offline-online framework [39]. The
parameter-independent offline step consists of the generation of the snapshots through a high-fidelity, expensive scheme
and the subsequent construction of the reduced basis via POD. To determine an appropriate dimension for the basis, which
ensures a desired degree of accuracy, one can resort to empirical criteria, like, e.g., the relative information content [41].
Then, given a new parameter value µ 2 P , the nonlinear reduced system (3.3) is solved online.
However, to enjoy a significant reduction in the computational burden with respect to traditional (full-order) discretiza-
tion techniques, the complexity of any online query should be independent of the original size of the problem. Yet, due
to the nonlinearity of the underlying PDE and the non-affinity in the parameter dependence (partially induced by the
transformation map ©(·; µg )), the assembly of the reduced problems has to be embodied directly in the online stage, thus
seriously compromising the efficiency of the overall procedure [3]. Without escaping the algebraic framework, this can be
overcome by resorting to suitable techniques such as the discrete empirical interpolation method (DEIM) [10] or its matrix
variant (MDEIM) [37], aiming at recovering an affine dependency on the parameter µ. However, the implementation of
such techniques is problem-dependent and of an intrusive nature, as it requires one to modify the assembly routines of the
corresponding computational code [9]. Moreover, any interpolation procedure unavoidably introduces a further level of
approximation. As a matter of fact, typically one needs to generate a larger number of snapshots in the offline stage and
then retain a larger number of POD modes to guarantee the same accuracy provided by the standard POD-Galerkin method
[3].

4. Artifical neural networks

Inspired by the biological information processing system (see, e.g., [21, 28]), an artificial neural network (ANN), often
referred to as neural network, is a computational model able to learn from observational data, i.e., by example, thus
providing an alternative to the algorithmic programming paradigm [36]. As its original counterpart, it consists of a
collection of processing units, called (artificial) neurons, and a set of directed weighted synaptic connections among the
neurons. Data travel among neurons through the connections, following the direction imposed by the synapses. Hence, an
artificial neural network is an oriented graph, with the neurons as nodes and the synapses as oriented edges, whose weights
are adjusted by means of a training process to configure the network for a specific application [45].
In the following, we discuss the structure and training of a neural network, starting by detailing the working principles
of an artificial neuron.

4.1. Neuronal model


An artificial neuron represents a simplified model of a biological neuron [28]. To introduce the components of the
model,
© let us
™ consider the neuron j represented © on the left ™ in Fig. 4.1. Suppose that it is connected with m sending neurons
s 1 , . . . , s m , and n receiving (target) neurons r 1 , . . . , r n . Denoting by y Æ (t ) 2 R the (scalar) output of a generic neuron Æ at
time t and by w Æ,Ø the weight of the connection (Æ, Ø), neuron © j gets the ™ weighted inputs w sk , j y sk (t ), k = 1, . . . , m, at time
t , and sends out the output y j (t + ¢t ) to the target neurons r 1 , . . . , r n at time t + ¢t . In particular, neuron r i , i = 1, . . . , n,
receives as input w j ,r i y j (t + ¢t ). Note that in the context of ANNs, time is discretized by introducing the timestep ¢t . This
is clearly not plausible from a biological viewpoint; however, it substantially simplifies the implementation. In the following,
we will avoid to specify the dependence on time unless strictly necessary, to lighten the notation.
An artificial neuron j is completely characterized by three functions: the propagation function, the activation function
and the output function. The propagation function f pr op converts the vectorial input p = [y s1 , . . . , y sm ]T 2 Rm into a scalar
u j often called net input, i.e.,
u j = f pr op (w s1 , j , . . . , w sm , j , y s1 , . . . , y sm ) .
A common choice for f pr op (used also in this work) is the weighted sum, adding up the scalar inputs multiplied by their
respective weights:
m
X
f pr op (w s1 , j , . . . , w sm , j , y s1 , . . . , y sm ) = w sk , j y sk .
k=1

6
ys1 wj,r1 . yj ys1 wj,r1 . yj
ws1,j wj,r1 ws1,j wj,r1
ys2 wj,r2 . yj ys2 wj,r2 . yj
ws2,j wj,r2 ws2,j wj,r2
. . . .
j (θj) j (0)
. . . .
. . . .
wsm,j wj,rn wsm,j wj,rn
-θj
ysm wj,rn . yj ysm wj,rn . yj
+1

Figure 4.1. Visualization of the generic j -th neuron © of an artificial neural™ network, including (right) or not (left) a bias neuron.
© On the™ left,
the neuron accumulates the weighted inputs w s1 , j y s1 ,©. . . , w sm , j y sm respectively coming™ from the sending neurons s 1 , . . . , s m ; on
the
© right, the™neuron accumulates the weighted inputs w s1 , j y s1 , . . . , w sm , j y sm , °µ j respectively coming from © the sending
™ neurons
s1 , . . . , sm
© , b , with b the ™bias neuron. In both situations, the neuron then fires y j , sent to the target neurons r 1 , . . . , r n through the
synapsis w j ,r 1 , . . . , w j ,r n . The neuron threshold is reported in brackets within its body.

At each timestep, the activation state a j , often referred to as activation, quantifies to which extent neuron j is currently
active or excited. It results from the activation function f ac t , which combines the net input u j with a threshold µ j 2 R [28]:

m
°X ¢
a j = f ac t (u j ; µ j ) = f ac t w sk , j y sk ; µ j .
k=1

Note that the threshold µ j is a parameter of the network and as such one may choose to adapt it through a training process,
exactly as it can be done for the synaptic weights. To ease the runtime access of µ j , it is common practice to introduce a
bias neuron in the network. A bias neuron is a continuously firing neuron, with constant output y b = 1, which is directly
connected with neuron j , assigning the bias weight w b, j = °µ j to the connection. As can be deduced by the representation
on the right in Fig. 4.1, µ j is now treated as a synaptic weight, while the neuron threshold is set to zero. Therefore, the net
input and the activation state can respectively be expressed as
m
X m
°X ¢
uj = w sk , j y sk ° µ j and a j = f ac t w sk , j y sk ° µ j .
k=1 k=1

There exist various choices for the activation function. The sigmoid activation functions have been widely used for the
realization of artificial neural networks due to their graceful combination of linear and nonlinear behaviour [21]. Sigmoid
functions are s-shaped, monotically increasing, and assume values in a bounded interval; a well-known instance is given by
the hyperbolic tangent,
e v ° e °v
f ac t (v) = v .
e + e °v
Finally, the output function f out calculates the scalar output y j 2 R based on the activation state a j of the neuron:

y j = f out (a j ) .

Often, f out is the identity function, so that activation and output of a neuron coincides, i.e., y j = f out (a j ) = a j . The output
y j could then be sent either to other neurons or constitute a component of the overall output vector of the network, as for
the neurons in the output layer of a feedforward neural network, illustrated in the following subsection.
It should be pointed out that the neural model presented so far refers to the so called computing neuron, i.e., a neuron
which processes input information to provide a response. However, in a neural network one may also identify source
neurons, supplying the network with the respective components of the activation pattern (input vector), without performing
any computation [21].

7
h1,1 h2,1

h1,2 h2,2 o1 q1
p1 i1
h1,3 h2,3 o2 q2
p2 i2
h1,4 h2,4 o3 q3
p3 i3
h1,5 h2,5 o4 q4

h1,6 h2,6

Figure 4.2. A three-layer feedforward neural network, with three input neurons, two hidden layers each one consisting of six neurons,
and four output neurons. Within each connection, information flows from left to right.

4.2. Network topology: the feedforward neural network


The interconnection of neurons within a network defines the topology of the network itself, i.e., its design. In the
literature, many network architectures have been proposed, sometimes tailored to a specific application. Among all,
feedforward neural networks, also called perceptrons [43], have been preferred in function regression tasks.
In a feedforward neural network, neurons are arranged into layers, with one input layer of M I source neurons, K hidden
layers, each one consisting of Hk computing neurons, k = 1, . . . , K , and an output layer of MO computing neurons. As a
characteristic property, neurons in a layer can only be connected with neurons in the next layer towards the output layer.
Then, an activation pattern p 2 RM I , supplied to the network through the source nodes in the first layer, provides the input
signal for the neurons in the first hidden layer. For each hidden layer, its output signals give the input pattern for the
following layer. In this way, information travels towards the last layer of the network, i.e., the output layer, whose outputs
constitute the components of the overall output q 2 RMO of the network 1 . Hence, a feedforward network establishes a map
between the input space RM I and the output space RMO . This does make this network architecture particularly suitable for
continuous function approximation.
Feedforward networks can be classified according to the number of hidden layers or, equivalently, the number of
layers of trainable weights. Single-layer perceptrons (SLPs) consist of the input and output layer, without any hidden layer.
Because of their simple structure, the range of application of SLPs is rather limited. Indeed, only linearly separable data can
be properly represented using SLPs [28]. Conversely, multi-layer perceptrons (MLPs), with at least one hidden layer, are
universal function approximators, as stated by Cybenko [12, 13]. In detail:

(i) MLPs with one hidden layer and differentiable activation functions can approximate any continuous function;

(ii) MLPs with two hidden layers and differentiable activation functions can approximate any function.

Therefore, in many practical applications there is no reason to employ MLPs with more than two hidden layers. However, (i)
and (ii) do not give any practical advice neither on the number of hidden neurons nor the number of samples required to
train the network: these should be found pursuing a trial-and-error (and likely time-consuming) approach [20].
An instance of a three-layer (i.e., two hidden layer plus the output layer) feedforward network is offered in Fig. 4.2. In
this case, we have M I = 3 input neurons (denoted with the letter i ), H1 = H2 = 6 hidden neurons (letter h for both hidden
layers), and MO = 4 output neurons (letter o). In particular, it represents an instance of a completely linked perceptron,
since each neuron is directly connected with all neurons in the following layer.

1 Please note that while the output of a single neuron is denoted with the letter y, we use the letter q (bolded) to indicate the overall output of the
network. Clearly, for the j -th output neuron, its output y j coincides with the corresponding entry of q, i.e., q j = y j for any j = 1, . . . , MO .

8
4.3. Training a multi-layer feedforward neural network
As previously mentioned, the principal characteristic of a neural network is its capability of learning from the sur-
rounding environment, storing the acquired knowledge via its internal parameters, i.e., the synaptic and bias weights.
Learning is accomplished through a training process, during which the network is exposed to a collection of examples,
called training patterns. According to some performance measure, the weights are then adjusted by means of a well-defined
set of rules. Therefore, the learning procedure is an algorithm, typically iterative, such that after a successfull training, the
neural network provides reasonable responses for unknown problems of the same class of the training set. This property is
known as generalization [28].
Training algorithms can be classified based on the nature of the training set, i.e., the set of training patterns. We can
then distinguish three learning paradigms: supervised learning, unsupervised learning and reinforcement learning [20].
The choice of the learning paradigm is clearly task-dependent. Particularly, for function approximation, the supervised
learning paradigm is the natural choice. © ™
Consider the nonlinear unknown function f : RM I ! RMO and a set of labeled examples pi , ti = f (pi ) 1∑i ∑Nt r , which
form the training set. Note that to each input pattern pi 2 RM I , i = 1, . . . , N t r , corresponds an associated desired output or
teaching input ti 2 RMO . The goal is to approximate f over a domain D Ω RM I up to a user-defined tolerance ≤, i.e.,

||F (x) ° f (x)|| < ≤ 8x 2 D ,

where F : RM I ! RMO is the actual input-output map established by the neural network and || · || is some suitable norm
on RMO . For this purpose, consider the synapsis between a sending neuron i and a target neuron j . At the t -th iteration
(also called epoch) of the training procedure, the weight w i , j (t ) of the connection (i , j ) is modified by the time-dependent
quantity ¢w i , j (t ), whose form depends on the specific learning rule. Hence, at the subsequent iteration t + 1 the synaptic
weight is simply given by
w i , j (t + 1) = w i , j (t ) + ¢w i , j (t ) .
The whole training process is driven by an error or performance function E , which measures the discrepancy between
the neural network knowledge of the surrounding environment and the actual state of the environment itself. Therefore,
every learning rule aims to minimize the performance E , thought as a scalar function of the free parameters of the network,
namely
E = E (w) 2 R ,
with w 2 RW being the vector collecting all the synaptic and bias weights. Thus, the point over the error surface reached at
the end of a successful training process provides the optimal configuration wopt for the network. A common choice for the
performance function is the (accumulated) mean squared error (MSE)

X X 1 M XO ° ¢2
E (w) = E p (w) = t p, j ° q p, j , (4.1)
p2P p2P M O j =1

where for each input pattern p belonging to the training set P , tp and qp denote the corresponding teaching input and
actual output, respectively. Observe that (4.1) accounts for the error committed on each input pattern in the training set,
leading to an offline learning procedure.
In our numerical tests, the weights are iteratively adjusted via the Levenberg-Marquardt algorithm [19, 32], whose cost
increases nonlinearly with the number of neurons in the network, making it poorly efficient for large networks. However, it
turns out to be efficient and accurate for networks with few hundreads of connections.

5. A non-intrusive reduced basis method using artificial neural networks

The scenario portrayed so far motivates the research for an alternative approach to tackle any online query within the
reduced basis framework. To this end, let us remark that there
© exists a one-to-one
™ correspondence
© between™ the reduced
space Vrb and the column space Col(V) of V. Indeed, letting ¡1 , . . . , ¡Nh be a basis for Vh and √1 , . . . , √L be the reduced

9
basis, from Eq. (3.2) follows:

L
X L
X Nh
X Nh °
X ¢
(j) (j)
Vrb 3 v L = v rb √ j = v rb Vi , j ¡i = V vrb i ¡i $ vL 2 Col(V) .
j =1 j =1 i =1 i =1

In particular, this implies that the projection of any v h 2 Vh onto Vrb in the discrete scalar product (·, ·)h ,

Nh
X
(¬h , ªh )h = ¬(i
h
) (i )
ªh = (¬h , ªh )RNh 8¬h , ªh 2 Vh , (5.1)
i =1

algebraically corresponds to the projection vV


h
of vh onto Col(V) in the Euclidean scalar product, given by

vV
h = P vh with P = VVT 2 RNh £Nh .

Note that vV
h
is the element of Col(V) closest to vh in the Euclidean norm, i.e.,
∞ ∞
∞vh ° vV ∞ N = inf kvh ° wh kRNh .
h R h
wh 2Col(V)

p
Therefore, the element of Vrb closest to the high-fidelity solution u h in the discrete norm k·kh = (·, ·)h can be expressed as

Nh °
X ¢ L °
X ¢
u hV (x; µ) = VVT uh (µ) j ¡ j (x) = VT uh (µ) i √i (x) .
j =1 i =1

Motivated by this last equality, once a reduced basis has been constructed (e.g., via POD of the snapshot matrix), we aim at
approximating the function
º : P Ω RP ! RL
(5.2)
µ 7! VT uh (µ) ,
T V
© ™ maps each input vector parameter µ 2 P to the coefficients V uh (µ) for the expansion of u h in the reduced basis
which
√i 1∑i ∑L . Then, given a new parameter instance µ, the associated RB solution is simply given by the evaluation at µ of the
so-built approximation º̂ of º, i.e.
urb (µ) = º̂(µ) ,
and, consequently,
uL (µ) = V º̂(µ) .
Note that, provided that the construction of º̂ is carried out within the offline stage, this approach leads to a non-intrusive
RB method, enabling a complete decoupling between the online step and the underlying full-order model. Moreover, the
accuracy of the resulting reduced solution uniquely relies on the quality of the reduced basis and the effectiveness of the
approximation º̂ of the map º, which we assume sufficiently smooth.
In the literature, different approaches for the interpolation of (5.2) have been developed, e.g., exploiting some geometri-
cal considerations concerning the discrete solution manifold Mh [1], or employing radial basis functions [11]. In this work
we resort to artificial neural networks for the nonlinear regression of the map º, leading to the POD-NN RB method. As
described in Subsection 4.3, any neural network is tailored to the particular application at hand by means of a preliminary
training phase. Here, we are concerned with a function regression task, thus we straightforwardly adopt a supervised
learning paradigm, training the perceptron via exposition to a collection of (known) input-output pairs
©° ° ¢ ¢™
Pt r = µ(i ) , VT uh µ(i ) 1∑i ∑N t r .

According to the notation and ° (inomenclature


¢ previously introduced, for any i = 1, . . . , N t r , pi = µ(i ) 2 RP represents the
T ) L
input pattern and ti = V uh µ 2 R the associated
° ¢ teaching input; together, they constitute a training pattern. In this
respect, note that the teaching inputs VT uh µ(i ) , i = 1, . . . , N t r , are generated through the FE solver. On the one hand,
this ensures the reliability of the teaching patterns, given the assumed high-fidelity of the FE scheme. On the other hand,

10
Algorithm 5.1 The offline and online stages for the POD-NN RB method.
£ §
1: function V, wrb = PODNN_ OFFLINE (P , ≠h , N )
© ™
2: generate the parameter set •N = µ(1) µ(N
© , .°. . ,(1) ¢
)
° (N ) ¢™
3: compute the high-fidelity solutions © uh µ ,™. . . , uh µ via FE-Newton’s method
4: generate the POD basis
£ Ø Ø § functions w 1 , . . . , wL via method of snapshots
5: assemble V = w1 Ø . . . Ø wL
6: find an optimal network configuration wrb relying upon LHS and Levenberg-Marquardt algorithm
7: end function

1: function uNN
L (µ) = PODNN_ ONLINE (µ, V, wrb )
2: evaluate the output uNN
rb (µ) of the network for the input vector µ
3: compute uNNL (µ) = V u NN (µ)
rb
4: end function

this also suggests to incorporate the learning phase of the perceptron in the offline step of the POD-NN RB method, as
described in Algorithm 5.1. In doing so, we exploit the natural decoupling between the training and the evaluation of neural
networks, thus fulfilling the necessary requirement to ensure online efficiency.
It should be pointed out that the design of an effective learning procedure may require a larger amount of snapshots than
the generation of the reduced space does. Moreover, we mentioned in Subsection 4.2 the time-consuming yet unavoidable
trial-and-error approach which one should pursue in the search for an optimal network topology. To this end, we propose
an automatic procedure.
While Cybenko’s theorems (see Subsection 4.2) allow one to consider perceptrons with no more than two hidden layers,
no similar a priori and general results are available for the number H of neurons per layer 2 . Hence, given an initial amount
N t r of training samples (say N t r = 100), we train the network for increasing values of H , stopping when overfitting of
training data occurs, due to an excessive number of neurons with respect to the number of training patterns. In case the
best configuration, i.e., the one yielding the smallest error on a test data set, does not meet a desired level of accuracy, we
generate a new set of snapshots, which will enlarge the current training set, and then proceed to re-train the network in
different configurations. At this point, it is worth pointing out two issues.
(i) Once the training set has been enriched, we can limit ourselves to network configurations including a number of
neurons no smaller than the current optimal network. Indeed, the error on the test data set is decreasing in the
number of patterns the neural network is exposed to during the learning phase, and the larger the number of neurons,
the faster the decay.

(ii) In order to maximize the additional quantity of information available for the learning, we should ensure that the new
training inputs, i.e., parameter values, do not overlap with the ones already present in the training parameter set. To
this aim, we pursue an heuristhic approach, employing, at each iteration, the latin hypercube sampling [25], which
provides a good compromise between randomness and even coverage of the parameter domain.
The procedure is then iterated until a satisfactory degree of accuracy and generalization is attained, or the available
resources (i.e., computing power, running time and memory space) are exhausted. Therefore, the speed-up enabled at the
online stage comes at the cost of an extended offline phase.
In our numerical tests, we rely on the mean squared error (4.1) as performance function driving the neural network
training process. To motivate this choice, let
uNN
rb (µ) 2 R
L

be the (actual) output provided by the network for a given input µ, and

uNN NN
L (µ) = V urb (µ) 2 Col(V) Ω R
Nh
.

2 Here we uniquely consider networks with the same number of neurons in both the first and the second hidden layer, but this is not required.

11
Then (omitting the dependence on the input vector for ease of notation):
° ¢ ∞ NN ∞2 ∞ NN ∞2 ∞ NN ∞
V ∞2
M SE uNN T ∞ T ∞ ∞ T ∞ ∞
rb , V uh / urb ° V uh RL = uL ° V V uh RNh = u L ° u h h ,

where
L °
X ¢
u LNN (x; µ) = uNN
rb (µ) i √i (x) 2 Vrb .
i =1
(i )
Therefore, for any training input µ , i = 1, . . . , N t r , by minimizing the MSE we minimize the distance (in the discrete norm
k·kh ) between the approximation provided by the neural network and the projection of the FE solution onto the reduced
space Vrb . The proper generalization to other parameter instances, not included in the training set, is then ensured by the
implementation of suitable techniques (e.g., early stopping [33], generalized cross validation [27]) aiming at preventing the
network to overfit the training data.

6. Numerical results

In this section, we present the numerical results obtained via the FE, POD-G and POD-NN methods applied to
parametrized full-Dirichlet boundary value problems (BVPs) for the one-dimensional and two-dimensional nonlinear
Poisson equation and the two-dimensional incompressible steady Navier-Stokes equations. In the one-dimensional case we
deal uniquely with physical parameters, whereas in two spatial dimensions we consider purely geometric parametrizations.
The two RB techniques considered in this work are compared both in terms of accuracy and performance during the
online stage. For this, let k·k be the canonical (Euclidean) norm on RNh . In the online phases of the POD-G and POD-NN
methods, for a new parameter value µ 2 P (not involved either in the generation of the POD basis nor in the identification
of an optimal neural network), the following relative errors with respect to the high-fidelity solution uh (µ) are analyzed:

(a) the POD-G relative error, ∞ ∞ ∞ ∞


∞uh (µ) ° uL (µ)∞ ∞uh (µ) ° V urb (µ)∞
"PODG (L, µ) = ∞ ∞ = ∞ ∞ ; (6.1)
∞uh (µ)∞ ∞uh (µ)∞

(b) the POD-NN relative error,


∞ ∞ ∞ ∞
∞uh (µ) ° uNN (µ)∞ ∞u (µ) ° V uNN (µ)∞
L h rb
"PODNN (L, µ) = ∞ ∞ = ∞ ∞ ; (6.2)
∞uh (µ)∞ ∞uh (µ)∞

(c) the relative projection error, i.e., the error committed by approximating the high-fidelity solution with its projection
(in the discrete scalar product (·, ·)h , see (5.1)) onto the reduced space Vrb ,
∞ ∞ ∞ ∞
∞uh (µ) ° uV (µ)∞ ∞u (µ) ° VVT u (µ)∞
h h h
"V (L, µ) = ∞ ∞ = ∞ ∞ . (6.3)
∞uh (µ)∞ ∞uh (µ)∞

Clearly, (6.3) provides a lower-bound for both (6.1) and (6.2).

The above errors are evaluated on a test parameter set •t e Ω P consisting of N t e randomly picked samples. Given that a
sufficiently large number of test values is chosen, statistics for "PODG (L, ·), "PODNN (L, ·) and "V (L, ·) on •t e can be reasonably
assumed independent of the particular choice made for •t e , thus making any subsequent error analysis reliable. In
particular, in our numerical studies we let N t e range from 50 up to 100, and we consider the average of the errors over the
test data set, respectively denoted by
P P P
µ2•t e "PODG (L, µ) µ2•t e "PODNN (L, µ) µ2•t e "V (L, µ)
"¯ PODG = "¯ PODG (L) = , "¯ PODNN = "¯ PODNN (L) = , "¯ V = "¯ V (L) = .
Nt e Nt e Nt e

In the training phase of neural networks, we rely upon the well-known cross-validation procedure [27]. While the training
samples •t r Ω P are generated via successive latin hypercube samplings (as explained in Section 5), the validation inputs
•v a Ω P are randomly picked through a Monte Carlo sampling, as done for •t e . The ratio between the size of •v a and •t r

12
is set to 0.3. The training is then performed using the multiple restarts approach to prevent the results from depending on
the way the weights are (randomly) initialized; see, e.g., [28, 33]. Typically, five restarts are performed for each network
topology. To ease the research for an optimal network configuration and by exploiting Cybenko’s results (see Subsection
4.2), we limit ourselves to three-layers neural networks, with the same number of neurons for both hidden layers. For each
topology, the hyperbolic tangent is used as activation function for the hidden neurons.
All the results presented in the following subsections have been obtained via a MATLAB® implementation of the FE,
POD-G and POD-NN methods, with all the simulations carried out on a laptop equipped with a CPU Intel® CoreTM i7 @
2.50 GHz.

6.1. Nonlinear Poisson equation


Despite a rather simple form, the Poisson equation has proved itself to be effective to model steady phenomena
occurring in, e.g., electromagnetism, heat transfer and underground flows [34]. We consider the following version of the
parametrized Poisson equation for a state variable ue = u(µ):
e
8 ° ¢
>
>°r e x , u(µ);
e · k(e e e u(µ)
µph ) r e x ; µph )
= se(e e g ),
in ≠(µ (6.4a)
>
<
u(µ)
e e æ;
= h( e µph ) eD ,
on ° (6.4b)
>
>
>
: k(e æ, e u(µ) eN .
e u(µ);
e µph ) r e ·n
e=0 on ° (6.4c)

Here, for any µg 2 P g , xe and æ


e denote a generic point in ≠ e and on °,
e respectively, r e is the nabla operator with respect to xe,
n
e = n( e denotes the outward normal to °
e æ) e ke : ≠
e in æ, e £ R £ P ph ! (0, 1) is the diffusion coefficient, se : ≠
e £ P ph ! R is the
source term, and h e : °eD £ P ph ! R encodes the Dirichlet boundary conditions. To ease the subsequent discussion, we
limit the attention to homogeneous Neumann boundary constraints.
Let us fix µ 2 P and set ° ¢ © ° ¢ Ø ™
Ve = Ve (µg ) = H°e1 ≠(µ
e g ) = v 2 H 1 ≠(µe g ) : v Øe = 0 ,
°D
D

Multiplying (6.4a) by a test function v 2 Ve , integrating over ≠,


e and exploiting integration by parts on the left-hand side,
yields: Z Z
e u(µ);
k( e e u(µ)
µph ) r e · rv e g)=
e d ≠(µ e g ),
se(µph ) v d ≠(µ (6.5)
e g)
≠(µ e g)
≠(µ

where we have omitted the dependence on the space variable xe for ease of notation. For the integrals in Eq. (6.5) to be
well-defined, we require, for any µg 2 P g ,
° ¢
|k(e e g ), r 2 R , and se(µph ) 2 L 2 ≠(µ
e x , r ; µg )| < 1 for almost any (a.a.) xe 2 ≠(µ e g) .
° ¢ Ø ° ¢
Let then le = le(µ) 2 H 1 ≠(µ
e g ) be a lifting function such that le(µ)Øe = h(µ
°D
e ph ), with h(µe ph ) 2 H 1/2 °
eD . We assume that
such a function can be constructed, e.g., by interpolation of the boundary condition. Hence, upon defining
Z Z
a(w,
e v; µ) := e + le(µ); µph ) rw
k(w e · rv e g)+
e d ≠(µ e + le(µ); µph ) r
k(w e le(µ) · rv e g ) 8w, v 2 Ve ,
e d ≠(µ
e g)
≠(µ e g)
≠(µ
Z (6.6)
fe(v; µ) := e g)
se(µph ) v d ≠(µ 8v 2 Ve ,
e g)
≠(µ

the weak formulation of problem (6.4) reads: given µ 2 P , find u(µ)


e 2 Ve (µg ) such that

a(
e u(µ),
e v; µ) = fe(v; µ) 8v 2 Ve (µg ) , (6.7)

Then, the weak solution of problem (6.4) is given by u(µ)


e + le(µ).
Let us now re-state the variational problem (6.7) onto the reference domain ≠. For this purpose, let °D and °N be the
portions of the boundary ° = @≠ on which we impose Dirichlet and Neumann boundary conditions, respectively. Moreover,
we denote by J© (·; µg ) the Jacobian of the parametrized map ©(·; µg ), with determinant |J© (·; µg )|, and we set
e
k(x, ·; µ) = k(©(x; e
µg ), ·; µph ), s(x; µ) = se(©(x; µg ); µph ) and h(x; µ) = h(©(x; µg ); µph ) .

13
Letting
V = H°1D (≠)
and exploiting standard change of variables formulas, the variational formulation of the Poisson problem (6.4) over ≠ reads:
given µ 2 P , find u(µ) 2 V such that
a(u(µ), v; µ) = f (v; µ) 8v 2 V ,
with Z
a(w, v; µ) = k(w + l (µ); µ) J°T °T
© (µg )rw · J© (µg )rv |J© (µg )| d ≠

Z
+ k(w + l (µ); µ) J°T °T
© (µg )rl (µ) · J© (µg )rv |J© (µg )| d ≠ ,

Z
f (v; µ) = s(µ) v |J© (µg )| d ≠ ,

Ø
for any w, v 2 V and µ 2 P . Note that, as in (6.6), we resort to a lifting function l (µ) 2 H 1 (≠) with l (µ)ذD = h(µ), such that
the weak solution to problem (6.4) re-stated over ≠ is obtained as u(µ) + l (µ).

6.1.1. One-dimensional test case


Consider the following BVP for the one-dimensional Poisson equation, featuring an exponential nonlinearity (in the
diffusion coefficient) and involving three parameters:
8 ° ¢ ≥ º º¥
> 0 0
< ° exp(u(µ)) u(µ) = s(x; µ) in ≠ = ° , ,
2 2
Ø ≥ ≥ ¥¥ ≥ ¥ (6.8)
>
: u(µ)Ø µ1 º µ3 º
x=±º/2 = µ2 2 ± sin exp ± .
2 2
° ¢ £ § £ § £ §
Here, µ = µ1 , µ2 , µ3 2 P = 1, 3 £ 1, 3 £ ° 0.5, 0.5 and s(·; µ) is defined such that
° ¢ ° ¢
u ex (x; µ) = µ2 2 + sin(µ1 x) exp µ3 x

is the exact solution to the problem for any µ 2 P . The left plot of Fig. 6.1 shows the convergence of the POD-G and POD-NN
solutions with respect to the number of basis functions L retained in the model, generated via POD of N = 100 FE snapshots

"¯ PODNN, N t r = 100, H{1,2} = 15 u h (µ) u LNN(µ), µ = (2.83, 2.91, 0.50)


105 6
"¯ PODNN, N t r = 200, H{1,2} = 25 u h (µ) u LNN(µ), µ = (2.09, 2.88, 0.44)
"¯ PODNN, N t r = 300, H{1,2} = 30 u h (µ) u LNN(µ), µ = (1.09, 2.60, °0.24)
103 4
"¯ PODNN, N t r = 400, H{1,2} = 35
"¯ PODG
2
101 "¯ V

0
10°1
°2
10°3
°4
10°5
°6
0 5 10 °2 °1 0 1 2
L x

Figure 6.1. Convergence analysis for the POD-G and POD-NN methods applied to problem (6.8) (left) and comparison between the
FE and the POD-NN solutions for three parameter values (right). These latter results have been obtained via a neural network with
H1 = H2 = 35 units per hidden layer and employing L = 10 POD modes.

14
1 1
100
N t r = 100 N t r = 200
N t r = 300 N t r = 400
10°1

10°2
"¯ PODNN
10°3

10°4

10°5
15 20 25 30 35
H1 = H2
Figure 6.2. Error analysis for the POD-NN RB method applied to problem (6.8) for several numbers of training samples. The solid sections
represent the steps followed by the automatic routine described in Section 5.

computed over a uniform mesh of 100 nodes. Observe how the performance of the POD-NN method depends highly on the
number of training patterns and hidden neurons used. In fact, for N t r = 100 and H1 = H2 = 15 (upward-pointing triangles),
the method can approximate only the first five generalized coordinates,
1
actually providing more satisfactory results than
POD-G does. Yet, the error stagnates for L > 5. However, as we expand the training set, we obtain more accurate predictions
for a larger number of POD coefficients, provided that we allow the size of the network to increase. Particularly, employing
N t r = 400 training samples and H1 = H2 = 35 internal neurons (right-pointing triangles), the POD-NN error features an
exponential decay for L ∑ 10, resembling the projection error (squares). The reliability of the proposed RB method is also
confirmed by the right plot of Fig. 6.1, offering a comparison between the FE and POD-NN solutions for some parameter
values.
Diving deeper into the analysis of the POD-NN method, Fig. 6.2 offers the results concerning the search for an optimal
three-layers network configuration using the methodology described in Section 5. The steps followed by the routine are
represented by solid tracts; however, for the sake of completeness, we also report the test error (i.e., the minimum over
multiple restarts) for any considered number of hidden neurons per layer (H1 and H2 ) and any dimension of the training
set (N t r ). We remark that the basic assumption which the proposed routine relies on is fulfilled: as the number of available
samples increases, the amount of neurons which allows to attain the minimum error increases as well, while the minimum
error itself decreases.

6.1.2. Two-dimensional test case


e
Let µ = [µ1 , µ2 , µ3 ] 2 P = [°0.5, 0.5] £ [°0.5, 0.5] £ [1, 5] and ≠(µ) be the stenosis geometry reported on the left in Fig.
6.3, parametrized in the depths µ1 and µ2 of the bottom and top restrictions (or inflations), respectively, and the length µ3
of the vessel. The Poisson problem we deal with reads:
( ° ¢
e · exp(u(µ))
°r e e u(µ)
r e = se(e
x , ye) e
in ≠(µ) ,
(6.9)
u(µ)
e e x sin(º æ
=æ e x ) cos(º æ
ey ) e
on @≠(µ) .

with the source term se = se(e


x , ye) properly chosen so that the exact solution to the problem is given by

ueex (e
x , ye) = ye sin(º xe) cos(º ye)

for all µ 2 P . Although the state equation has an exponential nonlinearity and the computational domain presents curvilin-
ear boundaries, both of the RB methodologies studied in this work provide accurate solutions, close to the optimal one, i.e.,
u hV (µ). This can be seen in Fig. 6.4, offering the finite element solution (top left) to (6.9) when µ = (0.349, °0.413, 4.257),

15
(0, 1) °3 (1, 1)
µ2 e3
°

e2 e
≠ e4 °2 ≠ °4
H ° °

e1
° µ1
µ3 (0, 0) (1, 0)
°1

Figure 6.3. The physical (left) and reference (right) domains for the Poisson problem (6.9).

Figure 6.4. FE solution (top left) to the Poisson problem (6.9) with µ = (0.349, °0.413, 4.257), and pointwise errors yielded by either
its projection onto Vrb (top right), or the POD-G method (bottom left), or the POD-NN method (bottom right). The results have been
obtained by employing L = 30 POD modes.

and the pointwise errors committed by either the projection onto Vrb (top right), or the POD-G method (bottom left), or
the POD-NN method (bottom right). The computational mesh consists of 2792 nodes, resulting in Nh = 2632 degrees of
freedom (as many as the inner nodes) for the FE scheme. The reduced space Vrb is generated by L = 30 basis functions,
given by POD of N = 100 snapshots. Specifically, the results concerning the POD-NN method have been obtained by
employing a neural network equipped with H1 = H2 = 35 neurons per hidden layer and trained with N t r = 200 learning
samples. In this way, the POD-NN procedure leads to an error which shows patterns similar to those featured by the
projection error. This is not surprising, due to the way neural networks are trained within the POD-NN framework (see
Section 5).
A quantitative evidence of the efficacy of the RB approximations is given in the plot on the left in Fig. 6.5, which reports
the convergence analysis for both the POD-G and the POD-NN methods with respect to the number L of retained modal
basis functions. As usual, the error has to be interpreted as an average over a parameter data set •t e , here consisting of
N t e = 50 samples. However, the second plot of Fig. 6.5 reveals how the former scheme produces a response for any online
query approximately 103 times slower than the proposed POD-NN procedure does.

16
100 102

101

Online run time [s]


10°1
100

10°1
10°2
"¯ PODNN, N t r = 100, H{1,2} = 20
"¯ PODNN, N t r = 200, H{1,2} = 35 10°2
"¯ PODG
"¯ V
10°3 10°3
0 10 20 30 0 20 40
L Sample

Figure 6.5. Error analysis (left) and online CPU time (right) for the POD-G and the POD-NN methods applied to problem (6.9). The
reduced basis have been generated via POD, relying on N = 100 snapshots. The second plot refers to RB models including L = 30 modal
functions; within the POD-NN framework, an MLP emboding 35 neurons per inner layer has been used.

100 100 1
N t r = 100
1
N t r = 200
"¯ V, L = 30

10°1 10°1
"¯ PODNN
"¯ PODNN

10°2 10°2
Nt r = 100, H1 = H2 = 20
Nt r = 100, H1 = H2 = 35
Nt r = 200, H1 = H2 = 20
Nt r = 200, H1 = H2 = 35
10°3 10°3
15 20 25 30 35 0 10 20 30
H1 = H2 L
Figure 6.6. Convergence analysis with respect to the number of hidden neurons (left) and modal functions (right) used within the
POD-NN framework applied to problem (6.9). The results provided in the first plot have been obtained using L = 30 modes; the solid
tracts refer to the steps performed by the automatic routine carried out to find an optimal network configuration.

Let us further analyze the first plot of Fig. 6.5. When the proper orthogonal decomposition of a set of N = 100 snapshots
is coupled with a three-layers perceptron
1
with 35 neurons per hidden layer, trained to approximate
1 the map (5.2) relying
on N t r = 200 learning patterns, the relative error is smaller than the error committed by the POD-Galerkin procedure for
each L ∑ 30. Yet, halfing the number of learning patterns employed, the POD-NN error curve stops decreasing at L = 20,
then incurring a plateu. In this regard, Fig. 6.6 suggests that as the size of the available training set decreases, one should
reduce the amount of computing units to be embodied in the network, to limit the risk of overfitting. Particularly, when
only N t r = 100 training samples are used, the optimal network configuration comprises 20 neurons per internal layer (left).
However, doubling the dimension of the training set, the same configuration does not allow to fully exploit the augmented
amount of information. In that case, a network with, e.g., 35 neurons per hidden layer is preferable (right).
The distinguishing and novel feature of the POD-NN method is represented by the employment of multi-layer percep-
trons to recover the coefficients of the reduced model. To motivate this choice, let us pursue a more traditional approach in

17
100 10°1

Online run time [s]


10°1

10°2

10°2
"¯ PODCS, N± = 53 = 125
"¯ PODCS, N± = 73 = 343
"¯ PODCS, N± = 103 = 1000
"¯ PODNN, N t r = 200
10°3 10°3
0 10 20 30 0 20 40
L Sample
Figure 6.7. Average relative errors (left) and online run times (right) on •t e for the POD-CS and the POD-NN methods applied to problem
(6.9). For the latter, the results refer to an MLP equipped with H1 = H2 = 35 neurons per hidden layer.

the interpolation step by resorting to cubic splines [14]. To this end, let •± Ω P be a tensor-product grid on the parameter
domain, based on, e.g., Chebyshev nodes. In the offline phase, for each µ± 2 •± , we compute the truth solution uh (µ± ) 2 RNh
1
and we extract the expansion coefficients1 VT uh (µ± ) 2 RL . At the online stage, given a new parameter value µ 2 P , the i -th
expansion coefficient, i = 1, . . . , L, is sought by cubic spline interpolation of the samples
©° ° ¢ ¢™
µ± , VT uh (µ± ) i µ .
± 2•±

Hence, denoting by uCS L T


rb (µ) 2 R the approximation of V uh (µ), the reduced order approximation of uh (µ) is given by
CS CS
uL (µ) = V urb (µ). Similarly to the POD-G and the POD-NN methods, the accuracy of the resulting POD-CS procedure can
be assessed by evaluating and averaging the relative error "PODCS (L, µ), defined as
∞ ∞ ∞ ∞
∞uh (µ) ° uCS (µ)∞ ∞u (µ) ° V uCS (µ)∞
L h rb
"PODCS (L, µ) = ∞ ∞ = ∞ ∞ ,
∞uh (µ)∞ ∞uh (µ)∞

on a test parameter set •t e Ω P , with •t e \ •± = ;.


For the Poisson problem (6.9), we report in the left plot of Fig. 6.7 the relative errors by both the POD-CS and the POD-
NN methods. For the former, we provide the results obtained on tensor-product grids consisting of N± interpolation points,
with N± 2 {53 , 73 , 103 }. For the latter, H1 = H2 = 35 neurons per hidden layer and N t r = 200 training patterns have been used.
Although the online performance of the two procedures are basically the same, as shown by the second plot in Fig. 6.7, we
observe that the level of predictive accuracy enabled by the POD-NN method can be attained or at least approached by
cubic spline interpolation only when this relies on N± = 1000 samples. In fact, as mentioned in the Introduction, a standard
interpolation technique may require a large number of samples to be able to enforce the constraints characterizing the
nonlinear manifolds which the reduced bases typically belong to [1]. Hence, although this approach provides a valuable
alternative for paremetrized problems with a few parameters, it may be infeasible in real-life applications involving a large
number of parameters. Conversely, the hope behind the choice of a neural network-based regression is that the amount of
required snapshots properly scales with the dimension of the parameter space. This would justify the overheads due to the
training phase.

6.2. Steady incompressible Navier-Stokes equations


The system of the Navier-Stokes equations model the conservation of mass and momentum for an incompressible
Newtonian viscous fluid confined in a two- or three-dimensional region [42]. Letting µ ¥ µg and ve = ve(e
x ; µ) and pe = p(e
e x ; µ)
be, respectively, the velocity and pressure of the fluid, the parametrized steady version of the Navier-Stokes equations

18
considered here reads
8
>
> e · ve(µ) = 0
r e
in ≠(µ) , (6.10a)
>
>
>
> 1
>
< e ve(µ) + (ve(µ) · r)
e ve(µ) + e p(µ) e
°∫(µ) ¢ r e =0 in ≠(µ) , (6.10b)
Ω(µ)
>
>
>
> e
ve(µ) = h eD (µ) ,
on ° (6.10c)
>
>
>
:
e n
p(µ) e ve(µ) · n
e ° ∫(µ)r e=0 e N (µ) .
on ° (6.10d)

Here, he denotes the velocity field prescribed on °eD (µ), whereas homogeneous Neumann conditions are applied on ° e N (µ).
Furthermore, Ω(µ) and ∫(µ) represent the uniform density and kinematic viscosity of the fluid, respectively. Note that,
despite that these quantities encode physical properties, we let them depend on the geometrical parameters as well. Indeed,
fluid dynamics can be characterized (and controlled) by means of a wealth of dimensionless quantities, e.g., the Reynold’s
number, which combine physical properties of the fluid with geometrical features of the domain. Therefore, a numerical
study of the sensitivity of the system (6.10) with respect to µ may be carried out by adapting either Ω(µ) or ∫(µ) as µ varies,
so to preserve a dimensionless quantity of interest.
To write the differential system (6.10) in weak form over a µ-independent configuration ≠, let us introduce the velocity
and pressure spaces
£ ° ¢§d ° ¢
Xe (µ) = H°e1 ≠(µ)
e e
and Q(µ) = L 2 ≠(µ)
e ,
D

respectively, with
£ ° ¢§d ° ¢
X = H°1D ≠ and Q = L 2 ≠
° ¢
e ªe 2 Ve (µ) = Xe (µ) £ Q(µ),
be their respective counterparts over ≠. By multiplying (6.10) by test functions ¬, e integrating by
parts and then tracing everything back onto ≠ by means of the parametrized map ©(·; µ), we end up with the following
parametrized weak variational problem: given µ 2 P , find u(µ) = (v (µ), p(µ)) 2 V = X £Q so that

a(v (µ), ¬; µ) + c(v (µ), v (µ), ¬; µ) + d (v (µ), ¬; µ) + b(p(µ), r · ¬; µ) = f 1 (¬; µ) ,


b(r · v (µ), ª; µ) = f 2 (ª; µ) ,

for all (¬, ª) 2 V , with, for any µ 2 P ,


Z
° ¢
c(√, ¬, ¥; µ) = √ · J°T
© (µ)r ¬ · ¥ |J© (µ)| d ≠ , d (√, ¬; µ) = c(l (µ), √, ¬; µ) + c(√, l (µ), ¬; µ) ,

Z
a(√, ¬; µ) = ∫(µ) J°T °T
© (µ)r√ : J© (µ)r¬ |J© (µ)| d ≠ , f 1 (√; µ) = °a(l (µ), √; µ) ° c(l (µ), l (µ), √; µ) ,

Z
1 ° °T ¢
b(√, ª; µ) = ° J© (µ)r · √ ª |J© (µ)| d ≠ , f 2 (ª; µ) = °b(l (µ), ª; µ) ,
Ω(µ) ≠
£ §d
for all √, ¬, ¥ 2Ø X and ª 2 Q. In the definitions of d (·, ·; µ), f 1 (·; µ) and f 2 (·; µ), l (µ) 2 H 1 (≠) denotes the lifting vector
field, with l (µ)ذD = h(µ), h(x; µ) = h(©(x;
e µ)) being the velocity field prescribed on °D . Hence, the weak solution to (6.10)
° ¢
defined over the fixed domain ≠ is given by v (µ) + l (µ), p(µ) .

6.2.1. The lid-driven cavity problem


In this subsection, we are concerned with the numerical simulation of a viscous flow within a parallelogram-shaped
cavity, illustrated in Fig. 6.8 (left). The geometry of the domain is affected by three parameters: µ1 £2 [1, 2] and
§ µ2 2 [1, 2]
define the length of the horizontal and slanting (possibly vertical) edges, respectively, whereas µ3 2 º/6, 5º/6 denotes the
angle between the oblique sides and the positive xe-semiaxis. The Dirichlet boundary conditions enforced on the velocity
field, graphically represented in Fig. 6.8 (center), are such that the flow is driven by a unit horizontal velocity at the top
boundary [38]. Hence, this benchmark is typically referred to as the lid-driven cavity problem. In addition, we fix the
pressure at the lower-left corner, in order to retain the uniqueness of the solution [16]. Therefore, the computational mesh
≠h (Fig. 6.8, right) is refined in the upper part of the domain, so to properly resolve the steep velocity gradient near the
upper boundary and the pressure singularities at the top corners.

19
e3
° vexe = 1, veye = 0 1

0.8

0.6
µ2
e2 e
≠(µ) vexe = 0 e vexe = 0
e4 ≠(µ)

y
° °
veye = 0 veye = 0 0.4

0.2
e1
° µ3
pe = 0 0
µ1 vexe = 0, veye = 0 0 0.2 0.4 0.6 0.8 1
x

Figure 6.8. Computational domain ≠(µ)e (left), enforced velocity at the boundaries (center) and computational mesh ≠h (right) for the
lid-driven cavity problem for the incompressible Navier-Stokes equations.

° p ¢ ° ¢ ° p ¢
(a) µ = 1, 2/ 3, 2º/3 (b) µ = 1, 1, º/2 (c) µ = 1, 2/ 3, º/3

° p ¢ ° ¢ ° p ¢
(d) µ = 1, 2/ 3, 2º/3 (e) µ = 1, 1, º/2 (f ) µ = 1, 2/ 3, º/3

Figure 6.9. Velocity streamlines (top) and pressure distribution (bottom) for the lid-driven cavity problem, computed through the FE
method. Three different parameter values are considered; for all configurations, the Reynold’s number is 400.

For the Navier-Stokes equations, a suitable choice of the FE spaces is crucial to fulfill the well-known inf-sup stability
condition [42]. A common and effective choice consists in using quadratic finite elements for the components of the
velocity field and linear finite elements for the pressure, leading to the so-called P2 ° P1 (or Taylor-Hood) FE discretization
[38]. Figure 6.9 shows the velocity streamlines°(top) and the ¢ pressure
° distribution
¢ ° (bottom) ¢obtained through the resulting
p p
FE method for, from the left to the right, µ = 1, 2/ 3, 2º/3 , µ = 1, 1, º/2 or µ = 1, 2/ 3, º/6 . Observe how the moving lid
induces a large velocity gradient close to the top boundary, while it only slightly affects the fluid in the lower part of the

20
cavity. In turn, this gives rise to two vortices, whose extent highly depends on the shape of the domain, or, equivalently, µ.
Conversely, the pressure field does not undergo noticeable alterations across the parameter space. For all the configurations,
a low-pressure region takes place at the upper-left corner of the domain and in the center of the major vortex (although it
may be not clearly visible). Instead, the upper-right corner represents a stagnation point for the xe-velocity, and so therein
the pressure assumes values larger than in the rest of the cavity [16]. All the results presented in Fig. 6.9 refer to a Reynold’s
number of 400. For the specific problem at hand, the Reynold’s number reads:
© ™
max µ1 , µ2
Re = .
∫(µ)
In our analyses, ∫(µ) is adapted as the geometry varies so that the Reynold’s number is either 200 or 400.
Regarding the reduced order modeling of the parametrized problem of interest, we slightly distance ourselves from the
general workflow depicted in Sections 3 and 5. Indeed, for the Navier-Stokes equations it is convenient to construct two
separate reduced spaces [2, 11, 41]: X rb , of dimension L v , for the velocity field and Q rb , of dimension L p , for the pressure
distribution, respectively represented by the matrices
h Ø Ø i h Ø p i
v pØ p
Vv = √1v Ø . . . Ø √Lv v 2 RNh £L v and Vp = √1 Ø . . . Ø √L 2 RNh £L p ,
p

p
with Nhv (respectively, Nh ) the dimension of the FE velocity (resp., pressure) space. Although the underpinning finite
element solver has been designed to fulfill the inf-sup condition, thus ensuring the stability of the scheme, when dealing
with incompressible flows this property may not be automatically inherited by the POD-Galerkin reduced order solver.
As a result, it may show severe instabilities [8]. Indeed, one has to carefully choose the reduced velocity space X rb so to
meet an equivalent of the inf-sup condition at the reduced level. To this end, in our simulations we resort to a supremizer
° ¢ p
enrichment of X rb , as proposed in [2]. At each pressure snapshot ph µ(n) 2 RNh , n = 1, . . . , N , we associate the supremizer
° (n) ¢ v
solution sh µ 2 RNh , defined as the solution to a linear system of the form
° ¢ ° ¢ ° ¢
A sh µ(n) = B µ(n) ph µ(n) ,
v v ° ¢ v p
with A 2 RNh £Nh and B µ(n) 2 RNh £Nh . Letting
h Ø Ø i v
Vs = √1s Ø . . . Ø √Ls s 2 RNh £L s
© ° ¢ ° ¢™
be the POD basis of rank L s extracted from the ensemble of snapshots sh µ(1) , . . . , sh µ(N ) , the POD-G approximation
of the velocity field is then sought in the form
Lv
X Ls
X
(i ) v (i ) s
v̄L (µ) = Vv v̄rb (µ) = v rb √i + s rb √i ,
i =1 i =1

where ∑ ∏
v vrb (µ)
Vv = [Vv Vs ] 2 RNh £(L v +L s ) and v̄rb (µ) = 2 RL v +L s .
srb (µ)
In [2], several numerical evidences suggest that a proper value for L s should lay in between L p /2 and L p . In our simulations,
we set L s = L p = L v , leading to a stable reduced (nonlinear) system in L = 3 L v unknowns.
Concerning the POD-NN procedure, for any µ 2 P , reduced order discretizations vNN NN
L v (µ) and pL p (µ) of v (µ) and p(µ),
respectively, are sought in the form

vNN NN
L v (µ) = Vv vrb (µ) and pNN NN
L p (µ) = Vp prb (µ) .

Here, vNN NN
rb (µ) and prb (µ) denote the output vectors provided by two distinct multi-layer perceptrons, respectively trained
with the ensemble of input-output patterns
© (i ) T ° (i ) ¢™ © ° ¢™
µ , Vv vh µ 1∑i ∑N t r and µ(i ) , VTp ph µ(i ) 1∑i ∑Nt r .

21
100 v v 102 p p
"¯ PODNN, N t r = 100, H{1,2} = 15
"¯ PODNN , N t r = 100, H{1,2} = 15
v v p p
"¯ PODNN , N t r = 200, H{1,2} = 30 "¯ PODNN, N t r = 200, H{1,2} = 20
v v p p
"¯ PODNN , N t r = 300, H{1,2} = 35 ¯"PODNN, N t r = 300, H{1,2} = 30
p
v "¯ PODG
"¯ PODG
10°1 v
"¯ V 100 p
"¯ V

10°2 10°2

10°3 10°4
10 20 30 10 20 30
Lv Lp

0
10 v
"¯ PODNN v
, N t r = 100, H{1,2} = 20
102 p p
"¯ PODNN, N t r = 100, H{1,2} = 15
v v p p
"¯ PODNN , N t r = 200, H{1,2} = 35 "¯ PODNN, N t r = 200, H{1,2} = 20
v v p p
"¯ PODNN , N t r = 300, H{1,2} = 40 "¯ PODNN, N t r = 300, H{1,2} = 30
v p
"¯ PODG "¯ PODG
10°1 v
"¯ V 100 p
"¯ V

1 1

10°2 10°2

10°3 10°4
10 20 30 10 20 30
Lv Lp

Figure 6.10. Velocity (left) and pressure (right) error analysis for the POD-G and POD-NN methods applied to the lid-driven cavity
problem with Re = 200 (top) and Re = 400 (bottom).

At this regard, let us point out two important remarks.

(i) The supremizer solutions are 1 not involved (either directly or indirectly) in the POD-NN
1 framework. Indeed, they
ensure the overall stability of the POD-G procedure, but they do not improve the accuracy of the method, and so they
can be disregarded whenever stability is not an issue.

(ii) The routine outlined in Section 5, aiming at finding an optimal network configuration, is applied separately to the
perceptrons designated to predict the velocity field and to the perceptrons required to approximate the pressure
distribution. As a result, once the offline phase of the POD-NN method is completed, we end up with two different
p p
networks, equipped with H1v = H2v and H1 = H2 computing units per hidden layer, respectively.

Figure 6.10 reports the error committed by both RB procedures when approximating the velocity field (left) and the
pressure distribution (right) by means of L v velocity modes, L p pressure modes and, exclusively for the POD-Galerkin
method, L s supremizer modes, with 5 ∑ L v = L s = L p ∑ 35. The plots on the top refer to a Reynold’s number of 200, the
ones on the bottom to a Reynold’s number of 400. Note that for clarity of illustration, the symbols denoting the projection,
POD-G and POD-NN errors have been identified with a superscript (either v or p) recalling the state variable they refer to.
While the error yielded on the velocity field shows an almost perfect exponential decay with the number of POD modes
included in the RB model, the POD-G method faces difficulties in providing a correct recovery of the pressure, already
for Re = 200. Indeed, the corresponding error curve is not monotone, and generally is at least one order of magnitude

22
10°1 100
N t r = 100 N t r = 300 N t r = 100 N t r = 300
v p
N t r = 200 "¯ PODG , L = 35 N t r = 200 "¯ PODG, L = 35

10°1
"¯ PODNN

"¯ PODNN
10°2
v

p
10°2

10°3 10°3
15 20 25 30 35 20 30 40
H1v = H2v p
H1 = H2
p

10°1 100

10°1
"¯ PODNN

"¯ PODNN

1 1

10°2
v

10°2

N t r = 100 N t r = 300 N t r = 100 N t r = 300


v
N t r = 200 "¯ PODG , L = 35 p
"¯ PODG, L = 35
N t r = 200
10°3 10°3
20 30 40 20 30 40
H1v = H2v p
H1 = H2
p

Figure 6.11. Convergence analysis with respect to the number of hidden neurons and training samples used within the POD-NN
procedure to approximate the velocity field (left) and the pressure distribution (right) by means of 35 modal functions. The Reynold’s
number is either 200 (top) or 400 (bottom).

above the projection error. Conversely,1


the POD-NN method attains a satisfactory predictive 1
accuracy. The advantages of
resorting to a neural network-based nonlinear regression coupled with POD are evident when the approximation is built
upon a few basis functions, say L p < 20. Moreover, the proposed routine recommends the use of fewer neurons to predict
the POD coefficients for the pressure than for the velocity, thus resulting in a lighter perceptron.
This is confirmed also by Fig. 6.11, which provides a sensitivity analysis of the predictive accuracy featured by the
POD-NN method with respect to the amount of neurons and training samples used. Observe that for the pressure (right),
employing more than 30 neurons within each hidden layer is counter-productive, both for Re = 200 (top) and Re = 400
(bottom). This assertion agrees with the peculiar role played by the pressure in the parametrized lid-driven cavity problem,
with similar patterns featured across the entire parameter domain.
On the contrary, we have seen that the velocity field presents more complex dynamics, highly varying with the domain
configuration. As a result, an optimal approximation of the velocity is obtained for the maximum values of Hiv , i = 1, 2, and
N t r tested, that is, H1v = H2v = 35 and N t r = 300 for Re = 200, H1v = H2v = 40 and N t r = 300 for Re = 400. Moreover, we may
not be able to exactly attain the same precision enabled by the standard POD-Galerkin procedure, although the results are
quite similar. However, this (slight) loss of accuracy is offset by a (great) reduction in the online run time: the POD-NN
method takes around 2/100 of seconds per query, against the average 40 seconds required by the POD-G method; see Fig.
6.12. As for the test cases for the Poisson problem, this comes at the cost of a longer offline phase.

23
102 102

101 101
Online run time [s]

Online run time [s]


100 100

10°1 10°1

10°2 10°2
0 20 40 60 0 20 40 60
Sample Sample
Figure 6.12. Online run times for the POD-G and the POD-NN method applied to the lid-driven cavity problem with Re = 200 (left) and
Re = 400 (right). N t e = 75 test configurations are considered. For the POD-NN method, the reported times include the (sequential)
evaluation of both neural networks for the velocity and pressure field.

1 1

µ = (1.12, 1.70, 1.08), FEM µ = (1.90, 1.50, 1.60), FEM µ = (1.78, 1.99, 2.29), FEM

µ = (1.12, 1.70, 1.08), POD-NN µ = (1.90, 1.50, 1.60), POD-NN µ = (1.78, 1.99, 2.29), POD-NN

Figure 6.13. xe-velocity contour at three parameter values, as computed through the FE (top row) and POD-NN (bottom row) method. For
each configuration, the Reynold’s number is 400.

Another numerical evidence of the predictive accuracy of the proposed POD-NN RB method is provided in Fig. 6.13,
showing the contour plots for the xe-velocity computed through the FE (top row) and the POD-NN (bottom row) scheme.
Three different configurations, corresponding to as many input vectors, are considered; the Reynold’s number is fixed to
400. We can appreciate good agreement between the solutions given by the full-order and the reduced order methods.
Lastly, Fig. 6.14 compares the streamlines obtained through the direct method (top) and the proposed reduced basis
approach (bottom) over the three configurations previously considered. Streamlines provide an interesting test, as minor
variations in velocity contours may lead to substantial differences in the streamlines [11]. Nevertheless, we observe a good

24
µ = (1.12, 1.70, 1.08), FEM µ = (1.90, 1.50, 1.60), FEM µ = (1.78, 1.99, 2.29), FEM

µ = (1.12, 1.70, 1.08), POD-NN µ = (1.90, 1.50, 1.60), POD-NN µ = (1.78, 1.99, 2.29), POD-NN

Figure 6.14. Streamlines at three parameter values, as computed through the FE (top row) and POD-NN (bottom row) method. For each
configuration, the Reynold’s number is 400.

agreement between the two methods. In particular, in the second example, the POD-NN method is still able to detect the
two micro recirculation zones at the lower corners of the domain. On the other hand, the method partially fails in properly
describing the velocity field at the bottom-left corner in the first configuration and at the bottom-right corner in the last
configuration. However, these are effectively dead zones, and one can safely disregard these little imprecisions.

7. Conclusion

In this work, we propose a non-intrusive RB method (referred to as POD-NN) for parametrized steady-state PDEs.
The method extracts a reduced basis from a collection of snapshots through a POD procedure and employs multi-layer
perceptrons to approximate the coefficients of the reduced model. By exploiting the fundamental results by Cybenko
(see Subsection 4.2), we limit ourselves to perceptrons endowed with two hidden layers. The identification of the optimal
number of inner neurons and the minimum amount of training samples to prevent overfitting is performed during the
offline stage through an automatic routine, relying upon the latin hypercube sampling and the Levenberg-Marquardt
training algorithm. On the one hand, this guarantees a complete decoupling between the offline and the online phase,
with the latter having a computational cost independent of the dimension of the full-order model. On the other hand, this
extends the offline stage with respect to a standard projection-based RB procedure, making the POD-NN method practically
convenient only when the underlying PDE has to be solved for many parameter values (many-query context).
The POD-NN method has been successfully tested on the nonlinear Poisson equation in one and two spatial dimensions,
and on two-dimensional cavity viscous flows, modeled through the steady incompressible Navier-Stokes equations. In
particular, the proposed RB strategy enables the same predictive accuracy provided by the POD-Galerkin method while
reducing the CPU time required to process an online query by two or even three order of magnitudes.
All test cases considered in our numerical studies involved three parameters, affecting either physical or geometrical
factors of the differential problem. The extension of the POD-NN method to time-dependent problems depending on many
parameters is left as future work.
Lastly, let us point out that although in this work we used POD to recover a reduced space, this choice is not binding,
and a greedy approach may also be pursued.

25
References

[1] Amsallem., D. (2010). Interpolation on manifolds of CFD-based fluid and finite element-based structural reduced-order
models for on-line aeroelastic predictions. Doctoral dissertation, Department of Aeronautics and Astronautics, Stanford
University.

[2] Ballarin, F., Manzoni, A., Quarteroni, A., Rozza, G. (2014). Supremizer stabilization of POD-Galerkin approximation of
parametrized Navier-Stokes equations. MATHICSE Technical Report, École Polytechnique Fédérale de Lausanne.

[3] Barrault, M., Maday, Y., Nguyen, N. C., Patera, A. T. (2004). An ’empirical interpolation’ method: Application to efficient
reduced-basis discretization of partial differential equations. Comptes Rendus Mathematique, 339(9):667-672.

[4] Barthelmann, V., Novak, E., Ritter, K. (2000). High-dimensional polynomial interpolation on sparse grids. Advances in
Computational Mathematics, 12(4):273-288.

[5] Bendsøe, M. P., Sigmund, O. (2004). Topology optimization: Theory, methods and applications. Heidelberg, DE: Springer
Science & Business Media.

[6] Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., Mercer, R. L. (1993). The mathematics of statistical machine translation:
Parameter estimation. Computational linguistics, 19(2):263-311.

[7] Buffa, A., Maday, Y., Patera, A. T., Prud’Homme, C., Turinici, G. (2012). A priori convergence of the greedy algorithm for
the parametrized reduced basis method. ESAIM: Mathematical Modelling and Numerical Analysis, 46:595-603.

[8] Burkardt, J., Gunzburger, M., Lee, H. C. (2006). POD and CVT-based reduced-order modeling of Navier-Stokes flows.
Computer Methods in Applied Mechanics and Egninnering, 196:337-355.

[9] Casenave, F., Ern, A., Lelièvre, T. (2015). A nonintrusive reduced basis method applied to aeroacoustic simulations.
Advances in Computational Mathematics, 41:961-986.

[10] Chaturantabut, S., Sorensen, D. C. (2010). Nonlinear model reduction via discrete empirical interpolation. SIAM Journal
on Scientific Computing, 32(5):2737-2764.

[11] Chen, W., Hesthaven, J. S., Junqiang, B., Yang, Z., Tihao, Y. (2017). A greedy non-intrusive reduced order model for fluid
dynamics. Submitted to American Institute of Aeronautics and Astronautics.

[12] Cybenko, G. (1988). Continuous valued neural networks with two hidden layers are sufficient. Technical Report,
Department of Computer Science, Tufts University.

[13] Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and
Systems, 2(4):303–314.

[14] De Boor, C. (1978). A practical guide to splines. New York, NY: Springer-Verlag.

[15] Deparis, S. (2008). Reduced basis error bound computation of parameter-dependent Navier-Stokes equations by the
natural norm approach. SIAM Journal of Numerical Analysis, 46(4):2039-2067.

[16] Dhondt, G. (2014). CalculiX CrunchiX user’s manual. Available at http://web.mit.edu/calculix_v2.7/CalculiX/


ccx_2.7/doc/ccx/node1.html.

[17] Eckart, C., Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1:211-218.

[18] Eftang, J. L. (2008). Reduced basis methods for partial differential equations. Master thesis, Department of Mathematical
Sciences, Norwegian University of Science and Technology.

[19] Hagan, M. T., Menhaj, M. B. (1994). Training feedforward networks with the Marquardt algorithm. IEEE Transactions
on Neural Networks, 5(6):989-993.

26
[20] Hagan, M. T., Demuth, H. B., Beale, M. H., De Jesús, O. (2014). Neural Network Design, 2nd Edition. Retrieved from
http://hagan.okstate.edu/NNDesign.pdf.
[21] Haykin, S. (2004). Neural Networks: A comprehensive foundation. Upper Saddle River, NJ: Prentice Hall.

[22] Liang, Y. C., Lee, H. P., Lim, S. P., Lin, W. Z., Lee, K. H., Wu, C. G. (2002) Proper orthogonal decomposition and its
applications - Part I: Theory. Journal of Sound and Vibration, 252(3):527-544.

[23] Hesthaven, J. S., Stamn, B., Rozza, G. (2016). Certified reduced basis methods for parametrized partial differential
equations. New York, NY: Springer.

[24] Hesthaven, J. S., Stamn, B., Zhang, S. (2014). Efficienty greedy algorithms for high-dimensional parameter spaces with
applications to empirical interpolation and reduced basis methods. ESAIM: Mathematical Modelling and Numerical
Analysis, 48(1):259-283.

[25] Imam, R. L. (2008). Latin hypercube sampling. Encyclopedia of Quantitative Risk Analysis and Assessment.

[26] Jaggli, C., Iapichino, L., Rozza, G. (2014). An improvement on geometrical parametrizations by transfinite maps. Comptes
Rendus de l’Académie des Sciences Paris, Series I, 352:263-268.

[27] Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of
the 40t h International Joint Conference on Artificial Intelligence, 2(12):1137-1143.

[28] Kriesel, D. (2007). A Brief Introduction to Neural Networks. Retrieved from http://www.dkriesel.com/en/science/
neural_networks.
[29] Le Maître, O., Knio, O. M. (2010). Spectral methods for uncertainty quantification with applications to computational
fluid dynamics. Berlin, DE: Springer Science & Business Media.

[30] Lee, E. B., Markus, L. (1967). Foundations of optimal control theory. New York, NY: John Wiley & Sons.

[31] Maday, Y. (2006) Reduced basis method for the rapid and reliable solution of partial differential equations. Proceedings
of the International Congress of Mathematicians, Madrid, Spain, 1255-1269.

[32] Marquardt, D. W. (1963). An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for
Industrial and Applied Mathematics, 11(2):431-441.

[33] The MathWorks, Inc. (2016). Machine learning challenges: Choosing the best model and avoiding overfit-
ting. Retrieved from https://it.mathworks.com/campaigns/products/offer/common-machine-learning-
challenges.html.
[34] Mitchell, W., McClain, M. A. (2010). A collection of 2D elliptic problems for testing adaptive algorithms. NISTIR 7668.

[35] Manzoni, A., Negri, F. (2016). Automatic reduction of PDEs defined on domains with variable shape. MATHICSE
technical report, École Polytechnique Fédérale de Lausanne.

[36] Nielsen, M. A. (2015). Neural Networks and Deep Learning. Determination Press.

[37] Negri, F., Manzoni, A., Amsallem, D. (2015). Efficient model reduction of parametrized systems by matrix discrete
empirical interpolation. Journal of Computational Physics, 303:431-454.

[38] Persson, P. O. (2002). Implementation of finite-element based Navier-Stokes solver. Massachussets Institue of Technol-
ogy.

[39] Prud’homme, C., Rovas, D. V., Veroy, K., Machiels, L., Maday, Y., Patera, A. T., Turinici, G. (2002). Reliable real-
time solution of parametrized partial differential equations: Reduced-basis output bound methods. Journal of Fluids
Engineering, 124(1):70-80.

27
[40] Quarteroni, A. (2010). Numerical models for differential problems (Vol. 2). New York, NY: Springer Science & Business
Media.

[41] Quarteroni, A., Manzoni, A., Negri, F. (2015). Reduced basis methods for partial differential equations: An introduction
(Vol. 92). New York, NY: Springer, 2015.

[42] Rannacher, R. (1999). Finite element methods for the incompressible Navier-Stokes equations. Lecture notes, Institute of
Applied Mathematics, University of Heidelberg.

[43] Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain.
Psychological Review, 65:386-408.

[44] Schmidt, E. (1907). Zur theorie der linearen und nichtlinearen integralgleichungen. I. Teil: Entwicklung willkürlicher
funktionen nach systemen vorgeschriebener. Mathematische Annalen, 63:433-476.

[45] Stergiou, C., Siganos, D. (2013). Neural Networks. Retrieved from https://www.doc.ic.ac.uk/~nd/surprise_96/
journal/vol4/cs11/report.html#Introductiontoneuralnetworks.

[46] Volkwein, S. (2008). Model reduction using proper orthogonal decomposition. Lecture notes, Department of Mathemat-
ics, University of Konstanz.

28

You might also like