A High Level High Performance
A High Level High Performance
Program summary
✩ This paper and its associated computer program are available via the Computer Physics Communication homepage on ScienceDirect (http://www.sciencedirect.com/
science/journal/00104655).
∗ Corresponding author at: University of Oslo, Moltke Moes vei 35, 0851 Oslo, Norway.
E-mail address: [email protected] (M. Mortensen).
http://dx.doi.org/10.1016/j.cpc.2014.10.026
0010-4655/© 2014 Elsevier B.V. All rights reserved.
178 M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188
Solution method:
The finite element method.
Unusual features:
FEniCS automatically generates and compiles low-level C++ code based on high-level Python code.
Running time:
The example provided takes a couple of minutes on a single processor.
© 2014 Elsevier B.V. All rights reserved.
1. Introduction ational form that is to be solved, but does not need to actually per-
form any coding on the level of the computational cell, or element.
The Navier–Stokes equations describe the flow of incompress- A choice is made of finite element basis functions, and code is then
ible, Newtonian fluids. The equations are transient, nonlinear and generated for the form accordingly. There is a large library of pos-
velocity is non-trivially coupled with pressure. A lot of research has sible finite elements to choose from and they may be combined
been devoted to finding efficient ways of linearizing, coupling and both implicitly in a coupled manner or explicitly in a segregated
solving these equations. Many commercial solvers for Computa- manner—all at the same level of complexity to the user. The user
tional Fluid Dynamics (CFD) are available, and, due to the complex- never has to see the generated low-level code, but, this being an
ity of the high-level implementations (usually Fortran or C), users open source project, the code is wide open for inspection and even
are often operating these solvers through a Graphical User Inter- manual fine-tuning and optimization is possible.
face (GUI). To implement a generic, unstructured Navier–Stokes In this paper we will describe the Navier–Stokes solver Oasis,
solver from scratch in a low-level language like C or Fortran is that is written from scratch in Python, using building blocks
a considerable and time-consuming task involving tens of thou- from FEniCS and the PETSc backend. Our goal with this paper
sands of lines of error prone code that require much maintenance. is to describe a code that is (i) short and easily understood, (ii)
Nowadays, as will be shown in this paper, the use of new and mod- easily configured and (iii) as fast and accurate as state-of-the-art
ern high-level software tools enables developers to cut the size of Navier–Stokes solvers developed entirely in low-level languages.
programs down to a few hundred lines and development times to We assume that the reader has some basic knowledge of how
hours. to write simple solvers for partial differential equations using the
The implementation of any unstructured (Eulerian) CFD-solver FEniCS framework. Otherwise, reference is given to the online
requires a computational mesh. For most CFD software packages FEniCS tutorial [12].
today the mesh is generated by a third-party software like, e.g., the
open source projects VMTK [1], Gmsh [2] or Cubit [3]. To solve 2. Fractional step algorithm
the governing equations on this computational mesh, the equa-
tions must be linearized and discretized such that a solution can be In Oasis we are solving the incompressible Navier–Stokes
found for a certain (large) set of degrees of freedom. Large systems equations, optionally complemented with any number of passive
of linear equations need to be assembled and subsequently solved or reactive scalars. The governing equations are thus
by appropriate direct or iterative methods. Like for mesh gener- ∂u
ation, basic linear algebra, with matrix/vector storage and opera- + (u · ∇)u = ν∇ 2 u − ∇ p + f , (1)
∂t
tions, is nowadays most commonly outsourced to third-party soft-
∇ · u = 0, (2)
ware packages like PETSc [4] and Trilinos [5] (see, e.g., [6–8]).
With both mesh generation and linear algebra outsourced, the ∂ cα
+ u · ∇ cα = Dα ∇ 2 cα + fα , (3)
main job of CFD solvers boils down to linearization, discretization ∂t
and assembly of the linear system of equations. This is by no means where u(x, t ) is the velocity vector, ν the kinematic viscosity,
a trivial task as it requires, e.g., maps from computational cells to p(x, t ) the fluid pressure, cα (x, t ) is the concentration of species
global degrees of freedom and connectivity of cells, facets and ver- α and Dα its diffusivity. Any volumetric forces (like buoyancy)
tices. For parallel performance it is also necessary to distribute the are denoted by f (x, t ) and chemical reaction rates (or other
mesh between processors and set up for inter-communication be- scalar sources) by fα (c ), where c (x, t ) is the vector of all species
tween compute nodes. Fortunately, much of the Message Passing concentrations. The constant fluid density is incorporated into
Interface (MPI) is already handled by the providers of basic lin- the pressure. Note that through the volumetric forces there is a
ear algebra. When it comes down to the actual discretization, the possible feedback to the Navier–Stokes equations from the species,
most common approaches are probably the finite volume method, and, as such, a Boussinesq formulation for natural convection (see,
which is very popular for fluid flow, finite differences or the finite e.g., [13]) is possible within the current framework.
element method. We will now outline a generic fractional step method, where
FEniCS [9] is a generic open source software framework that the velocity and pressure are solved for in a segregated manner.
aims at automating the discretization of differential equations Since it is important for the efficiency of the constructed solver,
through the finite element method. FEniCS takes full advantage of the velocity vector u will be split up into its individual components
specialized, reliable and robust third-party providers of computa- uk .1 Time is split up into uniform intervals2 using a constant time
tional software and interfaces to both PETSc and Trilinos for linear
algebra and several third-party mesh generators. FEniCS utilizes
the Unified Form Language (UFL, [10]) and the FEniCS Form Com- 1 FEniCS can alternatively solve vector equations where all components are
piler (FFC, [11]) to automatically generate low-level C++ code that coupled.
efficiently evaluates any equation formulated as a finite element 2 It is trivial to use nonuniform intervals, but uniform is used here for
variational form. The FEniCS user has to provide the high-level vari- convenience.
M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188 179
step △t = t n − t n−1 , where superscript n is an integer and Set time and initial conditions
t n ∈ R+ . The governing equations are discretized in both space t=0
and time. Discretization in space is performed using finite for time steps n = 0, 1, 2, . . . do
elements, whereas discretization in time is performed with finite t = t + dt
differences. Following Simo and Armero [14] the generic fractional for inner iterations i = 0, 1, . . . do
step algorithm can be written as ϕ = p∗ = pn−1/2
solve (4) for uIk , k = 1, . . . , d
uIk − ukn−1 n−1/2 n−1/2
+ Bk = ν∇ 2 ũk − ∇k p∗ + fk solve (5) for pn−1/2
△t ϕ = pn−1/2 − ϕ
for k = 1, . . . , d, (4) end
solve (6) for unk , k = 1, . . . , d
1
∇ 2ϕ = − ∇ · uI , (5) solve (7) for cαn
△t update to next time step
unk − uIk end
= −∇k ϕ for k = 1, . . . , d, (6) Algorithm 1: Generic fractional step algorithm for the Navier–
△t Stokes equations.
cαn − cαn−1
+ Bαn−1/2 = Dα ∇ 2 c̃α + fαn−1/2 , (7)
△t We now have an algorithm that can be used to integrate the
where unk is component k of the velocity vector at time t n , d is solution forward in time, and it is clear that the fractional step
the dimension of the problem, ϕ = pn−1/2 − p∗ is a pressure algorithm allows us to solve for the coupled velocity and pressure
correction and p∗ is a tentative pressure. We are solving for the fields in a segregated manner. We should mention here that there
velocity and pressure on the next time step, i.e., unk for k = 1, . . . , d are plenty of similar, alternative algorithms for time stepping
of segregated solvers. The most common algorithm is perhaps
and pn−1/2 . However, the tentative velocity Eq. (4) is solved with
Pressure Implicit with Splitting of Operators (PISO) [15], which is
the tentative velocity component uIk as unknown. To avoid strict
used by both Ansys-Fluent [16], Star-CD [17] and OpenFOAM [18].
time step restrictions, the viscous term is discretized using a semi-
A completely different strategy would be to solve for velocity and
implicit Crank–Nicolson interpolated velocity component ũk = 0.5
n−1/2 pressure simultaneously (coupled solvers). Using FEniCS such a
(uIk + unk −1 ). The nonlinear convection term is denoted by Bk , coupled approach is straightforward to implement, and, in fact, it
indicating that it should be evaluated at the midpoint between requires less coding than the segregated one. However, since the
time steps n and n − 1. Two different discretizations of convection coupled approach requires more memory than a segregated, and
are currently used by Oasis since there are more issues with the efficiency of linear algebra
n−1/2 3 1 solvers, the segregated approach is favored here.
Bk = un−1 · ∇ ukn−1 − un−2 · ∇ unk −2 , (8) We are still left with the spatial discretization and the actual im-
2 2
n−1/2
plementation. To this end we will first show how the implementa-
Bk = u · ∇ ũk , (9) tion can be performed naively, using very few lines of Python code.
where the first is a fully explicit Adams–Bashforth discretization We will then, finally, describe the implementation of the high-
and the second is implicit, with an Adams–Bashforth projected performance solver.
convecting velocity vector u = 1.5 un−1 − 0.5 un−2 and
Crank–Nicolson for the convected velocity. Both discretizations are 3. Variational formulations for the fractional step solver
second order accurate in time, and, since the convecting velocity is
known, there is no implicit coupling between the (possibly) three The governing PDEs (4)–(7) are discretized with the finite el-
velocity components solved for. ement method in space on a bounded domain Ω ⊂ Rd , with
n−1/2
Convection of the scalar is denoted by Bα . The term must 2 ≤ d ≤ 3, and the boundary ∂ Ω . Trial and test spaces for the
n velocity components are defined as
be at most linear in cα and otherwise any known velocity and
scalar may be used in the discretization. Note that solving for cαn V = {v ∈ H 1 (Ω ) : v = u0 on ∂ Ω },
n n−1/2
the velocity u will be known and may be used to discretize Bα .
The discretization used in Oasis is V̂ = {v ∈ H 1 (Ω ) : v = 0 on ∂ Ω }, (10)
n−1/2
Bα = u · ∇ c̃α where u0 is a prescribed velocity component on part ∂ Ω of the
boundary and H 1 (Ω ) is the Sobolev space containing functions v
where c̃α = 0.5 (cαn + cαn−1 ). such that v 2 and |∇v|2 have finite integrals over Ω . Both the scalars
An iterative fractional step method involves solving Eq. (4) for and pressure use the same H 1 (Ω ) space without the restricted
all tentative velocity components and (5) for a pressure correction. boundary part. The test functions for velocity component and pres-
The procedure is repeated a desired number of times before finally sure are denoted as v and q, respectively, whereas the scalar simply
a velocity correction (6) is solved to ensure conservation of mass uses the same test function as the velocity component.
before moving on to the next time step. The fractional step method To obtain a variational form for component k of the tentative
can thus be outlined as shown in Algorithm 1. Note that if the velocity vector, we multiply Eq. (4) by v and then integrate over
momentum equation depends on the scalar (e.g., when using a the entire domain using integration by parts on the Laplacian
Boussinesq model), then there may also be a second iterative
uk − ukn−1
I
n−1/2
loop over Navier–Stokes and temperature. The iterative scheme + Bk v + ν∇ ũk · ∇v dx
shown in Algorithm 1 is based on the observation that the tentative Ω △t
velocity computed in Eq. (4) only depends on previous known
n−1/2
solutions un−1 , un−2 and not un . As such, the velocity update can = −∇k p∗ + fk v dx + ν∇n ũk v ds. (11)
be placed outside the inner iteration. In case of an iterative scheme Ω ∂Ω
where the convection depends on un (e.g., un · ∇ ũk ) the update Here ∇n represents the gradient in the direction of the outward
would have to be moved inside the inner loop. normal on the boundary. Note that the trial function uIk enters also
180 M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188
Fig. 2. The opening section of NSfracStep.py. Allocation of necessary storage and parameters for solving the momentum equation through its segregated components.
Note that a mesh, some parameters (for e.g., viscosity, time step, end time, etc.), and some functions (for e.g., body force, boundary conditions or initializing the solution)
must be imported from the problem module. The UFL function as_vector creates vectors (u_, u_1, u_2) from the segregated velocity components. The built-in function
vars() returns the current modules namespace. Neglecting scalar components the list sys_comp = ["u0", "u1", "p"] for 2D and ["u0", "u1", "u2", "p"]
for 3D problems. The list is used as keys for the dictionary bcs.
and zero for the remaining walls. We start the simulations from a problem parameters can be found in the dictionary NS_parameters
fluid at rest and advance the solution in time steps of △t = 0.001 declared in problems/__init__.py, and all these parameters may be
from t = 0 to t = 1. The viscosity is set to ν = 0.001. This overloaded, either as shown in Fig. 5, or through the command line.
problem can be implemented as shown in Fig. 5. Here we have A comprehensive list of parameters and their use is given in
made use of the standard python package numpy and two dolfin the user manual. We use preconditioned iterative Krylov solvers
classes UnitSquareMesh and DirichletBC. UnitSquareMesh creates (NS_parameters["use_krylov_solvers"]=True), and not the default
a computational mesh on the unit square, whereas DirichletBC direct solvers based on LU decomposition, since the former here
creates Dirichlet boundary conditions for certain segments of the are faster and require less memory (the exact choice of iterative
boundary identified through two strings noslip and top (x[0] and solvers is discussed further in Section 7). Note that FEniCS inter-
x[1] represent coordinates x and y respectively). A default set of faces to a wide range of different linear algebra solvers and precon-
182 M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188
Each term inside the parenthesis on the right hand side represents
a matrix
Mij = φj φi dx, (18)
Ω
Kij = ∇φj · ∇φi dx. (19)
Ω
The two matrices are independent of time and can be preassem-
bled once through (u, v are trial and test functions respectively)
M = assemble ( inner (u, v)*dx)
K = assemble ( inner (grad(u), grad(v))*dx)
Note that the solution vectors and matrices represent the major
cost in terms of memory use for the solver. The matrices are sparse
and allocated by the linear algebra backend, using appropriate
wrappers that are hidden to the user. The allocation takes place
just once, when the matrices/vectors are created.
The nonlinear convection form contains the evolving solution
and requires special attention. We use the implicit convection
form given in Eq. (9) and write out the implicit Crank–Nicolson
convected velocity for component k
1
u · ∇ uIk + unk −1 .
u · ∇ ũk = (20)
2
Inserting for the algebraic form of the finite element trial and test
functions, the variational form for the bilinear convection term
becomes
Nu
k,I
u · ∇ uIk v dx = u · ∇φj φi dx Uj , (21)
Ω j=1 Ω
Fig. 8. Inside assemble_first_inner_iter. Fast assemble of coefficient
where u = 1.5 un−1 − 0.5 un−2 . The convection matrix can be matrix and parts of right hand side vector. A temporary rhs vector b_tmp is used for
each velocity component since this routine is called only on the first inner iteration.
recognized as the term inside the parenthesis x_1 is the vector of degrees of freedom at t n−1 .
n−1/2 We may now reformulate our variational problem on the
Cij = u · ∇φj φi dx. (22)
Ω algebraic level using the three assembled matrices. It is required
The convecting velocity is time-dependent and interpolated at that for each test function v = φi , i = 1, . . . , Nu , the following
t n−1/2 . As such, the convection matrix is also evaluated at n − 1/2 equations must hold
k,I
and needs to be reassembled each time step. To simplify notations, Mij Uj − Ujk,n−1 Cij Uj
k,I
+ Ukj ,n−1
though, we have for the rest of this paper omitted the time notation +
on Cij . The assembly of the Cij matrix is prepared in the setup △t 2
function:
k,I
# Defined in setup Kij Uj + Ujk,n−1
k,n−1/2
u_ab = as_vector ([ Function (V) f o r i i n +ν = Φi , (26)
range (len( u_components ))]) 2
aconv = inner (v, dot(grad(u), u_ab))*dx where
k,n−1/2 n−1/2
where u_ab is used as a container for the convecting velocity u. Note Φi = −∇k p∗ + fk φi dx. (27)
that u_ab,is assembled (see Fig. 8) before assembling the matrix Ω
Cij , because this leads to code that is a factor 2 faster than simply If separated into bilinear and linear terms, the following system of
using a form based on the velocity functions at the two previous algebraic equations is obtained
levels directly (i.e., aconv = inner(v, dot(grad(u), 1.5*u_1 -
Mij Cij k,I Kij
0.5*u_2))*dx). + +ν Uj
△t 2 2
Consider now the linear terms,
Nu where the known solution func-
k,n−1
tion is written as ukn−1 = φj , where Ukj ,n−1 are the
j=1 Uj
Mij Cij Kij k,n−1 k,n−1/2
= − −ν Uj + Φi . (28)
known coefficients of velocity component k at the previous time △t 2 2
step t n−1 . We have the following linear terms in Eq. (11) If now Aij = Mij /△t + Cij /2 + ν Kij /2 is used as the final coefficient
matrix, then the equation may be written as
k,n−1
unk −1 v dx = Mij Uj , (23)
Ω k,I 2Mij k,n−1 k,n−1/2
Aij Uj = − Aij Uj + Φi , (29)
k,n−1 △t
∇ ukn−1 · ∇v dx = Kij Uj , (24)
or
Ω
k,I k,n−1/2
Aij Uj = bi , for k = 1, . . . , d, (30)
k,n−1
u · ∇ ukn−1 v dx = Cij Uj , (25)
k,n−1/2
Ω where bi is the right hand side of (29). Note that the same
that are all very quickly computed using simple matrix vector coefficient matrix is used by all velocity components, even when
products. there are Dirichlet boundary conditions applied.
M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188 185
An efficient algorithm (2) can now be designed to assemble both The Laplacian matrix K̂ij can be preassembled. If the pressure
large parts of the right hand side and the left hand side of Eq. (30) function space is the same as the velocity function space, then
at the same time. K̂ij = Kij and no additional work is required. The divergence term
may be computed as
Assemble Aij ←− Cij
Aij = Mij /dt − Aij /2 − ν Kij /2
d Nu
∇ · uI
1 k,I
k,n−1/2 k,n−1/2
φ̂i dx = ∇k φj φ̂i dx Uj ,
bi = fi + Aij Ukj ,n−1 , Ω △t △t k=1 j=1 Ω
for k = 1, . . . , d d
1 k,I
Aij = −Aij + 2Mij /dt . = dUkij Uj , (34)
△t k=1
Algorithm 2: Efficient algorithm for assembling the coefficient where the matrices dUkij for k = 1, . . . , d can be preassembled.
matrix Aij , where most of the right hand side of Eq. (30) is Again, the cost is three additional sparse matrices, unless the
assembled in an intermediate step. function spaces of pressure and velocity are the same. In that case
dUkij = dPijk and memory can be saved. If the low_memory_version
Algorithm (2) is implemented as shown in Fig. 8. At the end of is chosen, then we simply use the slower finite element assembly
this algorithm, most of bk,n−1/2 (except from the pressure gradient) assemble ((1/dt)*div(u_)*q*dx)
has been assembled and the coefficient matrix Aij is ready to be
used in Eq. (30). The convection matrix needs to be reassembled The final step for the fractional step solver is the velocity update
each new time step, but only on the first inner velocity pressure that can be written for component k as
iteration since u only contains old and known velocities, not
k,n n−1/2
the new unk . For this reason the code in Fig. 8 is placed inside Mij Uj = Mij Ukj ,I − △t dPijk Pj , (35)
assemble_first_inner_iter, called in Fig. 4. Notice that there is
no separate matrix used for 2Mij /△t − Aij or Cij and the total where Uj
k,I
and Pj
n−1/2
now are the known degrees of freedom
memory cost of the algorithm is exactly three individual sparse k ,n
of tentative velocity and pressure respectively, whereas Uj
matrices (Aij , Mij and Kij ). The sparsity pattern of the matrices is
represent the unknowns. The velocity update requires a linear
computed on the first assemble and the matrix axpy operations
algebra Krylov or direct solve and as such it is quite expensive
take advantage of the fact that all these matrices share the same
even though the equation is cheap to assemble. For this reason the
pattern.
k,n−1/2 velocity update has an additional option to use either a weighted
The linear term Φi needs some further comments.
k,n−1/2
gradient matrix3 Gkij or lumping of the mass matrix, that allows the
Neglecting the constant forcing, f , the second part of Φi is update to be performed directly
k ,n n−1/2
−∇k p∗ φi dx, (31) Ui = Uki ,I − △t Gkij Pj , for i = 1, . . . , Nu . (36)
Ω
Np The parameter used to enable the direct approach is
j=1 Pj φ̂j , φ̂j is the basis function for the pressure
∗
where p∗ = NS_parameters["velocity_update_type"] that can be set to
∗
and Pj are the known degrees of freedom. On algebraic form we "gradient_matrix" or "lumping".
get
Np
6. Verification of implementation
∇k p φi dx =
∗
∇k φ̂j φi dx Pj∗ ,
Ω j=1 Ω The fractional step algorithm implemented in NSfracStep is tar-
geting transient flows in large-scale applications, with turbulent as
= dPijk Pj ,∗
(32)
well as laminar or transitional flow. It is not intended to be used
where dPijk for k = 1, . . . , d are d matrices that are constant in as a steady state solver.4 Oasis has previously been used to study,
time. Since the matrices can be preassembled, the computation e.g., blood flow in highly complex intracranial aneurysms [21,22],
k,n−1/2 where the results compare very well with, e.g., the spectral el-
of Φi through a matrix vector product is very fast.
ement code NEKTAR [23]. Simulations by Steinman and Valen-
Unfortunately, though, three additional matrices require storage
Sendstad [22] are also commented by Ventikos [24], who state this
(in 3D), which may be too expensive. In that case there is a
is ‘‘the right way to do it’’—referring to the need for highly resolved
parameter in Oasis that can be used. Setting
CFD simulations of transitional blood flow in aneurysms.
NS_parameters [" low_memory_version "] = False
Considering the end use of the solver in biomedical applications
enables the creation of the matrices dPijk . If disabled the term is and research, it is essential that we establish the accuracy as well
computed simply through as the efficiency of the solver.
assemble ( inner (p_.dx(k), v)*dx)
6.1. 2D Taylor–Green flow
for k = 0, . . . , d − 1. The pressure gradient is added to bk in
velocity_tentative_assemble and not in Fig. 8, since the pressure Two dimensional Taylor–Green flow is one of very few non-
is modified on inner iterations. trivial analytical and transient solutions to the Navier–Stokes
The pressure correction equation can also be optimized on the equations. For this reason it is often used for verification of com-
Np n−1/2
algebraic level. Using trial function pn−1/2 = j =1 Pj φ̂j and puter codes. The implementation can be found in Oasis/problems/
test function q = φ̂i we can write (12) for each test function
Table 1 Table 2
Taylor–Green flow convergence errors O(hk ), where h and k are mesh size and order Taylor–Green flow convergence errors O(dt k ), where dt and k are time step and
of convergence respectively. ∥ · ∥h represents an L2 norm. The velocity is either order of convergence respectively. The velocity uses Lagrange elements of degree
quadratic (P2) or linear (P1), whereas the pressure is always linear (P1). four (P4), whereas the pressure uses third degree (P3).
h ∥u − ue ∥h k ∥p − pe ∥h k P4P3
P2P1 dt ∥u − ue ∥h k ∥p − pe ∥h k
6 Due to two periodic directions the number of degrees of freedom for the fine
5 A direct numerical simulation indicates a simulation where all scales of mesh are 128 · 129 · 128 for each velocity component and pressure, which is the
turbulence have been resolved. same as used by MKM.
M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188 187
Fig. 11. Normal Reynolds stresses scaled by u2τ shown as functions of scaled
distance to the wall y+ . Dotted and dashed curves are computed with Oasis using
respectively 6 · 643 and 6 · 1283 computational cells. The solid lines are from the
reference solution of MKM. The three different profiles represent, in decreasing
magnitude, the normal stresses uu , ww + and vv + , where u, v and w are velocity
+
are used per CPU, and thus we have run our simulations using 8
CPUs for the coarse mesh and 64 for the fine, which contains 12.5
million tetrahedrons. Simulations for the fine grid take approxi-
mately 1.5–1.7 s real time per computational time step depending
on traffic and distribution on Abel (20%–25% lower for the coarse
simulations) and thus close to 12 h for the entire test (30,000 time
steps). Approximately 75% of the computing time is spent in the
linear algebra backend’s iterative Krylov solvers and assembling of
the coefficient matrix, as detailed in Section 5, is responsible for
most of the remaining time. The backend (here PETSc) and the it-
erative linear algebra solvers are thus key to performance. For the
Fig. 9. Implementation of the Channel problem. tentative velocity computations we have used a stabilized version
of a biconjugate gradient squared solver [28] with a very cheap
(fast, low memory) Jacobi preconditioner imported from method
get_solvers, where it is specified as KrylovSolver(‘bicgstab’,
‘jacobi’). This choice is justified since the tentative velocity coeffi-
cient matrix is diagonally dominant due to the short time steps, and
each solve requires approximately 10 iterations to converge (the
same number for coarse and fine). The pressure coefficient matrix
represents a symmetric and elliptic system and thus we choose a
solver based on minimal residuals [29] and the hypre [30] algebraic
multigrid preconditioner (KrylovSolver(‘minres’, ‘hypre_amg’)).
The pressure solver uses an average of 6 iterations (same for coarse
and fine) to converge to a given tolerance. The velocity update is
computed using a lumped mass matrix and no linear algebra solver
is thus required for this final step.
Fig. 10. Mean velocity in x-direction normalized by uτ as a function of scaled 7. Concluding notes on performance
distance to the wall y+ . Dotted and dashed curves are computed with Oasis using
respectively 6 · 643 and 6 · 1283 computational cells. The solid line is the reference The computational speed of any implicit, large-scale
solution from MKM.
Navier–Stokes solver is determined by many competing factors,
but most likely it will be limited by hardware and by routines for
an upwind scheme or a monotonically integrated implicit LES [27], setting up (assembly) and solving for its linear algebra subsystems.
would have the same effect as a LES model that adds viscosity and In Oasis, and many comparable Navier–Stokes solvers, the linear al-
as such could lead to coarse simulations with mean velocity pro- gebra is performed through routines provided by a backend (here
files closer to MKM. PETSc) and are thus arguably beyond our control. Accepting that
Fig. 11 shows the normal, non-dimensionalized, Reynolds we cannot do better than limits imposed by hardware and the
stresses. The results confirm that the underresolved stresses are backend, the best we can really hope for through high-level imple-
underpredicted close to the wall, whereas the fine simulations con- mentations is to eliminate the cost of assembly. In Oasis we take all
verge towards the spectral MKM results. conceivable measures to do just this, as well as even reducing the
The channel simulations do not require more computational number of required linear algebra solves. As mentioned in the pre-
power than can be provided by a relatively new laptop computer. vious section, for the turbulent channel case with 12.5 mill. tetra-
However, since these simulations are run for more than 30,000 hedrons, 75% of the computational time was found spent inside
time steps to sample statistics, we have performed parallel com- very efficient Krylov solvers and we are thus, arguably, pushing at
putations on the Abel supercomputer at the University of Oslo. The the very boundaries of what may be achieved by a solver developed
simulations scale weakly when approximately 200,000 elements with similar numerical schemes, using the same backend.
188 M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188
To further support this claim, without making a complete com- [7] Fluidity. URL: imperial.ac.uk/earthscienceandengineering.
parison in terms of accuracy, we have also set up and tested the [8] Oofem—object oriented finite element solver. URL: http://www.oofem.org/en/
oofem.html.
channel simulations on Abel for two low-level, second order ac- [9] FEniCS. URL: http://fenicsproject.org.
curate, semi-implicit, fractional step solvers, OpenFOAM [18] and [10] M.S. Alnæs, A. Logg, K.B. Ølgaard, M.E. Rognes, G.N. Wells, Unified form
CDP [31], that both are targeting high performance on massively language: A domain-specific language for weak formulations of partial
differential equations, ACM Trans. Math. Software 40 (2) (2014).
parallel clusters. We used channelFoam, distributed with Open- [11] R.C. Kirby, A. Logg, A compiler for variational forms, ACM Trans. Math. Software
FOAM version 2.2.1 [32] and 2.5.0 version of CDP (requires license). 32 (3) (2006) 417–444. http://dx.doi.org/10.1145/1163641.1163644. URL:
The channelFoam LES solver was modified slightly to run with con- http://doi.acm.org/10.1145/1163641.1163644.
[12] FEniCS tutorial. URL: http://fenicsproject.org/documentation/tutorial.
stant viscosity, and parameters were set to match the finest chan- [13] M.A. Christon, P.M. Gresho, S.B. Sutton, Computational predictability of time-
nel simulations using 1283 hexahedral cells. OpenFOAM used for dependent natural convection flows in enclosures (including a benchmark
the tentative velocity a biconjugate gradient [28] solver with a di- solution), Internat. J. Numer. Methods Fluids (2012).
[14] J. Simo, F. Armero, Unconditional stability and long-term behavior of transient
agonal incomplete-LU preconditioner. For the pressure a conju- algorithms for the incompressible Navier–Stokes and Euler equations,
gate gradient solver was used with a diagonal incomplete Cholesky Comput. Methods Appl. Mech. Engrg. 111 (1994) 111–154.
preconditioner. The CDP solver was set up with the same hexahe- [15] R.I. Issa, Solution of the implicitly discretized fluid flow equations by operator-
splitting, J. Comput. Phys. 62 (1985) 40–65.
dral mesh as OpenFOAM, using no model for the LES subgrid vis- [16] Ansys-fluent. URL: www.ansys.com.
cosity. The linear solvers used by CDP were very similar to those [17] Star-CD. URL: http://www.cd-adapco.com/products/star-cd.
used by Oasis, with a Jacobi preconditioned biconjugate gradient [18] OpenFOAM—The open source CFD toolbox. URL: www.openfoam.com.
[19] Oasis user manual. URL: https://github.com/mikaem/Oasis/blob/master/doc/
solver for the tentative velocity and a generalized minimum resid-
usermanual.pdf.
ual method [33] with the hypre algebraic multigrid preconditioner. [20] fenicstools. URL: https://github.com/mikaem/fenicstools.
Depending on traffic on Abel, both CDP and channelFoam required [21] D.A. Steinman, et al., Variability of computational fluid dynamics solutions
approximately 1.4–1.7 s real time per time step, which is very close for pressure and flow in a giant aneurysm: The asme 2012 summer
bioengineering conference cfd challenge, J. Biomech. Eng. 135 (2) (2013).
to the speed obtained by Oasis. For both CDP and OpenFOAM speed [22] K. Valen-Sendstad, D.A. Steinman, Mind the gap: Impact of computational
was strongly dominated by the Krylov solvers and both showed the fluid dynamics solution strategy on prediction of intracranial aneurysm
same type of weak scaling as Oasis on the Abel supercomputer. hemodynamics and rupture status indicators, Am. J. Neuroradiol. 35 (3) (2014)
536–543.
[23] NEKTAR. URL: http://wwwf.imperial.ac.uk/ssherw/spectralhp/nektar.
[24] Y. Ventikos, Resolving the issue of resolution (2014).
Acknowledgment
[25] F. Guillén-gonzález, G. Tierra, Superconvergence in velocity and pressure for
the 3D time-dependent Navier-Stokes equations, SeMA J. 57 (1) (2012) 49–67.
This work has been supported by a Center of Excellence grant [26] R.D. Moser, J. Kim, N.N. Mansour, Direct numerical simulation of turbulent
from the Research Council of Norway to the Center for Biomedical channel flow up to re_tau = 590, Phys. Fluids 11 (1999) 943–945.
[27] C. Fureby, F.F. Grinstein, Monotonically integrated large eddy simulation of
Computing at Simula Research Laboratory. free shear flows, AIAA J. 37 (5) (1999) 544–556.
[28] H. van der Vorst, Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG
for the Solution of Nonsymmetric Linear Systems, SIAM J. Sci. Stat. Comput. 13
References (2) (1992) 631–644.
[29] C.C. Paige, M.A. Saunders, Solution of sparse indefinite systems of linear
[1] VMTK—The Vascular Modeling Toolkit. URL: http://www.vmtk.org. equations, SIAM J. Numer. Anal. 12 (1975) 617–629.
[2] Gmsh: a three-dimensional finite element mesh generator with built-in pre- [30] Hypre. URL: http://acts.nersc.gov/hypre/.
and post-processing facilities. URL: http://www.geuz.org/gmsh/. [31] CDP. URL: http://web.stanford.edu/group/cits/research/combustor/cdp.html.
[3] The CUBIT geometry and mesh generation toolkit. URL: https://cubit.sandia. [32] github.com/openfoam/openfoam-2.1.x/tree/master /applications/solver-
gov. s/incompressible/channelfoam. URL: github.com/OpenFOAM/OpenFOAM-
[4] S. Balay, et al. PETSc Web page, http://www.mcs.anl.gov/petsc (2013). 2.1.x/tree/master/applications /solvers/incompressible/channelFoam.
[5] M.A. Heroux, et al., An overview of the trilinos project, ACM Trans. Math. [33] Y. Saad, M. Schultz, Gmres: A generalized minimal residual algorithm for
Software 31 (3) (2005) 397–423. solving nonsymmetric linear systems, SIAM J. Sci. Stat. Comput. 7 (3) (1986)
[6] OpenFVM. URL: http://openfvm.sourceforge.net/. 856–869.