0% found this document useful (0 votes)
15 views12 pages

A High Level High Performance

Oasis is a high-level/high-performance finite element Navier–Stokes solver written in Python using FEniCS. It solves the Navier-Stokes equations for incompressible fluid flow on unstructured meshes in complex geometries using MPI parallelization and PETSc for linear algebra. The paper demonstrates that Oasis can accurately reproduce benchmark turbulent flow simulations while being easily programmable and customizable through its Python interface.

Uploaded by

michele barucca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views12 pages

A High Level High Performance

Oasis is a high-level/high-performance finite element Navier–Stokes solver written in Python using FEniCS. It solves the Navier-Stokes equations for incompressible fluid flow on unstructured meshes in complex geometries using MPI parallelization and PETSc for linear algebra. The paper demonstrates that Oasis can accurately reproduce benchmark turbulent flow simulations while being easily programmable and customizable through its Python interface.

Uploaded by

michele barucca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Computer Physics Communications 188 (2015) 177–188

Contents lists available at ScienceDirect

Computer Physics Communications


journal homepage: www.elsevier.com/locate/cpc

Oasis: A high-level/high-performance open source Navier–Stokes


solver✩
Mikael Mortensen a,b,∗ , Kristian Valen-Sendstad b,c
a
University of Oslo, Moltke Moes vei 35, 0851 Oslo, Norway
b
Center for Biomedical Computing at Simula Research Laboratory, P.O. Box 134, N-1325 Lysaker, Norway
c
University of Toronto, 5 Kings College Road, Toronto, ON, Canada

article info abstract


Article history: Oasis is a high-level/high-performance finite element Navier–Stokes solver written from scratch in
Received 19 February 2014 Python using building blocks from the FEniCS project (fenicsproject.org). The solver is unstructured
Received in revised form and targets large-scale applications in complex geometries on massively parallel clusters. Oasis utilizes
8 August 2014
MPI and interfaces, through FEniCS, to the linear algebra backend PETSc. Oasis advocates a high-level,
Accepted 27 October 2014
Available online 18 November 2014
programmable user interface through the creation of highly flexible Python modules for new problems.
Through the high-level Python interface the user is placed in complete control of every aspect of the
Keywords:
solver. A version of the solver, that is using piecewise linear elements for both velocity and pressure,
CFD is shown to reproduce very well the classical, spectral, turbulent channel simulations of Moser et al.
FEniCS (1999). The computational speed is strongly dominated by the iterative solvers provided by the linear
Python algebra backend, which is arguably the best performance any similar implicit solver using PETSc may
Navier–Stokes hope for. Higher order accuracy is also demonstrated and new solvers may be easily added within the
same framework.

Program summary

Program title: Oasis


Catalogue identifier: AEUW_v1_0
Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEUW_v1_0.html
Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland
Licensing provisions: GNU Lesser GPL version 3 or any later version
No. of lines in distributed program, including test data, etc.: 3491
No. of bytes in distributed program, including test data, etc.: 266924
Distribution format: tar.gz
Programming language: Python/C++.
Computer: Any single laptop computer or cluster.
Operating system: Any (Linux, OSX, Windows).
RAM: a few Megabytes to several hundred Gigabytes
Classification: 12.
External routines: FEniCS 1.3.0 (www.fenicsproject.org, that in turn depends on a number of external
libraries like MPI, PETSc, Epetra, Boost and ParMetis)
Nature of problem:
Incompressible, Newtonian fluid flow.

✩ This paper and its associated computer program are available via the Computer Physics Communication homepage on ScienceDirect (http://www.sciencedirect.com/
science/journal/00104655).
∗ Corresponding author at: University of Oslo, Moltke Moes vei 35, 0851 Oslo, Norway.
E-mail address: [email protected] (M. Mortensen).

http://dx.doi.org/10.1016/j.cpc.2014.10.026
0010-4655/© 2014 Elsevier B.V. All rights reserved.
178 M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188

Solution method:
The finite element method.
Unusual features:
FEniCS automatically generates and compiles low-level C++ code based on high-level Python code.
Running time:
The example provided takes a couple of minutes on a single processor.
© 2014 Elsevier B.V. All rights reserved.

1. Introduction ational form that is to be solved, but does not need to actually per-
form any coding on the level of the computational cell, or element.
The Navier–Stokes equations describe the flow of incompress- A choice is made of finite element basis functions, and code is then
ible, Newtonian fluids. The equations are transient, nonlinear and generated for the form accordingly. There is a large library of pos-
velocity is non-trivially coupled with pressure. A lot of research has sible finite elements to choose from and they may be combined
been devoted to finding efficient ways of linearizing, coupling and both implicitly in a coupled manner or explicitly in a segregated
solving these equations. Many commercial solvers for Computa- manner—all at the same level of complexity to the user. The user
tional Fluid Dynamics (CFD) are available, and, due to the complex- never has to see the generated low-level code, but, this being an
ity of the high-level implementations (usually Fortran or C), users open source project, the code is wide open for inspection and even
are often operating these solvers through a Graphical User Inter- manual fine-tuning and optimization is possible.
face (GUI). To implement a generic, unstructured Navier–Stokes In this paper we will describe the Navier–Stokes solver Oasis,
solver from scratch in a low-level language like C or Fortran is that is written from scratch in Python, using building blocks
a considerable and time-consuming task involving tens of thou- from FEniCS and the PETSc backend. Our goal with this paper
sands of lines of error prone code that require much maintenance. is to describe a code that is (i) short and easily understood, (ii)
Nowadays, as will be shown in this paper, the use of new and mod- easily configured and (iii) as fast and accurate as state-of-the-art
ern high-level software tools enables developers to cut the size of Navier–Stokes solvers developed entirely in low-level languages.
programs down to a few hundred lines and development times to We assume that the reader has some basic knowledge of how
hours. to write simple solvers for partial differential equations using the
The implementation of any unstructured (Eulerian) CFD-solver FEniCS framework. Otherwise, reference is given to the online
requires a computational mesh. For most CFD software packages FEniCS tutorial [12].
today the mesh is generated by a third-party software like, e.g., the
open source projects VMTK [1], Gmsh [2] or Cubit [3]. To solve 2. Fractional step algorithm
the governing equations on this computational mesh, the equa-
tions must be linearized and discretized such that a solution can be In Oasis we are solving the incompressible Navier–Stokes
found for a certain (large) set of degrees of freedom. Large systems equations, optionally complemented with any number of passive
of linear equations need to be assembled and subsequently solved or reactive scalars. The governing equations are thus
by appropriate direct or iterative methods. Like for mesh gener- ∂u
ation, basic linear algebra, with matrix/vector storage and opera- + (u · ∇)u = ν∇ 2 u − ∇ p + f , (1)
∂t
tions, is nowadays most commonly outsourced to third-party soft-
∇ · u = 0, (2)
ware packages like PETSc [4] and Trilinos [5] (see, e.g., [6–8]).
With both mesh generation and linear algebra outsourced, the ∂ cα
+ u · ∇ cα = Dα ∇ 2 cα + fα , (3)
main job of CFD solvers boils down to linearization, discretization ∂t
and assembly of the linear system of equations. This is by no means where u(x, t ) is the velocity vector, ν the kinematic viscosity,
a trivial task as it requires, e.g., maps from computational cells to p(x, t ) the fluid pressure, cα (x, t ) is the concentration of species
global degrees of freedom and connectivity of cells, facets and ver- α and Dα its diffusivity. Any volumetric forces (like buoyancy)
tices. For parallel performance it is also necessary to distribute the are denoted by f (x, t ) and chemical reaction rates (or other
mesh between processors and set up for inter-communication be- scalar sources) by fα (c ), where c (x, t ) is the vector of all species
tween compute nodes. Fortunately, much of the Message Passing concentrations. The constant fluid density is incorporated into
Interface (MPI) is already handled by the providers of basic lin- the pressure. Note that through the volumetric forces there is a
ear algebra. When it comes down to the actual discretization, the possible feedback to the Navier–Stokes equations from the species,
most common approaches are probably the finite volume method, and, as such, a Boussinesq formulation for natural convection (see,
which is very popular for fluid flow, finite differences or the finite e.g., [13]) is possible within the current framework.
element method. We will now outline a generic fractional step method, where
FEniCS [9] is a generic open source software framework that the velocity and pressure are solved for in a segregated manner.
aims at automating the discretization of differential equations Since it is important for the efficiency of the constructed solver,
through the finite element method. FEniCS takes full advantage of the velocity vector u will be split up into its individual components
specialized, reliable and robust third-party providers of computa- uk .1 Time is split up into uniform intervals2 using a constant time
tional software and interfaces to both PETSc and Trilinos for linear
algebra and several third-party mesh generators. FEniCS utilizes
the Unified Form Language (UFL, [10]) and the FEniCS Form Com- 1 FEniCS can alternatively solve vector equations where all components are
piler (FFC, [11]) to automatically generate low-level C++ code that coupled.
efficiently evaluates any equation formulated as a finite element 2 It is trivial to use nonuniform intervals, but uniform is used here for
variational form. The FEniCS user has to provide the high-level vari- convenience.
M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188 179

step △t = t n − t n−1 , where superscript n is an integer and Set time and initial conditions
t n ∈ R+ . The governing equations are discretized in both space t=0
and time. Discretization in space is performed using finite for time steps n = 0, 1, 2, . . . do
elements, whereas discretization in time is performed with finite t = t + dt
differences. Following Simo and Armero [14] the generic fractional for inner iterations i = 0, 1, . . . do
step algorithm can be written as ϕ = p∗ = pn−1/2
solve (4) for uIk , k = 1, . . . , d
uIk − ukn−1 n−1/2 n−1/2
+ Bk = ν∇ 2 ũk − ∇k p∗ + fk solve (5) for pn−1/2
△t ϕ = pn−1/2 − ϕ
for k = 1, . . . , d, (4) end
solve (6) for unk , k = 1, . . . , d
1
∇ 2ϕ = − ∇ · uI , (5) solve (7) for cαn
△t update to next time step
unk − uIk end
= −∇k ϕ for k = 1, . . . , d, (6) Algorithm 1: Generic fractional step algorithm for the Navier–
△t Stokes equations.
cαn − cαn−1
+ Bαn−1/2 = Dα ∇ 2 c̃α + fαn−1/2 , (7)
△t We now have an algorithm that can be used to integrate the
where unk is component k of the velocity vector at time t n , d is solution forward in time, and it is clear that the fractional step
the dimension of the problem, ϕ = pn−1/2 − p∗ is a pressure algorithm allows us to solve for the coupled velocity and pressure
correction and p∗ is a tentative pressure. We are solving for the fields in a segregated manner. We should mention here that there
velocity and pressure on the next time step, i.e., unk for k = 1, . . . , d are plenty of similar, alternative algorithms for time stepping
of segregated solvers. The most common algorithm is perhaps
and pn−1/2 . However, the tentative velocity Eq. (4) is solved with
Pressure Implicit with Splitting of Operators (PISO) [15], which is
the tentative velocity component uIk as unknown. To avoid strict
used by both Ansys-Fluent [16], Star-CD [17] and OpenFOAM [18].
time step restrictions, the viscous term is discretized using a semi-
A completely different strategy would be to solve for velocity and
implicit Crank–Nicolson interpolated velocity component ũk = 0.5
n−1/2 pressure simultaneously (coupled solvers). Using FEniCS such a
(uIk + unk −1 ). The nonlinear convection term is denoted by Bk , coupled approach is straightforward to implement, and, in fact, it
indicating that it should be evaluated at the midpoint between requires less coding than the segregated one. However, since the
time steps n and n − 1. Two different discretizations of convection coupled approach requires more memory than a segregated, and
are currently used by Oasis since there are more issues with the efficiency of linear algebra
n−1/2 3 1 solvers, the segregated approach is favored here.
Bk = un−1 · ∇ ukn−1 − un−2 · ∇ unk −2 , (8) We are still left with the spatial discretization and the actual im-
2 2
n−1/2
plementation. To this end we will first show how the implementa-
Bk = u · ∇ ũk , (9) tion can be performed naively, using very few lines of Python code.
where the first is a fully explicit Adams–Bashforth discretization We will then, finally, describe the implementation of the high-
and the second is implicit, with an Adams–Bashforth projected performance solver.
convecting velocity vector u = 1.5 un−1 − 0.5 un−2 and
Crank–Nicolson for the convected velocity. Both discretizations are 3. Variational formulations for the fractional step solver
second order accurate in time, and, since the convecting velocity is
known, there is no implicit coupling between the (possibly) three The governing PDEs (4)–(7) are discretized with the finite el-
velocity components solved for. ement method in space on a bounded domain Ω ⊂ Rd , with
n−1/2
Convection of the scalar is denoted by Bα . The term must 2 ≤ d ≤ 3, and the boundary ∂ Ω . Trial and test spaces for the
n velocity components are defined as
be at most linear in cα and otherwise any known velocity and
scalar may be used in the discretization. Note that solving for cαn V = {v ∈ H 1 (Ω ) : v = u0 on ∂ Ω },
n n−1/2
the velocity u will be known and may be used to discretize Bα .
The discretization used in Oasis is V̂ = {v ∈ H 1 (Ω ) : v = 0 on ∂ Ω }, (10)
n−1/2
Bα = u · ∇ c̃α where u0 is a prescribed velocity component on part ∂ Ω of the
boundary and H 1 (Ω ) is the Sobolev space containing functions v
where c̃α = 0.5 (cαn + cαn−1 ). such that v 2 and |∇v|2 have finite integrals over Ω . Both the scalars
An iterative fractional step method involves solving Eq. (4) for and pressure use the same H 1 (Ω ) space without the restricted
all tentative velocity components and (5) for a pressure correction. boundary part. The test functions for velocity component and pres-
The procedure is repeated a desired number of times before finally sure are denoted as v and q, respectively, whereas the scalar simply
a velocity correction (6) is solved to ensure conservation of mass uses the same test function as the velocity component.
before moving on to the next time step. The fractional step method To obtain a variational form for component k of the tentative
can thus be outlined as shown in Algorithm 1. Note that if the velocity vector, we multiply Eq. (4) by v and then integrate over
momentum equation depends on the scalar (e.g., when using a the entire domain using integration by parts on the Laplacian
Boussinesq model), then there may also be a second iterative
uk − ukn−1
  I
n−1/2

loop over Navier–Stokes and temperature. The iterative scheme + Bk v + ν∇ ũk · ∇v dx
shown in Algorithm 1 is based on the observation that the tentative Ω △t
velocity computed in Eq. (4) only depends on previous known   
n−1/2

solutions un−1 , un−2 and not un . As such, the velocity update can = −∇k p∗ + fk v dx + ν∇n ũk v ds. (11)
be placed outside the inner iteration. In case of an iterative scheme Ω ∂Ω
where the convection depends on un (e.g., un · ∇ ũk ) the update Here ∇n represents the gradient in the direction of the outward
would have to be moved inside the inner loop. normal on the boundary. Note that the trial function uIk enters also
180 M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188

>>> python NSfracStep .py problem = Channel


solver =IPCS

The fractional step solver pulls in a required mesh, parameters


and functions from two submodules located in folders solvers
and problems. The user communicates with the solver through the
implementation of new problem modules in the problems folder.
With the design choice of placing the solver at the root level
of a Python module, there is a conscious decision of avoiding
object oriented classes. However, remembering that everything in
Python is an object, we still, as will be shown, make heavy use of
overloading Python objects (functions, variables).
The fractional step module NSfracStep.py is merely one hun-
dred lines of code (excluding comments and spaces) dedicated to
allocation of necessary storage and variables, plus the implemen-
tation of the generic fractional step Algorithm 1. The first half of
NSfracStep.py is shown in Fig. 2. Except from the fact that most
details are kept in submodules, the design is very similar to most
FEniCS Python demos, and, as such, Oasis should feel familiar and
Fig. 1. Directory tree structure of Python package Oasis. be quite easily accessible to new users with some FEniCS experi-
ence.
through the Crank–Nicolson velocity component ũk = 0.5(uIk + Consider the three functions towards the end of Fig. 2 that
unk −1 ). The boundary term is only important for some boundaries take **vars() as argument. The body_force function returns f in
and is neglected for the rest of this paper. (1) and should thus by default return a Constant vector of zero
The variational form for the pressure correction is obtained by values (length 2 or 3 depending on whether the problem is 2D
multiplying Eq. (5) by q and then integrating over the domain, or 3D). The initialize function initializes the solution in q_,
using again integration by parts q_1, q_2 and create_bcs must return a dictionary of boundary
conditions. These functions are clearly problem specific and thus
∇ · uI
  
default implementations are found in the problems/__init__.py
∇ϕ · ∇ q dx − ∇n ϕ qds = q dx. (12) module that all new problems are required to import from.
Ω ∂Ω Ω △t
The default functions may then be overloaded as required by
The boundary integral can be neglected for all parts of the domain the user in the new problem module (see, e.g., Fig. 5). An
where the velocity is prescribed. interesting feature is the argument **vars(), which is used for
A variational form for the velocity update of component k is all three functions. The Python built-in function vars() returns
obtained by multiplying (6) by v and integrating over the domain a dictionary of the current module’s namespace, i.e., it is here
NSfracStep’s namespace containing V, Q, u, v, and all the other
unk − uIk
 
v dx = − ∇k ϕ v dx. (13) variables seen in Fig. 2. When **vars() is used in a function’s
Ω △t Ω signature, any variable declared within NSfracStep’s namespace
Finally, a variational form for the scalar component α is may be unpacked in that function’s list of arguments and accessed
obtained by multiplying Eq. (7) by v , and then integrating over the by reference. Fig. 3 illustrates this nicely through the default
domain using integration by parts on the diffusion term implementations (found in problems/__init__.py) of the three
previously mentioned functions.
cα − cαn−1
  n
After initialization the solution needs to be advanced in time.

+ Bαn−1/2 v + Dα ∇ c̃α · ∇v dx The entire implementation of the time integration performed in
Ω △t
  NSfracStep.py is shown in Fig. 4, that closely resembles Algorithm
= fαn−1/2 v dx + Dα ∇n c̃α v ds. (14) 1. In Fig. 4 the functions ending in hook are imported through the
Ω ∂Ω problems submodule, save_solution from common and the rest of the
functions are imported from the solvers submodule.
4. Oasis The common submodule basically contains routines for parsing
the command line and for storing and retrieving the solution
We now have all the variational forms that together constitute (common/io.py). There is, for example, a routine here that can be
a fractional step solver for the Navier–Stokes equations, comple- used if the solver needs to be restarted from a previous simulation.
mented with any number of scalar fields. We will now describe The problems and solvers submodules are more elaborate and will
how the fractional step algorithm has been implemented in Oa- be described next.
sis and discuss the design of the solver package. For installation of
the software, see the user manual [19]. Note that this paper refers The problems submodule
to version 1.3 of the Oasis solver, which in turn is consistent with Oasis is a programmable solver and the user is required to im-
version 1.3 of FEniCS. plement the problem that is to be solved. The implemented prob-
lem module’s namespace must include at least a computational
4.1. Python package mesh and functions for specifying boundary conditions and initial-
ization of the solution. Other than that, the user may interact with
The Oasis solver is designed as a Python package with tree NSfracStep through certain hook files strategically placed within
structure shown in Fig. 1. The generic fractional step algorithm the time advancement loop, as seen in Fig. 4, and as such there is
is implemented in the top level Python module NSfracStep.py and no need to modify NSfracStep itself.
the solver is run by executing this module within a Python shell Consider a lid driven cavity with Ω = [0, 1] × [0, 1]. The
using appropriate keyword arguments, e.g., velocity boundary conditions are u = (1, 0) for the top lid (y = 1)
M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188 181

Fig. 2. The opening section of NSfracStep.py. Allocation of necessary storage and parameters for solving the momentum equation through its segregated components.
Note that a mesh, some parameters (for e.g., viscosity, time step, end time, etc.), and some functions (for e.g., body force, boundary conditions or initializing the solution)
must be imported from the problem module. The UFL function as_vector creates vectors (u_, u_1, u_2) from the segregated velocity components. The built-in function
vars() returns the current modules namespace. Neglecting scalar components the list sys_comp = ["u0", "u1", "p"] for 2D and ["u0", "u1", "u2", "p"]
for 3D problems. The list is used as keys for the dictionary bcs.

and zero for the remaining walls. We start the simulations from a problem parameters can be found in the dictionary NS_parameters
fluid at rest and advance the solution in time steps of △t = 0.001 declared in problems/__init__.py, and all these parameters may be
from t = 0 to t = 1. The viscosity is set to ν = 0.001. This overloaded, either as shown in Fig. 5, or through the command line.
problem can be implemented as shown in Fig. 5. Here we have A comprehensive list of parameters and their use is given in
made use of the standard python package numpy and two dolfin the user manual. We use preconditioned iterative Krylov solvers
classes UnitSquareMesh and DirichletBC. UnitSquareMesh creates (NS_parameters["use_krylov_solvers"]=True), and not the default
a computational mesh on the unit square, whereas DirichletBC direct solvers based on LU decomposition, since the former here
creates Dirichlet boundary conditions for certain segments of the are faster and require less memory (the exact choice of iterative
boundary identified through two strings noslip and top (x[0] and solvers is discussed further in Section 7). Note that FEniCS inter-
x[1] represent coordinates x and y respectively). A default set of faces to a wide range of different linear algebra solvers and precon-
182 M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188

Fig. 3. Default implementations of three of the functions found in


problems/__init__.py.

Fig. 5. Drivencavity.py—Implementation of the driven cavity problem.

of the computational mesh has been implemented in Fig. 5 as


Nx=Ny=50. This may be overloaded through the command line while
running the solver, like
>>> python NSfracStep .py problem = DrivenCavity
Nx=20 Ny=20

The ability to overload parameters through the command line is


useful for, e.g., fast convergence testing.
The computational mesh has to be part of the problem module’s
namespace. However, it does not need to be defined as a callable
function, like that used in Fig. 5. Three equally valid examples are
mesh = UnitSquareMesh (10 , 10)
mesh = Mesh(" SomeMesh .xml.gz")
d e f mesh(N, ** params ):
r e t u r n UnitSquareMesh (N, N)

The first mesh is hardcoded in the module and cannot be mod-


ified through the commandline. The second approach, mesh =
Mesh("some_mesh.xml.gz"), is usually used whenever the mesh
has been created by an external software. The third option uses
a callable function, making it possible to modify the mesh size
through the command line.
A complete list of all default functions and parameters that may
Fig. 4. Time loop in NSfracStep.py.
be overloaded by the user in their implemented problem module
is found in problems/__init__.py.
ditioners. The iterative solvers used by Oasis are defined in function
get_solvers imported from the solvers submodule. The solvers submodule
To run the solver for the driven cavity problem we need to spec- The finer details of the fractional step solver are implemented in
ify this through the command line—along with any other param- the solvers submodule. A list of all functions that are imported by
eter we wish to modify at runtime. For example, the default size NSfracStep is found in the solvers/__init__.py module. The most
M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188 183

Fig. 7. Implementation in solvers/IPCS.py of routines called in Fig. 4.

Fig. 6. Naive implementation in solvers/IPCS.py of variational forms used for


solving the momentum equation (11), pressure correction (12) and momentum 5. High-performance implementation
update (13).

The naive solver described in the previous section is very easy


important can be seen in Fig. 4. Note the special calling routine for to implement and understand, but for obvious reasons it is not very
the function setup fast. For example, the entire coefficient matrix is reassembled each
vars (). update ( setup (** vars ())) time step (see Fig. 4), even though it is only the convection term
that changes in time. We will now explain how the same incre-
The purpose of this setup function is to prepare the solver for time mental pressure correction solver can be implemented efficiently,
advancement. This could mean either defining UFL forms of the at the cost of losing some intuitiveness. The implementation of the
variational problems (see Fig. 6) or to preassemble matrices that do high-performance solver described in this section can be found in
not change in time, e.g., diffusion and mass matrices (see Section 5). solvers/IPCS_ABCN.py.
The setup function returns a dictionary and this dictionary is The most significant steps in the optimization can roughly be
updated and made part of the NSfracStep namespace through the split into four contributions: (i) preassembling of constant matri-
use of vars().update. ces making up the variational forms, (ii) efficient assembly of the
We may now take the naive approach and implement all vari- entire coefficient matrix, where in an intermediate form it is used
ational forms exactly as described in Section 3. A smart approach, also to compute large parts of the linear right hand side, (iii) use
on the other hand, will take advantage of certain special features of of constructed (constant) matrices for assembling terms on right
the Navier–Stokes equations. The starting point for implementing hand side through fast matrix vector products and (iv) efficient use
a new solver, though, will usually be the naive approach. A naive and re-use of iterative solvers with preconditioners.
implementation requires very few lines of code, it is easy to debug To implement an efficient solver we need to split up the
and as such it can be very useful for verification of the slightly more variational forms (11)–(13) term by term and view the equations
complex and optimized solvers to be discussed in the next section. on an algebraic level. The finite element solution, which is the
The solvers/IPCS.py module contains a naive implementation product of the solver, is then written as
of the variational forms (11)–(13). The forms are implemented us-
ing the setup function shown in Fig. 6. Dictionaries are used to Nu
k,I

hold the forms for the velocity components, whereas there is only uIk = Uj φj , (15)
one form required for the pressure. Note the very close corre- j =1
spondence between the high-level Python code and the mathe-
k,I N
matical description of the variational forms. The variational forms where φj are the basis functions and {Uj }j=u1 are the Nu degrees
are assembled and solved through the very compact routines of freedom.
velocity_tentative_solve, pressure_solve and velocity_update We start by inserting for the tentative velocity uIk and v = φi in
that are implemented as shown in Fig. 7. The remaining de- the bilinear terms of the variational form (11)
fault functions are left to do nothing, as implemented already in
solvers/__init__.py, and as such these 4 functions shown in Figs. 6  Nu  
φj φi dx Ukj ,I ,

and 7 are all it takes to complete the implementation of the naive uIk v dx = (16)
incremental pressure correction solver. Note that this implemen- Ω j =1 Ω

tation works for any order of the velocity/pressure function spaces.  Nu  


∇φj · ∇φi dx Ukj ,I .

There is simply no additional implementation cost for using higher ∇ uIk · ∇v dx = (17)
order elements. Ω j =1 Ω
184 M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188

Each term inside the parenthesis on the right hand side represents
a matrix

Mij = φj φi dx, (18)


Kij = ∇φj · ∇φi dx. (19)

The two matrices are independent of time and can be preassem-
bled once through (u, v are trial and test functions respectively)
M = assemble ( inner (u, v)*dx)
K = assemble ( inner (grad(u), grad(v))*dx)

Note that the solution vectors and matrices represent the major
cost in terms of memory use for the solver. The matrices are sparse
and allocated by the linear algebra backend, using appropriate
wrappers that are hidden to the user. The allocation takes place
just once, when the matrices/vectors are created.
The nonlinear convection form contains the evolving solution
and requires special attention. We use the implicit convection
form given in Eq. (9) and write out the implicit Crank–Nicolson
convected velocity for component k
1
u · ∇ uIk + unk −1 .
 
u · ∇ ũk = (20)
2
Inserting for the algebraic form of the finite element trial and test
functions, the variational form for the bilinear convection term
becomes
 Nu  
k,I

u · ∇ uIk v dx = u · ∇φj φi dx Uj , (21)
Ω j=1 Ω
Fig. 8. Inside assemble_first_inner_iter. Fast assemble of coefficient
where u = 1.5 un−1 − 0.5 un−2 . The convection matrix can be matrix and parts of right hand side vector. A temporary rhs vector b_tmp is used for
each velocity component since this routine is called only on the first inner iteration.
recognized as the term inside the parenthesis x_1 is the vector of degrees of freedom at t n−1 .

n−1/2 We may now reformulate our variational problem on the
Cij = u · ∇φj φi dx. (22)
Ω algebraic level using the three assembled matrices. It is required
The convecting velocity is time-dependent and interpolated at that for each test function v = φi , i = 1, . . . , Nu , the following
t n−1/2 . As such, the convection matrix is also evaluated at n − 1/2 equations must hold
   
k,I
and needs to be reassembled each time step. To simplify notations, Mij Uj − Ujk,n−1 Cij Uj
k,I
+ Ukj ,n−1
though, we have for the rest of this paper omitted the time notation +
on Cij . The assembly of the Cij matrix is prepared in the setup △t 2
function: 
k,I

# Defined in setup Kij Uj + Ujk,n−1
k,n−1/2
u_ab = as_vector ([ Function (V) f o r i i n +ν = Φi , (26)
range (len( u_components ))]) 2
aconv = inner (v, dot(grad(u), u_ab))*dx where
 
k,n−1/2 n−1/2

where u_ab is used as a container for the convecting velocity u. Note Φi = −∇k p∗ + fk φi dx. (27)
that u_ab,is assembled (see Fig. 8) before assembling the matrix Ω
Cij , because this leads to code that is a factor 2 faster than simply If separated into bilinear and linear terms, the following system of
using a form based on the velocity functions at the two previous algebraic equations is obtained
levels directly (i.e., aconv = inner(v, dot(grad(u), 1.5*u_1 -
 
Mij Cij k,I Kij
0.5*u_2))*dx). + +ν Uj
△t 2 2
Consider now the linear terms,
Nu where the known solution func-
k,n−1
tion is written as ukn−1 = φj , where Ukj ,n−1 are the
 
j=1 Uj
Mij Cij Kij k,n−1 k,n−1/2
= − −ν Uj + Φi . (28)
known coefficients of velocity component k at the previous time △t 2 2
step t n−1 . We have the following linear terms in Eq. (11) If now Aij = Mij /△t + Cij /2 + ν Kij /2 is used as the final coefficient
matrix, then the equation may be written as

k,n−1
unk −1 v dx = Mij Uj , (23)  
Ω k,I 2Mij k,n−1 k,n−1/2
 Aij Uj = − Aij Uj + Φi , (29)
k,n−1 △t
∇ ukn−1 · ∇v dx = Kij Uj , (24)
or

k,I k,n−1/2
Aij Uj = bi , for k = 1, . . . , d, (30)

k,n−1
u · ∇ ukn−1 v dx = Cij Uj , (25)
k,n−1/2
Ω where bi is the right hand side of (29). Note that the same
that are all very quickly computed using simple matrix vector coefficient matrix is used by all velocity components, even when
products. there are Dirichlet boundary conditions applied.
M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188 185

An efficient algorithm (2) can now be designed to assemble both The Laplacian matrix K̂ij can be preassembled. If the pressure
large parts of the right hand side and the left hand side of Eq. (30) function space is the same as the velocity function space, then
at the same time. K̂ij = Kij and no additional work is required. The divergence term
may be computed as
Assemble Aij ←− Cij
Aij = Mij /dt − Aij /2 − ν Kij /2
 
d Nu 
∇ · uI

1   k,I
k,n−1/2 k,n−1/2
φ̂i dx = ∇k φj φ̂i dx Uj ,
bi = fi + Aij Ukj ,n−1 , Ω △t △t k=1 j=1 Ω
for k = 1, . . . , d d
1  k,I
Aij = −Aij + 2Mij /dt . = dUkij Uj , (34)
△t k=1

Algorithm 2: Efficient algorithm for assembling the coefficient where the matrices dUkij for k = 1, . . . , d can be preassembled.
matrix Aij , where most of the right hand side of Eq. (30) is Again, the cost is three additional sparse matrices, unless the
assembled in an intermediate step. function spaces of pressure and velocity are the same. In that case
dUkij = dPijk and memory can be saved. If the low_memory_version
Algorithm (2) is implemented as shown in Fig. 8. At the end of is chosen, then we simply use the slower finite element assembly
this algorithm, most of bk,n−1/2 (except from the pressure gradient) assemble ((1/dt)*div(u_)*q*dx)
has been assembled and the coefficient matrix Aij is ready to be
used in Eq. (30). The convection matrix needs to be reassembled The final step for the fractional step solver is the velocity update
each new time step, but only on the first inner velocity pressure that can be written for component k as
iteration since u only contains old and known velocities, not
k,n n−1/2
the new unk . For this reason the code in Fig. 8 is placed inside Mij Uj = Mij Ukj ,I − △t dPijk Pj , (35)
assemble_first_inner_iter, called in Fig. 4. Notice that there is
no separate matrix used for 2Mij /△t − Aij or Cij and the total where Uj
k,I
and Pj
n−1/2
now are the known degrees of freedom
memory cost of the algorithm is exactly three individual sparse k ,n
of tentative velocity and pressure respectively, whereas Uj
matrices (Aij , Mij and Kij ). The sparsity pattern of the matrices is
represent the unknowns. The velocity update requires a linear
computed on the first assemble and the matrix axpy operations
algebra Krylov or direct solve and as such it is quite expensive
take advantage of the fact that all these matrices share the same
even though the equation is cheap to assemble. For this reason the
pattern.
k,n−1/2 velocity update has an additional option to use either a weighted
The linear term Φi needs some further comments.
k,n−1/2
gradient matrix3 Gkij or lumping of the mass matrix, that allows the
Neglecting the constant forcing, f , the second part of Φi is update to be performed directly

k ,n n−1/2
−∇k p∗ φi dx, (31) Ui = Uki ,I − △t Gkij Pj , for i = 1, . . . , Nu . (36)

Np The parameter used to enable the direct approach is
j=1 Pj φ̂j , φ̂j is the basis function for the pressure

where p∗ = NS_parameters["velocity_update_type"] that can be set to

and Pj are the known degrees of freedom. On algebraic form we "gradient_matrix" or "lumping".
get
Np 
6. Verification of implementation
  
∇k p φi dx =

∇k φ̂j φi dx Pj∗ ,
Ω j=1 Ω The fractional step algorithm implemented in NSfracStep is tar-
geting transient flows in large-scale applications, with turbulent as
= dPijk Pj ,∗
(32)
well as laminar or transitional flow. It is not intended to be used
where dPijk for k = 1, . . . , d are d matrices that are constant in as a steady state solver.4 Oasis has previously been used to study,
time. Since the matrices can be preassembled, the computation e.g., blood flow in highly complex intracranial aneurysms [21,22],
k,n−1/2 where the results compare very well with, e.g., the spectral el-
of Φi through a matrix vector product is very fast.
ement code NEKTAR [23]. Simulations by Steinman and Valen-
Unfortunately, though, three additional matrices require storage
Sendstad [22] are also commented by Ventikos [24], who state this
(in 3D), which may be too expensive. In that case there is a
is ‘‘the right way to do it’’—referring to the need for highly resolved
parameter in Oasis that can be used. Setting
CFD simulations of transitional blood flow in aneurysms.
NS_parameters [" low_memory_version "] = False
Considering the end use of the solver in biomedical applications
enables the creation of the matrices dPijk . If disabled the term is and research, it is essential that we establish the accuracy as well
computed simply through as the efficiency of the solver.
assemble ( inner (p_.dx(k), v)*dx)
6.1. 2D Taylor–Green flow
for k = 0, . . . , d − 1. The pressure gradient is added to bk in
velocity_tentative_assemble and not in Fig. 8, since the pressure Two dimensional Taylor–Green flow is one of very few non-
is modified on inner iterations. trivial analytical and transient solutions to the Navier–Stokes
The pressure correction equation can also be optimized on the equations. For this reason it is often used for verification of com-
Np n−1/2
algebraic level. Using trial function pn−1/2 = j =1 Pj φ̂j and puter codes. The implementation can be found in Oasis/problems/
test function q = φ̂i we can write (12) for each test function

∇ · uI 3 Requires the fenicstools [20] package.



n−1/2
K̂ij Pj = K̂ij Pj∗ − φ̂i dx. (33) 4 As of 14 Sep 2014 Oasis ships with a coupled steady state solver for this purpose.
Ω △t
186 M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188

Table 1 Table 2
Taylor–Green flow convergence errors O(hk ), where h and k are mesh size and order Taylor–Green flow convergence errors O(dt k ), where dt and k are time step and
of convergence respectively. ∥ · ∥h represents an L2 norm. The velocity is either order of convergence respectively. The velocity uses Lagrange elements of degree
quadratic (P2) or linear (P1), whereas the pressure is always linear (P1). four (P4), whereas the pressure uses third degree (P3).
h ∥u − ue ∥h k ∥p − pe ∥h k P4P3
P2P1 dt ∥u − ue ∥h k ∥p − pe ∥h k

2.83E−01 2.14E−02 – 1.81E−02 – 5.00E−01 5.08E−01 – 1.29E+00 –


1.41E−01 1.44E−03 3.89 5.49E−03 1.72 2.50E−01 1.36E−01 1.91 2.97E−01 2.11
9.43E−02 2.84E−04 4.01 2.46E−03 1.97 1.25E−01 3.42E−02 1.99 7.12E−02 2.06
7.07E−02 8.94E−05 4.01 1.39E−03 2.00 6.25E−02 8.62E−03 1.99 1.77E−02 2.01
5.66E−02 3.65E−05 4.01 8.88E−04 2.00 3.12E−02 2.17E−03 1.99 4.41E−03 2.00
P1P1
2.83E−01 9.31E−03 – 4.97E−03 – between two parallel planes located at y = ±1 and is periodic
1.41E−01 2.36E−03 1.98 1.55E−03 1.68 in the x and z directions. The flow is driven by an applied
9.43E−02 1.06E−03 1.98 7.12E−04 1.92
7.07E−02 5.98E−04 1.99 4.05E−04 1.97
constant pressure gradient (forcing) in the x-direction. This flow
5.66E−02 3.83E−04 1.99 2.60E−04 1.98 has been studied extensively with numerous CFD-codes, often
using spectral accuracy since it is of primary importance to capture
the rate of dissipation of turbulent kinetic energy, allowing no
TaylorGreen.py and the Taylor–Green solution reads (or very little) numerical diffusion. To verify our implementation
 we will here attempt to reproduce the classical simulations of
ue = − sin(π y) cos(π x) exp(−2π 2 ν t ), (37) Moser, Kim and Mansour (MKM,√[26]) for Reτ = 180, based on
 the wall friction velocity uτ = ν∂ u/∂ ywall . The computational
sin(π x) cos(π y) exp(−2π 2 ν t ) , (38) box is of size Lx = 4π , Ly = 2 and Lz = 4π /3. The resolution
of MKM was a box of size 1283 , uniform in x and z-directions
1 and skewed towards the walls using Chebyshev points in the y-
pe = − (cos(2π x) + cos(2π y)) exp(−4π 2 ν t ), (39)
4 direction. In this test we use one under-resolved box of size 643
on the doubly periodic domain (x, y) = [0, 2] × [0, 2]. The and one of the same size as MKM to show convergence towards
analytical solution is used to initialize the solver and to compute the correct solution. Since each hexahedron is further divided
the norms of the error, i.e., ∥u − ue ∥h and ∥p − pe ∥h , where ∥ · ∥h into 6 tetrahedrons, this corresponds to 6 · 643 and 6 · 1283
represents an L2 error norm. The mesh consists entirely of right finite elements6 . MKM performed their simulations using spectral
triangles and is uniform in both spatial directions. The mesh size accuracy with Fourier representation in the periodic directions
h is computed as two times the circumradium of a triangle. The and a Chebyshev-tau formulation in the y-direction. Here we
kinematic viscosity is set to ν = 0.01 and time is integrated for t = use piecewise linear Lagrange elements (P1P1) of second order
[0, 1] with a short time step (△t = 0.001) to practically eliminate accuracy. The creation of the mesh and boundary conditions in
temporal integration errors. The solver is run for a range of mesh module problems/Channel.py is shown in Fig. 9.
sizes and the order of convergence is shown in Table 1. The velocity The sampling of statistics is performed using routines from the
is either piecewise quadratic (P2) or piecewise linear (P1), whereas fenicstools [20] package and are not described in detail here. Refer-
the pressure is always piecewise linear. The P2P1 solver achieves ence is given to the complete source code in problems/Channel.py
fourth order accuracy in velocity and second order in pressure, in the Oasis repository. Fig. 10 shows the statistically converged
whereas the P1P1 solver is second order accurate in both. Note that mean velocity in the x-direction across the channel normalized
the fourth order in velocity is due to superconvergence [25] and it by uτ . The black curve shows the spectral solution of MKM. The
will drop to three for a mesh that is not regularly sized and aligned dashed and dotted curves show, respectively, the Oasis solution
with the coordinate axis. The order of the L2 error (k) is computed using 6 · 643 and 6 · 1283 computational cells. The coarse solu-
by comparing the error norm of two consecutive discretization tion represents an underresolved simulation where the sharpest
levels i and i − 1, and assuming that the error can be written as velocity gradients cannot be captured. The total amount of dissi-
Ei = Chki , where C is an arbitrary constant. Comparing Ei = Chki pation within the flow is thus underpredicted and the mean pre-
dicted velocity is consequently higher than it should be. This re-
and Ei−1 = Chki−1 we can isolate k = ln(Ei /Ei−1 )/ ln(hi /hi−1 ).
sult is in agreement with the understanding of underresolved Large
To verify the convergence of the transient fractional step
Eddy Simulations (LES) of turbulent flows, that in effect adds vis-
scheme, we isolate temporal errors by practically eliminating
cosity to the large resolved scales to counteract the lack of dissipa-
spatial discretization errors through the use of high order P4 and P3
tion from the unresolved small scales. Hence, simply by increasing
elements for velocity and pressure respectively. The solver is then
the kinematic viscosity, the predicted mean flow could be forced
run for a range of time step sizes for t = [0, 6]. The error norms
closer to the spectral DNS solution seen in Fig. 10. Another option
at the end of the runs are shown in Table 2 indicating that both
is, of course, to refine the mesh and thereby resolve the smallest
pressure and velocity achieve second order accuracy in time. Note
scales. As expected, we see in Fig. 10 that the 6 · 1283 simulations
that in Table 2, the order of the error is computed from Ei = Cdtik ,
are in much closer agreement with the spectral DNS. There is still
where dti is the time step used at level i.
a slight mismatch, though, that should be attributed to the lower
order of the Oasis solver, incapable of capturing all the finest scales
6.2. Turbulent channel flow of turbulence. It is worth mentioning that the Galerkin finite el-
ement method used by Oasis contains no, or very little, numeri-
The second test case is a direct numerical simulation5 of cal diffusion. A dissipative solver, like, e.g., a finite volume using
turbulent, fully developed, plane channel flow. The flow is bounded

6 Due to two periodic directions the number of degrees of freedom for the fine
5 A direct numerical simulation indicates a simulation where all scales of mesh are 128 · 129 · 128 for each velocity component and pressure, which is the
turbulence have been resolved. same as used by MKM.
M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188 187

Fig. 11. Normal Reynolds stresses scaled by u2τ shown as functions of scaled
distance to the wall y+ . Dotted and dashed curves are computed with Oasis using
respectively 6 · 643 and 6 · 1283 computational cells. The solid lines are from the
reference solution of MKM. The three different profiles represent, in decreasing
magnitude, the normal stresses uu , ww + and vv + , where u, v and w are velocity
+

fluctuations in x, y and z directions respectively.

are used per CPU, and thus we have run our simulations using 8
CPUs for the coarse mesh and 64 for the fine, which contains 12.5
million tetrahedrons. Simulations for the fine grid take approxi-
mately 1.5–1.7 s real time per computational time step depending
on traffic and distribution on Abel (20%–25% lower for the coarse
simulations) and thus close to 12 h for the entire test (30,000 time
steps). Approximately 75% of the computing time is spent in the
linear algebra backend’s iterative Krylov solvers and assembling of
the coefficient matrix, as detailed in Section 5, is responsible for
most of the remaining time. The backend (here PETSc) and the it-
erative linear algebra solvers are thus key to performance. For the
Fig. 9. Implementation of the Channel problem. tentative velocity computations we have used a stabilized version
of a biconjugate gradient squared solver [28] with a very cheap
(fast, low memory) Jacobi preconditioner imported from method
get_solvers, where it is specified as KrylovSolver(‘bicgstab’,
‘jacobi’). This choice is justified since the tentative velocity coeffi-
cient matrix is diagonally dominant due to the short time steps, and
each solve requires approximately 10 iterations to converge (the
same number for coarse and fine). The pressure coefficient matrix
represents a symmetric and elliptic system and thus we choose a
solver based on minimal residuals [29] and the hypre [30] algebraic
multigrid preconditioner (KrylovSolver(‘minres’, ‘hypre_amg’)).
The pressure solver uses an average of 6 iterations (same for coarse
and fine) to converge to a given tolerance. The velocity update is
computed using a lumped mass matrix and no linear algebra solver
is thus required for this final step.

Fig. 10. Mean velocity in x-direction normalized by uτ as a function of scaled 7. Concluding notes on performance
distance to the wall y+ . Dotted and dashed curves are computed with Oasis using
respectively 6 · 643 and 6 · 1283 computational cells. The solid line is the reference The computational speed of any implicit, large-scale
solution from MKM.
Navier–Stokes solver is determined by many competing factors,
but most likely it will be limited by hardware and by routines for
an upwind scheme or a monotonically integrated implicit LES [27], setting up (assembly) and solving for its linear algebra subsystems.
would have the same effect as a LES model that adds viscosity and In Oasis, and many comparable Navier–Stokes solvers, the linear al-
as such could lead to coarse simulations with mean velocity pro- gebra is performed through routines provided by a backend (here
files closer to MKM. PETSc) and are thus arguably beyond our control. Accepting that
Fig. 11 shows the normal, non-dimensionalized, Reynolds we cannot do better than limits imposed by hardware and the
stresses. The results confirm that the underresolved stresses are backend, the best we can really hope for through high-level imple-
underpredicted close to the wall, whereas the fine simulations con- mentations is to eliminate the cost of assembly. In Oasis we take all
verge towards the spectral MKM results. conceivable measures to do just this, as well as even reducing the
The channel simulations do not require more computational number of required linear algebra solves. As mentioned in the pre-
power than can be provided by a relatively new laptop computer. vious section, for the turbulent channel case with 12.5 mill. tetra-
However, since these simulations are run for more than 30,000 hedrons, 75% of the computational time was found spent inside
time steps to sample statistics, we have performed parallel com- very efficient Krylov solvers and we are thus, arguably, pushing at
putations on the Abel supercomputer at the University of Oslo. The the very boundaries of what may be achieved by a solver developed
simulations scale weakly when approximately 200,000 elements with similar numerical schemes, using the same backend.
188 M. Mortensen, K. Valen-Sendstad / Computer Physics Communications 188 (2015) 177–188

To further support this claim, without making a complete com- [7] Fluidity. URL: imperial.ac.uk/earthscienceandengineering.
parison in terms of accuracy, we have also set up and tested the [8] Oofem—object oriented finite element solver. URL: http://www.oofem.org/en/
oofem.html.
channel simulations on Abel for two low-level, second order ac- [9] FEniCS. URL: http://fenicsproject.org.
curate, semi-implicit, fractional step solvers, OpenFOAM [18] and [10] M.S. Alnæs, A. Logg, K.B. Ølgaard, M.E. Rognes, G.N. Wells, Unified form
CDP [31], that both are targeting high performance on massively language: A domain-specific language for weak formulations of partial
differential equations, ACM Trans. Math. Software 40 (2) (2014).
parallel clusters. We used channelFoam, distributed with Open- [11] R.C. Kirby, A. Logg, A compiler for variational forms, ACM Trans. Math. Software
FOAM version 2.2.1 [32] and 2.5.0 version of CDP (requires license). 32 (3) (2006) 417–444. http://dx.doi.org/10.1145/1163641.1163644. URL:
The channelFoam LES solver was modified slightly to run with con- http://doi.acm.org/10.1145/1163641.1163644.
[12] FEniCS tutorial. URL: http://fenicsproject.org/documentation/tutorial.
stant viscosity, and parameters were set to match the finest chan- [13] M.A. Christon, P.M. Gresho, S.B. Sutton, Computational predictability of time-
nel simulations using 1283 hexahedral cells. OpenFOAM used for dependent natural convection flows in enclosures (including a benchmark
the tentative velocity a biconjugate gradient [28] solver with a di- solution), Internat. J. Numer. Methods Fluids (2012).
[14] J. Simo, F. Armero, Unconditional stability and long-term behavior of transient
agonal incomplete-LU preconditioner. For the pressure a conju- algorithms for the incompressible Navier–Stokes and Euler equations,
gate gradient solver was used with a diagonal incomplete Cholesky Comput. Methods Appl. Mech. Engrg. 111 (1994) 111–154.
preconditioner. The CDP solver was set up with the same hexahe- [15] R.I. Issa, Solution of the implicitly discretized fluid flow equations by operator-
splitting, J. Comput. Phys. 62 (1985) 40–65.
dral mesh as OpenFOAM, using no model for the LES subgrid vis- [16] Ansys-fluent. URL: www.ansys.com.
cosity. The linear solvers used by CDP were very similar to those [17] Star-CD. URL: http://www.cd-adapco.com/products/star-cd.
used by Oasis, with a Jacobi preconditioned biconjugate gradient [18] OpenFOAM—The open source CFD toolbox. URL: www.openfoam.com.
[19] Oasis user manual. URL: https://github.com/mikaem/Oasis/blob/master/doc/
solver for the tentative velocity and a generalized minimum resid-
usermanual.pdf.
ual method [33] with the hypre algebraic multigrid preconditioner. [20] fenicstools. URL: https://github.com/mikaem/fenicstools.
Depending on traffic on Abel, both CDP and channelFoam required [21] D.A. Steinman, et al., Variability of computational fluid dynamics solutions
approximately 1.4–1.7 s real time per time step, which is very close for pressure and flow in a giant aneurysm: The asme 2012 summer
bioengineering conference cfd challenge, J. Biomech. Eng. 135 (2) (2013).
to the speed obtained by Oasis. For both CDP and OpenFOAM speed [22] K. Valen-Sendstad, D.A. Steinman, Mind the gap: Impact of computational
was strongly dominated by the Krylov solvers and both showed the fluid dynamics solution strategy on prediction of intracranial aneurysm
same type of weak scaling as Oasis on the Abel supercomputer. hemodynamics and rupture status indicators, Am. J. Neuroradiol. 35 (3) (2014)
536–543.
[23] NEKTAR. URL: http://wwwf.imperial.ac.uk/ssherw/spectralhp/nektar.
[24] Y. Ventikos, Resolving the issue of resolution (2014).
Acknowledgment
[25] F. Guillén-gonzález, G. Tierra, Superconvergence in velocity and pressure for
the 3D time-dependent Navier-Stokes equations, SeMA J. 57 (1) (2012) 49–67.
This work has been supported by a Center of Excellence grant [26] R.D. Moser, J. Kim, N.N. Mansour, Direct numerical simulation of turbulent
from the Research Council of Norway to the Center for Biomedical channel flow up to re_tau = 590, Phys. Fluids 11 (1999) 943–945.
[27] C. Fureby, F.F. Grinstein, Monotonically integrated large eddy simulation of
Computing at Simula Research Laboratory. free shear flows, AIAA J. 37 (5) (1999) 544–556.
[28] H. van der Vorst, Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG
for the Solution of Nonsymmetric Linear Systems, SIAM J. Sci. Stat. Comput. 13
References (2) (1992) 631–644.
[29] C.C. Paige, M.A. Saunders, Solution of sparse indefinite systems of linear
[1] VMTK—The Vascular Modeling Toolkit. URL: http://www.vmtk.org. equations, SIAM J. Numer. Anal. 12 (1975) 617–629.
[2] Gmsh: a three-dimensional finite element mesh generator with built-in pre- [30] Hypre. URL: http://acts.nersc.gov/hypre/.
and post-processing facilities. URL: http://www.geuz.org/gmsh/. [31] CDP. URL: http://web.stanford.edu/group/cits/research/combustor/cdp.html.
[3] The CUBIT geometry and mesh generation toolkit. URL: https://cubit.sandia. [32] github.com/openfoam/openfoam-2.1.x/tree/master /applications/solver-
gov. s/incompressible/channelfoam. URL: github.com/OpenFOAM/OpenFOAM-
[4] S. Balay, et al. PETSc Web page, http://www.mcs.anl.gov/petsc (2013). 2.1.x/tree/master/applications /solvers/incompressible/channelFoam.
[5] M.A. Heroux, et al., An overview of the trilinos project, ACM Trans. Math. [33] Y. Saad, M. Schultz, Gmres: A generalized minimal residual algorithm for
Software 31 (3) (2005) 397–423. solving nonsymmetric linear systems, SIAM J. Sci. Stat. Comput. 7 (3) (1986)
[6] OpenFVM. URL: http://openfvm.sourceforge.net/. 856–869.

You might also like