Hokpunna, Compact Fourth Order
Hokpunna, Compact Fourth Order
Hokpunna, Compact Fourth Order
Fachgebiet Hydromechanik
ARPIRUK HOKPUNNA
Vollstandiger Abdruck der von der Fakultat fur Bauingenieur- und Vermessungswesen der
Technischen Universitat Munchen zur Erlangung des akademischen Grades eines
Doktor-Ingenieurs
genehmigten Dissertation.
Die Dissertation wurde am 23.09.2009 bei der Technischen Universitat Munchen eingereicht und
durch die Fakultat fur Bauingenieur- und Vermessungswesen am 10.12.2009 angenommen.
Acknowledgements
I would like to thank colleagues at Fachgebiet Hydromechanik of Technische Universitat
Munchen and express my special appreciation to Prof. M. Manhart for supervising my
research. His insightful suggestions have always been a guided light for my work. All of my
work would not be made possible with out his constant helps and supports. I would like to
express my sincere gratitude to Dr.-Ing. M. Haselbauer, Dr.-Ing. F. Schwertfirm, Dipl.-Ing.
N. Peller. and Dipl.-Tech. math. C. Gobert. Discussion with them during coffee breaks
have always been enjoyable moments and most of the time, come along some new interesting
ideas. Many thanks to Dr.-Ing. C. Raap who always gives excellent suggestion regarding
regulations and administrative requirement of the Faculty.
There are no word that could be enough to express an appreciation to my parents, Prof.
Wech and Aj. Wilaiwan Hokpunna. Their love, their encouragement and the thoughtful plan
in paving my basic education have shaped and equipped me with necessary quality to carried
out this work. My last words of gratitude, and by no meanleast, I would like to express my
indebted appreciation to my wife, Dr. med. Warangkana A. Hokpunna. Her companionship
brings out the best in me.
II
Contents
Acknowledgements I
Contents V
List of figures IX
Nomenclature XIII
Zusammenfassung XVII
Abstract XIX
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contribution of this work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Validations 31
3.1 Taylor-Green vortex flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Doubly-Periodic shear layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Instability of plane channel flow . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 Turbulent channel flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.1 Choice of the approximation of the convective velocities . . . . . . . . 40
3.4.2 Grid dependency study . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4.3 Comparison with the second-order scheme . . . . . . . . . . . . . . . 43
3.4.4 Effects of second-order solution of pressure and nonlinear correction . 44
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Literature 121
VI Contents
List of Tables
2.1 Square root of L2 -norm of errors of the correction term (kNexact Ni k2 ) over
kh [0, /2]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 Maximum error of the streamwise velocity and its convergence rate of T4 and
DF4 convective velocities using fourth-order solution of pressure at t = 1.2,
in doubly periodic shear layers flow. . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Error and the convergence rate of the streamwise velocity at t = 1.2 using
DF 4 convective velocities and second-order solution of pressure (D2 G2 ). . . 34
3.3 Convergence study in y-direction of the instability of plane channel flow on
uniform grid with Nx = 64. The table shows growth-rates of perturbations
(Gp ), their errors (p ) and the corresponding convergence rates (Ca ). . . . . . 37
3.4 Convergence study in y-direction of the instability of plane channel flow on
nonuniform grid with Nx = 64. . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5 Error of the perturbations growth rate (p ) relative to the growth rate on the
finest grid ( 4.428E-03 ) using second-order approximation of pressure. . . . 39
3.6 Specification of numerical grids used in the grid dependency study. . . . . . 40
3.7 Mean flow variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.8 CPU-seconds per time step spent in momentum equation(M) and the enforce-
ment of continuity(P) on an Opteron 8216. . . . . . . . . . . . . . . . . . . . 44
5.1 Maximum error and convergence rate of the streamwise velocity at t = 1.2
subjected to different projection methods. Extra digits are given to P44 and
P42. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
VIII List of Tables
5.2 Global flow parameters of the turbulent channel flow under different projection
methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.1 Numerical grid used in this study and the grid spacings at the wall and the
center of the channel in term of channel half-width(H). . . . . . . . . . . . 92
6.2 CPU-seconds per time integration on AMD opteron 8216 of the sequential
fourth-order scheme with direct and iterative solvers for the pressure. . . . . 94
6.3 Recommended grid resolutions for DNS and LES from [WHS07]. . . . . . . 99
6.4 Summay of turbulent channel flow simulations on fine grids.. . . . . . . . . 100
6.5 Summay of turbulent channel flow simulations on coarse grids.. . . . . . . . 108
6.6 Scalability of parallel compact scheme of T180-G1 case on commodity work-
station.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.7 CPU-time per time step used in T180-G1 and the scalability of the fourth-
order scheme on ALTIX 4700. . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.8 Comparison of CPU-time per time integration and the scalability of the
fourth- and the second-order schemes in T390-G2 on ALTIX 4700. . . . . . . 111
List of Figures
3.1 (a) Maximum errors of the velocity at t = 10 of the fourth-order and the
second-order scheme applied to inviscid TGV with c1 = 0 and c2 = 0. (b)
Maximum error of the fourth-order scheme applied to classical TGV(A) and
convective TGV(B). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Contour plot of vorticity from -36 to 36. Left: second-order, Right: fourth-
order (DF4). The resolution are 642 , 1282 and 2562 from top to bottom. . . . 35
X List of Figures
3.3 Time evolution of the perturbation energy on: (a) two uniform grids and (b)
non-uniform grids with different stretching factor (S). All grids are computed
with Nx = 64. On uniform grid, the resolution in the wall-normal direction of
the two grids are Ny = 64 and 128 and the result with nonlinear correction are
denoted by 64+NC and 128+NC. The result without the nonlinear correction
are denoted by 64 and 128. . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 Mean profiles of streamwise velocity along wall-normal direction on grid A (a)
and grid B (b). Averaged mass imbalance per unit volume on u-momentum
cells of T4 (c) and DF4 (d). . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5 (a) Convergence of mean streamwise velocity. (b) Mean streamwise velocity
on grid M2 compared with other two spectral solutions from [KMM87] (dashed
line) and [MKM99] (solid line). (c) Square root of Reynolds normal stresses
on grid M2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.6 Skewness factor (a) and Flatness factor (b). . . . . . . . . . . . . . . . . . . 48
3.7 One-dimensional energy spectra in spanwise (a) and streamwise (b) directions
of the streamwise velocity at z + = 178. . . . . . . . . . . . . . . . . . . . . 48
3.8 Comparison of the mean streamwise velocity profiles of second-order and
fourth-order scheme: (a) mean streamwise velocity, (b) one-dimensional spec-
tra of streamwise velocity in x-direction on grid F. . . . . . . . . . . . . . . 49
3.9 Mean streamwise velocity on grid C(a), M1(b) and M2(c) and the Reynolds
shear stress on grid M2 (d). The thin solid line is the full fourth-order scheme
(4M-4P-wNON). The open circle (4M-2P-wNON) is the solution using the
fourth-order scheme for convective and diffusive terms but second-order ap-
proximation of divergence and pressure gradient. The plus symbol (4M-4P-
woNON)is the full fourth-order schemes without the nonlinear correction. . 50
4.1 Matrix Q of the interface splitting algorithm and its N k subdiagonal where
c = (k 1)m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Absolute speedup (S ) of multiple right-hand side problem on Linux cluster. 63
4.3 Running time (a) and the scaled-efficiency (b) of the interface-splitting algo-
rithm on Altix4700 compared to ScaLAPACK: . . . . . . . . . . . . . . . . . 64
5.1 Contour plot of the divergence introduced by numerical approximations kTy (ky)
Tx (kx)k from Eq.(5.12): (a) second-order on collocated grids, (b) second-order
on staggered grids, (c) fourth-order on collocated grids and (d) fourth-order
on staggered grids. Horizontal and vertical axis are the components of the
Fourier mode in x and y respectively. See (a) for numerical values of the
contours. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
List of Figures XI
6.1 Comparison of statistics from the solutions of parallel and sequential versions
of the fourth-order scheme on 1283 grid cells. . . . . . . . . . . . . . . . . . 95
6.2 Effect of the sampling procedures to the Reynolds shear stress in turbulent
channel flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.3 Effect of the nonlinear correction to the Reynolds normal stress in turbulent
channel flow: solid-line:cell-averaged values and dash-line:surface-averaged
values with nonlinear correction. The top pair is < u u >, the middle pair is
< v v > and the bottom pair is < w w >. . . . . . . . . . . . . . . . . . . . 97
6.4 Comparison of probability density functions of the streamwise velocity be-
tween parallel and sequential versions of the fourth-order scheme at two po-
sitions: (a) close the wall at z + = 5 and (b) at the center of the channel. . . 102
6.5 Mean streamwise velocity profile (left) and r.m.s. of velocity fluctuations
(right) of turbulent channel flow on the domain and the grids comparable
with those used by spectral code[MKM99]. Top:Re = 180 ; middle:Re =
395 ; bottom:Re = 590. Plus symbol: u-component ; square symbol: v-
component; triangle symbol: w-component. . . . . . . . . . . . . . . . . . . 103
6.6 Skewness factor (left) and flatness factor (right) of turbulent channel flow.
Top:Re = 180. Middle:Re = 395. Bottom:Re = 590. . . . . . . . . . . . 104
6.7 One-dimensional spectra of the streamwise velocity at z + = 5 (left) and at
the center of the channel (right) of turbulent channel flow. Top: Re = 180.
Middle: Re = 395. Bottom: Re = 590. . . . . . . . . . . . . . . . . . . . 105
6.8 Mean streamwise velocity (a) and r.m.s. of velocities fluctuations(b) of T950-
G4 (every two grid cells are shown). . . . . . . . . . . . . . . . . . . . . . . 106
XII List of Figures
Machine accuracy
Kinematic viscostiy
(u1 , u2 ) Difference of the convective and diffusive fluxes between two velocities
Roman Symbols
k Wave number
XIV Nomenclature
u Velocity vector
u Friction velocity
bk
u Fourier component of u
u, v, w Velocity in x, y, z direction
u, v, w Momentum in x, y, z direction
x Exact solution to Ax = b
e
x Approximate solution to Ax = b
x, y, z Cartesian coordiantes
A Tridiagonal matrix
Re Reynolds number
T Transformation matrix
T Transfer function
1.1 Motivation
Over a half century, Computational Fluid Dynamics (CFD) plays a very important part
helping engineers and scientists to understand the nature of turbulent flows. An accurate
time-dependent numerical simulation of incompressible fluids can be obtained by Direct Nu-
merical Simulations (DNS) which solve the discrete Navier-Stokes equations directly. DNS
can be very accurate but it is extremely expensive. Physical requirement dictates that one
must resolve the flow close to Kolmogorov length scale. This spatial resolution requirement
is roughly rising with O(Re9/4 ). Taking the number of time integration in to account, a
complexity of O(Re3 ) is expected. This scaling restricts DNS to a low or moderate Reynolds
number. We can, however, expect an accurate prediction when this grid resolution is used to-
gether with spectral scheme. When second-order scheme are used instead, but the same qual-
ity of the solution is desired, this high complexity must be multiplied by a factor of 16 which
can transform a prohibitively expensive computation into an impossible computation. In
order to reduce this factor, a higher-order scheme is required.
If one wish to learn only the large scale structure of the flow, a promising alternative
approach to DNS is the Large Eddy Simulation (LES), in which the large-scale structures of
the flow are resolved and the small-scale structures are modelled. In an essence of numerical
simulations, these two approaches rely heavily on the accuracy of the spatial discretisations.
A satisfactory simulation can not be obtained if the dynamics of the flow are not described
in a sufficiently accurate way. Modeling effects of the small scales in an LES will not improve
the overall accuracy of the solution when the numerical error was larger than the effects of the
small scales [Gho96]. The numerical accuracy can be improved by increasing the numerical
grid or increasing the accuracy order of the numerical approximations. The latter approach
has become an active field of research recently because in three dimensional simulations, the
cost of higher order is linearly proportional to second-order scheme but the number of grid
point is reduced cubically. This scaling give a tremendous favour to higher-order methods
over a brute force increasing the number of grid points.
Higher order approximations can be computed explicitly using Lagrange polynomial.
The n-th order approximation of the m-order derivative requires at most n + m abscissas.
Alternatively, one can couple unknown values to the abscissas and solve a linear equation
2 1 Introduction
system. These implicit approximations have shorter stencils and have been called com-
pact scheme by Lele and Sanjiva [Lel92]. They demonstrate the superiority of compact
schemes over traditional explicit schemes and show that for intermediate wave numbers,
the compact fourth-order scheme is even better than the explicit sixth-order scheme. They
quantify the resolution characteristics of second and higher-order schemes and point out
that for a relative error of 0.1%, the fourth-order compact differentiation requires 5 points
per half-wave. Fourth-order explicit requires 8 grid points and the second-order requires
50. The explicit fourth-order requires 30% more grid points to achieve the same accu-
racy as the compact schemes. Therefore the compact fourth-order scheme is very attrac-
tive.
The finite-volume methods (FVM) hold a strong position in the CFD community be-
cause of their intrinsic conservation properties. Despite of the popularity of the second-order
FVM, there are only a few papers addressing its developments towards higher-order. The
complicated relationship of volume averaged values and surface fluxes made higher-order
FVM more difficult than the finite difference (FD) counterpart. The first concept that
tries to link compact schemes to FVM is presented by Gaitonde and Shang [GS97]. They
present fourth- and sixth-order compact finite volume methods for linear wave phenomena.
However, the so-called reconstruction procedure is needed to compute the primitive value
and this costs significant computational time. A more economical approach is proposed
by Kobayashi [Kob99]. He directly calculates the surface-averaged from the cell-averaged
values. Explicit and implicit approximations based on the cell-averaged value up to twelfth-
order are analysed. Pereira et al. [PKP01] presented a compact finite volume method for the
two-dimensional Navier-Stokes equations on collocated grids. Piller and Stalio [PS04] pre-
sented a compact finite volume method on staggered grids in two dimensions. Lacor [CSM04]
propose a finite volume method on arbitrary collocated structured grids and perform LES
of turbulent channel flows at Re = 180. LES of the same flow with explicit filtering is per-
formed in [EK05] using the spatial discretization proposed in [PKP01]. Fourth-order finite
volume in cylindrical domain is developed in [SW07] and DNS of pipe flow of Re = 360 is
performed.
Staggered grids have become a favorable arrangement over collocated grids because
of the pressure decoupling problem. The pressure decoupling is not confined only in the
lower order schemes. This problem is already reported in [PKP01] when using even number
of cells with fourth-order scheme. This problem can be avoided by limiting ourselves to
an odd number of cells which poses an undesirable limitation on grid design. Recently,
staggered grids is shown to be more robust than collocated grids by Nagarajan [NLF03]
in Large-eddy simulations. Thus compact finite volume on staggered grids deserves more
attention.
1.2 Contribution of this work 3
approximation of the Poisson equation for the pressure was sufficient, then the higher-order
accuracy in the velocities can be achieved at a marginal cost. However, if a fourth-order
treatment of the pressure is necessary, a 19-points stencil of the Laplacian operator must be
used instead of the simple 7-point stencil.
(iv.) Efficient iterative solution for pressure: Stencils of the fourth-order Laplacians
given by the projection method are three times wider than the second-order scheme. Such
stencil requires a wider overlapping domains and imposes significant cost on the solution of
pressure. The computation of the residual alone is already three times more expensive than
the second-order projection. Accurate and efficient algorithm solving this Laplacian is needed
to enable the fourth-order method to be useful in practice.
(v.) Parallelisation of tridiagonal systems: The implicitness that gives a higher resolv-
ing power to the compact schemes prevents a simple parallelisation. The compact fourth-
order needs to solve a tridiagonal systems along the direction of approximation. This process
is strictly two-ways sequential, one for the forward elimination and another one on the back-
ward substitution. Therefore when the data along this line are distributed on different
processors, a special implementation are needed. Classical algorithms solving tridiagonal
matrices in parallel are either twice more expensive or requires frequent data transfer. Re-
cent algorithm of Sun [Sun95] allows an efficient algorithm but when the solver are called,
the caller have to had a full knowledge of grid connection which is against the cocept of do-
main decomposition. His algorithm requires two times of a uni-directional communication.
Most of the computer architectures today are able to handle bi-directional communications
and thus we still need more efficient algorithms.
The objective of this work is to answer the mentioned problems by a systematic study.
The first phase of the work is dedicated to the fundamental development of the method where
problem (i.)-(iii.) are solved. The developed algorithm is then evaluated by well known
numerical benchmarks. Once the successful algorithm is developed, we proceed to the second
phase where we are dealing with the problem (iv)&(v.). The code is parallelised and efficient
solver for higher-order discretisation of Poisson is developed.
The main achievement of this work is the highly-accurate fourth-order solver for the
Navier-Stokes equations and the companion novel algorithms namely :
1.3 Outline
In chapter 2, the numerical approximations of each term in the Navier-Stokes equations are
presented and analysed. This chapter is designed to present the development of a fourth-
order method for finite volume discretisation of the Navier-Stokes Equations. A novel in-
terpolation that preserves the discrete divergence-free property of the velocity fields on all
discrete cells is presented. This method is generalised for arbitrary order of accuracy. An-
other fourth-order convective velocity that is not divergence-free is presented for comparison.
Several choices of nonlinear corrections and the role of the pressure term are studied. The
higher resolution properties of the cell-centered deconvolutions for divergence and gradient
calculations are demonstrated.
In chapter 3 , the proposed scheme is evaluated. The necessity of fourth-order approx-
imations of divergence and gradients are numerically verified. Fourier analysis shows that
staggered grids can satisfy the incompressibility constraint better than collocated grids, thus
retaining more accurate information in the flow field. The performance of the fourth-order
scheme is carefully investigated. Despite the fact that higher-order schemes are shown to be
vastly superior to second-order schemes in laminar flows by numerous authors, some recent
papers report disappointing findings in the application of higher-order schemes to turbulent
flows. Gullbrand[Gul00] applies fully-conservative explicit fourth-order scheme of Morinishi
et al.[MLVM98] and Vasilyev[Vas00] to a DNS of turbulent channel flow. Knikker[Kni08] uses
fully-conservative compact fourth-order scheme on the same flows. The grid resolutions used
in their simulations are comparable to those used by the spectral code in [MKM99]. They
both report that differences between second-order and the fourth-order schemes are negli-
gible and significantly differ from the reference solution. Meinke et al.[MSKR02] comment
that the sixth-order compact scheme is comparable with the second-order upwind scheme in
large-eddy simulation of turbulent channel and jet flows. Shishkina and Wagner [SW07] also
note a similar finding in their DNS of turbulent pipe flow but point out that the fourth-order
scheme improve the third- and the fourth-order statistics. In this work we will show that
in a turbulent channel flow, our fourth-order scheme can deliver a comparable result to the
second-order scheme using only eight times less number of cells. The goal of this chapter is to
verify the fourth-order convergence rate and carefully investigate whether those unfavourable
findings will be observed in the proposed scheme.
The interface-splitting presented in Chapter 4 allows an efficient way of solving tridi-
agonal systems in parallel on distributed-memory machines. Factors determining the ac-
curacy and efficiency of the algorithm are presented and the error bound is derived. The
performance and scalability of the algorithm are evaluated on Gigabit cluster and ALTIX
4700.
Chapter 5 presents the divergence-free approximate projection which ensures the fourth-
6 1 Introduction
order accuracy of the spatial approximations without being excessively expensive. According
to the projection method, we have to solve the 19-point Laplacian in order to preserve the
fourth-order accuracy of the approximation used in the momentum equation. This would
need three ghost cells. In second-order codes, two ghost cells are usually sufficient. Im-
plementation of such Laplacian would need to engineer the whole code. The proposed
approximate projection method presented in this chapter allows the fourth-order scheme
to be added to second-order codes without modifying the number of ghost cells. The de-
veloped algorithm needs to solve only 13-point Laplacian without recomputing the diver-
gence.
The presented numerical algorithms are combined and implemented in the MGLET
code. This code has been developed over several decades and currently belong to Fachge-
biet Hydromechanik, Technische Universitat Munchen. Detail information on the numerical
approaches used in this code can be found in [Man04]. In chapter 6, the implementation of
the parallel compact fourth-order scheme is evaluated. The parallel version is first compared
to the sequential version using the turbulent channel flow Re = 180. The parallel version
is then used to simulate the turbulent channel flow up to Re = 950. The grid resolutions
necessary to achieve good predictions of the first- and second-order statistics are identified.
The scalability of the parallelised fourth-order scheme is finally compared to the parallel
version of the second-order scheme.
2 Finite Volume Discretisation of
Navier-stokes on Staggered Grids
In this chapter we define the governing equations of incompressible fluids we intend to solve
using a finite-volume method. We then describe our staggered grid system. Next, the numer-
ical approximations of each term in the Navier-Stokes equations are discussed, follows by the
projection method. Finally, we close the chapter by the Fourier analyses of the approxima-
tions of convolution, differentiation and the nonlinear terms.
Here, u defines the velocity vector, p the pressure, T the strain rate tensor, the density
and the kinematic viscosity of the fluid while n is the unit vector on dA pointing outside
of the volume .
the following mapping between two indices, xsi+1/2 = xi+1 and xi+1/2 = xsi . We explicitly
call the vector of staggered grid point as xs to emphasize that xsi = 21 ((i) + (i + 1)) 6=
(i + 12 ). This setting allows an accurate calculation of the divergence on the pressure
cell because surfaces of pressure cells are placed exactly at the middle of the momentum
cells.
The finite volume method describes the changes of a volume-averaged quantity by
the net fluxes on the surface enclosing that control volume. These fluxes are surface-
averaged quantity. In a second-order context, pointwise, cell-averaged and surface-averaged
values are interchangeable because the second-order local truncation error is acceptable.
In higher-order context, they are not and must be well identified. In this work, a cell-
averaged value of f defined on a collocated control volume i,j,k = xi yj zk is denoted
by
Z xi+1/2 Z yj+1/2 Z zk+1/2
1
[f ]xyz
i,j,k = f (x, y, z)dx dy dz. (2.3)
i,j,k xi1/2 yj1/2 zk1/2
In order to make a distinction of averaged values defined on staggered grid points from the
collocated ones, the s is appended to the indices. Here is stands for the i-th staggered grid
point, x = xs(i). For example [u]xyz xyz
is,j,k is half-a-cell staggered from [p]i,j,k in the positive
x-direction. Surface and line-averaged values can be defined in a similar way by reducing
the dimension of integration to two and one respectively. For example [p]yz is,j,k is surfaced-
averaged value of p on the yz-plane located at xsi .
p(i,j,k)
u(is,j,k)
w(i,j,ks)
p(i+1,j,k)
u(is+1,j,k)
w(i,j,ks+1)
[div]xyz
i,j,k xi yj zk = [u]yz
i+ 12 ,j,k
[u]yz
i 21 ,j,k
yj zk
+ [v]xz
i,j+ 21 ,k
[v]xz
i,j 21 ,k
xi zk (2.5)
+ [w]xy
i,j,k+ 1
[w]xyi,j,k 1
xi yj
2 2
[u]xyz
is,j,k 1
Sis,j,k = Cis,j,k + Dis,j,k Pis,j,k (2.6)
t
The terms Cis,j,k , Dis,j,k and Pis,j,k are shorthand notations of the net convective,
diffusive and pressure fluxes, respectively. On a Cartesian grid they are defined as fol-
lows:
Cis,j,k = [uu]yz 1
is+ 2 ,j,k
[uu] yz
1
is 2 ,j,k
yj zk
+ [vu]xz
is,j+ 21 ,k
[vu]xz
is,j 21 ,k
xsis zk (2.7)
+ [wu]xy
is,j,k+ 1
[wu]xyis,j,k 1
xsis yj ,
2 2
10 2 Finite Volume Discretisation of Navier-stokes on Staggered Grids
yz
yz !
u u
Dis,j,k = yj zk
x is+ 1 ,j,k x is 1 ,j,k
xz
2
xz
2
!
u u
+ xsis zk (2.8)
y is,j+ 1 ,k y is,j 1 ,k
xy
2
xy
2
!
u u
+ xsis yj ,
z is,j,k+ 1 z is,j,k 1
2 2
yz yz
Pis,j,k = [p]is+ 1 ,j,k [p]is 1 ,j,k yj zk
2 2
xz xz
+ [p]is,j+ 1 ,k [p]is,j 1 ,k xsis zk (2.9)
2 2
xy xy
+ [p]is,j,k+ 1 [p]is,j,k 1 xsis yj .
2 2
Here we introduce a distinction of the two velocities in the convective term into convective
velocities and the convected velocities. The convective velocities are denoted by Sans-serif
fonts u, v and w. The convected velocities (momentum per unit mass) are denoted by Roman
font u, v and w. It is important to note that the convective velocities have to be conservative,
i.e. the divergence over the momentum cells has to be zero. If the convective velocities were
not mass-conservative, an additional source term,
will be added to the r.h.s of momentum equation (Eq.(2.6)). Although this source term
does not affect the global conservation of the momentum due to the telescoping property of
FVM, the quality of the local solution is degraded and the Galilean invariant is violated as
well.
All the discrete equations in this section are exact and no simplifications or approxi-
mations have been introduced so far. Approximation errors will be introduced when these
fluxes are approximated from the volume-averaged values.
The coefficients of this deconvolution can be found by Taylor expansion or the method of
1 13
undetermined coefficients. On uniform grids they are 1 = 3 = 24 and 2 = 12 . Let
his1 = xsis1 , his = xsis and his+1 = xsis+1 the coefficients on non-uniform grids are
given by the following formulas:
h2is
1 = , 2 = 1 (1 + 3 ),
4(his1 + his )(his1 + his + his+1 )
h2is
3 = .
4(his + his+1 )(his1 + his + his+1 )
Here we do not explicitly compute 2 from the grid spacing, but the consistency crite-
rion is used instead. We call this approximation cell-centered deconvolution because it ap-
proximates the surface-averaged values at the center of the cells from the volume-averaged
ones.
Note that, even if the stencil is exactly the same as in the previously introduced cell-centered
deconvolution, the coefficients can be different when the grid is not uniform. This is because
the deconvolved values are not lying on the center of the cells and the following coefficients
12 2 Finite Volume Discretisation of Navier-stokes on Staggered Grids
must be used :
According to Eq.(2.7), the convected velocities u on the surface enclosing the staggered
control volumes are needed. In contrast to Eq.(2.11) where the deconvoluted quantity is
located at the center of the volume-averaged one, here the desired surface-averaged fluxes
are needed at the interfaces between momentum cells i.e. between the volume-averaged
quantities. We call this an intercell-deconvolution. These surface-averages of the momentum
are positioned at nonstaggered grid points e.g. xi and they can be approximated by the
following fourth-order compact deconvolution [Kob99]:
h2is h2is1
1 = , 2 = ,
(his + his1 )2 (his + his1 )2
2h2 (his + 2his1 ) 2h2 (his1 + 2his )
7 = is , 8 = is .
(his + his1 )3 (his + his1 )3
It is possible to tune these coefficients in the Fourier space and obtain a better resolution
for high wave numbers [Lel92, KL96], however, at the expense of the asymptotic conver-
gence rate. In this work, we aim to construct a genuine fourth-order numerical scheme
for the Navier-Stokes equations, therefore only formal fourth-order schemes are studied
here.
2.4 Numerical Approximations 13
his (h2is1 + his1 his h2is ) (h2 + his1 his h2is1 )his1
3 = , 4 = is ,
K K
12his1 his 12his1his
9 = , 10 = ,
K K
K = (his + his1 )(h2is + 3his1 his + h2is1 ).
[u]xyz
is1,j,k [u]xyz
is,j,k
[u]yz
i1,j,k [u]yz
i,j,k [u]yz
i+1,j,k
the momentum fluxes. Here we present a more compact form of a fourth-order approxi-
mation. This approximation only utilizes the information on the two cells enclosing the
interested surface (see Fig.2.3). The compact approximation for the convective velocity
[w]xy
is,j,ks reads
[w]xy 4
is,j,ks = k1 k2 k3 + R( ) (2.15)
where,
6 xyz
k1 = [w]is 1 ,j,ks + [w]xyz
is+ 12 ,j,ks
8 2
1
k2 = [w]yz yz yz
is+1,j,k 2[w]is,j,k + [w]is1,j,k
8
1 xy
k3 = [w]is 1 ,j,ks 1 + [w]xy
is+ 21 ,j,ks 21
+ [w]xy
is+ 21 ,j,ks+ 21
+ [w]xy
is 21 ,j,ks+ 21
8 2 2
1 4w 4 1 4w 2 2 1 4w
R= 4
x 2 2
x z 4
z 4 .
384 x 192 x z 1920 z
This stencil is as small as the second order stencil thus it does not require extra boundary
closures. The surface-averaged velocities obtained from the two deconvolutions in Eq.(2.13)
and Eq.(2.15) can be used as convective velocities. These convective velocities are fourth-
order accurate but not necessarily mass-conservative. We denote these convective velocities
as T 4.
[w]xyz
is 1 ,j,ks
[w]xyz
is+ 1 ,j,ks
2 2
J
]
J
[p]xyz J [w]xy
i,j,k J is,j,k+ 1
2
[u]xyz
is,j,k
Concept
The simplest way of computing divergence-free convective velocities is to use a linear combi-
nation of the volume fluxes that are already divergence-free. This means we should compute
the convective velocities from the volume fluxes over the surface of the pressure cells where
the continuity is enforced. For example, the convective velocity of the u-momentum can
be computed from the volume fluxes on neighbouring pressure cells sharing the same x-
coordinate. The remaining difficulty is that we have to work with two directions of fluxes.
In the first, the fluxes are aligned with the momentum e.g. the approximation of u for the u-
momentum. In the second, the fluxes are normal to the momentum e.g. the approximation of
w for the u-momentum. These fluxes are defined on different positions and when the grid was
not uniform, they would require a different set of coefficients.
Now, we consider the discrete divergence written as a summation of matrix-vector
multiplications:
Dx u + Dy v + Dz w = div. (2.16)
Any linear transformations matrix T applied to this equation will not change the sum-
mation. This means if the same interpolation was used for all three velocities, the mass
conservation will remain unchanged. Using constant coefficient is of course one of the pos-
sibilities, but this does not give a fourth-order convergence. In order to use the same in-
terpolation, we have to convert one of the fluxes into a compatible form with the other
one.
The interpolation of the fluxes aligned with the momentum can be done easily using
Lagrange interpolations. Therefore we choose to convert the fluxes normal to the direction of
the momentum. Inspired by the primitive value reconstruction of Gaitonde and Shang[GS97],
we convert the fluxes normal to the direction of the momentum to line averaged ones such
16 2 Finite Volume Discretisation of Navier-stokes on Staggered Grids
that the same interpolation can be used. To this end, we invoke the second fundamental
theorem of Calculus:
Z b
f (x) = F (b) F (a), (2.17)
a Z
F = f (x). (2.18)
In two dimensions, the surface-averaged value of the volume flux on the top of pressure cell
and the associated line-averaged primitive are related by
1
[w]xy
i,j,ks = [W ]yis,j,ks [W ]yis1,j,ks . (2.19)
xi
The primitive values can be reconstructed at the top of the East and West faces of the
pressure cell using this formula. We can now interpolate these values using the same method
as was used for u. After that, the interpolated primitive values can be converted back
to surface-averaged values using the same relationship. These two conversions are exact.
However, a direct implementation of the above method is expensive. A total floating point
operations of 8m is required for the (2m)th -order interpolation, instead of just 4m 1. The
novelty of our approach is the elimination of these extra costs.
In what follows, we derive a method to approximate the second-order divergence-free
convective velocities and generalize the method for arbitrary order of accuracy.
To derive an expression for the convective velocities which is divergence-free and second-
order accurate, we start from the mass conservation equation of the u-momentum cell on
Si,j,k :
yz yz
[div]xyz
is,j,k xsi yj zk = [u]i+1,j,k [u]i,j,k yj zk
+ [v]xz xz
is,js,k [v]is,js1,k xsi zk (2.20)
+ [w]xy
is,j,ks [w] xy
is,j,ks1 xsi yj
Applying the second fundamental theorem of Calculus to the above equation leads to
yz yz
[div]xyz
is,j,k xs i y j zk = [u]i+1,j,k [u]i,j,k yj zk +
[V]zi+1,js,k [V]zi,js,k [V]zi+1,js1,k [V]zi,js1,k zk + (2.21)
[W]yi+1,j,ks [W]yi,j,ks [W]yi+1,j,ks1 [W]yi,j,ks1 yj .
2.4 Numerical Approximations 17
The divergence of the convective velocities is expressed in terms of variables given at xi and
xi+1 . A desired variable f at xi can be obtained by a second-order interpolation using the re-
spective variables from xsi1 and xsi by the following formula
where i,1 and i,2 are the respective interpolation coefficients. The reconstruction of the
primitive values of w on the pressure cells can be started by assuming that [W]yis1,j,k is
known and the subsequent primitive values can be computed using Eq.(2.19) and Eq.(2.22).
Together with Eq.(2.19) and Eq.(2.22) we obtain
[w]xy y y
is,j,ks xis =i+1,1 [W]is,j,ks + i+1,2 [W]is+1,j,ks
(2.23)
i,1 [W]yis1,j,ks i,2[W]yis,j,ks.
After regrouping of variables, the convective velocity on the top surface of [u]xyz
is,j,k is given
by
[w]xy y xy xy
is,j,ks = 0 [W]is1,j,ks + 1 [w]i,j,ks + 2 [w]i+1,j,ks , (2.24)
with,
1
0 = (i+1,1 + i+1,2) (i,1 + i,2 )),
xis
xi
1 = ((i+1,1 + i+1,2 ) i,2) .
xis
i+1,2 xi+1
2 = .
xis
The coefficient of the unknown primitive value, [W]yis1,j,k is reduced to a difference between
the sum of two sets of the interpolation coefficients. The consistency dictates that the sum of
any set of interpolation coefficients is equal to unity thus 0 = 0 and [W]yis1,j,k can be removed
from the interpolation. This leads to a convenient way of computing convective velocities us-
ing the new set of interpolation coefficients, . In this formulation, the construction of prim-
itive values and the back transformation are fully avoided.
The net volume flux leaving the control volume of is,j,k under the second-order divergence-
free interpolation is
This equation indicates that the imbalance of mass fluxes at the momentum cell is of the same
order of magnitude as the one enforced at the pressure cells.
18 2 Finite Volume Discretisation of Navier-stokes on Staggered Grids
[u]yz yz yz yz yz
i,j,k = i,1 [u]is2,j,k + i,2 [u]is1,j,k + i,3 [u]is,j,k + i,4 [u]is+1,j,k (2.26)
The coefficients of this interpolation are the same as the ones for a fourth-order Lagrange
interpolation of pointwise values. We can proceed with the similar procedure as in the
second-order divergence-free interpolation and arrive at
[w]xy xy xy xy xy
is,j,ks = is,1 [w]i1,j,ks + is,2 [w]i,j,ks + is,3 [w]i+1,j,ks + is,4 [w]i+2,j,ks . (2.27)
We call this convective velocity DF 4. On uniform grids, the two sets of interpolating coef-
ficients i,14 and is,14 are [ 1 , 9 , 9 , 1 ], in numerical order. The divergence-free inter-
16 16 16 16
polation of the convective velocities can be generalised for arbitrary order. Suppose that
a (2m)th -order Lagrange interpolation is used instead of Eq.(2.26), then the (2m)th -order
divergence-free interpolation of the convective velocity on the top surface of the u-momentum
cell is given by
2m
X
[w]xy
is,j,ks = is,l [w]xy
im+l,j,ks (2.28)
l=1
with,
2m 2m
!
X X xim+l
is,l = i+1,j i,j (2.29)
j=l j=l+1
xis
Identical coefficients are used for v. This higher-order divergence-free interpolation can be
applied for any position in the field.
y 2 f g z 2 f g
[f g]yz = [f ]yz [g]yz + + + O y 4, z 4 . (2.30)
12 y y 12 z z
The original formula of [PKP01] computes the correction term from the cell-averaged values.
First the second-order interpolation is used to compute the face-averaged values then the
computed values are used for the approximation of the first-derivative. Here, we use the
surface-averaged values which are readily available from the approximation of the momentum
fluxes (Eq.(2.13)). The use of the surface-averaged values allows a cheaper computation and
better resolution characteristics. The fourth-order approximation for the convection term
on the East face of u-momentum is given by:
1 yz 2
[uu]yz yz yz
i+1,j,k = [u]i+1,j,k [u]i+1,j,k + [u]i+1,j+1,k [u]yz
i+1,j1,k
48 (2.31)
1 2
+ [u]yz
i+1,j,k+1 [u]yz
i+1,j,k1 .
48
On the top face we use the following formula
[wu]xy xy xy
is,j,ks = [w]is,j,ks[u]is,j,ks
1 xy xy
xyz xyz
+ [u]is+1,j,ks [u]is1,j,ks [w]i+1,j,ks [w]i,j,ks (2.32)
24
1
xy xy xz xz
+ [u]is,j+1,k [u]is,j1,k [w]i,js,k [w]i,js1,k .
24
The proposed forms above are one of many possibilities. In section 2.7.2 we consider some
other possible forms of nonlinear corrections. However, the proposed forms here are the most
accurate.
Both approaches have to solve a Poisson equation, but with a different form of the discrete
Laplacians.
Consider the explicit Euler time integration of u-momentum. Let un be the velocity
and Hn be the contribution from convective and diffusive terms at time tn . Let u be the
provisional velocity evaluated without the pressure term and un+1 be the divergence-free
velocity field at the new time step when a suitable p is used. The equations for u and un+1
are shown below.
u = un + dtHn (2.33)
dt
un+1 = un + dtHn p (2.34)
The divergence of the difference between (2.33) and (2.34) gives the Poisson equation for the
pressure,
p = u . (2.35)
dt
This equation is identical to the one obtained from taking the divergence of the momen-
tum equation. Thus the pressure found in (2.35) is essentially the pressure at time tn .
Once the solution of pressure is obtained, the divergence-free velocity field can be recovered
by
dt
un+1 = u p. (2.36)
The new velocity is divergence free and its vorticity is equal to that of the provisional velocity
because,
dt
un+1 = u p = u (2.37)
In this derivation, the projection method and the pressure-Poisson method are essentially
the same. They will go separate ways when the approximations of gradient and diver-
gence are introduced. Suppose discrete divergence operators D and G are used to approx-
imate the divergence and gradient, respectively. Then the discrete form of equation (2.35)
is
DGp = Du . (2.38)
dt
The projection method adheres to this derivation and the discrete Laplacian is given by
L = DG. The Laplacian in a pressure-Poisson formulation represents the minimization
which is not related to the Navier-Stokes equations and thus any discrete Laplacian will
2.5 Projection method 21
suffice. Solving Eq.(2.38) by a direct method and correcting the velocity using the respective
discrete gradient will result in a machine accurate divergence. Using a Laplacian operator
other than this one will leave a significant divergence in the velocity fields, even when solved
with a direct method. When the Poisson equation is solved by iterative methods, pressure-
Poisson formulations need to recalculate the divergence and then start the iteration again.
On the other hand, projection methods only need to compute the divergence once. Therefore
the projection method offers a clear computational advantage over the pressure-Poisson
formulation when aiming at small mass-conservation errors. The projection operator deriving
the methods name is defined as
P = I G(DG)1 D.
Because the two components are orthogonal, the third term on the r.h.s vanishes and we
have
If the divergence-free field u were to have the same L2 -norm as u , the p and u can
not be orthogonal which is against the underlying concept of projection methods. This
equation indicates that the energy is strictly decreasing, when the provisional velocity was
not divergence-free. This fact is used by Chorin[Cho68] to show the stability of the projection
method. Therefore the projection method is stable, but not energy-conserving. Even if the
numerical scheme for the momentum equation was energy conserving, a reduction in L2
norm of the momentum can be expected.
In fourth-order context, we have the freedom to use second-order or fourth-order ap-
proximations for D and G. This leads to four possible choices of the Laplacian namely (i)
D2 G2 , (ii) D2 G4 , (iii) D4 G2 and (iv) D4 G4 . The first and the fourth Laplacian are formal
second-order and fourth-order, respectively. The other two are non-formal. In an existing
second-order code, the second-order projection method (D2 G2 ) is usually implemented. On
22 2 Finite Volume Discretisation of Navier-stokes on Staggered Grids
staggered grids this D2 G2 is a well known 7-points Laplacian which can be solved in a very
efficient way. The most important question here is whether D2 G2 is sufficient to deliver a
fourth-order accurate solution of the velocities. In this chapter we restrict the study to the
two formal Laplacians. In-depth investigation of these four Laplacians will be presented in
chapter 5.
2 [p]xyz
i,j,k [p]xyz xyz xyz xyz
i3,j,k 54 [p]i2,j,k + 783 [p]i1,j,k 1460 [p]i,j,k
= . (2.39)
x2 576
On nonuniform grids it is convenient to construct the Laplacian from the matrices D4x and
G4x which are the approximations of the cell-averaged values of divergence and gradient,
respectively. They are given by
Ic (is) Ic (is 1) Ic (i + 1) Ic (i 1)
D4x = and G4x = ,
xs(is) xs(is 1) x(i + 1) x(i)
where Ic and Ic are the cell-centered deconvolutions defined in Eq.(2.11) and Eq.(2.12),
respectively. The consistent Laplacian operator in x-direction is simply given by L4x =
D4x G4x and the three-dimensional Laplacian is
In our code, we use fast Fourier transformations in the xy-plane and Gaussian elimination
in z direction. After solving the Poisson equation, the divergence-free velocity is recovered
by Eq.(2.36) with the respective discrete gradient.
section we consider the treatment of the solid surface as a Dirichlet boundary condition
depicted in Fig.2.4. Approximation stencils near solid surfaces should not extend too much
into the inner domain because the strong differences in the velocity gradient near the wall
and in the inner domain decrease the accuracy. The compact fourth-order schemes require
only the two nearest cells at the boundary thus it is less sensitive to this problem compared
to the explicit fourth-order scheme.
ks+1
k+1
ks
k
ks1
(a) (b)
Figure 2.4: Arrangement of the closure stencils near the Dirichlet boundary condition: (a)
compact differentiation of collocated variables and (b) compact deconvolution
and differentiation of staggered variables. Solid rectangles are the collocated
cells(u) and dashed rectangles are staggered cells (w). Known values are shown
by the circles and the arrows represent the position of the approximated surface-
averaged values.
This closure has the same convergence rate as the differentiation in the inner domain when
the grid is not uniform. Thus using this third-order differentiation here does not degrade
the global accuracy.
tion
19 xy 11 1 xyz 4
[w]xy
i,j,1 + [w]i,j,2 = [w]xyz
i,j,1s + [w]i,j,2s + [w]xy . (2.42)
21 7 7 21 wall
The third-order closure for the differentiation in Eq.(2.14) is
xy xy
w 11 w 36 12 24
= [w]xyz
i,j,1s [w]xyz
i,j,2s [w]xy . (2.43)
z i,j,1 13 z i,j,2 23h 23h 23h i,j,wall
It is noteworthy that using deconvolved values for the differentiation is not recommended
even though its asymptotic errors are fourth-order. The nth order leading truncation term
of the deconvolution transfers into (n 1)th order for the differentiation thus using them for
the differentiation does not improve the accuracy.
In the cell-centered deconvolution and the approximation of convective velocities, we
simply set the velocity to the wall value, for example zero in the case of a no-slip wall.
The pressure cell within the wall is assumed to be equal to the pressure at the first cell in
p
the domain. This ensures a second-order accurate enforcement of n = 0 at the wall. A
similar extrapolation is also used by Verstaapen and Veldman in [VV03]. These treatments
are sufficient for the fourth-order convergence which will be shown numerically in the next
chapter.
2.7 Analysis
The accuracy of the NSE solver is determined by every single approximation step in the code.
It is important to understand how large the errors are being generated in each term. Fourier
analysis provides us with a quantitative error for each wave number. The convective and
diffusive terms used in this work have been studied already in [Kob99] and [PS04]. In this
section we perform a comparative study of numerical errors in the Fourier space. The fourth-
order compact deconvolution, compact differentiation, and cell-centered deconvolution are
compared.
A better measurement for comparison is the transfer function where the approximated value
is normalized by the exact one. In this section we use the concept of transfer function to
compare each approximation in the momentum equation as well as the mass conservation
equation.
In order to perform a Fourier analysis of a periodic function u(x) over the domain
[0, L], the function u(x) is decomposed into its respective Fourier components. We use a
scaled wave number, kh [0, ], similar to [Lel92] and each Fourier component is given by
bk exp(ikh). The model equation we consider here is the one-dimensional transport equation
u
in a periodic domain described by
u u 2u
+c = 2. (2.44)
t x x
The finite volume discretisation of the above equation is
" #
1 ui f
u f
u
+c uei+1/2 u
ei1/2 = |i+1/2 |i1/2 . (2.45)
t x x
In this one-dimensional problem we use the overbar to represent cell-averaged values and the
approximations are represented by tilde symbol. Projecting the above equation in the Fourier
space, we obtain the following equation for each wave-number k:
bk
u bk = k 2 TD (k)u
+ ickTI (k)u bk (2.46)
t
The transfer function of an approximation is defined by the ratio of the approximated value
over the exact value, for example TI (k) = b ek /b
u uk defines the transfer function of a decon-
volution. The transfer function of the differentiation is defined equivalently. The overall
accuracy of a numerical solution of the transport equation is determined by TI and TD . For
the purpose of a general analysis, let c = = 1 such that only errors of the approximations
are considered.
The transfer functions of the cell-centered deconvolution Eq.(2.11), inter-cell deconvolu-
tion (Eq.(2.13)) and the differentiation (Eq.(2.14)) are plotted in Fig.2.5. The significant im-
provement of the compact fourth-order deconvolution over the second-order is clearly shown.
During the projection step, the mass fluxes are computed by the cell-centered deconvolution
(Eq.(2.11)). According to Fig.2.5, the second-order cell-centered deconvolution (mid-point
rule) is less accurate than the fourth-order inter-cell deconvolution (Eq.(2.13)), especially for
0.5 < kh < 2. When the mid-point rule was used to enforce the mass conservation, the er-
rors of the wave components in this range will remain in the velocity fields and thus degrade
the level of accuracy that was achieved by the compact fourth-order. The fourth-order cell-
centered deconvolution (Eq.(2.13)) is more accurate than the compact fourth-order inter-cell
26 2 Finite Volume Discretisation of Navier-stokes on Staggered Grids
0.8
0.6
T(kh)
0.4
TI2
0.2 TI4
TD4
TC2
TC4
0
0 0.5 1 1.5 2 2.5 3
kh
Figure 2.5: Comparison of the transfer function of standard second-order and higher-order
schemes: second-order inter-cell deconvolution (TI2 ), fourth-order compact inter-
cell deconvolution (TI4 ), fourth-order compact differentiation (TD4 ), second-order
cell-centered deconvolution (TC2 ) and fourth-order cell-centered deconvolution
(TC4 )
deconvolution through out the Fourier space. Thus we do not need compact deconvolution
for the approximations of mass and pressure fluxes. This finding has an important conse-
quence because these two explicit stencils( Eq.(2.11) and Eq.(2.12)) lead to a narrow banded
Laplacian operator which can be solved much easier than a full Laplacian operator which
would arise from an implicit scheme for D or G.
The discussed cell-centered deconvolutions are used for the mass conservation and the
pressure gradient in staggered grid arrangement. However, in a collocated grid arrangement,
the inter-cell deconvolution has to be used for these two tasks. According to Fig.2.5, we
could expect a more accurate mass conservation on staggered grids than the collocated ones.
With this analysis, we can explain why the second-order solution of pressure on collocated
grids strictly limits the accuracy to second-order as reported in [ARM01]. This limitation
is, however, less severe on staggered grids. It will be shown later that, on staggered grids a
convergence rate of approximately third-order can be achieved with the second-order solution
of pressure.
[uu]yz
i+1,j,k = C2 + Ni (2.47)
2.7 Analysis 27
where C2 = [u]yz yz
i+1,j,k [u]i+1,j,k , the second-order approximation of the nonlinear convective
term. There are several straight forward methods which can be used to compute the nonlinear
2 f g
correction, Ni . In this study we consider three forms for the correction term z 12 z z
in
Eq.(2.30):
1 2
N1 = [u]yz yz
i+1,j,k+1 [u]i+1,j,k1 (2.48)
48
1 xyz 2
N2 = [u]is,j,k+1 + [u]xyz
is+1,j,k+1 [u]xyz
is,j,k1 + [u]xyz
is+1,j,k1 (2.49)
192
xy xy xy xy !2
1 u u u u
N3 = + + + (2.50)
192 z is,j,ks z is+1,j,ks z is,j,ks+1 z is+1,j,ks+1
The first correction uses surface-averaged values, the second equation uses cell-averaged
values and the third equation averages the first-derivatives provided by the compact dif-
ferentiation. The nonlinear correction term is analysed using an ansatz function u(x, z) =
exp((i + 0.4)kx + z). This function mimics an oscillating velocity under an exponential
gradient in z-direction and its amplitude is varied in x-direction. The cell-averaged values
here are treated as exact and surface-averaged values used for C2 are fourth-order accurate.
The interpolated terms are computed by multiplying the analytical value with the modified
amplitude of this ansatz function.
Instead of looking at the whole nonlinear convective term, we consider here just the
transfer function of the correction term. The norm of the error, L2 (|1 T (Ni )|) over kh
[0, /2] is shown in Tab.2.1. The first row is the norm of the analytical correction, and
at the same time the error when we do not apply any correction. According to the table,
nonlinear corrections improve the accuracy when the gradient is not too high and they are
able to predict roughly two digits of the correction term. The correction using face-averaged
values (N1 ) is more accurate than the one using cell-average values (N2 ). This is attributed
to the inferior transfer functions of second-order approximations. The third form (N3 ) is
slightly more accurate than the second form(N2 ) at the lowest gradient, but it performs
poorly otherwise. Thus computing the nonlinear correction using face-averaged values is
recommended.
Table 2.1: Square root of L2 -norm of errors of the correction term (kNexact Ni k2 ) over
kh [0, /2].
28 2 Finite Volume Discretisation of Navier-stokes on Staggered Grids
2.9 Conclusion
We have presented a fourth-order finite volume method using compact schemes for trans-
ported momentum and a divergence-free convective velocity. The accuracy of spatial approx-
imations was studied by Fourier analysis and a priori testing. The deconvolution needed to
approximate the momentum fluxes from the volume-averaged velocities is found to be the
critical part of the whole scheme.
2.9 Conclusion 29
1.0e+00
1.0e-01
error
1.0e-02
TI2
TI4
1.0e-03 TI6
TD4
TC2
TC4
1.0e-04
5 10 20 30 40
n
1e-12 1
0.1
0.01
1e-13
0.001
||
||
1e-04
1e-14
1e-05
1e-06 A
U B
V Fourth-order
1e-15 1e-07
10 100 1 10 100
N N
(a) (b)
Figure 3.1: (a) Maximum errors of the velocity at t = 10 of the fourth-order and the second-
order scheme applied to inviscid TGV with c1 = 0 and c2 = 0. (b) Maximum
error of the fourth-order scheme applied to classical TGV(A) and convective
TGV(B).
we proceed to the validation of the deconvolution and the differentiation. The classical TGV
(c1 = 0, c2 = 0, Re = 100) is used to test the differentiation , denoted by A in Fig3.1(b)).
The convected inviscid TGV is used to test the deconvolution by setting c1 = 1, c2 = 0 and
= 0, denoted by B in Fig3.1(b). We keep the CFL number constant at 0.05 and march the
solution to t = 11 where the magnitude of the velocities is reduced to half of its initial value
in case A. The maximum errors at the end of the simulations are shown in Fig.3.1(b) which
clearly indicates that the convergence rates are fourth-order.
v = sin(2x). (3.5)
3.2 Doubly-Periodic shear layer 33
This flow is governed by three parameters, the shear layer width parameter , the pertur-
bation magnitude, and the Reynolds number. In this study the Reynolds number based
on the maximum velocity and the length of the computational domain is set to 10, 000, the
shear layer parameter and the perturbation magnitude are 30 and 0.05 respectively. This
setting is similar to a thick shear-layer problem studied in [Bro95]. In order to show that the
proposed scheme converges towards the correct solution, we generate a reference solution
of this case on a 5122 grid using a pseudo-spectral code. In this code, the computation is
performed on the physical space and the derivatives are computed using FFT differentia-
tions while the divergence form is used for the convective term. Authenticity of the solution
is checked by successive refining the grid from 642 to 5122 . A smooth solution is already
obtained at a resolution of 1282 and the maximum difference between the solutions on 2562
and 5122 at the end of simulation is 109 . Further, the amplitudes of the wave numbers
larger than k 300 are clipped at machine accuracy. These assure the authenticity of the
reference solution.
Comparing a finite volume solution with the one obtain from finite difference requires
some interpolations or integrations in order to relocate and recast these two solutions in
comparable forms. The solutions of finite volume methods can be deconvoluted to the
pointwise values or we can integrate the solution from finite difference method. We chose
the latter approach because interpolations and integrations on the fine grid of the reference
solution is much more accurate than operating on the coarse grids used for the fourth-order
scheme. The reference solution is first interpolated to the integration points using cubic
splines and then the seventh-order integration is applied.
We provide a qualitative overview of the solutions using the contour plots of the vorticity
in Fig.3.2. The improvement of the fourth-order over the second-order scheme is clearly
visible. At the lowest resolution, the fourth-order solution preserves the correct shape of
the vortex sheet while it is already distorted in second-order solution. The fourth-order
has more wiggles but their magnitudes are smaller than those of the second-order. When
the resolution is doubled, numerical wiggles are still disturbing the second-order solution.
On the other hand, the solution of the fourth-order scheme is already smooth and the
numerical wiggles at this resolution are comparable to those on the finest solution (2562 )
of the second-order scheme. In this figure we see that even on the coarsest grid which is
heavily under-resolved, the fourth-order scheme still delivers an appreciable solution unlike
the second-order scheme.
The maximum errors of the fourth-order schemes and the convergence rates with respect
to the reference solution are shown in Tab.3.1. Here we compare the two formulations of
the convective velocities. On coarse grids, T 4 is slightly more accurate than DF 4. This
is because the leading truncation term of T 4 is smaller. The convergence rates of these
two formulations approach fourth-order when the grid is sufficiently fine. The differences
34 3 Validations
N || of u Convergence Rate
T4 DF4 T4 DF4
642 2.5835E-02 2.7744E-02
962 1.0762E-02 1.1147E-02 2.16 2.25
1282 4.4665E-03 5.0558E-03 3.06 2.75
1962 8.9431E-04 1.2273E-03 3.97 3.49
2562 2.6394E-04 2.5998E-04 4.24 5.39
Table 3.1: Maximum error of the streamwise velocity and its convergence rate of T4 and
DF4 convective velocities using fourth-order solution of pressure at t = 1.2, in
doubly periodic shear layers flow. .
between the solutions using a different convective velocity are very small and the solutions
are visually indistinguishable. For this reason, only the results from DF 4 were shown in
Fig.3.2.
We repeat the simulation again, but with second-order solution of pressure, i.e. the di-
vergence and the pressure gradient are computed with second-order schemes and the discrete
Laplacian is D2 G2 . The result is given in Tab.3.2 for the T 4 convective velocity. The effects
of the second-order pressure can be observed at every grid size. The convergence rate of the
L -norm falls between second- and third-order. The convergence rate of the L2 -norm tends
to third-order. At low resolution, the errors primarily stem from the wiggles inherit in the
approximation of the convective term and therefore we see little differences between second-
order and fourth-order pressure. Once the wiggles have disappeared (N 1962 ), the error of
the solution using second-order pressure is significantly larger.
Table 3.2: Error and the convergence rate of the streamwise velocity at t = 1.2 using DF 4
convective velocities and second-order solution of pressure (D2 G2 ).
3.2 Doubly-Periodic shear layer 35
1 1
0.5 0.5
0 0
0 0.5 1 0 0.5 1
(a) (b)
1 1
0.5 0.5
0 0
0 0.5 1 0 0.5 1
(c) (d)
1 1
0.5 0.5
0 0
0 0.5 1 0 0.5 1
(e) (f)
Figure 3.2: Contour plot of vorticity from -36 to 36. Left: second-order, Right: fourth-order
(DF4). The resolution are 642 , 1282 and 2562 from top to bottom.
36 3 Validations
The Orr-Sommerfeld eigenfunction (y) here is a stream function and only the real
part of the perturbation is taken into the velocities. This test-case is sensitive to the bal-
ance among the terms in the Navier-Stokes equations. The viscous term attenuates the
perturbation while the convective term transfers energy from the main flow to the pertur-
bation. If the approximation of the diffusion term is accurate and the convective term is
under approximated, the growth-rate of the disturbance will be less than the analytical
one. This is a common situation found in finite difference methods applied to this case
[CM04, MZH85, RM91, GPPZ98, DM01]. On the other hand, the growth-rate will be larger
than the analytical one when the situation is reverse. Higher-order convergence can only
be achieved if every approximation is correctly treated. Therefore this is a formidable test
case for numerical schemes and the boundary closures. The conditions of this test are set
to the same conditions used in [MZH85] where Re = 7500, = 1, = 0.0001 and the
only unstable mode is = 0.24989154 + 0.00223498i. The expected analytical growth rate
is Gp (t) =4.46996E-03. The computational domain is [Lx , Ly ] = [2H, 2H] based on the
channel half-width H. The CF L is kept constant at 0.05 such that the errors are dominated
by the spatial approximations. The simulations are calculated using double precision and
the growth-rate of the perturbation is measured at t = 50.29H/uc where uc is the velocity
at the center of the channel.
Several grid systems in the wall-normal directions have been used to simulate this flow.
Chebychev grids are used in [MZH85] and geometric grids are used in [RM91] and [CM04].
The geometric grids deliver better results at a much lower number of grid points. Further, the
Chebychev grid does not offer flexibility in grid placement. Therefore we use the geometric
grid in our code in which the grid spacing is increased or decreased with a constant factor.
3.3 Instability of plane channel flow 37
Table 3.3: Convergence study in y-direction of the instability of plane channel flow on uni-
form grid with Nx = 64. The table shows growth-rates of perturbations (Gp ),
their errors (p ) and the corresponding convergence rates (Ca ).
Table 3.4: Convergence study in y-direction of the instability of plane channel flow on
nonuniform grid with Nx = 64.
0.35 0.35
64+NC S=1.000
64 S=1.046
0.30 128+NC 0.30 S=1.087
128 S=1.124
0.25 Exact 0.25 Exact
ln( Et / E0 )
0.20 0.20
0.15 0.15
0.10 0.10
0.05 0.05
0.00 0.00
0 10 20 30 40 50 0 10 20 30 40 50
Time(Uc / H) Time( Uc/H )
(a) (b)
Figure 3.3: Time evolution of the perturbation energy on: (a) two uniform grids and (b)
non-uniform grids with different stretching factor (S). All grids are computed
with Nx = 64. On uniform grid, the resolution in the wall-normal direction of
the two grids are Ny = 64 and 128 and the result with nonlinear correction are
denoted by 64+NC and 128+NC. The result without the nonlinear correction
are denoted by 64 and 128.
NX
NY
32 64 128 160 256
64 6.64e-04 3.18e-04 5.64e-04 5.93e-04 6.26e-04
128 1.43e-03 4.42e-04 1.94e-04 1.65e-04 1.33e-04
256 1.35e-03 3.62e-04 1.15e-04 8.50e-04 5.30e-05
512 1.16e-03 3.10e-04 6.20e-05 3.20e-05
Table 3.5: Error of the perturbations growth rate (p ) relative to the growth rate on the
finest grid ( 4.428E-03 ) using second-order approximation of pressure.
boundary closures. A fourth-order convergence rate is obtained in all cases with the fourth-
order approximation of pressure. The overall accuracy falls to third-order if the pressure was
only treated by a second-order scheme. The nonlinear correction plays crucial role in energy
transfer in the instability of plane channel flow. In the next step we investigate the applica-
tion of the fourth-order scheme in a fully turbulent flow.
+
Grid NX NY NZ x+ y + zmin +
zmax
A 48 42 64 47.2 18.0 5.6 5.6
B 96 64 128 23.6 11.8 2.8 2.8
C 32 32 32 70.7 23.6 5.4 18.6
M1 64 64 64 35.4 11.8 2.7 9.7
M2 96 80 96 23.6 9.4 1.1 5.8
F 128 128 128 17.7 5.9 0.72 4.4
MKM1999,[MKM99] 128 129 128 17.7 5.9 0.054 4.4
KMM1987,[KMM87] 192 129 160 11.9 7.1 0.054 4.4
H2006,[Hu06] 256 256 121 16.9 8.4 0.062 4.7
Table 3.6: Specification of numerical grids used in the grid dependency study.
that an artificial momentum of the comparable amount are being removed or added per
unit-momentum (see Eq.(2.10)). Therefore we choose DF4 to approximate the convective
velocities for the remaining tests.
20 20
18 18
16 16
14 14
12 12
+
+
10 10
u
u
8 8
6 6
4 4
DF4 DF4
2 T4 2 T4
MKM1999 MKM1999
0 0
1 10 100 1 10 100
+ +
z z
(a) (b)
5.0E-02 3.0E-15
Nz=64 Nz=64
4.5E-02 Nz=128 Nz=128
2.5E-15
4.0E-02
3.5E-02
2.0E-15
3.0E-02
Mv
Mv
2.5E-02 1.5E-15
2.0E-02
1.0E-15
1.5E-02
1.0E-02
5.0E-16
5.0E-03
0.0E+00 0.0E+00
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.5 1 1.5 2
z z
(c) (d)
Figure 3.4: Mean profiles of streamwise velocity along wall-normal direction on grid A (a)
and grid B (b). Averaged mass imbalance per unit volume on u-momentum cells
of T4 (c) and DF4 (d).
Grid uc /u ub /u Cf
C 16.68 14.55 9.35E-3
M1 17.62 15.20 8.57E-3
M2 18.19 15.63 8.20E-3
F 18.34 15.77 8.16E-3
KMM1987,[KMM87] 18.20 15.63 8.18E-3
MKM1999,[MKM99] 18.30 15.52 8.18E-3
solution. On the grid(C), the bulk flow is underestimated by 6.4% and the error is reduced
to 2.3% on M1. Results of grid M2 and grid F are very close and comparable to the reference
solutions.
Qualitative convergence of the scheme is shown in Fig.3.5(a). Amiri and Hanami[EK05]
incorporated LES into Pereiras fourth-order scheme and found that the result without ex-
plicit filtering is poor on the 643 grid which is equivalent to M1. Here we can obtain a
satisfactory result on the same resolution without any filtering or modelling. Increasing res-
olution from grid M1 to grid M2 places the profile almost on top of the spectral solution.
When this profile is plotted together with two spectral solutions in Fig.3.5(b), we see that
it is lying between these two. The mean streamwise velocity on grid F is not plotted here
because it is lying between the two spectral solutions as well
In Fig.3.5(c) the square-roots of the surfaced-averaged values of the Reynolds nor-
mal stresses e.g. [uu]yzi,j,k extracted from the momentum equation are plotted against the
two reference spectral solutions. It confirms that the grid M2 is sufficient to capture all
scales of engineering interest. The Reynolds normal stresses of the fourth-order scheme do
not differ from the reference solutions more than the difference between those two. This
level of difference is much smaller compared to uncertainties occurring in physical experi-
ments.
Next we consider the higher-order statistics, the skewness and the flatness factors. The
Skewness factor shown in Fig.3.6(a) confirms the consistency and convergence of the scheme.
The profiles of grid M2 are satisfactory close to the spectral solution. The profile of S(u)
is on top of the reference solution almost everywhere and the profile of S(w) is satisfactory.
The value of 0.06 is obtained near the wall compared to 1.3 when the second-order scheme
was used (not shown). To the best of our knowledge, all second-order codes predict negative
S(w) near the wall when the grid resolution is not finer than those used in [MKM99] and
[KMM87]. Increasing the solution to grid F clearly improves the accuracy of the solutions and
brings S(w) on top of the spectral solution. The flatness factors, are plotted in Fig.3.6(b).
The profiles on both grids are highly satisfactory for the streamwise velocity but notable
shortcomings are observed in F (w). The deviations we see here in the skewness and flatness
3.4 Turbulent channel flow 43
factors can be attributed to small scale structures at the far end of the spectrum. In order
to reveal how much the fourth-order scheme is behind the spectral scheme, one-dimensional
spectra of the cell-averaged values are investigated and plotted in Fig.3.7(a) and 3.7(b). In
these figures, the Fourier spectra are normalised by the value of the first mode. The energy
spectrum Euu in the y-direction on grid C is far from the spectral solution, but the spectra on
the other grids follow the one of the spectral code nicely. In the streamwise direction, where
the convection is much stronger, the energy spectra on all grids follow the spectral solution
up to about 60% of the Nyquist limit and then start to fall sharply. This is consistent with
the sharp drop of the transfer function of the fourth-order compact deconvolution used for
the convective fluxes shown in Fig.2.5.
10-times more efficient than the second-order scheme. This luxury efficiency can be spent
to get a more accurate solution in a shorter time.
Table 3.8: CPU-seconds per time step spent in momentum equation(M) and the enforcement
of continuity(P) on an Opteron 8216.
cost/performance ratio. The remaining question is whether the nonlinear correction pays
off on very fine grids. This is answered in Fig.3.9(c)&(d). The thin solid line of the fourth-
order with nonlinear correction (4M-4P-wNON) is absorbed into the thick solid line of the
reference solution while the plus symbols of the fourth-order without nonlinear correction
(4M-4P-woNON) are slightly lower. Also, the solution without the nonlinear correction
predicts the wrong value of the Reynolds shear stress (Fig.3.9(d)). Therefore if we do not
use the nonlinear correction, the turbulence interactions may not be captured accurately.
In summary, the nonlinear correction should be turned off when performing LES and let
the subgrid-scale handle small scale interactions. In DNS, the nonlinear correction would be
necessary when highly accurate solutions (comparable to spectral code) are sought and the
grid was sufficiently fine.
3.5 Conclusion
We studied two formulations for the fourth-order convective velocities, a conservative (DF4)
and non-conservative (T4). A difference between them can hardly be observed in laminar
flows because of the smoothness of the field. On staggered grids the second-order solution
of the pressure limits the accuracy of the solver but not as strong as observed in [WD01] for
collocated grids. A convergence rate of third-order can be achieved on staggered grids when
the pressure is only treated with second-order approximations. This finding is supported by a
comparative error analysis in Fourier space in the chapter.2. The cell-centered deconvolution
used to enforce the continuity on staggered grids has higher resolving power than the inter-
cell deconvolution on collocated grids. Thus more information is preserved in the velocity
field. A third-order convergence rate can be obtained whilst it is capped at second-order on
collocated grids. Nevertheless, a fourth-order solution of pressure is required to reach overall
fourth-order convergence.
In turbulent flows the two convective velocities give significantly different results on
coarse grids. Therefore divergence-free formulations should be used for the convective veloc-
ity to maintain the underlying conservation properties of the Navier-Stokes equations.
The high resolution property and the efficiency of the proposed scheme is demonstrated
using a turbulent channel flow in which the convergence towards the spectral solution is
demonstrated. The proposed scheme is robust and can give a reasonable solution using
only 323 grid cells. Actually, our code can deliver a stable DNS solution even on 203 cells.
The solution using 0.7M grid points is in excellent agreement with spectral solutions up to
second-order statistics although it is only one-third and one-sixth of the grid cells used by
spectral solutions in [MKM99] and [KMM87], respectively. Small deviations near the wall
are observed in third and fourth-order statistics. Increasing the grid resolution improves
these higher-order statistics. We have quantified the efficiency of the proposed scheme. It
46 3 Validations
only requires half of the resolution per coordinate direction to have a comparable result with
the second-order scheme. Effectively, the fourth-order scheme can be ten-times faster than
the second-order scheme in advancing the momentum per one dimensionless time unit, at a
comparable accuracy.
The fourth-order solution of pressure is essential, physically and numerically. The
second-order pressure will only give, at best, a third-order convergence rate for the velocities
and it delivers poor solutions in turbulent channel flows. The nonlinear correction is found
to be useful on a very fine grid and unimportant otherwise. It could be turned off when
performing LES for a better cost/performance ratio.
3.5 Conclusion 47
20
18
16
14
12
<u+>
10
8
6
4 C
M1
2 M2
MKM1999
0
1 10 100
+
z
(a)
20
18
16
14
12
<u+>
10
8
6
4
M2
2 MKK1999
KMM1987
0
1 10 100
z+
(b)
3
<uu>1/2
<vv>1/2
2.5 <ww>1/2
2
<uiui>1/2/u
1.5
0.5
0
20 40 60 80 100 120 140 160 180
z+
(c)
Figure 3.5: (a) Convergence of mean streamwise velocity. (b) Mean streamwise velocity on
grid M2 compared with other two spectral solutions from [KMM87] (dashed line)
and [MKM99] (solid line). (c) Square root of Reynolds normal stresses on grid
M2.
48 3 Validations
1.5 30
S(u)-M2 F(u)-M2
S(w)-M2 F(w)-M2
S(u)-F 25 F(u)-F
1 S(w)-F F(w)-F
S(u),MKM1999 F(u),MKM1999
S(w),MKM1999 20 F(w),MKM1999
0.5
Skewness
Flatness
15
0
10
-0.5
5
-1 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
z z
(a) (b)
1E+01 1E+01
1E+00 1E+00
1E-01 1E-01
1E-02 1E-02
Euu
Euu
1E-03 1E-03
1E-04 1E-04
C
1E-05 M1 1E-05 M1
M2 M2
1E-06 F 1E-06 F
MKM1999 MKM1999
1E-07 1E-07
1 10 100 1 10 100
ky kx
(a) (b)
Figure 3.7: One-dimensional energy spectra in spanwise (a) and streamwise (b) directions of
the streamwise velocity at z + = 178.
3.5 Conclusion 49
18
16
14
12
<u >
10
+
8
6 3
4O-32
3
4 2O-643
4O-64
3
2 2O-128
3
MKM1999-128
0
1 10 100
+
z
(a)
1E+00
1E-01
1E-02
1E-03
Euu
1E-04
1E-05
1E-06 Second-order
Fourth-order
MKM1999
1E-07
1 10 100
kx
(b)
Figure 3.8: Comparison of the mean streamwise velocity profiles of second-order and fourth-
order scheme: (a) mean streamwise velocity, (b) one-dimensional spectra of
streamwise velocity in x-direction on grid F.
50 3 Validations
20 20
18 18
16 16
14 14
12 12
<u >
<u >
+
+
10 10
8 8
6 6
4 4M-4P-wNON 4 4M-4P-wNON
4M-2P-wNON 4M-2P-wNON
2 4M-4P-woNON 2 4M-4P-woNON
MKM1999 MKM1999
0 0
1 10 100 1 10 100
+ +
z z
(a) (b)
20 0
18 -0.1
16
-0.2
14
-0.3
<uw>/u
12
2
<u >
+
10 -0.4
8 -0.5
6
-0.6
4 4M-4P-wNON 4M-4P-wNON
4M-2P-wNON 4M-2P-wNON
2 4M-4P-woNON -0.7 4M-4P-woNON
MKM1999 MKM1999
0 -0.8
1 10 100 20 40 60 80 100 120 140 160 180
+ +
z z
(c) (d)
Figure 3.9: Mean streamwise velocity on grid C(a), M1(b) and M2(c) and the Reynolds shear
stress on grid M2 (d). The thin solid line is the full fourth-order scheme (4M-4P-
wNON). The open circle (4M-2P-wNON) is the solution using the fourth-order
scheme for convective and diffusive terms but second-order approximation of
divergence and pressure gradient. The plus symbol (4M-4P-woNON)is the full
fourth-order schemes without the nonlinear correction.
4 Parallel Solution to Tridiagonal
Systems
4.1 Introduction
In order to parallelised the compact scheme, we need an algorithm solving tridiagonal sys-
tems in parallel. Solving tridiagonal systems is one of the important kernels of scientific
computing. These systems appear in many approximation problems such as spline interpo-
lations, wavelets or numerical solutions to differential equations. These systems are usually
solved repeatedly for an enormous number of right-hand sides. In three-dimensional prob-
lems, this number can easily reach 232 which may not fit on a memory of a single processor
machine. Efficient parallelisation of these systems is thus critical for scientific computing
and our compact fourth-order scheme.
Modern parallel computers are mostly categorised into massive parallel processing
(MPP) type which consists of a large number of powerful nodes. The right-hand-side of
the system can be distributed among the processor, and thus a processor holding a certain
subsystem may not have explicit access to the subsystems own by other processors. Despite
the availability of NUMA architecture which allows a processor to access the memory of the
other processors, this nonlocal access is much more expensive than the local one. This prob-
lem is more pronounce when the number of processor does not fit on a single partition where
the computational nodes are connected by special interconnection networks. Therefore an
efficient and scalable algorithm solving tridiagonal matrices in parallel should minimise the
number of communications and synchronisations. Divide and conquer algorithms [Wan81],
[Bon91], [Sun95], [AG96] and [Heg96] are well suited for the current trend in supercomputer
architecture. Bondeli [Bon91] and Sun [Sun95] independently present the algorithm spe-
cialised for diagonal dominant tridiagonal matrix. The reduced parallel diagonal dominant
algorithm in [Sun95] can solve a tridiagonal system of n equations for right-hand sides
using (5n/p + 4J + 1) operations on p processors, for some small number of J which will
be described later. His algorithm is among the most efficient algorithms for this problem.
This algorithm requires two times a unidirectional communication at two stages of the com-
putation. The bidirectional communication links of modern computers are thus left unused.
This shortcoming and problem of load-balancing among the processors can double the cost
52 4 Parallel Solution to Tridiagonal Systems
of the communication.
In the present work, we develop a novel interface-splitting algorithm with a complexity
of (5n/p + 4J 4). This algorithm is designed for diagonal dominant tridiagonal matri-
ces. The idea is to decrease the communication and reduce the data dependency. This is
achieved by computing the solution at the interface between two processors before comput-
ing the solution on the inner points. This algorithm exploits an exponential decay of the
inverse of diagonal dominant matrices demonstrated in [Nab99]. The proposed scheme is
competitive. It has less complexity than the algorithm presented in [Sun95] and requires
one less synchronisation phase. Therefore the proposed algorithm is less sensitive to load
balancing and network congestion problems. This scheme is applicable for non-Toeplitz as
well as periodic systems.
This chapter is organised as follows. First the interface-splitting algorithm is derived.
Then, similarities and differences with existing divide-and-conquer algorithms are discussed
and the complexity of the proposed algorithm is presented. Finally, accuracy and perfor-
mance on workstations, an Ethernet cluster and a supercomputer of the proposed scheme
are presented compared to ScaLAPACK.
Ax = b, (4.1)
where A is a strictly diagonal dominant matrix: A = [li , di , ri], |di| > |li |+|ri| and l1 = rn =0.
In order to solve this system in parallel, one can assume that there is a simpler matrix Q
and a perturbed right-hand side v with the accompanied transformation matrix T such
4.2 Parallel algorithm solving tridiagonal systems 53
that
Tv = b, (4.2a)
Qx = v. (4.2b)
This means that the original tridiagonal matrix is decomposed in the form A = TQ.
The structure of the algorithm depends on the choice of the matrix Q. Wangs parti-
tioning algorithm [Wan81] and parallel line solver [ABM04] belongs to this type of fac-
torisation. This factorisation can be considered as pre-processing schemes where the r.h.s
is perturbed such that the solution of the simpler block matrix gives the desired solu-
tion.
On the other hand, one can assume a decomposition in the form A = Q S and
solve
Q w = b, (4.3a)
Sx = w. (4.3b)
The earliest algorithm of this type applied to tridiagonal matrices is the SPIKE algorithm
[SK78], [LPJN93]. The subsequent algorithms are the parallel partition algorithm (P P T )
of Sun [SSN89] and the divide and conquer algorithm (DAC) of Bondeli [Bon91]. Even
though all of these algorithms are derived differently, their implementations can be identical.
In practice, these three algorithms solve Eq.(4.3b) by two substeps, and the solution x is
obtained by
x = w w, (4.4)
using the correction vector w. Sun further exploits the decay of the correction vector and
derives the Reduced PDD in [Sun95]. These algorithms can be considered as post-processing
schemes where the first solution w is corrected by solving Eq.(4.3b). All of these algorithms
use the block subdiagonal matrix of A as Q .
A parallel algorithm solving the tridiagonal system will be efficient if the preprocessing
step (Eq.(4.2a) ) and the post-processing step (Eq.(4.3b) ) are easy to solve in parallel. The
post-processing step of the algorithms in the previous paragraph solves a block tridiagonal
system of p subsystems where each block is 4 4 matrix. In case of a strictly diagonal
dominant system, and if the subsystem size is sufficiently large, the matrix S can be reduced
to a p-block diagonal matrix and solved by nearest neighbour communications. For the
preprocessing scheme, an equivalent algorithm with the same advantages can be formulated
for the solution of Eq.(4.2a). The preprocessing matrix is given by T = ST and it is straight
forward to show that for any algorithm of the post-processing type, there is an equivalent
54 4 Parallel Solution to Tridiagonal Systems
variant in the pre-processing type, using the same block matrix, Q = Q . There will be small
differences between these two variants of the equivalent algorithm due to floating point
operations, but it can make significant differences in the programming of the application
using these algorithms. For example, if the post-processing algorithm is called from a certain
subroutine, the calling routine must have knowledge of the global topology or at least its
nearest neighbours. On the contrary when the pre-processing algorithm is used, knowledge
of the topology is not necessary and the calling routine can proceed as if it was working alone
because solving Q does not require information of the neighbours unlike solving S. The pre-
processing operations can be done in a higher level subroutine which has the knowledge of the
topology. Therefore the pre-processing algorithm is more suitable for numerical programs
adopting the domain decomposition concept.
dc+1 rc+1
N1
lc+2 dc+2 rc+2
N2
.. .. ..
Q=
..
Nk = . . .
.
lc+m1 dc+m1 rc+m1
Np
1
Figure 4.1: Matrix Q of the interface splitting algorithm and its N k subdiagonal where c =
(k 1)m.
The matrix Q is accompanied by the transformation matrix T which is just the identity
matrix whose km-th rows are replaced by a vector (yk )T . T can be inverted easily thus
allowing to solve (4.2a) by
v = T1 b. (4.5)
4.2 Parallel algorithm solving tridiagonal systems 55
It can be shown that the matrix T1 has exactly the same structure as T but the km-
T
th row are changed to (zk )T and zjk = ckm,j , 1 j n with C = A1 . This is
equivalent to computing the solution xkm explicitly using the row inverse of A. The solution
vector v can be obtained by manipulating b only at the interface i.e. v = b f using a
sparse vector f whose component is non-zero only in the neighbourhood of the interface.
The application of this approach for a general matrix is of course expensive and prone
to numerical instability. However when A is strictly diagonal dominant, the calculations
of (zk )T and (zk )T b are stable and accurate. Nabben has shown in [Nab99] that the
components of matrix C decay exponentially away from the diagonal. The key to the
efficiency of the proposed algorithm is based on the truncation of the scalar product (zk )T b
to a certain bandwidth 2J which introduces an approximation error ek for the r.h.s. of (4.5)
(see later).
Thus the interface-splitting algorithm partitions the whole system of Eq.(4.1) into p
smaller independent subsystems. Each subsystem, xk , xkj = x(k1)m+j , 1 j m, is
separated from the others by the interface xkm . This solution at the interface is explicitly
computed by a truncated scalar product x ekm = (zk )T b. The dependencies between the
subsystems are replaced by these precomputed solutions. Then the k-th subsystem of A
takes the following form
Nk xk = bk f k ek . (4.6)
The components of the vector f k are zero except the first and the last component which
are given by f1k = lmk1 k1
em and fm
x k
= e xkm bkm . The vector ek contains the error of
the approximation which is a result of the approximations introduced to the top and the
bottom interfaces. Note that ek are non-zero only in the first and the last components. The
above equation is exact, however due the error term ek will be omitted thus we solve for the
following approximate solution
Nk x
ek = v
e = b f. (4.7)
Fk uk = z (4.8)
4.3 Accuracy Analysis 57
k1 T
for u = [xm xk xkm ]T with
T
1 0 k1 k T k T
F= , z = [xm b1 b2 bn1 xm ] and w = [l1 0 0] . (4.9)
k
w N
k1
The interface splitting algorithm introduces an approximation to xm and xkm and the system
Feuk = ezk is solved instead using z = [ek1
xm ekm ]T . These approximations
b1 b2 bn1 x
create errors which can be written as follows
hk = u
ek uk (4.11)
Fhk = e
zk zk (4.12)
k1
em 0
Fhk = + (4.13)
0 ekm
The error at the inner indices (hki , 1 < i < m + 1) is thus a sum of the error propagated
from both interfaces. The position of the maximum error is given by the following proposi-
tion.
The error of the inner indices i.e. hki1 , 1 < i < m is not larger than the maximum error
introduced at the interface because
k k k
hi li hi1 + ri hi+1
k
hi (|li | + |ri |) max(hki1 , hki+1 )
k
h < max(hk , hk ) < max(ek1 , ek ).
i i1 i+1 m m
58 4 Parallel Solution to Tridiagonal Systems
In what follows we assumes that the threshold c was used to truncate the vector zk and
k
J is the minimum j > 0 satisfying zij > c . The maximum error of the interface-splitting
algorithm thus consists of the truncated terms and the round-off error in the calculation of
the dot product in step 3. These errors are however bounded by a small factor of the cut-off
threshold.
T
Theorem 1. Let A be the matrix of size n = pm mentioned in Eq.(4.1) and zk be the
km-th row of the inverse of A, which can be used to compute the m-th solution of the k-th
subsystem. Then the maximum error of the interface-splitting algorithm is bounded by
1 1
emax = 2L + + L c |b| , (4.14)
2 2
where c is the threshold for the cut-off coefficient, is the machine accuracy and L is the
bandwidth in which the magnitude of the coefficients is reduced by one significant digit. Note
that |b| is the maximum norm of b.
Proof. Assuming that the right-hand-side vector can be represented exactly by machine
k
number and let k be the error of representing the exact zk by b
zk , i.e. m
d zk zk . The hat
u =b
here represents the exact numerical value that only differ from the exact one due to machine
bkm
operations. Let f l(x) be the floating point operation on x, the numerical computation of x
by bT b
zk is given by
n
!
X
bkm = f l
x zjk + j bkj
k
(4.15)
j=1
km
! n
!
X X
bkm = f l
x zjk + j bkj
k
+ fl zjk + j bkj
k
+ , (4.16)
j=1 j=km+1
with || < 2. Because zk decays exponentially, there is a smallest number L such that
k 1 k k 1 k
zjL < 10 zj for j < km and zj+L < 10 zj for j > km. This means that if the machine
+1
accuracy is in (10 , 10 ) then Eq.(4.16) is equivalent to
km
! km+L+1
!
X X
bkm = f l
x zjk + j bkj
k
+ fl zjk + j bkj
k
+ . (4.17)
j=kmL j=km+1
ak0 and b
Let b ak1 be the first and the second sum in Eq.(4.17). Again, due to the decaying
nature of zk , the round-off errors only affect the scalar product on the first L largest terms,
thus the first sum is reduced to
4.4 Complexity and performance 59
km
!
X
ak0 = f l
b zjk + j bkj k
(4.18)
j=kmL
km
! km
!
X X
ak0
b = fl zjk bkj + fl kj bkj (4.19)
j=kmL j=kmL
and because the exponential decay is bounded by a linear decay, we arrive at the following
error bound
b
hkm = x
bkm b
ak0 + b
ak1 (4.20)
km
! km+L+1
!
X X 1
< fl zjk bkj + fl zjk bkj + L + (4.21)
j=kmL j=km+1
2
1
< 2L + . (4.22)
2
Now, we have a bound in case of using the full bT zk . It is now straight forward that error
of the truncated scalar product is bounded by
hkm = x
ekm xkm (4.23)
kmJ
! n
!
X X
=b
hkm f l zjk bkj fl zjk bkj (4.24)
j=1 j=km+J+2
1 1
< 2L + + L c |b| . (4.25)
2 2
( |di|/(|li | + |ri|) ). For Toeplitz system T = [1, , 1], the entry of the matrices of the
LU-decomposition of this system converges to certain values and the bandwidth J can be
deduced from these values. The necessary bandwidth J achieving the cut-off threshold c
can be approximated for this special system by :
ln c
J = p . (4.26)
ln 21 || + (2 4)
For instance, J equal to 7 and 27 for 104 and 1015 when = 4. These two numbers
are much smaller than the usual size of the subsystem used in scientific computing. The
inter-face splitting algorithm is an approximate method but it can be made equivalent to
other direct methods by setting c to machine accuracy.
In step 2, we have to solve a linear system for the km-th row of the C. This system
has to be larger than 2J such that the coefficients of (zk )T are sufficiently accurate. In this
work we solve a system of size 2J + 4L which gives a 2-digits accurate representation of the
smallest coefficient of the truncated (zk )T . It would be rare that one would satisfy with the
error larger than 104 . It is thus safe to assume that L = J/4. This value of L is substituted
in to the operation counts of the interface-splitting algorithm and the results are shown in
Tab.4.1. Note that the error threshold smaller than this will lead to a smaller L for the
same level of the accuracy of the coefficient vector and consequently a smaller number of
operations relative to J. In this table we list also the communication time for N which can
be expressed by a simple model as com = + Q where is the fixed latency and is the
transmission time per datum. For the single r.h.s systems, we assume that the system is so
large (otherwise we would not need a parallelisation) such that each processor only knows
their own r.h.s and the submatrix Ak . This leads to a higher communication compared
to [Bon91] and [Sun95] in which the global matrix is known to all processors. If the same
assumption is taken here, the communication will be reduced to + 2. For multiple right-
hand side system, we do not count the computation in the first two steps. This leads to
the absolute speedup of single r.h.s. problem (S1 ) and multiple r.h.s. problem (S ) given
in Eq.(4.27) and Eq.(4.28) when n is much greater than p and the communication cost is
small.
p
S1 = J
(4.27)
1 + 3m
p
S = J
(4.28)
1 + 0.8 m
The key number determining the performance of the interface-splitting algorithm is the
ratio J/m. When this ratio is small, an excellent speedup can be expected otherwise the
algorithm suffers a penalty. For example, the efficiency will drop from 92% to 56% when J
4.5 Results 61
is increased from 0.1m to m on multiple right-hand side system. According to Tab.4.1, the
interface-splitting algorithm is slightly faster than the reduced-PDD algorithm [Sun95] for
multiple right-hand side problems. Our algorithm is also faster for a single right-hand side
system, when J < 0.375m. Therefore the interface splitting algorithm is competitive both as
approximate and direct methods (close to machine accuracy).
It is a common practice that the subsystem size should be reasonably large such that
the speedup from load distribution justifies the communications and other overhead. This
algorithm assumed that J m, if this does not hold the program adopting this algorithm
should issue a warning or an error to the user.
Interface-Splitting Algorithm
System Matrix Sequential
Computation Communication (com )
Nonperiodic 8n 7 8 np + 24J 17 + 9J/2 + 2
Single r.h.s
Periodic 14n 16 8 n + 24J 17 + 9J/2 + 2
p
Nonperiodic (5n 3) 5 np + 4J 4 + 2
Multiple r.h.s
Periodic (7n 1) 5 np + 4J 4 + 2
Table 4.1: Computation and communication costs of the interface-splitting algorithm for
solving a linear system of order n on p processors with single and right-hand
sides.
4.5 Results
In this section, we present results of the interface-splitting algorithm applied to single and
multiple r.h.s. on two parallel computers. First the accuracy of the interface-splitting al-
gorithm is studied for a simple matrix in an approximation problem. In the second step,
we evaluate its performance on a Linux cluster. Finally, performance and the scalabil-
ity of the interface-splitting algorithm is presented and compared with ScaLAPACK pack-
age.
Table 4.2 shows the error of the inter-face splitting algorithm applied to a matrix [1, 4, 1]
encountered in approximation problem such as spline interpolation, compact differentiation
[Lel92] and compact deconvolution [Kob99]. In this table we consider the case of differentia-
tion of f = sin(20x) on x = [0, 1]. The unknown are placed at xi = ih, 0 i 251. This
problem is solved using three partitions. The table shows that the bound given in Eq.(4.14)
is not so far from the actual error as seen in the first and the last row. Sometimes the
error can be much smaller than the bound due to the cancellation of the neglected terms.
The error of the algorithm using the smallest J = 7 is already much smaller than the error
62 4 Parallel Solution to Tridiagonal Systems
of the local truncation error of the differentiation. If one consider the fact that there are
25 grid points per wavelength in this problem which is already very fine, we expect that
the smallest J here should be adequate for most simulation-based applications. If higher
accuracy is required, the band can be expanded as necessary. Note that the bandwidth J
here is dependent only on the diagonal-dominance of the matrix surrounding the interface
and the choice of cut-off threshold c . The number of unknowns has no influence on this
bandwidth.
In what follows, we demonstrate the accuracy of the proposed algorithm when applied
to a matrix with non-constant coefficient. In order to have a reproducible test, we solve
the test matrix A = [sin(i), 2(|sin(i)| + |cos(i)|), cos(i)] of order 1000 with bi = 1 on 4
processors. The off diagonal coefficients of this matrix vary relatively fast. Their signs
change approximately once every three rows. The numerical error of the algorithm shown
in Tab.4.3 indicates the non-constant coefficients of the matrix do not have adverse effects
on the algorithm.
J c 1
|f
|b| seq
fpar
|
|fexact fseq
|
7 9.9167e-05 7.1325e-06
15 2.6350e-09 7.2559e-11 1.7394e-03
27 3.6092e-16 3.8095e-17
Table 4.2: Normalised error of the interface-splitting algorithm (column 2) subjected to dif-
ferent cut-off threshold (c ). The error of the differentiation (column 3) is shown
for a comparison.
1
J |x
|b| seq
xpar |
7 1.41E-005
15 2.06E-011
18 4.66E-014
20 4.44E-016
27 4.44E-016
Table 4.3: Normalised error of the interface-splitting algorithm applied to system with non-
constant coefficients using different bandwidth.
Fixed size speedup of the algorithm is studied by solving a multiple r.h.s. system of
order 25600 and 104 right-hand sides. In all following tests, the interface bandwidth J is
set to 9. The absolute fixed size speedup of the interface-splitting algorithm on a Linux
cluster in Fig.4.2 showing an almost ideal speedup. Here the absolute speedup is defined
by the solution time on a single processor using the fastest sequential algorithm (Gaussian
4.5 Results 63
elimination) divided by the time used by the interface splitting algorithm. Because the
algorithm only communicates to the nearest neighbour, it is therefore highly scalable. When
the subsystem size is sufficiently large i.e. J/m 1 , the overhead will be small and the
ideal absolute speed up can be achieved. At p = 64 the speedup is even slightly better than
the ideal one, which could be attributed to a better caching.
70
60
50
Speedup
40
30
20
Figure 4.2: Absolute speedup (S ) of multiple right-hand side problem on Linux cluster.
The scalability of the algorithm is studied by measuring runtimes on Altix4700 for two
types of problem, a single r.h.s and a multiple r.h.s.. Here we consider a scaled problem where
the total system size grows linearly with the number of processors i.e. ntot = pnsub . The
subsystem size is set to 106 in a single r.h.s problem. For multiple right-hand side problems,
it is set to 100 with 104 right-hand-sides. The results of the test are shown in Fig.4.3(a). The
runtime of the interface-splitting algorithm grows approximately linear with the logarithm
of the number of processors in both types of problem. This can be attributed to the fat-
tree topology of the interconnection of the machine. Note that the number of unknowns
in both problems are equal i.e. 106 . The difference in CPU-time seen here is solely the
communication time. Here we can not achieved an ideal speedup unlike on the cluster. This
is of course a limitation imposed by the hardware. It is obvious that, the ScaLAPACK is not
scaled well as the our algorithm. The CPU-time is approximately doubled when increased the
number of processor from one to two. This is in accordance to the increase in complexity of
the parallel algorithm used by ScaLAPACK. Interestingly the differences in the CPU-time of
the multiple right-hand side problems are increasing sharply when the number of processors
is increased. This indicates that the ScaLAPACK is very sensitive to the characteristics of
the interconnection network unlike the proposed algorithm.
In Fig.4.3(b) efficiencies of the inter-face splitting algorithm and the ScaLAPACK are
presented. On a single r.h.s., the efficiency of the proposed algorithm falls to 50% at p = 64
because the increase in communication time that occupies 50% of the CPU-time there.
64 4 Parallel Solution to Tridiagonal Systems
0.09
ITSA Multiple RHS
SCALAPACK Multiple RHS
0.08 ITSA single RHS
SCALAPACK single RHS
0.07
0.06
CPU-Time(s)
0.05
0.04
0.03
0.02
0.01
0
1 10 100
Number of processors
(a)
100
90
80
70
Efficiency(%)
60
50
40
30
20
10
1 10 100
Number of processors
(b)
Figure 4.3: Running time (a) and the scaled-efficiency (b) of the interface-splitting algorithm
on Altix4700 compared to ScaLAPACK: .
Interestingly, the increased communication time does not affect the solving time of the
SCALAPCK (thin dash line) much, especially the single right-hand side problem. This is
because the increase in the cost of computation is dominated. In the multiple r.h.s problem,
both algorithms enjoy better efficiencies. At p = 64, the interface-splitting algorithm deliver
an 85% efficiency compared to 30% of the ScaLAPACK. On both problems, the interface-
splitting is at least four-times faster than the ScaLAPACK.
4.6 Conclusion
We have presented the inter-face splitting algorithm solving diagonal dominant tridiagonal
systems. The accuracy of the algorithm depends on the diagonal dominance of the matrix
4.6 Conclusion 65
and not the order of the matrix. This algorithm is an approximate algorithm but the user
have full control over the accuracy. It can be used equivalently to the direct method if desired,
provided that the subsystem size is sufficiently large. This algorithm is highly efficient and
scalable, especially for multiple right-hand side problems. Excellent efficiencies are obtained
on two parallel computers. The algorithm is at least four-times faster than the ScaLAPACK
which used a direct algorithm. Unlike the existing algorithms, this algorithm fully utilise
the bidirectional link which allows these communications to overlap and thus a low commu-
nication overhead can be expected from this algorithm.
66 4 Parallel Solution to Tridiagonal Systems
5 Approximate Projection Method
In chapter 3, we have numerically verified that the fourth-order solution of pressure is essen-
tial for a fourth-order convergence rate of the velocities. Despite the fact that the fourth-
order projection method used earlier can deliver excellent results and ultimately leads to a
reduction of computational time. This was partially due to the cost of the direct solver of the
fourth-order projection was not fully revealed. Because, the FFT were used in the xy-plane,
the cost of the fourth-order and the second-order projection are approximately the same in
this plane. We only had to solve the seven-banded system in the wall-normal direction for
each wave number. The major cost in this process was the FFT and thus the cost of the
fourth-order projection came at marginal cost (see Tab.3.8). In a general situations, the
flow may not have any homogeneous directions and thus the FFT cannot be used. In these
situations, we have to deal directly with the 19-banded matrix given by the fourth-order
projection.
At first glance, it would seem that this is a simple task. One can just construct a
sparse matrix out of this discrete Laplacian and use a certain type of preconditioned Krylov
subspace algorithm to solve it. This approach has been tried and convergence of the iterative
solution is obtained. However, the performance was not satisfactory due to two reasons, the
matrix is singular and the precondition step is too expensive. Since the matrix is singular,
we have to use BiCGSTAB or GMRES with restart which are quite expensive. Due to the
fact that turbulent flows consist of a wide range of length scale, preconditioning is necessary.
However, the matrix of the fourth-order projection method is new and thus there are no
specialized preconditioner for it. The incomplete LU decomposition have been tried for the
preconditioner, but a truncating threshold that can give a reasonable convergence rate leads
to a large number of non-zero entry. Consequently, the memory requirement is too much to
be used for large scale computing.
In the fourth-order projection method, one has to solve the special 19-point stencils.
Comparing the size of this stencil to that of the standard fourth-order approximation of the
Laplacian which is a 13. This 19-point stencil is expensive. The number of operation count
is increased by 46% just in the computation of the residual alone (div Lp), compared to the
13-points stencil. Even though this wide stencil has certain advantages due to the property
of the projection method, this wide stencil spans over 3 cells in each direction which leads
to a need of 3 ghost cells. This can be a big problem in the implementation, especially when
68 5 Approximate Projection Method
5.1 Background
A unique feature of the projection method is that the residual given by the corresponding
discrete Laplacian of the projection method is the mass imbalance that will reside in the
velocity fields after the velocity correction. This means when the discrete Poisson equation
is solved by a direct method, the divergence of the velocity field after the correction is
close to machine accuracy. During the solution process, only the pressure and the residual
are needed. In pressure-Poisson formulation, this is not the case. The residual in the
Poisson equation reflects the discrete mass-imbalance only up to the local truncation error.
If one wish to achieve a certain value of mass conservation, it is necessary to correct the
velocity and then recompute the divergence. If the result is unsatisfactory, then the Poisson
equation must be solved again. In general cases, discrete Poisson equations are solved by
iterative methods. This leads to a dual iteration in which the inner loop solves for the
pressure and the outer loop check for the discrete mass imbalance. This inner-outer iteration
must be repeated until the desired divergence is obtained. It is obvious that the pressure-
Poisson formulations are at disadvantage because six variables (pressure, divergence, residual
and the three velocities) must be continuously accessed instead of just two variables in the
projection method. Memory bandwidth and cache misses can increase the computational
time significantly.
On the other hand, projection methods only allow a unique form of the discrete Lapla-
cian given by L = DG where D and G are the discrete divergence and gradient, respectively
(see section 2.5 for the derivation). On collocated grids, this restriction give rise to discrete
Laplacians that are wider than usual and thereby increases the costs of computation. In
second-order projection method, the number of the stencil point of the discrete Laplacian
in 3D is 13 instead of 7. This is however a performance problem. In fact, a more serious
problem of the projection method on collocated grids is the local-decoupling problem of the
5.1 Background 69
pressure. Here, the local-decoupling here means that the solution on odd cells are inde-
pendent from the solution on even cells. This decoupling allows an unphysical oscillation
pattern in the pressure field. Approximate projection method [ABS96] [ABC00] have been
proposed to overcome these two specific problems. However the approximate projection
methods proposed so far do not satisfy the discrete divergence exactly but only up to the
local truncation errors. Loosing the perfect mass-conservation property, these approximate
projection methods become similar to the pressure-Poisson formulations and do not offer
any significant advantages.
On staggered grids, the situation is simpler. The Laplacian given by the projection
method has a minimum number of stencil point and it does not suffer from the local-
decoupling problem. Therefore, approximate projection for staggered grids does not exist in
the second-order context. The situation is however changed when we move to fourth-order
scheme. As mentioned earlier that the fourth-order projection method needs to solve the
19-points Laplacian while the 13-points stencil is sufficient to reach a formal fourth-order
convergence rate. Therefore, if we solved the 13-points stencil and correct the velocity by
fourth-order gradient, we can expect a fourth-order convergence rate in the velocity and
an improvement in the performance can be gained here. However, the corrected velocity
will not be divergence-free. Nevertheless, this shed some light on the possibility that, there
could be some 13-points stencil which lead to a divergence-free velocity after the correc-
tion.
Recent developments in higher-order scheme for Navier-Stokes Equation such as those
in [ARM01] and [Kni08] including our investigations require that the divergence and gradient
approximations are as accurate as those of the convective and diffusive terms, in order to
achieve the full potential of higher-order schemes. This requirement widen the Laplacian
stencil and in some cases, the discrete Laplacian becomes a full matrix. For example, when
the implicit schemes were used to approximate the divergence or the pressure, the Laplacian
will become a full matrix. However, on staggered grids, the explicit approximations of
divergence and gradient on staggered grids is accurate enough and can accommodate compact
schemes without compromising the accuracy. The full matrix of the discrete Laplacian such
as the one used in [ARM01] is thus not needed. Nagarajan et al. [NLF03] comment that
staggered grids have better conservation property and it is more robust than the collocated
counterpart. Therefore efficient projection method for higher-order schemes on staggered
grids deserves more attention.
This chapter is organised as follows. First we outline advantages and disadvantages of
projection methods then the Helmholtz-Hodge decomposition is introduced follows by the
definition of consistent projection method. Effects of the choices in the discretisations of the
pressure and the divergence are analysed. After that, the approximate projection method
is presented and its local truncation error analysis is performed. Convergence and accuracy
70 5 Approximate Projection Method
of the proposed method is then evaluated using numerical simulations of laminar flow. Fi-
nally, the applicability to realistic flows is demonstrated by direct numerical simulations of
a turbulent channel flow.
u 1
= (u ) u + u p (5.1)
t
u = 0 (5.2)
These equations are non-linearly coupled and difficult to solve. Projections methods allow
a simpler numerical procedure to be used. Projection methods are based on the Helmholtz-
Hodge decomposition theorem which can be defined as follows
u = u + (5.3)
H
with u = 0 in and
u ndS = 0
This theorem states that any vector fields can be decomposed into two components, the
divergence-free vector field u and the curl-free vector field . When apply this theorem to
NSE, one can integrate Eq.(5.1) without enforcing the mass conservation and then project
these provisional velocities back to the divergence-free space.
The momentum equation can be integrated in time in different ways. In this work
the low storage third-order Runge-Kutta (RK) time integration of Williamson [Wil80] is
considered. Let H(uk ) be the contribution of convective and diffusive terms at the k-th
substep of the time tn = nt. Each substep k, 1 k 3 of the third-order Runge-Kutta
time integration for the time step n is given by
k k1 k1 1 k1
q = ak q + H(u ) Gp t (5.4)
uk = u(k1) + bk qk (5.5)
where the variables of the zeroth substep are those of the n-th time integration i.e. p0 = pn
except for q 0 which is set to zero. The usage of the numerical superscript in this chapter
e.g. p0 is referred to the corresponding RK-substep. There will not be a reference to other
5.2 Consistent Projection methods 71
step of the time integration. Therefore, the notation p1 , pn and pk will be clear in the
context.
The provisional velocity uk above is usually not divergence-free and the mass imbalance
can be accumulated from each substep and degrade the quality of the final solution. This
problem can be cured by applying the projection operator P = I G(DG)1 D to uk .
This projection operator requires a solution of Poisson equation Lp = tDuk / under the
consistent Laplacian operator L = DG. The projection method appended after Eq.(5.5) is
given below
DG = Duk (5.6)
t
t
uk = uk G (5.7)
pk = pk1 + (5.8)
The resulting uk will have the divergence exactly as the values of the residual t Duk DG
in Eq.(5.6), when solve analytically. The above projection method obeys the Helmholtz-
Hodge decomposition exactly in a discrete sense i.e. u satisfies the discrete divergence
operator close to to machine accuracy and essentially the same for the conservation of the
vorticity. This projection method is thus called exact projection method by Almgren, Bell
and Crutchfield [ABC00]. It must be emphasised that the term exact here does not mean
that the projection leads to the exact solution of p and u.
In the context of fourth-order, D and G in equations (5.4), (5.6) and (5.7) can be ap-
proximated by second-order or fourth-order schemes. This leads to four possible choices of
the Laplacian namely (i) D2 G2 , (ii) D2 G4 , (iii) D4 G2 and (iv) D4 G4 . They are called
here composite Laplacians for an obvious reason. We call the projection method solv-
ing one of these Laplacians and correcting the velocity using the respective discrete gra-
dient consistent projection method. The consistent projection methods solving (i) and
(iv) are the second-order and fourth-order exact projection methods, because they obeys
the Helmholtz-Hodge decomposition up to their corresponding order of accuracy. The
consistent projection methods solving (ii) and (iii) are thus called mixed-order projection
method.
In [ARM01], [Kni08] and earlier in chapter 3 the two exact projection methods(i&iv)
are studied (D2 G2 and D4 G4 ). The conclusions from these works indicate that the second-
order exact projection method is unable to deliver the fourth-order convergent rate in
both velocity and pressure and the fourth-order projection method is necessary. We have
not found the investigation of mixed-order consistent projection methods in the litera-
ture.
72 5 Approximate Projection Method
u v
+ = 0. (5.9)
x y
Applying the Fourier transform to the above equation gives the following relation for each
wave number pair:
where q(kx , ky ) is the Fourier components of the divergence with respect to Tx (kx ) and Ty (ky ),
the transfer function of the differentiation in x-direction and y-direction. One can make use
of continuity equation and arrive at
Suppose that the two transfer functions, Tx (kx ) and Ty (ky ), belong to scheme A having
DA and GA as its discrete gradient and the divergence operators, respectively. Projection
methods search for a suitable field (x, y) such that
1
DA (u GA ) = 0 (5.13)
5.3 Analysis of consistent Projection methods 73
When compute the divergence of this corrected velocity analytically, the divergence is of
P
course equal to the negative of the source term, div = q(x, y) = q(kx,ky) exp(i(kx x +
ky y). This function q(x, y) converges towards zero at the same rate as the convergence rate
of scheme A (see Eq.(5.12)). It is evident that when the m-th order scheme was used for
the divergence approximation, the velocity field satisfy the mass conservation at m-th order
of accuracy. Thus when a second-order scheme was used to approximate the divergence,
the error introduced into the velocity is O(x2). When the solution is integrated in time
from t0 to t1 , these errors will be accumulated to (t1 t0 )(O(x2 )) Therefore fourth-order
approximation of the divergence is essential to achieve fourth-order convergence rate in the
velocity.
The contour plots of the relative divergence introduced to the velocity field (kTy (ky)
Tx (kx)k in Eq.(5.11)) by the second- and fourth-order schemes are shown in Fig.5.1. In
all figures, the most unreliable regions are the top-left and the bottom-right corners. In
these regions, the approximation in one direction is very accurate while that of the other
direction is poorly determined. This unbalanced leads to a larger value of q. According to the
figures, the accuracy of the second-order and fourth-order approximation of the divergence on
staggered grids are impressive. For a relative error of 0.1, the second-order approximation
on staggered grid is comparable to the fourth-order on collocated grids (Fig.5.1(b) and
Fig.5.1(c)). They can represent approximately 20% of all the modes within the Nyquist
limit. On this level of relative error, the fourth-order can capture at least 60% of the whole
wave space. For more accurate result, at a relative error of 0.001, the fourth-order scheme on
collocated grids is much better than the second-order scheme on staggered grids. However,
the fourth-order scheme on staggered grids is the most accurate in all levels of relative
error.
In what follows, we consider how the approximation of the pressure gradient affects the
accuracy of the velocity using Fourier analysis in 3D.
74 5 Approximate Projection Method
The velocity field in the vector form on the Fourier series can be written as
X
u(x, t) = eik x u(x, t). (5.16)
k
The divergence of the velocity on the physical space translates to the following relationship
on the Fourier space
k u = 0. (5.17)
This means the divergence-free condition of a vector field u on the physical space translate
into the orthogonality between the Fourier mode (k) and the Fourier component vector
u.
In the momentum equation there are only two terms that can generate a divergence,
namely (i) the convection and (ii) the pressure terms. Let the nonlinear term be w(x, t),
one can apply the Helmholtz-Hodge decomposition to this term and arrive at the following
relationship of nonlinear convective and the pressure terms
b = kp
kw b (5.18)
This is similar to the situation we saw earlier in Eq.(5.10). The only difference is that
the pressure must cancel out the image of the nonlinear convection in the direction of k.
Therefore when we approximate the convective term with higher-order schemes but kept the
approximation of the pressure gradient at second-order, parts of the nonlinear term parallel
to k will be cancelled out at second-order rate. Consequently, the accuracy of the con-
vective term is effectively degraded to second-order. When the momentum equation
is integrated in time the second-order errors is multiplied with t and thus the errors of
O(tx2) will be added to the momentum in each time integration. This first-order in time
will be accumulated over time and eventually leads to the global error of O(x2 ). It should
be noted that, if the second-order scheme are used for the divergence approximation, the
error of O(x2 ) will be added at each RK substep. The error introduced by the second-order
approximation of the pressure term, in a time-dependent problem, is thus less severe than
the effect of second-order approximation of the divergence.
According to the above analysis, the approximations of divergence and pressure gradient
must be fourth-order accurate, if one wish to achieve fourth-order convergence rate for the ve-
locity. The result from the analysis in this section indicates that, within the framework of the
consistent projection method, the fourth-order convergence rate can only be achieved through
the consistent fourth-order projection method solving D4 G4 .
5.3 Analysis of consistent Projection methods 75
3 0.001 3
0.010
0.050
0.100
2.5 0.200 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
(a) (b)
3 3
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
(c) (d)
This projection ensures that the velocity is divergence-free because Eq.(5.22) is consistent
with Eq.(5.21). However, it is not consistent with Eq.(5.19) and thus called approximate
projection method. On uniform grid, we have to solve D4 G2 whose component in x-direction
reads
1
D4x G2x p = (pi2,j,k + 28pi1,j,k 54pi,j,k ). (5.24)
24x2
The component in y and z-direction can be found by switching the running index to j and
k, respectively. On nonuniform grid, the method described in section 2.5.1 can be used to
construct the discrete Laplacian.
It is clear that in steady state problems, this approximate projection method converges
to that of the exact fourth-order projection method. This is because the velocities will cease
to change only when either the momentum equation is in balance or the divergence of the pro-
visional velocity is zero. If one these conditions is reached, the projection step will not change
the velocity field. The remaining question is how will it perform in time-dependent problems
which will be investigated in the subsequent sections.
5.5 Analysis of the approximate projection method 77
The subscript 1 and 2 denote the solutions of the exact projection and approximate projection
methods, respectively. These two projection methods are called incremental projection
method [BCM01] because the initial pressure is used in the momentum equation and it
is then corrected by the solution of the Poisson equation. On the other hand, one can
omit the pressure in the momentum equation and solve for the whole pressure itself. Such
strategy is called pressure-free projection method [BCM01]. A work of Kim and Moin in
[KM85] is an example of this type. In consistent projection methods, if the Poisson equation
was solved by a direct solver, these two type of projection methods will be essentially the
78 5 Approximate Projection Method
same. However, if an iterative solver was used instead, different results will be obtained
because the magnitude of the solution is different. For example if the stopping criterion
is set to 1E 3, the incremental projection method will have up to 3 digits but the
pressure-free projection method will have pk accurate up to the same digit. Since to < pk
in general, the final solution of pk will be more accurate under the incremental projection
method.
Let us first consider the error in the first RK substep. The error between these two pro-
jection method in velocity field after the first substep is simply
t
e1apx = ku11 u12 k = kG2 2 G4 1 k
= k G2 (D4 G2 )1 G4 (D4 G4 )1 D4 u1
1 k
= k G2 (D4 G2 )1 G4 (D4 G4 )1 k kD4 u1
1 k
= O(x2 )kD4 u1
1 k
The numerical superscript1 of u and in the above equation denote that they are belong
to first RK substep and it should not be confused with the power. The inverse of the
Laplacian is denoted by -1. According to the above error, it would be straight forward to
approximate the bound of eapx if we can find a bound of the initial divergence kD4 uk 1 k .
Since we assume the exact initial conditions, kD4 uk
1 k naturally converges to zero at fourth-
order rate. The error eapx is thus in the order of O(x6 ) because we solved a second-
1
order Laplacian. Likewise, the error in the pressure term is sixth-order i.e. p12 p21 =
O(x6 )
In the second substep, the velocity of both projections have now been changed and the
Poisson equations in this substep for the exact projection becomes
a2 b2 1
D4 G4 21 = 2 1 1
D4 u1 = D4 b2 H(u1 ) G4 p1 + q (5.31)
t t 1
Apply the corresponding inverse of the discrete Laplacian to equations (5.32) and (5.31),
then compute the difference gives the equation for the error in the incremental pressure at
this step. Follows from the errors obtained in the previous paragraph, the right hand side of
this error equation (source term), the keys determining the error of the incremental pressure
in this substep are thus the difference in the order of accuracy and the magnitude of the
source term. Consider Eq.(5.31), since the p11 is responsible for cancelling the divergence
5.5 Analysis of the approximate projection method 79
part of H(un ), the only contributor to the divergence in the second substep is thus the added
velocity given by du1 = u21 u11 . In order quantify how large is this added velocity, let us
consider the divergence of the momentum equation.
Recall that, taking the divergence of the momentum equation leads to the following
relationship
ui uj
p = , (5.33)
xj xi
where i and j here are the Einstein summation indices. Use this relation together with
u21 = u22 + O(x6), Eq.(5.31) is equivalent to
duj ui uj dui duj dui
D4 G4 21 = + + + O(x4), (5.34)
xi xj xi xj xi xj
Note that in this equation, we try to keep the equation simple therefore the r.h.s of Eq.(5.33)
is left at analytic form and the error of the approximation is put to the big-O in the above
equation. This means the r.h.s of the above equation is in the order of O(t). The Poisson
equation for the approximate projection method takes the same r.h.s. Therefore, the error
bound for the increment pressure term () of the approximate projection method (relative to
the exact projection method) at the second RK substep is given by
k21 22 k = k (D4 G4 )1 (D4 G2 )1 D4 u2 2
1 k = O(tx ) (5.35)
This incremental pressure is used to correct the velocity in Eq.(5.29) which leads to the local
truncation error of O (t2 x2 ) in the velocity.
At this point, it would seem that the approximate projection method deliver fourth-
order accurate velocity but a third-order accurate pressure. When integrate the NSE for
a long time, the time-error may be accumulated and leads to third-order and second-order
global errors in the velocity and the pressure, respectively. Actually, this accumulation of
error in the pressure does not occur due to the self-correction property which can be describe
in the following paragraphs.
The provisional velocity at the third RK substep is given by
1
u3
2 = u22 2
+ b3 t H(u2 ) G4 p2 + a3 b3 q22
2
(5.36)
2 1
1 2
= u2 + b3 t H(u2 + du2 ) G4 p2 + 2 + a3 b3 q22
2 1
(5.37)
Before we continue further, let us consider an extreme case where the time step size is
close to the machine accuracy i.e. t . When we perform the time integration in
80 5 Approximate Projection Method
this case, the velocity will stay constant and only the pressure will be changed. Since
q 0, each i-th RK substep is having the same amount of convection and diffusion i.e.
H(u12 ) = H(u22 ) = H(u32 ) = H(un ). We simply perform an iteration for the pressure field
p satisfying D4 G4 p = t D4 H (un ). Since pn = p + O(x4) and we solve D4 G2 which is
a second-order approximation instead of D4 G4 , the pressure after the first RK substep is
given by p12 = p + O(x6). Substitute this pressure into Eq.(5.31), the source term obviously
becomes O(x6). The solution p22 is thus equal to p + O(x8 ). This convergence continues
until p2 p .
Now, let us come back to a general case where t is in the same order of magnitude
as the grid size. We define to be a function of the difference of the net diffusive and the
convective fluxes of two velocities: (w1 + w2 ) and w1 , i.e.
The first square bracket would be zero if we had solved D4 G4 and the second square bracket
would be our source term for 32 . Since we actually solved D4 G2 , the first bracket leads to a
divergence in the order of O(tx2). For clarity, we write the divergence of the provisional
velocity at this substep for the exact projection below
2
D4 u3 2 1
1 = D4 a3 b3 q1 + b3 t(u1 , du1 ) . (5.41)
Apply the respective inverse of the Laplacian operator to Eq.(5.41) and Eq.(5.40), then cor-
rect the pressure leads to the following difference in pressure
1 1
p32 p31 = (D4 G2 ) 1
D4 H(u2 ) G4 p2 + 2 + O(tx2)
1 2
(12 11 ) + (22 21 ) + (32 31 ) =
1 1
(D4 G2 ) D4 H(u2 ) G4 p2 + 2 + O(tx2)
1 1 2
Since we are not interested in the solution attributed to D4 a3 b3 q22 + b3 t(u12 , du22 ) and
5.5 Analysis of the approximate projection method 81
D4 a3 b3 q21 + b3 t(u11 , du21 ) , we simply bound the error due to these terms to third-order.
The first bracket on the l.h.s. of the above equation can be neglected and only the following
equation determines the leading truncation error
1 1
(22 21 ) + (32 31 ) = (D4 G2 ) 1
D4 H(u2 ) G4 p2 + 2 + O(tx2)
1 2
(5.42)
It follows that,
2 2 1 1 1 1 2
2 1 (D4 G2 ) D4 H(u2 ) G4 p2 + 2 + (32 31 ) = O(tx2 )
(5.43)
Since p11 = p12 + O(x6 ) and u11 = u12 + O(x6), the above equation can be written
as
2 2 1 1 1 1 2
2 1 (D4 G2 ) D4 H(u1 ) G4 p1 + 2 + (32 31 ) = O(tx2 )
(5.44)
The solution of the Poisson equation in the above equation is thus only a correction for 22 ,
that is
1 1
(D4 G2 ) D4 H(u1 ) G4 p1 + 2 = 22 21 + O(tx4 ).
1 1 2
(5.45)
Therefore the error in computing 22 at the second RK substep is damped in the third substep.
We call this error cancellation the self-correction mechanism. Similar to the extreme case
discussed earlier, this self-correction will be saturated at the same magnitude of changes
introduced to the velocity field. In our case, the limit is given by the error introduced to the
velocity field at the second substep which is O(t2x2 ).
In summary, the local truncation error of the approximate projection method relative to
the exact projection method is O(t2x2 ) for the velocity and O(tx2) for the pressure.
Since t x in explicit time-integrations, the approximate projection does not compro-
mise the accuracy of the third-order time integration or the fourth-order spatial approxi-
mation because O(t2 x2 ) = O(t4) = O(x4) for explicit time integrations. This local
truncation error leads to O(tx2) global error in the velocity. The local truncation error of
the pressure term in the approximate projection method is O(tx2). This local truncation
error is not accumulated due to the self-correction mechanism of the approximate projec-
tion method and therefore the global error is also O(tx2).
82 5 Approximate Projection Method
t
G4 uk1 G4 uk2 = G4 G4 k1 G4 G2 k2 (5.46)
k t
= 2 ((G4y G2z G4z G2y )i + (G4z G2x G4x G2z )j
(5.47)
+ (G4x G2y G4y G2x )k)
In analytical operations, = 0 for all scalar field . This property is not automatically
inherited to discrete operators. However, for a uniform grid with periodic domain, this
property is inherited if the same operator was used for both computations. For example,
G4 G4 is equal to zero because there is no order in the application of the operators.
Therefore, the first cross product is zero in these conditions.
The second cross product, on the other hand, is not equal to zero and the error is
the commutative error between the second and the fourth-order operators. Consider the
value of the i-component. Here the second-order and the fourth-order approximations for
the first derivative are applied in a different order. Since these approximations are not
exact and some information will be lost after applying the approximations. Two different
approximations applied in different order will give different results. It can be shown by
the Fourier analysis using a similar procedure done in section 5.3.1, that this cross-product
will be equal to zero if kx = ky and otherwise it convergences to zero at second-order rate.
Therefore the error of the approximate projection method lies in the conservation of the
vorticity.
Before we proceed with the result section, first let us summarize the projection methods
discussed so far. The fourth-order exact projection (P44), the consistent mixed-order(P42),
and the approximate projection method (P42a) are summarized in Alg.1, Alg.2 and Alg.3
respectively.
5.5 Analysis of the approximate projection method 83
5.6 Results
In this section we evaluate the accuracy of the approximate projection in comparison with
mixed order projection and the second-order projection methods using the Doubly periodic
shear layer flow and the turbulent channel flows used earlier in chapter 3.
Table 5.1: Maximum error and convergence rate of the streamwise velocity at t = 1.2 sub-
jected to different projection methods. Extra digits are given to P44 and P42.
5.6 Results 85
0
(u)
(v)
(p)
-1 Fourth-order
-2
log10()
-3
-4
-5
Figure 5.2: Convergence of the approximate projection method compared to the solution
interpolated from pseudo spectral method.
86 5 Approximate Projection Method
The results in this test can can easily mislead us to believe that the convergence rate
of the proposed projection method (P42a) is fourth-order. In the analysis section we found
that the global convergence rate, compared to P44 is in fact O(tx2). It is possible that
the errors of the projection method are so small and hidden under the local truncation er-
ror of other spatial approximations. In order to verify this, the errors in each time step
are monitored. The solution from the full fourth-order scheme with the exact projection
method at time t=0.8 is used as the initial condition and then the solution is integrated
in time using fixed CF L = 0.032 such that the error of the time integration is negligible.
The mean streamwise velocity and the pressure from P42 and P42a are compared with P44
and the maximum errors are plotted in Fig.5.3(a) and 5.3(b). The error in the streamwise
velocity of the mixed-order projection (P42) is linearly accumulated in time while that of the
approximate projection (P42a) is relatively constant. This can be explained by Fig.5.3(b)
which shows that the level of pressure error is constant in P42. However, with the approx-
imate projection, the error of the pressure is decreased by approximately a constant factor
(one order of magnitude per time step in this case) for several time steps and then saturated
at the level comparable to the error of the velocity. Therefore P42a does not lead to error
accumulation in the velocity, unlike P42.
5e-06 1E-03
Alg.2 Alg.2
Alg.3 Alg.3
4e-06
1E-04
3e-06
( u )
( p )
1E-05
2e-06
1E-06
1e-06
0 1E-07
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Time step Time step
(a) (b)
Figure 5.3: Error of the streamwise velocity (a) and pressure (b) of the approximate projec-
tion method (P42a) compared to the exact projection method (P44).
The convergence of the flow variables from P42 and P42a towards the approximate pro-
jection method are plotted in Fig5.4(a). It shows clearly that the convergence rate of P42 and
P42a are second- and third-order, respectively. The extra convergence rate of the approxi-
mate projection method comes from the first-order convergence in time shown in Fig.5.4(b).
These findings agree well with the analysis. The errors of both projection algorithms are
smaller than that of the convective and diffusive terms and thus the comparable errors were
observed in Tab.5.1 up to N=192. At the finest solution, error of P42 becomes larger than
the error of the approximations of convective and diffusive terms. However, the error of the
5.6 Results 87
1E-04
1E-05
1E-05
1E-06
1E-06
Alg.2-u
1E-07 Alg.2-v
Alg.2-p 1E-07
1E-08 Alg.3-u
Alg.3-v
Alg.3-p
1E-09 1E-08
20 100 1E-05 1E-04 1E-03 1E-02
N dt
(a) (b)
Figure 5.4: (a) Convergence of flow variables under a spatial grid refinement with fixed CFL
of mixed-order projection method (Alg.2) and approximate projection method
(Al.g3) towards the exact projection method (Alg.1). (b) Convergence of the
pressure under a time-step refinement on N = 642 grid.
where u1 is the streamwise velocity from Alg.1(P44), and the overbar denote an averaging
on the streamwise-spanwise plane. 1 and 2 are the standard deviation of u1 and u2 respec-
88 5 Approximate Projection Method
tively. The simulations are performed for 60H/ub with CFL=0.51. The results are shown
in Fig.5.5(a) and Fig.5.5(b) for two positions. Near the wall, the cross-correlation of the
streamwise velocity from the approximate projection method is approximately one up to
t = 20H/ub while that of the mixed-order projection method drops already at t 10H/ub .
The cross-correlation of P42 drops to approximately zero at t = 30H/ub while that of P42a
lasts twice longer. Near the center of the channel, both algorithms have a better correlation
with the exact projection method. This is because, at the center of the channel, large scale
structures are dominant and thus it takes longer time before they start to be affected by the
errors introduced in the projection step. These two figures show that the approximate pro-
jection is closer to the exact project since it can produce the streamwise velocity field that
has higher, and longer correlation with the one provided by the exact projection method.
The one-dimensional spectra were investigated at the end of the simulation (not shown
here), the spectra from the exact projection method and the exact projection method are
virtually identical. However, the ones from mixed-order projection method are significantly
different.
1 1
0.8 0.8
Cross-correlation
Cross-correlation
0.6 0.6
0.4 0.4
0.2 0.2
0 Alg.2 0 Alg.2
Alg.3 Alg.3
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Time ( ub/ H ) Time ( ub/ H )
(a) (b)
Figure 5.5: Cross-correlation of streamwise velocity near the wall (a) and near the center of
the channel (b).
In Tab.5.2, the bulk flow parameters are listed. On the rightmost column, the imposed
parameters are shown. Note that the flow conditions used here are slightly different from
what used earlier. In chapter 3, the flow conditions were set to match the conditions in
[MKM99]. Here, the conditions are set to the one used in [KMM87]. This leads to small
differences in wall and ub . According to previous analysis, one would expect that P42
arrives at different time-averaged profile due to second-order approximation of pressure in
the momentum equation. This is indeed true because the bulk velocity from P42 is slightly
lower than the ones seen obtained from P44 and P42a. Similar to what have seen earlier
with the doubly periodic shear layers flow, the global parameter of the solutions from P44
and P42a are identical in the first three digits. The first- and second-order statistics of the
5.7 Conclusion 89
flow shown in Fig.5.6(a) and Fig.5.6(b). The profile of the mean streamwise velocity of the
approximate projection (P42a) is passing through the circle symbols of the exact projection
while the one of the mixed-order projection (P42) follows the bottom of the circle symbols
from z + = 30 onwards. Similar situation is seen in the r.m.s of the streamwise velocity
but the other two r.m.s profiles are collapsing. This level of differences was expected since
the maximum differences seen in the global flow parameter was only 1% (in the bulk flow
velocity). Similar study on grid 323 grid was also performed and came the same conclusion
and thus are not reported here. When compare the second-order projection to the P42 and
P42a, the bulk flow velocity normalized by the shear stress velocity is significantly higher.
In the mean streamwise velocity profile, the differences are insignificant. However, the level
of the velocity fluctuations of P22 are notably lower than the others. It should be emphasize
that at this resolution P22 is accidentally collapse on the profile of P44 (see Fig.3.9(b)).
Perhaps the difference between these projection methods can be seen better in the mean
streamwise profile on some other grids.
Table 5.2: Global flow parameters of the turbulent channel flow under different projection
methods.
5.7 Conclusion
The mixed-order projection method (P42) is much better than using second-order projection
(P22) which was shown to be unacceptable in chapter 3. Here, the solutions from P42
are hardly distinguished from the ones obtained from the exact projection method, when
plotted together. Nevertheless, the approximate projection which solves the same discrete
Laplacian gives even better result. Further, it was shown that the errors introduced by
the approximate projection at each substep of Runge-Kutta time-integration are in the
order of O(x4 ) when x t. Therefore it does not degrade the overall accuracy of
the spatial approximations nor the time integration. It was evident that the mixed-order
projection method (P42) reaches the point where the projection error becomes dominant
and its solution discorrelates from the exact projection method faster than the approximate
projection method(P42a). Since the complexities of both projections are comparable, the
approximate projection method is thus preferred.
90 5 Approximate Projection Method
18
16
14
12
10
< u+>
8
6
4 Alg.1
Alg.2
2 Alg.3
P22
0
1 10 100
z+
(a)
Alg.1
2.5 Alg.2
Alg.3
P22
2
/u
1/2
1.5
<uu>
0.5
0
0 20 40 60 80 100 120 140 160 180
+
z
(b)
Figure 5.6: Time-averaged streamwise velocity profile (a) and r.m.s. of velocity (b) with the
approximate projection and mixed-order projection methods. Results from the
second-order projection method (P22) from chapter 3 are shown for comparison.
6 Application to Massive-scaled
Simulations
A Direct Numerical Simulation (DNS) is a useful tool in turbulent investigation. When the
flow field is sufficiently resolved, DNS can be very accurate and it provides us with a great
detail of dynamics and time history of the whole flow field. The quality of different simula-
tions of the same flow can be slightly different due to numerical methods, grid resolutions,
and boundary conditions used to simulate them. Among the wide range of DNS in the liter-
ature, turbulent channel flow is one of the most studied. This is because the simplicity of the
geometry and the well-defined boundary conditions. The forcing of the flow is usually done
by assuming a constant pressure gradient which is added as a body force in the streamwise
momentum equation. These conditions allow the simulations to be repeated with a great
precision. A large number of publications have shown that the statistics of turbulent channel
flow Re = 180 converge the one reported in [KMM87]. This code is subsequently used to
simulate the flow at higher Reynolds number up to Re = 590 in 1999 [MKM99]. Later in
2003, the simulation of turbulent channel flow at Re = 2003 [HJ08] is made public. This
database is the highest Reynolds number in the literature at the time of this work. This
simulation was performed on the MareNostrum at the Barcelona supercomputing center.
According to amount of CPU hours and specification of the MareNosturm at that time,
we can estimate that the simulation had occupied half of the MareNostrum for about six
months. Disregarding the cost of the computer itself and only take the operation budget
of the supercomputing center alone, this simulation costs roughly 1.5 million euro. If the
performance of the code was 50% poorer, this run would take a year and the cost would
have been doubled. This fact emphasizes that the code solving the Navier-Stokes equation
must be parallelised with excellent scalability if one aims at attacking high Reynolds number
flows.
It is relatively straight forward to parallelise the compact scheme on shared memory
computer because the approximation problem is just a one-dimensional. For example, if
the total number of grid cells is Nx Ny Nz , the deconvolution in x-direction will have
Ny Nz independent systems. Processors can shared this work load and the programming
can be easily implemented by OpenMP directive. However, this shared memory paradigm
is not as scalable as the distributed one. We addressed already how to solve tridiagonal
92 6 Application to Massive-scaled Simulations
Table 6.1: Numerical grid used in this study and the grid spacings at the wall and the center
of the channel in term of channel half-width(H).
first approximation opts for a performance by trading off some negligible errors. Otherwise
the calculations of convective and diffusive terms could be four times more expensive if the
direct method from the ScaLAPACK was used. The approximate projection method offers
a convenient implementation to existing second-order codes. The use of three ghost cell
required by the fourth-order projection method has been avoided. It also reduces the cost
of the projection step significantly. In what follows, the parallelised version is compared to
the sequential one in term of accuracy and performance.
However, the results reported in this table are performed under identical conditions which
allow a direct comparison.
Time Divergence
Grid
DIRECT SIP DIRECT SIP
32 32 32 0.16 0.24 5.76E-15 9.93E-6
64 64 64 2.77 3.21 2.56E-14 8.37E-5
96 80 96 6.79 6.60 1.56E-14 2.17E-5
128 128 128 22.86 18.84 2.25E-14 4.47E-5
Table 6.2: CPU-seconds per time integration on AMD opteron 8216 of the sequential fourth-
order scheme with direct and iterative solvers for the pressure.
The most notable deviation can be seen in the flatness factor F (w) at the first cell next
to the wall. Here, the sequential version predicts a smaller flatness than the one predicted
from the spectral scheme in [MKM99] but the parallelised version predicts a higher value (
47), a comparable deviation from the spectral scheme. This difference between the two
versions is only seen at the first cell of the wall-normal velocity. This should not be an
effect of the interface splitting algorithm. The error of the interface splitting algorithm is
set to be five order of magnitude smaller than the maximum velocities (the approximate
bandwidth J was set to 9) and the velocity there is much smaller than the maximum ve-
locity. Therefore, the difference in the first cell here should come from the treatment of the
pressure term. Recall that we use the third-order approximation for the pressure term in
the momentum equation for both versions. The only difference here is the velocity correc-
tion and the discrete Laplacian. Due to the velocity close to the war has a steep gradient
and the level of fluctuation changes rapidly, the difference between the exact projection
and the approximate projection become notable in the fourth-order statistics, close to the
wall.
20 3
Parallel Fourth-order
18 Sequential Fourth-order
2.5
16
14
2
12
r.m.s.
<u >
+
10 1.5
8
1
6
4
0.5
2 Parallel Fourth-order
Sequential Fourth-order
0 0
1 10 100 20 40 60 80 100 120 140 160 180
+ +
z z
(a) (b)
1 30
Parallel Fourth-order S(u) Parallel Fourth-order S(u)
0.8 Sequential Fourth-order S(u) Sequential Fourth-order S(u)
Parallel Fourth-order S(w) 25 Parallel Fourth-order S(w)
0.6 Sequential Fourth-order S(w) Sequential Fourth-order S(w)
Spectral S(w)
Skewness factor
0.4 20
Flatness factor
0.2
15
0
-0.2 10
-0.4
5
-0.6
-0.8 0
0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5
z z
(c) (d)
Figure 6.1: Comparison of statistics from the solutions of parallel and sequential versions of
the fourth-order scheme on 1283 grid cells.
96 6 Application to Massive-scaled Simulations
Another factor that can affect accuracy of the interpretation of the statistics are the
type of average values being sampled. When the grids are staggered, one can store the
statistics on their original positions or averages them to the pressure cells. MGLET samples
the second-order statistics at the pressure cells by simply averaging the two cells sharing the
same pressure cell. The third- and the fourth-order statistics, however, are stored at their
original positions. The averaging procedures, of course, introduce a filter to the velocity field
and thus the information on small scale are less accurate. This would imply that we need
even higher-order approximations to process the statistics.
The effect of using lower-order approximation statistics can be highlighted using the
Reynolds shear stress profiles shown in Fig. 6.2. When the cell-averaged values are used to
compute < u w >, u and w are averaged to the pressure cell where < uu >, < ww > and
< uw > are computed and stored. The profile of the cell-averaged < u w > close to the
wall of both versions significantly deviate from the theoretical predictions of the Reynolds
shear stress. Post-processing by de convolving the stored cell-averaged < uw > (not shown)
reduce the deviation by half. However, when < u w > is taken directly from the surface-
averaged at the top of the u-momentum cell, the profile follows the theoretical prediction
perfectly. Another effect that one must take in to account when interpret the statistics
1E+02
1E+01
1E+00
1E-01
1E-02
<uw>
1E-03
1E-04
Parallel cell-averaged
1E-05 Sequential cell-averaged
Spectral
1E-06 Sequential face-averaged
+ 3
(z )
1E-07
0.1 1 10
+
z
Figure 6.2: Effect of the sampling procedures to the Reynolds shear stress in turbulent chan-
nel flow.
statistics from different processors. In Fig.6.3, the cell-averaged and the surface-averaged
values (with nonlinear correction) of the r.m.s. of the velocity fluctuations are compared.
Actually, the contribution of the nonlinear-correction is small. When the solution is solved
using the full fourth-order scheme, but sample the surface-averaged < ui ui > without the
nonlinear correction gives the r.m.s. profiles in between the two profiles plotted in Fig.6.3.
This is not unexpected because the magnitude of the nonlinear correction is second-order.
Since, we are dealing with < ui ui > which is already a second-order quantity. We could only
expect a small differences in the profile. Nevertheless, this correction is necessary for a fine
comparison with the spectral scheme. If this correction is neglected, the level of the r.m.s
will be notable lower than the spectral solution.
In practice, one can deconvolve the cell-average values to pointwise ones but it would
involve a triple deconvolution. A very high order approximation would be needed to avoid
a damping of high wave component. At this moment, there are no infrastructures to sample
face-averaged values or pointwise values in the parallel version of the code and therefore the
results in the rest of the chapter will be reported using the cell-average values. This should
be noted for future improvement.
2.5
2
+
r.m.s.
1.5
0.5
0
0 20 40 60 80 100 120 140 160 180
+
z
Figure 6.3: Effect of the nonlinear correction to the Reynolds normal stress in turbulent
channel flow: solid-line:cell-averaged values and dash-line:surface-averaged values
with nonlinear correction. The top pair is < u u >, the middle pair is < v v >
and the bottom pair is < w w >.
The probability density functions (PDF) are investigated in Fig.6.4. At first it was
difficult to make a comparison of the fourth-order results with the reference [MKM99] . This
is due to the normalisation, which can bring a different scaling when the cumulative density
functions are computed with different intervals. This results in a different weighting and the
98 6 Application to Massive-scaled Simulations
PDF curve can be scaled up or down. In Fig. 6.4(a) , PDF(u) at z + = 5 are shown. The
number of interval and the range are set to 201 similar to the reference. The right figure
shows a similar plot at the center of the channel. Here, the number of interval is set to 160.
In both figures, all the distributions are comparable. The distribution of the fourth-order at
the center of the channel is slightly broader than that of the spectral scheme. Nonetheless,
there are no notable differences between the parallel and the sequential versions. According
to these results, we can conclude that the parallelisation of the compact fourth-order is
implemented correctly and the parallel algorithms used here do not have negative effects on
the quality of the solution.
to show that larger structures (low wave number) are created when the Reynolds number
was increased. When the computational box is not large enough to allow these large-scaled
structure to formed, statistics and turbulence mechanisms may not represent the correct
physics. If they contain enough energy, the numerical simulations of turbulent channel flow
on different domain (with the same grid resolutions) may deliver different results. However it
is not yet known at what Reynolds number these large-scaled structures become significant
and how much they affect the flow statistics. To get rid of this uncertainty, three DNS of tur-
bulent channel flows up to Re = 590 are performed using the same computational box and
comparable resolutions as those used in [MKM99] which will be used as the reference. Com-
putational domain, grid resolutions, nominal flow conditions which are used to enforce the
flows and the resulting Reynolds number are listed in Tab.6.4.
In the next step, we extend the study to a higher Reynolds number, the turbulent
channel flow Re = 950. In order to avoid excessive use of resources, we scaled down the
computational domain and perform a simulation using the recommended grid resolutions.
It will be explain later that this domain is sufficiently large for this simulation. All of
the simulations in this section are run on ALTIX 4700. The number of processors in each
simulation spans from 8 to 128.
Table 6.3: Recommended grid resolutions for DNS and LES from [WHS07].
tub
Case Nominal Re Effective Re x+ y + +
zw zc+ Lx H
T180-G1 180.0 179.7 17.7 5.90 0.72 4.4 100
T395-G2 392.2 391.9 9.61 4.81 0.86 6.5 41.1
T590-G3 587.2 575.1 9.45 4.70 0.63 9.3 17.3
T950-G4 943.1 943.6 12.41 6.21 0.78 9.7 21.0
streamwise velocity are slightly lower than what predicted by the spectral scheme. This
difference is due to the averaging procedure mentioned earlier which explicitly filters the
field and remove the small scale from the statistics. This shortcoming can be improved by
deconvolve the field to pointwise value before the taking the sampling or deconvolve the
field to the surface or adding nonlinear correction into it such that ui ui represent the true
Reynolds normal stresses in the momentum equation like what we did earlier in chapter
3.
The profiles of the skewness factor of the wall-normal velocity at higher Reynolds num-
bers (Fig.6.6) agree with the spectral scheme better than what seen on the lowest Reynolds
number (Re = 180). The flatness factors are also correctly predicted. At the center of
the channel, the spectral code predicts [F (u), F (v), F (w)] = [3.58, 3.73, 3.92] and the fourth-
order predicts [3.57, 3.40, 3.76].
One-dimensional energy spectra are investigated in Fig.6.7 which shows some interesting
results. Near the wall, Euu of the fourth-order scheme in T180-G1 drops near the end of the
spectrum while that of the other Reynolds number follows the profile of the spectrum almost
to the end of the Nyquist limit. On the other hand, the spectrum on T180-G1 follows that of
the spectral scheme up to 75% of the resolvable frequency but the profiles on higher Reynolds
number follows the predictions of spectral scheme up to only 60%. This opposite trend can
be attributed to the grid spacing in the streamwise and the wall normal direction including
the profile of the energy spectrum itself. The grid spacing in the streamwise direction of
T180-G1 is approximately twice larger than those used in the other two cases. Therefore,
near the wall it is not as accurate as the other cases. At the center of the channel, the grid
spacing of T180-G1 in the wall-normal direction is two-third of the others and the slope of
the reference spectrum is steeper which means the small scale structures is less significant
than the other two cases. These two factors lead to a better prediction of energy cascade
and in turn predicts an accurate energy spectrum there.
6.3 DNS of turbulent channel flow on fine grids 101
+
PDF(u) at z =5
0.030
Parallel fourth-order
Sequentail fourth-order
0.025 Spectral
0.020
PDF
0.015
0.010
0.005
0.000
-10 -8 -6 -4 -2 0 2 4 6 8 10
+
u
(a)
+
PDF(u) at z =180
0.030
Parallel fourth-order
Sequentail fourth-order
0.025 Spectral
0.020
PDF
0.015
0.010
0.005
0.000
-5 -4 -3 -2 -1 0 1 2 3 4 5
+
u
(b)
Figure 6.4: Comparison of probability density functions of the streamwise velocity between
parallel and sequential versions of the fourth-order scheme at two positions: (a)
close the wall at z + = 5 and (b) at the center of the channel.
6.3 DNS of turbulent channel flow on fine grids 103
20 3
18
2.5
16
14
2
12
r.m.s.
<u >
+
10 1.5
8
1
6
4
0.5
2 Fourth-order
Spectral
0 0
1 10 100 1 10 100
+ +
z z
(a) (b)
22 3
20
18 2.5
16
2
14
r.m.s.
12
<u >
+
1.5
10
8
1
6
4 0.5
2 Fourth-order
Spectral
0 0
1 10 100 1 10 100
+ +
z z
(c) (d)
25 3
2.5
20
2
15
r.m.s.
<u >
+
1.5
10
1
5
0.5
Fourth-order
Spectral
0 0
1 10 100 1 10 100
+ +
z z
(e) (f)
Figure 6.5: Mean streamwise velocity profile (left) and r.m.s. of velocity fluctuations (right)
of turbulent channel flow on the domain and the grids comparable with those used
by spectral code[MKM99]. Top:Re = 180 ; middle:Re = 395 ; bottom:Re =
590. Plus symbol: u-component ; square symbol: v-component; triangle symbol:
w-component.
104 6 Application to Massive-scaled Simulations
1 30
Fourth-order-S(u) Fourth-order-F(u)
0.8 Spectral-S(u) Spectral-F(u)
Fourth-order-S(w) 25 Fourth-order-F(w)
0.6 Spectral-S(w) Spectral-F(w)
0.4 20
Flatness factor
skewness
0.2
15
0
-0.2 10
-0.4
5
-0.6
-0.8 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
z z
(a) (b)
1.2 40
Fourth-order-S(u) Fourth-order-F(u)
1 Spectral-S(u) 35 Spectral-F(u)
Fourth-order-S(w) Fourth-order-F(w)
0.8 Spectral-S(w) Spectral-F(w)
30
0.6
Flatness factor
0.4 25
skewness
0.2 20
0 15
-0.2
10
-0.4
-0.6 5
-0.8 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
z z
(c) (d)
1.2 45
Fourth-order-S(u) Fourth-order-F(u)
1 Spectral-S(u) 40 Spectral-F(u)
Fourth-order-S(w) Fourth-order-F(w)
0.8 Spectral-S(w) 35 Spectral-F(w)
0.6
30
Flatness factor
0.4
skewness
25
0.2
20
0
15
-0.2
-0.4 10
-0.6 5
-0.8 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
z z
(e) (f)
Figure 6.6: Skewness factor (left) and flatness factor (right) of turbulent channel flow.
Top:Re = 180. Middle:Re = 395. Bottom:Re = 590.
6.3 DNS of turbulent channel flow on fine grids 105
1E+00 1E+00
1E-01 1E-01
1E-02 1E-02
1E-03 1E-03
Euu
Euu
1E-04 1E-04
1E-05 1E-05
1E-06 1E-06
Fourth-order Fourth-order
Spectral Spectral
1E-07 1E-07
1 10 100 1 10 100
kx kx
(a) (b)
1E+00 1E+02
1E-01 1E+00
1E-02
1E-02
1E-03
1E-04
Euu
Euu
1E-04
1E-06
1E-05
1E-08
1E-06
1E-07 1E-10
Fourth-order Fourth-order
Spectral Spectral
1E-08 1E-12
1 10 100 1 10 100
kx kx
(c) (d)
1E+00 1E+00
1E-01 1E-02
1E-02
1E-04
1E-03
Euu
Euu
1E-06
1E-04
1E-08
1E-05
1E-06 1E-10
Fourth-order Fourth-order
Spectral Spectral
1E-07 1E-12
1 10 100 1 10 100
kx kx
(e) (f)
Figure 6.7: One-dimensional spectra of the streamwise velocity at z + = 5 (left) and at the
center of the channel (right) of turbulent channel flow. Top: Re = 180. Middle:
Re = 395. Bottom: Re = 590.
106 6 Application to Massive-scaled Simulations
25 3
2.5
20
2
15
r.m.s.
<u >
+
1.5
10
1
5
0.5
Fourth-order
Spectral
0 0
1 10 100 1000 1 10 100 1000
+ +
z z
(a) (b)
Figure 6.8: Mean streamwise velocity (a) and r.m.s. of velocities fluctuations(b) of T950-G4
(every two grid cells are shown).
1E+01 1E+02
1E+00 1E+00
1E-01 1E-02
1E-02 1E-04
Euu
Euu
1E-03 1E-06
1E-04 1E-08
(a) (b)
tub
Case Effective Re x+ y + +
zw zc+ Lx H
T395-G1 390.3 38.33 12.81 1.56 9.63 35
T590-G2 579.5 14.22 9.47 1.27 9.58 68
T950-G3 965.9 15.96 7.98 2.13 15.97 59
to the center of the channel. On the other hand, the profile of the r.m.s. of the velocity
fluctuations of T950-G3 is very good close to the peak, but the fluctuations are slightly
over predicted the center. This indicates that the grid spacing near the channel is relatively
coarse. Here the grid spacing in the wall-normal direction is the coarsest among other cases.
According to the grid resolutions used in this study, it could be possible that the first- and
second-order statistics of the turbulent channel flows can be predicted accurately using grid
spacings twice larger than the recommended values in Tab.6.3.
In this section, the performance of the parallel compact scheme is evaluated on two types
of parallel machine, a low cost workstation and a supercomputer. The workstation used
in this test is a four-processors of dual core AMD 8216. The supercomputer where the
benchmarking of large-scaled simulation is performed is ALTIX 4700 located at Leibniz-
Rechenzentrum.
The result from the workstation is presented in Tab.6.6. This table documents the result
of the parallelisation of T180-G1. The domain is sliced first in the streamwise direction, then
the spanwise direction and repeated in this order for each time the number of processors is
doubled. According to the table, there is a significant overhead going from a single processor
to two processors. This overhead is as high as one-third of the total work. The overhead
is attributed to three factors, the communications, domain decomposition of the SIP and
the interface splitting algorithm. All simulations in this section use 16 iterations of SIP.
The absolute efficiency which is defined by Ea = 100T pTp
1
, relative to the time used on a single
processor (T1 ), is not reflecting the overhead of the parallelsation. It is thus a pessimistic
indicator for the scalability. A more practical indicator, named here incremental efficiency,
measures the performance gain when the number of processor is increased. It is defined by
100Tp1
Ei = (p2 /p1 )Tp2
for p2 > p1 . This indicator says clearly how much the increased processors
are being used, relatively to the simulation on p1 processors.
In this table, Ei is measured between the two consecutive number of processors. On
the workstation, roughly 75% of the added processors helps in speeding up the solution
process. The performance on ALTIX 4700 in Tab.6.7 shows a similar finding. Performance
on a single processor is slightly lower than what seen on the workstation. The values of
absolute efficiency are also lower. For the incremental efficiency the ALTIX 4700 shows
an interesting behaviour. At lower number of processors, the incremental efficiencies are
lower than what seen on the workstation but on eighth and sixteen processors the value
are higher. This mean that the overhead on ALTIX 4700 increased sharply up to four
processors and then become almost saturated. Incremental efficiency of 92% is the evident
of this saturation. There are three factors contribute to this behaviour. First, when we
increase the number of processor from one to two, we introduce communication overheads in
the streamwise direction. Increasing the number of processor from two to four, overheads are
added again in the spanwise direction. After this, no extra direction is added. The increased
overheads at eight and sixteen processors are therefore not as high. This contribute to a
higher incremental efficiency. This behaviour also seen on the cluster as well. The second
contributor is the architectural design. ALTIX 4700 is composed of building blocks which
are blade systems consisting of two dual core Intel Itanium2 Montecito per blade. These four
cores share the same 8.5 GB/s memory bus. This blade is then linked to the other blades
110 6 Application to Massive-scaled Simulations
via 6.4Gb/s NUMAlink4. The peak performance of the blade is 25.6 GFlops and therefore
204.8 GB/s must be fed to each processor to achieve their full potential. Unfortunately, the
bus is only 4% of this value and thus results in a sharp drop of performance in the first
four processors. This effect is clearly shown in the speedup value of 1.76 at four processors,
compared to 2.21 on the workstation.
Efficiency (%)
p CPU-seconds Speedup
Ea Ei
1 12.47
2 8.47 1.47 74 74
4 5.65 2.21 55 75
8 3.54 3.52 44 80
Table 6.6: Scalability of parallel compact scheme of T180-G1 case on commodity worksta-
tion..
Efficiency (%)
p CPU-seconds Speedup
Ea Ei
1 13.61
2 9.87 1.38 69 69
4 7.75 1.76 43 64
8 4.21 3.23 40 92
16 2.38 5.72 35 88
Table 6.7: CPU-time per time step used in T180-G1 and the scalability of the fourth-order
scheme on ALTIX 4700.
The comparison of the performance between the second-order and the fourth-order
on ALTIX 4700 using a turbulent channel flow on T395-G2 is shown in Tab.6.8. On four
and eighth processors, the fourth-order is slower than the second-order by a factor of 2.7.
This factor is reduced to 2.1 on thirty two processors. Both schemes achieve almost ideal
scalability, relatively to performance on four processors. In these regions, the incremental
efficiency is free from the memory bottle neck mentioned earlier. The amount of memory per
variable on 4 and 8 are 17.9MB and 8.96MB, respectively. Thus on eighth processor a single
variable fits perfectly on the processors cache. This better data locality is compensated with
the increased communications and results in excellent scalability in both schemes. The part
of the code which is the most communication intensive is the solution of pressure where the
SIP is used. There, the subroutine must access two variables, the pressure and the residual.
On sixteen processors, both data can be stored entirely in the cache which gives a super-linear
relative speedup, shown by 101% incremental efficiency.
6.6 Conclusion 111
Another reason that made the fourth-order scheme less sensitive to the communication
is the pattern of communication in the SIP. The original second-order code, first solve the
L-system then synchronise the residual. It then solves U-system and synchronises again.
The modified version for the fourth-order scheme only synchronises once after both systems
are solved. This halves the number of the communications. Even though, the second-
order scheme needs to synchronise just one ghost cell compared to two ghost cells in the
fourth-order scheme, the lower number of communication helps avoiding the idle time of
the processor in which they must wait for the mismatched send-receive to be complete.
According to the result form chapter 2, the second-order needs twice more grid point per
direction to have the same accuracy as the fourth-order scheme. Taking this resolution
requirement and the time integration in to account, we can expect that the fourth-order is
6 8 times more expensive than the second-order code. Comparing to the factor of 10 found
earlier in chapter 2, the advantages of the fourth-order scheme is reduced slightly in the
parallel version. This can be attributed to the overhead in the interface-splitting algorithm
and the SIP solver. Nevertheless, the parallel version of the fourth-order scheme is still much
more efficient than the second-order scheme.
Table 6.8: Comparison of CPU-time per time integration and the scalability of the fourth-
and the second-order schemes in T390-G2 on ALTIX 4700.
6.6 Conclusion
The results shown in this chapter highlights the desirable properties of the parallelised com-
pact fourth-order scheme. It is highly accurate and can matched up with the spectral scheme
with great accuracy. It can recreate the one-dimensional energy spectra up to 99.9% of the
fluctuation energy using the same grid as the spectral scheme. If one satisfied with the grid
independent solution of the first- and the second order statistics, a grid twice coarser than the
usually recommended values can also be used. This is made possible by the fact that the small
scales do not contribute much to the large scale structures.
In general, it is relatively difficult to obtain a good scalability in fixed size speed up.
Here we obtain approximately linear scalability which is a nice property thanks to the linear
112 6 Application to Massive-scaled Simulations
complexity of the algorithms we used. In every algorithms presented in this work, the
complexity does not depends on the number of processor. Therefore the proposed scheme
should be scalable for any number of processors, provided that the interconnection fabric of
the parallel machine is scalable.
6.6 Conclusion 113
22 3
20
18 2.5
16
2
14
r.m.s.
12
<u >
+
1.5
10
8
1
6
4 0.5
2 Fourth-order-2.1M
Spectral
0 0
1 10 100 1 10 100
+ +
z z
(a) (b)
25 3
2.5
20
2
15
r.m.s.
<u >
+
1.5
10
1
5
0.5
Fourth-order
Spectral
0 0
1 10 100 1 10 100
+ +
z z
(c) (d)
25 3
2.5
20
2
15
r.m.s.
<u >
+
1.5
10
1
5
0.5
Fourth-order
Spectral
0 0
1 10 100 1000 1 10 100 1000
+ +
z z
(e) (f)
Figure 6.10: Mean streamwise velocity profile (left) and r.m.s. of velocity fluctuations (right)
of turbulent channel flow on coarse grid. Top: Re = 395 ; middle: Re =
590 ; bottom: Re = 950. Plus symbol: u-component ; square: symbol: v-
component; triangle symbol: w-component.
114 6 Application to Massive-scaled Simulations
20 3
1/2
18
Fourth-order <uu>
1/2
Fourth-order <vv>
2.5 1/2
16 Fourth-order <ww>
14
2
/ub
12
1/2
<u >
1.5
<uiui>
+
10
8
1
6
4 0.5
2 Fourth-order
Spectral 0
0
1 10 100 20 40 60 80 100 120 140 160 180
+ +
z z
(a) (b)
25 3
1/2
Fourth-order <uu>
1/2
Fourth-order <vv>
2.5 1/2
20 Fourth-order <ww>
2
/ub
15
1/2
<u >
+
1.5
<uiui>
10
1
5
0.5
Fourth-order
Spectral
0 0
1 10 100 1000 0 100 200 300 400 500 600 700 800 900 1000
+ +
z z
(c) (d)
Figure 6.11: Mean streamwise velocity (left) and velocity fluctuations (right) normalised by
the nominal parameters of Re = 180 (top) and 950 (bottom) on the grids twice
larger than the recommended resolutions.
1E+00 1E+01
1E+00
1E-01
1E-01
1E-02 1E-02
1E-03
Euu
Euu
1E-03
1E-04
1E-04 1E-05
1E-06
1E-05
Fourth-order 1E-07 Fourth-order
Spectral Spectral
1E-06 1E-08
1 10 100 0.1 1 10 100
kx kx
(a) (b)
Figure 6.12: One-dimensional spectra Euu of the streamwise velocity for Re = 180 (a) and
950 (b)on very coarse grids.
7 Conclusion and outlook
On the course of this work, the compact fourth-order scheme for numerical solutions of the
Navier-Stokes equations (NSE) tailored specially for staggered grids have been developed.
The scheme is efficiently parallelised and can be conveniently applied to any existing second-
order code having two ghost cells. This development is a complete fourth-order scheme for
finite volume discretisation on staggered grids which ensures a perfect mass conservation on
every control volumes. Compared to earlier works on higher-order method for Navier-Stokes
equations, we have considered many aspects of the numerical scheme such as accuracy, effi-
ciency, scalability including compatibility for existing CFD code. The outcome of the work is
a highly efficient algorithm solving Navier-Stokes equations which outperforms the standard
second order schemes in both accuracy and performance. Careful evaluations have been
carried out for laminar and turbulent flows. The approximations for the momentum term
is the least accurate among the spatial approximations used in the NSE. The role and the
importance of the pressure term have been investigated. Improving all other approximations
to fourth-order but keeping that of the the pressure term at second-order will prevent the
convergence rate to reach fourth-order. We also found an evident suggesting that the en-
forcement of continuity may be more accurate on staggered grids than on collocated ones.
When the approximation of the pressure gradient and divergence is kept at second-order,
the overall convergence rate of third-order can be achieved while it is limited to second-order
on the collocated grids. This could be the reason why previous developments of higher-order
scheme on collocated grids deliver disappointed results for turbulence flows, despite the fact
that higher-order schemes were shown to be much more accurate than the second-order
scheme on laminar flows.
Tridiagonal matrices of the compact scheme has been parallelised efficiently by the
interface-splitting algorithm. This algorithm is an approximate method, but the accuracy
of the approximation is predetermined by the cut-off threshold of the coefficients vector.
The overhead of the algorithm is small when the size of the subsystem is properly chosen.
The proposed algorithm has a minimum communication and the least number of floating
point operations. The algorithm is designed in a way that the the additional computation
and the bidirectional communication can be overlapped. This algorithm is shown to be at
least four-times faster than the ScaLAPACK library. In optimal conditions, ideal absolute
speedup can be obtained.
116 7 Conclusion and outlook
The developed NSE solver uses the novel divergence-free interpolation for the convective
velocities. This interpolation ensures the divergence-free property of the convective velocities
on all control volumes inside the computational domain. This divergence-free property of the
convective velocity is a necessary condition for Galilean invariant of the numerical solution
of the NSE and it can be very important in numerical simulations of turbulent flows on
coarse grids. The divergence-free approximate projection method is developed to enforce
the mass conservation using a narrow banded matrix. This method has a very good data
locality and does not need to recompute the divergence during the solution process. The
proposed approximate projection method shows an excellent correlation with the fourth-
order projection method and it is fully divergence-free with third-order global convergence
rate.
These pieces, when they are put together, create a highly efficient and scalable codes.
To the best of the author knowledge, this scheme is the only non-spectral scheme that can
produce a collapsing profile of the first- and second-order statistics of turbulent channel flows
up to Re = 950 using approximately one-third of the total grid used by the spectral code.
This reduction in number of cells is allowed by the fact that the small scales structures
contribute very little to the large scale structures. Ultimately, we expect that the fourth-
order scheme can use the grid resolution twice larger than the usual recommended values
for turbulent channel flow, and yet deliver accurate results for the first- and second-order
statistics.
The current code can easily be used to simulate some classic turbulent flows such as
flows over mounted cube, backward/forward facing step, flow over rectangular cavity, mixing
layers etc. Due to the excellent scalability of the parallelised version of proposed scheme, we
can expect that direct numerical simulations of these classic test cases can be performed by
the proposed scheme at higher Reynolds number and expand our understanding of Turbulent
flows.
In order for the proposed scheme to be applicable for complex geometries, further
developments are needed. This can be in a form of higher-order immersed boundary method
or conformal matching grid of Cartesian and curvilinear grids or Cartesian and unstructured
grids. This topic could be a very interesting research. The invention of fourth-order scheme
could be beneficiary for large-eddy simulations (LES). We have seen clearly that the one-
dimensional energy spectra of the fourth-order scheme follows that of the spectral scheme
50% longer than that of the second-order. The local truncation errors are now small. For a
well known turbulent channel flow at Re = 180, the DNS of the fourth-order scheme using
323 grid cells delivers 5% error in the mean flows profile. What sub-grid scale modelling
will improve the solution ? How to adjust the modelling parameter, in comparison to the
second-order ? These questions must be answered in order to enable the fourth-order scheme
for industrial applications.
Bibliography
[ABC00] Ann S. Almgren, John B. Bell, and William Y. Crutchfield, Approximate projec-
tion methods: Part i. inviscid analysis, SIAM Journal on Scientific Computing
22 (2000), no. 4, 11391159.
[ABM04] Travis Austin, Markus Berndt, and David Moulton, A memory efficient parallel
tridiagonal solver, Preprint LA-UR-03-4149, 2004.
[ABS96] Ann S. Almgren, John B. Bell, and William G. Szymczak, A numerical method
for the incompressible navierstokes equations based on an approximate projec-
tion, SIAM J. Sci. Comput. 17 (1996), no. 2, 358369.
[AG96] P. Arbenz and W. Gander, Direct parallel algorithms for banded linear systems,
Z. Angew. Math. Mech. 76 (1996), 119122.
[ARM01] Demuren Ayodeji, Wilson Robert, V., and Carpenter Mark, Higher-order com-
pact schemes for numerical simulation of incompressible flows, part i: Theoretical
development, Numerical Heat Transfer, Part B 39 (2001), no. 3, 207230.
[BCM01] David L. Brown, Ricardo Cortez, and Michael L. Minion, Accurate projection
methods for the incompressible navierstokes equations, J. Comput. Phys. 168
(2001), no. 2, 464499.
[Bon91] Stefan Bondeli, Divide and conquer: a parallel algorithm for the solution of a
tridiagonal linear system of equations, Parallel Computing 17 (1991), no. 4-5,
419434.
[CM04] Saugata Chakravorty and Joseph Mathew, A high-resolution scheme for low
mach number, INTERNATIONAL JOURNAL FOR NUMERICAL METHODS
IN FLUIDS 46 (2004), no. 3, 245261.
118 Bibliography
[CSM04] Lacor Chris, Smirnova Sergey, and Baelmansb Martine, A finite volume for-
mulation of compact central schemes on arbitrary structured grids, Journal of
computational physics 198 (2004), no. 2, 535566.
[DM01] A. Das and J. Mathew, Direct numerical simulation of turbulent spots, Comput-
ers & Fluids 30 (June 2001), 533541.
[GA93] David Gottlieb and Saul Abarbanel, The stability of numerical boundary treat-
ments for compact high-order finite-difference schemes, J. Comput. Phys. 108
(1993), no. 2, 272295.
[GPPZ98] Alfonsi Giancarlo, Giuseppe Passoni, Lea Pancaldo, and Domenico Zampaglione,
A spectral-finite difference solution of the navier-stokes equations in three dimen-
sions, International Journal for Numerical Methods in Fluids 28 (1998), no. 1,
129142.
[Heg96] M. Hegland, Divide and conquer for the solution of banded linear systems of
equations, in Proceedings of the Fourth Euromicro Workshop on Parallel and
Distributed Processing, IEEE Computer Society Press, Los Alamitos, 1996,
pp. 394401.
[HJ08] Sergio Hoyas and Javier Jimenez, Reynolds number effects on the reynolds-stress
budgets in turbulent channels, Physics of Fluids 20 (2008), no. 10, 101511.
Bibliography 119
[Hu06] Z. W. Hu, Wall pressure and shear stress spectra from direct simulations of
channel flow, AIAA Journal 44 (2006), no. 7, 15411549.
[KA99] K. C. Kim and R. J. Adrian, Very large-scale motion in the outer layer, Physics
of Fluids 11 (1999), no. 2, 417422.
[KL96] J. W. Kim and D. J. Lee, Optimized compact finite difference schemes with
maximum resolution, AIAA Journal 34 (1996), 887893.
[KMM87] J. Kim, P. Moin, and R. D. Moser, Turbulence statistics in fully developed channel
flow at low reynolds number, Journal of Fluid Mechanics 177 (1987), 133166.
[Kni08] R. Knikker, Study of a staggered fourth-order compact scheme for unsteady in-
compressible viscous flows, International Journal for Numerical Methods in Flu-
ids 59 (2008), 10631092.
[Lel92] Sanjiva K. Lele, Compact finite difference schemes with spectral-like resolution,
J. Comput. Phys. 103 (1992), no. 1, 1642. MR MR1188088 (93g:76086)
[LPJN93] Josep-Lluis Larriba-Pey, Angel Jorba, and Juan J. Navarro, Spike algorithm
with savings for strictly diagonal dominant tridiagonal systems, Microprocess.
Microprogram. 39 (1993), no. 2-5, 125128.
[Man04] M. Manhart, A zonal grid algorithm for DNS of turbulent boundary layers, Com-
puters & Fluids 33 (2004), no. 3, 435461.
[MKM99] Moser, Kim, and Mansour, Direct numerical simulation of turbulent channel flow
up to re = 590, Physics of Fluids 11 (1999), no. 1, 943945.
[MM98] Parviz Moin and Krishnan Mahesh, Direct numerical simulation: A tool in tur-
bulence research, Annual Review of Fluid Mechanics 30 (1998), no. 1, 539578.
[MZH85] M.R. Malik, T.R. Zang, and M.Y. Hussaini, A spectral collocation method for
the navier-stokes equation, JCP 61 (1985), 6488.
[Nab99] Reinhard Nabben, Decay rates of the inverse of nonsymmetric tridiagonal and
band matrices, SIAM Journal on Matrix Analysis and Applications 20 (1999),
no. 3, 820837.
[NLF03] Santhanam Nagarajan, Sanjiva K. Lele, and Joel H. Ferziger, A robust high-order
compact method for large eddy simulation, J. Comput. Phys. 191 (2003), no. 2,
392419.
[PS04] M. Piller and E. Stalio, Finite-volume compact schemes on staggered grids, Jour-
nal of computational physics 197 (2004), no. 1, 299340.
[RM91] M. M. Rai and P. Moin, Direct simulation of turbulent flow using finite-difference
schemes, JCP 96 (1991), 1553.
[SK78] A. H. Sameh and D. J. Kuck, On stable parallel linear system solvers, J. ACM
25 (1978), no. 1, 8191.
[SSN89] Xian-He Sun, Hong Zhang Sun, and Lionel M. Ni, Parallel algorithms for solu-
tion of tridiagonal systems on multicomputers, ICS 89: Proceedings of the 3rd
international conference on Supercomputing (New York, NY, USA), ACM, 1989,
pp. 303312.
[Sto73] Harold S. Stone, An efficient parallel algorithm for the solution of a tridiagonal
linear system of equations, Journal of the ACM 20 (1973), no. 1, 2738.
[Sun95] Xian-He Sun, Application and accuracy of the parallel diagonal dominant algo-
rithm, Parallel Computing 21 (1995), no. 8, 12411267.
Bibliography 121
[SW07] Olga Shishkina and Claus Wagner, A fourth order finite volume scheme for
turbulent flow simulations in cylindrical domains, Computers & Fluids 36 (2007),
no. 2, 484497.
[Vas00] Oleg V. Vasilyev, High order finite difference schemes on non-uniform meshes
with good conservation properties, J. Comput. Phys. 157 (2000), no. 2, 746761.
[Wan81] H. H. Wang, A parallel method for tridiagonal equations, ACM Trans. Math.
Softw. 7 (1981), no. 2, 170183.
[WD01] Robert V. Wilson and Ayodeji O. Demuren, Higher-order compact schemes for
numerical simulation of incompressible flows, part ii: Applications, Numerical
Heat Transfer, Part B 13 (2001), no. 39, 231255.
[WHS07] Claus Wagner, Thomas Huttl, and Pierre Sagaut (eds.), Large-eddy simulation
for acoustics, thirteenth ed., pp. 89127, Large-eddy simulation for acoustics,
2007.