Pressure Solver
+ =
( ) ( )
i j i
j i j
u u u
t x x x
+ = +
( ) ( ) ( )
j i ij
j j j
H u H u
t x x x t
+ = + +
( ) ( )
, 1,
i j i
j j j
Y u Y i NS
t x x x
| |
+ = + = |
\ .
where is density,
u is velocity vector, p is pressure,
Y is the mass fraction of species i (out of a
total of NS species),
is viscous stress tensor,
q is heat flux vector; H is the total (or stagnation)
enthalpy given by
i i
H h u u = + (5)
with h being the specific enthalpy which is related to the temperature (T) as
( )
i pi
h h C dT = +
C is the specific-heat (for constant pressure processes) for the ith species. An equation of
state is required to relate the density to the thermodynamic variables; for an ideal gas, we use the
p RT = (7)
where R is the gas constant.
The constitutive relation between stress and strain rate for a Newtonian fluid is used to relate
the components of the stress tensor to velocity gradients:
( )
2 2
3 3
i l
ij t ij ij
j i l
u u
x x x
| |
= + + |
\ .
where is the molecular viscosity and
is the turbulent (eddy) viscosity to be defined later. The
heat flux vector is obtained from Fouriers law:
t j
| |
= +
\ .
is the Kronecker delta and is the thermal conductivity;
Pr is the laminar Prandtl
number defined as:
= (10)
For turbulence closure, the model employed is the k- model. The eddy viscosity is estimated
from the turbulent kinetic energy (k) and the rate of dissipation of turbulent kinetic energy () by
the following relationship:
C f k
= (11)
The k and are estimated by their own transport equations which can be written, in Cartesian
coordinates, as the following:
( ) ( )
i k
i i k i
k u k P
t x x x
( | |
+ = + +
( |
( \ .
( ) ( )
1 2
i k
i i i
u C P C
t x x x k k
( | |
+ = + +
( |
( \ .
P is the production of k from the mean flow shear stresses and is given by
i i i
k ij t
j j i j
u u u
x x x x
| |
= = + |
\ .
The term represents the chemical heat release source terms which are obtained from the laws
of mass action. A set of chemical reactions can be expressed as follows for the ith species of the jth
reaction, in terms of the stoichiometric coefficients (
for the reactants and products,
1 1
, 1,...,
ij i ij i
i i
M M j NR
= =
The net rate of change of the molar concentration,
X , of species i due to reactions j is given by
( )
ij ij
j j
i i
ij ij ij f b
i i Wi Wi
| | | |
( =
| |
\ . \ .
and the net species production rate,
, is obtained by summing over all reactions:
i Wi ij
The forward rate of reaction is given by the modified Arrhenius law:
B j
f j
| |
\ .
and the corresponding backward reaction can be obtained from
= (19)
K is the equilibrium coefficient given by
( )
j j i
p RT
| |
| |
| =
\ .
\ .
G is the Gibbs free energy and
1 1
j ij ij
i i
= =
In the above equations,
A ,
B and
E are constants.
2.1. Transformation to Curvilinear Coordinates
For arbitrary shaped geometries, the governing equations are transformed into generalized
curvilinear coordinates (, , ), where =(x,y,z), =(x,y,z) and =(x,y,z). The transformation
of the physical domain (x,y,z) to the computational domain (,,) is achieved by the following
11 12 13
21 22 23
31 32 33
x y z
x y z
x y z
f f f
f f f
f f f
where the metrics are
11 12 13
, , f y z z y f z x x z f x y y x
= = =
12 22 23
, , f z y y z f x z z x f y x x y
= = = (23)
31 32 33
, , f y z z y f z x x z f x y y x
= = =
and J is the Jacobian determinant of the transformation given by
( , , )
( , , )
x x x
x y z
J y y y
z z z
= =
Figure 1. 2-D structured grid (a) Physical plane. (b) Transformed (computational) plane.
2.3. SIMPLE Algorithm for Pressure-Velocity Coupling
Following the standard procedure employed in the SIMPLE algorithm (Ref. [7]), suppose that
the velocity field at an intermediate step of the iterative solution procedure is given by u*, v* and
w*, corresponding to a pressure field p*. The new predicted velocity and pressure fields can then
be obtained by adding a correction as follows:
* * * *
, , , p p p u u u v v v w w w = + = + = + = + (25)
To obtain the pressure and velocity corrections, first the u-momentum equation for this
intermediate velocity field is written as
* * * u
P P nbr nbr P D C
A u A u S S S = + + + (26)
( )
* * * *
11 21 31 P
S f p f p f p
= + + (27)
From this, an expression for the correction of the u-component of velocity can be obtained:
( )
11 21 31
u f p f p f p
= + + (28)
Similarly, v and w can be written as functions of p . Finally, plugging the predicted velocity field
( * u u u = + , etc.) into the continuity equation yields an equation for p :
( ) ( )
p V
p C p V
RT t RT t
| | | |
+ =
| |
\ .
\ .
i i i (29)
where the superscript n represents solution at the old time level.
2.4. PISO-Based Predictor-Corrector Algorithm for Unsteady Flows
The steady-state SIMPLE algorithm can be extended in a straightforward manner for unsteady
flows by including the unsteady terms in the Navier-Stokes equations. However, this approach is
computationally expensive since several iterations have to be conducted at each timestep. A more
efficient procedure has been developed in Ref. [9] which is based on the PISO algorithm of Ref.
[10]. The unsteady algorithm can be summarized as follows:
Start with the previous timestep values: , , , ,
n n n n n
u v p T
Predictor: solve the momentum equations implicitly treating the pressure gradient from the
previous timestep explicitly to get
u and
First corrector:
Compute control volume interface velocities
Compute p and update velocities using p to get
u and
Update pressure using p and density from equation of state to yield
p and
Second corrector
Compute p
Update velocities using p to get
u and
Update pressure and density to obtain
p and
from equation of state
u ,
v ,
p ,
are considered the values at the new time level (n+1) and we proceed to the first
step above for the next time level.
An example of the application of STREAM for unsteady flow in a multi-stage turbomachine is
shown in Fig. 2.
(a) Transient pressure field in the first-stage impeller
(b) Pressure fluctuation (with time) at the diffuser inlet (c) FFT of the pressure signal showing the harmonics
Figure 2. Unsteady turbomachinery computation using STREAM.
0 2000 4000 6000 8000 10000 12000 14000
FFT of pressure time history
0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02
x 10
Pressure Oscillations in the Diffuser
In writing finite volume applications within the structured curvilinear-coordinate framework, a
number of complexities appear which are simply a result of use of this framework. These issues can
be categorized as follows:
1) Curvilinear coordinate framework issues.
2) Multi-block framework issues.
In the early development of structured grid algorithms for grid generation and finite-volume
Navier-Stokes solvers, curvilinear coordinates were adopted as a natural way to map physical space
to computational space. While this mapping initially appears convenient, it introduces quite a lot of
unnecessary difficulty into the process of finite-volume discrete integration. The primary source of
excess complexity arises with terms in the governing equations which contain products of
derivatives, such as the viscous dissipation term of the energy equation. This term can be naturally
expanded in the form of Cartesian gradients. When using curvilinear coordinates each of the
Cartesian gradients is expanded into gradients in computational space along with corresponding
metrics. Such an expansion immediately triples the number of terms in the equation. In the process
of unstructured grid assembly, curvilinear coordinates are completely abandoned and the Cartesian
gradient components are evaluated directly with the aid of the relation derived from the divergence
Another major complexity introduced via curvilinear coordinates is the artificial separation of
control volume faces into three distinct types. These are commonly labeled as North, South, East,
West, Top and Bottom faces as shown in Fig. 1. The typical assembly for a scalar
convection/diffusion/source equation involves an internal loop over the interior control volumes of
a block to compute fluxes. Separate loops are then undertaken to handle boundaries. Since the
boundaries fall along three different computational space coordinate surfaces, three distinct loops
are required to complete the assembly process. In the process of unstructured grid finite volume
integration, all modern solvers now use the process of either face-based or edge-based assembly. In
this process, fluxes for each of the control volumes is assembled by a sweep over the faces (when
variables are cell-centered) or edges (when variable are node- centered) of the grid. Taking the
face-based process for example, the only distinction now made during flux assembly is between an
interior face and a boundary face. Interior faces are those which contain elements on both sides and
boundary faces those which are connected to a single element. This type of assembly process has
been found to greatly reduce the amount of code required. In addition to theses complexities, the
multi-block framework in itself leads to several algorithmic complications which disappear when
employing unstructured grids. Since the complete domain is decomposed into a number of blocks,
an entirely separate set of functions must be employed to communicate information between these
blocks. For general block-to-block connectivities, these functions can become quite complex,
especially if one-to-one connectivity (grid points match at the boundary between blocks) is not
present. Since the unstructured grid contains no blocks, this additional complexity does not arise.
The multi-block approach also creates complications regarding domain decomposition. Since
multi-block codes are most naturally decomposed into functions which operate on individual
blocks, the desire to maintain this structure often means that domain decomposition is performed at
the block level as well. Such a choice often leads to load-balancing problems in the distributed
memory environment, in which each processor is performing calculations on a single block of the
domain. To achieve equal load balancing, in which all processors have an equal amount of work, in
some instances, structured multi-block codes require all blocks to have the same number of grid
points, which while alleviating the load-balancing problem, can result in excessive grid generation
time and an overall waste of grid points. With the unstructured grid approach, sophisticated
automatic domain decomposition methods such as METIS [Ref. 11] have been developed which
eliminate these problems and result in a good level of load balancing for general unstructured grids.
While the benefits of migrating to unstructured grids are clear, the structured grid framework
does provide one advantage. Overlooking the code complexity issue, once a structured code is
written and well-tested, these codes tend to give generally higher performance than unstructured
codes on cache-based memory architectures, since memory is accessed in a more uniform manner.
This is entirely attributable to the ordered nature of the arrays used in structured codes. To use
cache-base memory machines in an efficient manner, unstructured codes must employ
node/edge/element reordering functions to ensure efficient memory access. Even with these
considerations, structured grid codes are still superior in this regard.
The underlying algorithm of STREAM-UNS is the same as that of STREAM, the primary
difference being the use of face-based assembly which can accommodate arbitrary polygonal
unstructured meshes. The generic convection-diffusion-source equation for any variable (such as
momentum, energy, species, etc.) is written in integral form for a control volume (e.g., the
shaded region surrounding a control point P in Fig. 3) bounded by surface as:
d F nd S d
+ =
STREAM-UNS is coded in C++ using object-oriented design. Figure 4 shows a steady state
computation of incompressible flow over a swimmer delivery vehicle (SDV) using STREAM-
Figure 3. Control volume for integration on unstructured grid.
Figure 4. Submerged vehicle computation using STREAM-UNS.
In this section we provide a basic overview of the LOCI programming system (Ref. [5]) and
describe how it can be used to implement unstructured grid solvers. For more complete details on
the LOCI system, the reader should consult the LOCI tutorial (Ref. [6]) which is available with the
LOCI distribution. Following this overview, we will discuss some of the anticipated benefits of
using LOCI and outline the strategy for testing the feasibility of using LOCI for pressure-based
unstructured grid assembly.
4.1. LOCI: Overview
Programs written using LOCI consist of the following three general components:
1) Fact Database: This database which is maintained by LOCI contains all information which is
known about the problem being solved. For finite-volume programs for fluid-flow and heat
transfer, this information usually consists of items such as boundary conditions, initial conditions,
material properties, and the combustion mechanism among other things. The fact database is
usually constructed during the input section of the user's program. For example, the user may have
a function called readBoundaryConditions() inside which the boundary conditions associated with
the problem would be read and entered into the fact database.
2) Rules: Rules can be thought of as the components of the finite-volume algorithm which are used
to compute the desired solution from the known facts. Each rule can be represented in symbolic
form by a rule signature. For example, a rule to compute the centroid of the triangles in a 2-D
unstructured grid can be represented by the following rule signature:
This rule can be translated as the centriod of each triangle is computed from the position property
of the triangle's nodes. The <- operator signifies the output of a rule, while the -> operator
signifies that we are using the position property of the triangle nodes in the calculation process.
This rule signature is really only a symbolic representation of what the rule is doing. For each rule
signature, the programmer must supply a C++ class which actually implements the functions of the
rule. An example of this will be given in a later section.
3) Query: Once the facts and rules are specified, one obtains the solution by executing a query to
the fact database for a desired solution. At this point LOCI attempts to order all the rules into an
execution schedule which can produce the solution. If LOCI finds that it is not possible to arrive at
a solution given the known facts and list of rules, it will inform the user.
For completeness, a skeleton main function for a finite volume code is shown in Fig. 5.
int main(int argc,char *argv[]){
// Initialize the LOCI system.
Loci::Init(&argc,&argv) ;
// Setup the fact database and read known facts. Grid and boundary conditions
// are inserted into the fact database.
fact_db factDatabase ;
readGrid(factDatabase) ; readBoundaryConditions(factDatabase) ; ...
// Add all previously registered rules which define the finite-volume
// program to the rule database. Each rule is specified as a C++ class and
// registered in the global rule list in a separate implementation file.
rule_db ruleDatabase ; ruleDatabase.add_rules(global_rule_list) ;
// Distribute the rules and facts to the various processors.
int numProcesses=Loci::MPI_processes,myID=Loci::MPI_rank ;
std::vector<entitySet> partition=Loci::generate_distribution
(factDatabase,ruleDatabase) ;
Loci::distribute_facts(partition,factDatabase,ruleDatabase) ;
// Specify the 'query' and set up an execution schedule to satisfy it. Here
// we are asking for the solution, which also happens to be a rule.
string query("solution") ;
executeP schedule=create_execution_shedule(ruleDatabase,factDatabase,query) ;
// Execute the schedule to produce the solution.
schedule->execute(factDatabase) ;
// Finalize the LOCI system.
Loci::Finalize() ;
Figure 5. LOCI: Example 1.
4.2. Basic Data Structures of LOCI
In order to provide some foundation for understanding the implementation files for the rules
which compose any finite-volume program constructed using the LOCI system, we present some of
the basic data structures used in LOCI. Only the data structures required for understanding the
following material are provided.
(a) Entity:
This data type is used to identify objects in LOCI (e.g. triangles, edges, etc.). In LOCI, each entity
is given an integer number for identification. For example, a list of triangle entities can be created
by the following statement, where numTriangle has been previously defined (maybe by reading a
grid file):
Entity triangles(numTriangle) ;
The triangle entities in this list are numbered sequentially from 0 to (numTriangle-1).
(b) store:
The data type is essentially an array which holds a number of values. Stores are usually associated
with a collection of entities. The store then holds a single value for each entity. For example, we
may have a store to hold the centroid value for a collection of triangles as follows:
store<vector2d<double>> triangleCentroid ;
(c) MapVec:
This data structure is used to map one collection of entities to another. For example, to hold the
triangle-to-node connectivity information in a 2-D finite-volume code we would have the
MapVec<3> triangleNodes ;
Thus, for each triangle, we hold the three global node numbers which define the triangle.
4.3. Implementation of Rules in LOCI.
In LOCI, each rule that composes the program is implemented in the form of a C++ class,
which provides the functionality associated with the rule. A sample implementation for the triangle
centroid rule discussed above is shown in Fig 6.
Each rule class provides three basic functions:
(1) A constructor, which essentially registers the data used and produced by the rule with LOCI
(2) A calculation method which specifies the procedure for computing the output for a single entity.
(3) A compute method which calls calculate() for a sequence of entities. This method is
implemented in LOCI as a template function, which allows LOCI to avoid calling the virtual
method calculate() at the loop level, which would significantly decrease the calculation efficiency.
In addition to the rule implementation class, one also creates a global register_rule<> object
which allows the rule to be registered with the global rule list which is maintained by LOCI.
class triangleCentroid : public pointwise_rule {
const_store<vector2d<double> > position ;
const_MapVec<3> triangleNodes ;
store<vector2d<double> triangleCentroid ;
// Constructor to provide symbolic names for the data used in
// this rule and to define the input and output quantities.
triangleCentroid() {
name_store("position",position) ;
name_store("triangleNodes",triangleNodes) ;
name_store("triangleCentroid",triangleCentroid) ;
input("triangleNodes->position") ;
output("triangleCentroid") ;
// Method which performs the calculation for a single Entity.
void calculate(Entity e) {
position[triangleNodes[e][1]]+position[triangleNodes[e][2]])/3.0 ;
// Template function to call the calculate method for a sequence of
// entities.
virtual void compute(const sequence &sequence) {
do_loop(sequence,this) ;
} ;
// Create a global object that will register this rule in the global
// rule list.
register_rule<triangleCentroid> registerTriangleCentroid ;
Figure 6. LOCI: Example 2.
4.4. Benefits of Using the LOCI Framework
In using any new system for writing finite-volume applications, the general hope is that one
will spend less time on the actual mechanics of writing code, and thus more time concentrating on
improving other aspects of the solver, such as the implementation of additional turbulence models
or the integration with other solvers to handle multi-disciplinary physics. After all, scientists/
engineers are not in the business of writing code for fun usually there is a physical problem that
needs to be solved. In this regard, there are two major benefits which are envisioned in using LOCI,
rather than other modern coding techniques such as standard object-oriented programming in C++.
(a) LOCI is designed with multi-disciplinary problems in mind.
(b) LOCI automatically handles the partitioning of the unstructured problem in the distributed-
memory environment.
(a) Seamless Integration of Multi-Disciplinary Physics
With the rapid development of computer hardware, it is now feasible to solve quite complex
problems involving multi-disciplinary physics. For example, one may solve a fluid flow/heat
transfer problem in some combustion device, where one not only computes the fluid flow using a
finite-volume solver, but also solves the heat-transfer and stress problem in the solid section of the
device simultaneously. One approach commonly used these days is to solve each section (fluid or
solid) independently and iterate several times until a converged solution is obtained in both regions.
In this approach, the fluid and solid solution procedures are said to be loosely coupled, and in fact
most often are obtained using different solvers which known nothing of the other. This approach
works, but is usually a very slow process due to the loose nature of the coupling between the two
A better approach to the multidisciplinary problem involves the so-called tight coupling
between the components, in which the different solvers operate in a more closely coordinated
manner. In such a fashion, each of the solvers has some knowledge of the other, and the interface
between the components allows data to be exchanged at a much higher frequency, usually at the
inner iteration level. In the most extreme case of tight coupling, all components (fluid/heat-
transfer/stress) are solved together simultaneously at every iteration. In this case, there is really
only one solver.
One of the major strengths of LOCI is its ability to handle all approaches, from loosely-coupled
to tightly-coupled. When writing applications in LOCI, it is not necessary that all components of
the application be entirely written within the rule-based framework. Applications can exist as
independent modular components which can be linked to other components written entirely in
LOCI by encapsulating the component as a rule. For example, LOCI has an interface to the PETSc
[Ref. 12] linear algebra library. In both the loosely- and tightly-coupled approaches, a significant
advantage of LOCI is its ability to check the internal consistency of a program. Often times,
components of a multidisciplinary application may be written by different developers, who may not
have detailed knowledge of all system components. When all components are used to solve a given
problem, LOCI guarantees that a program schedule is generated which ensures that information
between the components is computed at the appropriate time and all information required by each
component is available when it is needed. If the components cannot be linked together due to an
insufficient specification of the interface between them, LOCI informs the user that a schedule
cannot be generated and terminates execution. This feature completely eliminates errors associated
with inter-component coordination, which become more common in complex codes written in a
multidisciplinary environment.
(b) Grid Partitioning for Distributed-Memory Environment
Regarding the second issue, one can see from the main program example in Fig. 5 that it is very
simple to create programs which can run in a distributed memory environment. LOCI handles all
partitioning of the problem, including the grid, the rule database and the fact database. This feature
of LOCI is a major advantage over other approaches such as standard object-oriented C++, where
the programmer must entirely code and debug a separate layer of the program devoted to
partitioning of the problem. For scientists/engineers inexperienced in the area of message-passing
(e.g. MPI), the use of LOCI can entirely eliminate the need for this extra complexity.
4.6. Plan for Implementation and Testing of STREAM-UNS-LOCI
The primary criteria for judging the effectiveness of LOCI lies in answering the following
(1) Is there a significant advantage to be gained in program organization and simplification by
using LOCI, in comparison with standard object-oriented C++?
(2) The internals of LOCI are a black box to the programmer. So, are the resulting programs as
efficient as the currently-existing version STREAM-UNS which employs standard object-oriented
techniques in C++?
In order to answer these questions, the technology of STREAM-UNS will be ported to a new
code called STREAM-UNS-LOCI written entirely in the LOCI framework. This new code will be
tested versus existing implementations of STREAM-UNS written using standard object-oriented
techniques in C++, with standard MPI message passing for the distributed computing environment.
Regarding code organization and maintenance, we have found that object-oriented techniques
significantly improve both the initial program design as well as the ability to enhance and maintain
codes as new features are added. Whether the use of a rule-based (as opposed to object-based)
system improves one's ability to develop and maintain code can only be answered by a direct head-
to-head comparison. In addition, since all data is allocated and controlled by the LOCI system, the
programmer has less control of the internal representation of the data. In traditional object-oriented
approaches, especially for codes finely tailored to a specific application (e.g. fluid flow solvers)
concrete data types are created which have a very narrow scope of function, but are therefore
highly efficient. Will the more general data types provided by LOCI achieve the same level of
performance? Results of this ongoing work will be reported at a later date in hopes that a better
understanding of LOCI in relation to its application to pressure-based solvers can assist others in
deciding if LOCI is a viable alternative to the more traditional approaches for program
