0% found this document useful (0 votes)
297 views

PETSc Tutorial

This document provides an introduction to the Portable, Extensible Toolkit for Scientific Computation (PETSc). It outlines the tutorial objectives of introducing PETSc, demonstrating how to write a parallel implicit PDE solver using PETSc, and learning about interfaces to other packages. The tutorial covers topics such as getting started with PETSc, defining data objects like vectors and matrices, using solvers, data layout, and putting together a complete example application.

Uploaded by

Pranav Ladkat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
297 views

PETSc Tutorial

This document provides an introduction to the Portable, Extensible Toolkit for Scientific Computation (PETSc). It outlines the tutorial objectives of introducing PETSc, demonstrating how to write a parallel implicit PDE solver using PETSc, and learning about interfaces to other packages. The tutorial covers topics such as getting started with PETSc, defining data objects like vectors and matrices, using solvers, data layout, and putting together a complete example application.

Uploaded by

Pranav Ladkat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 132

PETSc Tutorial

Numerical Software Libraries for


the Scalable Solution of PDEs
Satish Balay
Kris Buschelman
Bill Gropp
Lois Curfman McInnes
Barry Smith

Mathematics and Computer Science Division


Argonne National Laboratory
http://www.mcs.anl.gov/petsc
Intended for use with version 2.0.29 of PETSc

2 of 132

Tutorial Objectives
Introduce the Portable, Extensible Toolkit for
Scientific Computation (PETSc)
Demonstrate how to write a complete parallel
implicit PDE solver using PETSc
Learn about PETSc interfaces to other packages
How to learn more about PETSc

3 of 132

The Role of PETSc


Developing parallel, non-trivial PDE solvers that
deliver high performance is still difficult, and
requires months (or even years) of concentrated
effort.
PETSc is a toolkit that can ease these difficulties
and reduce the development time, but it is not a
black-box PDE solver nor a silver bullet.

4 of 132

What is PETSc?

A freely available and supported research code

Available via http://www. mcs.anl. gov/petsc


Hyperlinked documentation and manual pages for all routines
Many tutorial-style examples
Support via email: petsc [email protected]
Usable from Fortran 77/90, C, and C++

Portable to any parallel system supporting MPI, including


Tightly coupled systems
Cray T3E, SGI Origin, IBM SP, HP 9000, Sun Enterprise

Loosely coupled systems, e.g., networks of workstations


Compaq, HP, IBM, SGI, Sun
PCs running Linux or NT

PETSc history
Begun in September 1991
Now: over 4,000 downloads of version 2.0

PETSc funding and support


Department of Energy, MICS Program DOE2000
National Science Foundation, Multidisciplinary Challenge Program, CISE

5 of 132

PETSc Concepts
How to specify the mathematics of the problem
Data objects
vectors, matrices

How to solve the problem


Solvers
linear, nonlinear, and time stepping (ODE) solvers

Parallel computing complications


Parallel data layout
structured and unstructured meshes

6 of 132

Tutorial Topics
Getting started
sample results
programming paradigm

Data objects
vectors (e.g., field variables)
matrices (e.g., sparse
Jacobians)

Viewers
object information
visualization

Solvers
linear
nonlinear
timestepping (and ODEs)

Data layout and ghost


values
structured and unstructured
mesh problems
partitioning and coloring

Putting it all together


a complete example

Debugging and error


handling
Profiling and
performance tuning
Extensibility issues
Using PETSc with other
software packages

Tutorial Topics:
Using PETSc with Other Packages

PVODE ODE integrator


A. Hindmarsh et al. - http://www.llnl.gov /CASC/PVODE

ILUDTP drop tolerance ILU


Y. Saad - http://www.cs.umn.edu/~saad
ParMETIS parallel partitioner

G. Karypis - http://www. cs.umn.edu/~karypis


Overture composite mesh PDE package
D. Brown, W. Henshaw, and D. Quinlan - http://www.llnl.gov /CASC/Overture
SAMRAI AMR package
S. Kohn, X. Garaiza, R. Hornung, and S. Smith - http://www.llnl.gov /CASC/SAMRAI
SPAI sparse approximate inverse preconditioner
S. Bernhard and M. Grote - http://www. sam.math. ethz.ch/~grote/spai
Matlab

http://www. mathworks.com

TAO optimization software


S. Benson, L.C. McInnes, and J. Mor - http://www. mcs.anl. gov/tao

7 of 132

8 of 132

Tutorial Approach
From the perspective of an application programmer:
Beginner

Advanced

basic functionality, intended


for use by most programmers

user-defined customization of
algorithms and data structures

Only in this tutorial

beginner
beginner

advanced
advanced

Intermediate
selecting options, performance
evaluation and tuning
2
intermediate
intermediate

Developer
advanced customizations,
intended primarily for use by
library developers
4
developer
developer

9 of 132

Incremental Application Improvement


Beginner
Get the application up and walking

Intermediate
Experiment with options
Determine opportunities for improvement

Advanced
Extend algorithms and/or data structures as needed

Developer
Consider interface and efficiency issues for integration and
interoperability of multiple toolkits

Full tutorials available at http://www.mcs.anl.gov/petsc/docs/tutorials

10 of 132

Structure of PETSc
PETSc PDE Application Codes

ODE Integrators

Visualization

Nonlinear Solvers,
Interface
Unconstrained Minimization
Linear Solvers
Preconditioners + Krylov Methods
Object-Oriented
Grid
Matrices, Vectors, Indices
Management
Profiling Interface
Computation and Communication Kernels
MPI, MPI-IO, BLAS, LAPACK

11 of 132

PETSc Numerical Components


Nonlinear Solvers
Newton-based Methods

Time Steppers

Other

Backward Pseudo Time


Euler
Stepping

Euler

Line Search Trust Region

Other

Krylov Subspace Methods


GMRES

CG

CGS

Bi-CG-STAB

TFQMR

Richardson Chebychev Other

Preconditioners
Additive
Schwartz

Block
Jacobi

Compressed
Sparse Row
(AIJ)

Jacobi

ILU

ICC

Others

Matrices

Blocked Compressed
Sparse Row
(BAIJ)

Block
Diagonal
(BDIAG)

Distributed Arrays

Dense

Other

Index Sets
Indices

Vectors

LU
(Sequential only)

Block Indices

Stride

Other

12 of 132

Flow of Control for PDE Solution


Main Routine

Timestepping Solvers (TS)


Nonlinear Solvers (SNES)
Linear Solvers (SLES)

PETSc
PC
Application
Initialization

KSP
Function
Evaluation
User code

Jacobian
Evaluation
PETSc code

PostProcessing

13 of 132

Flow of Control for PDE Solution


Main Routine
Overture

SAMRAI

Timestepping Solvers (TS)


Nonlinear Solvers (SNES)
Linear Solvers (SLES)

PETSc
PC
Application
Initialization
User code

SPAI

KSP

ILUDTP
PETSc code

Function
Evaluation
Other Tools

Jacobian
Evaluation

PVODE

PostProcessing

14 of 132

Levels of Abstraction
in Mathematical Software
Application-specific interface
Programmer manipulates objects associated with
the application

High-level mathematics interface


Programmer manipulates mathematical objects,
such as PDEs and boundary conditions

Algorithmic and discrete mathematics


interface
PETSc
emphasis

Programmer manipulates mathematical objects


(sparse matrices, nonlinear equations),
algorithmic objects (solvers) and discrete
geometry (meshes)

Low-level computational kernels


e.g., BLAS-type operations

15 of 132

Basic PETSc Components


Data Objects
Vec (vectors) and Mat (matrices)
Viewers

Solvers
Linear Systems
Nonlinear Systems
Timestepping

Data Layout and Ghost Values


Structured Mesh
Unstructured Mesh

16 of 132

PETSc Programming Aids


Correctness Debugging
Automatic generation of tracebacks
Detecting memory corruption and leaks
Optional user-defined error handlers

Performance Debugging
Integrated profiling using -log_summary
Profiling by stages of an application
User-defined events

17 of 132

The PETSc Programming Model


Goals
Portable, runs everywhere
Performance
Scalable parallelism

Approach
Distributed memory, shared-nothing
Requires only a compiler (single node or processor)
Access to data on remote machines through MPI

Can still exploit compiler discovered parallelism on each node


(e.g., SMP)
Hide within parallel objects the details of the communication
User orchestrates communication at a higher abstract level than
message passing

18 of 132

Collectivity
MPI communicators (MPI_Comm) specify collectivity
(processors involved in a computation)
All PETSc creation routines for solver and data objects are
collective with respect to a communicator, e.g.,
VecCreate(MPI_Comm comm, int m, int M, Vec *x)

Some operations are collective, while others are not, e.g.,


collective: VecNorm( )
not collective: VecGetLocalSize()

If a sequence of collective routines is used, they must be


called in the same order on each processor

19 of 132

Hello World
#include petsc.h
int main( int arc, char *argv[] )
{
PetscInitialize( &argc, &argv,
NULL, NULL );
PetscPrintf( PETSC_COMM_WORLD,
Hello World\n);
PetscFinalize();
return 0;
}

20 of 132

Hello World (Fortran)


program main
integer ierr, rank
#include "include/finclude/petsc.h"
call PetscInitialize( PETSC_NULL_CHARACTER,ierr )
call MPI_Comm_rank( PETSC_COMM_WORLD, rank, ierr )
if (rank .eq. 0) then
print *, Hello World
endif
call PetscFinalize(ierr)
end

21 of 132

Fancier Hello World


#include petsc.h
int main( int arc, char *argv[] )
{
int rank;
PetscInitialize( &argc, &argv,
NULL, NULL );
MPI_Comm_rank( PETSC_COMM_WORLD, &rank );
PetscSynchronizedPrintf( PETSC_COMM_WORLD,
Hello World from %d\n, rank);
PetscFinalize();
return 0;
}

22 of 132

Solver Definitions: For Our Purposes


Explicit: Field variables are updated using
neighbor information (no global linear or
nonlinear solves)
Semi-implicit: Some subsets of variables (e.g.,
pressure) are updated with global solves
Implicit: Most or all variables are updated in a
single global linear or nonlinear solve

23 of 132

Focus On Implicit Methods


Explicit and semi-explicit are easier cases
No direct PETSc support for
ADI-type schemes
spectral methods
particle-type methods

24 of 132

Numerical Methods Paradigm


Encapsulate the latest numerical algorithms in a
consistent, application-friendly manner
Use mathematical and algorithmic objects, not
low-level programming language objects
Application code focuses on mathematics of the
global problem, not parallel programming details

25 of 132

Data Objects
Vectors (Vec)
focus: field data arising in nonlinear PDEs

Matrices (Mat)
focus: linear operators arising in nonlinear PDEs (i.e., Jacobians)
beginner
beginner

Object creation

beginner
beginner

Object assembly

intermediate
intermediate

Setting options

intermediate
intermediate

Viewing

advanced
advanced

User-defined customizations
tutorial outline:
tutorial outline:
data objects
data objects

26 of 132

Vectors
Fundamental objects for storing field
solutions, right-hand sides, etc.
VecCreateMPI(...,Vec *)
MPI_Comm - processors that share the
vector
number of elements local to this processor
total number of elements

Each process locally owns a subvector


of contiguously numbered global
indices
beginner
beginner

proc 0
proc 1
proc 2
proc 3
proc 4

data objects:
data objects:
vectors
vectors

27 of 132

Vector Assembly
VecSetValues(Vec,)
number of entries to insert/add
indices of entries
values to add
mode: [INSERT_VALUES,ADD_VALUES]
VecAssemblyBegin(Vec)
VecAssemblyEnd(Vec)

beginner
beginner

data objects:
data objects:
vectors
vectors

28 of 132

Parallel Matrix and Vector Assembly


Processors may generate any entries in vectors and
matrices
Entries need not be generated on the processor on
which they ultimately will be stored
PETSc automatically moves data during the
assembly process if necessary

beginner
beginner

data objects:
data objects:
vectors and
vectors and
matrices
matrices

29 of 132

Selected Vector Operations


Function Name

Operation

VecAXPY(Scalar *a, Vec x, Vec y)


VecAYPX(Scalar *a, Vec x, Vec y)
VecWAXPY(Scalar *a, Vec x, Vec y, Vec w)
VecScale(Scalar *a, Vec x)
VecCopy(Vec x, Vec y)
VecPointwiseMult( Vec x, Vec y, Vec w)
VecMax(Vec x, int *idx, double *r)
VecShift(Scalar *s, Vec x)
VecAbs(Vec x)
VecNorm(Vec x, NormType type , double *r)

y = y + a*x
y = x + a*y
w = a*x + y
x = a*x
y=x
w_i = x_i *y_i
r = max x_i
x_i = s+x_i
x_i = |x_i |
r = ||x||

beginner
beginner

data objects:
data objects:
vectors
vectors

30 of 132

Simple Example Programs


Location:

petsc/src/sys/examples/tutorials/

E ex2.c

Location:

- synchronized printing
petsc/src/vec/examples/tutorials/

E ex1.c, ex1f.F, ex1f90.F

E ex3.c, ex3f.F

1
beginner
beginner

- basic vector routines


- parallel vector layout

And many more examples ...


E - on-line exercise

data objects:
data objects:
vectors
vectors

31 of 132

Sparse Matrices
Fundamental objects for storing linear operators
(e.g., Jacobians)
MatCreateMPIAIJ(,Mat *)
MPI_Comm - processors that share the matrix
number of local rows and columns
number of global rows and columns
optional storage pre-allocation information

beginner
beginner

data objects:
data objects:
matrices
matrices

32 of 132

Parallel Matrix Distribution


Each process locally owns a submatrix of contiguously
numbered global rows.
proc 0
proc 1
proc 2
proc 3
proc 4

} proc 3: locally owned rows

MatGetOwnershipRange(Mat A, int *rstart, int *rend)


rstart: first locally owned row of global matrix
rend -1: last locally owned row of global matrix

beginner
beginner

data objects:
data objects:
matrices
matrices

33 of 132

Matrix Assembly
MatSetValues(Mat,)
number of rows to insert/add
indices of rows and columns
number of columns to insert/add
values to add
mode: [INSERT_VALUES,ADD_VALUES]
MatAssemblyBegin(Mat)
MatAssemblyEnd(Mat)

beginner
beginner

data objects:
data objects:
matrices
matrices

34 of 132

Blocked Sparse Matrices


For multi-component PDEs
MatCreateMPIBAIJ(,Mat *)
MPI_Comm - processors that share the matrix
block size
number of local rows and columns
number of global rows and columns
optional storage pre-allocation information

beginner
beginner

data objects:
data objects:
matrices
matrices

35 of 132

Blocking: Performance Benefits


More issues and details discussed in Performance Tuning section
100

3D compressible
Euler code
Block size 5
IBM Power2

80
MFlop
/sec

60
40

Blocked

Basic

20

Matrix-vector products
Triangular solves
beginner
beginner

data objects:
data objects:
matrices
matrices

36 of 132

Viewers
beginner
beginner

beginner
beginner
intermediate
intermediate

Printing information about solver and


data objects
Visualization of field and matrix data
Binary output of vector and matrix data

tutorial outline:
tutorial outline:
viewers
viewers

37 of 132

Viewer Concepts
Information about PETSc objects
runtime choices for solvers, nonzero info for matrices, etc.

Data for later use in restarts or external tools


vector fields, matrix contents
various formats (ASCII, binary)

Visualization
simple x-window graphics
vector fields
matrix sparsity structure

beginner
beginner

viewers
viewers

38 of 132

Viewing Vector Fields


VecView(Vec x,Viewer v);

Default viewers

Solution components,
using runtime option
-snes_vecmonitor

ASCII (sequential): VIEWER_STDOUT_SELF


ASCII (parallel):
VIEWER_STDOUT_WORLD
X-windows:
VIEWER_DRAW_WORLD

Default ASCII formats

VIEWER_FORMAT_ASCII_DEFAULT
VIEWER_FORMAT_ASCII_MATLAB
VIEWER_FORMAT_ASCII_COMMON
VIEWER_FORMAT_ASCII_INFO
etc.

velocity: v

vorticity:
beginner
beginner

velocity: u

temperature: T
viewers
viewers

39 of 132

Viewing Matrix Data


MatView(Mat A, Viewer v);

Runtime options available


after matrix assembly
-mat_view_info
info about matrix assembly
-mat_view_draw
sparsity structure
-mat_view
data in ASCII

etc.
beginner
beginner

viewers
viewers

40 of 132

Solvers: Usage Concepts


Solver Classes
Linear (SLES)
Nonlinear (SNES)
Timestepping (TS)

important concepts
important concepts

Usage Concepts

Context variables
Solver options
Callback routines
Customization

tutorial outline:
tutorial outline:
solvers
solvers

41 of 132

Linear PDE Solution


Main Routine

PETSc

Linear Solvers (SLES)

Solve
Ax = b

Application
Initialization

PC

Evaluation of A and b

User code
beginner
beginner

KSP

PostProcessing

PETSc code
solvers:
solvers:
linear
linear

42 of 132

Linear Solvers
Goal: Support the solution of linear systems,
Ax=b,
particularly for sparse, parallel problems arising
within PDE-based models
User provides:
Code to evaluate A, b

beginner
beginner

solvers:
solvers:
linear
linear

43 of 132

Sample Linear Application:


Exterior Helmholtz Problem
Solution Components

2u k 2u = 0
lim r

1/ 2

beginner
beginner

Real

+ iku = 0

Collaborators: H. M. Atassi, D. E. Keyes,


L. C. McInnes, R. Susan-Resiga

Imaginary

solvers:
solvers:
linear
linear

44 of 132

Helmholtz: The Linear System

Logically regular grid, parallelized with DAs


Finite element discretization (bilinear quads)
Nonreflecting exterior BC (via DtN map)
Matrix sparsity structure (option: -mat_view_draw)

Natural ordering
beginner
beginner

Close-up

Nested dissection ordering


solvers:
solvers:
linear
linear

45 of 132

Linear Solvers (SLES)


SLES: Scalable Linear Equations Solvers
beginner
beginner

Application code interface

beginner
beginner

Choosing the solver

intermediate
intermediate

Setting algorithmic options

intermediate
intermediate

Viewing the solver

intermediate
intermediate

Determining and monitoring convergence

intermediate
intermediate

Providing a different preconditioner matrix

advanced
advanced

Matrix-free solvers

advanced
advanced

User-defined customizations

tutorial outline:
tutorial outline:
solvers:
solvers:
linear
linear

46 of 132

Context Variables
Are the key to solver organization
Contain the complete state of an algorithm,
including
parameters (e.g., convergence tolerance)
functions that run the algorithm (e.g.,
convergence monitoring routine)
information about the current state (e.g., iteration
number)
beginner
beginner

solvers:
solvers:
linear
linear

47 of 132

Creating the SLES Context


C/C++ version
ierr = SLESCreate(MPI_COMM_WORLD,&sles);

Fortran version
call SLESCreate(MPI_COMM_WORLD,sles,ierr)

Provides an identical user interface for all linear


solvers
uniprocessor and parallel
real and complex numbers
beginner
beginner

solvers:
solvers:
linear
linear

48 of 132

Linear Solvers in PETSc 2.0


Krylov Methods (KSP)

Conjugate Gradient
GMRES
CG-Squared
Bi-CG-stab
Transpose-free QMR
etc.

beginner
beginner

Preconditioners (PC)
Block Jacobi
Overlapping Additive
Schwarz
ICC, ILU via
BlockSolve95
ILU(k), LU (sequential
only)
etc.
solvers:
solvers:
linear
linear

49 of 132

Basic Linear Solver Code (C/C++)


SLES
Mat
Vec
int

sles;
A;
x, b;
n, its;

/*
/*
/*
/*

linear solver context */


matrix */
solution, RHS vectors */
problem dimension, number of iterations */

MatCreate(MPI_COMM_WORLD,n,n,&A); /* assemble matrix */


VecCreate(MPI_COMM_WORLD,n,&x);
VecDuplicate(x,&b);
/* assemble RHS vector */
SLESCreate(MPI_COMM_WORLD,&sles);
SLESSetOperators(sles,A,A,DIFFERENT_NONZERO_PATTERN);
SLESSetFromOptions(sles);
SLESSolve(sles,b,x,&its);

beginner
beginner

solvers:
solvers:
linear
linear

50 of 132

Basic Linear Solver Code (Fortran)


SLES
Mat
Vec
integer

sles
A
x, b
n, its, ierr

call MatCreate(MPI_COMM_WORLD,n,n,A,ierr)
call VecCreate(MPI_COMM_WORLD,n,x,ierr)
call VecDuplicate(x,b,ierr)
C

then assemble matrix and right-hand-side vector

call SLESCreate(MPI_COMM_WORLD,sles,ierr)
call SLESSetOperators(sles,A,A,DIFFERENT_NONZERO_PATTERN ,ierr)
call SLESSetFromOptions(sles,ierr)
call SLESSolve(sles,b,x,its,ierr)

beginner
beginner

solvers:
solvers:
linear
linear

51 of 132

Setting Solver Options at Runtime

-ksp_type [cg,gmres,bcgs,tfqmr,]
-pc_type [lu,ilu,jacobi,sor,asm,]

-ksp_max_it <max_iters>
-ksp_gmres_restart <restart>
-pc_asm_overlap <overlap>
-pc_asm_type [basic,restrict,interpolate,none]
etc ...

beginner intermediate
beginner intermediate

solvers:
solvers:
linear
linear

52 of 132

Linear Solvers: Monitoring Convergence

-ksp_monitor

- Prints preconditioned residual norm


1

-ksp_xmonitor

- Plots preconditioned residual norm

-ksp_truemonitor

- Prints true residual norm || b-Ax ||

-ksp_xtruemonitor - Plots true residual norm || b-Ax ||

User-defined monitors, using callbacks

beginner intermediate advanced


beginner intermediate advanced

solvers:
solvers:
linear
linear

53 of 132

Helmholtz: Scalability
128x512 grid, wave number = 13, IBM SP
GMRES(30)/Restricted Additive Schwarz
1 block per proc, 1-cell overlap, ILU(1) subdomain solver
Procs
1
2
4
8
16
32

beginner
beginner

Iterations
221
222
224
228
229
230

Time (Sec)
163.01
81.06
37.36
19.49
10.85
6.37

Speedup
-

2.0
4.4
8.4
15.0
25.6
solvers:
solvers:
linear
linear

54 of 132

SLES: Review of Basic Usage

SLESCreate( )
- Create SLES context
SLESSetOperators( ) - Set linear operators
SLESSetFromOptions( ) - Set runtime solver options
for [SLES, KSP,PC]
SLESSolve( )
- Run linear solver
SLESView( )
- View solver options
SLESDestroy( )

beginner
beginner

actually used at runtime


(alternative: -sles_view)
- Destroy solver

solvers:
solvers:
linear
linear

55 of 132

SLES: Review of Selected Preconditioner Options


Functionality

Procedural Interface

Set preconditioner type PCSetType( )

Set level of fill for ILU


Set SOR iterations
Set SOR parameter
Set additive Schwarz
variant
Set subdomain solver
options

beginner intermediate
beginner intermediate

Runtime Option
-pc_type [lu,ilu,jacobi,
sor,asm,]

PCILULevels( )
PCSORSetIterations( )
PCSORSetOmega( )
PCASMSetType( )

-pc_ilu_levels <levels>
2
-pc_sor_its <its>
-pc_sor_omega <omega>
-pc_asm_type [basic,
restrict,interpolate,none]

PCGetSubSLES( )

-sub_pc_type < pctype>


-sub_ksp_type < ksptype>
-sub_ksp_rtol < rtol>

And many more options...


solvers: linear:
solvers: linear:
preconditioners
preconditioners

56 of 132

SLES: Review of Selected Krylov Method Options


Functionality

Procedural Interface Runtime Option

Set Krylov method

KSPSetType( )

Set monitoring
routine

KSPSetMonitor()

-ksp_type [cg,gmres,bcgs,
1
tfqmr,cgs,]
-ksp_monitor, ksp_xmonitor,
-ksp_truemonitor, -ksp_xtruemonitor

KSPSetTolerances( )
-ksp_rtol <rt> -ksp_atol <at>
Set convergence
-ksp_max_its <its>
tolerances
KSPGMRESSetRestart( ) -ksp_gmres_restart <restart>
Set GMRES restart
parameter
Set orthogonalization KSPGMRESSetOrthogon -ksp_unmodifiedgramschmidt
-ksp_irorthog
routine for GMRES alization( )

And many more options...


1

beginner intermediate
beginner intermediate

solvers: linear:
solvers: linear:
Krylov methods
Krylov methods

57 of 132

SLES: Example Programs


Location:

petsc/src/sles/examples/tutorials/

ex1.c, ex1f.F - basic uniprocessor codes


E ex23.c

- basic parallel code


ex11.c
- using complex numbers

ex4.c

ex9.c
E ex22.c

- using different linear system and


2
preconditioner matrices
- repeatedly solving different linear systems
- 3D Laplacian using multigrid

- setting a user-defined preconditioner

ex15.c

And many more examples ...


1

beginner intermediate advanced


beginner intermediate advanced

E - on-line exercise

solvers:
solvers:
linear
linear

58 of 132

Nonlinear Solvers (SNES)


SNES: Scalable Nonlinear Equations Solvers
beginner
beginner

Application code interface

beginner
beginner

Choosing the solver

intermediate
intermediate

Setting algorithmic options

intermediate
intermediate

Viewing the solver

intermediate
intermediate

Determining and monitoring convergence

advanced
advanced

Matrix-free solvers

advanced
advanced

User-defined customizations
tutorial outline:
tutorial outline:
solvers:
solvers:
nonlinear
nonlinear

59 of 132

Nonlinear PDE Solution


Main Routine

Nonlinear Solvers (SNES)


Solve
F(u) = 0

Linear Solvers (SLES)

PC

Application
Initialization

PETSc

KSP

Function
Evaluation
User code

beginner
beginner

Jacobian
Evaluation
PETSc code

PostProcessing
solvers:
solvers:
nonlinear
nonlinear

60 of 132

Nonlinear Solvers
Goal: For problems arising from PDEs,
support the general solution of F(u) = 0
User provides:
Code to evaluate F(u)
Code to evaluate Jacobian of F(u) (optional)
or use sparse finite difference approximation
or use automatic differentiation (coming soon!)

beginner
beginner

solvers:
solvers:
nonlinear
nonlinear

61 of 132

Nonlinear Solvers (SNES)


Newton-based methods, including

Line search strategies


Trust region approaches
Pseudo-transient continuation
Matrix-free variants

User can customize all phases of the solution process

beginner
beginner

solvers:
solvers:
nonlinear
nonlinear

62 of 132

Sample Nonlinear Application:


Driven Cavity Problem

Velocity-vorticity
formulation
Flow driven by lid and/or
bouyancy
Logically regular grid,
parallelized with DAs
Finite difference
discretization
source code:
petsc/src/snes/examples/tutorials/ex8.c

beginner
beginner

Solution Components

velocity: u

vorticity:

velocity: v

temperature: T

Application code author: D. E. Keyes

solvers:
solvers:
nonlinear
nonlinear

63 of 132

Basic Nonlinear Solver Code (C/C++)


SNES snes;
Mat J;
Vec x, F;
int
n, its;
ApplicationCtx usercontext;

/*
/*
/*
/*
/*

nonlinear solver context */


Jacobian matrix */
solution, residual vectors */
problem dimension, number of iterations */
user-defined application context */

...

MatCreate(MPI_COMM_WORLD,n,n,&J);
VecCreate(MPI_COMM_WORLD,n,&x);
VecDuplicate(x,&F);
SNESCreate(MPI_COMM_WORLD,SNES_NONLINEAR_EQUATIONS,&snes);
SNESSetFunction(snes,F,EvaluateFunction,usercontext);
SNESSetJacobian(snes,J,EvaluateJacobian,usercontext);
SNESSetFromOptions(snes);
SNESSolve(snes,x,&its);

beginner
beginner

solvers:
solvers:
nonlinear
nonlinear

64 of 132

Basic Nonlinear Solver Code (Fortran)


SNES
Mat
Vec
int

snes
J
x, F
n, its

...

call MatCreate(MPI_COMM_WORLD,n,n,J,ierr)
call VecCreate(MPI_COMM_WORLD,n,x,ierr)
call VecDuplicate(x,F,ierr)
call SNESCreate(MPI_COMM_WORLD
&
SNES_NONLINEAR_EQUATIONS,snes,ierr)
call SNESSetFunction(snes,F,EvaluateFunction,PETSC_NULL,ierr)
call SNESSetJacobian(snes,J,EvaluateJacobian,PETSC_NULL,ierr)
call SNESSetFromOptions(snes,ierr)
call SNESSolve(snes,x,its,ierr)

beginner
beginner

solvers:
solvers:
nonlinear
nonlinear

65 of 132

Solvers Based on Callbacks


User provides routines to perform actions that the library
requires. For example,
SNESSetFunction(SNES,...)
uservector - vector

to store function values


important concept
important concept
userfunction - name of the users function
usercontext - pointer to private data for the users function

Now, whenever the library needs to evaluate the users


nonlinear function, the solver may call the application code
directly with its own local state.
usercontext: serves as an application context object. Data are
handled through such opaque objects; the library never
sees irrelevant application data
beginner
beginner

solvers:
solvers:
nonlinear
nonlinear

66 of 132

Uniform access to all linear and


nonlinear solvers

-ksp_type [cg,gmres,bcgs,tfqmr,]
-pc_type [lu,ilu,jacobi,sor,asm,]
-snes_type [ls,tr,]

-snes_line_search <line search method>


-sles_ls <parameters>
-snes_convergence <tolerance>
etc...
2

beginner intermediate
beginner intermediate

solvers:
solvers:
nonlinear
nonlinear

67 of 132

SNES: Review of Basic Usage

SNESCreate( )
SNESSetFunction( )
SNESSetJacobian( )
SNESSetFromOptions( ) -

SNESSolve( )
SNESView( )

SNESDestroy( )

beginner
beginner

Create SNES context


Set function eval. routine
Set Jacobian eval. routine
Set runtime solver options
for [SNES,SLES, KSP,PC]
- Run nonlinear solver
- View solver options
actually used at runtime
(alternative: -snes_view)
- Destroy solver
solvers:
solvers:
nonlinear
nonlinear

68 of 132

SNES: Review of Selected Options


Functionality

Procedural
Interface

Runtime Option

Set nonlinear solver


Set monitoring
routine

SNESSetType( )
SNESSetMonitor( )

-snes_type [ls,tr,umls,umtr,]
-snes_monitor
1
snes_xmonitor,

Set convergence
tolerances
Set line search routine
View solver options
Set linear solver
options

SNESSetTolerances( )

-snes_rtol <rt> -snes_atol <at>


-snes _ max_its <its>
-snes_eq_ls [cubic,quadratic,]
-snes_view
-ksp_type < ksptype>
2
-ksp_rtol <krt>
-pc_type <pctype>

beginner intermediate
beginner intermediate

SNESSetLineSearch( )
SNESView( )
SNESGetSLES( )
SLESGetKSP( )
SLESGetPC( )

And many more options...


solvers:
solvers:
nonlinear
nonlinear

69 of 132

SNES: Example Programs


Location:

petsc/src/snes/examples/tutorials/

ex1.c, ex1f.F
ex4.c, ex4f.F

- basic uniprocessor codes


- uniprocessor nonlinear PDE
(1 DoF per node)
E ex5.c, ex5f.F, ex5f90.F - parallel nonlinear PDE (1 DoF per node)

E ex18.c

- parallel radiative transport problem with

multigrid

E ex19.c

- parallel driven cavity problem with


multigrid

And many more examples ...


1

beginner intermediate
beginner intermediate

E - on-line exercise

solvers:
solvers:
nonlinear
nonlinear

70 of 132

Timestepping Solvers (TS)


(and ODE Integrators)
beginner
beginner

Application code interface

beginner
beginner

Choosing the solver

intermediate
intermediate

Setting algorithmic options

intermediate
intermediate

Viewing the solver

intermediate
intermediate

Determining and monitoring convergence

advanced
advanced

User-defined customizations
tutorial outline:
tutorial outline:
solvers:
solvers:
timestepping
timestepping

71 of 132

Time-Dependent PDE Solution


Main Routine

Timestepping Solvers (TS)


Nonlinear Solvers (SNES)

PETSc

Linear Solvers (SLES)


PC
Application
Initialization

Solve
U t = F(U,Ux,Uxx)

KSP
Function
Evaluation
User code

beginner
beginner

Jacobian
Evaluation
PETSc code

PostProcessing
solvers:
solvers:
timestepping
timestepping

72 of 132

Timestepping Solvers
Goal: Support the (real and pseudo) time
evolution of PDE systems
Ut = F(U,Ux,Uxx,t)
User provides:
Code to evaluate F(U,Ux,Uxx,t)
Code to evaluate Jacobian of F(U,Ux,Uxx,t)
or use sparse finite difference approximation
or use automatic differentiation (coming soon!)

beginner
beginner

solvers:
solvers:
timestepping
timestepping

Sample Timestepping Application:


Burgers Equation

Ut= U Ux + Uxx
U(0,x) = sin(2 x)
U(t,0) = U(t,1)

beginner
beginner

solvers:
solvers:
timestepping
timestepping

74 of 132

Actual Local Function Code


Ut = F(t,U) = Ui (Ui+1 - U i-1)/(2h) +
(Ui+1 - 2Ui + U i-1)/(h*h)
Do 10, i=1,localsize
F(i) = (.5/h)*u(i)*(u(i+1)-u(i-1)) +
(e/(h*h))*(u(i+1) - 2.0*u(i) + u(i-1))
10 continue

beginner
beginner

solvers:
solvers:
timestepping
timestepping

75 of 132

Timestepping Solvers

Euler
Backward Euler
Pseudo-transient continuation
Interface to PVODE, a sophisticated parallel ODE
solver package by Hindmarsh et al. of LLNL
Adams
BDF

beginner
beginner

solvers:
solvers:
timestepping
timestepping

76 of 132

Timestepping Solvers
Allow full access to all of the PETSc
nonlinear solvers
linear solvers
distributed arrays, matrix assembly tools, etc.

User can customize all phases of the solution


process

beginner
beginner

solvers:
solvers:
timestepping
timestepping

77 of 132

TS: Review of Basic Usage

TSCreate( )
TSSetRHSFunction( )
TSSetRHSJacobian( )
TSSetFromOptions( )

TSSolve( )
TSView( )

TSDestroy( )

beginner
beginner

- Create TS context
- Set function eval. routine
- Set Jacobian eval. routine
- Set runtime solver options
for [TS,SNES,SLES,KSP,PC]
- Run timestepping solver
- View solver options
actually used at runtime
(alternative: -ts_view)
- Destroy solver
solvers:
solvers:
nonlinear
nonlinear

78 of 132

TS: Review of Selected Options


Functionality

Procedural
Interface

Set timestepping solver TSSetType( )


TSSetMonitor()
Set monitoring
routine

Runtime Option
-ts_ type [euler,beuler,pseudo,]
-ts_monitor
1
-ts_xmonitor,

-ts_max_steps <maxsteps>
-ts_max_time < maxtime>
TSView( )
-ts_view
View solver options
-snes_monitor -snes_rtol < rt>
Set timestepping solver TSGetSNES( )
SNESGetSLES( )
-ksp_type < ksptype>
options
2
SLESGetKSP( )
-ksp_rtol <rt>
SLESGetPC( )
-pc_type <pctype>

Set timestep duration

beginner intermediate
beginner intermediate

TSSetDuration ( )

And many more options...

solvers:
solvers:
timestepping
timestepping

79 of 132

TS: Example Programs


Location:

petsc/src/ts/examples/tutorials/

ex1.c, ex1f.F - basic uniprocessor codes (time-

ex3.c

ex4.c

dependent nonlinear PDE)


E ex2.c, ex2f.F - basic parallel codes (time-dependent

nonlinear PDE)
- uniprocessor heat equation
- parallel heat equation

And many more examples ...


1

beginner intermediate
beginner intermediate

E - on-line exercise

solvers:
solvers:
timestepping
timestepping

80 of 132

Mesh Definitions:

For Our Purposes

Structured: Determine neighbor relationships


purely from logical I, J, K coordinates
Semi-Structured: In well-defined regions,
determine neighbor relationships purely from
logical I, J, K coordinates
Unstructured: Do not explicitly use logical I, J,
K coordinates
tutorial
tutorial
introduction
introduction

81 of 132

Structured Meshes

PETSc support provided via DA objects


tutorial
tutorial
introduction
introduction

83 of 132

Semi-Structured Meshes

No explicit PETSc support


OVERTURE-PETSc for composite meshes
SAMRAI-PETSc for AMR

tutorial
tutorial
introduction
introduction

84 of 132

Data Layout and Ghost Values :


Usage Concepts
Managing field data layout and required ghost
values is the key to high performance of most
PDE-based parallel programs.

Mesh Types
Structured
DA objects

Unstructured
VecScatter objects

important concepts
important concepts

Usage Concepts

Geometric data
Data structure creation
Ghost point updates
Local numerical computation
tutorial outline:
tutorial outline:
data layout
data layout

85 of 132

Ghost Values
Local node

Ghost node

Ghost values: To evaluate a local function f(x) , each process


requires its local portion of the vector x as well as its ghost values -or bordering portions of x that are owned by neighboring processes.
beginner
beginner

data layout
data layout

86 of 132

Communication and Physical Discretization


Communication
Geometric
Data

Data Structure Ghost Point


Creation
Data Structures

stencil
[implicit]

DACreate( )

DA
AO

Ghost Point
Updates

DAGlobalToLocal( )

structured meshes
elements
edges
vertices

VecScatter
VecScatterCreate( ) AO

Loops over
I,J,K
indices

VecScatter( )

unstructured meshes

Local
Numerical
Computation

Loops over
entities

beginner intermediate
beginner intermediate

data layout
data layout

87 of 132

DA: Parallel Data Layout and Ghost Values


for Structured Meshes

beginner
beginner

Local and global indices

beginner
beginner

Local and global vectors

beginner
beginner

DA creation

intermediate
intermediate

Ghost point updates

intermediate
intermediate

Viewing
tutorial outline:
tutorial outline:
data layout:
data layout:
distributed arrays
distributed arrays

88 of 132

Communication and Physical Discretization:


Structured Meshes
Communication
Geometric
Data
stencil
[implicit]

Data Structure Ghost Point


Creation
Data Structures
DACreate( )

DA
AO

Ghost Point
Updates

DAGlobalToLocal( )

structured meshes

beginner
beginner

Local
Numerical
Computation
Loops over
I,J,K
indices

data layout:
data layout:
distributed arrays
distributed arrays

89 of 132

Global and Local Representations


Local node
Ghost node

9
0

Global: each process stores a unique


local set of vertices (and each vertex
is owned by exactly one process)

beginner
beginner

Local: each process stores a unique


local set of vertices as well as ghost
nodes from neighboring processes
data layout:
data layout:
distributed arrays
distributed arrays

90 of 132

Logically Regular Meshes


DA - Distributed Array: object containing
information about vector layout across the
processes and communication of ghost values
Form a DA
DACreateXX(.,DA *)

Update ghostpoints
DAGlobalToLocalBegin(DA,)
DAGlobalToLocalEnd(DA,)
beginner
beginner

data layout:
data layout:
distributed arrays
distributed arrays

91 of 132

Distributed Arrays
Data layout and ghost values

Proc 10

Proc 0

Proc 1

Box-type
stencil

beginner
beginner

Proc 10

Proc 0

Proc 1

Star-type
stencil
data layout:
data layout:
distributed arrays
distributed arrays

92 of 132

Vectors and DAs


The DA object contains information about the data
layout and ghost values, but not the actual field data,
which is contained in PETSc vectors
Global vector: parallel
each process stores a unique local portion
DACreateGlobalVector(DA da,Vec *gvec);

Local work vector: sequential


each processor stores its local portion plus ghost values
DACreateLocalVector(DA da,Vec *lvec);

uses natural local numbering of indices (0,1,nlocal-1)


beginner
beginner

data layout:
data layout:
distributed arrays
distributed arrays

93 of 132

DACreate1d(,*DA)
MPI_Comm - processors containing array
DA_STENCIL_[BOX,STAR]
DA_[NONPERIODIC,XPERIODIC]

beginner
beginner

number of grid points in x-direction


degrees of freedom per node
stencil width
...

data layout:
data layout:
distributed arrays
distributed arrays

94 of 132

DACreate2d(,*DA)

DA_[NON,X,Y,XY]PERIODIC

number of grid points in x- and y-directions


processors in x- and y-directions
degrees of freedom per node
stencil width
...
And similarly for DACreate3d()

beginner
beginner

data layout:
data layout:
distributed arrays
distributed arrays

95 of 132

Updating the Local Representation


Two-step process that enables overlapping
computation and communication
DAGlobalToLocalBegin(DA,
Vec global_vec,
INSERT_VALUES or ADD_VALUES
Vec local_vec);

DAGlobalToLocal End(DA,)

beginner
beginner

data layout:
data layout:
distributed arrays
distributed arrays

96 of 132

Unstructured Meshes
Setting up communication patterns is much more
complicated than the structured case due to
mesh dependence
discretization dependence

beginner
beginner

data layout:
data layout:
vector scatters
vector scatters

97 of 132

Sample Differences Among


Discretizations

Cell-centered
Vertex-centered
Cell and vertex centered (e.g., staggered grids)
Mixed triangles and quadrilaterals

beginner
beginner

data layout:
data layout:
vector scatters
vector scatters

98 of 132

Communication and Physical Discretization


Communication
Geometric
Data

Data Structure Ghost Point


Creation
Data Structures

stencil
[implicit]

DACreate( )

DA
AO

Ghost Point
Updates

DAGlobalToLocal( )

structured mesh
elements
edges
vertices

VecScatter
VecScatterCreate( ) AO

Loops over
I,J,K
indices

VecScatter( )

unstructured mesh

Local
Numerical
Computation

Loops over
entities

beginner intermediate
beginner intermediate

data layout
data layout

99 of 132

Driven Cavity Model


Example code: petsc/src/snes/examples/tutorials/ex8.c
Solution Components
Velocity-vorticity formulation,
with flow driven by lid and/or
bouyancy
Finite difference discretization
with 4 DoF per mesh point
velocity: u

velocity: v

vorticity:

temperature: T

[u,v, ,T]

beginner intermediate
beginner intermediate

solvers:
solvers:
nonlinear
nonlinear

100 of 132

Driven Cavity Program


Part A: Parallel data layout
Part B: Nonlinear solver creation, setup, and usage
Part C: Nonlinear function evaluation
ghost point updates
local function computation

Part D: Jacobian evaluation


default colored finite differencing approximation

Experimentation
1

beginner intermediate
beginner intermediate

solvers:
solvers:
nonlinear
nonlinear

101 of 132

Driven Cavity Solution Approach


A
B

Main Routine

Nonlinear Solvers (SNES)


Solve
F(u) = 0

Linear Solvers (SLES)

PC

Application
Initialization

PETSc

KSP

Function
Evaluation

C
User code

Jacobian
Evaluation

PostProcessing

D
PETSc code

solvers:
solvers:
nonlinear
nonlinear

102 of 132

Driven Cavity:
Running the program (1)
Matrix-free Jacobian approximation with no preconditioning
(via -snes_mf) does not use explicit Jacobian evaluation
1 processor: (thermally-driven flow)
mpirun -np 1 ex8 -snes_mf -snes_monitor -grashof 1000.0 -lidvelocity 0.0

2 processors, view DA (and pausing for mouse input):


mpirun -np 2 ex8 -snes_mf -snes_monitor
da_view_draw -draw_pause -1

View contour plots of converging iterates


mpirun ex8 -snes_mf -snes_monitor -snes_vecmonitor

beginner
beginner

solvers:
solvers:
nonlinear
nonlinear

103 of 132

Debugging and Error Handling


beginner
beginner

Automatic generation of tracebacks

beginner
beginner

Detecting memory corruption and leaks

developer
developer

Optional user-defined error handlers

tutorial outline:
tutorial outline:
debugging and errors
debugging and errors

104 of 132

Sample Error Traceback


Breakdown in ILU factorization due to a zero pivot

beginner
beginner

debugging and errors


debugging and errors

105 of 132

Sample Memory Corruption Error

beginner
beginner

debugging and errors


debugging and errors

106 of 132

Sample Out-of-Memory Error

beginner
beginner

debugging and errors


debugging and errors

107 of 132

Sample Floating Point Error

beginner
beginner

debugging and errors


debugging and errors

108 of 132

Profiling and Performance Tuning


Profiling:
beginner
beginner

Integrated profiling using -log_summary

intermediate
intermediate

Profiling by stages of an application

intermediate
intermediate

User-defined events

Performance Tuning:
intermediate
intermediate

Matrix optimizations

intermediate
intermediate

Application optimizations

advanced
advanced

Algorithmic tuning

tutorial outline:
tutorial outline:
profiling and
profiling and
performance tuning
performance tuning

109 of 132

Profiling
Integrated monitoring of

time
floating-point performance
memory usage
communication

All PETSc events are logged if compiled with


-DPETSC_LOG (default); can also profile application
code segments
Print summary data with option: -log_summary
See supplementary handout with summary data
beginner
beginner

profiling and
profiling and
performance tuning
performance tuning

110 of 132

Conclusion
beginner
beginner
beginner
beginner
beginner
beginner
developer
developer
beginner
beginner

Summary
New features
Interfacing with other packages
Extensibility issues
References

tutorial outline:
tutorial outline:
conclusion
conclusion

111 of 132

Summary
Using callbacks to set up the problems for ODE
and nonlinear solvers
Managing data layout and ghost point
communication with DAs and VecScatters
Evaluating parallel functions and Jacobians
Consistent profiling and error handling

112 of 132

Multigrid Support:
Recently simplified for structured grids
Linear Example:
3-dim linear problem on mesh of dimensions mx x my x mz
stencil width = sw, degrees of freedom per point = dof
using piecewise linear interpolation
ComputeRHS () and ComputeMatrix() are user-provided functions

DAMG

*damg;

DAMGCreate(comm,nlevels,NULL,&damg)

DAMGSetGrid(damg,3,DA_NONPERIODIC,DA_STENCIL_STAR,
mx,my,mz,sw,dof)

DAMGSetSLES(damg,ComputeRHS,ComputeMatrix)

DAMGSolve(damg)

solution = DAMGGetx(damg)

All standard SLES, PC and MG options apply.

113 of 132

Multigrid Support
Nonlinear Example:
3-dim nonlinear problem on mesh of dimensions mx x my x mz
stencil width = sw, degrees of freedom per point = dof
using piecewise linear interpolation
ComputeFunc () and ComputeJ acobian() are user-provided functions

DAMG

*damg;

DAMGCreate(comm,nlevels,NULL,&damg)

DAMGSetGrid(damg,3,DA_NONPERIODIC,DA_STENCIL_STAR,
mx,my,mz,sw,dof)

DAMGSetSNES(damg,ComputeFunc,ComputeJacobian)

DAMGSolve(damg)

solution = DAMGGetx(damg)

All standard SNES, SLES, PC and MG options apply.

114 of 132

Using PETSc with Other Packages:

Overture
Overture is a framework for generating
discretizations of PDEs on composite grids.
PETSc can be used as a black box linear
equation solver (a nonlinear equation solver is
under development).
Advanced features of PETSc such as the runtime
options database, profiling, debugging info, etc.,
can be exploited through explicit calls to the
PETSc API.
software
software

interfacing:
interfacing:
Overture
Overture

115 of 132

Overture Essentials
Read the grid
CompositeGrid cg;
getFromADataBase(cg,nameOfOGFile);
cg.update();

Create differential operators for the grid


int stencilSize = pow(3,cg. numberOfDimensions())+1);
CompositeGridOperators ops(cg);
ops.setStencilSize(stencilSize);

Create grid functions to hold matrix and vector values


Attach the operators to the grid functions
Assign values to the grid functions
Create an Oges (Overlapping Grid Equation Solver)
software
software
object to solve the system
interfacing:

interfacing:
Overture
Overture

116 of 132

Constructing Matrix Coefficients


Laplace operator with Dirichlet BCs:
Make a grid function to hold the matrix coefficients:
Range all;
realCompositeGridFunction coeff(cg,stencilSize,all,all,all);

Attach operators to this grid function:


coeff.setOperators(ops);

Designate this grid function for holding matrix coefficients:


coeff.setIsACoefficientMatrix(TRUE,stencilSize);

Get the coefficients for the Laplace operator:


coeff=ops.laplacianCoefficients();

Fill in the coefficients for the boundary conditions:


coeff.applyBoundaryConditionCoefficients(0,0,dirichlet,allBoundaries);

Fill in extrapolation coefficients for defining ghost cells:


coeff.applyBoundaryConditionCoefficients(0,0,extrapolate,allBoundaries);
coeff.finishBoundaryConditions();

software
software
interfacing:
interfacing:
Overture
Overture

117 of 132

Simple Usage of PETSc through Oges


PETSc API can be hidden from the user
Make the solver:
Oges solver(cg);

Set solver parameters:


solver.set(OgesParameters::THEsolverType,OgesParameters::PETSc);
solver.set(blockJacobiPreconditioner);
solver.set(gmres);

Solve the system:


solver.solve(sol,rhs);

Hides explicit matrix and vector conversions


Allows easy swapping of solver types (i.e., PETSc,
Yale, SLAP, etc.)
software
software
interfacing:
interfacing:
Overture
Overture

118 of 132

Advanced usage of PETSc with Oges


Exposing the PETSc API to the user
Set up PETSc:
PetscInitialize(&argc,&argv,);
PCRegister(MyPC,);

Build a PETScEquationSolver via Oges:


solver.set(OgesParameters::THEsolverType,OgesParameters::PETSc);
solver.buildEquationSolver(solver.parameters.solver);

Use Oges for matrix and vector conversions:


solver.formMatrix();
solver.formRhsAndSolutionVectors(sol,rhs);

Get a pointer to the PETScEquationSolver:


pes=(PETScEquationSolver*)solver.equationSolver[solver.parameters.solver];

Use PETSc API directly:


PCSetType(pes->pc,MyPC);
SLESSolve(pes->sles,pes->xsol,pes->brhs,&its);

Use Oges to convert vector into GridFunction:


solver.storeSolutionIntoGridFunction();

software
software
interfacing:
interfacing:
Overture
Overture

119 of 132

PETSc-Overture Black Box Example


#include "Overture.h"
#include "CompositeGridOperators.h"
#include "Oges.h"
int main() {
printf(This is Overtures Primer example7.C);
// Read in Composite Grid generated by Ogen:
String nameOfOGFile=TheGrid.hdf;
CompositeGrid cg;
getFromADataBase(cg,nameOfOGFile);
cg.update();
// Make some differential operators:
CompositeGridOperators op(cg);
int stencilSize=pow(3,cg.numberOfDimensions())+1;
op.setStencilSize(stencilSize);
// Make grid functions to hold vector coefficients:
realCompositeGridFunction u(cg),f(cg);
// Assign the right hand side vector coefficients:

// Make a grid function to hold the matrix coefficients:


Range all;
realCompositeGridFunction coeff(cg,stencilSize,all,all,all);
// Attach operators to this grid function:
coeff.setOperators(op);
// Designate this grid function for holding matrix coefficients:
coeff.setIsACoefficientMatrix(TRUE,stencilSize);

// Get the coefficients for the Laplace operator:


coeff=op.laplacianCoefficients();
// Fill in the coefficients for the boundary conditions:
coeff.applyBoundaryConditionCoefficients(0,0,
BCTypes::dirichlet,BCTypes::allBoundaries);
// Fill in extrapolation coefficients for ghost line:
coeff.applyBoundaryConditionCoefficients(0,0,
BCTypes::extrapolate,BCTypes::allBoundaries);
coeff.finishBoundaryConditions();
// Create an Overlapping Grid Equation Solver:
Oges solver(cg);
// Tell Oges to use PETSc:
solver.set(OgesParameters::THEsolverType,
OgesParameters::PETSc);
// Tell Oges which preconditioner and Krylov solver to use:
solver.set(blockJacobiPreconditioner);
solver.set(gmres);
// Prescribe the location of the matrix coefficients:
solver.setCoefficientArray( coeff );
// Solve the system:
solver.solve( u,f );
// Display the solution using Overtures ASCII format:

u.display();
return(0);
}

software
software
interfacing:
interfacing:
Overture
Overture

120 of 132

Advanced PETSc Usage In Overture


#include mpi.h
#include "Overture.h"
#include "CompositeGridOperators.h"
#include "Oges.h
#include petscpc.h
EXTERN_C_BEGIN
extern int CreateMyPC(PC);
EXTERN_C_END
char help=This is Overtures Primer example7.C using \
advanced PETSc features. \n Use of the Preconditioner \
MyPC is enabled via the option \n \t pc_type MyPC;
int main(int argc,char *argv[]) {
int ierr = PetscInitialize(&argc,&argv,0,help); {
// Allow PETSc to select a Preconditioner I wrote:
ierr = PCRegister(MyPC",0,"CreateMyPC",CreateMyPC);
// Read in Composite Grid generated by Ogen:
String nameOfOGFile=TheGrid.hdf;
// Determine file with runtime option -file
PetscTruth flag;
ierr = OptionsGetString(0,-file,(char*)nameOfOGFile ,
&flag); CHKERRA(ierr);
CompositeGrid cg;
getFromADataBase(cg,nameOfOGFile);
cg.update();

// Make some differential operators:


// Make grid functions to hold vector coefficients:
// Make a grid function to hold the matrix coefficients:
// Create an Overlapping Grid Equation Solver:
Oges solver(cg);
// Prescribe the location of the matrix coefficients:
solver.setCoefficientArray(coeff);
// Tell Oges to use PETSc:
solver.set(OgesParameters::THEsolverType,
OgesParameters::PETSc);
// Tell Oges which preconditioner and Krylov solver to use:
solver.set(blockJacobiPreconditioner);
solver.set(gmres);
// Allow command line arguments to supercede the above,
// enabling use of the runtime option: -pc_type MyPC
solver.setCommandLineArguments(argc,argv);
// Solve the system:
solver.solve( u,f );
// Access PETSc Data Structures:
PETScEquationSolver &pes = *(PETScEquationSolver *)
solver.equationSolver[OgesParameters::PETSc];
// View the actual (PETSc) matrix generated by Overture:
ierr = MatView(pes.Amx,VIEWER_STDOUT_SELF);
CHKERRA(ierr);
// Display the solution using Overtures ASCII format:
u.display();
}
software
software
PetscFinalize();
interfacing:
interfacing:
return(0);
Overture
}
Overture

121 of 132

Using PETSc with Other Packages

ILUDTP - Drop Tolerance ILU


Use PETSc SeqAIJ or MPIAIJ (for block Jacobi or ASM)
matrix formats
-pc_ilu_use_drop_tolerance <dt,dtcol,maxrowcount>
dt drop tolerance
dtcol - tolerance for column pivot
maxrowcount - maximum number of nonzeros kept per row

software
software
interfacing:
interfacing:
ILUDTP
ILUDTP

122 of 132

Using PETSc with Other Packages

ParMETIS Graph Partitioning


Use PETSc MPIAIJ or MPIAdj matrix formats

MatPartitioningCreate(MPI_Comm,MatPartitioning ctx)

MatPartitioningSetAdjacency(ctx,matrix)

Optional MatPartitioningSetVertexWeights(ctx,weights)

MatPartitioningSetFromOptions(ctx)

MatPartitioningApply(ctx,IS *partitioning)

software
software
interfacing:
interfacing:
ParMETIS
ParMETIS

123 of 132

Using PETSc with Other Packages

PVODE ODE Integrator

TSCreate(MPI_Comm,TS_NONLINEAR,&ts)

TSSetType(ts,TS_PVODE)

.. regular TS functions

TSPVODESetType(ts,PVODE_ADAMS)

. other PVODE options


TSSetFromOptions(ts) accepts PVODE options

software
software
interfacing:
interfacing:
PVODE
PVODE

124 of 132

Using PETSc with Other Packages

SPAI Sparse Approximate Inverse

PCSetType(pc,PCSPAI)

PCSPAISetXXX(pc,)

set SPAI options


PCSetFromOptions(pc) accepts SPAI options

software
software
interfacing:
interfacing:
SPAI
SPAI

125 of 132

Using PETSc with Other Packages

Matlab

PetscMatlabEngineCreate(MPI_Comm,machinename,
PetscMatlabEngine eng)

PetscMatlabEnginePut(eng,PetscObject obj)

Vector

Matrix

PetscMatlabEngineEvaluate(eng,R = QR(A);)

PetscMatlabEngineGet(eng,PetscObject obj)

software
software
interfacing:
interfacing:
Matlab
Matlab

126 of 132

Using PETSc with Other Packages

SAMRAI
SAMRAI provides an infrastructure for solving
PDEs using adaptive mesh refinement with
structured grids.
SAMRAI developers wrote a new class of PETSc
vectors that uses SAMRAI data structures and
methods.
This enables use of the PETSc matrix-free linear
and nonlinear solvers.
software
software
interfacing:
interfacing:
SAMRAI
SAMRAI

127 of 132

Sample Usage of SAMRAI with PETSc


Exposes PETSc API to the user
Make a SAMRAI Vector:
Samrai_Vector = new SAMRAIVectorReal2<double>();

Generate vector coefficients using SAMRAI


Create the PETSc Vector object wrapper for the
SAMRAI Vector:
Vec PETSc_Vector = createPETScVector(Samrai_Vector);

Use PETSc API to solve the system:


SNESCreate();
SNESSolve();

Both PETSc_Vector and Samrai_Vector


refer to the same data

software
software
interfacing:
interfacing:
SAMRAI
SAMRAI

128 of 132

Using PETSc with Other Packages

TAO
The Toolkit for Advanced Optimization (TAO) provides
software for large-scale optimization problems, including

unconstrained optimization
bound constrained optimization
nonlinear least squares
nonlinearly constrained optimization

TAO uses abstractions for vectors, matrices, linear solvers,


etc.; currently PETSc provides these implementations.
TAO interface is similar to that of PETSc
See tutorial by S. Benson, L.C. McInnes, and J. Mor,
available via http://www.mcs.anl.gov/tao
software
software
interfacing:
interfacing:
TAO
TAO

129 of 132

TAO Interface
TAO_SOLVER tao;
Vec
x, g;
ApplicationCtx usercontext;

/* optimization solver */
/* solution and gradient vectors */
/* user-defined context */

TaoInitialize();
/* Initialize Application -- Create variable and gradient vectors x and g */ ...
TaoCreate(MPI_COMM_WORLD,tao_lmvm,&tao);
TaoSetFunctionGradient(tao,x,g, FctGrad,(void*)&usercontext);
TaoSolve(tao);
/* Finalize application -- Destroy vectors x and g */ ...
TaoDestroy(tao);
TaoFinalize();

Similar Fortran interface, e.g., call TaoCreate(...)

software
software
interfacing:
interfacing:
TAO
TAO

130 of 132

Extensibility Issues
Most PETSc objects are designed to allow one to
drop in a new implementation with a new set of
data structures (similar to implementing a new
class in C++).
Heavily commented example codes include
Krylov methods: petsc/src/sles/ksp/impls/cg
preconditioners: petsc/src/sles/pc/impls/jacobi

Feel free to discuss more details with us in person.

131 of 132

Caveats Revisited
Developing parallel, non-trivial PDE solvers that
deliver high performance is still difficult, and
requires months (or even years) of concentrated
effort.
PETSc is a toolkit that can ease these difficulties
and reduce the development time, but it is not a
black-box PDE solver nor a silver bullet.
Users are invited to interact directly with us
regarding correctness or performance issues by
writing to [email protected].

132 of 132

References
http://www.mcs.anl.gov/petsc/docs

Example codes docs/exercises/main.htm


http://www.mpi-forum.org

Using MPI (2nd Edition), Gropp, Lusk, and Skjellum


Domain Decomposition, Smith, Bjorstad, and Gropp

You might also like