0% found this document useful (0 votes)
14 views

SciComp2 ClassNotes

Uploaded by

Oriol Bertomeu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

SciComp2 ClassNotes

Uploaded by

Oriol Bertomeu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 274

Scientific Computing II

Overview

Michael Bader
Technical University of Munich
Summer 2023
Remember: The Simulation Pipeline
phenomenon, process etc.
- modelling
?
v mathematical model
a
l
- numerical treatment
?
i numerical algorithm
d
a - implementation
t ?
i simulation code
o
n visualization
?
results to interpret
H



HH
j embedding
H
statement tool

Michael Bader | Scientific Computing II | Overview | Summer 2023 2


Topic #1: Solving Systems of Linear Equations

Focussing on
• large systems: 106 –109 unknowns
• sparse systems: typically only O(N) non-zeros in the system matrix
(N unknowns)
• systems resulting from the discretization of PDEs

Topics
• relaxation methods (as smoothers)
• multigrid methods
• Conjugate Gradient methods
• preconditioning

Michael Bader | Scientific Computing II | Overview | Summer 2023 3


Recall: Finite Volume Model for Heat Equation
• object: a rectangular metal plate
• model as a collection of small connected rectangular cells

hy
hx
• compute the temperature distribution on this plate!
Michael Bader | Scientific Computing II | Overview | Summer 2023 4
A Finite Volume Model (2)

• model assumption: temperatures in equilibrium in every grid cell


• heat flow across a given edge is proportional to
• temperature difference (T1 − T0 ) between the adjacent cells
• length h of the edge
• e.g.: heat flow across the left edge:
(left) 
qi,j = kx Ti,j − Ti−1,j hy

note: heat flow out of the cell (and kx > 0)


• heat flow across all edges determines change of heat energy:
 
qij = kx Tij − Ti−1,j hy + kx Tij − Ti+1,j hy
 
+ ky Tij − Ti,j−1 hx + ky Tij − Ti,j+1 hx

Michael Bader | Scientific Computing II | Overview | Summer 2023 5


A Steady-State Model
. . . and a large system of linear equations

• heat sources: consider additional source term Fi,j due to


• external heating
• radiation
• Fi,j = fi,j hx hy (fi,j heat flow per area)
• equilibrium with source term requires qi,j + Fi,j = 0:

fi,j hx hy = −kx hy 2Ti,j − Ti−1,j − Ti+1,j

−ky hx 2Ti,j − Ti,j−1 − Ti,j+1

• leads to large system of linear equations


• 1/h2 unknowns, sparse system matrix (only 5 entries per row)

→ will be our model problem!

Michael Bader | Scientific Computing II | Overview | Summer 2023 6


Multigrid: HHG for Mantle Convection
(Rüde et al., 2013; project: TERRA NEO)

Michael Bader | Scientific Computing II | Overview | Summer 2023 7


Multigrid: HHG for Mantle Convection (2)
Mantle Convection on PetaScale Supercomputers:
• mantle convection modeled via Stokes equation (“creeping flow”)
• linear Finite Element method on an hierarchically structured tetrahedral
mesh
• requires solution of global pressure equation in each time step

Weak Scaling of HHG Multigrid Solver on JuQueen:


• geometric multigrid for Stokes flow via pressure-correction
• pressure residual reduced by 10−3 (A) or 10−8 (B)
Nodes Threads Grid points Resolution Time: (A) (B)
1 30 2.1 · 1007 32 km 30 s 89 s
4 240 1.6 · 1008 16 km 38 s 114 s
30 1 920 1.3 · 1009 8 km 40 s 121 s
240 15 360 1.1 · 1010 4 km 44 s 133 s
1 920 122 880 8.5 · 1010 2 km 48 s 153 s
15 360 983 040 6.9 · 1011 1 km 54 s 170 s

Michael Bader | Scientific Computing II | Overview | Summer 2023 8


Topic #2: Molecular Dynamics

Discuss large part of the simulation pipeline:


• modelling: potentials, forces, systems of ODE
• numerics: suitable numerical methods for the ODEs
• implementation: short-range vs. longe-range forces
• visualisation? (well, actually not the entire pipeline . . . )

Focussing on
• large systems: 106 –109 particles
• short-range vs. long-range forces
• N-body methods, parallelization

Michael Bader | Scientific Computing II | Overview | Summer 2023 9


N-Body Methods: Millennium-XXL Project

(Springel, Angulo, et al., 2010)


• N-body simulation with N = 3 · 1011 “particles”
• compute gravitational forces and effects
(every “particle” correspond to ∼ 109 suns)
• simulation of the generation of galaxy clusters
plausibility of the “cold dark matter” model
Michael Bader | Scientific Computing II | Overview | Summer 2023 10
N-Body Methods: Particulate Flow Simulation

(Rahimian, . . . , Biros, 2010)

• direct simulation of blood flow


• particulate flow simulation (coupled problem)
• Stokes flow for blood plasma
• red blood cells as immersed, deformable particles
Michael Bader | Scientific Computing II | Overview | Summer 2023 11
Organisation

Michael Bader | Scientific Computing II | Overview | Summer 2023 12


Lectures

Lecturers:
• Michael Bader
• recorded lectures by Anne Reinarz, from summer 2020

Time & Day:


• by default, lectures will be on Tuesday (10 c.t.)

“Style”: online-augmented presence teaching


• onsite lectures
→ focus on use cases, examples, questions, discussions, etc.
• recorded lectures from summer 2020
→ for preparation, repeating, etc.

Michael Bader | Scientific Computing II | Overview | Summer 2023 13


Tutorials

Tutors:
• David Schneller (multigrid and CG)
• Sam Newcome (molecular dynamics)

Time & Day:


• by default, tutorials will be on Fridays (14 c.t.)
• first tutorial on . . .

“Style”:
• worksheets with applications & examples
• no compulsory part

Michael Bader | Scientific Computing II | Overview | Summer 2023 14


Exams, ECTS, Modules

ECTS, Modules
• 5 ECTS (2+2 lectures/tutorials per week)
• CSE: compulsory course
• Biomed. Computing/Computer Science:
elective/Master catalogue
• others?

Exam:
• written exam at end of semester
• based on exercises presented in the tutorials
• one implementation-oriented exercise Python

Michael Bader | Scientific Computing II | Overview | Summer 2023 15


Lecture Slides: Color Code for Headers

Black Headers:
• for all slides with regular topics

Green Headers:
• summarized details: will be explained in the lecture, but usually not as an
explicit slide; “green” slides will only appear in the handout versions

Red Headers:
• advanced topics or outlook: will not be part of the exam topics

Blue Headers:
• background information or fundamental concepts that are probably
already known, but are important throughout the lecture

Michael Bader | Scientific Computing II | Overview | Summer 2023 16


Scientific Computing II
Relaxation Methods and the Smoothing Property

Michael Bader
Technical University of Munich
Summer 2023
Part I

(Recap on) Relaxation Methods

Residual-Based Correction
The Residual Equation
Relaxation
Jacobi Relaxation
Gauss-Seidel Relaxation
Successive-Over-Relaxation (SOR)

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 2
The Residual Equation
• we consider a system of linear equations: Ax = b
(as stemming from the FD/FV/FEM discretisation of a PDE)
• for which we compute a sequence of approximate solutions x (k)
• the residual r (k) shall then be defined as:

r (k ) = b − Ax (k)

• short computation:

r (k) = b − Ax (k ) = Ax − Ax (k ) = A(x − x (k) ) = Ae(k ) .

• relates the residual r (k ) to the error e(k) := x − x (k )


(note that x = x (k) + e(k) );
• we will call this equation the residual equation:

Ae(k) = r (k)

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 3
Residual Based Correction

Solve Ax = b using the residual equation Ae(k) = r (k )


• the residual r (which can be computed) is an indicator for the size of the
error e (which is not known).
• therefore: use residual equation to compute a correction to x (k)
• one possible approach: solve a modified (easier) SLE

B ê(k ) = r (k) where B∼A

• use ê(k) as an approximation for e(k ) , and set

x (k +1) = x (k) + ê(k)

for the next iteration

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 4
Relaxation

How should we choose B?


• B should be “similar” to A (B ∼ A )
• more precisely B −1 ≈ A−1
• or at least B −1 y ≈ A−1 y for most vectors y
• Be = r should be easy/fast to solve

Examples:
• B = diag(A) = DA (diagonal part of A)
⇒ Jacobi method (“Jacobi relaxation”)
• B = LA (lower triangular part of A)
⇒ Gauss-Seidel method (“Gauss-Seidel relaxation”)

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 5
Jacobi Relaxation

Iteration formulas in matrix-vector notation:


1. residual notation:
 
x (k+1) = x (k) + DA−1 r (k) = x (k) + DA−1 b − Ax (k)

2. for implementation:
 
x (k +1) = DA−1 b − (A − DA )x (k)

3. for analysis:
 
x (k+1) = I − DA−1 A x (k ) + DA−1 b =: Mx (k ) + Nb

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 6
Jacobi Relaxation – Algorithm

• based on: x (k +1) = DA−1 b − (A − DA )x (k )




for i from 1 to n do
xnew[i] := ( b[i]
- sum( A[i,j]*x[j], j=1..i-1)
- sum( A[i,j]*x[j], j=i+1..n)
) / A[i,i];
end do;
for i from 1 to n do
x[i] := xnew[i];
end do;
• properties:
• additional storage required (xnew)
• x, xnew can be computed in any order
• x, xnew can be computed in parallel

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 7
Gauss-Seidel Relaxation

Iteration formulas in matrix-vector notation:


1. residual notation:
 
x (k+1) = x (k ) + L−1
A r (k)
= x (k)
+ L−1
A b − Ax (k)

2. for implementation:
 
x (k +1) = L−1
A b − (A − LA )x (k)

3. for analysis:
 
x (k+1) = I − L−1
A A x
(k )
+ L−1
A b =: Mx
(k)
+ Nb

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 8
Gauss-Seidel Relaxation – Algorithm

• based on: x (k +1) = L−1



A b − (A − LA )x (k )
• solve LA x (k+1) = b − (A − LA )x (k)
via backwards substitution:
for i from 1 to n do
x[i] := ( b[i]
- sum( A[i,j]*x[j], j=1..i-1) !updated values of x
- sum( A[i,j]*x[j], j=i+1..n) !previous values of x
) / A[i,i];
end do;
• properties:
• no additional storage required
• inherently sequential computation of x
• usually faster convergence than Jacobi

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 9
Successive-Over-Relaxation (SOR)

• observation: Gauss-Seidel corrections are “too small”


• add an over-relaxation-factor α:
for i from 1 to n do
x[i] := x[i] + alpha * ( b[i]
- sum( A[i,j]*x[j], j=1..n)
) / A[i,i];
end do;
• for 2D Poisson model problem:
optimal α (≈ 1.7) improves convergence: O(n2 ) → O(n3/2 )

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 10
Does It Always Work?

• simple answer: no (life is not that easy . . . )


• Jacobi: matrix A needs to be diagonally dominant
• Gauss-Seidel: matrix A needs to be positive definite
• How about performance?
→ usually quite slow

Our next topics:


1. How slow are the methods exactly?
2. What is the underlying reason?
3. Is there a fix?

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 11
Part II

Smoothing Property of Relaxation


Methods
The Model Problem – 1D Poisson
Convergence of Relaxation Methods
The Smoothing Property

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 12
The Model Problem – 1D Poisson
1D Poisson equation:
• −u 00 (x) = 0 on Ω = (0, 1), u(0) = u(1) = 0
• thus: u(x) = 0 boring, but easy to examine the error
• discretised on a uniform grid of mesh size h = n1
• compute approximate values uj ≈ u(xj )
at grid points xj := jh, with j = 1, . . . , (n − 1)
• tridiagonal system matrix Ah (size (n − 1) × (n − 1)) built from 3-point
stencil:
1
[−1 2 − 1]
h2
• assume zero right-hand side to obtain the following system:
1
−uj−1 + 2uj − uj+1 = 0 ⇔ uj = 12 uj−1 + uj+1
 
h2

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 13
1D Poisson: Jacobi Relaxation
Relaxation Methods – Jacobi
Iterative scheme for Jacobi relaxation:  
• leads to relaxation scheme uj(k +1) = 12 uj+1
(k ) (k)
+ uj−1
• start with initial guess uj(0) 6= 0
place peas on the line between
• in this case: ej(k) = uj − uj(k) = −uj(k)
two neighbours in parallel
Visualisation of relaxation process:

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 14
1D Poisson: Jacobi Relaxation
Relaxation Methods – Jacobi
Iterative scheme for Jacobi relaxation:  
• leads to relaxation scheme uj(k +1) = 12 uj+1
(k ) (k)
+ uj−1
• start with initial guess uj(0) 6= 0
place peas on the line between
• in this case: ej(k) = uj − uj(k) = −uj(k)
two neighbours in parallel
Visualisation of relaxation process:

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 14
1D Poisson: Jacobi Relaxation
Relaxation Methods – Jacobi
Iterative scheme for Jacobi relaxation:  
• leads to relaxation scheme uj(k +1) = 12 uj+1
(k ) (k)
+ uj−1
• start with initial guess uj(0) 6= 0
place peas on the line between
• in this case: ej(k) = uj − uj(k) = −uj(k)
two neighbours in parallel
Visualisation of relaxation process:

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 14
1D Poisson: Jacobi Relaxation
Relaxation Methods – Jacobi
Iterative scheme for Jacobi relaxation:  
(k )
• leads to relaxation scheme uj(k +1) = 12 uj+1 (k)
+ uj−1
• start with initial guess uj(0) 6= 0
place
• inpeas on the
(k) line between
(k)
this case: e = uj − u = −u
(k) two neighbours in parallel
j j j

Visualisation of relaxation process:

we getBader
Michael a |high
Scientificplus a IIlow
Computing frequency
| Relaxation oscillation
Methods and the Smoothing Property | Summer 2023 14
1D Poisson: Gauss-Seidel Relaxation
Relaxation Methods – Gauss-Seidel
Iterative scheme for Gauss-Seidel relaxation: 
• leads to relaxation scheme uj(k +1) = 12 uj+1
(k ) (k+1)
+ uj−1
• start with initial guess uj(0) 6= 0
sequentially place peas on the line
• in this case: ej(k) = uj − uj(k) = −uj(k)
between two neighbours
Visualisation of relaxation process:

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 15
1D Poisson: Gauss-Seidel Relaxation
Relaxation Methods – Gauss-Seidel
Iterative scheme for Gauss-Seidel relaxation: 
• leads to relaxation scheme uj(k +1) = 12 uj+1
(k ) (k+1)
+ uj−1
• start with initial guess uj(0) 6= 0
sequentially place peas on the line
• in this case: ej(k) = uj − uj(k) = −uj(k)
between two neighbours
Visualisation of relaxation process:

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 15
1D Poisson: Gauss-Seidel Relaxation
Relaxation Methods – Gauss-Seidel
Iterative scheme for Gauss-Seidel relaxation: 
• leads to relaxation scheme uj(k +1) = 12 uj+1
(k ) (k+1)
+ uj−1
• start with initial guess uj(0) 6= 0
sequentially place peas on the line
• in this case: ej(k) = uj − uj(k) = −uj(k)
between two neighbours
Visualisation of relaxation process:

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 15
1D Poisson: Gauss-Seidel Relaxation
Relaxation Methods – Gauss-Seidel
Iterative scheme for Gauss-Seidel relaxation: 
• leads to relaxation scheme uj(k +1) = 12 uj+1
(k ) (k+1)
+ uj−1
• start with initial guess uj(0) 6= 0
sequentially place peas on the line
• in this case: ej(k) = uj − uj(k) = −uj(k)
between two neighbours
Visualisation of relaxation process:

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 15
1D Poisson: Gauss-Seidel Relaxation
Relaxation Methods – Gauss-Seidel
Iterative scheme for Gauss-Seidel relaxation: 
• leads to relaxation scheme uj(k +1) = 12 uj+1
(k ) (k+1)
+ uj−1
• start with initial guess uj(0) 6= 0
sequentially place peas on the line
• in this case: ej(k) = uj − uj(k) = −uj(k)
between two neighbours
Visualisation of relaxation process:

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 15
1D Poisson: Gauss-Seidel Relaxation
Relaxation Methods – Gauss-Seidel
Iterative scheme for Gauss-Seidel relaxation: 
• leads to relaxation scheme uj(k +1) = 12 uj+1
(k ) (k+1)
+ uj−1
• start with initial guess uj(0) 6= 0
sequentially place peas on the line
• in this case: ej(k) = uj − uj(k) = −uj(k)
between two neighbours
Visualisation of relaxation process:

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 15
1D Poisson: Gauss-Seidel Relaxation
Relaxation Methods – Gauss-Seidel
Iterative scheme for Gauss-Seidel relaxation: 
• leads to relaxation scheme uj(k +1) = 12 uj+1
(k ) (k+1)
+ uj−1
• start with initial guess uj(0) 6= 0
sequentially place peas on the line
• in this case: ej(k) = uj − uj(k) = −uj(k)
between two neighbours
Visualisation of relaxation process:

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 15
1D Poisson: Gauss-Seidel Relaxation
Relaxation Methods – Gauss-Seidel
Iterative scheme for Gauss-Seidel relaxation: 
• leads to relaxation scheme uj(k +1) = 12 uj+1
(k ) (k+1)
+ uj−1
• start with initial guess uj(0) 6= 0
sequentially place peas on the line
• in this case: ej(k) = uj − uj(k) = −uj(k)
between two neighbours
Visualisation of relaxation process:

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 15
1D Poisson: Gauss-Seidel Relaxation
Relaxation Methods – Gauss-Seidel
Iterative scheme for Gauss-Seidel relaxation: 
• leads to relaxation scheme uj(k +1) = 12 uj+1
(k ) (k+1)
+ uj−1
• start with initial guess uj(0) 6= 0
sequentially place peas on the line
• in this case: ej(k) = uj − uj(k) = −uj(k)
between two neighbours
Visualisation of relaxation process:

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 15
1D Poisson: Gauss-Seidel Relaxation
Relaxation Methods – Gauss-Seidel
Iterative scheme for Gauss-Seidel relaxation: 
• leads to relaxation scheme uj(k +1) = 12 uj+1
(k ) (k+1)
+ uj−1
• start with initial guess uj(0) 6= 0
sequentially place peas on the line
• in this case: ej(k) = uj − uj(k) = −uj(k)
between two neighbours
Visualisation of relaxation process:

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 15
Convergence of Relaxation Methods

Observation (see also tutorials)


• slow convergence
• smooth error components are reduced very slowly
• high frequency error components are damped more efficiently (esp. for
Gauss-Seidel relaxation)

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 16
Convergence Analysis

• remember iteration scheme: x (i+1) = Mx (i) + Nb


• derive iterative scheme for the error e(i) := x − x (i) :

e(i+1) = x − x (i+1) = x − Mx (i) − Nb

• for a consistent scheme, x is a fixpoint of the iteration equation:


x = Mx + Nb
• hence:

e(i+1) = Mx + Nb − Mx (i) − Nb
= Mx − Mx (i) = Me(i)
⇒ e(i) = M i e(0) .

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 17
Convergence Analysis (2)

• iteration equation for error: e(i) = M i e(0)


• consider eigenvalues µk and eigenvectors vk of iteration matrix M:
X  X X
Mvk = µk vk ⇒ M αk vk = αk Mvk = µk αk vk
k k k
X
• write error as combination of eigenvectors: e(0) = αk vk , then:
k
X  X
M i e(0) = M i αk vk = (µk )i αk vk
k k

• convergence, if all |µk | < 1


• speed of convergence dominated by largest µk

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 18
The Smoothing Property
Eigenvalues and -vectors of Ah : (compare with tutorials!)
• eigenvalues: λk = h42 sin2 kπ 4 2 k πh
 
2n = h2 sin 2
• eigenvectors: v (k) = sin(k πj/n)

j=1,...,n−1
• both for k = 1, . . . , (n − 1)

For Jacobi relaxation:


• iteration matrix M = I − DA−1 A = I − h2
2 A
• eigenvalues of M: µk := 1 − 2 kπh

2 sin 2
• |µk | < 1 for all k , but |µk | ≈ 1 if k = 1 or k = n−1
• µ1 ∈ O(1 − h2 ): slow convergence of smooth errors
• µn−1 ≈ −1: “sign-flip” (but slow reduction) of “zig-zag” error components
• convergence factor determined by O(1 − h2 )

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 19
The Smoothing Property
Eigenvalues and -vectors of Ah : (compare with tutorials!)
• eigenvalues: λk = h42 sin2 kπ 4 2 k πh
 
2n = h2 sin 2
• eigenvectors: v (k) = sin(k πj/n)

j=1,...,n−1
• both for k = 1, . . . , (n − 1)

For weighted Jacobi relaxation:


2
• iteration matrix M = I − ωDA−1 A = I − h2 ωA
• eigenvalues of M: 1 − 2ω sin2 kπh

2
• µ1 ∈ O(1 − h2 ): slow convergence of smooth errors
• µn−1 ≈ 0 for ω = 12 ; µn−1 ≈ − 13 for ω = 23
thus quick reduction of high-frequency errors
• convergence determined by O(1 − n−2 )
(slower than normal Jacobi due to ω)

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 19
The Smoothing Property (2)

“Fourier mode analysis”


• decompose the error e(i) into eigenvectors → for 1D Poisson: sin(kπxj )
• determine convergence factors for “eigenmodes”

Observation for weighted Jacobi and Gauss-Seidel:


• The high frequency part (with respect to the underlying grid) is reduced
quite quickly.
• The low frequency part (w.r.t. the grid) decreases only very slowly;
actually the slower, the finer the grid is.
⇒ “smoothing property”

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 20
The Smoothing Property (2)

“Fourier mode analysis”


• decompose the error e(i) into eigenvectors → for 1D Poisson: sin(kπxj )
• determine convergence factors for “eigenmodes”

Another Observation:
• the smoothest (slowest converging) component corresponds to the
smallest eigenvalue of A (k = 1)
• remember residual equation: Ae = r :
if e = v (1) , then r = λ1 v (1)
⇒ “small residual, but large error”
⇒ in such a situation, any residual-based correction will normally fail

Michael Bader | Scientific Computing II | Relaxation Methods and the Smoothing Property | Summer 2023 20
Scientific Computing II
Towards Multigrid Methods

Michael Bader
Technical University of Munich
Summer 2022
Part I

Multigrid Methods

Fundamental Multigrid Ideas


Nested Iteration
Coarse-Grid Correction – A Two-Grid Method
Multigrid V-Cycle
V-Cycle as Recursive Coarse-Grid Correction
Costs per Iteration
Speed of Convergence

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 2


Multigrid Idea No. 1

Observation and convergence analysis show:


• “high-frequency error” is relative to mesh size
• on a sufficiently coarse grid, even very low frequencies can be
“high-frequency”
(if the mesh size is big)

“Multigrid” idea:
• use multiple grids to solve the system of equations
• hope that on each grid, a certain range of error frequencies will be
reduced efficiently

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 3


Nested Iteration

Solve the problem on a coarser grid:


• will be comparably (very) fast
• can give us a good initial guess:
• leads to “poor man’s multigrid”: nested iteration

Algorithm:
1. Start on a very coarse grid with mesh size h = h0 ;
guess an initial solution xh
2. Iterate over Ah xh = bh using relaxation method
⇒ approximate solution xh
3. interpolate the solution xh to a finer grid Ωh/2
4. proceed with step 2 (now with mesh size h := h/2) using interpolated
xh/2 as initial solution

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 4


Multigrid Idea No. 2

Observation for nested iteration:


• error in interpolated initial guess also includes low frequencies
• relaxation therefore still slow
• can we go “back” to a coarser grid later in the algorithm?

Idea No. 2: use the residual equation


⇒ coarse-grid correction:
• relaxation leads to a smooth error e
• thus: solve Ae = r on a coarser grid
• leads to an approximation of the error e
• add this approximation to the fine-grid solution

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 5


A Two-Grid Method

Algorithm:
1. relaxation/smoothing on the fine level system
⇒ solution xh
2. compute the residual rh = bh − Ah xh
3. restriction of rh to the coarse grid ΩH rH
4. compute a solution to AH eH = rH on ΩH
5. interpolate the coarse grid solution eH to the fine grid Ωh
6. add the resulting correction to xh
7. again, relaxation/smoothing on the fine grid

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 6


Correction Scheme – Components

• smoother:
reduce the high-frequency error components, and get a smooth error
• restriction:
transfer residual from fine grid to coarse grid
• coarse grid equation:
(acts as) discretisation of the PDE on the coarse grid;
requires a coarse-grid solver
• interpolation:
transfer coarse grid solution/correction from coarse grid to fine grid
(and update solution accordingly)

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 7


The Multigrid V-Cycle

Crucial idea: recursive call of Two-/Multigrid solver on coarse grids

Algorithm:
1. pre-smoothing on the fine level system Al xl = bl
approximate solution xl
2. compute the residual rl = bl − Al xl
3. restriction of rl to the coarse grid Ωl−1 r̂l−1
4. solve coarse grid system Al−1 el−1 = r̂l−1 =: bl−1
by a recursive call to the V-cycle algorithm
5. interpolate the coarse grid solution el−1 êl to the fine grid Ωl
6. add the interpolated coarse-grid correction: xl = xl + êl
7. post-smoothing on the fine grid

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 8


V-Cycle – Computational Costs
Further algorithmic details:
• on the coarsest grid: direct solution;
but: number of unknowns small, ideally O(1)
• number of smoothing steps is typically very small (even only 1 or 2)
and must not depend on problem size

Computational Costs (storage and computing time):


• 1D: c · n + c · n/2 + c · n/4 + . . . ≤ 2c · n
• 2D: c · n + c · n/4 + c · n/16 + . . . ≤ 4/3c · n
• 3D: c · n + c · n/8 + c · n/64 + . . . ≤ 8/7c · n
• overall costs are dominated by the costs of the finest grid
(n the number of grid points on the finest grid: typicall n = h−D )

Thus: runtime O(n) per iteration, but how many iterations necessary?

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 9


Speed of Convergence

• fastest method around (if all components are chosen carefully), but:
best-possible convergence often hard to obtain
• “textbook multigrid efficiency”:

e(m+1) ≤ γ e(m) ,

where convergence rate γ < 1 (esp. γ  1) is independent of the number


of unknowns
⇒ constant number of multigrid steps to obtain a given number of digits
⇒ overall computational work increases only linearly with the number of
unknowns
• see exercises: analysis of two-grid convergence

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 10


Part II

Components of Multigrid Methods

Interpolation
Restriction
Coarse Grid Operator
Smoothers
Multigrid Cycles
Multigrid W-Cycle
Full Multigrid V-Cycle

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 11


Interpolation (aka “Prolongation”)

For Poisson problem:


• (bi-)linear interpolation:
in 1D: resembles homogeneous (f = 0) solution
• constant (in general too small approximation order):
sometimes used for cell-based coarsening (unknowns located in cell
centers)
• quadratic, cubic, etc.:
often too costly, more smoothing steps are cheaper and can eliminate the
disadvantage of a lower-order interpolation
• but: in an FMV-cycle (to be discussed) interpolation to finer grid (after a
completed V-cycle) should be higher-order (to limit the introduced error)

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 12


Interpolation – Matrix Notation

For linear interpolation (1D):


 1   1

2 0 0 2 (0 + x1 )
 1 0 0   x1 
 1 1    1

 2 2 0 x1 2 (x1 + x2 )
   
  
 0 1 0   x2  =  x2 
   
 0 1 1  x3  1
(x 2 + x3 ) 
 2 2   2 
 0 0 1   x3 
0 0 12 1
2 (x3 + 0)

h h
Notation: I2h x2h = xh or P2h x2h = xh

Note: disregards boundary values (here: 0-Dirichlet condition assumed)

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 13


Interpolation – Convection-Diffusion
Example problem: 1D convection-diffusion equation

−uxx + cux = f , 0<c

Operator-dependent Interpolation:
• consider homogeneous problem (f = 0)
with Dirichlet boundaries: u(0) = 1, u(1) = 0
• exact solution for this case:

1 
cx/ c/
 1 − ecx/
u(x) = e − e = 1 −
1 − ec/ 1 − ec/
c/2
• interpolate at x = 21 : u 12 = 1 − 1−e c/ =: 1 − 1−z 1−z

1−e 2;
c 1
 1
for large z = e 2 , we have u 2 ≈ 1 − z ≈ 1
• thus: linear interpolation inappropriate (and leads to slow convergence)
→ interpolation should be operator-dependent
Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 14
Restriction
For Poisson problem:
• “injection”: pick values at corresp. coarse grid points
• “full weighting” = transpose of bilinear interpolation (safer, more robust
convergence), see illustration below for the 1D case

1/2 1/2 1/2 1/2


linear interpolation
1 1 1

1/2 1/2

1/2 f ll weighting
full i hti
1

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 15


Restriction – Matrix Notation

For full weighting (1D):


 
x1
 x2  
1 1 1
 
1 0 0 0 0 (x1 + 2x2 + x3 )
 
2 2  x3  2
1 1    1
0 0 1 0 0  x4  =  (x3 + 2x4 + x5 ) 

2 2   2 
0 0 0 0 1
1 1 x5 1
2 (x5 + 2x6 + x7 )
 
2 2  
 x6 
x7

Notation: Ih2h xh = x2h or Rh2h xh = x2h

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 16


Coarse Grid Operator

Two main options:


1. discretise PDE on grid Ωh to obtain Ah
2. “Galerkin approach”: A2h := Rh2h Ah P2h
h

→ compare effect on vector x2h :

A2h x2h := Rh2h Ah P2h


h
x2h

→ evaluate from right to left:


• interpolate x2 h to x̂h := P2h
h
x2h
• apply fine-grid operator Ah to interpolated x̂h
• restrict resulting matrix-vector product to Ω2h

Exercise:
• Compute A2h := Rh2h Ah P2h
h 1
for Ah := h2
tridiag(−1, 2, −1)

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 17


A Matrix-Oriented View of Coarse-Grid Correction
1. given a system of equations Ah xh = bh
(i)
2. pre-smoothing leads to approximate solution xh
(i) (i)
and resp. error xh = xh + eh
(i) (i)
3. compute residual rh = bh − Ah xh ;
(i) (i)
respective residual equation: Ah eh = rh
(i) (i)
4. restriction of residual equation: Rh2h Ah eh = Rh2h rh
h (i) (i)
5. approximate error on coarse grid: eh ≈ P2h ê2h
h (i) (i)
2h
→ leads to Galerkin coarsening Rh Ah P2h ê2h = Rh2h rh
(i) (i) (i)
6. with A2h := Rh2h Ah P2h
h
and b2h := Rh2h rh compute ê2h from A2h ê2h = b2h
h (i) (i)
7. interpolate coarse-grid error, eh ≈ P2h ê2h ,
(i+1) (i) h (i)
and apply as coarse-grid correction: xh = xh + P2h ê2h
(i+1)
8. post-smoothing on approximate solution xh

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 18


Galerkin Coarse Grid Operator: A2h := Rh2h Ah P2h
h

• assume linear Interpolation and full weighting restriction:


 1
1 12 0 0 0 0

2
Rh2h =  0 0 12 1 21 0 0  h
P2h = (Rh2h )T
 
1 1
0 0 0 0 2 1 2

• for Poisson equation with stencil h12 [−1 2 − 1] (see exercises)


1
→ coarse-grid stencil reproduces: (2h) 2 [−1 2 − 1]
• for pure convection and central differencing, stencil h1 [1 0 − 1],
1
→ coarse-grid stencil reproduces: 2h [1 0 − 1] (check!)
→ but leads to unstable discretisation
• question: result for upwind discretisation: stencil h1 [−1 1 0]?

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 19


Galerkin Coarsening and Convection
• coarsening with linear Interpolation, full weighting restriction,
and upwind stencil h1 [−1 1 0]
• leads to coarse-grid stencil h1 − 43 12 14
 

→ not a diagonal-dominant matrix


→ unstable discretisation
• remedy: use “downwind” interpolation and “upwind” restriction:
 
0 0 0
1 0 0 
   
1 1 0 0 0 0 0 1 0 0 
Rh2h =  0 0 1 1 0 0 0  h
 
P2h = 0 1 0


0 0 0 0 1 1 0 0 1 0



0 0 1 
0 0 1
• result: upwind discretisation stencil reproduced: 1
2h [−1 1 0]!

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 20


Matrix-Dependent Interpolation/Restriction
• assume general 1D 3-point discretisation stencil: [sl sc sr ]
where sc = −(sl + sr ) > 0
• use the following 3-point interpolation stencil: − ssl sr
 
c
1 − sc and
h
P2h = (Rh2h )T , thus:
 sl
− sc 1 − sscr 0

0 0 0
2h sl sr
Rh = 
 0 0 − sc 1 − sc 0 0  
0 0 0 0 − sscl 1 sr
− sc
• Galerkin coarsening A2h := Rh2h Ah P2h
h
leads to stencil
1 2
sl2 + sr2 − sr2

−sl → check this!
sc
• remains diagonal dominant → stable coarse-grid discretisation
• try as exercise:
compute coarse-grid operator for convection-diffusion equation;
compare operator- and matrix-dependent interpolation/restriction
Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 21
Matrix-Dependent Interpolation/Restriction (2)
• take a closer look at the matrix multiplication R 2h Ah :
h
 
. .. ..
   .. . .
.. .. ..

 
 . . . 0 0 0 0  0 sl sc sr 0 ··· 
 sl sr
 
 ··· 0 sl sc sr 0 ··· 
 ··· 0 − 1 − 0 · · ·  
 sc sc 


.. .. ..
 ··· 0 sl sc sr 0 
0 0 0 0 . . .
 
.. .. ..
 
. . .
• equivalent to performing row operations as in Gaussian elimination:
 
.. .. ..
 . . . 
 
 0 sl sc sr 0 ···  ·(− sl )
  sc
 ··· 0 sl sc sr 0 ···  ·1
 
·(− ssr )
 

 ··· 0 sl sc sr 0   c

.. .. ..
 
. . .
• red entries become 0 ⇒ coarse-grid unknown no longer depends on fine-grid unknowns
• similar for multiplication Ah P h , but with column operations
2h

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 22


Smoothers
Efficient smoothers for the Poisson problem (see tutorials):
• Gauss-Seidel
• red-black Gauss-Seidel
• damped (ω = 23 ) Jacobi
→ why ω = 32 ?
→ examine smoothing factors (dependent on wave number k )

How about . . .
• Jacobi (non-weighted)?
→ does not work (zig-zag pattern prevents smoothing)
• SOR?
→ typically does not work well for Poisson model problem
(does not smooth high frequencies efficiently)
→ can help for other problems using a tailored ω

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 23


Smoothers – Anisotropic Problems
Model problems and observation:
• anisotropic Poisson equation: uxx + uyy = f with   1
(or similar:   1)
• Poisson equation on stretched grids:
1 1
(ui+1,j − 2ui,j + ui−1,j ) + 2 (ui,j+1 − 2ui,j + ui,j−1 ) = fij
hx2 hy

with hx  hy (or similar: hx  hy )


• Strong dependency in x-direction, weak dependency in y-direction
(or vice versa)
• Good smoothing of the error only in x-direction
(or in y-direction):
 
1 2 2 2 2
ui,j = −h h f
x y ij + hy (ui+1,j + ui−1,j ) + hx (ui,j+1 + ui,j−1 )
2hx2 + 2hy2

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 24


Smoothers – Anisotropic Problems (2)
Semi-Coarsening:
• situation on coarser grid for example Hx = 2hx , Hy = hy :

1 1
(ui+1,j − 2ui,j + ui−1,j ) + 2 (ui,j+1 − 2ui,j + ui,j−1 ) = fij
4hx2 hy

• thus: anisotropy has weakened on the semi-coarsened grid

Line Smoothers:
• perform a column-wise (or row-wise) Jacobi/Gauss-Seidel relaxation
→ solve each column (or row) simultaneously:
(n+1) (n+1) (n+1) (n) (n)
+ ui+1,j = h2 fij −  ui,j−1 + ui,j+1

ui−1,j − (2 + 2)uij

• use direct, tridiagonal solver for each “line” (i.e., row or column)

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 25


Smoothers – Convection-Diffusion

• example: 1D Convection-Diffusion equation

−uxx + ux = f , 1

• “upwind discretisaton”:

 1
− 2
(un−1 − 2un + un+1 ) + (un − un−1 ) = fn
h h
• (weighted) Jacobi and red-black Gauss-Seidel?
→ no smoothing, basically updates one grid point per iteration
• Gauss-Seidel (relaxation from “left to right”)?
→ almost an exact solver

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 26


Smoothers – More Complicated Situations

Problems:
• anisotropic Poisson with space-dependent  = (x, y ),
or more general: 
−∇ D(x, y )∇u(x, y) = f (x, y )
• convection-diffusion with variable convection:

−uxx + v (x)ux = f − ∆u + v (x, y )∇u(x, y ) = f (x, y)

Approaches for Smoothing:


• alternating line smoothers, “plane smoothers” in 3D
• “Zebra” line smoothers (similar to red-black-GS)
• Gauss-Seidel smoothing in “downwind” order
→ difficult to do for complicated flows in 2D and 3D

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 27


Multigrid Cycles Revisited

Recall: The Multigrid V-Cycle


1. pre-smoothing on the fine level system Al xl = bl
approximate solution xl
2. compute the residual rl = bl − Al xl
3. restriction of rl to the coarse grid Ωl−1 r̂l−1
4. solve coarse grid system Al−1 el−1 = r̂l−1 =: bl−1
by a recursive call to the V-cycle algorithm
5. interpolate the coarse grid solution el−1 êl to the fine grid Ωl
6. add the interpolated coarse-grid correction: xl = xl + êl
7. post-smoothing on the fine grid

Now: improvements possible on the cycling scheme?

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 28


The W-Cycle

• perform two coarse grid correction steps instead of one

Ωh Ωh Ωh Ωh
AU 
 AA
U 
Ω2h Ω2h Ω2h Ω2h Ω2h
AU 
 AU 
 AU 
Ω4h Ω4h Ω4h Ω4h Ω4h Ω4h Ω4h Ω4h
AA
U 
 AA
U 
 AA
U 
 AA
U  AAU 
Ω8h Ω8h Ω8h Ω8h Ω8h
(V-cycle and W-cycle)

• more expensive
• useful in situations where the coarse grid correction is not very accurate

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 29


The Full Multigrid V-Cycle (FMV- or F-Cycle)
Recursive algorithm:
• combines nested iteration and V-cycle
• (recursively!) perform an FMV-cycle on the next coarser grid to get a
good initial solution
• interpolate this initial guess to the current grid
• perform a V-cycle to improve the solution

Ωh Ωh

 AA
U 
Ω2h Ω2h Ω2h Ω2h
 AA
 U 
 AA
U 

Ω4h Ω4h Ω4h Ω4h Ω4h Ω4h
 AU 
  AA
U 
 AA
U 

Ω8h Ω8h Ω8h Ω8h

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 30


Speed of Convergence vs. Multigrid Cycles
For the “Model Problem” (i.e., Poisson Problem):
• O(n) to solve up to “reduce error by a factor of . . . ” (10−8 , e.g.)
• wanted: solve to the “level of truncation”
→ depending on discretisation error; for example O(h2 )
• O(n) up to “level of truncation” achieved by FMV-Cycle;
ideal case: 1 cycle; after each V-cycle, the “level of truncation” error is
achieved on that grid on that grid

For Other Problems:


• OK for strongly elliptic problems
• multigrid variants for non-linear problems, parabolic/hyperbolic, . . .
• every component may fail, leading to slow or no convergence:
smoother, interpolation/restriction, coarse-grid operator
• achieving “textbook efficiency” usually a demanding task

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 31


Literature/References – Multigrid

• Briggs, Henson, McCormick: A Multigrid Tutorial (2nd ed.), SIAM, 2000.


• Trottenberg, Oosterlee, Schüller: Multigrid, Elsevier, 2001.
• Shapira: Matrix-Based Multigrid: Theory and Applications, Springer,
2008.
• Hackbusch: Iterative Solution of Large Sparse Systems of Equations,
Springer 1993.
• Brandt, Livne: Multigrid Techniques: 1984 Guide with Applications to
Fluid Dynamics, Revised Edition, SIAM
• M. Griebel: Multilevelmethoden als Iterationsverfahren über
Erzeugendensystemen, Teubner Skripten zur Numerik, 1994
M. Griebel: Multilevel algorithms considered as iterative methods on
semidefinite systems, SIAM Int. J. Sci. Stat. Comput. 15(3), 1994.

Michael Bader | Scientific Computing II | Towards Multigrid Methods | Summer 2022 32


Scientific Computing II
Multigrid and Finite Elements

Michael Bader
Technical University of Munich
Summer 2022
Part I

Finite Elements and Hierarchical


Bases

Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 2
Remember: Finite Elements – Main Ingredients
1. solve weak form of PDE to reduce regularity properties
Z Z
u 00 = f −→ v 0 u 0 dx = vf dx

→ allows additional weak solutions


2. compute a function as numerical solution
→ search in a function space Wh :
X
uh = uj ϕj (x), span{ϕ1 , . . . , ϕJ } = Wh
j

3. find weak solutions of simple form:


for example piecewise linear functions and choose basis functions with
local support (“hat functions”)
→ leads to system of linear equations

Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 3
Test and Shape Functions
• consider a general PDE Lu = f on some domain Ω
• search for solution functions uh of the form
X
uh = uj ϕj (x)
j

the ϕj (x) are typically called shape or ansatz functions


• the basis functions ϕj (x) build a vector space
(i.e., a function space) Wh

span{ϕ1 , . . . , ϕJ } = Wh
• insert into weak formulation
Z X  Z
vL uj ϕj (x) dx = vf dx ∀v ∈ V
j

Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 4
Test and Shape Functions (2)
• choose a basis {ψi } of the test space Vh
typically defined on some discretisation grid Ωh
• then: if all basis functions ψi satisfy
Z X  Z
ψi (x)L uj ϕj (x) dx = ψi (x)f (x) dx ∀ψi
j

then all v ∈ Vh satisfy the equation


• the {ψi } are therefore often called test functions
• we obtain a system of equations for unknowns uj :
one equation for each test function ψi
• Vh is often chosen to be identical to Wh (Ritz-Galerkin method)
→ we then have as many equations as unknowns
• leads to system of linear equations Au = b where
X Z Z
(Au)i := uj ψi (x)Lϕj (x) dx = ψi (x)f (x) dx =: bi
j | {z }
= Aij
Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 5
Example: Nodal Basis
 1

 h (x − xi−1 ) xi−1 < x < xi
1
h (xi+1 − x)
ϕi (x) := xi < x < xi+1

0 otherwise

0,8

0,6

0,4

0,2

0
0 0,2 0,4 0,6 0,8 1
x

Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 6
Example Problem: 1D Poisson
• in 1D: −u 00 (x) = f (x) on Ω = (0, 1),
hom. Dirichlet boundary cond.: u(0) = u(1) = 0
• weak form: Z 1 Z 1
v 0 (x) · u 0 (x) dx = v (x)f (x) dx ∀v
0 0
• grid points xi = ih, (for i = 1, . . . , n − 1); mesh size h = 1/n
• Vh = Wh : piecewise linear functions (on intervals [xi , xi+1 ])
• leads to stiffness matrix:
 
2 −1
 .. 
1
 −1 2 .


h .. .. 
 . . −1 
−1 2

Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 7
Hierarchical Basis
• hat functions with multi-level resolution
1

0,8

0,6

0,4

0,2

0
0 0,2 0,4 0,6 0,8 1
x

• FEM solution identical for hierarchical and nodal basis


(same function space!)
• known from Scientific Computig I:
diagonal stiffness matrix for 1D Poisson!
Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 8
Hierarchical vs. Nodal Basis
f(x)
3 u(x) 3 u(x)=iii(x)

2 2

1 1

0 0
0 xi 1 0 1
h3=2-3

3 u(x)=iii(x) 3
ii(x)
2 2

1 1

0 0
0 xi 1 0 1
h3=2-3
Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 9
Hierarchical Basis and Multigrid
• What happens, if we use FEM on hat function bases with different
resolutions?
• Define “mother of all hat functions”
φ(x) := max{1 − |x|, 0}
• consider mesh size hn = 2−n and grid points xn,i = i · hn
• nodal basis then Φn := {φn,i , 0 ≤ i ≤ 2n } with
 
x − xn,i
φn,i (x) := φ
hn
• hierarchical basis combines Φb n := {φn,i , i = 1, 3, . . . , 2n − 1} (only odd
indices) and defines basis as
n
[
Ψn := Φ
bl
l=1

Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 10
Hierarchical Basis Transformation
(or: How to represent functions on a coarser grid?)

1 1

0,8 0,8

0,6 0,6

0,4
−→ 0,4

0,2 0,2

0 0
0 0,2 0,4 0,6 0,8 1 0 0,2 0,4 0,6 0,8 1
x
x

• represent hat functions φn−1,i (x) via fine-level functions φn,j (x)

φn−1,i (x) = 21 φn,2i−1 (x) + φn,2i (x) + 12 φn,2i+1 (x)

• hierarchical-basis transformation as matrix-vector product:


      
ψn,i−1 (x) φn,2i−1 (x) 1 0 0 φn,2i−1 (x)
 ψn,i (x)  :=  φn−1,i (x)  =  1 1 1   φn,2i (x) 
2 2
ψn,i+1 (x) φn,2i+1 (x) 0 0 1 φn,2i+1 (x)

Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 11
Hierarchical Basis Transformation (2)
Level-by-level algorithm for hierarchical transform:

Φ1 Φ2 Φ3 Φ4 Φ5 Φ6 Φ7 Ψ1 Ψ2 Ψ3 Ψ5 Ψ6 Ψ7

−→
. .
x1 x2 x3 x4 x5 x6 x7 x1 x2 x3 x4 x5 x6 x7

Ψ1 Ψ2 Ψ3 Ψ5 Ψ6 Ψ7 Ψ1 Ψ2 Ψ3 Ψ4 Ψ5 Ψ6 Ψ7

−→
. .
x1 x2 x3 x4 x5 x6 x7 x1 x2 x3 x4 x5 x6 x7

Remark: allows to implement transform in O(N)

Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 12
Hierarchical Basis Transformation (3)
• hierarchical basis transformation: ψn,i (x) =
P
Hi,j φn,j (x)
j

• transform can be written as matrix-vector product: ψ~ n = Hn φ


~n
• step-wise transform from each grid level to the next similar to
 
1 0 0 0 0 0 0
1 1 1
0 0 0 0
2 2 
0 0 1 0 0 0 0
(2)  1 1

H3 0
= 0 2 1 2 0 0 cmp. restriction operator!
0 0 0 0 1 0 0
 
0 1 1
0 0 0 2 1 2
0 0 0 0 0 0 1

• Hn then a sequence of level-to-next-level transforms:


(1) (2) (n−2) (n−1)
Hn = Hn Hn . . . Hn Hn
Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 13
Hierarchical Coordinate Transformation
• consider function f (x) ≈
P
ai ψn,i (x) represented via hier. basis
i
• wanted: corresponding representation in nodal basis
X X
bk φn,k (x) = ai ψn,i (x) ≈ f (x)
k i

• with ψn,i (x) =


P
Hi,j φn,j (x) we obtain
j
X X X XX
bk φn,k (x) = ai Hi,j φn,j (x) = ai Hi,j φn,j (x)
k i j j i

• compare coordinates (identify indices j and k) and get


X X
H T j,i ai

bj = Hi,j ai =
i i

• written in vector notation: b = H a T

Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 14
FEM and Hierarchical Basis Transform
• FEM discretisation with hierarchical test and shape functions:
Z X  Z
ψi (x)L uj ψj (x) dx = ψi (x)f (x) dx ∀ψi
j

• leads to respective stiffness matrix AHB


i,j :
Z X  X Z X
ψi (x)L uj ψj (x) dx = uj ψi (x)Lψj (x) dx = uj AHB
i,j
j j j

• vs. stiffness matrix with nodal basis as shape functions:


Z X  X Z X
ψi (x)L vj φj (x) dx = vj ψi (x)Lφj (x) dx = vj A∗i,j
j j j

• Note: j uj AHB ∗
P P R
i,j and j vj Ai,j are both equal to ψi (x)f (x) dx
• Therefore: (AHB u)i = j uj AHB ∗ ∗
)i and v = H T u
P P
i,j = v A
j j i,j = (A v

Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 15
FEM and Hierarchical Basis Transform (2)
• status: FEM with hierarchical test and nodal shape functions
Z X  Z
ψi (x)L vj φj (x) dx = ψi (x)f (x) dx
j

• represent test functions via nodal basis:


Z X X  Z X
Hi,k φk (x)L vj φj (x) dx = Hi,k φk (x)f (x) dx
k j k
X Z X  X Z
Hi,k φk (x)L vj φj (x) dx = Hi,k φk (x)f (x) dx
k j k

• leads to new system of equations: HANB v = H bNB


where ANB and bNB stem from nodal-basis FEM discretisation!
• with v = H T u we obtain H ANB H T u = H b as system of equations, thus:
AHB = H ANB H T ( Galerkin coarsening)
Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 16
FEM and Hierarchical Basis Transform – Summary
• in general: FEM with nodal test and shape functions
Z X  Z
φi (x)L vj φj (x) dx = φi (x)f (x) dx ANB u NB = bNB
j

changed to different test and shape functions:


Z X  Z
ψi (x)L vj ψj (x) dx = ψi (x)f (x) dx A?? u ?? = b??
j

• change from nodal to hierarchical basis:


AHB u HB = H ANB H T u HB = H bNB = bHB
• change from nodal to coarser-level nodal basis:
AC u C = R ANB R T u C = R bNB = R C
• Note: R, R T as in Galerkin coarsening; strongly related to H, H T

Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 17
Part II

Finite Elements and Multigrid

Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 18
Hierarchical Generating System
Φ1,1
W1 V1
l =1
. .
x1,1 x1,1
Φ2,1 Φ2,3
W 2 V3 V2
l =2
. .
x2,1 x2,3 x2,1 x2,2 x2,3
Φ3,1 Φ3,3 Φ3,5 Φ3,7
W3 V3
l =3
. .
x3,1 x3,3 x3,5 x3,7 x3,1 x3,2 x3,3 x3,4 x3,5 x3,6 x3,7

• hierarchical basis combines Φ


b n := {φn,i , i = 1, 3, . . . , 2n − 1} (only odd
indices)
n
• generating system combines nodal bases of all levels:
S
Φl
l=1
Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 19
FEM and Hierarchical Generating Systems
• solution function uh represented as
l
−1
n 2X
X
uh = vl,j φl,j (x)
l=1 j=1

→ non-unique “multi-grid” representation!


• FEM discretisation with test and shape functions from hierarchical
generating system:
Z X X  Z [n
φk ,i (x)L vl,j φl,j (x) dx = φk,i (x)f (x) dx ∀φk,i ∈ Φk
l j k=1

• assume order of test/shape functions:


φn,1 , . . . , φn,2n −1 , φn−1,1 , . . . , φn−1,2n−1 −1 , . . . , φ2,1 , φ2,2 , φ2,3 , φ1,1
• leads to (linearly dependent!) system of linear equations:
AGS uh = bGS
Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 20
Multi-Level System of Equations
• system matrix AGS given as (for example)
 
Ah A2hh A4h
h
 h
A2h A2h A4h

2h 
Ah4h A2h
4h A4h
• Ah , A2h , and A4h are the nodal-basis stiffness matrices for resolution h,
2h, and 4h
• consider submatrix Ah2h → computed as
Z X  Z
φ2h,i (x)L vh,j φh,j (x) dx = φ2h,i (x)f (x) dx ∀φ2h,i ∈ Φ2h
j

• test functions φ2h,i (x) = 21 φh,2i−1 (x) + φh,2i (x) + 12 φh,2i+1 (x)
• not a hierarchical transform, but a restriction operation Rh2h :
~ 2h = R 2h φ
φ ~h thus: Ah2h = Rh2h Ah
h

Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 21
Multi-Level System of Equations (2)

• system of linear equations AGS v GS = bGS thus given as


 h h
  
Ah Ah P2h Ah P4h

vh bh
 v2h  = Rh2h bh 
 2h 2h
Rh Ah A2h A2h P4h

4h
R Ah R A2h4h
A4h v4h Rh4h bh
h 2h

with restriction and prolongation operators: R = P T


• matrix is singular(!) due to obviously linearly dependent rows
• however: all solutions lead to same piecewise linear function
• result of a relaxation method on this system of equations?
symmetric Gauss-Seidel sweep corresponds to one multigrid
V-Cycle! (Griebel, 1994)

Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 22
Symmetric Gauss-Seidel on AGS v GS = bGS
• pick out second block row of the system AGS v GS = bGS :

Rh2h Ah vh + A2h v2h + A2h P4h


2h
v4h = Rh2h bh

• Gauss-Seidel → thus solve for unknowns in v2h :


 
2h (old)
A2h v2h + A2h P4h v4h = Rh2h bh − Rh2h Ah vh(new) = Rh2h bh − Ah vh(new)

• observation
 #1: relaxation
 works on restricted fine-grid residual
2h (new)
Rh bh − Ah vh
• observation #2: relaxation considers prolongated coarse-grid correction
2h (old)
from previous iteration: A2h P4h v4h
⇒ FEM on hierarchical generating systems matches V-Cycle Multigrid
with Galerkin coarsening

Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 23
Literature/References – Multigrid

• Briggs, Henson, McCormick: A Multigrid Tutorial (2nd ed.), SIAM, 2000.


• Trottenberg, Oosterlee, Schüller: Multigrid, Elsevier, 2001.
• Shapira: Matrix-Based Multigrid: Theory and Applications, Springer,
2008.
• Hackbusch: Iterative Solution of Large Sparse Systems of Equations,
Springer 1993.
• Brandt, Livne: Multigrid Techniques: 1984 Guide with Applications to
Fluid Dynamics, Revised Edition, SIAM
• M. Griebel: Multilevelmethoden als Iterationsverfahren über
Erzeugendensystemen, Teubner Skripten zur Numerik, 1994
M. Griebel: Multilevel algorithms considered as iterative methods on
semidefinite systems, SIAM Int. J. Sci. Stat. Comput. 15(3), 1994.

Michael Bader | Scientific Computing II | Multigrid and Finite Elements | Summer 2022 24
Scientific Computing II
Conjugate Gradient Methods

Michael Bader
Technical University of Munich
Summer 2022
Families of Iterative Solvers

• relaxation methods:
• Jacobi-, Gauss-Seidel-Relaxation, . . .
• Over-Relaxation-Methods
• Krylov methods:
• Steepest Descent, Conjugate Gradient, . . .
• GMRES, . . .
• Multilevel/Multigrid methods,
Domain Decomposition, . . .

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 2


Remember: The Residual Equation
• for Ax = b, we defined the residual as:

r (i) = b − Ax (i)

• and the error: e(i) := x − x (i)


• leads to the residual equation:

Ae(i) = r (i)

• relaxation methods: solve a modified (easier) SLE:

B ê(i) = r (i) where B∼A

• multigrid methods: coarse-grid correction on residual equation


(i) (i) (i)
AH eH = rH and x (i+1) := x (i) + PHh eH

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 3


Part I

Quadratic Forms and Steepest


Descent
Quadratic Forms
Direction of Steepest Descent
Steepest Descent

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 4


Quadratic Forms
A quadratic form is a scalar, quadratic function of a vector of the form:
f (x) = 12 x T Ax − bT x + c, where A = AT

2
y
160 x
-2 0 2 4 6
120 0

80
-2
40

0 4 -4
-4 2
-2
0
0
2 -2 y
x -6
4 -4
6 -6

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 5


Quadratic Forms (2)

The gradient of a quadratic form is defined as


 ∂ 
∂x1 f (x)
f 0 (x) = 
 .. 
. 

∂xn f (x)

• apply to f (x) = 12 x T Ax − bT x + c, then


• f 0 (x) = Ax − b
• f 0 (x) = 0 ⇔ Ax − b = 0 ⇔ Ax = b

⇒ Ax = b equivalent to a minimisation problem


⇒ proper minimum only if A positive definite

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 6


Direction of Steepest Descent
• gradient f 0 (x): direction of “steepest ascent”
• f 0 (x) = Ax − b = −r (with residual r = b − Ax)
• residual r : direction of “steepest descent”
4

2
y

x
-2 0 2 4 6
0

-2

-4

-6

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 7


Solving SLE via Minimum Search

• basic idea to find minimum:


move into direction of steepest descent
• most simple scheme:
x (i+1) = x (i) + αr (i)
• α constant ⇒ Richardson iteration
(usually considered as a relaxation method)
• better choice of α:
move to lowest point in that direction Steepest Descent

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 8


Steepest Descent – find an optimal α
• task: line search along the line x (1) = x (0) + αr (0)
• choose α such that f (x (1) ) is minimal:


f (x (1) ) = 0
∂α
• use chain rule:
∂ ∂ (1)
f (x (1) ) = f 0 (x (1) )T x = f 0 (x (1) )T r (0)
∂α ∂α
• remember f 0 (x (1) ) = −r (1) , thus:
 T
!
− r (1) r (0) = 0

hence, f 0 (x (1) ) = −r (1) should be orthogonal to r (0)

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 9


Steepest Descent – find α (2)

 T  T
r (1) r (0) = b − Ax (1) r (0) = 0
 T
b − A(x (0) + αr (0) ) r (0) = 0
 T  T
b − Ax (0) r (0) − α Ar (0) r (0) = 0
 T  T
r (0) r (0) − α r (0) Ar (0) = 0

Solve for α: T
r (0) r (0)
α= T
r (0) Ar (0)

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 10


Steepest Descent – Algorithm 4

2
y
1. r (i) = b − Ax (i) -2 0 2 4
x
6
T 0
r (i) r (i)
2. αi = T
-2
r (i) Ar (i)
3. x (i+1) = x (i) + αi r (i) -4

Observations: -6

• slow convergence (sim. to Jacobi relaxation)


 i
• detailed analysis reveals: e(i) ≤ κ−1κ+1 e(0)
A A
• with κ = λmax /λmin
(largest/smallest eigenvalues of A; for positive definite A)
• many steps in the same direction

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 11


Steepest Descent – Example 1
     
3 2 2 −3
Consider example x= , starting solution
2 6 −8 −3

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 12


Steepest Descent – Example 2
     
3 2 2 −10
Consider example x= , starting solution
2 100 −8 −2

Multiple steps in the same search direction!

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 13


Part II

Conjugate Gradients

Conjugate Directions
A-Orthogonality
Conjugate Gradients
A Miracle Occurs . . .
CG Algorithm

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 14


Conjugate Directions

• observation:
Steepest Descent takes repeated steps in the same direction
• obvious idea:
try to do only one step in each direction
• possible approach:
choose orthogonal search directions d (0) ⊥ d (1) ⊥ d (2) ⊥ . . .
• notice:
errors then orthogonal to previous directions:

e(1) ⊥ d (0) , e(2) ⊥ d (1) ⊥ d (0) , . . .

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 15


Conjugate Directions (2)
• wanted: best-possible α for correction x (i+1) = x (i) + αi d (i)
• use orthogonality criterion determine α from
 T  T  
d (0) e(1) = d (0) e(0) − αd (0) = 0

requires propagation of the error e(1) = x − x (1)


x (1) = x (0) + αi d (0)
(1)
x −x = x − x (0) − αi d (0)
(1)
e = e(0) − αi d (0)

• formula for α:
T
d (0) e(0)
α= T
d (0) d (0)
• but: we don’t know e(0)

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 16


A-Orthogonality
• make the search directions A-orthogonal:
 T
d (i) Ad (j) = 0
• again: errors A-orthogonal to previous directions:
 T
!
e(i+1) Ad (i) = 0

• equiv. to minimisation in search direction d (i) :

∂  (i+1) 
  T ∂
0 (i+1)
f x = f x x (i+1) = 0
∂α ∂α
 T
⇔ − r (i+1) d (i) = 0
 T
⇔ − d (i) Ae(i+1) = 0

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 17


A-Conjugate Directions

• remember the formula for conjugate directions:


T
d (0) e(0)
α= T
d (0) d (0)

• same computation, but with A-orthogonality:


T T
d (i) Ae(i) d (i) r (i)
αi = T = T
d (i) Ad (i) d (i) Ad (i)

(for the i-th iteration)


• these αi can be computed!
• still to do: find A-orthogonal search directions

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 18


A-Conjugate Directions (2)

classical approach to find orthogonal directions →


conjugate Gram-Schmidt process:
• from linearly independent vectors u (0) , u (1) , . . . , u (i−1)
• construct orthogonal directions d (0) , d (1) , . . . , d (i−1)

i−1
X
d (i) = u (i) + βik d (k )
k =0
(u (i) )T Ad (k)
βik = − (k ) T (k)
(d ) Ad
• needs to keep all old search vectors in memory
• O(n3 ) computational complexity ⇒ infeasible (much too expensive!)

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 19


Conjugate Gradients
• use residuals (i.e., u (i) := r (i) ) to construct conjugate directions:
i−1
X
(i) (i)
d =r + βik d (k )
k =0

• new direction d (i) should be A-orthogonal to all d (j) :


i−1
! T T X T
0 = d (i) Ad (j) = r (i) Ad (j) + βik d (k) Ad (j)
k=0

• all directions d (k) (for k = 0, . . . , i − 1) are already A-orthogonal


(and j < i), hence:
T
T T r (i) Ad (j)
0 = r (i) Ad (j) + βij d (j) Ad (j) ⇒ βij = − T
d (j) Ad (j)

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 20


Conjugate Gradients – Status
1. conjugate directions and computation of αi :
T
d (i) r (i)
αi = T
d (i) Ad (i)
x (i+1) = x (i) + αi d (i)

2. use residuals to compute search directions:


i−1
X
d (i) = r (i) + βik d (k)
k=0
(i) T

r Ad (k )
βik = − T
d (k) Ad (k)

→ still too expensive, as we need to store all vectors d (k)


Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 21
A Miracle Occurs – Part 1

Two small contributions:


1. propagation of the error e(i) = x − x (i)

x (i+1) = x (i) + αi d (i)


x − x (i+1) = x − x (i) − αi d (i)
e(i+1) = e(i) − αi d (i)
(we have used this once, already)

2. propagation of residuals
 
r (i+1) = Ae(i+1) = A e(i) − αi d (i)
⇒ r (i+1) = r (i) − αi Ad (i)

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 22


A Miracle Occurs – Part 2

Orthogonality of the residuals:


• search directions are A-orthogonal
• only one step in each direction
• hence: error is A-orthogonal to all previous search directions:
T
d (i) Ae(j) = 0, for i < j
• residuals are orthogonal to all previous search directions:
T
d (i) r (j) = 0, for i < j
• search directions are built from residuals:
 
span d (0) , . . . , d (i−1) = span r (0) , . . . , r (i−1)
• hence: residuals are orthogonal
T
r (i) r (j) = 0, i <j

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 23


A Miracle Occurs – Part 3
T
r (i) Ad (j) T
• Recall: βij = − T look at the nominator r (i) Ad (j)
d (j) Ad (j)
• combine orthogonality and recurrence for residuals:
T T T
r (i) r (j+1) = r (i) r (j) − αj r (i) Ad (j)
T T T
⇒ αj r (i) Ad (j) = r (i) r (j) − r (i) r (j+1)

• r (i) T r (j) = 0, if i 6= j:



1
T

 r (i) r (i) ,
αi i =j
T
r (i) Ad (j)
T (i)
= − 1
r (i)
r , i =j +1
 αi−1

0 otherwise.

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 24


A Miracle Occurs – Part 4
• computation of βik (for k = 0, . . . , i − 1):
 T
r (i) r (i)
(i) T

, if i = k + 1
 (k ) 
r Ad  T
βik = − T = αi−1 d (i−1) Ad (i−1)
d (k) Ad (k) 


0, if i > k + 1

• thus: search directions


i−1
X
(i) (i)
d = r + βik d (k) = r (i) + βi,i−1 d (i−1)
k=0
T
r (i) r (i)
βi := βi,i−1 = T
αi−1 d (i−1) Ad (i−1)

⇒ reduces to a simple iterative scheme for βi

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 25


A Miracle Occurs – Part 5
• build search directions

d (i+1) = r (i+1) + βi d (i)


T
r (i+1) r (i+1)
βi+1 = T
αi d (i) Ad (i)

(d (i) )T r (i)
• remember: αi =
(d (i) )T Ad (i)
• thus: αi (d (i) )T Ad (i) = (d (i) )T r (i)
T T
r (i+1) r (i+1) r (i+1) r (i+1)
⇒ βi+1 = T = T
d (i) r (i) r (i) r (i)

• last step: d (i) T r (i) = r (i) + βi−1 d (i−1) T r (i) = r (i) T r (i) + βi−1 d (i−1) T r (i) = r (i) T r (i)
    

(residual r (i) orthogonal to previous search direction d (i−1) )

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 26


Conjugate Gradients – Algorithm
Start with d (0) = r (0) = b − Ax (0)
While kr (i) k >  iterate over:

(r (i) )T r (i)
1. αi =
(d (i) )T Ad (i)

2. x (i+1) = x (i) + αi d (i)

3. r (i+1) = r (i) − αi Ad (i)


T
r (i+1) r (i+1)
4. βi+1 = T
r (i) r (i)

5. d (i+1) = r (i+1) + βi+1 d (i)

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 27


Conjugate Gradients – Example
     
3 2 2 −10
Consider example x= , starting solution
2 100 −8 −2

Convergence to solution after n = 2 steps!

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 28


Literature/References

Conjugate Gradients:
• Shewchuk: An Introduction to the Conjugate Gradient Method Without
the Agonizing Pain.
• Hackbusch: Iterative Solution of Large Sparse Systems of Equations,
Springer 1993.

Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2022 29


Scientific Computing II
Conjugate Gradients and Preconditioning

Michael Bader
Technical University of Munich
Summer 2022
Part I

Preconditioning

CG with Matrix Preconditioner


Preconditioners – Examples
ILU and Incomplete Cholesky

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 2
Conjugate Gradients – Convergence
Convergence Analysis:
• uses Krylov subspace:
n o
span r (0) , Ar (0) , A2 r (0) , . . . , Ai−1 r (0)

• “Krylov subspace method”

Convergence Results:
• in principle: direct method (n steps)
(however: orthogonality lost due to round-off errors → exact solution not found)
• in practice: iterative scheme
√ i
κ−1
e(i) ≤2 √ e(0) , κ = λmax /λmin
A κ+1 A

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 3
Preconditioning
• convergence depends on matrix A
• idea: modify linear system

Ax = b M −1 Ax = M −1 b,

then: convergence depends on matrix M −1 A


• optimal preconditioner: M −1 = A−1 :

A−1 Ax = A−1 b ⇔ x = A−1 b.


• in practice:
– avoid explicit computation of M −1 A
– find an M similar to A, compute effect of M −1
(i.e., approximate solution of SLE)
– or: find an M −1 similar to A−1
• also possible: solve AM y = b for y , and then x = My

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 4
CG and Preconditioning
• just replace A by M −1 A in the algorithm??
• problem: M −1 A not necessarily symmetric
(even if M and A both are)
• we will try an alternative first: symmetric preconditioning

Ax = b LT ALx̂ = LT b, x = Lx̂

• guarantees symmetry: LT AL T = LT AT LT T = LT AL
 

• remember: for Finite Element discretization, this corresponds to a


change of basis functions!
• can we implement CG without having to set up LT AL?
goal: compute all steps of CG on the preconditioned system,
but only perform operations with A, L and LT
• requires some re-computations in the CG algorithm
(see following slides)

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 5
“Change-of-Basis” Preconditioning
• preconditioned system of equations:
T
LT b ,

Ax = b L
| {z } x̂ = |{z}
AL x = Lx̂
=: Ab =: b̂
• computation of residual:
b x̂ = LT b − LT ALx̂ = LT (b − Ax) = LT r
r̂ = b̂ − A
• computation of α (for preconditioned system):
T T T
r̂ (i) r̂ (i) r̂ (i) r̂ (i) r̂ (i) r̂ (i)
αi := T = T = T
d̂ (i) A b d̂ (i) d̂ (i) LT AL d̂ (i) d̃ (i) A d̃ (i)
where we defined Ld̂ (i) =: d̃ (i)
• update of solution:
x̂ (i+1) = x̂ (i) + αi d̂ (i)
(i+1) (i+1)
⇒x = Lx̂ = Lx̂ (i) + Lαi d̂ (i) = x (i) + αi d̃ (i)

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 6
“Change-of-Basis” Preconditioning (2)
• update of residual r̂ :

r̂ (i+1) b d̂ (i) = r̂ (i) − αi LT AL d̂ (i)


= r̂ (i) − αi A
= r̂ (i) − αi LT A d̃ (i)
• computation of βi :
T
r̂ (i+1) r̂ (i+1)
βi+1 = T no change needed
r̂ (i) r̂ (i)
• update of search directions:

d̂ (i+1) = r̂ (i+1) + βi d̂ (i)


(i+1) (i+1)
⇒ d̃ = Ld̂ = Lr̂ (i+1) + Lβi d̂ (i)
= Lr̂ (i+1) + βi d̃ (i)

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 7
CG with “Change-of-Basis” Preconditioning
Start with r̂ (0) = LT (b − Ax (0) ) and d̃ (0) = L r̂ (0) ;
While kr̂ (i) k >  iterate over:
T
r̂ (i) r̂ (i)
1. αi = T
d̃ (i) A d̃ (i)

2. x (i+1) = x (i) + αi d̃ (i)

3. r̂ (i+1) = r̂ (i) − αi LT A d̃ (i)


T
r̂ (i+1) r̂ (i+1)
4. βi+1 = T
r̂ (i) r̂ (i)

5. d̃ (i+1) = L r̂ (i+1) + βi d̃ (i)

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 8
Part II

Hierarchical Basis Preconditioning

CG with Matrix Preconditioner


Preconditioners – Examples
ILU and Incomplete Cholesky

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 9
Hierarchical Basis Preconditioning
Specifics for CG implementation, if L/LT stems from hierarchical transform:
• L transforms coefficient vector from hierarchical basis to nodal basis,
for example x = L x̂ or d̃ = L d̂
• LT transforms the vector of basis functions from nodal basis to
hierarchical basis (cmp. FEM), thus r̂ = LT r
• effect of L and LT can be computed in O(N) operations

HB-CG for the Poisson problem:


• in 1D: convergence after log N iterations!
(in this case: LT AL diagonal matrix with log N different eigenvalues)
• in 2D and 3D very fast convergence!
• further improved by additional diagonal preconditioning
• so-called hierarchical generating systems (change to a multigrid basis)
achieve multigrid-like performance

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 10
“Change-of-Basis” Preconditioning
Hierarchical vs. “non-hierarchical” vectors

We switch between working on “hierarchical” and “non-hierarchical” vectors:


• computation of residual: r̂ = LT r
→ residual computed for preconditioned system
T
r̂ (i) r̂ (i)
• computation of α: αi := T → uses “hierarchical” residuals, but . . .
d̃ (i) A d̃ (i)

• with d̃ (i) := Ld̂ (i)


→ “hierarchical” search directions, transformed back to nodal basis
• update of solution: x (i+1) = x (i) + αi d̃ (i)
→ computes “non-hierarchical” solution directly
T
r̂ (i+1) r̂ (i+1)
• computation of β: βi+1 = T → uses “hierarchical” residuals
r̂ (i) r̂ (i)

• update of search directions: d̃ (i+1) = Lr̂ (i+1) + βi d̃ (i)


→ “hierarchical” search directions, transformed back to nodal basis

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 11
CG with Hierarchical Generating Systems
Recall: system of linear equations AGS v GS = bGS given as
 h h
  
Ah Ah P2h Ah P4h

vh bh
= Rh2h bh 
 2h 2h  v 
Rh Ah A2h A2h P4h  2h
4h 4h
Rh Ah R2h A2h A4h v 4h Rh4h bh

Preconditioning for CG? (cmp. Griebel, 1994):


• system AGS v GS = bGS is singular (AGS positive semi-definite)
→ subspace of solutions v GS (minima of the quadratic form!)
• all solutions v GS transform to the same non-hierarchical solution!
• preconditioned CG converges(!) with:
√ i
κ̂GS − 1
e(i) ≤ 2 √ e(0) , κ̂GS = λ̂max /λ̂min
A κ̂GS + 1 A

where κ̂GS is the ratio of largest vs. smallest non-zero eigenvalue


• for Poisson eq.: κ̂GS independent of h multigrid convergence
Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 12
Hierarchical Basis Transformation
Towards level-wise approach

Consider “semi-hierarchical” transform:

Φ1 Φ2 Φ3 Φ4 Φ5 Φ6 Φ7 Ψ1 Ψ2 Ψ3 Ψ5 Ψ6 Ψ7

−→
. .
x1 x2 x3 x4 x5 x6 x7 x1 x2 x3 x4 x5 x6 x7

(2)
Matrices for change of basis are then: (H3 to transform to hierarchical basis)
   
1 0 0 0 0 0 0 1 0 0 0 0 0 0
1 1
 2 1 2 0 0 0 0
  0 1 0 0 0 0 0
 
0 0 1 0 0 0 0 0 0 1 0 0 0 0
(1)  1 1
 (2)  1 1

H3 =  0 0 2 1 2 0 0
 H3 =  0 2 0 1 0 2 0

0 0 0 0 1 0 0 0 0 0 0 1 0 0
   
0 0 0 0 1 1 1  0 0 0 0 0 1 0
2 2
0 0 0 0 0 0 1 0 0 0 0 0 0 1
Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 13
Hierarchical Basis Transformation
Level-wise hierarchical transform:
• hierarchical basis transformation: ψn,i (x) = Hi,j φn,j (x)
P
j
~ n = Hn φ
• written as matrix-vector product: ψ ~n
~ n can be performed as a sequence of level-wise transforms:
• Hn φ

~ n = Hn(n−1) Hn(n−2) . . . Hn(2) Hn(1) φ


Hn φ ~n

• consider Hierarchical-Basis-Preconditioning for FEM-discretization:


LT := Hn and we need to compute r̂ (0) = LT r (0) and LT A d̃ (i)
• each level-wise transform Hn(k) has a simple loop implementation:
For k from 1 to n-1 do
For j from 2k to 2n step 2k
rj := 12 rj−2k −1 + rk + 12 rj+2k −1

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 14
Hierarchical Coordinate Transformation
• transform b = HnT a turns “hierachical” coefficients a into “nodal”
coefficients b: X X
bj φn,j (x) = ai ψn,i (x) ≈ f (x)
j i
(n−1) (n−2) (2) (1)
• Hn = Hn Hn ... Hn Hn
has a level-wise representation, thus:
 T  T  T  T
(1) (2) (n−2) (n−1)
HnT = Hn Hn . . . Hn Hn

• consider Hierarchical-Basis-Preconditioning for FEM-discretization:


L := HnT for computation of d̃ (0) = L r̂ (0) and d̃ (i+1) = L r̂ (i+1) + βi d̃ (i)
 T
• again: use loop-based implementations for Hn(k) a

For k from n-1 downto 1


For i from 2k−1 to 2n step 2k
di := 12 di−2k −1 + di + 12 di+2k−1 (with d0 = ad n = 0)

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 15
Part III

Matrix-based Preconditioning

CG with Matrix Preconditioner


Preconditioners – Examples
ILU and Incomplete Cholesky

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 16
CG and Preconditioning (revisited)
• preconditioning: replace A by M −1 A
• problem: M −1 A not necessarily symmetric
• compare symmetric preconditioning

Ax = b LT ALx̂ = Lb, x = Lx̂


• workaround: find E T E = M (Cholesky fact.), then

Ax = b E −T AE −1 x̂ = E −T b, x̂ = Ex
• what if E cannot be computed (efficiently)?
(neither M nor M −1 might be known explicitly!)
• E, E −T , E −1 can be eliminated from algorithm
(again requires some re-computations):
set d̂ = Ed and use r̂ = E −T r , x̂ = Ex, E −1 E −T = M −1

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 17
CG with Preconditioner
Start: r (0) = b − Ax (0) ; d (0) = M −1 r (0)

(r (i) )T M −1 r (i)
1. αi =
(d (i) )T Ad (i)

2. x (i+1) = x (i) + αi d (i)

3. r (i+1) = r (i) − αi Ad (i)


T
r (i+1) M −1 r (i+1)
4. βi+1 = T
r (i) M −1 r (i)

5. d (i+1) = M −1 r (i+1) + βi+1 d (i)

(for detailed derivation, see Shewchuck)

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 18
Implementation

Preconditioning steps: M −1 r (i) , M −1 r (i+1)


• M −1 known then multiply M −1 r (i)
• M known? Then solve My = r (i) to obtain y = M −1 r (i)
• neither M, nor M −1 are known explicitly:
• algorithm to solve My = r (i) is sufficient!
• algorithm to compute action of M −1 on a vector is sufficient!
⇒ any approximate solver for Ae = r (i) is sufficient
(if it is equivalent to applying a symmetric and pos. definite matrix)

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 19
Preconditioners for CG – Examples
Find M ≈ A and compute effect of M −1 :
• Jacobi preconditioning: M := DA
• (Symmetric) Gauss-Seidel preconditioning: M := LA or
M = (DA + L0A )DA−1 (DA + (L0A )T ), etc.
Just compute effect of M −1 :
• any approximate solver might do → incl. multigrid methods
• incomplete Cholesky factorization
→ i.e., incomplete LU-decomp. (ILU) for symm. positive definite matrix
• use a multigrid method as preconditioner(?)
→ worthwhile (only) in situations where multigrid does not work (well) as
stand-alone solver
Find an M −1 similar to A−1
• “sparse approximate inverse” (SPAI)
• tries to minimise kI − MAkF , where M is a matrix with (given) sparse
non-zero pattern
Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 20
Preconditioners – ILU and Incomplete Cholesky

Recall LU decomposition and Cholesky factorization:


• LU decomposition: given A, find lower/upper triangular matrices L and U
such that A = LU
• Cholesky factorization: given A = AT , find lower triangular matrix L such
that A = LLT → symmetric preconditioning!
• variants with explicit diagonal matrix D:

A = LD −1 U or A = LD −1 LT ,

where L = D + L0 and R = D + R 0 with strict lower/upper triangular L0 , R 0


• but: for sparse A, L and U may be non-sparse

Idea: disregard all fill-in during factorization

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 21
Cholesky Factorization
   −1  T   
L11 D11 L11 LT21 LT31 A11 AT21 AT31
−1
L21 L22 D22 LT22 T 
L32  = A21 A22 AT32 
    
 
L31 L32 L33 −1 LT33 A31 A32 A33
D33

Derive the factorization algorithm:


! −1 T
• assume that A11 = L11 D11 L11 is already factorized
• let L21 be a 1 × k submatrix, i.e., to compute next row of L:
−1 T !
L21 D11 L11 = A21
−1 T
D11 L11 upper triangular matrix → solve triangular system for L21
• by convention L22 = D22 (1 × 1 “matrix”), which is computed from:
−1 T −1 −T ! −1 T
L21 D11 L21 + L22 D22 L22 = A22 ⇒ L22 = D22 := A22 − L21 D11 L21

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 22
Incomplete Cholesky Factorization
Algorithm: ( A → LD −1 LT )
• initialize D := 0, L := 0
• for i = 1, . . . , n:
1. for k = 1, . . . , i − 1:
if (i, k ) ∈ S then set Lik := Aik − 0 Lij Djj−1 Lkj
P
j<k
2. set Lii = Dii := Aii − 0 Lij Djj−1 Lij
P
j<i
P0 P0
• note: sums and only consider non-zero elements ∈ S
j<k j<i
• uses given pattern S of non-zero elements in the factorization
(frequent choice: use non-zeros of A for S)
• Cholesky factorization computed in O(n) operations for sparse matrices
(with c · n non-zeros)
• frequently used for preconditioning

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 23
Literature/References

Conjugate Gradients:
• Shewchuk: An Introduction to the Conjugate Gradient Method Without
the Agonizing Pain.
• Hackbusch: Iterative Solution of Large Sparse Systems of Equations,
Springer 1993.
• M. Griebel: Multilevelmethoden als Iterationsverfahren über
Erzeugendensystemen, Teubner Skripten zur Numerik, 1994
M. Griebel: Multilevel algorithms considered as iterative methods on
semidefinite systems, SIAM Int. J. Sci. Stat. Comput. 15(3), 1994.

Michael Bader | Scientific Computing II | Conjugate Gradients and Preconditioning | Summer 2022 24
Scientific Computing II
Molecular Dynamics Simulation – Introduction

Michael Bader – SCCS


Technical University of Munich
Summer 2022
The Simulation Pipeline – Revisited
phenomenon, process etc.
- modelling
?
mathematical model
v
a - numerical treatment
l ?
i numerical algorithm
d
a - implementation
t ?
i simulation code
o
n visualization
?
results to interpret
HH

 HHj embedding
statement tool
Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 2
The Seven Dwarfs of HPC – Dwarf # 4

“dwarfs” = key algorithmic kernels in many scientific computing


applications

P. Colella (LBNL), 2004:


1. dense linear algebra
2. sparse linear algebra
3. spectral methods
4. N-body methods
5. structured grids
6. unstructured grids
7. Monte Carlo

→ discuss simulation pipeline for molecular dynamics


(as example for N-body methods)

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 3
Overview: Particle-Oriented Simulation Methods
General Modelling Approach:
• “N-body problem”
→ compute motion paths of many individual particles
• requires modelling and computation of inter-particle forces
• typically leads to ODE for particle positions and velocities

Numerical Aspects:
• how to discretize the resulting modelling equations?
• efficient time stepping algorithms?

Implementation Aspects:
• suitable data structures?
• efficient algorithms to compute short- and long-range forces?
• parallelisation?

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 4
Some types of N-Body Simulations
Smoothed Particle Discrete Element Methods Molecular Dynamics
Hydrodynamics (SPH) (DEM) (MD)
• Approximate gas/fluid • Particles have • Particles are single- or
via many particles, geometry multi-atom molecules
smoothed over gaps • Complex contact • Protein folding, free
• Cosmology potentials (e.g. twisting energy calculations, . . .
simulations, e.g. friction)

Ship breaking through ice


Source: Raza, N. et al., 2019. Analysis of Oden
Icebreaker Performance in Level Ice Using
Simulator for Arctic Marine Structures (SAMS). In
SARS-CoV-2 Spike Protein
Formation of galaxies. Proceedings of the 25th International Conference
Source: PDB
of Portand Ocean Engineering under Arctic
Source: http://swift.dur.ac.uk/ https://www.rcsb.org/structure/7DX5
Conditions Delft, The Netherlands.

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 5
Molecular Dynamics – Large Complex Molecules
• Short- and long-range interaction potentials
• E.g. life-science applications

Ion / water flow through channels in a double membrane.


(A) Setup (B) Simulation (C) Potential difference
Simulation with GROMACS; source: https://manual.gromacs.org/

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 6
Molecular Dynamics – Small Rigid Molecules
• Large number of particles
• E.g. thermodynamics applications

Comparison of a phase-field vs MD simulation of a droplet sliding on a surface. (l) mirrored


visualizations. (r) droplet location and speed.
Source: Diewald, Felix, et al. ”Molecular dynamics and phase field simulations of droplets on surfaces with wettability gradient.” Computer Methods in
Applied Mechanics and Engineering 361 (2020): 112773.

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 7
ls1 Mardyn

• Highly parallel MPI+OpenMP MD


simulations with dynamic load balancing
and automatic algorithm selection.
• Focus on large numbers of small, rigid
molecules.
• Applications in process engineering
– State transitions 11.8
μm
(e.g. evaporation, droplet formation)
– Determining equations of state World Record Simulation of
– Behavior of droplets 2.1 · 1013 Xenon molecules.
– Flow in shock tubes Source: Tchipev, Nikola, et al. ”TweTriS: Twenty trillion-atom
simulation.” The International Journal of High Performance
Computing Applications 33.5 (2019): 838-854.

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 8
Thermo
Prof. Dr.-Ing. habil. Jadran Vrabec
Fachgebiet Thermodynamik und Thermische Verfahrenstechnik

Stationary Evaporation under Pressure Fakultät III – Prozesstechnik

Evaporation across a planar vapor-liquid interface


Heinen & Vrabec: Evaporation sampled by stationary molecular dynamics simulation.
The Journal of Chemical Physics 151.4 (2019): 044704.

Delete particles of
forward flux 𝑗 +

Replenish liquid phase by Establish appropriate


same amount of particles backward flux 𝑗 −
deleted at the vapor according to shifted
boundary condition Maxwell distribution

Coloring: Boundary conditions Forward moving particles Backward moving particles (at instance of time)

Michael Bader – SCCSThe Journal


| Scientific of Chemical
Computing Physics
II | Molecular 151: –044704
Dynamics (2019)2022
Intro | Summer 9
HPC Example – ACM Gordon Bell Prize 2005
• Gordon-Bell-Prize 2005 (most important annual supercomputing award)
• phenomenon studied: solidification processes in Tantalum and Uranium
• method: 3D molecular dynamics, up to 524,000,000 atoms simulated
• machine: IBM Blue Gene/L, 131,072 processors (world’s #1 in November
2005)
• performance: more than 101 TeraFlops (almost 30% of the peak
performance)

(Streitz et al., 2005)


Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 10
HPC Example – ACM Gordon Bell Prize 2010

(Rahimian, . . . , Biros, 2010: DOI: 10.1109/SC.2010.42)


• direct simulation of blood flow
• particulate flow simulation (coupled problem)
• Stokes flow for blood plasma
• red blood cells as immersed, deformable particles
Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 11
HPC Example – ACM Gordon Bell Prize 2010 (2)

Simulation – HPC-Related Data:


• up to 260 Mio blood cells, up to 9 · 1010 unknowns
• fast multipole method to compute Stokes flow
(octree-based; octree-level 4–24)
• scalability: 327 CPU-GPU nodes on Keeneland cluster,
200,000 AMD cores on Jaguar (ORNL)
• 0.7 Petaflops/s sustained performance on Jaguar
• extensive use of GEMM routine (matrix multiplication)
• runtime: ≈ 1 minute per time step

Article (preprint) for Supercomputing conference:


http://www.cc.gatech.edu/~gbiros/papers/sc10.pdf

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 12
HPC Example – ACM Gordon Bell Prize 2014
Anton 2 – special-purpose MD supercomputer:

(a)
(Shaw et al.: Anton 2: Raising the(b)Bar for Performance and Programmability
(c)
in a
Fig. 1 (a) An Anton 2 ASIC. (b) An Anton 2 node board, with one ASIC underneath a phase-change heat sink. (c) A 512-node Anton 2 machine, with four racks.
Special-Purpose Molecular Dynamics Supercomputer, DOI: 10.1109/SC.2014.9)
Fine-grained operation is exposed to software via mance could be obtained on larger Anton 2 machine
distributed
• shared memory and an event-driven programming while the largest machine constructed to date contains 51
special-purpose ASIC to support
model, with hardware support for scheduling and dispatching event-driven computation,
nodes, the architecture scales up to 4,096 nodes.
“pairwise point interaction pipelines” etc.
small computational tasks. Our MD software for Anton 2
leverages these general-purpose mechanisms with new II. BACKGROUND: MOLECULAR DYNAMICS SIMULATION
• nodes/ASICs
algorithms, consisting of manydirectly connected
sequentially to formAna MD
dependent tasks, 3D simulation
torus topology
models the motion of a set of atom
that would have been impractical on Anton 1, but provide over a large number of discrete time steps. During each tim
• dedicated
additional to COVID-19
performance improvements on Antonresearch
2. in 2019
step, the forces acting on all atoms are computed: these force
In conjunction with support for fine-grained operation, consist of “bond term” forces between small groups of atom
Michael
Anton 2 contains SCCS | Scientific
Bader –architectural Computing
improvements that II | Molecular
are more usually
Dynamicsseparated
– Introby| Summer
1–3 covalent
2022bonds, and “non-bonded
13
HPC Example – Millennium-XXL Project

(Springel, Angulo, et al., 2010)


• N-body simulation with N = 3 · 1011 “particles”
• study gravitational forces
(each “particles” corresp. to ∼ 109 suns)
• simulates the generation of galaxy clusters
served to “validate” the cold dark matter model
Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 14
Millennium-XXL Project (2)

Simulation Figures:
• N = 3 · 1011 particles
• 10 TB RAM required only to store
positions and velocities (32-bit floats)
• entire memory requirements: 29 TB
• JuRoPa Supercomputer (Jülich)
• computation on 1536 nodes
(each 2x QuadCore 12 288 cores)
• hybrid parallelisation:
MPI plus OpenMP/Posix threads
• execution time: 9.3 days
(ca. 300 CPU years) Development of N-body problem sizes for
cosmology simulations
(source: www.magneticum.org)

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 15
Scales – an Important Issue

• length scales in simulations:


– from 10−9 m (atoms)
– to 1023 m (galaxy clusters)
• time scales in simulations:
– from 10−15 s
– to 1017 s
• mass scales in simulations:
– from 10−24 g (atoms)
– to 1043 g (galaxies)
• obviously impossible to take all scales into acount in an explicit and
simultaneous way
• first molecular dynamics simulations reported in 1957

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 16
Laws of Motion

• force on a molecule: F~i = P F ~


j6=i ij
• leads to acceleration (Newton’s 2nd Law):

∂U(~ri ,~rj )
~ ij
P
~
P
F F j6=i ∂|rij |
~r¨i = i = j6=i
=− (1)
mi mi mi
• system of dN ODE (2nd order)
(N: number of molecules, d: dimension),
• reformulated into a system of 2dN 1st-order ODEs:

p~i := mi ~r˙i (2a)


p~˙i = F~i (2b)

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 17
Example: Hooke’s Law

i j
rij
• “harmonic potential”: Uharm rij = 12 k rij − r0 2
 

• potential energy of a spring of length r0 when extended or compressed to


length rij
• resulting force:

 ∂U 
1D: Fij = −grad U rij = − = −k rij − r0
∂rij

~ ij = −k ~rij − ~r0

2D, 3D: F

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 18
Example: Gravity

• attractive force due to the mass of two bodies (planets, etc.)


• gravity potential: Ugrav rij = −g mri mj

ij
• resulting force:
 mi mj
1D: Fij = −grad U rij = −g 2
rij
Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 19
Example: Coulomb Potential

q1 r12 q2
+ −
• attractive or repulsive force between charged particles
• Coulomb potential: Ucol rij = 4π 1 qi qj

r 0 ij

• resulting force:

 1 qi qj
1D: Fij = −grad U rij =
4π0 rij2

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 20
Example – Smoothed Particle Hydrodynamics

“Forces” result from discretisation of a PDE:


• approximate functions using kernel functions W :
Z
f (x) ≈ f (r 0 )W (|r − r 0 |, h) dV 0
V

• for h → 0: W → δ (Dirac function)


• approximation of derivatives → integration by parts:
Z
∇f (x) ≈ f (r 0 )∇W (|r − r 0 |, h) dV 0
V

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 21
Example – Smoothed Particle Hydrodynamics (2)
• approximate integrals at particle positions:
N
X mj
f (ri ) ≈ f (rj )W (|ri − rj |, h)
ρ(rj )
j=1

• in particular for the density


N
X
ρ(ri ) ≈ mj W (|ri − rj |, h)
j=1

• similar for derivatives:


N
X mj
∇f (ri ) ≈ f (rj )∇W (|ri − rj |, h)
ρ(rj )
j=1

• leads to N-body problem (based on Navier-Stokes equations, e.g.)


Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Intro | Summer 2022 22
Scientific Computing II
Molecular Dynamics Simulation – Modelling

Michael Bader – SCCS


Technical University of Munich
Summer 2022
Part I

Intro: Molecular Models for Fluid


Mechanics?

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 2
Continuum Mechanics for Fluids
Fluid:
• term “fluid” covers liquids and gases
• liquids: hardly compressible
• gases: volume depends on pressure
• both: small resistance to changes of form

Continuum:
• “continuum” = space, continuously filled with mass
• homogeneous
• subdivision into small fluid voxels with constant physical properties is
possible
• idea valid on micro scale upward (where we consider continuous masses
and not discrete particles)

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 3
Description of State

• consideration of a control volume V0 (Eulerian perspective)


• description of the fluid’s state via
– the velocity field ~v (~x , t) and two thermodynamical quantities, typically
– the pressure p(~x , t) and
– the density ρ(~x , t)
• for incompressible fluids, the density ρ is constant
(if there are no chemical reactions)

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 4
Molecular Dynamics for Fluids?
N-Body Problem – Newton’s Laws of Motion:
~i = P F
• force on a molecule: F ~
j6=i ij
• leads to acceleration (Newton’s 2nd Law):

∂U(~ri ,~rj )
~ ij
P
~
P
F F j6=i ∂|rij |
~r¨i = i = j6=i
=− (1)
mi mi mi
• system of dN ODE (2nd order)
(N: number of molecules, d: dimension),
• reformulated into a system of 2dN 1st-order ODEs:

p~i := mi ~r˙i (2a)


p~˙i = F~i (2b)

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 5
Continuum vs. Molecular Dynamics
Compare Simulation Results for a (Micro-/Nano-)Channel Flow
mean free path
For various Knudsen numbers: Kn =
characteristic length
1.4 1.2
Kn = 0.1128 Kn = 4.5135
1.2 1.1

1
1
0.9
u

u
u x (y /H)/u

u x (y /H)/u
0.8
0.8
0.6
0.7
0.4
0.6
LBM Li et al. LBM Li et al.
0.2 Present LBM: 2nd Order, VA 0.5 Present LBM: 2nd Order, VA
2nd Order Slip NS 2nd Order Slip NS
Ohwada et al. Ohwada et al.
0 0.4
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
y /H y /H

→ continuum description only valid on coarse scale


→ other methods (Molecular Dynamics, Direct Simulation Monte Carlo, ...)
required
Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 6
Scope of Application

• Loschmidt number: 2.687 · 1019 cm−3


→ number of molecules in 1 cm3 of an ideal gas
• Avogadro constant: 6.0221415 · 1023 mol−1
→ number of C12 atoms in 12g of C12
→ number of molecules in 1 mol of a substance
(1 mol of ideal gas, under normal conditions, takes a volume of 22.4 litres)
• “Avogadro number”:
notion used in different ways for both of the above constants, which
depend on each other:

2.687 · 1019 cm−3 · 22.413996 · 103 cm3 mol−1 = 6.0221415 · 1023 mol−1

• time steps for numerical simulations are typically in the order of


femtoseconds (1 fs := 10−15 s)

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 7
Pour Me A Glass . . .
• assume we want to simulate all molecules in one glass (0.5l) of water
• assume a simulation over 1 second with a time step size of 1 fs
• assume we only need one floating point operation per molecule in each
time step
→ 1.673 · 1022 molecules (biggest MD simulation: ≈ 1013 molecules)
→ 1015 timesteps
→ 1.673 · 1037 operations

Using SuperMUC (3 · 1015 operations per second), we need at least


• 1.77 · 1014 years for the computations
• 8.03 · 108 PB of memory assuming we need to store only 3+3 unknowns
per molecule (position+velocity)
Scope of application thus limited to micro- and nanoscale simulations
(at least for the near future)
Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 8
Part II

Molecular Dynamics – the Physical


Model

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 9
Classical Molecular Dynamics

• Quantum mechanics: Schrödinger equation, probability distributions, etc.


→ too difficult for N-body settings, where N is large
→ use other approaches, such as Density Functional Theory (DFT)
• approximation via “classical” Molecular Dynamics
→ based on Newton’s equations of motion: F ~ i = mi~¨ri
• molecules are modelled as particles (simplest case: point masses)
• there are interactions (forces) between molecules
• multibody potential functions describe the potential energy of the system;
the velocities of the molecules (kinetic energy) are a composition of
– Brownian motion (high velocities, no macroscopic movement)
– flow velocity (for fluids)
• total energy is constant ↔ energy conservation

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 10
Fundamental Interactions
• Classification of the fundamental
interactions:
– strong nuclear force
– electromagnetic force rj rk
– weak nuclear force
– gravity ri O

• interaction → potential energy


• the total potential of N particles is the sum of multibody potentials:
N
X N
X N
X N
X N
X N
X
U := U1 (ri ) + U2 (ri , rj ) + U3 (ri , rj , rk ) + . . .
i=1 i=1 j=i+1 i=1 j=i+1 k =j+1

N!
there are ( Nn ) = n!(N−n)! ∈ O(N n ) n-body potentials Un ,
particulary N one-body and 12 N(N − 1) two-body potentials
~ = −gradU
• force F
Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 11
Forces vs. Potentials

i j
rij
• some potentials from mechanics:
2
– harmonic potential (Hooke’s law): Uharm rij = 12 k rij − r0


potential energy of a spring with length r0 , stretched/clinched to a length rij


 mm
– gravitational potential: Ugrav rij = −g ri ij j
potential energy caused by a mass attraction of two bodies (planets, e.g.)
~ ij = −gradU rij = − ∂U
• the resulting force is F

∂rij
integration of the force over the displacement results in the energy or a potential
difference
~ ij = −F
• Newton’s 3rd law (“actio = reactio”): F ~ ji

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 12
Intermolecular Two-Body Potentials
(
∞ ∀ rij ≤ d


hard sphere potential: UHS rij = force: Dirac function
0 ∀ rij > d
  n
• soft sphere potential: USS rij =  rσij


∞ ∀ rij ≤ d1
 
• Square-well potential: USW rij = − ∀ d1 < rij < d2

0 ∀ rij ≥ d2

(
∞ ∀ rij ≤ d


Sutherland potential: USu rij = −
r6
∀ rij > d
ij
• Lennard-Jones potential
 6
• van der Waals potential: UW rij = −4σ 6 r1

ij
 = energy parameter
σ = length parameter (rel. to atom diameter, cmp. van der Waals radius)
• Coulomb potential: Ucol rij = 1 qi qj

4π0 rij

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 13
Two-Body Potentials: Hard vs. Soft Spheres
hard sphere potentials soft sphere potentials
2 2
hard sphere soft sphere
Square−well Lennard−Jones
Sutherland van der Waals
1.5 1.5

1 1

0.5 0.5
potential U

potential U
0 0
σ σ

−0.5 −0.5

−1 −1

−1.5 −1.5

−2 −2
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
distance r distance r

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 14
Van der Waals Attraction
• intermolecular, electrostatic interactions
• electron motion in the atomic hull may result in a temporary asymmetric
charge distribution in the atom; i.e. more electrons (or negative charge,
resp.) on one side of the atom than on the opposite one
• charge displacement ⇒ temporary dipole
• a temporary dipole
– attracts another temporary dipole
– induces an opposite dipole moment in an adjacent non-dipole atom
and attracts it
• dipole moments are very small and the resulting electric attraction forces
are weak and act in a short range only
• atoms have to be very close to attract each other, for a long distance the
two dipole partial charges cancel each other
• high temperature (kinetic energy) breaks van der Waals bonds

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 15
Lennard-Jones Potential
• general Lennard-Jones potential: e s

ULJ rij = α
 
σ
n

 m 
σ rij
rij rij
ri
1

nn
 1
n−m e,s
with n > m and α = n−m mm
• LJ-12-6 potential rj
O
   6 
12
σ
− rσij

ULJ rij = 4 rij

• m = 6: van der Waals attraction (matches van der Waals potential)


• n = 12: Pauli repulsion (soft sphere potential) → purely heuristic
• application: simulation of ideal gases (e.g. Argon)
• force between 2 molecules:
    6 
∂U (rij ) 12
Fij = − ∂rij = 24
rij 2 σ
rij − σ
rij

• very fast decay ⇒ short range (m = 6 > 3 = d dimension)


Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 16
LJ Atom-Interaction Parameters
 and σ need to be determined
for any atom/molecule:
atom  σ
[1.38066 · 10−23 J]a [10−1 nm]b
e s
H 8.6 2.81
He 10.2 2.28
C 51.2 3.35
N 37.3 3.31
O 61.6 2.95
F 52.8 2.83
Ne 47.0 2.72  = energy parameter
S 183.0 3.52 σ = length parameter
Cl 173.5 3.35
Ar 119.8 3.41
(cmp. van der Waals radius)
Br 257.2 3.54
Kr 164.0 3.83 → parameter fitting
to match experiments
a Boltzmann-constant: kB := 1.38066 · 10−23 KJ
b 10−1 nm = 1Å (Ångstöm)

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 17
Dimensionless Formulation
“Dimensionless”:
use reference values such as σ, , . . . to derive equations in which quantities
no longer carry any dimensional units (m/s, kg, etc.)

Re-scaling to dimensionless quantities:


• position, distance
1
~r ∗ := ~r (3a)
σ
• time r
∗ 1 ε
t := t (3b)
σ m
• velocity
∆t
~v ∗ := ~v (3c)
σ

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 18
Dimensionless Formulation (2)

• potential (atom-interaction parameters are eliminated!): U ∗ := U




ULJ rij
   −3 
−6

= 4 rij∗ 2 − rij∗ 2

ULJ rij := (4a)

∗ Ukin 1 mv 2 v ∗2
Ukin := = = (4b)
  2 2∆t ∗ 2
• force
~ ij σ
F
  
−6  −3  ~r ∗
∗2 ij
~ ∗
Fij := = 24 2 rij − rij∗ 2 (4c)
 rij∗ 2

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 19
Advanced Modelling: Multi-Centered Molecules
• molecules composed of multiple LJ-centers
(rigid bodies without internal degrees of freedom)
• additionally: orientation (quarternions), angular
velocity
• additionally: moment of inertia (principal axes
transformation)
CB2 • calculation of the interactions between each
CB
FB2A2
FB2A1 center of one molecule to each center of the other
CB1
• resulting force (sum) acts at the center of gravity,
FB1A2 FBA
FB1A1
additional calculation of torque
FAB
FA1B1
FA1B2 FA2B1
FA2B2
CA
CA1 CA2

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 20
Molecular Dynamics – the Mathematical Model
System of ODEs

~i = P F
• resulting force acting on a molecule: F ~
j6=i ij
• acceleration of a molecule (Newton’s 2nd law):
∂U(~ri ,~rj )
~ ij
P
~
P
F F j6=i ∂|rij |
~¨ri = i = j6=i
=− (5)
mi mi mi
• system of dN coupled ordinary differential equations of 2nd order
(N: number of molecules, d: dimension)
• transferable to 2dN coupled ordinary differential equations of 1st order,
e.g. by introducing velocity ~v := ~r˙ (“derivative” variable),
or (even better) momentum ~p:

~pi := mi~r˙i (6a)


~p˙ i = F
~i (6b)

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 21
Initial Conditions
Initial Value Problem:
Molecule positions and velocities have to be given:
• place molecules as in a crystal lattice
(body-/face-centered cell)
• choose initial velocity to match temperature:

N
d 1X
N kB T = mvi2 with vi := v0
2 2
i=1

• set velocities according to normal or uniform


distribution around

r
dkB T
v0 := resp. v0∗ := dT ∗ ∆t ∗
m
and with random direction

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 22
NVT Ensemble, Thermostat
• statistical (thermodynamics) “ensemble”: system in equilibrium,
described by a set of macroscopically observable variables
• for the simulation of a (canonical) NVT ensemble, the following values
have to be kept constant:
– N: number of molecules
– V : volume
– T : temperature
• thermostat regulates and controls the temperature (the kinetic energy),
which is fluctuating in a simulation
• kinetic energy specified by the velocity of the molecules: Ekin = 12 i mi ~vi2
P

• temperature is defined by T = dNk 2


B
Ekin (kB : Boltzmann constant)
• simple method: isokinetic (velocity) scaling
q
vcorr := βvact with β = TTact
ref

• further methods exists, e.g. Berendsen or Nosé-Hoover thermostat

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 23
Periodic Boundary Conditions

a a

• replicate simulated box periodically in all dimensions


• models an infinite space, built from identical cells
⇒ domain with torus topology
Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Modelling | Summer 2022 24
Scientific Computing II
Molecular Dynamics – Forces

Michael Bader
Technical University of Munich
Summer 2022
Computational Effort for N-Body Problems

• to solve: system of ODE

d2 ~
F 1 X~
~ri = i = Fij
dt 2 mi mi
j6=i

• requires forces between all pairs of molecules:


X
~i =
“for all i: compute F ~ ij ”
F
j6=i
~ i thus O(N 2 )
• computational effort to compute all forces F
• unfeasible for systems with 106 to 109 molecules
⇒ How can the computational effort be reduced?

Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 2


Part I

Short-Range Potentials

Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 3


Short-Range Potential – Lennard-Jones
• consider LJ-6-12 potential: fast decay of potential and force
• for each molecule, an influence volume (closed sphere) with
cut-off radius rc can be assumed where every molecule outside
this influence volume is neglected:
(   
4 ∗ 2 −6 − r ∗ 2 −3 for rij∗ ≤ rc


 
∗ r ij ij
ULJ,r c
r ij = (1a)
0 for rij∗ > rc
   ∗
  24 2 r ∗ 2 −6 − r ∗ 2 −3 ~rij for rij∗ ≤ rc
~ ∗ ~r ∗ =
F ij ij rij∗ 2 (1b)
ij,rc ij
0 for r ∗ > r ij c

• consider only a subgraph of the interaction-graph


   
~ ij k, and we write, e.g., U ∗
Some conventions: rij := k~rij k, Fij := kF ∗ instead of U ∗
LJ,rc rij ,
LJ,rc rij
~∗
if the potential only depends on the absolute distance rij .

Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 4


Short-Range Interactions – Force Matrix
dim. red. Lennard−Jones 12−6 Potential dim. red. Lennard−Jones 12−6 Force
0.5
0.4

0
0.2

0 −0.5

−0.2
−1
*

*
U

F
−0.4

−1.5
−0.6

−2
−0.8

−1 −2.5
0 1 2 3 4 5 0 1 2 3 4 5
* *
r r

1
Fij Force matrix/Interaction-graph

- F12 F13 F14 F15


3
−F12 - F23 F24 F25
2
4
−F13 −F23 - F34 F35
−F14 −F24 −F34 - F45
5 −F15 −F25 −F35 −F45 -

Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 5


Sparse Force Matrix with Cut-Off Potentials
dim. red. finites Lennard−Jones 12−6 Potential (rc=2) dim. red. finite Lennard−Jones 12−6 Force (rc=2)
0.5
0.4

0
0.2

0 −0.5

−0.2
−1
*

*
U

F
−0.4

−1.5
−0.6

−2
−0.8

−1 −2.5
0 1 2 3 4 5 0 1 2 3 4 5
* *
r r

1
Fij Force matrix/Interaction-graph

- F12 F13 F14 0


3
−F12 - 0 F24 F25
2
4
−F13 0 - F34 0
−F14 −F24 −F34 - F45
5 0 −F25 0 −F45 -

Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 6


Cut-Off Corrections

• due to the cut-off radius, the calculation of


– the potential energy
– the pressure
neglects some addends with small absolute values
⇒ (small) errors
• cut-off correction tries to correct this error
• constant density and a homogeneous distribution are a
prerequisite
• physical values in the calculated volume can be approximately
extrapolated

Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 7


Shifted Potentials
shifted dim. red. finites Lennard−Jones 12−6 Potential (rc=2) dim. red. finite Lennard−Jones 12−6 Force (rc=2)
0.5

0.4

0
0.2

0 −0.5

−0.2
−1
*

*
U

F
−0.4
−1.5
−0.6

−2
−0.8

−1 −2.5
0 1 2 3 4 5 0 1 2 3 4 5
* *
r r

(  
  ∗ ∗ (r ∗ ) for r ∗ ≤ r ∗
ULJ rij∗ − ULJ
∗ c c
ULJ,r ,shifted rij∗ = ij
0 for rij∗ > rc∗
c

(  
  ~ ∗ ~r ∗
F for rij∗ ≤ rc∗
~ ∗ ~r ∗ =
F ij ij
ij,rc ij
0 for rij∗ > rc∗

Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 8


Shifted Potentials

shifted dim. red. finites Lennard−Jones 12−6 Potential (rc=2) dim. red. finite Lennard−Jones 12−6 Force (rc=2)
0.5

0.4

0
0.2

0 −0.5

−0.2
−1
*

*
U

F
−0.4
−1.5

−0.6

−2
−0.8

−1 −2.5
0 1 2 3 4 5 0 1 2 3 4 5
* *
r r

• additionally, constant additive term for the potential


⇒ continuous potential
reduced error for the overall potential

Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 9


Shifted Potentials
shifted dim. red. finites Lennard−Jones 12−6 Potential (rc=2) shifted dim. red. finite Lennard−Jones 12−6 Force (rc=2)
0.5

0.4

0
0.2

0 −0.5

−0.2
−1
*

*
U

F
−0.4
−1.5
−0.6

−2
−0.8

−1 −2.5
0 1 2 3 4 5 0 1 2 3 4 5
* *
r r

(    
∗ ∗
 ∗
ULJ rij∗ − ULJ

(rc∗ ) − FLJ

(rc∗ ) rij∗ − rc∗ for rij∗ ≤ rc∗
ULJ,r c ,shifted
rij =
0 for rij∗ > rc∗
   ∗
~ ∗ ~r ∗ − F ∗ (r ∗ ) ~rij∗
 F for rij∗ ≤ rc∗
~∗
F ∗
~rij = ij ij LJ c r
ij
ij,rc ,shifted
0 for rij∗ > rc∗

Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 10


Shifted Potentials
shifted dim. red. finites Lennard−Jones 12−6 Potential (rc=2) shifted dim. red. finite Lennard−Jones 12−6 Force (rc=2)
0.5

0.4

0
0.2

0 −0.5

−0.2
−1
*

*
U

F
−0.4
−1.5

−0.6

−2
−0.8

−1 −2.5
0 1 2 3 4 5 0 1 2 3 4 5
* *
r r

• additionally, constant additive term for the potential


⇒ continuous potential
• additionally, linear additive term for the potential
⇒ continuous force

Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 11


Cut-Off Potentials – Summary

• fast decay of force contributions with increasing distance


→ dense force matrix with O(n2 ), but mostly very small entries
• with cut-off: force matrix is sparse (and anti-symmetric)
• only small (constant) number of molecules fits into the cut-off
radius
• cut-off radius thus leads to a reduction of the computational effort
• complexity of entire force calculation can thus reduced
from O N 2 to O(N)


• todo: efficient implementation to identify the close neighbours

Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 12


Part II

Linked Cell Algorithm

Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 13


MD – Implementational Aspects
Verlet Neighbour Lists

• every molecule stores its neighbours


for a distance rmax > rc
rmax • every nupd time steps (dep. on rmax ),
rc
the lists are updated
→ needs to be O(N)
• the ”buffer” has to be larger than the
covered distance of all involved
molecules for that time:

rmax − rc > vm · nupd ∆t

Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 14


Linked-Cell Algorithm

molecules arranged in a lattice of cubic cells of side length ∼ rc

• similar to a hash table with


”geometrically motivated”
hash function
• ”Binning” resp.
”Bucketing”-techniques from
”Computational Geometry”
• direct volume representation (voxel)
of the influence region

Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 15


Classical Linked-Cell
Linked-Cell: Algorithm
Data Structures
Cell 1 Cell 2 Cell 3

3 6
8
4

1 7
2
9
5

Memory Layout: Alternative 1

1 3 5 8 2 4 6 7 9

Memory Layout: Linked List

1 2 3 4 5 6 7 8 9

Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 16


Linked-Cell: Computational Effort

molecules arranged in a lattice of cubic cells of side length ∼ rc

• runtime: O(n)
• only half (point symmetry) of the
neighbour cells are explicitly traversed
(Newton’s 3rd law)
orange vs. yellow cells in the picture
• erase and generate the data structure
in each time step

Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 17


Variable Linked-Cell Algorithm
• lattice might be built up from cells 10
searchvolume/hemispherevolume

of side length rtc with t ∈ R+ 9

• integer numbers are preferable for the 7

divisor t ∈ N 5

• for t → ∞, the examinated 3

influence volume will converge to 2

1
1 2 3 4 5 6 7 8 9 10

the (optimal) sphere rc/cellwidth

t =1 t =2 t =4 t =3
Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 18
Outlook: Parallelisation and Actio=Reactio
“Actio = Reactio”:
• symmetrically acting force between two
molecules
• straightforward optimisation: compute
force once and apply to both molecules
• can lead to race conditions for
parallelisation in shared memory
Mitigation: Colouring Schemes
• graph colouring of linked cells:
adjacent cells have different colours
• only parallelise within cells of the
same colour
• sequential processing of colours.
Michael Bader | Scientific Computing II | Molecular Dynamics – Forces | Summer 2022 19
Scientific Computing II
Molecular Dynamics – Numerics

Michael Bader – SCCS


Technical University of Munich
Summer 2022
Recall: Molecular Dynamics – System of ODEs

~i = P F
• resulting force acting on a molecule: F ~
j6=i ij
• acceleration of a molecule (Newton’s 2nd law):

∂U(~ri ,~rj )
~ ij
P
~
P
F F j6=i ∂|rij |
~¨ri = i = j6=i
=−
mi mi mi

• or, with acceleration ~ai := m1 F~ i : ~¨ri = ~ai ,


i

where F ~ i and ~ai depend on all positions ~ri


• transfer to 2dN coupled ordinary differential equations of 1st order:

mi~r˙i = ~pi or ~r˙i = ~vi


~p˙ i = F~i ~v˙ i = F~ i /mi = ~ai

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Numerics | Summer 2022 2
Euler Time Stepping for MD

Explicit Euler Method:


• Taylor series expansion of the positions in time:

1 ∆t i (i)
~r (t + ∆t) = ~r (t) + ∆t~r˙ (t) + ∆t 2~¨r (t) + ~r (t) + . . . (1)
2 i!
(ṙ , r̈ , r (i) : derivatives)
• neglecting terms of higher order of ∆t, and analogous formulation of
~
~v (t) := ~r˙ (t) with ~a(t) := ~v˙ (t) = ~¨r (t) = Fm(t) leads to the explicit Euler
method:
. ~
~v (t + ∆t) = v (t) + ∆t ~a(t)
.
~r (t + ∆t) = ~r (t) + ∆t ~v (t)

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Numerics | Summer 2022 3
Euler Time Stepping for MD (cont.)

• explicit Euler method:


. ~
~v (t + ∆t) = v (t) + ∆t ~a(t) (2a)
. ~
~r (t + ∆t) = r (t) + ∆t ~v (t) (2b)

• similar for implicit Euler method


→ derivatives at the time step end:
. ~
~v (t + ∆t) = v (t) + ∆t ~a(t + ∆t) (3a)
.
~r (t + ∆t) = ~r (t) + ∆t ~v (t + ∆t) (3b)

• disadvantage for both schemes: do not conserve critical properties;


lead to wrong long-term solutions (compare tutorial on circular motion);
plus: low accuracy

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Numerics | Summer 2022 4
Classical Störmer Verlet Method
• the Taylor series expansion in (1) can also be performed for −∆t:

1 (−∆t)i (i)
~r (t − ∆t) = ~r (t) − ∆t~r˙ (t) + ∆t 2~¨r (t) + ~r (t) + . . . (4)
2 i!
• from (1) and (4) the classical Verlet algorithm can be derived:

~r (t + ∆t) = 2~r (t) − ~r (t − ∆t) + ∆t 2~¨r (t) + O(∆t 4 )


(5)
≈ 2~r (t) − ~r (t − ∆t) + ∆t 2~a(t)

~ (t)
note: direct calculation of ~r (t + ∆t) from ~r (t) and F
• velocity can be estimated via

. ~r (t + ∆t) − ~r (t − ∆t)
~v (t) = ~r˙ (t) = (6)
2∆t
• disadvantage: needs to store two previous time steps

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Numerics | Summer 2022 5
Crank Nicolson Method

• explicit approximation (7a) for half step [t, t + ∆t


2 ] inserted into implicit
∆t
approximation (7b) for half step [t + 2 , t + ∆t]

∆t ∆t
~v (t + ) = ~v (t) + ~a(t) (7a)
2 2
∆t ∆t
~v (t + ∆t) = ~v (t + )+ ~a(t + ∆t) (7b)
2 2
• leads to Crank-Nicolson scheme for v :
∆t 
~v (t + ∆t) = ~v (t) + ~a(t) + ~a(t + ∆t) (8)
2
• key disadvantage: implicit scheme, as ~a(t + ∆t) depends on ~r (t + ∆t);
needs to solve non-linear system of equations

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Numerics | Summer 2022 6
Velocity Störmer Verlet Method
The Velocity Störmer Verlet method is a composition of a
• Taylor series expansion of 2nd order for the positions, as in Eq. (1)
• and a Crank Nicolson method for the velocities, as in Eq. (8)
~r (t + ∆t) = ~r (t) + ∆t ~v (t) + ∆t 2 ~
2 a(t) (9a)
∆t

~v (t + ∆t) = ~v (t) + 2
~a(t) + ~a(t + ∆t) (9b)

Forward Euler r Force calculation Crank-Nicolson v

r
v
F
t-Dt t t+Dt t-Dt t t+Dt t-Dt t t+Dt t-Dt t t+Dt

Memory requirement: (3 + 1) · 3N (3+1 vector fields)


update of v (t + ∆t) requires v (t), r (t + ∆t) and F (t + ∆t), but also F (t)

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Numerics | Summer 2022 7
Velocity Störmer Verlet – Implementation
• reformulate equation for positions ~r :

∆t 2
~r (t + ∆t) = ~r (t) + ∆t ~v (t) + ~a(t)
 2 
∆t
= ~r (t) + ∆t ~v (t) + ~a(t)
2
∆t
contains “half an Euler time step” (i.e., Euler time step of size 2 ) for ~v
• similar for the velocities ~v :
∆t 
~v (t + ∆t) = ~v (t) + ~a(t) + ~a(t + ∆t)
 2 
∆t ∆t
= ~v (t) + ~a(t) + ~a(t + ∆t)
2 2

reuses the result of the half Euler time step for ~v

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Numerics | Summer 2022 8
Velocity Störmer Verlet – Implementation (2)

1. compute half an Euler time step for ~v

~v (t + ∆t ∆t ~
2 ) = ~v (t) + 2 a(t)

2. update positions ~r :
~r (t + ∆t) = ~r (t) + ∆t ~v (t + ∆t
2 )

3. calculate forces ~a(t + ∆t) from positions ~r (t + ∆t)


4. update the velocities ~v :
~v (t + ∆t) = ~v (t + ∆t ∆t ~
2 ) + 2 a(t + ∆t)

Note: memory requirements: 3 · 3N (3 vector fields)


vectors ~v and ~r , as well as forces/accelerations ~a may be updated in-place in each time step

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Numerics | Summer 2022 9
Leapfrog Method

• combine steps 4 (from previous time step) and 1 to a single step


• velocity calculation thus shifted by a half time step:

∆t ∆t
~v (t + 2 ) = ~v (t − 2 ) + ∆t ~a(t) (10a)
~r (t + ∆t) = ~r (t) + ∆t ~v (t + ∆t
2 ) (10b)

r
v t-Dt/2 t+Dt/2 t-Dt/2 t+Dt/2 t-Dt/2 t+Dt/2 t-Dt/2 t+Dt/2
F
t-Dt t t+Dt t-Dt t t+Dt t-Dt t t+Dt t-Dt t t+Dt

• exact arithmetic: Störmer Verlet, Velocity Störmer Verlet and Leapfrog


Scheme are equivalent
• the latter two are more robust w.r.t. roundoff errors

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Numerics | Summer 2022 10
Dimensionless Velocity Störmer Verlet

• remember dimensionless formulation:


(~r := σ~r ∗ , ~v := ∆t v , ∆t 2 := σ 2 m ∆t ∗ 2 , ~¨r =
σ ~∗ 1~
mF := 1 ~∗
m σF )
• insert into Velocity Störmer Verlet Method to get:.
∆t ∗ 2 ~∗
~r ∗ (t + ∆t) = ~r ∗ (t) + ~v ∗ (t) + F (t) (11a)
2
∆t ∗ 2 ~∗ ∆t ∗ 2 ~∗
~v ∗ (t + ∆t) = ~v ∗ (t) + F (t) + F (t + ∆t) (11b)
2 2
Forward Euler r ½ Forward Euler v Force calculation ½ Backward Euler v

r
v
F
t-Dt t t+Dt t-Dt t t+Dt t-Dt t t+Dt t-Dt t t+Dt t-Dt t t+Dt

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Numerics | Summer 2022 11
Dimensionless Velocity Störmer Verlet (2)
Forward Euler r ½ Forward Euler v Force calculation ½ Backward Euler v

r
v
F
t-Dt t t+Dt t-Dt t t+Dt t-Dt t t+Dt t-Dt t t+Dt t-Dt t t+Dt

Procedure:
1. calculate new positions (11a),
∗2
partial velocity update: + ∆t2 F~∗ (t) in (11b)
2. calculate new forces, accelerations (computationally intensive!)
∗2
3. calculate new velocities: + ∆t2 F ~ ∗ (t + ∆t) in (11b)
→ memory requirements: 3 · 3N

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Numerics | Summer 2022 12
Outlook: Leapfrog Method with Thermostat
• Leapfrog method:
∆t ∆t
~v (t + 2
) = ~v (t − 2
) + ∆t ~a(t)
~r (t + ∆t) = ~r (t) + ∆t ~v (t + ∆t
2
)

Thermostat
r
v t-Dt t+Dt t+Dt t-Dt t+Dt
F
t-Dt t t+Dt t-Dt t t+Dt t-Dt t t+Dt t-Dt t t+Dt t-Dt t t+Dt

• intermediate step may be introduced for the thermostat


~v (t) := 12 ~v (t + ∆t ∆t

~
2 ) + v (t − 2 ) to synchronize velocity:

~vact (t) = ~v (t − ∆t ∆t ~
2 ) + 2 a(t) (13a)
∆t
~v (t + 2 ) = (2β − 1)~vact (t) + ∆t ~
2 a(t) (13b)

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Numerics | Summer 2022 13
Evaluation of Time Integration Methods

Evaluation criteria:
• accuracy (often not of great importance for exact particle positions)
• stability
• conservation
→ of phase space density (symplectic)
→ of energy
→ of momentum
(especially with PBC → Periodic Boundary Conditions)
• reversibility of time
• use of resources:
– computational effort (number of force evaluations)
– maximum time step size
– memory usage

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Numerics | Summer 2022 14
Reversibility of Time
• time reversal for a closed system means
• a turnaround of the velocities and also momentums;
positions at the inversion point stay constant
• traverse of a trajectory back in the direction of the origin
• demand for symmetry for time integration methods
+ satisfied by Verlet method, e.g.
− not satisfied by, e.g., Euler method, Predictor Corrector methods
(also not by standard Runge-Kutta methods)
• contradiction with
• the H-theorem (increase of entropy, irreversible processes)?
(Loschmidt’s paradox)
• the second theorem of thermodynamics?
• reversibility in theory only for a very short time
• Lyapunov instability ⇒ Kolmogorov entropy

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Numerics | Summer 2022 15
Lyapunov Instability

• Basic question: how does a model behave with slightly disturbed initial
condition?
• Example of a simple system:
• stable case:
jumping ball on a plane with slightly disturbed initial horizontal
velocity ⇒ linear increase of the disturbance
• instable case:
jumping ball on a sphere with slightly disturbed initial horizontal
velocity ⇒ exponential increase of the disturbance (Lyapunov
exponent)
• for the instable case, small disturbances result in large changes:
chaotic behaviour (butterfly leading to a hurricane?)
• non-linear differential equations are often dynamically instable

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Numerics | Summer 2022 16
Lyapunov Instability: A Numerical Experiment

• setup of 4000 fcc atoms


• for a second setup, the position of a single
atom was displaced by 0.001
• this atom is traced in both setups
tracing a Molecule (with initial displacement)

Molecule 25, run1


Molecule 25, run2

4.1 colours indicate velocity


4
3.9
3.8
3.7
3.6
3.5

7.7
7.6
7.5
2.5 7.4
3
3.5 7.3
4
4.5 7.2
5
5.57.1

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Numerics | Summer 2022 17
Lyapunov Instability: A Numerical Experiment
• Calculation of the trajectories → badly conditioned problem:
a small change of the initial position of a molecule may result in a
distance to the comparable original position, after some time, in the
magnitude of the whole domain!
• Thus: do not target at simulation of individual trajectories
→ numerical simulation of the behaviour of the system is wanted!
tracing a Molecule (with initial displacement) Molecule deviation (with initial displacement)
0.5
Molecule 25, run1
Molecule 25, run2 0.45

0.4

4.1 0.35
4
3.9 0.3
3.8
3.7 0.25
3.6
0.2
3.5

7.7
0.15
7.6
7.5
2.5 7.4 0.1
3
3.5 7.3
4
4.5 7.2 0.05
5
5.57.1
0
2.5 3 3.5 4 4.5 5 5.5

Michael Bader – SCCS | Scientific Computing II | Molecular Dynamics – Numerics | Summer 2022 18
Scientific Computing II
Molecular Dynamics – Barnes-Hut and Fast Multipole

Michael Bader
Technical University of Munich
Summer 2022
Computational Effort for N-Body Problems
• to solve: system of ODE

d2 ~
F 1 X~
~ri = i = Fij
dt 2 mi mi
j6=i

• requires forces between all pairs of molecules:


X
~i =
“for all i: compute F ~ ij ”
F
j6=i
~ i thus O(N 2 )
• computational effort to compute all forces F
• unfeasible for systems with 106 to 109 molecules
⇒ How can the computational effort be reduced?
. . . For long-range potentials?
Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 2
Part I

The Barnes-Hut Algorithm

J. Barnes, P. Hut: A hierarchical O(N log N) force-


calculation algorithm. Nature 324, 1986, p. 446 ff.

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 3
Barnes Hut Method – Key Ideas

Consider Astrophysics:
• force w.r.t. a far-away individual star might be neglected
• but not the force w.r.t. a far-away(?) galaxy
• thus: approximate forces on a individual star by grouping far-away
stars, galaxies, etc. into clusters
• represent clusters by accumulated mass located at its
centre-of-mass

Clustering via Domain Decomposition:


• clustering of particles required, where size of clusters depends on
the distance to each individual particle
• solved by multi-level tree-based domain decomposition

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 4
Domain Decomposition
• distribute long-range region into subdomains: Ωfar = Ωfar
S
i i
• to be done for every particles position
(in practice via hierarchical domain decomposition)
• assign a point y0i to each Ωfar
i
• decomposition depending on size of subdomains:

diam := sup ky − y0i k


y∈Ωfar
i

• choose decomposition such that


diam
≤ θ
kx − y0i k

for a suitable constant 0 < θ < 1


Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 5
Octrees for Domain Decomposition
• efficient realisation of required decompositions
• recursive decomposition of Ω in subdomains
• stop, if only one particle left per cell
• use respective subtree for each xi
Octrees:

xi

xi

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 6
Barnes-Hut Algorithm
• developed 1986 for applications in Astrophysics
• for gravity potential/force:

~ ~
U(rij ) = −γgrav
mi mj ~ (~ri , ~rj ) = −γgrav mi mj (ri − rj )
~ ij = F
F
rij k~ri − ~rj k3

• uses octree with 0 or 1 particles per cell


• inner nodes corresp. to clusters of particles
(pseudo particle)
• idea: gravity force of particle cluster approximated
(sum of masses, localised in centre of mass)
• computation of forces: for each particle, do an incomplete(!) octree
traversal

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 7
Barnes-Hut: Computation of Forces

For each particle (position x ∈ Ω):


• start in root node
• descent into subdomains, until θ-rule satisfied: diam
r ≤ θ, r the
distance of pseudo particle from x
• accumulate corresp. partial force to current particle

Implicit separation of short- and long-range forces:


• short-range: all leaf nodes that are reached
(containing 1 particle)
• long-range: all inner nodes, where descent is stopped (force
caused by pseudo particle)

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 8
Barnes-Hut: Computation of Forces (2)

Tree traversal:

xi

xi

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 9
Barnes-Hut: Accuracy and Complexity
Accuracy of Barnes-Hut:
• depends on choice of θ
• the smaller θ, the more accurate the long-range forces
• the smaller θ, the larger the short-range (i.e., the costs)
• slow convergence w.r.t. θ (low-order method)

Complexity:
• grows for small θ
• for θ → 0: algorithm degenerates to “all-to-all” → O(N 2 )
• for more or less homogeneously distributed particles:
– number of active cells: O(log N/θ3 )
– total effort therefore O(θ−3 N log N)

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 10
Barnes-Hut: Implementation

• computation of pseudo particles:


– bottom-up-traversal (post-order)
– sum up masses, weighted average for centre-of-mass
• computation of forces:
– traversal of entire tree (outer loop on all particles)
– top-down traversal (pre-order) until θ-rule satisfied (inner loop)
• further traversals for time integration
• re-build (or update) octree structure after each time step
→ requires efficient data structures and algorithms

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 11
Part II

The Fast Multipole Method

L. Greengard, V. Rokhlin: A fast algorithm for particle simulations.


J. Comp. Phys. 73, 1987. https://doi.org/10.1016/0021-9991(87)90140-9
C. R. Anderson: An implementation of the Fast Multipole Method without
multipoles. SIAM J. Sci. Stat. Comput. 13(4), 1992. https://doi.org/10.1137/0913055

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 12
Barnes-Hut vs. Fast Multipole
Barnes-Hut: compute forces of pseudo particles to particles:

x y0ii
x y0

Fast Multipole: compute forces between pseudo particles:

x l0l y0ii
x0 y0

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 13
Revisited: Force, Potential Energy and Potential
For Coulomb and gravity interaction:
• force on a particle with mass mi located at position ~xi caused by many particles
with masses mj at positions ~xj
~ (grav) =
X mi mj ~ (Coul) =
X 1 qi qj
F i −γ 3 ~rij F i
~rij
rij 4π0 rij3
j j

• corresponding potential energy of the particle with mass mi at position ~xi


 X mi mj  1 X qi qj
Ugrav ~xi = −γ UCoul ~xi =
rij 4π0 rij
j j

(negative gradient of U ~xi leads to force)
• potential at ~xi caused by all particles with masses mj at positions ~xj
 X mj  1 X qj
Ψgrav ~xi = −γ ΨCoul ~xi =
rij 4π0 rij
j j

(potential is potential energy of a unit mass/charge at location ~xi )


⇒ accumulate potentials Ψgrav/Coul , then compute potential energy and force
Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 14
“Box–Box Interactions” in a “Linked Cell” Fashion
Let’s draft a possible algorithm in a strongly simplified setting:
• consider a “Linked Cell”-type grid (possibly even in 1D)
→ each grid cell contains a list of particles
• define a pseudo particle (as in Barnes-Hut) for each linked cell
→ accumulated mass located in centre of mass
• force computation:
1. between particles in the same or in adjacent cells:
add particle–particle force
2. between particles in separated cells:
add forces between corresponding pseudo-particles;
accumulate that force to all particles of a pseudo particle

What’s missing?

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 15
“Box–Box Interactions” in a “Linked Cell” Fashion
Findings:
• consider a “Linked Cell”-type grid (in 1D)
(k ) (k)
→ each grid cell Ck contains a list of particles: xi , mi
• define a pseudo particle (as in Barnes-Hut) for each linked cell
P (l)
→ accumulated mass Ml := j mj
P (l) (l)
→ located in centre of mass Xl := M1 j mj xj
l
• add up particle–particle forces between particles in adjacent cells Ck and Cl :
(k ) (k)
for all particles xi , mi in cell Ck :
(k) (k) P (k) (l) (k) (l) 2
Fi = Fi ± j∈Cl γmi mj / xi − xj
• for all separated cells Cl → add potential caused by pseudo particles of Cl
(sum over all cells
P Cl with pseudo particle of mass Ml at position Xl ):
Ψk = −γ l6=k ,l6=k ±1 Ml / Xk − Xl
(or, accumulate forces, considering correct sign:)
(k) (k) P (k) 2
Fi = Fi ± j∈Cl γmi Ml / Xk − Xl
(k)
⇒ idea: accumulate potentials Ψk , then multiply with factors mi

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 16
“Box–Box Interactions” in a “Multigrid’ Fashion
Less simplified setting:
• add a hierarchy of grids as in multigrid methods
→ finest grid contains a list of particles for each cell
→ all grids contain a pseudo particle (as in Barnes-Hut)
• force computation on the finest level:
identical to “Linked Cell” Fashion on previous slide
• force computation between pseudo particles:
1. between pseudo particles in “nearby” cells:
add pseudo-particle–pseudo-particle force
2. between pseudo particles in “far away” cells:
add force between corresp. pseudo-particles on next-coarser level
Do we catch all interactions? How to define “nearby”/“far away”?
What’s (still) missing?
Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 17
“Box–Box Interactions” in a “Multigrid’ Fashion
Findings:
• box–box interactions occur at multiple levels → as particles are part of all
parent/grand-parent pseudo particles, the interaction between two particles might
be captured by box–box interactions on multiple levels
⇒ make sure that each particle–particle interaction is considered exactly once!
• different concepts for “far away” boxes:
Barnes-Hut-type: θ-criterion
Fast-Multipole-type: not in an adjacent cell ( “well separated”)
• force computation between pseudo particles occurs, if:
1. pseudo particles are not in cells that are direct neighbours (requires
particle–particle interaction, no approximation via pseudo particles allowed)
2. interaction between the boxes that contain those pseudo particles is not
considered on coarser levels
3. hence, considers comparably few interactions on each level that are
“nearby” but neither direct neighbours nor too far away

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 18
“Box–Box Interactions” in Hierarchical Methods

Fast Multipole introduces “box–box interactions”:


• compute the approximate potential (and resulting forces) that
(all particles of) a box cause(s) in a remote box (on the same level)
• pass on the accumulated (approximate) potentials resulting from
the box–box approximations to the child boxes

“Well-separated boxes” instead of θ-criterion


• two boxes are called “well separated”, if there is at least one entire
box (of the same level) between them
→ fixed geometric criterion
→ no influence on pseudo-particle positions
→ no θ available to control accuracy

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 19
Comparison of Barnes-Hut and Fast Multipole

What components are required:


• approximate potential of sets/clusters of particles
→ Barnes-Hut: pseudo particles in tree cells
→ Fast Multipole: higher-order representations necessary!
(simple pseudo particles together with well-separated criterion
is too inaccurate)
• hierarchical computation of “box potentials”
→ Barnes-Hut: combine pseudo particles in child cells
→ Fast Multipole: generate high order “box potentials”
(accumulate approximate potentials of child boxes to
approximate potential of parent box → needs to be derived for
high-order representation)

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 20
Approximate “Box potentials”
Approximate the potential of a set/cluster of particles by:
• multipole extension (Greengard & Rokhlin, 1987)
→ similar concept as Taylor series
→ complicated to derive, esp. in 3D (spherical harmonics)
→ complicated formula for hierarchical assembly
• inner/outer ring approximations (Anderson, 1992)
→ derived via numerical integration of an integral formula
→ uniform interaction with child and remote boxes
→ hierarchical assembly via evaluation of potentials
at integration points
• both approaches apply principle of “well-separated boxes”
→ box–box interaction allowed between boxes that are
separated by one box of the same hierarchical level

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 21
Outer Ring Approximations
• fundamental idea: represent potential via a surface integral
∞  n+1
Z !
X a
Ψa (~x ) = g(a~s) Qn (~s · ~xp ) ds
S2 r
n=0

in spherical coordinates ~x = (r , θ, ψ), with “suitable” functions Qn ,


~xp = (1, θ, ψ), and:
• g(a~s): the potential on a sphere of radius a that contains all
particles g(a~s) induced by the particles in the box!
• use numerical integration rule:
K M  n+1
!
X X a
Ψa (~x ) ≈ wi g(a~si ) Qn (~si · ~xp )
r
i=1 n=0

using K integration points ~si on the sphere S 2 (weights wi )


and choosing M relative to accuracy of integration rule
Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 22
Outer Ring Approximations (2)
K M  n+1
!
X X a
Ψa (~x ) ≈ wi g(a~si ) Qn (~si · ~xp )
r
i=1 n=0

• wi and ~si determined by integration rule


• r and ~xp determined by evaluation position ~x
• Qn depends on the type of potential (gravity/Coulomb, etc.)
• choice of a: usually twice the size of the box
• only g(a~si ) need to be computed and stored for each box:
→ leaf box: accumulate potentials of all box-local particles at a~si
→ parent box: accumulate (outer-ring-)approximate potentials of
all child boxes at a~si of “parent” ring

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 23
Outer Ring Approximations – Illustrations

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 24
Outer Ring Approximations – Procedures

Ring Approximations from Particles (“particle to multipole”, P2M)


• approximate potential caused by all particles of the current box
• evaluate and accumulate particle potentials at positions a~si
to obtain values g(a~si )

Ring Approximations from Children (“multipole to multipole”, M2M)


• approximate potential caused by all child boxes of the current box
• evaluate and accumulate outer ring approximations of all child
boxes at positions a~si (of the parent box) to obtain values g(a~si )

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 25
Inner Ring Approximations

Now: approximate box-local potential caused by all external boxes!

Almost the same formula:


K M  n+1
!
X X r
Ψa (~x ) ≈ wi g(a~si ) Qn (~si · ~xp )
a
i=1 n=0

• parent box: evaluate Ψa (~x ) at all integration points of all child-box


rings
⇒ obtain ring approximations of child boxes
• leaf box: evaluate Ψa (~x ) at all interior particles
⇒ obtain approximate box-induced forces on each particle

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 26
Hierarchical Computation of Potentials and Forces

Potential for a box results from:


• inner ring approximation of potential of its parent box
(“local to local”, L2L)
→ contains all accumulated box–box interactions on the parent
box
• box–box interaction with well-separated boxes on the same level
(“multipole to local”, M2L)
→ excludes boxes for which interactions are already considered
on parent levels
Result is a inner ring approximation of the potential
→ fixed approximation order (K , M) on each level

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 27
Hierarchical Computation of Potentials and Forces
(continued)

Forces on particles include:


• direct interaction (“particle to particle”, P2P), if particles are in
the same leaf-box or in a not well-separated (neighbour) leaf-box
• inner ring approximation (“local to particle”, L2P)
→ ring approximation of potential leads to force on each particles
in the resp. box
→ ring approximation contains all accumulated box–box
interactions with this box (i.e., with all well-separated boxes)
→ move from potential to force representation, i.e., compute
gradient of the ring approximation
Don’t forget: compute derivatives to obtain forces from potentials!

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 28
Hierarchical Force Computation – Illustrations
Forces on particles:
particle-to-particle (P2P) and local-to-particle (L2P)

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 29
Hierarchical Force Computation – Illustrations
Forces on ring approximations:
multipole-to-local (M2L) and local-to-local (L2L)

Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 30
Kernels in Fast Multipole Methods – Illustration

R. Yokota: illustration of data/information flow in FMM vs. treecodes/Barnes-Hut (source: https:


//www.bu.edu/pasi/courses/12-steps-to-having-a-fast-multipole-method-on-gpus/)
Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 31
Fast Multipole Method – Accuracy and Complexity
Accuracy:
• depends on accuracy of integration rule
→ determined by number of integration points
• in practice: can be increased to allow approximations that are
accurate up to machine precision

Complexity:
• computation of box-approximations, i.e., all g(a~si )
→ constant effort per box (leaf and inner boxes)
→ thus O(NB ) effort (NB boxes); if max. number of particles per
box is constant then O(N) (N particles)
• computation of forces
→ multilevel algorithms leads to O(N) effort
Michael Bader | Scientific Computing II | Barnes-Hut and Fast Multipole | Summer 2022 32

You might also like