0% found this document useful (0 votes)
52 views32 pages

CS 294-73 Software Engineering For Scientific Computing Lecture 12: Particle Methods Homework 3

This lecture discusses particle methods for numerical solutions of partial differential equations. Particle methods represent the solution as a collection of particles that evolve over time. The document outlines techniques for efficiently computing forces between particles, including using short-range cutoffs, binning particles into grids, and representing long-range forces using the particle-mesh (PM) method. It describes how the PM method approximates long-range forces by depositing particle properties onto a grid and solving Poisson's equation on the grid. The lecture also discusses the particle-particle particle-mesh (P3M) method, which combines short-range particle-particle interactions with the PM method for long-range forces.

Uploaded by

Edmund Zin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views32 pages

CS 294-73 Software Engineering For Scientific Computing Lecture 12: Particle Methods Homework 3

This lecture discusses particle methods for numerical solutions of partial differential equations. Particle methods represent the solution as a collection of particles that evolve over time. The document outlines techniques for efficiently computing forces between particles, including using short-range cutoffs, binning particles into grids, and representing long-range forces using the particle-mesh (PM) method. It describes how the PM method approximates long-range forces by depositing particle properties onto a grid and solving Poisson's equation on the grid. The lecture also discusses the particle-particle particle-mesh (P3M) method, which combines short-range particle-particle interactions with the PM method for long-range forces.

Uploaded by

Edmund Zin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

CS 294-73 


Software Engineering for


Scientific Computing


Lecture 12: Particle Methods;
Homework 3.

Particle Methods
Numerical methods defined by the evolution of a finite
collection of points in space. Each point carries both the
position in space, along with other properties, which
themselves may or may not evolve.
Example: a collection of N physical particles, evolving
under Newton’s laws of classical mechanics.
{xk , v k , wk }N
k=1
dxk
= vk
dt
dv k
= F (xk )
Xdt
F (x) = wk0 (r )(x xk 0 )
k0

10/08/19 CS294-73 – Lecture 12 2


Particle Methods
Particle methods are also employed as discretizations of
partial differential equations.
•  Vorticity form of the incompressible Euler equations.

10/08/19 CS294-73 – Lecture 12 3


dxk
= vk
Particle Methods dv dt
k
= F (xk )
dt
X
F (x) = wk0 (r )(x xk 0 )
k0

To evaluate the fields for a single particle requires N


evaluations of the field functions, leading to an O(N2)
cost per time step: How do we reduce that cost ?

10/08/19 CS294-73 – Lecture 12 4


Short-Range Forces

•  Short-range forces (e.g. Lennard-Jones potential).


C1 C2
(x) =
|x|6 |x|12

The forces fall off sufficiently rapidly that the


approximation ⇥ (x) 0 if |x| > introduces acceptably
small errors for practical values of the cutoff distance .

10/08/19 CS294-73 – Lecture 12 5


Long-Range Forces
•  Coulomb / Newtonian potentials

cannot be localized by cutoffs without an unacceptable loss of


accuracy. But for this special case, we can take advantage of the
fact, that the potential is a localized solution to Poisson’s equation,
for ,

Harmonic functions are very smooth -> can be represented with a


small number of computational degrees of freedom.

10/08/19 CS294-73 – Lecture 12 6


Bin Sorting
For both long-range and short-range forces, need to sort particles to
determine which ones are near / far from a particle. The easiest way
to do this is with bin sorting.

•  Cost of sorting into bins: O(1)


per particle.
• Cost of computing which bins
are close: O(1) per bin.

10/08/19 CS294-73 – Lecture 12 7


Bin Sorting
Can use locally-refined grids / to maintain number of particle / bin
fixed.

10/08/19 CS294-73 – Lecture 12 8


Coulomb Forces
For short-range forces, choose h ~ σ, and to determine which
particles are close are close enough to each other to require
evaluation of the force. However, we still have an
O(N (h/L)D)2 x (L/h)D = O(N2) (h/L)D calculation for a uniformly-
distributed collection of particles, i.e. we’ve reduced the number of
pair-potential calculations by the number of bins.
For Coulomb forces, need to take advantage of the smoothness of
the far field, and the relationship of the Coulomb potential to
Poisson’s equation to reduce the work required to compute the
potential of distantly-separated particles.
•  Multipole methods / tree methods.

•  Particle-in-cell (PIC) methods / Particle-Particle Particle-


Mesh(PPPM, P3M) methods.

10/08/19 CS294-73 – Lecture 12 9


Particle-In-Cell Methods
Uses bin-sorting grid to compute the solution to Poisson’s equation
to represent the field induced by the particles. This is the most
common approach for problems in which the particles are used to
discretize a solution to a partial differential equation.
•  Deposit charges onto the grid.
•  Compute potential on the grid as a discrete convolution: = Gh ⇤ q
Can do this fast using Hockney’s method (O(Ng log Ng).
•  Compute fields using finite differences on the grid.
•  Interpolate fields to the particles.
Examples charge-deposition functions
in 1D:

Both of these functions conserve total


charge:

10/08/19 CS294-73 – Lecture 12 10


Particle-In-Cell Methods
Difficulties with particle-in-cell methods.
1.  Not the right answer for “real” particles (e.g. molecular dynamics).
2.  Accumulation of numerical error for long-time integrations.
3.  Solutions to 2:
I.  Introduce a smoothing of the particles that scales more slowly than
the mesh spacing.
II.  Remap the particles onto a fixed grid every few time steps. Accuracy
vs. positivity.

10/08/19 CS294-73 – Lecture 12 11


Different deposition functions

10/08/19 CS294-73 – Lecture 12 12


Splitting into short-range and long-range

To compute the force on a particle (red), represent as a combination of


•  Contributions from nearby particles (black), using (local) N-Body calculations.
•  Contributions from far-away particles, using PIC (blue).
How do you do this without double-counting, or having many PIC solves ?
What is the error of the resulting method for an arbitrary distribution of a
finite number of particles ?

10/08/19 CS294-73 – Lecture 12 13


PPPM (Hockney and Eastwood)

Idea: express the potential as a sum.

G(z) = GP P (z) + GP M (z)

GP P (z) =0 for |z| >


0
=G(z) for |z| < < , = O( 0 )
X
•  (rGP P )(x x0k ) Computed using N-body calculation.
k0
X
•  (rGP M )(x x0k ) Approximated using PIC.
k0

3
Cost of PP calculation: O(N )N = O(N 2 3 )

Cost of PM calculation: O(N ) + O(Ng logNg )

10/08/19 CS294-73 – Lecture 12 14


P3M (Hockney and Eastwood)

Issues:
•  How to implement ?
• Approach: basic particle representation from PIC doesn’t change - only
force calculation does.
•  What is the error ?
•  This is only a question about the error in the PIC calculation - the
decomposition and the PP calculation are exact.
•  Error analysis is tricky, since the number of particles is fixed.

10/08/19 CS294-73 – Lecture 12 15


PM Force Calculation
X
⇢gi = qk (ihg xk )
k
g
X
i = GP M (ihg jhg )⇢gi
j2Z3

F~ig = rg ( g )i
X g
F~P M,k = F~i (xk ihg )
i

Focus on the first two steps. We are making the approximation


X X
qk GP M (x xk ) ⇡ GP M (x jhg )⇢gi
k j2Z3
X
= GP M (x jhg )qk (jhg xk )
j2Z3
X X
= qk GP M (x jhg ) (jhg xk )
k j2Z3
10/08/19 CS294-73 – Lecture 12 16
PM Force Calculation
In what sense does
X
GP M (x xk ) ⇡ GP M (x jhg ) (jhg xk )?
j2Z3

Taylor expansion of GPM(y) = GPM(x - y) about y = xk (multi-index


notation).
X X
GP M (x jhg ) (jhg xk ) =GP M (x xk ) (jhg xk ) =1 (conservation of charge)
j2Z3 j2Z3
X ( 1)|p|1 X
+ (rp GP M )|(x xk ) (jhg xk )p (jhg xk )
p
p!
j2Z3
⇣ hP ⌘ =0 (moment conditions)
g
+O
max(|x xk |, )P +1
Remainder term in the Taylor expansion.

10/08/19 CS294-73 – Lecture 12 17


PM Force Calculation
Thus the error in the field induced by a single particle is
⇣ hP ⌘
g
O
max(|x xk |, )P +1

What happens for many particles - some nearby, some farther away ?
X ⇣ hP ⌘
g
✏= qk O
max(|x xk |, )P +1
k
X ⇣ hP ⌘
2
= O P +1
q̄ (r )
r
(r )
⇣ hPP X R
1 ⌘ ⇣ ⇣ h ⌘P 2 ⌘
g 2 g
=O
= O PP 22 P
P 11
= O h g
r>0
r
r=1

(Missing a factor of log(R) if P=2).

10/08/19 CS294-73 – Lecture 12 18


Method of Local Corrections (a P3M method)
Idea: compute far field effects with particle-in-cell, effect of nearby particles
with N-body calculation. For this approach, it is conceptually simpler to work
directly with the fields / forces.

•  Deposit fields on the grid.

•  Compute discrete convolution to get forces: C = ∞, original force calculation


evaluated at grid points. Why can
~ h = Gh ⇤ D
F ~h we take C to be finite with only a
small error ?
•  Compute fields at particle locations:

10/08/19 CS294-73 – Lecture 12 19


Finite-Difference Localization
Decay of truncation error:

Plot of for 27-point operator (q = 6) left; 53-point operator (q=8), right.

q
✏ = O(C )

But it is known how to construct discretizations that are very high


order accurate on harmonic functions: 5x5x5 stencil -> q = 10.
Size of stencil impacts cost of deposition.

10/08/19 CS294-73 – Lecture 12


Homework 3: Matrix Multiply, fast and slow.
•  You will implement dense general matrix matrix multiply for two column-wise stored dense
matrices A and B. This data looks like the data layout we described for the float* indexing in
Homework 1, except we will now be using double precision (double not float). The result is
stored into another Matrix C. These will be square matrices, and a few dozen matrix sizes will
be executed and then checked for correctness. A and B are initialized with random data.
1. implement your own triply-nested loop version of dense matrix multiply and put it in the file
dgemm-naive.cpp. it will contain one function declared as
void square_dgemm( int n, double *A, double *B, double *C )

compile the 'naive' makefile target and execute and capture the output in a file named
'naive.out' which you check into the repo
2. change the compiler flags from the default '-g -Wall' flags to '-O3'. i.e., turn on compiler
optimization. build 'naive' again, and produce a 'naive_opt.out' output.
3. implement a version of this function in a file named dgemm-blas.cpp with a call to cblas
third party library. Build the 'blas' makefile target. run the code and create a 'blas.out'
output for your problem to submit.

10/08/19 CS294-73 – Lecture 12 21


Homework 3: FFT, fast and slow

•  You will write classes FFT1DRecursive, FFT1DW, derived


from FFT1D. You will build them with the FFT1D class, and time
them using the Unix time shell command. We will provide an
implementation of the bit-reversed algorithm (BRI). Time them
with both the debug and optimized versions.
•  Once you have your three implementations running, you will test
them inside the 3D driver, timing them for the sizes indicated.

10/08/19 CS294-73 – Lecture 12 22


FFTW (From Section 2.1 of http://www.fftw.org/fftw3_doc/ )

#include “fftw3.h”
...
{
fftw_complex *in, *out;
fftw_plan p; ... in = (fftw_complex*)
fftw_malloc(sizeof(fftw_complex) * N);
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE); ...
fftw_execute(p);
/* repeat as needed */ ...
fftw_destroy_plan(p);
fftw_free(in);
fftw_free(out);
}

•  Need to go to the fftw site and install fftw on your own machine.
•  You won’t use this approach from the FFTW document. fftw needs to have its own view of the data
– you will alias your own complex data to fftw’s data, as indicated in the last lecture.

10/08/19 CS294-73 – Lecture 12 23


Homework #3 as seen through the makefiles

•  hw3/GNUmakefile – contains common makefile definitions.


•  hw3/dgemmTest/GNUmakefile – build rules for matrix-multiply tests.
•  hw3/fftTest/GNUmakefile – build rules for FFT tests.
•  hw3/fftTools/GNUmakefile – build rules for FFT libraries.

10/08/19 CS294-73 – Lecture 12 24


Building and Using Libraries
Makefile for building FFT1D test.

In $(HOME)/GNUmakefile:
LIBS_LOCAL = $(HOME)/lib
LIB_FLAGS:= -L$(LIBS_LOCAL) -L$(FFTW_HOME)/lib -lfftw3 -lfft1D

test1d: FFT1DTest.cpp GNUmakefile $(LIBS_LOCAL)/libfft1D.a


$(CXX) $(CXXFLAGS) FFT1DTest.cpp $(LIB_FLAGS) -o test1d.exe

What is libfft1D.a ?
•  Collection of binaries (.o files), assembled into an archive (.a)
•  Unix utilities for building archives: ar (Linux), libTool (Mac).
•  You’ve already been accessing such archives (part of the compilation system).
- /usr/lib , /usr/local/lib
•  Compiler / linker told where to look for them through -L , -l flags.
- -L<dir_name> : search this directory for .a files.
- -l<root> : look for library name of the form lib<root>.a

10/08/19 CS294-73 – Lecture 12 25


Building and Using Libraries
In fftTools/GNUmakefile:

1DFFTOBJS = FFT1DBRI.o PowerItoI.o FFTCTBRI.o FFT1DRecursive.o FFTW1D.o


libfft1D.a: GNUmakefile $(1DFFTOBJS)
$(LIBTOOL) libfft1D.a $(1DFFTOBJS)
mkdir -p ../lib;mv libfft1D.a $(LIBS_LOCAL)

In hw3/GNUmakefile:
$(LIBS_LOCAL)/libfft1D.a:$(wildcard $(FFT_HOME)/*.{H,cpp} ) GNUmakefile
cd $(FFT_HOME);make clean;make libfft1D.a DIM=1 CXX=$(CXX)

10/08/19 CS294-73 – Lecture 12 26


Test Programs for Assignment 3
class FFT1D
{
public:
// Interface class for complex-to-complex power-of-two FFT on the unit interval.
FFT1D();
// Constructor. argument a_M specifies number of points is N= 2^{a_M}
FFT1D(unsigned int a_M){m_M = a_M; m_N = Power(2,m_M);}
virtual ~FFT1D() { }
// Forward FFT: a_fHat[k] = \sum_j=0^{N-1} a_f[j] z^{j k}, z = e^{-2 \pi \iota /
m_N}
virtual void forwardFFTCC(vector<complex<double> > & a_fHat,
const vector<complex<double> >& f) const = 0;
// inverse FFT: a_f[j] = \sum_{k=0}^{N-1} a_fHat[k] z^{j k}, z = e^{2 \pi \iota /
m_N}
virtual void inverseFFTCC(vector<complex<double> > & a_f,
const vector<complex<double> > & a_fHat) const = 0;
// Access functions.
const unsigned int& getN(){return m_N;};
const unsigned int& getM(){return m_M;}
protected:
unsigned int m_M, m_N;
};

10/08/19 CS294-73 – Lecture 12 27


Test Programs for Assignment 3
int main(int argc, char* argv[])
{
int M;
int inputMode;
string fft_string;

cout << "input log_2(N), N = number of points" << endl;


cin >> M ;
cout << "input test mode < N" << endl;
cin >> inputMode;

10/08/19 CS294-73 – Lecture 12 28


Test Programs for Assignment 3
cout << "input FFT being tested: Recursive, BRI, FFTW" << endl;
cin >> fft_string;
shared_ptr<FFT1D> p_fft;

if (fft_string == "Recursive")
{
// Uncomment and delete the abort when ready to test.
// shared_ptr<FFT1DRecursive> p_fft1dR
// =shared_ptr<FFT1DRecursive>(new FFT1DRecursive(M));
// p_fft = dynamic_pointer_cast<FFT1D >(p_fft1dR);
cout << "this one is for the students to do" << endl;
abort();
}
else if (fft_string == "BRI")
{
shared_ptr<FFT1DBRI> p_fft1dBRI =
shared_ptr<FFT1DBRI>(new FFT1DBRI(M));
p_fft = dynamic_pointer_cast<FFT1D >(p_fft1dBRI);
}

10/08/19 CS294-73 – Lecture 12 29


Test Programs for Assignment 3
else if (fft_string == "BRI")
{
shared_ptr<FFT1DBRI> p_fft1dBRI =
shared_ptr<FFT1DBRI>(new FFT1DBRI(M));
p_fft = dynamic_pointer_cast<FFT1D >(p_fft1dBRI);
}
else if (fft_string == "FFTW")
{
// Uncomment and delete the abort when ready to test.
// p_fft =
// dynamic_pointer_cast<FFT1D >(shared_ptr<FFTW1D>(new FFTW1D(M)));

cout << "this one is for the students to do" << endl;
abort();
}
else
{cout << "invalid input - should use BRI, Recursive or FFTW as name
for FFT implementation to be tested" << endl;
abort();}

10/08/19 CS294-73 – Lecture 12 30


Test Programs for Assignment 3
double error = test1(p_fft);
int mode = test2(p_fft,inputMode);
cout << fft_string<< ": test 1: error in Gaussian = " << error << endl;
cout << fft_string << ": test 2: reproducing input Fourier mode " <<
inputMode <<
" , output mode " << mode <<endl;
cout << "The input mode number and the output mode number should match"
<< endl;
cout << "if multiple modes have non-roundoff amplitude, this is an error
and the output mode is set to -1" << endl;

10/08/19 CS294-73 – Lecture 12 31


Test Programs for Assignment 3
int main(int argc, char* argv[])
{
int M;
double time;

sscanf(argv[1],"%d",&M);
dynamic_pointer_cast<FFT1D >(shared_ptr<FFTW1D>(new FFTW1D(M)));
FFTMD fftmd(p_fft);
double error = test(fftmd,time);
cout << "test 1: error in Gaussian = " << error << endl;
cout << "time in FFTMD = " << time << " seconds" << endl;
};

10/08/19 CS294-73 – Lecture 12 32

You might also like