0% found this document useful (0 votes)
11 views13 pages

A Fast Algorithm For Particle Simulations Greengard Rokhlin

document uploaded

Uploaded by

se22pmat003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views13 pages

A Fast Algorithm For Particle Simulations Greengard Rokhlin

document uploaded

Uploaded by

se22pmat003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

JOURNAL OF COMPUTATIONAL PHYSICS 135, 280–292 (1997)

ARTICLE NO. CP975706

A Fast Algorithm for Particle Simulations*


L. Greengard and V. Rokhlin
Department of Computer Science, Yale University, New Haven, Connecticut 06520

Received June 10, 1986; revised February 5, 1987

We restrict our attention in this paper to the case where


An algorithm is presented for the rapid evaluation of the potential the potential (or force) at a point is a sum of pairwise
and force fields in systems involving large numbers of particles interactions. More specifically, we consider potentials of
whose interactions are Coulombic or gravitational in nature. For a the form
system of N particles, an amount of work of the order O(N 2) has
traditionally been required to evaluate all pairwise interactions, un-
less some approximation or truncation method is used. The algo- F 5 Ffar 1 (Fnear 1 Fexternal),
rithm of the present paper requires an amount of work proportional
to N to evaluate all interactions to within roundoff error, making it where Fnear (when present) is a rapidly decaying potential
considerably more practical for large-scale problems encountered in
plasma physics, fluid dynamics, molecular dynamics, and celestial
(e.g., Van der Waals), Fexternal (when present) is indepen-
mechanics. Q 1987 Academic Press dent of the number of particles, and Ffar , the far-field
potential, is Coulombic or gravitational. Such models de-
scribe classical celestial mechanics and many problems in
1. INTRODUCTION plasma physics and molecular dynamics. In the vortex
method for incompressible fluid flow calculations [4], an
The study of physical systems by means of particle simu- important and expensive portion of the computation has
lations is well established in a number of fields and is the same formal structure (the stream function and the
becoming increasingly important in others. The most classi- vorticity are related by Poisson’s equation).
cal example is probably celestial mechanics, but much re- In a system of N particles, the calculation of Fnear re-
cent work has been done in formulating and studying parti- quires an amount of work proportional to N, as does the
cle models in plasma physics, fluid dynamics, and molecular calculation of Fexternal . The decay of the Coulombic or
dynamics [5]. gravitational potential, however, is sufficiently slow that
There are two major classes of simulation methods. Dy- all interactions must be accounted for, resulting in CPU
namical simulations follow the trajectories of N particles time requirements of the order O(N 2). In this paper a
over some time interval of interest. Given initial positions method is presented for the rapid (order O(N)) evaluation
hxi j and velocities, the trajectory of each particle is gov- of these interactions for all particles.
erned by Newton’s second law of motion, There have been a number of previous efforts aimed
at reducing the computational complexity of the N-body
d 2xi problem. Particle-in-cell methods [5] have received careful
mi 5 2=iF for i 5 1, ..., N, study and are used with much success, most notably in
dt 2
plasma physics. Assuming the potential satisfies Poisson’s
equation, a regular mesh is layed out over the computa-
where mi is the mass of the ith particle and the force is
tional domain and the method proceeds by:
obtained from the gradient of a potential function F. When
one is interested in an equilibrium configuration of a set (1) interpolating the source density at mesh points,
of particles rather than their time-dependent properties, (2) using a ‘‘fast Poisson solver’’ to obtain potential
an alternative approach is the Monte Carlo method. In values on the mesh,
this case, the potential function F has to be evaluated for
a large number of configurations in an attempt to deter- (3) computing the force from the potential and inter-
mine the potential minimum. polating to the particle positions.
The complexity of these methods is of the order
Reprinted from Volume 73, Number 2, December 1987, pages 325–348. O(N 1 M log M), where M is the number of mesh points.
* The authors were supported in part by the Office of Naval Research The number of mesh points is usually chosen to be propor-
under Grant N00014-82-K-0184. tional to the number of particles, but with a small constant
280
0021-9991/97 $25.00
Copyright  1987 by Academic Press
All rights of reproduction in any form reserved.
ALGORITHM FOR PARTICLE SIMULATION 281

of proportionality so that M ! N. Therefore, although the fx0(x, y) 5 2log(ix 2 x0i)


asymptotic complexity for the method is O(N log N), the
computational cost in practical calculations is usually ob- and
served to be proportional to N. Unfortunately, the mesh
provides limited resolution, and highly nonuniform source (x 2 x0)
distributions cause a significant degradation of perfor- Ex0(x, y) 5 ,
ix 2 x0i2
mance. Further errors are introduced in step (3) by the
necessity for numerical differentiation to obtain the force.
respectively.
To improve the accuracy of particle-in-cell calculations,
It is well known that fx0 is harmonic in any region not
short-range interactions can be handled by direct computa-
containing the point x0 . Moreover, for every harmonic
tion, while far-field interactions are obtained from the
function u, there exists an analytic function w: C R C such
mesh, giving rise to so-called particle–particle/particle–
that u(x, y) 5 Re(w(x, y)) and w is unique except for an
mesh (P 3M) methods [5]. For an implementation of these
additive constant. In the remainder of the paper we will
ideas in the context of vortex calculations, see [1]. While
work with analytic functions, making no distinction be-
these algorithms still depend for their efficient perfor-
tween a point (x, y) [ R2 and a point x 1 iy 5 z [ C. We
mance on a reasonably uniform distribution of particles,
note that
in theory they do permit arbitrarily high accuracy to be
obtained. As a rule, when the required precision is rela-
tively low, and the particles are distributed more or less fx0(x) 5 Re(2log(z 2 z0)),
uniformly in a rectangular region, P 3M methods perform
satisfactorily. However, when the required precision is high and, following standard practice, we will refer to the ana-
(as, for example, in the modeling of highly correlated sys- lytic function log(z) as the potential due to a charge. As
tems), the CPU time requirements of such algorithms tend we develop expressions for the potential due to more com-
to become excessive. plicated charge distributions, we will continue to use com-
Appel [2] introduced a ‘‘gridless’’ method for many- plex notation and will refer to the corresponding analytic
body simulation with a computational complexity esti- functions themselves as the potentials. The following
mated to be of the order O(N log N). It relies on using a lemma is an immediate consequence of the Cauchy–
monopole (center-of-mass) approximation for computing Riemann equations.
forces over large distances and sophisticated data struc- LEMMA 2.1. If u(x, y) 5 Re(w(x, y)) describes the poten-
tures to keep track of which particles are sufficiently clus- tial field at (x, y), then the corresponding force field is
tered to make the approximation valid. For certain types given by
of problems, the method achieves a dramatic speedup,
compared to the naive O(N 2) approach. It is less efficient
=u 5 (ux , uy) 5 (Re(w9), 2 Im(w9)),
when the distribution of particles is relatively uniform and
the required precision is high.
The algorithm we present uses multipole expansions to where w9 is the derivative of w.
compute potentials or forces to whatever precision is re- The following lemma is used in obtaining the multipole
quired, and the CPU time expended is proportional to N. expansion for the field due to m charges.
The approach we use is similar to the one introduced in
[7] for the solution of boundary value problems for the LEMMA 2.2. Let a point charge of intensity q be located
Laplace equation. In the following section, we describe the at z0 . Then for any z such that uzu . uz0u,
necessary analytical tools, while Section 3 is devoted to a
detailed description of the method.
S
fz0(z) 5 q log(z 2 z0) 5 q log(z) 2 O k1 Szz D D.
y

k51
0
k
(2.1)
2. PHYSICAL AND MATHEMATICAL PRELIMINARIES
Proof. Note first that log(z 2 z0) 2 log(z) 5 log(1 2
In this paper, we consider a two-dimensional physical
z0 /z) and that uz0 /zu , 1. The lemma now follows from
model which consists of a set of N charged particles with
the expansion
the potential and force obtained as the sum of pairwise

O wk ,
interactions from Coulomb’s law. Suppose that a point y k
charge of unit strength is located at the point (x0 , y0) 5 log(1 2 g) 5 (21)
x0 [ R2. Then, for any x 5 (x, y) [ R2 with x ? x0 , k51

the potential and electrostatic field due to this charge are


described by the expressions which is valid for any w such that uwu , 1.
282 GREENGARD AND ROKHLIN

THEOREM 2.1. (Multipole expansion). Suppose that m


charges of strengths hqi , i 5 1, ..., mj are located at points
hzi , i 5 1, ..., mj, with uzi u , r. Then for any z [ C with
uzu . r, the potential f(z) is given by

f(z) 5 Q log(z) 1 O za ,
y

k51
k
k (2.2)

FIG. 1. Well-separated sets in the plane.


where

Q5 O
m

i51
qi , ak 5 O
i51
2qi zik
m

k
. (2.3) that hy1 , y2 , ..., ynj is another set of points in C (Fig. 1).
We say that the sets hxi j and hyi j are well separated if there
exist points x0 , y0 [ C and a real r . 0 such that
Furthermore, for any p $ 1,
uxi 2 x0u , r for all i 5 1, ..., m,
Uf(z) 2 Q log(z) 2
k51
ak
z k #a
r
z
O U UU
p p11
#
A
S DS D
c21
1 p
c
,
uyj 2 y0u , r
ux0 2 y0u . 3r.
for all j 5 1, ..., n,

(2.4)
In order to obtain the potential (or force) at the points
where hyj j due to the charges at the points hxi j directly, we
could compute

c5 UU
z
r
m

i51
O
, A 5 uqi u, and a 5
1 2
A
ur/zu
. (2.5)
O f (y )
m

i51
xi j for all j 5 1, ..., n. (2.7)

Proof. The form of the multipole expansion (2.2) is an This clearly requires order nm work (evaluating m fields
immediate consequence of the preceding lemma and the at n points). Now suppose that we first compute the coeffi-
fact that f(z) 5 oi51 fzi(z). To obtain the error bound
m
cients of a p-term multipole expansion of the potential due
(2.4), observe that to the charges q1 , q2 , ..., qm about x0 , using Theorem 2.1.
This requires a number of operations proportional to mp.

U f(z) 2 Q log(z) 2 O za U 5 U O za U.
p

k51
k
k
y

k5p11
k
k
Evaluating the resulting multipole expansion at all points
yj requires order np work, and the total amount of compu-
tation is of the order O(mp 1 np). Moreover, by (2.6),

UO U SD
Substituting for ak the expression in (2.3), we have
O
p
m
ak 1 p
fxi (yj ) 2 Q log(yj 2 x0) 2 # A ,

UO U O O UU UU k51 uyj 2 x0u


k
y y y k p11 i51 2
ak rk r r
k #A k#A 5a
k5p11 z k5p11 k uzu k5p11 z z
and in order to obtain a relative precision « (with respect

S DS D 5
A
c21
1
c
p
.
to the total charge), p must be of the order 2log2(«). Once
the precision is specified, the amount of computation has
been reduced to

In particular, if c $ 2, then O(m) 1 O(n),

U O za U # A S12D .
p p
k which is significantly smaller than nm for large n and m.
f(z) 2 Q log(z) 2 k (2.6)
k51
2.1. Translation Operators and Error Bounds
Finally, we demonstrate, with a simple example, how The following three lemmas constitute the principal ana-
multipole expansions can be used to speed up calculations lytical tool of this paper, allowing us to manipulate
with potential fields. Suppose that charges of strengths q1 , multipole expansions in the manner required by the fast
q2 , ..., qm are located at the points x1 , x2 , ..., xm [ C and algorithm. Lemma 2.3 provides a formula for shifting the
ALGORITHM FOR PARTICLE SIMULATION 283

center of a multipole expansion, Lemma 2.4 describes how


to convert such an expansion into a local (Taylor) expan-
sion in a circular region of analyticity, and Lemma 2.5
furnishes a mechanism for shifting the center of a Taylor
expansion within a region of analyticity. We also derive
error bounds associated with these translation operators
which allow us to carry out numerical computations to any
specified accuracy.
LEMMA 2.3. Suppose that

f(z) 5 a0 log(z 2 z0) 1 O (z 2a z )


y

k51
k

0
k (2.8)

is a multipole expansion of the potential due to a set of m


charges of strengths q1 , q2 , ..., qm , all of which are located
inside the circle D of radius R with center at z0 . Then for
z outside the circle D1 of radius (R 1 uz0u) and center at
the origin,

Oy
FIG. 2. Source charges q1 , q2 , ..., q1 are contained in the circle D1 .
bl The corresponding multipole expansion about z0 converges inside D2 . C
f(z) 5 a0 log(z) 1 l, (2.9)
l5 1 z is a circle of radius s, with s . R.

where
LEMMA 2.4. Suppose that m charges of strengths q1 ,

bl 5 SOl

k51
akz l02k S DD
l21
k21
2
a0z 0l
0
, (2.10)
q2 , ..., qm are located inside the circle D1 with radius R and
center at z0 , and that uz0u . (c 1 1)R with c . 1 (Fig. 2).
Then the corresponding multipole expansion (2.8) con-
verges inside the circle D2 of radius R centered about the
with (kl ) the binomial coefficients. Furthermore, for any origin. Inside D2 , the potential due to the charges is de-
p $ 1, scribed by a power series,

Uf(z) 2 a0 log(z) 2
bl
z
OU
l
p
f(z) 5 O b ?z ,
y

l50
l
l
(2.12)
l51
(2.11)

S @S U
# A 12
uz0u 1 R
z
UDDU uz0u 1 R
z
U p11 where

with A defined in (2.5).


b0 5 O za (21) 1 a log(2z )
k51
y
k
k
0
k
0 0 (2.13)

Proof. The coefficients of the shifted expansion (2.9) and


are obtained by expanding the expression (2.8) into a Tay-
lor series with respect to z0 . For the error bound (2.11),
observe that the terms hb1j are the coefficients of the bl 5 S O S
1 y ak l 1 k 2 1
z 0l k51 z 0k k21
(21)k 2
l ?
D D
a0
z 0l
for l $ 1.
(unique) multipole expansion about the origin of those (2.14)
charges contained in the circle D, and Theorem 2.1 applies
immediately with r replaced by uz0u 1 R. Furthermore, for any p $ max(2, 2c/(c 2 1)), an error
Remark. Once the values ha0 , a1 , ..., apj in the expan- bound for the truncated series is given by
sion (2.8) about z0 are computed, we can obtain hb1 , ...,
U O b ? z U , A(4e(p 1c(cc)(c2 1)1 1) 1 c ) S1cD
p 2 p11
bpj exactly by (2.10). In other words, we may shift the f(z) 2 l
,
l
center of a truncated multipole expansion without any loss l50
of precision. (2.15)
284 GREENGARD AND ROKHLIN

where A is defined in (2.5) and e is the base of natural loga- Obviously, for any t lying on C,
rithms.

Proof. We obtain the coefficients of the local expan-


sion (2.12) from Maclaurin’s theorem applied to the multi-
uf1(t)u # O U(t 2a z ) U,
k51
y
k

0
k

ple expansion (2.8). To derive the error bound (2.15),


we let c0 5 a0 log(2z0), cl 5 2(a0 /l ? z 0l for l $ 1, and and it is easy to see that
bl 5 bl 2 cl for l $ 0. Then
uaku # AR k, ut 2 z0u $ R 1 cR 2 s 5 R 1 cR/p.

U O b ?z U 5 U O b ?z U # S 1 S
p y
f(z) 2 l
l
l
l
1 2 (2.16) After some algebraic manipulation, we have
l50 l5p11

with M#A S pR 1 cR
cR
D
uzu cR 2 R
, 12 $
s cR 1 R
.

S1 5 UO U y

l5p11
cl ? z l , S2 5 UO U y

l5p11
bl ? z l .
Observing that for any positive integer n and any integer
p $ 2,

A bound for S1 is easily found by observing that


S D
11
1
n
n
# e, S 11
1
p21
D
2
# 4,

S1 5 UO U O
y

l5p11
O
clz l # ua0u
y
zl
l5p11 l ? z 0
l # A
y
zl
l5p11 l ? z 0
l
we obtain

#A O S D O S D S DS D
y

l5p11
1
c11
l
,A
l5p11
y
1
c
l
5
A
c21
1 p
c
. S2 #
A(pR 1 cR)(cR 1 R) uzu
cR(cR 2 R) cR
S D S D p11
p
p21
p11

To obtain a bound for S2 , let C be a circle of radius s, #


A(p 1 c)(c 1 1) 1
c(c 2 1) c
SD S D S
p11
11
1
p21
p21
11
1
p21
D
2

where s 5 cR((p 2 1)/p) (Fig. 2). Note first that for any
p $ 2c/(c 2 1),
#
4Ae(p 1 c)(c 1 1) 1
c(c 2 1) c
SD p11
.

cR 1 R
R, , s , cR.
2 Adding the last expression to the error bound for S1 com-
pletes the proof.
Defining the function f1 : C\D1 R C by the expression The following lemma is an immediate consequence of
Maclaurin’s theorem. It describes an exact translation op-
f1(z) 5 f(z) 2 a0 ? log(z 2 z0), eration with a finite number of terms, and no error bound
is needed.

and using Taylor’s theorem for complex analytic functions LEMMA 2.5. For any complex z0 , z, and hakj, k 5 0, 1,
(see [6, p. 190]), we obtain 2, ..., n,

S2 5 f1(z) 2 U ObzU5U O bzU


p

l
l
y
l
l O a (z 2 z ) 5 O SO a SklD (2z ) D z .
k50
n
k 0
k
l50
n

k5l
n
k 0
k2l l
(2.17)
l50 l5p11

@S1 2 uzus DSuzus D


p11
#M , 3. THE FAST MULTIPOLE ALGORITHM

In this section, we present an algorithm for the rapid


where evaluation of the potentials and/or electrostatic fields due
to distributions of charges. The central strategy used is
that of clustering particles at various spatial lengths and
M 5 max uf1(t)u.
C computing interactions with other clusters which are suffi-
ALGORITHM FOR PARTICLE SIMULATION 285

FIG. 3. The computational box (shaded) and its nearest periodic


images. The box is centered at the origin ‘‘0’’ and has area one.

ciently far away by means of multipole expansions. Interac-


tions with particles which are nearby are handled directly.
To be more specific, let us consider the geometry of the
computational box, depicted in Fig. 3. It is a square with
sides of length one, centered about the origin of the coordi- FIG. 4. The computational box and three levels of refinement.
nate system, and is assumed to contain all N particles of
the system under consideration. The eight nearest neighbor
boxes are also shown and will be needed in the next section
C̃l,i the p-term local expansion about the center of box
when considering various boundary conditions. First, we
i at level l, describing the potential field due to all
will describe the method for free-space problems, where
particles outside i’s parent box and the parent box’s
the boundary can be ignored and the only interactions to
nearest neighbors.
be accounted for involve particles within the computational
box itself. Interaction list for box i at level l, it is the set of boxes
Fixing a precision «, we choose p P log2(«) and specify which are children of the nearest neigh-
that no interactions be computed for clusters of particles bors of i’s parent and which are well
which are not well separated. This is precisely the condition separated from box i (Fig. 5).
needed for the error bounds (2.4), (2.11), and (2.15) to Suppose now that at level l 2 1, the local expansion Cl21,i
apply with c 5 2, the truncation error to be bounded by has been obtained for all boxes. Then, by using Lemma 2.5
22p, and the desired precision to be achieved. In order to
impose such a condition, we introduce a hierarchy of
meshes which refine the computational box into smaller
and smaller regions (Fig. 4). Mesh level 0 is equivalent to
the entire box, while mesh level l 1 1 is obtained from
level l by subdivision of each region into four equal parts.
The number of distinct boxes at mesh level l is equal to
4 l. A tree structure is imposed on this mesh hierarchy, so
that if ibox is a fixed box at level l, the four boxes at
level l 1 1 obtained by subdivision of ibox are considered
its children.
Other notation used in the description of the algo-
rithm includes
Fli the p-term multipole expansion (about the box cen-
ter) of the potential field created by the particles
contained inside box i at level l,
Cl,i the p-term expansion about the center of box i FIG. 5. Interaction list for box i. Thick lines correspond to mesh level
at level l, describing the potential field due to all 2 and thin lines to level 3. Boxes marked with an ‘‘x’’ are well separated
particles outside the box and its nearest neighbors. from box i and contained within the nearest neighbors of box i’s parent.
286 GREENGARD AND ROKHLIN

to shift (for all i) the expansion Cl21,i to each of box i’s local expansion describes the field due to all
children, we have, for each box j at level l, a local represen- particles in the system that are not contained
tation of the potential due to all particles outside of j ’s in the current box or its nearest neighbors.
parent’s neighbors, namely C̃l, j . The interaction list is, Once the local expansion is obtained for a
therefore, precisely that set of boxes whose contribution given box, it is shifted, in the second inner
to the potential must be added to C̃l, j in order to create Cl, j . loop to the centers of the box’s children,
This is done by using Lemma 2.4 to convert the multipole forming the initial expansion for the boxes
expansions of these interaction boxes to local expansions at the next level.]
about the current box center and adding them to the expan- Set C̃1,1 5 C̃1,2 5 C̃1,3 5 C̃1,4 5 (0, 0, ..., 0)
sion obtained from the parent. Note also that with free- do l 5 1, ..., n 2 1
space boundary conditions, C0,i and C1,i are equal to zero do ibox 5 1, ..., 4 l
since there are no well-separated boxes to consider, and Form Cl,ibox by using Lemma 2.4 to convert the
we can begin forming local expansions at level 2. The multipole expansion Fl, j of each box j in interac-
following is a formal description of the algorithm. tion list of box ibox to a local expansion about the
center of box ibox, adding these local expansions
ALGORITHM.
together, and adding the result to C̃l,ibox .
Initialization enddo
Choose a level of refinement n P log4 N, a precision «, do ibox 5 1, ..., 4 l
and set p P log2(«). Form the expansion C̃l11, j for ibox’s children by
Upward Pass using Lemma 2.5 to expand Cl,box about the chil-
dren’s box centers.
Step 1
enddo
Comment [From multipole expansions of potential field
enddo
due to particles in each box about the box
center at the finest mesh level.] Step 4
do ibox 5 1, ..., 4n Comment [Compute interactions at finest mesh level.]
Form a p-term multipole expansion Fn,ibox , by using do ibox 5 1, ..., 4n
Theorem 2.1. Form Cl,box by using Lemma 2.4 to convert the
enddo multipole expansion Fl, j of each box j in interaction
Step 2 list of box ibox to a local expansion about the center
Comment [Form multipole expansions about the centers of box ibox, adding these local expansions together,
of all boxes at all coarser mesh levels, each and adding the result to C̃l,ibox .
expansion representing the potential field enddo
due to all particles contained in one box.] Comment [Local expansions at finest mesh level are now
do l 5 n 2 1, ..., 0 available. They can be used to generate the
do ibox 5 1, ..., 4 l potential or force due to all particles outside
Form a p-term multipole expansion Fl,ibox , by the nearest neighbor boxes at finest mesh
using Lemma 2.3 to shift the center of each child level.]
box’s expansion to the current box center and Step 5
adding them together. Comment [Evaluate local expansions at particle posi-
enddo tions.]
enddo do ibox 5 1, ..., 4n
Downward Pass For every particle pj located at the point zj in box
Comment [In the downward pass, interactions are consis- ibox, evaluate Fn,ibox(zj ).
tently computed at the coarsest possible level. enddo
For a given box, this is accomplished by in-
Step 6
cluding interactions with those boxes which
Comment [Compute potential (or force) due to nearest
are well separated and whose interactions
neighbors directly.]
have not been accounted for at the parent’s
do ibox 5 1, ..., 4n
level.]
For every particle pj in box ibox, compute interac-
Step 3 tions with all other particles within the box and its
Comment [Form a local expansion about the center of nearest neighbors.
each box at each mesh level l # n 21. This enddo
ALGORITHM FOR PARTICLE SIMULATION 287

Step 7 tional to N, it is easy to see that the asymptotic storage


do ibox 5 1, ..., 4n requirements of the algorithm are of the form
For every particle in box ibox, add direct and far-
field terms together. (a 1 bp) ? N
enddo
Remark. Each local expansion is described by the coef- or
ficients of a p-term polynomial. Direct evaluation of this
polynomial at a point yields the potential. But, by Lemma (a 2 b log2(«)) ? N,
2.1, the force is immediately obtained from the derivative
which is available analytically. There is no need for numeri- with the coefficients a and b determined, as above, by the
cal differentiation. Furthermore, due to the analyticity of computer system, language, implementation, etc. In our
F9, there exist error bounds for the force of exactly the numerical experiments, the actual storage requirements
same form as (2.4), (2.11), and (2.15). were of the order

A brief analysis of the algorithmic complexity is given (25 2 log2(«)) ? N


below.
single precision words.

Operation Remark. It is clear that the operation count for Step


Step count Explanation 6 assumes a reasonably homogeneous distribution of parti-
cles. If the distribution were highly nonhomogeneous, then
1 order Np Each particle contributes to one expansion at we would need to refine only those portions of space where
the finest level.
the number of particles is large. Although its description
2 order Np2 At the lth level, 4l shifts involving order p2
work per shift must be performed. is more involved, an adaptive version retains both the
3 order #28Np2 There are at most 27 entries in the interac- accuracy and the computational speed of the algorithm
tion list for each box at each level. An ex- (see [3]).
tra order Np2 work is required for the sec-
ond loop.
4. BOUNDARY CONDITIONS
4 order #27Np2 Again, there are at most 27 entries in the in-
teraction list for each box and PN boxes.
5 order #27Np2 One p-term expansion is evaluated for each A variety of boundary conditions are used in particle
particle. simulations, including periodic boundary conditions, ho-
6 order sLNkn Let kn be a bound on the number of parti- mogeneous Dirichlet or Neumann conditions, and several
cles per box at the finest mesh level. Inter- types of mixed conditions. The periodic case will be treated
actions must be computed within the box
and its eight nearest neighbors, but using
first in some detail. We then turn to the imposition of
Newton’s third law, we need only compute Dirichlet conditions and end with a brief discussion of the
half of the pairwise interactions. other cases.
7 order N Adding two terms for each particle.
4.1. Periodic Boundary Conditions
We begin by reconsidering the computational domain
The estimate for the running time is therefore depicted in Fig. 5. At the end of the upward pass of the
algorithm, we have a net multipole expansion
N(22a log2(«) 1 56b(log2(«))2 1 4.5dkn 1 e),
O za
p
k
F0,1(z) 5 k (4.1)
with the constants a, b, c, d, and e determined by the k51
computer system, language, implementation, etc.
In addition to the asymptotic time complexity, asymp- for the entire computational box. This is then the expansion
totic storage requirements are an important characteristic for each of the periodic images of the box with respect to
of a numerical procedure. The algorithm requires that Fl, j its own center. All of these images except for the ones
and Cl, j be stored, as well as the locations of the particles, depicted in Fig. 3 are well separated from the computational
their charges, and the results of the calculations (the poten- box itself, and their induced fields are accurately represent-
tials and/or electric fields). Since every box at every level able by a p-term local expansion, where, as before, p P
has a pair of p-term expansions, F and C, associated with 2log2(«) is the number of terms needed to achieve a rela-
it, and the lengths of all other storage arrays are propor- tive precision «. We assume that the periodic particle model
288 GREENGARD AND ROKHLIN

has no net charge and, therefore, that the local representa- and Newton’s third law requires that the net force on each
tion given by Lemma 2.4 can be written as particle be zero. But the net force on the particle at the
origin corresponds to the summation over S of 1/z0 , so

Ob
p
that we set
C0,1 5 m?z
m
(4.2)

O z1 5 0.
m51

with S 0

O S ak m 1 k 2 1
D
p To determine a value for the second term,
1
bm 5 (21)k with m 5 0, 1, ..., p,
z 0 k51 z 0k
m
k21
(4.3) O z1 ,
S
2
0

with z0 the center of the image box under consideration.


suppose that the only particle in the simulation is a dipole
Remark. In certain problems (e.g., cosmology), the of strength one, oriented along the x-axis and located at
computational box obviously cannot satisfy the condition the origin. Then the periodic model is again a uniform
of no net charge (mass). This condition is necessary for lattice and the difference in potential between the equiva-
the potential to be well defined, since the logarithmic term lent sites (2As, 0) and (As, 0) must be zero; i.e.,
becomes unbounded as z0 R y. Force calculations, how-
ever, may still be carried out. Indeed, using the notation F(1/2,0) 2 F(21/2,0) ; dF 5 0. (4.5)
of the algorithm, Fl,i , Cl,i , C̃l,i are expansions of analytic
functions representing the potential, so that their deriva- The contribution to the potential difference, dF, of a single
tives are also analytic functions (with the same regions of dipole located at z0 is
analyticity). Moreover, it is clear from Theorem 2.1 that
the derivatives F9l,i are described by pure inverse power 1 1 1
series. Therefore, the identical formal structure of the algo- 2 5 2 .
z0 2 1/2 z0 1 1/2 z 0 2 1/4
rithm can, due to Lemma 2.1, be used to evaluate force
fields everywhere, bypassing the difficulty introduced by
the logarithmic term. The only change required is that Thus, we find that the potential difference due to the origi-
the initial expansions computed be the derivatives of the nal dipole located at the origin is 24. For an image dipole
multipole expansions and not the multipole expansions located at z0 , with uz0u $ 1, we can expand the contribution
themselves. to dF as
Note now that well-separated images of the computa-
tional cell are boxes whose centers z0 have integer real 1 1 1
5 21 4 .
and imaginary parts, with Re(z0) $ 2 or Im(z0) $ 2. Let z 20 2 1/4 z 0 4z 0 2 z 20
S be the set of such centers. To account for the field due
to all well-separated images, we form the coefficients for Now let S9 be the set of the centers of all image boxes.
the local representation by adding the local shifted expan- That is, S9 is the set of all points z0 with integer real and
sions of the form (4.3) for all z0 [ S to obtain imaginary parts, excluding the origin. Then

O a Sm 1k 2k 21 1D (21) So z 1 D. O z1 1 O 4z 12 z .
p
b total
m 5 k
k
m1k (4.4) dF 5 24 1 2 4 2
k51 S 0 S9 0 S9 0 0

The summation over S for each inverse power of z0 can A somewhat involved calculation shows that
be precomputed and stored. For (m 1 k) . 2, the series
is absolutely convergent. However, for (m 1 k) # 2, the
series is not absolutely convergent, and the computed value O 4z 12 z 5 4 2 f.
S9
4
0
2
0
depends on the order of addition. Choosing a reasonable
value for the sum of the series requires careful consider-
ation of the physical model. Therefore, to satisfy (4.5), we set
Suppose first that the only particle in the simulation is
a charge of unit strength located at the origin. Then the
periodic model corresponds to a uniform lattice of charges,
O z1 5 f.
S9
2
0
ALGORITHM FOR PARTICLE SIMULATION 289

Now

O z1 5 O z1 O z1 ,
S9
2
0 S
2
0 S 9\ S
2
0

and the sum oS 9\ S (1/z 20) is easily evaluated and found to


be equal to zero. Therefore, we have

O z1 5 f,
S
2
0

and the summation over S for every inverse power of z0


is defined.
The procedure of converting the multipole expansion of
the whole computational cell F0,1 into a local expansion
FIG. 6. The computational cell centered at the origin is represented
C0,1 which describes the potential field due to all well-
by C. The cell C is the image of cell C reflected across the top boundary,
separated images can be written, in the notation of the with corresponding particles assigned charges of opposite sign. The cell
algorithm, as C̃ is the image of cell C reflected across the left boundary, again with
corresponding particles assigned charges of opposite sign. The cell C’ is
C0,1 5 T ? F0,1 , the image of cell C reflected through the origin, with corresponding
particles assigned charges of the same sign. Successive reflections across
the four boundaries of the computational cell yield an infinite expansion
where T is a constant p by p matrix whose entries are of image boxes as indicated in (d).
defined by the formula

Tm,k 5 Sm1k21
k21
D (21)k SO DS
1
zm
0
1k
. where Fsources is the field due to the particles inside the
computational cell and Fimages is the field due to selected
image charges located outside the computational cell. The
This can be viewed as the first step in the downward pass image charge positions and strengths are chosen so that
of the algorithm for periodic boundary conditions. At this
point, we have accounted for all interactions excluding the Fsources(x, y) 5 2Fimages(x, y) for (x, y) [ ­D.
ones within the immediate neighbors of the computational
box as depicted in Fig. 3. But the expansions Fl,i for boxes
inside the computational cell are also the expansions of the For the computational domain we are considering, ap-
corresponding boxes inside the nearest neighbor images of propriate locations for the image charges can be deter-
the computational cell. By adding to the interaction list mined by an iterative process, illustrated in Fig. 6. We
the appropriate boxes, we maintain the formal structure first reflect each particle pi of charge strength si in the
of the algorithm and the associated computational com- computational cell across the top boundary line and place
plexity. an image charge of strength 2si at that location, generating
an image box which we denote C (Fig. 6b)). The set of
4.2. Dirichlet Boundary Conditions image charges is denoted by V1 , and the field they induce
is called FV1 . Adding FV1 to Fsources clearly enforces the
We turn now to the imposition of homogeneous Dirich- desired condition along the top boundary. To impose the
let boundary conditions, namely boundary condition along the bottom of the computational
cell, we must reflect all charges (source and image) cur-
F(x, y) 5 0 for (x, y) [ ­D, rently in the model across the bottom boundary, generating
two more image boxes (which are copies of C and C ). The
where ­D is the boundary of the computational domain. set of all image charges after this second reflection step is
Analytically speaking, this can be accomplished by the denoted by V2 . Now, while Fsources 1 FV2 is equal to zero
method of images, described in detail below. In general along the bottom boundary, the resulting field violates the
terms, we consider the potential field to be composed of top boundary condition. We therefore reflect again across
two parts; that is, the top boundary, creating two new image boxes and a
new set of image charges V3 , such that Fsources 1 FV3
F 5 Fsources 1 Fimages , satisfies the top condition but violates the bottom one. By
290 GREENGARD AND ROKHLIN

iterating in this manner, we generate a sequence of sets converted, by using Lemma 2.5, into an expansion about
of image charges hVi j with the origin (the center of the computational cell), which we
call C̃0,1 . It remains to account for the well-separated boxes
V1 , V2 , V3 , ? ? ? , V, which are contained inside the supercell’s nearest neigh-
bors. There are exactly 27 of these boxes, and their
y
where V 5 <i51 Vi is the set of charges contained in the multipole expansions can be shifted (by using Lemma 2.4)
infinite array of image boxes depicted in Fig. 6c. It is easy to local expansions about the origin which are then added
to see that the corresponding sequence of image fields to C̃0,1 to finally form C0,1 .
hFVi j converges inside the computational cell and that the
potential field Fsources 1 FV does satisfy both the top and 4.3. Other Boundary Conditions
bottom boundary conditions.
While in certain applications, periodic or Dirichlet
In order to enforce the Dirichlet condition on the re-
boundary conditions are called for, in others, Neumann or
maining two sides, we proceed analogously. First, we reflect
mixed conditions have to be imposed on the boundary of
all the charges currently in the model (the original sources
the computational domain. A typical example of a problem
plus the images in V ) across the left boundary. This obvi-
with mixed conditions is the computational cell with Neu-
ously does not affect the top and bottom conditions and
mann conditions on two opposing sides and Dirichlet con-
enforces the homogeneous boundary condition along the
ditions on the two others. Other models require periodic
left side of the computational cell. The current set of (all)
boundary conditions on the left and right sides of the com-
image charges is now denoted H1 . Reflecting across the
putational cell and Dirichlet or Neumann conditions on
right boundary creates a new set H2 , witht he field
the top and bottom. The imposition of these conditions is
Fsources 1 FH2 satisfying the Dirichlet condition along the
achieved by a procedure essentially identical to the one
right (but not the left) boundary. Repeated reflection
described above. By reflection and/or periodic extension,
across the left and right boundaries of the computational
one first generates an entire plane of images. The local
cell yields a sequence hHi j of infinite sets of image charges,
expansion C0,1 is then computed by an appropriate summa-
tion over all well-separated image boxes, and the remaining
H1 , H2 , H3 , ? ? ? , H,
image interactions are handled as above.
y
where H 5 <i51 Hi is the set of charges contained in the
5. NUMERICAL RESULTS
two-dimensional family of image boxes depicted in Fig.
6d. It is easy to see that the sequence hFHi j converges
A computer program has been implemented utilizing
inside the computational cell, and we denote its limit by
the algorithm of this paper and capable of handling free-
FH . Finally, we observe that Fsources 1 FH 5 0 on the
space problems and problems with periodic, homogeneous
entire boundary ­D.
Dirichlet or homogeneous Neumann boundary conditions.
From a computational point of view, the rate of conver-
For testing purposes, we randomly assigned charged par-
gence of the method of images is quite unsatisfactory. In
ticles to positions in the computational cell (Fig. 7), with
conjunction with our algorithm, however, this method can
charge strengths between 0 and 1, and with the numbers
be turned into an extremely efficient numerical tool. In
the terminology previously introduced, all of the image
boxes except the nearest neighbors of the computational
cell are well separated and their induced fields can be
represented by a single local expansion, denoted C0,1 .
Once the coefficients of this local expansion have been
computed, we need only account for interactions within
the nearest neighbors of the computational cell itself. To do
this, as in the periodic case, we simply add the appropriate
image boxes to the interaction lists of the boxes inside the
computational cell.
Thus, it remains only to calculate C0,1 . We first observe
that the plane of images has a periodic structure with unit
‘‘supercell’’ centered at (As, As), indicated by thick lines in
Fig. 6d. But then, by the method developed above for
periodic problems, we can obtain an expansion about the
point (As, As) which accounts for all interactions beyond the
nearest neighbors of the suercell. This expansion can be FIG. 7. 1600 randomly located charges in the computational cell.
ALGORITHM FOR PARTICLE SIMULATION 291

of particles varying from 100 to 12,800. The calculations


were performed on a VAX-8600, and the number of terms
in the expansions Cl,i , F̃l,i , Fl,i was set to 20, guaranteeing
roughly 5-digit accuracy of the result. In each case, we
performed the calculation in three ways: (1) via the algo-
rithm of the present paper in single precision arithmetic;
(2) directly (via formula (2.7)) in single precision arithme-
tic; and (3) via formula (2.7) in double precision arithmetic.
The first two calculations were used to compare the speed
and accuracy of our algorithm to those of the direct
method. The direct evaluation of the field in double preci-
sion was used as a standard for comparing the relative
accuracies of the first two computations. In all cases, the
calculation was performed for a periodic model, the peri-
odic boundary condition being imposed by means of the FIG. 8. The equipotential lines for the electrostatic field due to 10
algorithm described in Section 4 of this paper. randomly located charges in the computational cell, with homogeneous
The results of these numerical experiments are summa- Dirichlet boundary conditions.
rized in Table I. The first column of the table contains the
numbers N of particles for which calculations have been
performed. The second and third columns contain the CPU 1. The accuracy of the results produced by the algo-
times Talg that were required by the algorithm of the pres- rithm is about the same as that predicted by the estimates
ent paper to obtain the fields at all N particles, and the (2.4), (2.11), and (2.15) for the number of terms we are
greatest relative error dalg obtained at any of the particles, using in the expansions Fl,i , C̃l,i , Cl,i . There is no evidence
respectively. Columns 4 and 5 contain the CPU times Tdir of accuracy problems due to truncation errors.
that were required by the direct algorithm (2.7) to obtain
2. The calculation time grows linearly with the number
the fields at all N particles, and the greatest relative error
of charges in the model, even though its behavior is some-
ddir obtained at any one particle, respectively. what erratic.
Remark. For the example involving 12,800 particles, 3. For as few as 1600 particles in the model, the compu-
the algorithm of the present paper required about one tational effort required by the direct algorithm is roughly
minute of CPU time (see Table I). However, it was not 40 times greater than that required by the algorithm of
considered practical to use the direct algorithm to evaluate the present paper. For 12,800 particles, the effort is nearly
the field at all 12,800 points, since it would take about 5 h 300 times greater.
of CPU time, without producing much useful information.
Therefore, we used the direct algorithm to evaluate the Similar calculations have been performed for homoge-
field at only 100 of the 12,800 particles, both in single and neous Dirichlet and Neumann boundary conditions, and
double precision, and used the resulting data to estimate the observations made above for the periodic model are
dalg and ddir . The value for Tdir in this case was estimated equally applicable in these cases.
by scaling. For illustration, the equipotential lines for a box with
The following observations can be made from Table I: 10 randomly distributed particles and Dirichlet boundary
conditions are shown in Fig. 8. The entire calculation re-
quired 15 s of CPU time; about half the time was spent
evaluating the field at more than 10,000 points, while the
TABLE I rest was used up by the plotting routine.
Computational Results
6. CONCLUSIONS
N Talg (s) dalg Tdir (s) ddir
An algorithm has been constructed for the rapid evalua-
100 0.6 1.1 3 1025 1.1 1.9 3 1025
200 1.4 4.1 3 1025 4.5 3.2 3 1025 tion of potential fields generated by ensembles of particles
400 2.0 3.6 3 1025 18 6.6 3 1025 of the type encountered in plasma physics, molecular dy-
800 3.8 4.6 3 1025 69 7.3 3 1025 namics, fluid dynamics (the vortex method), and celestial
1,600 6.6 1.4 3 1025 272 7.0 3 1025 mechanics. The algorithm is applicable both in the context
3,200 16.5 0.9 3 1025 1088 3.1 3 1025 of dynamical simulations and Monte Carlo simulations,
6,400 24.7 7.2 3 1025 4480 6.8 3 1025
12,800 60.9 3.0 3 1025 17920 (est.) 1.8 3 1025 provided that the fields to be evaluated are Coulombic in
nature. The asymptotic CPU time estimate for the algo-
292 GREENGARD AND ROKHLIN

rithm of the present paper is of the order O(N), where N REFERENCES


is the number of particles in the simulation, and the numeri-
cal examples presented in Section 5 indicate that even 1. C. R. Anderson, J. Comput. Phys. 62, 111 (1986).

very large-scale problems result in acceptable CPU time 2. A. W. Appel, SIAM J. Sci. Stat. Comput. 6, 85 (1985).
requirements. In the present paper, a two-dimensional ver- 3. J. Carrier, L. Greengard, and V. Rokhlin, A fast adaptive multipole
sion of the algorithm is described. Generalizing this result algorithm for particle simulations, Technical Report 496, Yale Com-
to three dimensions is fairly straightforward and will be puter Science Department, 1986.

reported at a later date. 4. A. J. Chorin, J. Fluid. Mech. 57, 785 (1973).


5. R. W. Hockney and J. W. Eastwood, Computer Simulation Using
ACKNOWLEDGMENTS Particles (McGraw–Hill, New York, 1981).
6. G. Polya and G. Latta, Complex Variables (Wiley, New York,
It is the authors’ pleasure to thank Professor M. H. Schultz for drawing 1974).
their attention to the subject of this paper and for his continuing interest
and support. 7. V. Rokhlin, J. Comput. Phys. 60, 187 (1985).

You might also like