A Fast Algorithm For Particle Simulations Greengard Rokhlin
A Fast Algorithm For Particle Simulations Greengard Rokhlin
k51
0
k
(2.1)
2. PHYSICAL AND MATHEMATICAL PRELIMINARIES
Proof. Note first that log(z 2 z0) 2 log(z) 5 log(1 2
In this paper, we consider a two-dimensional physical
z0 /z) and that uz0 /zu , 1. The lemma now follows from
model which consists of a set of N charged particles with
the expansion
the potential and force obtained as the sum of pairwise
O wk ,
interactions from Coulomb’s law. Suppose that a point y k
charge of unit strength is located at the point (x0 , y0) 5 log(1 2 g) 5 (21)
x0 [ R2. Then, for any x 5 (x, y) [ R2 with x ? x0 , k51
f(z) 5 Q log(z) 1 O za ,
y
k51
k
k (2.2)
Q5 O
m
i51
qi , ak 5 O
i51
2qi zik
m
k
. (2.3) that hy1 , y2 , ..., ynj is another set of points in C (Fig. 1).
We say that the sets hxi j and hyi j are well separated if there
exist points x0 , y0 [ C and a real r . 0 such that
Furthermore, for any p $ 1,
uxi 2 x0u , r for all i 5 1, ..., m,
Uf(z) 2 Q log(z) 2
k51
ak
z k #a
r
z
O U UU
p p11
#
A
S DS D
c21
1 p
c
,
uyj 2 y0u , r
ux0 2 y0u . 3r.
for all j 5 1, ..., n,
(2.4)
In order to obtain the potential (or force) at the points
where hyj j due to the charges at the points hxi j directly, we
could compute
c5 UU
z
r
m
i51
O
, A 5 uqi u, and a 5
1 2
A
ur/zu
. (2.5)
O f (y )
m
i51
xi j for all j 5 1, ..., n. (2.7)
Proof. The form of the multipole expansion (2.2) is an This clearly requires order nm work (evaluating m fields
immediate consequence of the preceding lemma and the at n points). Now suppose that we first compute the coeffi-
fact that f(z) 5 oi51 fzi(z). To obtain the error bound
m
cients of a p-term multipole expansion of the potential due
(2.4), observe that to the charges q1 , q2 , ..., qm about x0 , using Theorem 2.1.
This requires a number of operations proportional to mp.
U f(z) 2 Q log(z) 2 O za U 5 U O za U.
p
k51
k
k
y
k5p11
k
k
Evaluating the resulting multipole expansion at all points
yj requires order np work, and the total amount of compu-
tation is of the order O(mp 1 np). Moreover, by (2.6),
UO U SD
Substituting for ak the expression in (2.3), we have
O
p
m
ak 1 p
fxi (yj ) 2 Q log(yj 2 x0) 2 # A ,
S DS D 5
A
c21
1
c
p
.
to the total charge), p must be of the order 2log2(«). Once
the precision is specified, the amount of computation has
been reduced to
U O za U # A S12D .
p p
k which is significantly smaller than nm for large n and m.
f(z) 2 Q log(z) 2 k (2.6)
k51
2.1. Translation Operators and Error Bounds
Finally, we demonstrate, with a simple example, how The following three lemmas constitute the principal ana-
multipole expansions can be used to speed up calculations lytical tool of this paper, allowing us to manipulate
with potential fields. Suppose that charges of strengths q1 , multipole expansions in the manner required by the fast
q2 , ..., qm are located at the points x1 , x2 , ..., xm [ C and algorithm. Lemma 2.3 provides a formula for shifting the
ALGORITHM FOR PARTICLE SIMULATION 283
k51
k
0
k (2.8)
Oy
FIG. 2. Source charges q1 , q2 , ..., q1 are contained in the circle D1 .
bl The corresponding multipole expansion about z0 converges inside D2 . C
f(z) 5 a0 log(z) 1 l, (2.9)
l5 1 z is a circle of radius s, with s . R.
where
LEMMA 2.4. Suppose that m charges of strengths q1 ,
bl 5 SOl
k51
akz l02k S DD
l21
k21
2
a0z 0l
0
, (2.10)
q2 , ..., qm are located inside the circle D1 with radius R and
center at z0 , and that uz0u . (c 1 1)R with c . 1 (Fig. 2).
Then the corresponding multipole expansion (2.8) con-
verges inside the circle D2 of radius R centered about the
with (kl ) the binomial coefficients. Furthermore, for any origin. Inside D2 , the potential due to the charges is de-
p $ 1, scribed by a power series,
Uf(z) 2 a0 log(z) 2
bl
z
OU
l
p
f(z) 5 O b ?z ,
y
l50
l
l
(2.12)
l51
(2.11)
S @S U
# A 12
uz0u 1 R
z
UDDU uz0u 1 R
z
U p11 where
where A is defined in (2.5) and e is the base of natural loga- Obviously, for any t lying on C,
rithms.
0
k
U O b ?z U 5 U O b ?z U # S 1 S
p y
f(z) 2 l
l
l
l
1 2 (2.16) After some algebraic manipulation, we have
l50 l5p11
with M#A S pR 1 cR
cR
D
uzu cR 2 R
, 12 $
s cR 1 R
.
S1 5 UO U y
l5p11
cl ? z l , S2 5 UO U y
l5p11
bl ? z l .
Observing that for any positive integer n and any integer
p $ 2,
S1 5 UO U O
y
l5p11
O
clz l # ua0u
y
zl
l5p11 l ? z 0
l # A
y
zl
l5p11 l ? z 0
l
we obtain
#A O S D O S D S DS D
y
l5p11
1
c11
l
,A
l5p11
y
1
c
l
5
A
c21
1 p
c
. S2 #
A(pR 1 cR)(cR 1 R) uzu
cR(cR 2 R) cR
S D S D p11
p
p21
p11
where s 5 cR((p 2 1)/p) (Fig. 2). Note first that for any
p $ 2c/(c 2 1),
#
4Ae(p 1 c)(c 1 1) 1
c(c 2 1) c
SD p11
.
cR 1 R
R, , s , cR.
2 Adding the last expression to the error bound for S1 com-
pletes the proof.
Defining the function f1 : C\D1 R C by the expression The following lemma is an immediate consequence of
Maclaurin’s theorem. It describes an exact translation op-
f1(z) 5 f(z) 2 a0 ? log(z 2 z0), eration with a finite number of terms, and no error bound
is needed.
and using Taylor’s theorem for complex analytic functions LEMMA 2.5. For any complex z0 , z, and hakj, k 5 0, 1,
(see [6, p. 190]), we obtain 2, ..., n,
l
l
y
l
l O a (z 2 z ) 5 O SO a SklD (2z ) D z .
k50
n
k 0
k
l50
n
k5l
n
k 0
k2l l
(2.17)
l50 l5p11
to shift (for all i) the expansion Cl21,i to each of box i’s local expansion describes the field due to all
children, we have, for each box j at level l, a local represen- particles in the system that are not contained
tation of the potential due to all particles outside of j ’s in the current box or its nearest neighbors.
parent’s neighbors, namely C̃l, j . The interaction list is, Once the local expansion is obtained for a
therefore, precisely that set of boxes whose contribution given box, it is shifted, in the second inner
to the potential must be added to C̃l, j in order to create Cl, j . loop to the centers of the box’s children,
This is done by using Lemma 2.4 to convert the multipole forming the initial expansion for the boxes
expansions of these interaction boxes to local expansions at the next level.]
about the current box center and adding them to the expan- Set C̃1,1 5 C̃1,2 5 C̃1,3 5 C̃1,4 5 (0, 0, ..., 0)
sion obtained from the parent. Note also that with free- do l 5 1, ..., n 2 1
space boundary conditions, C0,i and C1,i are equal to zero do ibox 5 1, ..., 4 l
since there are no well-separated boxes to consider, and Form Cl,ibox by using Lemma 2.4 to convert the
we can begin forming local expansions at level 2. The multipole expansion Fl, j of each box j in interac-
following is a formal description of the algorithm. tion list of box ibox to a local expansion about the
center of box ibox, adding these local expansions
ALGORITHM.
together, and adding the result to C̃l,ibox .
Initialization enddo
Choose a level of refinement n P log4 N, a precision «, do ibox 5 1, ..., 4 l
and set p P log2(«). Form the expansion C̃l11, j for ibox’s children by
Upward Pass using Lemma 2.5 to expand Cl,box about the chil-
dren’s box centers.
Step 1
enddo
Comment [From multipole expansions of potential field
enddo
due to particles in each box about the box
center at the finest mesh level.] Step 4
do ibox 5 1, ..., 4n Comment [Compute interactions at finest mesh level.]
Form a p-term multipole expansion Fn,ibox , by using do ibox 5 1, ..., 4n
Theorem 2.1. Form Cl,box by using Lemma 2.4 to convert the
enddo multipole expansion Fl, j of each box j in interaction
Step 2 list of box ibox to a local expansion about the center
Comment [Form multipole expansions about the centers of box ibox, adding these local expansions together,
of all boxes at all coarser mesh levels, each and adding the result to C̃l,ibox .
expansion representing the potential field enddo
due to all particles contained in one box.] Comment [Local expansions at finest mesh level are now
do l 5 n 2 1, ..., 0 available. They can be used to generate the
do ibox 5 1, ..., 4 l potential or force due to all particles outside
Form a p-term multipole expansion Fl,ibox , by the nearest neighbor boxes at finest mesh
using Lemma 2.3 to shift the center of each child level.]
box’s expansion to the current box center and Step 5
adding them together. Comment [Evaluate local expansions at particle posi-
enddo tions.]
enddo do ibox 5 1, ..., 4n
Downward Pass For every particle pj located at the point zj in box
Comment [In the downward pass, interactions are consis- ibox, evaluate Fn,ibox(zj ).
tently computed at the coarsest possible level. enddo
For a given box, this is accomplished by in-
Step 6
cluding interactions with those boxes which
Comment [Compute potential (or force) due to nearest
are well separated and whose interactions
neighbors directly.]
have not been accounted for at the parent’s
do ibox 5 1, ..., 4n
level.]
For every particle pj in box ibox, compute interac-
Step 3 tions with all other particles within the box and its
Comment [Form a local expansion about the center of nearest neighbors.
each box at each mesh level l # n 21. This enddo
ALGORITHM FOR PARTICLE SIMULATION 287
has no net charge and, therefore, that the local representa- and Newton’s third law requires that the net force on each
tion given by Lemma 2.4 can be written as particle be zero. But the net force on the particle at the
origin corresponds to the summation over S of 1/z0 , so
Ob
p
that we set
C0,1 5 m?z
m
(4.2)
O z1 5 0.
m51
with S 0
O S ak m 1 k 2 1
D
p To determine a value for the second term,
1
bm 5 (21)k with m 5 0, 1, ..., p,
z 0 k51 z 0k
m
k21
(4.3) O z1 ,
S
2
0
O a Sm 1k 2k 21 1D (21) So z 1 D. O z1 1 O 4z 12 z .
p
b total
m 5 k
k
m1k (4.4) dF 5 24 1 2 4 2
k51 S 0 S9 0 S9 0 0
The summation over S for each inverse power of z0 can A somewhat involved calculation shows that
be precomputed and stored. For (m 1 k) . 2, the series
is absolutely convergent. However, for (m 1 k) # 2, the
series is not absolutely convergent, and the computed value O 4z 12 z 5 4 2 f.
S9
4
0
2
0
depends on the order of addition. Choosing a reasonable
value for the sum of the series requires careful consider-
ation of the physical model. Therefore, to satisfy (4.5), we set
Suppose first that the only particle in the simulation is
a charge of unit strength located at the origin. Then the
periodic model corresponds to a uniform lattice of charges,
O z1 5 f.
S9
2
0
ALGORITHM FOR PARTICLE SIMULATION 289
Now
O z1 5 O z1 O z1 ,
S9
2
0 S
2
0 S 9\ S
2
0
O z1 5 f,
S
2
0
Tm,k 5 Sm1k21
k21
D (21)k SO DS
1
zm
0
1k
. where Fsources is the field due to the particles inside the
computational cell and Fimages is the field due to selected
image charges located outside the computational cell. The
This can be viewed as the first step in the downward pass image charge positions and strengths are chosen so that
of the algorithm for periodic boundary conditions. At this
point, we have accounted for all interactions excluding the Fsources(x, y) 5 2Fimages(x, y) for (x, y) [ D.
ones within the immediate neighbors of the computational
box as depicted in Fig. 3. But the expansions Fl,i for boxes
inside the computational cell are also the expansions of the For the computational domain we are considering, ap-
corresponding boxes inside the nearest neighbor images of propriate locations for the image charges can be deter-
the computational cell. By adding to the interaction list mined by an iterative process, illustrated in Fig. 6. We
the appropriate boxes, we maintain the formal structure first reflect each particle pi of charge strength si in the
of the algorithm and the associated computational com- computational cell across the top boundary line and place
plexity. an image charge of strength 2si at that location, generating
an image box which we denote C (Fig. 6b)). The set of
4.2. Dirichlet Boundary Conditions image charges is denoted by V1 , and the field they induce
is called FV1 . Adding FV1 to Fsources clearly enforces the
We turn now to the imposition of homogeneous Dirich- desired condition along the top boundary. To impose the
let boundary conditions, namely boundary condition along the bottom of the computational
cell, we must reflect all charges (source and image) cur-
F(x, y) 5 0 for (x, y) [ D, rently in the model across the bottom boundary, generating
two more image boxes (which are copies of C and C ). The
where D is the boundary of the computational domain. set of all image charges after this second reflection step is
Analytically speaking, this can be accomplished by the denoted by V2 . Now, while Fsources 1 FV2 is equal to zero
method of images, described in detail below. In general along the bottom boundary, the resulting field violates the
terms, we consider the potential field to be composed of top boundary condition. We therefore reflect again across
two parts; that is, the top boundary, creating two new image boxes and a
new set of image charges V3 , such that Fsources 1 FV3
F 5 Fsources 1 Fimages , satisfies the top condition but violates the bottom one. By
290 GREENGARD AND ROKHLIN
iterating in this manner, we generate a sequence of sets converted, by using Lemma 2.5, into an expansion about
of image charges hVi j with the origin (the center of the computational cell), which we
call C̃0,1 . It remains to account for the well-separated boxes
V1 , V2 , V3 , ? ? ? , V, which are contained inside the supercell’s nearest neigh-
bors. There are exactly 27 of these boxes, and their
y
where V 5 <i51 Vi is the set of charges contained in the multipole expansions can be shifted (by using Lemma 2.4)
infinite array of image boxes depicted in Fig. 6c. It is easy to local expansions about the origin which are then added
to see that the corresponding sequence of image fields to C̃0,1 to finally form C0,1 .
hFVi j converges inside the computational cell and that the
potential field Fsources 1 FV does satisfy both the top and 4.3. Other Boundary Conditions
bottom boundary conditions.
While in certain applications, periodic or Dirichlet
In order to enforce the Dirichlet condition on the re-
boundary conditions are called for, in others, Neumann or
maining two sides, we proceed analogously. First, we reflect
mixed conditions have to be imposed on the boundary of
all the charges currently in the model (the original sources
the computational domain. A typical example of a problem
plus the images in V ) across the left boundary. This obvi-
with mixed conditions is the computational cell with Neu-
ously does not affect the top and bottom conditions and
mann conditions on two opposing sides and Dirichlet con-
enforces the homogeneous boundary condition along the
ditions on the two others. Other models require periodic
left side of the computational cell. The current set of (all)
boundary conditions on the left and right sides of the com-
image charges is now denoted H1 . Reflecting across the
putational cell and Dirichlet or Neumann conditions on
right boundary creates a new set H2 , witht he field
the top and bottom. The imposition of these conditions is
Fsources 1 FH2 satisfying the Dirichlet condition along the
achieved by a procedure essentially identical to the one
right (but not the left) boundary. Repeated reflection
described above. By reflection and/or periodic extension,
across the left and right boundaries of the computational
one first generates an entire plane of images. The local
cell yields a sequence hHi j of infinite sets of image charges,
expansion C0,1 is then computed by an appropriate summa-
tion over all well-separated image boxes, and the remaining
H1 , H2 , H3 , ? ? ? , H,
image interactions are handled as above.
y
where H 5 <i51 Hi is the set of charges contained in the
5. NUMERICAL RESULTS
two-dimensional family of image boxes depicted in Fig.
6d. It is easy to see that the sequence hFHi j converges
A computer program has been implemented utilizing
inside the computational cell, and we denote its limit by
the algorithm of this paper and capable of handling free-
FH . Finally, we observe that Fsources 1 FH 5 0 on the
space problems and problems with periodic, homogeneous
entire boundary D.
Dirichlet or homogeneous Neumann boundary conditions.
From a computational point of view, the rate of conver-
For testing purposes, we randomly assigned charged par-
gence of the method of images is quite unsatisfactory. In
ticles to positions in the computational cell (Fig. 7), with
conjunction with our algorithm, however, this method can
charge strengths between 0 and 1, and with the numbers
be turned into an extremely efficient numerical tool. In
the terminology previously introduced, all of the image
boxes except the nearest neighbors of the computational
cell are well separated and their induced fields can be
represented by a single local expansion, denoted C0,1 .
Once the coefficients of this local expansion have been
computed, we need only account for interactions within
the nearest neighbors of the computational cell itself. To do
this, as in the periodic case, we simply add the appropriate
image boxes to the interaction lists of the boxes inside the
computational cell.
Thus, it remains only to calculate C0,1 . We first observe
that the plane of images has a periodic structure with unit
‘‘supercell’’ centered at (As, As), indicated by thick lines in
Fig. 6d. But then, by the method developed above for
periodic problems, we can obtain an expansion about the
point (As, As) which accounts for all interactions beyond the
nearest neighbors of the suercell. This expansion can be FIG. 7. 1600 randomly located charges in the computational cell.
ALGORITHM FOR PARTICLE SIMULATION 291
very large-scale problems result in acceptable CPU time 2. A. W. Appel, SIAM J. Sci. Stat. Comput. 6, 85 (1985).
requirements. In the present paper, a two-dimensional ver- 3. J. Carrier, L. Greengard, and V. Rokhlin, A fast adaptive multipole
sion of the algorithm is described. Generalizing this result algorithm for particle simulations, Technical Report 496, Yale Com-
to three dimensions is fairly straightforward and will be puter Science Department, 1986.