0% found this document useful (0 votes)
13 views23 pages

HPC Nbody

Uploaded by

Rajul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views23 pages

HPC Nbody

Uploaded by

Rajul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

N-body Problems

Victor Eijkhout

Fall 2022
Summing forces

2
Particle interactions

for each particle i


for each particle j
let r̄ij be the vector between i and j;
then the force on i because of j is
mm
fij = −r̄ij |ri | j
ij
(where mi , mj are the masses or charges) and
fji = −fij .
Sum forces and move particle over ∆t

3
Complexity reduction

• Naive all-pairs algorithm: O (N 2 )


• Clever algorithms: O (N log N ), sometimes even O (N )
• Octtree algorithm: Barnes-Hut

4
Octtree

5
Dynamic octree creation
Procedure Quad_Tree_Build
Quad_Tree = {empty}
for j = 1 to N // loop over all N particles
Quad_Tree_Insert(j, root) // insert particle j in QuadTree
endfor
Traverse the Quad_Tree eliminating empty leaves

Procedure Quad_Tree_Insert(j, n) // Try to insert particle j at node n in


if n an internal node // n has 4 children
determine which child c of node n contains particle j
Quad_Tree_Insert(j, c)
else if n contains 1 particle // n is a leaf
add n’s 4 children to the Quad_Tree
move the particle already in n into the child containing it
let c be the child of n containing j
Quad_Tree_Insert(j, c)
else // n empty
store particle j in node n
end

6
Octree algorithm

• Consider cells on the top level


• if distance/diameter ratio small enough, take center of mass
• otherwise consider children cells

7
8
Masses calculation
// Compute the CM = Center of Mass and TM = Total Mass of all the particle
( TM, CM ) = Compute_Mass( root )

function ( TM, CM ) = Compute_Mass( n )


if n contains 1 particle
store (TM, CM) at n
return (TM, CM)
else // post order traversal
// process parent after all children
for all children c(j) of n
( TM(j), CM(j) ) = Compute_Mass( c(j) )
// total mass is the sum
TM = sum over children j of n: TM(j)
// center of mass is weighted sum
CM = sum over children j of n: TM(j)*CM(j) / TM
store ( TM, CM ) at n
return ( TM, CM )

9
Force evaluation
// for each particle, compute the force on it by tree traversal
for k = 1 to N
f(k) = TreeForce( k, root )
// compute force on particle k due to all particles inside root

function f = TreeForce( k, n )
// compute force on particle k due to all particles inside node n
f = 0
if n contains one particle // evaluate directly
f = force computed using formula on last slide
else
r = distance from particle k to CM of particles in n
D = size of n
if D/r < theta // ok to approximate by CM and TM
compute f
else // need to look inside node
for all children c of n
f = f + TreeForce ( k, c )

10
Complexity

• Each cell considers ‘rings’ of equi-distant cells


• but at doubling distance
• c log N cells to consider for each particle
• N log N overall

11
Computational aspects

• After position update, particles can move to next box: load


redistribution
• Naive octree algorithm is formulated for shared memory
• Distributed memory by using inspector-executor paradigm

12
Step 1: force by a particle

for level ℓ from one above the finest to the coarsest:


for each cell c on level ℓ
(ℓ) (ℓ+1)
let gc be the combination of the gi for all children i of c

13
14
Step 2: force on a particle

for level ℓ from one below the coarses to the finest:


for each cell c on level ℓ:
(ℓ)
let fc be the sum of
(ℓ−1)
1. the force fp on the parent p of c, and
(ℓ)
2. the sums gi for all i on level ℓ that
satisfy the cell opening criterium

15
16
• Center of mass calculation and force prolongation are local
• Force from neighbouring cells is a neighbour communication
• Neighbour communication can start before up/down tree
calculation is finished: latency hiding

17
All-pairs methods

• Traditional algorithm: distribute particles, for each particle gather


and update compute
• Problem: allgather has O (N )β cost
• does not go down with P, so does not scale weakly

18
1.5D calculation

√ √
• Better algorithm: use P × P processor grid,

• Divide particles in bins of N / P
• Processor (i , j ) computes interaction of boxes i and j:

19
20
21
22
√ √
• Better algorithm: use P × P processor grid,

• Divide particles in boxes of M = N / P
• Processor (i , j ) computes interaction of boxes i and j:
• this requires broadcast (for duplication) and reduction (for
summing) in processor rows and columns

• Bandwidth cost βN / P which is M: scalable.

23

You might also like