HPC Nbody
HPC Nbody
Victor Eijkhout
Fall 2022
Summing forces
2
Particle interactions
3
Complexity reduction
4
Octtree
5
Dynamic octree creation
Procedure Quad_Tree_Build
Quad_Tree = {empty}
for j = 1 to N // loop over all N particles
Quad_Tree_Insert(j, root) // insert particle j in QuadTree
endfor
Traverse the Quad_Tree eliminating empty leaves
6
Octree algorithm
7
8
Masses calculation
// Compute the CM = Center of Mass and TM = Total Mass of all the particle
( TM, CM ) = Compute_Mass( root )
9
Force evaluation
// for each particle, compute the force on it by tree traversal
for k = 1 to N
f(k) = TreeForce( k, root )
// compute force on particle k due to all particles inside root
function f = TreeForce( k, n )
// compute force on particle k due to all particles inside node n
f = 0
if n contains one particle // evaluate directly
f = force computed using formula on last slide
else
r = distance from particle k to CM of particles in n
D = size of n
if D/r < theta // ok to approximate by CM and TM
compute f
else // need to look inside node
for all children c of n
f = f + TreeForce ( k, c )
10
Complexity
11
Computational aspects
12
Step 1: force by a particle
13
14
Step 2: force on a particle
15
16
• Center of mass calculation and force prolongation are local
• Force from neighbouring cells is a neighbour communication
• Neighbour communication can start before up/down tree
calculation is finished: latency hiding
17
All-pairs methods
18
1.5D calculation
√ √
• Better algorithm: use P × P processor grid,
√
• Divide particles in bins of N / P
• Processor (i , j ) computes interaction of boxes i and j:
19
20
21
22
√ √
• Better algorithm: use P × P processor grid,
√
• Divide particles in boxes of M = N / P
• Processor (i , j ) computes interaction of boxes i and j:
• this requires broadcast (for duplication) and reduction (for
summing) in processor rows and columns
√
• Bandwidth cost βN / P which is M: scalable.
23