0% found this document useful (0 votes)
40 views12 pages

Load Balancing Strategies For The DSMC Simulation of Hypersonic Flows Using HPC

This document discusses load balancing strategies for simulating hypersonic flows using the Direct Simulation Monte Carlo (DSMC) method on high-performance computing systems. It examines different load distribution algorithms and compares their parallel performance. It also presents simulation results of the velocity, temperature, and heat flux for a biased hypersonic nitrogen flow around a blunted cone, compared to experimental measurements.

Uploaded by

nanilaurntiu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views12 pages

Load Balancing Strategies For The DSMC Simulation of Hypersonic Flows Using HPC

This document discusses load balancing strategies for simulating hypersonic flows using the Direct Simulation Monte Carlo (DSMC) method on high-performance computing systems. It examines different load distribution algorithms and compares their parallel performance. It also presents simulation results of the velocity, temperature, and heat flux for a biased hypersonic nitrogen flow around a blunted cone, compared to experimental measurements.

Uploaded by

nanilaurntiu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Load balancing strategies for the DSMC

arXiv:1811.04742v1 [physics.comp-ph] 12 Nov 2018

simulation of hypersonic flows using HPC

T. Binder1 , S. Copplestone2 , A. Mirza1 , P. Nizenkov1 , P. Ortwein2 ,


M. Pfeiffer1 , W. Reschke1 , C.-D. Munz2 , and S. Fasoulas1

1
Institute of Space Systems (IRS), University of Stuttgart, 70569 Stuttgart,
Germany [email protected]
2
Institute of Aerodynamics and Gas Dynamics (IAG), University of Stuttgart,
70569 Stuttgart, Germany [email protected]

Abstract In the context of the validation of PICLas, a kinetic particle suite for the
simulation of rarefied, non-equilibrium plasma flows, the biased hypersonic nitrogen
flow around a blunted cone was simulated with the Direct Simulation Monte Carlo
method. The setup is characterized by a complex flow with strong local gradients
and thermal non-equilibrium resulting in a highly inhomogeneous computational
load. Especially, the load distribution is of interest, because it allows to exploit the
utilized computational resources efficiently. Different load distribution algorithms
are investigated and compared within a strong scaling. This investigation of the
parallel performance of PICLas is accompanied by simulation results in terms of the
velocity magnitude, translational temperature and heat flux, which is compared to
experimental measurements.

1 Introduction

For the numerical simulation of highly rarefied plasma flows, a fully kinetic
modelling of Boltzmann’s equation complemented by Maxwell’s equations is
necessary. For this purpose a particle codes that combines the PIC (Particle
in Cell) and DSMC (Direct Simulation Monte Carlo) method is developed at
IAG (Institute of Aerodynamics and Gas Dynamics) and IRS (Institute of
Space Systems) in recent years [7]. Particle codes are inherently numerically
expensive and thus are an excellent application for parallel computing. The
2 Binder et al.

modelling of the Maxwell-Vlasov equations (PIC solver) has been described in


previous reports [9, 8, 4]. In the present report we focus our attention on the
simulation of rarefied, non-equilibrium, neutral gas flows, which are typical
for atmospheric entry conditions at high altitude and are simulated using the
DSMC part of the coupled code PICLas. The inhonogemeous particle distri-
bution throughout the domain leads to strong imbalances. These are reduced
through load balancing for which different load distribution algorithms are
investigated.
The physical basis of the coupled solver is the approximation of Boltz-
mann’s equation
 
∂ 1 s ∂f
+ v · ∇ + s F · ∇v f (x, v, t) = , (1)
∂t m ∂t coll

which covers basic particle kinetics, where f s (x, v, t) is the six-dimensional


Particle Distribution Function (PDF) in phase-space for each species s with
mass m. It describes the amount of particles per unit volume, which are found
at a certain point (x, v) in phase-space and time t. The left hand side of (1),
where F is an external force field, is solved using a deterministic Particle-in-
Cell [5] method, while the right hand side, where the collision integral ∂f

∂t Coll
accounts for all particle collisions in the system, is solved by applying the
non-deterministic DSMC [3] method.
The PDF is approximated by summing up a certain number of weighted
particles Npart and is given by
Npart
X
f s (x, v, t) ≈ wn δ (x − xn ) δ (v − vn ) ,
n=1

where the δ-function is applied to position and velocity space, separately, and
the particle weighting factor wn = Nphy /Nsim is used to describe the ratio of
simulated to physical particles.
The DSMC method is briefly reviewed in Section 2. In Section 3, the
numerical setup and results of the simulation of the flow around a 70◦ blunted
cone geometry are presented. The load-distribution algorithms and the parallel
performance of the DSMC code are investigated in detail in Section 4, followed
by a summary and conclusion in Section 5.
Hypersonic Flow around a Blunted Cone using HPC 3

2 DSMC Solver

The DSMC method approximates the right hand side of Eq. (1) by modelling
binary particle collisions in a probabilistic and transient manner. The main
idea of the DSMC method is the non-deterministic, statistical calculation of
changes in particle velocity utilizing random numbers in a collision process.
Additionally, chemical reactions may occur in such collision events. The pri-
mordial concept of DSMC was developed by Bird [3] and is commonly applied
to the simulation of rarefied and neutral gas flows. The collision operator in
Eq. (1) is given by
∂f
coll
=
Z ∂t
W (v1 , v2 , v3 , v4 ){f (x, v1 , t)f (x, v2 , t) − f (x, v3 , t)f (x, v4 , t)}dv1 dv2 dv3 ,

(2)

where W represents the probability per unit time in which two particles collide
and change their velocities from v1 and v2 to v3 and v4 , respectively. However,
the DSMC method does not solve this collision integral directly, but rather
applies a phenomenological approach to the collision process of simulation
particles in a statistical framework.
A single standard DSMC time step is depicted schematically in Fig. 1.
First, a particle pair for the collision process is found by examining each cell
and applying a nearest neighbour search with an octree based pre-sorting.
An alternative method is the random pairing of all particles in each cell,
but with additional restrictions to the cell size. The collision probability is
modelled by choosing a cross section for each particle species using microscopic
considerations. As with the PIC method, macro particles are simulated instead
of real particles to reduce computational effort. The collision probability of
two particles, 1 and 2, is determined by methods found in [3, 2], which yields
Np,1 Np,2 ∆t
P12 = w (σ12 g12 ) , (3)
1 + δ12 Vc S12
where δ12 is the Kronecker delta, Vc the cell volume, ∆t the time step, σ the
cross section, S12 the number of particle pairs of species 1 and 2 in Vc and
g the relative velocity between the two particles considered. This probabil-
ity is compared to a pseudo random number R ∈ [0, 1) and if R < P12 , the
4 Binder et al.

pairing

localization,
boundary treatment

collision process
Δt → (Δv)part.

particle movement
(Δv)part. → (x, v)part.

sampling

DSMC

Fig. 1: Schematic of the standard DSMC method.

collision occurs, otherwise it does not. Subsequent events such as chemical


reactions or relaxation processes are computed in the same manner, but using
additional probabilities. This may change the internal energy of particles, i.e.
their rotational, vibrational energy and electronic excitation. Chemical reac-
tions are modelled via the Arrhenius law or quantum-kinetic considerations,
which lead to dissociation, recombination, exchange reactions or ionization.
Macroscopic properties like temperature or density are calculated by sampling
particle positions and velocities over time within each cell.
A major requirement for a physical DSMC simulation is the ratio of the
mean collision separation distance to the mean free path in each cell
lmcs !
< 1. (4)
λ
The former represents the distance of two simulation particles that perform
a collision, while the latter is a function of the gas density. The ratio can be
modified by the weighting factor wn as introduced in Section 1, which then
directly depends on the local number density
1
w< √ 3 , (5)
2πd2ref n2/3
where dref is a species-specific reference diameter.
Hypersonic Flow around a Blunted Cone using HPC 5

[mm] # S/Rn [−]

Rb 25.0 1 0.00
Rc 1.25 2 0.52
Rj 2.08 3 1.04
Rn 12.5 4 1.56
Rs 6.25 5 2.68
6 3.32
7 5.06
8 6.50
9 7.94

Fig. 2: Geometry of the 70◦ blunted cone test case.

Table 1: Free-stream conditions of the 70◦ blunted cone test case.

Case |v∞ | m s−1 T∞ [K] n∞ [m−3 ]


 
∆t [s] w [−] Npart [−]
Set 2 1502.4 13.58 1.115 · 1021 5 · 10−8 2 · 1010 2.84 · 107

3 Test Case: 70◦ Blunted Cone

A popular validation case for rarefied gas flows is the wind tunnel test of the
70◦ blunted cone in a diatomic nitrogen flow at a Mach number of M = 20 [1].
The geometry of the model is depicted in Fig. 2. Positions of the heat flux
measurements are depicted by the numbers 1-9. While the experiments were
conducted at different rarefaction levels and angles of attack, the case denoted
by Set 2 and α = 30◦ is used for the investigation. The free-stream conditions
and simulation parameters are given in Table 1. Half of the fluid domain was
simulated to exploit the symmetry in the xy-plane.
An exemplary simulation result is shown in Fig. 3. Here, the translational
temperature in the symmetry plane and the velocity streamlines are shown.
The simulation results are compared to the experimental measurements in
terms of the heat flux in Fig. 4. Overall good agreement can be observed for
the first four thermocouples, where the error is below 10% and within exper-
imental uncertainty [1]. The agreement on the sting deteriorates for thermo-
couples further downstream to error values of up to 45%.
6 Binder et al.

T [K]
1,250
1,000
750
500
250
15

v [m s−1 ]
1,500

1,000

500

Fig. 3: Exemplary simulation result: Translational temperature in the sym-


metry plane and velocity streamlines.

102
1 2 3 4


9
Heat flux qw kW m−2

8
101 7


100

10−1 5 6 Experiment
PICLas

10−2
0 1 2 3 4 5 6 7 8
S/Rn [−]

Fig. 4: Comparison of measured and calculated heat flux.


Hypersonic Flow around a Blunted Cone using HPC 7

4 Parallelization of the DSMC Method

4.1 Load Computation and Distribution

The code framework of PICLas utilizes implementations of the MPI 2.0 stan-
dard for parallelization. Load distribution between the MPI processes is a
crucial step. A domain decomposition by grid elements was chosen as strat-
egy. In a preprocessing step, all elements within the computational domain
are sorted along a Hilbert curve due to its clustering property [6]. Then, each
MPI process receives a certain segment of the space filling curve (SFC). To
illustrate an optimal load balance scenario, a simplified grid is considered that
consists of 8 × 8 = 64 elements, which are ordered along a SFC. Fig. 5 de-
picts the decomposition of the grid into four regions, each corresponding to
an individual MPI process when the number of processes is Np = 4. For inho-
mogeneous particle distributions or elements of significantly different size, the
load has to be assigned carefully. In the DSMC method, the computational
costs L of each grid element is assumed to be linearly dependent on the con-
tained particle number. In an optimally balanced case, each process receives
approximately the average load.

Fig. 5: Domain decomposition for homogeneous load distribution.

Offset elements (i.e., an index I along the SFC) define the assigned segment
of a process. When the SFC describes the interval of [1, Nelem ], the segment
of each process p is defined by [I(p) + 1, I(p + 1)] with I(Np + 1) = Nelem .
Thus, the total load assigned to a single process results in:
8 Binder et al.

I(p+1)
X
Lptot = Li (6)
i=I(p)+1

The main goal of a proper load distribution is to minimize the idle time
of waiting processes, i.e., the maximum of all total, process-specific loads
Lptot needs to be minimized. To achieve that, several distribution methods are
implemented in PICLas.

Distribution by elements

Assuming a homogeneous particle population, a distribution only by elements


is favourable, i.e., Lelem =const. This can be achieved by dividing the number
elements into:
 
NElems
NElems = Np · A + B, A= , B = NElems mod Np (7)
Np
Based on this, each process receives A elements and the first B processes an
additional one, which can be calculated in a straightforward manner by:

Algorithm 1 Distribution by elements


ip ← 1
while ip ≤ Np do
I(ip ) ← A · (ip − 1) + min(ip − 1, B)
ip ← ip + 1
end while
I(Np + 1) ← Nelem

Simple load balance

The previous method is, however, not applicable if the elements have different
loads, since a subdivision in element number does not necessarily correspond
in the same fraction of total load. Therefore, while looping through the pro-
cesses along the SFC, each process receives in our “simple” balance scheme
an iteratively increasing segment until the so far gathered load is equal or
greater than the ideal fraction. To ensure that the following processes receive
at least one element each, the respective number of assignable elements must
be reduced. The algorithm follows as:
Hypersonic Flow around a Blunted Cone using HPC 9

Algorithm 2 Simple load balance


Ltot ← 0
ielem ← 1
ip ← 1
while ip ≤ Np do
I(ip ) ← ielem − 1
j ← ielem
ip PNelem
while j ≤ Nelem − Np + ip ∧ Ltot < Np
· k=1 Lk do
Ltot ← Ltot + Lj
j ←j+1
end while
ielem ← j + 1
ip ← ip + 1
end while

“Combing” algorithm

The “simple” algorithm ensures a very smooth load distribution for large
element numbers, since the ideal, current fraction can be achieved well by
the iterative adding of elements. However, if there exist elements with much
higher loads than most of the remaining ones, the load distribution method
fails. For this, we developed a smoothing algorithm, that “combs” the offset
elements along the SFC iteratively from the beginning towards the end. Here,
just the main characteristics of the method should be briefly described:

• The initial load distribution follows, i.e., from the “simple” balance method.
• A large number of different distributions is evaluated in terms of the max-
imum process-total load max(Lptot ), the one with the minimum value is
chosen as final solution.
• If the maximum Lptot belongs to a process p with a greater SFC-index
than the minimum one (maximum is “right” of the minimum), all offset
elements are shifted accordingly to the left.
• Maxima are smoothed to the right, i.e., small Lptot -intervals are increased
by shifting elements from maxima to minima.
• If the resultant optimum distribution was already reached before, elements
are shifted from the last process all towards the first one.
10 Binder et al.

4.2 Scaling performance of PICLas

For the test of parallelization, multiple simulations were run for a simulation
time of 1 · 10−4 s, corresponding to 2000 iterations. The speed-up between 720
and 5760 cores was calculated by
t720
SN = . (8)
tN
The respective parallel efficiency was determined by
720 · t720
ηN = , (9)
N · tN
where t720 and tN is the computational time using 720 and N cores, respec-
tively.
Fig. 6 shows the speed-up over the number of utilized nodes and the respec-
tive parallel efficiency as a label. The case without actual load balance (dis-
tribution by elements) is compared together with the distribution method by
paticle number per element against the ideal scaling behavior. The “Combing”
algorithm resulted into the same performace values as the “simple” balance
method, therefore, only the the latter one is displayed. The speed-up decreases
with an increasing number of cores due to the more frequent communication
between MPI processes. Nevertheless, a parallel efficiency of η = 0.87 can be
achieved using 5760 cores for the blunted cone test case.

5 Summary and Conclusions

The hypersonic flow around a 70◦ blunted cone was simulated with the Direct
Simulation Monte Carlo method. The case features complex flow phenomena
such as a detached compression shock in front and rarefied gas flow in the wake
of the heat shield. A comparison of the experimentally measured heat flux
yielded good agreement with the simulation results. The test case was utilized
to perform a strong scaling of the DSMC implementation of PICLas. With
regard to the computational duration on 720 cores, a parallel efficiency of 99%
to 87% could be achieved for 1440 and 5760 cores, respectively. The decrease
in parallel efficiency can be explained by an increasing MPI communication
effort. Currently, the implementation of cpu-time measurements into PICLas
is investigated for calculating the element loads directly instead of a simple
weighting of particle number, which will be focus of future reports.
Hypersonic Flow around a Blunted Cone using HPC 11

Distribution by elements
8 Simple load balance
Ideal 0.87

Speed-up S [-]
6

4 0.9 0.59

0.99 0.75
2 0.99
1
0.87
10.95
0
1,000 2,000 3,000 4,000 5,000 6,000
Nproc [-]

Fig. 6: Parallel performance of the double cone test case between 720 and
5670 cores. Speed-up S with labelled parallel efficiency η.

6 Acknowledgements

We gratefully acknowledge the Deutsche Forschungsgemeinschaft (DFG) for


funding within the projects ”Kinetic Algorithms for the Maxwell-Boltzmann
System and the Simulation of Magnetospheric Propulsion Systems” and
”Coupled PIC-DSMC-Simulation of Laser Driven Ablative Gas Expansions”.
The latter being a sub project of the Collaborative Research Center (SFB)
716 at the University of Stuttgart. The authors also wish to thank the
Landesgraduiertenförderung Baden-Württemberg for supporting the research.
Computational resources have been provided by the Höchstleistungsrechen-
zentrum Stuttgart (HLRS).

References

1. J. Allègre, D. Bisch, and J. C. Lengrand. Experimental Rarefied Heat Transfer


at Hypersonic Conditions over 70-Degree Blunted Cone. Journal of Spacecraft
and Rockets, 34(6):724–728, 1997.
2. D. Baganoff and J. D. McDonald. A collision selection rule for a particle simula-
tion method suited to vector computers. Phys. Fluids A, 2:1248–1259, 1990.
12 Binder et al.

3. G. A. Bird. Molecular Gas Dynamics and the Direct Simulation of Gas Flows.
Oxford University Press, Oxford, 1994.
4. S. Copplestone, T. Binder, A. Mirza, P. Nizenkov, P. Ortwein, M. Pfeiffer, S. Fa-
soulas, and C.-D. Munz. Coupled PIC-DSMC simulations of a laser-driven plasma
expansion. In W. E. Nagel, D. H. Kröner, and M. M. Resch, editors, High
Performance Computing in Science and Engineering ‘15. Springer, 2016.
5. R. W. Hockney and J. W. Eastwood. Computer Simulation Using Particles.
McGraw-Hill, Inc., New York, 1988.
6. B. Moon, H.V. Jagadish, C. Faloutsos, and J.H. Saltz. Analysis of the clustering
properties of the Hilbert space-filling curve. Knowledge and Data Engineering,
IEEE Transactions on, 13(1):124–141, Jan 2001.
7. C.-D. Munz, M. Auweter-Kurtz, S. Fasoulas, A. Mirza, P. Ortwein, M. Pfeif-
fer, and T. Stindl. Coupled Particle-In-Cell and Direct Simulation Monte Carlo
method for simulating reactive plasma flows. Comptes Rendus Mécanique,
342(10-11):662–670, October 2014.
8. P. Ortwein, T. Binder, S. Copplestone, A. Mirza, P. Nizenkov, M. Pfeiffer,
T. Stindl, S. Fasoulas, and C.-D. Munz. Parallel performance of a discontinu-
ous Galerkin spectral element method based PIC-DSMC solver. In W. E. Nagel,
D. H. Kröner, and M. M. Resch, editors, High Performance Computing in Science
and Engineering ‘14. Springer, 2015.
9. A. Stock, J. Neudorfer, B. Steinbusch, T. Stindl, R. Schneider, S. Roller, C.-D.
Munz, and M. Auweter-Kurtz. Three-dimensional gyrotron simulation using a
high-order particle-in-cell method. In W. E. Nagel, D. H. Kröner, and M. M.
Resch, editors, High Performance Computing in Science and Engineering ’11.
Springer Berlin Heidelberg.

You might also like