Parallel Computing Unsteady Heat Equation 1D With Mpi: Ipung Pramono
Parallel Computing Unsteady Heat Equation 1D With Mpi: Ipung Pramono
INTRODUCTION
|
Modeling Problem
in Mathematic
Model and
Simulation
Computation
INTRODUCTION (CONT.)
T=
isolated
DISCRETIZATION
|
Heat equation
T1,t1 T2,t1
Tn,t1
T1,t2 T2,t2
Tn,t2
i1
j
J+1
I+1
IMPLEMENTATION
i=1
Proc.1
i=2
i=n
Proc.2
j+1
Proc.3
Proc.4
if(my_rank > 0)
{
MPI_Send (&T[1],1,MPI_DOUBLE, my_rank-1, tag, MPI_COMM_WORLD);
|
|
{
MPI_Recv (&T[n+1],1, MPI_DOUBLE, my_rank+1, tag, MPI_COMM_WORLD, &status);
|
|
if (my_rank > 0)
{
MPI_Recv(&T[0],1,MPI_DOUBLE, my_rank-1, tag, MPI_COMM_WORLD, &status);
|
|
Number of nodes
Sequential
2processors
speed up
3processors
seconds
speed up
4processors
seconds
seconds
seconds
speed up
120
0.004658
0.010802
0.431
0.013614
0.342
0.009988
0.466
600
0.019435
0.020224
0.961
0.012305
1.579
0.011569
1.680
1200
0.033949
0.026332
1.289
0.019505
1.741
0.021559
1.575
1800
0.04262
0.029984
1.421
0.028406
1.500
0.022755
1.873
3000
0.067064
0.04664
1.438
0.036598
1.832
0.031644
2.119
6000
0.112408
0.070857
1.586
0.059541
1.888
0.046107
2.438
ANALYSE
For a smaller problem size for example the 120
nodes the sequential method is better than using
parallelism because communication time is
dominant in comparison with the computation
time. That means if there are too many
processors involved they spend more time to
transfer the data to their neighbors as to
calculate the next step.
| If the two neighboring processors both start with
MPI_Recv and continue with MPI_Send,
deadlock will arise because none can return from
the MPI_recv
|
OPTIMIZE
The MPI_Sendrecv command is designed to
handle one-to-one data exchange (Algo-2)
| use non-blocking MPI, A more important
motivation for using non-blocking MPI is to
exploit the possibility of communication and
computation overlap or hiding communication
overhead. (non-blocking)
|
OPTIMIZE(CONT.)
|
Use MPI_Sendrecv
tag = 1;
tag = 1;
RESULT
I
II
No
Number of nodes
1
2
3
4
5
600
1200
1800
3000
6000
No
Number of nodes
1
2
600
1200
3
4
5
1800
3000
6000
No Number of nodes
III
Sequential
seconds
0.019435
0.033949
0.04262
0.067064
0.112408
3processors
seconds
speed up
0.012305
1.579
0.019505
1.741
0.028406
1.500
0.036598
1.832
0.059541
1.888
4processors
seconds
speed up
0.011569
1.680
0.021559
1.575
0.022755
1.873
0.031644
2.119
0.046107
2.438
2processors
seconds
speed up
3processors
seconds
speed up
4processors
seconds
speed up
0.019435
0.033949
0.015873
0.028509
1.224
1.191
0.021008
0.024182
0.925
1.404
0.026304
0.016937
0.739
2.004
0.04262
0.067064
0.112408
0.033802
0.045894
0.065463
1.261
1.461
1.717
0.033681
0.039973
0.0558505
1.265
1.678
2.013
0.023062
0.030843
0.045793
1.848
2.174
2.455
Sequential
seconds
Sequential
seconds
2processors
seconds
speed up
0.020224
0.961
0.026332
1.289
0.029984
1.421
0.04664
1.438
0.070857
1.586
2processors
seconds
speed up
3processors
seconds
speed up
4processors
seconds
speed up
600
0.019435
0.020165
0.964
0.009701
2.003
0.009228
2.106
1200
0.033949
0.029884
1.136
0.022478
1.510
0.018715
1.814
1800
0.04262
0.031941
1.334
0.031863
1.338
0.021508
1.982
3000
0.067064
0.044221
1.517
0.037001
1.812
0.030798
2.178
6000
0.112408
0.067822
1.657
0.050361
2.232
0.04424
2.541
RESULT (FOR
2 PROCESSORS)
1.800
1.700
1.600
1.500
1.400
speedup
algo-1
1.300
algo-2
non-blocking
1.200
1.100
1.000
0.900
0.800
0
1000
2000
3000
4000
nodes
5000
6000
7000
RESULT (FOR
3 PROCESSORS)
2.400
2.200
2.000
1.800
speedup
algo-1
algo-2
1.600
non-blocking
1.400
1.200
1.000
0.800
0
1000
2000
3000
4000
nodes
5000
6000
7000
RESULT (FOR
4 PROCESSORS)
2.700
2.500
2.300
2.100
1.900
speedup
1.700
algo-1
1.500
algo-2
non-blocking
1.300
1.100
0.900
0.700
0.500
0
1000
2000
3000
4000
nodes
5000
6000
7000
CONCLUSION
|