Intro Parallel Programming Paradigms
Intro Parallel Programming Paradigms
on
Parallel
Programming
Paradigms
Ivan
Giro3o
–
[email protected]
Informa(on
&
Communica(on
Technology
Sec(on
(ICTS)
Interna(onal
Centre
for
Theore(cal
Physics
(ICTP)
What
Determines
Performance?
• How
fast
is
my
CPU?
• How
fast
can
I
move
data
around?
• How
well
can
I
split
work
into
pieces?
– Very
applica(on
specific:
never
assume
that
a
good
solu(on
for
one
problem
is
as
good
a
solu(on
for
another
– always
run
benchmarks
to
understand
requirements
of
your
applica(ons
and
proper(es
of
your
hardware
– respect
Amdahl's
law
node
MEMORY
CPU CPU CPU
NETWORK
node
node
Shared data!
node node
network
01/10/2015
–
Ivan
GiroSo
Overview
on
Parallel
Programming
Paradigms
12
[email protected]
ICTP,
smr2761
The
Hybrid
Mode
node
node
node node
network
01/10/2015
–
Ivan
GiroSo
Overview
on
Parallel
Programming
Paradigms
13
[email protected]
ICTP,
smr2761
~
8
GBytes
export OMP_NUM_THREADS=4
export OPENBLAS_NUM_THREADS=$OMP_NUM_THREADS
mpirun -np 2 pw-gpu.x -inp input file
CUDA/OpenCL/OpenAcc:
floating point accelerators
• Data
Parallelism:
different
people
are
performing
the
same
task,
but
on
different
equivalent
and
independent
objects
all here?
1000000 core
P = 0.999999
SP
EP
=
p
01/10/2015
–
Ivan
GiroSo
Overview
on
Parallel
Programming
Paradigms
25
[email protected]
ICTP,
smr2761
Efficiency
mynode01 mynode02
OMPI_COMM_WORLD_LOCAL_RANK
-‐
the
rela(ve
rank
of
this
process
on
this
node
within
its
job.
For
example,
if
four
processes
in
a
job
share
a
node,
they
will
each
be
given
a
local
rank
ranging
from
0
to
3.
OMPI_UNIVERSE_SIZE
-‐
the
number
of
process
slots
allocated
to
this
job.
Note
that
this
may
be
different
than
the
number
of
processes
in
the
job.
OMPI_COMM_WORLD_LOCAL_SIZE
-‐
the
number
of
ranks
from
this
job
that
are
running
on
this
node.
OMPI_COMM_WORLD_NODE_RANK
-‐
the
rela(ve
rank
of
this
process
on
this
node
looking
across
ALL
jobs.
hSp://www.open-‐mpi.org
01/10/2015
–
Ivan
GiroSo
Overview
on
Parallel
Programming
Paradigms
[email protected]
ICTP,
smr2761
34
In
Python
import os
myid = os.environ['OMPI_COMM_WORLD_RANK']
[...]
In
BASH
#!/bin/bash
myid=${OMPI_COMM_WORLD_RANK}
[...]
W2
W4
W2
W4
W3
W3
P4
P3
P2
P1
01/10/2015
–
Ivan
GiroSo
Overview
on
Parallel
Programming
Paradigms
40
[email protected]
ICTP,
smr2761
Parallel
I/O
P0
P1
P2
P3
I/O
Bandwidth
I/O
Bandwidth
I/O Bandwidth
I/O
Bandwidth
File
File
File
File
System
System
System
System
01/10/2015
–
Ivan
GiroSo
Overview
on
Parallel
Programming
Paradigms
41
[email protected]
ICTP,
smr2761
Parallel
I/O
P0
P1
P2
P3
I/O
I/O
I/O
I/O