0% found this document useful (0 votes)
22 views57 pages

BIg Data Anslysi

BIG daata

Uploaded by

215059
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views57 pages

BIg Data Anslysi

BIG daata

Uploaded by

215059
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Lecture 9

Message Passing Interface


(MPI)

1
Current HPC Platforms: COTS-Based Clusters

COTS = Commercial off-the-shelf

Access File
Control Server(s)

Login Node(s) Compute Nodes

2
Shared and Distributed Memory
OpenMP
Pthreads MPI
Java Threads
MPI

CPU1 CPU2
node1 node2 node3
core1 core2 core1 core2

FSB

controller RAM Cluster interconnect

Platform (node)

Shared memory on each node… Distributed memory across cluster

Multi-core CPUs in clusters – two types of parallelism to consider

3
TOP 500 Supercomputers

Top positions of theTOP500 in November 2021

4
TOP 500 Supercomputers

Countries’ share

5
TOP 500 Super Computers
Nov 2021

6
TOP 500 Supercomputers

Fugaku Supercomputer

7
MareNostrum Supercomputer, Barcelona 2010

8
What to do with huge problems?

• If problem suitable for parallelization and computational


resources are available, then parallelize!
• In many cases, simple parallelization (multiple threads) will be
good enough
• If problem is huge and can be parallelized, distribute! (but
remember Amdahl’s Law)
• How to deal with communication?

9
What to do with huge problems?

• How to deal with stuff such as group


communication, synchronization, termination
detection?
1 Implement your own, specialized algorithms? → not always suitable,
large codebase, maybe make errors
2 Use several existing implementations? → software will become too
hetergeneous, complicated, many libraries, hard to maintain
• We want a uniform programming interface and
implementations which provide the services we need.

10
Messare Passing Interface

• Messare Passing Interface (MPI) is a standard for the message


exchange and synchronization in parallel computations on
distributed computing systems
• Developed since 1992
• It provides a set of operations and their semantics, i.e., a
programming interface
• It does not define a specific protocol or implementation
• All definitions are hardware-independent

11
Messare Passing Interface

• MPI is a standard
• Agreed upon through extensive joint effort of ~100 representatives
from ~40 different organisations (the MPI Forum)
• Academics
• Industry experts
• Vendors
• Application developers
• Users
• First version (MPI 1.0) drafted in 1993
• Now on version 3 (version 4 being drafted)

12
MPI Sources

The Standard itself:


http://www.mpi-forum.org
All MPI official releases, in both postscript and HTML
Books:
Using MPI: Portable Parallel Programming with the Message-Passing
Interface, by Gropp, Lusk, and Skjellum, MIT Press, 1994.

MPI: The Complete Reference, by Snir, Otto, Huss-Lederman, Walker, and


Dongarra, MIT Press, 1996.

Designing and Building Parallel Programs, by Ian Foster, Addison-Wesley,


1995.

Parallel Programming with MPI, by Peter Pacheco, Morgan-Kaufmann, 1997.

MPI: The Complete Reference Vol 1 and 2,MIT Press, 1998(Fall).

Other information on Web:


http://www.mcs.anl.gov/mpi
pointers to lots of stuff, including other talks and tutorials, a FAQ, other MPI
pages

13
MPI Datatypes
MPI Datatype C datatype

MPI_CHAR signed char

MPI_SHORT signed short int

MPI_INT signed int

MPI_LONG signed long int

MPI_UNSIGNED_CHAR unsigned char

MPI_UNSIGNED_SHORT unsigned short int

MPI_INTEGER int

MPI_UNSIGNED unsigned int

MPI_UNSIGNED_LONG unsigned long int

MPI_FLOAT float

MPI_DOUBLE double

MPI_LONG_DOUBLE long double

…….

14
Implementations

• C/C++/Fortran:
1 MPICH
2 Open MPI
3 DeinoMPI

• Java: MPJExpress
• C#: MPI.NET
• Python: pyMPI

15
M PI

Compilation in C:
mpicc -o prog prog.c

Compilation in C++:
mpiCC -o prog prog.c
mpicxx -o prog prog.cpp

Executing program with num processes:


mpirun –n num ./prog
mpiexec -n num ./prog

Compilation in Java:
Compile: javac -cp .;%MPJ_HOME%/lib/mpj.jar HelloWorld.java
Executing program with 2 processes:
mpjrun.bat -np 2 HelloWorld

(http://mpjexpress.org/docs/guides/windowsguide.pdf) 16
Typical Structure

• A typical MPI application:


• a set of communicating processes
• started in parallel possibly on
1 multiple different computers e.g., in a cluster or
2 dedicated parallel computers
• processes work together on one problem
• processes use messages for information exchange
• Basic paradigms: message-based, group communication, reliable

17
Typical Structure

18
Message passing
n Messages are packets of data moving between sub-
programs
n Necessary information for the message passing
system:
n sending process – receiving process i.e., the ranks
n source location – destination location
n source data type – destination data type
n source data size – destination buffer size

data

program

communication network

19
Access

n A sub-program needs to be connected to a


message passing system
n A message passing system is similar to:
n phone line
n mail box
n fax machine
n etc.

n MPI:
n program must be linked with an MPI library
n program must be started with the MPI startup tool

20
Messages

n A message contains a number of elements of some


particular datatype.
n MPI datatypes:
n Basic datatype.
n Derived datatypes
n C types are different from Fortran types.
n Datatype handles are used to describe the type of the
data in the memory.

Example: message with 5 integers


2345 654 96574 -12 7676

21
The Message-Passing Programming Paradigm

n Sequential Programming Paradigm

data memory A processor


may run many
processes
program Processor/Process

Message-Passing Programming Paradigm


data data data data distributed
memory
program program program program parallel
processors

communication network

22
The Message-Passing Programming Paradigm
• A process is a program performing a task on a processor
• Each processor/process in a message passing program runs a
instance/copy of a program:
• written in a conventional sequential language, e.g., C or Fortran,
• typically a single program operating on multiple dataset
• Single Program, Multiple Data (SPMD)
• the variables of each sub-program have
• the same name
• but different locations (distributed memory) and different data!
• i.e., all variables are local to a process
• communicate via special send & receive routines (message
passing)

data data data data distributed


memory
program program program program parallel
processors

communication network

23
Data and Work Distribution

• To communicate together mpi-processes need identifiers:


rank = identifying number
• All distribution decisions are based on the rank
n i.e., which process works on which data

myrank=
myrank=0 myrank=1 myrank=2
(size-1)
data data data data

program program program program

communication network

24
Communicators

• Basis for group communication:


• communicators are special MPI constructs that
• hold a subset of processes and
• is passed as parameter for communication

• For now, we just use MPI.COMM_WORLD


• which contains all MPI processes

25
Simple MPI Program

• MPI.Init starts the MPI subsystem


• MPI.Finalize shuts down the MPI subsystem

26
MPI Init and MPI Finalize

• MPI.Init executes all actions which are


necessary for communication later, such as
1 establishing connections
2 initialization of variables
3 explore the network
4 unique ranks are assigned to each process
5 ...
• MPI.Finalize is the last MPI call in a program
• all communication must be finished before that

27
Important Runtime Parameters

• After MPI.Init, there are two main functions that are called.
• MPI programs need information about
1 “themselves” and
2 the current system of processes
• How many processes are there?
• MPI.Comm_World.Size()
• returns the size of a communicator
• Which ID do I have?
MPI.Comm_World.Rank()
• rank∈ {0. . . size− 1}
• returns the rank of a process in a communicator

28
Listing: Extended MPI Program

29
Point to Point Communication

• Communication between two


processes
• one process is performing a
send operation and the other is
From:
performing a matching receive source rank tag
operation
• A System Buffer allows asynchronous To:
destination rank tag
send-receive operations

item-1
item-2
item-3 “count“
item-4 elements
...
item-n

30
Blocking Send

• Performs a blocking send:


• Will block until message has been copied to OS/network stack buffers
• May block until message has received at destination process
• Buffer can be overwritten after function returns
31
Blocking Send

32
Blocking Receive

• Performs a blocking receive: Waits until a message has been received

33
Blocking Receive

34
Deadlock

Will block and never return,


n Code in each MPI process: because MPI.Recv cannot
n Send(…, right_rank, …) be called in the right-hand
n Recv( …, left_rank, …) MPI process

1
0
2 3
6
5 4

35
Non-Blocking Communications

n Separate communication into three phases:


1. Initiate non-blocking communication
• returns Immediately
• routine name starting with MPI_I…
2. Do some work
• “latency hiding”
3. Wait for non-blocking communication to
complete

36
Non-Blocking Examples

• Non-blocking send Isend(...)

do some other work

Wait(...)
0

• Non-blocking receive Irecv(...)


do some other work
Wait(...)
1
= waiting until operation locally completed

37
Test

• If we start an asynchronous operation, like sending or receiving. ..


• ...how do we know when we can change the data (being sent) or use
the data (being received) if the operation returns immediately?

38
Wait

• If we start an asynchronous operation, like sending or receiving. ..


• ..we can come back later and wait for the operation to complete

39
Non-blocking Send
• Sometimes, we want to keep calculating while sending/receiving is
going on: non-blocking operations

• Returns request object that can be used to check information about the operation 40
Non-blocking Recv
• Sometimes, we want to keep calculating while sending/receiving is
going on: non-blocking operations

• Returns request object that can be used to check information about the operation 41
Non-blocking Send/Recv

42
Non-Blocking Send

n Initiate non-blocking send


n in the ring example: Initiate non-blocking send to the right neighbor
n Do some work:
n in the ring example: Receiving the message from left neighbor

n Now, the message transfer can be completed


n Wait for non-blocking send to complete

1
0
2 3
6
5 4

43
Non-Blocking Receive

n Initiate non-blocking receive


n in the ring example: Initiate non-blocking receive from left neighbor
n Do some work:
n in the ring example: Sending the message to the right neighbor
n Now, the message transfer can be completed
n Wait for non-blocking receive to complete

1
0
2 3
6
5 4

44
Requirements for Point-to-Point Communications

n For a communication to succeed:


• Sender must specify a valid destination rank.
• Receiver must specify a valid source rank.
• The communicator must be the same.
• Tags must match.
• Message datatypes must match.
• Receiver’s buffer must be large enough.

45
Example – Simple send and receive

46
Example – Simple send and receive (Blocking)

47
Assignment – MPI Ping Pong Program

n Write psuedocode
n Two processes ping pong an integer back and forth,
incrementing it until it reaches a given value.

48
Example – Ping Pong

50
Collective Communication

• All processes within a communicator can exchange information at the


same time
• There are different semantics for the information exchange
• Either all processes or pair-wise
• Synchronization usually implicitly contained
• Every collective operation can also be expressed with
MPI_Send / MPI_Recv

51
Barrier

• One of the things to remember about collective communication is that


it implies a synchronization point among processes. This means that all
processes must reach a point in their code before they can all begin
executing again.

• A barrier can be used to synchronize all processes in a communicator.


Each process wait till all processes reach this point before proceeding
further.

• MPI_Barrier()

52
Barrier

• Process zero first


calls MPI_Barrier at the first
time snapshot (T1).

• While process zero is hung up


at the barrier, process one and
three eventually make it (T2).

• When process two finally


makes it to the barrier (T3), all
of the processes then begin
execution again (T4).

53
Broadcast, Scatter, & Gather

54
Broadcast
• Simplest way to do collective communication is broadcast

55
Scatt er
Divides an array of sendcount elements into n pieces, where n is the number of
processes in a communicator
• If root: sends the pieces to each of the n processes (including itself)
• If not root: receive a data piece

56
Gather
• Complement to MPI_Scatter
• Receives data in small arrays from all processes in a communicator
• If root: combines all the data into one array

57
Thank you

58

You might also like