0% found this document useful (0 votes)
24 views33 pages

Lecture 13-Derived Datatypes in MPI

The document discusses high-performance computing and parallel programming, focusing on MPI (Message Passing Interface) collective communication and derived data types. It outlines various MPI functions for synchronization, data movement, and global computation, as well as the importance of defining custom data types for efficient communication. Additionally, it covers the creation and management of derived data types, including MPI_Type_vector, MPI_Type_create_subarray, and MPI_Type_create_struct, to facilitate the handling of complex data structures in a portable manner.

Uploaded by

roarsomebros
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views33 pages

Lecture 13-Derived Datatypes in MPI

The document discusses high-performance computing and parallel programming, focusing on MPI (Message Passing Interface) collective communication and derived data types. It outlines various MPI functions for synchronization, data movement, and global computation, as well as the importance of defining custom data types for efficient communication. Additionally, it covers the creation and management of derived data types, including MPI_Type_vector, MPI_Type_create_subarray, and MPI_Type_create_struct, to facilitate the handling of complex data structures in a portable manner.

Uploaded by

roarsomebros
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Applied High-Performance Computing and Parallel

Programming

Presenter: Liangqiong Qu

Assistant Professor

The University of Hong Kong


Administration

• Assignment 2 has released


- Due April 1, 2025, Tuesday, 11:59 PM
- Important: The usage of accounts of HPC system for the second assignment is from
March 20 to April 1 11:59 PM.
Review of Lecture 12: Collective Communication in MPI
▪ Collective communication allows you to exchange data among a group
of processes
▪ Rules for all collectives
• Data type matching
• Do not use tags
• Count must be exact, i.e., there is only one message length, buffer
must be large enough
▪ Types:
• Synchronization (barrier)
• Data movement (broadcast, scatter, gather)
• Global computation (reduction, scan)

▪ General assumption: MPI does a better job at collectives than you trying
to emulate them with point-to-point calls
Review of Lecture 12: Synchronization and Data Movement
▪ Synchronization (barrier) MPI_Barrier(MPI_Comm comm)
• Explicit synchronization of all ranks from specified communicator
▪ Data movement (broadcast, scatter, gather)
• Broadcasting happens when one process wants to send the same information to every
other process.
MPI_Bcast(void* buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm)
• Scatter: Distributes distinct messages from a single root rank to each ranks in the
communicator.
MPI_Scatter(void *sendbuf, int sendcount, MPI_Datatype sendstype, void *recvbuf, int recvcount, MPI_Datatype
recvtype, int root, MPI_Comm comm)
• Receive a message from each rank and place i-th rank’s message at i-th position in
receive buffer
int MPI_Gather(void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int
recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm )

scatter
gather broadcast
Review of Lecture 12: Global Computation in MPI
▪ Global computation (MPI_Reduce, scatter, gather)
• MPI_Reduce: Collective computation operation. Applies a reduction operation on all tasks in
communicator and places the result in root rank.
MPI_reduce( void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype,
MPI_Op op, int root, MPI_Comm comm );

• MPI_Scan: Performs a prefix reduction of the data stored in sendbuf at each process and returns the
results in recvbuf of the process with rank dest.
MPI_Scan(void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm

MPI_Op op here indicates the reduce operation (MPI predefined or your own)

rank 0 rank 1 rank 2 rank 3


Outline

▪ MPI Derived Data Types

▪ Why derived data types?

▪ MPI_Type_vector

▪ MPI_Type_create_subarray

▪ MPI_Type_create_struct
Prerequisite: Bits and Bytes in Computer
• A bit is the smallest unit of information in computing and
digital communication.
• Everything in a computer is 0's and 1's. The bit stores just
a 0 or 1: it's the smallest building block of storage.

• Byte: One byte = collection of 8 bits, e.g. 0 1 0 1 1 0 1 0 Review of Lecture 4: At the core of CPU performance
lies the transistor, a tiny electronic switch that can
either allow a signal to pass (representing the on state,
or 1) or block it (representing the off state, or 0). This
fundamental behavior of transistors is what enables the
representation of a bit—the smallest unit of data in
computing.
Prerequisite: Bits and Bytes in Computer
• The byte is a unit of digital information that most commonly consists of 8 bits.
Historically, the byte was the number of bits used to encode a single character of
text in a computer.
• Different data types (e.g., int, float) require different amounts of memory
(bits/bytes).
• In programming, specifying the data type of a variable tells the computer how much
memory to allocate and how to interpret the data.
Predefined Data Types in MPI
• Different data types (e.g., int, float) require
different amounts of memory (bits/bytes).
• MPI provides predefined datatypes
like MPI_INT and MPI_FLOAT to ensure proper
memory allocation and data interpretation during
communication.

• Unlike other MPI datatypes


(e.g., MPI_INT, MPI_FLOAT), MPI_BYTE
does not assume any particular data format or
structure. It is simply a contiguous sequence
of bytes, making it useful for transferring
binary data or data of unknown or mixed
types.
Predefined Data Types in MPI
• Different machines store data types differently, such
as Little-Endian and Big-Endian systems:
• Little-Endian: The last byte of a multibyte data type is
stored first.
• Big-Endian: The first byte of a multibyte data type is
stored first.

• MPI derived data type provide support for


heterogeneous systems: automatic data type
conversion
• Process A on a Little-Endian machine sends a 32-bit
integer to Process B on a Big-Endian machine using
MPI.
• MPI automatically handles the conversion, ensuring
the data is correctly interpreted by both processes.
C Structures
▪ Structures (or structs) allow you to group multiple related variables into a single
unit. Each variable within the structure is called a member.
▪ To define a structure, use the struct keyword and declare its members inside curly
braces {}. To access member of a structure, use the dot syntax (.):

C require data types to be


stored at memory
addresses that are
multiples of their size due
to hardware
design and performance
optimization.
C Structures
▪ Structures (or structs) allow you to group multiple related variables into a single
unit. Each variable within the structure is called a member.
▪ To define a structure, use the struct keyword and declare its members inside curly
braces {}. To access member of a structure, use the dot syntax (.):

NOTE: MPI is a library and it has no idea about the


C struct that we have set up in our main program.
C structs can have different memory layouts and
padding depending on the compiler and architecture,
making them non-portable for communication
between different MPI processes.
Why MPI Derived Datatypes
▪ Other than the predefined MPI datatypes, it is possible to define new datatypes by
grouping. This class of data is the derived datatype.
▪ Example: Root reads configuration and broadcasts it to all others

Want to do something like:

MPI_Bcast(&cfg, 1, <type cfg>, …);

However, MPI is a library and it has no idea about


the struct that we have set up in our main program.

MPI_Bcast(&cfg, 1, sizeof(cfg), MPI_BYTE, …) is


also not a solution. It is not portable as no data
conversion can take place for MPI_BYTE.
Why MPI Derived Datatypes
▪ Example: Send column of matrix (noncontiguous in C):
• Send each element alone?
• Manually copy elements out into a contiguous buffer and send it?
MPI Derived Datatypes
▪ MPI allows the programmer to create your own data types, analogous to defining
structures in C. This class of data is the derived datatype.
▪ Derived datatypes in MPI can be used in Grouping data of different datatypes for
communication. Grouping non contiguous data for communication
▪ Three steps to create a new MPI data type
• Construct the new data type
MPI_Datatype newtype;
MPI_Type_*(…); // define the new data type
Using the function like MPI_Type_create_struct, MPI_Type_vector, or MPI_Type_create_subarray to
define the layout of the new data type.
• Commit new data type with
MPI_Type_commit(MPI_Datatype * newtype);
A datatype object has to be committed before it can be used in a communication.
• After use, deallocate the data type with
MPI_Type_free(MPI_Datatype * newtype);
MPI Derived Datatypes
▪ MPI has the following functions to define MPI derived datatypes

• MPI_Type_create_struct(…)
specifies the data layout of user-defined structs (or classes)

• MPI_Type_vector(…)
specifies strided data, i.e. same-type data with missing
elements

• MPI_Type_create_subarray(…)
specifies sub-ranges of multi-dimensional arrays
A Flexible, Vector-Like Type: MPI_Type_vector
▪ Creates a vector (strided) datatype:
MPI_Type_vector(int count, int blocklength, int stride, MPI_Datatype
oldtype, MPI_Datatype * newtype);

Input arguments:
• count is the number of contiguous blocks
• blocklengths are the number of the elements in each block
• stride is number of elements between start of each block
• oldtype is the datatype of the elements

Output arguments:
• newtype: new datatype (handle)
A Flexible, Vector-Like Type: MPI_Type_vector
MPI_Type_vector(int count, int blocklength, int stride, MPI_Datatype oldtype,
MPI_Datatype * newtype);

• count 2 (no. of blocks)


• blocklength 3 (no. of elements in each block)
• stride 5(no. of elements b/w start of each block
• oldtype MPI_INT
Derives Type Size and Extent
▪ Get the total size (in bytes) of datatype in a message.
• MPI_type_size(MPI_Datatype newtype, int *size);
• The size of the datatype refers to the total number of bytes occupied by the
datatype, not including any gaps.

▪ Get the lower bound and the extent (span from the first byte to the last byte) of
datatype
• MPI_type_get_extent(MPI_Datatype newtype, MPI_Aint *lb, MPI_Aint
*extent);
• Lower bound refers to the starting byte address of the datatype, while the extent
represents the span from the first byte to the last byte of the datatype.
• MPI_Aint is a MPI type represents an address or offset in memory.
How to Obtain and Handle Address
▪ int MPI_Get_address(const void *location, MPI_Aint *address);
• Get the address of a location in memory
• (input argument) location: The element to obtain the address of.
• (output argument) address: Address of location

▪ MPI_Aint MPI_Aint_diff(MPI_Aint addr1, MPI_Aint addr2);


• Returns the difference between addr1 and addr2

▪ Example:

Result would usually be disp = 400 (50 x 8)


How to Obtain and Handle Address
▪ MPI_Aint MPI_Aint_add(MPI_Aint base, MPI_Aint disp);
• base: The address to start from. disp: The displacement to apply to the start address.
Displacement refers to the difference or offset between two memory addresses.
• Return: The address obtained by adding the displacement to the base address

• MPI_Aint_add returns the address


obtained by adding the displacement to
the base address.

• MPI_Get_address get the address of a[4]


and saves to the orig_address_fifth_element
Use of MPI_Type_vector: Sending a Column of a Matrix in C
• Row-major data layout in C → cannot use plain array. In C, multi-dimensional arrays
is stored in row-major order in memory. The elements of the first row of the array are
stored first, followed by the elements of the second row, and so on.

Give a 3*3 matrix a in C:


Use of MPI_Type_vector: Sending a Column of a Matrix in C
• Row-major data layout in C → cannot use plain array. We can then use
MPI_Type_vector to create a strided datatype that tells MPI how to access the memory in
such a way that it matches the data layout of a column.
Use of MPI_Type_vector: Sending a Column of a Matrix in C

• MPI_Type_vector creates a
vector (strided) datatype, with
count as nrows, blocklength as
1 (the number of length in each
block), stride as ncols (number
of elements between start of
each block), original
MPI_FLOAT data type
A Sub-array Type: MPI_Type_create_subarray
MPI_Type_create_subarray(int ndims, const int array_of_sizes[], const int
array_of_subsizes[], const int array_of_starts[], int order, MPI_Datatype
oldtype, MPI_Datatype *newtype)

Input arguments:
• ndims: number of array dimensions
• array_of_sizes: number of elements in each dimension of the full array
• array_of_subsizes: number of elements in each dimension of the subarray
• array_of_starts: starting coordinates of the subarray in each dimension
• order : array storage order flag (row-major: MPI_ORDER_C or column-
major MPI_ORDER_FORTRAIN )

Output arguments:
• newtype: new datatype (handle)
A Sub-array Type: MPI_Type_create_subarray
MPI_Type_create_subarray(int ndims, const int array_of_sizes[],
const int array_of_subsizes[], const int array_of_starts[], int order,
MPI_Datatype oldtype, MPI_Datatype *newtype)

• ndims: 2 (number of array dimensions)


• array_of_sizes: {ncols, nrows} (dimension of original full array)
• array_of_subsizes: {ncols-2, nrows-2} (actual dimension of subarray)
• array_of_starts: {1, 1}
• order : MPI_ORDER_C
• oldtype: MPI_INT
Most Flexible Type: MPI_Type_create_struct
▪ MPI does not directly support sending or receiving C struct types because MPI is
designed to be portable across different architectures and programming
languages. C structs can have different memory layouts and padding depending
on the compiler and architecture, making them non-portable for communication
between different MPI processes.

▪ To ensure portability, MPI provides functions like `MPI_Type_create_struct` to


create new MPI datatypes that can represent complex structures.

MPI_Type_create_struct(int block_count, const int block_lengths[], const MPI_Aint displs[],


MPI_Datatype block_types[], MPI_Datatype* new_datatype);
Most Flexible Type: MPI_Type_create_struct
▪ MPI_Type_create_struct is the most flexible routine to create an MPI datatype. It
describe blocks with arbitrary data types and arbitrary displacements.
MPI_Type_create_struct(int block_count, const int block_lengths[], const MPI_Aint displs[],
MPI_Datatype block_types[], MPI_Datatype* new_datatype);

Input arguments:
• block_count: The number of blocks to create.
• block_lengths : Array containing the length of each block.
• displs: Array containing the displacement for each block, expressed in bytes.
The displacement is the distance between the start of the MPI datatype created
and the start of the block.
• block_types : Type of elements in each block

Output arguments:
• newtype: new datatype (handle)
Most Flexible Type: MPI_Type_create_struct
▪ MPI_Type_create_struct is the most flexible routine to create an MPI datatype. It
describe blocks with arbitrary data types and arbitrary displacements.

MPI_Type_create_struct(int block_count, const int block_lengths[], const MPI_Aint displs[],


MPI_Datatype block_types[], MPI_Datatype* new_datatype);

• The contents of displs are either the


displacements in bytes of the block bases or
MPI addresses
• Displs (displacement) is important in order to
let MPI know where each field is located in
memory so it can correctly pack, send, receive,
and unpack the data.
Most Flexible Type: MPI_Type_create_struct
MPI_Type_create_struct(int block_count, const int block_lengths[], const MPI_Aint displs[],
MPI_Datatype block_types[], MPI_Datatype* new_datatype);
MPI_Type_create_struct

MPI_Type_create_struct(int
block_count, const int block_lengths[],
const MPI_Aint displs[],
MPI_Datatype block_types[],
MPI_Datatype* new_datatype);
Derived Data Types: Summary
▪ A flexible tool to communicate complex data structures in MPI
▪ Most important calls:

• MPI_Type_create_struct(…)
specifies the data layout of user-defined structs (or classes)
• MPI_Type_vector(…)
specifies strided data, i.e. same-type data with missing elements
• MPI_Type_create_subarray(…)
specifies sub-ranges of multi-dimensional arrays
• MPI_Type_commit, MPI_Type_free
• MPI_Get_address, MPI_Aint_add, MPI_Aint_diff

▪ Matching rule: send and receive match if specified basic datatypes match one by
one, regardless of displacements
▪ Correct displacements at receiver side are automatically matched to the
corresponding data items
Thank You!

You might also like