Xcalablemp Pgas Programming Language From Programming Model To Applications Mitsuhisa Sato
Xcalablemp Pgas Programming Language From Programming Model To Applications Mitsuhisa Sato
com
https://textbookfull.com/product/xcalablemp-pgas-
programming-language-from-programming-model-to-applications-
mitsuhisa-sato/
OR CLICK HERE
DOWLOAD EBOOK
https://textbookfull.com/product/programming-language-concepts-peter-
sestoft/
textbookfull.com
https://textbookfull.com/product/ada-programming-everything-you-need-
to-know-about-ada-programming-language-2nd-edition-claudia-alves/
textbookfull.com
https://textbookfull.com/product/programming-in-15-language-muhammad-
allah-rakha/
textbookfull.com
Advanced Software Technologies for Post Peta Scale
Computing The Japanese Post Peta CREST Research Project
Mitsuhisa Sato
https://textbookfull.com/product/advanced-software-technologies-for-
post-peta-scale-computing-the-japanese-post-peta-crest-research-
project-mitsuhisa-sato/
textbookfull.com
https://textbookfull.com/product/go-programming-language-the-ultimate-
beginner-s-guide-to-learn-go-programming-step-by-step-3rd-edition-
john-bach/
textbookfull.com
https://textbookfull.com/product/introducing-vala-programming-a-
language-and-techniques-to-boost-productivity-michael-lauer/
textbookfull.com
https://textbookfull.com/product/the-rust-programming-language-1st-
edition-steve-klabnik/
textbookfull.com
https://textbookfull.com/product/the-rust-programming-language-2nd-
edition-steve-klabnik/
textbookfull.com
Mitsuhisa Sato Editor
XcalableMP PGAS
Programming
Language
From Programming Model
to Applications
XcalableMP PGAS Programming Language
Mitsuhisa Sato
Editor
XcalableMP PGAS
Programming Language
From Programming Model to Applications
Editor
Mitsuhisa Sato
Programming Envt Research Team
RIKEN Center for Computational Science
Kobe, Hyogo, Japan
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
v
vi Preface
Germany, and France, had been conducted to address fundamental research on the
various aspects of HPC software during 2016–2018 (phase II). The project “MUST
Correctness Checking for YML and XMP Programs (MYX)” had been selected.
The correctness checking program, MUST is applied to the XcalableMP program.
XcalableMP was taken as a parallel programming language project in FLAG-
SHIP 2020 project which was to develop the Japanese flagship supercomputer,
Fugaku, for improving the productivity and performance of parallel programming.
XcalableMP is now available on Fugaku and the performance is enhanced by the
Fugaku interconnect, Tofu-D. Chapter “XcalableMP 2.0 and Future Directions”
presents the current status of XcalableMP on Fugaku and a proposal for the next
version, XcalableMP 2.0. I conclude this book with challenges for future PGAS
models and some retrospectives about XcalableMP.
I hope this book will provide useful information to the XcalableMP program-
mers. Furthermore, for future exascale computing and beyond, an important role
of programming model is to bridge the gap between a new architecture and
programmer’s view as well as providing a comprehensive architectural view of the
system. I believe that the PGAS model and its extensions will be still an important
and suggestive model integrating computation and memory.
Finally, I would like to thank the members of the XcalableMP Specification
Working Group and all the people who contributed to the XcalableMP project.
1 Introduction
Note For more details, please refer: Ken Kennedy, Charles Koelbel and Hans Zima:
The Rise and Fall of High Performance Fortran: An Historical Object Lesson, Proc.
3rd ACM SIGPLAN History of Programming Languages Conf. (HOPL-III), pp. 7-
1–7-22 (2007).
it is assumed that accessing remote memory may be much slower than accessing
local memory.
The execution entities in an XMP program are referred to as XMP nodes or, more
simply, nodes, which has its own memory and can communicate with each other.
An XcalableMP program execution is based on the Single Program Multiple
Data (SPMD) model, where each node starts execution from the same main routine,
and continues to execute the same code independently (i.e. asynchronously) until it
encounters an XcalableMP construct (Fig. 2).
A set of nodes that executes a procedure, statement, loop, a block, etc. is referred
to as its executing node set, and is determined by the innermost task, loop, or
array directive surrounding it dynamically, or at runtime. The current executing
node set is an executing node set of the current context, which is managed by the
XcalableMP runtime system on each node.
The current executing node set at the beginning of the program execution, or
entire node set, is a node set that contains all the available nodes, which can be
specified in an implementation-defined way (e.g. through a command-line option).
When a node encounters at runtime either a loop, array, or task construct,
and is contained by the node set specified (explicitly or implicitly) by the on clause
of the directive, it updates the current executing node set with the specified one
and executes the body of the construct, after which it resumes the last executing
node set and proceeds to execute the subsequent statements.
In particular, when a node in the current executing node set encounters a loop
or an array construct, it executes the loop or the array assignment in parallel with
the other nodes, so that each iteration of the loop or element of the assignment is
independently executed by the node in which the specified data element resides.
When a node encounters a synchronization or a communication directive,
synchronization or communication occurs between it and the other nodes. That is,
such global constructs are performed collectively by the current executing nodes.
Note that neither synchronization nor communication occurs unless these constructs
are specified.
There are two classes of data in XcalableMP: global data and local data. Data
declared in an XcalableMP program are local by default.
Global data are distributed onto a node set by the align directive (see
Sect. 2.4). Each fragment of distributed global data is allocated in the local memory
of a node in the node set.
Local data comprises all data that are not global. They are replicated within the
local memory of each of the executing nodes.
A node can access directly only local data and sections of global data that reside
in its local memory. To access data in remote memory, explicit communication
must be specified in such ways as global communication constructs and coarray
assignments (Fig. 3).
XMP can be classified as a partitioned global address space (PGAS) language, such
as Co-Array Fortran [1], Unified Parallel C [2], and Chapel [3].
In such PGAS languages, multiple executing entities (i.e. threads, processes, or
nodes in XMP) share a part of their address space, which is, however, partitioned
and a portion of which is local to each executing entity.
The two programming models, global-view and local-view, that XMP supports
to achieve high performance and productivity on PGAS are explained below.
The global-view programming model is useful when, starting from a serial version
of a program, the programmer parallelizes it in a data-parallel style by adding
directives with minimum modification. Based on this model, the programmer
specifies the distribution of data among nodes using the data distribution directives.
The loop construct assigns each iteration of a loop to the node at which the
computed data is located. The global-view communication directives are used to
synchronize nodes, maintain the consistency of shadow areas of distributed data,
and move sections of distributed data globally. Note that the programmer must
specify explicitly communication to make all data references in their program local
using appropriate directives.
In many cases, the XcalableMP program following the global-view programming
model is based on a serial program, and it can produce the same result, regardless
of the number of nodes (Fig. 4).
There are three groups of directives for this model:
• Data mapping, which specifies the data distribution and mapping to nodes
• Work mapping (parallelization), which specifies the work distribution and
mapping to nodes.
• Communication and synchronization, which specify how a node communicates
and synchronizes with the other nodes.
Because these directives are ignored as a comment by the compilers of base
languages (Fortran and C), an XcalableMP program can usually be compiled by
them to ensure that they run properly.
For this model, some language extensions and directives are provided. The
coarray notation, which is imported from Fortran 2008, is one such extension,
and can be used to explicitly specify data on which node is to be accessed. For
example, the expression of A(i)[N] in XcalableMP Fortran is used to access an
XcalableMP Programming Model and Language 7
array element of A(i) located on the node N. If the access is a reference, then
a one-sided communication to read the value from the remote memory (i.e. the
get operation) is issued by the executing node. If the access is a definition, then
a one-sided communication to write the value to the remote memory (i.e. the put
operation) is issued by the executing node.
In the global-view model, nodes are used to distribute data and works. In the local-
view model, nodes are used to address remote data in the coarray notation. In
application programs, the programmers should choose an appropriate data model
according to the characteristics of their program. Figure 6 illustrates the global view
and the local view of data.
Data can have both a global view and a local view, and can be accessed in both of
the views. XcalableMP provides a directive to give the local name (alias) to global
data declared in the global-view programming model to enable them to also be
accessed in the local-view programming model. This feature is useful to optimize
a certain part of a program by using explicit remote data access in the local-view
programming model.
In XcalableMP C, the base language C is extended so that a part of an array, that is,
an array section or subarray, can be put in an array assignment statement, which
is described in Sect. 1.5.2, and some XcalableMP constructs. An array section is
built from a subset of the elements of an array, which is specified by a sequence of
square-bracketed integer expressions or triplets, which are in the form of:
[ base ] : [ length ] [ : step ]
When step is positive, the triplet specifies a set of subscripts that is a regularly
spaced integer sequence of length length beginning with base and proceeding in
increments of step up to the largest. The same applies to negative step too.
When base is omitted, it is assumed to be 0. When length is omitted, it is assumed
to be the number of remainder elements of the dimension of the array. When step is
omitted, it is assumed to be 1.
8 H. Murai et al.
1.6 Interoperability
Most of the existing parallel applications are written with MPI. It is not realistic to
port them over to XMP because each of them consists of millions of lines.
Because XMP is interoperable with MPI, users can develop an XMP application
by modifying a part of an existing one instead of rewriting it totally. Besides, when
developing a parallel application from scratch, it is possible to use XMP to write a
complicated part of, for example, domain decomposition while they use MPI, which
could be faster than XMP, to write a hot-spot part that need to be tuned carefully. In
addition, XMP is interoperable with OpenMP and Python (see Chap. 5).
It might be difficult to develop an application with just one programming
language or framework since it generally has its own strong and weak points. Thus,
an XMP program is interoperable with those in other languages to provide both high
productivity and performance.
10 H. Murai et al.
2 Data Mapping
The nodes directive declares a one-dimensional node array p that includes four
nodes. In XMP/C, it is zero-based and consists of p[0], p[1], p[2], and p[3].
In XMP/Fortran, it is one-based and consists of p(1), p(2), p(3), and p(4).
XcalableMP C
#pragma xmp nodes p[2][3]
XcalableMP Fortran
!$xmp nodes p(3,2)
The nodes directive declares two-dimensional node array p that includes six
nodes. In XMP/C, it consists of p[0][0], p[0][1], p[0][2], p[1][0],
p[1][1], and p[1][2]. In XMP/Fortran, it consists of p(1,1), p(2,1),
p(3,1), p(1,2), p(2,2), and p(3,2).
Note The ordering of the elements in a node array follows that of a normal array
in the base language, C or Fortran.
XcalableMP C
#pragma xmp nodes p[*]
XcalableMP Fortran
!$xmp nodes p(*)
XcalableMP Fortran
!$xmp nodes p(3,*)
When the programmer runs the sample code with 12 nodes, the node array p
has a shape of 4 × 3, in C, or 3 × 4, in Fortran.
Note The programmer can put an asterisk only in the last dimension, in XMP/For-
tran, or the first dimension, in XMP/C, of the node array.
Hint The dynamic node array may interfere with compiler optimizations. In
general, programs with static ones achieve better performance.
The programmer can declare a node subarray derived from an existing node array.
Node subarrays can be used, for example, to optimize inter-node communication by
reducing the number of nodes participating in the communication.
XcalableMP C
#pragma xmp nodes p[16]
#pragma xmp nodes q[8]=p[0:8]
#pragma xmp nodes r[4][2]=p[8:8]
XcalableMP Fortran
!$xmp nodes p(16)
!$xmp nodes q(8)=p(1:8)
!$xmp nodes r(2,4)=p(9:16)
A colon represents a triplet which indicates all possible indices in the dimension.
An asterisk indicates the index of the current executing node in the dimension.
For example, col[2] corresponds to p[0][0:2] on nodes p[0][0] and
p[0][1], and to p[1][0:2] on nodes p[1][0] and p[1][1] in XMP/C.
Similarly, col(2) corresponds to p(1:2,1) on nodes p(1,1) and p(2,1),
and to p(1:2,2) on nodes p(1,2) p(2,2) in XMP/Fortran.
In XMP/C, row[0] corresponds to p[0][0] and p[0][1] on p[:][0]
and p[:][1], respectively; col[0] corresponds to p[0][0], p[1][0],
p[2][0], and p[3][0] on p[0][:], p[1][:], p[2][:], p[3][:],
respectively. In XMP/Fortran, row(1) corresponds to p(1,1) and p(2,1) on
p(1,:) and p(2,:), respectively; col(1) corresponds to p(1,1), p(1,2),
p(1,3), and p(1,4) on p(:,1), p(:,2), p(:,3), p(:,4), respectively.
The template directive declares a template, which is a virtual array that is used
as a “template” of parallelization in the programs and to be distributed onto a node
array.
XcalableMP C
#pragma xmp template t[10]
XcalableMP Fortran
!$xmp template t(10)
Hint In many cases, a template should be declared to have the same shape as your
target array.
XcalableMP C
#pragma xmp template t[10][20]
XcalableMP Fortran
!$xmp template t(20,10)
The target template t is divided into contiguous blocks and distributed among
nodes in the node array p (Fig. 8). Let’s suppose that the size of the template is
N and the number of nodes is K. If N is divisible by K, a block of size N/K is
assigned to each node; otherwise, a block of size ceil(N/K) is assigned to each of
N/ceil(N/K) nodes, a block of size mod(N, K) to one node, and no block to (K −
N/ceil(N/K) − 1) nodes. The block distribution is useful for regular computations
such as a stencil one.
Note The function ceil(x) returns a minimum integer value greater than or equal to
x, and mod(x, y) returns x modulo y.
XcalableMP C
#pragma xmp nodes p[3]
#pragma xmp template t[22]
#pragma xmp distribute t[block] onto p
XcalableMP Fortran
!$xmp nodes p(3)
!$xmp template t(22)
!$xmp distribute t(block) onto p
Since ceil(22/3) is 8, eight elements are allocated on each of p[0] and p[1],
and the remaining six elements are allocated on p[2].
XMP/C
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
p[0], p(1)
XMP/Fortran
p[1], p(2)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
p[2], p(3)
XMP/C
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
p[0], p(1)
XMP/Fortran
p[1], p(2)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
p[2], p(3)
The target template t is divided into chunks of size one and distributed among
nodes in the node array p in a round-robin manner (Fig. 9). The cyclic distribution
is useful for the case where the load on each element of the template is not balanced.
XcalableMP C
#pragma xmp nodes p[3]
#pragma xmp template t[22]
#pragma xmp distribute t[cyclic] onto p
XcalableMP Fortran
!$xmp nodes p(3)
!$xmp template t(22)
!$xmp distribute t(cyclic) onto p
The target template t is divided into chunks of size w and distributed among
nodes in the node array p in a round-robin manner (Fig. 10). The block-cyclic
distribution is useful for the case where the load on each element of the template is
not balanced but the locality of the elements is required.
16 H. Murai et al.
XMP/C
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
p[0], p(1)
XMP/Fortran
p[1], p(2)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
p[2], p(3)
XcalableMP C
#pragma xmp nodes p[3]
#pragma xmp template t[22]
#pragma xmp distribute t[cyclic(3)] onto p
XcalableMP Fortran
!$xmp nodes p(3)
!$xmp template t(22)
!$xmp distribute t(cyclic(3)) onto p
The target template t is divided into contiguous blocks of size W[0], W[1], · · · ,
in XMP/C, or W(1), W(2), · · · , in XMP/Fortran, and distributed among nodes in
the node array p (Fig. 11). The array W is called a mapping array. The programmer
can specify irregular (uneven) block distribution with the gblock format.
XcalableMP C
#pragma xmp nodes p[3]
#pragma xmp template t[22]
int W[3] = {6, 11, 5};
#pragma xmp distribute t[gblock(W)] onto p
XcalableMP Fortran
!$xmp nodes p(3)
!$xmp template t(22)
integer, parameter :: W(3) = (/6,11,5/)
!$xmp distribute t(gblock(W)) onto p
XcalableMP Programming Model and Language 17
XMP/C
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
p[0], p(1)
XMP/Fortran
p[1], p(2)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
p[2], p(3)
XMP/C XMP/Fortran
0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8
0 1
1 2
2 3
3 4 p[0][0], p(1,1)
4 5 p[0][1], p(2,1)
5 6 p[1][0], p(1,2)
6 7
p[1][1], p(2,2)
7 8
XMP/C XMP/Fortran
0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8
0 1
1 2
2 3
3 4 p[0][0], p(1,1)
4 5 p[0][1], p(2,1)
5 6 p[1][0], p(1,2)
6 7
p[1][1], p(2,2)
7 8
XMP/C XMP/Fortran
0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8
0 1
1 2
2 3
3 4 p[0], p(1)
4 5 p[1], p(2)
5 6
p[2], p(3)
6 7
p[3], p(4)
7 8
XcalableMP C
#pragma xmp nodes p[4]
#pragma xmp template t[10][10]
#pragma xmp distribute t[block][*] onto p
XcalableMP Fortran
!$xmp nodes p(4)
!$xmp template t(10,10)
!$xmp distribute t(*,block) onto p
The align directive specifies that an array is to be mapped in the same way as a
specified template. In other words, an align directive defines the correspondence
of elements between an array and a template, and each of the array element is
allocated on the node where the corresponding template element is assigned.
p[0], p(1)
p[1], p(2)
p[2], p(3)
p[3], p(4)
p[0][0], p(1,1)
p[0][1], p(2,1)
p[1][0], p(1,2)
p[1][1], p(2,2)
p[0], p(1)
p[1], p(2)
p[2], p(3)
p[3], p(4)
p[0], p(1)
p[1], p(2)
p[2], p(3)
p[3], p(4)
XcalableMP C
#pragma xmp nodes p[4]
#pragma xmp template t[8]
#pragma xmp distribute t[block] onto p
int a[8];
5 #pragma xmp align a[i] with t[i]
XcalableMP Fortran
!$xmp nodes p(4)
!$xmp template t(8)
!$xmp distribute t(block) onto p
integer :: a(8)
5 !$xmp align a(i) with t(i)
The array a is decomposed and laid out so that each element a(i) is colocated
with the corresponding template element t(i) (Fig. 15).
The align directive can also be used for multi-dimensional arrays (Fig. 16).
XcalableMP C
#pragma xmp nodes p[2][2]
#pragma xmp template t[8][8]
#pragma xmp distribute t[block][block] onto p
XcalableMP Programming Model and Language 21
int a[8][8];
5 #pragma xmp align a[i][j] with t[i][j]
XcalableMP Fortran
!$xmp nodes p(2,2)
!$xmp template t(8,8)
!$xmp distribute t(block,block) onto p
integer :: a(8,8)
5 !$xmp align a(j,i) with t(j,i)
LIVRO SEGUNDO
O trabalho
II
III
IV
SUBRE
SUB UMBRA
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com