Assignment Exercise1
Assignment Exercise1
v1.2
History of changes
v1.2 : added a section about the expected structure of the report; added a detail about
how to run hybrid code when calling mpirun .
Game of Life
You’ll be prompted to implement a parallel version of a variant of the famous Conway’s
“game of life” ( references: 1, 2 ), that is cellular automaton which plays autonomously on
a infinite 2D grid.
The game itself has several exciting properties and pertains to a very interesting field, but
we leave to the readers going deeper into it at will.
Playground
As said, the game evolves on a 2D discrete world, where the tiniest position is a single
cell; actually you can imagine it as a point on a system of integer coordinates.
The neighbours of a cell are the 8 most adjacent cells, i.e. those that on a grid
representation share an edge with the considered cell, as depicted in the Fig. 1 below.
The playground, that will be a grid of size , has periodic boundary conditions at the
edges. It- means that cells at an edge have to be considered neighbours of the cells at the
opposite edge along the same axis. For instance, cell will have cells
and as neighbours.
Figure 2: Periodic boundary conditions: the cells at the edges with the same colour are
adjacent .
Each cell can be either “alive” or “dead” depending on the conditions of the neighboring
cells:
a cell becomes, or remains, alive if 2 to 3 cells in its neighborhood are alive;
In this way, the evolution of the system is completely determined by the initial conditions
and by the rules adopted to decide the update of the cells’ status.
Classically, the cells are upgraded in row-major order, like in the code:
A second option is to disentangle the status evaluation and status update of each cell, in
that the status of all the cells (i.e., the computation of how many alive adjacent each of
them has) should be evaluated at first, freezing the system, and updating the status of the
cells only afterwards. Let’s call this “static evolution”.
A third, among many others, option is that the ordered evolution does not always start
from the same point but from a random position and propagate in all directions as a
square wave, as illustrated in Fig. 5.
Figure 5: cells upgrade starting from a random point, propagating as a square wave.
Requirements
5. Along a run, a dump of the system has to be saved with a given frequency.
If a cell is set alive by chance, and it has less than 2 alive neighbours, then it
spreads life on neighbour cells so that at least 2 of them are set alive. It’s easy to
understand that, given the previous rules, whether this new life will survive
depends on its initial shape (you will quickly discover that a surprising number
of life forms can even move).
You are free to define whatever set of initial live patterns to be used in this case,
and to experiment with it.
An obvious advice is to use very small values for and to keep the Finger of
God a perturbation instead of the dominant force.
8. [ OPTIONAL ] implement the evolution with a square-wave signal from a grid
point randomly chosen at every time-step.
Your code should accept and handle the following command-line arguments:
ARGUMENT MEANING
-i initialize a playground
-r run a playground
-k num. value playground size
this evolves the initial status of the file $initial_conditions for 10000 steps
with the static evolution mode, producing a snapshot at every time-step.
File format
The adopted file format is the pgm binary format for both the initial conditions and the
evolutions snapshots, that allows to use any snapshot as a new initial conditions file for a
new experiment.
snapshot_nnnnn
where nnnnn (with 5 digits, padded with zeros) is the time-step it refers to.
The PGM image format, companion of the PBM and PPM formats, is a quite simple and
portable one.
It consists in a small header, written in ASCII, and in the pixels that compose the image,
written all one after the others as integer values.
Each pixel may occupy either 1 or 2 bytes, and its value in PGM corresponds to the grey
level of the pixel expressed in the range [0..255] or [0..65535]; since in our case the only
possibility is "dead/alive" then we adopt the single byte representation and the only two
values that are meaningful are 0 and 255.
Even if also the pixels can be written in ASCII format, we require the usage of a binary
format.
where magic is a magic number that for PGM is "" P4 ”, width and heigth are the
dimensions of the image in pixels, and maximum_value is either <256 or <65536 .
If maximum_value < 256 , then 1 byte is sufficient to represent all the possible values and
each pixel will be stored as 1 byte. instead, if 256 <= maximum_value < 65535 , 2 bytes
are needed to represent each pixel (that in the current case would be a waste of bytes).
In the sample file read_write_pgm_image.c that you find the folder, there are the
functions write_pgm_image() and read_pgm_image() that you can use to respectively
write and read such a file.
In the same file, there is a sample code that generates a square image and write it using
the read_write_pgm_image() function.
cc -o read_write_pgm_image read_write_pgm_image.c
./read_write_pgm_image [ max_val] [ width height]
as output you will find the image image.pgm which should be easily rendered by any
decent visualizer .
NOTE: the pbm file format is conceptually similar to the proposed pgm ; every pixel
requires one bit only of information because it is a black&white encoding. As such - and
that is the slightly trickier part - every byte in the file corresponds to 8 pixels that are
mapped onto the single bits. In the pgm format, instead, a pixel corresponds to a byte in
the file, which is easier to code. Optionally, you may want to implement also the pbm
format, which is perfectly adequate to our case because every cell (that is a pixel in the
image) may just be dead (0 ) or alive (1).
Appendix II - A note about hybrid MPI+OpenMP
As we mentioned in the class, a simple hybridization of MPI with OpenMP is quite
straightforward. As you have seen, it is obviously not a requirement but just an
opportunity for those among you that like to be challenged.
As long as you use OpenMP regions in a MPI process for computation only and not to
execute MPI calls, everything is basically safe and you can proceed as usual with both MPI
and OpenMP calls and constructs.
At a more advanced level, the same thread that initializes an OpenMP region (i.e. the
thread 0), and only that one, can make the MPI calls from within an OpenMP region
(“funneled” mode).
Possibly, every thread could call MPI routines but only one at one time (“serialized”
mode).
Eventually, multiple threads can make MPI calls at the same time, which is to be handled
carefully (“multiple” mode).
Initialize the MPI library with a call slightly different than MPI_Init() :
int mpi_provided_threaD_level;
MPI_Finalize();
return 0;
On several platforms you may notice that running something like the following
export OMP_NUM_THREADS=$MY_NUM_THREADS
export OMP_PLACES=$MY_PLACE_CHOICE
export OMP_PROC_BIN=$MY_BIND_CHOICE
export OMP_DISPLAY_ENV=TRUE
mpirun -np $MPI_NTASKS $MY_EXEC ...
results in all the threads spawned by an MPI task run on the same physical core.
That is due to the default mapping of MPI tasks to physical resource. You know that it is
possible to request a given mapping of the MPI tasks pool onto the hardware by, for
instance, –map-by .
So, the set of resources visible to a given MPI task will include only those included at the
required mapping level. For instance, if you require --map-by core , then the resources
visible to a given MPI task will include the logical threads of the core it will be assigned
to.
If you require --map-by L3cache the resource set will consist of all the cores that share
the L3 cache with the core that hosts the task itself. And so on.
hwthread only the logical core on which the whatever is the PLACES and BIND
task runs options, all of them will run on the
same logical core
core the physical core on which the task they will distribute on the visible
runs logical cores accordingly to your policy
choice
L?cache the phisical cores that share the ? same as above
level of cache with the core the task
run onto
socket the phisical cores on the same same as above
socket the task run onto
node all the node onto the task run same as above
In the following we suggest a structure that is somehow typical for any report/paper of
this kind.
SECTION DESCRIPTION
Introduction Brief overview and description of the problem tackled by your work.
Be concise and focused, leaving a comprehensive discussion to some
well-motivated reference.