Chapter 1PARALLEL PROGRAM
Chapter 1PARALLEL PROGRAM
Parallelism
Parallel, concurrent and distributed systems are types of computing systems that involve
multiple components or processes that can execute simultaneously or asynchronously.
These systems can offer advantages such as higher performance, scalability, reliability,
and fault tolerance over traditional sequential or centralized systems.
Parallel systems are systems that use multiple processors or cores to execute multiple
tasks or subtasks of a single problem at the same time. Parallel systems can be classified
according to Flynn’s taxonomy of computer architecture 1, which is based on the number
of instruction streams and data streams in the system. The four categories are:
SISD (Single Instruction, Single Data): A system that executes a single instruction on a
single data element at a time. This is the simplest and most common type of system,
such as a single-core CPU.
SIMD (Single Instruction, Multiple Data): A system that executes a single instruction on
multiple data elements at the same time. This type of system can exploit data
parallelism, which is when the same operation can be applied to different parts of the
data. An example of this type of system is a GPU, which can perform graphics operations
on many pixels or vertices in parallel.
MISD (Multiple Instruction, Single Data): A system that executes multiple instructions on
a single data element at a time. This type of system is rare and mostly theoretical, as it is
not clear what the benefit of such a system would be. An example of this type of system
is a fault-tolerant system that uses multiple processors to perform the same
computation on the same data and compare the results for consistency.
MIMD (Multiple Instruction, Multiple Data): A system that executes multiple instructions
on multiple data elements at a time. This type of system can exploit both data
parallelism and task parallelism, which is when different operations can be applied to
different parts of the data or different subproblems. An example of this type of system is
a multiprocessor or a multicore CPU, which can run multiple threads or processes in
parallel.
Concurrent systems are systems that use multiple processes or threads to execute
multiple tasks or subtasks of a single or multiple problems in an interleaved or
overlapping manner. Concurrent systems may or may not be parallel, depending on
whether the processes or threads can run simultaneously on multiple processors or
cores, or whether they have to share a single processor or core and switch between
them. Concurrent systems can be classified according to the parallel programming
models2, which are abstractions of parallel hardware and software that define how
parallel processes communicate and synchronize. The common models are:
Shared memory model: A model that assumes that all processes share a common
address space and can access the same variables or data structures. This model provides
a unified and convenient way of communication and data sharing, but also poses
challenges such as memory consistency, cache coherence, synchronization, and
scalability. An example of this model is a shared memory multiprocessor or a multicore
CPU, which can use locks, semaphores, monitors, or atomic operations to coordinate the
access to shared data.
Message passing model: A model that assumes that each process has its own address
space and can only communicate with other processes by sending and receiving
messages. This model provides a scalable and fault-tolerant way of communication and
data sharing, but also poses challenges such as latency, bandwidth, load balancing, and
synchronization. An example of this model is a distributed memory multiprocessor or a
cluster, which can use MPI, PVM, or sockets to exchange messages between processes.
Threads model: A model that assumes that each process can create multiple threads
that share the same address space and resources of the process, but can execute
independently and concurrently. This model provides a way of exploiting both
concurrency and parallelism within a single process, but also poses challenges such as
thread management, synchronization, and deadlock. An example of this model is a
multithreaded program that can run on a single or multiple processors or cores, which
can use pthreads, Java threads, or OpenMP to create and manage threads.
Data parallel model: A model that assumes that the same computation can be applied
to different parts of a large data set in parallel. This model provides a way of exploiting
data parallelism without explicitly managing the communication and synchronization
between processes, but also poses challenges such as data distribution, load balancing,
and scalability. An example of this model is a data parallel program that can run on a
SIMD array processor or a vector processor, which can use CUDA, OpenCL, or Fortran 90
to express data parallel operations.
The need for parallelism arises from the increasing demand for higher performance,
scalability, reliability, and fault tolerance in computing systems. Parallelism can help
improve these aspects by exploiting the inherent parallelism in the problems or
applications, by using multiple processors or cores to execute multiple tasks or subtasks
at the same time or in an interleaved or overlapping manner, and by using multiple
nodes or computers to communicate and coordinate with each other to achieve a
common goal. However, parallelism also introduces new challenges and complexities in
the design, development, analysis, and evaluation of parallel, concurrent, and distributed
systems, which require appropriate hardware, software, and tools to support them.
Slide 5: Pipelining
A parallel system that consists of multiple processing elements that operate on arrays of
data in parallel
Two types: SIMD array processor and vector processor
Examples: GPU, DSP, Cray-1, etc.
An abstraction of parallel hardware and software that defines how parallel processes
communicate and synchronize
Two aspects: process interaction and problem decomposition
Common models: shared memory model, message passing model, threads model, data
parallel model, etc.
Processes: independent units of execution that have their own address space and
resources
Threads: lightweight units of execution that share the address space and resources of a
process
Benefits of using processes and threads: concurrency, parallelism, modularity,
responsiveness, etc.
A formula that relates the speedup of a parallel program to the fraction of the program
that can be parallelized
Speedup = 1 / ( (1 - p) + p / n )
p: the fraction of the program that can be parallelized
n: the number of processors
Implication: the speedup is limited by the sequential part of the program
A formula that relates the speedup of a parallel program to the fraction of the program
that is sequential
Speedup = n - s * (n - 1)
s: the fraction of the program that is sequential
n: the number of processors
Implication: the speedup can be increased by increasing the problem size