Parallel Computing
Parallel Computing
Parallel Computing
• Shared memory machines can be divided into two groups based upon
memory access time:
UMA : Uniform Memory Access
NUMA : Non- Uniform Memory Access
Parallel Computer Memory Architectures
- Shared Memory
Two ways:
• Domain decomposition
• Functional decomposition
Designing Parallel Programs
Domain Decomposition
The data associated with a problem is decomposed
• Efficiency of communications
Designing Parallel Programs
Granularity
• In parallel computing, granularity is a qualitative measure of the ratio of
computation to communication. (Computation / Communication)
• Periods of computation are typically separated from periods of
communication by synchronization events.
• Fine-grain Parallelism
• Coarse-grain Parallelism
Designing Parallel Programs
• Fine-grain Parallelism
• Relatively small amounts of computational work
are done between communication events
• Low computation to communication ratio
• Facilitates load balancing
• Implies high communication overhead and less
opportunity for performance enhancement
• If granularity is too fine it is possible that the
overhead required for communications and
synchronization between tasks takes longer than
the computation.
• Coarse-grain Parallelism
• Relatively large amounts of computational work
are done between communication/synchronization
events
• High computation to communication ratio
• Implies more opportunity for performance increase
• Harder to load balance efficiently
Designing Parallel Programs
I/O
• Rule #1: Reduce overall I/O as much as possible
• If you have access to a parallel file system, use it.
• Writing large chunks of data rather than small chunks is usually
significantly more efficient.
• Fewer, larger files performs better than many small files.
• Confine I/O to specific serial portions of the job, and then use parallel
communications to distribute data to parallel tasks. For example, Task 1
could read an input file and then communicate required data to other
tasks. Likewise, Task 1 could perform write operation after receiving
required data from all other tasks.
• Aggregate I/O operations across tasks - rather than having many tasks
perform I/O, have a subset of tasks perform it.
Designing Parallel Programs
Debugging
• TotalView from RogueWave Software
• DDT from Allinea
• Inspector from Intel