Parallel Computing
Presented by Justin Reschke 9-14-04
Overview
Concepts and Terminology Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel Algorithm Examples Conclusion
Concepts and Terminology: What is Parallel Computing?
Traditionally software has been written for serial computation. Parallel computing is the simultaneous use of multiple compute resources to solve a computational problem.
Concepts and Terminology: Why Use Parallel Computing?
Saves time wall clock time Cost savings Overcoming memory constraints Its the future of computing
Concepts and Terminology: Flynns Classical Taxonomy
Distinguishes multi-processor architecture by instruction and data SISD Single Instruction, Single Data SIMD Single Instruction, Multiple Data MISD Multiple Instruction, Single Data MIMD Multiple Instruction, Multiple Data
Flynns Classical Taxonomy: SISD
Serial Only one instruction and data stream is acted on during any one clock cycle
Flynns Classical Taxonomy: SIMD
All processing units execute the same instruction at any given clock cycle. Each processing unit operates on a different data element.
Flynns Classical Taxonomy: MISD
Different instructions operated on a single data element. Very few practical uses for this type of classification. Example: Multiple cryptography algorithms attempting to crack a single coded message.
Flynns Classical Taxonomy: MIMD
Can execute different instructions on different data elements. Most common type of parallel computer.
Concepts and Terminology: General Terminology
Task A logically discrete section of computational work Parallel Task Task that can be executed by multiple processors safely Communications Data exchange between parallel tasks Synchronization The coordination of parallel tasks in real time
Concepts and Terminology: More Terminology
Granularity The ratio of computation to communication
Coarse High computation, low communication Fine Low computation, high communication Synchronizations Data Communications Overhead imposed by compilers, libraries, tools, operating systems, etc.
Parallel Overhead
Parallel Computer Memory Architectures: Shared Memory Architecture
All processors access all memory as a single global address space. Data sharing is fast. Lack of scalability between memory and CPUs
Parallel Computer Memory Architectures: Distributed Memory
Each processor has its own memory. Is scalable, no overhead for cache coherency. Programmer is responsible for many details of communication between processors.
Parallel Programming Models
Exist as an abstraction above hardware and memory architectures Examples:
Shared Memory Threads Messaging Passing Data Parallel
Parallel Programming Models: Shared Memory Model
Appears to the user as a single shared memory, despite hardware implementations. Locks and semaphores may be used to control shared memory access. Program development can be simplified since there is no need to explicitly specify communication between tasks.
Parallel Programming Models: Threads Model
A single process may have multiple, concurrent execution paths. Typically used with a shared memory architecture. Programmer is responsible for determining all parallelism.
Parallel Programming Models: Message Passing Model
Tasks exchange data by sending and receiving messages. Typically used with distributed memory architectures. Data transfer requires cooperative operations to be performed by each process. Ex.- a send operation must have a receive operation. MPI (Message Passing Interface) is the interface standard for message passing.
Parallel Programming Models: Data Parallel Model
Tasks performing the same operations on a set of data. Each task working on a separate piece of the set. Works well with either shared memory or distributed memory architectures.
Designing Parallel Programs: Automatic Parallelization
Automatic
Compiler analyzes code and identifies opportunities for parallelism Analysis includes attempting to compute whether or not the parallelism actually improves performance. Loops are the most frequent target for automatic parallelism.
Designing Parallel Programs: Manual Parallelization
Understand the problem
A Parallelizable Problem:
Calculate the potential energy for each of several thousand independent conformations of a molecule. When done find the minimum energy conformation.
A Non-Parallelizable Problem:
The Fibonacci Series
All calculations are dependent
Designing Parallel Programs: Domain Decomposition
Each task handles a portion of the data set.
Designing Parallel Programs: Functional Decomposition
Each task performs a function of the overall work
Parallel Algorithm Examples: Array Processing
Serial Solution
Perform a function on a 2D array. Single processor iterates through each element in the array
Assign each processor a partition of the array. Each process iterates through its own partition.
Possible Parallel Solution
Parallel Algorithm Examples: Odd-Even Transposition Sort
Basic idea is bubble sort, but concurrently comparing odd indexed elements with an adjacent element, then even indexed elements. If there are n elements in an array and there are n/2 processors. The algorithm is effectively O(n)!
Parallel Algorithm Examples: Odd Even Transposition Sort
Initial array:
Worst case scenario. Phase 1 Phase 2 Phase 1 Phase 2 Phase 1 Phase 2 Phase 1
6, 5, 4, 3, 2, 1, 0
6, 4, 5, 2, 3, 0, 1 4, 6, 2, 5, 0, 3, 1 4, 2, 6, 0, 5, 1, 3 2, 4, 0, 6, 1, 5, 3 2, 0, 4, 1, 6, 3, 5 0, 2, 1, 4, 3, 6, 5 0, 1, 2, 3, 4, 5, 6
Other Parallelizable Problems
The n-body problem Floyds Algorithm
Serial: O(n^3), Parallel: O(n log p)
Game Trees Divide and Conquer Algorithms
Conclusion
Parallel computing is fast. There are many different approaches and models of parallel computing. Parallel computing is the future of computing.
References
A Library of Parallel Algorithms, www2.cs.cmu.edu/~scandal/nesl/algorithms.html Internet Parallel Computing Archive, wotug.ukc.ac.uk/parallel Introduction to Parallel Computing, www.llnl.gov/computing/tutorials/parallel_comp/#Whatis Parallel Programming in C with MPI and OpenMP, Michael J. Quinn, McGraw Hill Higher Education, 2003 The New Turing Omnibus, A. K. Dewdney, Henry Holt and Company, 1993