Unit-2 Process Parallel Lang
Unit-2 Process Parallel Lang
The past decade has brought explosive growth in multiprocessor computing, including multi-core
processors and distributed data centers. As a result, parallel and distributed computing has
moved from a largely elective topic to become more of a core component of undergraduate
computing curricula. Both parallel and distributed computing entail the logically simultaneous
execution of multiple processes, whose operations have the potential to interleave in complex
ways. Parallel and distributed computing builds on foundations in many areas, including an
understanding of fundamental systems concepts such as concurrency and parallel execution,
consistency in state/memory manipulation, and latency. Communication and coordination
among processes is rooted in the message-passing and shared-memory models of computing and
such algorithmic concepts as atomicity, consensus, and conditional waiting. Achieving speedup
in practice requires an understanding of parallel algorithms, strategies for problem
decomposition, system architecture, detailed implementation strategies, and performance
analysis and tuning. Distributed systems highlight the problems of security and fault tolerance,
emphasize the maintenance of replicated state, and introduce additional issues that bridge to
computer networking.
Because parallelism interacts with so many areas of computing, including at least algorithms,
languages, systems, networking, and hardware, many curricula will put different parts of the
knowledge area in different courses, rather than in a dedicated course. While we acknowledge
that computer science is moving in this direction and may reach that point, in 2013 this process is
still in flux and we feel it provides more useful guidance to curriculum designers to aggregate the
fundamental parallelism topics in one place. Note, however, that the fundamentals of
concurrency and mutual exclusion appear in the Systems Fundamentals (SF) Knowledge Area.
Many curricula may choose to introduce parallelism and concurrency in the same course (see
below for the distinction intended by these terms). Further, we note that the topics and learning
outcomes listed below include only brief mentions of purely elective coverage. At the present
time, there is too much diversity in topics that share little in common (including for example,
parallel scientific computing, process calculi, and non-blocking data structures) to recommend
particular topics be covered in elective courses.
Because the terminology of parallel and distributed computing varies among communities, we
provide here brief descriptions of the intended senses of a few terms. This list is not exhaustive
or definitive, but is provided for the sake of clarity.
• Activity: A computation that may proceed concurrently with others; for example a
program, process, thread, or active parallel hardware component.
• Consensus: Agreement among two or more activities about a given predicate; for
example, the value of a counter, the owner of a lock, or the termination of a thread.
• Consistency: Rules and properties governing agreement about the values of variables
written, or messages produced, by some activities and used by others (thus possibly
exhibiting a data race); for example, sequential consistency, stating that the values of all
variables in a shared memory parallel program are equivalent to that of a single program
performing some interleaving of the memory accesses of these activities.
• Multicast: A message sent to possibly many recipients, generally without any constraints
about whether some recipients receive the message before others. An event is a multicast
message sent to a designated set of listeners or subscribers.
As multi-processor computing continues to grow in the coming years, so too will the role of
parallel and distributed computing in undergraduate computing curricula. In addition to the
guidelines presented here, we also direct the interested reader to the document entitled
"NSF/TCPP Curriculum Initiative on Parallel and Distributed Computing - Core Topics for
Undergraduates", available from the website: http://www.cs.gsu.edu/~tcpp/curriculum/.
- 146 -
The introduction to parallelism in SF complements the one here and there is no ordering
constraint between them. In SF, the idea is to provide a unified view of the system support for
simultaneous execution at multiple levels of abstraction (parallelism is inherent in gates,
processors, operating systems, and servers), whereas here the focus is on a preliminary
understanding of parallelism as a computing primitive and the complications that arise in parallel
and concurrent programming. Given these different perspectives, the hours assigned to each are
not redundant: the layered systems view and the high-level computing concepts are accounted
for separately in terms of the core hours.
PD/Parallelism Fundamentals 2 N
PD/Parallel Decomposition 1 3 N
PD/Parallel Architecture 1 1 Y
PD/Parallel Performance Y
PD/Distributed Systems Y
PD/Cloud Computing Y
- 147 -
PD/Parallelism Fundamentals
[2 Core-Tier1 hours]
Build upon students’ familiarity with the notion of basic parallel execution—a concept addressed
in Systems Fundamentals—to delve into the complicating issues that stem from this notion, such
as race conditions and liveness.
Cross-reference SF/Computational Paradigms and SF/System Support for Parallelism.
Topics:
Learning outcomes:
1. Distinguish using computational resources for a faster answer from managing efficient access to a shared
resource. (Cross-reference GV/Fundamental Concepts, outcome 5.) [Familiarity]
2. Distinguish multiple sufficient programming constructs for synchronization that may be inter-
implementable but have complementary advantages. [Familiarity]
3. Distinguish data races from higher level races. [Familiarity]
PD/Parallel Decomposition
[1 Core-Tier1 hour, 3 Core-Tier2 hours]
(Cross-reference SF/System Support for Parallelism)
Topics:
[Core-Tier1]
[Core-Tier2]
• Basic knowledge of parallel decomposition concepts (cross-reference SF/System Support for Parallelism)
• Task-based decomposition
o Implementation strategies such as threads
• Data-parallel decomposition
o Strategies such as SIMD and MapReduce
• Actors and reactive processes (e.g., request handlers)
- 148 -
Learning outcomes:
[Core-Tier1]
[Core-Tier2]
[Core-Tier1]
• Shared Memory
• Consistency, and its role in programming language guarantees for data-race-free programs
[Core-Tier2]
• Message passing
o Point-to-point versus multicast (or event-based) messages
o Blocking versus non-blocking styles for sending and receiving messages
o Message buffering (cross-reference PF/Fundamental Data Structures/Queues)
• Atomicity
o Specifying and testing atomicity and safety requirements
o Granularity of atomic accesses and updates, and the use of constructs such as critical sections or
transactions to describe them
o Mutual Exclusion using locks, semaphores, monitors, or related constructs
Potential for liveness failures and deadlock (causes, conditions, prevention)
o Composition
Composing larger granularity atomic actions using synchronization
Transactions, including optimistic and conservative approaches
[Elective]
• Consensus
o (Cyclic) barriers, counters, or related constructs
• Conditional actions
o Conditional waiting (e.g., using condition variables)
- 149 -
Learning outcomes:
[Core-Tier1]
[Core-Tier2]
3. Give an example of a scenario in which blocking message sends can deadlock. [Usage]
4. Explain when and why multicast or event-based messaging can be preferable to alternatives. [Familiarity]
5. Write a program that correctly terminates when all of a set of concurrent tasks have completed. [Usage]
6. Use a properly synchronized queue to buffer data passed among activities. [Usage]
7. Explain why checks for preconditions, and actions based on these checks, must share the same unit of
atomicity to be effective. [Familiarity]
8. Write a test program that can reveal a concurrent programming error; for example, missing an update when
two activities both try to increment a variable. [Usage]
9. Describe at least one design technique for avoiding liveness failures in programs using multiple locks or
semaphores. [Familiarity]
10. Describe the relative merits of optimistic versus conservative concurrency control under different rates of
contention among updates. [Familiarity]
11. Give an example of a scenario in which an attempted optimistic update may never complete. [Familiarity]
[Elective]
12. Use semaphores or condition variables to block threads until a necessary precondition holds. [Usage]
• Critical paths, work and span, and the relation to Amdahl’s law (cross-reference SF/Performance)
• Speed-up and scalability
• Naturally (embarrassingly) parallel algorithms
• Parallel algorithmic patterns (divide-and-conquer, map and reduce, master-workers, others)
o Specific algorithms (e.g., parallel MergeSort)
[Elective]
• Parallel graph algorithms (e.g., parallel shortest path, parallel spanning tree) (cross-reference
AL/Algorithmic Strategies/Divide-and-conquer)
• Parallel matrix computations
• Producer-consumer and pipelined algorithms
• Examples of non-scalable parallel algorithms
- 150 -
Learning outcomes:
[Core-Tier2]
[Elective]
PD/Parallel Architecture
[1 Core-Tier1 hour, 1 Core-Tier2 hour]
The topics listed here are related to knowledge units in the Architecture and Organization (AR)
knowledge area (AR/Assembly Level Machine Organization and AR/Multiprocessing and
Alternative Architectures). Here, we focus on parallel architecture from the standpoint of
applications, whereas the Architecture and Organization knowledge area presents the topic from
the hardware perspective.
[Core-Tier1]
• Multicore processors
• Shared vs. distributed memory
[Core-Tier2]
[Elective]
• GPU, co-processing
• Flynn’s taxonomy
• Instruction level support for parallel programming
o Atomic instructions such as Compare and Set
• Memory issues
o Multiprocessor caches and cache coherence
o Non-uniform memory access (NUMA)
- 151 -
• Topologies
o Interconnects
o Clusters
o Resource sharing (e.g., buses and interconnects)
Learning outcomes:
[Core-Tier1]
[Core-Tier2]
2. Describe the SMP architecture and note its key features. [Familiarity]
3. Characterize the kinds of tasks that are a natural match for SIMD machines. [Familiarity]
[Elective]
PD/Parallel Performance
[Elective]
Topics:
• Load balancing
• Performance measurement
• Scheduling and contention (cross-reference OS/Scheduling and Dispatch)
• Evaluating communication overhead
• Data management
o Non-uniform communication costs due to proximity (cross-reference SF/Proximity)
o Cache effects (e.g., false sharing)
o Maintaining spatial locality
• Power usage and management
Learning outcomes:
- 152 -
PD/Distributed Systems
[Elective]
Topics:
Learning outcomes:
PD/Cloud Computing
[Elective]
Topics:
• Internet-Scale computing
o Task partitioning (cross-reference PD/Parallel Algorithms, Analysis, and Programming)
o Data access
o Clusters, grids, and meshes
• Cloud services
o Infrastructure as a service
Elasticity of resources
Platform APIs
- 153 -
o Software as a service
o Security
o Cost management
• Virtualization (cross-reference SF/Virtualization and Isolation and OS/Virtual Machines)
o Shared resource management
o Migration of processes
• Cloud-based data storage
o Shared access to weakly consistent data stores
o Data synchronization
o Data partitioning
o Distributed file systems (cross-reference IM/Distributed Databases)
o Replication
Learning outcomes:
1. Discuss the importance of elasticity and resource management in cloud computing. [Familiarity]
2. Explain strategies to synchronize a common view of shared data across a collection of devices.
[Familiarity]
3. Explain the advantages and disadvantages of using virtualized infrastructure. [Familiarity]
4. Deploy an application that uses cloud infrastructure for computing and/or data resources. [Usage]
5. Appropriately partition an application between a client and resources. [Usage]
• Formal models of processes and message passing, including algebras such as Communicating Sequential
Processes (CSP) and pi-calculus
• Formal models of parallel computation, including the Parallel Random Access Machine (PRAM) and
alternatives such as Bulk Synchronous Parallel (BSP)
• Formal models of computational dependencies
• Models of (relaxed) shared memory consistency and their relation to programming language specifications
• Algorithmic correctness criteria including linearizability
• Models of algorithmic progress, including non-blocking guarantees and fairness
• Techniques for specifying and checking correctness properties such as atomicity and freedom from data
races
Learning outcomes:
- 154 -