0% found this document useful (0 votes)
16 views48 pages

PDC Unit-2

The document discusses the principles of parallel algorithm design, focusing on the need for synchronization and communication among tasks. It outlines various synchronization methods, scheduling policies, and task decomposition strategies, emphasizing the importance of balancing task granularity and communication overhead for optimal performance. Additionally, it highlights the role of task interaction graphs in analyzing communication dependencies and optimizing parallel algorithms.

Uploaded by

tatadhanyasri20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views48 pages

PDC Unit-2

The document discusses the principles of parallel algorithm design, focusing on the need for synchronization and communication among tasks. It outlines various synchronization methods, scheduling policies, and task decomposition strategies, emphasizing the importance of balancing task granularity and communication overhead for optimal performance. Additionally, it highlights the role of task interaction graphs in analyzing communication dependencies and optimizing parallel algorithms.

Uploaded by

tatadhanyasri20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Parallel algorithm design principles

and programming
UNIT-2
Need for communication and coordination/synchronization

Synchronization:

• Synchronization means organizing the sequence of work and the tasks that perform it. This is very important
in programs that run tasks in parallel (simultaneously).

• It ensures that tasks are coordinated, which improves program performance. Often, it involves "serializing"
parts of the program.

Types of Synchronization:
Barrier:
•A barrier ensures all tasks finish their work before moving forward.
•Each task keeps working until it reaches the barrier. Once all tasks reach it, they synchronize and move to the
next step together.
•Sometimes, a specific task or part of the work must be completed before others continue.
Lock/Semaphore:
•This is used to control access to shared resources like data or code.
•Only one task can use the resource at a time. A task must "lock" the resource before using it and
"unlock" it afterward.
•If another task tries to access the locked resource, it has to wait. This can either pause the task
(blocking) or allow it to do something else in the meantime (non-blocking).
Synchronous Communication Operations:
•These involve two or more tasks that need to communicate while working.
•For example, if a task sends data, it waits for confirmation that the other task received it. This ensures
both tasks are properly coordinated.
• Some problems can be solved in parallel without much coordination. For instance, in image processing,
different parts of an image can be handled by different tasks.

• However, some tasks depend on each other and need to share data or resources. This requires proper
synchronization.

Design Considerations for Synchronization:

• When designing systems, consider:

• The cost of communication between tasks.

• Latency (delays) and bandwidth usage.

• Whether tasks use synchronous (waiting for confirmation) or asynchronous (not waiting)
communication.

• The scope and efficiency of communication.


Scheduling and Contention:

• In a parallel system, multiple jobs arrive, and the system decides the order in which they should be executed.

• The goal is to reduce the total time it takes to complete all the jobs (minimize turnaround time).

•A parallel job is assigned to a group of processors, which is called a partition.

•Parallel machines are divided into separate, non-overlapping partitions, where different jobs run at the same

time. This is called space slicing or space partitioning.

Job Submission and Scheduling:

•Users send their jobs to a machine’s scheduler.

•Jobs wait in a queue until they are allocated processors, especially when the system's state changes.
Goal:
•The aim is to maximize processor usage.
•However, since future jobs and their execution times are unknown, the system uses simple rules (heuristics) to
allocate jobs efficiently at each scheduling step.
How Scheduling Works:
•The scheduler assigns resources (like processors and nodes) to a job based on its requirements, using data
provided by the resource manager.
Scheduling Policies:
1.FCFS (First Come First Serve): Jobs are processed in the order they arrive.
2.Lookahead Optimizing Scheduler: Tries to predict job requirements for better scheduling.
3.Gang Scheduling: Schedules related jobs (from the same group) to run simultaneously.
Independence & Partitioning
•The first step in creating a parallel algorithm is to break the problem into smaller tasks that can run at
the same time.
•These tasks can vary in size or complexity.
Decomposition:
•Tasks can be represented using a task dependency graph, which shows the order in which tasks must
be executed.
•In the graph, nodes represent tasks, and edges show which tasks depend on the results of others.
Tasks:
•A task is a small unit of work within the system.
•Decomposition divides the main computation into these tasks.
Common Tasks Include:
1.Identifying work that can run in parallel.
2.Assigning tasks to processors.
3.Distributing inputs, outputs, and data among tasks.
4.Managing shared resources.
5.Synchronizing processors to ensure proper task execution.
• Running multiple tasks at the same time helps solve problems faster.

• Tasks can be of any size, but once defined, they are the smallest units that can run in parallel.

Ex: Multiplying a Dense Matrix with a Vector

• Each element of the output vector (y) is calculated independently of the others.

• This allows the matrix-vector multiplication to be divided into n tasks, where each task handles a part of the
matrix and vector.
Observations: Tasks share data (e.g., the vector b), but there are no control dependencies.

• This means that no task needs to wait for another to complete before starting.

• All tasks perform the same number of operations, making them equal in size.

• A question arises: Is this the maximum number of tasks that can be created for this problem?

Ex: Database Query Processing


•Consider the execution of the query:
•MODEL = "CIVIC" AND YEAR = 2001 AND (COLOR = "GREEN" OR COLOR = "WHITE")
•This query processes a database by selecting rows that match all these conditions.
• Task: create sets of elements that satisfy a (or several) criteria.
• Edge: output of one task serves as input to the next
Different tables & their dependencies in a query processing operation
An alternate task-dependency graph for query
• Different task decomposition leads to different parallelism.
Granularity of Task Decomposition
• Fine-grained decomposition: large number of small tasks
• Coarse-grained decomposition: small number of large tasks
Matrix-vector multiplication example
Degree of Concurrency: # of tasks that can execute in parallel
-- maximum degree of concurrency: largest # of concurrent tasks at any point of
the execution
average degree of concurrency: average # of tasks that can be executed
concurrently
• Degree of Concurrency vs. Task Granularity
- Inverse relation
Advantages of concurrent tasking:
A Natural Model for Many Real-Time Applications:
•Concurrent tasking aligns well with real-time applications, where tasks or processes run
simultaneously to meet time constraints and deliver results efficiently.
Separation of Concerns:
•Concurrent tasking divides responsibilities by focusing on what a task does (its
functionality) separately from when it executes (its timing or scheduling). This separation
simplifies system design, making it easier to:
• Understand the system.
• Manage the individual components.
• Construct a robust and scalable architecture.
Reduction in System Execution Time:
•By overlapping the execution of independent tasks, concurrent tasking can minimize the total execution
time of a system, leading to more efficient use of resources.
Greater Scheduling Flexibility:
•Concurrent tasking enables flexibility in scheduling by allowing time-critical tasks with strict deadlines
to be prioritized over less critical tasks. This ensures that high-priority tasks are completed on time.
Early Performance Analysis:
•Identifying concurrent tasks early in the system design phase allows developers to conduct performance
analysis at an early stage. This can help in optimizing the system and addressing potential bottlenecks
before implementation.
Critical Path of Task Graph

• Critical Path Length: The longest path from the start to the end of the task graph. It defines the
minimum time required to complete all tasks in the graph.

• Average Degree of Concurrency: A measure of how many tasks, on average, can be executed
concurrently.

• It is calculated as the total sum of task weights divided by the critical path length.

Average degree of concurrency = total amount of work / critical path length


Ex: Critical Path Length
Task-dependency graphs of query processing operation
Limits on Parallel Performance:

Decomposition Granularity:

• Theoretically, breaking a task into smaller subtasks (finer granularity) can reduce parallel execution time.

• However, practical limits exist because:

• Dividing tasks too finely can result in excessive overhead for task management and coordination.

• Communication and synchronization costs may outweigh the benefits of finer granularity.
Bound on Granularity:
• There is an inherent upper bound to how finely a task can be divided.
• Example:
• For matrix-vector multiplication, there are at most n2n^2n2 concurrent tasks (one task per matrix
entry).
• Beyond this limit, no further decomposition is possible because the computational work is
inherently limited.
Communication Overhead:

•Concurrent tasks often need to exchange data (e.g., sharing intermediate results).

•This introduces communication overhead, which can:

•Reduce the overall efficiency of parallel execution.

•Create a tradeoff between finer decomposition (to maximize concurrency) and the overhead caused by data

transfer and synchronization.

Tradeoff Between Granularity and Overhead:

•The performance bounds of a parallel system are determined by finding the optimal balance between:

•Task granularity (how small the subtasks are).

•Communication overhead (cost of data transfer and synchronization).


• Parallel performance does not scale infinitely with finer task decomposition due to:

• Limits on granularity (the computational structure restricts how many tasks can run concurrently).

• Overheads (communication, coordination, and task scheduling).

• Optimizing parallel performance requires balancing task size and communication costs, which is often
application-specific.

• This highlights that while parallel computing can significantly speed up processes, it is limited by the inherent
nature of the computation and the associated overheads.
Task Interaction Graph:
Subtasks Exchange Data:
•In a decomposition (dividing a problem into smaller tasks), subtasks often need to communicate data with each
other.
•Example:
•In the decomposition of a dense matrix-vector multiplication:
•If the vector is not replicated across all tasks, subtasks will need to communicate elements of the
vector with one another.
Graph Representation:
•A task interaction graph is a representation of:
•Tasks as nodes.
•Interactions or data exchanges between tasks as edges.
•This graph helps visualize and analyze the communication dependencies among tasks.
Importance of the Task Interaction Graph:

• Analyzing Communication Overhead:


• The task interaction graph helps identify communication requirements among tasks, which is crucial for
understanding and minimizing communication overhead in parallel systems.

• Scheduling and Optimization:


• By examining the task interaction graph, tasks can be scheduled to minimize inter-task communication or
overlapping dependencies, improving overall efficiency.

• Designing Parallel Algorithms:


• The graph provides insight into how to design algorithms to reduce dependencies and enhance scalability.
Sparse Matrix Representation (Fig. a):
•The matrix A shown is sparse, meaning most of its elements are zero.
•Non-zero entries are marked, and these are the elements that will participate in the multiplication with the vector b.
•Each row of the matrix corresponds to a task in the multiplication process:
•Task iii: Computes the dot product of the iii-th row of A with the vector b.
Task Interaction Graph (Fig. b):
•Each node in the graph represents a task (row of A).
•Edges indicate dependencies or data exchange between tasks:
•For example, if two tasks share a common non-zero column in A, they might need to exchange the corresponding
value from b.
Steps in Sparse Matrix-Vector Multiplication:
• Decompose the matrix into rows (tasks).
• For each task (row), perform the dot product using only the non-zero entries in the row and the
corresponding elements of b.
• Communicate values of b to tasks that depend on shared columns.
Decomposition:
• Decomposition refers to dividing a task into smaller subtasks to facilitate parallel execution. This process
helps achieve concurrency and is critical for parallel computing.

• There is no single universal method for task decomposition; techniques depend on the specific problem
being solved.

• Common types of decomposition include:

• Recursive decomposition: Based on breaking down problems recursively.

• Data decomposition: Dividing tasks based on data distribution.

• Exploratory decomposition: For tasks involving dynamic exploration, like search problems.

• Speculative decomposition: Used when predicting which tasks will be required in the future.
Recursive Decomposition:
• Recursive decomposition is particularly effective for problems that follow the divide-and-conquer strategy,
which breaks a problem into smaller independent subproblems and solves them recursively.

• Steps:

• Decompose the problem into independent sub-problems:

• The problem is divided into smaller parts that can be solved independently.

• Example: Breaking a sorting problem into sorting smaller subsets.

• Recursively decompose each sub-problem:

• Each subproblem is further divided until the tasks become simple enough to be solved directly (base case).
Advantages of Recursive Decomposition:
• Naturally introduces concurrency, as the independent subproblems can be solved in parallel.
• Suited for problems with hierarchical or tree-like structures.
Data Decomposition

You might also like