0% found this document useful (0 votes)

16 views48 pages

PDC Unit-2

The document discusses the principles of parallel algorithm design, focusing on the need for synchronization and communication among tasks. It outlines various synchronization methods, scheduling policies, and task decomposition strategies, emphasizing the importance of balancing task granularity and communication overhead for optimal performance. Additionally, it highlights the role of task interaction graphs in analyzing communication dependencies and optimizing parallel algorithms.

Uploaded by

tatadhanyasri20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views48 pages

PDC Unit-2

Uploaded by

tatadhanyasri20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Parallel algorithm design principles

and programming
UNIT-2
Need for communication and coordination/synchronization

Synchronization:

• Synchronization means organizing the sequence of work and the tasks that perform it. This is very important
in programs that run tasks in parallel (simultaneously).

• It ensures that tasks are coordinated, which improves program performance. Often, it involves "serializing"
parts of the program.

Types of Synchronization:
Barrier:
•A barrier ensures all tasks finish their work before moving forward.
•Each task keeps working until it reaches the barrier. Once all tasks reach it, they synchronize and move to the
next step together.
•Sometimes, a specific task or part of the work must be completed before others continue.
Lock/Semaphore:
•This is used to control access to shared resources like data or code.
•Only one task can use the resource at a time. A task must "lock" the resource before using it and
"unlock" it afterward.
•If another task tries to access the locked resource, it has to wait. This can either pause the task
(blocking) or allow it to do something else in the meantime (non-blocking).
Synchronous Communication Operations:
•These involve two or more tasks that need to communicate while working.
•For example, if a task sends data, it waits for confirmation that the other task received it. This ensures
both tasks are properly coordinated.
• Some problems can be solved in parallel without much coordination. For instance, in image processing,
different parts of an image can be handled by different tasks.

• However, some tasks depend on each other and need to share data or resources. This requires proper
synchronization.

Design Considerations for Synchronization:

• When designing systems, consider:

• The cost of communication between tasks.

• Latency (delays) and bandwidth usage.

• Whether tasks use synchronous (waiting for confirmation) or asynchronous (not waiting)
communication.

• The scope and efficiency of communication.

Scheduling and Contention:

• In a parallel system, multiple jobs arrive, and the system decides the order in which they should be executed.

• The goal is to reduce the total time it takes to complete all the jobs (minimize turnaround time).

•A parallel job is assigned to a group of processors, which is called a partition.

•Parallel machines are divided into separate, non-overlapping partitions, where different jobs run at the same

time. This is called space slicing or space partitioning.

Job Submission and Scheduling:

•Users send their jobs to a machine’s scheduler.

•Jobs wait in a queue until they are allocated processors, especially when the system's state changes.
Goal:
•The aim is to maximize processor usage.
•However, since future jobs and their execution times are unknown, the system uses simple rules (heuristics) to
allocate jobs efficiently at each scheduling step.
How Scheduling Works:
•The scheduler assigns resources (like processors and nodes) to a job based on its requirements, using data
provided by the resource manager.
Scheduling Policies:
1.FCFS (First Come First Serve): Jobs are processed in the order they arrive.
2.Lookahead Optimizing Scheduler: Tries to predict job requirements for better scheduling.
3.Gang Scheduling: Schedules related jobs (from the same group) to run simultaneously.
Independence & Partitioning
•The first step in creating a parallel algorithm is to break the problem into smaller tasks that can run at
the same time.
•These tasks can vary in size or complexity.
Decomposition:
•Tasks can be represented using a task dependency graph, which shows the order in which tasks must
be executed.
•In the graph, nodes represent tasks, and edges show which tasks depend on the results of others.
Tasks:
•A task is a small unit of work within the system.
•Decomposition divides the main computation into these tasks.
Common Tasks Include:
1.Identifying work that can run in parallel.
2.Assigning tasks to processors.
3.Distributing inputs, outputs, and data among tasks.
4.Managing shared resources.
5.Synchronizing processors to ensure proper task execution.
• Running multiple tasks at the same time helps solve problems faster.

• Tasks can be of any size, but once defined, they are the smallest units that can run in parallel.

Ex: Multiplying a Dense Matrix with a Vector

• Each element of the output vector (y) is calculated independently of the others.

• This allows the matrix-vector multiplication to be divided into n tasks, where each task handles a part of the
matrix and vector.
Observations: Tasks share data (e.g., the vector b), but there are no control dependencies.

• This means that no task needs to wait for another to complete before starting.

• All tasks perform the same number of operations, making them equal in size.

• A question arises: Is this the maximum number of tasks that can be created for this problem?

Ex: Database Query Processing

•Consider the execution of the query:
•MODEL = "CIVIC" AND YEAR = 2001 AND (COLOR = "GREEN" OR COLOR = "WHITE")
•This query processes a database by selecting rows that match all these conditions.
• Task: create sets of elements that satisfy a (or several) criteria.
• Edge: output of one task serves as input to the next
Different tables & their dependencies in a query processing operation
An alternate task-dependency graph for query
• Different task decomposition leads to different parallelism.
Granularity of Task Decomposition
• Fine-grained decomposition: large number of small tasks
• Coarse-grained decomposition: small number of large tasks
Matrix-vector multiplication example
Degree of Concurrency: # of tasks that can execute in parallel
-- maximum degree of concurrency: largest # of concurrent tasks at any point of
the execution
average degree of concurrency: average # of tasks that can be executed
concurrently
• Degree of Concurrency vs. Task Granularity
- Inverse relation
Advantages of concurrent tasking:
A Natural Model for Many Real-Time Applications:
•Concurrent tasking aligns well with real-time applications, where tasks or processes run
simultaneously to meet time constraints and deliver results efficiently.
Separation of Concerns:
•Concurrent tasking divides responsibilities by focusing on what a task does (its
functionality) separately from when it executes (its timing or scheduling). This separation
simplifies system design, making it easier to:
• Understand the system.
• Manage the individual components.
• Construct a robust and scalable architecture.
Reduction in System Execution Time:
•By overlapping the execution of independent tasks, concurrent tasking can minimize the total execution
time of a system, leading to more efficient use of resources.
Greater Scheduling Flexibility:
•Concurrent tasking enables flexibility in scheduling by allowing time-critical tasks with strict deadlines
to be prioritized over less critical tasks. This ensures that high-priority tasks are completed on time.
Early Performance Analysis:
•Identifying concurrent tasks early in the system design phase allows developers to conduct performance
analysis at an early stage. This can help in optimizing the system and addressing potential bottlenecks
before implementation.
Critical Path of Task Graph

• Critical Path Length: The longest path from the start to the end of the task graph. It defines the
minimum time required to complete all tasks in the graph.

• Average Degree of Concurrency: A measure of how many tasks, on average, can be executed
concurrently.

• It is calculated as the total sum of task weights divided by the critical path length.

Average degree of concurrency = total amount of work / critical path length

Ex: Critical Path Length
Task-dependency graphs of query processing operation
Limits on Parallel Performance:

Decomposition Granularity:

• Theoretically, breaking a task into smaller subtasks (finer granularity) can reduce parallel execution time.

• However, practical limits exist because:

• Dividing tasks too finely can result in excessive overhead for task management and coordination.

• Communication and synchronization costs may outweigh the benefits of finer granularity.
Bound on Granularity:
• There is an inherent upper bound to how finely a task can be divided.
• Example:
• For matrix-vector multiplication, there are at most n2n^2n2 concurrent tasks (one task per matrix
entry).
• Beyond this limit, no further decomposition is possible because the computational work is
inherently limited.
Communication Overhead:

•Concurrent tasks often need to exchange data (e.g., sharing intermediate results).

•This introduces communication overhead, which can:

•Reduce the overall efficiency of parallel execution.

•Create a tradeoff between finer decomposition (to maximize concurrency) and the overhead caused by data

transfer and synchronization.

Tradeoff Between Granularity and Overhead:

•The performance bounds of a parallel system are determined by finding the optimal balance between:

•Task granularity (how small the subtasks are).

•Communication overhead (cost of data transfer and synchronization).

• Parallel performance does not scale infinitely with finer task decomposition due to:

• Limits on granularity (the computational structure restricts how many tasks can run concurrently).

• Overheads (communication, coordination, and task scheduling).

• Optimizing parallel performance requires balancing task size and communication costs, which is often
application-specific.

• This highlights that while parallel computing can significantly speed up processes, it is limited by the inherent
nature of the computation and the associated overheads.
Task Interaction Graph:
Subtasks Exchange Data:
•In a decomposition (dividing a problem into smaller tasks), subtasks often need to communicate data with each
other.
•Example:
•In the decomposition of a dense matrix-vector multiplication:
•If the vector is not replicated across all tasks, subtasks will need to communicate elements of the
vector with one another.
Graph Representation:
•A task interaction graph is a representation of:
•Tasks as nodes.
•Interactions or data exchanges between tasks as edges.
•This graph helps visualize and analyze the communication dependencies among tasks.
Importance of the Task Interaction Graph:

• Analyzing Communication Overhead:

• The task interaction graph helps identify communication requirements among tasks, which is crucial for
understanding and minimizing communication overhead in parallel systems.

• Scheduling and Optimization:

• By examining the task interaction graph, tasks can be scheduled to minimize inter-task communication or
overlapping dependencies, improving overall efficiency.

• Designing Parallel Algorithms:

• The graph provides insight into how to design algorithms to reduce dependencies and enhance scalability.
Sparse Matrix Representation (Fig. a):
•The matrix A shown is sparse, meaning most of its elements are zero.
•Non-zero entries are marked, and these are the elements that will participate in the multiplication with the vector b.
•Each row of the matrix corresponds to a task in the multiplication process:
•Task iii: Computes the dot product of the iii-th row of A with the vector b.
Task Interaction Graph (Fig. b):
•Each node in the graph represents a task (row of A).
•Edges indicate dependencies or data exchange between tasks:
•For example, if two tasks share a common non-zero column in A, they might need to exchange the corresponding
value from b.
Steps in Sparse Matrix-Vector Multiplication:
• Decompose the matrix into rows (tasks).
• For each task (row), perform the dot product using only the non-zero entries in the row and the
corresponding elements of b.
• Communicate values of b to tasks that depend on shared columns.
Decomposition:
• Decomposition refers to dividing a task into smaller subtasks to facilitate parallel execution. This process
helps achieve concurrency and is critical for parallel computing.

• There is no single universal method for task decomposition; techniques depend on the specific problem
being solved.

• Common types of decomposition include:

• Recursive decomposition: Based on breaking down problems recursively.

• Data decomposition: Dividing tasks based on data distribution.

• Exploratory decomposition: For tasks involving dynamic exploration, like search problems.

• Speculative decomposition: Used when predicting which tasks will be required in the future.
Recursive Decomposition:
• Recursive decomposition is particularly effective for problems that follow the divide-and-conquer strategy,
which breaks a problem into smaller independent subproblems and solves them recursively.

• Steps:

• Decompose the problem into independent sub-problems:

• The problem is divided into smaller parts that can be solved independently.

• Example: Breaking a sorting problem into sorting smaller subsets.

• Recursively decompose each sub-problem:

• Each subproblem is further divided until the tasks become simple enough to be solved directly (base case).
Advantages of Recursive Decomposition:
• Naturally introduces concurrency, as the independent subproblems can be solved in parallel.
• Suited for problems with hierarchical or tree-like structures.
Data Decomposition

Operating System Concepts - 21BCA43 NEW
No ratings yet
Operating System Concepts - 21BCA43 NEW
2 pages
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
No ratings yet
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
27 pages
Parallel Programming: Lecture #9
No ratings yet
Parallel Programming: Lecture #9
24 pages
HPC Ut 2
No ratings yet
HPC Ut 2
4 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
Partitioning
No ratings yet
Partitioning
37 pages
Parallel Algorithm Design Principles and Programming
No ratings yet
Parallel Algorithm Design Principles and Programming
8 pages
Module 1
No ratings yet
Module 1
14 pages
Principles of Parallel Algorithm Design
No ratings yet
Principles of Parallel Algorithm Design
63 pages
Lecture 6 Principles of Parallel Algorithm Design
No ratings yet
Lecture 6 Principles of Parallel Algorithm Design
35 pages
CS439 CC 2 Parallel Distributed Systems
No ratings yet
CS439 CC 2 Parallel Distributed Systems
37 pages
E - Notes - HPC-Unit 3-1
No ratings yet
E - Notes - HPC-Unit 3-1
26 pages
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
Lecture 5 Principles of Parallel Algorithm Design
No ratings yet
Lecture 5 Principles of Parallel Algorithm Design
30 pages
Parallel and Distributed Lec 8
No ratings yet
Parallel and Distributed Lec 8
24 pages
Untitled Document
No ratings yet
Untitled Document
63 pages
Untitled Document
No ratings yet
Untitled Document
39 pages
FoP HPC Unit II
No ratings yet
FoP HPC Unit II
107 pages
AA Part1
No ratings yet
AA Part1
43 pages
CSC 580 - Chapter 3
No ratings yet
CSC 580 - Chapter 3
35 pages
Lecture 6
No ratings yet
Lecture 6
37 pages
PDC ch#5
No ratings yet
PDC ch#5
12 pages
ConcurrencyDecomposition Parallel Algorithm
No ratings yet
ConcurrencyDecomposition Parallel Algorithm
40 pages
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
No ratings yet
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
18 pages
U1&u2 Padcom-25
No ratings yet
U1&u2 Padcom-25
95 pages
L19-20 PA Design Intro
No ratings yet
L19-20 PA Design Intro
31 pages
Common PDC Module3
No ratings yet
Common PDC Module3
43 pages
5 - Designing Parallel Programs
No ratings yet
5 - Designing Parallel Programs
52 pages
Lecture4 PDF
No ratings yet
Lecture4 PDF
23 pages
High Performance Computing (HPC) - Lec2
No ratings yet
High Performance Computing (HPC) - Lec2
53 pages
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-19 Reference-Material-I
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-19 Reference-Material-I
72 pages
Unit-2 Notes (Os)
No ratings yet
Unit-2 Notes (Os)
18 pages
Parallel Computing
No ratings yet
Parallel Computing
91 pages
Parallel Programming
No ratings yet
Parallel Programming
18 pages
HPC Unit 2
No ratings yet
HPC Unit 2
2 pages
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-12 Reference-Material-I
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503160 2023-01-12 Reference-Material-I
28 pages
ParallelIzation Principles
No ratings yet
ParallelIzation Principles
40 pages
Principles of Parallel Algorithm Design
No ratings yet
Principles of Parallel Algorithm Design
78 pages
Chapter 3 - Principles of Parallel Algorithm Design
No ratings yet
Chapter 3 - Principles of Parallel Algorithm Design
52 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Parallel Algorithms Presentation
No ratings yet
Parallel Algorithms Presentation
32 pages
Lectures5 14
No ratings yet
Lectures5 14
85 pages
4 DesigningParallelPrograms
No ratings yet
4 DesigningParallelPrograms
69 pages
BCSE412L - Parallel Computing 01
No ratings yet
BCSE412L - Parallel Computing 01
27 pages
HPC Lecture 2 Points
No ratings yet
HPC Lecture 2 Points
7 pages
Unit 2
No ratings yet
Unit 2
64 pages
Unit 2
No ratings yet
Unit 2
151 pages
Week 7
No ratings yet
Week 7
27 pages
PDC Summers Finals Revision Notes
No ratings yet
PDC Summers Finals Revision Notes
50 pages
Chapter Overview: Algorithms and Concurrency: - Introduction To Parallel Algorithms
No ratings yet
Chapter Overview: Algorithms and Concurrency: - Introduction To Parallel Algorithms
84 pages
Con Currency Mapping
No ratings yet
Con Currency Mapping
40 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
Parallel Computing
No ratings yet
Parallel Computing
24 pages
WINSEM2022-23 CSE4001 ETH VL2022230503176 Reference Material I 02-02-2023 Module3-ParallelDecomposition
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503176 Reference Material I 02-02-2023 Module3-ParallelDecomposition
89 pages
Programming For Performance
No ratings yet
Programming For Performance
79 pages
Module 1 Chapter2
No ratings yet
Module 1 Chapter2
98 pages
Module - 3 Parallel Algorithm Design - Preliminaries
No ratings yet
Module - 3 Parallel Algorithm Design - Preliminaries
12 pages
Module 1 Chapter2
No ratings yet
Module 1 Chapter2
100 pages
Pca Chapter 2 Program & Network Properties
No ratings yet
Pca Chapter 2 Program & Network Properties
71 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Q. Describe Memory Layout of Multiprogramming Operating System. State It's Advantage
No ratings yet
Q. Describe Memory Layout of Multiprogramming Operating System. State It's Advantage
9 pages
Chapter 6 Dynamic Priority Servers
No ratings yet
Chapter 6 Dynamic Priority Servers
18 pages
Anudeep Resume
No ratings yet
Anudeep Resume
2 pages
Lec 1 (Operating System) 1
No ratings yet
Lec 1 (Operating System) 1
96 pages
Recreator3D Fun Size Printed Parts List
No ratings yet
Recreator3D Fun Size Printed Parts List
11 pages
Mid Sem - CSF 372 - Operating Systems - Solutions
No ratings yet
Mid Sem - CSF 372 - Operating Systems - Solutions
6 pages
(COMP3511) (2014) (S) Midterm Cwwuaa 52964
No ratings yet
(COMP3511) (2014) (S) Midterm Cwwuaa 52964
10 pages
Week 6 Intro To Kernel Modules, Project 2: Sarah Diesburg Florida State University
No ratings yet
Week 6 Intro To Kernel Modules, Project 2: Sarah Diesburg Florida State University
61 pages
Real Time Scheduling: Edf: Bachelor of Technology Computer Science and Engineering
No ratings yet
Real Time Scheduling: Edf: Bachelor of Technology Computer Science and Engineering
6 pages
1CP2 01 Rms 20230824
No ratings yet
1CP2 01 Rms 20230824
24 pages
International Journal About Windows & Linux
No ratings yet
International Journal About Windows & Linux
6 pages
MCASyllabusR 20
No ratings yet
MCASyllabusR 20
82 pages
A Cognitive System Model For Human/Automation Dynamics in Airspace Management
No ratings yet
A Cognitive System Model For Human/Automation Dynamics in Airspace Management
15 pages
Blue Prism Interview Questions and Answers: What Is Blue Prism? - Features, Components, Benefits, Payscale
No ratings yet
Blue Prism Interview Questions and Answers: What Is Blue Prism? - Features, Components, Benefits, Payscale
49 pages
1Pgdcs1:Microprocessor Architecture & Assembly Language Programming
No ratings yet
1Pgdcs1:Microprocessor Architecture & Assembly Language Programming
11 pages
PCP Chapter
No ratings yet
PCP Chapter
3 pages
A - JVM - Threading - Model - For - The - Containerized - Times 2
No ratings yet
A - JVM - Threading - Model - For - The - Containerized - Times 2
92 pages
CA-Autosys Workload Automation
No ratings yet
CA-Autosys Workload Automation
9 pages
Hadoop Advanced 40 MCQs
No ratings yet
Hadoop Advanced 40 MCQs
5 pages
OS MCQ Combine
No ratings yet
OS MCQ Combine
30 pages
BITS ZG553 Real Time Systems L-1a KGK
No ratings yet
BITS ZG553 Real Time Systems L-1a KGK
42 pages
Batch Best Practices PDF
No ratings yet
Batch Best Practices PDF
74 pages
Integration of Databases With Cloud Environment
No ratings yet
Integration of Databases With Cloud Environment
9 pages
Chapter 6: Real-Time CPU Scheduling: Silberschatz, Galvin and Gagne ©2009! Operating System Concepts - 8 Edition,!
No ratings yet
Chapter 6: Real-Time CPU Scheduling: Silberschatz, Galvin and Gagne ©2009! Operating System Concepts - 8 Edition,!
29 pages
VxWorks vs. LynxOS Real-Time Operating Systems For Embedded Systems
No ratings yet
VxWorks vs. LynxOS Real-Time Operating Systems For Embedded Systems
4 pages
Introduction To Operating Systems
100% (1)
Introduction To Operating Systems
38 pages
Sap Implementation and Administration Guide
100% (4)
Sap Implementation and Administration Guide
325 pages
Operating System: Bahria University, Islamabad
No ratings yet
Operating System: Bahria University, Islamabad
10 pages
SJ-20160119164028-015-ZXR10 5960 Series (V3.02.20) All 10-Gigabit Data Center Switch Configuration Guide (IDC)
No ratings yet
SJ-20160119164028-015-ZXR10 5960 Series (V3.02.20) All 10-Gigabit Data Center Switch Configuration Guide (IDC)
127 pages

PDC Unit-2

Uploaded by

PDC Unit-2

Uploaded by

Parallel algorithm design principles

Design Considerations for Synchronization:

• When designing systems, consider:

• The cost of communication between tasks.

• Latency (delays) and bandwidth usage.

• The scope and efficiency of communication.

•A parallel job is assigned to a group of processors, which is called a partition.

time. This is called space slicing or space partitioning.

Job Submission and Scheduling:

•Users send their jobs to a machine’s scheduler.

Ex: Multiplying a Dense Matrix with a Vector

Ex: Database Query Processing

Average degree of concurrency = total amount of work / critical path length

• However, practical limits exist because:

•This introduces communication overhead, which can:

•Reduce the overall efficiency of parallel execution.

transfer and synchronization.

Tradeoff Between Granularity and Overhead:

•Task granularity (how small the subtasks are).

•Communication overhead (cost of data transfer and synchronization).

• Overheads (communication, coordination, and task scheduling).

• Analyzing Communication Overhead:

• Scheduling and Optimization:

• Designing Parallel Algorithms:

• Common types of decomposition include:

• Recursive decomposition: Based on breaking down problems recursively.

• Data decomposition: Dividing tasks based on data distribution.

• Decompose the problem into independent sub-problems:

• Example: Breaking a sorting problem into sorting smaller subsets.

• Recursively decompose each sub-problem:

You might also like