0% found this document useful (0 votes)
27 views

Lec # 04 - Parallel Algorithm

The document discusses various parallel algorithm design techniques including divide and conquer, greedy methods, dynamic programming, backtracking, branch and bound, and linear programming. It also covers parallel algorithm structures like linked lists, arrays, and hypercube networks as well as concurrency and synchronization in parallel computing.

Uploaded by

miansb023
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Lec # 04 - Parallel Algorithm

The document discusses various parallel algorithm design techniques including divide and conquer, greedy methods, dynamic programming, backtracking, branch and bound, and linear programming. It also covers parallel algorithm structures like linked lists, arrays, and hypercube networks as well as concurrency and synchronization in parallel computing.

Uploaded by

miansb023
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

[Parallel and Distributed Computing]

Lecture # 04
Parallel Algorithm:
Parallel algorithm, as opposed to a traditional serial algorithm, is an algorithm which can do multiple
operations in a given time.

Many parallel algorithms are executed concurrently


– though in general concurrent algorithms are a
distinct concept – and thus these concepts are often
conflated, with which aspect of an algorithm is
parallel and which is concurrent not being clearly
distinguished. Further, non-parallel, non-concurrent
algorithms are often referred to as "sequential
algorithms", by contrast with concurrent
algorithms.

Algorithms vary significantly in how parallelizable they are, ranging from easily parallelizable to
completely unparallelizable.

Some problems are easy to divide up into pieces, so those could be solved as parallel problem

Some problems cannot be split up into parallel portions, as they require the results from a preceding step
to effectively carry on with the next step – these are called inherently serial problems

Motivation
Parallel algorithms on individual devices have become more common since the early 2000s because of
substantial improvements in multiprocessing systems and the rise of multi-core processors. Up until the
end of 2004, single-core processor performance rapidly increased via frequency scaling, and thus it was
easier to construct a computer with a single fast core than one with many slower cores with the same
throughput, so multicore systems were of more limited use. Since 2004 however, frequency scaling hit
a wall, and thus multicore systems have become more widespread, making parallel algorithms of more
general use.

Parallel Algorithm - Structure


To apply any algorithm properly, it is very important that you select a proper data structure. It is because
a particular operation performed on a data structure may take more time as compared to the same
operation performed on another data structure.

Muhammad Ehsan 1
[Parallel and Distributed Computing]

Example − To access the ith element in a set by using an array, it may take a constant time but by using
a linked list, the time required to perform the same operation may become a polynomial.
Therefore, the selection of a data structure must be done considering the architecture and the type of
operations to be performed.
The following data structures are commonly used in parallel programming −
1. Linked List 2. Arrays 3. Hypercube Network

Linked List
A linked list is a data structure having zero or more nodes connected by pointers. Nodes may or may
not occupy consecutive memory locations. Each node has two or three parts − one data part that stores
the data and the other two are link fields that store the address of the previous or next node. The first
node’s address is stored in an external pointer called head. The last node, known as tail, generally does
not contain any address.
There are three types of linked lists −
1. Singly Linked List 2. Doubly Linked List 3. Circular Linked List
Singly Linked List
A node of a singly linked list contains data and the address of the next node. An external pointer
called head stores the address of the first node.

Doubly Linked List


A node of a doubly linked list contains data and the address of both the previous and the next node. An
external pointer called head stores the address of the first node and the external pointer
called tail stores the address of the last node.

Circular Linked List


A circular linked list is very similar to the singly linked list except the fact that the last node saved the
address of the first node.

Muhammad Ehsan 2
[Parallel and Distributed Computing]

Arrays
An array is a data structure where we can store similar types of data. It can be one-dimensional or
multi-dimensional. Arrays can be created statically or dynamically.
• In statically declared arrays, dimension and size of the arrays are known at the time of
compilation.
• In dynamically declared arrays, dimension and size of the array are known at runtime.
For shared memory programming, arrays can be used as a common memory and for data parallel
programming, they can be used by partitioning into sub-arrays.

Hypercube Network
Hypercube architecture is helpful for those parallel algorithms where each task has to communicate
with other tasks. Hypercube topology can easily embed other topologies such as ring and mesh. It is
also known as n-cubes, where n is the number of dimensions. A hypercube can be constructed
recursively.

Parallel Algorithm - Design Techniques


Selecting a proper designing technique for a parallel algorithm is the most difficult and important task.
Most of the parallel programming problems may have more than one solution. In this chapter, we will
discuss the following designing techniques for parallel algorithms −
• Divide and conquer • Backtracking
• Greedy Method • Branch & Bound
• Dynamic Programming • Linear Programming
Divide and Conquer Method
In the divide and conquer approach, the problem is divided into several small sub-problems. Then the
sub-problems are solved recursively and combined to get the solution of the original problem.
The divide and conquer approach involves the following steps at each level −
• Divide − The original problem is divided into sub-problems.
• Conquer − The sub-problems are solved recursively.
• Combine − The solutions of the sub-problems are combined together to get the solution of the
original problem.

Muhammad Ehsan 3
[Parallel and Distributed Computing]

The divide and conquer approach is applied in the following algorithms −


• Binary search • Integer multiplication
• Quick sort • Matrix inversion
• Merge sort • Matrix multiplication

Greedy Method
In greedy algorithm of optimizing solution, the best solution is chosen at any moment. A greedy
algorithm is very easy to apply to complex problems. It decides which step will provide the most
accurate solution in the next step.
This algorithm is a called greedy because when the optimal solution to the smaller instance is provided,
the algorithm does not consider the total program as a whole. Once a solution is considered, the greedy
algorithm never considers the same solution again.
A greedy algorithm works recursively creating a group of objects from the smallest possible component
parts. Recursion is a procedure to solve a problem in which the solution to a specific problem is
dependent on the solution of the smaller instance of that problem.

Dynamic Programming
Dynamic programming is an optimization technique, which divides the problem into smaller sub-
problems and after solving each sub-problem, dynamic programming combines all the solutions to get
ultimate solution. Unlike divide and conquer method, dynamic programming reuses the solution to the
sub-problems many times.
Recursive algorithm for Fibonacci Series is an example of dynamic programming.

Backtracking Algorithm
Backtracking is an optimization technique to solve combinational problems. It is applied to both
programmatic and real-life problems. Eight queen problem, Sudoku puzzle and going through a maze
are popular examples where backtracking algorithm is used.
In backtracking, we start with a possible solution, which satisfies all the required conditions. Then we
move to the next level and if that level does not produce a satisfactory solution, we return one level
back and start with a new option.

Branch and Bound


A branch and bound algorithm is an optimization technique to get an optimal solution to the problem.
It looks for the best solution for a given problem in the entire space of the solution. The bounds in the
function to be optimized are merged with the value of the latest best solution. It allows the algorithm
to find parts of the solution space completely.

Muhammad Ehsan 4
[Parallel and Distributed Computing]

The purpose of a branch and bound search is to maintain the lowest-cost path to a target. Once a solution
is found, it can keep improving the solution. Branch and bound search is implemented in depth-bounded
search and depth–first search.

Linear Programming
Linear programming describes a wide class of optimization job where both the optimization criterion
and the constraints are linear functions. It is a technique to get the best outcome like maximum profit,
shortest path, or lowest cost.
In this programming, we have a set of variables and we have to assign absolute values to them to satisfy
a set of linear equations and to maximize or minimize a given linear objective function.

https://www.tutorialspoint.com/parallel_algorithm/parallel_algorithm_quick_guide.htm#:~:text=
A%20parallel%20algorithm%20is%20an,to%20produce%20the%20final%20result.

Concurrency and Synchronization in Parallel Computing


What is Concurrency?
Concurrency is the tendency for things to happen
at the same time in a system. Concurrency is a
natural phenomenon, of course. In the real world,
at any given time, many things are happening
simultaneously. When we design software to
monitor and control real-world systems, we must
deal with this natural concurrency.
Figure 1: Example of concurrency at work:
parallel activities that do not interact have simple
concurrency issues. It is when parallel activities
interact or share the same resources that
concurrency issues become important.
When dealing with concurrency issues in software systems, there are generally two aspects that are
important:
1. being able to detect and respond to external events occurring in a random order,
2. and ensuring that these events are responded to in some minimum required interval.

The challenges of designing concurrent systems arise mostly because of the interactions which happen
between concurrent activities. When concurrent activities interact, some sort of coordination is required.
This Coordination is also called synchronization.

Muhammad Ehsan 5
[Parallel and Distributed Computing]

Concurrency relates to an application that is processing more than one task at the same time.
Concurrency is an approach that is used for decreasing the response time of the system by using the
single processing unit. Concurrency is creating the illusion of parallelism, however actually the chunks
of a task aren’t parallelly processed, but inside the application, there are more than one task is being
processed at a time.
It doesn’t fully end one task before it begins ensuing.
Concurrency is achieved through the interleaving operation of processes on
the central processing unit (CPU) or in other words by the context switching.
that’s rationale it’s like parallel processing. It increases the amount of work
finished at a time.
In the figure, we can see that there are multiple tasks making progress at the
same time. This figure shows the concurrency because concurrency is the
technique that deals with the lot of things at a time.
Parallelism:
Parallelism is related to an application where tasks are divided into smaller sub-tasks that are processed
seemingly simultaneously or parallel. It is used to increase the throughput and computational speed of
the system by using multiple processors. It enables single sequential CPUs to do lot of things
“seemingly” simultaneously.

Parallelism leads to overlapping of central


processing units and input-output tasks in one
process with the central processing unit and input-
output tasks of another process. Whereas in
concurrency the speed is increased by overlapping
the input-output activities of one process with CPU
process of another process.

In the above figure, we can see that the tasks are divided into smaller sub-tasks that are processing
simultaneously or parallel. This figure shows the parallelism, the technique that runs threads
simultaneously.

Basic synchronization primitives for concurrency control


Mutual exclusion: Lock mutexes, semaphores, monitors, ....
Conditions: Flag, condition variable signals, ...

Muhammad Ehsan 6
[Parallel and Distributed Computing]

Mutual Exclusion:
Mutual exclusion is a property of concurrency control, which is instituted for the purpose of
preventing race conditions. It is the requirement that one thread of execution never enters a critical
section while a concurrent thread of execution is already accessing critical section, which refers to an
interval of time during which a thread of execution accesses a shared resource, such as [Shared data
objects, shared resources, shared memory].

During concurrent execution of processes, processes need to enter the critical section (or the section of
the program shared across processes) at times for execution. It might so happen that because of the
execution of multiple processes at once, the values stored in the critical section become inconsistent. In
other words, the values depend on the sequence of execution of instructions – also known as a race
condition. The primary task of process synchronization is to get rid of race conditions while executing
the critical section.
This is primarily achieved through mutual exclusion.
Mutual exclusion is a property of process synchronization which states that “no two processes can
exist in the critical section at any given point of time”. The term was first coined by Dijkstra. Any
process synchronization technique being used must satisfy the property of mutual exclusion, without
which it would not be possible to get rid of a race condition.
To understand mutual exclusion, let’s take an example.

Example:
In the clothes section of a supermarket, two people are shopping for clothes.
Boy A decides upon some clothes to buy
and heads to the changing room to try
them out. Now, while boy A is inside the
changing room, there is an ‘occupied’ sign
on it – indicating that no one else can come
in. Girl B has to use the changing room
too, so she has to wait till boy A is done
using the changing room.

Muhammad Ehsan 7
[Parallel and Distributed Computing]

Once boy A comes out of the changing


room, the sign on it changes from
‘occupied’ to ‘vacant’ – indicating that
another person can use it. Hence, girl B
proceeds to use the changing room, while
the sign displays ‘occupied’ again.

The changing room is nothing but the critical section, boy A and girl B are two different processes,
while the sign outside the changing room indicates the process synchronization mechanism being used.
Lock or mutex (mutual exclusion) is a synchronization primitive: is a mechanism that enforce limits
on assessing a resource when there are many thread of execution.
The first person to propose such primitive was Edsger Dijkstra, who suggested new data type called a
Semaphore.

What is Semaphore?
Semaphore is simply a variable that is non-negative and shared between threads. A semaphore is a
signaling mechanism, and a thread that is waiting on a semaphore can be signaled by another thread. It
uses two atomic operations, 1) Wait, and 2) Signal for the process synchronization.

Characteristic of Semaphore
Here, are characteristics of a semaphore:
• It is a mechanism that can be used to provide synchronization of tasks.
• It is a low-level synchronization mechanism.
• Semaphore will always hold a non-negative integer value.
• Semaphore can be implemented using test operations and interrupts, which should be executed
using file descriptors.

Types of Semaphores
The two common kinds of semaphores are
• Counting semaphores
• Binary semaphores.

Muhammad Ehsan 8
[Parallel and Distributed Computing]

Counting Semaphores
This type of Semaphore uses a count that
helps task to be acquired or released
numerous times. If the initial count = 0, the
counting semaphore should be created in the
unavailable state.

However, If the count is > 0, the semaphore is created in the available state, and the number of tokens
it has equals to its count.

Binary Semaphores
The binary semaphores are quite similar to counting semaphores, but their value is restricted to 0 and
1. In this type of semaphore, the wait operation works only if semaphore = 1, and the signal operation
succeeds when semaphore= 0. It is easy to implement than counting semaphores.

Example of Semaphore
The below-given program is a step-by-step implementation, which involves usage and declaration of
semaphore.
Shared var mutex: semaphore = 1;
Process i
begin
.
.
P(mutex);
execute CS;
V(mutex);
.
.
End;
Wait and Signal Operations in Semaphores
Both of these operations are used to implement process synchronization. The goal of this semaphore
operation is to get mutual exclusion.

Muhammad Ehsan 9
[Parallel and Distributed Computing]

Wait for Operation


This type of semaphore operation helps you to control the entry of a task into the critical section.
However, If the value of wait is positive, then the value of the wait argument X is decremented. In the
case of negative or zero value, no operation is executed. It is also called P(S) operation.
After the semaphore value is decreased, which becomes negative, the command is held up until the
required conditions are satisfied.
Copy CodeP(S)
{
while (S<=0);
S--;
}

Signal operation
This type of Semaphore operation is used to control the exit of a task from a critical section. It helps to
increase the value of the argument by 1, which is denoted as V(S).
Copy CodeP(S)
{
while (S>=0);
S++;
}
Counting Semaphore vs. Binary Semaphore
Here, are some major differences between counting and binary semaphore:
Counting Semaphore Binary Semaphore
No mutual exclusion Mutual exclusion
Any integer value Value only 0 and 1
More than one slot Only one slot
Provide a set of Processes It has a mutual exclusion mechanism.
Advantages of Semaphores
Here, are pros/benefits of using Semaphore:
1. It allows more than one thread to access the critical section
2. Semaphores are machine-independent.
3. Semaphores are implemented in the machine-independent code of the microkernel.
4. They do not allow multiple processes to enter the critical section.
5. As there is busy waiting in semaphore, there is never a wastage of process time and resources.
6. They are machine-independent, which should be run in the machine-independent code of the
microkernel.
7. They allow flexible management of resources.

Muhammad Ehsan 10
[Parallel and Distributed Computing]

Disadvantage of semaphores
Here, are cons/drawback of semaphore
1. One of the biggest limitations of a semaphore is priority inversion.
2. The operating system has to keep track of all calls to wait and signal semaphore.
3. Their use is never enforced, but it is by convention only.
4. In order to avoid deadlocks in semaphore, the Wait and Signal operations require to be executed
in the correct order.
5. Semaphore programming is a complicated, so there are chances of not achieving mutual
exclusion.
6. It is also not a practical method for large scale use as their use leads to loss of modularity.
7. Semaphore is more prone to programmer error.
8. It may cause deadlock or violation of mutual exclusion due to programmer error.

Difference between Semaphore vs. Mutex ?


Data and work partitioning (Decomposition)
Decomposition means dividing the big task into the sub-tasks. We allocate these sub tasks to the
different processors.
Broadly, there are four Decomposition Techniques in Parallel Computing.
1. Recursive Decomposition 3. Exploratory Decomposition
2. Data Decomposition 4. Speculative Decomposition
The recursive and data decomposition techniques are relatively general purpose as the can be used to
decompose a wide variety of problems, on the other hand speculative and exploratory
decomposition techniques are more of a special purpose nature because they apply to specific classes
of problems.

1. Recursive Decomposition
Recursive Decomposition is a method for including concurrency in problems that can be solved using
the divide and conquer strategy. This technique, a problem is solved by first dividing it into a set of
independent subproblems. Each one of these a problem is solved by recursively applying a similar
division into smaller sub problems followed by a combination of their results.

The divide and conquer strategy results in natural


concurrency, as different Sub problems can be
solved concurrently. It is based on recursion.
e.g. Quick Sort

Muhammad Ehsan 11
[Parallel and Distributed Computing]

2. Data Decomposition
Data decomposition is a powerful and commonly used method for deriving concurrency in algorithm
that operate on large data structure.
In this method, the decomposition of computations is done in two steps:
1. In the first step, the data on which the compositions are performed is partitioned and
2. Then the second step this data partitioning is used to induce a partitioning of a computations
into tasks.
Operations that these tasks perform on different data partitions are usually similar.
The partitioning of data can be performed in many possible ways. but we are going to discover matrix
multiplication.
To multiply matrix by another matrix, we need to do dot product of rows and column...
what does that mean let's see with Example: 1st row * 1st column

c1 = a1 * b1 + a2 * b3
c2 = a1 * b2 + a2 * b4
c3 = a3 * b1 + a4 * b3
c4 = a3 * b2 + a4 * b4

3. Exploratory Decomposition
Exploratory decomposition is used to decompose problems whose underlying computations
correspond to a searching of a solution from the search space.
In exploratory decomposition, we partition the search space into smaller parts and search each one of
these parts concurrently, until the desired solutions are found.

4. Speculative Decomposition
Speculative decomposition is used when the program may take one of many possible computationally
significant branches depending on the output of other competitions.

In this situation, while one task is performing the


computation whose output is used in deciding the
next computation, other tasks can concurrently start
the competitions.
e.g. Switch case

Muhammad Ehsan 12
[Parallel and Distributed Computing]

Parameters Semaphore Mutex


Mechanism It is a type of signaling mechanism. It is a locking mechanism.
Data Type Semaphore is an integer variable. Mutex is just an object.
The wait and signal operations can It is modified only by the process that may
Modification
modify a semaphore. request or release a resource.
If no resource is free, then the process If it is locked, the process has to wait. The
Resource requires a resource that should execute process should be kept in a queue. This needs
management wait operation. It should wait until the to be accessed only when the mutex is
count of the semaphore is greater than 0. unlocked.
You can have multiple program threads in
Thread You can have multiple program threads.
mutex but not simultaneously.
Value can be changed by any process Object lock is released only by the process,
Ownership
releasing or obtaining the resource. which has obtained the lock on it.
Types of Semaphore are counting
Types Mutex has no subtypes.
semaphore and binary semaphore and
Semaphore value is modified using wait
Operation Mutex object is locked or unlocked.
() and signal () operation.
It is occupied if all resources are being
used and the process requesting for In case if the object is already locked, the
Resources
resource performs wait () operation and process requesting resources waits and is
Occupancy
blocks itself until semaphore count queued by the system before lock is released.
becomes >1.

Muhammad Ehsan 13

You might also like