Lec # 04 - Parallel Algorithm
Lec # 04 - Parallel Algorithm
Lecture # 04
Parallel Algorithm:
Parallel algorithm, as opposed to a traditional serial algorithm, is an algorithm which can do multiple
operations in a given time.
Algorithms vary significantly in how parallelizable they are, ranging from easily parallelizable to
completely unparallelizable.
Some problems are easy to divide up into pieces, so those could be solved as parallel problem
Some problems cannot be split up into parallel portions, as they require the results from a preceding step
to effectively carry on with the next step – these are called inherently serial problems
Motivation
Parallel algorithms on individual devices have become more common since the early 2000s because of
substantial improvements in multiprocessing systems and the rise of multi-core processors. Up until the
end of 2004, single-core processor performance rapidly increased via frequency scaling, and thus it was
easier to construct a computer with a single fast core than one with many slower cores with the same
throughput, so multicore systems were of more limited use. Since 2004 however, frequency scaling hit
a wall, and thus multicore systems have become more widespread, making parallel algorithms of more
general use.
Muhammad Ehsan 1
[Parallel and Distributed Computing]
Example − To access the ith element in a set by using an array, it may take a constant time but by using
a linked list, the time required to perform the same operation may become a polynomial.
Therefore, the selection of a data structure must be done considering the architecture and the type of
operations to be performed.
The following data structures are commonly used in parallel programming −
1. Linked List 2. Arrays 3. Hypercube Network
Linked List
A linked list is a data structure having zero or more nodes connected by pointers. Nodes may or may
not occupy consecutive memory locations. Each node has two or three parts − one data part that stores
the data and the other two are link fields that store the address of the previous or next node. The first
node’s address is stored in an external pointer called head. The last node, known as tail, generally does
not contain any address.
There are three types of linked lists −
1. Singly Linked List 2. Doubly Linked List 3. Circular Linked List
Singly Linked List
A node of a singly linked list contains data and the address of the next node. An external pointer
called head stores the address of the first node.
Muhammad Ehsan 2
[Parallel and Distributed Computing]
Arrays
An array is a data structure where we can store similar types of data. It can be one-dimensional or
multi-dimensional. Arrays can be created statically or dynamically.
• In statically declared arrays, dimension and size of the arrays are known at the time of
compilation.
• In dynamically declared arrays, dimension and size of the array are known at runtime.
For shared memory programming, arrays can be used as a common memory and for data parallel
programming, they can be used by partitioning into sub-arrays.
Hypercube Network
Hypercube architecture is helpful for those parallel algorithms where each task has to communicate
with other tasks. Hypercube topology can easily embed other topologies such as ring and mesh. It is
also known as n-cubes, where n is the number of dimensions. A hypercube can be constructed
recursively.
Muhammad Ehsan 3
[Parallel and Distributed Computing]
Greedy Method
In greedy algorithm of optimizing solution, the best solution is chosen at any moment. A greedy
algorithm is very easy to apply to complex problems. It decides which step will provide the most
accurate solution in the next step.
This algorithm is a called greedy because when the optimal solution to the smaller instance is provided,
the algorithm does not consider the total program as a whole. Once a solution is considered, the greedy
algorithm never considers the same solution again.
A greedy algorithm works recursively creating a group of objects from the smallest possible component
parts. Recursion is a procedure to solve a problem in which the solution to a specific problem is
dependent on the solution of the smaller instance of that problem.
Dynamic Programming
Dynamic programming is an optimization technique, which divides the problem into smaller sub-
problems and after solving each sub-problem, dynamic programming combines all the solutions to get
ultimate solution. Unlike divide and conquer method, dynamic programming reuses the solution to the
sub-problems many times.
Recursive algorithm for Fibonacci Series is an example of dynamic programming.
Backtracking Algorithm
Backtracking is an optimization technique to solve combinational problems. It is applied to both
programmatic and real-life problems. Eight queen problem, Sudoku puzzle and going through a maze
are popular examples where backtracking algorithm is used.
In backtracking, we start with a possible solution, which satisfies all the required conditions. Then we
move to the next level and if that level does not produce a satisfactory solution, we return one level
back and start with a new option.
Muhammad Ehsan 4
[Parallel and Distributed Computing]
The purpose of a branch and bound search is to maintain the lowest-cost path to a target. Once a solution
is found, it can keep improving the solution. Branch and bound search is implemented in depth-bounded
search and depth–first search.
Linear Programming
Linear programming describes a wide class of optimization job where both the optimization criterion
and the constraints are linear functions. It is a technique to get the best outcome like maximum profit,
shortest path, or lowest cost.
In this programming, we have a set of variables and we have to assign absolute values to them to satisfy
a set of linear equations and to maximize or minimize a given linear objective function.
https://www.tutorialspoint.com/parallel_algorithm/parallel_algorithm_quick_guide.htm#:~:text=
A%20parallel%20algorithm%20is%20an,to%20produce%20the%20final%20result.
The challenges of designing concurrent systems arise mostly because of the interactions which happen
between concurrent activities. When concurrent activities interact, some sort of coordination is required.
This Coordination is also called synchronization.
Muhammad Ehsan 5
[Parallel and Distributed Computing]
Concurrency relates to an application that is processing more than one task at the same time.
Concurrency is an approach that is used for decreasing the response time of the system by using the
single processing unit. Concurrency is creating the illusion of parallelism, however actually the chunks
of a task aren’t parallelly processed, but inside the application, there are more than one task is being
processed at a time.
It doesn’t fully end one task before it begins ensuing.
Concurrency is achieved through the interleaving operation of processes on
the central processing unit (CPU) or in other words by the context switching.
that’s rationale it’s like parallel processing. It increases the amount of work
finished at a time.
In the figure, we can see that there are multiple tasks making progress at the
same time. This figure shows the concurrency because concurrency is the
technique that deals with the lot of things at a time.
Parallelism:
Parallelism is related to an application where tasks are divided into smaller sub-tasks that are processed
seemingly simultaneously or parallel. It is used to increase the throughput and computational speed of
the system by using multiple processors. It enables single sequential CPUs to do lot of things
“seemingly” simultaneously.
In the above figure, we can see that the tasks are divided into smaller sub-tasks that are processing
simultaneously or parallel. This figure shows the parallelism, the technique that runs threads
simultaneously.
Muhammad Ehsan 6
[Parallel and Distributed Computing]
Mutual Exclusion:
Mutual exclusion is a property of concurrency control, which is instituted for the purpose of
preventing race conditions. It is the requirement that one thread of execution never enters a critical
section while a concurrent thread of execution is already accessing critical section, which refers to an
interval of time during which a thread of execution accesses a shared resource, such as [Shared data
objects, shared resources, shared memory].
During concurrent execution of processes, processes need to enter the critical section (or the section of
the program shared across processes) at times for execution. It might so happen that because of the
execution of multiple processes at once, the values stored in the critical section become inconsistent. In
other words, the values depend on the sequence of execution of instructions – also known as a race
condition. The primary task of process synchronization is to get rid of race conditions while executing
the critical section.
This is primarily achieved through mutual exclusion.
Mutual exclusion is a property of process synchronization which states that “no two processes can
exist in the critical section at any given point of time”. The term was first coined by Dijkstra. Any
process synchronization technique being used must satisfy the property of mutual exclusion, without
which it would not be possible to get rid of a race condition.
To understand mutual exclusion, let’s take an example.
Example:
In the clothes section of a supermarket, two people are shopping for clothes.
Boy A decides upon some clothes to buy
and heads to the changing room to try
them out. Now, while boy A is inside the
changing room, there is an ‘occupied’ sign
on it – indicating that no one else can come
in. Girl B has to use the changing room
too, so she has to wait till boy A is done
using the changing room.
Muhammad Ehsan 7
[Parallel and Distributed Computing]
The changing room is nothing but the critical section, boy A and girl B are two different processes,
while the sign outside the changing room indicates the process synchronization mechanism being used.
Lock or mutex (mutual exclusion) is a synchronization primitive: is a mechanism that enforce limits
on assessing a resource when there are many thread of execution.
The first person to propose such primitive was Edsger Dijkstra, who suggested new data type called a
Semaphore.
What is Semaphore?
Semaphore is simply a variable that is non-negative and shared between threads. A semaphore is a
signaling mechanism, and a thread that is waiting on a semaphore can be signaled by another thread. It
uses two atomic operations, 1) Wait, and 2) Signal for the process synchronization.
Characteristic of Semaphore
Here, are characteristics of a semaphore:
• It is a mechanism that can be used to provide synchronization of tasks.
• It is a low-level synchronization mechanism.
• Semaphore will always hold a non-negative integer value.
• Semaphore can be implemented using test operations and interrupts, which should be executed
using file descriptors.
Types of Semaphores
The two common kinds of semaphores are
• Counting semaphores
• Binary semaphores.
Muhammad Ehsan 8
[Parallel and Distributed Computing]
Counting Semaphores
This type of Semaphore uses a count that
helps task to be acquired or released
numerous times. If the initial count = 0, the
counting semaphore should be created in the
unavailable state.
However, If the count is > 0, the semaphore is created in the available state, and the number of tokens
it has equals to its count.
Binary Semaphores
The binary semaphores are quite similar to counting semaphores, but their value is restricted to 0 and
1. In this type of semaphore, the wait operation works only if semaphore = 1, and the signal operation
succeeds when semaphore= 0. It is easy to implement than counting semaphores.
Example of Semaphore
The below-given program is a step-by-step implementation, which involves usage and declaration of
semaphore.
Shared var mutex: semaphore = 1;
Process i
begin
.
.
P(mutex);
execute CS;
V(mutex);
.
.
End;
Wait and Signal Operations in Semaphores
Both of these operations are used to implement process synchronization. The goal of this semaphore
operation is to get mutual exclusion.
Muhammad Ehsan 9
[Parallel and Distributed Computing]
Signal operation
This type of Semaphore operation is used to control the exit of a task from a critical section. It helps to
increase the value of the argument by 1, which is denoted as V(S).
Copy CodeP(S)
{
while (S>=0);
S++;
}
Counting Semaphore vs. Binary Semaphore
Here, are some major differences between counting and binary semaphore:
Counting Semaphore Binary Semaphore
No mutual exclusion Mutual exclusion
Any integer value Value only 0 and 1
More than one slot Only one slot
Provide a set of Processes It has a mutual exclusion mechanism.
Advantages of Semaphores
Here, are pros/benefits of using Semaphore:
1. It allows more than one thread to access the critical section
2. Semaphores are machine-independent.
3. Semaphores are implemented in the machine-independent code of the microkernel.
4. They do not allow multiple processes to enter the critical section.
5. As there is busy waiting in semaphore, there is never a wastage of process time and resources.
6. They are machine-independent, which should be run in the machine-independent code of the
microkernel.
7. They allow flexible management of resources.
Muhammad Ehsan 10
[Parallel and Distributed Computing]
Disadvantage of semaphores
Here, are cons/drawback of semaphore
1. One of the biggest limitations of a semaphore is priority inversion.
2. The operating system has to keep track of all calls to wait and signal semaphore.
3. Their use is never enforced, but it is by convention only.
4. In order to avoid deadlocks in semaphore, the Wait and Signal operations require to be executed
in the correct order.
5. Semaphore programming is a complicated, so there are chances of not achieving mutual
exclusion.
6. It is also not a practical method for large scale use as their use leads to loss of modularity.
7. Semaphore is more prone to programmer error.
8. It may cause deadlock or violation of mutual exclusion due to programmer error.
1. Recursive Decomposition
Recursive Decomposition is a method for including concurrency in problems that can be solved using
the divide and conquer strategy. This technique, a problem is solved by first dividing it into a set of
independent subproblems. Each one of these a problem is solved by recursively applying a similar
division into smaller sub problems followed by a combination of their results.
Muhammad Ehsan 11
[Parallel and Distributed Computing]
2. Data Decomposition
Data decomposition is a powerful and commonly used method for deriving concurrency in algorithm
that operate on large data structure.
In this method, the decomposition of computations is done in two steps:
1. In the first step, the data on which the compositions are performed is partitioned and
2. Then the second step this data partitioning is used to induce a partitioning of a computations
into tasks.
Operations that these tasks perform on different data partitions are usually similar.
The partitioning of data can be performed in many possible ways. but we are going to discover matrix
multiplication.
To multiply matrix by another matrix, we need to do dot product of rows and column...
what does that mean let's see with Example: 1st row * 1st column
c1 = a1 * b1 + a2 * b3
c2 = a1 * b2 + a2 * b4
c3 = a3 * b1 + a4 * b3
c4 = a3 * b2 + a4 * b4
3. Exploratory Decomposition
Exploratory decomposition is used to decompose problems whose underlying computations
correspond to a searching of a solution from the search space.
In exploratory decomposition, we partition the search space into smaller parts and search each one of
these parts concurrently, until the desired solutions are found.
4. Speculative Decomposition
Speculative decomposition is used when the program may take one of many possible computationally
significant branches depending on the output of other competitions.
Muhammad Ehsan 12
[Parallel and Distributed Computing]
Muhammad Ehsan 13