0% found this document useful (0 votes)

57 views

UNIT-3 Concurrent and Parallel Programming:: Some Simple Computations

1) The document discusses several fundamental parallel algorithms including semigroup computation, parallel prefix computation, packet routing, broadcasting, sorting, and their implementations on simple architectures like linear arrays, binary trees, and meshes. 2) It provides examples of maximum finding as a semigroup computation and parallel prefix sums on a linear array of processors. Both algorithms take p-1 communication steps, matching the diameter of the architecture. 3) Extensions to handling multiple data items per processor are discussed.

Uploaded by

Sandhya Gubbala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views

UNIT-3 Concurrent and Parallel Programming:: Some Simple Computations

Uploaded by

Sandhya Gubbala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 36

UNIT-3

Concurrent and Parallel Programming: Parallel algorithms – sorting, ranking,

searching, traversals, prefix sum etc.,

SOME SIMPLE COMPUTATIONS:

In this section three fundamental building-block computations are defined:

1. Semigroup (reduction, fan-in)computation

2. Parallel prefix computation
3. Packet routing
4. Broadcasting, and its more general version, multicasting
5. Sorting records in ascending/descending order of their keys

1. Semigroup Computation. Let be an associative binary operator; i.e., ( x⊕y ) ⊕z= x ⊕ ( y

⊕z ) for all x, y, z ∈S. A semigroup is simply a pair ( S, ⊕ ), where S is a set of elements on
which ⊕ is defined. Semigroup(also known as reduction or fan-in )computation is defined as:
Given a list of n values x0,x1,...,xn–1,computex0⊕x1⊕...⊕xn–1. The operator ⊕ may or may not be
commutative, i.e., it may or may not satisfy x ⊕y = y ⊕x (all of the above examples are, but the
carry computation, e.g., is not). This last point is important; while the parallel algorithm can
compute chunks of the expression using any partitioning scheme, the chunks must eventually be
combined in left-to-right order. Figure 3.1 depicts a semigroup computation on a uni processor.

2. Parallel Prefix Computation. With the same assumptions as in the preceding paragraph, a
parallel prefix computation is defined as simultaneously evaluating all of the prefixes
oftheexpressionx0⊕x1 ...⊕xn–1;i.e.,x0,x0⊕x1,x0⊕x1 ⊕x2,...,x0⊕x1⊕...⊕xn–1. Note that the ith
prefix expression is si = x0 ⊕x1 ⊕ . . . ⊕xi.

The graph representing the prefix computation on a uniprocessor is similar to Fig. 3.1, but with
the intermediate values also output.

3. Packet Routing. A packet of information resides at Processor i and must be sent to

Processor j. The problem is to route the packet through intermediate processors, if needed,

Fig 3.1: Semigroup computation on a uniprocessor.

1
such that it gets to the destination as quickly as possible. The problem becomes more challenging
when multiple packets reside at different processors, each with its own destination. In this case,
the packet routes may interfere with one another as they go through common intermediate
processors. When each processor has at most one packet to send and one packet to receive, the
packet routing problem is called one-to-one communication or 1–1routing.

4. Broadcasting: Given a value a known at a certain processor i, disseminate it to all p

processors as quickly as possible, so that at the end, every processor has access to, or “knows,”
the value. This is sometimes referred to as one-to-all communication. The more general case of
this operation, i.e., one-to-many communication, is known as multicasting. From a programming
viewpoint, we make the assignments xj: = a for 1 ≤ j ≤ p(broadcasting) or for j ∈G (multicasting),
where G is the multicast group and xj is a local variable in processor j.

5. Sorting: Rather than sorting a set of records, ach with a key and data elements, we focus on
sorting a
set of keys for simplicity. Our sorting problem is thus defined as: Given a list of n keys x0,
x1, . . . , xn–1, and a total order ≤ on key values, rearrange the n keys as xi , xi , . . . , xi ,

such that xi ≤ xi≤ . . . ≤x i. We consider only sorting the keys In non descending order.

SOME SIMPLEARCHITECTURES
In this section, we define four simple parallel architectures:

1. Linear array of processors

2. Binary tree of processors
3. Two-dimensional mesh of processors
4. Multiple processors with shared variables

Linear Array: Figure 3.2 shows a linear array of nine processors, numbered 0 to 8. The
diameter of ap-processor linear array, defined as the longest of the shortest distances between
pairs of processors, is D = p – 1. The ( maximum) node degree, defined as the largest number of
links or communication channels associated with a processor, is d = 2. The ring variant, also
shown in Fig.3.2, has the same node degree of 2 but a smaller diameter of D=p/2.

Fig 3.2: A linear array of nine processors and its ring variant.
Binary Tree: Figure 3.3 shows a binary tree of nine processors. This binary tree is balanced in
that the leaf levels differ by at most 1. If all leaf levels are identical and every non leaf processor
has two children, the binary tree is said to be complete. The diameter

Fig 3.3: A balanced (but incomplete) binary tree of nine processors.

of a p-processor complete binary tree is 2 log2 (p + 1) – 2. More generally, the diameter of ap-
processor balanced binary tree architecture is 2log 2p or2log2p–1, depending on the placement of
leaf nodes at the last level. Unlike linear array, several different p-processor binary tree
architectures may exist. This is usually not a problem as we almost always deal with complete
binary trees. The (maximum) node degree in a binary tree is d=3.

2D Mesh: Figure 3.4 shows a square 2D mesh of nine processors. The diameter of a p-processor
square mesh is – 2. More generally, the mesh does not have to be square. The diameter of a p-
processor r× (p/ r) mesh is D = r + p /r – 2. Again, multiple 2D meshes may exist for the same
number p of processors, e.g., 2 × 8 or 4 × 4. Square meshes are usually preferred because they
minimize the diameter. The torus variant, also shown in Fig. 3.4, has end-around or wraparound
links for rows and columns. The node degree for both meshes and tori is d = 4. But a p -
processor r×(p/r) torus has a smaller diameter of D = ×r /2×+p/(2r).

Shared Memory: A shared-memory multiprocessor can be modeled as a complete graph, in

which every node is connected to every other node, as showing Fig.3.5 for p= 9.

In the 2D mesh of Fig. 3.4, Processor 0 can send/receive data directly to/from P 1and P3.
However, it has to go through an intermediary to send/receive data to/from P 4 , say. In a shared-
memory multiprocessor, every piece of data is directly accessible to every processor (we assume
that each processor can simultaneously send/receive data over all of its p – 1 links).The diameter
D=1 of a complete graph is an indicator of this direct access. The node
Fig 3.4: A 2D mesh of nine processors and its torus variant.

Fig 3.5: A shared-variable architecture modeled as a complete graph.

degreed = p – 1, on the other hand, indicates that such an architecture would be quite costly to
implement if no restriction is placed on data accesses.

ALGORITHMS FOR A LINEARARRAY

Semigroup Computation: Let us consider first a special case of semigroup computation,
namely, that of maximum finding. Each of the p processors holds a value initially and our goal is
for every processor to know the largest of these values. A local variable, max-thus-far, can be
initialized to the processor’s own data value. In each step, a processor sends its max-thus-far
value to its two neighbors. Each processor, on receiving values from its left and right neighbors,
sets its max-thus-far value to the largest of the three values, i.e., max(left, own, right). Figure 3.6
depicts the execution of this algorithm for p = 9 processors. The dotted lines in Fig.2.6 show how
the maximum value propagates from P6 to all other processors. Had there been two maximum

values, say in P 2 and P 6 , the propagation would have been faster. In the worst case, p – 1
communication steps (each involving sending a processor’s value to both neighbors), and the
same number of three – way comparison steps, are needed. This is the best one can hope for,
given that the diameter of ap-processor linear array is D = p – 1 (diameter-based lower bound).
Fig 3.6: Maximum-finding on a linear array of nine processors
For a general semigroup computation, the processor at the left end of the array (the one with no left neighbor) becomes
active and sends its data value to the right (initially, all processors are dormant or inactive). On receiving a value from its
left neighbor, a processor becomes active, applies the semigroup operation ⊕ to the value received from the left and its
own data value, sends the result to the right, and becomes inactive again. This wave of activity propagates to the
right, until the rightmost processor obtains the desired result. The
computationresultisthenpropagatedleftwardtoallprocessors.Inall,2p–2communication steps
areneeded.
th
Parallel Prefix Computation. Let us assume that we want the i prefix result to be obtained at
th
the i processor, 0 ≤i ≤p – 1. The general semigroup algorithm described in the preceding
paragraph in fact performs a semigroup computation first and then does a broadcast of the final
value to all processors. Thus, we already have an algorithm for parallel prefix computation that
takes p – 1 communication/combining steps. A variant of the parallel prefix computation, in
th
which Processor i ends up with the prefix result up to the (i – 1) value, is sometimes useful.
This diminished prefix computation can be performed just as easily if each processor holds onto
the value received from the left rather than the one it sends to the right. The diminished prefix
sum results for the example of Fig. 3.7 would be 0, 5, 7, 15, 21, 24, 31, 40,41.

Thus far, we have assumed that each processor holds a single data item. Extension of the
semigroup and parallel prefix algorithms to the case where each processor initially holds several
data items is straightforward. Figure 3.8 shows a parallel prefix sum computation with each
processor initially holding two data items. The algorithm consists of each processor doing a
prefix computation on its own data set of size n/p (this takes n/p – 1 combining steps),then doing
a diminished parallel prefix computation on the linear array as above (p-1
communication/combining steps), and finally combining the local prefix result from this last
computation with the locally computed prefixes(n /p combining steps). In all, 2n/p + p-2
combining steps and p – 1 communication steps are required.

Packet Routing. To send a packet of information from Processor i to Processor j on a linear

array, we simply attach a routing tag with the value j – i to it. The sign of a routing tag
determines the direction in which it should move (+ = right, – = left) while its magnitude
indicates the action to be performed (0 = remove the packet, nonzero = forward the packet). With
each forwarding, them agnitude of the routing tag Is decremented by 1.Multiplepackets
Fig 3.7: Computing prefix sums on a linear array of nine processors.

Fig 3.8: Computing prefix sums on a linear array with two items per processor.

originating at different processors can flow rightward and leftward in lockstep, without ever
interfering with each other.
Broadcasting. If Processor i wants to broadcast a value a to all processors, it sends an rbcast(a)
(read r-broadcast) message to its right neighbor and an lbcast( a) message to its left neighbor.
Any processor receiving an rbcast(a) message, simply copies the value a and forwards the
message to its right neighbor (if any). Similarly, receiving an lbcast(a) message causes a to be
copied locally and the message forwarded to the left neighbor. The worst-case number of
communication steps for broadcasting is p – 1.

Sorting. We consider two versions of sorting on a linear array: with and without I/O. Figure 3.9
depicts a linear-array sorting algorithm when p keys are input, one at a time, from the left end.
Each processor, on receiving a key value from the left, compares the received value with the
value stored in its local register. The smaller of the two values is kept in the local register and
larger value is passed on to the right. Once all p inputs have been received, we must allow p – 1
additional communication cycles for the key values that are in transit to settle into their
respective positions in the linear array. If the sorted list is to be output from the left, the output
phase can start immediately after the last key value has been received. In this case, an array half
the size of the input list would be adequate and we effectively have zero-time sorting, i.e., the
total sorting time is equal to the I/O time.
If the key values are already in place, one per processor, then an algorithm known as odd–even
transposition can be used for sorting. A total of p steps are required. In an odd-numbered step,
odd-numbered processors compare values with their even-numbered right neighbors. The two
processors exchange their values if they are out of order. Similarly, in an even-numbered step,
even-numbered processors compare–exchange values with their right neighbors (see Fig. 3.10).
In the worst case, the largest key value resides in Processor 0 and must move all the way to the
other end of the array. This needs p – 1 right moves. One step must be added because no
movement occurs in the first step. Of course one could use even–odd transposition, but this will
not affect the worst-case time complexity of the algorithm for our nine-process or linear array.

Fig 3.9: Sorting on a linear array with the keys input sequentially from the left.

Fig 3.10: Odd–even transposition sort on a linear array.

Let use valuate the odd–even transposition algorithm with respect to the various measures
introduced in Section 1.6. The best sequential sorting algorithms take on the order of plogp
compare–exchange steps to sort a list of size p. Let us assume, for simplicity, that they take
2
exactly p log2 p steps. Then, we have T(1)=W(1)=plog2p,T(p)=p,W(p)=p /2,S(p)=log2p
3 2
(Minsky’sconjecture?), E(p) =(log2p)/p,R(p)=p/(2log2p),U(p)=1/2,andQ(p)=2(log2p) /p .
Ranking the Elements of a Linked List

Our next example computation is important not only because it is a very useful building block in
many applications, but also in view of the fact that it demonstrates how a problem that seems
hopelessly sequential can be efficiently parallelized.

The problem will be presented in terms of a linear linked list of size p, but in practice it often
arises in the context of graphs of the types found in image processing and computer vision
applications. Many graph-theoretic problems deal with (directed) paths between various pairs of
nodes. Such a path essentially consists of a sequence of nodes, each “pointing” to the next node
on the path; thus, a directed path can be viewed as a linear linked list.
The problem of list ranking can be defined as follows: Given a linear linked list of the type
shown in Fig. 3.11, rank the list elements in terms of the distance from each to the terminal

Fig 3.11: Another divide-and-conquer scheme for parallel prefix computation.

Fig 3.12: Example linked list and the ranks of its elements.

element. The terminal element is thus ranked 0, the one pointing to it 1, and so forth. In a list of
length p, each element’s rank will be a unique integer between 0 and p–1.
A sequential algorithm for list ranking requires (p) time. Basically, the list must be traversed
once to determine the distance of each element from the head, storing the results in the linked list
itself or in a separate integer vector. This first pass can also yield the length of the list (six in the
example of Fig. 3.12). A second pass, through the list, or the vector of p intermediate results,
then suffices to compute all of the ranks.

The list ranking problem for the example linked list of Fig. 3.12 may be approached with the
PRAM input and output data structures depicted in Fig. 3.13. The info and next vectors are given,
as is the head pointer (in our example, head = 2). The rank vector must be filled with the unique
element ranks at the termination of the algorithm.

The parallel solution method for this problem is known as pointer jumping:
Repeatedly make each element point to the successor of its successor (i.e., make the pointer jump
over the current successor) until all elements end up pointing to the terminal node, keeping track
of the number of list elements that have been skipped over. If the original list is not to be
modified, a copy can be made in the PRAM’s shared memory in constant time before the
algorithm is applied.

Processor j, 0 ≤j <p, will be responsible for computing rank [j]. The invariant of the list ranking
algorithm given below is that initially and after each iteration, the partial computed rank of each
element is the difference between its rank and the rank of its successor. With the difference
between the rank of a list element and the rank of its successor available, the rank of an element
can be determined as soon as the rank of its successor becomes known. Again, a doubling
process takes place. Initially, only the rank of the terminal element (the only node that points to
itself) is known. In successive iterations of the algorithm, the ranks

Fig 3.13: PRAM data structures representing a linked list and the ranking results.
Of two elements, then four elements, the n eight elements, and so forth become known until the
ranks of all elements have been determined
PRAM list ranking algorithm (via pointer jumping)

Processorj,0 j<p,do{initializethepartialranks}
ifnext [j ] = j then rank[j] := 0 else rank[j ] := 1
endif whilerank[next[head]] 0 Processor j, 0 j <p, do
rank[j] := rank [j] + rank [next[j]] next[j] := next
[next[j]] endwhile
Figure3.14 shows the intermediate e values in the vectors rank(numbers with inboxes) and next
(arrows) as the above list ranking algorithm is applied to the example list. Because the number of
elements that are skipped doubles with each iteration, the number of iterations, and thus the
running time of the algorithm, is logarithmic in p.

List-ranking appears to be hopelessly sequential, as no access to list elements is possible without

traversing all previous elements. However, the list ranking algorithm presented above shows that
we can in fact use a recursive doubling scheme to determine the rank of each element in optimal
time. The problems at the end of the chapter contain other examples of computations on lists that
can be performed just as efficiently. This is why intuition can be misleading when it comes to
determining which computations are or are not efficiently parallelizable (formally, whether a
computation is or is not inNC).

Fig 3.14: Element ranks initially and after each of the three iterations

Parallel Algorithm-Introduction
An algorithm is a sequence of steps that take inputs from the user and after some computation,
produces an output. A parallel algorithm is an algorithm that can execute sever all instructions
simultaneously on different processing devices and then combine all the individual out put to
produce the final result.

Concurrent Processing

The easy availability of computers along with the growth of Internet has changed the way we
store and process data. We are living in a day and age where data is available in abundance.
Every day we deal with huge volumes of data that require complex computing and that too, in
quick time. Sometimes, we need to fetch data from similar or interrelated events that occur
simultaneously. This is where we require concurrent processing that can divide a complex task
and process. It multiple systems to produce the output in quick time.
Concurrent processing is essential where the task involves processing a huge bulk of complex
data. Examples include− accessing large databases, aircraft testing, astronomical calculations,
atomic and nuclear physics, biomedical analysis, economic planning, image processing, robotics,
weather forecasting, web-based services, etc.

What is Parallelism?

Parallelism is the process of processing several set of instructions simultaneously. It reduces the
total computational time. Parallelism can be implemented by using parallel computers, i.e.a
computer with many processors. Parallel computers require parallel algorithm, programming
languages, compilers and operating system that support multitasking.

In this tutorial, we will discuss only about parallel algorithms. Before moving further, let us first
discuss about algorithms and their types.
What is an Algorithm?
An algorithm is a sequence of instructions followed to solve a problem. While designing an
algorithm, we should consider the architecture of computer on which the algorithm will be
executed. As per the architecture, there are two types of computers–
• Sequential computer
• Parallel computer
Depending on the architecture of computers, we have two types of algorithms –
• Sequential Algorithm − An algorithm in which some consecutive steps of instructions are
executed in a chronological order to solve a problem.

• Parallel Algorithm−The problem is divided into sub-problems and are executed in

parallel to get individual outputs. Later on, these individual outputs are combined
together to get the final desired output.

It is not easy to divide a large problem into sub-problems. Sub-problems may have data
dependency among them. Therefore, the processors have to communicate with each other to
solve the problem.

It has been found that the time needed by the processors in communicating with each other is
more than the actual processing time. So, while designing a parallel algorithm, proper CPU
utilization should be considered to get an efficient algorithm.

To design an algorithm properly, we must have a clear idea of the basic model of computation in
a parallel computer.
Model of Computation
Both sequential and parallel computers operate on a set (stream) of instructions called
algorithms. These set of instructions (algorithm) instruct the computer about what it has to do in
each step.
Depending on the instruction stream and data stream, computers can be classified into four
categories–
• Single Instruction stream, Single Data stream(SISD)computers single
• Instruction Stream , Multiple Data Stream (SIMD) computers Multiple
• Instruction Stream , Single Data Stream (MISD) computers Multiple
• Instruction Stream , Multiple Data Stream (MIMD) computers Multiple

SISD Computers
SISD computers contain one control unit, one processing unit, and one memory unit.

Fig 3.15: SISD computers

In this type of computers, the processor receives a single stream of instructions from the control
unit and operates on a single stream of data from the memory unit. During computation, at each
step, the processor receives one instruction from the control unit and operates on a single data
received from the memory unit.
SIMD Computers:
SIMD computers contain one control unit, multiple processing units, and shared memory or
interconnection network.

Fig 3.16: Control Unit and Shared Memory

Here, one single control unit sends instruction to all processing units. During computation, at
each step, all the processors receive a single set of instructions from the control unit and operate
on different set of data from the memory unit.

Each of the processing units has its own local memory unit to store both data and instructions. In
SIMD computers, processors need to communicate among themselves. This is done by shared
memory or by inter connection network.

While some of the processors execute a set of instructions, the remaining processors wait for
their next set of instructions. Instructions from the control unit decides which processor will be
active(execute instructions) or inactive (wait for next instruction).

MISD Computers

As the name suggests, MISD computers contain multiple control units, multiple processing units,
and one common memory unit.

Fig 3.17: Flow of instruction from control unit to memory

Here, each processor has its own control unit and they share a common memory unit.All
the processors get instructions individually from their own control unit and they operate
on a single stream of data as per the instructions they have received from their respective
control units. This processor operates simultaneously.

MIMD Computers

MIMD computers have multiple control units, multiple processing units, and a shared
memory or interconnection network.
Fig 3.18: Instruction and data stream

Here, each processor has its own control unit, local memory unit, and arithmetic and logic
unit. They receive different sets of instructions from their respective control units and
operate on different sets of data.

• An MIMD computer that shares a common memory is known as multiprocessors, while

those that uses an interconnection network is known as multi computers.
• Based on the physical distance of the processors, multicomputers are of two types –

➢ Multicomputer − When all the processors are very close to one another (e.g., in
the same room).

➢ Distributed system − When all the processors are far away from one another (e.g.-
in the different cities)

Parallel Algorithm-Structure

To apply any algorithm properly, it is very important that you select a proper data structure. It is
because a particular operation performed on a data structure may take more time as compared to
the same operation performed on another data structure.
Example−To access the ith element in a set by using an array, it may take a constant time but by
using a linked list, the time required to perform the same operation may become a polynomial.
Therefore, the selection of a data structure must be done considering the architecture and the type
of operations to be performed.
The following data structures are commonly used in parallel programming
− Hypercube Network
Linked List

A linked list is a data structure having zero or more nodes connected by pointers. Nodes may or
may not occupy consecutive memory locations. Each node has two or three parts−one data part
that stores the data and the other two are link fields that store the address of the previous or next
node. The first node’s address is stored in an external pointer called head. The last node, known
as tail, generally does not contain any address.
There are three types of linked lists −
Singly Linked List Doubly Linked List Circular Linked List
Singly Linked List
A node of a singly linked list contains data and the address of the next node. A next ernal pointer
called head stores the address of the first node.

Fig 3.19: Singly linked list

Doubly Linked List
A node of a doubly linked list contains data and the address of both the previous and the next
node. An external pointer called head stores the address of the first node and the external pointer
called tail stores the address of the last node.

Fig 3.20: Doubly linked list

Circular Linked List
A circular linked list is very similar to the singly linked list except the fact that the last node
saved the address of the first node.
Arrays
An array is a data structure where we can store similar types of data. It can be one-dimensional
or multi- dimensional. Arrays can be created statically or dynamically.
In statically declared arrays, dimension and size of the arrays are known at the time of
compilation.
In dynamically declared arrays, dimension and size of the array are known at runtime.
For shared memory programming, arrays can be used as a common memory and for data parallel
programming, they can be used by partitioning into sub-arrays.

Hypercube Network
Hyper cube architecture is helpful for those parallel algorithms where each task has to
communicate with other tasks. Hypercube topology can easily embed other topologies such as
ring and mesh. It is also known as n-cubes, where n is the number of dimensions. A hypercube
can be constructed recursively.
Parallel Algorithm-Matrix Multiplication
A matrix is a set of numerical and non-numerical data arranged in a fixed number of rows and
column. Matrix multiplication is an important multiplication design in parallel communication.
Here we will discuss the implementation of matrix multiplication on various communication
networks like mesh and hypercube. Mesh and hypercube have higher network connectivity, so
they allow faster algorithm than other networks like ring network.

Mesh Network
A topology where a set of nodes for map-dimensional grid is called a mesh topology. Here, all
the edges are parallel to the grid and all the adjacent nodes can communicate among themselves.
Total number of nodes= (number of nodes in row) × (number of nodes in column)
A mesh network can be evaluated using the following factors−

• Diameter
• Bisection width

Diameter−In a mesh network, the longest distance between two nodes is its diameter. Ap-
dimensional mesh network having kP nodes has a diameter of (k–1).

Bisection width − Bisection width is the minimum number of edges needed to be removed from
a network to divide the mesh network into two halves.

Matrix Multiplication Using Mesh Network

We have considered a 2D mesh network SIMD model having wrap around connections. We will
design an algorithm to multiply two n×n arrays using n2 processors in a particular amount of
time.
Matrices A and B have elements aij and bij respectively. Processing element PEij represents aij
and bij. Arrange the matrices A and B in such a way that every processor has a pair of elements
to multiply. The elements of matrix A will move in left direction and the elements of matrix B
will move in upward direction. These changes in the position of the elements in matrix A and B
present each processing element, PE, a new pair of values to multiply.

Steps in Algorithm

Stagger two matrices.

Calculate all products, aik × bkj

Calculate sums when step 2 is complete.

Algorithm
ProcedureMatrixMulti

Begin
fork =1to n-1

forall Pij;wherei andj ranges from1to n ifiisgreater

than k then
rotate a inleft direction
endif

ifj isgreater than k then

rotate b inthe upward direction
endif

forall Pij;wherei andj lies between 1andn compute the

product ofa andb andstore it inc

fork=1to n-1step 1
forall Pi;jwherei andj ranges from1to n rotate a
inleft direction
rotate b inthe upward direction c=c+aXb
End

Hypercube Network
A hypercube is an n-dimensional construct where edges are perpendicular among themselves and
are of same length. An n-dimensional hypercube is also known as an n-cube or an n-dimensional
cube.
Features of Hypercube with 2k node
Diameter = k

Bisection width = 2k–1 Number of edges = k

Matrix Multiplication using Hypercube Network
General specification of Hypercube networks −
• Let N = 2m be the total number of processors. Let the processors be P0, P1…..PN-1.
• Let I and ib be two integers, 0<i, ib<N-1 and its binary representation differ only in
position b,
0< b < k–1.
• Let us consider two n × n matrices, matrix A and matrix B.
• Step1−The elements of matrix A and matrix B are assigned to then 3 processors such that
the processor in position i, j, k will have aji and bik.
• Step 2 − All the processor in position (i,j,k) computes the product
C(i,j,k) = A(i,j,k) × B(i,j,k)
Step 3 − The sum C(0,j,k) = ΣC(i,j,k) for 0 ≤ i ≤ n-1, where 0 ≤ j, k < n–1.

Block Matrix
Block Matrix or partitioned matrix is a matrix where each element itself represents an individual
matrix. These individual sections are known as a block or sub-matrix.

Example
In Figure(a), X is a block matrix where A, B, C, D are matrix themselves. Figure(f) shows the
total matrix.

Block Matrix Multiplication

When two block matrices are square matrices, then they are multiplied just the way we perform
simple matrix multiplication. For example,

Parallel Algorithm-Sorting
Sorting is a process of arranging elements in a group in a particular order, i.e., ascending order,
descending order, alphabetic order, etc. Here we will discuss the following −

• Enumeration Sort
• odd even transportation sortParallel Merge Sort
• Hyper Quick Sort

Sorting a list of elements is a very common operation. A sequential sorting algorithm may not be
efficient enough when we have to sort a huge volume of data. Therefore, parallel algorithms are
used in sorting.

Enumeration Sort
Enumeration sort is a method of arranging all the elements in a list by finding the final position
of each element in a sorted list. It is done by comparing each element with all other elements and
finding the number of elements having smaller value.

Therefore, for any two elements, ai and aj any one of the following cases must be true −

a <a a >a a = a
ii jj ii jj ii jj
Algorithm

procedure ENUM_SORTING (n)

begin
foreach process P1,jdo
C[j]:=0;

foreach process Pi,jdo

if(A[i]<A[j])orA[i]=A[j]andi <j)then
C[j]:=1;
else
C[j]:=0;

foreach process P1,j do

A[C[j]]:=A[j];

endENUM_SORTING

Odd-Even Transposition Sort

Odd-EvenTransposition Sort is based on the Bubble Sort technique. It compares two adjacent
numbers and switches them, if the first number is greater than the second number to get an
ascending order list. The opposite case applies for a descending order series. Odd-Even
transposition sort operates in two phases − odd phase and even phase. In both the phases,
processes exchange numbers with their adjacent number in the right.

Fig 3.21: Example for Odd-Even Transposition sort

Algorithm

procedure ODD-EVEN_PAR (n)

begin
id:=process'slabel

fori:=1tondobegin

if i is odd and id is odd then

compare-exchange_min(id+1);
else
compare-exchange_max(id-1);

if i is even and id is even then

compare-exchange_min(id+1);
else
compare-exchange_max(id-

1);endfor

endODD-EVEN_PAR

Parallel Merge Sort

Merge sort first divides the unsorted list into smallest possible sub-lists, compares it with the
adjacent list, and merges it in a sorted order. It implements parallelism very nicely by following
the divide and conquer algorithm
Fig 3.22: Parallel Merge Sort
Algorithm

procedureparallelmergesort(id,n,data,newdata)

begin
data=sequentialmergesort(data)

for dim =1to n

data=parallelmerge(id,dim,data)endfor

newdata=data
end

Hyper Quick Sort

Hyper quick sort is an implementation of quick sort on hypercube. Its steps are as follows −

• Divide the unsorted list among each node.

• Sort each node locally.
• From node 0, broadcast the median value.
• Split each list locally, then exchange the halves across the highest dimension. Repeat
steps3 and 4 in parallel until the dimension reaches 0.
Algorithm

procedure HYPERQUICKSORT (B,n)

begin

id:=process’s label;

for i :=1to d dobegin

x:=pivot;
partition B intoB1 andB2 such that B1 ≤x <B2;
ifith bit is0then

begin
send B2 to the process along the ith communication link;
C:=subsequence received along the ith communication link;B:=B1 U
C; endif

else
send B1 to the process along the ith communication link; C:=subsequence
received along the ith communication link;B:=B2 U C;
end else end
or

sort B using sequential quicksort;

end HYPERQUICKSORT

Parallel Search Algorithm

• Searching is one of the fundamental operations in computer science. It is used in all

applications where we need to find if an element is in the given list or not. In this
chapter, we will discuss the following search algorithms −

• Divide and Conquer Depth-First Search Breadth-First Search Best-First Search

Divide and Conquer

In divide and conquer approach, the problem is divided into several small sub-problems.
Then the sub- problems are solved recursively and combined to get the solution of the
original problem.
The divide and conquer approach involves the following steps at each level −
Divide − The original problem is divided into sub-problems.
Conquer − The sub-problems are solved recursively.

Combine−The solutions of the sub- problems are combined to get the solution of the
original problem.

Binary search is an example of divide and conquer algorithm.

Pseudocode

Binarysearch(a,b,low,high)

if low <high then return NOT FOUND

else
mid←(low+high)/2
if b =key(mid)then return key(mid)
elseif b <key(mid)then
return BinarySearch(a,b,low,mid−1)
else
return BinarySearch(a,b,mid+1,high)

Depth-First Search

Depth-First Search (orDFS) is an algorithm for searching a tree or an undirected graph

data structure. Here, the concept is to start from the starting node known as the root and
traverse as far as possible in the same branch. If we get anode with no successor node,
were turn and continue with the vertex, which is yet to be visited.
Steps of Depth-First Search

• Consider a node (root) that is not visited previously and mark it visited. Visit the first
adjacent successor node and mark it visited.

• If all the successors nodes of the considered node are already visited or it doesn’t have
any more successor node, return to its parent node.
Pseudo code
Let v be the vertex where the search starts in Graph G.
DFS(G,v)

StackS :={};

foreach vertex u,setvisited[u]:=false;push S,v;

while(S isnotempty)do
u:=pop S;

if(notvisited[u])then
visited[u]:=true;
foreach unvisited neighbour w ofu push
S,w;
endif

endwhile

ENDDFS()

Breadth-First Search

Breadth-First Search (orBFS) is an algorithm for searching a tree or an undirected graph data
structure. Here, we start with a node and then visit all the adjacent nodes in the same level and
then move to the adjacent successor node in the next level. This is also known as level-by-level
search.
Steps of Breadth-First Search
• Start with the root node, mark it visited.
• As the root node has no node in the same level, go to the next level. Visit all adjacent
nodes and mark them visited.
• Go to the next level and visit all the unvisited adjacent nodes. Continue this process until
all the nodes are visited.
Pseudocode
Let v be the vertex where the search starts in Graph G.

BFS(G,v)

QueueQ :={};

foreach vertex u,setvisited[u]:=false;insert

Q,v;
while(Q isnotempty)do
u:=deleteQ;

if(notvisited[u])then
foreach unvisited neighbor w ofu insert Q,w;
endif

endwhile

ENDBFS()

Best-First Search
Best-First Search is an algorithm that traverses a graph to reach a target in the shortest possible
path. Unlike BFS and DFS, Best-First Search follows an evaluation function to determine which
node is the most appropriate to traverse next.
Steps of Best-First Search
• Start with the root node, mark it visited.
• Find the next appropriate node and mark it visited.
• Go to the next level and find the appropriate node and mark it visited. Continue this
process until the target is reached.

Pseudocode
BFS(m )

Insert(m.StartNode)
UntilPriorityQueueisempty
c←PriorityQueue.DeleteMin Ifcisthe goal
ExitElse

Foreachneighbor n ofc
If n "Unvisited"Markn"Visited"
Insert(n)
Markc"Examined"

End procedure

Graph Algorithm
A graph is an abstract notation used to represent the connection between pairs of objects. A
graph consists of −
• Vertices − Interconnected objects in a graph are called vertices. Vertices are also known
as
• nodes.

• Edges − Edges are the links that connect the vertices.

• There are two types of graphs −

• Directed graph − In a directed graph, edges have direction, i.e., edges go from one vertex
to another.

• Undirected graph − In an undirected graph, edges have no direction.

Graph Coloring
Graph coloring is a method to assign colors to the vertices of a graph so that no two adjacent
vertices have the same color. Some graph coloring problems are−

• Vertex coloring − A way of coloring the vertices of a graph so that no two adjacent
vertices share the same color.

• Edge Coloring − It is the method of assigning a color to each edge so that no two adjacent
edges have the same color.

• Facecoloring−Itassignsacolortoeachfaceorregionofaplanargraphsothatnotwofaces that
share a common boundary have the samecolor.
Chromatic Number
Chromatic number is the minimum number of colors required to color a graph. For example, the
chromatic number of the following graph is3.

Fig 3.23: Chromatic Number

The concept of graph coloring is applied in preparing timetables, mobile radio frequency
assignment, Suduku, register allocation, and coloring of maps.
Steps for graph coloring

• Set the initial value of each processor in the n-dimensional array to 1.

• Now to assign a particular color to a vertex, determine whether that color is already
assigned to the adjacent vertices or not.
• If a processor detects same color in the adjacent vertices, it sets its value in the array to 0.

2
After making n comparisons, if any element of the array is 1, then it is a valid coloring.

Pseudocode for graph coloring

begin

create the processors P(i0,i1,...in-1)where0_iv<m,0_ v <n status[i0,..in-1]=1

forj varies from0to n-1dobegin

fork varies from0to n-1dobegin

ifaj,k=1andij=ikthen status[i0,..in-1]=0
end

end
ok=ΣStatus

ifok >0,thendisplay valid coloring exists

else
display invalid coloring

end

Minimal Spanning Tree

A spanning tree whose sum of weight (or length) of all its edges is less than all other possible
spanning tree of graph G is known as a minimal spanning tree or minimum cost spanning tree.
The following figure shows a weighted connected graph.

Some possible spanning trees of the above graph are shown below −
Among all the above spanning trees, figure(d) is the minimum spanning tree. The concept of
minimum cost spanning tree is applied in travelling sales man problem, designing electronic
circuits, Designing efficient networks, and designing efficient routing algorithms.
To implement the minimum cost-spanning tree, the following two methods are used −
Prim’s Algorithm Kruskal’s Algorithm
Prim's Algorithm
Prim’s algorithm is a greedy algorithm, which helps us find the minimum spanning tree for a
weighted undirected graph. It selects a vertex first and finds an edge with the lowest weight
incident on that vertex.
• Steps of Prim’s Algorithm
• Select any vertex, say v1 of Graph G.
• Select an edge, say e1 of G such that e1 = v1 v2 and v1 ≠ v2 and e1 has minimum
weight among the edges incident on v1 in graph G.
• Now, following step 2, select the minimum weighted edge incident on v2. Continue

this till n–1 edges have been chosen. Here n is the number of vertices.

The minimum spanning tree is −

Kruskal's Algorithm
Kruskal’s algorithm is a greedy algorithm, which helps us find the minimum spanning tree
for a connected weighted graph, adding increasing cost arcsat each step. It is a minimum-
spanning-tree algorithm that finds an edge of the least possible weight that connects any two
trees in the forest.
Steps of Kruskal’s Algorithm
• Select an edge of minimum weight; say e1 of Graph G and e1 is not a loop. Select the
next minimum weight ed edge connected to e1.

• Continue this till n–1 edges have been chosen. Here n is the number of vertices.
The minimum spanning tree of the above graph is −

Shortest Path Algorithm

Shortest Path algorithm is a method of finding the least cost path from the source node(S) to
the destination node (D). Here, we will discuss Moore’s algorithm, also known as Breadth
First Search Algorithm.
Moore’s algorithm

• Label the source vertex, S and label it i and set i=0.

• Find all unlabeled vertices adjacent to the vertex labeled i. If no vertices are
connected to the vertex, S, then vertex, D, is not connected to S. If there are vertices
connected to S, label them i+1.

• If D is labeled, then go to step4, else go to step2 to increase i=i+1. Stop after the
length of the shortest path is found.

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6408)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (640)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1173)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (990)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1851)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4101)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (887)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (627)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1015)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1138)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (581)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (297)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5142)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4355)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (460)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2126)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (278)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2283)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2001)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1087)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2785)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2032)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2876)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4087)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (835)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (918)
EWAEWC Pair Trade Kalman Filter
No ratings yet
EWAEWC Pair Trade Kalman Filter
4 pages
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (814)
Unit-6 Concurrent and Parallel Programming:: C++ AMP (Accelerated Massive Programming)
No ratings yet
Unit-6 Concurrent and Parallel Programming:: C++ AMP (Accelerated Massive Programming)
21 pages
Unit-Iv Concurrent and Parallel Programming: Parallel Programming Paradigms - Data Parallel
No ratings yet
Unit-Iv Concurrent and Parallel Programming: Parallel Programming Paradigms - Data Parallel
61 pages
CPP Unit-4
No ratings yet
CPP Unit-4
61 pages
CPP Unit-2
No ratings yet
CPP Unit-2
37 pages
Unit-1 Concurrent and Parallel Programming Syllabus: Concurrent Versus Sequential Programming. Concurrent Programming Constructs
No ratings yet
Unit-1 Concurrent and Parallel Programming Syllabus: Concurrent Versus Sequential Programming. Concurrent Programming Constructs
26 pages
Universal Approximation Theorem
No ratings yet
Universal Approximation Theorem
1 page
CPP Unit-1
No ratings yet
CPP Unit-1
26 pages
Hen HLPS, Ges: Duc Ag Algor Conve
No ratings yet
Hen HLPS, Ges: Duc Ag Algor Conve
25 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (277)
Spectrum of Graph
No ratings yet
Spectrum of Graph
27 pages
Main Project Titles
No ratings yet
Main Project Titles
16 pages
Application of Partition Function
No ratings yet
Application of Partition Function
2 pages
Cryptpgrahy Simp Tie
No ratings yet
Cryptpgrahy Simp Tie
3 pages
Chapter 16: Security: Silberschatz, Galvin and Gagne ©2018 Operating System Concepts - 10 Edition
No ratings yet
Chapter 16: Security: Silberschatz, Galvin and Gagne ©2018 Operating System Concepts - 10 Edition
58 pages
Q No. Ans
No ratings yet
Q No. Ans
19 pages
Ch-5-Tabular Representation of Data (Prashant Kirad)
No ratings yet
Ch-5-Tabular Representation of Data (Prashant Kirad)
6 pages
Figure PPT ch008
No ratings yet
Figure PPT ch008
46 pages
1 - DSA Overview, Arrays, Strings, Bit Manipulation
No ratings yet
1 - DSA Overview, Arrays, Strings, Bit Manipulation
17 pages
Chap 5
No ratings yet
Chap 5
19 pages
Transfer Learning in Natural Language Processing PDF
0% (1)
Transfer Learning in Natural Language Processing PDF
238 pages
Credit Card Fraud Detection Using A Deep Learning Multistage Model
No ratings yet
Credit Card Fraud Detection Using A Deep Learning Multistage Model
26 pages
Bode Plot: Dr.R.Subasri Professor, Kongu Engineering College, Perundurai, Erode, Tamilnadu, INDIA
No ratings yet
Bode Plot: Dr.R.Subasri Professor, Kongu Engineering College, Perundurai, Erode, Tamilnadu, INDIA
33 pages
3.1.3 Explicit Analysis Vs Implicit Analysis: 30 Chapter 3. Theoretical Focus: Related Literature
No ratings yet
3.1.3 Explicit Analysis Vs Implicit Analysis: 30 Chapter 3. Theoretical Focus: Related Literature
1 page
Tutorial 2
No ratings yet
Tutorial 2
4 pages
Chapter 04 - Context Free Language
No ratings yet
Chapter 04 - Context Free Language
21 pages
PCM in Digital Communications
No ratings yet
PCM in Digital Communications
12 pages
6190506 Real Estate Analysis Part I
No ratings yet
6190506 Real Estate Analysis Part I
8 pages
Full download (Ebook) The digital signal processing fundamentals by Madisetti V. (ed.) ISBN 9781420046069, 1420046063 pdf docx
100% (3)
Full download (Ebook) The digital signal processing fundamentals by Madisetti V. (ed.) ISBN 9781420046069, 1420046063 pdf docx
67 pages
Irrational Cryptography
No ratings yet
Irrational Cryptography
10 pages
Flow Chart For Product of First N Natural Numbers: Syllabus/Lectures/Same/First Grade/programming 1 PDF
No ratings yet
Flow Chart For Product of First N Natural Numbers: Syllabus/Lectures/Same/First Grade/programming 1 PDF
7 pages
difference of dda and beresenham
No ratings yet
difference of dda and beresenham
8 pages
ECG Final Group Project
No ratings yet
ECG Final Group Project
60 pages
Devanshu File C
No ratings yet
Devanshu File C
12 pages
ShortBusRide: Dijkstra For Bus Routes
No ratings yet
ShortBusRide: Dijkstra For Bus Routes
10 pages
Dokumen - Pub Introduction To The Design and Analysis of Algorithms 0071243461 9780071243469
No ratings yet
Dokumen - Pub Introduction To The Design and Analysis of Algorithms 0071243461 9780071243469
750 pages
Mws Gen Sle PPT Gaussian
No ratings yet
Mws Gen Sle PPT Gaussian
94 pages
Minimum Description Length
No ratings yet
Minimum Description Length
6 pages

UNIT-3 Concurrent and Parallel Programming:: Some Simple Computations

Uploaded by

UNIT-3 Concurrent and Parallel Programming:: Some Simple Computations

Uploaded by

UNIT-3

Concurrent and Parallel Programming: Parallel algorithms – sorting, ranking,

SOME SIMPLE COMPUTATIONS:

In this section three fundamental building-block computations are defined:

1. Semigroup (reduction, fan-in)computation

1. Semigroup Computation. Let be an associative binary operator; i.e., ( x⊕y ) ⊕z= x ⊕ ( y

3. Packet Routing. A packet of information resides at Processor i and must be sent to

Fig 3.1: Semigroup computation on a uniprocessor.

4. Broadcasting: Given a value a known at a certain processor i, disseminate it to all p

1. Linear array of processors

Fig 3.3: A balanced (but incomplete) binary tree of nine processors.

Shared Memory: A shared-memory multiprocessor can be modeled as a complete graph, in

Fig 3.5: A shared-variable architecture modeled as a complete graph.

ALGORITHMS FOR A LINEARARRAY

Packet Routing. To send a packet of information from Processor i to Processor j on a linear

Fig 3.10: Odd–even transposition sort on a linear array.

Fig 3.11: Another divide-and-conquer scheme for parallel prefix computation.

List-ranking appears to be hopelessly sequential, as no access to list elements is possible without

• Parallel Algorithm−The problem is divided into sub-problems and are executed in

Fig 3.15: SISD computers

Fig 3.16: Control Unit and Shared Memory

Fig 3.17: Flow of instruction from control unit to memory

• An MIMD computer that shares a common memory is known as multiprocessors, while

Fig 3.19: Singly linked list

Fig 3.20: Doubly linked list

Matrix Multiplication Using Mesh Network

Stagger two matrices.

Calculate all products, aik × bkj

forall Pij;wherei andj ranges from1to n ifiisgreater

ifj isgreater than k then

forall Pij;wherei andj lies between 1andn compute the

Bisection width = 2k–1 Number of edges = k

Block Matrix Multiplication

procedure ENUM_SORTING (n)

foreach process Pi,jdo

foreach process P1,j do

Odd-Even Transposition Sort

Fig 3.21: Example for Odd-Even Transposition sort

procedure ODD-EVEN_PAR (n)

if i is odd and id is odd then

if i is even and id is even then

Parallel Merge Sort

for dim =1to n

Hyper Quick Sort

• Divide the unsorted list among each node.

procedure HYPERQUICKSORT (B,n)

for i :=1to d dobegin

sort B using sequential quicksort;

Parallel Search Algorithm

• Searching is one of the fundamental operations in computer science. It is used in all

• Divide and Conquer Depth-First Search Breadth-First Search Best-First Search

Divide and Conquer

Binary search is an example of divide and conquer algorithm.

if low <high then return NOT FOUND

Depth-First Search (orDFS) is an algorithm for searching a tree or an undirected graph

foreach vertex u,setvisited[u]:=false;push S,v;

foreach vertex u,setvisited[u]:=false;insert

• Edges − Edges are the links that connect the vertices.

• There are two types of graphs −

• Undirected graph − In an undirected graph, edges have no direction.

Fig 3.23: Chromatic Number

• Set the initial value of each processor in the n-dimensional array to 1.

Pseudocode for graph coloring

create the processors P(i0,i1,...in-1)where0_iv<m,0_ v <n status[i0,..in-1]=1

forj varies from0to n-1dobegin

fork varies from0to n-1dobegin

ifok >0,thendisplay valid coloring exists

Minimal Spanning Tree

The minimum spanning tree is −

Shortest Path Algorithm

• Label the source vertex, S and label it i and set i=0.

You might also like