Ca 3

Uploaded by

ashikapramodpm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views34 pages

Ca 3

Uploaded by

ashikapramodpm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Parallel algorithms for array

processors
• SIMD array processors was developed to
perform parallel computations on vector or
matrix types of data.
• Parallel processing algorithms have been
developed by many computer scientists for
SIMD computers.
•Important SIMD algorithms can be used to
perform
 Matrix multiplication,
 Fast Fourier transform (FFT),
 Matrix transposition,
 Summation of vector elements,
 Matrix inversion,
 Parallel sorting,
 Linear recurrence,
 Boolean matrix operations, and
 To solve partial differential equations.
• The implementation of parallel algorithms on
SIMD machines is described by concurrent
ALGOL.
• The physical memory allocations and program
implementation depend on the specific
architecture of a given SIMD machine.
SIMD Matrix Multiplication

• Cumulative multiplication refers to the linked
multiply-add operation c ← c + a x b.
• The addition is merged into the multiplication
because the multiply is equivalent to multi-
operand addition.
• Therefore, unit time is the time required to
perform one cumulative multiplication.
Example : O(n2 ) algorithm for
SIMD matrix multiplication
•It should be noted that the vector load
operation is performed to initialize the row
vectors of matrix C one row at a time.
•In the vector multiply operation, the same
multiplier aij is broadcast from the CU to all PEs
to multiply all n elements {bik for k = 1, 2, ... , n}
of the ith row vector of B.
•In total, n2 vector multiply operations are
needed in the double loops.
•Each vector multiply instruction implies n
parallel scalar multiplications in each of the n2
iterations.
•This algorithm is implementable on an array of
n PEs.
Memory allocation
•Implementation of matrix multiplication on a
SIMD computer with n PEs.
•The algorithm construct depends heavily on the
memory allocation of the A, B and C matrices in
the PEMs.
•Each row vector of the matrix are stored across
the PEMs.
•Column vectors are then stored within the
same PEM.
•This memory allocation scheme allows parallel
access of all the elements in each row vector of
the matrices.
•Based on this data distribution, we obtain the
O(n2) matrix multiplication parallel algorithm.
• The two parallel do operations correspond to
vector load for initialization and vector
multiply for the inner loop of additive
multiplications.
• The time complexity has been reduced to
O(n2).
• Therefore, the SIMD algorithm is n times
faster than the SISD algorithm for matrix
multiplication
•The successive memory contents in the
execution of the SIMD matrix multiplication
program are:
Parallel sorting on array processors
• A SIMD algorithm is to be presented for sorting
n2 connected processor array in O(n) routing
and comparison steps.
• This shows a speedup of O(log2n) over the best
sorting algorithm which takes O(nlog2n) steps
on a uniprocessor system
• Assume an array processor with N=n2 identical
PEs interconnected by a mesh network similar
to the Illiac-IV except that the PEs at the
perimeter have 2 or 3 rather than 4
neighbours.
• That is, there are no wrap around connections
in this simplified mesh network
• Eliminating wrap around condition simplifies
the array sorting algorithm
• The time complexity of the array sorting
algorithm would be affected by at most a
factor of two if the wraparound connections
were included
 Two time measures are needed to estimate
the time complexity of the parallel sorting
algorithm.
1.Routing Time, tR
2.Comparison Time, tC
• Let tR be the routing time required to move
one item from a PE to one of its neighbours,
and tc be the comparison time required for
one comparison step.
• Concurrent data routing is allowed upto N
comparisons may be performed
simultaneously
•This means that a comparison-interchange step
between two items in adjacent PEs can be
done in 2tR+tc time units (route left, compare
and route right).
• A mixture of horizontal and vertical
comparison interchanges requires at least
4tR+tc time units.
•The sorting problem depends on the indexing
schemes on the PEs.
•The PEs may be indexed by a bijection from
{1,2,..,n} x {1,2,..,n} to {0,1,…N-1}, where N=n2.
• The choice of a particular indexing scheme
depends upon how the sorted elements will
be used.
• The longest routing path on the mesh in a
sorting process is the transposition of two
elements initially loaded at opposite corner
PEs.
• This transposition needs atleast 4(n-1) routing
steps.
Batcher's
odd-even
merge sort
of two
sorted
sequences
on a set of
linearly
connected
PEs
• The shuffle and unshuffle operations can each
be implemented with a sequence of
interchange operations (marked by the
double-arrows).
• Both the perfect shuffle and its inverse
(unshuffle) can be done in k-1 interchanges
or 2(k— 1) routing steps on a linear array of 2k
PEs.
Example : M(j,2) sorting algorithm
• Given two sorted columns of length j ≥ 2, the
M(j,2) algorithm consists of the following
steps:
The M(j,2) algorithm is illustrated for an
M(4,2) sorting
Connection Issues for SIMD
Processing
•SIMD array processors allow explicit expression
of parallelism in user programs.
•The compiler detects the parallelism and
generates object code suitable for execution in
the multiple processing elements and the
control unit.
• Program segments that cannot be converted
into parallel executable forms are executed in
the control unit;
• Program segments that can be converted into
parallel executable forms are sent to the PEs
and executed synchronously on data fetched
from parallel memory modules under the
control of the control unit.
• To enable synchronous manipulation in the
PEs, the data is permuted and arranged in
vector form.
• Thus, to run a program more efficiently on an
array processor, one must develop a
technique for vectorizing the program codes.
• The interconnection network plays a major
role in vectorization.
• Several connection issues in using SIMD inter-
connection networks are :
1. Permutation and connectivity
2. Partitioning and reconfigurability
3. Reliability and bandwidth

Mtech Cse 1 Sem Advanced Algorithms v34 2012
No ratings yet
Mtech Cse 1 Sem Advanced Algorithms v34 2012
4 pages
Lec Notes
No ratings yet
Lec Notes
50 pages
Architectures For Parrallel Computation
No ratings yet
Architectures For Parrallel Computation
40 pages
Smith Waterman
100% (1)
Smith Waterman
23 pages
Unit - 2 HPC
No ratings yet
Unit - 2 HPC
96 pages
HJ Listrank
No ratings yet
HJ Listrank
20 pages
ADA Lab File
No ratings yet
ADA Lab File
45 pages
Linear Array: Jyotika Jain
No ratings yet
Linear Array: Jyotika Jain
22 pages
The Design and Analysis of Parallel Algorithms
No ratings yet
The Design and Analysis of Parallel Algorithms
412 pages
High Performance Computing Matrix Mul.
No ratings yet
High Performance Computing Matrix Mul.
15 pages
Unit 3
No ratings yet
Unit 3
10 pages
Overheads
No ratings yet
Overheads
139 pages
Parallel Algorithms Underlying MPI Implementations
No ratings yet
Parallel Algorithms Underlying MPI Implementations
55 pages
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
No ratings yet
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
27 pages
Sols Book PDF
100% (1)
Sols Book PDF
120 pages
Assignment 2 MPI MSA
No ratings yet
Assignment 2 MPI MSA
10 pages
Pda 4
No ratings yet
Pda 4
82 pages
1 Analyzing Algorithms
No ratings yet
1 Analyzing Algorithms
21 pages
Implementing Linear Algebraalgorithms For Dense Matrices
No ratings yet
Implementing Linear Algebraalgorithms For Dense Matrices
22 pages
DS Unit 1
No ratings yet
DS Unit 1
28 pages
Chained Matrix Multiplication
No ratings yet
Chained Matrix Multiplication
32 pages
Algorithms and Data Structure
No ratings yet
Algorithms and Data Structure
29 pages
Module 3a
No ratings yet
Module 3a
43 pages
Sorting On A Mesh-Connected Parallel Computer
No ratings yet
Sorting On A Mesh-Connected Parallel Computer
30 pages
Parallel Algo C Sar
No ratings yet
Parallel Algo C Sar
28 pages
Lect11 12 Parallel
No ratings yet
Lect11 12 Parallel
57 pages
Tiny Project 1
No ratings yet
Tiny Project 1
2 pages
Compre 1
No ratings yet
Compre 1
2 pages
Chapter Six
No ratings yet
Chapter Six
19 pages
Chapter Six
No ratings yet
Chapter Six
18 pages
Matrix Multiplication-Javan.
No ratings yet
Matrix Multiplication-Javan.
6 pages
Introduction To Algorithms: Dynamic Programming
No ratings yet
Introduction To Algorithms: Dynamic Programming
25 pages
Solution Manual of Cmputer Organization and Architectur
44% (27)
Solution Manual of Cmputer Organization and Architectur
29 pages
Case Study
33% (3)
Case Study
4 pages
Sheet 2: Problem 1: Matrix Multiplication Using CREW PRAM
No ratings yet
Sheet 2: Problem 1: Matrix Multiplication Using CREW PRAM
3 pages
Chapter 14: Parallel Algorithms
No ratings yet
Chapter 14: Parallel Algorithms
23 pages
DSA Updated Manual Sp17 25th May 2017
No ratings yet
DSA Updated Manual Sp17 25th May 2017
66 pages
Mod 3
No ratings yet
Mod 3
16 pages
1.1 Parallelism Is Ubiquitous
No ratings yet
1.1 Parallelism Is Ubiquitous
3 pages
LEC12-Optimization and New Trends
No ratings yet
LEC12-Optimization and New Trends
23 pages
Enayatullah Atal: Mid Term Assignment
No ratings yet
Enayatullah Atal: Mid Term Assignment
11 pages
Ass Parallel
No ratings yet
Ass Parallel
11 pages
Data Structure Unit 1
No ratings yet
Data Structure Unit 1
45 pages
Parallel Models of Computation
No ratings yet
Parallel Models of Computation
3 pages
!!!catia V5R12 Mechanical Design Solutions 1
100% (2)
!!!catia V5R12 Mechanical Design Solutions 1
2,065 pages
IJCRT2304397
No ratings yet
IJCRT2304397
5 pages
HPC Codes-2
No ratings yet
HPC Codes-2
15 pages
Unit 2 Basic Optimization Techniques For Serial Code
No ratings yet
Unit 2 Basic Optimization Techniques For Serial Code
31 pages
Advanced Computer Architecture 1
No ratings yet
Advanced Computer Architecture 1
14 pages
Greedy DP
No ratings yet
Greedy DP
57 pages
A Practical Performance Comparison of Parallel Matrix Multiplication Algorithms On Networks of Workstations
No ratings yet
A Practical Performance Comparison of Parallel Matrix Multiplication Algorithms On Networks of Workstations
2 pages
Math Task Sheet 19 - MYP 3 - 07.12.18
0% (1)
Math Task Sheet 19 - MYP 3 - 07.12.18
6 pages
Matrix-Matrix Multiplication Using Systolic Array Architecture in Bluespec
No ratings yet
Matrix-Matrix Multiplication Using Systolic Array Architecture in Bluespec
8 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
Cannon Strassen DNS Algorithm
No ratings yet
Cannon Strassen DNS Algorithm
10 pages
SIMD Computer Organizations
0% (1)
SIMD Computer Organizations
20 pages
Systolic Arrays & Their Applications
No ratings yet
Systolic Arrays & Their Applications
35 pages
Optalign RS5
0% (1)
Optalign RS5
185 pages
NADRA
No ratings yet
NADRA
4 pages
Make Python Tutorial
No ratings yet
Make Python Tutorial
23 pages
Gantt Chart
No ratings yet
Gantt Chart
12 pages
Cisco ASR 1000 Series Aggregation Services Routers SIP and SPA Software Configuration Guide
No ratings yet
Cisco ASR 1000 Series Aggregation Services Routers SIP and SPA Software Configuration Guide
442 pages
Reliability Engineering ECE-419E: EWP New Delhi
No ratings yet
Reliability Engineering ECE-419E: EWP New Delhi
3 pages
VAE-Driven Multimodal Fusion For Early Cardiac Disease Detection
No ratings yet
VAE-Driven Multimodal Fusion For Early Cardiac Disease Detection
17 pages
Release Notes BUSY 18
No ratings yet
Release Notes BUSY 18
53 pages
P 3
No ratings yet
P 3
4 pages
Predicting Market Performance Using Machine and Deep Learning Techniques
No ratings yet
Predicting Market Performance Using Machine and Deep Learning Techniques
8 pages
AICyber-Chain Combining AI and Blockchain For Improved Cybersecurity
No ratings yet
AICyber-Chain Combining AI and Blockchain For Improved Cybersecurity
22 pages
Lec5.1 - Convolutional Networks For Images - Speech - and Time Series PDF
No ratings yet
Lec5.1 - Convolutional Networks For Images - Speech - and Time Series PDF
14 pages
An Ensemble Deep Learning Model For Vehicular Engine Health Prediction
No ratings yet
An Ensemble Deep Learning Model For Vehicular Engine Health Prediction
19 pages
Binary Parallel Adders
No ratings yet
Binary Parallel Adders
58 pages
IOT Module 4
No ratings yet
IOT Module 4
14 pages
IP Security Architecture
No ratings yet
IP Security Architecture
11 pages
A - Circular Distance: Problem Statement
No ratings yet
A - Circular Distance: Problem Statement
15 pages
High Accuracy Lane Line Detection System Using Enhanced Yolo V3
No ratings yet
High Accuracy Lane Line Detection System Using Enhanced Yolo V3
6 pages
Question Bank For JS-1
No ratings yet
Question Bank For JS-1
3 pages
Seminar Index
No ratings yet
Seminar Index
5 pages
Lab Assignment 2 ESD
No ratings yet
Lab Assignment 2 ESD
1 page
AI-Powered Freshness Detection and Shelf Life Prediction System For Food Items Using Image Processing
No ratings yet
AI-Powered Freshness Detection and Shelf Life Prediction System For Food Items Using Image Processing
1 page
Pneumonia Detection and Classification Using Deep Learning Abstract
No ratings yet
Pneumonia Detection and Classification Using Deep Learning Abstract
1 page
Uss 2023-24 CONSOLIDATed 8
No ratings yet
Uss 2023-24 CONSOLIDATed 8
1 page
Uss 2023-24 MARK Test 6
No ratings yet
Uss 2023-24 MARK Test 6
1 page
00220awatt Meter
No ratings yet
00220awatt Meter
96 pages
Sophos Enterprise Console Quick Startup Guide: 5.2 Product Version: March 2015 Document Date
No ratings yet
Sophos Enterprise Console Quick Startup Guide: 5.2 Product Version: March 2015 Document Date
28 pages
Readme
No ratings yet
Readme
2 pages
Unit 3 - Object Oriented Programming and Methodology - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Object Oriented Programming and Methodology - WWW - Rgpvnotes.in
15 pages
ERP and Automation of Information Flow in A Diagnostic Centre
No ratings yet
ERP and Automation of Information Flow in A Diagnostic Centre
2 pages
Connaing Lndia: Bharat Sanchar Nigam Lmited
No ratings yet
Connaing Lndia: Bharat Sanchar Nigam Lmited
1 page
DB200
No ratings yet
DB200
2 pages
Source Code To Edith
No ratings yet
Source Code To Edith
8 pages
Inference Rules
No ratings yet
Inference Rules
19 pages
Lesson 4
No ratings yet
Lesson 4
15 pages
Bluecoat Knowledge Base
No ratings yet
Bluecoat Knowledge Base
16 pages
Learning Institutions' ("Schools") Information Form: THE COMMUNITY CHEST ("TCC") (952008-A)
No ratings yet
Learning Institutions' ("Schools") Information Form: THE COMMUNITY CHEST ("TCC") (952008-A)
5 pages
Sentiment Analysis of Typhoon Related Tweets Using Standard and Bidirectional Recurrent Neural Networks
No ratings yet
Sentiment Analysis of Typhoon Related Tweets Using Standard and Bidirectional Recurrent Neural Networks
5 pages
Pranava J K: Internship Technical Skills
No ratings yet
Pranava J K: Internship Technical Skills
4 pages
Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
Routing in Wireless Mesh Networks
From Everand
Routing in Wireless Mesh Networks
Raghav Kumar
No ratings yet
Nonlinear Control Feedback Linearization Sliding Mode Control
From Everand
Nonlinear Control Feedback Linearization Sliding Mode Control
Mourad Boufadene
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet

Ca 3

Uploaded by

Ca 3

Uploaded by

Parallel algorithms for array

You might also like