0% found this document useful (0 votes)

55 views11 pages

Unit-7 Design Issues For Parallel Computers Definition

This is the lecturer notes for Personal Computer Organization or simply Computer Organization which is helpful for electronics and communication or computer engineering students

Uploaded by

Sarbesh Chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views11 pages

Unit-7 Design Issues For Parallel Computers Definition

This is the lecturer notes for Personal Computer Organization or simply Computer Organization which is helpful for electronics and communication or computer engineering students

Uploaded by

Sarbesh Chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

1

Unit-7 design issues for parallel computers

Definition:
Processing of multiple tasks simultaneously on multiple processors is called parallel processing.
The parallel program consists of multiple active processes simultaneously solving a given
problem. The parallel program consists of multiple active processes simultaneously solving a
given problem. A given task is divided into multiple subtasks using divide and conquer
technique and each of them are processed on different CPU’s. Programming on multiprocessor
system using divide – conquer technique is called parallel programming.

Generally, issues in parallel computing are:

 Design of parallel computers

• Design of efficient parallel algorithms
• Parallel programming models
• Parallel computer language
• Methods for evaluating parallel algorithms
• Parallel programming tools
• Portable parallel programs
Why parallel processing?

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
2

Control Structure of Parallel computers (architecture of parallel computers)

• Processing units in parallel computers either operate under the centralized control of a single
control unit or work independently.
• If there is a single control unit that dispatches the same instruction to various processors (that
work on different data), the model is referred to as single instruction stream, multiple data
stream (SIMD).
• If each processor has its own control unit, each processor can execute different instructions on
different data items. This model is called multiple instruction stream, multiple data stream
(MIMD).

A typical SIMD architecture (a) and a typical MIMD architecture (b)

SIMD Processors
• Some of the earliest parallel computers such as the Illiac IV, MPP, DAP, CM-2, and MasPar MP-1
belonged to this class of machines.
• Variants of this concept have found use in co-processing units such as the MMX units in Intel
processors and DSP chips such as the Sharc.
• SIMD relies on the regular structure of computations (such as those in image processing).
• It is often necessary to selectively turn off operations on certain data items. For this reason,
most SIMD programming paradigms allow for an ``activity mask'', which determines if a
processor should participate in a computation or not.

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
3

MIMD Processors
• In contrast to SIMD processors, MIMD processors can execute different programs on different
processors.
• A variant of this, called single program multiple data streams (SPMD) executes the same
program on different processors.
• It is easy to see that SPMD and MIMD are closely related in terms of programming flexibility and
underlying architectural support.
Examples of such platforms include current generation Sun Ultra Servers, SGI Origin Servers,
multiprocessor PCs, workstation clusters, and the IBM SP
SIMD-MIMD Comparison
• SIMD computers require less hardware than MIMD computers (single control unit).
• However, since SIMD processors ae specially designed, they tend to be expensive and have long
design cycles.
• Not all applications are naturally suited to SIMD processors.
In contrast, platforms supporting the SPMD paradigm can be built from inexpensive off-the-shelf
components with relatively little effort in a short amount of time
Communication Model of Parallel computers
• There are two primary forms of data exchange between parallel tasks - :
• accessing a shared data space and exchanging messages.
• Platforms that provide a shared data space are called shared-address-space machines or
multiprocessors.
• Platforms that support messaging are also called message passing platforms or
multicomputers.
Shared-Address-Space Platforms
• Part (or all) of the memory is accessible to all processors.
• Processors interact by modifying data objects stored in this shared-address-space.
• If the time taken by a processor to access any memory word in the system global or local is
identical, the platform is classified as a uniform memory access (UMA), else, a non-uniform
memory access (NUMA) machine.
NUMA and UMA Shared-Address-Space Platforms

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
4

Figure: Typical shared-address-space architectures:

(a) Uniform-memory access shared-address-space computer;
(b) Uniform-memory-access shared-address-space computer with caches and memories;
(c) Non-uniform-memory-access shared-address-space computer with local memory only.
NUMA and UMA Shared-Address-Space Platforms comparison
• The distinction between NUMA and UMA platforms is important from the point of view of
algorithm design. NUMA machines require locality from underlying algorithms for performance.
• Programming these platforms is easier since reads and writes are implicitly visible to other
processors.
• However, read-write data to shared data must be coordinated (this will be discussed in greater
detail when we talk about threads programming).
• Caches in such machines require coordinated access to multiple copies. This leads to the cache
coherence problem.
• A weaker model of these machines provides an address map, but not coordinated access. These
models are called non cache coherent shared address space machines.
Shared-Address-Space vs. Shared Memory Machines
• It is important to note the difference between the terms shared address space and shared
memory.
• We refer to the former as a programming abstraction and to the latter as a physical machine
attribute.
• It is possible to provide a shared address space using a physically distributed memory.

Message-Passing Platforms:
• These platforms comprise of a set of processors and their own (exclusive) memory.

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
5

• Instances of such a view come naturally from clustered workstations and non-shared-address-
space multi-computers.
• These platforms are programmed using (variants of) send and receive primitives.
• Libraries such as MPI and PVM provide such primitives.

Message Passing vs. Shared Address Space Platforms

• Message passing requires little hardware support, other than a network.

• Shared address space platforms can easily emulate message passing. The reverse is more
difficult to do (in an efficient manner).

Performance & Scalability for parallel computers system

How do we measure the performance of a computer system?

• Many people believe that execution time is the only reliable metric to measure computer
performance Approach
• Run the user’s application elapsed time and measure wall clock time
• Remarks :
• This approach is sometimes difficult to apply and it could permit misleading interpretations.
• Pitfalls of using execution time as performance metric.
o Execution time alone does not give the user much clue to a true performance of the
parallel machine

Types of performance requirement

Six types of performance requirements are posed by users:

1. Executive time and throughput

2. Processing speed
3. System throughput
4. Utilization
5. Cost effectiveness
6. Performance / Cost ratio

Remarks :

These requirements could lead to quite different conclusions for the same application on the same
computer platform

Remarks :

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
6

• Higher Utilization corresponds to higher Gflop/s per dollar, provided if CPU-hours are changed
at a fixed rate.
• A low utilization always indicates a poor program or compiler.
• Good program could have a long execution time due to a large workload, or a low speed due to
a slow machine.
• Utilization factor varies from 5% to 38%. Generally the utilization drops as more nodes are used.
• Utilization values generated from the vendor’s benchmark programs are often highly optimized.

Performance Metrics of Parallel Systems

Speedup :

• Speedup Tp is defined as the ratio of the serial runtime of the best sequential algorithm for
solving a problem to the time taken by the parallel algorithm to solve the same problem on P
processor.
• The P processors used by the parallel algorithm are assumed to be identical to the one used by
the sequential algorithm .

Cost :

• Cost of solving a problem on a parallel system is the product of parallel runtime and the number
of processors used.

E=P*Sp

Efficiency :

• Ratio of speedup to the number of processors.

• Efficiency can also be expressed as the ratio of the execution time of the fastest known
sequential algorithm for solving a problem to the cost of solving the same problem on P
processors
• The cost of solving a problem on a single processor is the execution time of the known best
sequential algorithm Cost

Optimal :

• A parallel system is said to be cost-optimal if the cost of solving a problem on parallel computer
is proportional to the execution time of the fastest known sequential algorithm on a single
processor.

Some Speedup metrics:

Three performance models based on three speedup metrics are commonly used.

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
7

• Amdahl’s law -- Fixed problem size

• Gustafson’s law -- Fixed time speedup
• Sun-Ni’s law -- Memory Bounding speedup

Three approaches to scalability analysis are based on

• Maintaining a constant efficiency,

• A constant speed, and

• A constant utilization

Parallel processing software

Parallel processing software manages the execution of a program on parallel processing
hardware with the objectives of obtaining unlimited scalability (being able to handle an
increasing number of interactions at the same time) and reducing execution time. Applications
that benefit from parallel processing divide roughly into business data processing and
technical/scientific processing.

Business data processing applications are characterized by record processing, and the size of
the data typically creates significant I/O performance issues as well as the need for fast
computation. Parallel processing software assists business applications in two significant ways:

Frameworks - Dataflow frameworks provide the highest performance and simplest method for
expressing record-processing applications so that they are able to achieve high scalability and
total throughput. Dataflow frameworks underlie the internals of most relational database
management systems ( RDBMS s) as well as being available for direct use in construction
of data warehouse , business intelligence , and analytic CRM (customer relationship
management) applications. Frameworks hide most or all the details of inter-process and inter-
processor communications from application developers, making it simpler to create these
applications than it would be using low-level message passing.

RDBMS - As the most common repositories for commercial record-oriented data, RDBMS
systems have evolved so that the Structured Query Language ( SQL ) that is used to access them
is executed in parallel. The nature of the SQL language lends itself to faster processing using
parallel techniques.

Technical and scientific applications tend to be "compute-bound" (they require much processor
computation) and have tended to be associated with the supercomputer . There are two
primary techniques used in the creation of most of these applications - message
passing and parallelizing compiler s.

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
8

Message Passing - Application programs can be built using mechanisms for communication
between one processor operating concurrently with others. This is the lowest-level mechanism
available and can lead to the highest possible performance at the greatest implementation cost
and complexity. (Note that message passing for parallel computation should not be confused
with the term messaging which is also used to describe transactional communication systems
for distributed client-to-server and inter-server business applications.)

Parallelizing Compilers - For technical and mathematical applications dominated by matrix

algebra , there are compiler s that can create parallel execution from seemingly sequential
program source code. These compilers can decompose a program and insert the necessary
message passing structures and other parallel constructs automatically.

This was last updated in September 2005

Remaining topics of unit-1

The Von Neumann Model

 The invention of stored program computers has been ascribed to a mathematician, John
von Neumann, who was a contemporary of Mauchley and Eckert.
 Stored-program computers have become known as von Neumann Architecture systems.

Today’s stored-program computers have the following characteristics:

– Three hardware systems:
• A central processing unit (CPU)
• A main memory system
• An I/O system
-The capacity to carry out sequential instruction processing.
-A single data path between the CPU and main memory.This single path is known as the von
Neumann bottleneck.

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
9

Working
 The control unit fetches the next instruction from memory using the program counter to
determine where the instruction is located.
 The instruction is decoded into a language that the ALU can understand.
 Any data operands required to execute the instruction are fetched from memory and
placed into registers within the CPU.
 The ALU executes the instruction and places results in registers or memory.

Processor configuration
Processor configuration refers to the way in which a computer's central processing unit or units
are organized. It also concerns the type and capabilities of the CPUs. In turn, this determines
how quickly a computer can carry out software instructions
Processor or CPU arrangements as:
1.Single processor
2.Multi-Core CPUs
o Traditionally, a given CPU contained a single processor. This configuration allowed the
processor to execute one software instruction at a time. Multi-core processors contain two or
more processors, allowing them to process multiple instructions simultaneously.

o Number of CPUs

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
10

Some computers can have more than one CPU, which, in turn, may contain single or multiple
cores. In these configurations, the processors run in parallel, processing instructions at the same
time, similar to a CPU with multiple cores except that each CPU is an entirely separate
component.

 Capabilities
o Configuration also depends on the specifications of the CPU or CPUs installed on a system. One
such specification is clock speed. This is the number of times per second a processor works to
execute software commands. Other important values are the amount of cache memory and the
thermal design power (TDP). Cache is high-speed memory installed on the CPU. TDP is the
amount of heat the cooling system must dissipate to prevent malfunction, according to CPU
World.

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
11

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I

Online Shopping
0% (1)
Online Shopping
31 pages
Parallel Processing
No ratings yet
Parallel Processing
35 pages
Programação Paralela e Distribuída
No ratings yet
Programação Paralela e Distribuída
39 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
Parallel VS Distributed Computing
No ratings yet
Parallel VS Distributed Computing
9 pages
PDS Merged
No ratings yet
PDS Merged
182 pages
RS - Pds-Oe 3010
No ratings yet
RS - Pds-Oe 3010
8 pages
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
No ratings yet
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
17 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Coa Unit 04
No ratings yet
Coa Unit 04
85 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Unit Iv Parallelism
No ratings yet
Unit Iv Parallelism
80 pages
CC Unit 1.2
No ratings yet
CC Unit 1.2
39 pages
Parallel Computer Models: PCA Chapter 1
No ratings yet
Parallel Computer Models: PCA Chapter 1
61 pages
Cloud Computing Lecture3
No ratings yet
Cloud Computing Lecture3
50 pages
Com 419 - CSD - Cat 2 2023
No ratings yet
Com 419 - CSD - Cat 2 2023
5 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
28 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
51 pages
Unit2 A
No ratings yet
Unit2 A
70 pages
Pda 2
No ratings yet
Pda 2
105 pages
Introduction To Parallel Computing-Dr Nousheen
No ratings yet
Introduction To Parallel Computing-Dr Nousheen
43 pages
Introduction
No ratings yet
Introduction
34 pages
Unit I 2 Marks With Answer
No ratings yet
Unit I 2 Marks With Answer
6 pages
8051 Arch
No ratings yet
8051 Arch
55 pages
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
No ratings yet
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
58 pages
CS0051 - M1-Parallel Computing Hardware
No ratings yet
CS0051 - M1-Parallel Computing Hardware
36 pages
Seminar
No ratings yet
Seminar
85 pages
Cloud Computing - Lecture 3
No ratings yet
Cloud Computing - Lecture 3
22 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
COA - Unit 4
No ratings yet
COA - Unit 4
84 pages
HPC Unit 1 Solution
No ratings yet
HPC Unit 1 Solution
8 pages
Lecture 2 General Parallelism Terms
No ratings yet
Lecture 2 General Parallelism Terms
22 pages
Parallel Computers
No ratings yet
Parallel Computers
39 pages
اسئلة واجوبة معالجة متوازية نهائية
No ratings yet
اسئلة واجوبة معالجة متوازية نهائية
10 pages
Chapter 6 Parallel and Concurrent Computing
No ratings yet
Chapter 6 Parallel and Concurrent Computing
27 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
28 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
2.3 Dichotomy of Parallel Computing Platforms
No ratings yet
2.3 Dichotomy of Parallel Computing Platforms
6 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
Parallel
No ratings yet
Parallel
5 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
91 pages
CSCI 8150 Advanced Computer Architecture
100% (2)
CSCI 8150 Advanced Computer Architecture
18 pages
Mpi Course
No ratings yet
Mpi Course
202 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
Coa Chapter 5
No ratings yet
Coa Chapter 5
96 pages
PDC Notes by Zatch-1
No ratings yet
PDC Notes by Zatch-1
42 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Code Beneath the Surface: Mastering Assembly Programming
From Everand
Code Beneath the Surface: Mastering Assembly Programming
Kameron Hussain
No ratings yet
IBM 1401 Programming Systems
From Everand
IBM 1401 Programming Systems
Archive Classics
No ratings yet
Hack into your Friends Computer
From Everand
Hack into your Friends Computer
Magelan Cyber Security
No ratings yet
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
Unit 4 C Programs: Trapezoidal Method Algorithm, Flowchart and Code in C
100% (1)
Unit 4 C Programs: Trapezoidal Method Algorithm, Flowchart and Code in C
7 pages
My Unit 5 Lecture DIrect Method
No ratings yet
My Unit 5 Lecture DIrect Method
56 pages
Numerical Analysis: Solving Systems of Linear Equations
No ratings yet
Numerical Analysis: Solving Systems of Linear Equations
10 pages
Types of Optical Fiber
No ratings yet
Types of Optical Fiber
30 pages
Arithmetic Processor: 10-2 Addition and Subtraction
No ratings yet
Arithmetic Processor: 10-2 Addition and Subtraction
9 pages
Lecture 5 - Non-Linear Effects in Fibre: - Dispersion Compensating Fibre - Fibre Bragg Gratings
No ratings yet
Lecture 5 - Non-Linear Effects in Fibre: - Dispersion Compensating Fibre - Fibre Bragg Gratings
13 pages
Compiled By: Er. Amit Kumar Khan, Guest Lecture, PUSET Sem:8, Sub:OFC
No ratings yet
Compiled By: Er. Amit Kumar Khan, Guest Lecture, PUSET Sem:8, Sub:OFC
4 pages
Pco Note2
No ratings yet
Pco Note2
22 pages
MongoDB Overall Report
No ratings yet
MongoDB Overall Report
9 pages
8085 Instructions: Institute of Lifelong Learning, University of Delhi
No ratings yet
8085 Instructions: Institute of Lifelong Learning, University of Delhi
21 pages
Topic 5 DLL Elem Lower
100% (3)
Topic 5 DLL Elem Lower
4 pages
Del TestyourSkillsforComputerBasics
No ratings yet
Del TestyourSkillsforComputerBasics
21 pages
Complet File
100% (3)
Complet File
536 pages
Third Semester IT For Business Question Paper Analysis and Imp Questions
No ratings yet
Third Semester IT For Business Question Paper Analysis and Imp Questions
4 pages
Practice 5
No ratings yet
Practice 5
7 pages
KWP2000 Plus ECU Remap Flasher User Manual
No ratings yet
KWP2000 Plus ECU Remap Flasher User Manual
10 pages
Customer Perception Towards Internet Banking PDF
No ratings yet
Customer Perception Towards Internet Banking PDF
17 pages
Computers Notes Class 4 and All Classes
No ratings yet
Computers Notes Class 4 and All Classes
56 pages
INtroduction
No ratings yet
INtroduction
2 pages
Filipino Reviewer 1
No ratings yet
Filipino Reviewer 1
29 pages
ICT I (Meterial)
No ratings yet
ICT I (Meterial)
103 pages
Week 15 Literary Adaptations With ICT Skills Empowerment
No ratings yet
Week 15 Literary Adaptations With ICT Skills Empowerment
27 pages
Unit-II Management Information System
No ratings yet
Unit-II Management Information System
124 pages
Final Btech Handbook22
No ratings yet
Final Btech Handbook22
19 pages
Syllabi IBAS - Effective 2019 RWM PDF
100% (1)
Syllabi IBAS - Effective 2019 RWM PDF
104 pages
CSEP 573 Applications of Artificial Intelligence (AI) : Rajesh Rao (Instructor) Abe Friesen (TA)
No ratings yet
CSEP 573 Applications of Artificial Intelligence (AI) : Rajesh Rao (Instructor) Abe Friesen (TA)
63 pages
Examination Management System Edited
80% (5)
Examination Management System Edited
26 pages
CV Victor Nyami
No ratings yet
CV Victor Nyami
5 pages
Chapter1-Introduction To Computer
No ratings yet
Chapter1-Introduction To Computer
27 pages
Introduction To HCI
No ratings yet
Introduction To HCI
47 pages
Human Computer Interaction 1
100% (1)
Human Computer Interaction 1
85 pages
Manufacturing - Design, Production, Automation and Integration
100% (2)
Manufacturing - Design, Production, Automation and Integration
586 pages
I PU Quaterly Test - Sample QP
No ratings yet
I PU Quaterly Test - Sample QP
2 pages
Douglas Hopkins Interview With David Douglas of The New York Photo District News: "A Photographer and His Computer".
No ratings yet
Douglas Hopkins Interview With David Douglas of The New York Photo District News: "A Photographer and His Computer".
3 pages
Operating System Tutorialspoint
No ratings yet
Operating System Tutorialspoint
64 pages
Python CH 01 Lecture Notes
100% (1)
Python CH 01 Lecture Notes
18 pages
Agard Flight Test Technique Series Volume 3 Part 1 Dynamic Systems Output Error Approach
No ratings yet
Agard Flight Test Technique Series Volume 3 Part 1 Dynamic Systems Output Error Approach
186 pages