0% found this document useful (0 votes)
54 views11 pages

Unit-7 Design Issues For Parallel Computers Definition

This is the lecturer notes for Personal Computer Organization or simply Computer Organization which is helpful for electronics and communication or computer engineering students
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views11 pages

Unit-7 Design Issues For Parallel Computers Definition

This is the lecturer notes for Personal Computer Organization or simply Computer Organization which is helpful for electronics and communication or computer engineering students
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

1

Unit-7 design issues for parallel computers

Definition:
Processing of multiple tasks simultaneously on multiple processors is called parallel processing.
The parallel program consists of multiple active processes simultaneously solving a given
problem. The parallel program consists of multiple active processes simultaneously solving a
given problem. A given task is divided into multiple subtasks using divide and conquer
technique and each of them are processed on different CPU’s. Programming on multiprocessor
system using divide – conquer technique is called parallel programming.

Generally, issues in parallel computing are:

 Design of parallel computers


• Design of efficient parallel algorithms
• Parallel programming models
• Parallel computer language
• Methods for evaluating parallel algorithms
• Parallel programming tools
• Portable parallel programs
Why parallel processing?

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
2

Control Structure of Parallel computers (architecture of parallel computers)


• Processing units in parallel computers either operate under the centralized control of a single
control unit or work independently.
• If there is a single control unit that dispatches the same instruction to various processors (that
work on different data), the model is referred to as single instruction stream, multiple data
stream (SIMD).
• If each processor has its own control unit, each processor can execute different instructions on
different data items. This model is called multiple instruction stream, multiple data stream
(MIMD).

A typical SIMD architecture (a) and a typical MIMD architecture (b)

SIMD Processors
• Some of the earliest parallel computers such as the Illiac IV, MPP, DAP, CM-2, and MasPar MP-1
belonged to this class of machines.
• Variants of this concept have found use in co-processing units such as the MMX units in Intel
processors and DSP chips such as the Sharc.
• SIMD relies on the regular structure of computations (such as those in image processing).
• It is often necessary to selectively turn off operations on certain data items. For this reason,
most SIMD programming paradigms allow for an ``activity mask'', which determines if a
processor should participate in a computation or not.

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
3

MIMD Processors
• In contrast to SIMD processors, MIMD processors can execute different programs on different
processors.
• A variant of this, called single program multiple data streams (SPMD) executes the same
program on different processors.
• It is easy to see that SPMD and MIMD are closely related in terms of programming flexibility and
underlying architectural support.
Examples of such platforms include current generation Sun Ultra Servers, SGI Origin Servers,
multiprocessor PCs, workstation clusters, and the IBM SP
SIMD-MIMD Comparison
• SIMD computers require less hardware than MIMD computers (single control unit).
• However, since SIMD processors ae specially designed, they tend to be expensive and have long
design cycles.
• Not all applications are naturally suited to SIMD processors.
In contrast, platforms supporting the SPMD paradigm can be built from inexpensive off-the-shelf
components with relatively little effort in a short amount of time
Communication Model of Parallel computers
• There are two primary forms of data exchange between parallel tasks - :
• accessing a shared data space and exchanging messages.
• Platforms that provide a shared data space are called shared-address-space machines or
multiprocessors.
• Platforms that support messaging are also called message passing platforms or
multicomputers.
Shared-Address-Space Platforms
• Part (or all) of the memory is accessible to all processors.
• Processors interact by modifying data objects stored in this shared-address-space.
• If the time taken by a processor to access any memory word in the system global or local is
identical, the platform is classified as a uniform memory access (UMA), else, a non-uniform
memory access (NUMA) machine.
NUMA and UMA Shared-Address-Space Platforms

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
4

Figure: Typical shared-address-space architectures:


(a) Uniform-memory access shared-address-space computer;
(b) Uniform-memory-access shared-address-space computer with caches and memories;
(c) Non-uniform-memory-access shared-address-space computer with local memory only.
NUMA and UMA Shared-Address-Space Platforms comparison
• The distinction between NUMA and UMA platforms is important from the point of view of
algorithm design. NUMA machines require locality from underlying algorithms for performance.
• Programming these platforms is easier since reads and writes are implicitly visible to other
processors.
• However, read-write data to shared data must be coordinated (this will be discussed in greater
detail when we talk about threads programming).
• Caches in such machines require coordinated access to multiple copies. This leads to the cache
coherence problem.
• A weaker model of these machines provides an address map, but not coordinated access. These
models are called non cache coherent shared address space machines.
Shared-Address-Space vs. Shared Memory Machines
• It is important to note the difference between the terms shared address space and shared
memory.
• We refer to the former as a programming abstraction and to the latter as a physical machine
attribute.
• It is possible to provide a shared address space using a physically distributed memory.

Message-Passing Platforms:
• These platforms comprise of a set of processors and their own (exclusive) memory.

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
5

• Instances of such a view come naturally from clustered workstations and non-shared-address-
space multi-computers.
• These platforms are programmed using (variants of) send and receive primitives.
• Libraries such as MPI and PVM provide such primitives.

Message Passing vs. Shared Address Space Platforms


• Message passing requires little hardware support, other than a network.

• Shared address space platforms can easily emulate message passing. The reverse is more
difficult to do (in an efficient manner).

Performance & Scalability for parallel computers system


How do we measure the performance of a computer system?

• Many people believe that execution time is the only reliable metric to measure computer
performance Approach
• Run the user’s application elapsed time and measure wall clock time
• Remarks :
• This approach is sometimes difficult to apply and it could permit misleading interpretations.
• Pitfalls of using execution time as performance metric.
o Execution time alone does not give the user much clue to a true performance of the
parallel machine

Types of performance requirement


Six types of performance requirements are posed by users:

1. Executive time and throughput


2. Processing speed
3. System throughput
4. Utilization
5. Cost effectiveness
6. Performance / Cost ratio

Remarks :

These requirements could lead to quite different conclusions for the same application on the same
computer platform

Remarks :

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
6

• Higher Utilization corresponds to higher Gflop/s per dollar, provided if CPU-hours are changed
at a fixed rate.
• A low utilization always indicates a poor program or compiler.
• Good program could have a long execution time due to a large workload, or a low speed due to
a slow machine.
• Utilization factor varies from 5% to 38%. Generally the utilization drops as more nodes are used.
• Utilization values generated from the vendor’s benchmark programs are often highly optimized.

Performance Metrics of Parallel Systems


Speedup :

• Speedup Tp is defined as the ratio of the serial runtime of the best sequential algorithm for
solving a problem to the time taken by the parallel algorithm to solve the same problem on P
processor.
• The P processors used by the parallel algorithm are assumed to be identical to the one used by
the sequential algorithm .

Cost :

• Cost of solving a problem on a parallel system is the product of parallel runtime and the number
of processors used.

E=P*Sp

Efficiency :

• Ratio of speedup to the number of processors.


• Efficiency can also be expressed as the ratio of the execution time of the fastest known
sequential algorithm for solving a problem to the cost of solving the same problem on P
processors
• The cost of solving a problem on a single processor is the execution time of the known best
sequential algorithm Cost

Optimal :

• A parallel system is said to be cost-optimal if the cost of solving a problem on parallel computer
is proportional to the execution time of the fastest known sequential algorithm on a single
processor.

Some Speedup metrics:

Three performance models based on three speedup metrics are commonly used.

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
7

• Amdahl’s law -- Fixed problem size


• Gustafson’s law -- Fixed time speedup
• Sun-Ni’s law -- Memory Bounding speedup

Three approaches to scalability analysis are based on

• Maintaining a constant efficiency,

• A constant speed, and

• A constant utilization

Parallel processing software


Parallel processing software manages the execution of a program on parallel processing
hardware with the objectives of obtaining unlimited scalability (being able to handle an
increasing number of interactions at the same time) and reducing execution time. Applications
that benefit from parallel processing divide roughly into business data processing and
technical/scientific processing.

Business data processing applications are characterized by record processing, and the size of
the data typically creates significant I/O performance issues as well as the need for fast
computation. Parallel processing software assists business applications in two significant ways:

Frameworks - Dataflow frameworks provide the highest performance and simplest method for
expressing record-processing applications so that they are able to achieve high scalability and
total throughput. Dataflow frameworks underlie the internals of most relational database
management systems ( RDBMS s) as well as being available for direct use in construction
of data warehouse , business intelligence , and analytic CRM (customer relationship
management) applications. Frameworks hide most or all the details of inter-process and inter-
processor communications from application developers, making it simpler to create these
applications than it would be using low-level message passing.

RDBMS - As the most common repositories for commercial record-oriented data, RDBMS
systems have evolved so that the Structured Query Language ( SQL ) that is used to access them
is executed in parallel. The nature of the SQL language lends itself to faster processing using
parallel techniques.

Technical and scientific applications tend to be "compute-bound" (they require much processor
computation) and have tended to be associated with the supercomputer . There are two
primary techniques used in the creation of most of these applications - message
passing and parallelizing compiler s.

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
8

Message Passing - Application programs can be built using mechanisms for communication
between one processor operating concurrently with others. This is the lowest-level mechanism
available and can lead to the highest possible performance at the greatest implementation cost
and complexity. (Note that message passing for parallel computation should not be confused
with the term messaging which is also used to describe transactional communication systems
for distributed client-to-server and inter-server business applications.)

Parallelizing Compilers - For technical and mathematical applications dominated by matrix


algebra , there are compiler s that can create parallel execution from seemingly sequential
program source code. These compilers can decompose a program and insert the necessary
message passing structures and other parallel constructs automatically.

This was last updated in September 2005

Remaining topics of unit-1

The Von Neumann Model


 The invention of stored program computers has been ascribed to a mathematician, John
von Neumann, who was a contemporary of Mauchley and Eckert.
 Stored-program computers have become known as von Neumann Architecture systems.

Today’s stored-program computers have the following characteristics:


– Three hardware systems:
• A central processing unit (CPU)
• A main memory system
• An I/O system
-The capacity to carry out sequential instruction processing.
-A single data path between the CPU and main memory.This single path is known as the von
Neumann bottleneck.

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
9

Working
 The control unit fetches the next instruction from memory using the program counter to
determine where the instruction is located.
 The instruction is decoded into a language that the ALU can understand.
 Any data operands required to execute the instruction are fetched from memory and
placed into registers within the CPU.
 The ALU executes the instruction and places results in registers or memory.

Processor configuration
Processor configuration refers to the way in which a computer's central processing unit or units
are organized. It also concerns the type and capabilities of the CPUs. In turn, this determines
how quickly a computer can carry out software instructions
Processor or CPU arrangements as:
1.Single processor
2.Multi-Core CPUs
o Traditionally, a given CPU contained a single processor. This configuration allowed the
processor to execute one software instruction at a time. Multi-core processors contain two or
more processors, allowing them to process multiple instructions simultaneously.

o Number of CPUs

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
10

Some computers can have more than one CPU, which, in turn, may contain single or multiple
cores. In these configurations, the processors run in parallel, processing instructions at the same
time, similar to a CPU with multiple cores except that each CPU is an entirely separate
component.

 Capabilities
o Configuration also depends on the specifications of the CPU or CPUs installed on a system. One
such specification is clock speed. This is the number of times per second a processor works to
execute software commands. Other important values are the amount of cache memory and the
thermal design power (TDP). Cache is high-speed memory installed on the CPU. TDP is the
amount of heat the cooling system must dissipate to prevent malfunction, according to CPU
World.

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
11

PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I

You might also like