Unit-7 Design Issues For Parallel Computers Definition
Unit-7 Design Issues For Parallel Computers Definition
Definition:
Processing of multiple tasks simultaneously on multiple processors is called parallel processing.
The parallel program consists of multiple active processes simultaneously solving a given
problem. The parallel program consists of multiple active processes simultaneously solving a
given problem. A given task is divided into multiple subtasks using divide and conquer
technique and each of them are processed on different CPU’s. Programming on multiprocessor
system using divide – conquer technique is called parallel programming.
PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
2
SIMD Processors
• Some of the earliest parallel computers such as the Illiac IV, MPP, DAP, CM-2, and MasPar MP-1
belonged to this class of machines.
• Variants of this concept have found use in co-processing units such as the MMX units in Intel
processors and DSP chips such as the Sharc.
• SIMD relies on the regular structure of computations (such as those in image processing).
• It is often necessary to selectively turn off operations on certain data items. For this reason,
most SIMD programming paradigms allow for an ``activity mask'', which determines if a
processor should participate in a computation or not.
PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
3
MIMD Processors
• In contrast to SIMD processors, MIMD processors can execute different programs on different
processors.
• A variant of this, called single program multiple data streams (SPMD) executes the same
program on different processors.
• It is easy to see that SPMD and MIMD are closely related in terms of programming flexibility and
underlying architectural support.
Examples of such platforms include current generation Sun Ultra Servers, SGI Origin Servers,
multiprocessor PCs, workstation clusters, and the IBM SP
SIMD-MIMD Comparison
• SIMD computers require less hardware than MIMD computers (single control unit).
• However, since SIMD processors ae specially designed, they tend to be expensive and have long
design cycles.
• Not all applications are naturally suited to SIMD processors.
In contrast, platforms supporting the SPMD paradigm can be built from inexpensive off-the-shelf
components with relatively little effort in a short amount of time
Communication Model of Parallel computers
• There are two primary forms of data exchange between parallel tasks - :
• accessing a shared data space and exchanging messages.
• Platforms that provide a shared data space are called shared-address-space machines or
multiprocessors.
• Platforms that support messaging are also called message passing platforms or
multicomputers.
Shared-Address-Space Platforms
• Part (or all) of the memory is accessible to all processors.
• Processors interact by modifying data objects stored in this shared-address-space.
• If the time taken by a processor to access any memory word in the system global or local is
identical, the platform is classified as a uniform memory access (UMA), else, a non-uniform
memory access (NUMA) machine.
NUMA and UMA Shared-Address-Space Platforms
PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
4
Message-Passing Platforms:
• These platforms comprise of a set of processors and their own (exclusive) memory.
PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
5
• Instances of such a view come naturally from clustered workstations and non-shared-address-
space multi-computers.
• These platforms are programmed using (variants of) send and receive primitives.
• Libraries such as MPI and PVM provide such primitives.
• Shared address space platforms can easily emulate message passing. The reverse is more
difficult to do (in an efficient manner).
• Many people believe that execution time is the only reliable metric to measure computer
performance Approach
• Run the user’s application elapsed time and measure wall clock time
• Remarks :
• This approach is sometimes difficult to apply and it could permit misleading interpretations.
• Pitfalls of using execution time as performance metric.
o Execution time alone does not give the user much clue to a true performance of the
parallel machine
Remarks :
These requirements could lead to quite different conclusions for the same application on the same
computer platform
Remarks :
PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
6
• Higher Utilization corresponds to higher Gflop/s per dollar, provided if CPU-hours are changed
at a fixed rate.
• A low utilization always indicates a poor program or compiler.
• Good program could have a long execution time due to a large workload, or a low speed due to
a slow machine.
• Utilization factor varies from 5% to 38%. Generally the utilization drops as more nodes are used.
• Utilization values generated from the vendor’s benchmark programs are often highly optimized.
• Speedup Tp is defined as the ratio of the serial runtime of the best sequential algorithm for
solving a problem to the time taken by the parallel algorithm to solve the same problem on P
processor.
• The P processors used by the parallel algorithm are assumed to be identical to the one used by
the sequential algorithm .
Cost :
• Cost of solving a problem on a parallel system is the product of parallel runtime and the number
of processors used.
E=P*Sp
Efficiency :
Optimal :
• A parallel system is said to be cost-optimal if the cost of solving a problem on parallel computer
is proportional to the execution time of the fastest known sequential algorithm on a single
processor.
Three performance models based on three speedup metrics are commonly used.
PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
7
• A constant utilization
Business data processing applications are characterized by record processing, and the size of
the data typically creates significant I/O performance issues as well as the need for fast
computation. Parallel processing software assists business applications in two significant ways:
Frameworks - Dataflow frameworks provide the highest performance and simplest method for
expressing record-processing applications so that they are able to achieve high scalability and
total throughput. Dataflow frameworks underlie the internals of most relational database
management systems ( RDBMS s) as well as being available for direct use in construction
of data warehouse , business intelligence , and analytic CRM (customer relationship
management) applications. Frameworks hide most or all the details of inter-process and inter-
processor communications from application developers, making it simpler to create these
applications than it would be using low-level message passing.
RDBMS - As the most common repositories for commercial record-oriented data, RDBMS
systems have evolved so that the Structured Query Language ( SQL ) that is used to access them
is executed in parallel. The nature of the SQL language lends itself to faster processing using
parallel techniques.
Technical and scientific applications tend to be "compute-bound" (they require much processor
computation) and have tended to be associated with the supercomputer . There are two
primary techniques used in the creation of most of these applications - message
passing and parallelizing compiler s.
PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
8
Message Passing - Application programs can be built using mechanisms for communication
between one processor operating concurrently with others. This is the lowest-level mechanism
available and can lead to the highest possible performance at the greatest implementation cost
and complexity. (Note that message passing for parallel computation should not be confused
with the term messaging which is also used to describe transactional communication systems
for distributed client-to-server and inter-server business applications.)
PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
9
Working
The control unit fetches the next instruction from memory using the program counter to
determine where the instruction is located.
The instruction is decoded into a language that the ALU can understand.
Any data operands required to execute the instruction are fetched from memory and
placed into registers within the CPU.
The ALU executes the instruction and places results in registers or memory.
Processor configuration
Processor configuration refers to the way in which a computer's central processing unit or units
are organized. It also concerns the type and capabilities of the CPUs. In turn, this determines
how quickly a computer can carry out software instructions
Processor or CPU arrangements as:
1.Single processor
2.Multi-Core CPUs
o Traditionally, a given CPU contained a single processor. This configuration allowed the
processor to execute one software instruction at a time. Multi-core processors contain two or
more processors, allowing them to process multiple instructions simultaneously.
o Number of CPUs
PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
10
Some computers can have more than one CPU, which, in turn, may contain single or multiple
cores. In these configurations, the processors run in parallel, processing instructions at the same
time, similar to a CPU with multiple cores except that each CPU is an entirely separate
component.
Capabilities
o Configuration also depends on the specifications of the CPU or CPUs installed on a system. One
such specification is clock speed. This is the number of times per second a processor works to
execute software commands. Other important values are the amount of cache memory and the
thermal design power (TDP). Cache is high-speed memory installed on the CPU. TDP is the
amount of heat the cooling system must dissipate to prevent malfunction, according to CPU
World.
PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I
11
PCO reference –Tannebum, William stalling Compiled by Er. Amit Khan DEX-III/I