Advanced Computer Architecture: Section 1 Parallel Computer Models
Advanced Computer Architecture: Section 1 Parallel Computer Models
Advanced Computer Architecture: Section 1 Parallel Computer Models
Section 1
Parallel Computer Models
Sequential Lookahead
Memory-to- Register-to-
Architectural memory register
Associative
Multicomputer
Processor
Processor
Array Mutiprocessor
Flynn’s Classification (1972)
Single instruction, single data stream (SISD)
– conventional sequential machines
Single instruction, multiple data streams (SIMD)
– vector computers with scalar and vector hardware
Multiple instructions, multiple data streams (MIMD)
– parallel computers
Multiple instructions, single data stream (MISD)
– systolic arrays
Among parallel machines, MIMD is most popular, followed
by SIMD, and finally MISD.
Parallel/Vector Computers
SIMD architecture
A single instruction is applied to a vector (one-
dimensional array) of operands.
Two families:
– Memory-to-memory: operands flow from memory to
vector pipelines and back to memory
– Register-to-register: vector registers used to interface
between memory and functional pipelines
SIMD Computers
Performance depends on
– hardware technology
– architectural features
– efficient resource management
– algorithm design
– data structures
– language efficiency
– programmer skill
– compiler technology
Performance Indicators
Turnaround time depends on:
– disk and memory accesses
– input and output
– compilation time
– operating system overhead
– CPU time
Since I/O and system overhead frequently overlaps
processing by other programs, it is fair to consider only
the CPU time used by a program, and the user CPU
time is the most important factor.
Clock Rate and CPI
CPU is driven by a clock with a constant cycle time
(usually measured in nanoseconds).
The inverse of the cycle time is the clock rate (f = 1/,
measured in megahertz).
The size of a program is determined by its instruction
count, Ic, the number of machine instructions to be
executed by the program.
Different machine instructions require different numbers
of clock cycles to execute. CPI (cycles per instruction) is
thus an important parameter.
Average CPI
f
Wp
I c CPI
In a multiprogrammed system, the system throughput is often less
than the CPU throughput.
Example 1. VAX/780 and IBM RS/6000
Machine Clock Performance CPU Time
VAX 11/780 5 MHz 1 MIPS 12x seconds
IBM RS/6000 25 MHz 18 MIPS x seconds
The instruction count on the RS/6000 is 1.5 times that
of the code on the VAX.
Average CPI on the VAX is assumed to be 5.
Average CPI on the RS/6000 is assumed to 1.39.
VAX has typical CISC architecture.
RS/6000 has typical RISC architecture.
Programming Environments
Programmability depends on the programming environment
provided to the users.
Conventional computers are used in a sequential
programming environment with tools developed for a
uniprocessor computer.
Parallel computers need parallel tools that allow specification
or easy detection of parallelism and operating systems that
can perform parallel scheduling of concurrent events, shared
memory allocation, and shared peripheral and communication
links.
Implicit Parallelism
Use a conventional language (like C, Fortran, Lisp, or Pascal)
to write the program.
Use a parallelizing compiler to translate the source code into
parallel code.
The compiler must detect parallelism and assign target
machine resources.
Success relies heavily on the quality of the compiler.
Kuck (U. of Illinois) and Kennedy (Rice U.) used this approach.
Explicit Parallelism
P1 P2 … Pn
System Interconnect
(Bus, Crossbar, Multistage network)
LM1 P1
LM2 . P2
Inter-
.
connection
. Network .
. .
LMn Pn
Hierarchical Cluster Model
GSM GSM … GSM
P CSM P CSM
P C CSM P C CSM
…
. I . . I .
.
. N .
.
.
. N .
.
P CSM P CSM
The COMA Model
D D D
C C … C
P P P
Other Models
M P Message-passing P M
interconnection
network
M P P M
P P
…
M M
Multicomputer Generations
Each multicomputer uses routers and channels in its
interconnection network, and heterogeneous systems may involved
mixed node types and uniform data representation and
communication protocols.
First generation: hypercube architecture, software-controlled
message switching, processor boards.
Second generation: mesh-connected architecture, hardware
message switching, software for medium-grain distributed
computing.
Third generation: fine-grained distributed computing, with each
VLSI chip containing the processor and communication resources.
Multivector and SIMD Computers
P P P
…
M M M
Interconnection Network