Chapter 9
Chapter 9
Multiprocessor
• multiprocessors
•Parallel processing
• pipelining
• vector processing
2
Introduction
The CPU reads and writes numerical values
(instructions and/or data) to locations in the main
memory.
The CPU executes program instructions one at a
time, and sequentially until directed otherwise.
When we turn on our computer, the internal hardware
gives the CPU the address of the first program instruction
3 major parts of CPU
1) Register Set
2) ALU
3) Control
3
Introduction
the control unit: coordinates the transfer of data
and instructions between main memory and the
registers in the CPU
registers: provides small memory storage in the
CPU, e.g.:
1. instruction register: holder for current instruction
being executed
2. program counter: holder for address of next
instruction
3. general register(s): temporary storage for values as
needed
arithmetic/logic unit: performs calculations
4
Introduction
Executes all arithmetic and logical operations
Arithmetic operations
Addition, subtraction, multiplication, division
Logical operations
Compare numbers, letters, or special characters
Tests for one of three conditions
o Equal-to condition
o Less-than condition
o Greater-than condition
5
Multiprocessor
A multiprocessor is a computer system in which two or more
CPUs share full access to a common RAM.
The term “processor” in multiprocessor can make either a central
processing unit (CPU) or an input-output processor (IOP).
As it is most commonly defined, a multiprocessor system implies
the existence of multiple CPU’s, although usually there will be
one or more IOP’s as well.
A program running on any of the CPUs sees a normal (usually
paged) virtual address space.
The only unusual property this system has is that the CPU can
write some value into a memory word and then read the word
back and get a different value (because another CPU has changed
it).
6
Cont’d…
When organized correctly, this property forms the basis of
Interprocessor communication: one CPU writes some data
into memory and another one reads the data out.
A multiprocessor system is controlled by one operating
system that provides interaction between processors and
all the components of the system cooperate in the solution
of a problem.
The benefit derived from a multiprocessor organization is
an improved system performance.
The system derives its high performance from the fact that
computations can proceed in parallel in one of two ways.
1. Multiple independent jobs can be made to operate in parallel.
2. A single job can be partitioned into multiple parallel tasks.
7
Cont’d…
Multiprocessing improves the reliability of the system so that a
failure or error in one part has limited effect on the rest of the
system.
Multiprocessors are classified by the way their memory is
organized.
A multiprocessor system with common shared memory is
classified as:-
1. Shared memory or tightly coupled multiprocessor.
2. Distributed memory or loosely coupled system.
8
Parallel processing is the simultaneous processing of the
same task on two or more microprocessors in order to
obtain faster results.
Parallel processing is executing of program instructions
by dividing them among multiple processors.
The computer resources can include a single computer
with multiple processors, or a number of computers
connected by a network, or a combination of both.
The processors access data through shared memory. Some
supercomputer parallel processing systems have hundreds
of thousands of microprocessors.
9
Single instruction, single data stream - SISD
Single instruction, multiple data stream - SIMD
Multiple instruction, single data stream - MISD
Multiple instruction, multiple data stream- MIMD
10
Single processor
Single instruction stream
Data stored in single memory
Uni-processor
11
Single machine instruction
Controls simultaneous execution
Number of processing elements
Lockstep basis
Each processing element has associated data memory
Each instruction executed on different set of data by
different processors
Vector and array processors
12
Cond…
13
Sequence of data
Transmitted to set of processors
Each processor executes different instruction sequence
Never been implemented
14
Set of processors
Simultaneously execute different instruction sequences
Different sets of data
SMPs, clusters and NUMA systems
15
16
Applications of Parallel Processing
Parallel computing used on massive datasets is
popping up everywhere to create everything from
artificial brains to real-time video analysis systems.
Used to improve the catalog of earth orbiting satellite
and problems associated with the catalog.
17
pipelining
In computer pipeline is the continuous and somewhat
overlapped movement of instruction to the processor or in
the arithmetic steps taken by the processor to perform an
instruction.
Set of data processing elements connected in series ,
where the output of one element is the input of the next one.
The elements of a pipeline are often executed in parallel
or in time sliced fashion; in that case, some amount of
buffer storage is often inserted b/n the stages to memorize
the intermediate results and transfer useful information.
18
Cond…
19
cont’d…
Pipe lining is the use of pipeline.
Without a pipeline, a computer processor can not get the first
instruction from memory, and so forth.
While fetching the instruction the arithmetic part of the processor
is idle.
It must wait until it gets the next instruction.
5 Stage of Pipeline
1.Instruction fetch (IF)
2.Instruction Decode (ID)
3.Execution (EX)
4.Memory Read/Write (MEM)
5.Result Write back (WB)
All modern processors operate pipelining with 5 or more stages
20
With pipelining, the computer architecture allows the next
instruction to be fetched while the processor is performing
arithmetic operations, holding them in a buffer close to the
processor until each instruction operation can be
performed.
The staging of instruction fetching is continuous.
Pipeline & pipelining also apply to computer memory
controllers and moving data through various memory.
21
Pipelines can be divided in to:
Instruction pipeline
Arithmetic pipeline
Instruction pipeline the stage in w/c an
instruction moved through the processor,
including its being fetched , perhaps buffered
and then executed.
Arithmetic pipeline represents the parts of an
arithmetic operation that can be broken down
and overlapped as they are performed.
22
Pipeline management problems
conditions that lead to incorrect behavior if not fixed.
Limits to pipelining: Hazards prevent next instruction from
executing during its designated clock cycle
Structural hazards: caused by HW resources (memory,
register, ALU) two different instructions use same h/w in
same cycle.
HW cannot support this combination of instructions.
Data hazards(dependency conflict): two different
instructions use same storage (must appear as if the
instructions execute in correct order )
Instruction depends on result of prior instruction still in the
pipeline .
Control hazards: Pipelining of branches & other
instructions that change the PC (one instruction affects
which instruction is next )
23
VECTOR PROCESSING
A vector processor, or array processor, is a central processing
unit (CPU) that implements an instruction set containing
instructions that operate on one-dimensional arrays of data
called vectors.
This is in contrast to a scalar processor whose instructions
operate on single data items.
A vector processor is a processor that can operate on entire
vectors with one instruction, i.e. the operands of some
instructions specify complete vectors.
Vector processors can greatly improve performance on certain
workloads, notably numerical simulation and similar tasks.
24
VECTOR PROCESSING(con.t)
vector processing , Processing of sequences of data in
a uniform manner, a common occurrence in
manipulation of matrices (whose elements are vectors)
or other arrays of data.
A vector processor will process sequences of input
data as a result of obeying a single vector instruction
and generate a result data sequence.
vector processing techniques have since been added to
almost all modern CPU designs, although they are
typically referred to as SIMD.
25
VECTOR PROCESSING(con.t)
In these implementations, the vector unit runs beside
the main scalar CPU, and is fed data from vector
instruction aware programs.
In general terms, CPUs are able to manipulate one or
two pieces of data at a time.
For instance, most CPUs have an instruction that
essentially says "add A to B and put the result in C".
The data for A, B and C could be—in theory at least—
encoded directly into the instruction.
26
Contd..
A vector instruction can replace a loop
» Example: Adding vectors A and B and storing the result in C
– n elements in each vector
» We need a loop that iterates n times
for(i=0; i<n; i++)
C[i] = A[i] + B[i]
» This can be done by a single vector instruction
V3 V2+V1
Assumes that A is in V2 and B in V1
27
Applications
Servers
Home Cinema
Super Computing
Cluster Computing
Mainframes
“Astrophysicist Replaces Supercomputer With 8 PS3’s” 2
28