Multiprocessors and Thread: Unit-4
Multiprocessors and Thread: Unit-4
earningObjectives
(s Multiprocessors
10.1 Multiprocessors
Processor Organization
Serial Parallel
Symmetric Nonuniform
multiprocessor memory access
(SMP) (NUMA)
Definition of Multiprocessor
Multiprocessor is a multiple instruction stream, multiple data stream (MIMD) computer containing two or
more CPU's that cooperate on common computational tasks..
10.3
Multiprocessors and Thread Level Parallelism
Interprocessor
Communication
Processor Processor
Network
. .
Processor
Processor
Processor Processor
Memory Unit
CPU4 CPU 5
CPU 1 CPU2 CPU 3
Local Bus
System Bus
Advantages
. If malfunctioning occurs in any of the bus interface circuits, complete system will fail.
. Decreased throughput since at a time, only one processor can communicate with any other Tu
unit.
. The total overalltransfer rate within the system is limited by the speed of the single patn.
" Increased arbitration logic, as the number of processors and memorv unit increases, the bus contenuon
problem increases.
Multiport Memory
MM 1 MM 2 MM 3 MM 4
EEEE
CPU1
CPU 2
CPU3
CPU4
Advantages
High transfer rate can be achieved because of the multiple paths.
Disadvantages
" It requires expensive memory control logic and a large number of cables and connections.
Crossbar switch
The cross bar switch orgganization consists of anumber of cross points that are placedlat
between processor buses and memory module path. It provides separate path for each nmodule. intersections
Data, Address, and
Memory Modules Control from CPU1
Mm1|| Mm2 Mm3 Mm4 Data
Data, Address, and
Address Multiplexers Control from CPU2
CPU1 Memory and
ModuleRW Arbitration
CPU 2 Logic Data, Address, and
Memory (Control from CPU3
Enable
CPU 3
\Data, Address, and
CPU 4
Control from CPU4
Figure 10.6:(a) Cross bar switch (b) Block diagram of cross bar switch
Each switch point has control logic to set up the transfer path between a processor and a memory. It
examines the address that is placed in the bus to determine whether its particular module is being
addressed. It also resolves the multiple requests for access to the same memory on the predetermined
priority basis. It also supports simultaneous transfers from all memory modules because there is a
separate path associated with each module.
The functional designofacrossbar switch connected to one memory module is shown in figure 10.6.
The circuit consists of multiplexers that select the data address, and controlfrom one CPU for
communication with the memory module.
Priority levels are established by the arbitration logic to select on CPUwhen two or more CPU'S
attempt to access the same memory.
Advantages
Supports simultaneous transfers from all memory modules.
Multiprocessors and Thread LevelParallelism 10.9
pisadvantages
B 1 1
B
Aconnected to 0 A connected to 1
A 0
A
B 1 1
B
A connected to 0 A
connected to 1
010
Some requests cannot be P1 1
1 011
Satisfied Simultaneously P2
For Ex: it P1 is connected to
100
000 through 001. p2 can be 1
connected to only one of the 1 101
Destinations ie 100 through 111
110
1
111
Many diferent topologies have been proposed for multi-stage switching networks to c
processor - memory communication in a tightly coupled multiprocessor system or to control t
communication between the processing elements in a loosely coupled system.
" One such topology is the omega switching network shown in fig 10.9.
" In this configuration, thereaexactly one path from each source to any particular
destinatior
0 000
1 001
2 010
3 011
4 100
5 101
6 110
7 111
010
0 (01) (11) J110
)101
t
001
lbe.ooseprbyocecsouglorse
major drawback and if the bus fails, whole system
switch network evolved.
performances, crossbar, multi port, hypercube and multistage
10.4 Multi-threaded Architecture
Ubut also loca
instruction stream is divided into several smaller
Multi-threading is amechanism by which the
ors.These path streams (threads ) and can be executed in
parallel.
of a processor by switching to another thread when one thread is stalled is
Increasing utilization
known as hardware mutli-threading.
multi-threaded CPU is not a parallel architecture, strictly speaking; multi-threading is
A design and develop applications
obtained through a single CPU, but it allows aprogrammer to
execute in parallel; namely, threads.
as a set of programs that can virtually
Multit-hreading is solution to avoid waiting clock cycles as the missing data is fetched: making
" execute
concurrently; ifa thread gets blocked, the CPU can
the CPU manage more peer-threads units busy.
instructionsof another thread, thus keeping functional
set of private registers, separate from
" Each thread must have a private Program Counter and
other threads.
words, and the instruction set is composed of
" The architecture often exposes a register file of
instructions that operate on individual words.
Computer Architecture
ACoarse-grnined Multi-threading
version of hardware multi-threadingthatimpliesswitching between threads only after significant
events,such as a last-level cachemiss.
" This change relieves the needto have threadswitching be extremely fast and is much less likely
to slow down the execution of an individual thread, since instructions from other threads will
acostlystall.
only be issued when a thread encounters
Advantages
Tohave very fast thread switching. " Doesn't slow downthread.
Disadvantages
shorter stalls, due topipeline start-upcosts.
It is hard to overcome throughput losses from
" Since CPUissues instructions from 1 thread,
when a stall occurs, the pipeline must beemptied.
complete.
" New thread must fillpipeline before instructions can
start-up overhead, coarse-grained multi-threading is much more
useful for reducing the
Due to this
compared to the stall time.
penalty of high-cost stalls, where pipeline refill is negligible
Fine-grained Multi-threading
Aversion of hardware multi-threading that implies switching between threads after every
instruction resulting in interleaved execution of multiple threads. It switches from one thread
to another at each clock cycle.
" This interleaving is often done in around-robin fashion, skipping any threads that are stalled
at that clock cycle.
To make fine-grained multi-threading practical, the processor must be able to switch threads on
every clock cycle.
Advantages
" Vertical waste is eliminated. Pipeline hazards cannot arise.
" Zero switching overhead.
Ability to hide latency within a thread i.e, it can hide the throughput losses that arise from both short
and long stalls.
" Instructions from other threads can beexecuted when one thread stalls.
High execution efficiency.
" Potentially less complex than alternative high performance
processors.
(1o.15)
Multiprocossors ond Thregd level Parallelilsm
Disadvantages
" Clockcyeles are wasted if athread has little operationto execute.
Needs alot of threads to execute.
. It is expensive than coarse-grained multi-threading.
execute without
. It slows down the execution of the individual threads, since athread that is ready to
stalls willbe delayed by instructions from other threads.
Advantages
Disadvantages
bottlenecks for the
cannot improve performance if any of the shared resources are the limiting
It
performance.
Architecture
10.5 Distributed Memory MIMD
popular computer architecture. Multiple instructions
(MIMD) Architecture is one of the recent and
computer . MIMD (Multiple Instruction Multiple
worked on multiple data to boost the performance of
computers are basically computers with threads and process level architectures. MIMD is
Data)
After advancement and development
appropriate for programs restricted by condition statements.
architectures became common.
in integrated circuit technology the MIMD
(Processors) which are connected with
MIMD architecture comprises of many processing elements
these processing elements.
some memorythrough a common bus. Task and data is distributed among
instruction on their data corresponding to
So at the same time all processing elements execute the
complete the given task.
10.14 Computer Architecture
" All processors in the system are directly connected to own memory and caches. Any processor
cannot directly access another processor's memory.
" Each node has anetwork interface (NI).
nox.
" All communication and synchronization between processors happens via messages
through the NI.
" Since this approach uses messages for communication and synchronization, it is often called
message passing architecture.
Processing
PE1 PEn
PEO Element (node)
MO M1 Mn Memory
P0 P1 Pn Processor
Interconnection Network