Multiprocessor System and Interconnection Networks
Multiprocessor System and Interconnection Networks
Dt. 12.10.17
Contents
Characteristics of Multiprocessors
Interconnection Structures
Inter Processor Arbitration
Inter Processor communication and synchronization
Cache Coherence
Introduction
A multiprocessor system is an interconnection of two or more
CPUs with memory and I/O equipment.
IOPs are generally not included in the definitions of
multiprocessor system unless they have computational facilities
comparable to CPUs.
Multiprocessor are MIMD system.
Multicomputer system includes number of computers connected
together by means of communication lines.
It improves reliability.
If one system fails, the whole system continue to function with
perhaps low efficiency.
The computation can proceed in parallel in two ways:
Multiple independent jobs operate in parallel
Crossbars
excellent performance scalability
Multistage interconnects
compromise between these extremes
Time Shared Common Bus
Multiprocessor systems –system bus
Multiport memory system
Employs separate buses between each memory module and each CPU.
This is shown in Fig.for four CPUs and four memory modules (MMs).
Each processor bus is connected to each memory module (MM)
A processor bus consists of the address, data, and control lines
required to communicate with memory.
The MM is said to have 4 ports and each port accommodates one of the buses.
The module must have internal control logic to determine which port will have access to
memory at any given time.
Memory access conflicts are resolved by assigning fixed priorities to each memory port.
4
Figure shows the functional design of a crossbar switch connected to one memory
module.
The circuit consists of multiplexers that select the data, address, and control from
one CPU for communication with the memory module.
Priority levels are established by the arbitration logic to select one CPU when two or
more CPUs attempt to access the same memory.
A crossbar switch organization supports simultaneous transfers from all memory
modules because there is a separate path associated with each module.
However, the hardware required to implement the switch can become quite large and
complex.
1(a) Use two-input AND and OR gates to construct NxN
crossbar switch network between N processors and N
memory modules. Use cij signal as the enable signal for the
switch in ith row and jth column. Let the width of each
crosspoint be w bits.
(b) Estimate the total number of AND and OR gates needed
as a function of N and w.
Problem (cont.)
...
Problem (cont.)
The crossbar uses priority to determine who gets to go first when Two PE try to communicate
with a single memory.
P1 has priority over P2 , P2 over P3, PN-1 over PN.
Cij is the control signals to determine which cross point gets “activated”.
The decoder gets an address (to determine which memory the PE wants to communicate with).
So for example, if P1 wants to communicate with M1, it would send 1 to C11 and C21 would get
0 (since there is a NOT gate). What that means if P2 wanted to c
Multistage Network
MINs are a class of high-speed computer networks
An MIN consists of a sequence of switching stages, each of which consists of
several switches.
The switching stages are connected with inter stage links between successive
stages, usually composed of processing elements (PEs) on one end of the network
and memory elements (MEs) on the other end, connected by switching elements
(SEs).
The switching elements themselves are usually connected to each other in stages.
Multistage Switching Interconnection Networks (MINs)
.
The basic component of MIN is a 2 X 2
interchange switch.
2 x 2 switch has two inputs, labeled A and B,
and two outputs, labeled 0 and 1.
There are control signals associated with the
switch that establish the interconnection
between the input and output terminals.
switching elements
2 × 2 Switches
control signals
The switch has the capability of connecting input A to either of the outputs.
Terminal B of the switch; behaves in a similar fashion.
The switch also has the capability to arbitrate between conflicting requests.
If inputs A and B both request the same output terminal, only one of them will be
connected; the other will be blocked.
Using the 2 x 2 switch as a building block, it is possible to build a multistage network
to control the communication between a number of sources and destinations
Network Topology
MINs networks can be categorized on the basis of their topology.
Topology is the pattern in which one node is connected to other nodes.
There are two main types of topology: static and dynamic
Static interconnect networks are hard-wired and cannot change their
configurations. point-to-point communication links
– that don’t change dynamically (e.g., trees, rings, meshes )
Dynamic networks: that change interconnectivity dynamically
Implemented with switched communication links
( e.g.,system buses, crossbar switches, multistage networks)
The regular structure signifies that the nodes are arranged in specific shape
and the shape is maintained throughout the networks.
The way input units are connected with the output units, determine the
functional characteristics of the network, i.e., the allowable interconnections
In a single stage network, data may have to be passed through the switches
several times before reaching the final destination.
In multistage network, one pass of multistage stages of switches is usually
sufficient.
Single Stage Interconnect Network
The input nodes are connected to output via a single stage of switches.
The figure shows 8*8 single stage switch using shuffle exchange.
The way input units are connected with the output units, determine the
functional characteristics of the network, i.e., the allowable interconnections.
Static vs. Dynamic
direct links
which are fixed
once built.
perfect shuffle
of inputs of n PEs
to n/2 switches
Omega networks
A multi-stage IN using 2 × 2 switch boxes and a perfect shuffle
interconnect pattern between the stages
In the Omega MIN there is one unique path from each input to each
output.
No redundant paths → no fault tolerance and the possibility of blocking .
Example:
• Connect input 101 to output 001
• Use the bits of the destination
address, 001, for dynamically
selecting a path
• Routing:
- 0 means use upper output
- 1 means use lower output
27
Omega Network Routing
Let
s = binary representation of the source processor
d = binary representation of the destination processor or
memory
The data traverses the link to the first switching node
if the most significant bit of s and d are the same
route data in pass-through mode by the switch
else
use crossover path
Strip off leftmost bit of s and d
Repeat for each of the log2 N switching stages
Omega Network Routing
How to connect PE 001 to Memory module 100 ?
switch
switch
switch
switch
switch conflict : If both inputs of the switch have either 0 or 1. One of them is
connected. The other one is rejected or buffered at the switch
Hypercube Interconnection
Hypercube or binary n-cube multiprocessor structure is a
loosely coupled system.
It composed of N=2n processors interconnected in n-
dimensional binary cube.
Each processor form the node of the cube.
Each processor has direct communication path with (n) other
neighbor processor.
There are 2n distinct n-bit binary address that can be assigned
to each processor.
Hypercube Connection
Hypercube (cont.)
Point-to-point Routing 110 111
compare IDs of S & D, if S > D
look at left most bit 010
Ex. S 101 and D 000
Broadcasting (suppose from 0)
Step 1 Step 2 (C1)
100
0–1 0-2 (C3) 101
1-3 Step 3 000 001
0-4 (C0)
1-5
2-6
Multi stage 3-cube 3–7
i) Routing by least significant bit (C0)
0 – 1 , 2–3 , 4 – 5 , 6–7
ii) Routing by least significant bit (C1)
0 – 2 , 1–3 , 4 – 6 , 5–7
iii) Routing by least significant bit (C2)
0 – 4 , 1–5, 2 – 6 , 3–7
Interprocessor Arbitration
Computer system contains number of buses at various levels
to facilitate the transfer of information.
A bus that connects major components in a multiprocessor
system (such as CPU, IOP and memory) is called system bus.
Arbitration logic is the part of th system bus controller placed
between local bus and system bus that resolve the multiple
contention for shared resources.
System Bus
A typical system bus consists of approximately 100 signal lines.
System bus is divided into 3 functional groups of lines :
data bus , address bus and control bus.
In addition there are power distribution lines that supply power to
the components.
Ex. IEEE standard 796 multi bus system has 16 data lines, 24
address lines, 26 control lines and 20 power lines for total of 86
lines.
Data lines provide a path for the transfer of data between
processor and common memory.
The number of data lines are usually multiple of 8, with 16 and 32
being the most common.
Address lines are used to indentify memory location or any other
source and destination units.
The number of address lines determine the maximum possible
memory capacity in the system.
The control lines provides signal for controlling information
transfer.
Timing signals indicate validity of data and address.
Command signal specify the operation to be performed.
Data transfer on the system can be either Synchronous
Or Asynchronous
Arbitration Procedure
Arbitration procedure services all processor requests on the basis
of established priorities.
It can be implemented either HW (static ) or SW (dynamic)
Arbitration techniques:
i) Static Techniques
Serial Arbitration (Serial Connection of Units)
wired-OR
Data/Addr
Advantage: simple
Disadvantages:
Cannot assure fairness – a low-priority device may be
Daisy chain is a wiring scheme in which multiple devices are wired together in sequence.
The higher priority device will pass the grant line to the lower priority device only if it
does not want to use the bus.
Then priority is forwarded to the next in the sequence.
WORKING
The interrupt request line which is common to all the devices and CPU.
When there is no interrupt, the interrupt request line (IRL)is in HIGH state.
A device that raises an interrupt places the IRL in the LOW state.
The CPU acknowledges this interrupt request from the line and then enables
the interrupt acknowledge line in response to the request.
This signal is received at the PI (Priority in) input of device 1.
If the device has not requested the interrupt, it passes this signal to the next
device through its PO (priority out) output. (PI = 1 & PO = 1)
However, if the device had requested the interrupt, (PI =1 & PO = 0)
• The device consumes the acknowledge signal and block its further use by
placing 0 at its PO (priority out) output.
• The device then proceeds to place its interrupt vector address (VAD) into
the data bus of CPU.
• The device puts its interrupt request signal in HIGH state to indicate its
interrupt has been taken care of.
NOTE:
Interrupt vector address (VAD) is the address of the service routine
which services that device.
If a device gets 0 at its PI input, it generates 0 at the PO output to
tell other devices that acknowledge signal has been blocked. (PI =
0 & PO = 0)
Hence, the device having PI = 1 and PO = 0 is the highest priority
device that is requesting an interrupt.
Therefore, by daisy chain arrangement ensures that the highest
priority interrupt gets serviced first and have established a
hierarchy.
The farther a device is from the first device, the lower its priority.
Parallel Arbitration Logic
It uses an external priority encoder and decoder.
Each bus arbiter has a bus request output lines and a bus
acknowledge input lines.
Each arbiter enables request lines when its processor is
requesting the system bus.
The one with highest priority determine by the output of the
decoder get access to the bus.
Centralized Parallel Arbitration
Data/Addr
Polling
Deadlock
Data Inconsistency
i) The most common way is to set aside portion of memory that is accessible to
all processor (common memory)
Sending processor puts the data and the address of the receiving
processor in the memory.
All the processor periodically check the memory for any information.
If they find their address they read the data.
This procedure is time consuming
ii)Use of interrupt facility
To send the interrupt signal to the receiving processor whenever the
sending processor leaves the message.
In addition to shared memory, multiprocessor system may have other
shared resources.
Two primary forms of data exchange between parallel tasks - accessing a
shared data space and exchanging messages.
Platforms that provide a shared data space are called shared-address-
space machines or tightly coupled systems or multiprocessors.
Platforms that support messaging are also called message passing
platforms or multi-computers or loosely coupled systems (Distributed
memory)
To prevent the conflicting use of shared resources by several processor
there must be provision for assigning resources to processor.
This task is handled by the Operating System.
OS for Multiprocessors
There are three organization that have been used in design of
OS of multiprocessor:
Master-Slave Configuration
Separate OS
Distributed OS
Busy
Wait P(S) S Semaphore P(S)
Critical
Critical region
region Shared
data
structure
V(S) V(S)