0% found this document useful (0 votes)
36 views97 pages

Chapter 7

This document summarizes key aspects of multiprocessor system interconnects. It discusses different types of bus systems like local buses, backplane buses, and I/O buses. It also covers different interconnect network topologies like crossbar switches, multistage networks, and multiport memory. Issues related to cache coherence are addressed along with snoopy and directory-based cache coherence protocols. Message passing schemes for multiprocessors like store-and-forward routing and wormhole routing are briefly described.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views97 pages

Chapter 7

This document summarizes key aspects of multiprocessor system interconnects. It discusses different types of bus systems like local buses, backplane buses, and I/O buses. It also covers different interconnect network topologies like crossbar switches, multistage networks, and multiport memory. Issues related to cache coherence are addressed along with snoopy and directory-based cache coherence protocols. Message passing schemes for multiprocessors like store-and-forward routing and wormhole routing are briefly described.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 97

Multiprocessors and multi

computers
-prajwala T R
Dept. of CSE
PESIT
Multiprocessor system interconnects
Network characteristics
• Timing
– Synchronous
– asynchronous
• Switching
– Circuit switching
– Packet switching
• Control
– Centralized
– distributed
Hierarchical bus systems
• Local bus-
buses implemented within processor chip or
PCB
provides communication path among
components mounted on board
Memory bus
Data bus
• Backplane bus
• Is printed circuit on which many connectors
are used to plug in functional boards
• VME bus
• multibusII
• Futurebus+
Backplane bus
• I/O bus
– SCSCI-small computer system interface bus
– Made of coaxial cables with taps connecting to
disks,printer.
– Interface logic
– Ex: encore bus consists of 32 bit address,64 bit
data path and 14 bit vector bus
– Clock speed 12.5MHz
SCSI bus cable
Encore ultramax multiprocessor
architecture
Cross bar switch
• Single stage
• Multistage network
– Blocking ex:omega and baseline network
– non blocking(all possible connections between
i/o)
• Cross bar networks
Single stage
Cross bar networks
• Single stage, permutation and non blocking
network
• Unary switch set to open or close and
establishes point to point connections
• N X M or N=M
• All processors send request asynchronously
and independently
design
• Multiplexer
• Arbitration logic
• Acknowledgement signal
• Memory read or writ
• 16 processors then 4 bit control lines
• Advantages
– High bandwidth
– Interface is cheaper
– Single processor send many requests to multiple
modules
• Disadvantages
– Cost effective only for small number of processors
– Not expandable once built
Multiport memory
• Solution intermediate to bus and switch
• Only one of n processor requests is honored at
a time.
• Drawback
– Not scalable
– Large number of interconnection cables
Multiport memory
Multistage and combining networks
• Omega network
• Base line networks
• Hotspot problem
– ex: memory module
– Semaphore
– Degrade performance
Fetch and add primitive
• Increments content of memory loation.
• Atomic operation
• X,e-value, increments
• When using multiprocessor, when one process
is allowed to make change no other process
can access intermediate result
• Switch performs addition of increments.
• Disadvantage-
• Requires additional switch cycles to make
entire operation atomic.
• Rela time systems ex:IBMRP3
– 512 processors
– Omega network of 128 ports
– Bandwidth 13Gbps
– 50Mhz clock
• 2 methods to solve cache coherence problem
– Snoopy protocol- to monitor the values
– Directory based protocols-no broadcasting of
values. A central directory is maintained for
modifications made in the cache
Snoopy protocols
• Snoopy protocols are used to ensure
coherence of cache.
• The mechanism are
– write invalidate
– Write update
– Write through caches
– Write back caches
– Write once protocol
Snoopy protocols contd…
• Write invalidate protocol
– Will invalidate all remote copies when local cache
block is updates
• Write update policy
– Broadcast new data to all caches containing the
copy of block
Snoopy protocols contd…
• Write through caches
– I and j processors
– VALID o INVALID
• Possible operations:
– Read by same processR(i)
– Read by different processorR( j )
– Write by same processor W(i) Write by different
processor W( j )
– Replace by same processor Z(i) Replace by different
processor Z( j )
Write back caches
• Data item states: o
– RO : Read Only (Valid state)
– RW : Read Write (Valid state)
– INV : Invalid state
• Possible operations:
– Read by same processor R(i)
– Read by different processor R( j )
– Write by same processor W(i)
– Write by different processor W( j )
– Replace by same processor Z(i) Replace by different
processor Z( j )
Write back cache
Snoopy protocols contd..
• Write-once Protocol
• First write using write-through policy
• Subsequent writes using write-back policy
• In both cases, data item copy in remote caches is invalidated
• Data item states:
– Valid :cache block consistent with main memory copy
– Reserved : data has been written exactly once and is consistent
with main memory copy
– Dirty : data is written more than once but is not consistent with
main memory copy
– Invalid :block not found in cache or is inconsistent with main
memory copy
Read hit, read miss, write hit, write miss
• Read hit: The information is supplied by the c
• Read miss: The data is read from main
memory. Check for dirty or reserved states
• Write hit-if in dirty or reserved state update to
dirty state
• Write miss-invalid state
Multilevel cache coherence
• An write invalidate is sent vertically up inorder
to invalidate the shared caches at higher level.
• Higher level caches keep track of dirty blocks.
Protocol Performance issues
Directory based protocols

• Snoopy protocols broadcast the information.


• In large network this is expensive
• Write invalidate protocol leads heavy bus
traffic
• Write update protocol –the updated data may
not be used by remote processors a lot
• Hence use directory based protocol.
• Cache directories-
– List of cached locations
– Number of pointers to specify the copies of block
– Dirty bit
• Cache directories store information on where
copies of cache block resides, list of cached
locations
• Central directory scheme
– Duplicates all cache directories.
– Consistency must be maintained.
– Drawbacks-
• Contention
• Long term searches
• Distributed directory scheme
– Each memory module holds its own directories.
– State information is local to the memory module.
– If read miss in cache 2-request sent to memory
module and memory module controller
retransmits data in cache 1.
– If write hit of c1-controller sends invalidation to all
caches.
Types of directories
• Full map directories
– Each directory has n entries where n is number of
processors.
– 2 bit-entry for processor(valid),dirty bit(whether
block overwritten)
• Steps
– Cache c3 finds block containing x is valid.
– C3 issues write request to memory module
containing x.
– Memory module invalidates requests of c1 and c2
– C1 and c2 set the b it indicating x is no longer
valid.
– Memory module sends write request to c3
– Cache c3 updates value of x
Limited directories
• Directory size problem is solved- entries only if
cache block has the value X else no entry.
• Dirix
– i –number of pointers
– X-no broadcast scheme
• Full map scheme without broadcast
• i<n pointers
• Dir2NB-pointer replacement-eviction
• Directory-set associative mapping
• Scalable protocols
• Dir I B-
– Allow more than I copies of each block of data to
exist
Chained directories
• Singly linked chain
– Initially no shared copies of x
– P1 reads x from shared memory along with chain
termination pointer.
– If p2 requires cache data it is read from p1 along
with CT.
– Memory then keeps a pointer to c2.
– Gossip protocol –info passed from individual to
individual .
• Doubly linked chain
– 2 pointer-backward and forward chain pointers.
– More memory because of more storage of 2
pointers
• Cache design alternatives
• Shared caches-no private cache,will reduce
main memory access time,second level cache
• Shared data can be non cacheable.
• Cache flushing-at synchronization ,I/O and
process migration.
Atomic operation
• Synchronization primitives
– Test and set lock
– Lock-1 set
– Lock 0-reset
– Spin lock
• Wired barrier synchronization
– Wired NOR logic
– Control vector –X
– Common monitor-Y
– Xi connected to input
– Yi output
• Xi -1-process is initiated
• Barrier set to 1-synchronization
• Only one barrier line is needed to initiate and
complete single synchronization operation
3 generations of multicomputers
Message passing schemes
• Message formats
– Fixed length of packets.
– Destination addr, sequence number
– Further dived to flits-flow control digits
– Store and forward routing-packets
– Wormhole-flits
– Size of packets-64-512 bits
Store and forward routing
• Basic units of transfer are packets
• Transmitted through series of intermediate
nodes.
• Buffers are used to store packets, then
transferred to output channels
• Latency is α number of hops
Wormhole routing
• Flits are used.
• Transmission from source to destination is
done through routers
• All flits are transmitted in order ,as
inseparable companions.
• Header flit,dataflits
• Latency is independent of distance or number
of hops
Asynchronous pipelining
• Handshaking protocol
• 1 –bit ready/request line is used between
adjacent routers.
• No global clock
virtual channels
• Virtual channel is a logical link between 2
nodes.
• Flit buffer in source node and a physical
channel between them and flit buffer at
receiver node.
• One source buffer is paired with one receiver..
buffer to form virtual channel
• Physical channel is time shared by all virtual
channels
deadlocks
• Why?
– 4 flits from 4 messages occupy 4 channels
– Circular waits
• How to detect deadlock?
– Channel dependence graph
• Deadlock avoidance
– Use virtual channels
– Can be bidirectional or unidirectional
Flow control strategies
• Packet collision
• Elements –scr buffer and dest buffer holding
slit,channel
• Packet collision resolution-
– Which packet will be allocated to channel?
– What will be done to packet which is denied
Solution 1
• Virtual cut through routing scheme
– Packet 2 temporarily stored in buffer.
– Adv- not wasting resource
– Disadv-requires use of large buffer, storage delay
– Packet buffer should not have cycles
Solution 2
• Blocking flow control
– Second packet is blocked but not abandoned
Solution 3
• Discard and retransmit
– Drops packet
– Disadv-wastage of resources,unstable delivery
rate
– Requires packet retransmission and
acknowledgement.
Solution 4
• Detour after blocked
– Results in idling the resources
– Offers flexibility
– Rerouted packet enters live stock which wastes
resources
Dimension order routing
• Deterministic
– Communication path is completely predetermined
by source and destination.
– Dimension order routing- X Y routing, E cube
routing
• Adaptive routing depends on network
conditions
E cube routing
• N=2n
• Source (s),dest(d),intermediate node (v)
(0,1…n-1),
1.Direction bit ri=si-1 XOR di-1
2.V XOR 2i-1 if ri=1.else ri=0 skip
3.Move to i+1 dimension until dest reached.
example
Adaptive routing
Multicast routing algorithms
• Communication patters

– Unicast
– Broadcast
– multicast
Routing efficiency
• Channel bandwidth
• Communication delay
• Implemented by replicating packet at
intermediate node and multiple copies of
packet reach destination.
Virtual networks
Network portioning

You might also like