Chapter 7

This document summarizes key aspects of multiprocessor system interconnects. It discusses different types of bus systems like local buses, backplane buses, and I/O buses. It also covers different interconnect network topologies like crossbar switches, multistage networks, and multiport memory. Issues related to cache coherence are addressed along with snoopy and directory-based cache coherence protocols. Message passing schemes for multiprocessors like store-and-forward routing and wormhole routing are briefly described.

Uploaded by

K S Sanath Kashyap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views97 pages

Chapter 7

Uploaded by

K S Sanath Kashyap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 97

Multiprocessors and multi

computers
-prajwala T R
Dept. of CSE
PESIT
Multiprocessor system interconnects
Network characteristics
• Timing
– Synchronous
– asynchronous
• Switching
– Circuit switching
– Packet switching
• Control
– Centralized
– distributed
Hierarchical bus systems
• Local bus-
buses implemented within processor chip or
PCB
provides communication path among
components mounted on board
Memory bus
Data bus
• Backplane bus
• Is printed circuit on which many connectors
are used to plug in functional boards
• VME bus
• multibusII
• Futurebus+
Backplane bus
• I/O bus
– SCSCI-small computer system interface bus
– Made of coaxial cables with taps connecting to
disks,printer.
– Interface logic
– Ex: encore bus consists of 32 bit address,64 bit
data path and 14 bit vector bus
– Clock speed 12.5MHz
SCSI bus cable
Encore ultramax multiprocessor
architecture
Cross bar switch
• Single stage
• Multistage network
– Blocking ex:omega and baseline network
– non blocking(all possible connections between
i/o)
• Cross bar networks
Single stage
Cross bar networks
• Single stage, permutation and non blocking
network
• Unary switch set to open or close and
establishes point to point connections
• N X M or N=M
• All processors send request asynchronously
and independently
design
• Multiplexer
• Arbitration logic
• Acknowledgement signal
• Memory read or writ
• 16 processors then 4 bit control lines
• Advantages
– High bandwidth
– Interface is cheaper
– Single processor send many requests to multiple
modules
• Disadvantages
– Cost effective only for small number of processors
– Not expandable once built
Multiport memory
• Solution intermediate to bus and switch
• Only one of n processor requests is honored at
a time.
• Drawback
– Not scalable
– Large number of interconnection cables
Multiport memory
Multistage and combining networks
• Omega network
• Base line networks
• Hotspot problem
– ex: memory module
– Semaphore
– Degrade performance
Fetch and add primitive
• Increments content of memory loation.
• Atomic operation
• X,e-value, increments
• When using multiprocessor, when one process
is allowed to make change no other process
can access intermediate result
• Switch performs addition of increments.
• Disadvantage-
• Requires additional switch cycles to make
entire operation atomic.
• Rela time systems ex:IBMRP3
– 512 processors
– Omega network of 128 ports
– Bandwidth 13Gbps
– 50Mhz clock
• 2 methods to solve cache coherence problem
– Snoopy protocol- to monitor the values
– Directory based protocols-no broadcasting of
values. A central directory is maintained for
modifications made in the cache
Snoopy protocols
• Snoopy protocols are used to ensure
coherence of cache.
• The mechanism are
– write invalidate
– Write update
– Write through caches
– Write back caches
– Write once protocol
Snoopy protocols contd…
• Write invalidate protocol
– Will invalidate all remote copies when local cache
block is updates
• Write update policy
– Broadcast new data to all caches containing the
copy of block
Snoopy protocols contd…
• Write through caches
– I and j processors
– VALID o INVALID
• Possible operations:
– Read by same processR(i)
– Read by different processorR( j )
– Write by same processor W(i) Write by different
processor W( j )
– Replace by same processor Z(i) Replace by different
processor Z( j )
Write back caches
• Data item states: o
– RO : Read Only (Valid state)
– RW : Read Write (Valid state)
– INV : Invalid state
• Possible operations:
– Read by same processor R(i)
– Read by different processor R( j )
– Write by same processor W(i)
– Write by different processor W( j )
– Replace by same processor Z(i) Replace by different
processor Z( j )
Write back cache
Snoopy protocols contd..
• Write-once Protocol
• First write using write-through policy
• Subsequent writes using write-back policy
• In both cases, data item copy in remote caches is invalidated
• Data item states:
– Valid :cache block consistent with main memory copy
– Reserved : data has been written exactly once and is consistent
with main memory copy
– Dirty : data is written more than once but is not consistent with
main memory copy
– Invalid :block not found in cache or is inconsistent with main
memory copy
Read hit, read miss, write hit, write miss
• Read hit: The information is supplied by the c
• Read miss: The data is read from main
memory. Check for dirty or reserved states
• Write hit-if in dirty or reserved state update to
dirty state
• Write miss-invalid state
Multilevel cache coherence
• An write invalidate is sent vertically up inorder
to invalidate the shared caches at higher level.
• Higher level caches keep track of dirty blocks.
Protocol Performance issues
Directory based protocols

• Snoopy protocols broadcast the information.

• In large network this is expensive
• Write invalidate protocol leads heavy bus
traffic
• Write update protocol –the updated data may
not be used by remote processors a lot
• Hence use directory based protocol.
• Cache directories-
– List of cached locations
– Number of pointers to specify the copies of block
– Dirty bit
• Cache directories store information on where
copies of cache block resides, list of cached
locations
• Central directory scheme
– Duplicates all cache directories.
– Consistency must be maintained.
– Drawbacks-
• Contention
• Long term searches
• Distributed directory scheme
– Each memory module holds its own directories.
– State information is local to the memory module.
– If read miss in cache 2-request sent to memory
module and memory module controller
retransmits data in cache 1.
– If write hit of c1-controller sends invalidation to all
caches.
Types of directories
• Full map directories
– Each directory has n entries where n is number of
processors.
– 2 bit-entry for processor(valid),dirty bit(whether
block overwritten)
• Steps
– Cache c3 finds block containing x is valid.
– C3 issues write request to memory module
containing x.
– Memory module invalidates requests of c1 and c2
– C1 and c2 set the b it indicating x is no longer
valid.
– Memory module sends write request to c3
– Cache c3 updates value of x
Limited directories
• Directory size problem is solved- entries only if
cache block has the value X else no entry.
• Dirix
– i –number of pointers
– X-no broadcast scheme
• Full map scheme without broadcast
• i<n pointers
• Dir2NB-pointer replacement-eviction
• Directory-set associative mapping
• Scalable protocols
• Dir I B-
– Allow more than I copies of each block of data to
exist
Chained directories
• Singly linked chain
– Initially no shared copies of x
– P1 reads x from shared memory along with chain
termination pointer.
– If p2 requires cache data it is read from p1 along
with CT.
– Memory then keeps a pointer to c2.
– Gossip protocol –info passed from individual to
individual .
• Doubly linked chain
– 2 pointer-backward and forward chain pointers.
– More memory because of more storage of 2
pointers
• Cache design alternatives
• Shared caches-no private cache,will reduce
main memory access time,second level cache
• Shared data can be non cacheable.
• Cache flushing-at synchronization ,I/O and
process migration.
Atomic operation
• Synchronization primitives
– Test and set lock
– Lock-1 set
– Lock 0-reset
– Spin lock
• Wired barrier synchronization
– Wired NOR logic
– Control vector –X
– Common monitor-Y
– Xi connected to input
– Yi output
• Xi -1-process is initiated
• Barrier set to 1-synchronization
• Only one barrier line is needed to initiate and
complete single synchronization operation
3 generations of multicomputers
Message passing schemes
• Message formats
– Fixed length of packets.
– Destination addr, sequence number
– Further dived to flits-flow control digits
– Store and forward routing-packets
– Wormhole-flits
– Size of packets-64-512 bits
Store and forward routing
• Basic units of transfer are packets
• Transmitted through series of intermediate
nodes.
• Buffers are used to store packets, then
transferred to output channels
• Latency is α number of hops
Wormhole routing
• Flits are used.
• Transmission from source to destination is
done through routers
• All flits are transmitted in order ,as
inseparable companions.
• Header flit,dataflits
• Latency is independent of distance or number
of hops
Asynchronous pipelining
• Handshaking protocol
• 1 –bit ready/request line is used between
adjacent routers.
• No global clock
virtual channels
• Virtual channel is a logical link between 2
nodes.
• Flit buffer in source node and a physical
channel between them and flit buffer at
receiver node.
• One source buffer is paired with one receiver..
buffer to form virtual channel
• Physical channel is time shared by all virtual
channels
deadlocks
• Why?
– 4 flits from 4 messages occupy 4 channels
– Circular waits
• How to detect deadlock?
– Channel dependence graph
• Deadlock avoidance
– Use virtual channels
– Can be bidirectional or unidirectional
Flow control strategies
• Packet collision
• Elements –scr buffer and dest buffer holding
slit,channel
• Packet collision resolution-
– Which packet will be allocated to channel?
– What will be done to packet which is denied
Solution 1
• Virtual cut through routing scheme
– Packet 2 temporarily stored in buffer.
– Adv- not wasting resource
– Disadv-requires use of large buffer, storage delay
– Packet buffer should not have cycles
Solution 2
• Blocking flow control
– Second packet is blocked but not abandoned
Solution 3
• Discard and retransmit
– Drops packet
– Disadv-wastage of resources,unstable delivery
rate
– Requires packet retransmission and
acknowledgement.
Solution 4
• Detour after blocked
– Results in idling the resources
– Offers flexibility
– Rerouted packet enters live stock which wastes
resources
Dimension order routing
• Deterministic
– Communication path is completely predetermined
by source and destination.
– Dimension order routing- X Y routing, E cube
routing
• Adaptive routing depends on network
conditions
E cube routing
• N=2n
• Source (s),dest(d),intermediate node (v)
(0,1…n-1),
1.Direction bit ri=si-1 XOR di-1
2.V XOR 2i-1 if ri=1.else ri=0 skip
3.Move to i+1 dimension until dest reached.
example
Adaptive routing
Multicast routing algorithms
• Communication patters

– Unicast
– Broadcast
– multicast
Routing efficiency
• Channel bandwidth
• Communication delay
• Implemented by replicating packet at
intermediate node and multiple copies of
packet reach destination.
Virtual networks
Network portioning

Documentation Ubuntu Server 2025-03-18
No ratings yet
Documentation Ubuntu Server 2025-03-18
810 pages
Veeam 9.5 User Guide
No ratings yet
Veeam 9.5 User Guide
1,121 pages
Module 4
No ratings yet
Module 4
66 pages
Introduction To Distributed Operating Systems Communication in Distributed Systems
No ratings yet
Introduction To Distributed Operating Systems Communication in Distributed Systems
150 pages
Multi Processor
No ratings yet
Multi Processor
63 pages
Chapter 8 - Parallel Processing
No ratings yet
Chapter 8 - Parallel Processing
50 pages
200-301 CCNA (Cisco Certified Network Associate) Study Guide
From Everand
200-301 CCNA (Cisco Certified Network Associate) Study Guide
Anand Vemula
No ratings yet
1st Ia Preparation
No ratings yet
1st Ia Preparation
15 pages
05 Multiprocessor
No ratings yet
05 Multiprocessor
54 pages
Aca UNIT-4
No ratings yet
Aca UNIT-4
18 pages
ACA Lecture 29 Cache-Coherence 2
No ratings yet
ACA Lecture 29 Cache-Coherence 2
42 pages
ch5 4
No ratings yet
ch5 4
9 pages
Cosc530 Ch5all6up
No ratings yet
Cosc530 Ch5all6up
5 pages
MODULE 4 HPC
No ratings yet
MODULE 4 HPC
41 pages
Aca UNIT-4
No ratings yet
Aca UNIT-4
19 pages
EECS 470 Final Review
No ratings yet
EECS 470 Final Review
16 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
Unit 5
No ratings yet
Unit 5
89 pages
Snoop-Based Multiprocessor Design
No ratings yet
Snoop-Based Multiprocessor Design
57 pages
Cache Coherence - 20250120 - 142158 - 0000
No ratings yet
Cache Coherence - 20250120 - 142158 - 0000
34 pages
Cache Coherence
No ratings yet
Cache Coherence
53 pages
Module 4
No ratings yet
Module 4
40 pages
Cache Coherence
No ratings yet
Cache Coherence
14 pages
Snooping vs. Directory Based Coherency: Professor David A. Patterson Computer Science 252 Fall 1996
No ratings yet
Snooping vs. Directory Based Coherency: Professor David A. Patterson Computer Science 252 Fall 1996
59 pages
L39 - Centralized Shared Memory Architectures
No ratings yet
L39 - Centralized Shared Memory Architectures
31 pages
CH17 COA9e Parallel Processing
No ratings yet
CH17 COA9e Parallel Processing
52 pages
CS P3 MS Notes
No ratings yet
CS P3 MS Notes
14 pages
Bus-Based Multiprocessor: A.K.A or Snoopy-Bus Architecture
No ratings yet
Bus-Based Multiprocessor: A.K.A or Snoopy-Bus Architecture
54 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
23 pages
Ups Multi Dialog
100% (2)
Ups Multi Dialog
49 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
1.symmetric and Distributed Shared Memory Architectures
79% (19)
1.symmetric and Distributed Shared Memory Architectures
29 pages
Coherence
No ratings yet
Coherence
16 pages
VII. Cache Coherence. Interconnection Networks (1) : March 16, 2009
No ratings yet
VII. Cache Coherence. Interconnection Networks (1) : March 16, 2009
42 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
Citrix Virtual Desktop Handbook (7x)
No ratings yet
Citrix Virtual Desktop Handbook (7x)
159 pages
Module 3
No ratings yet
Module 3
25 pages
CH17 COA9e
No ratings yet
CH17 COA9e
51 pages
Cache Coherence: Computer Science & Artificial Intelligence Lab
No ratings yet
Cache Coherence: Computer Science & Artificial Intelligence Lab
36 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Shared Memory Architectures
No ratings yet
Shared Memory Architectures
34 pages
Multiple Processor Systems: 8.1 Multiprocessors 8.2 Multicomputers 8.3 Distributed Systems
No ratings yet
Multiple Processor Systems: 8.1 Multiprocessors 8.2 Multicomputers 8.3 Distributed Systems
55 pages
Lecture 4 Network Topologies For Parallel Architecture
No ratings yet
Lecture 4 Network Topologies For Parallel Architecture
34 pages
Unit 5 (Slides)
No ratings yet
Unit 5 (Slides)
75 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
Hack into your Friends Computer
From Everand
Hack into your Friends Computer
Magelan Cyber Security
No ratings yet
CA-unit 5-Material-For Reference
No ratings yet
CA-unit 5-Material-For Reference
16 pages
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
No ratings yet
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
79 pages
Lecture 3 Multiprocessor Vs Multicomputer Vs DS
No ratings yet
Lecture 3 Multiprocessor Vs Multicomputer Vs DS
55 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
Garden Planner User Guide
No ratings yet
Garden Planner User Guide
62 pages
ProfiSignal Basic Introduction
No ratings yet
ProfiSignal Basic Introduction
34 pages
Memory Hierarchy: Haresh Dagale Dept of ESE
No ratings yet
Memory Hierarchy: Haresh Dagale Dept of ESE
32 pages
MultiProcessors Tanenbaum BP
No ratings yet
MultiProcessors Tanenbaum BP
29 pages
Handover Performance BSC3JP01012010
No ratings yet
Handover Performance BSC3JP01012010
119 pages
Lec 6 SharedArch PDF
No ratings yet
Lec 6 SharedArch PDF
33 pages
Atm Surveillance System
No ratings yet
Atm Surveillance System
6 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
Multiprocessor Architecture and Programming
No ratings yet
Multiprocessor Architecture and Programming
20 pages
Snooping Cache and Directory Based Multiprocessors
No ratings yet
Snooping Cache and Directory Based Multiprocessors
59 pages
Client Side Server Side
No ratings yet
Client Side Server Side
4 pages
Introduction
No ratings yet
Introduction
46 pages
1.0 Introduction "EMC Cyber": Confidentiality - Set of Rules That Limits Access To Sensitive Information For
No ratings yet
1.0 Introduction "EMC Cyber": Confidentiality - Set of Rules That Limits Access To Sensitive Information For
23 pages
Shared Memory Architecture
No ratings yet
Shared Memory Architecture
39 pages
Network Monitoring With Zabbix
No ratings yet
Network Monitoring With Zabbix
6 pages
Experiment 2. Implement Dynamic Routing Using RIP (Routing Information Protocol)
No ratings yet
Experiment 2. Implement Dynamic Routing Using RIP (Routing Information Protocol)
17 pages
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
No ratings yet
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
33 pages
PowerMedia XMS Release 3.3
No ratings yet
PowerMedia XMS Release 3.3
36 pages
CompTIA Network
No ratings yet
CompTIA Network
6 pages
Fault-Tolerant Parallel Algorithms
No ratings yet
Fault-Tolerant Parallel Algorithms
16 pages
Parallel 2
No ratings yet
Parallel 2
14 pages
Cache Coherency
No ratings yet
Cache Coherency
33 pages
Module 10 - LAN Security Concepts
No ratings yet
Module 10 - LAN Security Concepts
30 pages
Cheat Sheet Prepared For Advanced Computer Architecture Midterm Exam - UofM
No ratings yet
Cheat Sheet Prepared For Advanced Computer Architecture Midterm Exam - UofM
11 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
Comp 207
No ratings yet
Comp 207
2 pages
Interconnecting Cisco Networking Devices Part 1 (100-101) : Exam Description
No ratings yet
Interconnecting Cisco Networking Devices Part 1 (100-101) : Exam Description
5 pages
Computer Science Textbook Solutions - 18
No ratings yet
Computer Science Textbook Solutions - 18
31 pages
BSNL Jabalpur PDF
No ratings yet
BSNL Jabalpur PDF
12 pages
Gottfried Bammes Die Gestalt Des Menschen PDF Download
No ratings yet
Gottfried Bammes Die Gestalt Des Menschen PDF Download
3 pages
Datasheet c78 743172 PDF
No ratings yet
Datasheet c78 743172 PDF
10 pages
CLOSER23 Igor
No ratings yet
CLOSER23 Igor
10 pages
January 2024 Cagayan DL - RM Transmittal
No ratings yet
January 2024 Cagayan DL - RM Transmittal
6 pages
RD Service Device Driver 3.0 PDF
No ratings yet
RD Service Device Driver 3.0 PDF
2 pages
Design of Digital Telephone System Oriented For Training: Bin Zhang Danhong Zhang Pan Yu Huajun Zhang
No ratings yet
Design of Digital Telephone System Oriented For Training: Bin Zhang Danhong Zhang Pan Yu Huajun Zhang
5 pages
Easypic3 Features
No ratings yet
Easypic3 Features
1 page
How To Configure TPLink TD-W8960N's LAN Port Work As WAN Port (TD-W8960N Work With Cable Modem)
No ratings yet
How To Configure TPLink TD-W8960N's LAN Port Work As WAN Port (TD-W8960N Work With Cable Modem)
5 pages
Ptsa Certificate Issue July2020v3
No ratings yet
Ptsa Certificate Issue July2020v3
3 pages
Kommineni - Niharika: Curriculam Vitae
No ratings yet
Kommineni - Niharika: Curriculam Vitae
2 pages

Chapter 7

Uploaded by

Chapter 7

Uploaded by

Multiprocessors and multi

• Snoopy protocols broadcast the information.

You might also like