0% found this document useful (0 votes)

39 views54 pages

Bus-Based Multiprocessor: A.K.A or Snoopy-Bus Architecture

The document describes different architectures for bus-based multiprocessors, including those with a shared bus connecting processors to memory, those with multiple memory modules, and distributed shared memory architectures. It also discusses cache coherence problems that can arise in multiprocessor systems with write-back caches and different protocols like MSI and MESI that can be used to maintain coherence.

Uploaded by

madhu75

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views54 pages

Bus-Based Multiprocessor: A.K.A or Snoopy-Bus Architecture

Uploaded by

madhu75

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 54

Bus-Based Multiprocessor

P P P

$ $ …….. $
Memory Bus

A.k.a SMP or Snoopy-Bus Memory

Architecture

• Most common form of multiprocessor!

• Small to medium-scale servers: 4-32 processors
• E.g., Intel/DELL Pentium II, Sun UltraEnterprise 450

1
Shared Cache Multiprocessor
P P …….. P

Interleaved $

Memory

• Very small number of processors: up to 4

• E.g., Dual-cpu Intel, Stanford Hydra on-chip multiprocessor

2
Dance Hall Multiprocessor
P P P

$ $ …….. $

Memory Memory

• Large-scale machines
• E.g., NYU Ultracomputer, IBM RP3

3
Distributed Shared Memory (DSM)
P P P
……..

Memory

Memory
$ $
Memory

• Most common form of large shared memory

• E.g., SGI Origin, Sequent NUMA-Q, Convex Exemplar

4
Cache Coherence Problem

Time P1 X=0 P2
$ $
Load X
Load X
Store 4, X

Load X
• What value of X is in P1 and P2’s caches?
• What value of X is in memory?

5
Cache-coherence problem
Proc0 Proc1
1 Ld X 3 Ld X
5 St X, 1 ...
7 Ld X
Cache0 4 X=0 Cache1
2 X=0
6 X=1 8 X=0

Memory
X=0

6
Write-through Coherence Protocol
Proc0 Proc1
1 Ld X 3 Ld X
5 St X, 1
7 Ld X
Cache0 4 X=0 Cache1
2 X=0
6 X=1 5c X=0
8 X=1
5a Write X

Memory
0 X=0
5b X=1

7
Write-through State Transition Diagram
PrRd/— PrWr/BusWr

PrRd/BusRd BusWr/—

PrWr/BusWr Processor-initiated transactions

Transactions on the BUS

• PrRd, PrWr
• BusRd (go to memory get data)
• BurWr (write data to memory/invalidate cached copies)

8
Problem with Write-Through
High bandwidth requirements
• Every write from every processor goes to shared bus and memory
• Consider 200MHz, 1CPI processor, and 15% instrs. are 8-byte
stores
• Each processor generates 30M stores or 240MB data per second
• 1GB/s bus can support only about 4 processors without saturating
• Write-through especially unpopular for SMPs
Write-back caches absorb most writes as cache hits
• Write hits don’t go on bus
• But now how do we ensure write propagation and serialization?
• Need more sophisticated protocols: large design space
Solution?
Write-back-based protocols

9
Design Space for Snooping Protocols
No need to change processor, main memory, cache …
• Extend cache controller and exploit bus (provides serialization)
Focus on protocols for write-back caches
Design space
• Invalidation versus Update-based protocols
– On write invalidate or update other copies

• Set of states
– Block OWNER:
• thus far data comes only from memory which is always updated
• owner is the one that is responsible for supplying data

10
Invalidate versus Update
Basic question of program behavior
• Is a block written by one processor read by others before it is
rewritten?
Invalidation:
• Yes => readers will take a miss
• No => multiple writes without additional traffic
– and clears out copies that won’t be used again
Update:
• Yes => readers will not miss if they had a copy previously
– single bus transaction to update all copies
• No => multiple useless updates, even to dead copies
Need to look at program behavior and hardware complexity
Invalidation protocols much more popular (more later)
• Some systems provide both, or even hybrid
11
Basic MSI Writeback Inval. Protocol
States
• Invalid (I)
• Shared (S): one or more
• Dirty or Modified (M): one only

Processor Events:
• PrRd (read)
• PrWr (write)

Bus Transactions
• BusRd: asks for copy with no intent to modify
• BusRdX: asks for copy with intent to modify
• BusWB (shown as Flush later on): updates memory

Actions
• Update state, perform bus transaction, flush value onto bus
12
MSI: Behavior
(1) Read hit: use local copy; no state change (states S or M)
(2) Read miss:
- if M copy exists, it is flushed onto the bus and memory;
all copies set to S
- otherwise, access memory; state set to S
(3) Write hit:
- if local copy is S; request exclusive copy; other copies
are invalidated; local copy set to M
- if local copy M; just write locally; no state changes
(4) Write miss:
- generate read excl. request; all other copies are
invalidated; if M copy exists it is flushed; set local state to M
13
Simple MSI Protocol: SGI 4D
Write-invalidate for write-back caches
PrRd: Processor read (load)
PrWr: Processor write (store)
BusRd: ReadOnly copy due to a PrRd
BusRdX: Writable copy due to a PrWr
BusWB: Writing back a block
BusInv: Invalidate other copies
BusCache: Cache-to-cache block transfer
BusUpdate: One/Two word update

14
Simple MSI Protocol: SGI 4D
I - Invalid
S - Shared
I
M - Modified
/ -

Pr
X

W
Rd

r/B
s
Bu

us
d

Rd
R

Rd
us

X
X/
PrRd/- /B PrRd/-
d

Bu
R
BusRd/- Pr

sW
PrWr/-

B
PrWr/BusRdX
S M

BusRd/BusWB

15
MSI:State Transition Diagram
PrRd/— PrWr/—

PrWr/BusRdX BusRd/Flush

S BusRdX/Flush
BusRdX/—
PrRd/BusRd
PrRd/—
PrWr/BusRdX BusRd/—

16
Lower-level Protocol Choices
BusRd observed in M state: what transitition to make, S or I?
Depends on expectations of access patterns
• S: assumption that I’ll read again soon, rather than other will write
– good for mostly read data
– what about “migratory” data
• I read and write, then you read and write, then X reads and
writes...
• better to go to I state, so I don’t have to be invalidated on your
write
• Synapse transitioned to I state
• Sequent Symmetry and MIT Alewife use adaptive protocols
Choices can affect performance of memory system

17
MESI (4-state) Invalidation Protocol

“Problem” with MSI protocol

• Read/modify (e.g., locks) is 2 bus xactions, even if no-one
sharing
– e.g. even in sequential program
– BusRd (I->S) followed by BusRdX or BusUpgr (S->M)

18
MESI (4-state) Invalidation Protocol

Add exclusive state: write locally without xaction, but not

modified
• Main memory is up to date, so cache not necessarily owner
• States
– invalid
– exclusive or exclusive-clean (only this cache has copy, but not
modified)
– shared (two or more caches may have copies)
– modified (dirty)
OWNER: Who is responsible for most uptodate copy: cache or
memory
• I -> E on PrRd if no-one else has copy
– “needs” “shared” signal on bus: wired-or line asserted in response to
BusRd
– Really way of knowing whether other copies exist

19
MESI State Transition Diagram
PrRd/—
PrWr/—
M BusRdX/Flush

BusRd/Flush
PrWr/—
PrWr/BusRdX
E
BusRd/
Flush BusRdX/Flush
PrRd/—
PrWr/BusRdX
S
PrRd/ BusRdX/Flush 
BusRd (!S)
PrRd/—
BusRd/Flush
PrRd/
BusRd(S)
I

• Same bus transactions as MSI

• Only diff, need for shared signal (BusRd(S) means other copies
exist)
20
Alternative Description of CC Protocols
Read MISS:
• read to non-existent, INVALID
• generates BUS transaction

Read HIT:
• read to any other state other than INVALID
• never generates BUS transaction

Write MISS:
• write to INVALID, Non-existent, or READ-ONLY (SHARED, …)
• generates BUS transaction

Write HIT
• write to READ-WRITE state (Modified,...)

21
MESI: behavior
(1) Read hit: use local copy; no state change (can be S, M or
E)
(2) Read miss:
- if no other copy exists get from memory; set local copy to E
- if E or S copies exist; get from memory (or the cache w/ E);
set all copies to S
- if M copy exists; get that (could be via memory); set both
copies to S

22
MESI: behavior
(3) Write hit:
- if local copy in E or M state; write locally; set state to M
- if local copy in S; invalidate all other copies; set state to M
(4) Write miss:
- if no other copy; get from memory; set state to M
- if M copy exists; flush that copy; set state to M
- if E or S copies exist; invalidate them; set state to M

23
Lower-level Protocol Choices
Who supplies data on miss when not in M state: memory or
cache
Original, lllinois MESI: cache, since assumed faster than
memory
• Cache-to-cache sharing
Not necessarily true in modern systems
• Intervening in another cache more expensive than getting from
memory

24
Lower-level Protocol Choices
Cache-to-cache sharing also adds complexity
• How does memory know it should supply data (must wait for
caches)
• Selection algorithm if multiple caches have valid data
But valuable for cache-coherent machines with distributed
memory
• May be cheaper to obtain from nearby cache than distant
memory
• Especially when constructed out of SMP nodes (Stanford DASH)

25
MOESI: behavior
As MESI but new state O=Owned
Have copy which could be shared but memory does not have
the most up-to-date value

As in MESI but w/ these differences:

a Read miss than brings in a remote M copy forces the remote
copy to O state

upon replacing an O copy it has to be treated as a dirty block

Why this? Reduces memory traffic

26
Coherence protocols
And now for a little bit of history

27
Write-Once (invalidation)
First to be described in the literature4 states: I, V, R(eserved),
D(irty), global invalidate line
(1) Read Hit:: access from cache, no state change
(2) Read Miss: if another cache has DIRTY copy, it inhibits memory, writes
the line back, and the requesting cache gets copy
Else, the line is loaded from memory
All caches with a copy set it VALID
(3) Write hit : if the line is DIRTY, write proceeds locally
If RESERVED, proceed locally and mark DIRTY
If VALID, write-through and mark RESERVED; other caches set state to
INVALID
(4) Write Miss: like a read miss, the line is copied from memory or from a
DIRTY copy; line marked DIRTY. All other caches invalidate copies
28
Write-Once Protocol
I - Invalid V - Valid R - Reserved D - Dirty
PrRd/- BusWB/-
BusRd/- BusRdX/-
V I
PrRd/BusRd -
/
PrWr/BusWrOnce

BusRdX/BusWB
X

PrWr/BusRdX
R d
us
BBusRd/-
Bu
s Rd PrRd/-
/ Bu PrWr/-
sW
B
R D
PrRd/- PrWr/-
Reserved: had copy, written once and no-one asked for it yet
BusWrOnce: BusRdX followed by BusWB
29
Synapse (invalidation)
First bus-based protocol Implemented! (protocol like SGI 4D)
3 states: I, S, and D
Avoid global inhibit line: use a tag bit in memory; if set, memory not
uptodate
(1) Read hit: access from cache; no state change
(2) Read miss:
• If another cache has a DIRTY copy,it supplies a nack, then writes back to
memory, resets tag bit in memory and sets its local state to INVALID; then
the requesting cache makes a second miss; the loaded line is set to VALID
• If block Shared, read from memory
(3) Write hit: if DIRTY, proceed locally; no state change
• if VALID, proceed like a write miss; including data transfer. There is no
invalidation signal
(4) Write miss: like a read miss but all VALID copies are set invalid
• line’s tag in main memory is set
30
Synapse Protocol
I - Invalid
S - Shared
I
D - Dirty
/ -

Pr
X

W
R d

r/B
us

Bu
B

us
d

Rd
R

Rd
us

X
X/
PrRd/- /B PrRd/-
d

Bu
R
BusRd/- Pr

sW
PrWr/-

B
PrWr/BusRdX
S D

BusRd/BusWB

High overhead on misses, probably only of historical interest

31
Berkeley (invalidation)
Multiprocessor Workstation (SPUR): 4 states: I, S, D and SD
(shared-dirty)
Uses the fourth state to optimize cache-to-cache transfers
(1) Read hit: use local copy; no state changes
(2) Read miss:
• If block Dirty or Shared-Dirty, transfer cache-to-cache; if DIRTY copy exists
it’s changed to Shared-Dirty; local copy is marked Shared
• If block Shared, read from memory; mark Shared
(3) Write hit:
• On Dirty, use local copy
• On Shared-Dirty Invalidate other copies; mark local copy DIRTY
(4) Write miss:
• Copy comes from owner (shared-dirty or memory); local copy set to
DIRTY; others INVALID
32
Berkeley Protocol
I - Invalid S - Shared SD - Shared-Dirty D - Dirty
BusRdX/-
PrRd/- BusInv/-
BusRd/- S I
PrRd/BusRd

BusRdX/BusWB

PrWr/BusRdX
X /-
R d /-
us nv
B usI
B PrRd/-
PrWr/-
BusRd/BusCache
SD D

PrRd/- PrWr/BusInv
BusRd/BusCache

33
Illinois Protocol
Implemented in SGI multiprocessors
4 states: I, V, S, VE (valid exclusive, similar to MESI)
Missed data always comes from caches, bus SharedLine
(1) Read hit: blah blah
(2) Read miss:
• If block Dirty, transfer cache-to-cache, and write back; state to shared
• If block Shared, transfer cache-to-cache; set state to Shared
• If no cached copy get from mem; set state to Valid-Exclusive
(3) Write hit: local Dirty? Grab that no state changes
• local VE? State to Dirty
• local Shared? Invalidate all other copies; state to Dirty
(4) Write miss:
• Same as Read miss; local set to Dirty all others invalidated

34
Illinois Protocol
I - Invalid S - Shared VE - Valid-Exclusive D - Dirty
BusRdX/-
BusInv/-
PrRd/- S I
Pr PrRd/BusRd
BusRd/BusCache Bu W
r/B

BusRdX/BusWB
BusRd/BusCache

PrWr/BusRdX
sR us
PrRd/BusRd d /B In
us v
W
-/ B
dX R d
PrRd/BusRd s R us PrRd/-
Bu d /B PrWr/-
R
Pr
VE D
PrRd/- PrWr/-
PrWr/BusRdX
35
Firefly write-back update protocol
Good performance when multiple processors are repeatedly
reading and updating the same location
3 states: VALID-EXCLUSIVE, SHARED and DIRTY (similar to
MES w/o I)
global shared line

36
Firefly: behavior
(1) Read hit: access from cache; no state change
(2) Read miss: if another cache has copy they place it on the bus (multiple
possible); set all copies to SHARED; if DIRTY exists it is written back to
memory otherwise get from memory and set state to Valid-Exclusive
(3) Write hit: if local copy is DIRTY proceed locally
if local copy is VE proceed locally; state set to DIRTY
if local copy is SHARED; a write to memory is initiated; other caches pick
up copy and set their state to SHARED; local copy is set to either VE or
SHARED (if other copies exist)
(4) Write miss: like a read miss; local copy is set to SHARED if other copies
exist in which case memory is updated also; if no other copies exist local
copy is set to DIRTY

37
Dragon Write-back Update Protocol
4 states
• Exclusive-clean or exclusive (E): I and memory have it
• Shared clean (Sc): I, others, and maybe memory, but I’m not
owner
• Shared modified (Sm): I and others but not memory, and I’m the
owner
– Sm and Sc can coexist in different caches, with only one Sm
• Modified or dirty (D): I and, no one else
No invalid state
• If in cache, cannot be invalid
• If not present in cache, can view as being in not-present or
invalid state

38
Dragon Write-back Update Protocol
New processor events: PrRdMiss, PrWrMiss
• Introduced to specify actions when block not present in cache
New bus transaction: BusUpd
• Broadcasts single word written on bus; updates other relevant
caches

39
Dragon: behavior
(1) Read hit: proceed locally; no state change
(2) Read miss: if another cache has a D or Sm copy, it supplies data and raises
the SharedLine signal. Supplying cache sets its copy to Sm; local copy is set
to Sc
if no D or Sm copies exist value comes from memory; Any cache with a copy
(E or Sc) raises the SharedLine signal; local copy is set to Sc if SharedLine is
raised otherwise is set to E
(3) Write hit: if local copy is D proceed locally; no state change
if local copy in E; proceed locally; state set to D
if local copy is Sm or Sc; delay write and initiate bus write; other caches get
new data and update their local copies; they set their copies to Sc; local copy
is set to Sm if other copies exist otherwise is set to D
(4) Write miss: like a read miss copy comes from cache with Sm or D copy
otherwise from M; if other copies exist they are set to Sc; local copy is set to
Sm if other copies exist or D if this is the only copy
40
Dragon State Transition Diagram
PrRd/— PrRd/—
BusUpd/Update

BusRd/—
E Sc
PrRdMiss/BusRd(S) PrRdMiss/BusRd(S)
PrWr/—
PrWr/BusUpd(S)

PrWr/BusUpd(S)
BusUpd/Update

PrWrMiss/(BusRd(S); BusUpd)
BusRd/Flush
PrWrMiss/BusRd(S)
Sm M
PrWr/BusUpd(S)
PrRd/—
PrRd/—
PrWr/BusUpd(S)
BusRd/Flush PrWr/—

41
Cache Coherence & Mem Ordering
• Cache coherence
• For a single memory location (address)
– Program order per process
– Value returned by read is the “latest” value written

– Write propagation: writes become visible to other processes

– Write serialization: all writes are seen by the same order by all
processes
• Memory Consistency:
• Order of operations on all memory locations (addresses):
• What order all memory accesses appear to happen
• Sequential consistency:
– as if all accesses (independent of location) happened in some serial
order which is an interleaving of local, individual program orders for
all addresses
42
Implementing SC
Two kinds of requirements
• Program order
– memory operations issued by a process must appear to
become visible (to others and itself) in program order
• Atomicity
– in the overall total order, one memory operation should appear
to complete with respect to all processes before the next one is
issued
– needed to guarantee that total order is consistent across processes
– tricky part is making writes atomic

43
Memory Order:
What Programmers Expect

P P …….. P
Load/Store

Shared Memory

Memory is accessed in program order & atomically

44
Memory Accessing Order

A = 0 and flag = 0
P P

A = 1;
While (flag = = 0);
flag = 1;
print A;

What should be value of A printed?

45
Memory Accessing Order: The Reality
What causes A to print 0?
• Out-of-order execution in the processor
• Compiler re-ordering accessing
• Shared-memory hardware: network, write buffers, etc.

How do you make sure the programmer gets what they want?
• Change the programming interface: memory models
• The programmer enforces order through annotations
• What are the advantages/disadvantages?

46
Write Atomicity
Write Atomicity: Position in total order at which a write
appears to perform should be the same for all processes
• Nothing a process does after it has seen the new value produced
by a write W should be visible to other processes until they too
have seen W

P1 P2 P3
A=1; while (A==0);
B=1; while (B==0);
print A;

•Transitivity implies A should print as 1 under SC

•Problem if P2 leaves loop, writes B, and P3 sees new B but old A (from
its cache, say)
47
Sufficient Conditions for SC
• Every process issues memory operations in program order
• After a write operation is issued, the issuing process waits for
the write to complete before issuing its next operation
• After a read operation is issued, the issuing process waits for
the read to complete, and for the write whose value is being
returned by the read to complete, before issuing its next
operation (provides write atomicity)
Sufficient, not necessary, conditions
Clearly, compilers should not reorder for SC, but they do!
• Loop transformations, register allocation (eliminates!)
Even if issued in order, hardware may violate for better performance
• Write buffers, out of order execution
Reason: uniprocessors care only about dependences to same location
• Makes the sufficient conditions very restrictive for performance
48
Coherence?

A memory operation M2 is subsequent to a memory operation M1 if the operations

are issued by the same processor and M2 follows M1 in program order.
Read is subsequent to write W if read generates bus xaction that follows that for W.
Write is subsequent to read or write M if M generates bus xaction and the xaction for
the write follows that for M.
Write is subsequent to read if read does not generate a bus xaction and is not
already separated from the write by another bus xaction.

P0 : R R R W R R

P1 : R R R R R W

P2 : R R R R R R

49
SC in Write-through Example
Provides SC, not just coherence

Extend arguments used for coherence

• Writes and read misses to all locations serialized by bus into bus
order
• If read obtains value of write W, W guaranteed to have
completed
– since it caused a bus transaction
• When write W is performed w.r.t. any processor, all previous
writes in bus order have completed

50
MSI:State Transition Diagram
PrRd/— PrWr/—

PrWr/BusRdX BusRd/Flush

S BusRdX/Flush
BusRdX/—
PrRd/BusRd
PrRd/—
PrWr/BusRdX BusRd/—

51
Satisfying Coherence
Everything like VI simple write-back protocol except for:
• Writes that don’t appear on the bus:
– sequence of such writes between two bus xactions for the block
must come from same processor, say P
– in serialization, the sequence appears between these two bus
xactions
– reads by P will seem them in this order w.r.t. other bus transactions
– reads by other processors separated from sequence by a bus
xaction, which places them in the serialized order w.r.t the writes
– so reads by all processors see writes in same order

P0 : R R R W W R

P1 : R R R R W

P2 : R R R R R

52
Write Serialization?
write on bus from Px - Bus serializes all bus writes
…. and reads
Write on bus from Py - local reads serialized w/ respect
those on bus
Write hit (has to be from same proc)- local writes?
local reads (Py)
read or write from Pz
other reads

53
Satisfying Sequential Consistency

• Bus imposes total order on bus xactions for all locations

• Between xactions, procs perform reads/writes locally in program
order
• So any execution defines a natural partial order
– Mj subsequent to Mi if (I) follows in program order on same
processor, (ii) Mj generates bus xaction that follows the memory
operation for Mi

Module 4
No ratings yet
Module 4
66 pages
Multi Processor
No ratings yet
Multi Processor
63 pages
Chapter 8 - Parallel Processing
No ratings yet
Chapter 8 - Parallel Processing
50 pages
Week4 1
No ratings yet
Week4 1
37 pages
MC&CC
No ratings yet
MC&CC
21 pages
05 Multiprocessor
No ratings yet
05 Multiprocessor
54 pages
ACA Lecture 29 Cache-Coherence 2
No ratings yet
ACA Lecture 29 Cache-Coherence 2
42 pages
Cache Coherency
No ratings yet
Cache Coherency
19 pages
MODULE 4 HPC
No ratings yet
MODULE 4 HPC
41 pages
ch5 4
No ratings yet
ch5 4
9 pages
Snoop-Based Multiprocessor Design
No ratings yet
Snoop-Based Multiprocessor Design
57 pages
Cache Coherence
No ratings yet
Cache Coherence
63 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
Cache Coherence: - According To Webster's Dictionary
No ratings yet
Cache Coherence: - According To Webster's Dictionary
15 pages
Cache Coherence
No ratings yet
Cache Coherence
53 pages
Week 5
No ratings yet
Week 5
52 pages
Unit 5
No ratings yet
Unit 5
89 pages
L39 - Centralized Shared Memory Architectures
No ratings yet
L39 - Centralized Shared Memory Architectures
31 pages
Module 4
No ratings yet
Module 4
40 pages
Unit 4 - Advanced Computer Architecture - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Advanced Computer Architecture - WWW - Rgpvnotes.in
14 pages
Cache Coherence
No ratings yet
Cache Coherence
39 pages
Snooping vs. Directory Based Coherency: Professor David A. Patterson Computer Science 252 Fall 1996
No ratings yet
Snooping vs. Directory Based Coherency: Professor David A. Patterson Computer Science 252 Fall 1996
59 pages
Sequential Consistency and Cache Coherence Protocols: Computer Science and Artificial Intelligence Lab M.I.T
No ratings yet
Sequential Consistency and Cache Coherence Protocols: Computer Science and Artificial Intelligence Lab M.I.T
29 pages
Coherence
No ratings yet
Coherence
16 pages
CS 523 Advanced Computer Architecture: Introduction To Cache Coherence Protocols
No ratings yet
CS 523 Advanced Computer Architecture: Introduction To Cache Coherence Protocols
24 pages
Cache Coherence: CEG 4131 Computer Architecture III Slides Developed by Dr. Hesham El-Rewini
No ratings yet
Cache Coherence: CEG 4131 Computer Architecture III Slides Developed by Dr. Hesham El-Rewini
63 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
Chapter 7
No ratings yet
Chapter 7
97 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
23 pages
Lecture12 PDF
No ratings yet
Lecture12 PDF
9 pages
Shared Memory Multiprocessors: Logical Design and Software Interactions
No ratings yet
Shared Memory Multiprocessors: Logical Design and Software Interactions
107 pages
CH17 COA9e
No ratings yet
CH17 COA9e
51 pages
IJARCCE-46 Cachemesiwithverilog
No ratings yet
IJARCCE-46 Cachemesiwithverilog
5 pages
Cache Coherence: Computer Science & Artificial Intelligence Lab
No ratings yet
Cache Coherence: Computer Science & Artificial Intelligence Lab
36 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Memory Hierarchy: Haresh Dagale Dept of ESE
No ratings yet
Memory Hierarchy: Haresh Dagale Dept of ESE
32 pages
Snooping Cache and Directory Based Multiprocessors
No ratings yet
Snooping Cache and Directory Based Multiprocessors
59 pages
CSCI 8150 Advanced Computer Architecture
100% (2)
CSCI 8150 Advanced Computer Architecture
46 pages
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
No ratings yet
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
21 pages
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
No ratings yet
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
33 pages
Lec 6 SharedArch PDF
No ratings yet
Lec 6 SharedArch PDF
33 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
No ratings yet
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
79 pages
1.symmetric and Distributed Shared Memory Architectures
79% (19)
1.symmetric and Distributed Shared Memory Architectures
29 pages
Cache Coherence (Part 1)
No ratings yet
Cache Coherence (Part 1)
13 pages
Shared Memory Architecture
No ratings yet
Shared Memory Architecture
39 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
Shared Memory Architecture Concepts and Performance Issues: Outline
No ratings yet
Shared Memory Architecture Concepts and Performance Issues: Outline
7 pages
Parallel 2
No ratings yet
Parallel 2
14 pages
Distributed Shared Memory: Introduction & Thisis
No ratings yet
Distributed Shared Memory: Introduction & Thisis
22 pages
1) Define MIPS. CPI and MFLOPS.: Q.1 Attempt Any FOUR
No ratings yet
1) Define MIPS. CPI and MFLOPS.: Q.1 Attempt Any FOUR
10 pages
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
No ratings yet
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
24 pages
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
No ratings yet
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
13 pages
Multiprocessor Cache Coherence
No ratings yet
Multiprocessor Cache Coherence
13 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
Cache Coherence - MESI MOESI
No ratings yet
Cache Coherence - MESI MOESI
57 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
Cache Coherency
No ratings yet
Cache Coherency
33 pages
Functions of An Operating System
No ratings yet
Functions of An Operating System
25 pages
Practice Problems: Concurrency: Lectures On Operating Systems (Mythili Vutukuru, IIT Bombay)
No ratings yet
Practice Problems: Concurrency: Lectures On Operating Systems (Mythili Vutukuru, IIT Bombay)
38 pages
Synchronization
100% (1)
Synchronization
13 pages
Question Bank
No ratings yet
Question Bank
46 pages
Fallsem2019-20 Cse4001 Eth Vl2019201001348 Reference Material Cse4001 Parallel and Distributed Computing May 2019 (003) 18
No ratings yet
Fallsem2019-20 Cse4001 Eth Vl2019201001348 Reference Material Cse4001 Parallel and Distributed Computing May 2019 (003) 18
4 pages
Chapter 1: Multi Threaded Programming: (Operating Systems-18Cs43)
No ratings yet
Chapter 1: Multi Threaded Programming: (Operating Systems-18Cs43)
39 pages
MadhurBakshi OS Practical File
No ratings yet
MadhurBakshi OS Practical File
51 pages
Chapter 2
No ratings yet
Chapter 2
143 pages
HPC Lab Manual
No ratings yet
HPC Lab Manual
31 pages
Operating Systems Concepts 5th Edition Chapter7
0% (1)
Operating Systems Concepts 5th Edition Chapter7
44 pages
Operating System Question Paper
No ratings yet
Operating System Question Paper
1 page
Module 17: Transactions: Database System Concepts, 7 Ed
No ratings yet
Module 17: Transactions: Database System Concepts, 7 Ed
48 pages
Module 1: PARALLEL AND DISTRIBUTED COMPUTING
No ratings yet
Module 1: PARALLEL AND DISTRIBUTED COMPUTING
65 pages
Fuss, Futexes and Furwocks: Fast Userlevel Locking in Linux
No ratings yet
Fuss, Futexes and Furwocks: Fast Userlevel Locking in Linux
19 pages
Multitasking (Overview)
No ratings yet
Multitasking (Overview)
23 pages
CSE321 - 3. Threads & Concurrency
No ratings yet
CSE321 - 3. Threads & Concurrency
40 pages
Os Chapter 3
No ratings yet
Os Chapter 3
14 pages
Multithreading / Concurrency
No ratings yet
Multithreading / Concurrency
63 pages
Dragon Book Chapter 3 Til 3.3
No ratings yet
Dragon Book Chapter 3 Til 3.3
11 pages
Process Scheduling Simplilified Notes
No ratings yet
Process Scheduling Simplilified Notes
7 pages
Dssss
No ratings yet
Dssss
9 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
CC7 Class Test (AB-Sir)
No ratings yet
CC7 Class Test (AB-Sir)
2 pages
DONE - CS604-Quiz 2 Solution
No ratings yet
DONE - CS604-Quiz 2 Solution
8 pages
Database Transaction: Management
No ratings yet
Database Transaction: Management
21 pages
Operating System - Lab Manual # 11
No ratings yet
Operating System - Lab Manual # 11
7 pages
IT-501-CBGS: B.Tech., V Semester
No ratings yet
IT-501-CBGS: B.Tech., V Semester
3 pages
LAB 3 Manual: Instructor: Saad Hameed Subject: Database Administration & Management Lab Class: BS-IT (4-A)
No ratings yet
LAB 3 Manual: Instructor: Saad Hameed Subject: Database Administration & Management Lab Class: BS-IT (4-A)
6 pages
Alignment A CSP (Profile 1)
No ratings yet
Alignment A CSP (Profile 1)
1 page

Bus-Based Multiprocessor: A.K.A or Snoopy-Bus Architecture

Uploaded by

Bus-Based Multiprocessor: A.K.A or Snoopy-Bus Architecture

Uploaded by

Bus-Based Multiprocessor

A.k.a SMP or Snoopy-Bus Memory

• Most common form of multiprocessor!

• Very small number of processors: up to 4

• Most common form of large shared memory

PrWr/BusWr Processor-initiated transactions

“Problem” with MSI protocol

Add exclusive state: write locally without xaction, but not

• Same bus transactions as MSI

As in MESI but w/ these differences:

upon replacing an O copy it has to be treated as a dirty block

Why this? Reduces memory traffic

High overhead on misses, probably only of historical interest

– Write propagation: writes become visible to other processes

Memory is accessed in program order & atomically

What should be value of A printed?

•Transitivity implies A should print as 1 under SC

A memory operation M2 is subsequent to a memory operation M1 if the operations

Extend arguments used for coherence

• Bus imposes total order on bus xactions for all locations

You might also like