0% found this document useful (0 votes)

33 views

Parallel Computer Architecture A Hardware-Software

The document discusses cache coherency in symmetric multiprocessor (SMP) systems. It describes how caches in SMP systems can become incoherent if not managed properly, as updates to shared memory locations in one processor's cache may not be visible to other processors. The key goal of cache coherency is to ensure a serially consistent view of memory updates across all processors. The document outlines several cache coherency mechanisms used in SMP systems to maintain a single shared memory image.

Uploaded by

Yasyr

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Parallel Computer Architecture A Hardware-Software

Uploaded by

Yasyr

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/220692283

Parallel Computer Architecture: A Hardware-

Software Approach

Book · January 1999

Source: DBLP

CITATIONS READS

973 1,504

3 authors, including:

David E. Culler Anoop Gupta

University of California, Berkeley Stanford University
353 PUBLICATIONS 62,069 CITATIONS 183 PUBLICATIONS 20,694 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Brick Schema View project

Matadata mapping for building controls and analytics View project

All content following this page was uploaded by David E. Culler on 20 May 2014.

The user has requested enhancement of the downloaded file.

Symmetric Multiprocessors

Implementing a single memory image operated upon

by multiple processors is possible at small scales. The
SMP is a standard design in which cache coherency
allows all processors to see the same image.

Homework
Write two procedures -- Put_Task() and
Get_Task() -- in any language, e.g. C, and
using Fetch&Add for synchronization as
follows
• Define TD[1..n], the ToDo array; n=c*Processors
for some c
• Define FF, for first free, pointing in TD to first free cell
• Define NA, for next available, pointing to next task in TD
• Put_Task(a) takes a task as input, and places it in TD;
Get_Task() returns the next task from TD if there is one
• The management of TD is completely decentralized

1
Homework
Strategy: The TD[1..n] is several times larger
than the number of processors; system
assumes at least 1 producer and 1 consumer
Put_Task(a)
Put_Task(a){{ Put waits if the slot is
slot = Fetch&Add(FF,1);
slot = Fetch&Add(FF,1); occupied to avoid overrun;
ififslot == n then Fetch&Add(FF,
slot == n then Fetch&Add(FF, -n); -n); Get waits if slot is empty
ififslot > n then slot = slot - n;
slot > n then slot = slot - n; No task has is “0”
while TD[slot] != 0 do wait(rand());
while TD[slot] != 0 do wait(rand());
TD[slot]
TD[slot]==a; a; Get_Task
Get_Task{{var vartemp;
temp;
}}
slot = Fetch&Add(NA,1);
slot = Fetch&Add(NA,1);
ififslot
slot==
==nnthen
thenFetch&Add(NA,-n);
Fetch&Add(NA,-n);
ififslot
slot > n thenslot
> n then slot==slot
slot--n;
n;
while
while TD[slot] == 0 dowait(rand());
TD[slot] == 0 do wait(rand());
temp=TD[slot];
temp=TD[slot];TD[slot]=0;
TD[slot]=0;return
returntemp;
temp;
}}

Shared Memory
Shared memory was claimed to be a poor model
because it is does not scale
– Many vendors have sold small shared memory
machines
• Some like SMPs work well (but modeled poorly by PRAM)
• Some never worked -- KSR
• Some worked because of a technology opportunity -- slow
processors with a “fast interconnect”
• Some work on small scale, but not beyond 64 processors
and everyone tries to ignore that fact -- Origin-2000
– Many researchers have come up with great ideas,
but they still remain unproved

2
Citation

David Culler and J.P. Singh

Parallel Computer Architecture
Morgan Kaufmann, 1999

Share Memory Image

• Previous models of shared memory have

literally implemented a single memory unit
where all data resides
• Besides being a point of contention, a single
memory doesn’t permit caching (though
“read-only” caching is OK)
• The SMP turns the idea around and exploits
caching to implement a shared memory

3
Architecture of an SMP
• A symmetric multiprocessor (SMP) is a set of
processor/cache pairs connected to a bus
• The bus is both good news and bad news
• The (memory) bus is a point at which all processors can
“see” memory activity, and can know what is happening
• A bus is used “serially,” and becomes a “bottleneck,”
limiting scaling

P0 P1 P2 P3

Cache Cache Cache Cache

Bus
Memory

Recall Caches
• Cache blocks (lines) contain several words
• Blocks have state
– Valid Cache
– Invalid ...
addr ...
– Dirty = diff from mem
...
• Cache writing
– Write through means update memory on all writes
– Write back means wait and update when block is
invalidated
– “allocate” vs “no-allocate”

4
Cache Coherence -- The Problem
• Processors can modify shared locations
without other processors being aware of it
unless special hardware is added

PP1 reads a into its cache

1 reads a into its cache P1 P2 P3
PP3
3
a: 4
PP1
1
PP3
3
a: 4 Memory

Cache Coherence -- The Problem

• Processors can modify shared locations
without other processors being aware of it
unless special hardware is added

PP1 reads a into its cache

1 reads a into its cache P1 P2 P3
PP3 reads
readsaainto
intoits
itscache
cache
3
a: 4 4
PP1
1
PP3
3
a: 4 Memory

5
Cache Coherence -- The Problem
• Processors can modify shared locations
without other processors being aware of it
unless special hardware is added

PP1 reads a into its cache

1 reads a into its cache P1 P2 P3
PP3 reads
readsaainto
intoits
itscache
cache
3
a: 4 5 4
PP1 changes a to 5 and
1 changes a to 5 and
writes
writes the resultthrough
the result throughto
to
main
main memory leavingP3
memory leaving P3 a: 4 5 Memory
with
withstale
staledata
data
PP3
3

Cache Coherence -- The Problem

• Processors can modify shared locations
without other processors being aware of it
unless special hardware is added

PP1 reads a into its cache

6
Cache Coherency -- The Goal
A multiprocessor memory system is coherent if
for every location there exists a serial order
for the operations on that location consistent
with the results of the execution such that
• The subsequence of operations for any processor are in
the order issued
• The value returned by each read is the value written by
the last write in serial order
P1 P2 P3
p1:i,
p1:i,p3:j,
p3:j,p2:k,
p2:k,p1:i+1,
p1:i+1,p3:j+1,
p3:j+1,...
... a45 4

a45 Memory
w:a r:a

Write Serialization
Implied
Impliedproperty
propertyof
ofCache
CacheCoherency:
Coherency:
Write
WriteSerialization
Serialization…
…all
all writes
writestotoaalocation
location
are seen in the same order by all processors
are seen in the same order by all processors

• For fulfilling “seen by all processors” a bus is

a perfect solution

7
Snooping To Solve Coherency
• The cache controllers can “snoop” on the
bus, meaning that they watch the events on
the bus even if they do not issue them, noting
any action relevant to cache lines they hold
• There are two possible actions when a
location held by processor A is changed by
processor B
• Invalidate -- mark the local copy as invalid
• Update -- make the same change B made

The
Theunit
unitof
ofcache
cachecoherency
coherencyisisaacache
cacheline
lineor
orblock
block

Snooping
When the cache controller “snoops” it sees
requests by its processor or bus activity by
other processors that is not local to them
P1 P2 P3
Activity from processor
Activity form others

Memory

8
Snooping At Work I
By snooping the cache controller for processor
P3 can take action in response to P1’s write
PP1 reads a into its cache
1 reads a into its cache P1 P2 P3
PP3 reads
readsaainto
intoits
itscache
cache
3
a: 4 5 4
PP1 changes a to 5 and
1 changes a to 5 and
writes
writes throughto
through tomain
main
memory; P33 seesthe
memory; P sees the a: 4 5 Memory
action
actionand
andinvalidates
invalidatesthe
the
location
location
PP3
3

Snooping At Work II
By snooping the cache controller for processor
P3 can take action in response to P1’s write
PP1 reads a into its cache
1 reads a into its cache P1 P2 P3
PP3 reads
readsaainto
intoits
itscache
cache
3
a: 4 5 45
PP1 changes a to 5 and
1 changes a to 5 and
writes
writes throughto
through tomain
main
memory; P33 seesthe
memory; P sees the a: 4 5 Memory
action
actionand
andinvalidates
invalidatesthe
the
location
locationor
orupdates
updatesitit
PP3
3

9
Write-through Coherency
• State diagrams show the protocol

States of a cache line

PrRd/-- PrWr/BusWr V is valid
V I is invalid
Transactions
PrRd/BusRd BusWr/-- Reads (Rd) or Writes (Wr)
by processor or bus
I Labeling A/B
If A is observed
Then transaction B is
PrWr/BusWr generated

Applying the WT Protocol

• Consider the transactions PrRd/-- PrWr/BusWr

V
PP1 reads a into its cache
1 reads a into its cache PrRd/BusRd BusWr/--
PP3 reads
readsaainto
intoits
itscache
cache
3 P1 : I --> V
PP1 changes a to 5 and
1 changes a to 5 and
PrRd/BusRd I
writes
writes throughto
through tomain
main P3 : I --> V
memory
memory PrRd/BusRd
P1 : V --> V
PP3 sees the action and PrWr/BusWr
3 sees the action and PrWr/BusWr
invalidates
invalidatesthe
thelocation
location P3 : V --> I
PP2 reads
readsaainto
intoits
itscache
cache BusWr/--
2
P2 : I --> V
PrRd/BusRd

10
Partial Order On Memory Operations

Write bus transactions define a global sequence

of events; between writes processors can read
… any total order produced by interleaving

R R R W R R W

R R R R

R R R R R R

Memory Consistency
• What should it mean for processors to see a
consistent view of memory?
• Coherency is too weak because it only
requires ordering with respect to individual
locations, but there are other ways of binding
values together PP0 :: [a,
[a, flag
flag initially
initially 0]
0]
0
aa :=
:= 1;
1;
flag
flag :=:= 1;
1;
Coherency requires PP1 ::
only that the 0 --> 1 1

transition of a be seen while(flag

while(flag !=!= 1)do;
1)do; --
-- spin
spin
eventually by P1 print (a);
print (a);

11
Basic Write-back Snoopy Cache Design
• Write-back protocols are more complex than
write-through because modified data remains
in the cache
• Introduce more cache states to handle that
• Modified, or dirty, the value differs from memory
• Exclusive, no other cache has this location
• Consider an MSI protocol with three states:
• Modified -- data is correct locally, different from memory
• Shared (Valid) -- data at this location is correct
• Invalid -- data at this location not correct

MSI Protocol
• Rdx means that the
cache holds a PrRd/-- PrWr/--
modified value of M
the location and
PrWr/BusRdx
asks for exclusive
BusRd/Reply
permission to read
PrWr/BusRdx S BusRdx/Reply
PrRd/BusRd PrRd/-
• Reply means put
BusRd/-
the value on the bus
BusRdx/--
for another
processor to read I

Conceptually: Manage dirty value within caches

12
MSI Protocol In Action
Proc
Proc Data
Data
Action P0 P1 P2 Bus From
Action P0 P1 P2 Bus From
P0:r
P0:r aa SS -- -- BRd
BRd Mem
Mem
PrRd/-- PrWr/--
P2:r
P2:r aa SS -- SS BRd
BRd Mem
Mem
P2:w
P2:w aa II -- MM BRdx
BRdx
M
P0:r
P0:r a S - S BRd P2
a S - S BRd P2 PrWr/BusRdx
P1:r a S S S BRd
P1:r a S S S BRd Mem Mem BusRd/Reply
PrWr/BusRdx S BusRdx/Reply
PrRd/BusRd PrRd/-
BusRd/-
BusRdx/--

Critique of MSI
Bad: 2 bus ops to load
PrRd/-- PrWr/--
and update a value even
M
without any sharing
PrWr/BusRdx
BusRd/Reply
PrWr/BusRdx S BusRdx/Reply
Proc
Proc Data
Data PrRd/BusRd PrRd/-
Action P0 Pi Bus From
Action P0 Pi Bus From BusRd/-
P0:r BusRdx/--
P0:r aa SS -- BRd
BRd Mem
Mem
P0:w a M - BRdx
P0:w a M - BRdx I

• Add an Exclusive State, opposite of Shared

13
Break

PRd/-- PWr/--
Illinois Protocol M

P=processor
B=Bus PWr/-
Rd=Read PWr/BRdx BRd/Rp
Rdx=Read Ex E BRdx/Rp
RdS=Read Sh
RdS'=Read Ex PrRd/-
PWr/BRdx BRdx/Rp'
Rp=Reply BRd/Rp
Rp=Reply Someone
PRd/BRdx'
S
PRd/BRd BRdx/Rp'
Proc Data PrRd/-
Proc Data
BRd/Rp'
Action
Action P0
P0 Pi
Pi Bus
Bus From
From
P0:r BRdx'x' Mem
P0:r aa EE -- BRd Mem
P0:w a M - I
P0:w a M -

14
Alternative … Updating
• One caching issue is “invalidation” vs “update”:
Dragon

E SC SM M
Proc
Proc Data
Data
Action
Action P0 P1 P2 Bus From
P0 P1 P2 Bus From
P1:r
P1:r aa EE -- -- BRd BRd Mem
Mem
P3:r a Sc - Sc BRd
P3:r a Sc - Sc BRd Mem Mem
P3:w
P3:w aa Sc
Sc -- SmSm Bupd
Bupd P3
P3
P1:r a Sc - Sm null
P1:r a Sc - Sm null - -
P2:r
P2:r aa Sc
Sc Sc
Sc Sm
Sm BRd
BRd P3P3

Invalidation vs Update
1 Repeat k times: P1 writes V, P2-Pp read V
… perhaps representing work allocation
2 Repeat k times: P1 writes V M times, P2 reads
… perhaps representing sharing pair
invl = 6B, update = 14B, miss = 70B
P = 16, M = 10, k = 10

U1:
U1:1,260B
1,260B I1:10,624B
I1:10,624B
U2: 1,400B
U2: 1,400B I2:
I2: 824B
824B

15
Implications of Blocksize

• Larger blocks exploit spatial locality better

• Bus transactions take more time with larger
blocks
• Fewer large blocks for a given amount of
memory or more small blocks
• There are implications on sharing

True/False Sharing
• If two processors reference the same cache
line and the same word, they are “truly”
sharing
• If two processors reference the same cache
line but a different word, they are “falsely”
sharing
Cache
...
addr ...
...

P0 references P1 references

16
Discussion

• What is the best model of an SMP … PRAM?

P0 P1 P2 P3 P4 P5 P6 P7

Memory
C A B

Summary
• SMPs solve shared memory by snooping
• Key to SMP’s success is the bus, a site for
serializing memory references
• Buses work, but only for a small number (64
is upper limit, but fewer is better) of
processors
• Relative to the two requirements of shared
memory -- acceptable costs, coherency -- the
SMP meets both

View publication stats

Yan Solihin - Fundamentals of Parallel Computer Architecture
100% (2)
Yan Solihin - Fundamentals of Parallel Computer Architecture
547 pages
Journal of The European Ceramic Society: Rainer Telle
No ratings yet
Journal of The European Ceramic Society: Rainer Telle
11 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
What Is Parallel Computing
No ratings yet
What Is Parallel Computing
9 pages
comporg6_ch12
No ratings yet
comporg6_ch12
36 pages
Shared Memory Multiprocessors: Logical Design and Software Interactions
No ratings yet
Shared Memory Multiprocessors: Logical Design and Software Interactions
107 pages
MODULE 4 hpc
No ratings yet
MODULE 4 hpc
41 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
KTMTSS Shared Memory Multiprocessor
No ratings yet
KTMTSS Shared Memory Multiprocessor
29 pages
ACA UNIT-5 Notes
No ratings yet
ACA UNIT-5 Notes
15 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
05 Multiprocessor
No ratings yet
05 Multiprocessor
54 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
1.symmetric and Distributed Shared Memory Architectures
79% (19)
1.symmetric and Distributed Shared Memory Architectures
29 pages
Week 5
No ratings yet
Week 5
52 pages
Week_5
No ratings yet
Week_5
35 pages
10-Multithreading
No ratings yet
10-Multithreading
60 pages
Shared Memory Architecture Concepts and Performance Issues: Outline
No ratings yet
Shared Memory Architecture Concepts and Performance Issues: Outline
7 pages
R12 U5 MultiProcessor Architectures
No ratings yet
R12 U5 MultiProcessor Architectures
47 pages
Cache Coherence (Part 1)
No ratings yet
Cache Coherence (Part 1)
13 pages
4-Module #4-Shared-Memory-Students-Version-Final-October-24-2024
No ratings yet
4-Module #4-Shared-Memory-Students-Version-Final-October-24-2024
25 pages
Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages
CH17 COA9e
No ratings yet
CH17 COA9e
51 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
Cs 6461 Computer Architecture Lecture 11
No ratings yet
Cs 6461 Computer Architecture Lecture 11
51 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
Unit 5
No ratings yet
Unit 5
89 pages
Chapter 3
No ratings yet
Chapter 3
21 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
2. Parallel Computers
No ratings yet
2. Parallel Computers
39 pages
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
No ratings yet
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
24 pages
Lec13 Multiprocessors
No ratings yet
Lec13 Multiprocessors
69 pages
Unit 5 (Slides)
No ratings yet
Unit 5 (Slides)
75 pages
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
No ratings yet
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
22 pages
L04 Parallel Systems Synchronization Communication Scheduling
No ratings yet
L04 Parallel Systems Synchronization Communication Scheduling
117 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
No ratings yet
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
13 pages
L39 - Centralized Shared Memory Architectures
No ratings yet
L39 - Centralized Shared Memory Architectures
31 pages
Lecture 06
No ratings yet
Lecture 06
26 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
CSE211 Computer Architecture
No ratings yet
CSE211 Computer Architecture
18 pages
HPC-Unit-1
No ratings yet
HPC-Unit-1
65 pages
Final Report: Multicore Processors
No ratings yet
Final Report: Multicore Processors
12 pages
APznzabMSGRiAQ8A6MYm6rveAifgi1HxTbiTS9Yf85jZUPqJgWxkujRhNKxar3EMmdUmkYBO7lY9cgFKwY4fwAkv2bcmoL6bQOuYWj_ptvmKvZa7LIHiGWTA-SGiv4ZX1G6v7akwnOUhTbDF77ogwOam9w3m9razgp9_G3AN8-n7pGnvYDhIz5LR3pHaezRf34N7xBAUUWK5LTsnzw1
No ratings yet
APznzabMSGRiAQ8A6MYm6rveAifgi1HxTbiTS9Yf85jZUPqJgWxkujRhNKxar3EMmdUmkYBO7lY9cgFKwY4fwAkv2bcmoL6bQOuYWj_ptvmKvZa7LIHiGWTA-SGiv4ZX1G6v7akwnOUhTbDF77ogwOam9w3m9razgp9_G3AN8-n7pGnvYDhIz5LR3pHaezRf34N7xBAUUWK5LTsnzw1
31 pages
MULTIPROCTLPA
No ratings yet
MULTIPROCTLPA
99 pages
Unit 1
No ratings yet
Unit 1
25 pages
Multiprocessor Architecture: Taxonomy of Parallel Architectures
100% (1)
Multiprocessor Architecture: Taxonomy of Parallel Architectures
32 pages
05 - Lecture #5 - 6
No ratings yet
05 - Lecture #5 - 6
42 pages
Lec 6 SharedArch PDF
No ratings yet
Lec 6 SharedArch PDF
33 pages
2 Parallel Computer Memory Architectures
No ratings yet
2 Parallel Computer Memory Architectures
26 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
33 pages
09 Communication models of Parallel platforms
No ratings yet
09 Communication models of Parallel platforms
25 pages
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
LPIC-1 Primer
From Everand
LPIC-1 Primer
John Greene
4.5/5 (3)
All My IT Tech Posts
From Everand
All My IT Tech Posts
Stephen Edwards
No ratings yet
IT404 Data Analytics
No ratings yet
IT404 Data Analytics
2 pages
IT304 Data Warehousing and Mining
No ratings yet
IT304 Data Warehousing and Mining
2 pages
Secondary Storage Devices
No ratings yet
Secondary Storage Devices
3 pages
Error Detection and Correction
No ratings yet
Error Detection and Correction
3 pages
FILE3296
No ratings yet
FILE3296
3 pages
List Suggested Books Indian Authors Publishers PDF
No ratings yet
List Suggested Books Indian Authors Publishers PDF
52 pages
The Hamming Distance D (10101, 11110) Is 3 Because
No ratings yet
The Hamming Distance D (10101, 11110) Is 3 Because
2 pages
William Stallings Computer Organization and Architecture: Chapter 11 - 11.3 CPU Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture: Chapter 11 - 11.3 CPU Structure and Function
27 pages
NAND and NOR Test
No ratings yet
NAND and NOR Test
4 pages
Social Networking Coupled With AI
No ratings yet
Social Networking Coupled With AI
3 pages
BCD Adder & Code Converter
No ratings yet
BCD Adder & Code Converter
10 pages
Micropr Notes
No ratings yet
Micropr Notes
119 pages
IT366 Advanced Database Management Systems
0% (1)
IT366 Advanced Database Management Systems
2 pages
Parity
No ratings yet
Parity
7 pages
British English & American English
No ratings yet
British English & American English
6 pages
Alliteration
No ratings yet
Alliteration
5 pages
Kernel
No ratings yet
Kernel
21 pages
It6007 Free and Open Source Software L T P C 3 0 0 3
100% (1)
It6007 Free and Open Source Software L T P C 3 0 0 3
1 page
Cognos Inc.: Eddie Haizlip Danny Roach Greg Sparks
No ratings yet
Cognos Inc.: Eddie Haizlip Danny Roach Greg Sparks
32 pages
Mid-term year 4 science
No ratings yet
Mid-term year 4 science
4 pages
Aeron Chair Care and Maintenance: Materials
No ratings yet
Aeron Chair Care and Maintenance: Materials
2 pages
Electrometals Copper Brochure
No ratings yet
Electrometals Copper Brochure
4 pages
CoreJava iNetSolv PDF
No ratings yet
CoreJava iNetSolv PDF
191 pages
AMS HP EliteBook 840 G1 Notebook PC Data Sheet PDF
No ratings yet
AMS HP EliteBook 840 G1 Notebook PC Data Sheet PDF
5 pages
Luz de Freno Del Vehiculo
No ratings yet
Luz de Freno Del Vehiculo
2 pages
Fenice Power Product Brochure V12
No ratings yet
Fenice Power Product Brochure V12
39 pages
H-3000-1360-026_A_0001
No ratings yet
H-3000-1360-026_A_0001
8 pages
Wepik Guarding The Invisible Vault Unleashing The Power of Privacy With Creativity 20231023081422hrmo
No ratings yet
Wepik Guarding The Invisible Vault Unleashing The Power of Privacy With Creativity 20231023081422hrmo
12 pages
Redundant Internet 3.1.1
No ratings yet
Redundant Internet 3.1.1
7 pages
Trane Foundation 15-25 Ton
No ratings yet
Trane Foundation 15-25 Ton
8 pages
Revisi - Detail Core - 5180911224
No ratings yet
Revisi - Detail Core - 5180911224
1 page
Viscosity Liq Inorg
No ratings yet
Viscosity Liq Inorg
3 pages
2022 10 12 313D2L Activacion - Ansul
No ratings yet
2022 10 12 313D2L Activacion - Ansul
13 pages
Error Codes For Komatsu Excavator's PC400-7, 400LC-7 - Electrical Equipment
100% (2)
Error Codes For Komatsu Excavator's PC400-7, 400LC-7 - Electrical Equipment
4 pages
Deep Water Mooring Systems Using Fiber Ropes
No ratings yet
Deep Water Mooring Systems Using Fiber Ropes
10 pages
Dimensions - 700-HL Relays: Photo Description
No ratings yet
Dimensions - 700-HL Relays: Photo Description
1 page
Mr. Jeff Jaczkowski PDF
No ratings yet
Mr. Jeff Jaczkowski PDF
15 pages
Astm B26 - B26M-2011 - 5343
33% (3)
Astm B26 - B26M-2011 - 5343
3 pages
Data Sheet - Throughbolt
No ratings yet
Data Sheet - Throughbolt
2 pages
91dSot9AhTL
No ratings yet
91dSot9AhTL
15 pages
10 Hospitality Trends To Look Out For
No ratings yet
10 Hospitality Trends To Look Out For
3 pages
Waterproofing Chemicals Zycoprime Plus Acrylic Sealer and Bonder
No ratings yet
Waterproofing Chemicals Zycoprime Plus Acrylic Sealer and Bonder
4 pages
Altistart 22 ATS22C21Q Soft Starter Datasheet
No ratings yet
Altistart 22 ATS22C21Q Soft Starter Datasheet
15 pages
350 Electrical
No ratings yet
350 Electrical
1 page
Harga Satuan Pipa Untuk 2023
No ratings yet
Harga Satuan Pipa Untuk 2023
4 pages
Chapter 5 Pneumatic System Multi Actuator Circuit Prepared
No ratings yet
Chapter 5 Pneumatic System Multi Actuator Circuit Prepared
59 pages
Riyadh Work Shop
No ratings yet
Riyadh Work Shop
114 pages
Mrinal Project 2
No ratings yet
Mrinal Project 2
93 pages

Parallel Computer Architecture A Hardware-Software

Uploaded by

Parallel Computer Architecture A Hardware-Software

Uploaded by

See

Parallel Computer Architecture: A Hardware-

Book · January 1999

David E. Culler Anoop Gupta

SEE PROFILE SEE PROFILE

Brick Schema View project

Matadata mapping for building controls and analytics View project

The user has requested enhancement of the downloaded file.

Implementing a single memory image operated upon

David Culler and J.P. Singh

Share Memory Image

• Previous models of shared memory have

Cache Cache Cache Cache

PP1 reads a into its cache

Cache Coherence -- The Problem

PP1 reads a into its cache

PP1 reads a into its cache

Cache Coherence -- The Problem

PP1 reads a into its cache

• For fulfilling “seen by all processors” a bus is

States of a cache line

Applying the WT Protocol

• Consider the transactions PrRd/-- PrWr/BusWr

Write bus transactions define a global sequence

transition of a be seen while(flag

Conceptually: Manage dirty value within caches

• Add an Exclusive State, opposite of Shared

• Larger blocks exploit spatial locality better

• What is the best model of an SMP … PRAM?

View publication stats

You might also like