R12 U5 MultiProcessor Architectures
R12 U5 MultiProcessor Architectures
Multiprocessor
Architectures
P. Raja vadhana
AP CSE
Computer Architecture II
BE CSE G2 S6
15/3/2016
Contents
Multiprocessors
Factors:
Silicon usage
Power consumption
Cost of ILP & TLP
Cache Coherence
Reason
Existence
Shared variable exhibits 2 states
Global state
Local state
Occurrence
View of memory held by two different
processors is through their individual caches
Consistency
Properties of Coherency
Time
Program Order Spaced
1.P writes to
location X
2.No other
WRITES in
between
3.P reads from
location X
Access returns
the WRITTEN
value by P
Write
Serialization
1.B writes to location
X
2.A writes to location
X
Access order is
seen the same by
all processors
Models Of Memory
Consistency
Delayed Invalidation
Solution:
Sequential Consistency
Delay memory access completion until all
invalidations caused by that access are
completed
Processor
Consistency
or
Total Store Ordering
Partial Store
Ordering
R->W, R->R
Weak Ordering
Or
Release Consistency
Model
W->R
W->W
R->W, R->R
Basics Of Synchronization
Implement a Atomic structure using hardware
primitives
Implement a Coherence Mechanism
LL-SC
SWAP:
try: MOV R3 , R4
LL R2 , 0(R1)
SC R3 , 0(R1)
BEQZ R3 , try
MOV R4 , R2
FETCH & INCREMENT:
try: LL R2 , 0(R1)
DADDUI R3 , R2 ,#1
SC R3 , 0(R1)
BEQZ R3 , try
Coherence Mechanism
SPIN LOCKS:
Simple Illustration of
Snoop
Remote CPU
Access to Local
Cache block
Remote CPU
Access to Local
Cache block
Problem
Bus
Activi
ty
Cache
content in
P0
State of
Block
containin
g __
Cache
content in
P1
State of
Block
containing __
Memory
Content
Solve the problem using above format under following two cases:
1. X and Y in SAME cache block/cache line
2. X and Y in DIFFERENT cache block/cache line
Performance of UMA
Limitations
True Sharing Vs False Sharing
Scalability of Cache Coherence
Simple Illustration of
Directory
Home Node
Vs
Requesting/Local Node
Vs
Owner/Remote Node
Performance of NUMA
Transition
Gray: Request from remote node
Black: Actions taken by Home directory