MULTIPROCTLPA
MULTIPROCTLPA
software models.
◦ parallel processing -- execution of a tightly coupled set of
threads collaborating on a single task.
◦ Request- level parallelism -- execution of multiple, independent
processes that may originate from one or more users,
RLP exploited by a single application running on multiple
processors, such as a database responding to queries, or multiple
applications running independently, often called multiprogramming
1. Single instruction stream, single data
stream (SISD) - Uniprocessor
Commodity clusters
Custom clusters
A class of clusters where the nodes are truly
commodities
◦ headless workstations, motherboards, or blade
servers
◦ connected with a SAN or LAN usually
accessible via an I/O bus.
Focus on Throughput
Large caches
• Disadvantages:
• Communication b/w processors – complex.
• Requires more effort in s/w to take advantage
of the increased memory bandwidth.
1. Communication through shared address
space.
2. Address space consist of multiple private
address spaces.
Physically separate memories can be
addressed as one logically addressed space.
memory
Small scale multiprocessors –several
+ comm among
them by reads &
writes.
Cached – location Cached – replicated
migrated to cache in multiple caches
Reduce avg access Reduce access
time , memory latency, memory
bandwidth reqd. bandwidth reqd.
No other processor Reduces contention
uses –like in New problem :
uniprocessor Cache Coherence.
Time Event Cache Cache loc.
for for X
CPU CPU
A B
0 1
1 CPU A reads X 1 1
2 CPU B reads X 1 1 1
3 CPU A Stores 0 into X 0 1 0
model.
Both are complementary
Coherence – defines behavior of reads &
• No centralized state
•
• All caches accessible via some broad cast
medium (bus/switch)
coherence
Snooping protocol
– communication with all caches on every
cache miss.
- Centralized data structure – inexpensive
- but no scalability
and 2).
Read miss Local Home dir P,A Proc P has RM at addr A; req data &
cache make P a read sharer
Write miss Local Home dir P,A Proc P has WM at addr A; req data &
cache make P exclusive owner
fetch Home dir Remote A Fetch the block at addr A, send it to its
cache home dir, change A from remote cache to
shared
fetch Home dir Remote A Fetch the block at addr A, send it to its
/Invalidate cache home dir, invalidate the block in cache
Data value Home dir Local D Return a data value from home mem
reply cache
Data write Remote Home dir A,D Write back a data value for address A
back cache
Assume a simple model of memory
consistency
Min. complexity have these assumptions:
Msg will be received & sent in same order
To ensure – invalidates sent by proc are
m
Data write back
r i t w r w i s s
w ta c pu m Read miss
ta D a ite
Da g w r
m s n d
a te Se
d lid
r ea t ch a
u Fe nv iss
Cp ss nd
i m Requests
i e i t e
m S wr
P U Read miss
C
CPU write hit Write miss
CPU read hit
Modified
(read/write) CPU write miss Invalidate
Data write back
Data fetch
write miss
Write miss op, which was broadcast on bus
in snooping; replaced by data fetch &
invalidate operations ; selectively sent by
directory controller.
Directory implements other half of
coherence protocol.
Msg sent to directory causes 2 actions:
;
sharers ={ }
y lu
s re pl a va
is ue at
m al } ;d
e ad t a v {P sharers =
R d a =
h; r e rs sharers + {P}
t c
Fe ; sha
a te
al id
inv Fetch/invalidate Requests
Exclusive Data value reply Read miss
(read/write)
Write miss
WM
sharers = {P}
Data write back Data write back
Block Status Req Process
State
Un cached Copy in RM Req proc gets data from mem – requestor
mem is is only sharing node – state – shared
current WM Req proc gets value & sharing node – state
value – excluisve -owner
Shared Mem value RM Req proc gets data from mem – req proc
up to date added to sharing set
Req proc gets data from mem – all sharers
WM sent invalidate msg – state -exclusive
Synchronization operations:
1. Atomic Exchange
2. Test & Set
3. Fetch & Increment
Interchanges a value in a register for a value
in memory.
Pair of instructions:
1. load linked/load locked
2. store conditional
If contents of mem loc specified by load
linked are changed before store conditional
to the same address occurs
If the processor does a context switch b/w
2 inst.
Store conditional is defined to return 1 if
successful, 0 otherwise.
Spin locks – locks that a processor
continuously tries to acquire, spinning
around a loop until it succeeds.
When are they used?
More formally:
◦ In every possible execution, for every shared data
◦ Write by a processor, and access (read/write) by
another processor
◦ Are separated by a pair of synchronization
operations
◦ One executed after the write and one before the
access by the 2nd processor.
That is, the program is data-race-free
X Y
◦ X must complete before Y is done
Four possibilities:
◦ R --> R, R --> W, W --> W, W --> R
r ea m i place
U d us
CP rea b read
lace s on
p s miss on
e mi
w rit bus
l ace
t e;p
rite alida
U w inv
CP
Exclusive
(read/write)
WM
Cpu read hit Place write miss on
bus
Write miss
invalidate for this block Shared
Invalid
(Read only)
RM
place
read
Write back
miss on
memory
access
bus
Abort
block
lo ck ss
b e
a ck acc
b ry
rite em
o
W m
WM ort
Exclusive A b
(read/write)
RM