0% found this document useful (0 votes)
11 views71 pages

Lecture 11 Post

Uploaded by

John
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views71 pages

Lecture 11 Post

Uploaded by

John
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

LOAD / STORE PROCESSING

CPEN 411
memory
data flow
1 8 )
2 8, 20
ep t
a l ,S
a ppe
o n
won
p p le
(A
Learning objectives
Learning objectives
• explain when loads can be reordered wrt stores
Learning objectives
• explain when loads can be reordered wrt stores

• motivate, describe, and emulate the store buffer


Learning objectives
• explain when loads can be reordered wrt stores

• motivate, describe, and emulate the store buffer

• motivate and explain the load-store queue (LSQ)


Learning objectives
• explain when loads can be reordered wrt stores

• motivate, describe, and emulate the store buffer

• motivate and explain the load-store queue (LSQ)

• explain the need for memory dependency prediction


Load bypassing (tomasola)
-

"
: O
sw r2, 0(r9)
lw r2, 0(r8)
of addi r4, r2, #0xf00
& AW
-
-
What is the add waiting for?
Load bypassing A: the store
B: the load a
sw r2, 0(r9) C: Mr. Godot
lw r2, 0(r8)
addi r4, r2, #0xf00
What is the add waiting for?
Load bypassing A: the store
B: the load ✓
sw r2, 0(r9) C: Mr. Godot
lw r2, 0(r8)
addi r4, r2, #0xf00
Load bypassing
sw r2, 0(r9)
lw r2, 0(r8)
addi r4, r2, #0xf00
Load bypassing
sw r2, 0(r9)
lw r2, 0(r8)
addi r4, r2, #0xf00

• loads are on the critical path, stores are (usually) not


Load bypassing
sw r2, 0(r9)
lw r2, 0(r8)
addi r4, r2, #0xf00

• loads are on the critical path, stores are (usually) not

• would like to use cache bandwidth for loads


and send stores to the cache when free
(~20% better performance)
Store buffer
Store buffer
• IDEA: delay stores until bandwidth available
Store buffer
• IDEA: delay stores until bandwidth available

• queue stores in a store buffer


Store buffer
• IDEA: delay stores until bandwidth available

• queue stores in a store buffer


• allow loads to bypass
Store buffer
• IDEA: delay stores until bandwidth available

• queue stores in a store buffer


• allow loads to bypass
Store buffer
• IDEA: delay stores until bandwidth available

• queue stores in a store buffer


• allow loads to bypass

00
• mark as non-committed if store still in ROB
Store buffer
• IDEA: delay stores until bandwidth available

• queue stores in a store buffer


• allow loads to bypass

• mark as non-committed if store still in ROB


• mark as committed if store retired from ROB
rority
FU
a -
FV addr
I
not yet
committed " # cleared data
committed SFore
X-
a
- -
Bottel
-
D$ write
> > >
Necessary and sufficient action
when a wild exception appears?

A: flush entire store buffer


B: write entire store buffer to D$
C: flush committed entries
D: flush uncommitted entries addr
E: write committed entries to D$
not yet
committed
data
committed

D$ write
Necessary and sufficient action
when a wild exception appears?

A: flush entire store buffer


B: write entire store buffer to D$
C: flush committed entries
D: flush uncommitted entries ✓ addr
E: write committed entries to D$
not yet
committed
data
committed

D$ write
addr

not yet
committed
data
committed

D$ write
Is this design enough to
allow bypassed loads?

A: yes, I’ll bet all my cookies


B: yes, I’ll bet one cookie
C: no, I’ll bet one cookie
addr
D: no, I’ll bet all my cookies
E: all your cookie are belong to us
not yet
committed
data
committed

D$ write
Is this design enough to
allow bypassed loads?

A: yes, I’ll bet all my cookies


B: yes, I’ll bet one cookie
C: no, I’ll bet one cookie
addr
D: no, I’ll bet all my cookies ✓
↓ E: ! ! ! ! !
not yet
committed L


data
committed
TALK
D$ write
addr

not yet
committed
data
committed

D$ write
addr

not yet
committed
data
committed

D$ write


problem: we have no way to check for dependencies
Load bypassing, revisited
memory. ADD*
&Forcou

"IENCE
sw r2, 0(r9)
lw r2, 0(r8)

*
&

I
addi r4, r2, #0xf00
Load bypassing, revisited
sw r2, 0(r9)
lw r2, 0(r8)
addi r4, r2, #0xf00

loads can bypass stores only if their results


do not depend on previous stores*

*more on memory access ordering later


addr

not yet
committed
data
committed

D$ write
before deciding to execute
a load, must check for
dependencies vs:

A: entire store buffer


B: committed entries
C: uncommitted entries addr
D: nothing
not yet
committed
data
committed

D$ write
before deciding to execute
a load, must check for
dependencies vs: -
t, regs
-
-

D
S
A: entire store buffer ✓
B: committed entries
*
10
C: uncommitted entries addr
D: nothing - -

not yet
committed
data
committed

E -
-
D$ write
addr

not yet
committed
data
committed

D$ write
addr

not yet 0 +

data
committed

committed

D$ write

--
if load address matches
- -

a store buffer entry:


- -

A: flush store buffer


B: flush committed
addr
C: flush uncommitted
D: flush entire ROB
E: stall the load
not yet
data
committed

committed

D$ write
if load address matches
a store buffer entry:

A: flush store buffer


B: flush committed
addr
C: flush uncommitted
D: flush entire ROB
E: stall the load ✓
not yet
data
committed

committed

D$ write
addr

not yet
data
committed

committed

D$ write
Load bypassing so far
Load bypassing so far
• store executed in a store buffer
Load bypassing so far
• store executed in a store buffer
• if load address matches in the store buffer, stall
Load bypassing so far
• store executed in a store buffer
• if load address matches in the store buffer, stall

• is stalling always necessary?


Load bypassing so far
• store executed in a store buffer
• if load address matches in the store buffer, stall

• is stalling always necessary?


• what if store value already available?
Load bypassing so far
• store executed in a store buffer
• if load address matches in the store buffer, stall

• is stalling always necessary?


• what if store value already available?
• forward!
Load forwarding
sw r4, 0(r9)
lw r2, 0(r8)
addi r4, r2, #0xf00
Load forwarding
-
sw r4, 0(r9) P I
&
share
at
lw r2, 0(r8) 2
a
addi r4, r2, #0xf00 ↑ I

• on RAW dependency:
if store value known, there is no need to stall
Load forwarding
sw r4, 0(r9)
lw r2, 0(r8)
addi r4, r2, #0xf00

• on RAW dependency:
if store value known, there is no need to stall
• value can be forwarded directly to waiting load
(additional ~5% performance)
addr

not yet
data
committed

committed

D$ write
Load bypassing can occur when...

A: a memory dependency exists


B: no memory dependency exists
C: not sure
addr

not yet
data
committed

committed

D$ write
Load bypassing can occur when...

A: a memory dependency exists


B: no memory dependency exists ✓
C: not sure addr

not yet
data
committed

committed

D$ write
addr

not yet
data
committed

committed

D$ write
Load forwarding can occur when...

A: a memory dependency exists


B: no memory dependency exists
C: not sure
addr

not yet
data
committed

committed

D$ write
Load forwarding can occur when...

A: a memory dependency exists ✓


B: no memory dependency exists
C: not sure
addr

not yet
data
committed

committed

D$ write
addr

not yet
data
committed

committed

D$ write
When can dependencies be resolved?
When can dependencies be resolved?
• problem: must wait until addresses resolved
- -

to determine if there is a dependency


$ for ADR

↑B
-

action
To ans

-
- ↑able
1000m
cy
des

② ...--
Page
A
poottt."
When can dependencies be resolved?
• problem: must wait until addresses resolved
to determine if there is a dependency

• IDEA: speculate no dependency, cancel if wrong


When can dependencies be resolved?
• problem: must wait until addresses resolved
to determine if there is a dependency

• IDEA: speculate no dependency, cancel if wrong


• as stores complete, cancel mis-executed code
When can dependencies be resolved?
• problem: must wait until addresses resolved
to determine if there is a dependency

• IDEA: speculate no dependency, cancel if wrong


• as stores complete, cancel mis-executed code
When can dependencies be resolved?
• problem: must wait until addresses resolved
to determine if there is a dependency

• IDEA: speculate no dependency, cancel if wrong


• as stores complete, cancel mis-executed code

• better alterative: memory dependency prediction


(this is where Apple and Intel got in trouble)
#
addr

data
not yet
committed

committed
not yet
committed
D$ write
Flush ROB starting from
finished speculative load when:

A: load addr = store


addr addr,
not committed
B: load addrdata
= store addr,
not yet committed
committed C: load value = store value,
not committed
committed D: load value = store value,
not yet
committed committed
E: don’t flush, forward load value
D$ write
Flush ROB starting from
finished speculative load when:

A: load addr = store


addr addr,
not committed
B: load addrdata
= store addr,
not yet committed ✓
committed C: load value = store value,
not committed
committed D: load value = store value,
not yet
committed committed
E: don’t flush, forward load value
D$ write
addr

data
not yet
committed

committed
not yet
committed
D$ write
More memory data flow ideas
More memory data flow ideas
• observation: load address can often be predicted
• e.g., cache prefetchers do this pretty well
• this way load can be sent to the D$ even earlier
• cancel if prediction incorrect
More memory data flow ideas
• observation: load address can often be predicted
• e.g., cache prefetchers do this pretty well
• this way load can be sent to the D$ even earlier
• cancel if prediction incorrect
VALUE ped
• observation: load value often the same as last time
• predict load value directly before accessing D$
• instrs dependent on load value can proceed speculatively
• cancel if D$ returns a different value
Summary
• would like to delay stores to save BW for loads
• store buffer keeps in-flight and committed stores
• load bypassing w/ the store buffer
• load forwarding from the store buffer
• speculative loads using the load-store queue
• memory dependency prediction

You might also like