Lecture 11 Post
Lecture 11 Post
CPEN 411
memory
data flow
1 8 )
2 8, 20
ep t
a l ,S
a ppe
o n
won
p p le
(A
Learning objectives
Learning objectives
• explain when loads can be reordered wrt stores
Learning objectives
• explain when loads can be reordered wrt stores
"
: O
sw r2, 0(r9)
lw r2, 0(r8)
of addi r4, r2, #0xf00
& AW
-
-
What is the add waiting for?
Load bypassing A: the store
B: the load a
sw r2, 0(r9) C: Mr. Godot
lw r2, 0(r8)
addi r4, r2, #0xf00
What is the add waiting for?
Load bypassing A: the store
B: the load ✓
sw r2, 0(r9) C: Mr. Godot
lw r2, 0(r8)
addi r4, r2, #0xf00
Load bypassing
sw r2, 0(r9)
lw r2, 0(r8)
addi r4, r2, #0xf00
Load bypassing
sw r2, 0(r9)
lw r2, 0(r8)
addi r4, r2, #0xf00
00
• mark as non-committed if store still in ROB
Store buffer
• IDEA: delay stores until bandwidth available
D$ write
Necessary and sufficient action
when a wild exception appears?
D$ write
addr
not yet
committed
data
committed
D$ write
Is this design enough to
allow bypassed loads?
D$ write
Is this design enough to
allow bypassed loads?
⑳
data
committed
TALK
D$ write
addr
not yet
committed
data
committed
D$ write
addr
not yet
committed
data
committed
D$ write
⑧
problem: we have no way to check for dependencies
Load bypassing, revisited
memory. ADD*
&Forcou
"IENCE
sw r2, 0(r9)
lw r2, 0(r8)
*
&
I
addi r4, r2, #0xf00
Load bypassing, revisited
sw r2, 0(r9)
lw r2, 0(r8)
addi r4, r2, #0xf00
not yet
committed
data
committed
D$ write
before deciding to execute
a load, must check for
dependencies vs:
D$ write
before deciding to execute
a load, must check for
dependencies vs: -
t, regs
-
-
D
S
A: entire store buffer ✓
B: committed entries
*
10
C: uncommitted entries addr
D: nothing - -
not yet
committed
data
committed
E -
-
D$ write
addr
not yet
committed
data
committed
D$ write
addr
not yet 0 +
data
committed
committed
D$ write
--
if load address matches
- -
committed
D$ write
if load address matches
a store buffer entry:
committed
D$ write
addr
not yet
data
committed
committed
D$ write
Load bypassing so far
Load bypassing so far
• store executed in a store buffer
Load bypassing so far
• store executed in a store buffer
• if load address matches in the store buffer, stall
Load bypassing so far
• store executed in a store buffer
• if load address matches in the store buffer, stall
• on RAW dependency:
if store value known, there is no need to stall
Load forwarding
sw r4, 0(r9)
lw r2, 0(r8)
addi r4, r2, #0xf00
• on RAW dependency:
if store value known, there is no need to stall
• value can be forwarded directly to waiting load
(additional ~5% performance)
addr
not yet
data
committed
committed
D$ write
Load bypassing can occur when...
not yet
data
committed
committed
D$ write
Load bypassing can occur when...
not yet
data
committed
committed
D$ write
addr
not yet
data
committed
committed
D$ write
Load forwarding can occur when...
not yet
data
committed
committed
D$ write
Load forwarding can occur when...
not yet
data
committed
committed
D$ write
addr
not yet
data
committed
committed
D$ write
When can dependencies be resolved?
When can dependencies be resolved?
• problem: must wait until addresses resolved
- -
↑B
-
action
To ans
-
- ↑able
1000m
cy
des
② ...--
Page
A
poottt."
When can dependencies be resolved?
• problem: must wait until addresses resolved
to determine if there is a dependency
data
not yet
committed
committed
not yet
committed
D$ write
Flush ROB starting from
finished speculative load when:
data
not yet
committed
committed
not yet
committed
D$ write
More memory data flow ideas
More memory data flow ideas
• observation: load address can often be predicted
• e.g., cache prefetchers do this pretty well
• this way load can be sent to the D$ even earlier
• cancel if prediction incorrect
More memory data flow ideas
• observation: load address can often be predicted
• e.g., cache prefetchers do this pretty well
• this way load can be sent to the D$ even earlier
• cancel if prediction incorrect
VALUE ped
• observation: load value often the same as last time
• predict load value directly before accessing D$
• instrs dependent on load value can proceed speculatively
• cancel if D$ returns a different value
Summary
• would like to delay stores to save BW for loads
• store buffer keeps in-flight and committed stores
• load bypassing w/ the store buffer
• load forwarding from the store buffer
• speculative loads using the load-store queue
• memory dependency prediction