0% found this document useful (0 votes)
40 views

Lecture 10: Memory Dependence Detection and Speculation

1. The document discusses memory dependence detection and speculation. 2. It describes how store and load instructions can be dependent on register values from other instructions. 3. Dynamic memory disambiguation techniques like load bypassing and load forwarding are discussed as ways to exploit memory-level parallelism while maintaining memory correctness.

Uploaded by

manjunath s.k
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Lecture 10: Memory Dependence Detection and Speculation

1. The document discusses memory dependence detection and speculation. 2. It describes how store and load instructions can be dependent on register values from other instructions. 3. Dynamic memory disambiguation techniques like load bypassing and load forwarding are discussed as ways to exploit memory-level parallelism while maintaining memory correctness.

Uploaded by

manjunath s.k
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Register and Memory Dependences

Store: SW Rt, A(Rs) LW Rt, A(Rs)


1. Calculate effective 1. Calculate effective
Lecture 10: Memory Dependence memory address ⇒ memory address ⇒
Detection and Speculation dependent on Rs dependent on Rs
2. Write to D-Cache ⇒ 2. Read D-Cache ⇒ could
Memory correctness, dynamic dependent on Rt, and be memory-dependent
cannot be speculative on pending writes!
memory disambiguation, speculative
disambiguation, Alpha 21264 Example
Compare “ADD Rd, Rs, Rt” When is the memory
What is the difference? dependence known?

1 2

Memory Correctness and Performance Load/store Buffer in Tomasulo


Correctness conditions: Original Tomasulo: IM
Load/store address are pre-
Š Only committed store instructions can calculated before scheduling Fetch Unit

write to memory Loads are not dependent on


Š Any load instruction receives its memory other instructions
Decode Rename Regfile
Reorder
Buffer
operand from its parent (a store Stores are dependent on
instruction) instructions producing the
store data
Š At the end of execution, any memory word S-buf L-buf RS RS
receives the value of the last write Provide dynamic memory DM FU1 FU2
disambiguation: check the
memory dependence
between stores and loads
Performance: Exploit memory level parallelism
3 4

Dynamic Scheduling with Integer


Instructions Load/Store with Dynamic Execution
IM
Centralized design „ Only committed store instructions can write to memory
example: Fetch Unit ⇒ Use store buffer as a temporary place for write
instruction output
Centralized reservation
stations usually include Decode Rename Regfile
Reorder
Buffer
„ Any memory word receives the value of the last write
the load buffer ⇒ Store instructions write to memory in program
order
Integer units are
shared by load/store Centralized RS „ Any memory word receives the value of the last write
and ALU instructions „ Memory level parallelism be exploited
I-Fu I-FU FU FU
⇒ Non-speculative solution: load bypassing and load
What is the challenge data
addr
forwarding
S-buf
in detecting memory data
addr ⇒ Speculative solution: speculative load execution
dependence?
D-Cache

5 6

1
Store Buffer Design Example Memory Dependence
Store instruction: Any load instruction receives the memory
Wait in RS until the base RS operand from its parent (a store
instruction)
Š
address and data are
ready
I-FU From RS
Š Calculate address, move to If any previous store has not written the
store buffer C Ry addr data D-cache, what to do?
Š Move data directly to young 0 0
store buffer 0 1
Š Wait for commit 1 - Arch. If any previous store has not finished,
If no exception/mis-predict
1 - states what to do?
old
5. Wait for memory port To D-Cache
6. Write to D-cache Simple Design: Delay all following loads; but
Otherwise flushed before how about performance?
writing D-cache
7 8

Memory-level Parallelism Load Bypassing and Load Forwarding


for (i=0;i<100;i++) Non-speculative solution
Read RS
A[i] = A[i]*2;
Read Dynamic Disambiguation:
Match the load address with
Read Store I-FU I-FU
Loop:L.S F2, 0(R1) all store addresses
unit
MULT F2, F2, F4 Write Load bypassing: start cache
match read if no match is found
Write
SW F2, 0(R1) Load forwarding: using store
Write
ADD R1, R1, 4 buffer value if a match is
found
BNE R1, R3,Loop Significant In-order execution
improvement from D-cache limitation: must wait until all
F4 store 2.0 sequential previous store have finished
reads/writes

9 10

In-order Execution Limitation Speculative Load Execution


Example 1: Example 1: When is the If no dependence predicted
RS
for (i=0;i<100;i++) SW result available, Send loads out even if
A[i] = A[i]/2; and when can the next dependence is unknown
Loop:L.S F2, 0(R1) load start? I-FU I-FU
Do address matching at
match
DIV F2, F2, F4 Possible solution: start store commits
SW F2, 0(R1) store address 1. Match found: memory
calculation early ⇒ dependence violation, flush
ADD R1, R1, 4
more complex design pipeline;
BNE R1, R3,Loop 2. Otherwise: continue
store-q load-q
Example 2: D-cache
a->b->c = 100; Example2: When is the
address “a->b->c” Note: may still need load
d = x;
available? forwarding (not shown)

11 12

2
Alpha 21264 Pipeline Alpha 21264 Load/Store Queues
Int issue queue fp issue queue
Addr Int Int Addr FP FP
ALU ALU ALU ALU ALU ALU

Int RF(80) Int RF(80) FP RF(72)

D-TLB L-Q S-Q AF

Dual D-Cache

32-entry load queue, 32-entry store queue


13 14

Load Bypassing, Forwarding, and RAW Detection Speculative Memory Disambiguation


commit PC
match at commit
ROB Load/store? 1024 1-bit
head
Load: WAIT if entry table Renamed inst
LQ head not
completed, then 1
load-q store-q move LQ head
load addr store addr committed Store: mark SQ int issue queue
If match: head as
forward completed, then
move SQ head To help predict memory dependence:
D-cache Whenever a load causes a violation, set stWait bit in the table
When the load is fetched, get its stWait from the table, send
to issue queue with the load instruction
If match: mark store-load trap A load waits there if its swWait is set and any previous store
to flush pipeline (at commit) exists
The tale is cleared periodically
15 16

Architectural Memory States Summary of Superscalar Execution


LQ
Instruction flow techniques
SQ Committed Branch prediction, branch target prediction, and
Completed states instruction prefetch
entries
L1-Cache
L2-Cache Register data flow techniques
L3-Cache (optional) Register renaming, instruction scheduling, in-order
Memory commit, mis-prediction recovery

Disk, Tape, etc.


Memory data flow techniques
Load/store units, memory consistency
Memory request: search the hierarchy from top to
bottom
Source: Shen & Lipasti reference book
17 18

You might also like