0% found this document useful (0 votes)

53 views

Lecture 10: Memory Dependence Detection and Speculation

1. The document discusses memory dependence detection and speculation. 2. It describes how store and load instructions can be dependent on register values from other instructions. 3. Dynamic memory disambiguation techniques like load bypassing and load forwarding are discussed as ways to exploit memory-level parallelism while maintaining memory correctness.

Uploaded by

manjunath s.k

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

Lecture 10: Memory Dependence Detection and Speculation

Uploaded by

manjunath s.k

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Register and Memory Dependences

Store: SW Rt, A(Rs) LW Rt, A(Rs)

1. Calculate effective 1. Calculate effective
Lecture 10: Memory Dependence memory address ⇒ memory address ⇒
Detection and Speculation dependent on Rs dependent on Rs
2. Write to D-Cache ⇒ 2. Read D-Cache ⇒ could
Memory correctness, dynamic dependent on Rt, and be memory-dependent
cannot be speculative on pending writes!
memory disambiguation, speculative
disambiguation, Alpha 21264 Example
Compare “ADD Rd, Rs, Rt” When is the memory
What is the difference? dependence known?

1 2

Memory Correctness and Performance Load/store Buffer in Tomasulo

Correctness conditions: Original Tomasulo: IM
Load/store address are pre-
Only committed store instructions can calculated before scheduling Fetch Unit

write to memory Loads are not dependent on

Any load instruction receives its memory other instructions
Decode Rename Regfile
Reorder
Buffer
operand from its parent (a store Stores are dependent on
instruction) instructions producing the
store data
At the end of execution, any memory word S-buf L-buf RS RS
receives the value of the last write Provide dynamic memory DM FU1 FU2
disambiguation: check the
memory dependence
between stores and loads
Performance: Exploit memory level parallelism
3 4

Dynamic Scheduling with Integer

Instructions Load/Store with Dynamic Execution
IM
Centralized design Only committed store instructions can write to memory
example: Fetch Unit ⇒ Use store buffer as a temporary place for write
instruction output
Centralized reservation
stations usually include Decode Rename Regfile
Reorder
Buffer
Any memory word receives the value of the last write
the load buffer ⇒ Store instructions write to memory in program
order
Integer units are
shared by load/store Centralized RS Any memory word receives the value of the last write
and ALU instructions Memory level parallelism be exploited
I-Fu I-FU FU FU
⇒ Non-speculative solution: load bypassing and load
What is the challenge data
addr
forwarding
S-buf
in detecting memory data
addr ⇒ Speculative solution: speculative load execution
dependence?
D-Cache

5 6

1
Store Buffer Design Example Memory Dependence
Store instruction: Any load instruction receives the memory
Wait in RS until the base RS operand from its parent (a store
instruction)

address and data are
ready
I-FU From RS
Calculate address, move to If any previous store has not written the
store buffer C Ry addr data D-cache, what to do?
Move data directly to young 0 0
store buffer 0 1
Wait for commit 1 - Arch. If any previous store has not finished,
If no exception/mis-predict
1 - states what to do?
old
5. Wait for memory port To D-Cache
6. Write to D-cache Simple Design: Delay all following loads; but
Otherwise flushed before how about performance?
writing D-cache
7 8

Memory-level Parallelism Load Bypassing and Load Forwarding

for (i=0;i<100;i++) Non-speculative solution
Read RS
A[i] = A[i]*2;
Read Dynamic Disambiguation:
Match the load address with
Read Store I-FU I-FU
Loop:L.S F2, 0(R1) all store addresses
unit
MULT F2, F2, F4 Write Load bypassing: start cache
match read if no match is found
Write
SW F2, 0(R1) Load forwarding: using store
Write
ADD R1, R1, 4 buffer value if a match is
found
BNE R1, R3,Loop Significant In-order execution
improvement from D-cache limitation: must wait until all
F4 store 2.0 sequential previous store have finished
reads/writes

9 10

In-order Execution Limitation Speculative Load Execution

Example 1: Example 1: When is the If no dependence predicted
RS
for (i=0;i<100;i++) SW result available, Send loads out even if
A[i] = A[i]/2; and when can the next dependence is unknown
Loop:L.S F2, 0(R1) load start? I-FU I-FU
Do address matching at
match
DIV F2, F2, F4 Possible solution: start store commits
SW F2, 0(R1) store address 1. Match found: memory
calculation early ⇒ dependence violation, flush
ADD R1, R1, 4
more complex design pipeline;
BNE R1, R3,Loop 2. Otherwise: continue
store-q load-q
Example 2: D-cache
a->b->c = 100; Example2: When is the
address “a->b->c” Note: may still need load
d = x;
available? forwarding (not shown)

11 12

2
Alpha 21264 Pipeline Alpha 21264 Load/Store Queues
Int issue queue fp issue queue
Addr Int Int Addr FP FP
ALU ALU ALU ALU ALU ALU

Int RF(80) Int RF(80) FP RF(72)

D-TLB L-Q S-Q AF

Dual D-Cache

32-entry load queue, 32-entry store queue

13 14

Load Bypassing, Forwarding, and RAW Detection Speculative Memory Disambiguation

commit PC
match at commit
ROB Load/store? 1024 1-bit
head
Load: WAIT if entry table Renamed inst
LQ head not
completed, then 1
load-q store-q move LQ head
load addr store addr committed Store: mark SQ int issue queue
If match: head as
forward completed, then
move SQ head To help predict memory dependence:
D-cache Whenever a load causes a violation, set stWait bit in the table
When the load is fetched, get its stWait from the table, send
to issue queue with the load instruction
If match: mark store-load trap A load waits there if its swWait is set and any previous store
to flush pipeline (at commit) exists
The tale is cleared periodically
15 16

Architectural Memory States Summary of Superscalar Execution

LQ
Instruction flow techniques
SQ Committed Branch prediction, branch target prediction, and
Completed states instruction prefetch
entries
L1-Cache
L2-Cache Register data flow techniques
L3-Cache (optional) Register renaming, instruction scheduling, in-order
Memory commit, mis-prediction recovery

Disk, Tape, etc.

Memory data flow techniques
Load/store units, memory consistency
Memory request: search the hierarchy from top to
bottom
Source: Shen & Lipasti reference book
17 18

AI -102
No ratings yet
AI -102
116 pages
AirWits R5.2 User manual
No ratings yet
AirWits R5.2 User manual
16 pages
Banking Management System Final
100% (1)
Banking Management System Final
17 pages
Ee6304 Ym Lec 14
No ratings yet
Ee6304 Ym Lec 14
15 pages
Lecture-11-Post
No ratings yet
Lecture-11-Post
71 pages
Superscalar Processors Superscalar Processors vs. VLIW: Computer Science
No ratings yet
Superscalar Processors Superscalar Processors vs. VLIW: Computer Science
17 pages
CA Classes-116-120
No ratings yet
CA Classes-116-120
5 pages
Onur 447 Spring15 Lecture17 Memoryhierarchyandcaches Afterlecture
No ratings yet
Onur 447 Spring15 Lecture17 Memoryhierarchyandcaches Afterlecture
51 pages
Superscalar Processor Simulator Report PDF Version
No ratings yet
Superscalar Processor Simulator Report PDF Version
16 pages
RN ACA-5 Unit-II
No ratings yet
RN ACA-5 Unit-II
42 pages
L14 AdvancedMemory
No ratings yet
L14 AdvancedMemory
30 pages
Instruction Set Architecture: From Source To Assembly Code
100% (1)
Instruction Set Architecture: From Source To Assembly Code
6 pages
2162 Term Project: The Tomasulo Algorithm Implementation
No ratings yet
2162 Term Project: The Tomasulo Algorithm Implementation
5 pages
M14
No ratings yet
M14
44 pages
Arch4 Pipelined Processor Design Afterlecture
No ratings yet
Arch4 Pipelined Processor Design Afterlecture
130 pages
Superscalar
No ratings yet
Superscalar
38 pages
The Sparc Microprocessor: Contents
No ratings yet
The Sparc Microprocessor: Contents
12 pages
Lecture-14-03.02.2025
No ratings yet
Lecture-14-03.02.2025
53 pages
Unit - 1 Microprocessor Architecture
No ratings yet
Unit - 1 Microprocessor Architecture
52 pages
Lec 13
No ratings yet
Lec 13
13 pages
Onur 447 Spring15 Lecture13 Ooo and Dataflow Afterlecture
No ratings yet
Onur 447 Spring15 Lecture13 Ooo and Dataflow Afterlecture
62 pages
A First Look at ARM Instruction Set Architecture
No ratings yet
A First Look at ARM Instruction Set Architecture
3 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
Instruction-Level Parallelism 2
No ratings yet
Instruction-Level Parallelism 2
77 pages
Chapter 3 - Basic Operational Concepts
No ratings yet
Chapter 3 - Basic Operational Concepts
16 pages
Micro Unit 4
No ratings yet
Micro Unit 4
151 pages
Chapter - 4
No ratings yet
Chapter - 4
105 pages
William Stallings Computer Organization and Architecture: CPU Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture: CPU Structure and Function
40 pages
DSP Processor Fundamentals
No ratings yet
DSP Processor Fundamentals
58 pages
ARM Chap 3 - Last
No ratings yet
ARM Chap 3 - Last
51 pages
L04-PipeliningII
No ratings yet
L04-PipeliningII
33 pages
CA Lecture 12
No ratings yet
CA Lecture 12
48 pages
@vtucode - In-2022-Scheme-Module-4-3rd semester-CSE
No ratings yet
@vtucode - In-2022-Scheme-Module-4-3rd semester-CSE
35 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
36 pages
APP_UNIT_2_PKM
No ratings yet
APP_UNIT_2_PKM
102 pages
Computer Architecture
No ratings yet
Computer Architecture
24 pages
M4
No ratings yet
M4
41 pages
ARM Architecture
No ratings yet
ARM Architecture
26 pages
ARM Instructions - Part D Memory Access Instructions
No ratings yet
ARM Instructions - Part D Memory Access Instructions
20 pages
Memory Management
No ratings yet
Memory Management
69 pages
Unit 4
No ratings yet
Unit 4
208 pages
8051 Assembly Language
No ratings yet
8051 Assembly Language
39 pages
Computer Organisation and Arthicture Question Paper and Solution
No ratings yet
Computer Organisation and Arthicture Question Paper and Solution
14 pages
Lecture 5 - Subroutine
No ratings yet
Lecture 5 - Subroutine
30 pages
Introduction To Processor Design & The ARM Architecture
100% (1)
Introduction To Processor Design & The ARM Architecture
65 pages
SAP1
No ratings yet
SAP1
40 pages
Verification and Computer Architecture Important Links
No ratings yet
Verification and Computer Architecture Important Links
22 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
Mod_4_Ch_1
No ratings yet
Mod_4_Ch_1
20 pages
COL216 Assignment 4: 1 Problem Statement
No ratings yet
COL216 Assignment 4: 1 Problem Statement
4 pages
UNIT-3 Hardware-Based Speculation
No ratings yet
UNIT-3 Hardware-Based Speculation
27 pages
Solutions COA7e 1
No ratings yet
Solutions COA7e 1
92 pages
Arm2 1
No ratings yet
Arm2 1
65 pages
Cpe626 ARMorganization
No ratings yet
Cpe626 ARMorganization
10 pages
UNIT V cd print
No ratings yet
UNIT V cd print
9 pages
Module 5_Processor Structure and Function
No ratings yet
Module 5_Processor Structure and Function
74 pages
CEIT-22021 Ch-8-L1
No ratings yet
CEIT-22021 Ch-8-L1
50 pages
Lec16 OoOa
No ratings yet
Lec16 OoOa
57 pages
7 - Memory and Stack
No ratings yet
7 - Memory and Stack
69 pages
Processor Organization
100% (1)
Processor Organization
55 pages
AppendixD Assembly Arm
No ratings yet
AppendixD Assembly Arm
53 pages
PostgreSQL Replication - Second Edition
From Everand
PostgreSQL Replication - Second Edition
Hans-Jurgen Schonig
No ratings yet
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
Beamng Dxdiag
No ratings yet
Beamng Dxdiag
36 pages
02 Analysis PDF
No ratings yet
02 Analysis PDF
54 pages
Flowchart: Syed Zaid Irshad
No ratings yet
Flowchart: Syed Zaid Irshad
12 pages
Classes
No ratings yet
Classes
37 pages
Module 2 Lesson 2 - GE 4 Living in The IT Era
No ratings yet
Module 2 Lesson 2 - GE 4 Living in The IT Era
11 pages
4 Troubleshooting ML-3750ND
No ratings yet
4 Troubleshooting ML-3750ND
42 pages
COMP S.Y. B.tech Sem 3 2020 Pattern Syllabus 1
No ratings yet
COMP S.Y. B.tech Sem 3 2020 Pattern Syllabus 1
24 pages
PCNSA Exam
No ratings yet
PCNSA Exam
157 pages
OSED Exam Report
No ratings yet
OSED Exam Report
5 pages
2017 03 29 EBICS - V - 3.0 FinalVersion Wa
No ratings yet
2017 03 29 EBICS - V - 3.0 FinalVersion Wa
348 pages
Computer Graphics and Visualization: Module-1
No ratings yet
Computer Graphics and Visualization: Module-1
42 pages
HCIA 4.5 Dump
No ratings yet
HCIA 4.5 Dump
75 pages
Seminar On 4G Wireless Technology: Submitted By: Girish S Guled
No ratings yet
Seminar On 4G Wireless Technology: Submitted By: Girish S Guled
19 pages
Business Analytics and Data Visualization
No ratings yet
Business Analytics and Data Visualization
43 pages
Brij
No ratings yet
Brij
1 page
Lab-02 Static Routing
No ratings yet
Lab-02 Static Routing
8 pages
1,28 - TFT Display Round EN
No ratings yet
1,28 - TFT Display Round EN
24 pages
OpenCV-Python Tutorials Documentation
No ratings yet
OpenCV-Python Tutorials Documentation
273 pages
Subqueries in SQL
No ratings yet
Subqueries in SQL
13 pages
Basic Tutorial On Sliding Mode Control I
No ratings yet
Basic Tutorial On Sliding Mode Control I
4 pages
Linux Manual
No ratings yet
Linux Manual
20 pages
Ethernet Modbus X80 Gateway Device Type Manager: User Manual
No ratings yet
Ethernet Modbus X80 Gateway Device Type Manager: User Manual
40 pages
1020233-MAD-2022 Question Paper
No ratings yet
1020233-MAD-2022 Question Paper
4 pages
Assignment
No ratings yet
Assignment
11 pages
Find My Classroom
No ratings yet
Find My Classroom
20 pages
Viva Voice Questions and Answers
No ratings yet
Viva Voice Questions and Answers
10 pages
sp4510sf Ingles Folleto
No ratings yet
sp4510sf Ingles Folleto
4 pages

Lecture 10: Memory Dependence Detection and Speculation

Uploaded by

Lecture 10: Memory Dependence Detection and Speculation

Uploaded by

Register and Memory Dependences

Store: SW Rt, A(Rs) LW Rt, A(Rs)

Memory Correctness and Performance Load/store Buffer in Tomasulo

write to memory Loads are not dependent on

Dynamic Scheduling with Integer

Memory-level Parallelism Load Bypassing and Load Forwarding

In-order Execution Limitation Speculative Load Execution

Int RF(80) Int RF(80) FP RF(72)

D-TLB L-Q S-Q AF

32-entry load queue, 32-entry store queue

Load Bypassing, Forwarding, and RAW Detection Speculative Memory Disambiguation

Architectural Memory States Summary of Superscalar Execution

Disk, Tape, etc.

You might also like