0% found this document useful (0 votes)

12 views

Lecture15

Uploaded by

minulo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Lecture15

Uploaded by

minulo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 81

CSC 252/452: Computer Organization

Fall 2024: Lecture 15

Instructor: Yanan Guo

Department of Computer Science

University of Rochester
Carnegie Mellon

Data Dependencies: 3 Nop’s

1 2 3 4 5 6 7 8 9 10 11

0x000: irmovq $10,%rdx F D E M W

0x00a: irmovq $3,%rax F D E M W
0x014: nop F D E M W
0x015: nop F D E M W
0x016: nop F D E M W
0x017: addq %rdx,%rax F D E M W
0x019: halt F D E M W

addq reads the correct %rdx

and %rax

2
Carnegie Mellon

Data Forwarding Example

1 2 3 4 5 6 7 8

0x000: irmovq $10,%rdx F D E M W

0x00a: irmovq $3,%rax F D E M W
0x014: addq %rdx,%rax F D E M W
0x016: halt F D E M W

3
Carnegie Mellon

Data Forwarding Example

1 2 3 4 5 6 7 8

0x000: irmovq $10,%rdx F D E M W

0x00a: irmovq $3,%rax F D E M W
0x014: addq %rdx,%rax F D E M W
0x016: halt F D E M W

3
Carnegie Mellon

Data Forwarding Example

1 2 3 4 5 6 7 8

0x000: irmovq $10,%rdx F D E M W

0x00a: irmovq $3,%rax F D E M W
0x014: addq %rdx,%rax F D E M W
0x016: halt F D E M W

3
Carnegie Mellon

Bypass Paths
Decode Stage
• Forwarding logic selects valA and valB
• Normally from register file
• Forwarding: get valA or valB from later pipeline stage
Forwarding Sources
• Execute: valE
• Memory: valE, valM
• Write back: valE, valM

4
Carnegie Mellon

Out-of-order Execution
• Compiler could do this, but has limitations
• Generally done in hardware

Long-latency instruction.
Forces the pipeline to stall.

r0 = r1 + r2 r0 = r1 + r2
r3 = MEM[r0] r3 = MEM[r0]
r4 = r3 + r6 r7 = r5 + r1
r7 = r5 + r1 …
… r4 = r3 + r6

5
Carnegie Mellon

Out-of-order Execution
• Compiler could do this, but has limitations
• Generally done in hardware

Long-latency instruction.
Forces the pipeline to stall.

r0 = r1 + r2 r0 = r1 + r2
r3 = MEM[r0] r3 = MEM[r0]
r4 = r3 + r6 r7 = r5 + r1
r7 = r5 + r1 …
… r4 = r3 + r6

5
Carnegie Mellon

Out-of-order Execution

r0 = r1 + r2
r3 = MEM[r0]
r4 = r3 + r6
r6 = r5 + r1
…

6
Carnegie Mellon

Out-of-order Execution

r0 = r1 + r2 Is this correct? r0 = r1 + r2
r3 = MEM[r0] r3 = MEM[r0]
r4 = r3 + r6 r6 = r5 + r1
r6 = r5 + r1 …
… r4 = r3 + r6

6
Carnegie Mellon

Out-of-order Execution

r0 = r1 + r2 Is this correct? r0 = r1 + r2
r3 = MEM[r0] r3 = MEM[r0]
r4 = r3 + r6 r6 = r5 + r1
r6 = r5 + r1 …
… r4 = r3 + r6

r0 = r1 + r2
r3 = MEM[r0]
r4 = r3 + r6
r4 = r5 + r1
…

6
Carnegie Mellon

Out-of-order Execution

r0 = r1 + r2 Is this correct? r0 = r1 + r2
r3 = MEM[r0] r3 = MEM[r0]
r4 = r3 + r6 r6 = r5 + r1
r6 = r5 + r1 …
… r4 = r3 + r6

r0 = r1 + r2 Is this correct? r0 = r1 + r2
r3 = MEM[r0] r3 = MEM[r0]
r4 = r3 + r6 r4 = r5 + r1
r4 = r5 + r1 …
… r4 = r3 + r6

6
Carnegie Mellon

Out-of-order Execution

r0 = r1 + r2 Is this correct? r0 = r1 + r2
r3 = MEM[r0] r3 = MEM[r0]
r4 = r3 + r6 r6 = r5 + r1
r6 = r5 + r1 …
… r4 = r3 + r6

r0 = r1 + r2 Is this correct? r0 = r1 + r2
r3 = MEM[r0] r3 = MEM[r0]
r4 = r3 + r6 r4 = r5 + r1
r4 = r5 + r1 …
… r4 = r3 + r6

“Tomasolu Algorithm” is the algorithm that is most

widely implemented in modern hardware to get out-of-
order execution right.
6
So far in 252…

CPU Addresses Memory

• We have been discussing the CPU microarchitecture

• Single Cycle, sequential implementation
• Pipeline implementation
• Resolving data dependency and control dependency
• What about memory?

7
Carnegie Mellon

Ideal Memory
• Low access time (latency)
• High capacity
• Low cost
• High bandwidth (to support multiple accesses in parallel)

8
Carnegie Mellon

The Problem
• Ideal memory’s requirements oppose each other

9
Carnegie Mellon

The Problem
• Ideal memory’s requirements oppose each other
• Bigger is slower

9
Carnegie Mellon

The Problem
• Ideal memory’s requirements oppose each other
• Bigger is slower
• Bigger  Takes longer to determine the location

9
Carnegie Mellon

The Problem
• Ideal memory’s requirements oppose each other
• Bigger is slower
• Bigger  Takes longer to determine the location

• Faster is more expensive

9
Carnegie Mellon

The Problem
• Ideal memory’s requirements oppose each other
• Bigger is slower
• Bigger  Takes longer to determine the location

• Faster is more expensive

• Memory technology: Flip-flop vs. SRAM vs. DRAM vs. Disk vs.
Tape

9
Carnegie Mellon

The Problem
• Ideal memory’s requirements oppose each other
• Bigger is slower
• Bigger  Takes longer to determine the location

• Faster is more expensive

• Memory technology: Flip-flop vs. SRAM vs. DRAM vs. Disk vs.
Tape

• Higher bandwidth is more expensive

9
Carnegie Mellon

The Problem
• Ideal memory’s requirements oppose each other
• Bigger is slower
• Bigger  Takes longer to determine the location

• Faster is more expensive

• Memory technology: Flip-flop vs. SRAM vs. DRAM vs. Disk vs.
Tape

• Higher bandwidth is more expensive

• Need more ports, higher frequency, or faster technology

9
Carnegie Mellon

Memory Technology: RAM

• Random access memory
• Random access means you can supply an arbitrary address to the
memory and get a value back

CE (chip enable)
Address
n
WE (write enable)

Content

10
Carnegie Mellon

Latch vs. DRAM vs. SRAM

• DFF (Data Flip-Flop)
• Fastest
• Low density (27 transistors per bit)
• High cost

• SRAM (Static RAM)

• Faster access (no capacitor)
• Lower density (6 transistors per bit; there are designs w/ fewer Ts)
• Higher cost
• Lower power consumption compared to DRAM
• Manufacturing compatible with logic process (no capacitor)
• DRAM (Dynamic RAM)
• Slower access (capacitor)
• Higher density (1 transistor + 1 capacitor per bit)
• Lower cost
• Higher power consumption compared to SRAM
• Manufacturing requires putting capacitor and logic together
11
Carnegie Mellon

Non-volatile Memories

12
Carnegie Mellon

Non-volatile Memories
• DFF, DRAM and SRAM are volatile memories
• Lose information if powered off.

12
Carnegie Mellon

Non-volatile Memories
• DFF, DRAM and SRAM are volatile memories
• Lose information if powered off.
• Nonvolatile memories retain value even if powered off
• Flash (~ 5 years)
• Hard Disk (~ 5 years)
• Tape (~ 15-30 years)

12
Carnegie Mellon

Summary of Trade-Offs
• Faster is more expensive (dollars and chip area)
• SRAM, < 10$ per Megabyte
• DRAM, < 1$ per Megabyte
• Hard Disk < 1$ per Gigabyte

13
Carnegie Mellon

Summary of Trade-Offs
• Faster is more expensive (dollars and chip area)
• SRAM, < 10$ per Megabyte
• DRAM, < 1$ per Megabyte
• Hard Disk < 1$ per Gigabyte
• Larger capacity is slower
• Flip-flops/Small SRAM, sub-nanosec
• SRAM, KByte~MByte, ~nanosec
• DRAM, Gigabyte, ~50 nanosec
• Hard Disk, Terabyte, ~10 millisec

• Other technologies have their place as well

• PC-RAM, MRAM, RRAM

14
Carnegie Mellon

We want both fast and large Memory

• But we cannot achieve both with a single level of memory

• Idea: Memory Hierarchy
• Have multiple levels of storage (progressively bigger and slower as the
levels are farther from the processor)
• Key: manage the data such that most of the data the processor
needs in the near future is kept in the fast(er) level(s)

15
Carnegie Mellon

Memory Hierarchy CPU

fast
small

cheaper per byte

faster per byte
backup
everything
here big but slow

16
Carnegie Mellon

Memory Hierarchy CPU

fast
small

cheaper per byte

faster per byte
backup
everything
here big but slow

16
Carnegie Mellon

Memory Hierarchy CPU

move what you use here fast

small

cheaper per byte

faster per byte
backup
everything
here big but slow

16
Carnegie Mellon

Memory Hierarchy
• Fundamental tradeoff
• Fast memory: small
• Large memory: slow
• Balance latency, cost, size,
bandwidth

Hard Disk
CPU
Main
Cache Memory
Registers (SRAM) (DRAM)
(DFF)

17
Carnegie Mellon

A Modern Memory Hierarchy

L1 cache (SRAM)
~32 KB, ~nsec

L2 cache (SRAM)
512 KB ~ 1MB, many nsec

L3 cache (SRAM)
.....

Main memory (DRAM),

GB, ~100 nsec

Hard Disk
100 GB, ~10 msec
18
Carnegie Mellon

Memory in a Modern System

L2 CACHE 1
L2 CACHE 0
SHARED L3 CACHE

DRAM INTERFACE

DRAM Modules
CORE 0 CORE 1

L1 CACHE 0 L1 CACHE 1
DRAM MEMORY
CONTROLLER

L1 CACHE 2 L1 CACHE 3
L2 CACHE 2

L2 CACHE 3

CORE 2 CORE 3

19
Carnegie Mellon

My Desktop

20
Carnegie Mellon

My Server

21
Carnegie Mellon

How Things Have Progressed

RF Main
Cache
Memory
(DFF) (SRAM) Disk
(DRAM)

1995 low-mid 200B 64KB 32MB 2GB

range 5ns 10ns 100ns 5ms
Hennessy & Patterson, Computer
Arch., 1996

2009 low-mid ~200B 8MB 4GB 750GB

range 0.33ns 0.33ns <100ns 4ms
www.dell.com, $449 including 17”
LCD flat panel

2015 ~200B 8MB 16GB 256GB

mid range 0.33ns 0.33ns <100ns 10us
22
Carnegie Mellon

How to Make Effective Use of the Hierarchy

• Fundamental question: how do we know what data to put in the fast
and small memory?
• Answer: ensure most of the data the processor needs in the near
future is kept in the fast(er) level(s)
• How do we know what data will be needed in the future?
• Do we know before the program runs?
• If so, programmers or compiler can place the right data at the
right place
• Do we know only after the program runs?
• If so, only the hardware can effectively place the data

23
Carnegie Mellon

How to Make Effective Use of the Hierarchy

Cache
CPU Registers Memory
$

• Modern computers provide both ways

• Register file: programmers explicitly move data from the main
memory (slow but big DRAM) to registers (small, very fast)
• movq (%rdi), %rdx
• Cache, on the other hand, is automatically managed by hardware
• Sits between registers and main memory, “invisible” to programmers
• The hardware automatically figures out what data will be used in the
near future, and place in the cache.
• How does the hardware know that??

2444
Carnegie Mellon

long a = 10; movq $10, %rax

long b = 20; movq $10, 4(%rbx)

• From the programmer’s perspective, data is either in register or

memory.
• One or the other, not both

• If the data is in memory, the hardware may keep a copy of this data
in cache to speed up access to it.

25
Carnegie Mellon

How to Make Effective Use of the Hierarchy

Cache
CPU Registers Memory
$

• Modern computers provide both ways

2644
Carnegie Mellon

Locality: An Empirical Observation

• Principle of Locality: Programs tend to use the same data over and
over again, and tend to access data next to each other.

27
Carnegie Mellon

Locality: An Empirical Observation

• Principle of Locality: Programs tend to use the same data over and
over again, and tend to access data next to each other.

• Temporal locality:
• Recently referenced items are likely
to be referenced again in the near future

27
Carnegie Mellon

Locality: An Empirical Observation

• Principle of Locality: Programs tend to use the same data over and
over again, and tend to access data next to each other.

• Temporal locality:
• Recently referenced items are likely
to be referenced again in the near future

• Spatial locality:
• Items with nearby addresses tend
to be referenced close together in time

27
Carnegie Mellon

Locality Example

sum = 0;
for (i = 0; i < n; i++)
sum += a[i];
return sum;

• Data references
• Spatial Locality: Reference array elements in succession (stride-1 reference
pattern)
• Temporal Locality: Reference variable sum each iteration.

• Instruction references
• Spatial Locality: Reference instructions in sequence.
• Temporal Locality: Cycle through loop repeatedly.

28
Carnegie Mellon

Use Locality to Manage Memory Hierarchy

sum = 0;
for (i = 0; i < n; i++)
sum += a[i];
return sum;

• Exploiting temporal locality:

• If a piece of data is recently accessed, very likely it will be needed
again, so move it to cache.
• Exploiting spatial locality:
• When moving a piece of data from the memory to the cache, move its
adjacent data to the cache as well.

29
Carnegie Mellon

The Bookshelf Analogy

• Book in your hand
• Desk
• Bookshelf
• Boxes at home
• Library
• Recently-used books tend to stay on desk, because you will likely use
it again.
• Comp Org. books
• Books for other courses

30
Carnegie Mellon

Cache Illustrations
CPU

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

31
Carnegie Mellon

Cache Illustrations
CPU

Cache
8 9 14 3
(small but fast)

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

32
Carnegie Mellon

Cache Illustrations
CPU
Request Data Data in address 14 is needed
at Address 14

Cache
8 9 14 3
(small but fast)

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

32
Carnegie Mellon

Cache Illustrations
CPU
Request Data Data in address 14 is needed
at Address 14

Cache
8 9 14 3 Address 14 is in cache: Hit!
(small but fast)

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

32
Carnegie Mellon

Cache Illustrations
CPU

Cache
8 9 14 3
(small but fast)

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

33
Carnegie Mellon

Cache Illustrations
CPU
Request data
at Address 12

Cache
8 9 14 3
(small but fast)

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

33
Carnegie Mellon

Cache Illustrations
CPU
Request data Data in address 12 is needed
at Address 12

Cache Address 12 is not in cache:

8 9 14 3
(small but fast) Miss!

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

33
Carnegie Mellon

Cache Illustrations
CPU
Request data Data in address 12 is needed
at Address 12

Cache Address 12 is not in cache:

8 9 14 3
(small but fast) Miss!

Address 12 is fetched from

Request: 12
memory

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

33
Carnegie Mellon

Cache Illustrations
CPU
Request data Data in address 12 is needed
at Address 12

Cache Address 12 is not in cache:

8 9 14 3
(small but fast) Miss!

Address 12 is fetched from

Request: 12
memory

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

33
Carnegie Mellon

Cache Illustrations
CPU
Request data Data in address 12 is needed
at Address 12

Cache Address 12 is not in cache:

8 9 14 3
(small but fast) Miss!

Address 12 is fetched from

12 Request: 12
memory

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

33
Carnegie Mellon

Cache Illustrations
CPU
Request data Data in address 12 is needed
at Address 12

Cache Address 12 is not in cache:

8 9
12 14 3
(small but fast) Miss!

Address 12 is fetched from

Request: 12
memory

0 1 2 3
Memory
4 5 6 7
(big but slow) Address 12 is stored in cache
8 9 10 11
12 13 14 15

33
Carnegie Mellon

Cache Hit Rate

• Cache hit is when you find the data in the cache
• Hit rate indicates the effectiveness of the cache

# Hits
Hit Rate =
# Accesses

34
Carnegie Mellon

Two Fundamental Issues in Cache Management

• Finding the data in the cache
• Given an address, how do we decide whether it’s in the cache or not?
• Kicking data out of the cache
• Cache is small than memory, so when there’s no place left in the
cache, we need to kick something out before we can put new data
into it, but who to kick out?

35
Carnegie Mellon

A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
Content Valid? 0010
0011 a
0100
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
1110
1111

36
Carnegie Mellon

A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
• Also called cache-line
Content Valid? 0010
0011 a
0100
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
1110
1111

36
Carnegie Mellon

A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
• Also called cache-line
Content Valid? 0010
0011 a • Every location has a valid bit, indicating
0100 whether that location contains valid data;
00 0101
0 initially.
01 0110 • For now, assume cache location size
10 0111 == memory location size == 1 B
11 1000
1001
• Assume each memory location can
1010
only reside in one cache-line
1011
1100
1101
1110
1111

36
Carnegie Mellon

Cache Placement
Cache Memory • Given a memory addr, say 0x0001, we
want to put the data there into the
0000
cache; where does the data go?
0001
0010
0011 a
0100
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
1110
1111

37
Carnegie Mellon

Function to Address Cache

Cache Memory • Simplest way is to take a subset
of address bits
0000
0001 • Direct-Mapped Cache
0010 • CA = ADDR[1],ADDR[0]
0011 a
0100
00 0101
01 0110
10 0111
11 1000
1001

CA 1010
1011
1100
1101
addr[1:0] 1110
1111

Mem addr

38
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
• CA = ADDR[1],ADDR[0]
0000
• Always use the lower order address
0001 bits
0010
0011
0100
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
addr[1:0] 1110
1111

Mem addr

39
Carnegie Mellon

Mem addr

39
Carnegie Mellon

Mem addr

39
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
• CA = ADDR[1],ADDR[0]
0000
• Always use the lower order address
0001 bits
0010
0011 • Multiple addresses can be
0100 mapped to the same location
00 0101 • E.g., 0010 and 1010
01 0110
10 0111 • How do we differentiate between
11 1000 different memory locations that
1001 are mapped to the same cache
1010 location?
1011
1100
1101
addr[1:0] 1110
1111

Mem addr

39
Carnegie Mellon

Mem addr

39
Carnegie Mellon

Mem addr

39
Carnegie Mellon

Mem addr

39
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
• CA = ADDR[1],ADDR[0]
0000
• Always use the lower order address
0001 bits
0010
0011 • Multiple addresses can be
0100 mapped to the same location
00 addr [3:2]
0101 • E.g., 0010 and 1010
01 addr [3:2]
0110
10 addr [3:2]
0111 • How do we differentiate between
11 addr [3:2]
1000 different memory locations that
1001 are mapped to the same cache
1010 location?
1011
1100
• Add a tag field for that purpose
1101 • What should the tag field be?
addr[1:0] 1110
• ADDR[3] and ADDR[2] in this
1111
particular example

Mem addr

39
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
• CA = ADDR[1],ADDR[0]
0000
• Always use the lower order address
Tag 0001 bits
0010
0011 • Multiple addresses can be
0100 mapped to the same location
00 addr [3:2]
0101 • E.g., 0010 and 1010
01 addr [3:2]
0110
10 addr [3:2]
0111 • How do we differentiate between
11 addr [3:2]
1000 different memory locations that
1001 are mapped to the same cache
1010 location?
1011
1100
• Add a tag field for that purpose
1101 • What should the tag field be?
addr[1:0] 1110
• ADDR[3] and ADDR[2] in this
1111
particular example

Mem addr

39
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
• CA = ADDR[1],ADDR[0]
0000
• Always use the lower order address
Tag 0001 bits
0010
0011 • Multiple addresses can be
0100 mapped to the same location
00 addr [3:2]
0101 • E.g., 0010 and 1010
01 addr [3:2]
0110
10 addr [3:2]
0111 • How do we differentiate between
11 addr [3:2]
1000 different memory locations that
1001 are mapped to the same cache
1010 location?
1011
1100
• Add a tag field for that purpose
1101 • What should the tag field be?
addr[1:0] = Hit? 1110
• ADDR[3] and ADDR[2] in this
1111
particular example

Mem addr addr[3:2]

CPU
39

Lecture 7 - Machine-Level Programming 4 - Data
No ratings yet
Lecture 7 - Machine-Level Programming 4 - Data
91 pages
Unit 5
No ratings yet
Unit 5
56 pages
Lecture-1-02.01.2025
No ratings yet
Lecture-1-02.01.2025
18 pages
Onur 447 Spring15 Lecture17 Memoryhierarchyandcaches Afterlecture
No ratings yet
Onur 447 Spring15 Lecture17 Memoryhierarchyandcaches Afterlecture
51 pages
ECE 152 Introduction To Computer Architecture Where We Are in This Course Right Now
No ratings yet
ECE 152 Introduction To Computer Architecture Where We Are in This Course Right Now
12 pages
Module 4-The Memory System
No ratings yet
Module 4-The Memory System
55 pages
ELECH473 Th03
No ratings yet
ELECH473 Th03
65 pages
Coa CH3&CH4 &CH6
No ratings yet
Coa CH3&CH4 &CH6
45 pages
Week 12 - Lecture 12 - Memory
No ratings yet
Week 12 - Lecture 12 - Memory
27 pages
Computer Organization2 292883412
No ratings yet
Computer Organization2 292883412
66 pages
computerarchitecture and organization summary
No ratings yet
computerarchitecture and organization summary
6 pages
Lecture 2 Get 211
No ratings yet
Lecture 2 Get 211
9 pages
Chapter 5-The Memory System
No ratings yet
Chapter 5-The Memory System
84 pages
EECS 150 - Components and Design Techniques For Digital Systems
No ratings yet
EECS 150 - Components and Design Techniques For Digital Systems
38 pages
unit-5-COA (3) (1)
No ratings yet
unit-5-COA (3) (1)
95 pages
the memory system hamacher
No ratings yet
the memory system hamacher
20 pages
Computer Systems: Hardware (Book No. 1 Chapter 2)
No ratings yet
Computer Systems: Hardware (Book No. 1 Chapter 2)
111 pages
DRAM Basics by Prof. Matthew D. Sinclair
No ratings yet
DRAM Basics by Prof. Matthew D. Sinclair
103 pages
CH 5 (1)
No ratings yet
CH 5 (1)
24 pages
Memory Systems
No ratings yet
Memory Systems
93 pages
Memory Interfacing
No ratings yet
Memory Interfacing
65 pages
EE6304 Lecture8 Mem Hierarchy
No ratings yet
EE6304 Lecture8 Mem Hierarchy
54 pages
Computer Organization: Submitted By: Shaveta Gupta (IT)
No ratings yet
Computer Organization: Submitted By: Shaveta Gupta (IT)
66 pages
N06_MemoryOrganization
No ratings yet
N06_MemoryOrganization
12 pages
Module 6_Memory
No ratings yet
Module 6_Memory
32 pages
CS 152 Computer Architecture and Engineering Lecture 6 - Memory
No ratings yet
CS 152 Computer Architecture and Engineering Lecture 6 - Memory
29 pages
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
No ratings yet
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
32 pages
Computer Organization2 292883412
No ratings yet
Computer Organization2 292883412
66 pages
001-stored program computer
No ratings yet
001-stored program computer
6 pages
Computer System Overview: 1 Spring 2015
No ratings yet
Computer System Overview: 1 Spring 2015
48 pages
Lecture 10: Memory System - Memory Technology: CSE 564 Computer Architecture Summer 2017
No ratings yet
Lecture 10: Memory System - Memory Technology: CSE 564 Computer Architecture Summer 2017
44 pages
Chap 1
No ratings yet
Chap 1
48 pages
CAO CO1
No ratings yet
CAO CO1
74 pages
4 Computer-Organization (MemoryAddressing)
No ratings yet
4 Computer-Organization (MemoryAddressing)
44 pages
250324Digital System Design_memory.pptx
No ratings yet
250324Digital System Design_memory.pptx
107 pages
memory systems-module 3 (1) (1)
No ratings yet
memory systems-module 3 (1) (1)
79 pages
Memory System
No ratings yet
Memory System
70 pages
Welcome
No ratings yet
Welcome
58 pages
Chapter 3 P1
No ratings yet
Chapter 3 P1
57 pages
Basic Architecture of a Computer
No ratings yet
Basic Architecture of a Computer
101 pages
FOC QA model
No ratings yet
FOC QA model
46 pages
CS Paper 1 Notes (1)
No ratings yet
CS Paper 1 Notes (1)
12 pages
FOC MODULE1 ktuspecial. in_ (1)
No ratings yet
FOC MODULE1 ktuspecial. in_ (1)
49 pages
Lecture 3 (Memory Hierarchy and Caches)
No ratings yet
Lecture 3 (Memory Hierarchy and Caches)
88 pages
9 - CH06 - Main Memory Organization
No ratings yet
9 - CH06 - Main Memory Organization
30 pages
CPU and Memory Systems
No ratings yet
CPU and Memory Systems
21 pages
Module 1 Part 1
No ratings yet
Module 1 Part 1
72 pages
Chapter5-The Memory System
No ratings yet
Chapter5-The Memory System
78 pages
CH10 - Memory Hierarchy
No ratings yet
CH10 - Memory Hierarchy
106 pages
DL&CO_UNIT-4
No ratings yet
DL&CO_UNIT-4
43 pages
Chapter 5 - Memory - Systems
No ratings yet
Chapter 5 - Memory - Systems
80 pages
Lecture 11
No ratings yet
Lecture 11
41 pages
운영체제 01
No ratings yet
운영체제 01
60 pages
Chapter-3-P3
No ratings yet
Chapter-3-P3
23 pages
Ch04 The Memory System
No ratings yet
Ch04 The Memory System
45 pages
Kien-Truc-May-Tinh - David-Brooks - Cs146-Lecture17-Main-Memory - (Cuuduongthancong - Com)
No ratings yet
Kien-Truc-May-Tinh - David-Brooks - Cs146-Lecture17-Main-Memory - (Cuuduongthancong - Com)
16 pages
Computer Science 146 Computer Architecture
No ratings yet
Computer Science 146 Computer Architecture
16 pages
Unit 4 Co - A
No ratings yet
Unit 4 Co - A
110 pages
2019 S4 COA NOTES (5)
No ratings yet
2019 S4 COA NOTES (5)
154 pages
Modern C++ Programming: Including the recent standards C++11, C++17, C++20, C++23
From Everand
Modern C++ Programming: Including the recent standards C++11, C++17, C++20, C++23
Orhan Gazi
No ratings yet
Lecture20
No ratings yet
Lecture20
81 pages
Chapter10_ANOVA - Student(1)
No ratings yet
Chapter10_ANOVA - Student(1)
38 pages
CSC 240 HW 4
No ratings yet
CSC 240 HW 4
17 pages
Lecture18
No ratings yet
Lecture18
70 pages
Lecture4
No ratings yet
Lecture4
154 pages
Lecture16
No ratings yet
Lecture16
123 pages
Lecture19
No ratings yet
Lecture19
71 pages
Lecture6
No ratings yet
Lecture6
127 pages
Lecture2
No ratings yet
Lecture2
96 pages
Ordering Document: Kuwait Food Company (Americana) Taha Tkito P.O.Box 3901 Sharjah SHARJAH +971 6509 2222
No ratings yet
Ordering Document: Kuwait Food Company (Americana) Taha Tkito P.O.Box 3901 Sharjah SHARJAH +971 6509 2222
2 pages
Monitoring Performance Azure SQL Database Using Dynamic Management Views
No ratings yet
Monitoring Performance Azure SQL Database Using Dynamic Management Views
5 pages
DataVideo SE-900 Manual
No ratings yet
DataVideo SE-900 Manual
81 pages
Colis Signals Design
No ratings yet
Colis Signals Design
20 pages
Group 5 Compression Assignment
No ratings yet
Group 5 Compression Assignment
11 pages
P880 Dual Analog Pad Manual
No ratings yet
P880 Dual Analog Pad Manual
44 pages
Wimax Technology: IEEE 802.16
No ratings yet
Wimax Technology: IEEE 802.16
18 pages
Install Sims
No ratings yet
Install Sims
11 pages
Brochure Cosec Tam
No ratings yet
Brochure Cosec Tam
8 pages
11BC52605455 en
100% (1)
11BC52605455 en
393 pages
Portable Power Tools Inspection Checklist
100% (1)
Portable Power Tools Inspection Checklist
2 pages
FA-SC1 Manual v2
No ratings yet
FA-SC1 Manual v2
6 pages
Dialux Lab 11 Presentation
No ratings yet
Dialux Lab 11 Presentation
33 pages
JavaServer™ Faces (JSF)
100% (8)
JavaServer™ Faces (JSF)
27 pages
Mixview 3.0: Product Bulletin
No ratings yet
Mixview 3.0: Product Bulletin
8 pages
2731171_E
No ratings yet
2731171_E
2 pages
2022 Marketing Plan Template (Mayple)
No ratings yet
2022 Marketing Plan Template (Mayple)
47 pages
A Low Power, High Gain 2.4/5.2 GHZ Concurrent Dual-Band Low Noise Amplifier
No ratings yet
A Low Power, High Gain 2.4/5.2 GHZ Concurrent Dual-Band Low Noise Amplifier
5 pages
Bermad Fire Protection Catalog PDF
No ratings yet
Bermad Fire Protection Catalog PDF
48 pages
SDCS:SA Cheat Sheet v2.0: " Top To Bottom, First One Wins"
No ratings yet
SDCS:SA Cheat Sheet v2.0: " Top To Bottom, First One Wins"
2 pages
Powershell Fundamentals Path
No ratings yet
Powershell Fundamentals Path
5 pages
Lesson 2 Notes and Vocabulary
No ratings yet
Lesson 2 Notes and Vocabulary
6 pages
16 Interfaces
No ratings yet
16 Interfaces
18 pages
Basic Knowledge About Frontend Development
No ratings yet
Basic Knowledge About Frontend Development
16 pages
Boiler Course 2
No ratings yet
Boiler Course 2
21 pages
Chapter 3 - Big Data Overview
No ratings yet
Chapter 3 - Big Data Overview
17 pages
Standby Android Log 2024 0709 030038
No ratings yet
Standby Android Log 2024 0709 030038
658 pages
Magneti 2008 PDF
No ratings yet
Magneti 2008 PDF
153 pages
P.5 EXAMS
No ratings yet
P.5 EXAMS
2 pages

Lecture15

Uploaded by

Lecture15

Uploaded by

CSC 252/452: Computer Organization

Fall 2024: Lecture 15

Instructor: Yanan Guo

Department of Computer Science

Data Dependencies: 3 Nop’s

0x000: irmovq $10,%rdx F D E M W

addq reads the correct %rdx

Data Forwarding Example

0x000: irmovq $10,%rdx F D E M W

Data Forwarding Example

0x000: irmovq $10,%rdx F D E M W

Data Forwarding Example

0x000: irmovq $10,%rdx F D E M W

“Tomasolu Algorithm” is the algorithm that is most

CPU Addresses Memory

• We have been discussing the CPU microarchitecture

• Faster is more expensive

• Faster is more expensive

• Faster is more expensive

• Higher bandwidth is more expensive

• Faster is more expensive

• Higher bandwidth is more expensive

Memory Technology: RAM

Latch vs. DRAM vs. SRAM

• SRAM (Static RAM)

• Other technologies have their place as well

We want both fast and large Memory

• But we cannot achieve both with a single level of memory

Memory Hierarchy CPU

cheaper per byte

Memory Hierarchy CPU

cheaper per byte

Memory Hierarchy CPU

move what you use here fast

cheaper per byte

A Modern Memory Hierarchy

Main memory (DRAM),

Memory in a Modern System

How Things Have Progressed

1995 low-mid 200B 64KB 32MB 2GB

2009 low-mid ~200B 8MB 4GB 750GB

2015 ~200B 8MB 16GB 256GB

How to Make Effective Use of the Hierarchy

How to Make Effective Use of the Hierarchy

• Modern computers provide both ways

long a = 10; movq $10, %rax

• From the programmer’s perspective, data is either in register or

How to Make Effective Use of the Hierarchy

• Modern computers provide both ways

Locality: An Empirical Observation

Locality: An Empirical Observation

Locality: An Empirical Observation

Use Locality to Manage Memory Hierarchy

• Exploiting temporal locality:

The Bookshelf Analogy

Cache Address 12 is not in cache:

Cache Address 12 is not in cache:

Address 12 is fetched from

Cache Address 12 is not in cache:

Address 12 is fetched from

Cache Address 12 is not in cache:

Address 12 is fetched from

Cache Address 12 is not in cache:

Address 12 is fetched from

Cache Hit Rate

Two Fundamental Issues in Cache Management

Function to Address Cache

Mem addr addr[3:2]

You might also like