0% found this document useful (0 votes)
10 views

Lecture15

Uploaded by

minulo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lecture15

Uploaded by

minulo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

CSC 252/452: Computer Organization

Fall 2024: Lecture 15

Instructor: Yanan Guo

Department of Computer Science


University of Rochester
Carnegie Mellon

Data Dependencies: 3 Nop’s


1 2 3 4 5 6 7 8 9 10 11

0x000: irmovq $10,%rdx F D E M W


0x00a: irmovq $3,%rax F D E M W
0x014: nop F D E M W
0x015: nop F D E M W
0x016: nop F D E M W
0x017: addq %rdx,%rax F D E M W
0x019: halt F D E M W

addq reads the correct %rdx


and %rax

2
Carnegie Mellon

Data Forwarding Example


1 2 3 4 5 6 7 8

0x000: irmovq $10,%rdx F D E M W


0x00a: irmovq $3,%rax F D E M W
0x014: addq %rdx,%rax F D E M W
0x016: halt F D E M W

Register %rdx
• Forward from the memory stage
Register %rax
• Forward from the execute stage

3
Carnegie Mellon

Data Forwarding Example


1 2 3 4 5 6 7 8

0x000: irmovq $10,%rdx F D E M W


0x00a: irmovq $3,%rax F D E M W
0x014: addq %rdx,%rax F D E M W
0x016: halt F D E M W

Register %rdx
• Forward from the memory stage
Register %rax
• Forward from the execute stage

3
Carnegie Mellon

Data Forwarding Example


1 2 3 4 5 6 7 8

0x000: irmovq $10,%rdx F D E M W


0x00a: irmovq $3,%rax F D E M W
0x014: addq %rdx,%rax F D E M W
0x016: halt F D E M W

Register %rdx
• Forward from the memory stage
Register %rax
• Forward from the execute stage

3
Carnegie Mellon

Bypass Paths
Decode Stage
• Forwarding logic selects valA and valB
• Normally from register file
• Forwarding: get valA or valB from later pipeline stage
Forwarding Sources
• Execute: valE
• Memory: valE, valM
• Write back: valE, valM

4
Carnegie Mellon

Out-of-order Execution
• Compiler could do this, but has limitations
• Generally done in hardware

Long-latency instruction.
Forces the pipeline to stall.

r0 = r1 + r2 r0 = r1 + r2
r3 = MEM[r0] r3 = MEM[r0]
r4 = r3 + r6 r7 = r5 + r1
r7 = r5 + r1 …
… r4 = r3 + r6

5
Carnegie Mellon

Out-of-order Execution
• Compiler could do this, but has limitations
• Generally done in hardware

Long-latency instruction.
Forces the pipeline to stall.

r0 = r1 + r2 r0 = r1 + r2
r3 = MEM[r0] r3 = MEM[r0]
r4 = r3 + r6 r7 = r5 + r1
r7 = r5 + r1 …
… r4 = r3 + r6

5
Carnegie Mellon

Out-of-order Execution

r0 = r1 + r2
r3 = MEM[r0]
r4 = r3 + r6
r6 = r5 + r1

6
Carnegie Mellon

Out-of-order Execution

r0 = r1 + r2 Is this correct? r0 = r1 + r2
r3 = MEM[r0] r3 = MEM[r0]
r4 = r3 + r6 r6 = r5 + r1
r6 = r5 + r1 …
… r4 = r3 + r6

6
Carnegie Mellon

Out-of-order Execution

r0 = r1 + r2 Is this correct? r0 = r1 + r2
r3 = MEM[r0] r3 = MEM[r0]
r4 = r3 + r6 r6 = r5 + r1
r6 = r5 + r1 …
… r4 = r3 + r6

r0 = r1 + r2
r3 = MEM[r0]
r4 = r3 + r6
r4 = r5 + r1

6
Carnegie Mellon

Out-of-order Execution

r0 = r1 + r2 Is this correct? r0 = r1 + r2
r3 = MEM[r0] r3 = MEM[r0]
r4 = r3 + r6 r6 = r5 + r1
r6 = r5 + r1 …
… r4 = r3 + r6

r0 = r1 + r2 Is this correct? r0 = r1 + r2
r3 = MEM[r0] r3 = MEM[r0]
r4 = r3 + r6 r4 = r5 + r1
r4 = r5 + r1 …
… r4 = r3 + r6

6
Carnegie Mellon

Out-of-order Execution

r0 = r1 + r2 Is this correct? r0 = r1 + r2
r3 = MEM[r0] r3 = MEM[r0]
r4 = r3 + r6 r6 = r5 + r1
r6 = r5 + r1 …
… r4 = r3 + r6

r0 = r1 + r2 Is this correct? r0 = r1 + r2
r3 = MEM[r0] r3 = MEM[r0]
r4 = r3 + r6 r4 = r5 + r1
r4 = r5 + r1 …
… r4 = r3 + r6

“Tomasolu Algorithm” is the algorithm that is most


widely implemented in modern hardware to get out-of-
order execution right.
6
So far in 252…

CPU Addresses Memory


Register
PC File Code
Data Data
Condition Stack
ALU Instructions
Codes

• We have been discussing the CPU microarchitecture


• Single Cycle, sequential implementation
• Pipeline implementation
• Resolving data dependency and control dependency
• What about memory?

7
Carnegie Mellon

Ideal Memory
• Low access time (latency)
• High capacity
• Low cost
• High bandwidth (to support multiple accesses in parallel)

8
Carnegie Mellon

The Problem
• Ideal memory’s requirements oppose each other

9
Carnegie Mellon

The Problem
• Ideal memory’s requirements oppose each other
• Bigger is slower

9
Carnegie Mellon

The Problem
• Ideal memory’s requirements oppose each other
• Bigger is slower
• Bigger  Takes longer to determine the location

9
Carnegie Mellon

The Problem
• Ideal memory’s requirements oppose each other
• Bigger is slower
• Bigger  Takes longer to determine the location

• Faster is more expensive

9
Carnegie Mellon

The Problem
• Ideal memory’s requirements oppose each other
• Bigger is slower
• Bigger  Takes longer to determine the location

• Faster is more expensive


• Memory technology: Flip-flop vs. SRAM vs. DRAM vs. Disk vs.
Tape

9
Carnegie Mellon

The Problem
• Ideal memory’s requirements oppose each other
• Bigger is slower
• Bigger  Takes longer to determine the location

• Faster is more expensive


• Memory technology: Flip-flop vs. SRAM vs. DRAM vs. Disk vs.
Tape

• Higher bandwidth is more expensive

9
Carnegie Mellon

The Problem
• Ideal memory’s requirements oppose each other
• Bigger is slower
• Bigger  Takes longer to determine the location

• Faster is more expensive


• Memory technology: Flip-flop vs. SRAM vs. DRAM vs. Disk vs.
Tape

• Higher bandwidth is more expensive


• Need more ports, higher frequency, or faster technology

9
Carnegie Mellon

Memory Technology: RAM


• Random access memory
• Random access means you can supply an arbitrary address to the
memory and get a value back

CE (chip enable)
Address
n
WE (write enable)

Content

10
Carnegie Mellon

Latch vs. DRAM vs. SRAM


• DFF (Data Flip-Flop)
• Fastest
• Low density (27 transistors per bit)
• High cost

• SRAM (Static RAM)


• Faster access (no capacitor)
• Lower density (6 transistors per bit; there are designs w/ fewer Ts)
• Higher cost
• Lower power consumption compared to DRAM
• Manufacturing compatible with logic process (no capacitor)
• DRAM (Dynamic RAM)
• Slower access (capacitor)
• Higher density (1 transistor + 1 capacitor per bit)
• Lower cost
• Higher power consumption compared to SRAM
• Manufacturing requires putting capacitor and logic together
11
Carnegie Mellon

Non-volatile Memories

12
Carnegie Mellon

Non-volatile Memories
• DFF, DRAM and SRAM are volatile memories
• Lose information if powered off.

12
Carnegie Mellon

Non-volatile Memories
• DFF, DRAM and SRAM are volatile memories
• Lose information if powered off.
• Nonvolatile memories retain value even if powered off
• Flash (~ 5 years)
• Hard Disk (~ 5 years)
• Tape (~ 15-30 years)

12
Carnegie Mellon

Summary of Trade-Offs
• Faster is more expensive (dollars and chip area)
• SRAM, < 10$ per Megabyte
• DRAM, < 1$ per Megabyte
• Hard Disk < 1$ per Gigabyte

13
Carnegie Mellon

Summary of Trade-Offs
• Faster is more expensive (dollars and chip area)
• SRAM, < 10$ per Megabyte
• DRAM, < 1$ per Megabyte
• Hard Disk < 1$ per Gigabyte
• Larger capacity is slower
• Flip-flops/Small SRAM, sub-nanosec
• SRAM, KByte~MByte, ~nanosec
• DRAM, Gigabyte, ~50 nanosec
• Hard Disk, Terabyte, ~10 millisec

• Other technologies have their place as well


• PC-RAM, MRAM, RRAM

14
Carnegie Mellon

We want both fast and large Memory

• But we cannot achieve both with a single level of memory


• Idea: Memory Hierarchy
• Have multiple levels of storage (progressively bigger and slower as the
levels are farther from the processor)
• Key: manage the data such that most of the data the processor
needs in the near future is kept in the fast(er) level(s)

15
Carnegie Mellon

Memory Hierarchy CPU

fast
small

cheaper per byte


faster per byte
backup
everything
here big but slow

16
Carnegie Mellon

Memory Hierarchy CPU

fast
small

cheaper per byte


faster per byte
backup
everything
here big but slow

16
Carnegie Mellon

Memory Hierarchy CPU

move what you use here fast


small

cheaper per byte


faster per byte
backup
everything
here big but slow

16
Carnegie Mellon

Memory Hierarchy
• Fundamental tradeoff
• Fast memory: small
• Large memory: slow
• Balance latency, cost, size,
bandwidth

Hard Disk
CPU
Main
Cache Memory
Registers (SRAM) (DRAM)
(DFF)

17
Carnegie Mellon

A Modern Memory Hierarchy


Register File (DFF)
32 words, sub-nsec

L1 cache (SRAM)
~32 KB, ~nsec

L2 cache (SRAM)
512 KB ~ 1MB, many nsec

L3 cache (SRAM)
.....

Main memory (DRAM),


GB, ~100 nsec

Hard Disk
100 GB, ~10 msec
18
Carnegie Mellon

Memory in a Modern System

L2 CACHE 1
L2 CACHE 0
SHARED L3 CACHE

DRAM INTERFACE

DRAM Modules
CORE 0 CORE 1

L1 CACHE 0 L1 CACHE 1
DRAM MEMORY
CONTROLLER

L1 CACHE 2 L1 CACHE 3
L2 CACHE 2

L2 CACHE 3

CORE 2 CORE 3

19
Carnegie Mellon

My Desktop

20
Carnegie Mellon

My Server

21
Carnegie Mellon

How Things Have Progressed

RF Main
Cache
Memory
(DFF) (SRAM) Disk
(DRAM)

1995 low-mid 200B 64KB 32MB 2GB


range 5ns 10ns 100ns 5ms
Hennessy & Patterson, Computer
Arch., 1996

2009 low-mid ~200B 8MB 4GB 750GB


range 0.33ns 0.33ns <100ns 4ms
www.dell.com, $449 including 17”
LCD flat panel

2015 ~200B 8MB 16GB 256GB


mid range 0.33ns 0.33ns <100ns 10us
22
Carnegie Mellon

How to Make Effective Use of the Hierarchy


• Fundamental question: how do we know what data to put in the fast
and small memory?
• Answer: ensure most of the data the processor needs in the near
future is kept in the fast(er) level(s)
• How do we know what data will be needed in the future?
• Do we know before the program runs?
• If so, programmers or compiler can place the right data at the
right place
• Do we know only after the program runs?
• If so, only the hardware can effectively place the data

23
Carnegie Mellon

How to Make Effective Use of the Hierarchy

Cache
CPU Registers Memory
$

• Modern computers provide both ways


• Register file: programmers explicitly move data from the main
memory (slow but big DRAM) to registers (small, very fast)
• movq (%rdi), %rdx
• Cache, on the other hand, is automatically managed by hardware
• Sits between registers and main memory, “invisible” to programmers
• The hardware automatically figures out what data will be used in the
near future, and place in the cache.
• How does the hardware know that??

2444
Carnegie Mellon

Register VS Cache

long a = 10; movq $10, %rax


long b = 20; movq $10, 4(%rbx)

• From the programmer’s perspective, data is either in register or


memory.
• One or the other, not both

• If the data is in memory, the hardware may keep a copy of this data
in cache to speed up access to it.

25
Carnegie Mellon

How to Make Effective Use of the Hierarchy

Cache
CPU Registers Memory
$

• Modern computers provide both ways


• Register file: programmers explicitly move data from the main
memory (slow but big DRAM) to registers (small, very fast)
• movq (%rdi), %rdx
• Cache, on the other hand, is automatically managed by hardware
• Sits between registers and main memory, “invisible” to programmers
• The hardware automatically figures out what data will be used in the
near future, and place in the cache.
• How does the hardware know that??

2644
Carnegie Mellon

Locality: An Empirical Observation


• Principle of Locality: Programs tend to use the same data over and
over again, and tend to access data next to each other.

27
Carnegie Mellon

Locality: An Empirical Observation


• Principle of Locality: Programs tend to use the same data over and
over again, and tend to access data next to each other.

• Temporal locality:
• Recently referenced items are likely
to be referenced again in the near future

27
Carnegie Mellon

Locality: An Empirical Observation


• Principle of Locality: Programs tend to use the same data over and
over again, and tend to access data next to each other.

• Temporal locality:
• Recently referenced items are likely
to be referenced again in the near future

• Spatial locality:
• Items with nearby addresses tend
to be referenced close together in time

27
Carnegie Mellon

Locality Example

sum = 0;
for (i = 0; i < n; i++)
sum += a[i];
return sum;

• Data references
• Spatial Locality: Reference array elements in succession (stride-1 reference
pattern)
• Temporal Locality: Reference variable sum each iteration.

• Instruction references
• Spatial Locality: Reference instructions in sequence.
• Temporal Locality: Cycle through loop repeatedly.

28
Carnegie Mellon

Use Locality to Manage Memory Hierarchy

sum = 0;
for (i = 0; i < n; i++)
sum += a[i];
return sum;

• Exploiting temporal locality:


• If a piece of data is recently accessed, very likely it will be needed
again, so move it to cache.
• Exploiting spatial locality:
• When moving a piece of data from the memory to the cache, move its
adjacent data to the cache as well.

29
Carnegie Mellon

The Bookshelf Analogy


• Book in your hand
• Desk
• Bookshelf
• Boxes at home
• Library
• Recently-used books tend to stay on desk, because you will likely use
it again.
• Comp Org. books
• Books for other courses

30
Carnegie Mellon

Cache Illustrations
CPU

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

31
Carnegie Mellon

Cache Illustrations
CPU

Cache
8 9 14 3
(small but fast)

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

32
Carnegie Mellon

Cache Illustrations
CPU
Request Data Data in address 14 is needed
at Address 14

Cache
8 9 14 3
(small but fast)

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

32
Carnegie Mellon

Cache Illustrations
CPU
Request Data Data in address 14 is needed
at Address 14

Cache
8 9 14 3 Address 14 is in cache: Hit!
(small but fast)

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

32
Carnegie Mellon

Cache Illustrations
CPU

Cache
8 9 14 3
(small but fast)

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

33
Carnegie Mellon

Cache Illustrations
CPU
Request data
at Address 12

Cache
8 9 14 3
(small but fast)

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

33
Carnegie Mellon

Cache Illustrations
CPU
Request data Data in address 12 is needed
at Address 12

Cache Address 12 is not in cache:


8 9 14 3
(small but fast) Miss!

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

33
Carnegie Mellon

Cache Illustrations
CPU
Request data Data in address 12 is needed
at Address 12

Cache Address 12 is not in cache:


8 9 14 3
(small but fast) Miss!

Address 12 is fetched from


Request: 12
memory

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

33
Carnegie Mellon

Cache Illustrations
CPU
Request data Data in address 12 is needed
at Address 12

Cache Address 12 is not in cache:


8 9 14 3
(small but fast) Miss!

Address 12 is fetched from


Request: 12
memory

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

33
Carnegie Mellon

Cache Illustrations
CPU
Request data Data in address 12 is needed
at Address 12

Cache Address 12 is not in cache:


8 9 14 3
(small but fast) Miss!

Address 12 is fetched from


12 Request: 12
memory

0 1 2 3
Memory
4 5 6 7
(big but slow)
8 9 10 11
12 13 14 15

33
Carnegie Mellon

Cache Illustrations
CPU
Request data Data in address 12 is needed
at Address 12

Cache Address 12 is not in cache:


8 9
12 14 3
(small but fast) Miss!

Address 12 is fetched from


Request: 12
memory

0 1 2 3
Memory
4 5 6 7
(big but slow) Address 12 is stored in cache
8 9 10 11
12 13 14 15

33
Carnegie Mellon

Cache Hit Rate


• Cache hit is when you find the data in the cache
• Hit rate indicates the effectiveness of the cache

# Hits
Hit Rate =
# Accesses

34
Carnegie Mellon

Two Fundamental Issues in Cache Management


• Finding the data in the cache
• Given an address, how do we decide whether it’s in the cache or not?
• Kicking data out of the cache
• Cache is small than memory, so when there’s no place left in the
cache, we need to kick something out before we can put new data
into it, but who to kick out?

35
Carnegie Mellon

A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
Content Valid? 0010
0011 a
0100
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
1110
1111

36
Carnegie Mellon

A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
• Also called cache-line
Content Valid? 0010
0011 a
0100
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
1110
1111

36
Carnegie Mellon

A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
• Also called cache-line
Content Valid? 0010
0011 a • Every location has a valid bit, indicating
0100 whether that location contains valid data;
00 0101
0 initially.
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
1110
1111

36
Carnegie Mellon

A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
• Also called cache-line
Content Valid? 0010
0011 a • Every location has a valid bit, indicating
0100 whether that location contains valid data;
00 0101
0 initially.
01 0110 • For now, assume cache location size
10 0111 == memory location size == 1 B
11 1000
1001
1010
1011
1100
1101
1110
1111

36
Carnegie Mellon

A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
• Also called cache-line
Content Valid? 0010
0011 a • Every location has a valid bit, indicating
0100 whether that location contains valid data;
00 0101
0 initially.
01 0110 • For now, assume cache location size
10 0111 == memory location size == 1 B
11 1000
1001
• Assume each memory location can
1010
only reside in one cache-line
1011
1100
1101
1110
1111

36
Carnegie Mellon

A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
• Also called cache-line
Content Valid? 0010
0011 a • Every location has a valid bit, indicating
0100 whether that location contains valid data;
00 0101
0 initially.
01 0110 • For now, assume cache location size
10 0111 == memory location size == 1 B
11 1000
1001
• Assume each memory location can
1010
only reside in one cache-line
1011 • Cache is smaller than memory
1100 (obviously)
1101
1110
1111

36
Carnegie Mellon

A Simple Cache
Cache Memory • 16 memory locations
0000 • 4 cache locations
0001
• Also called cache-line
Content Valid? 0010
0011 a • Every location has a valid bit, indicating
0100 whether that location contains valid data;
00 0101
0 initially.
01 0110 • For now, assume cache location size
10 0111 == memory location size == 1 B
11 1000
1001
• Assume each memory location can
1010
only reside in one cache-line
1011 • Cache is smaller than memory
1100 (obviously)
1101
1110
• Thus, not all memory locations can
1111 be cached at the same time

36
Carnegie Mellon

Cache Placement
Cache Memory • Given a memory addr, say 0x0001, we
want to put the data there into the
0000
cache; where does the data go?
0001
0010
0011 a
0100
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
1110
1111

37
Carnegie Mellon

Function to Address Cache


Cache Memory • Simplest way is to take a subset
of address bits
0000
0001 • Direct-Mapped Cache
0010 • CA = ADDR[1],ADDR[0]
0011 a
0100
00 0101
01 0110
10 0111
11 1000
1001

CA 1010
1011
1100
1101
addr[1:0] 1110
1111

Mem addr

38
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
• CA = ADDR[1],ADDR[0]
0000
• Always use the lower order address
0001 bits
0010
0011
0100
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
addr[1:0] 1110
1111

Mem addr

39
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
• CA = ADDR[1],ADDR[0]
0000
• Always use the lower order address
0001 bits
0010
0011 • Multiple addresses can be
0100 mapped to the same location
00 0101
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
addr[1:0] 1110
1111

Mem addr

39
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
• CA = ADDR[1],ADDR[0]
0000
• Always use the lower order address
0001 bits
0010
0011 • Multiple addresses can be
0100 mapped to the same location
00 0101 • E.g., 0010 and 1010
01 0110
10 0111
11 1000
1001
1010
1011
1100
1101
addr[1:0] 1110
1111

Mem addr

39
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
• CA = ADDR[1],ADDR[0]
0000
• Always use the lower order address
0001 bits
0010
0011 • Multiple addresses can be
0100 mapped to the same location
00 0101 • E.g., 0010 and 1010
01 0110
10 0111 • How do we differentiate between
11 1000 different memory locations that
1001 are mapped to the same cache
1010 location?
1011
1100
1101
addr[1:0] 1110
1111

Mem addr

39
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
• CA = ADDR[1],ADDR[0]
0000
• Always use the lower order address
0001 bits
0010
0011 • Multiple addresses can be
0100 mapped to the same location
00 0101 • E.g., 0010 and 1010
01 0110
10 0111 • How do we differentiate between
11 1000 different memory locations that
1001 are mapped to the same cache
1010 location?
1011
1100
• Add a tag field for that purpose
1101
addr[1:0] 1110
1111

Mem addr

39
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
• CA = ADDR[1],ADDR[0]
0000
• Always use the lower order address
0001 bits
0010
0011 • Multiple addresses can be
0100 mapped to the same location
00 0101 • E.g., 0010 and 1010
01 0110
10 0111 • How do we differentiate between
11 1000 different memory locations that
1001 are mapped to the same cache
1010 location?
1011
1100
• Add a tag field for that purpose
1101 • What should the tag field be?
addr[1:0] 1110
1111

Mem addr

39
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
• CA = ADDR[1],ADDR[0]
0000
• Always use the lower order address
0001 bits
0010
0011 • Multiple addresses can be
0100 mapped to the same location
00 0101 • E.g., 0010 and 1010
01 0110
10 0111 • How do we differentiate between
11 1000 different memory locations that
1001 are mapped to the same cache
1010 location?
1011
1100
• Add a tag field for that purpose
1101 • What should the tag field be?
addr[1:0] 1110
• ADDR[3] and ADDR[2] in this
1111
particular example

Mem addr

39
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
• CA = ADDR[1],ADDR[0]
0000
• Always use the lower order address
0001 bits
0010
0011 • Multiple addresses can be
0100 mapped to the same location
00 addr [3:2]
0101 • E.g., 0010 and 1010
01 addr [3:2]
0110
10 addr [3:2]
0111 • How do we differentiate between
11 addr [3:2]
1000 different memory locations that
1001 are mapped to the same cache
1010 location?
1011
1100
• Add a tag field for that purpose
1101 • What should the tag field be?
addr[1:0] 1110
• ADDR[3] and ADDR[2] in this
1111
particular example

Mem addr

39
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
• CA = ADDR[1],ADDR[0]
0000
• Always use the lower order address
Tag 0001 bits
0010
0011 • Multiple addresses can be
0100 mapped to the same location
00 addr [3:2]
0101 • E.g., 0010 and 1010
01 addr [3:2]
0110
10 addr [3:2]
0111 • How do we differentiate between
11 addr [3:2]
1000 different memory locations that
1001 are mapped to the same cache
1010 location?
1011
1100
• Add a tag field for that purpose
1101 • What should the tag field be?
addr[1:0] 1110
• ADDR[3] and ADDR[2] in this
1111
particular example

Mem addr

39
Carnegie Mellon

Direct-Mapped Cache
Cache Memory • Direct-Mapped Cache
• CA = ADDR[1],ADDR[0]
0000
• Always use the lower order address
Tag 0001 bits
0010
0011 • Multiple addresses can be
0100 mapped to the same location
00 addr [3:2]
0101 • E.g., 0010 and 1010
01 addr [3:2]
0110
10 addr [3:2]
0111 • How do we differentiate between
11 addr [3:2]
1000 different memory locations that
1001 are mapped to the same cache
1010 location?
1011
1100
• Add a tag field for that purpose
1101 • What should the tag field be?
addr[1:0] = Hit? 1110
• ADDR[3] and ADDR[2] in this
1111
particular example

Mem addr addr[3:2]

CPU
39

You might also like