0% found this document useful (0 votes)
8 views20 pages

Lecture 4 - Cache 3

CPU ARch Notes Cache

Uploaded by

njanthony60
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views20 pages

Lecture 4 - Cache 3

CPU ARch Notes Cache

Uploaded by

njanthony60
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

ECE 463/563

Microprocessor Architecture
Basic Cache Operation, cont.:
replacement policies, write policies

Prof. Eric Rotenberg


ECE 463/563, Microprocessor Architecture, 1
Prof. Eric Rotenberg
Generic Cache
• The same equations hold for any cache type
• Equation for # of blocks in the cache:
𝑆𝐼𝑍𝐸
¿ 𝑏𝑙𝑜𝑐𝑘𝑠=
𝐵𝐿𝑂𝐶𝐾𝑆𝐼𝑍𝐸

• Equation for # of sets in the cache:


¿ 𝑏𝑙𝑜𝑐𝑘𝑠 𝑆𝐼𝑍𝐸
¿ 𝑠𝑒𝑡𝑠= =
𝐴𝑆𝑆𝑂𝐶 𝐴𝑆𝑆𝑂𝐶 ∙ 𝐵𝐿𝑂𝐶𝐾𝑆𝐼𝑍𝐸

• Fully-associative: ASSOC = # blocks

ECE 463/563, Microprocessor Architecture, 2


Prof. Eric Rotenberg
Generic Cache (cont.)
Cache Type Viewed as generic “N-way Set-Associative Cache”,
where N is equal to…
“direct-mapped” N=1
“set-associative” 1 < N < # blocks
“fully-associative” N = # blocks

direct-mapped 2-way set-associative fully-associative


BLOCKSIZE 16 B 16 B 16 B
SIZE 128 B 128 B 128 B
ASSOC (N) 1 2 8
# blocks 8 8 8
# sets 8 4 1

way 0 way 0 way 1


set 0 set 0
set 1 set 1
set 2 set 2
set 3 set 3
set 4
set 5
set 6 way 0 way 1 way 2 way 3 way 4 way 5 way 6 way 7
set 7 set 0 ECE 463/563, Microprocessor Architecture, 3
Prof. Eric Rotenberg
Generic Cache (cont.)
• What this means for your Project 1 simulator
– You don’t have to treat the three cache types differently in
your simulator
– Support a generic N-way set-associative cache
– Don’t have to specifically worry about the two extremes
(direct-mapped / fully-associative)
• Also, question: “How do I specify ‘fully-associative’ in
the simulator command-line arguments?”
– You don’t specify this explicitly
– Instead, just specify ASSOC to be equal to SIZE/BLOCKSIZE
(the number of blocks)
ECE 463/563, Microprocessor Architecture, 4
Prof. Eric Rotenberg
Replacement Policy
• Which block in a set should be replaced when
a new block has to be allocated?
– LRU (Least-Recently-Used) is common (or cheaper
variants such as pseudo-LRU)
– Others: FIFO, Random, more advanced policies

ECE 463/563, Microprocessor Architecture, 5


Prof. Eric Rotenberg
LRU Implementation
• Small counter per block in set
– # bits in each counter = log2(ASSOC)
– Each block’s counter indicates recency of access w.r.t. other blocks’ counters
– For example: If a set has four blocks, each block gets a 2-bit counter. The block with
“0” (b’00) counter was most-recently referenced, the block with “1” (b’01) was
second-most-recently referenced, … and the block with “3” (b’11) was least-recently
referenced overall.
• If access hits in cache:
– Increment the counters of other blocks whose counters are less than the referenced
block’s counter (i.e., shift these formerly “more-recent” blocks to now be “less-
recent” than the referenced block)
– Set the referenced block’s counter to “0” (now the most-recently-used)
• If access misses in cache:
– Replace the LRU block (the block with counter “3”) and set the newly allocated
block’s counter to “0” (now the most-recently-used block)
– Increment the counters of all other blocks

ECE 463/563, Microprocessor Architecture, 6


Prof. Eric Rotenberg
LRU Example
• Blocks A, B, C, D, and E all map to the same set
• Trace: A B C D D D B D E
• (LRU counters are shown in parentheses)
(0) (1) (2) (3)
(1) (2) (3) A (0)

(2) (3) B (0) A (1)


(3) C (0) B (1) A (2)
D (0) C (1) B (2) A (3)

D (0) C (1) B (2) A (3)


D (0) C (1) B (2) A (3)
D (1) C (2) B (0) A (3)
D (0) C (2) B (1) A (3)

D (1) C (3) B (2) E (0)


ECE 463/563, Microprocessor Architecture, 7
Prof. Eric Rotenberg
Handling Writes
• Two questions
1. The write update question:
Suppose there is a write request to a memory
block that is cached in a given cache C. Is just
C’s copy of the block updated with new data,
or is the next level of the memory hierarchy
updated at the same time?
2. The write allocate question:
Suppose there is a write request to a memory
block that is not cached in a given cache C.
Do we bring the missing block into C (i.e., do
we “allocate” the block)?
ECE 463/563, Microprocessor Architecture, 8
Prof. Eric Rotenberg
The Write Update Question (1)

• Write-through (WT) policy

cache

next level in memory hierarchy

ECE 463/563, Microprocessor Architecture, 9


Prof. Eric Rotenberg
The Write Update Question (2)

• Write-back (WB) policy

cache

next level in memory hierarchy

ECE 463/563, Microprocessor Architecture, 10


Prof. Eric Rotenberg
The Write Update Question (3)
• Write-back (WB) policy
– What happens when a block previously written
to needs to be replaced?
1. Need to have a “dirty bit” (D) with each block in
the cache: set it when block is written to
2. When a dirty block is replaced, need to write
entire block back to next level of memory
(“writeback”)

ECE 463/563, Microprocessor Architecture, 11


Prof. Eric Rotenberg
The Write Update Question (4)

• Write-back (WB) policy

cache
D

replacement of a dirty block


causes writeback
next level in memory hierarchy

ECE 463/563, Microprocessor Architecture, 12


Prof. Eric Rotenberg
The Write Allocation Question (1)
• Write-Allocate (WA)
– Bring the block into the cache if the write
misses (handled just like a read miss)
– Typically, used with write-back policy: WBWA
• Write-No-Allocate (NA)
– Do not bring the block into the cache if the
write misses
– Typically, used with write-through policy:
WTNA

ECE 463/563, Microprocessor Architecture, 13


Prof. Eric Rotenberg
The Write Allocation Question (2)
• WTNA (scenario: the write misses)
write miss

cache

next level in memory hierarchy

ECE 463/563, Microprocessor Architecture, 14


Prof. Eric Rotenberg
The Write Allocation Question (3)
• WBWA (scenario: the write misses)
write miss

cache D

next level in memory hierarchy

ECE 463/563, Microprocessor Architecture, 15


Prof. Eric Rotenberg
Memory Hierarchies, and
Key Meaning of Dirty Bit
• Memory hierarchies can cause there to be
multiple copies of the same block among
different levels.
• A block is “dirty” within a given cache, if the
copy of the block in that cache differs from the
nearest copy in a level below it.
• A block is “clean” within a given cache, if the
copy of the block in that cache is the same as
the nearest copy in a level below it.
ECE 463/563, Microprocessor Architecture, 16
Prof. Eric Rotenberg
Memory Hierarchies, and Write Policies
• “Write policy” is a policy associated with a given cache
(how that cache handles write requests to it), not with
the memory hierarchy as a whole.
• That is, different caches at different levels of the
memory hierarchy may be designed with different write
policies.
• That is, for example, you can have a mix of WBWA and
WTNA caches in the same memory hierarchy.
• When a given cache receives a write request, how that
write is handled by that cache depends on that cache’s
local write policy.
ECE 463/563, Microprocessor Architecture, 17
Prof. Eric Rotenberg
Consider three different memory hierarchies. They all have L1 cache, L2 cache, and main memory. They differ in their L1
write policy or L2 write policy or both, as shown.

The block size is only 4 bytes. In the figure below, each byte of a block is shown as a small rectangle with a value inside. For
example, the third byte of block X has the value 99 and all other bytes have the value 0.

INITIAL STATE: Suppose block X was brought into both the L1 and L2 caches but subsequent activity caused block X to be
evicted from the L1 cache. Therefore, initially, block X is not in the L1 cache and block X is in the L2 cache.
Mem. Hier. #1 Mem. Hier. #2 Mem. Hier. #3

L1 cache (WBWA) L1 cache (WTNA) L1 cache (WTNA)

L2 cache (WTNA) L2 cache (WTNA) L2 cache (WBWA)

X: 0 0 99 0 X: 0 0 99 0 X: 0 0 99 0

Main Memory Main Memory Main Memory

X: 0 0 99 0 X: 0 0 99 0 X: 0 0 99 0

Suppose the processor issues a write request to the third byte of block X, changing its value from 99 to 55. For each memory
hierarchy:
• Show where copies of block X exist, after the write is performed.
• Show the values of all bytes within block X wherever block X exists, after the write is performed.

ECE 463/563, Microprocessor Architecture, 18


Prof. Eric Rotenberg
Mem. Hier. #1 Mem. Hier. #2 Mem. Hier. #3

L1 cache (WBWA) L1 cache (WTNA) L1 cache (WTNA)

X: 0 0 55 0

L2 cache (WTNA) L2 cache (WTNA) L2 cache (WBWA)


55 55
X: 0 0 99 0 X: 0 0 99 0 X: 0 0 99 0

Main Memory Main Memory Main Memory


55
X: 0 0 99 0 X: 0 0 99 0 X: 0 0 99 0

ECE 463/563, Microprocessor Architecture, 19


Prof. Eric Rotenberg
Mem. Hier. #1 Mem. Hier. #2 Mem. Hier. #3

L1 cache (WBWA) L1 cache (WTNA) L1 cache (WTNA)

X:
dirty 0 0 55 0

L2 cache (WTNA) L2 cache (WTNA) L2 cache (WBWA)


55 55
X: 0 0 99 0 X: 0 0 99 0 X: 0 0 99 0
clean clean dirty
Main Memory Main Memory Main Memory
55
X: 0 0 99 0 X: 0 0 99 0 X: 0 0 99 0

ECE 463/563, Microprocessor Architecture, 20


Prof. Eric Rotenberg

You might also like