0% found this document useful (0 votes)
48 views61 pages

EE6304 Lecture9 Mem Caches

The document discusses caches and cache organization. It begins by explaining the principle of locality and how caches exploit it. It then defines key cache terms like hit rate, miss rate, hit time, and miss penalty. Common cache organizations like direct mapping, set associativity, and full associativity are explained. The benefits and drawbacks of each organization are highlighted.

Uploaded by

Ashish Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views61 pages

EE6304 Lecture9 Mem Caches

The document discusses caches and cache organization. It begins by explaining the principle of locality and how caches exploit it. It then defines key cache terms like hit rate, miss rate, hit time, and miss penalty. Common cache organizations like direct mapping, set associativity, and full associativity are explained. The benefits and drawbacks of each organization are highlighted.

Uploaded by

Ashish Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

EEDG/CE/CS 6304 Computer Architecture

Lecture 9 - Caches

Benjamin Carrion Schaefer


Associate Professor
Department of Electrical and Computer Engineering
Course Overview
• Fundamentals of Design and Analysis of
Computers (2 lectures)
– History, technological breakthroughs, etc.
– Trends and metrics: performance,
power/energy, cost
• CPU (7 Lectures)
– Instruction Set Architecture
– Arithmetic for Computers (new)
– Instruction Level Parallelism (ILP)
– Dynamic instruction scheduling
– Branch prediction
– Thread-level parallelism
– Modern processors
• Memories (4 Lectures)
– Memory hierarchy
– Caches
– Secondary storage
– Virtual memory
• Buses (1 lecture)
• New computer structures: Heterogeneous
computing (1 lecture)
Learning Objectives
• Upon completion of this lecture, you will be
able to:
– Understand what caches are and why they
contribute to increase the CPU performance
– What is cache associativity
– Cost of cache misses
– Causes of cache misses
– Cache trade-offs

Ref1: Hennessey & Patterson, 6th Edition, Morgan Kaufmann– Appendix B


and Chapter 2
The Principle of Locality

• The Principle of Locality:


– Program access a relatively small portion of the address space
at any instant of time.
• Two Different Types of Locality:
– Temporal Locality (Locality in Time): If an item is referenced, it
will tend to be referenced again soon (e.g., loops, reuse)
– Spatial Locality (Locality in Space): If an item is referenced,
items whose addresses are close by tend to be referenced
soon
(e.g., straight line code, array access)
• Last 25+ years, HW relied on locality for speed
Cache vs. Main memory

• Data and instructions are stored on DRAM


chips – DRAM is a technology that has high bit
density, but relatively poor latency – an access
to data in memory can take as many as 300
cycles today
• Hence, some data is stored on the processor
in a structure called the cache – caches
employ SRAM technology, which is faster, but
has lower bit density
Caches ($) - Terminology
• Cache block is a synonym for cache line :
Minimum unit of address allocation in the cache
• Hit : data appears in some block in the $
• Hit Rate: the fraction of memory access found in
the $
• Hit Time: Time to find data in $
• Miss rate : 1-Hit Rate
• Miss penalty : Time to replace a block in the $
time to deliver the block to the processor
• Average Memory access time (AMAT)
– AMAT= Hit time + (Miss rate x Miss penalty)
Typical values
• Approximate values for L1 cache

Block (line) size Typical values


4-128 bytes
Hit time 1-4 cycles
Miss penalty 8-32 cycles
(access time) (6-10 cycles)
(transfer time) (2-22 cycles)
Miss rate 1%-20%
Cache size 1KB-256 KB
How Does a Cache Work?
• Exploit locality
– Temporal Locality:
• If an item is referenced, it will tend to be referenced
again soon
– Spatial Locality:
• If an item is referenced, items whose addresses are
close by tend to be referenced soon → Move blocks
consisting of contiguous words to the cache
Questions about Caches

1. When we copy data from main memory to the


cache, where exactly do we put it?
2. How can we know if a word is already in the
cache?
3. Caches will fill up. Which cache block should we
replace?
Direct Mapped Cache: The Simplest Cache
• Memory organization 16x8
Memory • Cache organization 4x8
address Memory
• Cache location 0 can be occupied by data from:
0 (0000) – Memory location 0,4,8,12
1 (0001) • Cache location 1 can be occupied by data from:
2 (0010) – Memory location 1,5,9,13
3 (0011)
• Address is 2 LSBs <1:0> → i mode 2k i=address, 2k cache
size
4 (0100)
Cache
5 (0101) Index 4-byte direct Cache
6 (0110)
0 (00)
7 (0111)
8 (1000)
1 (01)
9 (1001) 2 (10)
A (1010) 3 (11)
B (1011)
• Issues:
C (1100)
– Which memory location should
D (1101)
we place in the cache?
E (1110)
– How can we tell which memory
F (1111) location is in the cache?
Cache Tag and Cache Index
Memory
• Tags added to cache to supply rest of
address Memory address bits
0 (0000) • Assuming 16x8 memory organization
1 (0001) – Lowest N bits cache index
2 (0010) – Uppermost 16-N bits = tag
3 (0011)
4 (0100)
• Main memory address=Tag+cache index
5 (0101)
6 (0110)
7 (0111) Cache
8 (1000) Index Tag Data
9 (1001) 0 (00) 00
A (1010) 1 (01)
B (1011)
2 (10)
C (1100)
3 (11) 01
D (1101)
E (1110) Note: Cache block is a synonym for cache line :
F (1111) minimum unit of address allocation in the cache
Valid Bit

• Caches are empty (hold invalid data) when


started
• Need a flag bit to identify if data in cache is
valid or not

Cache Valid
Index Bit Tag Data
0 (00) 1
1 (01) 1
2 (10) 1
3 (11) 0 Invalid data
Retrieving Data from the Cache
• When the CPU requires instruction/data from memory,
the address will be sent to the cache controller
– Lowest k bits serve as cache index
– Upper m-k bits server as tag
• Data is sent to CPU if valid data is available
Loading Data into the Cache
• A copy of the data read from memory is stored into
the cache
• Lowest k bits of address specify a cache block
• Upper (m-k) specify the tag
• Data from memory is stored in the caches data field
• Valid bit is set to 1
Spatial Locality
Memory • How to make caches more efficient
address Memory exploiting locality?
0 (0000)
• Make cache block size larger than one
1 (0001)
byte
2 (0010)
3 (0011)
• E.g., two-byte blocks
4 (0100) – Last bit indicates which data entry
5 (0101)
6 (0110)
7 (0111)
8 (1000)
Cache
Index Tag Data
9 (1001)
A (1010) 0 (00)
B (1011) 1 (01)
C (1100) 2 (10)
D (1101)
3 (11)
E (1110)
F (1111)
Spatial Locality cont.
• When accessing main memory → Its entire
block (depending on the size) will be written
into the cache
• E.g., If address cache has a block size of 2 and
address 12h is ready from memory:
Locating Data in Multi Block Cache
• A block select (block offset) is required to
select which block to read
Example

Block
offset For the
following
addresses
below, what
byte is read?
• 1010
• 1110
Disadvantages of Direct Mapping Caches
• Advantages if Direct-mapped
Memory caches
address Memory
– Simple hardware to implement
0 (0000)
– Offset can be computed quickly and
1 (0001)
efficiently
2 (0010)
3 (0011)
• Disadvantages
4 (0100) – Cache can have low performance
5 (0101)
and be underutilized if program
address lead to same cache index
6 (0110)
E.g., 4,8,4,8,…
7 (0111)
8 (1000)
Cache
9 (1001)
Index Tag Data
A (1010)
0 (00)
B (1011)
C (1100) 1 (01)
D (1101) 2 (10)
E (1110) 3 (11)
F (1111)
Fully Associative Cache
• Allows data to be stored in any cache line
– When data is fetched from memory → It is placed
in any unused cache block
– No conflicts between multiple memory addresses
mapped onto a single cache block
Full Associative Cache
• Pros:
– Makes use of cache space more effectively
– No address conflicts
• Cons:
– It is expensive (area) to implement
• No index field → The entire addressed used as tag increasing cache size
• Data can be anywhere in cache →need to check every tag of every
cache block
Set Associativity
• Intermediate possibility
– Cache is divided into groups of blocks called sets.
– Each memory address maps to exactly one set in the cache,
but data may be placed in any block within that set
• If each set has 2x blocks, the cache is an 2x-way
associative cache
• 1-way associate cache = direct-mapped cache

1-way associativity 2-way associativity 4-way associativity


8 sets, 1 line each 4 sets, 2 line each 2 sets, 4 line each
Set Associativity
• Each location in main memory can be
cached by one or two cache locations
Locating a Set Associative Block

• If a cache has a 2s sets and each bock has 2n


bytes → the memory address can be partition
as follows:

• Need to compute the set index to select a set


within the cache instead of an individual block
Block Offset = Memory address mod 2N
Block Address = Memory address /2N
Set Index = Block address mod 2s
Example of Data in Set-associate Caches
• Where would data from memory byte address 1833h be placed,
assuming the eight-line cache designs below with 16 bytes per block
– 1833h is 0001 1000 0011 0011 (4 bits for block offset)
– Each line has 16 bytes, so the lowest 4 bits are the block offset
– For the 1-way cache, the next three bits (011)at the set index
– For the 2-way cache, the next two bits (11)
– For the 4-way cache the next single bit (1)
• Data may go in any block shown in green within the correct set

1-way associativity 2-way associativity 4-way associativity


8 sets, 1 line each 4 sets, 2 line each 2 sets, 4 line each
Example
• How many sets are there in a two-way set-
associate cache with 32-KB capacity and 64-byte
lines, and how many bits of the address are used
to select a set in this cache? What about an eight-
way set-associative cache with the same capacity
and line length?

1-way associativity 2-way associativity 4-way associativity


8 sets, 1 line each 4 sets, 2 line each 2 sets, 4 line each
Solution
• A 32-KB cache with 64-byte lines contains 512 lines of
data (32KB/64 bytes=512)
• In a two-way set-associative cache, each set contains 2
lines → 512/2=256 sets in the cache
• Number of addresses = log2(256)=8 → 8 bits of an
address used to select a set that the address maps to
• An 8-way set-associative cache has 64 lines (=512/8)
and uses log2(64)=6 bits of address to select a set
2-way Associative Cache Implementation
• Only two comparators compared to full-associative
• The cache tags are shorter
• Index points to set
4-Way Associativity Implementation
• 28 = 256 sets each with four ways (each with one
block)
Range of Set Associativity Caches
• For a fixed size cache, each increase by a factor of
two in associativity doubles the number of blocks
per set (i.e., the number of ways) and halves the
number of sets –decreases the size of the index by 1
and increases the size of the tag by 1
Replacement Policy
• Replacement policy:
– When a line must be evicted from $ to make room for
new data → determines which line to evict
1. Direct-mapped caches:
– No choice → incoming line can only be placed in one
location
2. Set-associative and fully-associative
– Contain multiple lines
– Goal: Minimize future cache misses by evicting lines
that will not be referenced in future
Policies
1. Least–Recently Used (LRU)
– Cache ranks each line on how recently they have been
accessed → evicts least-recently used
2. Random replacement (RR)
– Randomly select line from set
3. Least Frequently used (LFU)
– Similar to LRU except it keeps track of how many times
object was accessed
4. Not-most-recently used
– keep track of most referenced one and choose randomly
any other
→LRU most popular leads to slightly better results than
RR, but requires larger area overhead (relative complex to
implement)
Example
• A processor has an 8-bit physical address space and a
physically addressed cache. Memory is byte addressable. The
cache uses perfect LRU replacement. The processor supplies
the following sequence of addresses to the cache. The cache
is initially empty. The hit/miss outcome of each access is
shown below. Determine the block-size, associativity, and
size of the cache. Assume that all three are powers of two.

Address Outcome
0 Miss
2 Hit
4 Miss
128 Miss
0 Hit
128 Hit
64 Miss
4 Hit
0 Miss
32 Miss
64 Hit
Example Solution – Block Size
Address Outcome
0 Miss
2 Hit Tag ? Index ?
4 Miss 2
128 Miss
0 Hit
128 Hit
64 Miss
4 Hit
0 Miss
32 Miss
addr0 addr1 addr2 addr3
64 Hit

• First miss compulsory


miss. Cache empty
• Address means 2 hit and
4 miss block size of
4bytes (0,1,2,3)
Example Solution - Associativity Address Outcome
0 Miss
Addr 00 = 0x00 = 0000 0000b 2 Hit
Addr 32 = 0x20 = 0010 0000b 4 Miss
128 Miss
Addr 64 = 0x40 = 0100 0000b
0 Hit
Addr 128 = 0x80 = 1000 0000b 128 Hit
• Address 64 vacates address 0 (misses after) → 2-way. 64 Miss
If more than 2-way 64 would have not evicted 0 4 Hit

• Address 0, 64 and 128 must map to the same set → 0 Miss


32 Miss
index must be same, BUT address 32 should map to a 64 Hit
different set (does not vacate 64 - LRU) → 4-bits
• Cannot be more than 4-bits because of addr 64
Example Solution – Cache size
• 4 blocks, 2-way associative, 24 rows
➔ Cache size = 4 bytes/block x 2 lines/way x 24 sets=4bytesx2x16 = 128 bytes
Cache

Addr 00 = 0x00 = 0000 0000b


Addr 04 = 0x04 = 0000 0100b
Addr 32 = 0x20 = 0010 0000b
Addr 64 = 0x40 = 0100 0000b
Addr 128 = 0x80 =1000 0000b
….. Addresses 00, 64
12 and 128 mapped to
same set !
13

14

15

…..
Example 1
• Given the example code below, and assuming a
virtually-addressed direct-mapped cache of capacity
8KBytes and 64-bit blocks (8 bytes), compute the overall
miss rate (number of misses divided by number of
references). Assume that all variables except array
locations reside in registers, and that arrays A, B, and C
are placed consecutively in memory.

double A[1024], B[1024], C[1024];


for(int i=0;i<1000;i ++) {
A[i] = 35.0 * B[i] + C[i];
}
Example1 Solution
double A[1024], B[1024], C[1024];
for(int i=0;i<1000;i ++) {
A[i] = 35.0 * B[i] + C[i];
}
• Double = 64-bits = 8 bytes → 1 cache line per array element
• The first iteration accesses memory location &B[0], &C[0], and &A[0].
• Since the arrays are consecutive in memory, these locations are exactly
8KBytes = 213 x 8bits= 65,536 bits → 65,536/64bits = 1024 cache blocks/lines
• Hence, in a direct-mapped cache they conflict, and the access to C[0] will evict
B[0], while the access to A[0] will evict C[0]. As a result, every reference will
miss, leading to a 100% miss rate
Write-Back vs. Write-Through Caches
• Write-through
– Stored values are written into the level and sent to next
memory level down
– Ensures that the contents of level and next down level are same
• Write–back
– Keeps stored data in the level. When block that has been
written is evicted form level → contents of block are written
back to next level down
– Requires extra bit (dirty bit to indicate if data had been written
to)
• Caches can be implemented as
– Write-through
– Write-back
Dirty-Bit
• Indicates that its associated block of memory
has been modified and has not been saved yet
• When a block of memory is to be replaced, its
corresponding dirty bit is checked to see if the
block needs to be written back to secondary
memory before being replaced or if it can
simply be removed
Write-Back vs. Write Through Caches

• Both pros and cons


• Write-through Pros
– Not necessary to record which lines have been written →
data is always consistent with contents in next level
• Write-back
– Pros
• If line receives multiple store requests while in cache → waiting
until line is evicted can significantly reduce the number if writes
sent to next level
– Cons
• Requires extra hardware to track whether or not each line has
been written since fetched.
• In general, write-back caches have higher performance
Tag arrays
• Organized as a two-dimensional structure containing a row of tag
entries for each set in the cache
• Use CAM to read tag – check all entries at the same time
– Number of tag entries in each row = equal to associativity of the
cache
Number of entries in each set = Associativity of
cache
Tag entry Tag entry Tag entry Tag entry
Tag entry Tag entry Tag entry Tag entry
One row Tag entry Tag entry Tag entry Tag entry
for each Tag entry Tag entry Tag entry Tag entry
set in the Tag entry Tag entry Tag entry Tag entry
cache
Tag entry Tag entry Tag entry Tag entry
Tag entry Tag entry Tag entry Tag entry

All entries form the selected set are sent to the hit/miss logic
Tag Arrays
• A tag contains the information necessary to
record which line of data is stored in the line of
the data cache that is associated with the entry
• A tag entry consist of:
– a tag field that contains the portion of the address
– A valid bit that records whether or not the line
associated with this tag array contains valid data
– A dirty bit
– Depending on the replacement policy
• LRU: record how many of the other lines in the set have
been referenced since the last time the line it corresponds
was referenced (log2(cache associativity)) bits
Valid bit
Tag array entry
Dirty bit
Example
• How many bits of storage are required for the
tag array of a 320KB cache with 256-byte
cache lines and 4-way associativity if the cache
is write-back but does not require any
additional bits of data in the tag array to
implement the write-back policy? Assume that
the system containing the cache uses 32-bit
addresses and requires 1 dirty bit and 1 valid
bit
Solution
• A 32-KB cache with 256-byte lines contains:
– 32 KB/256= 128 lines
• Since the cache is 4-way set associative, it has:
– 128/4=32 sets → m=5 bits
• Lines that are 256 bytes long mean that log(256)=n=8 →
8+5=13 bit (m+n) of the address are used to select a set
and determine the byte within the line that an address
points to
• Tag field of each array entry = 32-13=19 bits long
• Adding 2 bits for the dirty and valid bit = 21 bits
• 21bits x 128 lines = 2,688 bits of storage in the tag array
Cache Misses and Performance
• How does this affect performance?
• Performance = Time / Program
Instructions Cycles Time
= X X
Program Instruction Cycle
(code size) (CPI) (cycle time)
• Cache organization affects cycle time
– Hit latency
• Cache misses affect CPI
Average Memory Access Time (AMAT)

• Indicator of how many cycles it takes to access


data in memory on average
𝐴𝑀𝐴𝑇 = 𝐻𝑖𝑡𝑡𝑖𝑚𝑒 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦

• Can be used recursively by substituting Miss


𝐴𝑀𝐴𝑇 = 𝐻𝑖𝑡𝐿1 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝐿1 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝐿1

𝐴𝑀𝐴𝑇 = 𝐻𝑖𝑡𝐿2 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝐿2 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝐿2

𝐴𝑀𝐴𝑇 = 𝐻𝑖𝑡𝑚𝑒𝑚 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝑚𝑒𝑚 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝑚𝑒𝑚

𝐴𝑀𝐴𝑇 = 𝐻𝑖𝑡𝐿1 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝐿1 𝑥𝐻𝑖𝑡𝐿2 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝐿2 𝑥𝐻𝑖𝑡𝑚𝑒𝑚 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝑚𝑒𝑚 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝑚𝑒𝑚
Example AMAT
• Calculate the AMAT for a system with the
following properties:
– L1$ hits in 1 cycle with local hit rate of 50%
– L2$ hits in 10 cycles with a local hit rate of 75%
– L3$ hits in 100 cycles with local hit rate of 90%
– Main memory always hits in 1000 cycles
Solution Example AMAT
L1$ hits in 1 cycle with local hit rate of 50%
L2$ hits in 10 cycles with a local hit rate of 75%
L3$ hits in 100 cycles with local hit rate of 90%
Main memory always hits in 1000 cycles

𝐴𝑀𝐴𝑇 = 𝐻𝑖𝑡𝐿1 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝐿1 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝐿1 = 1 + 1 − 0.5 𝑥


𝐻𝑖𝑡𝐿2 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝐿2 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝐿2 = 10 + 1 − 0.75 𝑥

𝐻𝑖𝑡𝐿3 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝐿3 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝐿3 = 100 + 1 − 0.9 𝑥

𝐻𝑖𝑡𝑚𝑒𝑚 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝑚𝑒𝑚 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝑚𝑒𝑚 = 1000

AMAT = 1+(1-0.5)(10+(1-0.75)(100+(1-0.9)(1000)))=31
Cache Misses and Performance
• CPI equation
– Only holds for misses that cannot be overlapped with other
activity
– Store misses often overlapped
• Place store in store queue → Wait for miss to complete →Perform
store
• Allow subsequent instructions to continue in parallel
– Modern out-of-order processors also do this for loads
• Cache performance modeling requires detailed modeling of entire
processor core
Sources of Cache Misses
• Compulsory (cold start or process migration, first reference): first access to
a block
– “Cold” fact of life: not a whole lot you can do about it
– Note: If you are going to run “billions” of instruction, Compulsory
Misses are insignificant
• Capacity:
– Cache cannot contain all blocks access by the program
– Solution: increase cache size
• Conflict (collision):
– Multiple memory locations mapped
to the same cache location
– Solution 1: increase cache size
– Solution 2: increase associativity
• Coherence (Invalidation): other process (e.g., I/O) updates memory
Cache Miss Rates
• Subtle tradeoffs between cache organization
parameters
– Large blocks reduce compulsory misses but increase miss
penalty
• #compulsory = (working set) / (block size)
• #transfers = (block size)/(bus width)
– Large blocks increase conflict misses
• #blocks = (cache size) / (block size)
– Associativity reduces conflict misses
– Associativity increases access time
Cache Miss Rages 3 C’s
• Vary size and associativity
– Compulsory misses are constant
– Capacity and conflict misses are reduced
9
8
Miss per Instruction (%)

7
6
5 Conflict
Capacity
4
Compulsory
3
2
1
0
8K1W 8K4W 16K1W 16K4W

© Hill, Lipasti
Cache Miss Rages 3 C’s
• Vary size and block size
– Compulsory misses drop with increased block size
– Capacity and conflict can increase with larger blocks

7
Miss per Instruction (%)

5
Conflict
4 Capacity
Compulsory
3

0
8K32B 8K64B 16K32B 16K64B
© Hill, Lipasti
Basic Cache Optimizations
• Metrics for cache optimizations : Hit latency, miss rate,
and miss penalty
• Reducing hit latency
– Block size
– Associativity
– Number of blocks
• Reducing Miss Rate
– Larger Block size (compulsory misses)
– Larger Cache size (capacity misses)
– Higher Associativity (conflict misses)
– Compiler optimizations
• Reducing Miss Penalty
– Multilevel Caches
– Hardware prefetching and compiler prefetching
Summary #1/2: The Cache Design Space
• Several interacting dimensions Cache Size
– cache size
– block size Associativity
– associativity
– replacement policy
– write-through vs write-back
– write allocation Block Size
• The optimal choice is a compromise
– depends on access characteristics
• workload Bad
• use (I-cache, D-cache, TLB)
– depends on technology / cost Good Factor A Factor B

• Simplicity often wins Less More


Summary #2/2: The Cache Design Space
• Small & Simple Caches are faster
Energy consumption per read
• Energy consumption per read increases as cache size and
associativity are increased.
Summary
• Need for memory hierarchies
• Caches
– Direct mapping
– Full associative
– N-way associative
– Performance and Trade-offs
• Replacement policies
• Write back policies
• Average Memory Access Time (AMAT)
• Source of cache missing
– Compulsory
– Capacity
– Conflict
– Coherence

You might also like