0% found this document useful (0 votes)

48 views61 pages

EE6304 Lecture9 Mem Caches

The document discusses caches and cache organization. It begins by explaining the principle of locality and how caches exploit it. It then defines key cache terms like hit rate, miss rate, hit time, and miss penalty. Common cache organizations like direct mapping, set associativity, and full associativity are explained. The benefits and drawbacks of each organization are highlighted.

Uploaded by

Ashish Soni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views61 pages

EE6304 Lecture9 Mem Caches

Uploaded by

Ashish Soni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

EEDG/CE/CS 6304 Computer Architecture

Lecture 9 - Caches

Benjamin Carrion Schaefer

Associate Professor
Department of Electrical and Computer Engineering
Course Overview
• Fundamentals of Design and Analysis of
Computers (2 lectures)
– History, technological breakthroughs, etc.
– Trends and metrics: performance,
power/energy, cost
• CPU (7 Lectures)
– Instruction Set Architecture
– Arithmetic for Computers (new)
– Instruction Level Parallelism (ILP)
– Dynamic instruction scheduling
– Branch prediction
– Thread-level parallelism
– Modern processors
• Memories (4 Lectures)
– Memory hierarchy
– Caches
– Secondary storage
– Virtual memory
• Buses (1 lecture)
• New computer structures: Heterogeneous
computing (1 lecture)
Learning Objectives
• Upon completion of this lecture, you will be
able to:
– Understand what caches are and why they
contribute to increase the CPU performance
– What is cache associativity
– Cost of cache misses
– Causes of cache misses
– Cache trade-offs

Ref1: Hennessey & Patterson, 6th Edition, Morgan Kaufmann– Appendix B

and Chapter 2
The Principle of Locality

• The Principle of Locality:

– Program access a relatively small portion of the address space
at any instant of time.
• Two Different Types of Locality:
– Temporal Locality (Locality in Time): If an item is referenced, it
will tend to be referenced again soon (e.g., loops, reuse)
– Spatial Locality (Locality in Space): If an item is referenced,
items whose addresses are close by tend to be referenced
soon
(e.g., straight line code, array access)
• Last 25+ years, HW relied on locality for speed
Cache vs. Main memory

• Data and instructions are stored on DRAM

chips – DRAM is a technology that has high bit
density, but relatively poor latency – an access
to data in memory can take as many as 300
cycles today
• Hence, some data is stored on the processor
in a structure called the cache – caches
employ SRAM technology, which is faster, but
has lower bit density
Caches ($) - Terminology
• Cache block is a synonym for cache line :
Minimum unit of address allocation in the cache
• Hit : data appears in some block in the $
• Hit Rate: the fraction of memory access found in
the $
• Hit Time: Time to find data in $
• Miss rate : 1-Hit Rate
• Miss penalty : Time to replace a block in the $
time to deliver the block to the processor
• Average Memory access time (AMAT)
– AMAT= Hit time + (Miss rate x Miss penalty)
Typical values
• Approximate values for L1 cache

Block (line) size Typical values

4-128 bytes
Hit time 1-4 cycles
Miss penalty 8-32 cycles
(access time) (6-10 cycles)
(transfer time) (2-22 cycles)
Miss rate 1%-20%
Cache size 1KB-256 KB
How Does a Cache Work?
• Exploit locality
– Temporal Locality:
• If an item is referenced, it will tend to be referenced
again soon
– Spatial Locality:
• If an item is referenced, items whose addresses are
close by tend to be referenced soon → Move blocks
consisting of contiguous words to the cache
Questions about Caches

1. When we copy data from main memory to the

cache, where exactly do we put it?
2. How can we know if a word is already in the
cache?
3. Caches will fill up. Which cache block should we
replace?
Direct Mapped Cache: The Simplest Cache
• Memory organization 16x8
Memory • Cache organization 4x8
address Memory
• Cache location 0 can be occupied by data from:
0 (0000) – Memory location 0,4,8,12
1 (0001) • Cache location 1 can be occupied by data from:
2 (0010) – Memory location 1,5,9,13
3 (0011)
• Address is 2 LSBs <1:0> → i mode 2k i=address, 2k cache
size
4 (0100)
Cache
5 (0101) Index 4-byte direct Cache
6 (0110)
0 (00)
7 (0111)
8 (1000)
1 (01)
9 (1001) 2 (10)
A (1010) 3 (11)
B (1011)
• Issues:
C (1100)
– Which memory location should
D (1101)
we place in the cache?
E (1110)
– How can we tell which memory
F (1111) location is in the cache?
Cache Tag and Cache Index
Memory
• Tags added to cache to supply rest of
address Memory address bits
0 (0000) • Assuming 16x8 memory organization
1 (0001) – Lowest N bits cache index
2 (0010) – Uppermost 16-N bits = tag
3 (0011)
4 (0100)
• Main memory address=Tag+cache index
5 (0101)
6 (0110)
7 (0111) Cache
8 (1000) Index Tag Data
9 (1001) 0 (00) 00
A (1010) 1 (01)
B (1011)
2 (10)
C (1100)
3 (11) 01
D (1101)
E (1110) Note: Cache block is a synonym for cache line :
F (1111) minimum unit of address allocation in the cache
Valid Bit

• Caches are empty (hold invalid data) when

started
• Need a flag bit to identify if data in cache is
valid or not

Cache Valid
Index Bit Tag Data
0 (00) 1
1 (01) 1
2 (10) 1
3 (11) 0 Invalid data
Retrieving Data from the Cache
• When the CPU requires instruction/data from memory,
the address will be sent to the cache controller
– Lowest k bits serve as cache index
– Upper m-k bits server as tag
• Data is sent to CPU if valid data is available
Loading Data into the Cache
• A copy of the data read from memory is stored into
the cache
• Lowest k bits of address specify a cache block
• Upper (m-k) specify the tag
• Data from memory is stored in the caches data field
• Valid bit is set to 1
Spatial Locality
Memory • How to make caches more efficient
address Memory exploiting locality?
0 (0000)
• Make cache block size larger than one
1 (0001)
byte
2 (0010)
3 (0011)
• E.g., two-byte blocks
4 (0100) – Last bit indicates which data entry
5 (0101)
6 (0110)
7 (0111)
8 (1000)
Cache
Index Tag Data
9 (1001)
A (1010) 0 (00)
B (1011) 1 (01)
C (1100) 2 (10)
D (1101)
3 (11)
E (1110)
F (1111)
Spatial Locality cont.
• When accessing main memory → Its entire
block (depending on the size) will be written
into the cache
• E.g., If address cache has a block size of 2 and
address 12h is ready from memory:
Locating Data in Multi Block Cache
• A block select (block offset) is required to
select which block to read
Example

Block
offset For the
following
addresses
below, what
byte is read?
• 1010
• 1110
Disadvantages of Direct Mapping Caches
• Advantages if Direct-mapped
Memory caches
address Memory
– Simple hardware to implement
0 (0000)
– Offset can be computed quickly and
1 (0001)
efficiently
2 (0010)
3 (0011)
• Disadvantages
4 (0100) – Cache can have low performance
5 (0101)
and be underutilized if program
address lead to same cache index
6 (0110)
E.g., 4,8,4,8,…
7 (0111)
8 (1000)
Cache
9 (1001)
Index Tag Data
A (1010)
0 (00)
B (1011)
C (1100) 1 (01)
D (1101) 2 (10)
E (1110) 3 (11)
F (1111)
Fully Associative Cache
• Allows data to be stored in any cache line
– When data is fetched from memory → It is placed
in any unused cache block
– No conflicts between multiple memory addresses
mapped onto a single cache block
Full Associative Cache
• Pros:
– Makes use of cache space more effectively
– No address conflicts
• Cons:
– It is expensive (area) to implement
• No index field → The entire addressed used as tag increasing cache size
• Data can be anywhere in cache →need to check every tag of every
cache block
Set Associativity
• Intermediate possibility
– Cache is divided into groups of blocks called sets.
– Each memory address maps to exactly one set in the cache,
but data may be placed in any block within that set
• If each set has 2x blocks, the cache is an 2x-way
associative cache
• 1-way associate cache = direct-mapped cache

1-way associativity 2-way associativity 4-way associativity

8 sets, 1 line each 4 sets, 2 line each 2 sets, 4 line each
Set Associativity
• Each location in main memory can be
cached by one or two cache locations
Locating a Set Associative Block

• If a cache has a 2s sets and each bock has 2n

bytes → the memory address can be partition
as follows:

• Need to compute the set index to select a set

within the cache instead of an individual block
Block Offset = Memory address mod 2N
Block Address = Memory address /2N
Set Index = Block address mod 2s
Example of Data in Set-associate Caches
• Where would data from memory byte address 1833h be placed,
assuming the eight-line cache designs below with 16 bytes per block
– 1833h is 0001 1000 0011 0011 (4 bits for block offset)
– Each line has 16 bytes, so the lowest 4 bits are the block offset
– For the 1-way cache, the next three bits (011)at the set index
– For the 2-way cache, the next two bits (11)
– For the 4-way cache the next single bit (1)
• Data may go in any block shown in green within the correct set

1-way associativity 2-way associativity 4-way associativity

8 sets, 1 line each 4 sets, 2 line each 2 sets, 4 line each
Example
• How many sets are there in a two-way set-
associate cache with 32-KB capacity and 64-byte
lines, and how many bits of the address are used
to select a set in this cache? What about an eight-
way set-associative cache with the same capacity
and line length?

1-way associativity 2-way associativity 4-way associativity

8 sets, 1 line each 4 sets, 2 line each 2 sets, 4 line each
Solution
• A 32-KB cache with 64-byte lines contains 512 lines of
data (32KB/64 bytes=512)
• In a two-way set-associative cache, each set contains 2
lines → 512/2=256 sets in the cache
• Number of addresses = log2(256)=8 → 8 bits of an
address used to select a set that the address maps to
• An 8-way set-associative cache has 64 lines (=512/8)
and uses log2(64)=6 bits of address to select a set
2-way Associative Cache Implementation
• Only two comparators compared to full-associative
• The cache tags are shorter
• Index points to set
4-Way Associativity Implementation
• 28 = 256 sets each with four ways (each with one
block)
Range of Set Associativity Caches
• For a fixed size cache, each increase by a factor of
two in associativity doubles the number of blocks
per set (i.e., the number of ways) and halves the
number of sets –decreases the size of the index by 1
and increases the size of the tag by 1
Replacement Policy
• Replacement policy:
– When a line must be evicted from $ to make room for
new data → determines which line to evict
1. Direct-mapped caches:
– No choice → incoming line can only be placed in one
location
2. Set-associative and fully-associative
– Contain multiple lines
– Goal: Minimize future cache misses by evicting lines
that will not be referenced in future
Policies
1. Least–Recently Used (LRU)
– Cache ranks each line on how recently they have been
accessed → evicts least-recently used
2. Random replacement (RR)
– Randomly select line from set
3. Least Frequently used (LFU)
– Similar to LRU except it keeps track of how many times
object was accessed
4. Not-most-recently used
– keep track of most referenced one and choose randomly
any other
→LRU most popular leads to slightly better results than
RR, but requires larger area overhead (relative complex to
implement)
Example
• A processor has an 8-bit physical address space and a
physically addressed cache. Memory is byte addressable. The
cache uses perfect LRU replacement. The processor supplies
the following sequence of addresses to the cache. The cache
is initially empty. The hit/miss outcome of each access is
shown below. Determine the block-size, associativity, and
size of the cache. Assume that all three are powers of two.

Address Outcome
0 Miss
2 Hit
4 Miss
128 Miss
0 Hit
128 Hit
64 Miss
4 Hit
0 Miss
32 Miss
64 Hit
Example Solution – Block Size
Address Outcome
0 Miss
2 Hit Tag ? Index ?
4 Miss 2
128 Miss
0 Hit
128 Hit
64 Miss
4 Hit
0 Miss
32 Miss
addr0 addr1 addr2 addr3
64 Hit

• First miss compulsory

miss. Cache empty
• Address means 2 hit and
4 miss block size of
4bytes (0,1,2,3)
Example Solution - Associativity Address Outcome
0 Miss
Addr 00 = 0x00 = 0000 0000b 2 Hit
Addr 32 = 0x20 = 0010 0000b 4 Miss
128 Miss
Addr 64 = 0x40 = 0100 0000b
0 Hit
Addr 128 = 0x80 = 1000 0000b 128 Hit
• Address 64 vacates address 0 (misses after) → 2-way. 64 Miss
If more than 2-way 64 would have not evicted 0 4 Hit

• Address 0, 64 and 128 must map to the same set → 0 Miss

32 Miss
index must be same, BUT address 32 should map to a 64 Hit
different set (does not vacate 64 - LRU) → 4-bits
• Cannot be more than 4-bits because of addr 64
Example Solution – Cache size
• 4 blocks, 2-way associative, 24 rows
➔ Cache size = 4 bytes/block x 2 lines/way x 24 sets=4bytesx2x16 = 128 bytes
Cache

Addr 00 = 0x00 = 0000 0000b

Addr 04 = 0x04 = 0000 0100b
Addr 32 = 0x20 = 0010 0000b
Addr 64 = 0x40 = 0100 0000b
Addr 128 = 0x80 =1000 0000b
….. Addresses 00, 64
12 and 128 mapped to
same set !
13

…..
Example 1
• Given the example code below, and assuming a
virtually-addressed direct-mapped cache of capacity
8KBytes and 64-bit blocks (8 bytes), compute the overall
miss rate (number of misses divided by number of
references). Assume that all variables except array
locations reside in registers, and that arrays A, B, and C
are placed consecutively in memory.

double A[1024], B[1024], C[1024];

for(int i=0;i<1000;i ++) {
A[i] = 35.0 * B[i] + C[i];
}
Example1 Solution
double A[1024], B[1024], C[1024];
for(int i=0;i<1000;i ++) {
A[i] = 35.0 * B[i] + C[i];
}
• Double = 64-bits = 8 bytes → 1 cache line per array element
• The first iteration accesses memory location &B[0], &C[0], and &A[0].
• Since the arrays are consecutive in memory, these locations are exactly
8KBytes = 213 x 8bits= 65,536 bits → 65,536/64bits = 1024 cache blocks/lines
• Hence, in a direct-mapped cache they conflict, and the access to C[0] will evict
B[0], while the access to A[0] will evict C[0]. As a result, every reference will
miss, leading to a 100% miss rate
Write-Back vs. Write-Through Caches
• Write-through
– Stored values are written into the level and sent to next
memory level down
– Ensures that the contents of level and next down level are same
• Write–back
– Keeps stored data in the level. When block that has been
written is evicted form level → contents of block are written
back to next level down
– Requires extra bit (dirty bit to indicate if data had been written
to)
• Caches can be implemented as
– Write-through
– Write-back
Dirty-Bit
• Indicates that its associated block of memory
has been modified and has not been saved yet
• When a block of memory is to be replaced, its
corresponding dirty bit is checked to see if the
block needs to be written back to secondary
memory before being replaced or if it can
simply be removed
Write-Back vs. Write Through Caches

• Both pros and cons

• Write-through Pros
– Not necessary to record which lines have been written →
data is always consistent with contents in next level
• Write-back
– Pros
• If line receives multiple store requests while in cache → waiting
until line is evicted can significantly reduce the number if writes
sent to next level
– Cons
• Requires extra hardware to track whether or not each line has
been written since fetched.
• In general, write-back caches have higher performance
Tag arrays
• Organized as a two-dimensional structure containing a row of tag
entries for each set in the cache
• Use CAM to read tag – check all entries at the same time
– Number of tag entries in each row = equal to associativity of the
cache
Number of entries in each set = Associativity of
cache
Tag entry Tag entry Tag entry Tag entry
Tag entry Tag entry Tag entry Tag entry
One row Tag entry Tag entry Tag entry Tag entry
for each Tag entry Tag entry Tag entry Tag entry
set in the Tag entry Tag entry Tag entry Tag entry
cache
Tag entry Tag entry Tag entry Tag entry
Tag entry Tag entry Tag entry Tag entry

All entries form the selected set are sent to the hit/miss logic
Tag Arrays
• A tag contains the information necessary to
record which line of data is stored in the line of
the data cache that is associated with the entry
• A tag entry consist of:
– a tag field that contains the portion of the address
– A valid bit that records whether or not the line
associated with this tag array contains valid data
– A dirty bit
– Depending on the replacement policy
• LRU: record how many of the other lines in the set have
been referenced since the last time the line it corresponds
was referenced (log2(cache associativity)) bits
Valid bit
Tag array entry
Dirty bit
Example
• How many bits of storage are required for the
tag array of a 320KB cache with 256-byte
cache lines and 4-way associativity if the cache
is write-back but does not require any
additional bits of data in the tag array to
implement the write-back policy? Assume that
the system containing the cache uses 32-bit
addresses and requires 1 dirty bit and 1 valid
bit
Solution
• A 32-KB cache with 256-byte lines contains:
– 32 KB/256= 128 lines
• Since the cache is 4-way set associative, it has:
– 128/4=32 sets → m=5 bits
• Lines that are 256 bytes long mean that log(256)=n=8 →
8+5=13 bit (m+n) of the address are used to select a set
and determine the byte within the line that an address
points to
• Tag field of each array entry = 32-13=19 bits long
• Adding 2 bits for the dirty and valid bit = 21 bits
• 21bits x 128 lines = 2,688 bits of storage in the tag array
Cache Misses and Performance
• How does this affect performance?
• Performance = Time / Program
Instructions Cycles Time
= X X
Program Instruction Cycle
(code size) (CPI) (cycle time)
• Cache organization affects cycle time
– Hit latency
• Cache misses affect CPI
Average Memory Access Time (AMAT)

• Indicator of how many cycles it takes to access

data in memory on average
𝐴𝑀𝐴𝑇 = 𝐻𝑖𝑡𝑡𝑖𝑚𝑒 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦

• Can be used recursively by substituting Miss

𝐴𝑀𝐴𝑇 = 𝐻𝑖𝑡𝐿1 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝐿1 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝐿1

𝐴𝑀𝐴𝑇 = 𝐻𝑖𝑡𝐿2 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝐿2 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝐿2

𝐴𝑀𝐴𝑇 = 𝐻𝑖𝑡𝑚𝑒𝑚 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝑚𝑒𝑚 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝑚𝑒𝑚

𝐴𝑀𝐴𝑇 = 𝐻𝑖𝑡𝐿1 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝐿1 𝑥𝐻𝑖𝑡𝐿2 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝐿2 𝑥𝐻𝑖𝑡𝑚𝑒𝑚 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝑚𝑒𝑚 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝑚𝑒𝑚
Example AMAT
• Calculate the AMAT for a system with the
following properties:
– L1$ hits in 1 cycle with local hit rate of 50%
– L2$ hits in 10 cycles with a local hit rate of 75%
– L3$ hits in 100 cycles with local hit rate of 90%
– Main memory always hits in 1000 cycles
Solution Example AMAT
L1$ hits in 1 cycle with local hit rate of 50%
L2$ hits in 10 cycles with a local hit rate of 75%
L3$ hits in 100 cycles with local hit rate of 90%
Main memory always hits in 1000 cycles

𝐴𝑀𝐴𝑇 = 𝐻𝑖𝑡𝐿1 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝐿1 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝐿1 = 1 + 1 − 0.5 𝑥

𝐻𝑖𝑡𝐿2 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝐿2 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝐿2 = 10 + 1 − 0.75 𝑥

𝐻𝑖𝑡𝐿3 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝐿3 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝐿3 = 100 + 1 − 0.9 𝑥

𝐻𝑖𝑡𝑚𝑒𝑚 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝑚𝑒𝑚 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝑚𝑒𝑚 = 1000

AMAT = 1+(1-0.5)(10+(1-0.75)(100+(1-0.9)(1000)))=31
Cache Misses and Performance
• CPI equation
– Only holds for misses that cannot be overlapped with other
activity
– Store misses often overlapped
• Place store in store queue → Wait for miss to complete →Perform
store
• Allow subsequent instructions to continue in parallel
– Modern out-of-order processors also do this for loads
• Cache performance modeling requires detailed modeling of entire
processor core
Sources of Cache Misses
• Compulsory (cold start or process migration, first reference): first access to
a block
– “Cold” fact of life: not a whole lot you can do about it
– Note: If you are going to run “billions” of instruction, Compulsory
Misses are insignificant
• Capacity:
– Cache cannot contain all blocks access by the program
– Solution: increase cache size
• Conflict (collision):
– Multiple memory locations mapped
to the same cache location
– Solution 1: increase cache size
– Solution 2: increase associativity
• Coherence (Invalidation): other process (e.g., I/O) updates memory
Cache Miss Rates
• Subtle tradeoffs between cache organization
parameters
– Large blocks reduce compulsory misses but increase miss
penalty
• #compulsory = (working set) / (block size)
• #transfers = (block size)/(bus width)
– Large blocks increase conflict misses
• #blocks = (cache size) / (block size)
– Associativity reduces conflict misses
– Associativity increases access time
Cache Miss Rages 3 C’s
• Vary size and associativity
– Compulsory misses are constant
– Capacity and conflict misses are reduced
9
8
Miss per Instruction (%)

7
6
5 Conflict
Capacity
4
Compulsory
3
2
1
0
8K1W 8K4W 16K1W 16K4W

© Hill, Lipasti
Cache Miss Rages 3 C’s
• Vary size and block size
– Compulsory misses drop with increased block size
– Capacity and conflict can increase with larger blocks

7
Miss per Instruction (%)

5
Conflict
4 Capacity
Compulsory
3

0
8K32B 8K64B 16K32B 16K64B
© Hill, Lipasti
Basic Cache Optimizations
• Metrics for cache optimizations : Hit latency, miss rate,
and miss penalty
• Reducing hit latency
– Block size
– Associativity
– Number of blocks
• Reducing Miss Rate
– Larger Block size (compulsory misses)
– Larger Cache size (capacity misses)
– Higher Associativity (conflict misses)
– Compiler optimizations
• Reducing Miss Penalty
– Multilevel Caches
– Hardware prefetching and compiler prefetching
Summary #1/2: The Cache Design Space
• Several interacting dimensions Cache Size
– cache size
– block size Associativity
– associativity
– replacement policy
– write-through vs write-back
– write allocation Block Size
• The optimal choice is a compromise
– depends on access characteristics
• workload Bad
• use (I-cache, D-cache, TLB)
– depends on technology / cost Good Factor A Factor B

• Simplicity often wins Less More

Summary #2/2: The Cache Design Space
• Small & Simple Caches are faster
Energy consumption per read
• Energy consumption per read increases as cache size and
associativity are increased.
Summary
• Need for memory hierarchies
• Caches
– Direct mapping
– Full associative
– N-way associative
– Performance and Trade-offs
• Replacement policies
• Write back policies
• Average Memory Access Time (AMAT)
• Source of cache missing
– Compulsory
– Capacity
– Conflict
– Coherence

Cache Mapping
100% (1)
Cache Mapping
44 pages
Cache Memory
No ratings yet
Cache Memory
61 pages
CL10 MemoryMgmt
No ratings yet
CL10 MemoryMgmt
45 pages
Lec 4
No ratings yet
Lec 4
31 pages
16-Cache Memory-13-03-2024
No ratings yet
16-Cache Memory-13-03-2024
50 pages
Unit 4
No ratings yet
Unit 4
72 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
10 Cache
No ratings yet
10 Cache
28 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
61 pages
Cache PPT
No ratings yet
Cache PPT
38 pages
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
54 pages
55-Types of Caches, Caches Misses,-04!03!2025
No ratings yet
55-Types of Caches, Caches Misses,-04!03!2025
64 pages
CS427 Final Fall2016
No ratings yet
CS427 Final Fall2016
25 pages
CAO - Lecutre7 Cache Memory
100% (1)
CAO - Lecutre7 Cache Memory
39 pages
Comp Arch Lect5
No ratings yet
Comp Arch Lect5
26 pages
Memory Cache (Finley 2000)
No ratings yet
Memory Cache (Finley 2000)
15 pages
11 Cache Memory
No ratings yet
11 Cache Memory
40 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
76 pages
Lecture 04 IS064
No ratings yet
Lecture 04 IS064
41 pages
04 Cache Memory
No ratings yet
04 Cache Memory
71 pages
Cache Memory
No ratings yet
Cache Memory
51 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
6.module 2 - Part 2
No ratings yet
6.module 2 - Part 2
39 pages
Wk10a Cache PDF
No ratings yet
Wk10a Cache PDF
25 pages
361 Computer Architecture Lecture 14: Cache Memory
No ratings yet
361 Computer Architecture Lecture 14: Cache Memory
20 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
Cache + Associations Ch-4
No ratings yet
Cache + Associations Ch-4
52 pages
Caching: Acknowledgements
No ratings yet
Caching: Acknowledgements
6 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
47 pages
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
72 pages
Cache Memory
No ratings yet
Cache Memory
57 pages
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
71 pages
Memory Organization PPT1
No ratings yet
Memory Organization PPT1
23 pages
Computer Organization and Architecture: Cache Memory
100% (1)
Computer Organization and Architecture: Cache Memory
57 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
Well Testing
100% (1)
Well Testing
109 pages
Cache Memory
No ratings yet
Cache Memory
39 pages
CMP3010L08 Memory
No ratings yet
CMP3010L08 Memory
45 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
Computer Architecture: Memory Hierarchy Design
No ratings yet
Computer Architecture: Memory Hierarchy Design
60 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
Module 6 CO 2020
No ratings yet
Module 6 CO 2020
40 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
71 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
Sampriya Chandra Cache Memory
No ratings yet
Sampriya Chandra Cache Memory
36 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
67 pages
Chap 6
No ratings yet
Chap 6
48 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
51 pages
Cache Design
No ratings yet
Cache Design
59 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
Assosiative Mapping - Cache Memory
No ratings yet
Assosiative Mapping - Cache Memory
2 pages
ST - Mapeh 6 - Q4
No ratings yet
ST - Mapeh 6 - Q4
2 pages
Cuestionario Mendix
No ratings yet
Cuestionario Mendix
6 pages
IRB4400 M2004 Product Manual 3HAC022032-001 RevH en
No ratings yet
IRB4400 M2004 Product Manual 3HAC022032-001 RevH en
309 pages
Lec 10
No ratings yet
Lec 10
111 pages
I. Faces, Surfaces, Edges and Vertices
No ratings yet
I. Faces, Surfaces, Edges and Vertices
5 pages
Annexure B
No ratings yet
Annexure B
8 pages
Security Model Workshop Template
No ratings yet
Security Model Workshop Template
21 pages
Simatic S5: S5EPROM For USB Prommer
No ratings yet
Simatic S5: S5EPROM For USB Prommer
10 pages
RK23EUA12 - Skill Based Assignment 1 - INT428 - Final Patent - Ai
No ratings yet
RK23EUA12 - Skill Based Assignment 1 - INT428 - Final Patent - Ai
14 pages
Les Iptv
No ratings yet
Les Iptv
5 pages
Cambridge International AS & A Level: Computer Science 9618/32
No ratings yet
Cambridge International AS & A Level: Computer Science 9618/32
16 pages
BIOS Beep Codes
No ratings yet
BIOS Beep Codes
6 pages
Class 8 Summer Holiday Homework
No ratings yet
Class 8 Summer Holiday Homework
16 pages
Last Minute Revision Dbms
No ratings yet
Last Minute Revision Dbms
23 pages
Computer Care and Lab Mangement
No ratings yet
Computer Care and Lab Mangement
41 pages
BVMS 11.1.1 - Upgrade and Migration
No ratings yet
BVMS 11.1.1 - Upgrade and Migration
34 pages
Actuators: Actuators FCD A 00-10 XX FCD A 00-15 XX FCD A 01-15 XX
No ratings yet
Actuators: Actuators FCD A 00-10 XX FCD A 00-15 XX FCD A 01-15 XX
16 pages
Readers and Identifiers
No ratings yet
Readers and Identifiers
74 pages
Module-6 2
No ratings yet
Module-6 2
13 pages
C011206551 Microsoft Antitrust Case
No ratings yet
C011206551 Microsoft Antitrust Case
13 pages
Java Lab
No ratings yet
Java Lab
13 pages
SOP - QBD To QBO Conversion
No ratings yet
SOP - QBD To QBO Conversion
11 pages
Ann Oil Gas Industry Review
No ratings yet
Ann Oil Gas Industry Review
12 pages
04.application of Spectral Decomposition and RGB Blending For Mapping The Distribution of Ubadari Sandstone in Kasuri Block
No ratings yet
04.application of Spectral Decomposition and RGB Blending For Mapping The Distribution of Ubadari Sandstone in Kasuri Block
18 pages
Programming Intel 8086
No ratings yet
Programming Intel 8086
2 pages
Samsung First Mover Advantages
No ratings yet
Samsung First Mover Advantages
3 pages
Fruit Basket App
No ratings yet
Fruit Basket App
3 pages
Yadav Raviranjan R: Area of Interest Career Objective
No ratings yet
Yadav Raviranjan R: Area of Interest Career Objective
1 page
Cisco 7600 Series Ethernet Services Plus 20 and 40-Gbps Line Cards Data Sheet
No ratings yet
Cisco 7600 Series Ethernet Services Plus 20 and 40-Gbps Line Cards Data Sheet
10 pages
200-301 CCNA (Cisco Certified Network Associate) Study Guide
From Everand
200-301 CCNA (Cisco Certified Network Associate) Study Guide
Anand Vemula
No ratings yet
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
IGNOU PGDCA MCS 208 Data Structure and Algorithm Previous Years Unsolved Papers
From Everand
IGNOU PGDCA MCS 208 Data Structure and Algorithm Previous Years Unsolved Papers
Manish Soni
No ratings yet
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet

EE6304 Lecture9 Mem Caches

Uploaded by

EE6304 Lecture9 Mem Caches

Uploaded by

EEDG/CE/CS 6304 Computer Architecture

Benjamin Carrion Schaefer

Ref1: Hennessey & Patterson, 6th Edition, Morgan Kaufmann– Appendix B

• The Principle of Locality:

• Data and instructions are stored on DRAM

Block (line) size Typical values

1. When we copy data from main memory to the

• Caches are empty (hold invalid data) when

1-way associativity 2-way associativity 4-way associativity

• If a cache has a 2s sets and each bock has 2n

• Need to compute the set index to select a set

1-way associativity 2-way associativity 4-way associativity

1-way associativity 2-way associativity 4-way associativity

• First miss compulsory

• Address 0, 64 and 128 must map to the same set → 0 Miss

Addr 00 = 0x00 = 0000 0000b

double A[1024], B[1024], C[1024];

• Both pros and cons

• Indicator of how many cycles it takes to access

• Can be used recursively by substituting Miss

𝐴𝑀𝐴𝑇 = 𝐻𝑖𝑡𝐿2 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝐿2 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝐿2

𝐴𝑀𝐴𝑇 = 𝐻𝑖𝑡𝑚𝑒𝑚 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝑚𝑒𝑚 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝑚𝑒𝑚

𝐴𝑀𝐴𝑇 = 𝐻𝑖𝑡𝐿1 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝐿1 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝐿1 = 1 + 1 − 0.5 𝑥

𝐻𝑖𝑡𝐿3 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝐿3 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝐿3 = 100 + 1 − 0.9 𝑥

𝐻𝑖𝑡𝑚𝑒𝑚 + 𝑀𝑖𝑠𝑠𝑟𝑎𝑡𝑒 𝑚𝑒𝑚 𝑥𝑀𝑖𝑠𝑠𝑝𝑒𝑛𝑎𝑙𝑡𝑦 𝑚𝑒𝑚 = 1000

• Simplicity often wins Less More

You might also like