0% found this document useful (0 votes)
57 views20 pages

05) Cache Memory Introduction

Computer Architecture Lecture Material

Uploaded by

kootty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views20 pages

05) Cache Memory Introduction

Computer Architecture Lecture Material

Uploaded by

kootty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Computer Architecture:

Memory Hierarchy design

Dr. Ashok Kumar Turuk

1
Memory Hierarchy Design
 Cache Performance
 Techniques to Improve Memory Performance
 Memory Organization Technology, case study

2
What is a cache?
 Small, fast storage used to improve average access time to slow memory.
 Exploits spatial and temporal locality
 In computer architecture, almost everything is a cache!
 Registers “a cache” on variables – software managed
 First-level cache a cache on second-level cache
 Second-level cache a cache on memory
 Memory a cache on disk (virtual memory)
 TLB a cache on page table
 Branch-prediction a cache on prediction information?

Proc/Regs
L1-Cache
Bigger L2- Faster

Cache

Memory
3

Disk, Tape,
etc.
Some terms
 Cache hit: when the CPU finds a requested data item in the cache.
 Cache miss: when the CPU does not find the requested data item in the cache.
 Block: a fixed-size collection of data, which is retrieved from the main memory
and placed into cache. (cache unit)
 Temporal locality: recently accessed data items are likely to be accessed in the
near future.
 Spatial locality: items whose addresses are near one another tend to be
referenced closed together in time.
 The time required for the cache miss depends on both latency and bandwidth of
the memory. Latency determines the time to retrieve the first word of the block,
and bandwidth determines the time to retrieve the rest of this block.
 Virtual memory: the address space is usually broken into fixed number of
blocks (pages). At any time, each page resides either in main memory or on
disk. When the CPU references an item within a page that is not in the cache or
main memory, a page fault occurs, and the entire page is then moved from the
disk to main memory.
 The cache and main memory have the same relationship as the main memory
and disk.

4
Review: Cache performance
 Memory stall cycles: the number of cycles during
which CPU is stalled waiting for memory access.
 CPUtime=(CPUclock cycles+Memstall cycles) x Cycle time
 Memstall cycles=# of Misses x Miss Penalty
Misses
= IC x  Miss Penalty
Instruction
= IC  MemAccess 
Instruction MissRate  MissPenalty

 Miss Rate: the fraction of cache accesses that result in a


miss.

5
Impact on Performance
 Example: Assume we have a computer where CPI is 1.0 when all memory
accesses hit the cache. The only data accesses are loads and stores, and these
total 50% of the instructions. If the miss penalty is 25 clock cycles and the
miss rate is 2%, how much fast the computer would be, and what is that if all
instructions were cache hits?
CPI = 1, Miss Penalty = 25, Miss Rate = 0.02,
Memory References Per Instruction = 1 + 0.5
1. Computer that always hits:
CPU execution time
= (CPU clock cycles + memory
stall cycles) * CCT
= (IC*CPI+0) * CCT = IC*CCT
2. Computer with real cache
Memory stall cycles = IC * MRPI * MR* MP
= IC * (1+0.5) * 0.02 * 25 = 0.75 * IC
CPU execution time
= (CPU clock cycles + memory stall cycles) * CCT
= (IC + 0.75 * IC)*CCT
CPU execution time with cache / CPU execution time = 1.75 6
Traditional Four Questions for Memory
Hierarchy Designers

 Q1: Where can a block be placed in the upper level? (Block


placement)
 Fully Associative, Set Associative, Direct Mapped
 Q2: How is a block found if it is in the upper level?
(Block identification)
 Tag/Block
 Q3: Which block should be replaced on a miss?
(Block replacement)
 Random, LRU
 Q4: What happens on a write?
(Write strategy)
 Write Back or Write Through (with Write Buffer)

7
Q1: Where can a block be placed in the upper
level? (Block placement)
 Fully associative: any block in the main memory can be placed
in any block frame.
 It is flexible but expensive due to associativity
 Direct mapping: each block in memory is placed in a fixed block
frame with the following mapping function:
 block frame # = (Block address ) MOD (#of block frames in cache)
 It is inflexible but simple and economical.

8
Q1: Where can a block be placed in the upper
level? (Block placement)
 Set associative: The cache is divided into sets of block frames,
and each block from the memory is first mapped to a fixed set
wherein the block can be placed in any block frame. Mapping to a
set follows the function, called a bit selection:
 set # = (Block address) MOD (# of sets in cache)
 n-way set associative: there are n blocks in a set;
 fully associative is an m-way set associative if there are m block frames in
the cache; whereas, direct mapping is one-way set associative
 one-way, two-way, and four-way are the most frequently used methods.

9
Figure B.2 This example cache has eight block frames and memory has 32 blocks. The three options for caches are shown left
to right. In fully associative, block 12 from the lower level can go into any of the eight block frames of the cache. With direct
mapped, block 12 can only be placed into block frame 4 (12 modulo 8). Set associative, which has some of both features, allows
the block to be placed anywhere in set 0 (12 modulo 4). With two blocks per set, this means block 12 can be placed either in block
0 or in block 1 of the cache. Real caches contain thousands of block frames, and real memories contain millions of blocks. The set
associative organization has four sets with two blocks per set, called two-way set associative. Assume that there is nothing in the
cache and that the block address in question identifies lower-level block 12.

Copyright © 2011, Elsevier Inc. All rights Reserved. 10


Q2: How is a block found if it is in the upper
level?(Block identification)
 Each block frame in the cache has an address tag indicating the
block's address in the memory
 All possible tags are searched in parallel
 A valid bit is attached to the tag to indicate whether the block contains valid
information or not
 An address for a datum from CPU, A, is divided into
block address field and block offset field:
 block address = A / block size
 block offset = (A) MOD (block size)
 block address is further divided into tag and index
 index indicates the set in which the block may reside, while tag is compared
to indicate a hit or a miss

11
Q3: Which block should be replaced on a miss?
(Block replacement)
 On a miss the cache controller must select a block be
replaced with the desired data.
 The more choices for replacement, the more expensive for hardware
 Direct mapping is the simplest

 Random, LRU, FIFO


 Random vs. Least-recently used (LRU): the former has uniform
allocation and is simple to build while the latter can take
advantage of temporal locality but can be expensive to implement

12
13
Q4: What happens on a write?
(Write strategy)
 Most cache accesses are reads:
 All instruction accesses are reads,
 Most instructions don’t write to memory.
 Optimize reads to make the common case fast
 Processor wait for the read to complete, but doesn't wait for writes
 Read is easy: reading and tag comparison can be done in parallel
 Write is hard: cannot overlap tag reading and block writing (destructive)
 CPU specifies write size: only 1 - 8 bytes
 Write strategies often distinguish cache design:
 Write through (or store through): write information to blocks
in both levels ensuring consistency at the cost of memory and bus
bandwidth
 Write stalls may be alleviated by using write buffers by
overlapping processor execution with memory updating

14
Q4: What happens on a write?
(Write strategy)
 Write back (store in): write information to blocks only in cache
level
 Minimizing memory and bus traffic at the cost of weak consistency
 Use dirty bit to indicate modification, reduce frequency of write-back
on replacement
 Read misses may result in writes
 On a write miss: the data are not needed
 Write allocate (The block is allocated on a write miss): used in write-back
 No-write allocate (write miss does not affect cache, modified in lower-
level cache): used in write-through

15
• Example: Assume a fully associative write–back
cache with many cache entries that start empty.
Below is a sequence of five memory operations (the
address is in square brackets):
Write Mem[100];
Write
Mem[100]; Read
Mem[200];
Write Mem[200];
Write
Mem[100];
What are the number of hits and misses when using no-write
allocate versus write allocate?

16
Example: Alpha 21264 Data Cache

For 2 set associative,


use round-robin
(FIFO) to choose
where to go

16 bytes Cache miss


Figure 5.7
17
Example: Fig. 5.7 Alpha AXP 21064 data cache (64-bit
machine). Cache size = 65,536 bytes (64K), block size = 64 bytes,
with two- way set-associative placement, write back, write allocate
on a write miss, 44 bit physical address. What is the index size?

Index size = f(cache size, block size, associativity)


2index size = number of sets in cache
2index size = cache size / (block size * set associativity)
= 216 / (26 * 2) = 29
block offset size = 6  64 byte block
tag length = 44 – 6 – 9 =29 bit.

18
Unified vs Split Caches
• Unified vs Separate I&D
Proc
Unified I-Cache-1 Proc D-Cache-1
Cache- Unified
1 Cache-2
Unified
Cache-
2

• Example:
– 16KB I&D: Inst miss rate=0.64%, Data miss rate=6.47%
– 32KB unified: Aggregate miss rate=1.99%
• Which is better (ignore L2 cache)?
– Assume 33% data ops  75% accesses from instructions
(1.0/1.33)
– hit time=1, miss time=50
19
– Note that data hit has 1 stall for unified cache (only one port)
Discussion: single (unified) cache vs. separate cache different miss rates
(74% instruction vs. 26% data), see figure 5.8.
(May have structural hazards from Load/Store with a single/unified
cache)
Size Instructi Data Unifie
on Cach d
cache e Cache
8 KB 8.16 44.0 63.0
16 KB 3.82 40.9 51.0

32 KB 1.36 38.4 43.3

64 KB 0.61 36.9 39.4

128 KB 0.30 35.3 36.2

256 KB 0.02 32.6 32.9

Figure 5.8 Miss per 1000 instructions


for inst, data and unified caches of diff
sizes 20

You might also like