Lecture 8
Lecture 8
CPU I/O
Cache Memory devices
Registers
Upper Lower
level level
Hits, misses ?
Memory Hierarchy Design
• To evaluate the effectiveness of the
memory hierarchy we can use the formula:
• Memory_stall_cycles =
– Block Place
– Block ID
– Block replaced
– Cache Main memory interactions
Block Place
• 3 methods to place blocks in the cache
– Direct mapped: has only one slot
» (Block address) MOD (Number of blocks in cache)
frame19 frame41
frame20 frame42
frame21 frame43
frame22 frame44
frame45
frame23
frame0 frame24
frame1 frame25
frame19 frame41
frame20 frame42
frame21 frame43
frame22 frame44
frame45
frame23
frame0 frame24
frame1 frame25
frame19 frame41
frame20 frame42
frame21 frame43
frame22 frame44
frame45
frame23
Block Identification
• Cache memory consists of two portions:
• Directory
- Address Tags ( checked to match the block address from CPU )
- Control Bits ( indicate that the content of a block is valid )
RAM
- Block Frames ( contain data of blocks )
24 bits
Block Identification
• Fully associative
» Memory size=16 MB=2** 24 bits
» Block size=32 B=2**5 bits
19 bits
5 bits
tag offset
24 bits
Block Identification
• Set Associative
» Memory size=16 MB=2** 24 bits
» Block size=32 B=2**5 bits
» Number 0f sets in cache= cache size/ (set size *
Block size) = 512KB/(2*32 B) = 2**13 bits
» Number of bits in tag=24-13-5=6
6 bits 13 bits 5 bits
24 bits
Block Replacement
• When a miss occurs, the cache controller must select a block to
be replaced with the desired data.
• With direct-mapped placement the decision is simple because
there is no choice: only one block frame is checked for a hit and
only that block can be replaced
• With fully-associative or set-associative placement, there are
more than one block to choose from on a miss
• Strategies
» First In First Out (FIFO)
» Most-Recently Used (MRU)
» Least-Frequently Used (LFU)
» Most-Frequently Used (MFU)
» Random
» Least-Recently Used (LRU)
Block Replacement
• Most two popular strategies
– Random - to spread allocation uniformly, candidate
blocks are randomly selected.
Advantage: simple to implement in hardware
Disadvantage: ignores principle of locality
F0 F0 F0 F0 F0 F0 F0 F0 F0 F0 F0 F0 F0 F0 F0 F0
F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1
F2 F2 F2 F2 F2 F2 F2 F2 F2 F2 F2 F2 F2 F2 F2 F2
F3 F3 F3 F3 F3 F3 F3 F3 F3 F3 F3 F3 F3 F3 F3 F3
© Gamal Famy
Memory Interaction with Cache
• Reads dominate processor cache accesses. All
instruction accesses are reads, and most
instructions do not write to memory.
• The read policies for a miss are:
Read Through - reading a word from main
memory to CPU
No Read Through - reading a block from main
memory to cache and then from cache to CPU
Memory Interaction with Cache
• The write policies on write hit often distinguish
cache designs:
• Write Through - the information is written to both
the block in the cache and to the block in the
lower-level memory.
• Advantage:
- read miss never results in writes to main memory
- easy to implement
- main memory always has the most current copy of the data
(consistent)
• Disadvantage:
- write is slower
- every write needs a main memory access
- as a result uses more memory bandwidth
Memory write on a hit (cont.)
• Write back - the information is written only to
the block in the cache. The modified cache
block is written to main memory only when it is
replaced.
• Advantage:
- writes occur at the speed of the cache memory
- multiple writes within a block require only one write to main
memory
- as a result uses less memory bandwidth
• Disadvantage:
- harder to implement
- main memory is not always consistent with cache
- reads that result in replacement may cause writes to main
memory
Memory write on a miss (cont.)
• Write Allocate - the block is loaded on a write miss,
followed by the write-hit action.