Patterson6e MIPS Ch05 Modified Part2
Patterson6e MIPS Ch05 Modified Part2
6
Edition
th
Chapter 5
Large and Fast: Exploiting
Memory Hierarchy
PART 1
1. Seek Time:
The time taken to position the read/write head over the correct track.
Given as 4ms.
2. Rotational Latency:
Time taken for the desired sector to rotate under the read/write head.
The disk spins at 15,000 RPM, meaning:
Time per revolution=60 sec/15,000=4 ms per revolution
Time per revolution=15,00060 sec=4 ms per revolution
On average, the desired sector will be halfway around, so:
Rotational latency=1/2×4=2 ms
3. Transfer Time:
Time to read the 512B sector.
Given the transfer rate is 100MB/s, we convert:
100MB/s=100×1024×1024 bytes per second=104,857,600 B/s
Transfer time=512/104,857,600=0.00000488 seconds=0.005 ms
4. Controller Overhead:
Delay due to processing overhead in the disk controller.
Given as 0.2ms.
Total Average Read Time:
4 ms+2 ms+0.005 ms+0.2 ms=6.2 ms
Thus, the average time required to read a sector from disk is 6.2ms.
Disk Access Example
The average read time for a disk access is calculated by summing up the
following four components:
Sector size, Disk speed, (Rotations Per Minute), Average seek time, Transfer
rate, Controller overhead
Given
512B sector, 15,000rpm, 4ms average seek time, 100MB/s transfer rate,
4ms seek time
+ ½ / (15,000/60) = 2ms rotational latency
+ 512 / 100MB/s = 0.005ms transfer time
+ 0.2ms controller delay
= 6.2ms
How do we know if
the data is present?
Where do we look?
Direct mapped: only one choice (For a given block of data in main memory, there is only
one specific location in the cache where that block can be placed)
(Block address) modulo (#Blocks in cache)-The block address is divided by the
total number of blocks in the cache, and the remainder (modulo) determines the cache location. This ensures that the cache
locations are distributed evenly.
#Blocks is a power of 2
Use low-order address bits -Utilizing
the least significant bits of the memory address for determining the
cache location. the low-order bits of the memory address are
employed to index into the cache and determine the specific cache
location where a particular block of data should be stored or
retrieved.
111 N
Cache Example
Word Binary Hit/ Cache
Processor reads address 18 addr addr miss block
The cache index is 010 18 10 010 Miss 010
The Tag of 010 is 11 When add combine index to tag then
compare to address they don’t match so Miss
Index V Tag Data
So address of Mem[10010] Doesn't match 10010 so doesn't read
from this address just flag Miss and go read from Ram 000 Y 10 Mem[10000]
001 N
010 Y 11 Mem[11010]
011 Y 00 Mem[00011]
100 N
101 N
110 Y 10 Mem[10110]
111 N
Index: determines
31 10 9 4 3 0 location of the
block in the cache
Tag Index Offset
Offset: Determine
22 bits 6 bits 4 bits
location of Byte
26 24 inside the block
RAM).
3. Types of Cache Misses:
Instruction Cache Miss:
Happens when the CPU can't find the next instruction in the
cache.
The CPU restarts instruction fetching from memory.
Data Cache Miss:
Happens when the required data is not in the cache.
The CPU waits until the data is fully loaded before continuing.
Types of Data Write in Cache Memory:
Data write refers to the process of storing or updating data in memory when a
program executes a store (write) instruction. This happens when the CPU wants
to write new data into a memory location, either in the cache or main memory.
Fetch the block from RAM, store it in the cache, and then write the new data.
Ensures future accesses are faster but adds extra memory access time.
Write Around:
Do NOT fetch the block into the cache, just write directly to RAM.
Useful when programs overwrite entire blocks before reading them (e.g., initializing data).
Avoids unnecessary cache pollution but may slow down future reads.
2. For Write-Back Cache
Usually Fetch the Block:
The block is loaded into cache first, and then the write occurs.
Ensures that future writes can be done quickly in cache without writing to RAM
immediately.
Summary
Write-Through: Either fetch block on a miss (allocate on miss) or skip fetching and write only to
RAM (write around).
Write-Back: Usually fetches the block, so future writes stay in cache before updating RAM.
1. Processor Type:
It is an embedded MIPS processor optimized for performance.
Can access both instructions and data in the same cycle for faster execution.
2. Cache Architecture:
Split cache design → Separate Instruction Cache (I-cache) and Data Cache (D-cache).
Each cache is 16KB, containing 256 blocks with 16 words per block.
D-cache supports both write-through and write-back policies for data storage.
D-cache miss rate: 11.4% (higher, meaning more data requests need to access main
memory).
Weighted average miss rate: 3.2%, showing overall efficiency.
Each memory unit has a fixed width (e.g., 1 word per access).
DRAM is connected to the CPU via a clocked bus, which is slower than the CPU clock.
15 bus cycles are required for DRAM to access the data.
1 bus cycle per word is needed to transfer data from DRAM to cache.
Summary:
DRAM is slower than the CPU, causing delays when fetching data.
Fetching a block requires multiple bus cycles, making cache performance important.
Miss penalty is high (65 cycles), and bandwidth is low (0.25 B/cycle), meaning memory
4. Speedup Calculation
Speedup = CPI with cache misses / Ideal CPI (without misses)
= 5.44/ 2=2.72 (Speedup of a perfect cache system = 2.72× faster
Average Access Time
Hit time is also important for performance
Average memory access time (AMAT)
AMAT = Hit time + Miss rate × Miss penalty
Example
CPU with 1ns clock, hit time = 1 cycle, miss
penalty = 20 cycles, I-cache miss rate = 5%
AMAT = 1 + 0.05 × 20 = 2ns
2 cycles per instruction
0 0 miss Mem[0]
8 0 miss Mem[8]
0 0 miss Mem[0]
6 2 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 55
Associativity Example –Direct Mapped
0 0 miss Mem[0]
8 0 miss Mem[8]
0 0 miss Mem[0]
6 2 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]
1.Accessing block 0 results in a miss, as the cache is initially empty. Block 0 is then loaded into cache line 0.
2.Accessing block 8 also results in a miss because block 8 maps to the same cache line as block 0 due to
direct mapping, causing block 0 to be replaced.
3.Accessing block 0 again results in another miss since it was replaced by block 8 in the previous step. Block
0 is loaded back into cache line 0, replacing block 8.
4.Accessing block 6 results in a miss since it has not been loaded into the cache yet. It is then loaded into
cache line 2, which is determined by the block address (6 mod 4 = 2).
5.Finally, accessing block 8 again results in a miss because block 0 replaced it earlier. Block 8 is loaded back
into cache line 0.
This demonstrates the limitation of direct-mapped caches, where blocks that map to the same cache line can
cause frequent cache misses if they are accessed in an alternating pattern. In this case, blocks 0 and 8 map
to cache line 0 and keep replacing each other, leading to a cache miss each time they are accessed.
0 0 miss Mem[0]
8 0 miss Mem[0] Mem[8]
0 0 hit Mem[0] Mem[8]
6 0 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]
Fully associative
Block Hit/miss Cache content after access
address
0 miss Mem[0]
8 miss Mem[0] Mem[8]
0 hit Mem[0] Mem[8]
6 miss Mem[0] Mem[8] Mem[6]
8 hit Mem[0] Mem[8] Mem[6]
0 0 miss Mem[0]
8 0 miss Mem[0] Mem[8]
0 0 hit Mem[0] Mem[8]
6 0 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]
1.Block 0 is accessed, resulting in a miss since the cache is initially empty. Block 0 is placed in one of the
lines in set 0.
2.Block 8 is accessed and also results in a miss as it's not in the cache yet. It is placed in the other line of
set 0.
3.Block 0 is accessed again. This time, it results in a hit because it is already in set 0.
4.Block 6 is accessed, leading to a miss. Because set 0 is already full, the cache must replace one of the
blocks. According to some replacement policy (like Least Recently Used, or LRU), one of the existing
blocks (probably block 0 or block 8, depending on which was less recently used) is replaced with block 6.
In this case, block 0 is replaced.
5.Block 8 is accessed, which results in a miss since it was replaced earlier by block 6 in set 0. Block 8 is
loaded back into set 0, replacing block 6.
The advantage of 2-way set associative caches over direct-mapped caches is apparent in step 3, where a
direct-mapped cache would have resulted in a miss due to conflict, but the set associative cache gets a
hit because it can hold two blocks with the same index
4.Block 6 is accessed, leading to a miss since it's not in the cache. It is placed in the next available cache
line.
5.Block 8 is accessed once more. This is a hit because block 8 is still in the cache from the previous
access.
In a fully associative cache, as long as there's a free line available, new blocks are added without
replacing others. This strategy tends to have higher hit rates compared to direct-mapped or set-
associative caches, as it does not suffer from conflicts where multiple data blocks compete for the same
cache line. However, fully associative caches can be more complex and expensive to implement because
they require more hardware to search all cache lines simultaneously for a match.
2. If all cache lines are valid, the replacement policy chooses which entry to
replace.
3. Least-recently used (LRU):
1. The cache line that hasn't been used for the longest time is replaced.
for a 4-way set associative cache but becomes complex with higher
associativity.
4. Random:
1. A random cache line within the set is chosen for replacement.
similarly to LRU since tracking the least-recently used line becomes less
cost-effective.
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 62
Multilevel Caches
Primary cache attached to CPU
Small, but fast
Level-2 cache services misses from
primary cache
Larger, slower, but still faster than main
memory
Main memory services L-2 cache misses
Some high-end systems include L-3 cache
The effective CPI increases significantly due to cache misses, demonstrating the impact
of memory access delays when using only a primary cache.
optimization for
memory access
older accesses
new accesses
Unoptimized Blocked
Service interruption
Deviation from
specified service
Hardware caches
Reduce comparisons to reduce cost
Virtual memory
Full table lookup makes full associativity feasible
Benefit in reduced miss rate
31 10 9 4 3 0
Tag Index Offset
18 bits 10 bits 4 bits
Read/Write Read/Write
Valid Valid
32 32
Address Address
32 Cache 128 Memory
CPU Write Data Write Data
32 128
Read Data Read Data
Ready Ready
Multiple cycles
per access
Could partition
into separate
states to
reduce clock
cycle time
3 CPU A writes 1 to X 1 0 1