0% found this document useful (0 votes)
39 views

CS 322M Digital Logic & Computer Architecture: Cache Optimization Techniques-II

This document discusses techniques for optimizing cache performance, including reducing miss rate, miss penalty, and hit time. It describes multi-banked caches that can support simultaneous accesses, non-blocking caches that allow hits during misses, early restart and critical word first to reduce miss penalties, write buffer merging, hardware and compiler prefetching to reduce miss rates, and compiler optimizations like loop interchange and blocking. The overall goal is to reduce average memory access time.

Uploaded by

sai rishi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

CS 322M Digital Logic & Computer Architecture: Cache Optimization Techniques-II

This document discusses techniques for optimizing cache performance, including reducing miss rate, miss penalty, and hit time. It describes multi-banked caches that can support simultaneous accesses, non-blocking caches that allow hits during misses, early restart and critical word first to reduce miss penalties, write buffer merging, hardware and compiler prefetching to reduce miss rates, and compiler optimizations like loop interchange and blocking. The overall goal is to reduce average memory access time.

Uploaded by

sai rishi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

CS 322M Digital Logic & Computer Architecture

Lecture 25 [28.10.2019]
Cache Optimization Techniques-II

John Jose
Assistant Professor
Department of Computer Science & Engineering
Indian Institute of Technology Guwahati, Assam.
Accessing Cache Memory
Hit time
Memory
CPU Cache Miss penalty

Average memory access time (AMAT) =


Hit time + (Miss rate×Miss penalty)

 Hit Time: Time to find the block in the cache and


return it to processor [indexing, tag comparison,
transfer].
 Miss Rate: Fraction of cache access result in a miss.
 Miss Penalty: Number of cycles required to fetch the
block from the next level of memory hierarchy. It is the
extra (not total) time (or cycle) for a miss in addition to
hit time which is incurred by all accesses.
How to optimize cache ?
 Reduce Average Memory Access Time
 AMAT= Hit Time + Miss Rate x Miss Penalty
 Motives
Reducing the miss rate
Reducing the miss penalty
Reducing the hit time
Multi-banked Caches
 Multi-banked caches to increase cache bandwidth
 Rather a single monolithic unit, divide cache into many
banks that can support simultaneous accesses.
ARM Cortex-A8 supports 1-4 banks for L2
Intel i7 supports 4 banks for L1 and 8 banks for L2
 Interleave banks according to block address
 Sequential Interleaving
Non-blocking Caches
 Non blocking caches to increase cache bandwidth
 Caches can serve hits under multiple cache miss in progress
(a) Hit under miss (b) Hit under multiple miss

 Must needed for OOO superscalar processor for IPC increase


 L2 must support it with L1-MSHR (Miss Status Holding Reg.)
 On an L1 miss allocate MSHR entry, clear upon L2 respond
with cache block reply.
 L1 miss penalty can be hidden to some extend.
Non-blocking Caches
 Non blocking caches to increase cache bandwidth
 Processors can hide L1 miss penalty but not L2 miss penalty
 Reduces the effective miss penalty by overlapping miss
latencies
 Significantly increases the complexity of the cache controller
as there can be multiple outstanding memory accesses
 Requires pipelined or banked memory system
Early Restart
 Early restart to reduce miss penalty
 CPU do not wait for entire block to be loaded
 Early restart
Request words in normal order
Missed word to the processor as soon as it arrives
Generally useful in large blocks
L2 controller is not involved in this technique
Critical Word First
 Critical word first to reduce miss penalty
 Critical word first
Request missed word from memory first
Send it to the processor as soon as it arrives
Processor resume while rest of the block is filled in
cache
L2 cache controller send words out of order.
L1 cache controller should re-arrange words in block
Merging Write Buffer
 Write buffer merging to reduce miss penalty
 Write buffer allows the processor to continue without waiting
for writes to get over.
 When performing a store/write on a block that is already
pending in the write buffer, update write buffer
 Reduces stalls due to full write buffer, improve buffer
efficiency
 If buffer is full writes incur processor stall.

No write buffering Write buffering


Hardware Prefetching
 Pre-fetching to reduce miss rate and miss penalty.
 Pre-fetch items before processor request them.
 Fetch more blocks on miss -include next sequential block
 Requested block is kept in I-cache and next in stream buffer.
 If a missed block is in stream buffer, cache miss is cancelled
Compiler Optimizations
 Compiler optimization to reduce miss rate
 Loop Interchange
Swap nested loops to access memory in sequential order
Maximize the use of data in cache before it is discarded
Compiler Optimizations
 Blocking
Instead of accessing entire rows or columns, subdivide
matrices into blocks
Requires more accesses but improves locality of accesses
Compiler Controlled Pre-fetching
 Pre-fetching to reduce miss rate and miss penalty.
 Insert pre-fetch instructions before data is needed
 Pre-fetching will give performance only if processor reads
from caches and executes while pre-fetching is in progress.
 Register pre-fetch
Loads data into register
 Cache pre-fetch
Loads data into cache
 Use loop unrolling and scheduling for pre-fetching data of
adjacent iterations
[email protected]
http://www.iitg.ac.in/johnjose/

You might also like