Microprocessor & Computer Architecture (Μpca) : Unit 4: Cache Memory
Microprocessor & Computer Architecture (Μpca) : Unit 4: Cache Memory
Architecture (μpCA)
UE19CS252
Session : 4.2
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty
• First level cache can be small enough to match the clock cycle time of the processor.
• Second level cache be can be large enough to capture many accesses that would go
to main memory, thus reducing miss penalty.
• Multilevel cache will complicate performance analysis.
• Considering Memory access time for a two level cache using subscripts L1 & L2 to
refer, respectively, to the first & second level,
• The original formula is:
AMAT = Hit TimeL1 + Miss RateL1 x Miss PenaltyL1
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty
and
Miss PenaltyL1 = Hit TimeL2 + Miss RateL2 x Miss PenaltyL2
AMAT = Hit TimeL1 + Miss RateL1 x [Hit TimeL2 + Miss RateL2 x Miss PenaltyL2]
Here,
Second level miss rate is measured on the leftovers from the first level cache.
To avoid ambiguity, the following terms are used for a two level cache system.
– Local miss rate
– Global miss rate
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty
Local miss rate:
• The number of misses in the cache divided by the total number of
memory accesses to this cache.
Ex: For first level cache it is, Miss RateL1
For second level cache it is, Miss RateL2
Global miss rate:
• The number of misses in the cache divided by the total number of memory
accesses generated by the processor.
Ex: Global miss rate for level1 cache is still Miss RateL1
but, for level2 cache it is : Miss RateL1 x Miss RateL2
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty
Avg. Memory stalls per instruction = Misses per instructionL1 x Hit TimeL2 + Misses per
instructionL2 x Miss PenaltyL2
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty
Suppose that in 1000 memory references there are 40 misses in the first level cache and 20 misses in the
second –level cache. What are the various miss rates?
Assume the miss penalty from the L2 cache to memory is 200 clock cycles, the hit time of the L2 cache is 10
clock cycles, hit time for L1 cache is 1 clock cycle and there are 1.5 memory references per instruction.
What is the average memory access time and average stall cycles per instruction?
Ignore impact of writes.
Answer
The miss rate [either global or local ] for the first level cache is 40/1000 = 4%.
The local miss rate for the second-level cache is 20/40 = 50%.
The global miss rate of the second level cache is 20/1000 = 2%.
Then,
AMAT = Hit TimeL1 + Miss RateL1 x [Hit TimeL2 + Miss RateL2 x Miss PenaltyL2]
= 1+4% x ( 10 + 50% x 200 ) = 1 + 4% x 110 = 5.4 clock cycles.
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty
# of instruction = # of memory references / # of memory references per instruction.
= 1000 / 1.5 = 667 instructions.
Thus, for L1 cache ,
# misses for 40 memory accesses for 1000 instructions = 40 x 1.5 = 60 misses
and for 20 misses for L2 cache it is 1.5 x 20 = 30 misses.
Average memory stalls per instruction = Misses per instruction L1 x Hit timeL2 +
Misses per instruction L2 x Miss PenaltyL2
= (60/1000) x 10 + (30/1000) x 200
= 0.06 x 10 + 0.03 x200 = 0.6 + 6
Then, = 6.6 clock cycles.
Average memory = (AMAT - Hit timeL1 ) x Average # of memory references per instruction
stalls per instruction
= ( 5.4 - 1.0) x 1.5 = 6.6 clock cycles.
Note: The computation of the memory stalls per instruction is same for either way.
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty
First perspective :
Second Perspective:
Note: Global cache miss rate should be used when evaluating second level caches.
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty
Two major questions for the design of the second level cache:
NOTE:
• Multi level inclusion is the natural policy for memory hierarchies.
• L1 data is always present in L2.
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty
• Insight leads to much larger caches and techniques to lower the miss rate:
• Such as higher associativity & larger blocks.
MPCA - Fifth optimization:
Prioritizes reads over writes
• Write-back cache:
• The cost of the writes by the processor in a write back can also be reduced.
• Consider a read miss replacing a dirty block.
• Instead of writing the dirty block to the memory, and then reading memory, we could
copy the dirty block to a buffer, then read memory and then write memory will finish
sooner.
• Thus, if a read miss occurs, the processor can either stall until the buffer is empty or
• Check the addresses of the words in the buffer for conflicts.
MPCA - Fifth optimization:
Prioritizes reads over writes
Team MPCA
Department of Computer Science and
Engineering