0% found this document useful (0 votes)
52 views20 pages

Microprocessor & Computer Architecture (Μpca) : Unit 4: Cache Memory

No, the value in R2 may not be equal to the value in R3 due to the read-after-write hazard. With a write-through cache and write buffer that is not checked on read misses: - The SW instruction writes R3 to address 512, placing it in the write buffer. - The LW to address 1024 is a cache miss, so it must wait for the write buffer to drain before fetching from memory. - The LW to address 512 hits the cache but gets the old value from memory before the write buffer drained. So there is no guarantee the value in R2 will be the same as the new value in R3 written by the SW, due to the read occurring

Uploaded by

Pranathi Praveen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views20 pages

Microprocessor & Computer Architecture (Μpca) : Unit 4: Cache Memory

No, the value in R2 may not be equal to the value in R3 due to the read-after-write hazard. With a write-through cache and write buffer that is not checked on read misses: - The SW instruction writes R3 to address 512, placing it in the write buffer. - The LW to address 1024 is a cache miss, so it must wait for the write buffer to drain before fetching from memory. - The LW to address 512 hits the cache but gets the old value from memory before the write buffer drained. So there is no guarantee the value in R2 will be the same as the new value in R3 written by the SW, due to the read occurring

Uploaded by

Pranathi Praveen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Microprocessor & Computer

Architecture (μpCA)

Unit 4: Cache Memory


4th & 5th Optimization

UE19CS252

Session : 4.2
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty

• Observing the cache performance formula,


– Avg. memory access time = Hit time + Miss rate x Miss penalty,
• Improvements in Miss penalty is as advantageous as improvements in
miss rate.
• The performance gap between processor & memory raises a question:
– Should I make the cache faster to keep the pace with the speed of
the processor? Or
– Make the cache larger to overcome the widening performance gap
between the processor & the main memory?
• Answer for these questions is to do both.
– Adding another level of cache between memory & original cache simplifies the
decision.
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty

• First level cache can be small enough to match the clock cycle time of the processor.
• Second level cache be can be large enough to capture many accesses that would go
to main memory, thus reducing miss penalty.
• Multilevel cache will complicate performance analysis.
• Considering Memory access time for a two level cache using subscripts L1 & L2 to
refer, respectively, to the first & second level,
• The original formula is:
AMAT = Hit TimeL1 + Miss RateL1 x Miss PenaltyL1
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty
and
Miss PenaltyL1 = Hit TimeL2 + Miss RateL2 x Miss PenaltyL2

Substituting in original equation, we get,

AMAT = Hit TimeL1 + Miss RateL1 x [Hit TimeL2 + Miss RateL2 x Miss PenaltyL2]

Here,
Second level miss rate is measured on the leftovers from the first level cache.

To avoid ambiguity, the following terms are used for a two level cache system.
– Local miss rate
– Global miss rate
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty
Local miss rate:
• The number of misses in the cache divided by the total number of
memory accesses to this cache.
Ex: For first level cache it is, Miss RateL1
For second level cache it is, Miss RateL2
Global miss rate:
• The number of misses in the cache divided by the total number of memory
accesses generated by the processor.
Ex: Global miss rate for level1 cache is still Miss RateL1
but, for level2 cache it is : Miss RateL1 x Miss RateL2
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty

• Local miss rate is large for second level caches because


– The first level cache skims the cream of the memory accesses.
– Global miss rate is more useful measure.
• It indicates what fraction of the memory accesses that leave the processor go all the way
to memory.
– Here, the misses per instruction metric shines.
Expanding the memory stalls per instruction to add the impact of a second level,

Avg. Memory stalls per instruction = Misses per instructionL1 x Hit TimeL2 + Misses per
instructionL2 x Miss PenaltyL2
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty

Suppose that in 1000 memory references there are 40 misses in the first level cache and 20 misses in the
second –level cache. What are the various miss rates?
Assume the miss penalty from the L2 cache to memory is 200 clock cycles, the hit time of the L2 cache is 10
clock cycles, hit time for L1 cache is 1 clock cycle and there are 1.5 memory references per instruction.
What is the average memory access time and average stall cycles per instruction?
Ignore impact of writes.
Answer
The miss rate [either global or local ] for the first level cache is 40/1000 = 4%.
The local miss rate for the second-level cache is 20/40 = 50%.
The global miss rate of the second level cache is 20/1000 = 2%.
Then,
AMAT = Hit TimeL1 + Miss RateL1 x [Hit TimeL2 + Miss RateL2 x Miss PenaltyL2]
= 1+4% x ( 10 + 50% x 200 ) = 1 + 4% x 110 = 5.4 clock cycles.
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty
# of instruction = # of memory references / # of memory references per instruction.
= 1000 / 1.5 = 667 instructions.
Thus, for L1 cache ,
# misses for 40 memory accesses for 1000 instructions = 40 x 1.5 = 60 misses
and for 20 misses for L2 cache it is 1.5 x 20 = 30 misses.
Average memory stalls per instruction = Misses per instruction L1 x Hit timeL2 +
Misses per instruction L2 x Miss PenaltyL2
= (60/1000) x 10 + (30/1000) x 200
= 0.06 x 10 + 0.03 x200 = 0.6 + 6
Then, = 6.6 clock cycles.
Average memory = (AMAT - Hit timeL1 ) x Average # of memory references per instruction
stalls per instruction
= ( 5.4 - 1.0) x 1.5 = 6.6 clock cycles.
Note: The computation of the memory stalls per instruction is same for either way.
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty

First perspective :

Global cache miss rate is very similar to the


single cache miss rate of the second level
cache.

• Provided that the second level cache


is much larger than the first level
cache.
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty

Second Perspective:

• Local cache miss rate is not a good measure


of secondary caches.

• It is a function of the miss rate of the first


level cache.

• Can vary by changing the first – level cache.

Note: Global cache miss rate should be used when evaluating second level caches.
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty

Parameters for second level caches:


• Difference between two levels is the speed of the first level cache
• That affects the clock rate of the processor.
• While, Speed of the second level cache only affects the miss penalty of the first level
cache.
• Thus, many alternatives can be considered in second level cache that are ill chosen for
the first level cache.

Two major questions for the design of the second level cache:

1. Will it lower the average memory access time ?


2. How much does it cost ?
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty

• Initial decision is the size of a second level cache.


• Everything in first level cache is likely in the second level cache.
• The size of the second level cache should be much higher than the first.
• If second level caches are just a little bigger, the local miss rate will be higher.
• Thus, inspires the design of huge second level caches.
• Probably to the size of the main memory in older computers!!!

NOTE:
• Multi level inclusion is the natural policy for memory hierarchies.
• L1 data is always present in L2.
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty
MPCA - Fourth optimization:
Multilevel Caches to Reduce Miss Penalty

The essence of all cache designs is:

• Balancing fast hits and few misses.


• For second level caches,
• Many fewer hits than in the first level cache.
• Emphasis shifts to fewer misses.

• Insight leads to much larger caches and techniques to lower the miss rate:
• Such as higher associativity & larger blocks.
MPCA - Fifth optimization:
Prioritizes reads over writes

• This optimization serves reads before writes have been completed.


• Complexities of a write buffer with a write-through cache:
• The most improvement is a write buffer of a proper size.
• Memory accesses are complicated as it may hold the updated value of a location
needed on a read miss.
To resolve this ambiguity,
• Read miss to wait until the write buffer is empty.
or
• To check the contents of the write buffer on a read miss and if there are no conflicts
and the memory system is available.
• Let the read miss continue.
• Virtually, all processors use the later approach.
• Gives priority reads over writes.
MPCA - Fifth optimization:
Prioritizes reads over writes

• Write-back cache:
• The cost of the writes by the processor in a write back can also be reduced.
• Consider a read miss replacing a dirty block.
• Instead of writing the dirty block to the memory, and then reading memory, we could
copy the dirty block to a buffer, then read memory and then write memory will finish
sooner.
• Thus, if a read miss occurs, the processor can either stall until the buffer is empty or
• Check the addresses of the words in the buffer for conflicts.
MPCA - Fifth optimization:
Prioritizes reads over writes

Consider the following code sequence.


Ex: SW R3, 512 (R0)
LW R1, 1024 ( R0)
LW R2, 512 (R0)
Assume Direct Mapped Cache
Write –through cache that maps 512 and 1024 to the same block. Four word write buffer that is
not checked on a read miss. Will the value in R2 always be equal to the value in R3? Ans:
• This is a read-after-write data hazard in memory.
• The data in R3 are placed into the write buffer after the STR.
• The following LDR instruction uses the same cache index and is therefore a miss.
• The second LDR instruction, tries to put the value in location 512 into the register R2.
• This also results in a miss.
• If the write buffer hasn’t completed writing to location 512 in memory,
• The read of location 512 will put the old, wrong value into the cache block and then into R2.
• Without proper precautions, R3 would not be equal to R2!
Optimization 5
THANK YOU

Team MPCA
Department of Computer Science and
Engineering

You might also like