Test 6 PracticeQuestion Cachememory 1 Updated
Test 6 PracticeQuestion Cachememory 1 Updated
Programmers want memory to be fast, large, and cheap. Question may be asked whether we need a very
fast and very large memory, since programs access a small proportion of their address space at any time!
Architects have found that they can address these conflicting demands with a hierarchy of memories, with
the fastest, smallest, and most expensive memory per bit at the top of the hierarchy and the slowest, largest,
and cheapest per bit at the bottom. To optimize cost and speed, combination of memory devices are
used. Smaller amount of expensive but fast memory is used close to the processor whereas large
amounts of cheaper but slower memory is used farther from the processor. Hierarchy of memories
give the programmer the illusion that main memory is nearly as fast as the top of the hierarchy and nearly
as big and cheap as the bottom of the hierarchy. Due to hierarchy, to CPU, it would appear as fast as
most expensive memory and as big as the cheapest.
The fastest, smallest, and most expensive memory per bit at the top of the hierarchy and the slowest,
largest, and cheapest per bit at the bottom.
2. What is principle of locality? What is locality of reference? Define temporal locality and spatial locality.
Programs access a small proportion of their address space at any time
Temporal locality
Items accessed recently are likely to be accessed again soon
instructions in a loop
Spatial locality
Items near those accessed recently are likely to be accessed soon
sequential instruction access, array data
3. Cache Memory reduces frequency of access to RAM while CPU is running a program
Reduces average memory access time of a program
Reduces program execution time
Caches give the programmer the illusion that main memory is nearly as fast as the top of the hierarchy
and nearly as big and cheap as the bottom of the hierarchy.
Program (Instructions and data) are loaded into RAM. Cache memory is usually much smaller than the size of
RAM. So only a fraction of information (Instruction and Data) of RAM can be copied/transferred/stored into
cache memory. In a computer having cache memory, CPU is designed to search next instruction or data it
requires in cache. If it is found, it is called cache hit and the CPU will read instruction/data from cache. If the
instruction or data the CPU is searching for is not found in cache, it is called cache miss. In the event of cache
miss, the CPU will access RAM and a number (block) of instructions/data including the one the CPU is
searching for will be copied/transferred from RAM to cache memory following the spatial locality of
reference. The size of the block is only few bytes, i.e., 4B/8B/16B/32B so on. The block transfer from RAM to
cache in the event of cache miss increases the probability of reading following instructions from cache
instead of reading from RAM.
5. Define Hit rate, Hit time, Miss rate, miss penalty, average memory access time, and memory stall cycles.
• Hit Time (HT): The hit time is how long it takes data to be sent from the cache to the processor.
This is usually fast, on the order of 1-3 clock cycles.
• Miss Penalty (MP): The miss penalty is the time to copy data from main memory to the cache.
This often requires dozens of clock cycles (at least). The miss rate is the percentage of misses.
• Miss Rate (MR): 1 – Hit Ratio
• The average memory access time, or AMAT, can then be computed.
AMAT = Hit time + (Miss rate × Miss penalty)
This is just averaging the amount of time for cache hits and the amount of time for cache misses
6. What is cache miss? What is the consequence of cache miss? Explain.
When CPU is searching any instruction or data in cache and if it is not found, called cache miss. In the
event of cache miss, the CPU will access RAM and a number (block) of instructions/data including the one the
CPU is searching for will be copied/transferred from RAM to cache memory following the spatial locality of
reference. The size of the block is only few bytes, i.e., 4B/8B/16B/32B so on. The block transfer from RAM to
cache in the event of cache miss increases the probability of reading following instructions from cache instead of
reading from RAM.
In the event of cache miss, a new block is always copied/transferred from RAM to cache. If the cache is full and
there is a cache miss, a new block will be copied/transferred from RAM to cache by replacing a block transferred
earlier.
7. What are the performance measures of cache memory?
• Hit Time (HT): The hit time is how long it takes data to be sent from the cache to the processor.
This is usually fast, on the order of 1-3 clock cycles.
• Miss Penalty (MP): The miss penalty is the time to copy data from main memory to the cache.
This often requires dozens of clock cycles (at least). The miss rate is the percentage of misses.
• Miss Rate (MR): 1 – Hit Ratio
The average memory access time, or AMAT, can then be computed.
AMAT = Hit time + (Miss rate × Miss penalty)
This is just averaging the amount of time for cache hits and the amount of time for cache misses
8. What is the justification of block transfer from RAM to cache in the event of cache miss? Explain
Instructions in users programs usually follow principle of spatial locality. It means that instruction from
nearby RAM locations is likely to be read next. So in the event of cache miss, a block of instruction,
including the one the CPU is searching for is copied from RAM to cache so that the CPU can read
following few instructions from cache instead of accessing RAM again.
9. Parameters that matter: RAM
10. If a memory system consists of a single external cache with an access time of 20 ns and a hit rate of
0.92, and a main memory with an access time of 60 ns, what is the average memory access time
(AMAT) of this system?
AMAT = Hit time + Miss rate × Miss penalty
Example
CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 cycles, cache miss rate = 5%
13. Processor clock cycle: 200 ns, Miss Penalty of 50 clock cycles, Miss Ratio of 0.02 misses/instruction,
and Hit time of 1 clock cycle
14. A machine has a base CPI of 2 clock cycles. Measurements obtained show that the instruction miss rate
is 12% and the data miss rate is 6%, and that on average, 30% of all instructions contain one data
reference. The miss penalty for the cache is 10 cycles. What is the total CPI?
Solution:
Please note that Base CPI means the clock cycles required to read from cache memory. You can also say
that this is hit time in CPU clock cycles, i.e., access time of Cache memory in CPU clock cycles.
Average CPI = 2.0 + instruction miss cycles + data miss cycles
= 2.0 + 0.12 x 10 + 0.30 x 0.06 x 10 = 2.0 + 1.2 + 0.18
= 3.38
15. Memory Stall Cycles = Number of Memory Accesses × Miss rate × Miss penalty
16. Machine has a base CPI of 1 clock cycles. Measurements obtained show that the instruction miss rate is
15% and the data miss rate is 6%, and that on average, 40% of all instructions contain one data
reference. The miss penalty for the cache is 20 cycles. What is the total CPI?
Solution:
Average CPI = Base CPI + instruction miss cycles + data miss cycles
I-cache miss rate = 15%
D-cache miss rate = 6%
Miss penalty = 20 cycles
Base CPI (ideal cache) = 1
Load & stores are 40% of instructions
Miss cycles per instruction
I-cache: 1 x 0.15 × 20 = 3
D-cache: 1x 0.40 × 0.06 × 20 = 0.48
Actual CPI = 1 + 3 + 0.48 = 4.48
17. Suppose our processor has separate L1 instruction cache and data cache. Our CPI-base is 2 clock cycles,
whereas memory accesses take 100 cycles. Our Instruction cache miss rate is 3% while our Data cache
miss rate is 10%. 40% of our instructions are loads or stores.
a. What is our processor's CPI stall?
Average CPI = CPI base + L1 inst miss cycles + L1 data miss cycles
= 2 + 1 ×0.03 ×100 + 0.4×0.1×100
= 2 +0.07 × 100 = 9 cycles
To improve the performance our processor, we add a unified L2 cache between the L1 caches and
memory. Our L2 cache has a hit time of 10 cycles and a global miss rate of 2%.
b. What is our new Average CPI?
CPI stall = CPI base + L1 inst miss cycles + L1 data miss cycles + L2 inst miss cycles + L2 data miss
cycles
= 2 + (1 ×0.03 × 10) + (0.4 ×0.1 × 10) + (1×0.02 × 100) + (0.4 ×0.02 × 100)
= 2 + 0.3 + 0.4 + 2 + 0.8
= 2 + 3.5 = 5.5 cycles
18. Assume
1. I-cache miss rate 3%.
2. D-cache miss rate 5%.
3. 40% of instructions reference data.
4. Miss penalty of 50 cycles.
5. Base CPI is 2.
What is the CPI including the misses? Average CPI
How much slower is the machine when misses are taken into account? Average CPI/base CPI
Redo the above if the I-miss penalty is reduced to 10 (D-miss still 50)
With I-miss penalty back to 50, what is performance if CPU (and the caches) are 100 times faster
19. Assume
o 5% I-cache misses.
o 10% D-cache misses.
o 1/3 of the instructions access data.
What is the CPI if the miss penalty is 12?
What is the CPI if miss penalty is 24 clock)?
20. Suppose our processor has separate L1 instruction cache and data cache. Our CPIbase is 2 clock cycles,
whereas memory accesses take 100 cycles. Our I-cache miss rate is 3% while our D-cache miss rate is
10%. 40% of our instructions are loads or stores.
21. Assume a memory access to main memory on a cache "miss" takes 30 ns and a cache "hit" takes 3 ns. If
80% of the processor's memory requests result in a cache "hit", what is the average memory access time?
Solution:
(0.8 × 3 ns) + (0.2 × 33 ns) = 2.4 ns + 6.6 ns
= 9 ns
25. Suppose our processor has separate L1 instruction cache and data cache. Our CPI base is 2 clock cycles,
whereas memory accesses take 100 cycles. Our instruction miss rate is 3% while our data miss rate is
10%. 40% of our instructions are loads or stores.
Question 2:
You want your AMAT to be <=2 cycles. You have two levels of cache.
L1 Hit Time is 1 cycle; L1 miss rate is 10%;
L2 Hit Time is 3 cycles; L2 Miss Penalty is 100 cycles
What must you optimize your L2 miss rate to be?
2 = 1 + 0.1x ( 3 +100x)
x = 0.07 = 7% miss rate
26. Assume that 33% of the instructions in a program are data accesses. The cache hit ratio is 97% and the
hit time is one cycle, but the miss penalty is 20 cycles.
AMAT = Hit time + (Miss rate x Miss penalty)
miss rate = 1%
CPU time this machine = IC x (1.0 + 1.50 x 0.01 x 50) x clock cycle time
28. Assume that 33% of the instructions in a program are data accesses. The cache hit ratio is 97% and the
hit time is one cycle, but the miss penalty is 20 cycles.
Solution:
Average Memory Access Time (AMAT) = Hit time + (Miss rate x Miss penalty)
29. Suppose our processor has separate L1 instruction cache and data cache. Our CPI-base is 2 clock cycles,
whereas memory accesses take 100 cycles. Our Instruction cache hit rate is 97% while our Data cache
hit rate is 90%. 40% of our instructions are loads or stores.
[Stalled means the processor was not making forward progress with instructions, and usually happens
because it is waiting on memory I/O.]
Alternate statement:
Suppose our processor has separate L1 instruction cache and data cache. Our CPI-base is 2 clock cycles,
whereas memory accesses take 100 cycles. Our Instruction cache miss rate is 3% while our Data cache
miss rate is 10%. 40% of our instructions are loads or stores.
CPI stall = CPI base + L1 inst miss cycles + L1 data miss cycles
= 2 + 1 ×0.03 ×100 + 0.4×0.1×100
= 2 +0.07 × 100 = 9 cycles
30.
You want your Average Memory Access Time (AMAT) to be <=2 cycles. You have two levels of cache.
L1 Hit Time is 1 cycle; L1 miss rate is 10%;
L2 Hit Time is 3 cycles; L2 Miss Penalty is 100 cycles
What must you optimize your L2 miss rate to be?
2 = 1 +0.1× (3 + x ×100)
x = 0.07 = 7% miss rate
31. Assume
CPI = 1.0 when cache has 100% hit rate
Miss penalty = 50 clock cycles
Miss rate = 1%
How much faster is an ideal machine?
32. A machine has a base CPI of 2 clock cycles. Measurements obtained show that the instruction miss
rate is 12% and the data miss rate is 6%, and that on average, 30% of all instructions contain one data
reference. The miss penalty for the cache is 10 cycles. What is the total CPI?
34. Assume
35. Assume
5% I-cache misses.
10% D-cache misses.
1/3 of the instructions access data.
The CPI = 4 if the miss penalty is 0. A 0miss penalty is not realistic of course.
What is the CPI if the miss penalty is 12?
What is the CPI if we upgrade to a double speed cpu+cache, but keep a single speed memory (i.e., a 24
clock miss penalty)?
How much faster is the double speed machine? It would be double speed if the miss penalty were 0 or if
there was a 0% miss rate.
----
Problem:
HT(L1) = 2ns ;
HT(L2) = 10ns ;
Miss Ratio(L1) = 6%,
Miss Ratio (L2) = 2%
RAM Access time = 100ns.
Calculate average memory access time.
If Average Memory Access Time = 3.358 ns, calculate what percent of instruction CPU reads from L1
cache memory?
Problem:
HT(L1) = 2ns ; Hit Ratio(L1) = 70%
HT(L2) = 10ns ; Hit Ratio(L1) = 80%
HT(L3) = 20ns ;
Global Miss Rate = 4% and RAM Access time = 100ns
Calculate: Average Memory Access Time
Assume:
L1 I-cache miss rate 3% L1 D-cache miss rate 8%
30% of instructions reference data L2 miss rate 4%
L2 time of 10 clock cycles Memory access time 90 clock cycles
Base CPI of 2.5 CPU Clock rate 3GHz
Which of the following implementation will be better?
a. A faster L2 of 6 clock cycles
b. A larger L2 of miss rate 3%
Average CPI: base CPI + L-1 Instruction miss cycles + L-1 Data Miss cycles + L-2 instruction miss cycles + L-2
Data Miss Cycles
L-1 Instruction miss cycles = Instruction count x Instruction MR x L-1 Miss Penalty = 1 x 0.03 x 10
L-1 data miss cycles = Data access count x data MR x L-1 Miss Penalty = (1 x0.3)x 0.08 x 10
L-2 Instruction miss cycles = Instruction count x Instruction MR x L-2 Miss Penalty = 1 x 0.04 x 90
L-2 data miss cycles = data access count x data MR x L-2 Miss Penalty = (1x0.3) x 0.04 x 90
Average CPI =
Case-a
L1 I-cache miss rate 3% L1 D-cache miss rate 8%
30% of instructions reference data L2 miss rate 4%
L2 time of 6 clock cycles Memory access time 90 clock cycles
Base CPI of 2.5 CPU Clock rate 3GHz
Average CPI=
Case-b
L1 I-cache miss rate 3% L1 D-cache miss rate 8%
30% of instructions reference data L2 miss rate 3%
L2 time of 10 clock cycles Memory access time 90 clock cycles
Base CPI of 2.5 CPU Clock rate 3GHz
Average CPI =
Assume:
L1 I-cache miss rate 3% L1 D-cache miss rate 8%
40% of instructions reference data L2 miss rate 4%
L2 time of 10 clock cycles Memory access time 80 clock cycles
Base CPI of 2 CPU Clock rate 4GHz
Which of the following implementation will be better?
a. A faster L2 of 5 clock cycles
b. A larger L2 of miss rate 2%
Assume
• L1 I-cache miss rate 8%
• L1 D-cache miss rate 10%
• Load Instructions: 15% of total instruction count
• Store Instructions: 25% of total instruction count
• Data ref = 40%
• L2 miss rate 6%
• L2 time of 15ns = 15ns / 0.25ns = 60 cycles
• Memory access time 100ns = 100ns/0.25ns = 400 cycles
• Base CPI of 2 = 0.25 ns x 2 = 0.5 ns
• Clock rate 4GHz
CPU clock period = 1/4x109 = 0.25 x 10-9 sec = 0.25 ns
a) How many instructions per second does this machine execute?
1/ 20 x 10-9 =
b) How many instructions per second would this machine execute if the L2 cache were eliminated?
Average CPI
No of instructions per second does this machine execute = 4GHz /average CPI
c) How many instructions per second would this machine execute if both caches were eliminated?
Average CPI
No of instructions per second does this machine execute = 4GHz /average CPI
d) How many instructions per second would this machine execute if the L2 cache had a 0% miss
rate (L1 as originally specified)?
e) How many instructions per second would this machine execute if both L1 caches had a 0% miss
rate?
Assume
• L1 I-cache miss rate 5%
• L1 D-cache miss rate 8%
• Load Instructions: 25% of total instruction count
• Store Instructions: 10% of total instruction count
• L2 miss rate 4%
• L2 time of 15ns
• Memory access time 90ns
• Base CPI of 2
• Clock rate 3GHz
f) How many instructions per second does this machine execute?
g) How many instructions per second would this machine execute if the L2 cache were eliminated?
h) How many instructions per second would this machine execute if both caches were eliminated?
i) How many instructions per second would this machine execute if the L2 cache had a 0% miss
rate (L1 as originally specified)?
j) How many instructions per second would this machine execute if both L1 caches had a 0% miss
rate?