This document contains 6 tutorial problems about cache memory optimizations. The problems cover topics like calculating block size based on a memory address, determining when reducing miss rate or increasing hit latency improves average memory access time, calculating misses per 1000 instructions and memory stall cycles per miss, determining speedup from a perfect cache, and identifying which cache sets would be filled when executing a sequence of instructions and data accesses.
This document contains 6 tutorial problems about cache memory optimizations. The problems cover topics like calculating block size based on a memory address, determining when reducing miss rate or increasing hit latency improves average memory access time, calculating misses per 1000 instructions and memory stall cycles per miss, determining speedup from a perfect cache, and identifying which cache sets would be filled when executing a sequence of instructions and data accesses.
Multicore Computer Architecture - Storage and Interconnects
Tutorial 3 Cache Memory Optimizations
Dr. John Jose
Assistant Professor Department of Computer Science & Engineering Indian Institute of Technology Guwahati, Assam. Tutorial Problem-1 The address of a word in a byte addressable 16MB physical memory is 0xAA0C2A. This word upon bringing to the cache is mapped to set 48. What is the block size of the cache memory ? A A 0 C 2 A 1010 1010 0000 1100 0010 1010 1010 1010 0000 1100 0010 1010 offset 64bytes Tutorial Problem-2 A cache has access time (hit latency)=10 ns and miss rate is 5%. An optimization was made to reduce the miss rate to 3 % but the hit latency was increased to 15 ns. Under what condition this change will result in better performance (Lower avg. memory access time)? AMAT 1 = HT1 + MR1 x MP HT1 = 10ns; MR1=0.05 AMAT 2 = HT2 + MR2 x MP HT2 = 15ns; MR1=0.03 AMAT2<AMAT1 Tutorial Problem-3 A cache has hit rate of 90%, 64 byte block, cache hit latency of 5ns. Main memory takes 150 ns to return first word (32 bits) of a block and 10 ns for each subsequent word. (a) What is the miss latency of the cache? (b) If doubling the cache block size reduces the miss rate to 3%, does it reduces average memory access time? Tutorial Problem-3 A cache has hit rate of 90%, 64 byte block, cache hit latency of 5ns. Main memory takes 150 ns to return first word (32 bits) of a block and 10 ns for each subsequent word. (a) What is the miss latency of the cache? (b) If doubling the cache block size reduces the miss rate to 3%, does it reduces average memory access time? Tutorial Problem-4 For a cache, that has a miss rate of 3% and miss penalty of 500 cycles. In a program 50% of the instructions are memory accesses (load-store) (a) Find the misses per 1000 instruction (MPKI) (b) Find memory stall cycles per miss Miss rate: miss/mem access = (miss / instruction)/(mem acc /instruction) MR = MPI/MAPI MPI =MR x MAPI MAPI=1.5 Tutorial Problem-5 Consider a cache system with miss rate of an I-cache is 2% and that of D- cache is 4%. The processor CPI=2 without memory stalls and miss penalty =100 cycles for all misses. Determine how much faster the processor would run with a perfect cache that never missed. Assume frequency of all loads and store is 36 %. Actual CPI real= Base CPI + stall CPI CPI ideal = Base CPI=2 Stall CPI = (% use of IC x stall of IC)+(% use of DC x stall of DC) Tutorial Problem-5 miss penalty =100 cycles for all misses. Assume frequency of all loads and store is 36 %. Actual CPI real= Base CPI + stall CPI CPI ideal = Base CPI=2 Stall CPI = (% use of IC x stall of IC)+(% use of DC x stall of DC) Tutorial Problem-6 Consider a 32 bit processor with 16KB direct mapped L1-cache that uses a block size of 4 words. It has an L2-cache of 256 KB with 4-way associativity and block size of 8 words. The system uses a byte addressable 256 MB DRAM system. Upon running a program, 16 consecutive fixed length instructions (each instruction is one word) starting at main memory address 0x 8226620 are executed. These instructions operate on an array A of 8 words, with starting address 0x 42AF5F8 Assuming caches are initially empty; indicate the non empty sets on L1 cache and L2 cache after the execution of the program. Tutorial Problem-6 32 bit processor: 1 word 4 bytes: 256 MB DRAM 28 bit address L1 Cache: 16KB, direct mapped, block size= 4 words (16B)