This document contains 6 tutorial problems about cache memory optimizations. The problems cover topics like calculating block size based on a memory address, determining when reducing miss rate or increasing hit latency improves average memory access time, calculating misses per 1000 instructions and memory stall cycles per miss, determining speedup from a perfect cache, and identifying which cache sets would be filled when executing a sequence of instructions and data accesses.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
53 views
Tutorial 3
This document contains 6 tutorial problems about cache memory optimizations. The problems cover topics like calculating block size based on a memory address, determining when reducing miss rate or increasing hit latency improves average memory access time, calculating misses per 1000 instructions and memory stall cycles per miss, determining speedup from a perfect cache, and identifying which cache sets would be filled when executing a sequence of instructions and data accesses.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14
Multicore Computer Architecture - Storage and Interconnects
Tutorial 3 Cache Memory Optimizations
Dr. John Jose
Assistant Professor Department of Computer Science & Engineering Indian Institute of Technology Guwahati, Assam. Tutorial Problem-1 The address of a word in a byte addressable 16MB physical memory is 0xAA0C2A. This word upon bringing to the cache is mapped to set 48. What is the block size of the cache memory ? A A 0 C 2 A 1010 1010 0000 1100 0010 1010 1010 1010 0000 1100 0010 1010 offset 64bytes Tutorial Problem-2 A cache has access time (hit latency)=10 ns and miss rate is 5%. An optimization was made to reduce the miss rate to 3 % but the hit latency was increased to 15 ns. Under what condition this change will result in better performance (Lower avg. memory access time)? AMAT 1 = HT1 + MR1 x MP HT1 = 10ns; MR1=0.05 AMAT 2 = HT2 + MR2 x MP HT2 = 15ns; MR1=0.03 AMAT2<AMAT1 Tutorial Problem-3 A cache has hit rate of 90%, 64 byte block, cache hit latency of 5ns. Main memory takes 150 ns to return first word (32 bits) of a block and 10 ns for each subsequent word. (a) What is the miss latency of the cache? (b) If doubling the cache block size reduces the miss rate to 3%, does it reduces average memory access time? Tutorial Problem-3 A cache has hit rate of 90%, 64 byte block, cache hit latency of 5ns. Main memory takes 150 ns to return first word (32 bits) of a block and 10 ns for each subsequent word. (a) What is the miss latency of the cache? (b) If doubling the cache block size reduces the miss rate to 3%, does it reduces average memory access time? Tutorial Problem-4 For a cache, that has a miss rate of 3% and miss penalty of 500 cycles. In a program 50% of the instructions are memory accesses (load-store) (a) Find the misses per 1000 instruction (MPKI) (b) Find memory stall cycles per miss Miss rate: miss/mem access = (miss / instruction)/(mem acc /instruction) MR = MPI/MAPI MPI =MR x MAPI MAPI=1.5 Tutorial Problem-5 Consider a cache system with miss rate of an I-cache is 2% and that of D- cache is 4%. The processor CPI=2 without memory stalls and miss penalty =100 cycles for all misses. Determine how much faster the processor would run with a perfect cache that never missed. Assume frequency of all loads and store is 36 %. Actual CPI real= Base CPI + stall CPI CPI ideal = Base CPI=2 Stall CPI = (% use of IC x stall of IC)+(% use of DC x stall of DC) Tutorial Problem-5 miss penalty =100 cycles for all misses. Assume frequency of all loads and store is 36 %. Actual CPI real= Base CPI + stall CPI CPI ideal = Base CPI=2 Stall CPI = (% use of IC x stall of IC)+(% use of DC x stall of DC) Tutorial Problem-6 Consider a 32 bit processor with 16KB direct mapped L1-cache that uses a block size of 4 words. It has an L2-cache of 256 KB with 4-way associativity and block size of 8 words. The system uses a byte addressable 256 MB DRAM system. Upon running a program, 16 consecutive fixed length instructions (each instruction is one word) starting at main memory address 0x 8226620 are executed. These instructions operate on an array A of 8 words, with starting address 0x 42AF5F8 Assuming caches are initially empty; indicate the non empty sets on L1 cache and L2 cache after the execution of the program. Tutorial Problem-6 32 bit processor: 1 word 4 bytes: 256 MB DRAM 28 bit address L1 Cache: 16KB, direct mapped, block size= 4 words (16B)