0% found this document useful (0 votes)
53 views

Tutorial 3

This document contains 6 tutorial problems about cache memory optimizations. The problems cover topics like calculating block size based on a memory address, determining when reducing miss rate or increasing hit latency improves average memory access time, calculating misses per 1000 instructions and memory stall cycles per miss, determining speedup from a perfect cache, and identifying which cache sets would be filled when executing a sequence of instructions and data accesses.

Uploaded by

Rama Devi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Tutorial 3

This document contains 6 tutorial problems about cache memory optimizations. The problems cover topics like calculating block size based on a memory address, determining when reducing miss rate or increasing hit latency improves average memory access time, calculating misses per 1000 instructions and memory stall cycles per miss, determining speedup from a perfect cache, and identifying which cache sets would be filled when executing a sequence of instructions and data accesses.

Uploaded by

Rama Devi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Multicore Computer Architecture - Storage and Interconnects

Tutorial 3
Cache Memory Optimizations

Dr. John Jose


Assistant Professor
Department of Computer Science & Engineering
Indian Institute of Technology Guwahati, Assam.
Tutorial Problem-1
 The address of a word in a byte addressable 16MB physical memory is
0xAA0C2A. This word upon bringing to the cache is mapped to set 48.
What is the block size of the cache memory ?
 A A 0 C 2 A
 1010 1010 0000 1100 0010 1010
 1010 1010 0000 1100 0010 1010 offset  64bytes
Tutorial Problem-2
 A cache has access time (hit latency)=10 ns and miss rate is 5%. An
optimization was made to reduce the miss rate to 3 % but the hit latency
was increased to 15 ns. Under what condition this change will result in
better performance (Lower avg. memory access time)?
 AMAT 1 = HT1 + MR1 x MP HT1 = 10ns; MR1=0.05
 AMAT 2 = HT2 + MR2 x MP HT2 = 15ns; MR1=0.03
 AMAT2<AMAT1
Tutorial Problem-3
 A cache has hit rate of 90%, 64 byte block, cache hit latency of 5ns. Main
memory takes 150 ns to return first word (32 bits) of a block and 10 ns for
each subsequent word.
(a) What is the miss latency of the cache?
(b) If doubling the cache block size reduces the miss rate to 3%, does it
reduces average memory access time?
Tutorial Problem-3
 A cache has hit rate of 90%, 64 byte block, cache hit latency of 5ns. Main
memory takes 150 ns to return first word (32 bits) of a block and 10 ns for
each subsequent word.
(a) What is the miss latency of the cache?
(b) If doubling the cache block size reduces the miss rate to 3%, does it
reduces average memory access time?
Tutorial Problem-4
 For a cache, that has a miss rate of 3% and miss penalty of 500 cycles. In
a program 50% of the instructions are memory accesses (load-store)
 (a) Find the misses per 1000 instruction (MPKI)
 (b) Find memory stall cycles per miss
 Miss rate: miss/mem access = (miss / instruction)/(mem acc /instruction)
MR = MPI/MAPI MPI =MR x MAPI MAPI=1.5
Tutorial Problem-5
 Consider a cache system with miss rate of an I-cache is 2% and that of D-
cache is 4%. The processor CPI=2 without memory stalls and miss penalty
=100 cycles for all misses. Determine how much faster the processor
would run with a perfect cache that never missed. Assume frequency of all
loads and store is 36 %.
 Actual CPI real= Base CPI + stall CPI CPI ideal = Base CPI=2
 Stall CPI = (% use of IC x stall of IC)+(% use of DC x stall of DC)
Tutorial Problem-5
 miss penalty =100 cycles for all misses. Assume frequency of all loads and
store is 36 %.
 Actual CPI real= Base CPI + stall CPI CPI ideal = Base CPI=2
 Stall CPI = (% use of IC x stall of IC)+(% use of DC x stall of DC)
Tutorial Problem-6
 Consider a 32 bit processor with 16KB direct mapped L1-cache that uses
a block size of 4 words. It has an L2-cache of 256 KB with 4-way
associativity and block size of 8 words. The system uses a byte
addressable 256 MB DRAM system. Upon running a program, 16
consecutive fixed length instructions (each instruction is one word)
starting at main memory address 0x 8226620 are executed. These
instructions operate on an array A of 8 words, with starting address 0x
42AF5F8 Assuming caches are initially empty; indicate the non empty
sets on L1 cache and L2 cache after the execution of the program.
Tutorial Problem-6
 32 bit processor: 1 word  4 bytes: 256 MB DRAM  28 bit address
 L1 Cache: 16KB, direct mapped, block size= 4 words (16B)

 L2 Cache : 256 KB, 4-way, block size= 8 words (32B).

 Instruction 0x 8226620, 16 consecutive fixed length instructions (each


instruction is one word) Data 0x 42AF5F8 , array of 8 words.
Tutorial Problem-6
 L1 Cache: 16KB, direct mapped, block size= 4 words (16B)
 Instruction 0x 8226620, 16 consecutive fixed length instructions (each
instruction is one word) Data 0x 42AF5F8 , array of 8 words.
Tutorial Problem-6
 L2 Cache : 256 KB, 4-way, block size= 8 words (32B).
 Instruction 0x 8226620, 16 consecutive fixed length instructions (each
instruction is one word) Data 0x 42AF5F8 , array of 8 words.
Tutorial Problem-6
 Non-Empty Blocks
 L1: Sets 610, 611, 612,613 (4 words x 4 = 16 instructions)
Sets 863, 864, 865 ( 2 + 4 +2 words of data array A)

 L2: Sets 817, 818 (8 words x 2 = 16 instructions)


Sets 1967, 1968 ( 2 + 6 words of data array A)
[email protected]
http://www.iitg.ac.in/johnjose/

You might also like