Cache Memory Test 2 Papers
Cache Memory Test 2 Papers
October 8, 1997
The exam is composed of four problems containing a total of 14 sub-problems, adding up to 100
points overall. The point value is indicated for each subproblem and is roughly proportional to
difficulty. Attempt all problems and budget your time according to the problem’s difficulty. Show
all work in the space provided. If you have to make an assumption then state what it was. Answers
unaccompanied by supporting work will not be graded. The exam is closed book, closed notes, and
closed neighbors. You are on your honor to have erased any course-relevant material from your
calculator. If there is a single numeric answer to a problem, please circle it so that it stands out on
the page and we grade according to the answer you want to have considered as your best shot.
Please print your initials at the top of each page in case the pages of your test get accidentally
separated.
Good luck!
1
18-742 Test 1 October 8, 1997 Printed Initials: ________
Grading Sheet
2
18-742 Test 1 October 8, 1997 Printed Initials: ________
1. Cache Policies.
Consider two alternate caches, each with 4 sectors holding 1 block per sector and one 32-bit word
per block. One cache is direct mapped and the other is fully associative with LRU replacement
policy. The machine is byte addressed on word boundaries and uses write allocation with write
back.
1a) (7 points) What would the overall miss ratio be for the following address stream on the direct
mapped cache? Assume the cache starts out completely invalidated.
read 0x00
read 0x04
write 0x08
read 0x10
read 0x08
write 0x00
1b) (6 points) Give an example address stream consisting of only reads that would result in a lower
miss ratio if fed to the direct mapped cache than if it were fed to the fully associative cache.
3
18-742 Test 1 October 8, 1997 Printed Initials: ________
2a) (10 points) Sketch a block diagram of how the virtual address is mapped into the physical
address (assuming a TLB hit). Be sure to label exactly which/how many of the address bits go
where. and how many bits are in each of the 3 fields in a TLB entry.
4
18-742 Test 1 October 8, 1997 Printed Initials: ________
2b) (14 points) Given that you have the address output of a TLB and the original virtual address,
sketch a block diagram of how the cache is accessed to determine whether there is a cache hit (you
may ignore data access -- just indicate enough to say whether a hit or miss occurs; also include
only tag fields in your picture of the cache organization). Again label exactly which/how many
address bits go where and how big an address tag is.
5
18-742 Test 1 October 8, 1997 Printed Initials: ________
3. Multi-Level Caches.
You have a computer with two levels of cache memory and the following specifications:
CPU Clock: 200 MHz Bus speed: 50 MHz
Processor: 32-bit RISC scalar CPU, single data address maximum per instruction
L1 cache on-chip, 1 CPU cycle access
block size = 32 bytes, 1 block/sector, split I & D cache
each single-ported with one block available for access, non-blocking
L2 cache off-chip, 3 CPU cycles transport time (L1 miss penalty)
block size = 32 bytes, 1 block/sector, unified single-ported cache, blocking, non-pipelined
Main memory has 12+4+4+4 CPU cycles transport time for 32 bytes (L2 miss penalty)
CMDLINE: dinero -b32 -i8K -d8K -a1 -ww -An -W8 -B8
CACHE (bytes): blocksize=32, sub-blocksize=0, wordsize=8, Usize=0,
Dsize=8192, Isize=8192, bus-width=8.
POLICIES: assoc=1-way, replacement=l, fetch=d(1,0), write=w, allocate=n.
CTRL: debug=0, output=0, skipcount=0, maxcount=10000000, Q=0.
3a) (6 points) What is the available (as opposed to used) sustained bandwidth between:
6
18-742 Test 1 October 8, 1997 Printed Initials: ________
3b) (9 points) How long does an insruction take to execute (in ns), assuming 1 clock cycle per
instruction in the absence of memory hierarchy stalls, no write buffering at the L1 cache level, and
0% L2 miss rate.
3c) (7 points) A design study is performed to examine replacing the L2 cache with a victim cache.
Compute a measure of speed for each alternative and indicate which is the faster solution. Assume
the performance statistics are
L2 cache local miss ratio = 0.19
Victim cache miss ratio = 0.26; and its transport time from L1 miss = 1 clock
7
18-742 Test 1 October 8, 1997 Printed Initials: ________
You have instrumented only data references from the subroutine “sum_array” in the following
program using Atom on an Alpha workstation (“long” values are 64 bits). The resultant data reads
and writes have been run through dinero with a particular cache configuration. In this question
you’ll deduce the cache configuration used.
#include <stdio.h>
#include <stdlib.h>
void sum_array(int N, long *a, long *b, long *c, long *d, long *e)
{ int i;
for (i = 0; i < N; i++)
{ *(a++) = *(b++) + *(c++) + *(d++) + *(e++); }
}
if (argc != 3)
{ fprintf(stderr, "\nUsage: test <size> <offset>\n"); exit(-1); }
sscanf(argv[1], "%d", &N); sscanf(argv[2], "%d", &offset);
sum_array(N, a, b, c, d, e);
The program was executed with a command line having successively higher values of N from 1 to
2100, and an offset value of 0. The below graph shows the number of combined data misses for
each value of N.
• Bus size and word size are both 8 bytes.
• The cache has one block per sector and a block size of 128 bytes (16 words).
• Assume a completely invalidated cache upon entry to sum_array.
8
18-742 Test 1 October 8, 1997 Printed Initials: ________
6500
6000
5500
5000
4500
4000
3500
Series1
3000
2500
2000
1500
1000
500
0
0 500 1000 1500 2000 2500
Value of N
Answer the following questions, giving brief support for your answer of (unsupported answers
will not receive full credit).
9
18-742 Test 1 October 8, 1997 Printed Initials: ________
4a) (5 points) Ignoring overhead for the subroutine call, what is the theoretical minimum possible
data total traffic (in 8-byte words) that this program has to move (combined into and out of the
cache) for N = 500?
4c) (6 points) Assuming it is not direct mapped, does this data look like it came from an LRU
replacement policy or a random replacement policy? Why?
10
18-742 Test 1 October 8, 1997 Printed Initials: ________
4d) (6 points) Assume that you have a direct mapped cache. What is the best value for the input
parameter offset if you want to improve performance for N=512 (“best” means the smallest
value guaranteed to have 100% effectiveness for N=512).
4e) (10 points) What is the actual associativity of the cache that produced the data given? (and
how did you figure that out?)
4f) (8 points) How many bytes does the cache hold (data only, not counting control+tag bits)?
11
18-742 Test 1 February 24, 1998 Printed Initials: ________
The exam is composed of four problems containing a total of 11 sub-problems, adding up to 100
points overall. The point value is indicated for each sub-problem and is roughly proportional to
difficulty. Attempt all problems and budget your time according to the problem’s difficulty. Show
all work in the space provided. If you have to make an assumption then state what it was. Answers
unaccompanied by supporting work will not receive full credit. The exam is closed book, closed
notes, and “closed neighbors.” You are on your honor to have erased any course-relevant material
from your calculator prior to the start of the test. Please print your initials at the top of each page
in case the pages of your test get accidentally separated. You may separate the pages of the test if
you like, and re-staple them when handing the test in.
Good luck!
1
18-742 Test 1 February 24, 1998 Printed Initials: ________
Grading Sheet
2
18-742 Test 1 February 24, 1998 Printed Initials: ________
1. Cache Policies.
1a) (5 points) Assume the cache starts out completely invalidated. Circle one of four choices next
to every access to indicate whether the access is a hit, or which type of miss it is. Put a couple
words under each printed line giving a reason for your choice.
1b) (8 points) Assume the cache starts out completely invalidated. Circle “hit” or “miss” next to
each access, and indicate the number of words of memory traffic generated by the access.
3
18-742 Test 1 February 24, 1998 Printed Initials: ________
Consider a system with a two-level cache having the following characteristics. The system does
not use “shared” bits in its caches.
L1 cache L2 cache
- Physically addressed, byte addressed - Physically addressed, byte addressed
- Split I/D - Unified
- 8 KB combined, evenly split -- 4KB each - 160 KB
- Direct mapped - 5-way set associative; LRU replacement
- 2 blocks per sector - 2 blocks per sector
- 1 word per block (8 bytes/word) - 1 word per block (8 bytes/word)
- Write-through - Write-back
- No write allocation - Write allocate
- L1 hit time is 1 clock - L2 hit time is 5 clocks (after L1 miss)
- L1 average local miss rate is 0.15 - L2 average local miss rate is 0.05
- The system has a 40-bit physical address space and a 52-bit virtual address space.
- L2 miss (transport time) takes 50 clock cycles
- The system uses a sequential forward access model (the “usual” one from class)
2a) (9 points)
Compute the following elements that would be necessary to determine how to configure the cache
memory array:
4
18-742 Test 1 February 24, 1998 Printed Initials: ________
2b) (5 points)
Compute the average effective memory access time (tea, in clocks) for the given 2-level cache under
the stated conditions.
5
18-742 Test 1 February 24, 1998 Printed Initials: ________
6
18-742 Test 1 February 24, 1998 Printed Initials: ________
3. Virtual Memory.
3a) (9 points) What is size (in number of address bits) of the virtual address space supported by
the above virtual memory configuration?
3b) (10 points) What is the maximum required size (in KB, MB, or GB) of an inverted page table
for the above virtual memory configuration?
7
18-742 Test 1 February 24, 1998 Printed Initials: ________
4. Cache simulation
The following program was instrumented with Atom and simulated with a cache simulator. The
cache simulator was fed only the data accesses (not instruction accesses) for the following
subroutine:
limit = a + size;
for (i = 0; i < 100; i++)
{ b = a;
while (b < limit)
{ result += *b;
b += stride;
}
}
return(result);
}
The result of sending the Atom outputs through the cache simulator for various values of “size”
and “stride” (each for a single execution of test_routine) are depicted in the graph below.
The cache word size was 8 bytes (matching the size of a “long” on an Alpha), and a direct-mapped
cache was simulated. The simulated cache was invalidated just before entering test_routine.
There was no explicit prefetching used. The code was compiled with an optimization switch that
got rid of essentially all reads and writes for loop handling overhead.
Use only “Cragon” terminology for this problem (not dinero terminology).
0.9
0.8 Value of
"size"
0.7
256
Miss Ratio
0.6 512
1024
0.5
2048
0.4 4096
8192
0.3
0.2
0.1
0
0 4 8 12 16 20 24 28 32
Value of "stride"
8
18-742 Test 1 February 24, 1998 Printed Initials: ________
4a) (9 points)
For the given program and given results graph, what was the size of the cache in KB (support your
answer with a brief reason)? Remember that in the C programming language pointer arithmetic is
automatically scaled according to the size of the data value being pointed to (so, in this code,
adding 1 to a pointer actually adds the value 8 to the address).
9
18-742 Test 1 February 24, 1998 Printed Initials: ________
10