Cose222 HW4
Cose222 HW4
Assignment #4
Due: Dec 15, 2021 (Wednesday) 11:59pm on Blackboard
Solutions (Total score: 150)
Please answer for the questions. Write your student ID and name on the top of the document. Submit
your homework with “PDF” format only. (You can easily generate the pdf files from Microsoft Word or
HWP. You can also handwrite your answers to scan the handwritten documents with “PDF” format. You
may use the document capture applications such as “Microsoft Lens” for scanning your documents with
your smartphones.)
The answer rules:
(1) You can write answers in both Korean and English.
(2) Please make your final answer numbers have three decimal places.
(3) Performance of A is improved by NN % compared to performance B if PerfA / PerfB = 1.NN.
1. The following code is written in C, where elements of an array are allocated contiguously. Arrays a
has 1024x1024 elements. Arrays b and c have 1024 elements respectively. Assume each element of
arrays is an 8-byte integer. The data type of all variables is an 8-byte integer also. Cache blocks are
allocated on write miss, and the size of a cache line is 64 bytes. Assume that cache size is infinite
(what?). (Hint: in this code, there are 6 variables: a, b, c, i, j, and sum)
int sum;
for (int i = 0; i < 1024; i++)
{
sum = 0;
for (int j = 0; j < 1024; j++)
sum += a[j+i*1024] + b[j];
c[i] = sum;
}
(c) Let’s focus on the data transfers for arrays a, b, and c. How many ld and sd instructions are issued
while executing this code? [3]
Array a:
Array b:
Array c:
(d) Let’s focus on the cache miss rates for arrays a, b, and c. Calculate the compulsory (cold) miss rate
for arrays. (Hint: We assume the cache size is infinite). [3]
Cod miss rate of A:
Cold miss rate of B:
Cold miss rate of C:
2. Below is a list of 64-bit memory address references, given as word addresses. (1 word = 4 bytes)
0xFD, 0xBA, 0x2C, 0xB5, 0x0E, 0xBE, 0x58, 0xBF, 0x02, 0x2B, 0xB4, 0x03
(a) Let us assume a direct-mapped cache has 16 blocks and a single block includes 2 word. What is the
size of this cache? [2]
(b) For each of these references, identify the binary word address, the tag, and the index given a direct-
mapped cache with 16 two-word blocks (i.e. the cache has 16 blocks, and the size of a single block is
two words.) Also list whether each reference is a hit or miss, assuming the cache is initially empty. [12]
(c) Calculate the hit rate (in percentage) of the above cache. [2]
(d) Let us assume that the size of a single block is increases to four words while the size of the direct-
mapped cache is the same. For each these references, identify the binary word address, the tag, and the
index. Also list if each reference is a hit or a miss, assuming the cache is initially empty. [12]
(e) Calculate the hit rate (in percentage) of the above cache. [2]
(f) Assume that the miss penalty of above caches is proportional to the size of fetched data from the
main memory or lower-level cache. Which cache configuration from questions (b) and (d) exhibits
better performance for the above word address stream? Explain your answer. [4]
3. For a direct-mapped cache design with a 64-bit address, the following bits of the address are used to
access the cache. (1 word = 4 bytes)
(c) What is the ratio between total bits required for such a cache implementation over the data storage
bits? Let us assume each cache block includes 1-bit “valid” field. [4]
4. Cache access time is usually proportional to the capacity of cache. Assume that main memory
accesses take 50 ns and that 36% of all instructions access data memory. The following table shows
data for L1 cache attached to each of two processors, P1 and P2.
(a) Assuming that the L1 hit time determines the cycle time for P1 and P2, what are their respective
clock rates? [2]
(b) What is the Average Memory Access Time (AMAT) for P1 and P2 (in cycles)? [4]
(c) Assuming a base CPI is 1.0 without any memory stalls, what is the total CPI for P1 and P2? Which
processor is faster? When we say a “base CPI of 1.0”, we mean instructions complete in one cycle, unless
either the instruction access or the data access causes a cache miss. [4]
For the next problems, we will consider the addition of an L2 cache to P1 (to presumably make up for
its limited L1 cache capacity). Use the L1 cache capacities and hit times from the previous table when
solving these problems. The L2 miss rate indicated is its local miss rate, namely the L2 miss counts
divided by the total L2 access counts.
(d) What is the AMAT for P1 with the addition of an L2 caches? Is the AMAT better or worse with the L2
cache? [4]
(e) Assuming a base CPI of 1.0 without any memory stalls, what is the total CPI for P1 with the addition
of an L2 cache? [4]
(f) What would the L2 miss rate need to be in order for P1 with an L2 cache to be faster than P1 without
an L2 cache? [4]
(g) What would the L2 miss rate need to be in order for P1 with an L2 cache to be faster than P2
without an L2 cache? [10]
5. Assume that the cache size is 128 bytes and the size of a single cache block is 32 bytes. Below is a
series of memory read reference set to the cache. Address point to bytes.
0x07, 0x15, 0x4D, 0x2A, 0x79, 0xAB, 0xCE, 0x2E, 0x20, 0x4B, 0x6D, 0x32
0x8A, 0xAF, 0x29, 0xC7, 0xCE, 0x01, 0x18, 0x07, 0x08, 0xAA, 0x08, 0x30
Classify each memory references as a hit or miss for the following caches. Calculate the total number of
misses also.
(a) Direct-mapped cache [12]
(b) Fully associative cache with LRU replacement [12]
(c) Fully associative cache with FIFO (First In First Out) replacement [12]
6. Let us assume that the size of virtual address is 48-bits and the size of physical memory is 8 GB. Word
size is 32-bits and page size is 4 KB. All addresses are byte-addressed.
(a) What is the maximum size of the virtual memory supported by this system? [2]
(c) Let us assume the TLB has 512 entries and TLB is two-way set associative. Which virtual address
bits are used to index the TLB? Which virtual address bits are used as tag of the table? [8]