0% found this document useful (0 votes)
9 views2 pages

Sheet 5

Uploaded by

mohmadkhairy44
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views2 pages

Sheet 5

Uploaded by

mohmadkhairy44
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

CCE 406 + ELE355

Computer Architecture

Sheet #5
Question 1
Suppose we use 16-Gbit DRAMs each costing $10, to build DIMMs
We have 3 choices for the data width: x4, x8, or x16 (16-bit data per
chip)
1. Which DRAM chip would you choose to build the lowest cost
single-ranked non-ECC module, and what will be its capacity?
2. Which DRAM chip would you choose to built the highest
capacity quadranked module with ECC, and what will be its cost?
Question 2
2 Gbit DRAM organized as: 8 banks, 8K rows × 2K columns × 16
bits
Refresh window = 64 msec (specified by standard)
Refresh operation takes 40 ns (tRFC = 40 ns)
What fraction of the memory bandwidth is lost to refresh
operations?
If refresh commands are distributed, what is the average refresh
interval?
Question 3
consider two memory modules
DDR3-1600 memory with timings: CL-tRCD-tRP = 8-8-8 cycles
DDR3-2666 memory with timings: CL-tRCD-tRP = 12-12-12
cycles
Which memory module has lower latency?
Question 4

The transpose of a matrix interchanges its rows and columns. Here


is the code:
for (i=0; i<N; i++)
for (j=0; j<N; j++)
output[j][i] = input[i][j];

Page 1
CCE 406 + ELE355
Computer Architecture

Both the input and output matrices are stored in row-major order.
Assume that you are executing N×N double-precision (8 bytes per
element) matrix transpose on a processor with 16 KB D-Cache,
which is 2-way set-associative, and 64-byte blocks. The D-Cache
is a write-back with write-allocate policy on a write miss.
a)Assume each set in the D-Cache stores one block from the input
matrix and a second block from the output matrix. How many sets
exist in the D-Cache? What is the maximum value of N such that
both the input and output matrices can fit in the 16-KB D-Cache?

b) A compulsory cache miss occurs when a block is referenced for


the first time. Given that N=16, how many cache misses are caused
in the 16 KB 2-way set associative cache? If each cache miss stalls
the processor for 8 cycles (assuming hit in L2 cache) then what is
the total number of stall cycles for matrix transpose when N=16?
What is the impact on the CPI if the execution CPI = 1.1
(excluding cache misses)? Assume six instructions are fetched and
executed per inner loop iterate plus 2 instructions per outer loop
iterate.

Page 2

You might also like