Parallel Computing Platforms-Dr Nausheen
Parallel Computing Platforms-Dr Nausheen
What is the average memory access time if cache hit is 85% for a 100nsec memory access time.
Impact of Caches: Example
Consider the architecture from the previous example. In this case, we
introduce a cache of size 32 KB with a latency of 1 ns or one cycle. We
use this setup to multiply two matrices A and B of dimensions 32 × 32.
We have carefully chosen these numbers so that the cache is large enough to
store matrices A and B, as well as the result matrix C.
for(i=0;i<r;i++) { O (N3)
for(j=0;j<c;j++) {
for(k=0;k<c;k++) {
mul[i][j]+=a[i][k]*b[k][j];
}}}
► Multiplying two n × n matrices takes 2n3 operations. For our problem, this corresponds to
64K operations, which can be performed in 16K cycles (or 16 µs) at four instructions per
cycle.
► The total time for the computation is therefore approximately the sum of time for
load/store operations and the time for the computation itself, i.e., 200 + 16 µs.
http://katecpp.github.io/cache-prefetching/
Impact of Memory Bandwidth
► The vector column_sum is small and easily fits into the cache
► The matrix b is accessed in a column order.
► The strided access results in very poor performance.