CS3350B Computer Architecture Memory Hierarchy: Why?: Marc Moreno Maza
CS3350B Computer Architecture Memory Hierarchy: Why?: Marc Moreno Maza
http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html
Department of Computer Science
University of Western Ontario, Canada
Locality Example:
- Reference to array elements in succession sum = 0;
(stride-1 reference pattern): for (i=0; i<n; i++)
Spatial locality sum += a[i];
- Reference to sum each iteration: return sum;
Temporal locality
Question: Can you permute the loops so that the function scans the
3D array a[] with a stride-1 reference pattern (and thus has good
spatial locality)?
CPU looks first for data in L1, then in L2, ..., then in main memory.
The TLB (Translation lookaside buffer) stores the recent translations of virtual
memory to physical memory and can be called an address-translation cache
The Andrew File System (AFS) and Network File System (NFS) are distributed file
system protocols
A proxy server is a server (a computer system or an application) that acts as an
intermediary for requests from clients seeking resources from other servers.
Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html
CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
18 / 27of W
Claim
Being able to look at code and get a qualitative sense of its locality
properties is a key skill for professional programmer.
Examples of projects driven by data locality (and other features):
BLAS (Basic Linear Algebra Subprograms)
http://www.netlib.org/blas/
SPIRAL, Software/Hardware Generation for DSP Algorithms
http://www.spiral.net/
FFTW, by Matteo Frigo and Steven G, Johnson
http://www.fftw.org/
Cache-Oblivious Algorithms, by Matteo Frigo, Charles E. Leiserson,
Harald Prokop, and Sridhar Ramachandran, 1999
https://en.wikipedia.org/wiki/Cache-oblivious_algorithm
...
Assuming that the cache hit costs are included as part of the normal
CPU execution cycle, we have:
With advancing technology, the CPU has more room on die for bigger
L1 caches and for second level cache - normally a unified L2 cache
(i.e., it holds both instructions and data,) and in some cases even a
unified L3 cache.
New AMAT Calculation:
AMAT = L1 Hit Time + L1 Miss Rate ∗ L1 Miss Penalty,
L1 Miss Penalty = L2 Hit Time + L2 Miss Rate ∗ L2 Miss Penalty
With L2 cache: