0% found this document useful (0 votes)
4 views8 pages

Lecture 2 - Cache 1

CPU ARch Notes Cache

Uploaded by

njanthony60
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views8 pages

Lecture 2 - Cache 1

CPU ARch Notes Cache

Uploaded by

njanthony60
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8

ECE 463/563

Microprocessor Architecture
Memory Hierarchies, Cache Memories
H&P: Appendix B and Chapter 2

Prof. Eric Rotenberg


ECE 463/563, Microprocessor Architecture, 1
Prof. Eric Rotenberg
Processor-memory performance gap
CPU
1000 CPU 60%/yr.
(2X/1.5yr)
Performance

100 processor-memory
performance gap
(grew 50% / year)
10
DRAM
DRAM
9%/yr.
(2X/10 yrs)
1
1980
1981
1982
1983
1984
1985
1986
1987
1988

1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
198
9

Time
ECE 463/563, Microprocessor Architecture, 2
Prof. Eric Rotenberg
Why main memory is slow
• Implementation technology optimized for density (cost), not speed
– DRAM 1T-1C cell
• Dense
• Slower to access than other technologies (e.g., SRAM 6T cell)
– Many bits per unit area
• Cost per bit is low
• More storage for same cost as faster technologies
• Main memory is a very large RAM
– Accessing a larger RAM is inherently slower than accessing a smaller RAM
• Regardless of the implementation technology
– Larger address decoders, longer routing to banks, longer wordlines/bitlines within
banks, etc.
• Going off chip is slow
– On-chip memory controller  I/O pins  memory bus  DRAM chip
– High latency
– Low bandwidth (limited number of I/O pins)
ECE 463/563, Microprocessor Architecture, 3
Prof. Eric Rotenberg
Quote from 1946
“Ideally one would desire an indefinitely large memory
capacity such that any particular … word would be
immediately available … We are … forced to recognize
the possibility of constructing a hierarchy of
memories, each of which has greater capacity than the
preceding but which is less quickly accessible.”

A. W. Burks, H. H. Goldstine, and J. von Neumann


Preliminary Discussion of the Logical Design of an
Electronic Computing Instrument (1946)
ECE 463/563, Microprocessor Architecture, 4
Prof. Eric Rotenberg
Locality of Reference
• Temporal locality:
– Recently accessed items are likely to be re-accessed soon
– Implies: Keep recently accessed items close by (in a cache)
• Spatial locality:
– Items with addresses near one another are likely to be
accessed close together in time
– Sequential locality (special case): next item accessed is likely
to be at the next sequential address in memory
– Implies: Fetch the “nearest neighbors” of an item when
fetching the item (fetch a large memory block)
ECE 463/563, Microprocessor Architecture, 5
Prof. Eric Rotenberg
How caches work (very high level)
• A cached memory block is “tagged” with its
address in main memory (so that it can be
searched for within the cache)
• As a black box:
CPU asks for a word
Cache
that is part of memory block X CPU gets requested word
searches for
(X is the address of the (it never talks to main memory)
memory block X
referenced memory block)

If not there, cache asks


main memory for memory block X
(called a cache miss)

ECE 463/563, Microprocessor Architecture, 6


Prof. Eric Rotenberg
Simple Memory Hierarchy
CPU

data instruction Level 1 or “L1” caches


registers load
stor cache cache
e
cache miss

unified data+inst.
L2 cache

cache miss
memory bus

main memory
(DRAM)

OS page fault handler


page fault

disk

(not relevant to main memory discussion) user files in file system.

“swap file” on disk gives appearance of a larger main memory

ECE 463/563, Microprocessor Architecture, 7


Prof. Eric Rotenberg
Example Hierarchy: Intel Skylake
Source:
https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(server)

Core 0 Core 1 Core n


L1 I$: L1 D$: L1 I$: L1 D$: L1 I$: L1 D$:
32KB 32KB 32KB 32KB … 32KB 32KB

L1 D$: 4 cycles L2 $:
L2 $: L2 $:
1MB 1MB L2: 14 cycles 1MB
L3: 50-70 cycles

Interconnection Network

Shared L3 $: 1.375 x n MB
Physically, each core has a “slice” of the L3 cache; but each core can access any slice.
L3 $ slice: L3 $ slice: L3 $ slice:
1.375 MB 1.375 MB 1.375 MB

ECE 463/563, Microprocessor Architecture, 8


Prof. Eric Rotenberg

You might also like