Lecture 2 - Cache 1
Lecture 2 - Cache 1
Microprocessor Architecture
Memory Hierarchies, Cache Memories
H&P: Appendix B and Chapter 2
100 processor-memory
performance gap
(grew 50% / year)
10
DRAM
DRAM
9%/yr.
(2X/10 yrs)
1
1980
1981
1982
1983
1984
1985
1986
1987
1988
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
198
9
Time
ECE 463/563, Microprocessor Architecture, 2
Prof. Eric Rotenberg
Why main memory is slow
• Implementation technology optimized for density (cost), not speed
– DRAM 1T-1C cell
• Dense
• Slower to access than other technologies (e.g., SRAM 6T cell)
– Many bits per unit area
• Cost per bit is low
• More storage for same cost as faster technologies
• Main memory is a very large RAM
– Accessing a larger RAM is inherently slower than accessing a smaller RAM
• Regardless of the implementation technology
– Larger address decoders, longer routing to banks, longer wordlines/bitlines within
banks, etc.
• Going off chip is slow
– On-chip memory controller I/O pins memory bus DRAM chip
– High latency
– Low bandwidth (limited number of I/O pins)
ECE 463/563, Microprocessor Architecture, 3
Prof. Eric Rotenberg
Quote from 1946
“Ideally one would desire an indefinitely large memory
capacity such that any particular … word would be
immediately available … We are … forced to recognize
the possibility of constructing a hierarchy of
memories, each of which has greater capacity than the
preceding but which is less quickly accessible.”
unified data+inst.
L2 cache
cache miss
memory bus
main memory
(DRAM)
disk
L1 D$: 4 cycles L2 $:
L2 $: L2 $:
1MB 1MB L2: 14 cycles 1MB
L3: 50-70 cycles
Interconnection Network
Shared L3 $: 1.375 x n MB
Physically, each core has a “slice” of the L3 cache; but each core can access any slice.
L3 $ slice: L3 $ slice: L3 $ slice:
1.375 MB 1.375 MB 1.375 MB