Cache Memory
Cache Memory
Cache Memory
Cost
More cache is expensive
Speed
More cache is faster (up to a point)
Checking cache for data takes time
Line or Slot r
14
24 bit address
No two blocks sharing the same line have the same Tag field
Word w
2
Illustration
of Example
Cons:
One fixed location for given block
If a program accesses 2 blocks that map to
the same line repeatedly, cache misses are
very high thrashing & counterproductivity
?
Random ?
Write through
All writes go to main memory as well as cache
(Typically 15% or less of memory references are
writes)
Challenges:
Multiple CPUs MUST monitor main memory traffic to
keep local (to CPU) cache up to date
Lots of traffic may cause bottlenecks
Potentially slows down writes
Write back
Updates initially made in cache only
(Update bit for cache slot is set when update occurs
Other caches must be updated)
If block is to be replaced, memory overwritten only if
update bit is set
( 15% or less of memory references are writes )
I/O must access main memory through cache or
update cache
Intel Caches
80386 no on chip cache
80486 8k using 16 byte lines and four way set associative organization
Pentium (all versions) two on chip L1 caches
Data & instructions
8k bytes
64 byte lines
four way set associative
L2 cache
L3 cache on chip
Solution
Processoronwhichfeature
firstappears
Externalmemoryslowerthanthesystembus.
Addexternalcacheusingfaster
memorytechnology.
386
Increasedprocessorspeedresultsinexternalbusbecoming
abottleneckforcacheaccess.
Moveexternalcacheonchip,
operatingatthesamespeedasthe
processor.
486
Internalcacheisrathersmall,duetolimitedspaceonchip
AddexternalL2cacheusingfaster
technologythanmainmemory
486
Createseparatedataandinstruction
caches.
Pentium
Createseparatebacksidebusthatruns
athigherspeedthanthemain(front
side)externalbus.TheBSBis
dedicatedtotheL2cache.
PentiumPro
MoveL2cacheontotheprocessor
chip.
PentiumII
AddexternalL3cache.
PentiumIII
MoveL3cacheonchip.
Pentium4
ContentionoccurswhenboththeInstructionPrefetcher
andtheExecutionUnitsimultaneouslyrequireaccessto
thecache.Inthatcase,thePrefetcherisstalledwhilethe
ExecutionUnitsdataaccesstakesplace.
Increasedprocessorspeedresultsinexternalbusbecoming
abottleneckforL2cacheaccess.
Someapplicationsdealwithmassivedatabasesandmust
haverapidaccesstolargeamountsofdata.Theonchip
cachesaretoosmall.
G5
32kB instruction cache
64kB data cache
Type
YearofIntroduction
Primarycache(L1)
2ndlevelCache(L2)
3rdlevelCache(L3)
IBM360/85
Mainframe
1968
16to32KB
PDP11/70
Minicomputer
1975
1KB
VAX11/780
Minicomputer
1978
16KB
IBM3033
Mainframe
1978
64KB
IBM3090
Mainframe
1985
128to256KB
Intel80486
PC
1989
8KB
Pentium
PC
1993
8KB/8KB
256to512KB
PowerPC601
PC
1993
32KB
PowerPC620
PC
1996
32KB/32KB
PowerPCG4
PC/server
1999
32KB/32KB
256KBto1MB
2MB
IBMS/390G4
Mainframe
1997
32KB
256KB
2MB
IBMS/390G6
Mainframe
1999
256KB
8MB
Pentium4
PC/server
Highendserver/
supercomputer
Supercomputer
2000
8KB/8KB
256KB
2000
64KB/32KB
8MB
2000
8KB
2MB
PC/server
2001
16KB/16KB
96KB
4MB
SGIOrigin2001
Highendserver
2001
32KB/32KB
4MB
Itanium2
PC/server
2002
32KB
256KB
6MB
IBMPOWER5
Highendserver
2003
64KB
1.9MB
36MB
CRAYXD1
Supercomputer
2004
64KB/64KB
1MB
IBMSP
CRAYMTAb
Itanium