Computer Organization & Architecture: Cache Memory
Computer Organization & Architecture: Cache Memory
Computer Organization & Architecture: Cache Memory
Chapter 4
Cache Memory
Characteristics of Computer Memory
• Location
• Capacity
• Unit of transfer
• Access method
• Performance
• Physical type
• Physical characteristics
• Organisation
Characteristics of Computer Memory
Location of Memory
• CPU
—Registers
• Internal
—Cache, Main Memory
• External
—Accessible via I/O module
—Hard disk, Optical disk
Memory Hierarchy - Diagram
Memory Hierarchy List
• Registers
• L1 Cache
• L2 Cache
• Main memory
• Virtual memory (OS)
• Disk
• Optical
• Tape
Memory Capacity
• Word size
—The natural unit of organisation
—No. of bits used to represent an integer and to
instruction length
• Number of words
—or Bytes
• Word length = 8, 16, 32 bits
Unit of Transfer
• Internal
—Unit of transfer = no. of lines in data bus
—32, 64, 128, 256 bits
• External
—Usually a block which is much larger than a
word
• Addressable unit
—Smallest location which can be uniquely
addressed
—Word or block
Access Methods (1)
• Sequential
—Start at the beginning and read through in
order
—Access time depends on location of data and
previous location
—e.g. tape
• Direct
—Individual blocks have unique addresses
—Access is by jumping to vicinity plus sequential
search
—Access time depends on location and previous
location
—e.g. disk
Access Methods (2)
• Random
—Individual addresses identify locations exactly
—Access time is independent of location or
previous access
—e.g. RAM
• Associative
—Data is located by a comparison with contents
of a portion of the store
—Access time is independent of location or
previous access
—Word retrieved based on a portion of its
contents rather than its address
—e.g. cache
Performance Units
• Access time (latency)
—Time between presenting the address and
getting the valid data
• Memory Cycle time
—Time may be required for the memory to
“recover” before next access
—Cycle time = access time + recovery time
• Transfer Rate
—Rate at which data can be moved
Physical Types
• Semiconductor
—RAM
• Magnetic
—Disk & Tape
• Optical
—CD & DVD
• Others
Physical Characteristics
• Decay (requires refresh circuitry)
• Volatility (requires voltages)
• Erasable (re-writeability)
• Power consumption
• Organization
—Physical arrangement of bits in word
The Bottom Line
• How much?
—Memory Capacity
• How fast?
—Transfer Time
• How expensive?
—Monetary cost
So you want fast?
• It is possible to build a computer which
uses only static RAM
• This would be very fast
instruction cache
data cache
Cache/Main Memory Structure
[email protected] 23
Cache operation – overview
• CPU requests contents of memory location
• Check cache for this data
• If present, get from cache (fast)
• If not present, read required block from
main memory to cache
• Then deliver from cache to CPU
• Cache includes tags to identify which
block of main memory is in each cache
slot
Cache Read Operation - Flowchart
Typical Cache Interconnection
Cache Design Parameters
• Size
• Mapping Function
• Replacement Algorithm
• Write Policy
• Block Size
• Number of Caches
[email protected] 27
Size does matter
• Cost
—More cache is expensive
• Speed
—More cache is faster (up to a point)
—Checking cache for data takes time
[email protected] 28
Comparison of Cache Sizes
[email protected] 29
Finding Cache Size on Computer
http://www.cpuid.com/downloads/cpu-z/1.57-setup-en.exe
[email protected] 30
Typical Mapping Function
• Cache of 64kByte
—Cache block of 4 bytes
—i.e. cache is 16k (214) lines of 4 bytes
• 16MBytes main memory
—24 bit address
—(224=16M)
[email protected] 31
Direct Mapping
• Each block of main memory maps to only
one cache line
—i.e. if a block is in cache, it must be in one
specific place
• Address is in two parts
• Least Significant w bits identify unique
word
• Most Significant s bits specify one memory
block
• The MSBs are split into a cache line field r
and a tag of s-r (most significant)
[email protected] 32
Direct Mapping
[email protected] 33
Direct Mapping pros & cons
• Simple
• Inexpensive
• Fixed location for given block
—If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very
high (This low hit ratio is called thrashing)
[email protected] 34
Associative Mapping
• A main memory block can load into any
line of cache
• Every line’s tag is examined for a match
• Cache searching gets expensive
[email protected] 35
Set Associative Mapping
• Cache is divided into a number of sets
• Each set contains a number of lines
• A given block maps to any line in a given
set
—e.g. Block B can be in any line of set i
• e.g. 2 lines per set
—2 way associative mapping
—A given block can be in one of 2 lines in only
one set
[email protected] 36
Cache Hit Ratio & L2 Cache Size
[email protected] 37
Cache Misses & Associativity
[email protected] 38
Replacement Algorithms (1): Direct mapping
• No choice
• Each block only maps to one line
• Replace that line
[email protected] 39
Replacement Algorithms (2)
Associative & Set Associative
• Hardware implemented algorithm (fast)
• Least Recently used (LRU)
—e.g. in 2 way set associative
—Which of the 2 blocks is LRU?
• First in first out (FIFO)
—replace block that has been in cache longest
• Least frequently used
—replace block which has fewest hits
• Others
[email protected] 40
Write Policy
• Must not overwrite a cache block unless
main memory is up to date
• Multiple CPUs may have individual caches
• I/O may address main memory directly
[email protected] 41
Write Through Policy
• All writes go to main memory as well as
cache
• Multiple CPUs can monitor main memory
traffic to keep local cache up to date
• Lots of traffic
• Slows down writes
[email protected] 42
Write Back Policy
• Updates are initially made in cache only
• Update bit for cache slot is set when
update occurs
• If block is to be replaced, write to main
memory only if update bit is set
• I/O must access main memory through
cache
• 15% of memory references are writes
[email protected] 43
Intel Cache Evolution
[email protected] 44
Pentium 4 Cache
• Pentium (all versions) – two on chip L1 caches
— Data & instructions
• Pentium III – L3 cache added off chip
• Pentium 4
— L1 caches
– 8k bytes
– four way set associative
— L2 cache
– Feeding both L1 caches
– 256k
– 8 way set associative
— L3 cache on chip
[email protected] 45
Pentium 4 Block Diagram
[email protected] 46
Pentium 4 Core Processor
• Fetch/Decode Unit
— Fetches instructions from L2 cache
— Decode into micro-ops
— Store micro-ops in L1 cache
• Out of order execution logic
— Schedules micro-ops
— Based on data dependence and resources
— May speculatively execute
• Execution units
— Execute micro-ops
— Data from L1 cache
— Results in registers
• Memory subsystem
— L2 cache and systems bus
[email protected] 47
Intel Core i7 Block Diagram
[email protected] 48
IBM PowerPC Cache Organization
• 601 – single 32kb 8 way set associative
• 603 – 16kb (2 x 8kb) two way set
associative
• 604 – 32kb
• 620 – 64kb
• G3 & G4
—64kb L1 cache
– 8 way set associative
—256k, 512k or 1M L2 cache
– two way set associative
• G5
—32kB instruction cache
—64kB data cache
[email protected] 49
Questions ???
[email protected] 50
Virtual Memory
• An Operating System construct
[email protected] 51
Virtual
Memory