Computer Organization & Architecture: Cache Memory

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Computer Organization & Architecture

Chapter 4
Cache Memory
Characteristics of Computer Memory
• Location
• Capacity
• Unit of transfer
• Access method
• Performance
• Physical type
• Physical characteristics
• Organisation
Characteristics of Computer Memory
Location of Memory
• CPU
—Registers
• Internal
—Cache, Main Memory
• External
—Accessible via I/O module
—Hard disk, Optical disk
Memory Hierarchy - Diagram
Memory Hierarchy List
• Registers
• L1 Cache
• L2 Cache
• Main memory
• Virtual memory (OS)
• Disk
• Optical
• Tape
Memory Capacity
• Word size
—The natural unit of organisation
—No. of bits used to represent an integer and to
instruction length
• Number of words
—or Bytes
• Word length = 8, 16, 32 bits
Unit of Transfer
• Internal
—Unit of transfer = no. of lines in data bus
—32, 64, 128, 256 bits
• External
—Usually a block which is much larger than a
word
• Addressable unit
—Smallest location which can be uniquely
addressed
—Word or block
Access Methods (1)
• Sequential
—Start at the beginning and read through in
order
—Access time depends on location of data and
previous location
—e.g. tape
• Direct
—Individual blocks have unique addresses
—Access is by jumping to vicinity plus sequential
search
—Access time depends on location and previous
location
—e.g. disk
Access Methods (2)
• Random
—Individual addresses identify locations exactly
—Access time is independent of location or
previous access
—e.g. RAM
• Associative
—Data is located by a comparison with contents
of a portion of the store
—Access time is independent of location or
previous access
—Word retrieved based on a portion of its
contents rather than its address
—e.g. cache
Performance Units
• Access time (latency)
—Time between presenting the address and
getting the valid data
• Memory Cycle time
—Time may be required for the memory to
“recover” before next access
—Cycle time = access time + recovery time
• Transfer Rate
—Rate at which data can be moved
Physical Types
• Semiconductor
—RAM
• Magnetic
—Disk & Tape
• Optical
—CD & DVD
• Others
Physical Characteristics
• Decay (requires refresh circuitry)
• Volatility (requires voltages)
• Erasable (re-writeability)
• Power consumption

• Organization
—Physical arrangement of bits in word
The Bottom Line
• How much?
—Memory Capacity
• How fast?
—Transfer Time
• How expensive?
—Monetary cost
So you want fast?
• It is possible to build a computer which
uses only static RAM
• This would be very fast

• This would need no cache


• This would cost very high
Locality of Reference
• During the execution of a program,
memory references tend to cluster
—Sequential execution
—Consecutive instructions
—Repetitive access to variables
—Loops
• If one instruction is being executed,
then it is very likely that nearby
instructions will also be executed
—Fetch block of instructions rather than a single
instruction
Spatial & Temporal Locality
• Spatial Locality
—Tendency of execution to involve a number of
memory locations that are clustered
—Sequential execution of instructions
—Sequential access to data values e.g. from a
table
• Temporal Locality
—Tendency of a processor to access memory
locations that have been used recently
—Loop execution
Cache
• Small amount of fast memory
• Sits between CPU and main memory
• May be located on CPU chip
Cache Hierarchy
• L1 Cache -> Closest to the CPU
• L2 Cache -> Next
• L3 Cache -> Farthest from CPU
Cache Hierarchy: On-Chip Cache
Cache Hierarchy
Instruction iCache vs. Data dCache

instruction cache

data cache
Cache/Main Memory Structure

[email protected] 23
Cache operation – overview
• CPU requests contents of memory location
• Check cache for this data
• If present, get from cache (fast)
• If not present, read required block from
main memory to cache
• Then deliver from cache to CPU
• Cache includes tags to identify which
block of main memory is in each cache
slot
Cache Read Operation - Flowchart
Typical Cache Interconnection
Cache Design Parameters
• Size
• Mapping Function
• Replacement Algorithm
• Write Policy
• Block Size
• Number of Caches

[email protected] 27
Size does matter
• Cost
—More cache is expensive
• Speed
—More cache is faster (up to a point)
—Checking cache for data takes time

[email protected] 28
Comparison of Cache Sizes

[email protected] 29
Finding Cache Size on Computer

http://www.cpuid.com/downloads/cpu-z/1.57-setup-en.exe
[email protected] 30
Typical Mapping Function
• Cache of 64kByte
—Cache block of 4 bytes
—i.e. cache is 16k (214) lines of 4 bytes
• 16MBytes main memory
—24 bit address
—(224=16M)

[email protected] 31
Direct Mapping
• Each block of main memory maps to only
one cache line
—i.e. if a block is in cache, it must be in one
specific place
• Address is in two parts
• Least Significant w bits identify unique
word
• Most Significant s bits specify one memory
block
• The MSBs are split into a cache line field r
and a tag of s-r (most significant)
[email protected] 32
Direct Mapping

[email protected] 33
Direct Mapping pros & cons
• Simple
• Inexpensive
• Fixed location for given block
—If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very
high (This low hit ratio is called thrashing)

[email protected] 34
Associative Mapping
• A main memory block can load into any
line of cache
• Every line’s tag is examined for a match
• Cache searching gets expensive

[email protected] 35
Set Associative Mapping
• Cache is divided into a number of sets
• Each set contains a number of lines
• A given block maps to any line in a given
set
—e.g. Block B can be in any line of set i
• e.g. 2 lines per set
—2 way associative mapping
—A given block can be in one of 2 lines in only
one set

[email protected] 36
Cache Hit Ratio & L2 Cache Size

[email protected] 37
Cache Misses & Associativity

[email protected] 38
Replacement Algorithms (1): Direct mapping
• No choice
• Each block only maps to one line
• Replace that line

[email protected] 39
Replacement Algorithms (2)
Associative & Set Associative
• Hardware implemented algorithm (fast)
• Least Recently used (LRU)
—e.g. in 2 way set associative
—Which of the 2 blocks is LRU?
• First in first out (FIFO)
—replace block that has been in cache longest
• Least frequently used
—replace block which has fewest hits
• Others

[email protected] 40
Write Policy
• Must not overwrite a cache block unless
main memory is up to date
• Multiple CPUs may have individual caches
• I/O may address main memory directly

[email protected] 41
Write Through Policy
• All writes go to main memory as well as
cache
• Multiple CPUs can monitor main memory
traffic to keep local cache up to date
• Lots of traffic
• Slows down writes

[email protected] 42
Write Back Policy
• Updates are initially made in cache only
• Update bit for cache slot is set when
update occurs
• If block is to be replaced, write to main
memory only if update bit is set
• I/O must access main memory through
cache
• 15% of memory references are writes

[email protected] 43
Intel Cache Evolution

[email protected] 44
Pentium 4 Cache
• Pentium (all versions) – two on chip L1 caches
— Data & instructions
• Pentium III – L3 cache added off chip
• Pentium 4
— L1 caches
– 8k bytes
– four way set associative
— L2 cache
– Feeding both L1 caches
– 256k
– 8 way set associative
— L3 cache on chip

[email protected] 45
Pentium 4 Block Diagram

[email protected] 46
Pentium 4 Core Processor
• Fetch/Decode Unit
— Fetches instructions from L2 cache
— Decode into micro-ops
— Store micro-ops in L1 cache
• Out of order execution logic
— Schedules micro-ops
— Based on data dependence and resources
— May speculatively execute
• Execution units
— Execute micro-ops
— Data from L1 cache
— Results in registers
• Memory subsystem
— L2 cache and systems bus
[email protected] 47
Intel Core i7 Block Diagram

[email protected] 48
IBM PowerPC Cache Organization
• 601 – single 32kb 8 way set associative
• 603 – 16kb (2 x 8kb) two way set
associative
• 604 – 32kb
• 620 – 64kb
• G3 & G4
—64kb L1 cache
– 8 way set associative
—256k, 512k or 1M L2 cache
– two way set associative
• G5
—32kB instruction cache
—64kB data cache
[email protected] 49
Questions ???

[email protected] 50
Virtual Memory
• An Operating System construct

• Virtual memory combines your computer’s


RAM with temporary space on your hard
disk.

• When RAM runs low, virtual memory


moves data from RAM to a space called
a paging file.

[email protected] 51
Virtual
Memory

[email protected] 52

You might also like