cache_concepts_memory
cache_concepts_memory
Fundamentals
Cache memories and
principle of locality
1
Cache CPU
Memory
CU ALU Clock
Registers
I/O BUS
Definition and Cache
Concepts
2
Cache: Definition
cache (kash), n.
• A hiding place used especially for storing provisions.
• A place for concealment and safekeeping, as of
valuables.
• The store of goods or valuables concealed in a hiding
place.
• Computer Science. A fast storage buffer in the central
processing unit (CPU) of a computer. In this sense, also
called cache memory.
Cache Memory
Small, fast SRAM-based memory managed
automatically in hardware (i.e., not software)
• Located on the CPU
• Hold frequently accessed blocks in main memory (RAM)
CPU
RAM
CU ALU Clock
Controller
Registers BUS
I/O BUS
data
Cache
General Cache Concept
Main memory is partitioned into equal
RAM
size chucks called a block
• Not physical partitioned Controller
• Block is a contiguous range of physical 0
address locations
1
2
For example, 3
• RAM partitioned into N blocks 4
5
N-1
Block Read Operation
Basic steps (for a block not in cache)
• CPU sends RAM controller the start address of
the block. RAM
• RAM controller puts a copy of the block on
the BUS Controller
• CPU controller reads the BUS and puts copy of
block in cache. 0
1
CPU
2
CU ALU Clock 3
Registers 4
3 (copy)
I/O BUS
5
Cache BUS
3 (copy) N-1
Block Write Operation
Basic steps:
RAM
• CPU controller puts a copy of the block (in
cache) on the BUS Controller
• RAM controller reads the BUS and
0
replaces block with the copy.
1
CPU
2
CU ALU Clock 3
Registers 4
3 (copy)
5
I/O BUS
Cache BUS
3 (copy) N-1
RAM
Main
Controller
Memory 1
2
3
Block Partitioning 4
5
N-1
8
Memory Partition: Blocks
Address (binary) Storage (1 Byte)
Simple example: 0000
block
0001
0010
bits 0011
• block size = 4 bytes 0100
block
0101
0110
• 1100 to 1111
block
1101
1110
1111
Block Offset Bits
Block address bits: 00 00 Address (binary) Storage (1 Byte)
0000
• 0000 to 0011 00 01
block
0001
0010
00 10 0011
00 11 0100
block
0101
block
in the block, e.g., byte at
offset 01 1010
1011
• block size = 2b bytes 2! 1100
• Total number of blocks = "
2
block
1101
1110
1111
RAM
Cache
Controller
Mapping 1
2
3
Block Placement 4
5
Algorithms
N-1
11
Cache Mapping Algos
Three types:
• Fully associative
• Direct mapping
• Set associative
12
Fully Associative (FA)
Important concepts:
1. Block data could be anywhere in the cache
2. Flexible block storage strategy
3. Expensive to evict and replace a block
• Block replacement algorithm
FA Read Example
memory address (m) bits = 4 Address Data (1 Byte)
0000 0xA1
Block size = 4 bytes 0001 0xA2
Tag bits (t) = 2 msb lsb 0010 0xA3
0011 0xA4
Block offset (b) bits = 2 0100 0xB1
$8 0001
0010
0xA2
0xA3
• Search tags -> cache miss 0011 0xA4
• Oh snap, cache is full!! 0100 0xB1
0101 0xB2
• Must evict a valid line (i.e.,
0110 0xB3
invalidate) and replace with new
0111 0xB4
block data!
1000 0xC1
1001 0xC2
Block offset (b) 1010 0xC3
1011 0xC4
line valid tag (t) 00 01 10 11
1100 0xD1
0 1 01 0xB1 0xB2 0xB3 0xB4 1101 0xD2
1110 0xD3
1 1 11 0xD1 0xD2 0xD3 0xD4
1111 0xD4
2 1 00 0xA1 0xA2 0xA3 0xA4
FA Replacement Algorithm
When cache is full, and a line must evicted, how to
pick which line to replace?
• LRU (Least-recently used)
• replaces the line that has gone UNACCESSED the LONGEST
• favors the most recently accessed data
• FIFO/LRR (first-in, first-out/least-recently replaced)
• replaces the OLDEST line in cache
• favors recently loaded items over older STALE items
• Random
• replace some line at RANDOM
• no favoritism – uniform distribution
Direct Mapping (DM)
Important concepts:
1. Line bits determine the exact location of the block
data in cache.
2. Fairly rigid storage strategy (see 1 above)
3. Simple to evict and replace a block (see 1 above)
• No block replacement algorithm
DM Read Example
memory address (m) bits = 4
Address Data (1 Byte)
Block size = 4 bytes
0000 0xA1
Tag bits (t) = 1 msb lsb 0001 0xA2
Line bits (s) = 1 0010 0xA3
0011 0xA4
Block offset (b) bits = 2 0100 0xB1
Valid bit: 0 = invalid 0101 0xB2
0110 0xB3
0111 0xB4
Two-line Cache design
1000 0xC1
1001 0xC2
Block offset (b)
1010 0xC3
Line (s) valid tag (t) 00 01 10 11 1011 0xC4
1100 0xD1
0 0
1101 0xD2
1 0 1110 0xD3
1111 0xD4
DM Read Example (Cont.)
CPU: Load data instruction, put data at
address 0111 in register $8 Address
0000
Data (1 Byte)
0xA1
• Go to line and identify tag -> cache miss 0001 0xA2
• Put copy of block at line (s=1) in cache 0010 0xA3
(i.e., valid = 0) 0011 0xA4
0100 0xB1
• Valid = 1
0101 0xB2
• Tag bit = 0
0110 0xB3
• Put 0xB4 in register $8 0111 0xB4
1000 0xC1
1001 0xC2
Block offset (b)
1010 0xC3
Line (s) valid tag (t) 00 01 10 11 1011 0xC4
1100 0xD1
0 0
1101 0xD2
1 1 0 0xB1 0xB2 0xB3 0xB4 1110 0xD3
1111 0xD4
DM Read Example (Cont.)
CPU: Load data instruction, put data
at address 0101 in register $9
Address Data (1 Byte)
0000 0xA1
27
Cache: Bytes, Shorts, and Words
In general, the size of a block in physical memory is
one (or more) words!
• Never put a single short or byte from DRAM into cache
• Instead, put the entire word, much more efficient!
• Why memory alignment is important!
Once the word is in cache, the byte
or short can be accessed through
hardware operations:
• i.e., bit mask and shifting
28
address
Principle stack
of Locality data
Block organization
program
time
29
Principle of Locality
Def: Programs tend to use data and instructions in
memory that have addresses near or equal to those
they have used recently!
Data references
• Reference array elements in succession (stride-1 Spatial locality
reference pattern).
• Reference variable sum each iteration. Temporal locality
32