IT3030E CA Chap6 Memory
IT3030E CA Chap6 Memory
❑ Flash memory
5,000ns – 50,000ns, $0.75 – $1 per GB
❑ Magnetic memory
5,000,000ns – 20,000,000ns, $0.05 – $0.1 per GB
❑ Fact:
Large memories are slow
Fast memories are small (and expensive)
On-Chip Components
Control
Cache Cache
Secondary
Instr Data
Second
ITLB DTLB Level Main Memory
Memory (Disk)
RegFile
Datapath Cache
(SRAM) (DRAM)
CPU Main
Cache Blocks of data memory
Instruction fetch
Memory read/write
❑ Direct mapped
Each memory block is mapped to exactly one block in the cache
- lots of lower level blocks must share blocks in the cache
Address mapping (to answer Q2):
(block address) modulo (# of blocks in the cache)
The tag field: associated with each cache block that contains
the address information (the upper portion of the address)
required to identify the block (to answer Q1)
The valid bit: if there is data in the block or not
0 1 2 3
4 3 4 15
Tag 20 10 Data
Hit
Index
Index Valid Tag Data
0
1
2
.
.
.
1021
1022
1023
20 32
Tag 20 10 Data
Hit
Index
Index Valid Tag Data
0
1
2
.
.
.
1021
1022
1023
20 32
32
0 1 2
3 4 3
4 15
4 hit 15 miss
01 Mem(5) Mem(4) 1101 Mem(5) Mem(4)
15 14
00 Mem(3) Mem(2) 00 Mem(3) Mem(2)
8 requests, 4 misses
8 KB
Miss rate (%)
16 KB
64 KB
256 KB
❑ Capacity:
Cache cannot contain all blocks accessed by the program
Solution: increase cache size (may increase access time)
❑ Conflict (collision):
Multiple memory locations mapped to the same cache location
Solution 1: increase cache size
Solution 2: increase associativity (may increase access time)
0 4 0 4
0 4 0 4
8 requests, 8 misses
❑ Ping pong effect due to conflict misses - two memory
locations that map into the same cache block
IT3030E, Fall 2022 32
Set Associative Cache Example
Main Memory
0000xx
One word blocks
Cache 0001xx
Two low order bits
0010xx define the byte in the
Way Set V Tag Data
0011xx word (32b words)
0 0100xx
0
1 0101xx
0 0110xx
1
1 0111xx
1000xx Q2: How do we find it?
1001xx
Q1: Is it there?
1010xx Use next 1 low order
1011xx memory address bit to
Compare all the cache
1100xx determine which
tags in the set to the
1101xx cache set (i.e., modulo
high order 3 memory
1110xx the number of sets in
address bits to tell if
1111xx the cache)
the memory block is in
the cache
IT3030E, Fall 2022 33
Another Reference String Mapping
❑ Consider the main memory word reference string
Start with an empty cache - all 0 4 0 4 0 4 0 4
blocks initially marked as not valid
8 requests, 2 misses
Tag 22 8
Index
Index V Tag Data V Tag Data V Tag Data V Tag Data
0 0 0 0
1 1 1 1
2 Way 0 2 Way 1 2 Way 2 2 Way 3
. . . .
. . . .
. . . .
253 253 253 253
254 254 254 254
255 255 255 255
32
4x1 select
Hit Data
IT3030E, Fall 2022 35
Range of Set Associative Caches
❑ For a fixed size cache, increase of the number of blocks
per set results in decrease of the number of sets
Used for tag compare Selects the set Selects the word in the block
Increasing associativity
Decreasing associativity
Fully associative
Direct mapped (only one set)
(only one way) Tag is all the bits except
Smaller tags, only a block and byte offset
single comparator
32KB
6 64KB
128KB
4 256KB
512KB
2
Data from Hennessy &
0 Patterson, Computer
1-way 2-way 4-way 8-way Architecture, 2003
Associativity
Last
used 0 miss 0 hit 0 hit 4 hit
x 000 Mem(0) x 000 Mem(0) x 000 Mem(0) 000 Mem(0)
❑ A new L2 is added
Access time from L1 to L2 is 5 ns.
Instruction miss rate (to main memory) reduced to 0.5%.
CPU L2
L1
RAM
Do programs need to be allocated in Example: CPU with 32 bit address, but the
contiguous physical pages? computer has only 1GB of physical
memory
IT3030E, Fall 2022 51
Address Translation
❑ CPU accesses a memory location based on virtual
address: Virtual page number + Page offset
❑ If the virtual page number can be translated to physical
page number (hit) → memory access can be done
properly.
❑ Otherwise (miss): page fault → very expensive operation
New physical page is allocated for the running process
- If no free physical pages is available, move an “old” page to disk to
make space for the new page ➔ page replacement
Content for the new page is loaded from disk
❑ Page present:
Handler copies PTE from memory to TLB
Then restarts instruction
❑ Sharing data
OS creates a page table entry for a virtual page of one process to
point to physical page of another page.
Write protection: use the write protection bit.
❑ Virtual memory
Address translation
TLB
Protection