Chapter - 05 9wy
Chapter - 05 9wy
Edition
The Hardware/Software Interface
DESIGN
Chapter 5
Large and Fast:
Exploiting Memory
Hierarchy
§5.1 Introduction
Principle of Locality
■ Programs access a small proportion of
their address space at any time
■ Temporal locality
■ Items accessed recently are likely to be
accessed again soon
■ e.g., instructions in a loop, induction variables
■ Spatial locality
■ Items near those accessed recently are likely
to be accessed soon
■ E.g., sequential instruction access, array data
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2
Taking Advantage of Locality
■ Memory hierarchy
■ Store everything on disk
■ Copy recently accessed (and nearby)
items from disk to smaller DRAM memory
■ Main memory
■ Copy more recently accessed (and
nearby) items from DRAM to smaller
SRAM memory
■ Cache memory attached to CPU
■ How do we know if
the data is present?
■ Where do we look?
■ #Blocks is a
power of 2
■ Use low-order
address bits
22,
26, Index V Tag Data
22, 000 N
26, 001 N
16, 010 N
3, 011 N
100 N
16,
101 N
18,
110 Y 10 Mem[10110]
16
111 N
22,
26, Index V Tag Data
22, 000 N
26, 001 N
16, 010 Y 11 Mem[11010]
3, 011 N
100 N
16,
101 N
18,
110 Y 10 Mem[10110]
16
111 N
22,
26, Index V Tag Data
22, 000 Y 10 Mem[10000]
26, 001 N
16, 010 Y 11
10 Mem[11010]
Mem[10010]
3, 011 Y 00 Mem[00011]
100 N
16,
101 N
18,
110 Y 10 Mem[10110]
16
111 N
64 blocks 16 bytes
■
et
4
■
bits
6
For
Data
For
Instruction
Previously
it was 5.44
4-Way
An Alternate
Implementation:
Remove
the multiplexor
Enable
Signals
■ Direct Mapped:
■ 4096 (212) 1-way set => 12 bit index
■ 16 bit tag (28 -12)
■ 4096 entries × 16 bit tag = 64 K bits tag
Service interruption
Deviation from
specified service
What is
the size
of the
220 (= 1 M) entries
page
table?
19 bits;
but 32 bits
wide usually
52 bits 12 bits
limits.
Stack ■ One grows from highest address down
Heap ■ One grows from lowest address up
■ So, address space is divided into 2 segments.
■ High-order bit of an address usually determines which
segment
■ i.e., which page table to use for that address.
■ So, each segment can be as large as one-half of the
address space.
■ A limit register for each segment specifies the current
size of the segment, which grows in units of pages.
Cache
256 blocks × 16 words/block
Example:
■ P2 wants P1 to access its page
■ P2 asks OS to create a page table entry for a virtual page in
P1’s address space that points to the same physical page that
P2 wants to share.
■ Any bits that determine the access rights for a page must be
included in both the page table and the TLB because the page
table is accessed only on a TLB miss.
switch the cache will contain data from the running process.
■
control to
■ Hardware caches
■ Reduce comparisons to reduce cost
■ Virtual memory
■ Full table lookup makes full associativity feasible
■ Benefit in reduced miss rate
31 10 9 4 3 0
Tag Index Offset
18 bits 10 bits 4 bits
Read/Write Read/Write
Valid Valid
32 32
Address Address
32 Cache 128 Memory
CPU Write Data Write Data
32 128
Read Data Read Data
Ready Ready
Multiple cycles
per access
Could partition
into separate
states to
reduce clock
cycle time