0% found this document useful (0 votes)
15 views5 pages

TLB & Caches: N N N N

The document discusses address translation and caching techniques in computer systems. It introduces the translation lookaside buffer (TLB) which caches recent address translations and caches which store recently accessed memory values. It describes how TLB and cache accesses can be overlapped for better performance and discusses software-controlled versus hardware-controlled TLB approaches.

Uploaded by

hoang.van.tuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views5 pages

TLB & Caches: N N N N

The document discusses address translation and caching techniques in computer systems. It introduces the translation lookaside buffer (TLB) which caches recent address translations and caches which store recently accessed memory values. It describes how TLB and cache accesses can be overlapped for better performance and discusses software-controlled versus hardware-controlled TLB approaches.

Uploaded by

hoang.van.tuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

TLB & Caches

n TLB contain the most recent address translations


Address Translation Wrap up n Usually fully associative
Small – 16 to 32
Demand Paging
n

n Access time <= cycle


n Table: virtual address, physical address pairs
n Cache: contain the values of memory cells recently
accessed
Arvind Krishnamurthy n Usually direct mapped (or very small associativity)
Spring 2001 n Direct mapped è entry to check is (address % cache size)
n Virtually addressed caches:
n Suffer from synonym problems

n Multiple cache entries could be caching a physical location

n Cache entry: data and physical address from which it is obtained

Overlapped TLB & Cache Access Hardware-controlled TLB


n Sequential access: slow n On a TLB miss
VA PA miss n Hardware loads the PTE into the TLB
n Need to write back if there is no free entry
Trans- Main
CPU Cache
lation Memory n Generate a page fault if the page containing the PTE is invalid
hit n VM software performs fault handling
data n Restart the CPU
n Overlapped access: n On a TLB hit, hardware checks the valid bit
n Virtual address à TLB à Physical address n If valid, pointer to page in memory
VPage # offset PPage # offset n If invalid, the hardware generates a page fault
n Perform page fault handling

n Direct mapped cache: use low address bits to index cache n Restart the faulting instruction

n Therefore, if cache size <= page size, overlapped access is possible


n Or if we guarantee that translation does not modify the low address
bits even if cache size > page size (more complex page allocation)

Software-controlled TLB Summary


n On a miss in TLB, fault into OS software n Physical memory:
n On a hit, hardware checks for the valid bit n no protection gcc
n Hardware approach n limited size gcc
n almost forces contiguous allocation
n Efficient emacs
n sharing visible to program
n Inflexible
n easy to share data
n Software approach
n Virtual memory
n Inefficient
n each program isolated from others
n Flexible
n transparent: can’t tell where running
n can share code, data gcc gcc
n non-contiguous allocation
n Today: illusion of infinite memory

1
Why partial residency? Expanding physical memory
n Assumption we made to simplify things: n Virtual address translated to:
n All of process’s data is in memory n Physical memory ($1/meg). Very fast, but small
Load entire process into memory before it can run
Disk ($.01/meg). Very large, but very slow (millis vs nanos)
n
n

n Disk (persistent data)


n Problems?
n wasteful of space (process doesn’t use all of its memory)
n limits multiprograminng disk
n slow (especially with big process) page table

n Solution: partial residency


n demand paging: only bring in pages actually used
n paging: only keep frequently used pages in memory
n Mechanism:
n use virtual memory to map some addresses to physical pages, some to disk
Physical memory

Disk-sized memory run faster Demand paging mechanism


n Want: disk-sized memory that’s as fast as physical mem n Enhancement: page table has “present” (valid) bit
n 90/10 rule: 10% of memory gets 90% of memory refs n if present, pointer to page frame in memory
n so, keep that 10% in real memory, the other 90% on disk n if not present, go to disk (disk block number)
n how to pick which 10%? (look at past references) n Hardware traps to OS on reference to missing page
(in MIPS/Nachos, trap on TLB miss, OS checks page table valid bit)
# of references

n OS software
choose an old page to replace
n

if old page has been modified, write contents back to disk


n

n change its page table entry and TLB entry

n load new page into memory from disk

Memory address n update page table entry

n continue thread

all this is transparent, OS can run another job in the meantime!


Disk
Physical memory

Main problems Problem: resuming after a fault


n how to resume a process after a fault?
n Fault might have happened in the middle of an inst!
n need to save state and resume
n process might have been in the middle of an instruction! Usr program

n what to fetch? fault alloc page


add r1, r2, r3
just the needed page or more?
move (sp)++, r2
read from disk OS
resume
n

set mapping
n what to eject?
n physical memory always too small, which page to replace?
n Key constraint: don’t want user process to be aware that page
n may need to write the evicted page back to the disk fault happened (just like context switching)
n Can we skip the faulting instruction? No.
n how many pages for each process?
n Can we restart the instruction from the beginning?
n what to do when not enough memory?
n Not if it has partial-side effects.
n how to deal with thrashing?
n Can we inspect instruction to figure out what to do?
n May be ambiguous where it was.

2
Solution: hardware support Deciding what page(s) to fetch
n RISC machines are pretty simple: n Page selection: when to bring pages into memory
n instructions tend to have 1 memory ref & 1 side effect. n Like all caches: we need to know the future.
n Thus, only need faulting address and faulting PC. n Doesn’t the user know?
n Example: MIPS n Not reliably
n How to communicate that to the OS?
n Easy load-time hack: demand paging
Fault: epc = 0xffdd0, n Load initial page(s). Run. Load others on fault.
0xffdcc: add r1,r2,r3 bad va = 0x0ef80
0xffdd0: ld r1, 0(sp) fault handler
ld init pages ld page ld page ...
ld page
jump 0xffdd0 n When will startup be slower? Memory less utilized?
n Most systems do some sort of variant of this
n CISC harder:
n multiple memory references and side effects; interpret the instruction? n Tweak: pre-paging. Get page & its neighbors

Deciding what page to eject Page replacement algorithms


n Find some page in memory and swap it out n Basic algorithms
n Random (? used by TLB)
n Goal: minimum number of page faults n FIFO
n page fault rate 0 ≤ p ≤ 1.0 n Optimal (MIN)
n if p = 0 no page faults n LRU
n if p = 1, every reference is a fault n LRU approximations (Clock, FIFO extensions, etc.)
n effective memory access time (EAT)
EAT = (1 – p) x memory access n Goal: low page faults but cheap & simple to support
+ p (page fault overhead
+ [swap page out // often ctxt swtch to avoid
+ swap page in ] // often ctxt swtch to avoid
n Examples: use memory reference string
+ restart overhead) 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
n Example: mem access -- 1 micro sec; swap cost -- 10 milli sec assuming 3 physical pages

First-In-First-Out (FIFO) Optimal or MIN


n Algorithm:
n Replace the page that won’t be used for the longest time
Recently
5 3 4 7 9 11 2 1 15
Page n Pros
loaded out n Minimal page faults
n This is an off-line algorithm for performance analysis

n Algorithm n Cons
n Throw out the oldest page n No on-line implementation

n Pros
n Low-overhead implementation
n Cons
n May replace the heavily used pages

3
Least Recently Used (LRU) Implementing LRU
n Algorithm
n Replace page that hasn’t been used for the longest time
n Question
n What hardware mechanisms are required to implement LRU? Mostly Least
recently used 5 3 4 7 9 11 2 1 15 recently used

n Perfect
n Use a timestamp on each reference
n Keep a list of pages ordered by time of reference
n Is this practical?

More page frames →fewer faults? FIFO with 2nd chance


n Consider the following reference string with 4 page frames
Recently Page
n FIFO replacement 5 3 4 7 9 11 2 1 15
loaded out
n 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
n 10 page faults If reference bit is 1

n Consider the same reference string with 3 page frames n Main idea: add a “reference bit” (or use bit) per PTE
n FIFO replacement n Check the reference-bit of the oldest page
n 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 n If it is 0, then replace it
n 9 page faults! n If it is 1, clear the referent-bit, put it to the end of the list, and
continue searching
n This is called Belady’s anomaly
n Pros
n Fast and do not replace a heavily used page
n Cons
n The worst case may take a long time

Clock: Simple FIFO + 2nd Chance Not Recently Used (NRU)


Oldest page n Algorithm
n Randomly pick a page from the following (in this order)
n Not referenced and not modified

n Not referenced and modified

n Referenced and not modified

n Referenced and modified

n FIFO clock algorithm n Pros


n Hand points to the oldest page
n Easy to implement
n On a page fault, follow the hand to inspect pages
n Second chance n Cons
n If the reference bit is 1, set it to 0 and advance the hand n Not very good performance and also takes time to classify
n If the reference bit is 0, use it for replacement
n What is the difference between Clock and the previous one?

4
Enhanced FIFO with 2nd-chance State per page table entry
n Same as the basic FIFO with 2nd chance, except that this Many machines maintain four bits per page table entry:
method considers both reference bit and modified bit
n (0,0): neither recently used nor modified n use (aka reference): set when page is referenced,
n (0,1): not recently used but modified cleared by “clock algorithm”
n (1,0): recently used but clean
n (1,1): recently used and modified n modified (aka dirty): set when page is modified, cleared
when page is written to disk
n Pros
n Avoid write back n valid (aka present): ok for program to reference this
n Cons page
More complicated
read-only: ok for program to read page, but not to modify
n
n

it (e.g., for catching modifications to code pages)

How many pages are allocated to


State in software-loaded TLB each process ?
n What if we have software-loaded TLB (as in Nachos)? n Each process needs minimum number of pages.

n Hardware sets use bit in TLB; when TLB entry is replaced, n Example: IBM 370 – 6 pages to handle SS MOVE
software copies use bit back to page table instruction:
n instruction is 6 bytes, might span 2 pages.
2 pages to handle from.
n Software manages TLB entries as FIFO list; everything not n

2 pages to handle to.


in TLB is second-chance list. n

n Two major allocation schemes.


n fixed allocation
n priority allocation

Fixed allocation Priority allocation


n Use a proportional allocation scheme using priorities rather than size.
n Equal allocation – e.g., if 100 frames and 5
processes, give each 20 pages. n If process Pi generates a page fault,
n Proportional allocation – Allocate according to n select for replacement one of its frames.
the size of process. n select for replacement a frame from a process with lower priority number.

si = size of process pi n Global replacement – process selects a replacement frame from the set
S = ∑ si m = 64 of all frames; one process can take a frame from another.
m = total number of frames s1 = 10
si s2 = 127 n Local replacement – each process selects from only its own set of
ai = allocation for pi = × m 10 allocated frames.
S a1 = × 64 ≈ 5
137
127
a2 = × 64 ≈ 59
137

You might also like