0% found this document useful (0 votes)
24 views

VM Os

Uploaded by

Sanket Karade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

VM Os

Uploaded by

Sanket Karade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

CS350: Operating Systems

Lecture 8: Virtual Memory OS

Ali Mashtizadeh

University of Waterloo

1 / 44
Outline

1 Paging

2 Eviction policies

3 Thrashing

4 User-level API

5 Case study: 4.4 BSD

2 / 44
Paging

• Use disk to simulate larger virtual than physical mem


3 / 44
Working set model
# of accesses

virtual address
• Disk much, much slower than memory
I Goal: run at memory speed, not disk speed
• 80/20 rule: 20% of memory gets 80% of memory accesses
I Keep the hot 20% in memory
I Keep the cold 80% on disk

4 / 44
Working set model
# of accesses

virtual address
• Disk much, much slower than memory
I Goal: run at memory speed, not disk speed
• 80/20 rule: 20% of memory gets 80% of memory accesses
Keep the hot 20% in memory
I Keep the cold 80% on disk

4 / 44
Working set model
# of accesses

virtual address
• Disk much, much slower than memory
I Goal: run at memory speed, not disk speed
• 80/20 rule: 20% of memory gets 80% of memory accesses
I Keep the hot 20% in memory
Keep the cold 80% on disk

4 / 44
Paging challenges

• How to resume a process after a fault?


I Need to save state and resume
I Process might have been in the middle of an instruction!
• What to fetch from disk?
I Just needed page or more?
• What to eject?
I How to allocate physical pages amongst processes?
I Which of a particular process’s pages to keep in memory?

5 / 44
Re-starting instructions

• Hardware provides kernel with information about page fault


I Faulting virtual address (In %c0_vaddr reg on MIPS)
I Address of instruction that caused fault (%c0_epc reg)
I Was the access a read or write? Was it an instruction fetch?
Was it caused by user access to kernel-only memory?
• Hardware must allow resuming after a fault
• Idempotent instructions are easy
I E.g., simple load or store instruction can be restarted
I Just re-execute any instruction that only accesses one address

6 / 44
What to fetch

• Bring in page that caused page fault


• Pre-fetch surrounding pages?
I Reading two disk blocks approximately as fast as reading one
I As long as no track/head switch, seek time dominates
I If application exhibits spacial locality, then big win to store and read multiple
contiguous pages
• Also pre-zero unused pages in idle loop
I Need 0-filled pages for stack, heap, anonymously mmapped memory
I Zeroing them only on demand is slower
I Hence, many OSes zero freed pages while CPU is idle

7 / 44
Selecting physical pages

• May need to eject some pages


I More on eviction policy in two slides
• May also have a choice of physical pages
• Direct-mapped physical caches
I Virtual → Physical mapping can affect performance
I In old days: Physical address A conflicts with kC + A
(where k is any integer, C is cache size)
I Applications can conflict with each other or themselves
I Scientific applications benefit if consecutive virtual pages do not conflict in the
cache
I Many other applications do better with random mapping
I These days: CPUs more sophisticated than kC + A

8 / 44
Superpages

• How should OS make use of “large” mappings


I x86 has 2/4MB pages that might be useful
I Alpha has even more choices: 8KB, 64KB, 512KB, 4MB
• Sometimes more pages in L2 cache than TLB entries
I Don’t want costly TLB misses going to main memory
• Or have two-level TLBs
I Want to maximize hit rate in faster L1 TLB
• OS can transparently support superpages [Navarro]
I “Reserve” appropriate physical pages if possible
I Promote contiguous pages to superpages
I Does complicate evicting (esp. dirty pages) – demote

9 / 44
Outline

1 Paging

2 Eviction policies

3 Thrashing

4 User-level API

5 Case study: 4.4 BSD

10 / 44
Straw man: FIFO eviction

• Evict oldest fetched page in system


• Example—reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
• 3 physical pages: 9 page faults

11 / 44
Straw man: FIFO eviction

• Evict oldest fetched page in system


• Example—reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
• 3 physical pages: 9 page faults
• 4 physical pages: 10 page faults

11 / 44
Belady’s Anomaly

• More physical memory doesn’t always mean fewer faults


12 / 44
Optimal page replacement

• What is optimal (if you knew the future)?

13 / 44
Optimal page replacement

• What is optimal (if you knew the future)?


I Replace page that will not be used for longest period of time
• Example—reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
• With 4 physical pages:

13 / 44
LRU page replacement

• Approximate optimal with least recently used


I Because past often predicts the future
• Example—reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
• With 4 physical pages: 8 page faults

• Problem 1: Can be pessimal – example?

• Problem 2: How to implement? 14 / 44


LRU page replacement

• Approximate optimal with least recently used


I Because past often predicts the future
• Example—reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
• With 4 physical pages: 8 page faults

• Problem 1: Can be pessimal – example?


I Looping over memory (then want MRU eviction)
• Problem 2: How to implement? 14 / 44
Straw man LRU implementations

• Stamp PTEs with timer value


I E.g., CPU has cycle counter
I Automatically writes value to PTE on each page access
I Scan page table to find oldest counter value = LRU page
I Problem: Would double memory traffic!
• Keep doubly-linked list of pages
I On access remove page, place at tail of list
I Problem: again, very expensive
• What to do?
I Just approximate LRU, don’t try to do it exactly

15 / 44
Clock algorithm

• Use accessed bit supported by most hardware


I E.g., Pentium will write 1 to A bit in PTE on first access
I Software managed TLBs like MIPS can do the same

• Do FIFO but skip accessed pages


A=0
• Keep pages in circular FIFO list A=1 A=1

• Scan: A=0 A=0


I page’s A bit = 1, set to 0 & skip
A=0 A=0
I else if A = 0, evict
• A.k.a. second-chance replacement A=1 A=1

A=1 A=0
A=0

16 / 44
Clock algorithm

• Use accessed bit supported by most hardware


I E.g., Pentium will write 1 to A bit in PTE on first access
I Software managed TLBs like MIPS can do the same

• Do FIFO but skip accessed pages


A=0
• Keep pages in circular FIFO list A=0 A=1

• Scan: A=0 A=0


I page’s A bit = 1, set to 0 & skip
A=0 A=0
I else if A = 0, evict
• A.k.a. second-chance replacement A=1 A=1

A=1 A=0
A=0

16 / 44
Clock algorithm

• Use accessed bit supported by most hardware


I E.g., Pentium will write 1 to A bit in PTE on first access
I Software managed TLBs like MIPS can do the same

• Do FIFO but skip accessed pages


A=0
• Keep pages in circular FIFO list A=0 A=1

• Scan: A=0 A=0


I page’s A bit = 1, set to 0 & skip
A=0 A=0
I else if A = 0, evict
• A.k.a. second-chance replacement A=1 A=1

A=1 A=0
A=0

16 / 44
Clock algorithm (continued)

A=0
• Large memory may be a problem A=1 A=0
I Most pages referenced in long A=1 A=1
interval
• Add a second clock hand A=0 A=0

I Two hands move in lockstep


A=1 A=1
I Leading hand clears A bits
I Trailing hand evicts pages with A=0 A=1 A=0
A=0
• Can also take advantage of hardware Dirty bit
I Each page can be (Unaccessed, Clean), (Unaccessed, Dirty), (Accessed, Clean), or
(Accessed, Dirty)
I Consider clean pages for eviction before dirty
• Or use n-bit accessed count instead just A bit
I On sweep: count = (A << (n − 1)) | (count >> 1)ft
I Evict page with lowest count 17 / 44
Clock algorithm (continued)

A=0
• Large memory may be a problem A=1 A=0
I Most pages referenced in long A=1 A=0
interval
• Add a second clock hand A=0 A=0

I Two hands move in lockstep


A=1 A=1
I Leading hand clears A bits
I Trailing hand evicts pages with A=0 A=1 A=0
A=0
• Can also take advantage of hardware Dirty bit
I Each page can be (Unaccessed, Clean), (Unaccessed, Dirty), (Accessed, Clean), or
(Accessed, Dirty)
I Consider clean pages for eviction before dirty
• Or use n-bit accessed count instead just A bit
I On sweep: count = (A << (n − 1)) | (count >> 1)ft
I Evict page with lowest count 17 / 44
Clock algorithm (continued)

A=0
• Large memory may be a problem A=1 A=0
I Most pages referenced in long A=1 A=0
interval
• Add a second clock hand A=0 A=0

I Two hands move in lockstep


A=1 A=1
I Leading hand clears A bits
I Trailing hand evicts pages with A=0 A=1 A=0
A=0
• Can also take advantage of hardware Dirty bit
I Each page can be (Unaccessed, Clean), (Unaccessed, Dirty), (Accessed, Clean), or
(Accessed, Dirty)
I Consider clean pages for eviction before dirty
• Or use n-bit accessed count instead just A bit
I On sweep: count = (A << (n − 1)) | (count >> 1)ft
I Evict page with lowest count 17 / 44
Other replacement algorithms

• Random eviction
I Simple to implement
I Not overly horrible (avoids Belady & pathological cases)
I Used in hypervisors to avoid double swap [Waldspurger]
• LFU (least frequently used) eviction
• MFU (most frequently used) algorithm
• Neither LFU nor MFU used very commonly
• Workload specific policies: Databases

18 / 44
Naïve paging

• Naïve page replacement: 2 disk I/Os per page fault


19 / 44
Page buffering

• Idea: reduce # of I/Os on the critical path


• Keep pool of free page frames
I On fault, still select victim page to evict
I But read fetched page into already free page
I Can resume execution while writing out victim page
I Then add victim page to free pool
• Can also yank pages back from free pool
I Contains only clean pages, but may still have data
I If page fault on page still in free pool, recycle

20 / 44
Page allocation

• Allocation can be global or local


• Global allocation doesn’t consider page ownership
I E.g., with LRU, evict least recently used page of any proc
I Works well if P1 needs 20% of memory and P2 needs 70%:
P1 P2
I Doesn’t protect you from memory pigs
(imagine P2 keeps looping through array that is size of mem)
• Local allocation isolates processes (or users)
I Separately determine how much memory each process should have
I Then use LRU/clock/etc. to determine which pages to evict within each process

21 / 44
Outline

1 Paging

2 Eviction policies

3 Thrashing

4 User-level API

5 Case study: 4.4 BSD

22 / 44
Thrashing

Thrashing is when an application is in a constantly swapping pages in and out


preventing the application from making forward progress at any reasonable rate.

• Processes require more memory than system has


I Each time one page is brought in, another page, whose contents will soon be
referenced, is thrown out
I Processes will spend all of their time blocked, waiting for pages to be fetched from
disk
I I/O devs at 100% utilization but system not getting much useful work done
• What we wanted: virtual memory the size of disk with access time the speed
of physical memory
• What we got: memory with access time of disk

23 / 44
Reasons for thrashing

• Access pattern has no temporal locality (past 6= future)

(80/20 rule has broken down)


• Hot memory does not fit in physical memory

P1
memory

• Each process fits individually, but too many for system


P1 P3 P5 P7 P9 P P P
P2 P4 P6 P8 P10 11P12 13P14 15P16
memory

I At least this case is possible to address

24 / 44
Dealing with thrashing

• Approach 1: working set


I Thrashing viewed from a caching perspective: given locality of reference, how big
a cache does the process need?
I Or: how much memory does the process need in order to make reasonable
progress (its working set)?
I Only run processes whose memory requirements can be satisfied
• Approach 2: page fault frequency
I Thrashing viewed as poor ratio of fetch to work
I PFF = page faults / instructions executed
I If PFF rises above threshold, process needs more memory.
Not enough memory on the system? Swap out.
I If PFF sinks below threshold, memory can be taken away

25 / 44
Working sets

Transitions
working set size

time

• Working set changes across phases


I Baloons during phase transitions

26 / 44
Outline

1 Paging

2 Eviction policies

3 Thrashing

4 User-level API

5 Case study: 4.4 BSD

27 / 44
Recall typical virtual address space
kernel
stack

breakpoint
heap
uninitialized data (bss)
initialized data
read-only data
code (text)
• Dynamically allocated memory goes in heap
• Top of heap called breakpoint
I Addresses between breakpoint and stack all invalid 28 / 44
Early VM system calls

• OS keeps “Breakpoint” – top of heap


I Memory regions between breakpoint & stack fault on access
• char *brk (const char *addr);
I Set and return new value of breakpoint
• char *sbrk (int incr);
I Increment value of the breakpoint & return old value
• Can implement malloc in terms of sbrk
I But hard to “give back” physical memory to system

29 / 44
Memory mapped files

kernel
stack

mmapped
regions

heap
uninitialized data (bss)
initialized data
read-only data
code (text)

• Other memory objects between heap and stack


30 / 44
mmap system call

• void *mmap (void *addr, size_t len, int prot,


int flags, int fd, off_t offset)
I Map file specified by fd at virtual address addr
I If addr is null, let kernel choose the address
• prot – protection of region
I OR of prot_exec, prot_read, prot_write, prot_none
• flags
I map_anon – anonymous memory (fd should be -1)
I map_private – modifications are private
I map_shared – modifications seen by everyone

31 / 44
More VM system calls

• int munmap(void *addr, size_t len)


I Removes memory-mapped object
• int mprotect(void *addr, size_t len, int prot)
I Changes protection on pages to or of PROT_…
• int msync(void *addr, size_t len, int flags);
I Flush changes of mmapped file to backing store
• int mincore(void *addr, size_t len, char *vec)
I Returns in vec which pages present
• int madvise(void *addr, size_t len, int behav)
I Advise the OS on memory use

32 / 44
Exposing page faults

struct sigaction {
union { /* signal handler */
void (*sa_handler)(int);
void (*sa_sigaction)(int, siginfo_t *, void *);
};
sigset_t sa_mask; /* signal mask to apply */
int sa_flags;
};

int sigaction (int sig, const struct sigaction *act,


struct sigaction *oact)
• Can specify function to run on SIGSEGV
(Unix signal raised on invalid memory access)

33 / 44
Example: OpenBSD/i386 siginfo

struct sigcontext {
int sc_gs; int sc_fs; int sc_es; int sc_ds;
int sc_edi; int sc_esi; int sc_ebp; int sc_ebx;
int sc_edx; int sc_ecx; int sc_eax;

int sc_eip; int sc_cs; /* instruction pointer */


int sc_eflags; /* condition codes, etc. */
int sc_esp; int sc_ss; /* stack pointer */

int sc_onstack; /* sigstack state to restore */


int sc_mask; /* signal mask to restore */

int sc_trapno;
int sc_err;
};
• Linux uses ucontext_t – same idea, just uses nested structures that won’t all
fit on one slide
34 / 44
VM tricks at user level

• Combination of mprotect/sigaction very powerful


I Can use OS VM tricks in user-level programs [Appel]
I E.g., fault, unprotect page, return from signal handler
• Technique used in object-oriented databases
I Bring in objects on demand
I Keep track of which objects may be dirty
I Manage memory as a cache for much larger object DB
• Other interesting applications
I Useful for some garbage collection algorithms
I Snapshot processes (copy on write)

35 / 44
Outline

1 Paging

2 Eviction policies

3 Thrashing

4 User-level API

5 Case study: 4.4 BSD

36 / 44
Overview

• Windows and most UNIX systems seperate the VM system into two parts
I VM PMap: Manages the hardware interface (e.g. TLB in MIPS)
I VM Map: Machine independent representation of memory

• 4.4 BSD VM is based on [Mach VM]

• VM Map consists of one or more objects (or segments)


• Each object consists of a contiguous mmap()
• Objects can be backed by files and/or shared between processes
• VM PMap manages the hardware (often caches mappings)

37 / 44
Operation

• Calls into mmap(), munmap(), mprotect()


I Update VM Map
I VM Map routines call into the VM PMap to invalidate and update the TLB

• Page faults
I Exception handler calls into the VM PMap to load the TLB
I If the page isn’t in the PMap we call VM Map code

• Low memory options


I PMap is a cache and can be discarded during a low memory condition

38 / 44
4.4 BSD VM system [McKusick]

• Each process has a vmspace structure containing


I vm_map – machine-independent virtual address space
I vm_pmap – machine-dependent data structures
I statistics – e.g. for syscalls like getrusage ()
• vm_map is a linked list of vm_map_entry structs
I vm_map_entry covers contiguous virtual memory
I points to vm_object struct
• vm_object is source of data
I e.g. vnode object for memory mapped file
I points to list of vm_page structs (one per mapped page)
I shadow objects point to other objects for copy on write

39 / 44
4.4 BSD VM data structures

shadow vm_page
vm_map_entry object
vm_map vnode/
object vm_page
vm_pmap vm_page
vm_map_entry
stats
vm_page
shadow vnode/
vmspace vm_map_entry object object

vm_page vm_page
vm_map_entry vnode/
object
vm_page

40 / 44
Pmap (machine-dependent) layer

• Pmap layer holds architecture-specific VM code


• VM layer invokes pmap layer
I On page faults to install mappings
I To protect or unmap pages
I To ask for dirty/accessed bits
• Pmap layer is lazy and can discard mappings
I No need to notify VM layer
I Process will fault and VM layer must reinstall mapping
• Pmap handles restrictions imposed by cache

41 / 44
Example uses

• vm_map_entry structs for a process


I r/o text segment → file object
I r/w data segment → shadow object → file object
I r/w stack → anonymous object
• New vm_map_entry objects after a fork:
I Share text segment directly (read-only)
I Share data through two new shadow objects
(must share pre-fork but not post-fork changes)
I Share stack through two new shadow objects
• Must discard/collapse superfluous shadows
I E.g., when child process exits

42 / 44
What happens on a fault?

• Traverse vm_map_entry list to get appropriate entry


I No entry? Protection violation? Send process a SIGSEGV
• Traverse list of [shadow] objects
• For each object, traverse vm_page structs
• Found a vm_page for this object?
I If first vm_object in chain, map page
I If read fault, install page read only
I Else if write fault, install copy of page
• Else get page from object
I Page in from file, zero-fill new page, etc.

43 / 44
Paging in day-to-day use

• Demand paging
I Read pages from vm_object of executable file
• Copy-on-write (fork, mmap, etc.)
I Use shadow objects
• Growing the stack, BSS page allocation
I A bit like copy-on-write for /dev/zero
I Can have a single read-only zero page for reading
I Special-case write handling with pre-zeroed pages
• Shared text, shared libraries
I Share vm_object (shadow will be empty where read-only)
• Shared memory
I Two processes mmap same file, have same vm_object (no shadow)

44 / 44

You might also like