OS - Virtual-Memory
OS - Virtual-Memory
Dr. N.Nalini
SCOPE
VIT
Background
• Code needs to be in memory to execute, but entire program rarely used
• Error code, unusual routines, large data structures
• Entire program code not needed at same time
• Consider ability to execute partially-loaded program
• Program no longer constrained by limits of physical memory
• Each program takes less memory while running -> more programs run at the
same time
• Increased CPU utilization and throughput with no increase in response
time or turnaround time
• Less I/O needed to load or swap programs into memory -> each user
program runs faster
Virtual memory
• Virtual memory – separation of user logical memory from physical memory
• Only part of the program needs to be in memory for execution
• Logical address space can therefore be much larger than physical address
space
• Allows address spaces to be shared by several processes
• Allows for more efficient process creation
• More programs running concurrently
• Less I/O needed to load or swap processes
Virtual memory (Cont.)
• Virtual address space – logical view of how process is stored in
memory
• Usually start at address 0, contiguous addresses until end of
space
• Meanwhile, physical memory organized in page frames
• MMU must map logical to physical
• Virtual memory can be implemented via:
• Demand paging
• Demand segmentation
Virtual Memory That is Larger Than Physical Memory
Virtual-address Space
• Usually design logical address space for stack to start
at Max logical address and grow “down” while heap
grows “up”
• Maximizes address space use
• Unused address space between the two is hole
• No physical memory needed until heap or
stack grows to a given new page
• Enables sparse address spaces with holes left for
growth, dynamically linked libraries, etc.
• System libraries shared via mapping into virtual
address space
• Shared memory by mapping pages read-write into
virtual address space
• Pages can be shared during fork(), speeding
process creation
Shared Library Using Virtual Memory
Demand Paging
• Could bring entire process into
memory at load time
• Or bring a page into memory only
when it is needed
– Less I/O needed, no
unnecessary I/O
– Less memory needed
– Faster response
– More users
• Similar to paging system with
swapping
Basic Concepts
• With swapping, pager guesses which pages will be used before
swapping out again
• Instead, pager brings in only those pages into memory
• How to determine that set of pages?
• Need new MMU functionality to implement demand paging
• If pages needed are already memory resident
• No difference from non demand-paging
• If page needed and not memory resident
• Need to detect and load the page into memory from storage
• Without changing program behavior
• Without programmer needing to change code
Valid-Invalid Bit
• With each page table entry a valid–invalid bit is associated
(v in-memory – memory resident, i not-in-memory)
• Initially valid–invalid bit is set to i on all entries
• Example of a page table snapshot:
Note now potentially 2 page transfers for page fault – increasing EAT
Page Replacement
Page and Frame Replacement Algorithms
15 page faults
• Giving more memory to a process would improve its performance (Increasing no.of frames)
– But, this assumption was not always true.
• Can vary by reference string: consider 1,2,3,4,1,2,5,1,2,3,4,5 find page fault if #frame=3 and 4 ?
• Adding more frames can cause more page faults!
• Belady’s Anomaly
• How to track ages of pages?
• Just use a FIFO queue
FIFO Illustrating Belady’s Anomaly
• Second-chance algorithm
• Generally FIFO, plus hardware-provided reference
bit
• Clock replacement
• If page to be replaced has
• Reference bit = 0 -> replace it
• reference bit = 1 then:
– set reference bit 0, leave page in memory
– replace next page, subject to same rules
Second-chance Algorithm
Second-chance Algorithm
Enhanced Second-Chance Algorithm
Counting Algorithms
• Keep a counter of the number of references that have been
made to each page
• Not common
• Lease Frequently Used (LFU) Algorithm:
• Replaces page with smallest count
• Most Frequently Used (MFU) Algorithm:
• Based on the argument that the page with the smallest count
was probably just brought in and has yet to be used
Page-Buffering Algorithms
• Keep a pool of free frames, always
– Then frame available when needed, not found at fault time
– Read page into free frame and select victim to evict and add to free pool
– When convenient, evict victim
• Possibly, keep list of modified pages
– When backing store otherwise idle, write pages there and set to non-dirty
• Possibly, keep free frame contents intact and note what is in them
– If referenced again before reused, no need to load contents again from disk
– Generally useful to reduce penalty if wrong victim frame selected
Applications and Page Replacement
• All of these algorithms have OS guessing about future page access
• Some applications have better knowledge – i.e. databases
• Memory intensive applications can cause double buffering
• OS keeps copy of page in memory as I/O buffer
• Application keeps page in memory for its own work
• Operating system can given direct access to the disk, getting out of
the way of the applications
• Raw disk mode
• Bypasses buffering, locking, etc.
Allocation of Frames
• Each process needs minimum number of frames
• Example: IBM 370 – 6 pages to handle SS MOVE instruction:
– instruction is 6 bytes, might span 2 pages
– 2 pages to handle from
– 2 pages to handle to
• Maximum of course is total frames in the system
• Two major allocation schemes
– fixed allocation
– priority allocation
• Many variations
Fixed Allocation
• Equal allocation – if there are 93 frames and 5 processes, each
process will get 18 frames. The 3 leftover frames can be used as a
free-frame buffer pool.
• Proportional allocation – Allocate according to the size of process
– Dynamic as degree of multiprogramming, process sizes change
m 64
si size of process pi s1 10( size of process1)
S si s2 127( size of process 2)
m total number of frames a1
10
62 4
s 137
ai allocation for pi i m 127
S a2 62 57
137
Global vs. Local Allocation
• Global replacement – process selects a replacement frame from the
set of all frames; one process can take a frame from another
(replacement frames with another process/outside frames)
• But then process execution time can vary greatly
• But greater throughput so more common
• Local replacement – each process selects from only its own set of
allocated frames (replacement frames with its own process frames)
• More consistent per-process performance
• But possibly underutilized memory
Reclaiming Pages
• A strategy to implement global page-replacement policy
• All memory requests are satisfied from the free-frame list,
rather than waiting for the list to drop to zero before we
begin selecting pages for replacement,
• Page replacement is triggered when the list falls below a
certain threshold.
• This strategy attempts to ensure there is always sufficient
free memory to satisfy new requests.
Reclaiming Pages Example
Non-Uniform Memory Access
• So far, we assumed that all memory accessed equally
• Many systems are NUMA – speed of access to memory varies
• Consider system boards containing CPUs and memory, interconnected
over a system bus
• NUMA multiprocessing architecture
Non-Uniform Memory Access (Cont.)
• Optimal performance comes from allocating memory “close to” the CPU on
which the thread is scheduled
• And modifying the scheduler to schedule the thread on the same system
board when possible
• Solved by Solaris by creating lgroups
• Structure to track CPU / Memory low latency groups
• Used my schedule and pager
• When possible schedule all threads of a process and allocate all
memory for that process within the lgroup
Thrashing
• If a process does not have “enough” pages, the page-fault rate is
very high
– Page fault to get page
– Replace existing frame
– But quickly need replaced frame back
– This leads to:
• Low CPU utilization
• Operating system thinking that it needs to increase the
degree of multiprogramming
• Another process added to the system
Thrashing (Cont.)
• Thrashing. A process is busy swapping pages in and out.
A process is thrashing if it is spending more time paging
than executing.
Demand Paging and Thrashing
• Why does demand paging work?
Locality model
• Process migrates from one locality to another
• Localities may overlap
• Why does thrashing occur?
– Program 2
for (i = 0; i < 128; i++)
for (j = 0; j < 128; j++)
data[i,j] = 0;
• Solaris
Windows
• Uses demand paging with clustering. Clustering brings in pages surrounding
the faulting page
• Processes are assigned working set minimum and working set maximum
• Working set minimum is the minimum number of pages the process is
guaranteed to have in memory
• A process may be assigned as many pages up to its working set maximum
• When the amount of free memory in the system falls below a threshold,
automatic working set trimming is performed to restore the amount of
free memory
• Working set trimming removes pages from processes that have pages in
excess of their working set minimum
Solaris
• Maintains a list of free pages to assign faulting processes
• Lotsfree – threshold parameter (amount of free memory) to begin paging
• Desfree – threshold parameter to increasing paging
• Minfree – threshold parameter to being swapping
• Paging is performed by pageout process
• Pageout scans pages using modified clock algorithm
• Scanrate is the rate at which pages are scanned. This ranges from slowscan
to fastscan
• Pageout is called more frequently depending upon the amount of free
memory available
• Priority paging gives priority to process code pages
Solaris 2 Page Scanner