0% found this document useful (0 votes)
40 views35 pages

Module-4 MEMORY MANAGEMENT: Memory-Management Strategies, Virtual-Memory Management

The document discusses memory management strategies, including address binding, swapping, contiguous memory allocation, and paging, highlighting the complexities of managing memory in multi-tasking operating systems. It explains the role of base and limit registers in memory protection, various allocation strategies, and the impact of fragmentation on memory efficiency. Additionally, it covers virtual memory management, emphasizing the benefits of demand paging and the ability to execute processes that exceed physical memory limits.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views35 pages

Module-4 MEMORY MANAGEMENT: Memory-Management Strategies, Virtual-Memory Management

The document discusses memory management strategies, including address binding, swapping, contiguous memory allocation, and paging, highlighting the complexities of managing memory in multi-tasking operating systems. It explains the role of base and limit registers in memory protection, various allocation strategies, and the impact of fragmentation on memory efficiency. Additionally, it covers virtual memory management, emphasizing the benefits of demand paging and the ability to execute processes that exceed physical memory limits.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

MODULE-4

MEMORY MANAGEMENT: Memory-Management Strategies,


Virtual-Memory Management

Memory Management Strategies


Every program to be executed has to be executed must be in memory. The instruction
must be fetched
from memory before it is executed.
• In multi-tasking OS memory management is complex, because as processes are
swapped in and out of
the CPU, their code and data must be swapped in and out of memory.
Basic Hardware
Main memory, cache and CPU registers in the processors are the only storage spaces
that CPU can
access directly.
The program and data must be bought into the memory from the disk, for the process
to run. Each
process has a separate memory space and must access only this range of legal
addresses. Protection of
memory is required to ensure correct operation. This prevention is provided by
hardware
implementation. Two registers are used - a base register and a limit register. The base
register holds the
smallest legal physical memory address; the limit register specifies the size of the
range.

For example, if the base register holds 1000 and limit register is 500, then the program
can legally
access all addresses from 1000 through 1500 (inclusive).
Protection of memory space is done. Any attempt by an executing program to access
operating system
memory or other program memory results in a trap to the operating system, which
treats the attempt as a fatal error. This scheme prevents a user program from
(accidentally or deliberately) modifying the code or data structures of either the
operating system or other users.

The base and limit registers can be loaded only by the operating system, which uses a
special
privileged instruction. Since privileged instructions can be executed only in kernel
mode only the operating system can load the base and limit registers.

1
Address Binding

User programs typically refer to memory addresses with symbolic names such as "i",
"count", and
"average Temperature". These symbolic names must be mapped or bound to physical
memory addresses, which typically occurs in several stages:

Compile Time - If it is known at compile


time where a program will reside in
physical memory,then absolute code
can be generated by the compiler,
containing actual physical addresses.
However if the load address changes at
some later time, then the program will
have to be recompiled.

Load Time - If the location at which a


program will be loaded is not known at
compile time, then the compiler must
generate relocatable code, which
references addresses relative to the start of
the program. If that starting address
changes, then the program must be
reloaded but not recompiled.

Execution Time - If a program can be


moved around in memory during the
course of its execution, then binding must
be delayed until execution time.

2
Swapping

• A process must be loaded into memory in order to execute.

• If there is not enough memory available to keep all running processes in memory at
the same time, then some processes that are not currently using the CPU may have
their memory swapped out to a fast local disk called the backing store.

• Swapping is the process of moving a process from memory to backing store


and moving another process from backing store to memory. Swapping is a
very slow process compared to other operations.

• It is important to swap processes out of memory only when they are idle, or more to
the point, only when there are no pending I/O operations. (Otherwise the pending I/O
operation could write into the wrong process's memory space.) The solution is to either
swap only totally idle processes, or do I/O operations only into and out of OS buffers,
which are then transferred to or from process's main memory as a second step.

• Most modern OSes no longer use swapping, because it is too slow and there are faster
alternatives available. (e.g. Paging. ) However, some UNIX systems will still invoke
swapping if the system gets extremely full, and then discontinue swapping when the
load reduces again.

Windows 3.1 would use a modified version of swapping that was somewhat controlled
by the user, swapping process's out if necessary and then only swapping them back in
when the user focused on that particular window.

3
Contiguous Memory Allocation
One approach to memory management is to load each process into a contiguous space.
The operating system is allocated space first, usually at either low or high memory
locations, and then the remaining available memory is allocated to processes as
needed. ( The OS is usually loaded low, because that is where the interrupt vectors are
located).
Here each process is contained in a single contiguous section of memory.

1.) Memory Mapping and Protection

• The system shown in figure below allows protection against user programs accessing
areas that they should not, allows programs to be relocated to different memory
starting addresses as needed, and allows the memory space devoted to the OS to grow
or shrink dynamically as needs change.

2.) Memory Allocation


• One method of allocating contiguous memory is to divide all available memory into
equal sized partitions, and to assign each process to their own partition (called as
MFT). This restricts both the number of simultaneous processes and the maximum
size of each process, and is no longer used.

• An alternate approach is to keep a list of unused (free) memory blocks ( holes ), and
to find a hole of a suitable size whenever a process needs to be loaded into memory
(called as MVT). There are many different strategies for finding the "best" allocation
of memory to processes, including the three most commonly discussed:

1. First fit - Search the list of holes until one is found that is big enough to
satisfy the request, and assign a portion of that hole to that process.
Whatever fraction of the hole not needed by the request is left on the free
list as a smaller hole. Subsequent requests may start looking either from the
beginning of the list or from the point at which this search ended.

2. Best fit - Allocate the smallest hole that is big enough to satisfy the
request. This saves large holes for other process requests that may need
them later, but the resulting unused portions of holes may be too small to
be of any use, and will therefore be wasted. Keeping the free list sorted can
speed up the process of finding the right hole.

4
3. Worst fit - Allocate the largest hole available, thereby increasing the
likelihood that the remaining portion will be usable for satisfying future
requests. Simulations show that either first or best fit are better than worst
fit in terms of both time and storage utilization. First and best fits are about
equal in terms of storage utilization, but first fit is faster.

3.) Fragmentation

The allocation of memory to process leads to fragmentation of memory. A hole is the


free space available within memory.

The two types of fragmentation are –


1.) External fragmentation – holes present in between the process
2.) Internal fragmentation - holes are present within the process itself. ie. There
is free space within a process.

• Internal fragmentation occurs with all memory allocation strategies. This is


caused by the fact that memory is allocated in blocks of a fixed size, whereas the actual
memory needed will rarely be that exact size.

• If the programs in memory are relocatable, (using execution-time address binding),


then the external fragmentation problem can be reduced via compaction, i.e.
moving all processes down to one end of physical memory so as to place all free
memory together to get a large free block. This only involves updating the relocation
register for each process, as all internal work is done using logical addresses.

• Another solution to external fragmentation is to allow processes to use non-


contiguous blocks of physical memory- Paging and Segmentation.

Paging

Paging is a memory management scheme that allows processes to be stored in physical


memory
discontinuously. It eliminates problems with fragmentation by allocating memory in
equal sized blocks
known as pages.
• Paging eliminates most of the problems of the other methods discussed previously,
and is the predominant memory management technique used today.
The basic idea behind paging is to divide physical memory into a number of equal sized
blocks called frames, and to divide a program’s logical memory space into blocks of
the same size called pages.

5
• Any page ( from any process ) can be placed into any available frame.
• The page table is used to look up which frame a particular page is stored in at the
moment. In the following example, for instance, page 2 of the program's logical
memory is currently stored in frame 3 of physical memory.

• A logical address consists of two parts: A page number in which the address resides,
and an offset from the beginning of that page. (The number of bits in the pagenumber
limits how many pages a single process can address. The number of bits in the offset
determines the maximum size of each page, and should correspond to the system
frame size. )

• The page table maps the page number to a frame number, to yield a physical address
which also has two parts:
1. The frame number and the offset within that frame.
2. The number of bits in the frame number determines how many frames the
system can address, and the number of bits in the offset determines the size
of each frame.

• Page numbers, frame numbers, and frame sizes are determined by the architecture,
but are typically powers of two, allowing addresses to be split at a certain number of
bits.

For example, if the logical address size is 2^m and the page size is 2^n, then the high
order m-n bits of a logical address designate the page number and the remaining n bits
represent the offset.

6
• Note that paging is like having a table of relocation registers, one for each page of the
logical memory.

• There is no external fragmentation with paging. All blocks of physical memory


are used, and there are no gaps in between and no problems with finding the right
sized hole for a particular chunk of memory.

• There is, however, internal fragmentation. Memory is allocated in chunks the size of
a page, and on the average, the last page will only be half full, wasting on the average
half a page of memory per process.

• Larger page sizes waste more memory, but are more efficient in terms of overhead.
Modern trends have been to increase page sizes, and some systems even have multiple
size pages to try and make the best of both worlds.

• Consider the following example, in which a process has 16 bytes of logical memory,
mapped in 4 byte pages into 32 bytes of physical memory. (Presumably some other
processes would be consuming the remaining 16 bytes of physical memory. )

7
When a process requests memory ( e.g. when its code is loaded in from disk ), free
frames are allocated from a free-frame list, and inserted into that process's page table.

• Processes are blocked from accessing anyone else's memory because all of their
memory requests are mapped through their page table. There is no way for them to
generate an address that maps into any other process's memory space.

• The operating system must keep track of each individual process's page table,
updating it whenever the process's pages get moved in and out of memory, and
applying the correct page table when processing system calls for a particular process.

This all increases the overhead involved when swapping processes in and out of the
CPU. ( The currently active page table must be updated to reflect the process that is
currently running. )

8
Structure of the Page Table

1. Hierarchical Paging
• This structure supports two or more page tables at different levels (tiers).
• Most modern computer systems support logical address spaces of 2^32 to 2^64.

VAX Architecture divides 32-bit addresses into 4 equal sized sections, and each page
is 512 bytes, yielding an address form of:

With a 64-bit logical address space and 4K pages, there are 52 bits worth of page
numbers, which is still too many even for two-level paging. One could increase the
paging level, but with 10-bit page tables it would take 7 levels of indirection, which
would be prohibitively slow memory access. So some other approach must be used.

9
2. Hashed Page Tables
One common data structure for accessing data that is sparsely distributed over a broad
range of possible values is with hash tables.

Figure below illustrates a hashed page table using chain-and-bucket hashing:

3. Inverted Page Tables


Another approach is to use an inverted page table. Instead of a table listing all of
the pages for a particular process, an inverted page table lists all of the pages currently
loaded in memory, for all processes. ( i.e. there is one entry per frame instead of one
entry per page. )

• Access to an inverted page table can be slow, as it may be necessary to search the
entire table in order to find the desired page .

• The ‘id’ of process running in each frame and its corresponding page number is stored
in the page table.

10
Segmentation
Most users (programmers) do not think of their programs as existing in one
continuous linear address space.

• Rather they tend to think of their memory in multiple segments, each dedicated to
a particular use, such as code, data, the stack, the heap, etc.

• Memory segmentation supports this view by providing addresses with a segment


number ( mapped to a segment base address ) and an offset from the beginning of that
segment.

• The logical address consists of 2 tuples:


<segment-number, offset>

• For example, a C compiler might generate 5 segments for the user code, library code,
global ( static ) variables, the stack, and the heap, as shown in Figure

Hardware
A segment table maps segment-offset addresses to physical addresses, and
simultaneously checks for invalid addresses.

Each entry in the segment table has a segment base and a segment limit. The segment
base contains the starting physical address where the segment resides in memory,
whereas the segment limit specifies the length of the segment.

• A logical address consists of two parts: a segment number, s, and an offset into
that segment, d. The segment number is used as an index to the segment table. The
offset d of the logical address must be between 0 and the segment limit.
When an offset is legal, it is added to the segment base to produce the address in
physical memory of the desired byte.

11
The segment table is thus essentially an array of baseand limit register pairs.

Virtual-Memory Management

Virtual memory is a technique that allows the execution of processes that


are not completely in memory. One major advantage of this scheme is that
programs can be larger than physical memory.
Background

In practice, most real processes do not need all their pages, or at least not all at once,
for several reasons:
1. Error handling code is not needed unless that specific error occurs, some of
which are quite rare.
2. Certain features of certain programs are rarely used.

The ability to load only the portions of processes that are actually needed has several
benefits:
 Programs could be written for a much larger address space (virtual
memory space) than physically exists on the computer.
 Because each process is only using a fraction of their total address space,
there is more memory left for other programs, improving CPU
utilization and system throughput.
 Less I/O is needed for swapping processes in and out of RAM, speeding
things up.

The figure below shows the general layout of virtual memory, which can be much
larger than physical memory:

12
The figure below shows virtual address space, which is the programmer’s logical
view of process memory storage. The actual physical layout is controlled by the
process's page table.

Note that the address space shown in Figure is sparse - A great


hole in the middle of the address space is never used, unless the
stack and/or the heap grow to fill the hole.

Virtual memory also allows the sharing of files and memory by multiple processes,
with several benefits:
 System libraries can be shared by mapping them into the virtual address
space of more than one process.
 Processes can also share virtual memory by mapping the same block of
memory to more than one process.
 Process pages can be shared during a fork( ) system call, eliminating the
need to copy all of the pages of the original ( parent ) process.

Shared Library using Virtual Memory

13
Demand Paging

The basic idea behind demand paging is that when a process is swapped in, its pages
are not swapped in all at once.
Rather they are swapped in only when the process needs them. ( on demand. ) This is
termed as lazy swapper, although a pager is a more accurate term.

The basic idea behind demand paging is that when a process is swapped in, the pager
only loads into memory those pages that is needed presently.

 Pages that are not loaded into memory are marked as invalid in the page
table, using the invalid bit. Pages loaded in memory are marked as valid.
 If the process only ever accesses pages that are loaded in memory (
memory resident pages ), then the process runs exactly as if all the
pages were loaded in to memory.

14
On the other hand, if a page is needed that was not originally loaded up, then a page
fault trap is generated, which must be handled in a series of steps:

1. The memory address requested is first checked, to make sure it was a valid memory
request.
2. If the reference is to an invalid page, the process is terminated. Otherwise, if the
page is not present in memory, it must be paged in.
3. A free frame is located, possibly from a free-frame list.
4. A disk operation is scheduled to bring in the necessary page from disk.
5. After the page is loaded to memory, the process's page table is updated with the new
frame number, and the invalid bit is changed to indicate that this is now a valid page
reference.
6. The instruction that caused the page fault must now be restarted from the
beginning.

In an extreme case, the program starts execution with zero pages in memory. Here NO
pages are swapped in for a process until they are requested by page faults. This is
known as pure demand paging.

The hardware necessary to support demand paging is the same as for paging and
swapping: A page table and secondary memory.

15
Performance of Demand Paging

o There is some slowdown and performance hit whenever a page fault


occurs(as the required page is not available in memory) and the system
has to go get it from memory.
o There are many steps that occur when servicing a page fault and some of
the steps are optional or variable. But just for the sake of discussion,
suppose that a normal memory access requires 200 nanoseconds, and
that servicing a page fault takes 8 milliseconds. ( 8,000,000
nanoseconds, or 40,000 times a normal memory access. ) With a page
fault rate of p, ( on a scale from 0 to 1 ), the effective access time is now:

Effective access time = p * time taken to access memory in page fault+ (1-p)* time
taken to access memory

= p * 8000000 + ( 1 - p ) * ( 200 )
= 200 + 7,999,800 * p

Even if only one access in 1000 causes a page fault, the effective access time drops from
200 nanoseconds to 8.2 microseconds, a slowdown of a factor of 40 times. In order to
keep the slowdown less than 10%, the page fault rate must be less than 0.0000025, or
one in 399,990 accesses.

Copy-on-Write

 The idea behind a copy-on-write is that the pages of a parent process is shared
by the child process, until one or the other of the processes changes the page.
Only when a process changes any page content, that page is copied for the child.

 Only pages that can be modified need to be labeled as copy-on-write. Code


segments can simply be shared.

 Some systems provide an alternative to the fork( ) system call called a virtual
memory fork, vfork( ). In this case the parent is suspended, and the child
uses the parent's memory pages. This is very fast for process creation, but
requires that the child not modify any of the shared memory pages before
performing the exec( ) system call.

16
Page Replacement

 In order to make the most use of virtual memory, we load several processes into
memory at the same time. Since we only load the pages that are actually needed
by each process at any given time, there are frames to load many more processes
in memory.
 If some process suddenly decides to use more pages and there aren't any free
frames available. Then there are several possible solutions to consider:

1. Adjust the memory used by I/O buffering, etc., to free up some frames for user
processes.

2. Put the process requesting more pages into a wait queue until some free frames
become available.

3. Swap some process out of memory completely, freeing up its page frames.

4. Find some page in memory that isn't being used right now, and swap that page only
out to disk, freeing up a frame that can be allocated to the process requesting it. This
is known as page replacement, and is the most common solution. There are many
different algorithms for page replacement.

The previously discussed page-fault processing assumed that there would be free
frames available on the free-frame list. Now the page-fault handling must be modified
to free up a frame if necessary, as follows:

1. Find the location of the desired page on the disk.


2. Find a free frame:
a. If there is a free frame, use it.
b. If there is no free frame, use a page-replacement algorithm to select
an existing frame to be replaced, known as the victim frame.
c. Write the victim frame to disk. Change all related page tables to
indicate that this page is no longer in memory.

3. Read in the desired page and store it in the frame. Change the entries in page table.

4. Restart the process that was waiting for this page.

17
Note that step 2c adds an extra disk write to the page-fault handling, thus doubling the
time required to process a page fault. This can be reduced by assigning a modify bit,
or dirty bit to each page in memory, indicating whether or not it has been changed
since it was last loaded in from disk.

If the page is not modified the bit is not set. If the dirty bit has not been set, then the
page is unchanged, and does not need to be written out to disk. Many page replacement
strategies specifically look for pages that do not have their dirty bit set.

There are two major requirements to implement a successful demand paging system.
A frame-allocation algorithm and a page-replacement algorithm.

The former centers around how many frames are allocated to each process, and the
latter deals with how to select a page for replacement when there are no free frames
available.

 The overall goal in selecting and tuning these algorithms is to generate the
fewest number of overall page faults. Because disk access is so slow relative to
memory access, even slight improvements to these algorithms can yield large
improvements in overall system performance.

 Algorithms are evaluated using a given string of page accesses known as a


reference string.

Few Page Replacement algorithms –


a) FIFO Page Replacement

 A simple and obvious page replacement strategy is FIFO, i.e. first-in-first-out.


 This algorithm associates with each page the time when that page was brought
into memory. When a page must be replaced, the oldest page is chosen.

 Or a FIFO queue can be created to hold all pages in memory. As new pages are
brought in, they are added to the tail of a queue, and the page at the head of the
queue is the next victim.
 In the following example, a reference string is given and there are 3 free frames.
There are 20 page requests, which results in 15 page faults.

18
 Although FIFO is simple and easy to understand, it is not always optimal, or
even efficient.

 Belady's anomaly tells that for some page-replacement algorithms, the


page-fault rate may increase as the number of allocated frames increases.

b) Optimal Page Replacement

 The discovery of Belady's anomaly lead to the search for an optimal page-
replacement algorithm, which is simply that which yields the lowest of all
possible page-faults, and which does not suffer from Belady's anomaly.
 Such an algorithm does exist, and is called OPT or MIN. This algorithm is
"Replace the page that will not be used for the longest time in the future."
 The same reference string used for the FIFO example is used in the example
below, here the minimum number of possible page faults is 9.

 Unfortunately OPT cannot be implemented in practice, because it requires the


knowledge of future string, but it makes a nice benchmark for the comparison
and evaluation of real proposed new algorithms

19
C) LRU Page Replacement

 The LRU (Least Recently Used) algorithm, predicts that the page that has
not been used in the longest time is the one that will not be used again in the
near future.
 Some view LRU as analogous to OPT, but here we look backwards in time
instead of forwards.
 Figure illustrates LRU for our sample string, yielding 12 page faults, ( as
compared to 15 for FIFO and 9 for OPT. ) LR

LRU is considered a good replacement policy, and is often used. There are two simple
approaches commonly used to implement this:

1. Counters. With each page-table entry a time-of-use field is associated. Whenever


a reference to a page is made, the contents of the clock register are copied to the time-
of-use field in the page-table entry for that page. In this way, we always have the "time"
of the last reference to each page. This scheme requires a search of the page table to
find the LRU page and a write to memory for each memory access.

2. Stack. Another approach is to use a stack, and whenever a page is accessed, pull
that page from the middle of the stack and place it on the top. The LRU page will always
be at the bottom of the stack. Because this requires removing objects from the middle
of the stack, a doubly linked list is the recommended data structure.

Neither LRU or OPT exhibit Belady's


anomaly. Both belong to a class of page-
replacement algorithms called stack
algorithms, which can never exhibit
Belady's anomaly.

20
d) LRU-Approximation Page Replacement

 Many systems offer some degree of hardware support, enough to approximate


LRU.
 In particular, many systems provide a reference bit for every entry in a page
table, which is set anytime that page is accessed. Initially all bits are set to zero,
and they can also all be cleared at any time. One bit distinguishes pages that
have been accessed since the last clear from those that have not been accessed.

d.1 Additional-Reference-Bits Algorithm

 An 8-bit byte(reference bit) is stored for each page in a table in memory.


 At regular intervals (say, every 100 milliseconds), a timer interrupt transfers
control to the operating system. The operating system shifts the reference bit
for each page into the high-order bit of its 8-bit byte, shifting the other bits right
by 1 bit and discarding the low-order bit.
 These 8-bit shift registers contain the history of page use for the last eight time
periods.
 If the shift register contains 00000000, then the page has not been used for
eight time periods.
 A page with a history register value of 11000100 has been used more recently
than one with a value of 01110111.

d.2 Second-Chance Algorithm

 The second chance algorithm is a FIFO replacement algorithm, except the


reference bit is used to give pages a second chance at staying in the page table.
 When a page must be replaced, the page table is scanned in a FIFO ( circular
queue ) manner.

 If a page is found with its reference bit as ‘0’, then that page is selected as the
next victim.
 f the reference bit value is ‘1’, then the page is given a second chance and its
reference bit value is cleared (assigned as‘0’).

 Thus, a page that is given a second chance will not be replaced until all other
pages have been replaced (or given second chances). In addition, if a page is
used often, then it sets its reference bit again.
 This algorithm is also known as the clock algorithm.

 One way to implement the second-chance algorithm is as a circular queue. A


pointer indicates which page is to be replaced next. When a frame is needed,
the pointer advances until it finds a page with a 0 reference bit. As it advances,
it clears the reference bits. Once a victim page is found, the page is replaced,
and the new page is inserted in the circular queue in that position.

21
d.3 Enhanced Second-Chance Algorithm

The enhanced second chance algorithm looks at the reference bit and the modify
bit ( dirty bit ) as an ordered page, and classifies pages into one of four classes:

1. ( 0, 0 ) - Neither recently used nor modified.


2. ( 0, 1 ) - Not recently used, but modified.
3. ( 1, 0 ) - Recently used, but clean.
4. ( 1, 1 ) - Recently used and modified.

 This algorithm searches the page table in a circular fashion, looking for the first
page it can find in the lowest numbered category. i.e. it first makes a pass
looking for a ( 0, 0 ), and then if it can't find one, it makes another pass looking
for a ( 0, 1 ), etc.
 The main difference between this algorithm and the previous one is the
preference for replacing clean pages if possible.

e) Counting-Based Page Replacement

There are several algorithms based on counting the number of references that have
been made to a given page, such as:
 Least Frequently Used, LFU: Replace the page with the lowest
reference count. A problem can occur if a page is used frequently initially
and then not used any more, as the reference count remains high. A
solution to this problem is to right-shift the counters periodically,
yielding a time-decaying average reference count.

 Most Frequently Used, MFU: Replace the page with the highest
reference count. The logic behind this idea is that pages that have already
been referenced a lot have been in the system a long time, and we are
probably done with them, whereas pages referenced only a few times
have only recently been loaded, and we still need them.

f) Page-Buffering Algorithms

 Maintain a certain minimum number of free frames at all times. When a page-
fault occurs, go ahead and allocate one of the free frames from the free list first,
so that the requesting process is in memory as early as possible, and then select
a victim page to write to disk and free up a frame.

 Keep a list of modified pages, and when the I/O system is idle, these pages are
written to disk, and then clear the modify bits, thereby increasing the chance of
finding a "clean" page for the next potential victim and page replacement can
be done much faster.

22
Allocation of Frames

The absolute minimum number of frames that a process must be allocated is


dependent on system architecture.
The maximum number is defined by the amount of available physical memory.

Allocation Algorithms

After loading of OS, there are two ways in which the allocation of frames can be
done to the processes.

 Equal Allocation - If there are m frames available and n processes to share


them, each process gets m / n frames, and the leftovers are kept in a free-frame
buffer pool.

 Proportional Allocation - Allocate the frames proportionally depending on


the size of the process. If the size of process i is Si, and S is the sum of size of all
processes in the system, then the allocation for process Pi is ai = m * Si / S.
where m is the free frames available in the system.

Consider a system with a 1KB frame size. If a small student process of 10 KB and an
interactive database of 127 KB are the only two processes running in a system with 62
free frames.
With proportional allocation, we would split 62 frames between two processes, as
follows-
m=62, S = (10+127)=137
Allocation for process 1 = 62 X 10/137 ~ 4
Allocation for process 2 = 62 X 127/137 ~57

Thus allocates 4 frames and 57 frames to student process and database respectively.

Variations on proportional allocation could consider priority of process rather than


just their size

Global versus Local Allocation

 Page replacement can occur both at local or global level.


 With local replacement, the number of pages allocated to a process is fixed, and
page replacement occurs only amongst the pages allocated to this process.
 With global replacement, any page may be a potential victim, whether it
currently belongs to the process seeking a free frame or not.
 Local page replacement allows processes to better control their own page fault
rates, and leads to more consistent performance of a given process over
different system load levels.

23
 Global page replacement is overall more efficient, and is the more commonly
used approach.

Non-Uniform Memory Access ( New )

 Usually the time required to access all memory in a system is equivalent.


 This may not be the case in multiple-processor systems, especially where each
CPU is physically located on a separate circuit board which also holds some
portion of the overall system memory.
 In such systems, CPUs can access memory that is physically located on the same
board much faster than the memory on the other boards.
 The basic solution is akin to processor affinity - At the same time that we try to
schedule processes on the same CPU to minimize cache misses, we also try to
allocate memory for those processes on the same boards, to minimize access
times.

Thrashing
Thrashing is the state of a process where there is high paging activity.
A process that is spending more time paging than executing is said to be thrashing.

Cause of Thrashing

 When memory is filled up and processes starts spending lots of time waiting for
their pages to page in, then CPU utilization decreases (Processes are not
executed as they are waiting for some pages), causing the scheduler to add in
even more processes and increase the degree of multiprogramming even more.
Thrashing has occurred, and system throughput plunges. No work is getting
done, because the processes are spending all their time paging.

 In the graph given below, CPU utilization is plotted against the degree of
multiprogramming. As the degree of multiprogramming increases, CPU
utilization also increases, although more slowly, until a maximum is reached. If
the degree of multiprogramming is increased even further, thrashing sets in,
and CPU utilization drops sharply. At this point, to increase CPU utilization and
stop thrashing, we must decrease the degree of multiprogramming.

Local page replacement policies can prevent thrashing process from taking pages away
from other processes, but it still tends to clog up the I/O queue.

24
Working-Set Model

The working set model is based on the concept of locality, and defines a working
set window, of length delta. Whatever pages are included in the most recent delta
page references are said to be in the processes working set window, and comprise its
current working set, as illustrated in Figure

 The selection of delta is critical to the success of the working set model - If it is
too small then it does not encompass all of the pages of the current locality, and
if it is too large, then it encompasses pages that are no longer being frequently
accessed.

 The total demand of frames, D, is the sum of the sizes of the working sets for all
processes (D=WSSi ). If D exceeds the total number of available frames, then at
least one process is thrashing, because there are not enough frames available to
satisfy its minimum working set. If D is significantly less than the currently
available frames, then additional processes can be launched.

 The hard part of the working-set model is keeping track of what pages are in the
current working set, since every reference adds one to the set and removes one
older page.

25
Page-Fault Frequency

When page-fault rate is too high, the process needs more frames and when it is too
low, the process may have too many frames.

The upper and lower bounds can be established on the page-fault rate. If the actual
page-fault rate exceeds the upper limit, allocate the process another frame or suspend
the process. If the page-fault rate falls below the lower limit, remove a frame from the
process. Thus, we can directly measure and control the page-fault rate to prevent
thrashing.

Allocating Kernal memory

When a process running in user mode requests additional memory, pages are
allocated from the list of free page frames maintained by the kernel. This list is
typically populated using a page-replacement algorithm and most likely contains free
pages scattered throughout physical memory, as explained earlier.

Remember, too, that if a user process requests a single byte of memory, internal
fragmentation will result, as the process will be granted, an entire page frame. Kernel
memory, however, is often allocated from a free-memory pool different from the list
used to satisfy ordinary user-mode processes.

There are two primary reasons for this:

1. The kernel requests memory for data structures of varying sizes, some of which are
less than a page in size. As a result, the kernel must use memory conservatively and
attempt to minimize waste due to fragmentation. This is especially important because
many operating systems do not subject kernel code or data to the paging system.

2. Pages allocated to user-mode processes do not necessarily have to be in contiguous


physical memory. However, certain hardware devices interact directly with physical
memory—-without the benefit of a virtual memory interface—and consequently may

26
require memory residing in physically contiguous pages. In the following sections, we
examine two strategies for managing free memory that is assigned to kernel
processes.

1. Buddy System

The "buddy system" allocates memory from a fixed-size segment consisting of


physically contiguous pages.

Memory is allocated from this segment using a power-of-2 allocator, which satisfies
requests in units sized as a power of 2 (4 KB, 8 KB, 16 KB, and so forth).

A request in units not appropriately sized is rounded up to the next highest power of
2.

For example, if a request for 11 KB is made, it is satisfied with a 16-KB segment. Next,
we explain the operation of the buddy system with a simple example. Let's assume
the size of a memory segment is initially 256 KB and the kernel requests 21 KB of
memory.

The segment is initially divided into two buddies—which we will call Ai and AR—each
128 KB in size. One of these buddies is further divided into two 64-KB buddies—B;
and B«.

However, the next-highest power of 2 from 21 KB is 32 KB so either B;_ or BR is again


divided into two 32-KB buddies, C[. and CR. One of these buddies is used to satisfy
the 21-KB request. This scheme is illustrated in Figure 9.27, where C;_ is the segment
allocated to the 21 KB request.

An advantage of the buddy system is how


quickly adjacent buddies dan be combined to
form larger segments using a technique
known as coalescing. In Figure for example,
when the kernel releases the Q. unit it was
allocated, the system can coalesce C-L and CR
into a 64-KB segment. This segment, BL, can
in turn be coalesced with its buddy BR to form
a 128-KB segment.

Ultimately, we can end up with the original


256-KB segment. The obvious drawback to

27
the buddy system is that rounding up to the next highest power of 2 is very likely to
cause fragmentation within allocated segments. For example, a 33-KB request can
only be satisfied with a 64- KB segment. In fact, we cannot guarantee that less than
50 percent of the allocated unit will be wasted due to internal fragmentation. In the
following section, we explore a memory allocation scheme where no space is lost due
to fragmentation.

2.Slab Allocation

A second strategy for allocating kernel memory is known as slab allocation.

A slab is made up of one or more physically contiguous pages. A cache consists of one
or more slabs.

There is a single cache for each unique kernel data structure —for example, a separate
cache for the data structure representing process descriptors, a separate cache for file
objects, a separate cache for semaphores, and so forth. Each cache is populated with
objects that are instantiations of the kernel data structure the cache represents.

For example, the cache representing semaphores stores instances of semaphores


objects, the cache representing process descriptors stores instances of process
descriptor objects, etc. The relationship between slabs, caches, and objects is shown
in Figure.

28
The figure shows two kernel objects 3 KB in size and three objects 7 KB in size. These
objects are stored in their respective caches The slab-allocation algorithm uses caches
to store kernel objects.

When a cache is created, a number of objects—which are initially marked as free—are


allocated to the cache. The number of objects in the cache depends on the size of the
associated slab.

For example, a 12-KB slab (comprised of three continguous 4-KB pages) could store
six 2-KB objects. Initially, all objects in the cache are marked as free. When a new
object for a kernel data structure is needed, the allocator can assign any free object
from the cache to satisfy the request.

The object assigned from the cache is marked as used.

Let's consider a scenario in which the kernel requests memory from the slab allocator
for an object representing a process descriptor. In Linux systems, a process descriptor
is of the type struc t task^struct, which requires approximately 1.7 KB of memory.

When the Linux kernel creates a new task, it requests the necessary memory for the
struc t task.struc t object from its cache. The cache will fulfill the request using a struct
taskstruct object that has already been allocated in a slab and is marked as free.

In Linux, a slab may be in one of three possible states:

1. Full. All objects in the slab are marked as used.

2. Empty. All objects in the slab are marked as free.

3. Partial. The slab consists of both used and free objects.

The slab allocator first attempts to satisfy the request with a free object in a partial
slab. If none exist, a free object is assigned from an empty slab. If no empty slabs are
available, a new slab is allocated from contiguous physical pages and assigned to a
cache; memory for the object is allocated from this slab.

The slab allocator provides two main benefits:

1. No memory is wasted due to fragmentation. Fragmentation is not an issue


because each unique kernel data structure has an associated cache, and each cache is
comprised of one or more slabs that are divided into chunks the size of the objects
being represented. Thus, when the kernel requests memory for an object, the slab
allocator returns the exact amount of memory required to represent the object.

29
2. Memory requests can be satisfied quickly. The slab allocation scheme is thus
particularly effective for managing memory where objects are frequently allocated
and deallocated, as is often the case with requests from the kernel.

The act of allocating—and releasing—memory can be a time-consuming process.


However, objects are created in advance and thus can be quickly allocated from the
cache. Furthermore, when the kernel has finished with an object and releases it, it is
marked as free and returned to its cache, thus making it immediately available for
subsequent requests from the kernel. The slab allocator first appeared in the Solaris
2.4 kernel.

Because of its general-purpose nature, this allocator is now also used for certain user-
mode memory requests in Solaris. Linux originally used the buddy system; however,
beginning with version 2.2, the Linux kernel adopted the slab allocator.

Memory Mapped Files

Consider a sequential read of a file on disk using the standard system calls open (),
read (), and write (). Each file access requires a system call and disk access.
Alternatively, we can use the virtual memory techniques discussed so far to treat file
I/0 as routine memory accesses. This approach, known as memory mapping a file,
allows a part of the virtual address space to be logically associated with the file. As we
shall see, this can lead to significant performance increases when performing I/0.

Working sets and page fault rates

There is a direct relationship between the working set of a process and its page-fault
rate.
Typically as shown in Figure , the working set of process changes per time as references
to. data and code sections move from one locality to another. Assuming there is
sufficient memory to store the working set of .a process (that is, the processes
thrashing), the page-fault rate of the process will transition between peaks and valleys
over time. This general behaviour is shown in Figure

A peak in the page-fault rate occurs when we begin demand-paging a new locality.

30
However, once the working set of this new locality is in memory, the page-fault rate
falls. When the process moves to a new working set, the page fault rate rises toward a
peak once again, returning to a lower rate once the new working set is loaded into
memory.
The span of time between the start of one peak and the start of the next peak represents
the transition from one working set to another.

Basic Mechanism

Memory mapping a file is accomplished by mapping a disk block to a page (or pages)
in memory. Initial access to the file proceeds through ordinary demand paging,
resulting in a page fault. However, a page-sized portion of the file is read from the file
system into a physical page (some systems may opt to read in more than a page-sized
chunk of memory at a time).
Subsequent reads and writes to the file are handled as routine memory accesses,
thereby simplifying file access and usage by allowing the system to manipulate files
through memory rather than incurring the overhead of using the read () and write()
system calls. Similarly, as file l/0 is done in memory- as opposed to using system calls
that involve disk I/0 - file access is much faster as well. Note that writes to the file
mapped in memory are not necessarily immediate (synchronous) writes to the file on
disk.
Some systems may choose to update the physical file when the operating system
periodically checks whether the page in memory has been modified. When the file is
closed, all the memory-mapped data are written back to disk and removed from the
virtual memory of the process.

Some operating systems provide memory mapping only through a specific system call
and use the standard system calls to perform all other file I/0.

However, some systems choose to memory-map a file regardless of whether the file
was specified as memory-mapped. Let's take Solaris as an example. If a file is specified
as memory-mapped (using the mmap () system call), Solaris maps the file into the
address space of the process. If a file is opened and accessed using ordinary system
calls, such as open(), read(), and write(), Solaris still memory-maps the file; however,
the file is mapped to the kernel address space.

Regardless of how the file is opened, then, Solaris treats all file I/0 as memory-
mapped, allowing file access to take place via the efficient memory subsystem.
Multiple processes may be allowed to map the same file concurrently, to allow sharing
of data. Writes by any of the processes modify the data in virtual memory and can be
seen by all others that map the same section of the file. Given our earlier discussions
of virtual memory, it should be clear how the sharing of memory-mapped sections of
memory is implemented: the virtual memory map of each sharing process points to
the same page of physical memory-the page that holds a copy of the disk block This
memory sharing is illustrated.

31
The memory-mapping system calls can also support copy-on-write functionality,
allowing processes to share a file in read-only mode but to have their own copies of

32
any data they modify. So that access to the shared data is coordinated, the processes
involved might use one of the mechanisms for achieving mutual exclusion described.

In many ways, the sharing of memory-mapped files is similar to shared memory as


Not all systems use the same mechanism for both; on UNIX and Linux systems, for
example, memory mapping is accomplished with the mmap () system call, whereas
shared memory is achieved with the POSIX-compliant shmget () and shmat () systems
calls (Section 3.5.1). On Windows NT, 2000, and XP systems, howeve1~ shared
memory is accomplished by memory mapping files. On these systems, processes can
communicate using shared memory by having the communicating processes memory
map the same file into their virtual address spaces. The memory mapped file serves as
the region of shared memory between the communicating processes (). In the
following section, we illustrate support in the Win32 API for shared memory using
memory-mapped files.

Shared Memory in the Win32 API


The general outline for creating a region of shared memory using memory mapped
files in the Win32 API involves first creating a file mapping for the file to be mapped
and then establishing a view of the mapped file in a process's virtual address space. A
second process can then open and create a view of the mapped file in its virtual address
space. The mapped file represents the shared-menwry object that will enable
communication to take place between the processes.

We next illustrate these steps in more detail. In this example, a producer process first
creates a shared-memory object using the memory-mapping features available in the
Win32 API. The producer then writes a message to shared memory. After that, a
consumer process opens a mapping to the shared-memory object and reads the
message written by the consumer. To establish a memory-mapped file, a process first
opens the file to be mapped with the CreateFile () function, which returns a HANDLE
to the opened file.

The process then creates a mapping of this file HANDLE using the
CreateFileMapping() function. Once the file mapping is established, the process then
establishes a view of the mapped file in its virtual address space with the
MapViewDfFile () function. The view of the mapped file represents the portion of the
file being mapped in the virtual address space of the process the entire file or only a
portion of it may be mapped.

The call to CreateFileMapping() creates a named shared-memory object called


SharedObj ect. The consumer process will communicate using this shared-memory
segment by creating a mapping to the same named object. The producer then creates
a view of the memory-mapped file in its virtual address space. By passing the last three
parameters the value 0, it indicates that the mapped view is the entire file. It could
instead have passed values specifying an offset and size, thus creating a view
containing only a subsection of the file. (It is important to note that the entire mapping
may not be loaded into memory when the mapping is established. Rather, the mapped
file may be demand-paged, thus bringing pages into memory only as they are
accessed.)

33
The MapViewOfFile () fm1ction returns a pointer to the shared-memory object; any
accesses to this memory location are thus accesses to the memory-mapped file. In this
ii1stance, the producer process writes the message "Shared memory message" to
shared memory.

A program illustrating how the consumer process establishes a view of the named
shared-memory object is shown. This program is somewhat simpler than the one
shown as all that is necessary is for the process to create a mapping to the existii1g
named shared-memory object. The consumer process must also create a view of the
mapped file, just as the producer process did in the program i. The consumer then
reads from shared memory the message "Shared memory message" that was written
by the producer process.

Finally, both processes remove the view of the mapped file with a call to
UnmapViewOfFile (). We provide a programming exercise at the end of this chapter
using shared memory with memory mapping in the Win32 API.

Memory-Mapped i/0

In the case of I/0, as mentioned in Section 1.2.1, each I/0 controller includes registers
to hold commands and the data being transferred. Usually, special I/0 instructions
allow data transfers between these registers and system memory.

To allow more convenient access to I/0 devices1 many computer architectures Provide
Memory Mapped I/O In this case ranges of memory addresses are set aside and are
mapped to the device registers. Reads and writes to these memory addresses cause the
data to be transferred to and from the device registers. This method is appropriate for
devices that have fast response times such as video controllers. In the IBM PC each
location on the screen is mapped to a memory location. Displaying text on the screen
is almost as easy as writing the text into the appropriate memory-mapped locations.

Memory-mapped I/O is also convenient for other devices/ such as the serial and
parallel ports used to connect modems and printers to a computer. The CPU transfers
data through these kinds of devices by reading and writing a few device registers/
called an I/0 Port To send out a long string of bytes through a memory-mapped serial
port1 the CPU writes one data byte to the data register and sets a bit in the control
register to signal that the byte is available. The device takes the data byte and then
clears the bit in the control register to signal that it is ready for the next byte. Then the
CPU can transfer the next byte. If the CPU uses polling to watch the control bit
constantly looping to see whether the device is ready/ this method of operation is
called Programmed I/O
If the CPU does not poll the control bit/ but instead receives an interrupt when the
device is ready for the next byte/ the data transfer is said to be Interrupt Driven .

34
*******All The Best*******

35

You might also like