Module-4 MEMORY MANAGEMENT: Memory-Management Strategies, Virtual-Memory Management
Module-4 MEMORY MANAGEMENT: Memory-Management Strategies, Virtual-Memory Management
For example, if the base register holds 1000 and limit register is 500, then the program
can legally
access all addresses from 1000 through 1500 (inclusive).
Protection of memory space is done. Any attempt by an executing program to access
operating system
memory or other program memory results in a trap to the operating system, which
treats the attempt as a fatal error. This scheme prevents a user program from
(accidentally or deliberately) modifying the code or data structures of either the
operating system or other users.
The base and limit registers can be loaded only by the operating system, which uses a
special
privileged instruction. Since privileged instructions can be executed only in kernel
mode only the operating system can load the base and limit registers.
1
Address Binding
User programs typically refer to memory addresses with symbolic names such as "i",
"count", and
"average Temperature". These symbolic names must be mapped or bound to physical
memory addresses, which typically occurs in several stages:
2
Swapping
• If there is not enough memory available to keep all running processes in memory at
the same time, then some processes that are not currently using the CPU may have
their memory swapped out to a fast local disk called the backing store.
• It is important to swap processes out of memory only when they are idle, or more to
the point, only when there are no pending I/O operations. (Otherwise the pending I/O
operation could write into the wrong process's memory space.) The solution is to either
swap only totally idle processes, or do I/O operations only into and out of OS buffers,
which are then transferred to or from process's main memory as a second step.
• Most modern OSes no longer use swapping, because it is too slow and there are faster
alternatives available. (e.g. Paging. ) However, some UNIX systems will still invoke
swapping if the system gets extremely full, and then discontinue swapping when the
load reduces again.
Windows 3.1 would use a modified version of swapping that was somewhat controlled
by the user, swapping process's out if necessary and then only swapping them back in
when the user focused on that particular window.
3
Contiguous Memory Allocation
One approach to memory management is to load each process into a contiguous space.
The operating system is allocated space first, usually at either low or high memory
locations, and then the remaining available memory is allocated to processes as
needed. ( The OS is usually loaded low, because that is where the interrupt vectors are
located).
Here each process is contained in a single contiguous section of memory.
• The system shown in figure below allows protection against user programs accessing
areas that they should not, allows programs to be relocated to different memory
starting addresses as needed, and allows the memory space devoted to the OS to grow
or shrink dynamically as needs change.
• An alternate approach is to keep a list of unused (free) memory blocks ( holes ), and
to find a hole of a suitable size whenever a process needs to be loaded into memory
(called as MVT). There are many different strategies for finding the "best" allocation
of memory to processes, including the three most commonly discussed:
1. First fit - Search the list of holes until one is found that is big enough to
satisfy the request, and assign a portion of that hole to that process.
Whatever fraction of the hole not needed by the request is left on the free
list as a smaller hole. Subsequent requests may start looking either from the
beginning of the list or from the point at which this search ended.
2. Best fit - Allocate the smallest hole that is big enough to satisfy the
request. This saves large holes for other process requests that may need
them later, but the resulting unused portions of holes may be too small to
be of any use, and will therefore be wasted. Keeping the free list sorted can
speed up the process of finding the right hole.
4
3. Worst fit - Allocate the largest hole available, thereby increasing the
likelihood that the remaining portion will be usable for satisfying future
requests. Simulations show that either first or best fit are better than worst
fit in terms of both time and storage utilization. First and best fits are about
equal in terms of storage utilization, but first fit is faster.
3.) Fragmentation
Paging
5
• Any page ( from any process ) can be placed into any available frame.
• The page table is used to look up which frame a particular page is stored in at the
moment. In the following example, for instance, page 2 of the program's logical
memory is currently stored in frame 3 of physical memory.
• A logical address consists of two parts: A page number in which the address resides,
and an offset from the beginning of that page. (The number of bits in the pagenumber
limits how many pages a single process can address. The number of bits in the offset
determines the maximum size of each page, and should correspond to the system
frame size. )
• The page table maps the page number to a frame number, to yield a physical address
which also has two parts:
1. The frame number and the offset within that frame.
2. The number of bits in the frame number determines how many frames the
system can address, and the number of bits in the offset determines the size
of each frame.
• Page numbers, frame numbers, and frame sizes are determined by the architecture,
but are typically powers of two, allowing addresses to be split at a certain number of
bits.
For example, if the logical address size is 2^m and the page size is 2^n, then the high
order m-n bits of a logical address designate the page number and the remaining n bits
represent the offset.
6
• Note that paging is like having a table of relocation registers, one for each page of the
logical memory.
• There is, however, internal fragmentation. Memory is allocated in chunks the size of
a page, and on the average, the last page will only be half full, wasting on the average
half a page of memory per process.
• Larger page sizes waste more memory, but are more efficient in terms of overhead.
Modern trends have been to increase page sizes, and some systems even have multiple
size pages to try and make the best of both worlds.
• Consider the following example, in which a process has 16 bytes of logical memory,
mapped in 4 byte pages into 32 bytes of physical memory. (Presumably some other
processes would be consuming the remaining 16 bytes of physical memory. )
7
When a process requests memory ( e.g. when its code is loaded in from disk ), free
frames are allocated from a free-frame list, and inserted into that process's page table.
• Processes are blocked from accessing anyone else's memory because all of their
memory requests are mapped through their page table. There is no way for them to
generate an address that maps into any other process's memory space.
• The operating system must keep track of each individual process's page table,
updating it whenever the process's pages get moved in and out of memory, and
applying the correct page table when processing system calls for a particular process.
This all increases the overhead involved when swapping processes in and out of the
CPU. ( The currently active page table must be updated to reflect the process that is
currently running. )
8
Structure of the Page Table
1. Hierarchical Paging
• This structure supports two or more page tables at different levels (tiers).
• Most modern computer systems support logical address spaces of 2^32 to 2^64.
VAX Architecture divides 32-bit addresses into 4 equal sized sections, and each page
is 512 bytes, yielding an address form of:
With a 64-bit logical address space and 4K pages, there are 52 bits worth of page
numbers, which is still too many even for two-level paging. One could increase the
paging level, but with 10-bit page tables it would take 7 levels of indirection, which
would be prohibitively slow memory access. So some other approach must be used.
9
2. Hashed Page Tables
One common data structure for accessing data that is sparsely distributed over a broad
range of possible values is with hash tables.
• Access to an inverted page table can be slow, as it may be necessary to search the
entire table in order to find the desired page .
• The ‘id’ of process running in each frame and its corresponding page number is stored
in the page table.
10
Segmentation
Most users (programmers) do not think of their programs as existing in one
continuous linear address space.
• Rather they tend to think of their memory in multiple segments, each dedicated to
a particular use, such as code, data, the stack, the heap, etc.
• For example, a C compiler might generate 5 segments for the user code, library code,
global ( static ) variables, the stack, and the heap, as shown in Figure
Hardware
A segment table maps segment-offset addresses to physical addresses, and
simultaneously checks for invalid addresses.
Each entry in the segment table has a segment base and a segment limit. The segment
base contains the starting physical address where the segment resides in memory,
whereas the segment limit specifies the length of the segment.
• A logical address consists of two parts: a segment number, s, and an offset into
that segment, d. The segment number is used as an index to the segment table. The
offset d of the logical address must be between 0 and the segment limit.
When an offset is legal, it is added to the segment base to produce the address in
physical memory of the desired byte.
11
The segment table is thus essentially an array of baseand limit register pairs.
Virtual-Memory Management
In practice, most real processes do not need all their pages, or at least not all at once,
for several reasons:
1. Error handling code is not needed unless that specific error occurs, some of
which are quite rare.
2. Certain features of certain programs are rarely used.
The ability to load only the portions of processes that are actually needed has several
benefits:
Programs could be written for a much larger address space (virtual
memory space) than physically exists on the computer.
Because each process is only using a fraction of their total address space,
there is more memory left for other programs, improving CPU
utilization and system throughput.
Less I/O is needed for swapping processes in and out of RAM, speeding
things up.
The figure below shows the general layout of virtual memory, which can be much
larger than physical memory:
12
The figure below shows virtual address space, which is the programmer’s logical
view of process memory storage. The actual physical layout is controlled by the
process's page table.
Virtual memory also allows the sharing of files and memory by multiple processes,
with several benefits:
System libraries can be shared by mapping them into the virtual address
space of more than one process.
Processes can also share virtual memory by mapping the same block of
memory to more than one process.
Process pages can be shared during a fork( ) system call, eliminating the
need to copy all of the pages of the original ( parent ) process.
13
Demand Paging
The basic idea behind demand paging is that when a process is swapped in, its pages
are not swapped in all at once.
Rather they are swapped in only when the process needs them. ( on demand. ) This is
termed as lazy swapper, although a pager is a more accurate term.
The basic idea behind demand paging is that when a process is swapped in, the pager
only loads into memory those pages that is needed presently.
Pages that are not loaded into memory are marked as invalid in the page
table, using the invalid bit. Pages loaded in memory are marked as valid.
If the process only ever accesses pages that are loaded in memory (
memory resident pages ), then the process runs exactly as if all the
pages were loaded in to memory.
14
On the other hand, if a page is needed that was not originally loaded up, then a page
fault trap is generated, which must be handled in a series of steps:
1. The memory address requested is first checked, to make sure it was a valid memory
request.
2. If the reference is to an invalid page, the process is terminated. Otherwise, if the
page is not present in memory, it must be paged in.
3. A free frame is located, possibly from a free-frame list.
4. A disk operation is scheduled to bring in the necessary page from disk.
5. After the page is loaded to memory, the process's page table is updated with the new
frame number, and the invalid bit is changed to indicate that this is now a valid page
reference.
6. The instruction that caused the page fault must now be restarted from the
beginning.
In an extreme case, the program starts execution with zero pages in memory. Here NO
pages are swapped in for a process until they are requested by page faults. This is
known as pure demand paging.
The hardware necessary to support demand paging is the same as for paging and
swapping: A page table and secondary memory.
15
Performance of Demand Paging
Effective access time = p * time taken to access memory in page fault+ (1-p)* time
taken to access memory
= p * 8000000 + ( 1 - p ) * ( 200 )
= 200 + 7,999,800 * p
Even if only one access in 1000 causes a page fault, the effective access time drops from
200 nanoseconds to 8.2 microseconds, a slowdown of a factor of 40 times. In order to
keep the slowdown less than 10%, the page fault rate must be less than 0.0000025, or
one in 399,990 accesses.
Copy-on-Write
The idea behind a copy-on-write is that the pages of a parent process is shared
by the child process, until one or the other of the processes changes the page.
Only when a process changes any page content, that page is copied for the child.
Some systems provide an alternative to the fork( ) system call called a virtual
memory fork, vfork( ). In this case the parent is suspended, and the child
uses the parent's memory pages. This is very fast for process creation, but
requires that the child not modify any of the shared memory pages before
performing the exec( ) system call.
16
Page Replacement
In order to make the most use of virtual memory, we load several processes into
memory at the same time. Since we only load the pages that are actually needed
by each process at any given time, there are frames to load many more processes
in memory.
If some process suddenly decides to use more pages and there aren't any free
frames available. Then there are several possible solutions to consider:
1. Adjust the memory used by I/O buffering, etc., to free up some frames for user
processes.
2. Put the process requesting more pages into a wait queue until some free frames
become available.
3. Swap some process out of memory completely, freeing up its page frames.
4. Find some page in memory that isn't being used right now, and swap that page only
out to disk, freeing up a frame that can be allocated to the process requesting it. This
is known as page replacement, and is the most common solution. There are many
different algorithms for page replacement.
The previously discussed page-fault processing assumed that there would be free
frames available on the free-frame list. Now the page-fault handling must be modified
to free up a frame if necessary, as follows:
3. Read in the desired page and store it in the frame. Change the entries in page table.
17
Note that step 2c adds an extra disk write to the page-fault handling, thus doubling the
time required to process a page fault. This can be reduced by assigning a modify bit,
or dirty bit to each page in memory, indicating whether or not it has been changed
since it was last loaded in from disk.
If the page is not modified the bit is not set. If the dirty bit has not been set, then the
page is unchanged, and does not need to be written out to disk. Many page replacement
strategies specifically look for pages that do not have their dirty bit set.
There are two major requirements to implement a successful demand paging system.
A frame-allocation algorithm and a page-replacement algorithm.
The former centers around how many frames are allocated to each process, and the
latter deals with how to select a page for replacement when there are no free frames
available.
The overall goal in selecting and tuning these algorithms is to generate the
fewest number of overall page faults. Because disk access is so slow relative to
memory access, even slight improvements to these algorithms can yield large
improvements in overall system performance.
Or a FIFO queue can be created to hold all pages in memory. As new pages are
brought in, they are added to the tail of a queue, and the page at the head of the
queue is the next victim.
In the following example, a reference string is given and there are 3 free frames.
There are 20 page requests, which results in 15 page faults.
18
Although FIFO is simple and easy to understand, it is not always optimal, or
even efficient.
The discovery of Belady's anomaly lead to the search for an optimal page-
replacement algorithm, which is simply that which yields the lowest of all
possible page-faults, and which does not suffer from Belady's anomaly.
Such an algorithm does exist, and is called OPT or MIN. This algorithm is
"Replace the page that will not be used for the longest time in the future."
The same reference string used for the FIFO example is used in the example
below, here the minimum number of possible page faults is 9.
19
C) LRU Page Replacement
The LRU (Least Recently Used) algorithm, predicts that the page that has
not been used in the longest time is the one that will not be used again in the
near future.
Some view LRU as analogous to OPT, but here we look backwards in time
instead of forwards.
Figure illustrates LRU for our sample string, yielding 12 page faults, ( as
compared to 15 for FIFO and 9 for OPT. ) LR
LRU is considered a good replacement policy, and is often used. There are two simple
approaches commonly used to implement this:
2. Stack. Another approach is to use a stack, and whenever a page is accessed, pull
that page from the middle of the stack and place it on the top. The LRU page will always
be at the bottom of the stack. Because this requires removing objects from the middle
of the stack, a doubly linked list is the recommended data structure.
20
d) LRU-Approximation Page Replacement
If a page is found with its reference bit as ‘0’, then that page is selected as the
next victim.
f the reference bit value is ‘1’, then the page is given a second chance and its
reference bit value is cleared (assigned as‘0’).
Thus, a page that is given a second chance will not be replaced until all other
pages have been replaced (or given second chances). In addition, if a page is
used often, then it sets its reference bit again.
This algorithm is also known as the clock algorithm.
21
d.3 Enhanced Second-Chance Algorithm
The enhanced second chance algorithm looks at the reference bit and the modify
bit ( dirty bit ) as an ordered page, and classifies pages into one of four classes:
This algorithm searches the page table in a circular fashion, looking for the first
page it can find in the lowest numbered category. i.e. it first makes a pass
looking for a ( 0, 0 ), and then if it can't find one, it makes another pass looking
for a ( 0, 1 ), etc.
The main difference between this algorithm and the previous one is the
preference for replacing clean pages if possible.
There are several algorithms based on counting the number of references that have
been made to a given page, such as:
Least Frequently Used, LFU: Replace the page with the lowest
reference count. A problem can occur if a page is used frequently initially
and then not used any more, as the reference count remains high. A
solution to this problem is to right-shift the counters periodically,
yielding a time-decaying average reference count.
Most Frequently Used, MFU: Replace the page with the highest
reference count. The logic behind this idea is that pages that have already
been referenced a lot have been in the system a long time, and we are
probably done with them, whereas pages referenced only a few times
have only recently been loaded, and we still need them.
f) Page-Buffering Algorithms
Maintain a certain minimum number of free frames at all times. When a page-
fault occurs, go ahead and allocate one of the free frames from the free list first,
so that the requesting process is in memory as early as possible, and then select
a victim page to write to disk and free up a frame.
Keep a list of modified pages, and when the I/O system is idle, these pages are
written to disk, and then clear the modify bits, thereby increasing the chance of
finding a "clean" page for the next potential victim and page replacement can
be done much faster.
22
Allocation of Frames
Allocation Algorithms
After loading of OS, there are two ways in which the allocation of frames can be
done to the processes.
Consider a system with a 1KB frame size. If a small student process of 10 KB and an
interactive database of 127 KB are the only two processes running in a system with 62
free frames.
With proportional allocation, we would split 62 frames between two processes, as
follows-
m=62, S = (10+127)=137
Allocation for process 1 = 62 X 10/137 ~ 4
Allocation for process 2 = 62 X 127/137 ~57
Thus allocates 4 frames and 57 frames to student process and database respectively.
23
Global page replacement is overall more efficient, and is the more commonly
used approach.
Thrashing
Thrashing is the state of a process where there is high paging activity.
A process that is spending more time paging than executing is said to be thrashing.
Cause of Thrashing
When memory is filled up and processes starts spending lots of time waiting for
their pages to page in, then CPU utilization decreases (Processes are not
executed as they are waiting for some pages), causing the scheduler to add in
even more processes and increase the degree of multiprogramming even more.
Thrashing has occurred, and system throughput plunges. No work is getting
done, because the processes are spending all their time paging.
In the graph given below, CPU utilization is plotted against the degree of
multiprogramming. As the degree of multiprogramming increases, CPU
utilization also increases, although more slowly, until a maximum is reached. If
the degree of multiprogramming is increased even further, thrashing sets in,
and CPU utilization drops sharply. At this point, to increase CPU utilization and
stop thrashing, we must decrease the degree of multiprogramming.
Local page replacement policies can prevent thrashing process from taking pages away
from other processes, but it still tends to clog up the I/O queue.
24
Working-Set Model
The working set model is based on the concept of locality, and defines a working
set window, of length delta. Whatever pages are included in the most recent delta
page references are said to be in the processes working set window, and comprise its
current working set, as illustrated in Figure
The selection of delta is critical to the success of the working set model - If it is
too small then it does not encompass all of the pages of the current locality, and
if it is too large, then it encompasses pages that are no longer being frequently
accessed.
The total demand of frames, D, is the sum of the sizes of the working sets for all
processes (D=WSSi ). If D exceeds the total number of available frames, then at
least one process is thrashing, because there are not enough frames available to
satisfy its minimum working set. If D is significantly less than the currently
available frames, then additional processes can be launched.
The hard part of the working-set model is keeping track of what pages are in the
current working set, since every reference adds one to the set and removes one
older page.
25
Page-Fault Frequency
When page-fault rate is too high, the process needs more frames and when it is too
low, the process may have too many frames.
The upper and lower bounds can be established on the page-fault rate. If the actual
page-fault rate exceeds the upper limit, allocate the process another frame or suspend
the process. If the page-fault rate falls below the lower limit, remove a frame from the
process. Thus, we can directly measure and control the page-fault rate to prevent
thrashing.
When a process running in user mode requests additional memory, pages are
allocated from the list of free page frames maintained by the kernel. This list is
typically populated using a page-replacement algorithm and most likely contains free
pages scattered throughout physical memory, as explained earlier.
Remember, too, that if a user process requests a single byte of memory, internal
fragmentation will result, as the process will be granted, an entire page frame. Kernel
memory, however, is often allocated from a free-memory pool different from the list
used to satisfy ordinary user-mode processes.
1. The kernel requests memory for data structures of varying sizes, some of which are
less than a page in size. As a result, the kernel must use memory conservatively and
attempt to minimize waste due to fragmentation. This is especially important because
many operating systems do not subject kernel code or data to the paging system.
26
require memory residing in physically contiguous pages. In the following sections, we
examine two strategies for managing free memory that is assigned to kernel
processes.
1. Buddy System
Memory is allocated from this segment using a power-of-2 allocator, which satisfies
requests in units sized as a power of 2 (4 KB, 8 KB, 16 KB, and so forth).
A request in units not appropriately sized is rounded up to the next highest power of
2.
For example, if a request for 11 KB is made, it is satisfied with a 16-KB segment. Next,
we explain the operation of the buddy system with a simple example. Let's assume
the size of a memory segment is initially 256 KB and the kernel requests 21 KB of
memory.
The segment is initially divided into two buddies—which we will call Ai and AR—each
128 KB in size. One of these buddies is further divided into two 64-KB buddies—B;
and B«.
27
the buddy system is that rounding up to the next highest power of 2 is very likely to
cause fragmentation within allocated segments. For example, a 33-KB request can
only be satisfied with a 64- KB segment. In fact, we cannot guarantee that less than
50 percent of the allocated unit will be wasted due to internal fragmentation. In the
following section, we explore a memory allocation scheme where no space is lost due
to fragmentation.
2.Slab Allocation
A slab is made up of one or more physically contiguous pages. A cache consists of one
or more slabs.
There is a single cache for each unique kernel data structure —for example, a separate
cache for the data structure representing process descriptors, a separate cache for file
objects, a separate cache for semaphores, and so forth. Each cache is populated with
objects that are instantiations of the kernel data structure the cache represents.
28
The figure shows two kernel objects 3 KB in size and three objects 7 KB in size. These
objects are stored in their respective caches The slab-allocation algorithm uses caches
to store kernel objects.
For example, a 12-KB slab (comprised of three continguous 4-KB pages) could store
six 2-KB objects. Initially, all objects in the cache are marked as free. When a new
object for a kernel data structure is needed, the allocator can assign any free object
from the cache to satisfy the request.
Let's consider a scenario in which the kernel requests memory from the slab allocator
for an object representing a process descriptor. In Linux systems, a process descriptor
is of the type struc t task^struct, which requires approximately 1.7 KB of memory.
When the Linux kernel creates a new task, it requests the necessary memory for the
struc t task.struc t object from its cache. The cache will fulfill the request using a struct
taskstruct object that has already been allocated in a slab and is marked as free.
The slab allocator first attempts to satisfy the request with a free object in a partial
slab. If none exist, a free object is assigned from an empty slab. If no empty slabs are
available, a new slab is allocated from contiguous physical pages and assigned to a
cache; memory for the object is allocated from this slab.
29
2. Memory requests can be satisfied quickly. The slab allocation scheme is thus
particularly effective for managing memory where objects are frequently allocated
and deallocated, as is often the case with requests from the kernel.
Because of its general-purpose nature, this allocator is now also used for certain user-
mode memory requests in Solaris. Linux originally used the buddy system; however,
beginning with version 2.2, the Linux kernel adopted the slab allocator.
Consider a sequential read of a file on disk using the standard system calls open (),
read (), and write (). Each file access requires a system call and disk access.
Alternatively, we can use the virtual memory techniques discussed so far to treat file
I/0 as routine memory accesses. This approach, known as memory mapping a file,
allows a part of the virtual address space to be logically associated with the file. As we
shall see, this can lead to significant performance increases when performing I/0.
There is a direct relationship between the working set of a process and its page-fault
rate.
Typically as shown in Figure , the working set of process changes per time as references
to. data and code sections move from one locality to another. Assuming there is
sufficient memory to store the working set of .a process (that is, the processes
thrashing), the page-fault rate of the process will transition between peaks and valleys
over time. This general behaviour is shown in Figure
A peak in the page-fault rate occurs when we begin demand-paging a new locality.
30
However, once the working set of this new locality is in memory, the page-fault rate
falls. When the process moves to a new working set, the page fault rate rises toward a
peak once again, returning to a lower rate once the new working set is loaded into
memory.
The span of time between the start of one peak and the start of the next peak represents
the transition from one working set to another.
Basic Mechanism
Memory mapping a file is accomplished by mapping a disk block to a page (or pages)
in memory. Initial access to the file proceeds through ordinary demand paging,
resulting in a page fault. However, a page-sized portion of the file is read from the file
system into a physical page (some systems may opt to read in more than a page-sized
chunk of memory at a time).
Subsequent reads and writes to the file are handled as routine memory accesses,
thereby simplifying file access and usage by allowing the system to manipulate files
through memory rather than incurring the overhead of using the read () and write()
system calls. Similarly, as file l/0 is done in memory- as opposed to using system calls
that involve disk I/0 - file access is much faster as well. Note that writes to the file
mapped in memory are not necessarily immediate (synchronous) writes to the file on
disk.
Some systems may choose to update the physical file when the operating system
periodically checks whether the page in memory has been modified. When the file is
closed, all the memory-mapped data are written back to disk and removed from the
virtual memory of the process.
Some operating systems provide memory mapping only through a specific system call
and use the standard system calls to perform all other file I/0.
However, some systems choose to memory-map a file regardless of whether the file
was specified as memory-mapped. Let's take Solaris as an example. If a file is specified
as memory-mapped (using the mmap () system call), Solaris maps the file into the
address space of the process. If a file is opened and accessed using ordinary system
calls, such as open(), read(), and write(), Solaris still memory-maps the file; however,
the file is mapped to the kernel address space.
Regardless of how the file is opened, then, Solaris treats all file I/0 as memory-
mapped, allowing file access to take place via the efficient memory subsystem.
Multiple processes may be allowed to map the same file concurrently, to allow sharing
of data. Writes by any of the processes modify the data in virtual memory and can be
seen by all others that map the same section of the file. Given our earlier discussions
of virtual memory, it should be clear how the sharing of memory-mapped sections of
memory is implemented: the virtual memory map of each sharing process points to
the same page of physical memory-the page that holds a copy of the disk block This
memory sharing is illustrated.
31
The memory-mapping system calls can also support copy-on-write functionality,
allowing processes to share a file in read-only mode but to have their own copies of
32
any data they modify. So that access to the shared data is coordinated, the processes
involved might use one of the mechanisms for achieving mutual exclusion described.
We next illustrate these steps in more detail. In this example, a producer process first
creates a shared-memory object using the memory-mapping features available in the
Win32 API. The producer then writes a message to shared memory. After that, a
consumer process opens a mapping to the shared-memory object and reads the
message written by the consumer. To establish a memory-mapped file, a process first
opens the file to be mapped with the CreateFile () function, which returns a HANDLE
to the opened file.
The process then creates a mapping of this file HANDLE using the
CreateFileMapping() function. Once the file mapping is established, the process then
establishes a view of the mapped file in its virtual address space with the
MapViewDfFile () function. The view of the mapped file represents the portion of the
file being mapped in the virtual address space of the process the entire file or only a
portion of it may be mapped.
33
The MapViewOfFile () fm1ction returns a pointer to the shared-memory object; any
accesses to this memory location are thus accesses to the memory-mapped file. In this
ii1stance, the producer process writes the message "Shared memory message" to
shared memory.
A program illustrating how the consumer process establishes a view of the named
shared-memory object is shown. This program is somewhat simpler than the one
shown as all that is necessary is for the process to create a mapping to the existii1g
named shared-memory object. The consumer process must also create a view of the
mapped file, just as the producer process did in the program i. The consumer then
reads from shared memory the message "Shared memory message" that was written
by the producer process.
Finally, both processes remove the view of the mapped file with a call to
UnmapViewOfFile (). We provide a programming exercise at the end of this chapter
using shared memory with memory mapping in the Win32 API.
Memory-Mapped i/0
In the case of I/0, as mentioned in Section 1.2.1, each I/0 controller includes registers
to hold commands and the data being transferred. Usually, special I/0 instructions
allow data transfers between these registers and system memory.
To allow more convenient access to I/0 devices1 many computer architectures Provide
Memory Mapped I/O In this case ranges of memory addresses are set aside and are
mapped to the device registers. Reads and writes to these memory addresses cause the
data to be transferred to and from the device registers. This method is appropriate for
devices that have fast response times such as video controllers. In the IBM PC each
location on the screen is mapped to a memory location. Displaying text on the screen
is almost as easy as writing the text into the appropriate memory-mapped locations.
Memory-mapped I/O is also convenient for other devices/ such as the serial and
parallel ports used to connect modems and printers to a computer. The CPU transfers
data through these kinds of devices by reading and writing a few device registers/
called an I/0 Port To send out a long string of bytes through a memory-mapped serial
port1 the CPU writes one data byte to the data register and sets a bit in the control
register to signal that the byte is available. The device takes the data byte and then
clears the bit in the control register to signal that it is ready for the next byte. Then the
CPU can transfer the next byte. If the CPU uses polling to watch the control bit
constantly looping to see whether the device is ready/ this method of operation is
called Programmed I/O
If the CPU does not poll the control bit/ but instead receives an interrupt when the
device is ready for the next byte/ the data transfer is said to be Interrupt Driven .
34
*******All The Best*******
35