Module 3 Memory
Module 3 Memory
The main memory of a computer is semiconductor memory. The main memory unit of computer is
basically consists
of two kinds of memory:
RAM : Random access memory; which is volatile in nature.
ROM: Read only memory; which is non-volatile.
The permanent information are kept in ROM and the user space is basically in RAM.
The smallest unit of information is known as bit (binary digit), and in one memory cell we can store
one bit of
information. 8 bit together is termed as a byte.
The maximum size of main memory that can be used in any computer is determined by the
addressing scheme.
A computer that generates 16-bit address is capable of addressing upto 216 which is equal to 64K
memory location.
Similarly, for 32 bit addresses, the total capacity will be 232 which is equal to 4G memory location.
In some computer, the smallest addressable unit of information is a memory word and the
machine is called wordaddressable.
In some computer, individual address is assigned for
each byte of information, and it is called byte-addressable
computer. In this computer, one memory word contains
one or more memory bytes which can be addressed
individually.
A byte addressable 32-bit computer, each memory word
contains 4 bytes. A possible way of address assignment
is shown in figure. The address of a word is always
integer multiple of 4.
The main memory is usually designed to store and
retrieve data in word length quantities. The word length of
a computer is generally defined by the number of bits
actually stored or retrieved in one main memory access.
Consider a machine with 32 bit address bus. If the word
size is 32 bit, then the high order 30 bit will specify the
address of a word. If we want to access any byte of the
word, then it can be specified by the lower two bit of the
address bus.
The data transfer between main memory and the CPU takes place through two CPU
registers.
m MAR : Memory Address Register
m MDR : Memory Data Register.
If the MAR is k-bit long, then the total addressable memory location will be 2k.
If the MDR is n-bit long, then the n bit of data is transferred in one memory cycle.
The transfer of data takes place through memory bus, which consist of address bus
and data bus. In the above
example, size of data bus is n-bit and size of address bus is k bit.
It also includes control lines like Read, Write and Memory Function Complete (MFC)
for coordinating data transfer. In
the case of byte addressable computer, another control line to be added to indicate
the byte transfer instead of the
whole word.
For memory operation, the CPU initiates a memory operation by loading the appropriate data i.e., address to
MAR.
If it is a memory read operation, then it sets the read memory control line to 1. Then the contents of the
memory
location is brought to MDR and the memory control circuitry indicates this to the CPU by setting MFC to 1.
If the operation is a memory write operation, then the CPU places the data into MDR and sets the write
memory
control line to 1. Once the contents of MDR are stored in specified memory location, then the memory control
circuitry
indicates the end of operation by setting MFC to 1.
A useful measure of the speed of memory unit is the time that elapses between the initiation of an operation
and the
completion of the operation (for example, the time between Read and MFC). This is referred to as Memory
Access
Time. Another measure is memory cycle time. This is the minimum time delay between the initiation two
independent
memory operations (for example, two successive memory read operation). Memory cycle time is slightly larger
than
memory access time.
Binary Storage Cell:
n Both static and dynamic RAMs are volatile, that is, it will retain the
information as long as power supply is
applied.
n A dynamic memory cell is simpler and smaller than a static memory cell.
Thus a DRAM is more dense,
i.e., packing density is high(more cell per unit area). DRAM is less expensive
than corresponding SRAM.
n DRAM requires the supporting refresh circuitry. For larger memories, the
fixed cost of the refresh circuitry is
more than compensated for by the less cost of DRAM cells
• n SRAM cells are generally fasterthan the DRAM cells. Therefore, to
construct faster memory modules(like
• cache memory) SRAM is used.
Internal Organization of Memory Chips
A memory chip consisting of 16 words of 8 bits each, usually referred to as 16 x 8 organization. The data input
and data output line of each Sense/Write circuit are connected to a single bidirectional data line in order to reduce the
pin required. For 16 words, we need an address bus of size 4. In addition to address and data lines, two control lines, R/W
and CS, are provided. The R /W line is to used to specify the required operation about read or write. The CS
(Chip Select) line is required to select a given chip in a multi chip memory system.
Consider a slightly larger memory unit that has 1K (1024) memory cells...
128 x 8 memory chips:
The memory control circuitry is designed to take advantage of the property of locality of reference. Some
assumptions are made while designing the memory control circuitry:
1. The CPU does not need to know explicitly about the existence of the cache.
2. The CPU simply makes Read and Write request. The nature of these two operations are same whether
cache is present or not.
3. The address generated by the CPU always refer to location of main memory.
4. The memory access control circuitry determines whether or not the requested word currently exists in
the cache.
When a Read request is received from the CPU, the contents of a block of memory words containing the
location
specified are transferred into the cache. When any of the locations in this block is referenced by the
program, its
contents are read directly from the cache.
The cache memory can store a number of such blocks at any given time.
The correspondence between the Main Memory Blocks and those in the cache is specified by means of a
mapping
function.
When the cache is full and a memory word is referenced that is not in the cache, a decision must be made as to
which block should be removed from the cache to create space to bring the new block to the cache that contains the
referenced word. Replacement algorithms are used to make the proper selection of block that must be replaced by the
new one.
When a write request is received from the CPU, there are two ways that the system can proceed. In the first case,
the cache location and the main memory location are updated simultaneously. This is called the store through method
or write through method.
The alternative is to update the cache location only. During replacement time, the cache block will be written back to
the main memory. If there is no new write operation in the cache block, it is not required to write back the cache block
in the main memory. This information can be kept with the help of an associated bit. This bit it set while there is a
write operation in the cache block. During replacement, it checks this bit, if it is set, then write back the cache block in
main memory otherwise not. This bit is known as dirty bit. If the bit gets dirty (set to one), writting to main memory is
required.
This write through method is simpler, but it results in unnecessary write operations in the main memory when a given
cache word is updated a number of times during its cache residency period.
Consider the case where the addressed word is not in the cache and the operation is a read. First the block of the
words is brought to the cache and then the requested word is forwarded to the CPU. But it can be forwarded to the
CPU as soon as it is available to the cache, instead of whole block to be loaded into the cache. This is called load
through, and there is some scope to save time while using load through policy.
• During a write operation, if the address word is not in the
cache, the information is written directly into the main
• memory. A write operation normally refers to the location
of data areas and the property of locality of reference is
not
• as pronounced in accessing data when write operation is
involved. Therefore, it is not advantageous to bring the
data
• block to the cache when there a write operation, and the
addressed word is not present in cache.
Mapping functions
The mapping functions are used to map a particular block of main memory to a particular block of cache.
This
mapping function is used to transfer the block from main memory to cache memory. Three different
mapping
functions are available:
Direct mapping:
A particular block of main memory can be brought to a particular block of cache memory. So, it is not
flexible.
Associative mapping:
In this mapping function, any block of Main memory can potentially reside in any cache block position.
This is much more flexible mapping method.
Block-set-associative mapping:
In this method, blocks of cache are grouped into sets, and the mapping allows a block of main memory
to reside in any block of a specific set. From the flexibility point of view, it is in between to the other two
methods.
All these three mapping methods are explained with the help of an
example.
Consider a cache of 4096 (4K) words with a block size of 32 words. Therefore, the cache is organized as 128 blocks.
For 4K words, required address lines are 12 bits. To select one of the block out of 128 blocks, we need 7 bits of
address lines and to select one word out of 32 words, we need 5 bits of address lines. So the total 12 bits of address
is divided for two groups, lower 5 bits are used to select a word within a block, and higher 7 bits of address are used
to select any block of cache memory.
Let us consider a main memory system consisting 64K words. The size of address bus is 16 bits. Since the block size
of cache is 32 words, so the main memory is also organized as block size of 32 words. Therefore, the total number of
blocks in main memory is 2048 (2K x 32 words = 64K words). To identify any one block of 2K blocks, we need 11
address lines. Out of 16 address lines of main memory, lower 5 bits are used to select a word within a block and
higher 11 bits are used to select a block out of 2048 blocks.
Number of blocks in cache memory is 128 and number of blocks in main memory is 2048, so at any instant of time
only 128 blocks out of 2048 blocks can reside in cache menory. Therefore, we need mapping function to put a
particular block of main memory into appropriate block of cache memory.
Direct Mapping Technique:
The simplest way of associating main memory blocks with cache block is the direct mapping technique. In this
technique, block k of main memory maps into block k modulo m of the cache, where m is the total number of blocks
in cache. In this example, the value of m is 128. In direct mapping technique, one particular block of main memory
can be transfered to a particular block of cache which is derived by the modulo function.
Since more than one main memory block is mapped onto a given cache block position, contention may arise for that
position. This situation may occurs even when the cache is not full. Contention is resolved by allowing the new block
to overwrite the currently resident block. So the replacement algorithm is trivial.
The detail operation of direct mapping technique is as follows:
The main memory address is divided into three fields. The field size depends on the memory capacity and the block
size of cache. In this example, the lower 5 bits of address is used to identify a word within a block. Next 7 bits are
used to select a block out of 128 blocks (which is the capacity of the cache). The remaining 4 bits are used as a TAG
to identify the proper block of main memory that is mapped to cache.
When a new block is first brought into the cache, the high order 4 bits of the main memory address are stored in
four
TAG bits associated with its location in the cache. When the CPU generates a memory request, the 7-bit block
address determines the corresponding cache block. The TAG field of that block is compared to the TAG field of the
address. If they match, the desired word specified by the low-order 5 bits of the address is in that block of the
cache.
If there is no match, the required
word must be accessed from the
main memory, that is, the contents
of that block of the cache is
replaced by the new block that is
specified by the new address
generated by the CPU and
correspondingly the TAG bit will
also be changed by the high order
4 bits of the address. The whole
arrangement for direct mapping
technique is shown in the figure.
Associated Mapping Technique:
In the associative mapping technique, a main memory block can potentially reside in any cache block position. In
this
case, the main memory address is divided into two groups, low-order bits identifies the location of a word within a
block and high-order bits identifies the block. In the example here, 11 bits are required to identify a main memory
block when it is resident in the cache , high-order 11 bits are used as TAG bits and low-order 5 bits are used to
identify a word within a block. The TAG bits of an address received from the CPU must be compared to the TAG bits
of each block of the cache to see if the desired block is present.
In the associative mapping, any block of main memory can go to any block of cache, so it has got the
complete flexibility and we have to use proper replacement policy to replace a block from cache if the currently
accessed block of main memory is not present in cache. It might not be practical to use this complete flexibility of
associative mapping technique due to searching overhead, because the TAG field of main memory address has to
be compared with the TAG field of all the cache block. In this example, there are 128 blocks in cache and the size of
TAG is 11 bits. The whole arrangement of Associative Mapping Technique is shown in the figure (Next Page..).
Block-Set-Associative Mapping Technique
• :
This mapping technique is intermediate to the above two techniques. Blocks of the cache are grouped into sets, and
the mapping allows a block of main memory to reside in any block of a specific set. Therefore, the flexibity of
associative mapping is reduced from full freedom to a set of specific blocks. This also reduces the searching
overhead, because the search is restricted to number of sets, instead of number of blocks. Also the contention
problem of the direct mapping is eased by having a few choices for block replacement.
Consider the same cache memory and main memory organization of the previous example. Organize the cache with
4 blocks in each set. The TAG field of associative mapping technique is divided into two groups, one is termed as
SET bit and the second one is termed as TAG bit. Since each set contains 4 blocks, total number of set is 32. The
main memory address is grouped into three parts: low-order 5 bits are used to identifies a word within a block. Since
there are total 32 sets present, next 5 bits are used to identify the set. High-order 6 bits are used as TAG bits.
The 5-bit set field of the address determines which set of the cache might contain the desired block. This is similar to
direct mapping technique, in case of direct mapping, it looks for block, but in case of block-set-associative mapping, it
looks for set. The TAG field of the address must then be compared with the TAGs of the four blocks of that set. If a
match occurs, then the block is present in the cache; otherwise the block containing the addressed word must be
brought to the cache. This block will potentially come to the cooresponding set only. Since, there are four blocks in
the set, we have to choose appropriately which block to be replaced if all the blocks are occupied. Since the search is
restricted to four block only, so the searching complexity is reduced. The whole arrangement of block-set-associative
mapping technique is shown in the figure.
It is clear that if we increase the number of blocks per set, then the
number of bits in SET field is
reduced. Due to the increase of
blocks per set, complexity of
search is also increased. The
extreme condition of 128 blocks
per set requires no set bits and
corrsponds to the fully associative
mapping technique with 11 TAG
bits. The other extreme of one
block per set is the direct mapping
method.
Replacement Algorithms
When a new block must be brought into the cache and all the
positions that it may occupy are full, a decision must be
made as to which of the old blocks is to be overwritten. In
general, a policy is required to keep the block in cache
when they are likely to be referenced in near future.
However, it is not easy to determine directly which of the
block in
the cache are about to be referenced. The property of locality
of reference gives some clue to design good
replacement policy.
• Least Recently Used (LRU) Replacement policy:
Since program usually stay in localized areas for reasonable periods of time, it can be assumed that there
is a high
probability that blocks which have been referenced recently will also be referenced in the near future.
Therefore,
when a block is to be overwritten, it is a good decision to overwrite the one that has gone for longest
time without
being referenced. This is defined as the least recently used (LRU) block. Keeping track of LRU block must
be done
as computation proceeds.
Consider a specific example of a four-block set. It is required to track the LRU block of this four-block set.
A 2-bit
counter may be used for each block.
When a hit occurs, that is, when a read request is received for a word that is in the cache, the counter of
the block
that is referenced is set to 0. All counters which values originally lower than the referenced one are
incremented by 1
and all other counters remain unchanged.
When a miss occurs, that is, when a read request is received for a word and the
word is not present in the cache, we
have to bring the block to cache.
There are two possibilities in case of a miss:
If the set is not full, the counter associated with the new block loaded from
the main memory is set to 0,
and the values of all other counters are incremented by 1.
If the set is full and a miss occurs, the block with the counter value 3 is
removed , and the new block is
put in its palce. The counter value is set to zero. The other three block counters are
incremented by 1.
It is easy to verify that the counter values of occupied blocks are always distinct.
Also it is trivial that highest counter
value indicates least recently used block.
First In First Out (FIFO) replacement policy:
A reasonable rule may be to remove the oldest from a full set when a new block must be
brought in. While using this
technique, no updation is required when a hit occurs. When a miss occurs and the set is
not full, the new block is put
into an empty block and the counter values of the occupied block will be increment by
one. When a miss occurs and
the set is full, the block with highest counter value is replaced by new block and counter
is set to 0, counter value of
all other blocks of that set is incremented by 1. The overhead of the policy is less, since
no updation is required
during hit.
Random replacement policy:
The simplest algorithm is to choose the block to be overwritten at random. Interestingly
enough, this simple algorithm
has been found to be very effective in practice.
Main Memory
The main working principle of digital computer is Von-Neumann stored program principle. First of all we have to
keep
all the information in some storage, mainly known as main memory, and CPU interacts with the main memory only.
Therefore, memory management is an important issue while designing a computer system.
On the otherhand, everything cannot be implemented in hardware, otherwise the cost of system will be very high.
Therefore some of the tasks are performed by software program. Collection of such software programs are basically
known as operating systems. So operating system is viewed as extended machine. Many more functions or
instructions are implemented through software routine. The operating system is mainly memory resistant, i.e., the
operating system is loaded into main memory.
Due to that, the main memory of a computer is divided into two parts. One part is
reserved for operating system. The other part is for user program. The program
currently being executed by the CPU is loaded into the user part of the memory.
In a uni-programming system, the program currently being executed is loaded into
the user part of the memory.
In a multiprogramming system, the user part of memory is subdivided to accomodate
multiple process. The task of subdivision is carried out dynamically by opearting
system and is known as memory management.
Efficient memory management is vital in a multiprogramming system. If only a few process are in memory, then for
much of the time all of the process will be waiting for I/O and the processor will idle. Thus memory needs to be
allocated efficiently to pack as many processes into main memory as possible.
When memory holds multiple processes, then the process can move from one process to another process when one
process is waiting. But the processor is so much faster then I/O that it will be common for all the processes in
memory to be waiting for I/O. Thus, even with multiprogramming, a processor could be idle most of the time.
Due to the speed mismatch of the
processor and I/O device, the
status at any point in time is
reffered to as a state.
There are five defined state of a
process as shown in the figure.
When we start to execute a
process, it is placed in the process
queue and it is in the new state. As
resources become available, then
the process is placed in the ready
queue.
Figure : Five State process model
1.New : A program is admitted by the scheduler, but not yet ready to execute. The operating system will
initialize the process by moving it to the ready state.
2.Ready : The process is ready to execute and is waiting access to the processor.
3.Running : The process is being executed by the processor. At any given time, only one process is in running
state.
4. Waiting : The process is suspended from execution, waiting for some system resource, such as I/O.
5. Exit : The process has terminated and will be destroyed by the operating system.
The processor alternates between executing operating system instructions and executing user processes. While
the
operating system is in control, it decides which process in the queue sholud be executed next.
A process being executed may be suspended for a variety of reasons. If it is suspended because the process
requests I/O, then it is places in the appropriate I/O queue. If it is suspended because of a timeout or because the
operating system must attend to processing some of it's task, then it is placed in ready state.
We know that the information of all the process that are in execution must be placed in main memory. Since
there is
fix amount of memory, so memory management is an important issue.
Memory Management
In an uniprogramming system, main memory is divided into two parts : one part for the operating
system and the
other part for the program currently being executed.
In multiprogramming system, the user part of
memory is subdivided to accomodate multiple
processes.
The task of subdivision is carried out
dynamically by the operating system and is
known as memory management.
In uniprogramming system, only one program
is in execution. After complition of one
program, another program may start.
In general, most of the programs involve I/O
operation. It must take input from some input
device and place the result in some output
device.
To utilize the idle time of CPU, we are shifting the paradigm from uniprogram environment to multiprogram
environment.
Since the size of main memory is fixed, it is possible to accomodate only few process in the main memory. If all are
waiting for I/O operation, then again CPU remains idle.
To utilize the idle time of CPU, some of the process must be off loaded from the memory and new process must be
brought to this memory place. This is known swapping.
What is swapping :
1. The process waiting for some I/O to complete, must stored back in disk.
2. New ready process is swapped in to main memory as space becomes available.
3. As process completes, it is moved out of main memory.
4. If none of the processes in memory are ready,
n Swapped out a block process to intermediate queue of blocked process.
n Swapped in a ready process from the ready queue.
But swapping is an I/O process, so it also takes time. Instead of remain in idle state of CPU, sometimes it is
advantageous to swapped in a ready process and start executing it.
The main question arises where to put a new process in the main memory. It must be done in such a way that the
memory is utilized properly.
Partitioning
Both unequal fixed size and variable size partitions are inefficient in the use of memory. It has been
observed that
both schemes lead to memory wastage. Therefore we are not using the memory efficiently.
There is another scheme for use of memory which is known as paging.
In this scheme,
The memory is partitioned into equal fixed size chunks that are relatively small. This chunk of memory is
known as frames or page frames.
Each process is also divided into small fixed chunks of same size. The chunks of a program is known as
pages.
A page of a program could be assigned to available page frame.
In this scheme, the wastage space in memory for a process is a fraction of a page frame which
corresponds to the
last page of the program.
At a given point of time some of the frames in memory are in use and some are free. The list of free
frame is
maintained by the operating system.
• Process A , stored in disk , consists of pages . At the time of execution of the process A, the operating system finds
• six free frames and loads the six pages of the process A into six frames.
• These six frames need not be contiguous frames in main memory. The operating system maintains a page table for
• each process.
• Within the program, each logical address consists of page number and a relative address within the page.
• In case of simple partitioning, a logical address is the location of a word relative to the beginning of the program; the
• processor translates that into a physical address.
• With paging, a logical address is a location of the word relative to the beginning of the page of the program, because
• the whole program is divided into several pages of equal length and the length of a page is same with the length of a
• page frame.
• A logical address consists of page number and relative address within the page, the process uses the page table to
• produce the physical address which consists of frame number and relative address within the frame.
• The figure on next page shows the allocation of frames to a new process in the main memory. A page table is
• maintained for each process. This page table helps us to find the physical address in a frame which corresponds to a
• logical address within a process.
The conversion of logical address to physical
address is shown in the figure for the Process A.
• This approach solves the problems. Main
memory is divided into many small equal size
frames. Each process is
• divided into frame size pages. Smaller process
requires fewer pages, larger process requires
more. When a process
• is brought in, its pages are loaded into
available frames and a page table is set up.
Virtual Memory
The basic mechanism for reading a word from memory involves the translation of a virtual or logical address,
consisting of page number and offset, into a physical address, consisting of frame number and offset, using a
page
table.
There is one page table for each process. But each process can occupy huge amount of virtual memory. But the
virtual memory of a process cannot go beyond a certain limit which is restricted by the underlying hardware of
the
MMU. One of such component may be the size of the virtual address register.
The sizes of pages are relatively small and so the size of page table increases as the size of process increases.
Therefore, size of page table could be unacceptably high.
To overcome this problem, most virtual memory scheme store page table in virtual memory rather than in real
memory.
This means that the page table is subject to paging just as other pages are.
When a process is running, at least a part of its page table must be in main memory, including the page table
entry of
the currently executing page.
There is one entry in the hash table and the inverted page table for
each real memory page rather than one per
virtual page.
Thus a fixed portion of real memory is required for the page table,
regardless of the number of processes or virtual
page supported.
Because more than one virtual address may map into the hash table
entry, a chaining technique is used for
managing the overflow.
The hashing techniques results in chains that are typically short –
either one or two entries.
The inverted page table in shown in the figure on next page...
Translation Lookaside Buffer (TLB)
Every virtual memory reference can cause two physical memory accesses.
One to fetch the appropriate page table entry
One to fetch the desired data.
Thus a straight forward virtual memory scheme would have the effect of doubling the memory
access time.
To overcome this problem, most virtual memory schemes make use of a special cache for page
table entries, usually
called Translation Lookaside Buffer (TLB).
This cache functions in the same way as a memory cache and contains those page table entries
that have been most
recently used.
In addition to the information that constitutes a page table entry, the TLB must also include the
virtual address of the
entry.
The figure in next page shows a possible organization of a TLB where the associative mapping
technique is used.
Set-associative mapped TLBs are also found in commercial
products.
An essential requirement is that the contents of the TLB
be coherent with the contents of the page table in the
main
memory.
When the operating system changes the contents of the
page table it must simultaneously invalidate the
corresponding entries in the TLB. One of the control bits
in the TLB is provided for this purpose.
Address Translation proceeds as follows:
l Given a virtual address, the MMU looks in the TLB for the reference page.
l If the page table entry for this page is found in the TLB, the physical address is obtained immediately.
l If there is a miss in the TLB, then the required entry is obtained from the page table in the main memory
and
the TLB is updated.
l When a program generates an access request to a page that is not in the main memory, a page fault is said
to
have occurred.
l The whole page must be brought from the disk into the memory before access can proceed.
l When it detects a page fault, the MMU asks the operating system to intervene by raising an exception.
(interrupt).
l Processing of active task is interrupted, and control is transferred to the operating system.
l The operating system then copies the requested page from the disk into the main memory and returns
control
to the interrupted task. Because a long delay occurs due to a page transfer takes place, the operating system
may suspend execution of the task that caused the page fault and begin execution of another task whose
page
are in the main memory.