Unit - Iv
Unit - Iv
Characteristics
Secondary
Name Register Cache Main Memory
Memory
less than 16
Size <1 KB <16GB >100 GB
MB
DRAM
On-chip/
Implementation Multi-ports (capacitor Magnetic
SRAM
memory)
0.25ns to
Access Time 0.5 to 25ns 80ns to 250ns 50 lakh ns
0.5ns
20000 to 1 5000 to
Bandwidth 1000 to 5000 20 to 150
lakh MB 15000
Operating Operating
Managed by Compiler Hardware
System System
from
Backing from Main
From cache Secondary from ie
Mechanism Memory
Memory
MAIN MEMORY:
Read-Only Memory
SRAM
SRAMs are faster but their cost is high because their cells
require many transistors. RAMs can be obtained at a lower
cost if simpler cells are used. A MOS storage cell based on
capacitors can be used to replace the SRAM cells. Such a
storage cell cannot preserve the charge (that is, data)
indefinitely and must be recharged periodically. Therefore,
these cells are called dynamic storage cells. RAMs using
these cells are referred to as Dynamic RAMs or simply
DRAMs.
Associative Memory
Last Updated : 01 Oct, 2024
HARDWARE ORGANIZATION
Keyboard
Display
Printer
CPU Board
Memory Board
I/O Board
The data flows between CPUs, RAM and I/O devices can take
place with the help of buses.
Match Logic
The match logic for each word can be derived from the comparison
algorithm for two binary numbers.
First, we neglect the key bits and compare the argument in A with the
bits stored in the cells of the words. Word i is equal to the argument in
A if Aj = Fij for j = 1, 2, ... , n. Two bits are equal if they are both 1 or
both 0.
We now include the key bit Kj in the comparison logic. The
requirement is that if Kj = 0, the corresponding bits of Aj and Fij need
no comparison. Only when Kj = 1 must they be compared. This
requirement is achieved by ORing each term with K'j thus:
This is necessary because each term is ANDed with all other terms
so that an output of 1 will have no effect. The comparison of the bits
has an effect only when Kj = 1.
The circuit for matching one word is shown in Fig. 9. Each cell
requires two AND gates and one OR gate. The inverters for A j and
Kj are needed once for each column and are used for all bits in the
column.
The output of all OR gates in the cells of the same word go to the
input of a common AND gate to generate the match signal for M i.
Mi will be logic 1 if a match occurs and 0 if no match occurs.
Note that if the key register contains all 0' s, output M j will be a 1
irrespective of the value of A or the word. This occurrence must be
avoided during normal operation.
Read Operation
If more than one word in memory matches the unmasked argument
field, all the matched words will have 1's in the corresponding bit
position of the match register.
In most applications, the associative memory stores a table with no
two identical items'under a given key.
In this case, only one word may match the unmasked argument field.
By connecting output Mi directly to the read line in the same word
position (instead of the M register), the content of the matched word
will be presented automatically at the output lines and no special read
command signal is needed.
Write Operation
For every active word stored in memory, the corresponding bit in the
tag register is set to 1. A word is deleted from memory by clearing its
tag bit to 0. Words are stored in memory by scanning the tag register
until the first 0 bit is encountered.
This gives the first available inactive word and a position for writing a
new word. After the new word is stored in memory it is made active
by setting its tag bit to 1. An unwanted word when deleted from
memory can be cleared to all 0' s if this value is used to specify an
empty location.
If the active portions of the program and data are placed in a fast
small memory, the average memory access time can be reduced,
thus reducing the total execution time of the program.
The cache memory access time is less than the access time of
main memory by a factor of 5 to 10. The cache is the fastest
component in the memory hierarchy and approaches the speed of
CPU components.
If the word addressed by the CPU is not found in the cache, the
main memory is accessed to read the word. A block of words
containing the one just accessed is then transferred from main
memory to cache memory. The block size may vary from one word
(the one just accessed) to about 16 words adjacent to the one just
accessed. In this manner, some data are transferred to cache so that
future references to memory find the required words in the fast cache
memory.
The ratio of the number of hits divided by the total CPU references
to memory (hits plus misses) is the hit ratio.
Hit ratios of 0.9 and higher have been reported. This high ratio
verifies the validity of the locality of reference property.
For example, a computer with cache access time of 100 ns, a main
memory access time of 1000 ns, and a hit ratio of 0.9 produces an
average access time of 200 ns. This is a considerable improvement
over a similar computer without a cache memory, whose access time
is 1000 ns.
Associative Mapping
The associative memory stores both the address and content (data)
of the memory word.
This permits any location in cache to store any word from main
memory.
The diagram shows three words presently stored in the cache.
If the address is found, the corresponding 12-bit data is read and
sent to the CPU.
If no match occurs, the main memory is accessed for the word. The
address--data pair is then transferred to the associative cache
memory.
Direct Mapping
The CPU address of 15 bits is divided into two fields. The nine least
significant bits constitute the index field and the remaining six bits
form the tag field.
The figure shows that main memory needs an address that includes
both the tag and the index bits.
Each word in cache consists of the data word and its associated tag.
When a new word is first brought into the cache, the tag bits are
stored alongside the data bits.
When the CPU generates a memory request, the index field is used
for the address to access the cache.
The tag field of the CPU address is compared with the tag in the
word read from the cache.
If the two tags match, there is a hit and the desired data word is in
cache.
The disadvantage of direct mapping is that the hit ratio can drop
considerably if two or more words whose addresses have the same
index but differenttags are accessed repeatedly.
Suppose that the CPU now wants to access the word at address
02000. The index address is 000, so it is used to access the cache.
The two tag_s are then compared.
The cache tag is 00 but the address tag is 02, which does not
produce a match. Therefore, the main memory is accessed and the
data word 5670 is transferred to the CPU.
The cache word at index address 000 is then replaced with a tag of
02 and data of 5670.
The index field is now divided into two parts: the block field and the
word field.
In a 512-word cache there are 64 blocks of 8 words each, since 64 x
8 = 512.
The block number is specified with a 6-bit field and the word within
the block is specified with a 3-bit field.
The tag field stored within the cache is common to all eight words of
the same block.
Although this takes extra time, the hit ratio will most likely improve
with a larger block size because of the sequential nature of computer
programs.
Set-Associative Mapping
Each data word is stored together with its tag and the number of tag-
data items in one word of cache is said to form a set.
Each index address refers to two data words and their associated
tags. Each tag requires six bits and each data word has 12 bits, so
the word length is 2(6 + 12) = 36 bits. An index address of nine bits
can accommodate 512 words. Thus the size of cache memory is 512
x 36.
When the CPU generates a memory request, the index value of the
address is used to access the cache. The tag field of the CPU
address is then compared with both tags in the cache to determine if
a match occurs.
The comparison logic is done by an associative search of the tags in
the set similar to an associative memory search: thus the name "set-
associative."
The hit ratio will improve as the set size increases because more
words with the same index but different tags can reside in cache.
With the random replacement policy the control chooses one tag-
data item for replacement at random. The FIFO procedure selects for
replacement the item that has been in the set the longest.
The LRU algorithm selects for replacement the item that has been
least recently used by the CPU.
Both FIFO and LRU can be implemented by adding a few extra bits
in each word of cache.
Writing into Cache
However, if the operation is a write, there are two ways that the
system can proceed.
This method has the advantage that main memory always contains
the same data as the cache.
It ensures that the data residing in main memory are valid at all times
so that an 110 device communicating through DMA would receive the
most recent updated data.
The second procedure is called the write-back method. In this
method only the cache location is updated during a write operation.
The location is then marked by a flag so that later when the word is
removed from the cache it is copied into main memory.
The reason for the write-back method is that during the time a word
resides in the cache, it may be updated several times; however, as
long as the word remains in the cache, it does not matter whether the
copy in main memory is out of date, since requests from the word are
filled from the cache.
It is only when the word is displaced from the cache that an accurate
copy need be rewritten into main memory.
Cache Initialization
The valid bit of a particular cache word is set to 1 the first time this
word is loaded from main memory and stays set unless the cache
has to be initialized again.
The introduction of the valid bit means that a word in cache is not
replaced by another word unless the valid bit is set to 1 and a
mismatch of tags occurs.
Thus the initialization condition has the effect of forcing misses from
the cache until it fills with valid data.
Virtual Memory
Thus CPU will reference instructions and data with a 20-bit address,
but the information at this address must be taken from physical
memory because access to auxiliary storage for individual words will
be prohibitively long.
In the second case, the tab takes space from main memory and two
accesses to memory are required with the program running at half
speed.
The term page refers to groups of address space of the same size.
For example, if a page or block consists of 1K words, then, using the
previous example, address space is divided into 1024 pages and
main memory is divided into 32 blocks.
Although both a page and a block are split into groups of 1K words,
a page refers to the organization of address space, while a block
refers to the organization of memory space.
Since each page consists of 210 = 1024 words, the highorder three
bits of a virtual address will specify one of the eight pages and the
low-order 10 bits give the line address within the page. Note that the
line address in address space and memory space is the same; the
only mapping required is from a page number to a block number.
The memory-page table consists of eight words, one for each page.
The address in the page table denotes the page number and the
content of the word gives the block number where that page is stored
in main memory.
The table shows that pages 1, 2, 5, and 6 are now available in main
memory in blocks 3, 0, 1, and 2, respectively.
A presence bit in each location indicates whether the page has been
transferred from auxiliary memory into main memory. A 0 in the
presence bit indicates that this page is not available in main memory.
The content of the word in the memory page table at the page
number address is read out into the memory table buffer register.
transfers the content of the word to the main memory buffer register
ready to be used by the CPU.
If the presence bit in the word read from the page table is 0, it
signifies that the content of the word referenced by the virtual address
does not reside in main memory.
At any given time, at least 992 locations will be empty and not in
use.
A more efficient way to organize the page table would be to
construct it with a number of words equal to the number of blocks in
main memory.
In this way the size of the memory is reduced and each location is
fully utilized. This method can be implemented by means of an
associative memory with each word in memory containing a page
number together with its corresponding block number.
The page field in each word is compared with the page number in
the virtual address.
If a match occurs, the word is read from memory and its
corresponding block number is extracted. Consider again the case of
eight pages and four blocks as in the example of Fig. 19.
If the page number is found, the 5-bit word is read out from memory.
VIRTUAL MEMORY
Associative Memory