block 2 memory input output-3
block 2 memory input output-3
6.0 INTRODUCTION
In the last unit, the concept of Memory hierarchy was discussed. The Unit also
discussed different types of memories including RAM, ROM, flash memory,
secondary storage technologies etc. The memory system of a computer uses variety of
memories for program execution. These memories vary in size, access speed, cost and
type, such as volatility (volatile/ non-volatile), read only or read-write memories etc.
As you know, a program is loaded in to the main memory for execution. Thus, the size
and speed of the main memory affects the performance of a computer system. This
unit will introduce you to concepts of cache memory, which is small memory between
the processing unit and main memory. Cache memory enhances the performance of a
computer system. Interleaved memory and associative memories are also used as
faster memories. Finally, the unit discusses the concept of virtual memory, which
allows programs larger than the physical memory.
6.1 OBJECTIVES
After going through this Unit, you will be able to:
explain the concept of locality of reference;
explain the different cache organisation schemes;
explain the characteristics of interleaved and associative memories;
explain the concept of virtual memory.
5
Basic Computer Orga
anisation The importaant task of a computer
c is to
t execute insstructions. It has
h been observed that
on an average 80-85 percent of thee execution ttime is spennt by the proocessor in
accessing thhe instructionn or data fromm the main mmemory. The situation
s becoomes even
worst whenn instruction tto be executeed or data to be processedd is not present in the
main memoory.
Another facctor which has been observved by analyssing various programs
p is thhat during
the programm execution, the processsor tends to access a seection of thee program
instructions or data for a specific timee period. For example, wh hen a program m enters in
a loop struccture, it conttinues to acceess and execute loop stattements as loong as the
looping conndition is satiisfied. Similaarly, wheneveer a program calls a subrooutine, the
subroutine statements
s aree going to exxecute. In another case, when
w a data ittem stored
in an array or
o array like sstructure is acccessed then it is very likeely that eitherr next data
item or prevvious data iteem will be accessed by thee processor. AllA these phennomenons
are known as
a Locality off Reference orr Principle off Locality.
So, accordinng to the priinciple of loccality, for a specific timee period, the processor
tends to maake memory references
r cloosed to each oother or accesses the samee memory
addresses again
a and again. The earrlier type is known as sppatial locality ty. Spatial
locality speccifies if a datta item is acccessed then daata item storeed in a nearbyy location
to the data item just acccessed may be b accessed inn near futuree. There can bbe special
case of spaatial locality, which is terrmed as sequuence localityy. Consider a program
accesses thee elements oof a single dim mensional arrray, which is a linear data structure,
in the sequeence of its inddex. Such acccesses will reead/write on a sequence of memory
locations onne after the otther. This typee of locality, which
w is a case of spatial locality,
l is
referred to as
a sequence loocality.
Another typpe of localitty is the tem mporal localiity, if a dataa item is accessed or
referenced at
a a particularr time, then thhe same data item is expected to be acccessed for
i near future. Typically it is observedd in loop stru
some time in uctures and subroutine
s
call.
As shown in n Figure 6.1, when the proogram enters iin the loop strructure at linee 7, it will
execute the loop statemeents again andd again multipple times till the loop termminates. In
this case, processor
p needs to access instructions 9 and 10 freequently. On the other
hand, when a program acccesses a dataa item store inn an array, thhen in the nexxt iteration
it accesses a data item stoored in an adjjacent memorry location to the previous one.
If you keep the content of the cluster of expected memory references in a small,
extremely fast memory then processing time of an instruction can be reduced by a
significant amount. Cache memory is a very high speed and expensive memory as
compared to the main memory and its access time is closer to the processing speed of
the processor. Cache memory act as a buffer memory between the processor and the
main memory.
Because cache is an expensive memory so its size in a computer system is also very
small as compared to the main memory. Thus, cache stores only those memory
clusters containing data/ instructions, which have been just accessed or going to be
accessed in near future. Data in the cache is updated based on the principle of locality
explained in the previous section.
Data in main memory is stored in the form of fixed size blocks/pages. Cache memory
contains some blocks of the main memory. When processor wants to read a data item
from the main memory, a check is made in the cache whether data item to be accessed
is present in the cache or not. If data item to be accessed is present in the cache then it
is read by the processor from the cache. If data item is not found in the cache, a
memory reference is made to read the data item from the main memory, and a copy of
the block containing data item is also copied into the cache for near future references
as explained by the principle of locality. So, whenever processor attempts to read the
data item next time, it is likely that the data item is found in the cache and saves the
time of memory reference to the main memory.
As shown in the Figure 6.2, if requested data item is found in the cache it is called as
cache hit and data item will be read by the processor from the cache. And if requested
data item is not found in cache, called cache miss, then a reference to the main
memory is made and requested data item is read and block containing data item will
also be copied into the cache.
7
Basic Computer Organisation Average access time for any data item is reduced significantly by using cache then
without using cache. For example, if a memory reference takes 200 ns and cache takes
20 ns to read a data item. Then for five continuous references will take:
Time taken with cache : 20 (for cache miss) + 200 (memory reference)
+ 4 x 20 (cache hit for subsequent access)
= 300 ns
Effective access time is defined as the average access time of memory access, when a
cache is used. The access time of memory access is reduced in case of a cache hit,
whereas it increases in case of cache miss. In the above mentioned example processor
takes 20 + 200 ns for a cache miss, whereas it takes only 20 ns for each cache hit.
Now suppose, we have a hit ratio of 80%, i.e. 80 percent of times a data item would
be found in the cache and 20 % of the times it would be accessed from the main
memory. So effective access time (EAT) will be computed as :
effective access time = (cache hit x data access time from cache only )
+(cache miss x data access time from cache and main memory)
From the example it is clear that cache reduces the average access time and effective
access time for a data item significantly and enhance the computer performance.
3. Hit ration of computer system is 90%. The cache has an access time of 10ns,
whereas the main memory has an access time of 50ns. Computer the effective
access time for the system.
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
8
The Memory System
Cache is an extremely fast but very expensive memory as compared to the main
memory. So large cache memory may shoot up the cost of the computer system and
too small cache might not be very useful in real time. So, based on various statistical
analyses, if a computer system has 4 GB of main memory then the size of the cache
may go up to 1MB.
What would be the block size for data transfer between cache and main memory?
Block size directly affects the cache performance. Higher block size would ensure
only fewer number of blocks in cache, whereas small block size contains fewer data
items. As you increase the block size, the hit ratio first increases but it decreases as
you further increase the block size. Further increase in block size will not necessarily
result in access of newer data items, as probability of accessing data items in the block
with larger number of data items tends to decrease. So, optimal size of the block
should be chosen to maximise the hit ratio.
As execution of the process continues, the processor requests for new data items. For
new data items and thus, new blocks to be present in the cache, the blocks containing
old data items must be replaced. So there must be a mechanism which may select the
block to be replaced which is least likely to be needed in near future.
When changes in the blocks will be written back on to the main memory?
During the program execution, the value of a data item in a cache block may get
changed. So the changed block must be written back to the main memory in order to
reflect those changes to ensure data consistency. So there must be a policy, which may
decide when the changed cache block is written back to the main memory.
In certain computer organisations, the cache memory for data and instruction are
placed separately. This results in separate address spaces for the instructions and data.
These separate caches for instructions and data are known as instruction cache and
data cache respectively. If processor requests an instruction, then it is provided by the
instruction cache, whereas requested data item is provided by the data cache. Using
separate cache memories for instruction and data enhances computer performance.
While some computer systems implements different cache memories for data and
instructions other implements multiple level of cache memories. Two level cache
popularly known as L1 cache and L2 cache is most commonly used. Size of level 1
cache or L1 cache is smaller than the level 2 or L2 cache. Comparatively more
frequently used data/ instructions are stored in L1 cache.
9
Basic Computer Organisation
As discussed earlier, the main memory is divided into blocks/ frames/ pages of k
words each. Each word of the memory unit has a unique address. A processor requests
for read/write of a memory word. When a processor's request of a data item cannot be
serviced by cache memory, i.e. a cache miss occurs, the block containing requested
data item is read from the main memory and a copy of the same is stored in cache
memory. A cache memory is organised as a sequence of line. Each cache line is
identified by a cache line number. A cache line stores a tag and a block of data. Cache
and main memory structure is shown in Figure 6.3. General structure of cache
memory having M lines and N=2n main memory size is shown in figure 6.3(a) and
figure 6.3(b) respectively.
Main Memory Block Address ((n-k) bits) Block address size (k bits)
((n-k)-m) bits for tag m bits to identify the Cache line Block address size (k bits)
Now, the following steps will be taken by the processing logic of processing unit and
hardware of Cache memory:
1. The tag number (FE in this case) is compared against the Tag number of data
stored in the cache line (DCB in this case).
2. In case both are identical
then (this is the case of cache hit): Ath word from the cache line DCB is
accessed by the processing logic.
else (this is a case of cache miss): The cache line 16 words data is read to
cache memory line (DCB) and its tag number is now FE. The
required Ath word is now accessed by the processing logic
Direct mapping is very easy to implement but has a disadvantage as location in which
a specific block is to be stored in cache is fixed. This arrangement leads to low hit
ratio as when processor wants to read two data items belongs to two different blocks,
which map to single cache location, then each time other data item is requested, the
block in the cache must be replaced by the requested one. This phenomenon is also
known as thrashing.
Associative Mapping:
Associative mapping is the most flexible mapping in cache organisation as it allows to
store any block of the main memory in any of the cache line/or location. It uses
complete (n-k) bits of block address field as a tag field. Cache memory stores (n-k)
bits of Tag and (2k × Word Size in bit) data. When a data item/ word is requested, (n-
k) bit tag field is used by the cache control logic to search the all the tag fields stored
in the cache simultaneously. If there is a match (cache hit) then corresponding data
item is read from the cache, otherwise (cache miss) the block of data that contains the
word to be accessed is read from the main memory. It replaces any of the cache line.
In addition, the block address of the accessed block from the main memory replaces
the tag of the cache line. It is also the fastest mapping amongst all types. Different
block replacement policies are used for replacing the existing cache content by newly
read data, however, those are beyond the scope of this unit. This mapping requires
most complex circuitry, as it requires all the cache tags to be checked simultaneously
with the block address of the access request.
Main Memory Address :
Address bits for identifying
Address of a block of data is same as Tag
a word in a Block
(n-k) bits k bits
13
Basic Computer Organisation
Tag Data Block of k words
14
Cache mapping logic uses d-bits to identify the set as 𝑣 2 and ((n-k)-d)) bits are The Memory System
used to represent the tag field. In set-associative mapping, a block j can be stored at
any of the cache line of set i. To read a data item, the cache control logic first
simultaneously looks into all the cache lines using ((n-k)-d)) bits of tag field of the set
identified by d-bits of the set field, otherwise a data item is read from the main
memory and corresponding data is copied into the cache accordingly. Set associative
mapping is also known as w-way set-associative mapping. It uses lesser number of
bits (((n-k)-d) bits) as compare to (n-k) bits in associative mapping in tag field.
A comprehensive example showing possible locations of main memory blocks
in Cache for different cache mapping schemes is discussed next.
A word like 00011110 can in any cache line, for example, in the cache memory
shown above it is in line 2 and can be accessed.
(iii) 2way set associative Mapping:
The size of cache = 32 bytes
The block size of main memory = words in one line of cache =4 ⇒ k=2 bits
The number of lines in a set (w) = 2 (this is a 2 way set associative memory)
The number of sets (v) = Size of cache in words/(words per line × w )
= 32/(4×2) =4
Thus, set number can be identified using 2 bits as 22 = 4
17
Basic Computer Organisation Tag size = (n-k)-v = (8 - 2) - 2 = 4
The address mapping for an address: 11111101
Block Address of Main Memory Address of a word in a
Block
1111 11 01
1111 11 01
Tag Set Number
Set number = 11 = 3 in decimal
Tag = 1111
The address mapping for an address: 00001011
Block Address of Main Memory Address of a word in a
Block
0000 10 11
0000 10 11
Tag Set Number
Set number = 10 = 2 in decimal
Tag = 0000
• Caches and main memory can be altered by multiple processes which may
result in inconsistency in the values of the data item in cache and main
memory.
18
The Memory System
• If there are multiple CPUs with individual cache memories, data item written
by one processor in one cache may invalidate the value of the data item in
other cache memories.
These issues can be addressed in two different ways:
1. Write through: This writing policy ensures that if a CPU updates a cache,
then it has to write/ or make the changes in the main memory as well. In
multiple processor systems, other CPUs-Cache need to keep an eye over the
updates made by other processor's cache into the main memory and make
suitable changes accordingly. It creates a bottleneck as many CPUs try to
access the main memory.
2. Write Back: Cache control logic uses an update bit. Changes are allowed to
write only in cache and whenever a data item is updated in the cache, the
update bit of the block is set. As long as data item is in the cache no update is
made in the main memory. All those blocks whose update bit is set is replaced
in the main memory at the time when the block is being replaced in the cache.
This policy ensures that all the accesses to the main memory are only through
cache, and this may create a bottleneck.
You may refer to further readings for more details on cache memories.
Check Your Progress 2
1. Assume that a Computer system have following memories:
RAM 64 words with each word of 16 bits
Cache memory of 8 Blocks (block size of cache is 32 bits)
Find in which location of cache memory a decimal address 21 can be found if
Associative Mapping is used.
……………………………………………………………………………………
……………………………………………………………………………………
2. For the system as given above, find in which location of cache memory a decimal
address 27 will be located if Direct Mapping is used.
…………………………………………………………………………………………
………………………………………………………………………………
3. For the system as given above, find in which location of cache memory a decimal
address 12 will be located if two way set associative Mapping is used.
……………………………………………………………………………………
……………………………………………………………………………………
Hardware Organization
Associative memory consists of a memory array and logic for m words with n bits per
word as shown in block diagram in Figure 6.15. Both argument register (A) and key
register (K) have n bits each. Each bit of argument and key register is for one bit of a
word. The match register M has m bits, one each for each memory word.
The key register provides a mask for choosing a particular field or key in the argument
word. The entire argument is compared with each memory word only if the key
register contains all 1s. Otherwise, only those bits in the argument that have 1s in their
corresponding positions of the key register are compared. Thus, the key provides a
mask or identifying information, which specifies how reference to memory is made.
The content of argument register is simultaneously matched with every word in the
memory. Corresponding bits in the mach register is set by the words that have match
with the content of the argument register. Set bits of the matching register indicates
that corresponding words have a match. Thereafter, memory is accessed sequentially,
to read only those words whose corresponding bits in the match register have been set.
21
Basic Computer Organisation
Example: Consider an associative memory of just 2 bytes. The content register and
argument registers are also shown in the diagram.
Please note as four most significant bits of key register are 1, therefore only they are
matched.
22
Let us say, you have a main memory of size 256K (218)words. This requires 18-bits to The Memory System
specify a physical address in main memory. A system also has an auxiliary memory as
large as the capacity of 16 main memories. So, the size of the auxiliary memory is
256K ×16 = 4096 K which requires 24 bits to address the auxiliary memory. A 24-bit
virtual address will be generated by the processor which will be mapped into an 18-bit
physical address by the address mapping mechanism as shown in Figure 6.17.
6.8 SUMMARY
This unit introduces you to the concept relating to cache memory. The unit defines
some of basic issues of cache design. The concept of cache mapping schemes were
explains in details. The direct mapping cache uses simple modulo function, but has
limited use. Associative mapping though allows flexibility but uses complex circuitry
and more bits for tag field. Set-associative mapping uses the concept of associative
and direct mapping cache. The unit also explain the use of memory interleaving,
which allows multiple words to be accessed in a single access cycle. The concept of
23
Basic Computer Organisation content addressable memories are also discussed. The cache memory, memory
interleaving and associative memories are primarily used to increase the speed of
memory access. Finally, the unit discusses the concept of virtual memory, which
allows execution of programs requiring more than physical memory space on a
computer. You may refer to further readings of the block for more details on memory
system.
6.9 ANSWERS
Check Your Progress 1
In set associative memory the given tag can be stored in any of the 8
lines.
2. Main memory size = 64 words (a word = 16 bits) = 26 ⇒ n=6 bits
Block Size = 32 bits = 2 words = 21 ⇒ k=1 bit
The size of cache = 8 blocks of 32 bits each = 8 lines ⇒ m=3 bits
Tag size for direct mapping = (n-k) - m = (6 - 1) - 3 = 2
The address mapping for an address: 27 in decimal that is 011011
1. Memory interleaving divides the main memory into modules. Each of these
module stores the words of main memory as follows (example uses 4 modules
and 16 word main memory.
Module 0: Words 0, 4, 8, 12 Module 1: Words 1, 5, 9, 13
Module 2: Words 2, 6, 10, 14 Module 3: Words 3, 7, 11, 15
Thus, several consecutive memory words can be fetched from the interleaved
memory in one access. For example, in a typical access words 4, 5, 6, and 7 can
be accessed simultaneously from the Modules 0, 1, 2 and 3 respectively.
2. Associative memory do not use addresses. They are accessed by contents. They
are very fast.
25