0% found this document useful (0 votes)
27 views113 pages

CO Unit 4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views113 pages

CO Unit 4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 113

The Memory System

Outline
• Basic memory circuits & memory organization
• Memory technology
• Direct memory access
• Memory hierarchy concepts
• Cache memory and virtual memory
• Magnetic and optical disks
Basic Concepts
• Access provided by processor-memory interface
• The processor uses the address lines to specify the memory
location involved in a data transfer operation, and uses the data
lines to transfer the data, the control lines carry the command
indicating a Read or a Write operation
• Speed of the memory unit is measured by
1. Memory access time
2. Memory cycle time
• Memory access time is time from initiation to completion of a
word or byte transfer
• Memory cycle time is minimum time delay between initiation of
two successive transfers[ex: the time between two successive
Read operations]
• cycle time is usually slightly longer than the access time
• Random-access memory (RAM) means that access time is same,
independent of location
Cache and Virtual Memory
• The main memory is slower than processor
• Cache memory is smaller and faster memory that is used to
reduce effective access time
• Holds subset of program instructions and data
• Information for one or more active programs may exceed
physical capacity of the memory
• Virtual memory provides larger apparent size by transparently
using secondary storage. Sections of the program are transferred
back and forth between the main memory and the secondary
storage device in a manner that is transparent to the application
program
• Both approaches need efficient block transfers. These transfers
do not occur one word at a time. Data are always transferred in
contiguous blocks involving tens, hundreds or thousands of
words.
Semiconductor RAM Memories
• Semiconductor Random Access Memories (RAMs) are available
in a wide range of speeds. Their cycle times range from 100 ns
to less than 10 ns.
• Memory chips have a common organization
• Cells holding single bits arranged in an array
• Words are rows; cells connected to word lines
• Cells in columns connect to bit lines
• Sense/Write circuits are interfaces between internal bit lines and
data I/O pins of chip
• Typical control pin connections include Read/Write command
and chip select (CS)
Internal Organization and Operation
• Example of 16-word  8-bit memory chip has decoder to
select word line from 4-bit address
• Two complementary bit lines for each data bit
• External source provides stable address bits, and asserts
chip-select input with command
• For Read operation, Sense/Write circuits transfer data from
selected row to I/O pins
• For Write operation, Sense/Write circuits transfer data from
I/O pins to selected cells
• 168 chip has only 128 storage cells, So it require 8pins
• Address decoder (4),Data I/O lines(8),R/W and CS(2) and
power/ground(2). So totally it need 482216 pins
More on Chip Organization
• Larger chips: similar organization, more pins[costly]
• For example(1K) a chip with 1024 cells could be
organized as 1288 (782219 pins)
How to reduce more pins?
• Alternatives:
• 1. cells can be organized into a 1K×1 format. So, a 10-bit
address is needed, but there is only one data line, R/W and
CS(2) and power/ground(2) resulting in 15 pins
• 2.Use 3232 array & divide address bits to have 5 upper
bits for row, 5 lower bits for column, one data line, R/W
and CS(2) and power/ground(2) resulting in 15 pins
Organization of a 1K × 1 memory chip using 5 bit row address and 5 bit column
address
Static RAMs and CMOS Cell
• SRAM: Data once written are retained as long as power is
ON. usually has short access time (few nanosecs.)
• A static RAM cell in a chip consists of two cross-connected
inverters to form a latch[store 1bit of data]
• Chip implementation typically uses CMOS cell whose
advantage is low power consumption
• Two transistors controlled by word line act as switches
between the cell and the bit lines
• To write, bit lines driven with desired data
X X

PMOS NMOS
When X=1,
When X=0,
Transistor is ON
Transistor is ON
there is a
there is a
conducting path
conducting path
from S to D
from S to D
1 bit SRAM Cell using inverter/ Not gate
Transistors that can be ON and
OFF under the control of word
line. To retain the state of the
latch, the word line can be
grounded which make transistors
OFF
Read operation:
1. Word line is activated(=1), to
make T1 and T2 ON.
2. Both T1 and T2 ON, then the
value in X and Y (i.e)the
value stored in latch is
available on b and b’.
3. Sense/Write circuit is
connected to the bit lines ,
monitor the states of b and b’
Write Operation
To Write 1: The bit line b is set with 1 and bit line b’ is set
with 0. Then the word line is activated and the data is written
to the latch.

To Write 0: The bit line b is set with 0 and bit line b’ is set
with 1. Then the word line is activated and the data is written
to the latch.

6 Transistor static memory cell


1bit SRAM cell with 6 transistors are used in modern day
SRAM implementations.
Transistors (T3&T5) and (T4&T6) form the CMOS inverters
in the latch
In state 1 : Voltage at X is high and voltage at Y is low
1.When the voltage at X is high, transistors T6 is ON and T4
is OFF.
2. Similarly the voltage at Y is low, transistors T5 is OFF and
T3 is ON.
Thus, (T4,T5) OFF , (T3,T6) ON.
3.When word line is activated, T1 and T2 are ON then b=1,
b’=0.

In state 0 : Voltage at X is low and voltage at Y is high


1.When the voltage at X is low, transistors T6 OFF and T4 is
ON. 2.Similarly the voltage at Y is high, transistors T5 is ON
and T3 is OFF. Thus, (T4,T5) ON , (T3,T6) OFF.
3.When word line is activated, T1 and T2 are ON then b=0,
b’=1
Dynamic RAMs
• Static RAMs have short access times, but need several
transistors per cell, so density is lower
• Dynamic RAMs are simpler for higher density and lower
cost, but access times are longer
• Density/cost advantages outweigh slowness
• Dynamic RAMs are widely used in computers
• Cell consists of a transistor and a capacitor
• State is presence/absence of capacitor charge
• Charge leaks away and must be refreshed
Sense/Write Circuit
Read Operation:
1. The transistor of the particular cell is turned ON by
activating the word line.
2. Sense circuit is connected to bit line senses the charge
stored in the capacitor.
3. If the charge is above threshold, the bit line has high
voltage, which represents 1.
4. If the charge is below threshold, the bit line has 0 voltage,
which represents 0.
Write Operation:
1. The transistor of the particular cell is turned ON by
activating the word line.
2. Depending on the value to be written (0 or 1), an
appropriate voltage is applied to the bit line.
3. The capacitor gets charged to the required voltage state.
4. Refreshing of the capacitor is to be done.
32MB Dynamic RAM Chips
• Consider 32M  8 chip with 16K  16K array
• 16,384 rows and each row has 16,384 cells.
• 16,384 cells per row organized as 2048 bytes
• 14 bits to select row, 11 bits for byte in row
• In total, a 25-bit address is needed to access a byte in this
memory. The high-order 14 bits and the low-order 11 bits of the
address constitute the row and column addresses of a byte,
respectively. To reduce the number of pins needed for external
connections, the row
and column addresses are multiplexed on 14 pins.
• Use multiplexing of row/column on same pins
• Row/column is identified by address strobe signals(RAS, CAS)
• Row/column address latches capture bits
• Asynchronous DRAMs: delay-based access, external controller
refreshes rows periodically
•During a Read or Write operation, the row address is applied
first. It is loaded into the row address latch from the input
control line called the Row Address Strobe (RAS).
•After the row address is loaded, the column address is applied
to the address pins and loaded into the column address latch
under control of a second control line called the Column
Address Strobe (CAS).
•The information in this latch is decoded and the appropriate
group of 8 Sense/Write circuits is selected.
• If the R/W control signal indicates a Read operation, the
output values of the selected circuits are transferred to the data
lines, D7−0.
•For a Write operation, the information on the D7−0 lines is
transferred to the selected circuits
Fast Page Mode
• In preceding example, all 16,384 cells in a row are
accessed
• But only 8 bits of data are actually transferred
for each full row/column addressing sequence
• For more efficient access to data in same row, latches
in sense amplifiers hold cell contents
• For consecutive data, just assert CAS signal and
increment column address in same row
• This fast page mode is useful in block transfers
Synchronous DRAMs
• In early 1990s, DRAM technology enhanced by including
clock signal with other chip pins
• More circuitry also added for enhancements
• These chips are called synchronous DRAMs
• Sense amplifiers still have latching capability
• Additional benefits from internal buffering and availability
of synchronizing clock signal
• Internal row counter enables built-in refresh instead of
relying on external controller
SDRAM Features
• Synchronous DRAM (SDRAM) chips include data
registers as well as address latches
• New access operation can be initiated while data are
transferred to or from these registers
• Also have more sophisticated control circuitry
• SDRAM chips require power-up configuration
• Memory controller initializes mode register
• Used to specify burst length for block transfers and also
to set delays for control of timing
Efficient Block Transfers
• Asynchronous DRAM incurs longer delay from CAS
assertion for each column address
• Synchronous DRAM reduces delay by having CAS
assertion once for initial column address
• SDRAM circuitry increments column counter and
transfers consecutive data automatically
• Burst length determines number of transfers
• Consider example with burst length of 4, RAS delay of 2
cycles, CAS delay of 2 cycles
• The first word of data is transferred after five clock
cycles
Latency
Data transfers to and from the main memory often involve blocks
of data. During block transfers, memory latency is the amount of
time it takes to transfer the first word of a block. The time required
to transfer a complete block depends also on the rate at which
successive words can be transferred and on the size of the block.
The time between successive words of a block is much shorter than
the time needed to transfer the first word.

[Ex]The first word of data is transferred after five clock cycles. So,
the latency is five clock cycles. If the clock rate is 500 MHz, then
the latency = 5/(500×106) =10 ns. The remaining three words are
transferred in consecutive clock cycles, at the rate of one word
every 2 ns(1/500×106). Time between successive words of a block
is much shorter than the time needed to transfer the first word. Here
2ns<10ns. Then total latency = 5+4 cycles = (10 +(4*2))=18ns
Bandwidth
It is the number of bits or bytes that can be transferred in one
second. It depends on the speed of access to the stored data and on
the number of bits that can be accessed in parallel.

EX1: Consider a main memory built with SDRAM chips. Data are
transferred in burst length of 8. Assume that 32 bits of data are
transferred in parallel. If a 400-MHz clock is used, how much time
does it take to transfer: (a) 32 bytes of data (b) 64 bytes of data.
What is the latency in each case?
Each column address strobe causes 8× 4 = 32 bytes to be
transferred.
(a) The first word of data is transferred after five clock cycles. So,
the latency is five clock cycles. If the clock rate is 400 MHz, then
the latency = 5/(400×106) =12.5 ns. The remaining seven words at
the rate of one word of every 2.5 ns(1/400×106). So total time = 5 +
8 = 13 clock cycles, or = (12.5+(8*2.5))=32.5 ns.
(b) A second column strobe is needed to transfer the second
burst of 32 bytes. Therefore:
Latency = 5 clock cycles or 12.5 ns
Total time = 5 + 8 + 2 + 8 = 23 clock cycles, or 57.5 ns

Double-Data-Rate (DDR) SDRAM


•Early SDRAMs transferred on rising clock edge
•Later enhanced to use both rising and falling edges
•Doubles effective rate after RAS/CAS assertion
•Requires more complex clock/control circuitry
Structure of Larger Memories

Larger memories combine multiple chips and how are these


chips connected together?
Memories can be extended using both
1. Static RAM technology
2. Dynamic RAM technology
Static RAM technology
• Problem: Construct 2M memory[Each word is 32 bits] to
implement with 512K memory[Each word is 8 bits]based on
Static Memory systems.
• Memory size is 2M words, each word is 32 bits in size
• Implement with 512K words, each word is 8 bits using SRAM
chips
• Number of memory chips = (2×220×32)/ (512×210×8) =
(2×210×4)/512= 4rows×4 columns=16 chips
4 chips for 32 bits[Arrange it in llly], then 4 groups of 4 chips for 2M
Address Decoder
• 2M word-addressable memory needs 21 bits.
• Each chip has only 19 address bits (512K = 29 × 210  219 )
• Address bits A20 and A19 select one of 4 groups
• Outputs of 2-bit decoder drive chip-select pins, Only the selected
chips respond to request. When a chip is not selected (i.e., CS
input is 0), its I/O pins are electrically disconnected
Dynamic RAM technology
Modern computers use very large memories. A large memory leads
to better performance, because more of the programs and data used in
processing can be held in the memory, thus reducing the frequency of
access to secondary storage.
Because of their high bit density and low cost, dynamic RAMs,
mostly of the synchronous type, are widely used in the memory units
of computers. They are slower than static RAMs, but they use less
power and have lower cost per bit. A single chip have capacities as
high as 2G bits, and even larger chips are being developed.
Memory controller: The address applied to dynamic RAM chips is
divided into two parts. The high-order address bits, which select a row
in the cell array, are provided first and latched into the memory chip
under control of the RAS signal. Then, the low-order address bits,
which select a column, are provided on the same address pins and
latched under control of the CAS signal. Since a typical processor
issues all bits of an address at the same time, a multiplexer is
required. This function is performed by a memory controller circuit.
Refresh Circuit: Dynamic RAMs must be refreshed periodically.
The circuitry required to initiate refresh cycles is included as part
of the internal control circuitry of synchronous DRAMs.
Refresh Overhead: A dynamic RAM cannot respond to read or
write requests while an internal refresh operation is taking place.
Such requests are delayed until the refresh cycle is completed.
Choice of Technology: The choice of a RAM chip for a given
application depends on several factors. Foremost among these are
the cost, speed, power dissipation, and size of the chip.
Comparison between Static and Dynamic RAM
Static RAMs are characterized by their very fast operation. But,
their cost and bit density are adversely affected by the circuit that
realizes the basic cell. They are used mostly where a small but very
fast memory is needed.
Dynamic RAMs, on the other hand, have high bit densities and a
low cost per bit. Synchronous DRAMs are the predominant choice
for implementing the main memory.
Solved Problem
1. Describe a structure for an 8M × 32 memory using 512K × 8
memory chips and find the number of address bits.
Solution:
Memory size is 8M words, each word is 32 bits in size
Implement with 512K words, each word is 8 bits using SRAM
chips
Number of memory chips = (8×220×32)/ (512×210×8)=
16rows×4columns =64chips
4column chips for 32 bits, then 16 groups of 4 chips for 8M. So,
The required structure is 16 rows, each row with four 512K × 8
chips.
Address Decoder
8M word-addressable memory needs 23 bits(8M= 23 × 220  223 )
Each chip has only 19 address bits (512K = 29 × 210  219 ).
Address lines A18−0 should be connected to all chips.
Address lines A22−19 should be connected to a 4-bit decoder to
select one of the 16 rows.
EX2: Describe a structure for a 16M × 32 memory using 1M× 4
memory chips.
Number of memory chips = (16×220×32)/ (1×220×4)=
16rows×8columns =128chips

16M word-addressable memory needs 24 bits(16M=24 × 220  224


Each chip has only 20 address bits (1M  220). Address lines
A19−0 should be connected to all chips.
Address lines A23−20 should be connected to a 4-bit decoder to
select one of the 16 rows.
Read-only Memories
• Static and dynamic RAM chips are volatile, (information
retained only when power is on). Some applications require
information to be retained in memory chips when power is off
• Read-only memory (ROM) chips provide the non volatile storage
for such applications. Special writing process sets memory
contents
• Read operation is similar to volatile memories
Basic ROM Cell
• A read-only memory (ROM) has its contents written only once, at
the time of manufacture
• The basic ROM cell in such a memory contains a single transistor
switch for the bit line
• The other end of the bit line is connected to the power supply
through a resistor
• If the transistor is connected to ground, bit line voltage is zero, so
cell stores a 0
• Otherwise, bit line voltage is high for a 1
PROM
Some ROM designs allow the data to be loaded by the user, thus
providing a programmable ROM (PROM). Programmability is
achieved by inserting a fuse at point P in ROM cell.
Before it is programmed, the memory contains all 0s. The user can
insert 1s at the required locations by burning out the fuses at these
locations using high-current pulses. This process is irreversible.
EPROM
•It allows the stored data to be erased and new data to be written into
it. Such an erasable, reprogrammable ROM is usually called an
EPROM. Since EPROMs are capable of retaining stored information
for a long time, they can be used in place of ROMs or PROMs while
software is being developed.
•An EPROM cell has a structure similar to the ROM cell . However,
the connection to ground at point P is made through a special
transistor. The transistor is normally turned off, creating an open
switch. It can be turned ON by injecting charge into it.
•Erasure requires remove the charge in the transistors that form the
memory cells. This can be done by exposing the chip to ultraviolet
light, which erases the entire contents of the chip. To make this
possible, EPROM chips are mounted in packages.
EEPROM
An EPROM must be physically removed from the circuit for
reprogramming. Also, the stored information cannot be erased
selectively. The entire contents of the chip are erased when exposed to
ultraviolet light.
Another type of erasable PROM can be programmed, erased, and
reprogrammed electrically. Such a chip is called an electrically
erasable PROM, or EEPROM. It does not have to be removed for
erasure. Also, it is possible to erase the cell contents selectively.
One disadvantage of EEPROMs is that different voltages are
needed for erasing, writing, and reading the stored data, which
increases circuit complexity.
This disadvantage is outweighed by the many advantages of
EEPROMs. They have replaced EPROMs in practice.
Flash Memory
• Flash memory is based on EEPROM cells
• For higher density, Flash cells are designed
to be erased in larger blocks, not individually
• Writing individual cells requires reading block, erasing
block, then writing block with changes
• Greater density & lower cost
• Widely used in cell phones, digital cameras, and solid-state
drives (e.g., USB memory keys)
Direct Memory Access
• To transfer block of data from/to main memory to/from I/O
devices two techniques are used.
1. Program-controlled I/O 2. Direct Memory Access(DMA)
Program-controlled I/O: It requires processor to intervene
frequently for many data transfers. Overhead is high because
each transfer involves only a single word or a single byte.
Interrupt state-saving and operating system also introduce
overheads for small data size. It takes two clock cycles for
transfer.
Direct Memory Access(DMA) : It is an alternative to Program
controlled I/O. A special control unit is provided to manage the
transfer, without continuous intervention by the processor. The
unit that controls DMA transfers is referred to as a DMA
controller(DMAC). It may be part of the I/O device interface,
or it may be a separate unit shared by a number of I/O devices.
It takes one clock cycle for transfer.
1 clock cycle

2 clock cycle
1 cycle
DMA Controller
• A DMA controller transfers data without intervention by the
processor, its operation must be under the control of a program
executed by the processor.
• To initiate the transfer of a block of words, the processor sends
to the DMA controller the starting address, the number of words
in the block, and the direction of the transfer. The DMA
controller then proceeds to perform the requested operation.
• When the entire block has been transferred, it informs the
processor by raising an interrupt.
• DMA controller examples: disk and Ethernet
Typical registers in a DMA controller
•Two registers are used for storing the starting address and the
word count.
•The third register contains status and control flags. The R/W bit
determines the direction of the transfer. When this bit is set to 1 by
a program instruction, the controller performs a Read operation,
that is, it transfers data from the memory to the I/O device.
Otherwise, it performs a Write operation.
•When the controller has completed transferring a block of data and
is ready to receive another command, it sets the Done flag to 1. Bit
30 is the Interrupt-enable flag, IE. When this flag is set to 1, it
causes the controller to raise an interrupt after it has completed
transferring a block of data. Finally, the controller sets the IRQ bit
to 1 when it has requested an interrupt.
Use of DMA controllers in a computer system
One DMA controller connects a high-speed Ethernet to the
computer’s I/O bus and the disk controller, which controls two
disks, also has DMA capability and provides two DMA channels. It
can perform two independent DMA operations, as if each disk had
its own DMA controller.

To start a DMA transfer of a block of data from the main memory


to one of the disks, an OS routine writes the address and word
count information into the registers of the disk controller. The
DMA controller proceeds independently to implement the specified
operation. When the transfer is completed, this is recorded in the
status and control register of the DMA channel by setting the Done
bit. At the same time, if the IE bit is set, the controller sends an
interrupt request to the processor and sets the IRQ bit.
Memory Hierarchy
• Ideal memory is fast, large, and inexpensive
• Not feasible, so use memory hierarchy instead
• Exploits program behaviour to make it appear as though
memory is fast and large
• Recognizes speed/capacity/cost features of different
memory technologies
• Fast static memories are closest to processor
• Slower dynamic memories for more capacity
• Slowest disk memory for even more capacity
Memory Hierarchy
• Processor registers are fastest.
• There are often two or more levels of cache. A primary cache is
always located on the processor chip. This cache is small and its
access time is comparable to that of processor registers. The
primary cache is referred to as the level 1 (L1) cache. A larger,
and hence somewhat slower, secondary cache is placed between
the primary cache and the primary memory. It is referred to as
the level 2 (L2) cache. Often, the L2 cache is also housed on the
processor chip.
• Some computers have a level 3 (L3) cache of even larger size, in
addition to the L1 and L2 caches. An L3 cache, also implemented
in SRAM technology, may or may not be on the same chip with the
processor and the L1 and L2 caches.
• Holds copies of program instructions and data stored in the
large external main memory. For very large programs, or
multiple programs active at the same time, need more storage
• Use disks to hold what exceeds main memory
Caches and Locality of Reference
• The cache is between processor and memory
• Makes large, slow main memory appear fast
• Effectiveness is based on locality of reference
• Typical program behaviour involves executing instructions
in loops and accessing array data
• Temporal locality: instructions/data that have been recently
accessed are likely to be needed again soon
• Spatial locality: nearby instructions or data are likely to be
accessed after current access
• Cache block : refers to a set of contiguous address
locations of some size.
Use of a cache memory
Cache Concepts
• To exploit spatial locality, transfer cache block with multiple
adjacent words from memory
• Later accesses to nearby words are fast, provided that cache still
contains the block
• Mapping function determines where a block from main memory
is to be located in the cache
• When cache is full, replacement algorithm determines which
block to remove for space
• Processor issues Read and Write requests as if it were accessing
main memory directly
• But control circuitry first checks the cache, If desired
information is present in the cache, a read or write hit occurs
• For a read hit, main memory is not involved; the cache provides
the desired information
• For a write hit, there are two approaches
Handling Cache Writes
• Write-through protocol: update cache & main memory.
• Write-back (or) copy-back, protocol: only update the cache;
main memory is updated later when block is replaced
• Write-back scheme needs modified or dirty bit to mark blocks
that are updated in the cache
• The write-through protocol is simpler than the write-back
protocol, but it results in unnecessary Write operations in the
main memory when a given cache word is updated several times
during its cache residency.
• The write-back protocol also involves unnecessary Write
operations, because all words of the block are eventually written
back, even if only a single word has been changed while the
block was in the cache.
• The write-back protocol is used most often, to take advantage of
the high speed with which data blocks can be transferred to
memory chips.
Handling Cache Misses
• If desired information is not present in cache, a read or write
miss occurs
• For a read miss, the block with desired word is transferred from
main memory to the cache
• For a write miss under write-through protocol, information is
written to the main memory
• Under write-back protocol, first transfer block containing the
addressed word into the cache, then overwrite specific word in
cached block
Mapping Functions
• Block of consecutive words in main memory must be transferred
to the cache after a miss
• The mapping function determines the location
• There are three different mapping functions
1. Direct mapping
2. Associative mapping
3. Set Associative mapping
• [Example]:Consider a cache consisting of 128 blocks of 16
words each, for a total of 2048 (2K) words, and assume that the
main memory is addressable by a 16-bit address. The main
memory has 64K words, which is 4K blocks of 16 words each.
Direct Mapping
•The simplest way to determine cache locations in which to store
memory blocks is the direct-mapping technique.
•In this technique, block j of the main memory maps onto block
j modulo 128 of the cache. Thus, whenever one of the main
memory blocks 0, 128, 256, . . . is loaded into the cache, it is stored
in cache block 0.
•Blocks 1, 129, 257, . . . are stored in cache block 1, and so on.
Since more than one memory block is mapped onto a given cache
block position, contention may arise for that position even when
the cache is not full.
For example, instructions of a program may start in block 1 and
continue in block 129, possibly after a branch. As this program is
executed, both of these blocks must be transferred to the block-1
position in the cache. Contention is resolved by allowing the new
block to overwrite the currently resident block.
The memory address is divided into three fields, Tag, block and
word
Word: To differentiate 16 words 4 bits are enough. The low-order 4
bits select one of 16 words in a block.
Block : There are 128 blocks in the cache. To represent 128
blocks,7bits are enough. When a new block enters the cache, the 7-
bit cache block field determines the cache position in which this
block must be stored.
Tag: The remaining bits 16-7-4 =5 bits. This high-order 5 bits of
the memory address of the block are stored in 5 tag bits associated
with its location in the cache. The tag bits identify which of the 32
main memory blocks mapped into this cache position is currently
resident in the cache.
Problem: Find the cache block and word (in decimal) if the main
memory address of 16 bits of 0101010101010101. Consider a
cache consisting of 128 blocks of 16 words each, for a total of 2048
(2K) words. The main memory has 64K words, which is 4K blocks
of 16 words each.
Tag Cache block Word
01010 1010101 0101
5th word
53 rd Cache
block

Main memory address of 16 bits


of 0101010101010101
represents W21813. It is in the
main memory block of 1333
Searching a word using direct mapping
As execution proceeds, the 7-bit cache block field of each
address generated by the processor points to a particular block
location in the cache. The high-order 5 bits of the address are
compared with the tag bits associated with that cache location.
If they match, then the desired word is in that block of the
cache. If there is no match, then the block containing the
required word must first be read from the main memory and
loaded into the cache.
Advantage : Hardware design is simple, so it is easy to
implement. Only one Comparator is enough to compare all tag
bits available in the cache.
Disadvantage: If two blocks contending for same block , one
block has to be removed even though the space is available in
the cache. So, it is not very flexible.
Practice problems based on Direct mapping
1. Consider a direct mapped cache of size 16 KB with block size 256
bytes. The size of main memory is 128 KB. Find i) Number of bits in
tag ii)Tag directory size (Total Tag bits of cache)
Given: Cache memory size = 16 KB, Block size = 256 bytes, Main
memory size = 128 KB
i) Bits required for main memory = 128KB=217bytes. So, main memory
requires 17 bits.
Tag Cache block bits Word in terms of byte
bits
3bits 6 bits 8 bits
17 bits
a). A block contains 256 bytes. So, 8 bits in word field
b). No.of Cache blocks = Cache size/block size = 16KB/256B =214/28
=64blocks.So, there are 6bits to represent cache block
c). Tag bits = 17-6-8= 3 bits
ii) Tag directory size = No.Cache blocks * tag bits for each block =
64*3=192bits=24B
2. Consider a direct mapped cache of size 512 KB with block size 1 KB. There
are 7 bits in the tag. Find the i)Size of main memory ii)Tag directory size.
Given: Cache memory size = 512 KB, Block size =1 KB, Number of bits in
tag = 7 bits Tag Cache block bits Word in terms of byte
bits
7 ? ?
a). A block contains 1024 bytes. So, 10 bits in word field.
b). No.of Cache blocks = Cache size/block size =512KB/1024B =219/210 =29
= 512 blocks. So, there are 9 bits to represent cache block
Tag Cache block bits Word in terms of byte
bits
7bits 9bits 10bits
c). Number of bits in Main memory(or)physical address = 7+9+10=26 bits
d). Size of main memory= Number of bits in physical address = 26 bits.
Thus, Size of main memory = 226 bytes= 26*210 bytes=64MB
e). Tag directory size = No.Cache blocks * tag bits for each block =
512*7=3584bits=448Bytes
3. Consider a direct mapped cache with block size 4 KB. The size of main
memory is 16 GB and there are 10 bits in the tag. Find i).Size of cache
memory ii)Tag directory size
Given: Block size = 4 KB, Size of main memory = 16 GB, Number of bits in
tag = 10 bits
a)Size of main memory=16 GB= 234 bytes. So, Number of bits in physical
address = 34 bits
Tag bits Cache block bits Bits in Word in terms of byte
10 ? ?
34bits
b). A block contains 4KB=4*1024=4096 bytes. So, 12 bits in word field
c). Cache block bits = Number of bits in physical address – (Number of
bits in tag + Number of bits in Word field) = 34-(10+12)= 12bits
d). No.of Cache blocks = Cache size/block size, From that
Cache size = No.of Cache blocks* block size = 212 * 4KB= 212
*22*210B= 224 B = 16MB
e). Tag directory size = No.Cache blocks * tag bits for each block =
212*10= 40960bits=5120Bytes
4. Consider a direct mapped cache of size 32 KB with block
size 32 bytes. The CPU generates 32 bit addresses. The
number of bits needed for cache blocks and the number of tag
bits are respectively.
A. 10, 17 B.10, 22 C.15, 17 D.5, 17
5. Consider a machine with a byte addressable main memory
of 232 bytes divided into blocks of size 32 bytes. Assume that a
direct mapped cache having 512 cache blocks(or lines) is used
with this machine. The size of the tag field in bits is ______.
A. 18 B. 22 C. 17 D.5
Associative Mapping
• The main memory block can be placed into any cache block
position. So, more flexible. So, the memory address is divided into
two fields. There are word and Tag field .
• Tag field is enlarged to include block bits, so larger tag stored in
cache with each block
• The tag bits of an address received from the processor are
compared to the tag bits of each block of the cache to see if the
desired block is present. This is called the associative-mapping
technique
• The complexity of an associative cache is higher than that of a
direct-mapped cache.
• For hit/miss, to search all 128 tag patterns to determine whether a
given block is in the cache. To avoid a long delay, the tags must be
searched in parallel. This search is called as associative search
• Flexible mapping also requires appropriate replacement algorithm
Searching a word using associative mapping
Advantage : Any main memory block can be placed in any
cache block if space is available in the cache. So, it is very
flexible and it reduce cache miss compared to direct mapping.
Disadvantage: Hardware design is complex. It require
Comparators equal to the number of cache blocks.
Practice problems based on associative mapping
1. Consider a fully associative mapped cache with block size
4 KB. The size of main memory is 16 GB. Find the number
of bits in tag.
Given: Block size = 4 KB, Size of main memory = 16 GB
a). Size of the main memory = 16GB=234B, So the number of
bits in main memory addresses is 34bits.
b). A block contains 4KB = 212So, the bits in word field is 12
bits
c). Number of bits in Tag field are 34-12=22bits
Tag Word
22 bits 12 bits
2.Consider a fully associative mapped cache of size 16 KB with block
size 256 bytes. The size of main memory is 128 KB. Find i).Number
of bits in tag ii).Tag directory size.
Given: Cache memory size = 16 KB, Block size = 256 bytes
Main memory size = 128 KB
a). Size of the main memory = 128KB=217B, So the number of bits
in main memory addresses is 17bits.
b). A block contains 256 bytes. So, the bits in word field is 8 bits
c). Number of bits in Tag field are 17-8=9bits

Tag Word
9bits 8bits

d). Tag directory size = No.Cache blocks * tag bits for each block
No.Cache blocks = Cache size/block size = 16KB/256B =26 =
64blocks
Then Tag directory = 64*9bits=576bits=72bytes
Set-Associative Mapping
• Combination of direct & associative mapping
• The blocks of the cache are grouped into sets, and the mapping
allows a block of the main memory to reside in any block of a
specific set. Hence, the contention problem of the direct method is
eased by having a few choices for block placement. At the same
time, the hardware cost is reduced by decreasing the size of the
associative search.
• Group blocks of cache into sets. [Example]: A cache with two
blocks per set. In this case, memory blocks 0, 64, 128, . . . , 4032
map into cache set 0, and they can occupy either of the two block
positions within this set.
• Having 64 sets means that the 6-bit set field of the address
determines which set of the cache might contain the desired block.
• The tag field of the address must then be associatively compared to
the tags of the two blocks of the set to check if the desired block is
present. This two-way associative search is simple to implement.
Searching a word using Set associative mapping
Stale Data [ Not fresh data]
• When power is first turned on, the cache contains no valid data. A
control bit, usually called the valid bit, must be provided for each
cache block to indicate whether the data in that block are valid.
• The valid bits of all cache blocks are set to 0 when power is
initially applied to the system. Some valid bits may also be set to 0
when new programs or data are loaded from the disk into the main
memory. Data transferred from the disk to the main memory using
the DMA mechanism are usually loaded directly into the main
memory, bypassing the cache.
• If the memory blocks being updated are currently in the cache, the
valid bits of the corresponding cache blocks are set to 0.
• As program execution proceeds, the valid bit of a given cache
block is set to 1 when a memory block is loaded into that location.
• The processor fetches data from a cache block only if its valid bit is
equal to 1. The use of the valid bit ensures that the processor will
A similar precaution is needed in a system that uses the write-back
protocol. Under this protocol, new data written into the cache are
not written to the memory at the same time. Hence, data in the
memory do not always reflect the changes that may have been
made in the cached copy. It is important to ensure that such stale
data in the memory are not transferred to the disk.
The solution is flush the cache, by forcing all dirty blocks to be
written back to the memory before performing the transfer.
The operating system issue a flush command to the cache before
initiating the DMA operation that transfers the data to the disk.
Flushing the cache does not affect performance greatly.
Practice problems based on Set-associative mapping
1. Consider a 2-way set associative mapped cache of size 16 KB
with block size 256 bytes. The size of main memory is 128 KB.
Find i).Number of bits in tag ii).Tag directory size
Given: Set size = 2, Cache memory size = 16 KB, Block size = 256 bytes
Main memory size = 128 KB
a). Size of the main memory = 128KB=217B, So the number of bits in
main memory addresses is 17bits.
b). A block contains 256 bytes. So, the bits in word field is 8 bits
c). Total Number of blocks in cache = cache size/block size =
16KB/256B=214 bytes / 28 bytes= 64 blocks
d). Total number of sets in cache = Total number of blocks in cache / Set
size = 64/2 = 32 sets , So the number of bits to represent 32 sets are 5bits
e)Number of bits in Tag field are 17-5-8=4bits
Tag sets Word in terms of bytes
4bits 5bits 8bits
f)Tag directory= 64*4bits=256bits=32Bytes
2. Consider a 8-way set associative mapped cache of size 512 KB
with block size 1 KB. There are 7 bits in the tag. Find i).Size of main
memory ii).Tag directory size
Tag sets Word in terms of bytes
7bits 6bits 10bits
i) Size of main memory = 223 bytes= 8 MB
ii) Tag directory size =448Bytes

3. Consider a 4-way set associative mapped cache with block size 4


KB. The size of main memory is 16 GB and there are 10 bits in the
tag. Find i).Size of cache memory ii).Tag directory size
Tag sets Word in terms of bytes
10bits 12bits 12bits
i) Size of cache memory = 64 MB
ii) Tag directory size =20KBytes
Solved Problem
A computer system uses 32-bit memory addresses and it has a main
memory consisting of 1Gbytes. It has a 4K-byte cache organized in
the block-set-associative manner, with 4 blocks per set and 64 bytes
per block. Calculate the number of bits in each of the Tag, Set, and
Word fields of the memory address.
Given : Main memory =1GB , Cache = 4KB , 4 blocks per set , 64
bytes per block
A block has 64 bytes; hence the Word field is 6 bits long.
A set contains bytes of 4 blocks per set × 64 = 256 bytes
Number of Sets = 4K bytes/256 bytes= 16 sets, requiring a Set
field of 4 bits.
So, the main memory contains address of 32 − 4 − 6 = 22 bits for
the Tag field.
Tag Sets Word in terms of bytes
22 bits 4 bits 6bits
Ex2 : A block-set-associative cache consists of a total of 64
blocks, divided into 4-block sets. The main memory contains
4096 blocks, each consisting of 32 words. Assuming a 32-bit
byte-addressable address space, how many bits are there in
each of the Tag, Set, and Word fields?
Given: Main memory= 4096 blocks, Cache blocks=64, divided
into 4-block sets
A block has 32 words and each word has 4 bytes, so each
block contains 32*4=128 bytes; hence the Word field is 7 bits.
Number of Sets = 64/4= 16 sets, requiring a Set field of 4
bits.
So, the main memory contains address of 32 − 4 − 7 = 21 bits
for the Tag field.
Tag Sets Word in terms of bytes
21 bits 4 bits 7bits
G1: A processor can support a maximum memory of 4 GB,
where the memory is word-addressable (a word consists of
two bytes). The size of the address bus of the processor is at
____ least bits. (Gate 2016)
Maximum Memory = 4GB = 232 bytes Size of a word = 2
bytes Therefore, Number of words = 232 / 2 = 231 So, it require
31 bits for the address bus of the processor.
G2. An 8KB direct-mapped write-back cache is organized as
multiple blocks, each of size 32 bytes. The processor generates 32-
bit addresses. The cache controller maintains the tag information
for each cache block comprising of the following. 1 Valid bit 1
Modified bit As many bits as the minimum needed to identify the
memory block mapped in the cache. What is the total size of
memory needed at the cache controller to store meta-data (tags) for
the cache? (GATE CS 2011)
Cache size = 8 KB Block size = 32 bytes
Number of cache blocks = Cache size / Block size
= (8 × 1024 bytes)/32 bytes = 256
Tag Cache block Word in terms of bytes
19 bits 8 bits 5 bits
Total bits required to store meta-data for 1 block= Valid bit +
modified bit + tag bit = 1 + 1 + 19 = 21 bits
Then total memory required to store meta data for 256 blocks =
21 × 256 = 5376 bits
G3. The width of the physical address on a machine is 40 bits. The
width of the tag field in a 512 KB , 8-way set associative cache is
____________ bits.GATE-CS-2016 (Set 2)

Cache size = no.of.sets* blocks-per-set* block-size


Let us assume no of sets = 2x And block size= 2y
So applying it in formula,
Cache size = no.of.sets* blocks-per-set* block-size
219 = 2x + 8 + 2y  219 = 2x + 23 + 2y  216= 2x + 2y  216= 2x+y
So x+y = 16
Now we know that to address block size and set number we need
16 bits so remaining bits must be for tag
i.e., 40 - 16 = 24 bits
Tag Sets Word in terms of bytes
24 bits 16 bits
G4. A 4-way set-associative cache memory unit with a capacity of
16 KB is built using a block size of 8 words. The word length is 32
bits. The size of the physical address space is 4 GB. The number of
bits for the TAG field is _____. GATE-CS-2014-(Set-2)
The main memory size is 4GB = 22*230 = 232 bytes. In order to
identify 232 bytes address of 32 bits are required.
Tag Sets Word in terms of bytes
20bits 7bits 5bits

Now, find the value of 32 bits


1. Bits required for Block size in terms of bytes = 8 words =
32bytes=5bits
2. No.of.blocks = size of the cache/ block size = 16KB/8words=
214B/ 32B = 214B/25B=29 =512 blocks
3. No.of sets = No.of blocks/ 4-way = 512/4 = 128 sets
4. Bits required for 128 sets are = 7 bits
5. Tag bits = 32-7-5 = 20bits
Performance Consideration
Two key factors in the commercial success of a computer are
performance and cost. A common measure is the
price/performance ratio. Performance depends on how fast
machine instructions can be brought into the processor and
how fast they can be executed.
The main purpose of this hierarchy is to create a memory
that the processor sees as having a short access time and a
large capacity. When a cache is used, the processor is able to
access instructions and data more quickly when the data from
the referenced memory locations are in the cache. Therefore,
the extent to which caches improve performance is dependent
on how frequently the requested instructions and data are
found in the cache.
Hit Rate and Miss Penalty
•A successful access to data in a cache is called a hit.
•Hit rate :The ratio of number of hits divided by all attempted
accesses.
•Miss rate :The ratio of number of misses divided by attempted
accesses.
•Miss Penalty: Performance is affected by the actions when a miss
occurs. A performance penalty is incurred because of the extra time
needed to bring a block of data from a slower unit in the memory
hierarchy to a faster unit. During that period, the processor is
stalled waiting for instructions or data. The total access time seen
by the processor when a miss occurs is called as miss penalty.
•Let h be the hit rate, M the miss penalty, and C the time to access
information in the cache. Thus, the average access time
experienced by the processor is tavg = hC + (1 − h)M
Virtual Memory
• In most modern computer systems, the physical main memory is
not as large as the address space of the processor.
• For example, a processor that issues 32-bit addresses has an
addressable space of 4G bytes. But, the size of the main memory
in a typical computer with a 32-bit processor may range from 1G
to 4G bytes.
• If a program does not completely fit into the main memory, the
parts of it not currently being executed are stored on a secondary
storage device, typically a magnetic disk.
• As these parts are needed for execution, they must first be
brought into the main memory, possibly replacing other parts
that are already in the memory. These actions are performed
automatically by the operating system, using a scheme known as
virtual memory.Needed portions are automatically loaded into
the memory, replacing other portions. Programmers need not be
aware of actions.
• Programs written assuming full address space
• Processor issues virtual or logical address, it must be translated
into physical address
• Proceed with normal memory operation, when addressed
contents are in the memory
• When no current physical address exists, perform actions to
place contents in memory
Memory Management Unit
• Implementation of virtual memory relies on a memory
management unit (MMU)
• Maintains virtualphysical address mapping to perform the
necessary translation. It is available in CPU itself.
• When no current physical address exists, MMU invokes
operating system services
• Causes transfer of desired contents from disk to the main
memory using DMA scheme
• MMU mapping information is also updated
Frame0
Page0 A Frame1
Page1 B Frame2
Page Table
Page2 C Frame3
Page C Frame
Page3 D Frame4 A
Page4 E Page0 V 4
Frame5 B
Page5 F Page1 B
Frame6 C AA
B
Page6 G Page2 V 6
Frame7 b
Page7 H Page3
Frame8
Page8 Page4 Frame9 F
: Page5 V 9 Frame10
: Page6 Frame11
: Page7 Frame12
: Frame13
: Matches b/w logical
and physical Frame14 Hard disk
Page n address Frame15

Logical Memory Physical Memory


Address Translation
• Use fixed-length unit of pages (2K-16K bytes)
• Larger size than cache blocks due to slow disks
• For translation, divide address bits into 2 fields
• Lower bits give offset of word within page
• Upper bits give virtual page number (VPN)
• Translation preserves offset bits, but causes VPN bits to be
replaced with page frame bits
• Page table (stored in the main memory) provides information
to perform translation
Virtual-memory address translation
Page Table
• MMU must know location of page table
• Page table base register has starting address
• Adding VPN to base register contents gives location of
corresponding entry about page
• If page is in memory, table gives frame bits
• Otherwise, table may indicate disk location
• Control bits for each entry include a valid bit and modified
bit indicating needed copy-back
• Also have bits for page read/write permissions
Translation Lookaside Buffer
• MMU must perform lookup in page table for translation of
every virtual address
• For large physical memory, MMU cannot hold entire page
table with all of its information
• Translation lookaside buffer (TLB) in the MMU holds
recently-accessed entries of page table. TLB is placed
separately in any one of the cache unit.
• Associative searches are performed on the TLB with virtual
addresses to find matching entries
• If miss in TLB, access full table and update TLB
Use of an associative-mapped TLB
Page Faults
• When a program generates an access request to a page that is not
in the main memory, a page fault is occurred. The entire page
must be brought from the disk into the memory before access can
proceed.
• When it detects a page fault, the MMU asks the operating
system to intervene by raising an exception (interrupt).
Processing of the program that generated the page fault is
interrupted, and control is transferred to the operating system.
• The operating system copies the requested page from the disk
into the main memory
• If a new page is brought from the disk when the main memory is
full, it must replace one of the resident pages.
• A modified page has to be written back to the disk before it is
removed from the main memory.
Practice problems based on Virtual memory
1.An address space is specified by 24 bits and the
corresponding memory space is 16 bits. How many words are
there in the virtual and main memory?
Number of words in virtual memory =224 words=16M words
Number of words in main memory =216 words=64K words

2. Consider a system with byte-addressable memory, 32 bit


logical addresses, 4 kilobyte page size and page table entries
of 4 bytes each. What is the size of the page table?
Given: Virtual memory logical addresses =32bit, Page size=4KB,
Page table entry=4bytes
Page table size = Number of entries in page table x Page table entry
size
Number of entries in page table= Number of pages
Number of pages = Size of the virtual memory/Page size
=232Bytes/ 4Kbytes = 232 /212 = 220
So, Number of entries in the page table = 220
Size of the page table= Number of entries in page table x Page
table entry size = 220 × 4 bytes= 4MB

3. Consider a machine with 64 MB physical memory and a 32


bit virtual address space. If the page size is 4 KB, what is the
size of the page table? Also, find the frames and frame offset.
Given: Physical memory logical addresses=64MB, Page size
or frame size =4KB, Virtual address = 32bits.
Page table size = Number of entries in page table x Page table
entry size
Number of entries in page table= Number of pages
Number of pages = Size of the virtual memory/Page size
=232Bytes/ 4Kbytes = 232 /212 = 220
Number of frames = Size of the physical memory/frame size
=64MB/ 4KB = 226 /212 = 214
Number of bits for frames = 14bits
Frame offset = 4KB= 212 =12 bits

Frame Number in bits Frame offset in bits


14bits 12bits

26 bits
Size of the page table= Number of entries in page table x Page
table entry size
Page table entry size = control bit + Number of bits in frame=
1+14=15 bits
Size of the page table= 220 x 15 bits= 220 x 2B =2MB
4. For each configuration (a-c), state how many bits are needed for
each of the following: i). Virtual address ii).Physical address
iii).Virtual page number iv).Physical page number v).Offset
a. 32-bit operating system, 4-KB pages, 1 GB of RAM
b. 32-bit operating system, 16-KB pages, 2 GB of RAM
c. 64-bit operating system, 16-KB pages, 16 GB of RAM

Virtual Address = OS address length in bits


Physical Address = log2(RAM size) bits
Offset = log2(page size) bits
Virtual Address = Virtual Page Number bits +Offset, then
Virtual Page Number bits = Virtual Address - Offset
Similarly,
Physical Page Number bits = Physical Address - Offset
Magnetic Hard Disks
• Computers often use magnetic hard disks for large secondary
storage devices
• One or more platters on a common spindle
• Platters are covered with thin magnetic film
• Platters rotate on spindle at constant rate
• Read/write heads in close proximity to surface can access data
arranged in concentric tracks
• Magnetic yoke and magnetizing coil can change or sense
polarity of areas on surface. It uses a magnetization process.
• The entire disk is divided into platters.
• Each platter consists of concentric circles called as tracks.
• These tracks are further divided into sectors which are the
smallest divisions in the disk.
• A cylinder is formed by combining the tracks at a given radius
of a disk pack.
There exists a mechanical arm called as Read / Write head. It is
used to read from and write to the disk.
Head has to reach at a particular track and then wait for the
rotation of the platter. The rotation causes the required sector of the
track to come under the head.
Each platter has 2 surfaces- top and bottom and both the surfaces
are used to store the data. Each surface has its own read / write
head.
Phase encoding or Manchester encoding is used in magnetization
for each data bit.
The disk system consists of three key parts.
One part is the assembly of disk platters, which is usually
referred to as the disk.
The second part comprises the electromechanical mechanism
that spins the disk and moves the read/write heads; it is called the
disk drive.
The third part is the disk controller, which is the electronic
circuitry that controls the operation of the system. The disk
controller may be implemented as a separate module, or it may be
incorporated into the enclosure that contains the entire disk system.
Access Time
There are two components involved in the time delay between
the disk receiving an address and the beginning of the actual
data transfer.
The first, called the seek time, is the time required to move the
read/write head to the proper track.
The second component is the rotational delay, also called
latency time, which is the time taken to reach the addressed
sector after the read/write head is positioned over the correct
track. On average, this is the time for half a rotation of the
disk.
The sum of these two delays is called the disk access time

You might also like