CO Unit 4
CO Unit 4
Outline
• Basic memory circuits & memory organization
• Memory technology
• Direct memory access
• Memory hierarchy concepts
• Cache memory and virtual memory
• Magnetic and optical disks
Basic Concepts
• Access provided by processor-memory interface
• The processor uses the address lines to specify the memory
location involved in a data transfer operation, and uses the data
lines to transfer the data, the control lines carry the command
indicating a Read or a Write operation
• Speed of the memory unit is measured by
1. Memory access time
2. Memory cycle time
• Memory access time is time from initiation to completion of a
word or byte transfer
• Memory cycle time is minimum time delay between initiation of
two successive transfers[ex: the time between two successive
Read operations]
• cycle time is usually slightly longer than the access time
• Random-access memory (RAM) means that access time is same,
independent of location
Cache and Virtual Memory
• The main memory is slower than processor
• Cache memory is smaller and faster memory that is used to
reduce effective access time
• Holds subset of program instructions and data
• Information for one or more active programs may exceed
physical capacity of the memory
• Virtual memory provides larger apparent size by transparently
using secondary storage. Sections of the program are transferred
back and forth between the main memory and the secondary
storage device in a manner that is transparent to the application
program
• Both approaches need efficient block transfers. These transfers
do not occur one word at a time. Data are always transferred in
contiguous blocks involving tens, hundreds or thousands of
words.
Semiconductor RAM Memories
• Semiconductor Random Access Memories (RAMs) are available
in a wide range of speeds. Their cycle times range from 100 ns
to less than 10 ns.
• Memory chips have a common organization
• Cells holding single bits arranged in an array
• Words are rows; cells connected to word lines
• Cells in columns connect to bit lines
• Sense/Write circuits are interfaces between internal bit lines and
data I/O pins of chip
• Typical control pin connections include Read/Write command
and chip select (CS)
Internal Organization and Operation
• Example of 16-word 8-bit memory chip has decoder to
select word line from 4-bit address
• Two complementary bit lines for each data bit
• External source provides stable address bits, and asserts
chip-select input with command
• For Read operation, Sense/Write circuits transfer data from
selected row to I/O pins
• For Write operation, Sense/Write circuits transfer data from
I/O pins to selected cells
• 168 chip has only 128 storage cells, So it require 8pins
• Address decoder (4),Data I/O lines(8),R/W and CS(2) and
power/ground(2). So totally it need 482216 pins
More on Chip Organization
• Larger chips: similar organization, more pins[costly]
• For example(1K) a chip with 1024 cells could be
organized as 1288 (782219 pins)
How to reduce more pins?
• Alternatives:
• 1. cells can be organized into a 1K×1 format. So, a 10-bit
address is needed, but there is only one data line, R/W and
CS(2) and power/ground(2) resulting in 15 pins
• 2.Use 3232 array & divide address bits to have 5 upper
bits for row, 5 lower bits for column, one data line, R/W
and CS(2) and power/ground(2) resulting in 15 pins
Organization of a 1K × 1 memory chip using 5 bit row address and 5 bit column
address
Static RAMs and CMOS Cell
• SRAM: Data once written are retained as long as power is
ON. usually has short access time (few nanosecs.)
• A static RAM cell in a chip consists of two cross-connected
inverters to form a latch[store 1bit of data]
• Chip implementation typically uses CMOS cell whose
advantage is low power consumption
• Two transistors controlled by word line act as switches
between the cell and the bit lines
• To write, bit lines driven with desired data
X X
PMOS NMOS
When X=1,
When X=0,
Transistor is ON
Transistor is ON
there is a
there is a
conducting path
conducting path
from S to D
from S to D
1 bit SRAM Cell using inverter/ Not gate
Transistors that can be ON and
OFF under the control of word
line. To retain the state of the
latch, the word line can be
grounded which make transistors
OFF
Read operation:
1. Word line is activated(=1), to
make T1 and T2 ON.
2. Both T1 and T2 ON, then the
value in X and Y (i.e)the
value stored in latch is
available on b and b’.
3. Sense/Write circuit is
connected to the bit lines ,
monitor the states of b and b’
Write Operation
To Write 1: The bit line b is set with 1 and bit line b’ is set
with 0. Then the word line is activated and the data is written
to the latch.
To Write 0: The bit line b is set with 0 and bit line b’ is set
with 1. Then the word line is activated and the data is written
to the latch.
[Ex]The first word of data is transferred after five clock cycles. So,
the latency is five clock cycles. If the clock rate is 500 MHz, then
the latency = 5/(500×106) =10 ns. The remaining three words are
transferred in consecutive clock cycles, at the rate of one word
every 2 ns(1/500×106). Time between successive words of a block
is much shorter than the time needed to transfer the first word. Here
2ns<10ns. Then total latency = 5+4 cycles = (10 +(4*2))=18ns
Bandwidth
It is the number of bits or bytes that can be transferred in one
second. It depends on the speed of access to the stored data and on
the number of bits that can be accessed in parallel.
EX1: Consider a main memory built with SDRAM chips. Data are
transferred in burst length of 8. Assume that 32 bits of data are
transferred in parallel. If a 400-MHz clock is used, how much time
does it take to transfer: (a) 32 bytes of data (b) 64 bytes of data.
What is the latency in each case?
Each column address strobe causes 8× 4 = 32 bytes to be
transferred.
(a) The first word of data is transferred after five clock cycles. So,
the latency is five clock cycles. If the clock rate is 400 MHz, then
the latency = 5/(400×106) =12.5 ns. The remaining seven words at
the rate of one word of every 2.5 ns(1/400×106). So total time = 5 +
8 = 13 clock cycles, or = (12.5+(8*2.5))=32.5 ns.
(b) A second column strobe is needed to transfer the second
burst of 32 bytes. Therefore:
Latency = 5 clock cycles or 12.5 ns
Total time = 5 + 8 + 2 + 8 = 23 clock cycles, or 57.5 ns
2 clock cycle
1 cycle
DMA Controller
• A DMA controller transfers data without intervention by the
processor, its operation must be under the control of a program
executed by the processor.
• To initiate the transfer of a block of words, the processor sends
to the DMA controller the starting address, the number of words
in the block, and the direction of the transfer. The DMA
controller then proceeds to perform the requested operation.
• When the entire block has been transferred, it informs the
processor by raising an interrupt.
• DMA controller examples: disk and Ethernet
Typical registers in a DMA controller
•Two registers are used for storing the starting address and the
word count.
•The third register contains status and control flags. The R/W bit
determines the direction of the transfer. When this bit is set to 1 by
a program instruction, the controller performs a Read operation,
that is, it transfers data from the memory to the I/O device.
Otherwise, it performs a Write operation.
•When the controller has completed transferring a block of data and
is ready to receive another command, it sets the Done flag to 1. Bit
30 is the Interrupt-enable flag, IE. When this flag is set to 1, it
causes the controller to raise an interrupt after it has completed
transferring a block of data. Finally, the controller sets the IRQ bit
to 1 when it has requested an interrupt.
Use of DMA controllers in a computer system
One DMA controller connects a high-speed Ethernet to the
computer’s I/O bus and the disk controller, which controls two
disks, also has DMA capability and provides two DMA channels. It
can perform two independent DMA operations, as if each disk had
its own DMA controller.
Tag Word
9bits 8bits
d). Tag directory size = No.Cache blocks * tag bits for each block
No.Cache blocks = Cache size/block size = 16KB/256B =26 =
64blocks
Then Tag directory = 64*9bits=576bits=72bytes
Set-Associative Mapping
• Combination of direct & associative mapping
• The blocks of the cache are grouped into sets, and the mapping
allows a block of the main memory to reside in any block of a
specific set. Hence, the contention problem of the direct method is
eased by having a few choices for block placement. At the same
time, the hardware cost is reduced by decreasing the size of the
associative search.
• Group blocks of cache into sets. [Example]: A cache with two
blocks per set. In this case, memory blocks 0, 64, 128, . . . , 4032
map into cache set 0, and they can occupy either of the two block
positions within this set.
• Having 64 sets means that the 6-bit set field of the address
determines which set of the cache might contain the desired block.
• The tag field of the address must then be associatively compared to
the tags of the two blocks of the set to check if the desired block is
present. This two-way associative search is simple to implement.
Searching a word using Set associative mapping
Stale Data [ Not fresh data]
• When power is first turned on, the cache contains no valid data. A
control bit, usually called the valid bit, must be provided for each
cache block to indicate whether the data in that block are valid.
• The valid bits of all cache blocks are set to 0 when power is
initially applied to the system. Some valid bits may also be set to 0
when new programs or data are loaded from the disk into the main
memory. Data transferred from the disk to the main memory using
the DMA mechanism are usually loaded directly into the main
memory, bypassing the cache.
• If the memory blocks being updated are currently in the cache, the
valid bits of the corresponding cache blocks are set to 0.
• As program execution proceeds, the valid bit of a given cache
block is set to 1 when a memory block is loaded into that location.
• The processor fetches data from a cache block only if its valid bit is
equal to 1. The use of the valid bit ensures that the processor will
A similar precaution is needed in a system that uses the write-back
protocol. Under this protocol, new data written into the cache are
not written to the memory at the same time. Hence, data in the
memory do not always reflect the changes that may have been
made in the cached copy. It is important to ensure that such stale
data in the memory are not transferred to the disk.
The solution is flush the cache, by forcing all dirty blocks to be
written back to the memory before performing the transfer.
The operating system issue a flush command to the cache before
initiating the DMA operation that transfers the data to the disk.
Flushing the cache does not affect performance greatly.
Practice problems based on Set-associative mapping
1. Consider a 2-way set associative mapped cache of size 16 KB
with block size 256 bytes. The size of main memory is 128 KB.
Find i).Number of bits in tag ii).Tag directory size
Given: Set size = 2, Cache memory size = 16 KB, Block size = 256 bytes
Main memory size = 128 KB
a). Size of the main memory = 128KB=217B, So the number of bits in
main memory addresses is 17bits.
b). A block contains 256 bytes. So, the bits in word field is 8 bits
c). Total Number of blocks in cache = cache size/block size =
16KB/256B=214 bytes / 28 bytes= 64 blocks
d). Total number of sets in cache = Total number of blocks in cache / Set
size = 64/2 = 32 sets , So the number of bits to represent 32 sets are 5bits
e)Number of bits in Tag field are 17-5-8=4bits
Tag sets Word in terms of bytes
4bits 5bits 8bits
f)Tag directory= 64*4bits=256bits=32Bytes
2. Consider a 8-way set associative mapped cache of size 512 KB
with block size 1 KB. There are 7 bits in the tag. Find i).Size of main
memory ii).Tag directory size
Tag sets Word in terms of bytes
7bits 6bits 10bits
i) Size of main memory = 223 bytes= 8 MB
ii) Tag directory size =448Bytes
26 bits
Size of the page table= Number of entries in page table x Page
table entry size
Page table entry size = control bit + Number of bits in frame=
1+14=15 bits
Size of the page table= 220 x 15 bits= 220 x 2B =2MB
4. For each configuration (a-c), state how many bits are needed for
each of the following: i). Virtual address ii).Physical address
iii).Virtual page number iv).Physical page number v).Offset
a. 32-bit operating system, 4-KB pages, 1 GB of RAM
b. 32-bit operating system, 16-KB pages, 2 GB of RAM
c. 64-bit operating system, 16-KB pages, 16 GB of RAM