21cs401 CA Unit IV
21cs401 CA Unit IV
A bus is a shared communication link, which uses one set of wires to connect multiple
subsystems. Most modern computers use single bus arrangement for connecting I/O devices to
CPU & Memory. The bus enables all the devices connected to it to exchange information.
Advantages
1. Versatility 2. Low cost.
Bus consists of 3 set of lines :
Address: Processor places a unique address for an I/O device on address lines.
Data: The data will be placed on Data lines. The data lines of the bus carry information between
the source and the destination. This information may consist of data, complex commands, or
addresses.
Control: The control lines are used to signal requests and acknowledgments, and to indicate
what type of information is on the data lines. Processor requests for either Read / Write.
INPUT OUTPUT SYSTEM
• Memory mapped I/O
• Programmed I/O
• Interrupts
• DMA (Direct memory Access)
1. Memory mapped I/O and Isolated I/O
• I/O devices and the memory share the same address space; the arrangement is called
Memory-mapped I/O.
• In Memory-mapped I/O portions of address space are assigned to I/O devices and reads
and writes to those addresses are interpreted as commands to the I/O device.
• Memory mapped I/O can also be used to transmit data by writing or reading to select
addresses.
• The device uses the address to determine the type of command, and the data may be
provided by a write or obtained by a read.
Separate Address space
• Separate I/O instructions
• Separate address space for I/O devices & memory
• Fewer address lines
• Not necessarily a separate bus. The same address bus is used but, control lines tell
whether the requested R/W is an I/O operation or memory operation.
• Address decoder recognizes its address
• Data register holds data to be transferred
• Status register holds information relevant to the operation of the I/O device
• Because of the difference in speed between I/O devices & processor buffers are used.
• DATAIN is the address of the input buffer associated with the keyboard.
Move DATAIN, R0; reads the data from DATAIN and stores them into
processor registerR0
Move R0, DATAOUT; sends the contents of register R0 to location
DATAOUT
2. Programmed I/O
• Programmed I/O is a method included in every computer for controlling I/O operation. It
is most useful in small, low speed system where hardware cost must be minimized.
• Programmed I/O requires all I/O operation can be executed under the direct control of
the CPU.
• Generally, data transfer takes place between two registers. One is CPU register and other
register is attached in I/O device. I/O device does not have direct access to main memory.
• A data transfer from an I/O device to memory required CPU to execute several
instructions.
• Input instruction to transfer a word from the I/O device to the CPU and store instruction
to transfer the word from the CPU to memory.
Techniques:
1. I/O Addressing 2. I/O-Mapped I/O 3. I/O Instruction 4. I/O Interface Circuit
For example, the contents of the keyboard character buffer DATAIN can be transferred to
register R1 in the processor by the instruction
MoveByte DATAIN, R1
Similarly, the contents of register R1 can be transferred to DATAOUT by the instruction
MoveByte R1, DATAOUT
The status flags SIN and SOUT are automatically cleared when the buffer registers DATAIN
and DATAOUT are referenced, respectively.
Interrupts
I/O device sends a special signal (interrupt) over the bus whenever it is ready for a data
transfer operation.
Direct Memory Access (DMA)
o Used for high speed I/O devices
o Device interface transfers data directly to or from the memory
o Processor not continuously involved
1.2 BUS OPERATIONS
Synchronous Bus
In synchronous buses, the steps of data transfer take place at fixed clock cycles. Everything is
synchronized to bus clock and clock signals are made available to both master and slave.
The bus clock is a square wave signal. A cycle starts at one rising edge of the clock and ends
at the next rising edge, which is the beginning of the next cycle.
A transfer may take multiple bus cycles depending on the speed parameters of the bus and the
two ends of the transfer.
One scenario would be that on the first clock cycle, the master puts an address on the address
bus, puts data on the data bus, and asserts the appropriate control lines.
Slave recognizes its address on the address bus on the first cycle and reads the new value from
the bus in the second cycle.
Synchronous buses are simple and easily implemented. However, when connecting devices
with varying speeds to a synchronous bus, the slowest device will determine the speed of the
bus. Also, the synchronous bus length could be limited to avoid clock-skewing problems.
During the first clock cycle the CPU places the address of the location it wants to read, on the
address lines of the bus. Later during the same clock cycle, once the address lines have
stabilized, the READ request is asserted by the CPU.
Asynchronous Bus
An asynchronous bus has no system clock. Handshaking is done to properly conduct the
transmission of data between the sender and the receiver. For example, in an asynchronous
read operation, the bus master puts the address and control signals on the bus and then asserts
a synchronization signal. T
The synchronization signal from the master prompts the slave to get synchronized and once it
has accessed the data, it asserts its own synchronization signal.
The slave's synchronization signal indicates to the processor that there is valid data on the bus,
and it reads the data.
The master then deasserts its synchronization signal, which indicates to the slave that the master
has read the data. The slave then deasserts its synchronization signal.
This method of synchronization is referred to as a full handshake. Note that there is no clock
and that starting and ending of the data transfer are indicated by special synchronization signals.
An asynchronous communication protocol can be considered as a pair of Finite State machines
(FSMs) that operate in such a way that one FSM does not proceed until the other FSM has
reached a certain state.
1.3 BUS ARBITRATION
• Bus master: Device that initiates data transfers on the bus.
• The next device can take control of the bus after the current master surrenders control.
• Bus Arbitration: Process by which the next device to become master is selected.
Types of bus arbitration:
1. Centralized and Distributed Arbitration
2. Distributed Arbitration
Centralized and Distributed Arbitration
• BR (Bus Request) line is the open drain line and the signal on this line is a logical OR of
the bus request from all the DMA devices.
• BG (Bus Grant) line is processor activates this line indicating (acknowledging) to all the
DMA devices (connected in daisy chain fashion) that the BUS may be used when it’s
free.
• BBSY (Bus Busy) line is an open collector line. The current bus master indicates devices
that it is currently using the bus by signaling this line.
• Processor is normally the bus master, unless it grants bus mastership to DMA.
• For the timing/control, in the above diagram DMA controller 2 requests and acquires bus
mastership and later releases the bus.
• During its tenure as the bus master, it may perform one or more data transfer operations,
depending on whether it is operating in the cycle stealing or block mode.
• After it releases the bus, the processor resumes bus mastership.
Distributed Arbitration
• All devices waiting to use the bus have to carry out the arbitration process and it has no
central arbiter.
• Each device on the bus is assigned with a 4-bit identification number.
• One or more devices request the bus by asserting the start-arbitration signal and place
their identification number on the four open collector lines.
• ARB0 through ARB3 are the four open collector lines.
• One among the four is selected using the code on the lines and one with the highest ID
number.
• Assume that two devices, A and B, having ID numbers 5 and 6, respectively, are
requesting the use of the bus.
• Device A transmits the pattern 0101, and device B transmits the pattern 0110. The code
seen by both devices is 0111.
• Each device compares the pattern on the arbitration lines to its own ID, starting from the
most significant bit.
• If it detects a difference at any bit position, it disables its drivers at that bit position and
for all lower-order bits. It does so by placing a 0 at the input of these drivers.
• In the case of our example, device A detects a difference on line ARB I. Hence, it disables
its drivers on lines ARB 1 and ARBO.
• This causes the pattern on the arbitration lines to change to 0110, which means that B has
won the contention.
Observe the parallel input port that connects the keyboard to the processor. Now, whenever the
key is tapped on the keyboard an electrical connection is established that generates an electrical
signal. This signal is encoded by the encoder to convert it into ASCII code for the
corresponding character pressed at the keyboard.
The encoder then outputs one byte of data that presents the character encoded by the encoder
along with one valid bit. This valid bit changes its status from 0 to 1 when the key is pressed.
So, when the valid bit is 1 the ASCII code of the corresponding character is loaded to the
KBD_DATA register of the input interface circuit.
Now, when the data is loaded into the DATA IN register the SIN status flag is set to1. Which
causes the processor to read the data from DATA IN.
Once the processor reads the data from DATA IN register the SIN flag is again set to 0. Here
the input interface is connected to the processor using an asynchronous bus.
So, the way they alert each other is using the master ready line and the slave ready line.
Whenever the processor is ready to accept the data, it activates its master-ready line and
whenever the interface is ready with the data to transmit it to the processor it activates its slave-
ready line.
The bus connecting processor and interface has one more control line i.e., R/W which is set to
one for reading operation.
Figure shows a suitable circuit for an input interface. The output lines of the DATAIN register
are connected to the data lines of the bus by means of three-state drivers, which are turned on
when the processor issues a read instruction with the address that selects this register.
The SIN signal is generated by a status flag circuit. This signal is also sent to the bus through
a three-state driver.
It is connected to bit D0, which means it will appear as bit 0 of the status register. Other bits of
this register do not contain valid information.
An address decoder is used to select the input interface when the high-order 31 bits of an
address correspond to any of the addresses assigned to this interface.
Address bit A0 determines whether the status or the data registers is to be read when the Master-
ready signal is active.
The control handshake is accomplished by activating the Slave-ready signal when either Read-
status or Read-data is equal to 1.
Parallel Port – Output Port
Observe the output interface shown in the figure above that connects the display and processor.
The display device uses two handshake signals that are ready and new data and the other master
and slave-ready.
When the display unit is ready to display a character, it activates its ready line to 1 which setups
the SOUT flag in the DATAOUT register to 1. This indicates the processor and the processor
places the character to the DATAOUT register.
As soon as the processor loads the character in the DATAOUT the SOUT flag setbacks to 0
and the New-data line to 1. Now as the display senses that the new-data line is activated it turns
the ready line to 0 and accepts the character in the DATAOUT register to display it.
The circuit in figure has separate input and output data lines for connection to an I/O device. A
more flexible parallel port is created if the data lines to I/O devices are bidirectional.
Figure shows a general-purpose parallel interface circuit that can be configured in a variety of
ways. Data lines P7 through P0 can be used for either input or output purposes. For increased
flexibility, the circuit makes it possible for some lines to serve as inputs and some lines to serve
as outputs, under program control.
The DATAOUT register is connected to these lines via three-state drivers that are controlled
by a data direction register, DDR. The processor can write any 8-bit pattern into DDR.
For a given bit, if the DDR value is 1, the corresponding data line acts as an output line;
otherwise, it acts as an input line.
1.10 DIRECT MEMORY ACCESS (DMA)
• To transfer large blocks of data at high Speed, between EXTERNAL devices & Main
Memory, DMA approach is often used.
• DMA controller allows data transfer directly between I/O device and Memory, with
minimal intervention of processor.
• DMA controller acts as a Processor, but it is controlled by CPU.
• To initiate transfer of a block of words, the processor sends the following data to
controller
• The starting address of the memory block
• The word count
• Control to specify the mode of transfer such as read or write
• A control to start the DMA transfer
• DMA controller performs the requested I/O operation and sends a interrupt to the
processor upon completion
• In DMA interface First register stores the starting address, Second register stores Word
count and Third register contains status and control flags.
• Figure shows the faster memory is close to the processor and the slower, less expensive
memory is below it.
• A memory hierarchy can consist of multiple levels, but data is copied between only two
adjacent levels at a time.
• The upper level is the one closer to the processor and smaller and faster than the lower
level, since the upper level uses technology that is more expensive.
• Block: The minimum unit of information that can be either present or not present in the
two-level hierarchy is called a block or a line.
• Hit: If the data requested by the processor appears in some block in the upper level, this
is called a hit.
• Miss: If the data is not found in the upper level, the request is called a miss.
• The lower level in the hierarchy is then accessed to retrieve the block containing the
requested data.
• Hit rate or Hit ratio: It is the fraction of memory accesses found in the upper level;
it is often used as a measure of the performance of the memory hierarchy.
• Miss rate: Miss rate (1−hit rate) is the fraction of memory accesses not found in the upper
level.
• Hit time: It is the time to access the upper level of the memory hierarchy, which includes
Example:1
Assume there are three small caches, each consisting of four one-word blocks. One cache is
fully associative, a second is two-way set-associative, and the third is direct-mapped. Find the
number of misses for each cache organization given the following sequence of block addresses:
0, 8, 0, 6, and 8.
Solution:
Direct Map: Number of blocks: 4
• The direct-mapped cache generates five misses for the five accesses.
Set Associative:
• The set-associative cache has two sets (with indices 0 and 1) with two elements per set.
Let’s first determine to which set each block address maps:
• Set-associative caches usually replace the least recently used block within a set;
• Notice that when block 6 is referenced, it replaces block 8, since block 8 has been less
recently referenced than block 0. The two-way set-associative cache has four misses, one
less than the direct-mapped cache.
Fully Associative:
• The fully associative cache has four cache blocks (in a single set); any memory block can
be stored in any cache block.
• The fully associative cache has the best performance, with only three misses:
Replacement Algorithm
Least Recently Used (LRU):
First in First out (FIFO):
• Writes are more complicated. For a write-through scheme, we have two sources of
stalls:
• Write Misses: we fetch the block before continuing the write and write buffer stalls,
which occur when the write buffer is full when a write occurs. Thus, the cycles stalled
for writes equals the sum of these two:
• Because the write buffer stalls depend on the proximity of writes. Fortunately, in systems
with a reasonable write buffer depth (e.g., four or more words) and a memory capable of
accepting writes at a rate that significantly exceeds the average write frequency in
programs (e.g., by a factor of 2), the write buffer stalls will be small, and we can safely
ignore them.
• If a system did not meet these criteria, it would not be well designed; instead, the designer
should have used either a deeper write buffer or a write-back organization.
• Write-back: schemes also have potential additional stalls arising from the need to write
a cache block back to memory when the block is replaced. If we assume that the write
buffer stalls are negligible, we can combine the reads and writes by using a single miss
rate and the miss penalty:
Example:1
• Pages: Both the virtual memory and the physical memory are broken into pages, so that
a virtual page is mapped to a physical page.
• Physical pages can be shared by having two virtual addresses point to the same physical
address. This capability is used to allow two different programs to share data or code.
• Segmentation: It is a variable-size address mapping scheme in which an address consists
of two parts: a segment number, which is mapped to a physical address, and a segment
off set.
• Page table: It is the table containing the virtual to physical address translations in a virtual
memory system. The table, which is stored in memory, is typically indexed by the
virtual page number; each entry in the table contains the physical page number for that
virtual page if the page is currently in memory.
Mapping from virtual address into physical address
• The difficulty in using fully associative placement is in locating an entry, since it can be
anywhere in the upper level of the hierarchy. A full search is impractical.
• In virtual memory systems, we locate pages by using a table that indexes the memory;
this structure is called a page table, and it resides in memory.
• A page table is indexed with the page number from the virtual address to discover the
corresponding physical page number. Each program has its own page table, which maps
the virtual address space of that program to main memory.
• Page Table Register: To indicate the location of the page table in memory, the hardware
includes a register that points to the start of the page table; we call this the page table
register. Assume for now that the page table is in a fixed and contiguous area of memory.
• When a page fault occurs, if all the pages in main memory are in use, the operating system
must choose a page to replace.
• Because we want to minimize the number of page faults, most operating systems try to
choose a page that they hypothesize will not be needed in the near future.
• Using the past to predict the future, operating systems follow the least recently used
(LRU) replacement scheme. The operating system searches for the least recently used
page, assuming that a page that has not been used in a long time is less likely to be needed
than a more recently accessed page. The replaced pages are written to swap space on the
disk.
• Swap space: It is the space on the disk reserved for the full virtual memory space of a
process.
• Reference bit: It is also called use bit. A field that is set whenever a page is accessed
and that is used to implement LRU or other replacement schemes.
SRAM Technology
• SRAMs are simply integrated circuits that are memory arrays with (usually) a single
access port that can provide either a read or a write.
• SRAMs have a fixed access time to any datum, though the read and write access times
may differ.
• SRAMs don’t need to refresh and so the access time is very close to the cycle time.
• SRAMs typically use six to eight transistors per bit to prevent the information from being
disturbed when read.
• SRAM needs only minimal power to retain the charge in standby mode.
• All levels of caches are integrated onto the processor chip.
• In a SRAM, as long as power is applied, the value can be kept indefinitely.
DRAM Technology
• In a dynamic RAM (DRAM), the value kept in a cell is stored as a charge in a capacitor.
• A single transistor is then used to access this stored charge, either to read the value or to
overwrite the charge stored there. Because DRAMs use only a single transistor per bit of
storage, they are much denser and cheaper per bit than SRAM.
• As DRAMs store the charge on a capacitor, it cannot be kept indefinitely and must
periodically be refreshed. That is why this memory structure is called dynamic, as
opposed to the static storage in an SRAM cell.
• To refresh the cell, we merely read its contents and write it back. The charge can be kept
for several milliseconds.
• Figure shows the internal organization of a DRAM, and shows the density, cost, and
access time of DRAMs.
• Modern DRAMs are organized in banks, typically four for DDR3. Each bank consists of
a series of rows.
• When the row is in the buffer, it can be transferred by successive column addresses at
whatever the width of the DRAM is (typically 4, 8, or 16 bits in DDR3) or by specifying
a block transfer and the starting address.
• To further improve the interface to processors, DRAMs added clocks and are properly
called Synchronous DRAMs or SDRAMs.
• The advantage of SDRAMs is that the use of a clock eliminates the time for the memory
and processor to synchronize.
• The speed advantage of synchronous DRAMs comes from the ability to transfer the bits
in the burst without having to specify additional address bits.
• The fastest version is called Double Data Rate (DDR) SDRAM. The name means data
transfers on both the rising and falling edge of the clock, thereby getting twice as much
bandwidth as you might expect based on the clock rate and the data width.
• Sending an address to several banks permits them all to read or write simultaneously. For
example, with four banks, there is just one access time and then accesses rotate between
the four banks to supply four times the bandwidth. This rotating access scheme is called
address interleaving.
Flash Memory
• Flash memory is a type of electrically erasable programmable read-only memory
(EEPROM).
• EEPROM technologies use flash memory bits for writing purpose.
• Most flash products include a controller to spread the writes by remapping blocks that
have been written many times to less trodden blocks. This technique is called wear
leveling.
• Personal mobile devices are very unlikely to exceed the write limits in the flash.
Disk Memory
• A magnetic hard disk consists of a collection of platters, which rotate on a spindle at 5400
to 15,000 revolutions per minute.
• The metal platters are covered with magnetic recording material on both sides, similar to
the material found on a cassette or videotape.
• To read and write information on a hard disk, a movable arm containing a small
electromagnetic coil called a read-write head is located just above each surface.
• The disk heads to be much closer to the drive surface.
• Tracks: Each disk surface is divided into concentric circles, called tracks. There are
typically tens of thousands of tracks per surface.
• Sector: Each track is in turn divided into sectors that contain the information; each track
may have thousands of sectors. Sectors are typically 512 to 4096 bytes in size.
• The sequence recorded on the magnetic media is a sector number, a gap, the information
for that sector including error correction code.
• The disk heads for each surface are connected together and move in conjunction, so that
every head is over the same track of every surface.
• The term cylinder is used to refer to all the tracks under the heads at a given point on all
surfaces.
• Seek Time: To access data, the operating system must direct the disk through a three-
stage process. The first step is to position the head over the proper track. This operation
is called a seek, and the time to move the head to the desired track is called the seek time.
• Average seek times are usually advertised as 3 ms to 13 ms, but, depending on the
application and scheduling of disk requests.
• Rotational Latency: Once the head has reached the correct track, we must wait for the
desired sector to rotate under the read/write head. This time is called the rotational
latency or rotational delay.
• The average latency to the desired information is halfway around the disk. Disks rotate
at 5400 RPM to 15,000 RPM. The average rotational latency at 5400 RPM is Average
rotational latency 0.5 rotation.
• The last component of a disk access, transfer time, is the time to transfer a block of bits.
• The transfer time is a function of the sector size, the rotation speed, and the recording
density of a track.
• In summary, the two primary differences between magnetic disks and semiconductor
memory technologies are that disks have a slower access time because they are
mechanical devices flash is 1000 times as fast and DRAM is 100,000 times as fast.
• Magnetic disks memory is cheaper per bit because they have very high storage capacity
at a modest cost disk is 10 to 100 times cheaper.
• Magnetic disks are nonvolatile like flash, but unlike flash there is no write wear-out
problem. However, flash is much more rugged and hence a better match to the jostling
inherent in personal mobile devices.
PROBLEMS:
Q.No:1
Assume there are three small caches, each consisting of four one-word blocks. One cache is
fully associative, a second is two-way set-associative, and the third is direct-mapped.
a. Find the number of Misses and Hit for each cache organization given the following
sequence of block addresses:
32 64 32 64 8 5 8 26 8
b. Draw the necessary diagrams for each cache organisation.
Solution
CACHE MEMORY MAPPING TECHNIQUES:
1. Direct Mapping
• Direct-Mapped Cache: A cache structure in which each memory location is mapped to
exactly one location in the cache.
(Block address) modulo (Number of blocks in the cache)
2. Set Associative:
• The set-associative cache has two sets (with indices 0 and 1) with two elements per set.
Let’s first determine to which set each block address maps:
Set-associative caches usually replace the least recently used block within a set;
Notice that when block 8 is referenced, it replaces block 32, since block 32 has been less
recently referenced than block 64.
Notice that when block 26 is referenced, it replaces block 64, since block 64 has been less
recently referenced than block 8.
The two-way set-associative cache has five misses, two less than the direct-mapped cache
and 4 Hits.
3. Fully Associative:
• The fully associative cache has four cache blocks (in a single set); any memory block can
be stored in any cache block.
• The fully associative cache has five misses and 4 Hits:
Address of Contents of cache block after reference
Hit or
memory block
Miss Block 0 Block 1 Block 2 Block 3
accessed
32 Miss Memory[32]
64 Miss Memory[32] Memory[64]
32 Hit Memory[32] Memory[64]
64 Hit Memory[32] Memory[64]
8 Miss Memory[32] Memory[64] Memory[8]
5 Miss Memory[32] Memory[64] Memory[8] Memory[5]
8 Hit Memory[32] Memory[64] Memory[8] Memory[5]
26 Miss Memory[26] Memory[64] Memory[8] Memory[5]
8 Hit Memory[26] Memory[64] Memory[8] Memory[5]
Q.No:2
Consider the following reference string
7 8 5 4 8 1 8 6 4 1 8 1 4
5 4 8 5 7 8 5
And assume that number of page frames that cache can hold at a time is 3.
a. How to implement LRU caching scheme? Find the number of page faults using least
recently used (LRU) page replacement algorithm.
b. How to implement FIFO caching scheme? Find the number of page faults using First
in First Out (FIFO) page replacement algorithm.
c. Despite these problems and find out which one is better?
Solution:
a. Least Recently Used (LRU):
The algorithm which replaces the page that has not been used for the longest period of
time is referred to as the Least recently Used (LRU) algorithm. The LRU algorithm
produces 12 Page Faults.
Reference String
7 8 5 4 8 1 8 6 4 1 8 1 4 5 4 8 5 7 8 5
Page Frame
7 7 7 4 4 6 6 6 8 5 5 5
8 8 8 8 8 8 1 1 1 8 8
5 5 1 1 4 4 4 4 4 7
b. First in First out (FIFO):
A replacement scheme in which the block replaced is the one that has been entered first in
the cache memory. The FIFO algorithm produces 15 Page Faults.
Reference String
7 8 5 4 8 1 8 6 4 1 8 1 4 5 4 8 5 7 8 5
Page Frame
7 7 7 4 4 4 6 6 6 8 8 8 7 7 7
8 8 8 1 1 1 4 4 4 5 5 5 8 8
5 5 5 8 8 8 1 1 1 4 4 4 5
c. LRU replacement with 12 faults is still much better than FIFO replacement with 15 faults.