Computer Architecture Unit IV
Computer Architecture Unit IV
1 byte = 8 bits
Each register consist of storage element, which stores one bit of data, is called a
cell. Storage Cells
I. PROCESSOR REGISTER
The fastest access of data is possible only if data is available in processor registers.
The top of the speed of access is very small portion of the required memory.
Processor
Cache (L2)
Main
Memory
Magnetic Disk
Biggest Secondary Memory Slowest Lowest
V. SECONDARY MEMORY
Magnetic disk storage units are called as secondary memory. This provide
huge amount of low cost storage. They are very slow compared to the semi-conductor
devices which are used to implement primary memory.
Tabel 4.1 Characteristics of Memory
Memory Speed Size Cost
Primary cache High Lower Low
Secondary cache Low Low Low
Main memory Lower than Secondary cache High High
Secondary Memory Very low Very High Very High
The metal platters are covered with magnetic recording material on both sides.
Arm - containing small electromagnetic coil called a read-write head.
Each disk surface is divided into concentric circles called Tracks.
Each tracks is in turn divided into sectors that contain the information
MEMORY AND I/O ORGANIZATION
Computer Architecture
The program and data which are currently being executed is accessed from
Processor and it is added between the processor and main memory to speed
up the execution process is called Cache memory.
Cache is a small amount of memory used to speed up the system performance
Placed between the processor and main memory
Used to store programs and data currently being executed and temporary data
frequently used by the CPU
To speed up the execution process, high speed memory such as cache (SRAM) is
used. It is a type of memory that contains small portion of code and data
To execute a program in a computer system
o Program is loaded into the main memory
o Processor fetches the instructions (code) and data from the main memory
o Execute the program.
4.4.1 Cache Memory System
Cache memory includes small amount of fast memory (SRAM) and large amount
of slow memory (DRAM).
If the processor requests the data in cache, which is not available in cache it is
referred as Cache Miss, then the desired block is copied from the main memory
to cache using cache controller.
Cache controller decides which memory block should be moved in or out of cache
and main memory.
LOCALITY OF REFERENCE
The effectiveness of the cachemechanismis based on the property of computer
programs called Locality of reference.
MEMORY AND I/O ORGANIZATION Computer Architecture
Main Memory
Cache
Cache was the name chosen to represent the level of memory hierarchy between
the processor and main memory.
Cache is a safe place for hiding or storing things that we need to examine.
When the CPU finds a requested data item in the cache it is called as cache hit.
When the CPU does not find a requested data item in the cache it is called as
cache miss.
A fixed size collection of data containing the requested word called a
block. 4.4.3 Types of Cache Memory
1. Primary Cache
Also referred as Processor Cache, Level1 or L1 cache
Always located on the processor chip
2. Secondary Cache
Also referred as Level2 or L2 cache
Placed between the primary cache and the main memory
Advantages
Cache memory is faster than main memory
Consumes less access time as compared to main memory
Stores program that can be executed within a short period of time
Stores data for temporary use
MEMORY AND I/O ORGANIZATION Computer Architecture
Disadvantages
Limited capacity
Very expensive
4.4.4 BLOCK REPLACEMENT
When the cache is full and a memory word is not in the cache is referenced,
the cache control hardware replaces a block from the cache to store the new
block.
The algorithm used for this process is called Replacement algorithm.
The fraction of memory accesses found in a level of the memory hierarchy
is called as hit rate or hit ratio.
WRITE OPERATION
There are two ways for write operation
Write-through.
Write-back.
Write-buffer.
WRITE-THROUGH
The cache location and the main memory location are updated simultaneously.
WRITE-BACK
Only the cache location is updated a flag bit is used to specify the updation.
It is called dirty bit or modified bit.
WRITE-BUFFER
Stores the data while it is waiting to be written into the memory.
After writing data, processor can continue execution.
Write completes - write buffer is free.
Table 4.3 Both write back and write through have their advantages
WRITE-BACK WRITE-THROUGH
1) Individually words can be written 1) Misses are simpler and cheaper
by the processor at the rate that the because they never require a block
cache rather than the memory to be written back to the lower level.
can accept them.
2) Multiple words within a block 2) Write-through is easier to
require only one write to the lower implement than write-back.
level in the hierarchy.
MEMORY AND I/O ORGANIZATION Computer Architecture
READ MISS
When the addressed word in a read operation is not in the cache, it is
called as Read miss.
Alternatively, this word may be sent to the processor as soon as it is
read from the memory. This approach is called Load-through or early
restart.
WRITE MISS
During write operation, word is not in the cache, it is called as write miss.
MISS RATE = 1 - HIT RATE
MISS RATE
The fraction of memory accesses not found in the level of the memory
hierarchy is called miss rate.
MISS PENALTY
It is the time to replace a block in the upper level with the corresponding
block from the lower level, plus the time to deliver this block to
processor.
AMAT (AVERAGE MEMORY ACCESS TIME)
It is the average time to access memory consider both hits and misses
and the frequent of different accesses.
Block 0
Block 0
tag Block 0
tag Block 1
Block 127
Block 128
TAG
A field in a table for a memory hierarchy that contains the address
information required to identify whether block in the order mapping to a
requested word.
ADVANTAGE
It is simple and easy to implement.
DISADVANTAGE
It is not flexible.
Contention may arrive for the position, if cache is not full.
4.5.2.1 B. ASSOCIATIVE MAPPING
A cache structure in which a block can be placed in any location in the cache
is called fully associative cache.
It is much more flexible than direct mapping.
Main memory block can be placed into any cache block position.
Fig 4.8 Associate mapping Memory
TAG(12) WORD(4)
MSB LSB
16bits Block 0
tag Block 0 Block 0
tag Block 1
Block i
ADVANTAGE
It gives complete freedom in choosing a cache location to load a
memory block in a cache.
Space in the cache can be used efficiently.
A new block replaces an existing block only if the cache is full.
DISADVANTAGE
Cost of associative mapping cache is higher than the cost of direct mapping
cache.
All the 128 tags are searched to find whether a given block is in the cache
or not.
4.5.2.1 C. SET-ASSOCIATIVE MAPPING
A cache that has fixed number of location(at least two) where each
block can be placed is called set-associative mapping.
A combination of direct and associative mapping techniques are called
set-associative mapping.
Blocks of cache are grouped into sets. In set-associative cache, the
set containing a memory blocks is given by,
Cache
tag Block 0 Block 0
Set 0
tag Block 1 Block 1
tag Block 2
Set 1
tag Block 3
Block 63
Block 64
PAGE TABLE
Virtual add is
interprest
PAGE as page is
PTBR + virtual TABLE 4 offset
page numbers
provide the entry of the This entry has the
page in the page table starting location
of the page
Page table contains information about the main memory location of each page.
This information includes the main memory address where the page is stored and
current status of the page.
An area in the main memory that holds one page is called page frame.
Starting address of the page table is kept in a page table base register.
The content of page table base register is added with the virtual page number
to get the corresponding entry in the page table.
Each entry in the page table also includes some control bits.
Two important control bits are
MEMORY AND I/O ORGANIZATION Computer Architecture
No
=?
Yes
Miss
Hit
TLB contains virtual address of entry, in addition to the page table entry.
Following steps are used in address translation process with TLB:
Processor generates the virtual addresses.
MMU looks in the TLB for the referenced page.
If the page table entry for this page is found in the TLB, the physical address is
obtained immediately.
If the page table entry is is not found, then the corresponding page entry is obtained
from the page table in the main memory.
Then the TLB is updated accordingly.
Page fault
Page replacement.
Write operation.
4.6.3.1.PAGE FAULT
When a program generates an address request to a page that is not available
in the main memory, then a page fault is occurred.
If a valid bit for a virtual page is off, it causes a page fault. If page fault
occur then the control has given to operating system.
When the MMU detects a page fault it asks the OS to handle the situation
by raising a exception (interrupt). The operating system copies the
contents of the page from the disk into the main memory and transfers the
control back to the interrupted task.
4.6.3.2. PAGE REPLACEMENT
When a new page is brought from the disk if the main memory is full,a
page must be replaced from the main memory.
LRU replacement algorithms can be used for page replacement.
4.6.3.3 WRITE OPERATION
A modified page has to be written back to the disk before it is removed from
the main memory.
Access time of disk is so long, only write-back protocol is suitable for
virtual memories, write through protocol is not suitable.
MEMORY AND I/O ORGANIZATION Computer Architecture
Data bus
Address bus
READ
WRITE
IO IO
Device A Device B
Main
CPU IO Port 1 IO Port 2 IO Port 3
Memory
IO IO
Device A Device B
Fig 4.17 Programmed IO with IO-mapped IO
MEMORY AND I/O ORGANIZATION Computer Architecture
4.7.2 IO INSTRUCTIONS
Two IO instructions are used to implement programmed IO. Intel 80X86 series
of microprocessor series have two IO instructions called IN & OUT.
IN: The instruction IN X causes a word to be transferred from IO port X to
the accumulator register A.
OUT: The instruction OUT X transfers a word from the accumulator register A
to the port X.
4.7.3 IO DATA Transfer:
When the CPU executes an IO instruction such as IN or OUT, the addressed IO port
must be ready to respond to the instruction.
In programmed IO, the CPU can be programmed to test the IO device’s status before
initiating the IO data transfer. The status is specified by a single bit
information.
The CPU must perform the following steps to determine the status of an IO device
MEMORY AND I/O ORGANIZATION Computer Architecture
iii. Data Register (IODR): is used to store the data to be transferred. When the data
transfer over, the DMA controller sends an interrupt signal to the CPU to notify the
end of IO
data transfer.
Main
Memory
Data
Address
Control
IO Devices
Fig 4.18 DIRECT MEMORY ACCESS HARDWARE
Modes of Operation:
DMA works in different modes. It is based on the degrees of overlap between the
CPU and DMA operations. They are,
i. Block Transfer
ii. Cycle stealing
iii. Transparent DMA
-> Block Transfer:
In block transfer mode, a block of data of arbitrary length is transferred in a
single burst. It is also called as burst transfer.
During this data transfer, DMA controller is the master of the memory bus. It is
useful for memories like disk drives.
Dis adv: it make CPU inactive for a long period
-> Cycle stealing
Cycle stealing allows the system bus to transfer one data word, after that it
returns the control of the bus to the CPU.
MEMORY AND I/O ORGANIZATION Computer Architecture
Drive
DMA Disk Main
CPU Controller Controller Memory
Buffer
Address
Court
Control 4. Ack
Interrupt when
done
Bus
MEMORY AND I/O ORGANIZATION Computer Architecture
CPU
Start
Requested bus
No
Bus free!
Yes
Perform other activities
Activate address
lines
Perform data
transfer
Release bus
Total data No
bytes
transferred
Yes
Notifiy CPU of
Completion
BG-Bus Grant line-bus grant signal is propagated through all of the devices
through this line.
BBSY-Bus Busy-The current bus master indicates to all devices that it is using
the bus by activating this open controller line called bus busy.
Address Lines
BBSY
Arbiter BR
(Processor)
DMA DMA
Controller - 1 Controller - 2
DMA DMA
Controller - 1 Controller - 2
Advantages –
This method generates fast response.
Disadvantages –
Hardware cost is high as large no. of control lines are required.
The decentralized arbitration offers high reliability because operation of the bus
is not dependent on any single device.
V0 ARB3
ARB2
ARB1
ARB0
Start - Arbitrat
Non Maskable Interrupt: The hardware which cannot be delayed and should process
by the processor immediately.
o Software Interrupts: Software interrupt can also divided in to two types.
Normal Interrupts: the interrupts which are caused by the software instructions are
called software instructions.
Exception: unplanned interrupts while executing a program is called Exception. For
example: while executing a program if we got a value which should be divided by
zero is called an exception.
5. The PC and other processor registers are saved in stack as a sun routine call.
6. The PC is loaded with address of the interrupt handler. Execution proceeds until
return instruction is reached. Immediately control is transferred back to the interrupted
system.
Interrupt Selection
o Interrupt selection is similar to bus arbitration process.
1. Single Line Interrupt system
2. Multi - Line Interrupt system
3. Vectored interrupt
INTERRUPT REQUEST
Interrupt
Flip flop
CPU
IO Devices
CPU
IO Port 0 IO Port 1 IO Port 2 IO Port 3
IO Devices
3. Vectored interrupt
In vectored interrupt, the interrupting device must supply the CPU with the starting
address or interrupt vector of the program. This method is most flexible in handling
interrupts.
Data Bus Interrupt Handler Address
Priority
Control
Circuit
IO Devices
4.8.2 IO PROCESSORS (IOP)
The DMA mode of data transfer reduces CPU’s overhead in handling I/O
operations.
It also allows parallelism in CPU and I/O operations. Such parallelism is necessary
MEMORY AND I/O ORGANIZATION Computer Architecture
to avoid wastage of valuable CPU time while handling I/O devices whose speeds
are much slower as compared to CPU.
The concept of DMA operation can be extended to relieve the CPU further from
getting involved with the execution of I/O operations. This gives rises to
the development of special purpose processor called Input-Output Processor
(IOP) or IO channel.
The I/O processor is capable of performing actions without interruption or
intervention from the CPU. The CPU only needs to initiate the I/O processor by
telling it what activity to perform.
Once the necessary actions are performed, the I/O processor then provides the
results to the CPU. Doing these actions allow the I/O processor to act as a Bus to
the CPU, like a CPU Bus, carrying out activities by directly interacting
with memory and other devices in the computer. A more advanced I/O processor
may also have memory built into it, allowing it to perform actions and activities
more quickly.
For example, without an I/O processor, a computer would require the CPU to
perform all actions and activities, reducing overall computer performance. However, a
computer with an I/O processor would allow the CPU to send some activities to the I/O
processor. While the I/O processor is performing the necessary actions for those
activities, the CPU is free to carry out other activities, making the computer more
efficient and increase performance.
Advantages –
The I/O devices can directly access the main memory without the intervention by
the processor in I/O processor based systems.
It is used to address the problems that are arises in Direct memory access method.
IO instructions:
When an IOP is used,the CPU execuites only a few instruction that initate
and terminate the execution of IO program via IOP.
o IO Instruction Executed by CPU
START IO-Initates the IO operation.
HALT IO-it causes the IOP to terminate IO program
execution
TEST IO-it is used to determine the status of the
named io device and IOP.
o IO Instruction Executed by IOP
DATA TRNSFER INSTRUCTIONS-read,write,sense
BRANCH INSTRUCTION-next CCW
IO DEVICE CONTROL INSTRUCTION-rewind,seek address,print page.
MEMORY AND I/O ORGANIZATION Computer Architecture
IO Organization
o The structure of IOP organization is both CPU and IOP shares the
common memory via the system bus.
o There are two common kinds of programs,
CPU program
These are executed by CPU.
IOP program
ca ed OCR which s used o pass ng
These programs
nforma on be are executed
ween he wobyprocesso
IOP. n he fo m
of messages
Both the programs are stored in memory. the memory also contains a communication
IOP organ zation IOP and CPU share access to a common memoryM via he systembus
region ll I i t i i ti t t t r i t
r
.
i : t
Computer containing an IOP : (a) system organization and (b) CPU-IOP interaction
MEMORY AND I/O ORGANIZATION Computer Architecture
IOP Organization
CPU – IOP Interaction:-
There are four phases in the CPU-IOP Interaction.
They are
WAIT
o IOP checks the ATTENTION line if it is 1, then IOP fetches parameters from
IOCR.
SETUP
o DMA control registers are set up by CPU. The CPU begins IO program
execution by sending commands to IO device.
SEND
o Data is transmitted between IO device and memory by IOP without the
intervention of CPU. During transmission, if any error occurs, the control
transferred to EXIT phase.
EXIT
o IOP places the termination status in IOCR and goes to WAIT phase.
MEMORY AND I/O ORGANIZATION Computer Architecture
o Computer mice and some of the controls and manipulators used with video
games are good examples.
o Consider now a different source of data. Many computers have a microphone,
eitherexternally attached or built in. The sound picked up by the
microphone produces an analog electrical signal, which must be converted
into a digital form before it can be handled by the computer. This is
accomplished by sampling the analog signal periodically. For each
sample, an analog-to- digital (A/D) converter generates an n-bit number
representing the magnitude of the sample. The number of bits, n, is selected
based on the desired precisionwith which to represent each sample. Later, when
these data are sent to a speaker, a digital- to-analog (D/A) converter is used to
restore the original analog signal from the digital format. A similar approach is
used with video information from a camera.
o The sampling process yields a continuous stream of digitized samples that
arrive at regular intervals, synchronized with the sampling clock. Such a data
stream is called isochronous, meaning that successive events are separated by
equal periods of time.
Electrical Characteristics:-
USB connections consist of four wires, of which two carry power, +5 V and Ground,
and two carry data.
Two methods are used to send data over a USB cable.
When sending data at low speed, a high voltage relative to Ground is transmitted
on one of the two data wires to represent a 0 and on the other to represent a 1.
The Ground wire carries the return current in both cases. Such a scheme in which
a signal is injected on a wire relative to ground is referred to as single-ended
transmission
The High-Speed USB uses an alternative arrangement known as differential signalling. The
data signal is injected between two data wires twisted together. The ground wire is not
involved. The receiver senses the voltage difference between the two signal wires directly,
without reference to ground. The ground wire acts as a shield for the data on the twisted pair
against interference from nearby wires. Differential signalling allows much lower voltages
and much higher speeds to be used compared to single-ended signalling.
MEMORY AND I/O ORGANIZATION Computer Architecture
Endpoints
In USB, the information flows between host and device. The endpoints are
source or sink of information in a communication channel.
These are blocks of memory in a controller chip containing buffers for
transmission and reception.
The two endpoints can have same endpoint number but different directions.
When the device is plugged in, only the default endpoint 0 is accessible. This
endpoint receives control and status request from the host during enumeration
process.
The other endpoints are declared as per requirement after configuration of the device.
Pipes
A pipe is a logical data connection between host controller’s software and
device endpoint.
The information is exchanged through this pipe. It is created during
enumeration process. When the device is unplugged, unneeded pipes are
removed.
There are two types of pipes:
Message pipes: These are bi-directional pipes which follow defined
packet format. They are controlled by host and only support control
transfer.
MEMORY AND I/O ORGANIZATION Computer Architecture
Stream pipes: These are unidirectional pipes which don’t follow any specific
data format. They can be controlled by host or device (peripheral) and
support bulk, isochronous and interrupt types of transfer.
The Default Control pipe is a special type of message pipe which is
bidirectional and supports control transfer type. It uses endpoint 0-IN and
endpoint 0-OUT.
This pipe can be accessed when a device is plugged
in.
Transfer Types
There are four types of transfer modes or types which can be used for communication:
• Control Transfer: This transfer type is used to transfer the control information
while identifying and configuring the device.
• Bulk Transfer: In this transfer type, large amount of data is moved and time is not
a critical factor here. It can be used as fillers.
• Interrupt Transfer: This type of transfer is for small data transmission with
immediate attention.
• Isochronous Transfer: It is a transfer in which time is a critical factor. The
transfer has to be completed in a given time.
MEMORY AND I/O ORGANIZATION Computer Architecture
Transaction
A single transaction contains transmission of up to three packets. These packets are:
• Token Packet: This packet is always sent by Host
• Data Packet: This packet can be sent by Host or Device.
• Handshake Packet: This packet provides success/failure information for the data
packet received. If Host transmits the data packet, the device sends the handshake packet
and vice versa.
Handshaking
Handshaking is a mechanism to check the success/failure of a request or to check the
delivery of a packet. It is done to avoid loss of packets and to ensure successful
transmission. Terms related to handshaking:
ACK – acknowledgment for data receive (success)
NAK – negative acknowledgment means no data
STALL – request not supported or endpoint
halted
NYET – not yet. A case can be when device is busy and not ready for next data packet
ERR – split transaction error
Advantages of USB
• Ease of Use: USB was for obvious reasons designed to be a simplified interface. The
simplicity of the interface lies with following aspects:
MEMORY AND I/O ORGANIZATION Computer Architecture