Computer Organization Notes
Computer Organization Notes
TYPES OF COMPUTERS
Desktop Computers
• These are most commonly used computers in home, schools and offices.
• This has
→ processing- & storage-units
→ video & audio output-units
→ Keyboard & mouse input-units.
Notebook Computers (Laptops)
• This is a compact version of a personal-computer (PC) made as a portable-unit.
Workstations
• These have more computational-power than PC.
Enterprise Systems (Mainframes)
• These are used for business data-processing.
• These have large computational-power and larger storage-capacity than workstations.
• These are referred to as
→ server at low-end and
→ Super-computers at high end.
Servers
• These have large database storage-units and can also execute requests from other
computers.
• These are used in banks & educational institutions.
Super Computers
• These are used for very complex numerical-calculations.
• These are used in weather forecasting, aircraft design and military applications.
FUNCTIONAL UNITS
• A computer consists of 5 functionally independent main parts: 1)input,
2)memory,3)arithmetic & logic, 4)output and 5)control units.
Input Unit
• The computer accepts the information in the form of program & data through an input-
device.
Eg: keyboard
• Whenever a key is pressed, the corresponding letter/digit is automatically translated into its
corresponding binary-code and transmitted over a cable to either the memory or the
processor.
Memory Unit
• This unit is used to store programs & data.
• There are 2 classes of storage:
1) Primary-storage is a fast-memory that operates at electronic-speed. Programs must
be stored in the memory while they are being executed.
2) Secondary-storage is used when large amounts of data & many programs have to
be stored. Eg: magnetic disks and optical disks(CD-ROMs).
• The memory contains a large number of semiconductor storage cells(i.e. flip-flops), each
capable of storing one bit of information.
• The memory is organized so that the contents of one word can be stored or retrieved in one
basic operation.
Output Unit
• This unit is used to send processed-results to the outside world.
Eg: printer, graphic displays etc.
Control Unit
• This unit is used for controlling the activities of the other units (such as memory, I/O
device).
• This unit sends control-signals (read/write) to other units and senses their states.
• Data transfers between processor and memory are also controlled by the control-unit
through timing-signals.
• Timing-signals are signals that determine when a given action is to take place.
PROCESSOR CLOCK
• Processor circuits are controlled by a timing signal called a clock.
• The clock defines regular time intervals called clock cycles.
• To execute a machine instruction, the processor divides the action to be performed into a
sequence of basic steps such that each step can be completed in one clock cycle.
• Let P=length of one clock cycle R=clock rate. Relation between P and R is given by R=1/P
which is measured in cycles per second.
• Cycles per second is also called hertz(Hz)
-----(1)
As shown in above figure, 6 clock cycles are required to complete two operations.
• As shown in above figure, if we use pipelining & prefetching, only 4 cycles are required to
complete same two operations.
• While executing the Add instruction, the processor can read the Move instruction from
memory.
• In the ideal case, if all instructions are overlapped to the maximum degree possible,
execution proceeds at the rate of one instruction completed in each clock cycle.
• A higher degree of concurrency can be achieved if multiple instruction pipelines are
implemented in the processor i.e. multiple functional units can be used to execute different
instructions parallely. This mode of operation is known as superscalar execution.
• With Superscalar arrangement, it is possible to complete the execution of more than one
instruction in every clock cycle.
• If SPEC rating=50 means that the computer under test is 50times as fast as reference
computer.
• The test is repeated for all the programs in the SPEC suite, and the geometric mean of the
results is computed.
Let SPECi be the rating for program i in the suite. The overall SPEC rating for the computer
is given by
BYTE ADDRESSABILITY
• In byte addressable memory, successive addresses refer to successive byte locations in the
memory.
• Byte locations have addresses 0, 1, 2. . . . .
• If the word length is 32 bits, successive words are located at addresses 0, 4, 8. .with each
word having 4 bytes.
WORD ALIGNMENT
• Words are said to be aligned in memory if they begin at a byte address that is a multiple of
the number of bytes in a word.
• For example, if the word length is 16(2 bytes), aligned words begin at byte addresses 0, 2, 4
. . . . . And for a word length of 64, aligned words begin at byte addresses 0, 8, 16. . . . . . .
• Words are said to have unaligned addresses, if they begin at an arbitrary byte address.
MEMORY OPERATIONS
• Two basic operations involving the memory are: Load(Read/Fetch) and Store(Write).
• The Load operation transfers a copy of the contents of a specific memory location to the
processor. The memory contents remain unchanged.
• The steps for Load operation:
1) Processor sends the address of the desired location to the memory
2) Processor issues „read‟ signal to memory to fetch the data
3) Memory reads the data stored at that address
4) Memory sends the read data to the processor
• The Store operation transfers the information from the processor register to the specified
memory location. This will destroy the original contents of that memory location.
• The steps for Store operation are:
1) Processor sends the address of the memory location where it wants to store data
2) Processor issues „write‟ signal to memory to store the data
3) Content of register(MDR) is written into the specified memory location.
INSTRUCTIONS & INSTRUCTION SEQUENCING
• A computer must have instructions capable of performing 4 types of operations:
1) Data transfers between the memory and the processor registers (MOV, PUSH,
POP, XCHG),
2) Arithmetic and logic operations on data (ADD, SUB, MUL, DIV, AND, OR,
NOT),
3) Program sequencing and control (CALL.RET, LOOP, INT),
4) I/0 transfers (IN, OUT)
REGISTER TRANSFER NOTATION (RTN)
• We identify a memory location by a symbolic name (in uppercase alphabets).
For example, LOC, PLACE, NUM etc indicate
memory locations.R0, R5 etc indicate processor
register. DATAIN, OUTSTATUS etc indicate I/O
registers.
• For example,
R<-[LOC] means that the contents of memory location LOC are transferred into
processor register R1 (The contents of a location are denoted by placing square
brackets around the name of the location). R3<-[R1]+[R2] indicates the operation
that adds the contents of registers R1 and R2 ,and then places their sum into register
R3.
• This type of notation is known as RTN(Register Transfer Notation).
CONDITION CODES
• The processor keeps track of information about the results of various operations. This is
accomplished by recording the required information in individual bits, called condition code
flags.
• These flags are grouped together in a special processor-register called the condition code
register (or statue register).
ADDRESSING MODES
• The different ways in which the location of an operand is specified in an instruction are
referred to as addressing modes (Table 2.1).
• Clearly, the immediate mode is only used to specify the value of a source-operand.
INDIRECTION AND POINTERS
• In this case, the instruction does not give the operand or its address explicitly; instead, it
provides information from which the memory-address of the operand can be determined. We
refer to this address as the effective address(EA) of the operand.
Indirect Mode
• The EA of the operand is the contents of a register(or memory-location) whose address
appears in the instruction.
• The register (or memory-location) that contains the address of an operand is called a
pointer. {The indirection is denoted by ( ) sign around the register or memory-location}.
E.g: Add (R1),R0;The operand is in memory. Register R1 gives the effective-
address(B) of the operand. The data is read from location B and added to
contents of register R0
* To execute the Add instruction in fig (a), the processor uses the value which is in register
R1, as the EA of the operand.
* It requests a read operation from the memory to read the contents of location B. The value
read is the desired operand, which the processor adds to the contents of register R0.
* Indirect addressing through a memory location is also possible as shown in fig (b). In this
case, the processor first reads the contents of memory location A, then requests a second read
operation using the value B as an address to obtain the operand
RELATIVE MODE
• This is similar to index-mode with an exception: The effective address is determined using
the PC in place of the general purpose register Ri.
• The operation is indicated as X(PC).
ASSEMBLY LANGUAGE
• A complete set of symbolic names and rules for their use constitute an assembly language.
• The set of rules for using the mnemonics in the specification of complete instructions and
programs is called the syntax of the language.
• Programs written in an assembly language can be automatically translated into a sequence
of machine instructions by a program called an assembler.
• The user program in its original alphanumeric text formal is called a source program, and
the assembled machine language program is called an object program.
• Move instruction is written is
MOVE R0,SUM ;The mnemonic MOVE represents the binary
pattern, or OP code, for the operation performed by
the instruction.
• The instruction
ADD #5,R3 ;Adds the number 5 to the contents of register R3 and puts
the result back into register R3.
MEMORY-MAPPED I/O
• Some address values are used to refer to peripheral device buffer-registers such as DATAIN
and DATAOUT.
• No special instructions are needed to access the contents of the registers; data can be
transferred between these registers and the processor using instructions such as Move, Load
or Store.
• For example, contents of the keyboard character buffer DATAIN can be
transferred to register R1 in the processor by the instruction
MoveByte DATAIN,R1
• The MoveByte operation code signifies that the operand size is a byte.
• The Testbit instruction tests the state of one bit in the destination, where the bit
position to be tested is indicated by the first operand.
STACKS
• A stack is a list of data elements with the accessing restriction that elements can be added
or removed at one end of the list only. This end is called the top of the stack, and the other
end is called the bottom (Figure: 2.21).
• The terms push and pop are used to describe placing a new item on the stack and removing
the top item from the stack, respectively.
• A processor-register is used to keep track of the address of the element of the stack that is
at the top at any given time. This register is called the SP (Stack Pointer).
SUBROUTINES
• A subtask consisting of a set of instructions which is executed many times is called a
subroutine.
• The program branches to a subroutine with a Call instruction (Figure: 2.24).
• Once the subroutine is executed, the calling-program must resume execution starting from
the instruction immediately following the Call instructions i.e. control is to be transferred
back to the calling-program. This is done by executing a Return instruction at the end of the
subroutine.
• The way in which a computer makes it possible to call and return from subroutines is
referred to as its subroutine linkage method.
• The simplest subroutine linkage method is to save the return-address in a specific location,
which may be a register dedicated to this function. Such a register is called the link register.
• When the subroutine completes its task, the Return instruction returns to the calling-
program by branching indirectly through the link-register.
• The Call instruction is a special branch instruction that performs the following operations:
→ Store the contents of PC into link-register.
→ Branch to the target-address specified by the instruction.
• The Return instruction is a special branch instruction that performs the operation:
→ Branch to the address contained in the link-register.
STACK FRAME
• Stack frame refers to locations that constitute a private work-space for the subroutine
(Figure:2.26).
• The work-space is
→ created at the time the subroutine is entered &
→ freed up when the subroutine returns control to the calling-program.
• Following is a program for adding a list of numbers using subroutine with the parameters
passed to stack
Multiply Ri,Rj
Rj= [Ri]*[Rj]
Division Ri,Rj
Rj=[Ri]/[Rj]
INTERRUPTS
• I/O device initiates the action instead of the processor. This is done by sending a special hardware signal to the
processor called as interrupt(INTR), on the interrupt-request line.
• The processor can be performing its own task without the need to continuously check the I/O device.
• When device gets ready, it will "alert" the processor by sending an interrupt-signal (Figure 4.5).
• The routine executed in response to an interrupt-request is called ISR(Interrupt Service Routine).
• Once the interrupt-request signal comes from the device, the processor has to inform the device that its request
has been recognized and will be serviced soon. This is indicated by a special control signal on the bus called
interrupt-acknowledge(INTA).
Difference between subroutine & ISR
• A subroutine performs a function required by the program from which it is called.
However, the ISR may not have anything in common with the program being executed at the
time the interrupt-request is received. Before starting execution of ISR, any information that
may be altered during the execution of that routine must be saved. This information must be
restored before the interrupted-program resumed.
• Another difference is that an interrupt is a mechanism for coordinating I/O transfers whereas
a subroutine is just a linkage of 2 or more function related to each other.
• The speed of operation of the processor and I/O devices differ greatly. Also, since I/O devices are manually
operated in many cases (like pressing a key on keyboard), there may not be synchronization between the CPU
operations and I/O operations with reference to CPU clock. To cater to the different needs of I/O operations, 3
mechanisms have been developed for interfacing I/O devices. 1) Program controlled I/O 2) Interrupt I/O 3) Direct
memory access (DMA).
• Saving registers increases the delay between the time an interrupt request is received and the start of execution
of the ISR. This delay is called interrupt latency.
• Since interrupts can arrive at any time, they may alter the sequence of events. Hence, facility must be provided
to enable and disable interrupts as desired.
• Consider the case of a single interrupt request from one device. The device keeps the interrupt request signal
activated until it is informed that the processor has accepted its request. This activated signal, if not deactivated
may lead to successive interruptions, causing the system to enter into an infinite loop.
COMPUTER ORGANIZATION
INTERRUPT HARDWARE
• An I/O device requests an interrupt by activating a bus-line called interrupt-request(IR).
• A single IR line can be used to serve „n‟ devices (Figure 4.6).
• All devices are connected to IR line via switches to ground.
• To request an interrupt, a device closes its associated switch. Thus, if all IR signals are inactive(i.e. if all switches
are open), the voltage on the IR line will be equal to Vdd.
• When a device requests an interrupt by closing its switch, the voltage on the line drops to 0, causing the INTR
received by the processor to goto 1.
• The value of INTR is the logical OR of the requests from individual devices
INTR=INTR1+ INTR2+ . . . . . +INTRn
• A special gate known as open-collector or open-drain are used to drive the INTR line.
• Resistor R is called a pull-up resistor because
it pulls the line voltage up to the high-voltage state when the switches are open.
Vectored Interrupts
• A device requesting an interrupt identifies itself by sending a special-code to processor over bus. (This enables
processor to identify individual devices even if they share a single interrupt-request line).
• The code represents starting-address of ISR for that device.
• ISR for a given device must always start at same location.
• The address stored at the location pointed to by interrupting-device is called the interrupt-vector.
• Processor
→ loads interrupt-vector into PC &
→ executes appropriate ISR
• Interrupting-device must wait to put data on bus only when processor is ready to receive it.
• When processor is ready to receive interrupt-vector code, it activates INTA line.
• I/O device responds by sending its interrupt-vector code & turning off the INTR signal.
COMPUTER ORGANIZATION
CONTROLLING DEVICE REQUESTS
• There are 2 independent mechanisms for controlling interrupt requests.
• At device-end, an interrupt-enable bit in a control register determines whether device is allowed to generate an
interrupt request.
• At processor-end, either an interrupt-enable bit in the PS register or a priority structure determines whether a
given interrupt-request will be accepted.
COMPUTER ORGANIZATION
INTERRUPT NESTING
• A multiple-priority scheme is implemented by using separate INTR & INTA lines for each device
• Each of the INTR lines is assigned a different priority-level (Figure 4.7).
• Priority-level of processor is the priority of program that is currently being executed.
• During execution of an ISR, interrupt-requests will be accepted from some devices but not from others
depending upon device’s priority.
• Processor accepts interrupts only from devices that have priority higher than its own.
• At the time of execution of an ISR for some device is started, priority of processor is raised to that of the device
• Processor's priority is encoded in a few bits of processor-status (PS) word. This can be changed by program
instructions that write into PS. These are called privileged instructions.
• Privileged-instructions can be executed only while processor is running in supervisor-mode.
• Processor is in supervisor-mode only when executing operating-system routines. (An attempt to execute a
privileged-instruction while in the user-mode leads to a special type of interrupt called a privileged exception).
SIMULTANEOUS REQUESTS
• INTR line is common to all devices (Figure 4.8).
• INTA line is connected in a daisy-chain fashion such that INTA signal propagates serially through devices.
• When several devices raise an interrupt-request and INTR line is activated, processor responds by setting INTA
line to 1. This signal is received by device 1.
• Device 1 passes signal on to device 2 only if it does not require any service.
• If device 1 has a pending-request for interrupt, it blocks INTA signal and proceeds to put its identifying code on
data lines.
• Device that is electrically closest to processor has highest priority.
• Main advantage: This allows the processor to accept interrupt-requests from some devices but not
from others depending upon their priorities.
COMPUTER ORGANIZATION
EXCEPTIONS
• An interrupt is an event that causes
→ execution of one program to be suspended &
→ execution of another program to begin.
• Exception refers to any event that causes an interruption.
I/O interrupts are one example of an exception.
Recovery from Errors
• Computers use a variety of techniques to ensure that all hardware-components are operating properly. For e.g.
many computers include an error-checking code in main-memory which allows detection of errors in stored-data.
• If an error occurs, control-hardware detects it & informs processor by raising an interrupt.
• When exception processing is initiated (as a result of errors), processor
→ suspends program being executed &
→ starts an ESR(Exception Service Routine). This routine takes appropriate action to recover from the
error to inform user about it.
Debugging
• Debugger
→ helps programmer find errors in a program and
→ uses exceptions to provide 2 important facilities: 1) Trace & 2) Breakpoints
• When a processor is operating in trace-mode, an exception occurs after execution of every instruction (using
debugging-program as ESR).
• Debugging-program enables user to examine contents of registers (AX, BX), memory-locations and so on.
• On return from debugging-program,
next instruction in program being debugged is executed,
then debugging-program is activated again.
• Breakpoints provide a similar facility except that program being debugged is interrupted only at specific points
selected by user. An instruction called Trap(or Software interrupt) is usually provided for this purpose.
Privilege Exception
• To protect OS of computer from being corrupted by user-programs, certain instructions can be executed only
while processor is in supervisor-mode. These are called privileged instructions.
• For e.g. when the processor is running in user-mode, it will not execute an instruction that changes priority-level
of processor.
• An attempt to execute such an instruction will produce a privilege-exception. As a result, processor switches to
supervisor-mode & begins to execute an appropriate routine in OS.
COMPUTER ORGANIZATION
DIRECT MEMORY ACCESS (DMA)
• The transfer of a block of data directly between an external device & main memory without continuous
involvement by processor is called as DMA.
• DMA transfers are performed by a control-circuit that is part of I/O device interface. This circuit is called as a
DMA controller (Figure 4.19).
• DMA controller performs the functions that would normally be carried out by processor
• In controller, 3 registers are accessed by processor to initiate transfer operations (Figure 4.18):
1) Two registers are used for storing starting-address & word-count
2) Third register contains status- & control-flags
• The R/W bit determines direction of transfer.
When R/W=1, controller performs a read operation(i.e. it transfers data from memory to I/O),
Otherwise it performs a write operation (i.e. it transfers data from I/O device to memory).
• When Done=1, controller
→ completes transferring a block of data &
→ is ready to receive another command.
• When IE=1, controller raises an interrupt after it has completed transferring a block of data (IE=Interrupt
Enable).
• Finally, when IRQ=1, controller requests an interrupt. (Requests by DMA devices for using the bus are always
given higher priority than processor requests).
• There are 2 ways in which the DMA operation can be carried out:
2) In one method, processor originates most memory-access cycles. DMA controller is said to "steal"
memory cycles from processor. Hence, this technique is usually called cycle stealing.
3) In second method, DMA controller is given exclusive access to main-memory to transfer a block of data
without any interruption. This is known as block mode (or burst mode).
COMPUTER ORGANIZATION
BUS ARBITRATION
• The device that is allowed to initiate data transfers on bus at any given time is called bus-master.
• There can be only one bus master at any given time.
• Bus arbitration is the process by which next device to become the bus-master is selected and bus-mastership is
transferred to it.
• There are 2 approaches to bus arbitration:
1) In centralized arbitration, a single bus-arbiter performs the required arbitration.
2) In distributed arbitration, all device participate in selection of next bus-master.
CENTRALIZED ARBITRATION
• A single bus-arbiter performs the required arbitration (Figure: 4.20 & 4.21).
• Normally, processor is the bus. master unless it grants bus mastership to one of the DMA controllers.
• A DMA controller indicates that it needs to become busmaster by activating Bus-Request line(BR).
• The signal on the BR line is the logical OR of bus-requests from all devices connected to it.
• When BR is activated, processor activates Bus-Grant signal(BG1) indicating to DMA controllers that they may
use bus when it becomes free. (This signal is connected to all DMA controllers using a daisy-chain arrangement).
• If DMA controller-1 is requesting the bus, it blocks propagation of grant-signal to other devices.
Otherwise, it passes the grant downstream by asserting BG2.
• Current bus-master indicates to all devices that it is using bus by activating Bus-Busy line (BBSY).
• Arbiter circuit ensures that only one request is granted at any given time according to a predefined priority
scheme
A conflict may arise if both the processor and a DMA controller try to use the bus at the same time to access the
main memory. To resolve these conflicts, a special circuit called the bus arbiter is provided to coordinate the
activities of all devices requesting memory transfers
COMPUTER ORGANIZATION
DISTRIBUTED ARBITRATION
• All device participate in the selection of next bus-master (Figure 4.22)
• Each device on bus is assigned a 4-bit identification number (ID).
• When 1 or more devices request bus, they
→ assert Start-Arbitration signal &
→ place their 4-bit ID numbers on four open-collector lines ARB 0 through ARB 3 .
• A winner is selected as a result of interaction among signals transmitted over these lines by all contenders.
• Net outcome is that the code on 4 lines represents request that has the highest ID number.
• Main advantage: This approach offers higher reliability since operation of bus is not dependent on any single
device.
COMPUTER ORGANIZATION
BUSES
• Bus
→ is used to inter-connect main-memory, processor & I/O devices
→ includes lines needed to support interrupts & arbitration
• Primary function: To provide a communication-path for transfer of data.
• Bus protocol is set of rules that govern the behaviour of various devices connected to the buses.
• Bus-protocol specifies parameters such as:
→ asserting control-signals
→ timing of placing information on bus
→ rate of data-transfer
• A typical bus consists of 3 sets of lines: 1) Address, 2) Data and 3) Control lines.
• Control-signals specify whether a read or a write operation is to be performed.
• R/W line specifies
→ read operation when R/W=1
→ write operation when R/W=0
• In data-transfer operation, one device plays the role of a bus-master which initiates data transfers by issuing
Read or Write commands on bus ( Hence it may be called an initiator).
• Device addressed by master is referred to as a slave (or target).
• Timing of data transfers over a bus is classified into 2 types:
1) Synchronous and 2) Asynchronous
SYNCHRONOUS BUS
• All devices derive timing-information from a common clock-line.
• Equally spaced pulses on this line define equal time intervals.
• Each of these intervals constitutes a bus-cycle during which one data transfer can take place.
A sequence of events during a read operation:
• At time t0, the master (processor)
→ places the device-address on address-lines &
→ Sends an appropriate command on control-lines (Figure 4.23).
• Information travels over bus at a speed determined by its physical & electrical characteristics.
• Clock pulse width (t1-t0) must be longer than the maximum propagation-delay between 2 devices connected to
bus.
• Information on bus is unreliable during the period t0 to t1 because signals are changing state.
• Slave places requested input-data on data-lines at time t1.
• At end of clock cycle(at time t2), master strobes(captures) data on data-lines into its input-buffer
• For data to be loaded correctly into any storage device (such as a register built with flip-flops), data must be
available at input of that device for a period greater than setup-time of device.
COMPUTER ORGANIZATION
ASYNCHRONOUS BUS
• This method uses handshake-signals between master and slave for coordinating data transfers.
• There are 2 control-lines:
1) Master-ready(MR) to indicate that master is ready for a transaction
2) Slave-ready(SR) to indicate that slave is ready to respond
The read operation proceeds as follows:
• At t0, master places address- & command-information on bus. All devices on bus begin to
decode this information.
• At t1, master sets MR-signal to 1 to inform all devices that the address- & command-information
is ready.
• At t2, selected slave performs required input-operation & sets SR signal to 1 (Figure 4.26).
• At t3, SR signal arrives at master indicating that the input-data are available on bus skew.
• At t4, master removes address- & command-information from bus.
• At t5, when the device-interface receives the 1-to-0 transition of MR signal, it removes data and
SR signal from the bus. This completes the input transfer
INTERFACE CIRCUITS
I/O interface consists of the circuitry required to connect an I/O device to a computer bus.
Side of the interface which connects to the computer has bus signals for:
Address,
Data
Control
Side of the interface which connects to the I/O device has:
Datapath and associated controls to transfer data between the interface and the
I/O device.
This side is called as a “port”.
Ports can be classified into two:
Parallel port,
Serial port.
Parallel port transfers data in the form of a number of bits, normally 8 or 16 to or from the
device.
Serial port transfers and receives data one bit at a time.
Processor communicates with the bus in the same way, whether it is a parallel port or a
serial port.
Conversion from the parallel to serial and vice versa takes place inside the
interface circuit.
Dat
a
Addres
DATAIN Dat
s
a Encoder
R /W an Keyboard
Processo SIN d
debouncing switche
r Master-ready circuit s
Valid
Input
Sl v e-ready
interface
a
Data
Processor
CPU R /W SOUT Printer
Valid
Master -ready
Output Idle
Sla ve-ready interface
• Data lines of the processor bus are connected to the DATAOUT register of the interface.
• The status flag SOUT is connected to the data line D1 using a three-state driver.
• The three-state driver is turned on, when the control Read-status line is 1.
• Address decoder selects the output interface using address lines A1 through A31.
• Address line A0 determines whether the data is to be loaded into the DATAOUT register or
status flag is to be read.
• If the Load-data line is 1, then the Valid line is set to 1.
• If the Idle line is 1, then the status flag SOUT is set to 1.
• Address bits A2 through A31, that is 30 bits are used to select the overall interface.
• Address bits A1 through A0, that is, 2 bits select one of the three registers, namely,
DATAIN, DATAOUT, and the status register.
• Status register contains the flags SIN and SOUT in bits 0 and 1.
• Data lines PA0 through PA7 connect the input device to the DATAIN register.
• DATAOUT register connects the data lines on the processor bus to lines PB0 through PB7
which connect to the output device.
• Separate input and output data lines for connection to an I/O device. Refer fig no. 4.33
SERIAL PORT
Serial port is used to connect the processor to I/O devices that require transmission of
data one bit at a time.
Serial port communicates in a bit-serial fashion on the device side and bit parallel fashion
on the bus side.
Transformation between the parallel and serial formats is achieved with shift
registers that have parallel access capability.
• Input shift register accepts input one bit at a time from the I/O device. Refer fig no.4.37
• Once all the 8 bits are received, the contents of the input shift register are loaded in
parallel into DATAIN register.
• Output data in the DATAOUT register are loaded into the output shift register.
• Bits are shifted out of the output shift register and sent out to the I/O device one bit at a
time.
• As soon as data from the input shift registers are loaded into DATAIN, it can start
accepting another 8 bits of data.
• Input shift register and DATAIN registers are both used at input so that the input shift
register can start receiving another set of 8 bits from the input device after loading the
contents to DATAIN, before the processor reads the contents of DATAIN. This is called as
double-buffering.
• Serial interfaces require fewer wires, and hence serial transmission is convenient for
connecting devices that are physically distant from the computer.
• Speed of transmission of the data over a serial interface is known as the “bit rate”.
• Bit rate depends on the nature of the devices connected.
• In order to accommodate devices with a range of speeds, a serial interface must be able to
use a range of clock speeds.
• Several standard serial interfaces have been developed:
• Universal Asynchronous Receiver Transmitter (UART) for low-speed serial devices.
• RS-232-C for connection to communication links.
PCI BUS
Peripheral Component Interconnect
Introduced in 1992
Low-cost bus
Processor independent
Plug-and-play capability
In today’s computers, most memory transfers involve a burst of data rather than just one
word. The PCI is designed primarily to support this mode of operation.
The bus supports three independent address spaces: memory, I/O, and configuration.
we assumed that the master maintains the address information on the bus until data
transfer is completed. But, the address is needed only long enough for the slave to be
selected. Thus, the address is needed on the bus for one clock cycle only, freeing the
address lines to be used for sending data in subsequent clock cycles. The result is a
significant cost reduction.
A master is called an initiator in PCI terminology. The addressed device that responds to
read and write commands is called a target.
Refer table 4.3 and 4.40 from text
Device configuration
When an I/O device is connected to a computer, several actions are needed to configure
both the device and the software that communicates with it.
PCI incorporates in each I/O device interface a small configuration ROM memory that
stores information about that device.
The configuration ROMs of all devices are accessible in the configuration address space.
The PCI initialization software reads these ROMs and determines whether the device is a
printer, a keyboard, an Ethernet interface, or a disk controller. It can further learn bout
various device options and characteristics.
Devices are assigned addresses during the initialization process.
This means that during the bus configuration operation, devices cannot be accessed based
on their address, as they have not yet been assigned one.
Hence, the configuration address space uses a different mechanism. Each device has an
input signal called Initialization Device Select, IDSEL#
Electrical characteristics:
PCI bus has been defined for operation with either a 5 or 3.3 V power supply
SCSI BUS
The acronym SCSI stands for Small Computer System Interface.
It refers to a standard bus defined by the American National Standards Institute (ANSI)
under the designation X3.131 .
In the original specifications of the standard, devices such as disks are connected to a
computer via a 50-wire cable, which can be up to 25 meters in length and can transfer
data at rates up to 5 megabytes/s.
The SCSI bus standard has undergone many revisions, and its data transfer capability has
increased very rapidly, almost doubling every two years.
SCSI-2 and SCSI-3 have been defined, and each has several options.
Because of various options SCSI connector may have 50, 68 or 80 pins.
Devices connected to the SCSI bus are not part of the address space of the processor
The SCSI bus is connected to the processor bus through a SCSI controller. This controller
uses DMA to transfer data packets from the main memory to the device, or vice versa.
A packet may contain a block of data, commands from the processor to the device, or
status information about the device.
A controller connected to a SCSI bus is one of two types – an initiator or a target.
An initiator has the ability to select a particular target and to send commands specifying
the operations to be performed. The disk controller operates as a target. It carries out the
commands it receives from the initiator.
The initiator establishes a logical connection with the intended target.
Once this connection has been established, it can be suspended and restored as needed to
transfer commands and bursts of data.
While a particular connection is suspended, other device can use the bus to transfer
information.
This ability to overlap data transfer requests is one of the key features of the SCSI bus
that leads to its high performance.
Data transfers on the SCSI bus are always controlled by the target controller.
To send a command to a target, an initiator requests control of the bus and, after winning
arbitration, selects the controller it wants to communicate with and hands control of the
bus over to it.
Then the controller starts a data transfer operation to receive a command from the
initiator.
Assume that processor needs to read block of data from a disk drive and that data are
stored in disk sectors that are not contiguous.
The processor sends a command to the SCSI controller, which causes the following
sequence of events to take place:
The SCSI controller, acting as an initiator, contends for control of the bus.
When the initiator wins the arbitration process, it selects the target controller and
hands over control of the bus to it.
The target starts an output operation (from initiator to target); in response to this,
the initiator sends a command specifying the required read operation.
The target, realizing that it first needs to perform a disk seek operation, sends a
message to the initiator indicating that it will temporarily suspend the connection
between them. Then it releases the bus.
The target controller sends a command to the disk drive to move the read head to
the first sector involved in the requested read operation. Then, it reads the data
stored in that sector and stores them in a data buffer. When it is ready to begin
transferring data to the initiator, the target requests control of the bus. After it
wins arbitration, it reselects the initiator controller, thus restoring the suspended
connection.
The target transfers the contents of the data buffer to the initiator and then
suspends the connection again
The target controller sends a command to the disk drive to perform another seek
operation. Then, it transfers the contents of the second disk sector to the initiator
as before. At the end of this transfers, the logical connection between the two
controllers is terminated.
As the initiator controller receives the data, it stores them into the main memory
using the DMA approach.
The SCSI controller sends as interrupt to the processor to inform it that the
requested operation has been completed
The maximum size of the Main Memory (MM) that can be used in any computer is
determined by its addressing scheme. For example, a 16-bit computer that generates 16-bit
16
addresses is capable of addressing upto 2 =64K memory locations. If a machine generates
32
32-bit addresses, it can access upto 2 = 4G memory locations. This number represents the
size of address space of the computer.
0 0 1 2 3
4 4 5 6 7
8 8 9 10 11
. …..
. …..
. …..
With the above structure a READ or WRITE may involve an entire memory word or
it may involve only a byte. In the case of byte read, other bytes can also be read but ignored
by the CPU. However, during a write cycle, the control circuitry of the MM must ensure that
only the specified byte is altered. In this case, the higher-order 30 bits can specify the word
and the lower-order 2 bits can specify the byte within the word.
It is a useful measure of the speed of the memory unit. It is the time that elapses
between the initiation of an operation and the completion of that operation (for example,
the time between READ and MFC).
Memory Cycle Time :-
It is an important measure of the memory system. It is the minimum time delay
required between the initiations of two successive memory operations (for example, the
time between two successive READ operations). The cycle time is usually slightly longer
than the access time.
RAM: A memory unit is called a Random Access Memory if any location can be
accessed for a READ or WRITE operation in some fixed amount of time that is
independent of the location‟s address. Main memory units are of this type. This
distinguishes them from serial or partly serial access storage devices such as magnetic
tapes and disks which are used as the secondary storage device.
Cache Memory:-
The CPU of a computer can usually process instructions and data faster than they can be
fetched from compatibly priced main memory unit. Thus the memory cycle time becomes
the bottleneck in the system. One way to reduce the memory access time is to use cache
memory. This is a small and fast memory that is inserted between the larger, slower main
memory and the CPU. This holds the currently active segments of a program and its data.
Because of the locality of address references, the CPU can, most of the time, find the
relevant information in the cache memory itself (cache hit) and infrequently needs access
to the main memory (cache miss) with suitable size of the cache memory, cache hit rates
of over 90% are possible leading to a cost-effective increase in the performance of the
system.
Memory Interleaving: -
This technique divides the memory system into a number of memory modules and
arranges addressing so that successive words in the address space are placed in different
modules. When requests for memory access involve consecutive addresses, the access
will be to different modules. Since parallel access to these modules is possible, the
average rate of fetching words from the Main Memory can be increased.
Virtual Memory: -
In a virtual memory System, the address generated by the CPU is referred to as a virtual
or logical address. The corresponding physical address can be different and the required
mapping is implemented by a special memory control unit, often called the memory
management unit. The mapping function itself may be changed during program execution
according to system requirements.
Because of the distinction made between the logical (virtual) address space and the
physical address space; while the former can be as large as the addressing capability of
the CPU, the actual physical memory can be much smaller. Only the active portion of the
virtual address space is mapped onto the physical memory and the rest of the virtual
address space is mapped onto the bulk storage device used. If the addressed information
is in the Main Memory (MM), it is accessed and execution proceeds. Otherwise, an
exception is generated, in response to which the memory management unit transfers a
contigious block of words containing the desired word from the bulk storage unit to the
MM, displacing some block that is currently inactive. If the memory is managed in such a
way that, such transfers are required relatively infrequency (ie the CPU will generally
find the required information in the MM), the virtual memory system can provide a
reasonably good performance and succeed in creating an illusion of a large memory with
a small, in expensive MM.
Memory cells are usually organized in the form of an array, in which each cell is
capable of storing on bit of information. Each row of cells constitutes a memory word,
and all cells of a row are connected to a common line referred to as the word line, which
is driven by the address decoder on the chip. The cells in each column are connected to a
Sense/Write circuit by two bit lines. The Sense/Write circuits are connected to the data
I/O lines of the chip. During the read operation, these circuits‟ sense, or read, the
information stored in the cells selected by a word line and transmit this information to the
output data lines. During the write operation, the Sense/Write circuits receive the input
information and store in the cells of the selected word.
The above figure is an example of a very small memory chip consisting of 16 words of 8
bits each. This is referred to as a 16×8 organization. The data input and the data output of
each Sense/Write circuit are connected to a single bidirectional data line that can be
connected to the data bus of a computer. Two control lines, R/W (Read/ Write) input
specifies the required operation, and the CS (Chip Select) input selects a given chip in a
multichip memory system.
The memory circuit given above stores 128 and requires 14 external connections for
address, data and control lines. Of course, it also needs two lines for power supply and
ground connections. Consider now a slightly larger memory circuit, one that has a 1k
(1024) memory cells. For a 1k×1 memory organization, the representation is given next.
The required 10-bit address is divided into two groups of 5 bits each to form the row and
column addresses for the cell array. A row address selects a row of 32 cells, all of which
are accessed in parallel. However, according to the column address, only one of these
cells is connected to the external data line by the output multiplexer and input de-
multiplexer.
5.2.2 Static Memories
Memories that consist of circuits capable of retaining their state as long as power is
applied are known as static memories.
Y
Bit lines
The above figure illustrates how a static RAM (SRAM) cell may be implemented. Two
inverters are cross- connected to form a latch. The latch is connected to two bit lines by
transistors T1 and T2. These transistors act as switches that can be opened or closed under
control of the word line. When the word line is at ground level, the transistors are turned
off and the latch retains its state. For example, let us assume that the cell is in state 1 if the
logic value at point X is 1 and at point Y is 0. This state is maintained as long as the signal
on the word line is at ground level.
Read Operation
In order to read the state of the SRAM cell, the word line is activated to close switches T 1
and T2. If the cell is in state 1, the signal on the bit line b is high and the signal on the bit
line b‟ is low. The opposite is true if the cell is in state 0. Thus b and b‟ are compliments
of each other. Sense/Write circuits at the end of the bit lines monitor the state of b and b‟
and set the output accordingly.
Write Operation
The state of the cell is set by placing the appropriate value on bit line b and its
complement b‟, and then activating the word line. This forces the cell into the
corresponding state. The required signals on the bit lines are generated by the Sense/Write
circuit.
CMOS Cell
A sense amplifier connected to the bit line detects whether the charge stored on the
capacitor is above the threshold. If so, it drives the bit line to a full voltage that represents
logic value 1. This voltage recharges the capacitor to full charge that corresponds to logic
value 1. If the sense amplifier detects that the charge on the capacitor will have no charge,
representing logic value 0.
A 16-megabit DRAM chip, configured as 2M×8, is shown below.
Each row can store 512 bytes. 12 bits to select a row, and 9 bits to select a group
in a row. Total of 21 bits.
• First apply the row address; RAS signal latches the row address. Then apply the
column address, CAS signal latches the address.
In these DRAMs, operation is directly synchronized with a clock signal. The below
given figure indicates the structure of an SDRAM.
The above figure shows the timing diagram for a burst read of length 4.
First, the row address is latched under control of the RAS signal.
Then, the column address latched under control of the CAS signal.
After a delay of one clock cycle, the first set of data bits is placed on the
data lines.
The SDRAM automatically increments the column address to access next
three sets of the bits in the selected row, which are placed on the data lines
in the next clock cycles.
Memory latency is the time it takes to transfer a word of data to or from memory
Memory bandwidth is the number of bits or bytes that can be transferred in one
second.
To assist the processor in accessing data at high enough rate, the cell array is organized
in two banks. Each bank can be accessed separately. Consecutive words of a given block
are stored in different banks. Such interleaving of words allows simultaneous access to
two words that are transferred on the successive edges of the clock. This type of SDRAM
is called Double Data Rate SDRAM (DDR- SDRAM).
Placing large memory systems directly on the motherboard will occupy a large
amount of space.
Memory modules are an assembly of memory chips on a small board that plugs
vertically onto a single socket on the motherboard.
Recall that in a dynamic memory chip, to reduce the number of pins, multiplexed
addresses are used.
Recall that in a dynamic memory chip, to reduce the number of pins, multiplexed
addresses are used.
Refresh Operation:-
The Refresh control block periodically generates Refresh, requests, causing the access
control block to start a memory cycle in the normal way. This block allows the refresh
operation by activating the Refresh Grant line. The access control block arbitrates
between Memory Access requests and Refresh requests, with priority to refresh requests
in the case of a tie to ensure the integrity of the stored data.
As soon as the Refresh control block receives the Refresh Grant signal, it activates the
Refresh line. This causes the address multiplexer to select the Refresh counter as the
source and its contents are thus loaded into the row address latches of all memory chips
when the RAS signal is activated.
Data are written into a ROM at the time of manufacture programmable ROM
(PROM) devices allow the data to be loaded by the user. Programmability is achieved by
connecting a fuse between the emitter and the bit line. Thus, prior to programming, the
memory contains all 1s. The user can inserts Os at the required locations by burning out
the fuses at these locations using high-current pulses. This process is irreversible.
ROMs are attractive when high production volumes are involved. For smaller numbers,
PROMs provide a faster and considerably less expensive approach. Chips which allow
the stored data to be erased and new data to be loaded. Such a chip is an erasable,
programmable ROM, usually called an EPROM. It provides considerable flexibility
during the development phase. An EPROM cell bears considerable resemblance to the
dynamic memory cell. As in the case of dynamic memory, information is stored in the
form of a charge on a capacitor. The main difference is that the capacitor in an EPROM
cell is very well insulated. Its rate of discharge is so low that it retains the stored
information for very long periods. To write information, allowing charge to be stored on
the capacitor.
The contents of EPROM cells can be erased by increasing the discharge rate of the
storage capacitor by several orders of magnitude. This can be accomplished by allowing
ultraviolet light into the chip through a window provided for that purpose, or by the
application of a high voltage similar to that used in a write operation. If ultraviolet light is
used, all cells in the chip are erased at the same time. When electrical erasure is used,
however, the process can be made selective. An electrically erasable EPROM, often
referred to as EEPROM. However, the circuit must now include high voltage generation.
Some EEPROM chips incorporate the circuitry for generating these voltages o the chip
itself. Depending on the requirements, suitable device can be selected.
Flash memory:
Read the contents of a single cell, but write the contents of an entire block
of cells.
Single flash chips are not sufficiently large, so larger memory modules are
implemented using flash cards and flash drives.
(REFER slides for point wise notes on RoM and types of ROM)
Static RAM: Very fast, but expensive, because a basic SRAM cell has a complex circuit
making it impossible to pack a large number of cells onto a single chip.
Dynamic RAM: Simpler basic cell circuit, hence are much less expensive, but
significantly slower than SRAMs.
Magnetic disks: Storage provided by DRAMs is higher than SRAMs, but is still less than
what is necessary. Secondary storage such as magnetic disks provides a large amount of
storage, but is much slower than DRAMs.
Fastest access is to the data held in processor registers. Registers are at the top of the
memory hierarchy. Relatively small amount of memory that can be implemented on the
processor chip. This is processor cache. Two levels of cache. Level 1 (L1) cache is on the
processor chip. Level 2 (L2) cache is in between main memory and processor. Next level
is main memory, implemented as SIMMs. Much larger, but much slower than cache
memory. Next level is magnetic disks. Huge amount of inexpensive storage. Speed of
memory access is critical, the idea is to bring instructions and data that will be used in the
near future as close to the processor as possible.
5.5 Cache memories
Processor is much faster than the main memory. As a result, the processor has to spend
much of its time waiting while instructions and data are being fetched from the main
memory. This serves as a major obstacle towards achieving good performance. Speed of
the main memory cannot be increased beyond a certain point. So we use Cache
memories. Cache memory is an architectural arrangement which makes the main memory
appear faster to the processor than it really is. Cache memory is based on the property of
computer programs known as “locality of reference”.
Analysis of programs indicates that many instructions in localized areas of a program are
executed repeatedly during some period of time, while the others are accessed relatively
less frequently. These instructions may be the ones in a loop, nested loop or few
procedures calling each other repeatedly. This is called “locality of reference”. Its types
are:
• Processor issues a Read request, a block of words is transferred from the main
memory to the cache, one word at a time.
• Subsequent references to the data in this block of words are found in the cache.
• At any given time, only some blocks in the main memory are held in the cache.
Which blocks in the main memory are in the cache is determined by a “mapping
function”.
• When the cache is full, and a block of words needs to be transferred from the main
memory, some block of words in the cache must be replaced. This is determined
by a “replacement algorithm”.
Cache hit:
Existence of a cache is transparent to the processor. The processor issues Read and
Write requests in the same manner. If the data is in the cache it is called a Read or Write
hit.
Write hit: Cache has a replica of the contents of the main memory. Contents of the cache
and the main memory may be updated simultaneously. This is the write-through protocol.
Update the contents of the cache, and mark it as updated by setting a bit known as the
dirty bit or modified bit. The contents of the main memory are updated when this
block is replaced. This is write-back or copy-back protocol.
Cache miss:
• If the data is not present in the cache, then a Read miss or Write miss occurs.
• Read miss: Block of words containing this requested word is transferred from the
memory. After the block is transferred, the desired word is forwarded to the
processor. The desired word may also be forwarded to the processor as soon as it
is transferred without waiting for the entire block to be transferred. This is called
load-through or early-restart.
• Write-miss: Write-through protocol is used, then the contents of the main memory
are updated directly. If write-back protocol is used, the block containing the
addressed word is first brought into the cache. The desired word is overwritten
with new information.
A bit called as “valid bit” is provided for each block. If the block contains valid data, then
the bit is set to 1, else it is 0. Valid bits are set to 0, when the power is just turned on.
When a block is loaded into the cache for the first time, the valid bit is set to 1. Data
transfers between main memory and disk occur directly bypassing the cache. When the
data on a disk changes, the main memory block is also updated. However, if the data is
also resident in the cache, then the valid bit is set to 0.
The copies of the data in the cache, and the main memory are different. This is called the
cache coherence problem
Mapping functions: Mapping functions determine how memory blocks are placed in the
cache.
1. Direct mapping
2. Associative mapping
3. Set-associative mapping.
Main
memory Block 0 •Block j of the main memory maps to j modulo 128 of
Block 1
the cache. 0 maps to 0, 129 maps to 1.
Cache
tag
•More than one memory block is mapped onto the same
Block 0 position in the cache.
tag •May lead to contention for cache blocks even if the
Block 1
cache is not full.
Block 127
•Resolve the contention by allowing new block to
Block 128 replace the old block, leading to a trivial replacement
tag
Block 127 Block 129
algorithm.
•Memory address is divided into three fields:
- Low order 4 bits determine one of the 16
words in a block.
- When a new block is brought into the cache,
Block 255 the the next 7 bits determine which cache
Tag Block Word
5 7 4 Block 256 block this new block is placed in.
- High order 5 bits determine which of the possible
Main memory address Block 257
32 blocks is currently present in the cache. These
are tag bits.
•Simple to implement but not very flexible.
Block 4095
Main Block 0
memory
Block 1
•Main memory block can be placed into any cache
Cache
tag
position.
Block 0 •Memory address is divided into two fields:
tag
Block 1 - Low order 4 bits identify the word within a block.
- High order 12 bits or tag bits identify a memory
Block 127
block when it is resident in the cache.
Block 128 •Flexible, and uses cache space efficiently.
tag
Block 127 Block 129
•Replacement algorithms can be used to replace an
existing block in the cache when the cache is full.
•Cost is higher than direct-mapped cache because of
the need to search all 128 patterns to determine
whether a given block is in the cache.
Block 255
Tag Word
12 4 Block 256
Block 4095
Cache
Main Block 0 Blocks of cache are grouped into sets.
memory
tag Block 0 Mapping function allows a block of the main
Block 1
tag
memory to reside in any block of a specific set.
Block 1
Divide the cache into 64 sets, with two blocks per set.
tag Block 2 Memory block 0, 64, 128 etc. map to block 0, and they
tag Block 3 can occupy either of the two positions.
Block 63 Memory address is divided into three fields:
Block 64 - 6 bit field determines the set number.
tag - High order 6 bit fields are compared to the tag
Block 126 Block 65
fields of the two blocks in a set.
tag
Block 127 Set-associative mapping combination of direct and
associative mapping.
Number of blocks per set is a design parameter.
Tag Block Word
Block 127 - One extreme is to have all the blocks in one set,
Block 128 requiring no set bits (fully associative mapping).
5 7 4
- Other extreme is to have one block per set, is
Block 129
Main memory address the same as direct mapping.
Block 4095
Replacement Algorithm
In a direct-mapped cache, the position of each block is fixed, hence no replacement strategy
exists. In associative and set-associative caches, when a new block is to be brought into the
cache and all the Positions that it may occupy are full, the cache controller must decide which
of the old blocks to overwrite. This is important issue because the decision can be factor in
system performance. The objective is to keep blocks in the cache that are likely to be
referenced in the near future. Its not easy to determine which blocks are about to be
referenced. The property of locality of reference gives a clue to a reasonable strategy. When a
block is to be over written, it is sensible to overwrite the one that has gone the longest time
without being referenced. This block is called the least recently used (LRU) block, and
technique is called the LRU Replacement algorithm. The LRU algorithm has been used
extensively for many access patterns, but it can lead to poor performance in some cases. For
example, it produces disappointing results when accesses are made to sequential elements of
an array that is slightly too large to fit into the cache. Performance of LRU algorithm can be
improved by introducing a small amount of randomness in deciding which block to replace.
Solved Problems:-
1. A block set associative cache consists of a total of 64 blocks divided into 4 block
sets. The MM contains 4096 blocks each containing 128 words.
b) How many bits are there in each of the TAG, SET & word fields
8 4 7
Performance of a processor depends on: How fast machine instructions can be brought into
the processor for execution. How fast the instructions can be executed.
Interleaving
Divides the memory system into a number of memory modules. Each module has its own
address buffer register (ABR) and data buffer register (DBR). Arranges addressing so that
successive words in the address space are placed in different modules. When requests for
memory access involve consecutive addresses, the access will be to different modules.
Since parallel access to these modules is possible, the average rate of fetching words from the
Main Memory can be increased.
Methods of address layout:
1)
Consecutive words are placed in a module. High-order k bits of a memory address determine
the module. Low-order m bits of a memory address determine the word within a module.
When a block of words is transferred from main memory to cache, only one module is busy
at a time.
2)
Consecutive words are located in consecutive modules. Consecutive addresses can be located
in consecutive modules. While transferring a block of data, several memory modules can be
kept busy at the same time.
Hit rate and miss penalty
The number of hits stated as fraction of all attempted accesses is called the hit rate
and the number of misses stated as a fraction of all attempted accesses is called the
miss rate.
The extra time needed to bring the desired information into cache is called as miss
penalty.
Hit rate can be improved by increasing block size, while keeping cache size constant
Block sizes that are neither very small nor very large give best results.
Miss penalty can be reduced if load-through approach is used when loading new
blocks into cache.
Avg access time experienced by processor is
tave = hC + (1-h)M
h1- hitrate of Cache 1, h2- hit rate of Cache 2, M – miss penalty, c1- time to access info
from cache1, c2- time to access info from cache2.
Write buffer
Write-through: Each write operation involves writing to the main memory. If the
processor has to wait for the write operation to be complete, it slows down the
processor. Processor does not depend on the results of the write operation. Write buffer
can be included for temporary storage of write requests. Processor places each write
request into the buffer and continues execution. If a subsequent Read request references
data which is still in the write buffer, then this data is referenced in the write buffer.
Write-back: Block is written back to the main memory when it is replaced. If the
processor waits for this write to complete, before reading the new block, it is slowed
down.Fast write buffer can hold the block to be written, and the new block can be read
first.
In addition to these prefetching and lock up free cache can also be used to improve
The addressable memory space depends on the number of address bits in a computer. For
example, if a computer issues 32-bit addresses, the addressable memory space is 4G bytes.
Physical main memory in a computer is generally not as large as the entire possible
addressable space. Physical memory typically ranges from a few hundred megabytes to 1G
bytes. Large programs that cannot fit completely into the main memory have their parts
stored on secondary storage devices such as magnetic disks. Pieces of programs must be
transferred to the main memory from secondary storage before they can be executed.
Techniques that automatically move program and data between main memory and secondary
storage when they are required for execution are called virtual-memory techniques. Programs
and processors reference an instruction or data independent of the size of the main memory.
Processor issues binary addresses for instructions and data. These binary addresses are called
logical or virtual addresses. Virtual addresses are translated into physical addresses by a
combination of hardware and software subsystems. If virtual address refers to a part of the
program that is currently in the main memory, it is accessed immediately. If the address
refers to a part of the program that is not currently in the main memory, it is first transferred
to the main memory before it can be used.
Memory management unit (MMU) translates virtual addresses into physical addresses. If the
desired data or instructions are in the main memory they are fetched as described previously.
If the desired data or instructions are not in the main memory, they must be transferred from
secondary storage to the main memory. MMU causes the operating system to bring the data
from the secondary storage into the main memory.
Address Translation
Assume that program and data are composed of fixed-length units called pages. A page
consists of a block of words that occupy contiguous locations in the main memory. Page is a
basic unit of information that is transferred between secondary storage and main memory.
Size of a page commonly ranges from 2K to 16K bytes. Pages should not be too small,
because the access time of a secondary storage device is much larger than the main memory.
Pages should not be too large, else a large portion of the page may not be used, and it will
occupy valuable space in the main memory.
Cache memory: Introduced to bridge the speed gap between the processor and the main
memory. Implemented in hardware.
Virtual memory: Introduced to bridge the speed gap between the main memory and
secondary storage. Implemented in part by software.
Virtual page number generated by the processor is added to the contents of the page table
base register. This provides the address of the corresponding entry in the page table. The
contents of this location in the page table give the starting address of the page if the page is
currently in the main memory.
Page table entry for a page also includes some control bits which describe the status of the
page while it is in the main memory. One bit indicates the validity of the page. Indicates
whether the page is actually loaded into the main memory. Allows the operating system to
invalidate the page without actually removing it. One bit indicates whether the page has been
modified during its residency in the main memory. This bit determines whether the page
should be written back to the disk when it is removed from the main memory. Similar to the
dirty or modified bit in case of cache memory. Other control bits for various other types of
restrictions that may be imposed. For example, a program may only have read permission for
a page, but not write or modify permissions.
The page table is used by the MMU for every read and write access to the memory. Ideal
location for the page table is within the MMU. Page table is quite large. MMU is
implemented as part of the processor chip. Impossible to include a complete page table on the
chip. Page table is kept in the main memory. A copy of a small portion of the page table can
be accommodated within the MMU. Portion consists of page table entries that correspond to
the most recently accessed pages.
A small cache called as Translation Lookaside Buffer (TLB) is included in the MMU. TLB
holds page table entries of the most recently accessed pages. The cache memory holds most
recently accessed blocks from the main memory. Operation of the TLB and page table in the
main memory is similar to the operation of the cache and main memory. Page table entry for
a page includes: Address of the page frame where the page resides in the main memory.
Some control bits. In addition to the above for each page, TLB must hold the virtual page
number for each page.
Associative-mapped TLB: High-order bits of the virtual address generated by the processor
select the virtual page.These bits are compared to the virtual page numbers in the TLB. If
there is a match, a hit occurs and the corresponding address of the page frame is read. If there
is no match, a miss occurs and the page table within the main memory must be consulted.
Set-associative mapped TLBs are found in commercial processors.
If a program generates an access to a page that is not in the main memory a page fault is said
to occur. Whole page must be brought into the main memory from the disk, before the
execution can proceed. Upon detecting a page fault by the MMU, following actions occur:
MMU asks the operating system to intervene by raising an exception. Processing of the active
task which caused the page fault is interrupted. Control is transferred to the operating system.
Operating system copies the requested page from secondary storage to the main memory.
Once the page is copied, control is returned to the task which was interrupted.
Platters and Read/Write Heads: - The heart of the disk drive is the stack of rotating platters
that contain the encoded data, and the read and write heads that access that data. The drive
contains five or more platters. There are read/write heads on the top and bottom of each
platter, so information can be recorded on both surfaces. All heads move together across the
platters. The platters rotate at constant speed usually 3600 rpm.
Drive Electronics: - The disk drive electronics are located on a printed circuit board
attached to the disk drive. After a read request, the electronics must seek out and find the
block requested, stream is off of the surface, error check and correct it, assembly into bytes,
store it in an on-board buffer and signal the processor that the task is complete. To assist in
the task, the drive electronics include a disk controller, a special purpose processor.
Data organization on the Disk:- The drive needs to know where the data to be accessed is
located on the disk. In order to provide that location information, data is organized on the
disk platters by tracks and sectors. Fig below shows simplified view of the organization of
tracks and sectors on a disk. The fig. shows a disk with 1024 tracks, each of which has 64
sectors. The head can determine which track it is on by counting tracks from a known
location and sector identities are encoded in a header written on the disk at the front of each
sector. The number of bytes per sector is fixed for a given disk drive, varying in size from
512 bytes to 2KB. All tracks with the same number, but as different surfaces, form a
cylinder. The information is recorded on the disk surface 1 bit at a time by magnetizing a
small area on the track with the write head. That bit is detected by sending the direction of
that magnetization as the magnetized area passes under the read head as shown in fig below.
The header usually contains both synchronization and location information. The
synchronization information allows the head positioning circuitry to keep the heads centered
on the track and the location information allows the disk controller to determine the sectors
& identifies as the header passes, so that the data can be captured if it is read or stored, if it is
a write. The 12 bytes of ECC (Error Correcting Code) information are used to detect and
correct errors in the 512 byte data field.
Dynamic properties are those that deal with the access time for the reading and writing of
data. The calculation of data access time is not simple. It depends not only as the rotational
speed of the disk, but also the location of the read/write head when it begins the access.
There are several measures of data access times.
1. Seek time: - Is the average time required to move the read/write head to the desired
track. Actual seek time which depend on where the head is when the request is
received and how far it has to travel, but since there is no way to know what these
values will be when an access request is made, the average figure is used. Average
seek time must be determined by measurement. It will depend on the physical size of
the drive components and how fast the heads can be accelerated and decelerated.
Seek times are generally in the range of 8-20 m sec and have not changed much in
recent years.
2. Track to track access time: - Is the time required to move the head from one track
to adjoining one. This time is in the range of 1-2 m sec.
3. Rotational latency: - Is the average time required for the needed sector to pass under
head once and head has been positioned once at the correct track. Since on the
average the desired sector will be half way around the track from where the head is
when the head first arrives at the track, rotational latency is taken to be ½ the rotation
time. Current rotation speeds are from 3600 to 7200 rpm, which yield rotational
latencies in the 4-8 ms range.
6. Sustained data rate: - Is the rate at which data can be accessed over a sustained
period of time.
Optical Disks
Compact Disk (CD) Technology:- The optical technology that is used for CD system is
based on laser light source. A laser beam is directed onto the surface of the spinning disk.
Physical indentations in the surface are arranged along the tracks of the disk. They reflect the
focused beam towards a photo detector, which detects the stored binary patterns.
The laser emits a coherent light beam that is sharply focused on the surface of the disk.
Coherent light consists of Synchronized waves that have the same wavelength. If a coherent
light beam is combined with another beam of the same kind, and the two beams are in phase,
then the result will be brighter beam. But, if a photo detector is used to detect the beams, it
will detect a bright spot in the first case and a dark spot in the second case.A cross section of
a small portion of a CD shown in fig. below. The bottom payer is Polycarbonate plastic,
which functions as a clear glass base. The surface of this plastic is Programmed to store data
by indenting it with pits. The unindented parts are called lands. A thin layer of reflecting
aluminium material is placed on top of a programmed disk. The aluminium is then covered
by a protective acrylic. Finally the topmost layer is deposited and stamped with a label.
The laser source and the Photo detector are positioned below the polycarbonate plastic. The
emitted bean travels through this plastic, reflects off the aluminium layer and travels back
toward photo detector.
1. CD-ROM
MODULE 4 : ARITHMETIC
CARRY-LOOKAHEAD ADDITIONS
• The logic expression for si(sum) and ci+1(carry-out) of stage i are
si=xi+yi+ci ------(1) ci+1=xiyi+xici+yici ------(2)
• Factoring (2) into
ci+1=xiyi+(xi+yi)ci
we can write
ci+1=Gi+PiCi where Gi=xiyi and Pi=xi+yi
• The expressions Gi and Pi are called generate and propagate functions (Figure 6.4).
• If Gi=1, then ci+1=1, independent of the input carry ci. This occurs when both xi and yi are 1. Propagate
function means that an input-carry will produce an output-carry when either xi=1 or yi=1.
• All Gi and Pi functions can be formed independently and in parallel in one logic-gate delay.
• Expanding ci terms of i-1 subscripted variables and substituting into the ci+1 expression, we obtain
ci+1=Gi+PiGi-1+PiPi-1Gi-2. . . . . .+P1G0+PiPi-1 . . . P0c0
• Conclusion: Delay through the adder is 3 gate delays for all carry-bits &
4 gate delays for all sum-bits.
• Consider the design of a 4-bit adder. The carries can be implemented as
c1=G0+P0c0
c2=G1+P1G0+P1P0c0
c3=G2+P2G1+P2P1G0+P2P1P0c0
c4=G3+P3G2+P3P2G1+P3P2P1G0+P3P2P1P0c0
• The carries are implemented in the block labeled carry-lookahead logic. An adder implemented in this form is
called a carry-lookahead adder.
• Limitation: If we try to extend the carry-lookahead adder for longer operands, we run into a problem of gate
fan-in constraints.
COMPUTER ORGANIZATION
HIGHER-LEVEL GENERATE & PROPAGATE FUNCTIONS
• 16-bit adder can be built from four 4-bit adder blocks (Figure 6.5).
• These blocks provide new output functions defined as Gk and Pk,
where k=0 for the first 4-bit block,
k=1 for the second 4-bit block and so on.
• In the first block,
P0=P3P2P1P0
&
G0=G3+P3G2+P3P2G1+P3P2P1G0
• The first-level Gi and Pi functions determine whether bit stage i generates or propagates a carry, and the second
level Gk and Pk functions determine whether block k generates or propagates a carry.
• Carry c16 is formed by one of the carry-lookahead circuits as
c16=G3+P3G2+P3P2G1+P3P2P1G0+P3P2P1P0c0
• Conclusion: All carries are available 5 gate delays after X, Y and c0 are applied as inputs.
COMPUTER ORGANIZATION
MULTIPLICATION OF POSITIVE NUMBERS
ARRAY MULTIPLICATION
• The main component in each cell is a full adder(FA)..
• The AND gate in each cell determines whether a multiplicand bit m j, is added to the incoming partial-product bit,
based on the value of the multiplier bit qi (Figure 6.6).
COMPUTER ORGANIZATION
SEQUENTIAL CIRCUIT BINARY MULTIPLIER
• Registers A and Q combined hold PPi(partial product)
while the multiplier bit qi generates the signal Add/Noadd.
• The carry-out from the adder is stored in flip-flop C (Figure 6.7).
• Procedure for multiplication:
1) Multiplier is loaded into register Q, Multiplicand
is loaded into register M and
C & A are cleared to 0.
2) If q0=1, add M to A and store sum in A. Then C, A and Q are shifted right one bit-position. If
q0=0, no addition performed and C, A & Q are shifted right one bit-position.
3) After n cycles, the high-order half of the product is held in register A and
the low-order half is held in register Q.
COMPUTER ORGANIZATION
SIGNED OPERAND MULTIPLICATION
BOOTH ALGORITHM
• This algorithm
→ generates a 2n-bit product
→ treats both positive & negative 2's-complement n-bit operands uniformly(Figure 6.9-6.12).
• Attractive feature: This algorithm achieves some efficiency in the number of addition required when the
multiplier has a few large blocks of 1s.
• This algorithm suggests that we can reduce the number of operations required for multiplication by representing
multiplier as a difference between 2 numbers.
For e.g. multiplier(Q) 14(001110) can be represented
as 010000 (16)
-000010 (2)
001110 (14)
4 1
• Therefore, product P=M*Q can be computed by adding 2 times the M to the 2's complement of 2 times the M
COMPUTER ORGANIZATION
FAST MULTIPLICATION
BIT-PAIR RECODING OF MULTIPLIERS
• This method
→ derived from the booth algorithm
→ reduces the number of summands by a factor of 2
• Group the Booth-recoded multiplier bits in pairs. (Figure 6.14 & 6.15).
• The pair (+1 -1) is equivalent to the pair (0 +1).
COMPUTER ORGANIZATION
CARRY-SAVE ADDITION OF SUMMANDS
• Consider the array for 4*4 multiplication. (Figure 6.16 & 6.18).
• Instead of letting the carries ripple along the rows, they can be "saved" and introduced into the next row, at the
correct weighted positions.
COMPUTER ORGANIZATION
INTEGER DIVISION
• An n-bit positive-divisor is loaded into register M.
An n-bit positive-dividend is loaded into register Q at the start of the operation.
Register A is set to 0 (Figure 6.21).
• After division operation, the n-bit quotient is in register Q,
and the remainder is in register A.
NON-RESTORING DIVISION
• Procedure:
Step 1: Do the following n times
i) If the sign of A is 0, shift A and Q left one bit position and subtract M from
A; otherwise, shift A and Q left and add M to A (Figure 6.23).
ii) Now, if the sign of A is 0, set q0 to 1; otherwise set q0 to
0. Step 2: If the sign of A is 1, add M to A (restore).
COMPUTER ORGANIZATION
RESTORING DIVISION
• Procedure: Do the following n times
1) Shift A and Q left one binary position (Figure 6.22).
2) Subtract M from A, and place the answer back in A
3) If the sign of A is 1, set q0 to 0 and add M back to A(restore
A). If the sign of A is 0, set q0 to 1 and no restoring done.
c
COMPUTER ORGANIZATION
FLOATING-POINT NUMBERS & OPERATIONS
IEEE STANDARD FOR FLOATING POINT NUMBERS
• Single precision representation occupies a single 32-bit word.
-126 +127 +38
The scale factor has a range of 2 to 2 (which is approximately equal to 10 ).
• The 32 bit word is divided into 3 fields: sign(1 bit), exponent(8 bits) and mantissa(23 bits).
• Signed exponent=E
Unsigned exponent E'=E+127. Thus, E' is in the range 0<E'<255.
• The last 23 bits represent the mantissa. Since binary normalization is used, the MSB of the mantissa is always
equal to 1. (M represents fractional-part).
• The 24-bit mantissa provides a precision equivalent to about 7 decimal-digits (Figure 6.24).
• Double precision representation occupies a single 64-bit word. And E' is in the range 1<E'<2046.
• The 53-bit mantissa provides a precision equivalent to about 16 decimal-digits.
COMPUTER ORGANIZATION
ARITHMETIC OPERATIONS ON FLOATING-POINT NUMBERS
Multiply Rule
1) Add the exponents & subtract 127.
2) Multiply the mantissas & determine sign of the result.
3) Normalize the resulting value if necessary.
Divide Rule
1) Subtract the exponents & add 127.
2) Divide the mantissas & determine sign of the result.
3) Normalize the resulting value if necessary.
Add/Subtract Rule
1) Choose the number with the smaller exponent & shift its mantissa right a number of steps equal to the
difference in exponents(n).
2) Set exponent of the result equal to larger exponent.
3) Perform addition/subtraction on the mantissas & determine sign of the result.
4) Normalize the resulting value if necessary.
COMPUTER ORGANIZATION
IMPLEMENTING FLOATING-POINT OPERATIONS
• First compare exponents to determine how far to shift the mantissa of the number with the smaller exponent.
• The shift-count value n
→ is determined by 8 bit subtractor &
→ is sent to SHIFTER unit.
• In step 1, sign is sent to SWAP network (Figure 6.26).
If sign=0, then EA>EB and mantissas MA & MB are sent straight through SWAP network. If
sign=1, then EA<EB and the mantissas are swapped before they are sent to SHIFTER.
• In step 2, 2:! MUX is used. The exponent of result E is tentatively determined as EA if EA>EB or EB
if EA<EB
• In step 3, CONTROL logic
→ determines whether mantissas are to be added or subtracted.
→ determines sign of the result.
• In step 4, result of step 3 is normalized. The number of leading zeros in M determines number of bit shifts(X) to
be applied to M.
COMPUTER ORGANIZATION
Write the complete control sequence for the instruction : Move (R s),Rd
• This instruction copies the contents of memory-location pointed to by Rs into Rd. This is a memory read
operation. This requires the following actions
→ fetch the instruction
→ fetch the operand (i.e. the contents of the memory-location pointed by Rs).
→ transfer the data to Rd.
• The control-sequence is written as follows
1) PCout, MARin, Read, Select4, Add, Zin
2) Zout, PCin, Yin, WMFC
3) MDRout, IRin
4) Rs, MARin, Read
5) MDRinE, WMFC
6) MDRout, Rd, End
COMPUTER ORGANIZATION
FETCHING A WORD FROM MEMORY
• To fetch instruction/data from memory, processor transfers required address to MAR (whose output is connected
to address-lines of memory-bus).
At the same time, processor issues Read signal on control-lines of memory-bus.
• When requested-data are received from memory, they are stored in MDR. From MDR, they are transferred to
other registers
• MFC (Memory Function Completed): Addressed-device sets MFC to 1 to indicate that the contents of the
specified location
→ have been read &
→ are available on data-lines of memory-bus
• Consider the instruction Move (R1),R2. The sequence of steps is:
1) R1out, MARin, Read ;desired address is loaded into MAR & Read command is issued
2) MDRinE, WMFC ;load MDR from memory bus & Wait for MFC response from memory
3) MDRout, R2in ;load R2 from MDR
where WMFC=control signal that causes processor's control
circuitry to wait for arrival of MFC signal
BRANCHING INSTRUCTIONS
• Control sequence for an unconditional branch instruction is as follows:
1) PCout, MARin, Read, Select4, Add, Zin
2) Zout, PCin, Yin, WMFC
3) MDRout, IRin
4) Offset-field-of-IRout, Add, Zin
5) Zout, PCin, End
• The processing starts, as usual, the fetch phase ends in step3.
• In step 4, the offset-value is extracted from IR by instruction-decoding circuit.
• Since the updated value of PC is already available in register Y, the offset X is gated onto the bus, and an
addition operation is performed.
• In step 5, the result, which is the branch-address, is loaded into the PC.
• The offset X used in a branch instruction is usually the difference between the branch target-address and the
address immediately following the branch instruction. (For example, if the branch instruction is at location 1000
and branch target-address is 1200, then the value of X must be 196, since the PC will be containing the address
1004 after fetching the instruction at location 1000).
• In case of conditional branch, we need to check the status of the condition-codes before loading a new value into
the PC.
e.g.: Offset-field-of-IRout, Add, Zin, If N=0 then End
If N=0, processor returns to step 1 immediately after step 4.
If N=1, step 5 is performed to load a new value into PC.
COMPUTER ORGANIZATION
MULTIPLE BUS ORGANIZATION
• All general-purpose registers are combined into a single block called the register file.
• Register-file has 3 ports. There are 2 outputs allowing the contents of 2 different registers to be simultaneously
placed on the buses A and B.
• Register-file has 3 ports.
1) Two output-ports allow the contents of 2 different registers to be simultaneously placed on buses A & B.
2) Third input-port allows data on bus C to be loaded into a third register during the same clock-cycle.
• Buses A and B are used to transfer source-operands to A & B inputs of ALU.
• Result is transferred to destination over bus C.
• Incrementer-unit is used to increment PC by 4.
• Control sequence for the instruction Add R4,R5,R6 is as follows
1) PCout, R=B, MARin, Read, IncPC
2) WMFC
3) MDRout, R=B, IRin
4) R4outA, R5outB, SelectA, Add, R6in, End
• Instruction execution proceeds as follows:
Step 1--> Contents of PC are passed through ALU using R=B control-signal and loaded into MAR
to start a memory Read operation. At the same time, PC is incremented by 4.
Step2--> Processor waits for MFC signal from memory.
Step3--> Processor loads requested-data into MDR, and then transfers them to IR.
Step4--> The instruction is decoded and add operation take place in a single step.
Note:
To execute instructions, the processor must have some means of generating the control signals needed in the
proper sequence. There are two approaches for this purpose:
1) Hardwired control and 2) Microprogrammed control.
COMPUTER ORGANIZATION
HARDWIRED CONTROL
• Decoder/encoder block is a combinational-circuit that generates required control-outputs depending on state of
all its inputs.
• Step-decoder provides a separate signal line for each step in the control sequence.
Similarly, output of instruction-decoder consists of a separate line for each machine instruction.
• For any instruction loaded in IR, one of the output-lines INS1 through INSm is set to 1, and all other lines are set
to 0.
• The input signals to encoder-block are combined to generate the individual control-signals Yin, PCout, Add, End
and so on.
• For example, Zin=T1+T6.ADD+T4.BR ;This signal is asserted during time-slot T1 for all instructions,
during T6 for an Add instruction
during T4 for unconditional branch instruction
• When RUN=1, counter is incremented by 1 at the end of every clock cycle.
When RUN=0, counter stops counting.
• Sequence of operations carried out by this machine is determined by wiring of logic elements, hence the name
“hardwired”.
• Advantage: Can operate at high speed.
Disadvantage: Limited flexibility.
COMPUTER ORGANIZATION
COMPLETE PROCESSOR
• This has separate processing-units to deal with integer data and floating-point data.
• A data-cache is inserted between these processing-units & main-memory.
• Instruction-unit fetches instructions
→ from an instruction-cache or
→ from main-memory when desired instructions are not already in cache
• Processor is connected to system-bus &
hence to the rest of the computer by means of a bus interface
• Using separate caches for instructions & data is common practice in many processors today.
• A processor may include several units of each type to increase the potential for concurrent operations.
COMPUTER ORGANIZATION
MICROPROGRAMMED CONTROL
• Control-signals are generated by a program similar to machine language programs.
• Control word(CW) is a word whose individual bits represent various control-signals(like Add, End, Zin). {Each of
the control-steps in control sequence of an instruction defines a unique combination of 1s & 0s in the CW}.
• Individual control-words in microroutine are referred to as microinstructions.
• A sequence of CWs corresponding to control-sequence of a machine instruction constitutes the microroutine.
• The microroutines for all instructions in the instruction-set of a computer are stored in a special memory called
the control store(CS).
• Control-unit generates control-signals for any instruction by sequentially reading CWs of corresponding
microroutine from CS.
• Microprogram counter(µPC) is used to read CWs sequentially from CS.
• Every time a new instruction is loaded into IR, output of "starting address generator" is loaded into µPC.
• Then, µPC is automatically incremented by clock,
causing successive microinstructions to be read from CS.
Hence, control-signals are delivered to various parts of processor in correct sequence.
COMPUTER ORGANIZATION
ORGANIZATION OF MICROPROGRAMMED CONTROL UNIT (TO SUPPORT CONDITIONAL BRANCHING)
• In case of conditional branching, microinstructions specify which of the external inputs, condition-codes should
be checked as a condition for branching to take place.
• The starting and branch address generator block loads a new address into µPC when a microinstruction instructs
it to do so.
• To allow implementation of a conditional branch, inputs to this block consist of
→ external inputs and condition-codes
→ contents of IR
• µPC is incremented every time a new microinstruction is fetched from microprogram memory except in following
situations
i) When a new instruction is loaded into IR, µPC is loaded with starting-address of microroutine for that
instruction.
ii) When a Branch microinstruction is encountered and branch condition is satisfied, µPC is loaded with
branch-address.
iii) When an End microinstruction is encountered, µPC is loaded with address of first CW in microroutine for
instruction fetch cycle.
COMPUTER ORGANIZATION
MICROINSTRUCTIONS
• Drawbacks of microprogrammed control:
1) Assigning individual bits to each control-signal results in long microinstructions
because the number of required signals is usually large.
2) Available bit-space is poorly used because
only a few bits are set to 1 in any given microinstruction.
• Solution: Signals can be grouped because
1) Most signals are not needed simultaneously.
2) Many signals are mutually exclusive.
• Grouping control-signals into fields requires a little more hardware because
decoding-circuits must be used to decode bit patterns of each field into individual control signals.
• Advantage: This method results in a smaller control-store (only 20 bits are needed to store the patterns for the
42 signals).
Microinstruction
COMPUTER ORGANIZATION
MICROPROGRAM SEQUENCING
• The microprogram requires several branch microinstructions which perform no useful operation. Thus, they
detract from the operating speed of the computer.
• Solution: Include an address-field as a part of every microinstruction to indicate the location of the next
microinstruction to be fetched. (This means every microinstruction becomes a branch microinstruction).
• The flexibility of this approach comes at the expense of additional bits for the address-field.
• Advantage: Separate branch microinstructions are virtually eliminated. There are few limitations in assigning
addresses to microinstructions. There is no need for a counter to keep track of sequential addresse. Hence, the
μPC is replaced with a μAR (Microinstruction Address Register). {which is loaded from the next-address field in
each microinstruction}.
• The next-address bits are fed through the OR gate to the μAR, so that the address can be modified on the basis
of the data in the IR, external inputs and condition-codes.
• The decoding circuits generate the starting-address of a given microroutine on the basis of the opcode in the IR.
COMPUTER ORGANIZATION
PREFETCHING MICROINSTRUCTIONS
• Drawback of microprogrammed control: Slower operating speed because of the time it takes to fetch
microinstructions from the control-store.
• Solution: Faster operation is achieved if the next microinstruction is pre-fetched while the current one is being
executed.
Emulation
• The main function of microprogrammed control is to provide a means for simple, flexible and relatively
inexpensive execution of machine instruction.
• Its flexibility in using a machine's resources allows diverse classes of instructions to be implemented.
• Suppose we add to the instruction-repository of a given computer M1, an entirely new set of instructions that is
in fact the instruction-set of a different computer M2.
• Programs written in the machine language of M2 can be then be run on computer M1 i.e. M1 emulates M2.
• Emulation allows us to replace obsolete equipment with more up-to-date machines.
• If the replacement computer fully emulates the original one, then no software changes have to be made to run
existing programs.
• Emulation is easiest when the machines involved have similar architectures.