Computer Architecture
Computer Architecture
Instruction code is usually divided into two parts: Opcode and address(operand)
o Operation Code (opcode):
group of bits that define the operation
Eg: add, subtract, multiply, shift, complement.
No. of bits required for opcode depends on no. of operations available in
computer.
n bit opcode >= 2n (or less) operations
o Address (operand):
specifies the location of operands (registers or memory words)
Memory words are specified by their address
Registers are specified by their k-bit binary code
k-bit address >= 2k registers
The data register (DR) holds the operand read from memory.
The accumulator (AC) register is a general purpose processing register.
The instruction read from memory is placed in the instruction register (IR).
The temporary register (TR) is used for holding temporary data during the
processing.
The memory address register (AR) has 12 bits since this is the width of a memory
address.
The program counter (PC) also has 12 bits and it holds the address of the next
instruction to be read from memory after the current instruction is executed.
Two registers are used for input and output.
The input register (INPR) receives an 8-bit character from an input
device.
The output register (OUTR) holds an 8-bit character for an output
device.
Common Bus System:
The basic computer has eight registers, a memory unit, and a control unit
Paths must be provided to transfer information from one register to another and
betweenmemory and registers.
The operation code (opcode) part of the instruction contains three bits and the
meaning of the remaining 13 bits depends on the operation code encountered.
A memory-reference instruction uses 12 bits(0-11) to specify an address,
next 3 bits for operation code(opcode) and one bit to specify the addressing
mode I. I is equal to 0 for direct address and to 1 for indirect address.
7
Page
The register-reference instructions are recognized by the operation code 111 with a 0
in the leftmost bit (bit 15) of the instruction.
A register-reference instruction specifies an operation on the AC register. So an
operand from memory is not needed. Therefore, the other 12(0-11) bits are used to
specify the operation to be executed.
The control logic is implemented with gates, The control information is stored in a
flip-flops, decoders, and other digital control memory. The control memory is
circuits. programmed to initiate the required
sequence of microoperations.
The advantage that it can be optimized to Compared with the hardwired control
produce a fast mode of operation. operation is slow.
Requires changes in the wiring among the Required changes or modifications can be
various components if the design has to be done by updating the microprogram in
modified or changed. control memory.
The block diagram of the hardwired control unit is shown in Fig. 5-6.
It consists of two decoders, a sequence counter, and a number of control logic gates.
An instruction read from memory is placed in the instruction register (IR). It is
divided into three parts: The I bit, the operation code, and bits 0 through 11.
The operation code in bits 12 through 14 are decoded with a 3 x 8 decoder. The
eight outputs of the decoder are designated by the symbols D0 through D7.
Bit 15 of the instruction is transferred to a flip-flop designated by the symbol I.
Bits 0 through 11 are applied to the control logic gates.
The 4-bit sequence counter can count in binary from 0 through 15.
The outputs of the counter are decoded into 16 timing signals T0 through T15.
The sequence counter SC can be incremented or cleared synchronously.
The counter is incremented to provide the sequence of timing signals out of the 4 x 16
decoder.
As an example, consider the case where SC is incremented to provide timing signals
T0, T1, T2, T3 and T4 in sequence. At time T4, SC is cleared to 0 if decoder output D3
is active.
This is expressed symbolically by the statement
D3T 4
The timing diagram of Fig. 5-7 shows the time relationship of the control signals.
The sequence counter SC responds to the positive transition of the clock.
Initially, the CLR input of SC is active. The first positive transition of the clock clears
SC to 0, which in turn activates the timing signal T0 out of the decoder. T0 is active
during one clock cycle.
SC is incremented with every positive clock transition, unless its CLR input is active.
This produces the sequence of timing signals T0, T1, T2, T3, T4and so on, as shown in the
diagram.
The last three waveforms in Fig.5-7 show how SC is cleared when D3T4 = 1.
Output D3 from the operation decoder becomes active at the end of timing signal T2.
When timing signal T4 becomes active, the output of the AND gate that
implements the control function D3T4 becomes active.
This signal is applied to the CLR input of SC. On the next positive clock
transition (the one marked T4 in the diagram) the counter is cleared to 0.
This causes the timing signal T0 to become active instead of T5 that would have been
active if SC were incremented instead of cleared.
5. Instruction Cycle:
A program residing in the memory unit of the computer consists of a sequence of
instructions.
The program is executed in the computer by going through a cycle for each instruction.
Each instruction cycle in turn is subdivided into a sequence of sub cycles or phases.
In the basic computer each instruction cycle consists of the following phases:
1. Fetch an instruction from memory.
2. Decode the instruction.
3. Read the effective address from memory if the instruction has an indirect address.
4. Execute the instruction.
Upon the completion of step 4, the control goes back to step 1 to
fetch, decode, and execute the next instruction.
Fetch and Decode:
Initially, the program counter PC is loaded with the address of the first instruction in the
program.
The sequence counter SC is cleared to 0, providing a decoded timing signal T0.
The microoperations for the fetch and decode phases can be specified by the
followingregister transfer statements.
Figure 5-8 shows how the first two register transfer statements are implemented in the
bussystem.
To provide the data path for the transfer of PC to AR we must apply timing signal
T0 to achieve the following connection:
o Place the content of PC onto the bus by making the bus selection inputs S2,
S1, S0 equal to 010.
o Transfer the content of the bus to AR by enabling the LD input of AR.
In order to implement the second statement it is necessary to use timing signal T1
to provide the following connections in the bus system.
o Enable the read input of memory.
o Place the content of memory onto the bus by making S2S1S0=111.
o Transfer the content of the bus to IR by enabling the LD input of IR.
o Increment PC by enabling the INR input of PC.
Multiple input OR gates are included in the diagram because there are other control
functions that will initiate similar operations.
Determine the Type of Instruction:
• Initialise SC =0.
• Fetch phase
- At time T0 get the address of the next instruction to be executed from the Program
Counter (PC).
𝐴𝑅 ← 𝑃𝐶
- At time T1 get the instruction from the memory address in AR into the Instruction
Register (IR) and increment PC.
𝐼𝑅 ← 𝑀[𝐴𝑅]
𝑃𝐶 ← 𝑃𝐶 + 1
• Decode phase
- At time T2 decode the instruction by considering 12-14 bits of IR as Opcode, copy
0-11 bits of IR to AR and 15th bit to I flip flop.
• Decoding Phase at time T3 determines the type of instruction read from memory.
• Decoder output D7 is equal to 1 the instruction is a register reference or memory
reference instruction.
- If I = 0 it is a register reference instruction.
- If I = 1 it is a I/O instruction. Execute the instruction and set SC =0.
• If D7 = 0 it is a memory reference instruction and the mode of address is determined
from the I value.
- If I = 1, we have a memory reference instruction with an indirect address.
The effective address is read from memory using the microoperation
AR ← M[AR]
- If D7=0 and I=0 Nothing is done . Execute the instruction and set SC=0.
After executing the instruction control shifts to process the next instruction.
6. Microprogrammed Control (Control Memory)
• Microinstruction : Each word in control memory contains within it a
microinstruction.
• Microoperation : A microinstruction specifies one or more microoperations.
• Microprogram : A sequence of microinstructions forms what is called a
microprogram.
• A computer that employs a microprogrammed control unit will have two separate
memories:
1. The main memory : This memory is available to the user for storing programs. The
user's program in main memory consists of machine instructions and data.
2. The control memory : This memory contains a fixed microprogram that cannot be
alter by the user. The microprogram consists of microinstructions that specify various
internal control signals for execution of register microoperations.
• Block diagram of microprogrammed control unit
16
• Advantage: Once the hardware configuration is established no need for further
hardware or wiring changes. For different control sequence a different set of
microinstructions is used in control memory.
7. Address Sequencing
• Microinstructions are usually stored in groups where each group specifies a routine,
where each routine specifies how to carry out an instruction.
• Each routine must be able to branch to the next routine in the sequence.
• An initial address is loaded into the CAR when power is turned on; this is usually the
address of the first microinstruction in the instruction fetch routine.
• Next, the control unit must determine the effective address of the instruction.
Mapping :
The next step is to generate the microoperations that executed the instruction.
– This involves taking the instruction‘s opcode and transforming it into an address for
the the instruction‘s microprogram in control memory. This process is called
mapping.
– While microinstruction sequences are usually determined by incrementing the
CAR, this is not always the case. If the processor‘s control unit can support
subroutines in a microprogram, it will need an external register for storing return
addresses.
• When instruction execution is finished, control must be return to the fetch routine.
This is done using an unconditional branch.
• Addressing sequencing capabilities of control memory include:
– Incrementing the CAR
– Unconditional and conditional branching (depending on status bit).
– Mapping instruction bits into control memory addresses
– Handling subroutine calls and returns.
17
Conditional Branching
• Status bits
– provide parameter information such as the carry-out from the adder, sign of a number,
mode bits of an instruction, etc.
– control the conditional branch decisions made by the branch logic together with the
field in the microinstruction that specifies a branch address.
Branch Logic
• Branch Logic - may be implemented in one of several ways:
– The simplest way is to test the specified condition and branch if the condition is true;
else increment the address register.
– This is implemented using a multiplexer:
• If the status bit is one of eight status bits, it is indicated by a 3-bit select number.
• If the select status bit is 1, the output is 0; else it is 0.
• A 1 generates the control signal for the branch; a 0 generates the signal to increment the
CAR.
• Unconditional branching occurs by fixing the status bit as always being 1.
Mapping of Instruction
• Branching to the first word of a microprogram is a special type of branch. The branch is
indicated by the opcode of the instruction.
• The mapping scheme shown in the figure allows for four microinstruction as well as
overflow space from 1000000 to 1111111.
Mapping of Instruction Code to Microoperation address
Subroutines
• Subroutine calls are a special type of branch where we return to one instruction below
the calling instruction.
– Provision must be made to save the return address, since it cannot be written into ROM
18
UNIT – II
CENTRAL PROCESSING UNIT
19
Control Word
ALU CONTROL
Encoding of ALU Operations:
20
2. STACK ORGANIZATION
• The two operations that are performed on stack are the insertion and deletion.
- The operation of insertion is called PUSH.
- The operation of deletion is called POP.
• These operations are simulated by incrementing and decrementing the stack pointer
register (SP).
Adv:
• Efficient computation of complex arithmetic expressions.
• Execution of instructions is fast because operand data are stored in consecutive
memory locations.
• Length of instruction is short as they do not have address field.
Disadv:
• The size of the program increases.
REGISTER STACK ORGANIZATION
• A stack can be placed in a portion of a large memory or it can be organized as a
collection of a finite number of memory words or registers.
• The figure shows the organization of a 64-word register stack
• The stack pointer register SP contains a binary number whose value is equal to the
address of the word is currently on top of the stack.
• Three items are placed in the stack: A, B, C, in that order.
• In above figure C is on top of the stack so that the content of SP is 3.
• For removing the top item, the stack is popped by reading the memory word at address
3 and decrementing the content of stack SP.
• Now the top of the stack is B, so that the content of SP is 2.
• Similarly for inserting the new item, the stack is pushed by incrementing SP and
writing a word in the nexthigher location in the stack.
• In a 64-word stack, the stack pointer contains 6 bits because 26 = 64.
• Since SP has only six bits, it cannot exceed a number greater than 63 (111111 in
binary).
21
• When 63 is incremented by 1, the result is 0 since 111111 + 1. = 1000000 in binary,
but SP can accommodate only the six least significant bits.
• Then the one-bit register FULL is set to 1, when the stack is full.
• Similarly when 000000 is decremented by 1, the result is 111111, and then the one-
bit register EMTY is set 1 when the stack is empty of items.
• DR is the data register that holds the binary data to be written into or read out of the
stack.
PUSH:
• Initially, SP is cleared to 0, EMTY is set to 1, and FULL is cleared to 0, so that SP
points to the word at address 0 and the stack is marked empty and not full.
• If the stack is not full (if FULL = 0), a new item is inserted with a push operation.
• The push operation is implemented with the following sequence of microoperations:
𝑺𝑷 ← 𝑺𝑷 + 𝟏
𝑴[𝑺𝑷] ← 𝑫𝑹
𝑰𝑭(𝑺𝑷 = 𝟎)𝒕𝒉𝒆𝒏 (𝑭𝑼𝑳𝑳 ← 𝟏)
𝑬𝑴𝑻𝒀 ← 𝟎
• The stack pointer is incremented so that it points to the address of next-higher word.
• A memory write operation inserts the word from DR the top of the stack.
• The first item stored in the stack is at address 1. The last item is stored at address 0.
• If SP reaches 0, the stack is full of items, so FULL is to 1. This condition is reached
if the top item prior to the last push way location 63 and, after incrementing SP, the
last item is stored in location 0.
• Once an item is stored in location 0, there are no more empty registers in the stack, so
the EMTY is cleared to 0.
POP:
• A new item is deleted from the stack if the stack is not empty (if EMTY = 0).
• The pop operation consists of the following sequence of min operations:
𝑫𝑹 ← 𝑴[𝑺𝑷]
𝑺𝑷 ← 𝑺𝑷 + 𝟏
𝑰𝑭(𝑺𝑷 = 𝟎)𝒕𝒉𝒆𝒏 (𝑬𝑴𝑻𝒀 ← 𝟏)
𝑭𝑼𝑳𝑳 ← 𝟎
• The top item is read from the stack into DR.
• The stack pointer is then decremented. If its value reaches zero, the stack is empty, so
EMTY is set 1. This condition is reached if the item read was in location 1. Once this it
is read out, SP is decremented and reaches the value 0, which is the initial value of SP.
• If a pop operation reads the item from location 0 and then is decremented, SP changes
to 111111, which is equivalent to decimal 63 in above configuration, the word in
address 0 receives the last item in the stack.
22
MEMORY STACK ORGANIZATION
23
The common arithmetic expressions are written in infix notation, with each operator written
between the operands. An expression can also be expressed in Prefix or Postfix notation as
follows:
Eg: A+B > Infix notation
+AB > Prefix or Polish notation
AB+ > Post or reverse Polish notation
3. Instruction Formats:
• The bits of the instruction are divided into groups called fields.
• The most common fields found in instruction formats are:
• An operation code field that specifies the operation to be perform
• An address field that designates a memory address or a processor register.
• A mode field that specifies the way the operand or the effective address is determined.
• Computers may have instructions of several different lengths containing varying
number of addresses.
• The number of address fields in the instruct format of a computer depends on the
internal organization of its registers.
24
• Most computers fall into one of three types of CPU organizations:
1. Single accumulator organization.
2. General register organization.
3. Stack organization.
25
Three Address Instruction
• This has three address field to specify a register or a memory location.
• Program created are much short in size but number of bits per instruction
increase.
• Makes Program creation easier.
• Programs run much slow because instruction contains too many bits of
information
• Expression: X = (A+B)*(C+D)
• R1, R2 are registers. M[] is any memory location
• It can be implemented using three address instruction as
ADD R1, A, B //R1 = M[A] + M[B]
ADD R2, C, D //R2 = M[C] + M[D]
MUL X, R1, R2 //M[X] = R1 * R2
Two Address Instruction
• This is common in commercial computers. Here two address can be specified in
the instruction.
• Expression: X = (A+B)*(C+D)
• R1, R2 are registers . M[] is any memory location.
• Program using two address instruction
MOV R1, A //R1 = M[A]
ADD R1, B //R1 = R1 + M[B]
MOV R2, C //R2 = C
ADD R2, D //R2 = R2 + D
MUL R1, R2 //R1 = R1 * R2
MOV X, R1 //M[X] = R1
One address instruction
• This use a implied ACCUMULATOR register for data manipulation.
• One operand is in accumulator and other is in register or memory location.
• Implied means that the CPU already know that one operand is in accumulator so
there is no need to specify it.
• Expression: X = (A+B)*(C+D)
• AC is accumulator. M[] is any memory location M[T] is temporary location.
• Program using one address instruction
LOAD A //AC = M[A]
ADD B //AC = AC + M[B]
STORE T //M[T] = AC
LOAD C //AC = M[C]
ADD D //AC = AC + M[D]
MUL T //AC = AC * M[T]
STORE X //M[X] = AC
26
Zero address instruction
• A stack based computer do not use address field in instruction.To evaluate a
expression first it is converted to revere Polish Notation i.e. Post fix Notation.
• Expression: X = (A+B)*(C+D)
• Postfixed : X = AB+CD+*
• TOP means top of stack. M[X] is any memory location
• Program using zero address instruction
PUSH A //TOP = A
PUSH B //TOP = B
ADD //TOP = A+B
PUSH C //TOP = C
PUSH D //TOP = D
ADD //TOP = C+D
MUL //TOP = (C+D)*(A+B)
POP X //M[X] = TOP
4. ADDRESSING MODES
• Addressing modes refers to the way in which the operand of an instruction is
specified.
• The addressing mode specifies a rule for interpreting or modifying the address
field of the instruction before the operand is actually executed.
27
Implied Mode
In this addressing mode, the instruction itself specifies the operands implicitly. It is
also called as implicit addressing mode. All register-reference instructions that make
use of the Accumulator are Implied Mode instructions.
Examples:
- The instruction ―Complement Accumulator‖ is an implied mode instruction.
- RAL – Rotate Left with Carry
Immediate Mode
In this addressing mode, the operand is specified in the instruction explicitly.
Instead of address field, an operand field is present that contains the operand.
• Examples:
- ADD 10 will increment the value stored in the accumulator by 10.
- MOV R,20 initializes register R to a constant value 20.
Register Mode
In this mode the operand is stored in the register and this register is present in CPU.
The instruction has the address of the Register where the operand is stored.
Advantages
• Shorter instructions and faster instruction fetch.
• Faster memory access to the operand(s)
Disadvantages
• Very limited address space
• Using multiple registers helps performance but it complicates the instructions.
Register Indirect Mode
• In this mode, the instruction specifies the register whose contents give us the
address of operand which is in memory. Thus, the register contains the address
of operand rather than the operand itself.
28
Effective Address
• The memory address of an operand consists of two components:
- Starting address of memory segment.
- Effective address or Offset: An offset is determined by adding any
combination of three address elements: displacement, base and index.
• Displacement: It is an 8 bit or 16 bit immediate value given in the instruction.
• Base: Contents of base register, BX or BP.
• Index: Content of index register SI or DI.
Direct Addressing Mode
• In this mode, effective address of operand is present in instruction itself. Single
memory reference to access data.No additional calculations to find the effective
address of the operand.
Example: ADD R1, 4000 - In this the 4000 is effective address of operand.
29
Relative Addressing Mode
In this the contents of PC(Program Counter) is added to address part of instruction to
obtain the effective address.
EA = A + (PC), where EA is effective address and PC is program counter.
The operand is A cells away from the current cell(the one pointed to by PC)
30
as EA = A + (R), where A is displacement and R holds pointer to base address.
31
Data Manipulation Instruction
• Data Manipulation Instructions perform operations on data and provide the
computational capabilities for the computer.
• It is divided into three basic types:
1) Arithmetic,
2) Logical and bit manipulation,
3) Shift Instruction
Arithmetic Instructions
32
Shift Instructions
• Shift are operations in which bits of a word are moved left or right.
6. PROGRAM CONTROL
• Program control instructions specify conditions for altering the content of the
program counter , while data transfer and manipulation instructions specify
condtions for data-processing operations.
33
34
Subroutine Call and Return
• It is a self-contained sequence of instructions that performs a given
computational task.
• During the execution of a program, a subroutine may be called, when it is called,
a branch is executed to the beginning of the subroutine to start executing its set
of instructions.
• After the subroutine has been executed,a branch is made back to the main
program.
• A subroutine call is implemented with the following microoperations:
CALL:
SP← SP-1 // Decrement stack point
M[SP] ←PC // Push content of PC onto the stack
PC←Effective Address //Transfer control to the subroutine
RETURN:
PC ← M[SP] // Pop stack and transfer to PC
SP ← SP+1 // Increment stack pointer
Program Interrupt
Types of Interrupts
1) External Interrupts
Arises from I/O device, from a timing device, from a circuit monitoring the
power supply, or from any other external source
2) Internal Interrupts or TRAP
Caused by register overflow, attempt to divide by zero, an invalid operation
code, stack overflow, and protection violation
3) Software Interrupts
Initiated by executing an instruction (INT or RST) » used by the programmer
to initiate an interrupt procedure at any desired point in the program
35
The computer which having the fewer instructions is classified as a reduced instruction
set computer, abbreviated as RISC.
CISC Characteristics:
• A large number of instructions--typically from 100 to 250 instructions.
• Some instructions that perform specialized tasks and are used infrequently.
• A large variety of addressing modes—typically from 5 to 20 differ modes.
• Variable-length instruction formats
• Instructions that manipulate operands in memory
RISC Characteristics:
• Relatively few instructions
• Relatively few addressing modes
• Memory access limited to load and store instructions
• All operations done within the registers of the CPU
• Fixed-length, easily decoded instruction format
• Single-cycle instruction execution
• Hardwired rather than microprogrammed control
• A relatively large number of registers in the processor unit
• Efficient instruction pipeline
36
UNIT III
COMPUTER ARITHMETIC
Arithmetic Processor
• Arithmetic instruction in digital computers manipulate data to produce results
necessary for the solution of the computational problems.
• An arithmetic processor is the part of a processor unit that execute arithmetic
instruction.
• An arithmetic instruction may specify binary or decimal data, and it may be
represented in, Fixed point (integer or fraction) OR floating point form.
• An arithmetic processor is simple for binary fixed point add instruction
• Data types considered for the arithmetic operations are,
o Fixed-point binary data in signed magnitude representation
o Fixed-point binary data in signed-2‘s complement representation
o Floating point binary data
o Binary –coded decimal (BCD) data
• Negative fixed point binary number can be represented in three ways,
o Signed magnitude (most computers use for floating point operations )
o Signed 1‘s complement
o Signed 2‘s complement(most computer use for integers)
37
Addition(Subtraction) algorithm
• When the signs of A and B are identical(different), add the two magnitudes and
attach the sign of A to the result.
• When the sign of A and B are different(identical), compare the magnitudes and
subtract the smaller number from the larger.
• Choose the sign of the result to be same as A if A > B or the complement of the
sign of A if A < B.
• For equal magnitude subtract B from A and make the sign of the result
Hardware Implementation
• A and B be two registers that hold the magnitudes of No
• As and Bs be two flip-flops that hold the corresponding signs
• The Result is transferred into A and As.
38
Hardware Algorithm
39
• The sign of the result is the same as the sign of A. so no change in A is required.
However, when A < B, the sign of the result is the complement of the original
sign of A
• The final result is found in register A and its sign in As.
• The left most bit in AC and BR represents the sign bits of the numbers
• The over flow flip-flops V is set to 1 if there is an overflow. The output carry in
this case is discarded.
40
• The sum is obtained by adding the contents of AC and BR(including their sign
bits).
• The overflow bit V is set to 1 if the ex-OR of the last two carries is 1,and it is
cleared to 0 otherwise.
2. Multiplication algorithms
41
• The multiplicand is stored in B register and its sign in Bs.
• The sequence counter SC is initially set to a number equal to the number of bits
in the multiplier. The counter is decremented by 1 after forming each partial
product
• The sum of A and B forms a partial product which is transferred to the EA
register .
• The shift will be denoted by the statement shr EAQ to designate the
• right shift depicted .
• The least significant bit of A is shifted into the most significant position of Q.
Hardware Algorithm
42
43
3. Booth’s Multiplication Algorithm
• Booth algorithm gives a procedure for multiplying binary integers in signed 2’s
complement representation in efficient way, i.e., less number of
additions/subtractions required.
• In this method the multiplier or multiplicand if it is negative number is represented in
2‘s complement representation.
Hardware Implementation of Booths Algorithm
• The hardware implementation of the booth algorithm requires the register
configuration shown in the figure below.
• The register used are AC, BR and QR respectively.
• Qn designates the least significant bit of multiplier in the register QR.
• An extra flip-flop Qn+1is appended to QR to facilitate a double inspection of the
multiplier
Algorithm
44
Example
• Multiplying -9 and -13.
• Since both are negative numbers they are represented in 2‘s complement form.
• Binary of 9 = 01001
1‘s complement of 9 = 10110
2‘s complement of 9 = 10110 +1 = 10111 = -9
• Binary of 13 = 01101
1‘s complement of 13 =10010
2‘s complement of 13 = 10010+1 = 10011= -13
• The result is in AC & BR 0001110101 = 117
45
4. DIVISION ALGORITHMS
Division of two fixed-point binary numbers in signed magnitude representation is done with
paper and pencil by a process of successive compare, shift, and subtract operations .
Binary Division
46
Divide overflow
• A divide overflow condition occurs if the higher order half bits of the dividend
constitute a number greater than or equal to the divisor.
• It may result in a quotient with an overflow.The length of the registers is finite and
could not hold number of bits greater than its length.
• Overflow condition is usually detected when a special flip-flop is set. Which will call it
a divide overflow flip-flop and label it DVF.
• In some computers it is the responsibility of the programmers to check if DVF is set
after each divide instruction.
• In older computers the occurrence of a divide overflow stopped the computer and this
condition was referred to as a DIVIDE STOP.
• The best way to avoid a divide overflow is to use floating point data. The divide
overflow can be handled very simply if numbers are in floating point representation.
Hardware algorithm
• The dividend is in A and Q and the divisor in B. The sign of the results transferred
into Qs to be part of quotient.
• A divide overflow condition is tested by subtracting divisor in B from half of the bits
of the dividend stored in A. if A≥B, the divide overflow flip-flop DVF set and the
operation is terminated prematurely.
• By doing the process as shown in the flowchart the quotient magnitude is formed in
register Q and the remainder is found in the register A. The quotient sign is in Qs
and the sign of the remainder in As is the same as the original sign of the dividend.
47
FLOATING POINT ARITHMETIC OPERATIONS
Register Configuration
• Three registers are there, BR, AC, and QR.
• Each register is subdivided into two parts – mantissa part (uppercase symbol) and
exponent part (lowercase symbol.
• The AC has a mantissa whose sign is in As, and a magnitude that is in A. The most
significant bit of A is A1. This bit must be a 1 to normalize the number. ‗a‘ is the
exponent part of AC.
• AC is a combination of As, A and a.
• Register BR is subdivided into Bs, B, and b and QR into Qs, Q and q.
• A parallel-adder adds the two mantissas and loads the sum into A and the carry into E.
A separate parallel adder can be used for the exponents.
• The exponents do not have a distinct sign bit because they are biased but are
represented as a biased positive quantity
48
• The exponents are also connected to a magnitude comparator that provides three binary
outputs to indicate their relative magnitude.
• The numbers in the registers should initially be normalized. After each arithmetic
operation, the result will be normalized. Thus all floating-point operands are always
normalized.
• During addition or subtraction, the two floating-point operands are kept in AC and BR.
The sum or difference is formed in the AC.
• The algorithm can be divided into four consecutive parts:
- Check for zeros.
- Align the mantissas.
- Add or subtract the mantissas
- Normalize the result
• A floating-point number cannot be normalized, if it is 0. If this number is used for
computation, the result may also be zero. Instead of checking for zeros during the
normalization process we check for zeros at the beginning and terminate the process if
necessary.
• The alignment of the mantissas must be carried out prior to their operation.
• After the mantissas are added or subtracted, the result may be un-normalized. The
normalization procedure ensures that the result is normalized before it is transferred to
memory. If the magnitudes were subtracted, there may be zero or may have an
underflow in the result.
• If the mantissa is equal to zero the entire floating-point number in the AC is cleared to
zero. Otherwise, the mantissa must have at least one bit that is equal to 1.
• The mantissa has an underflow if the most significant bit in position A1, is 0. In that
case, the mantissa is shifted left and the exponent decremented.
49
• The bit in A1 is checked again and the process is repeated until A1 = 1. When A1 = 1,
the mantissa is normalized and the operation is completed
50
7. FLOATING POINT DIVISION
• Floating-point division requires that the exponents be subtracted and the mantissas divided
as in fixed point division.
• Steps
- Check for zeros.
- Initialize registers and evaluate the sign
- Align the dividend(check overflow ).
- Subtract the exponents.
- Divide the mantissas.
51
52
UNIT-IV
INPUT-OUTPUT ORGANIZATION
Peripheral Devices:
The Input / output organization of computer depends upon the size of computer
and the peripherals connected to it. The I/O Subsystem of the computer,
provides an efficient mode of communication between the central system and
the outside environment
The most common input output devices are: Monitor, Keyboard, Mouse, Printer, Magnetic
tapes etc.The devices that are under the direct control of the computer are said to be
connected online.
1. INPUT - OUTPUT INTERFACE
• Input Output Interface provides a method for transferring information between
internal storage and external I/O devices.
• Peripherals connected to a computer need special communication links for interfacing
them with the central processing unit.
• The purpose of communication link is to resolve the differences that exist between the
central computer and each peripheral.
• The Major Differences are:-
• Peripherals are electromechnical and electromagnetic devices and CPU and memory
are electronic devices. Therefore, a conversion of signal values may be needed.
• The data transfer rate of peripherals is usually slower than the transfer rate of CPU
and consequently, a synchronization mechanism may be needed.
• Data codes and formats in the peripherals differ from the word format in the CPU and
memory.
• The operating modes of peripherals are different from each other and must be
controlled so as not to disturb the operation of other peripherals connected to the
CPU.
• To Resolve these differences, computer systems include special hardware components
between the CPU and Peripherals to supervises and synchronizes all input and out
transfers. These components are called Interface Units because they interface
between the processor bus and the peripheral devices.
I/O BUS and Interface Module
• It defines the typical link between the processor and several peripherals.
• The I/O Bus consists of data lines, address lines and control lines. The I/O bus from
the processor is attached to all peripherals interface.
• To communicate with a particular device, the processor places a device address on
address lines.
• Each Interface decodes the address and control received from the I/O bus, interprets
them for peripherals and provides signals for the peripheral controller.
• It is also synchronizes the data flow and supervises the transfer between peripheral
and processor.
• Each peripheral has its own controller.
53
For example, the printer controller controls the paper motion, the print timing The control
lines are referred as I/O command. The commands are as following:
Control command- A control command is issued to activate the peripheral and to inform it
what to do.
Status command- A status command is used to test various status conditions in the
interface and the peripheral.
Data Output command- A data output command causes the interface to respond by
transferring data from the bus into one of its registers.
Data Input command- The data input command is the opposite of the data output.
In this case the interface receives on item of data from the peripheral and places it in its
buffer register. I/O Versus Memory Bus
• To communicate with I/O, the processor must communicate with the memory unit.
• Like the I/O bus, the memory bus contains data, address and read/write control lines.
• There are 3 ways that computer buses can be used to communicate with memory and
I/O:
i. Use two Separate buses , one for memory and other for I/O.
ii. Use one common bus for both memory and I/O but separate control lines for each.
iii. Use one common bus for memory and I/O with common control lines. I/O Processor
• In the first method, the computer has independent sets of data, address and control
buses one for accessing memory and other for I/O. This is done in computers that
provides a separate I/O processor (IOP). The purpose of IOP is to provide an
independent pathway for the transfer of information between external device and
internal memory.
54
2. ASYNCHRONOUS DATA TRANSFER :
• This Scheme is used when speed of I/O devices do not match with microprocessor,
and timing characteristics of I/O devices is not predictable.
• In this method, process initiates the device and check its status. As a result, CPU has
to wait till I/O device is ready to transfer data.
• When device is ready CPU issues instruction for I/O transfer. In this method two
types of techniques are used based on signals before data transfer.
i. Strobe Control
ii. Handshaking
STROBE CONTROL :
The strobe control method of Asynchronous data transfer employs a single control line to
time each transfer. The strobe may be activated by either the source or the destination unit.
Source-Initiated Data Transfer:
• In the block diagram fig. (a), the data bus carries the binary information from source to
destination unit. Typically, the bus has multiple lines to transfer an entire byte or word.
The strobe is a single line that informs the destination unit when a valid data word is
available.
• The timing diagram fig. (b) the source unit first places the data on the data bus. The
information on the data bus and strobe signal remain in the active state to allow the
destination unit to receive the data.
55
Destination-Initiated Data Transfer:
• In this method, the destination unit activates the strobe pulse, to informing the source to
provide the data. The source will respond by placing the requested binary information on
the data bus.
• The data must be valid and remain in the bus long enough for the destination unit to accept
it. When accepted the destination unit then disables the strobe and the source unit removes
the data from the bus.
HANDSHAKING:
The handshaking method solves the problem of strobe method by introducing a second
control signal that provides a reply to the unit that initiates the transfer.
Principle of Handshaking:
The basic principle of the two-wire handshaking method of data transfer is as follow:
One control line is in the same direction as the data flows in the bus from the source to
destination. It is used by source unit to inform the destination unit whether there a valid data
in the bus. The other control line is in the other direction from the destination to the source. It
is used by the destination unit to inform the source whether it can accept the data. The
sequence of control during the transfer depends on the unit that initiates the transfer.
Source Initiated Transfer using Handshaking:
The sequence of events shows four possible states that the system can be at any given time.
The source unit initiates the transfer by placing the data on the bus and enabling its data valid
signal. The data accepted signal is activated by the destination unit after it accepts the data
from the bus. The source unit then disables its data accepted signal and the system goes into
its initial state.
56
Destination Initiated Transfer Using Handshaking:
The name of the signal generated by the destination unit has been changed to ready for data
to reflects its new meaning. The source unit in this case does not place data on the bus until
after it receives the ready for data signal from the destination unit. From there on, the
handshaking procedure follows the same pattern as in the source initiated case.
The only difference between the Source Initiated and the Destination Initiated transfer is in
their choice of Initial sate.
57
Advantage of the Handshaking method:
The Handshaking scheme provides degree of flexibility and reliability because the
successful completion of data transfer relies on active participation by both units.
If any of one unit is faulty, the data transfer will not be completed. Such an error can
be detected by means of a Timeout mechanism which provides an alarm if the data is
not completed within time.
i. Start Bit- First bit, called start bit is always zero and used to indicate the beginning
character.
ii. Stop Bit- Last bit, called stop bit is always one and used to indicate end of
characters. Stop bit is always in the 1- state and frame the end of the characters to
signify the idle or wait state.
iii. Character Bit- Bits in between the start bit and the stop bit are known as character
bits. The character bits always follow the start bit.
58
Serial Transmission of Asynchronous is done by two ways:
a) Asynchronous Communication Interface
• The transmitter register accepts a data byte from CPU through the data bus and
transferred to a shift register for serial transmission.
• The receive portion receives information into another shift register, and when a
complete data byte is received it is transferred to receiver register.
• CPU can select the receiver register to read the byte through the data bus. Data in the
status register is used for input and output flags.
• A First In First Out (FIFO) Buffer is a memory unit that stores information in such a
manner that the first item is in the item first out. A FIFO buffer comes with separate
input and output terminals. The important feature of this buffer is that it can input data
and output data at two different rates.
• When placed between two units, the FIFO can accept data from the source unit at one
rate, rate of transfer and deliver the data to the destination unit at another rate.
• If the source is faster than the destination, the FIFO is useful for source data arrive in
bursts that fills out the buffer. FIFO is useful in some applications when data are
transferred asynchronously.
Transfer of data is required between CPU and peripherals or memory or sometimes between
any two devices or units of your computer system. To transfer a data from one unit to another
one should be sure that both units have proper connection and at the time of data transfer the
receiving unit is not busy. This data transfer with the computer is Internal Operation.
All the internal operations in a digital system are synchronized by means of clock pulses
supplied by a common clock pulse Generator. The data transfer can be
i. Synchronous or
ii. Asynchronous
59
• When both the transmitting and receiving units use same clock pulse then such a data
transfer is called Synchronous process.
• On the other hand, if the there is no concept of clock pulses and the sender operates at
different moment than the receiver then such a data transfer is called Asynchronous
data transfer.
• The data transfer can be handled by various modes. some of the modes use CPU as an
intermediate path, others transfer the data directly to and from the memory unit and this
can be handled by 3 following ways:
i. Programmed I/O
ii. Interrupt-Initiated I/O
iii. Direct Memory Access (DMA)
• In this mode of data transfer the operations are the results in I/O instructions which is
a part of computer program. Each data transfer is initiated by a instruction in the
program. Normally the transfer is from a CPU register to peripheral device or vice-
versa.
• Once the data is initiated the CPU starts monitoring the interface to see when next
transfer can made. The instructions of the program keep close tabs on everything that
takes place in the interface unit and the I/O devices.
60
⚫ The transfer of data requires three instructions:
In this technique CPU is responsible for executing data from the memory for output and
storing data in memory for executing of Programmed I/O as shown in Flowchart-:
• The main drawback of the Program Initiated I/O was that the CPU has to monitor the
units all the times when the program is executing. Thus the CPU stays in a program
loop until the I/O unit indicates that it is ready for data transfer. This is a time
consuming process and the CPU time is wasted a lot in keeping an eye to the
executing of program.
• To remove this problem an Interrupt facility and special commands are used.
61
Interrupt-Initiated I/O :
⚫ In this method an interrupt facility an interrupt command is used to inform the device
about the start and end of transfer. In the meantime the CPU executes other program.
When the interface determines that the device is ready for data transfer it generates
an Interrupt Request and sends it to the computer.
⚫ When the CPU receives such an signal, it temporarily stops the execution of the
program and branches to a service program to process the I/O transfer and after
completing it returns back to task, what it was originally performing.
⚫ In this type of IO, computer does not check the flag. It continue to perform its task.
⚫ Whenever any device wants the attention, it sends the interrupt signal to the CPU.
⚫ CPU then deviates from what it was doing, store the return address from PC and
branch to the address of the subroutine.
⚫ Vectored Interrupt
⚫ Non-vectored Interrupt
⚫ In vectored interrupt the source that interrupt the CPU provides the branch
information. This information is called interrupt vectored.
⚫ In non-vectored interrupt, the branch address is assigned to the fixed address in the
memory.
4. PRIORITY INTERRUPT:
⚫ When the interrupt is generated from more than one device, priority interrupt system
is used to determine which device is to be serviced first.
⚫ Devices with high speed transfer are given higher priority and slow devices are given
lower priority.
⚫ Using Software
⚫ Using Hardware
62
Polling Procedure :
⚫ Branch address contain the code that polls the interrupt sources in sequence. The
highest priority is tested first.
⚫ The disadvantage is that time required to poll them can exceed the time to serve them
in large number of IO devices.
Using Hardware:
⚫ To speed up the operation each interrupting devices has its own interrupt vector.
⚫ Device that wants the attention send the interrupt request to the CPU.
⚫ CPU then sends the INTACK signal which is applied to PI(priority in) of the first
device.
⚫ If it had requested the attention, it place its VAD(vector address) on the bus. And it
block the signal by placing 0 in PO(priority out)
⚫ If not it pass the signal to next device through PO(priority out) by placing 1.
⚫ The device whose PI is 1 and PO is 0 is the device that send the interrupt request.
63
Parallel Priority Interrupt :
⚫ It consist of interrupt register whose bits are set separately by the interrupting
devices.
⚫ Corresponding interrupt bit and mask bit are ANDed and applied to priority encoder.
64
The Execution process of Interrupt–Initiated I/O is represented in the flowchart:
65
5.Direct Memory Access (DMA):
• In the Direct Memory Access (DMA) the interface transfer the data into and out of the
memory unit through the memory bus. The transfer of data between a fast storage
device such as magnetic disk and memory is often limited by the speed of the CPU.
Removing the CPU from the path and letting the peripheral device manage the
memory buses directly would improve the speed of transfer. This transfer technique is
called Direct Memory Access (DMA).
• During the DMA transfer, the CPU is idle and has no control of the memory buses. A
DMA Controller takes over the buses to manage the transfer directly between the I/O
device and memory.
• The CPU may be placed in an idle state in a variety of ways. One common method
extensively used in microprocessor is to disable the buses through special control
signals such as:
These two control signals in the CPU that facilitates the DMA transfer. The Bus Request
(BR) input is used by the DMA controller to request the CPU. When this input is active, the
CPU terminates the execution of the current instruction and places the address bus, data bus
and read write lines into a high Impedance state. High Impedance state means that the
output is disconnected.
66
• The CPU activates the Bus Grant (BG) output to inform the external DMA that the
Bus Request (BR) can now take control of the buses to conduct memory transfer
without processor.
• When the DMA terminates the transfer, it disables the Bus Request (BR) line. The
CPU disables the Bus Grant (BG), takes control of the buses and return to its normal
operation.
• Cycle Stealing :- Cycle stealing allows the DMA controller to transfer one data word
at a time, after which it must returns control of the buses to the CPU.
DMA Controller:
• The DMA controller needs the usual circuits of an interface to communicate with the
CPU and I/O device. The DMA controller has three registers:
i. Address Register
ii. Word Count Register
iii. Control Register
ii. Word Count Register :- WC holds the number of words to be transferred. The
register is incre/decre by one after each word transfer and internally tested for
zero.
• The unit communicates with the CPU via the data bus and control lines. The
registers in the DMA are selected by the CPU through the address bus by enabling
the DS (DMA select) and RS (Register select) inputs. The RD (read) and WR (write)
inputs are bidirectional.
• When the BG (Bus Grant) input is 0, the CPU can communicate with the DMA
registers through the data bus to read from or write to the DMA registers. When BG
=1, the DMA can communicate directly with the memory by specifying an address
in the address bus and activating the RD or WR control.
67
II B.Sc CS - Computer Architecture
DMA Transfer:
• The CPU communicates with the DMA through the address and data buses as with any
interface unit. The DMA has its own address, which activates the DS and RS lines. The
CPU initializes the DMA through the data bus. Once the DMA receives the start control
command, it can transfer between the peripheral and the memory.
• When BG = 0 the RD and WR are input lines allowing the CPU to communicate with
the internal DMA registers. When BG=1, the RD and WR are output lines from the
DMA controller to the random access memory to specify the read or write operation of
data.
68
UNIT – 5
Memory Organization
Volatile Memory: This loses its data, when power is switched off.
Non-Volatile Memory: This is a permanent storage and does not lose any
data when power is switched off.
Memory Hierarchy
Auxillary memory are devices that provide backup storage. Its access time
is generally 1000 times that of the main memory, hence it is at the bottom
of the hierarchy.
When the program not residing in main memory is needed by the CPU, they
are brought in from auxiliary memory. Programs not currently needed in
main memory are transferred into auxiliary memory to provide space in
main memory for other programs that are currently in use.
69
CPU at a rapid rate. Approximate access time ratio between cache memory
and main memory is about 1 to 7~10
Main Memory
The memory unit that communicates directly within the CPU, auxillary
memory and cache memory is called main memory. It is the central storage
unit of the computer system. It is a large and fast memory used to store data
during computer operations. Main memory is made up of RAM and ROM.
RAM Chip
A RAM chip has one or more control inputs(CS1, CS2) on select the
chip only when needed.
A bidirectional data bus allows data transfer from memory to cpu during
read operation and cpu to memory during write operation. It has a three
state buffer with the states 0, 1 or high impedance state.
It uses a 7 bit address bus(AD7). The read(RD) and write(WR) inputs
and the chip select input decide the operation. When the chip is selected
either RD or WR input is active and the operation is carried out.
The block diagram and function table of RAM is
70
ROM Chip
Auxiliary Memory
Devices that provide backup storage are called auxiliary memory. For
example: Magnetic disks and tapes are commonly used auxiliary devices.
Other devices used as auxiliary memory are magnetic drums, magnetic
bubble memory and optical disks. It is not directly accessible to the CPU,
and is accessed using the Input/Output channels.
Magnetic Disks:
71
number. When a track is reached the disk rotates to the specified sector and
starts data transfer.
Tracks near the circumference is longer than the center of the disk. To
ensure equal number of bits in all tracks it stores data in variable density.
Disks that are permanently attached to the unit assembly and are not
removed often are called hard disk.
Removable disk is called floppy disc.
Magnetic Tape
Associative Memory
72
The argument register contains the value to search for, and therefore is n
bits wide to match the size of a word in the memory.
The key register holds a mask that allows searching based on part of
argument. If a bit in the key register is 1, then the corresponding bit in
the argument and each memory word must be the same to be considered
a match. If a bit in the key register is 0, then the corresponding bit is
considered a match whether or not the argument and memory word are
equal for that bit. This allows searches for words where any subset of the
bits match the argument register.
The match register, M, is m bits wide (could be huge), and will contain a
1 for each word that matches the masked argument, and a 0 for each
word that does not.
Cache Memory
Cache memory is small, high speed RAM buffer located between CPU
and the main memory.
Cache memory hold copy of the instructions (instruction cache) or Data
(Operand or Data cache) currently being used by the CPU.
The data or contents of the main memory that are frequently used by the
CPU are stored in the cache memory so data can be accessed faster.
Whenever the CPU needs to access memory, it first checks the cache
memory. If the data is not found in cache memory then the CPU moves
onto the main memory. It also transfers block of recent data into the cache
and keeps on deleting the old data in cache to accomodate the new one.
If the processor finds that the memory location is in the cache, a cache
hit has occurred and data is read from chache
73
If the processor does not find the memory location in the cache, a cache
miss has occurred. For a cache miss, the cache allocates a new entry and
copies in data from main memory, then the request is fulfilled from the
contents of the cache.
The performance of cache memory is frequently measured in terms of a
quantity called Hit ratio.
Hit ratio = hit / (hit + miss) = no. of hits/total accesses
Transformation of data from main memory to cache memory is refered as
mapping process. The types of mapping procedures are
Direct mapping
Associative mapping,
Set-Associative mapping.
74
Set-associative mapping: This form of mapping is a enhanced form of the
direct mapping where the drawbacks of direct mapping is removed. Set
associative addresses the problem of possible thrashing in the direct
mapping method. It does this by saying that instead of having exactly one
line that a block can map to in the cache, we will group a few lines together
creating a set. Then a block in memory can map to any one of the lines of a
specific set..Set-associative mapping allows that each word that is present
in the cache can have two or more words in the main memory for the same
index address. Set associative cache mapping combines the best of direct
and associative cache mapping techniques
75
removed from cache then it is a Write-Back method. This method is used
when the data is updated several times when it is in cache.
Virtual Memory
76